summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2025-10-18KVM: SVM: Limit AVIC physical max index based on configured max_vcpu_idsNaveen N Rao1-2/+5
KVM allows VMMs to specify the maximum possible APIC ID for a virtual machine through KVM_CAP_MAX_VCPU_ID capability so as to limit data structures related to APIC/x2APIC. Utilize the same to set the AVIC physical max index in the VMCB, similar to VMX. This helps hardware limit the number of entries to be scanned in the physical APIC ID table speeding up IPI broadcasts for virtual machines with smaller number of vCPUs. Unlike VMX, SVM AVIC requires a single page to be allocated for the Physical APIC ID table and the Logical APIC ID table, so retain the existing approach of allocating those during VM init. Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/adb07ccdb3394cd79cb372ba6bcc69a4e4d4ef54.1757009416.git.naveen@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Add an off-by-default module param to WARN on missed consistency ↵Sean Christopherson1-0/+43
checks Add an off-by-default param, "warn_on_missed_cc", to have KVM WARN on a missed VMX Consistency Check on nested VM-Enter, specifically so that KVM developers and maintainers can more easily detect missing checks. KVM's goal/intent is that KVM detect *all* VM-Fail conditions in software, as relying on hardware leads to false passes when KVM's nested support is a subset of hardware support, e.g. see commit 095686e6fcb4 ("KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter"). With one notable exception, KVM now detects all VM-Fail scenarios for which there is known test coverage, i.e. KVM developers can enable the param and expect a clean run, and thus can use the param to detect missed checks, e.g. when enabling new features, when writing new tests, etc. The one exception is an unfortunate consistency check on vTPR. Because the vTPR for L2 comes from the virtual APIC page provided by L1, L2's vTPR is fully writable at all times, i.e. is inherently subject to TOCTOU issues with respect to checks in software versus consumption in hardware. Further complicating matters is KVM's deferred handling of vmcs12 pages when loading nested state; KVM flat out cannot check vTPR during KVM_SET_NESTED_STATE without breaking setups that do on-demand paging, e.g. for live migration and/or live update. To fudge around the vTPR issue, add a "late" controls check for vTPR and also treat an invalid virtual APIC as VM-Fail, but gate the check on warn_on_missed_cc being enabled to avoid unwanted false positives, i.e. to avoid breaking KVM in production. Cc: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250919005955.1366256-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Remove support for "early" consistency checks via hardwareSean Christopherson1-118/+12
Remove nested_early_check and all associated code, as it's quite obviously not being used or tested (it's been broken for 4+ years without a single bug report). More importantly, KVM's software-based consistency checks have matured since the option to do hardware-based checks was added; KVM appears to be missing only _one_ consistency check, on vTPR. And even *more* importantly, that consistency check can't be prevented by an early hardware check due to L1 being able to modify the virtual APIC at any time, i.e. there's an inherent TOCTOU flaw that could cause KVM to "miss" a consistency check VM-Fail, regardless of whether the check is performed by software or by hardware. In other words, KVM _must_ be able to unwind from a late VM-Fail (which was a big motivation for doing early checks). I.e. now that KVM provides (almost) all necessary consistency checks, what's really needed is a way to detect missing checks in KVM, not a way to avoid having to unwind from a late VM-Fail. And that can be done much more simply, e.g. by an simple module param to guard a WARN (which, sadly, must be off-by-default to avoid splats due to the aforementioned TOCTOU issue). For all intents and purposes, this reverts commit 52017608da33 ("KVM: nVMX: add option to perform early consistency checks via H/W"). Link: https://lore.kernel.org/r/20250919005955.1366256-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Stuff vmcs02.TSC_MULTIPLIER early on for nested early checksSean Christopherson1-0/+7
If KVM is doing "early" nested VM-Enter consistency checks and TSC scaling is supported, stuff vmcs02's TSC Multiplier early on to avoid getting a false positive VM-Fail due to trying to do VM-Enter with TSC_MULTIPLIER=0. To minimize complexity around L1 vs. L2 TSC, KVM sets the actual TSC Multiplier rather late during VM-Entry, i.e. may have '0' at the time of early consistency checks. If vmcs12 has TSC Scaling enabled, use the multiplier from vmcs12 so that nested early checks actually check vmcs12 state, otherwise throw in an arbitrary value of '1' (anything non-zero is legal). Fixes: d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling") Link: https://lore.kernel.org/r/20250919005955.1366256-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Add consistency check for TSC_MULTIPLIER=0Sean Christopherson1-0/+4
Add a missing consistency check on the TSC Multiplier being '0'. Per the SDM: If the "use TSC scaling" VM-execution control is 1, the TSC-multiplier must not be zero. Fixes: d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling") Link: https://lore.kernel.org/r/20250919005955.1366256-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Add consistency check for TPR_THRESHOLD[31:4]!=0 without VIDSean Christopherson1-0/+3
Add a missing consistency check on the TPR Threshold. Per the SDM If the "use TPR shadow" VM-execution control is 1 and the "virtual- interrupt delivery" VM-execution control is 0, bits 31:4 of the TPR threshold VM-execution control field must be 0. Note, nested_vmx_check_tpr_shadow_controls() bails early if "use TPR shadow" is 0. Link: https://lore.kernel.org/r/20250919005955.1366256-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: VMX: Use kvm_mmu_page role to construct EPTP, not current vCPU stateSean Christopherson1-11/+30
Use the role for the to-be-loaded/invalidated EPT root to compute the root's level and A/D enablement instead of pulling the information from the vCPU (e.g. by passing in the root level and querying vmcs12). Not making unnecessary assumptions about the root will allow invalidating arbitrary EPT roots (which sadly requires a full EPTP) at any given time. No functional change intended (the end result should be the same). Link: https://lore.kernel.org/r/20250919005955.1366256-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: x86/mmu: Move "dummy root" helpers to spte.hSean Christopherson2-10/+10
Move the helpers to get/query a dummy root from mmu_internal.h to spte.h so that VMX can detect and handle dummy roots when constructing EPTPs. This will allow using the root's role to build the EPTP instead of pulling equivalent information out of the vCPU structure. No functional change intended. Link: https://lore.kernel.org/r/20250919005955.1366256-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: nVMX: Hardcode dummy EPTP used for early nested consistency checksSean Christopherson3-7/+4
Hardcode the dummy EPTP used for "early" consistency checks as there's no need to use 5-level EPT based on the guest.MAXPHYADDR (the EPTP just needs to be valid, it's never truly consumed). This will allow breaking construct_eptp()'s dependency on having access to the vCPU, which in turn will (much further in the future) allow for eliding per-root TLB flushes when a vCPU is migrated between pCPUs (a flush is need if and only if that particular pCPU hasn't already flushed the vCPU's roots). Link: https://lore.kernel.org/r/20250919005955.1366256-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18KVM: VMX: Hoist construct_eptp() "up" in vmx.cSean Christopherson1-14/+14
Move construct_eptp() further up in vmx.c so that it's above vmx_flush_tlb_current(), its "first" user in vmx.c. This will allow a future patch to opportunistically make construct_eptp() local to vmx.c. No functional change intended. Link: https://lore.kernel.org/r/20250919005955.1366256-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18arm64: dts: qcom: ipq5424: correct the TF-A reserved memory to 512KKathiravan Thirumoorthy1-1/+1
Correct the reserved memory size for TF-A to 512K, as it was mistakenly marked as 500K. Update the reserved memory node accordingly. Fixes: 8517204c982b ("arm64: dts: qcom: ipq5424: Add reserved memory for TF-A") Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251014-tfa-reserved-mem-v1-1-48c82033c8a7@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17riscv: dts: thead: add zfh for th1520Han Gao1-4/+4
th1520 support Zfh ISA extension. It supports the same RISC-V extensions as SG2042. commit cb074bed1186 ("riscv: dts: sophgo: add zfh for sg2042") Signed-off-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Drew Fustini <fustini@kernel.org> Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17riscv: dts: thead: add ziccrse for th1520Han Gao1-8/+16
Existing rv64 hardware conforms to the rva20 profile. Ziccrse is an additional extension required by the rva20 profile, so th1520 has this extension. Signed-off-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Drew Fustini <fustini@kernel.org> Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17riscv: dts: thead: add xtheadvector to the th1520 devicetreeHan Gao1-4/+8
The th1520 support xtheadvector [1] so it can be included in the devicetree. Also include vlenb for the cpu. And set vlenb=16 [2]. This can be tested by passing the "mitigations=off" kernel parameter. Link: https://lore.kernel.org/linux-riscv/20241113-xtheadvector-v11-4-236c22791ef9@rivosinc.com/ [1] Link: https://lore.kernel.org/linux-riscv/aCO44SAoS2kIP61r@ghost/ [2] Signed-off-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Drew Fustini <fustini@kernel.org> Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17arm64: debug: always unmask interrupts in el0_softstp()Ada Couprie Diaz1-3/+5
We intend that EL0 exception handlers unmask all DAIF exceptions before calling exit_to_user_mode(). When completing single-step of a suspended breakpoint, we do not call local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(), leaving all DAIF exceptions masked. When pseudo-NMIs are not in use this is benign. When pseudo-NMIs are in use, this is unsound. At this point interrupts are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag manipulation may not work correctly. For example, a subsequent local_irq_enable() within exit_to_user_mode_loop() will only unmask interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and anything depending on interrupts being unmasked (e.g. delivery of signals) will not work correctly. This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING. Move the call to `try_step_suspended_breakpoints()` outside of the check so that interrupts can be unmasked even if we don't call the step handler. Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry") Cc: <stable@vger.kernel.org> # 6.17 Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-17arm64/sysreg: Fix GIC CDEOI instruction encodingLorenzo Pieralisi1-1/+10
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111 otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE. Currenly, its usage is encoded as a system register write, with a constant 0 value: write_sysreg_s(0, GICV5_OP_GIC_CDEOI) While compiling with GCC, the 0 constant value, through these asm constraints and modifiers ('x' modifier and 'Z' constraint combo): asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); forces the compiler to issue the XZR register for the MSR operation (ie that corresponds to Rt == 0b11111) issuing the right instruction encoding. Unfortunately LLVM does not yet understand that modifier/constraint combo so it ends up issuing a different register from XZR for the MSR source, which in turns means that it encodes the GIC CDEOI instruction wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE that we must prevent. Add a conditional to write_sysreg_s() macro that detects whether it is passed a constant 0 value and issues an MSR write with XZR as source register - explicitly doing what the asm modifier/constraint is meant to achieve through constraints/modifiers, fixing the LLVM compilation issue. Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Sascha Bischoff <sascha.bischoff@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-17RISC-V: KVM: Read HGEIP CSR on the correct cpuFangyu Yu1-2/+14
When executing kvm_riscv_vcpu_aia_has_interrupts, the vCPU may have migrated and the IMSIC VS-file have not been updated yet, currently the HGEIP CSR should be read from the imsic->vsfile_cpu ( the pCPU before migration ) via on_each_cpu_mask, but this will trigger an IPI call and repeated IPI within a period of time is expensive in a many-core systems. Just let the vCPU execute and update the correct IMSIC VS-file via kvm_riscv_vcpu_aia_imsic_update may be a simple solution. Fixes: 4cec89db80ba ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization") Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251016012659.82998-1-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-10-17ARM: dts: aspeed: santabarbara: Add eeprom device node for PRoT moduleFred Chen1-0/+5
Add eeprom device node for PRot module FRU. Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Add AMD APML interface supportFred Chen1-1/+15
Enable AMD APML related features - add amd sbrmi node for SoC power reading - add amd sbtsi node for SoC temperature reading - rename the P0_I3C_APML_ALERT_L GPIO to align with the naming convention expected by the AMD APML tool Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Add gpio line nameFred Chen1-5/+13
Add GPIO line name for userspace control or monitoring - Add leak-related line names to report chassis leak event - Add debug-card-mux to control debug card access - Add FM_MAIN_PWREN_RMC_EN_ISO_R to receive RMC power control signal Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Add bmc_ready_noled LedFred Chen1-0/+5
Add a 'bmc_ready_noled' LED on GPIOB3 with GPIO_TRANSITORY to ensure its state resets on BMC reboot. Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Enable MCTP for frontend NICFred Chen1-0/+7
Add the mctp-controller property and MCTP node to enable frontend NIC management via PLDM over MCTP. Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Add sensor support for extension boardsFred Chen1-0/+848
add power monitor and temperature sensors for extension boards in bus 6, 8, 10 and 13. Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: santabarbara: Add blank lines between nodes for readabilityFred Chen1-0/+20
Add missing blank lines between DT nodes to follow the devicetree coding style and improve readability. No functional changes. Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: fuji-data64: Enable mac3 controllerTao Ren1-0/+14
"mac3" controller was removed from the initial version of fuji-data64 dts because the rgmii setting is incorrect, but dropping mac3 leads to regression in the existing fuji platform, because fuji.dts simply includes fuji-data64.dts. This patch adds mac3 back to fuji-data64.dts to fix the fuji regression[1], and rgmii settings need to be fixed later. Fixes: b0f294fdfc3e ("ARM: dts: aspeed: facebook-fuji: Include facebook-fuji-data64.dts") Link: https://lore.kernel.org/all/79ddc7b9-ef26-4959-9a16-aa4e006eb145@roeck-us.net/ [1] Signed-off-by: Tao Ren <rentao.bupt@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17ARM: dts: aspeed: yosemite5: Add Meta Yosemite5 BMCKevin Tung2-0/+1068
Add device tree for the Meta (Facebook) Yosemite5 compute node, based on the AST2600 BMC. The Yosemite5 platform provides monitoring of voltages, power, temperatures, and other critical parameters across the motherboard, CXL board, E1.S expansion board, and NIC components. The BMC also logs relevant events and performs appropriate system actions in response to abnormal conditions. Signed-off-by: Kevin Tung <kevin.tung.openbmc@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17x86/sgx: Enable automatic SVN updates for SGX enclavesElena Reshetova1-2/+18
== Background == ENCLS[EUPDATESVN] is a new SGX instruction [1] which allows enclave attestation to include information about updated microcode SVN without a reboot. Before an EUPDATESVN operation can be successful, all SGX memory (aka. EPC) must be marked as “unused” in the SGX hardware metadata (aka.EPCM). This requirement ensures that no compromised enclave can survive the EUPDATESVN procedure and provides an opportunity to generate new cryptographic assets. == Solution == Attempt to execute ENCLS[EUPDATESVN] every time the first file descriptor is obtained via sgx_(vepc_)open(). In the most common case the microcode SVN is already up-to-date, and the operation succeeds without updating SVN. Note: while in such cases the underlying crypto assets are regenerated, it does not affect enclaves' visible keys obtained via EGETKEY instruction. If it fails with any other error code than SGX_INSUFFICIENT_ENTROPY, this is considered unexpected and the *open() returns an error. This should not happen in practice. On contrary, SGX_INSUFFICIENT_ENTROPY might happen due to a pressure on the system's DRNG (RDSEED) and therefore the *open() can be safely retried to allow normal enclave operation. [1] Runtime Microcode Updates with Intel Software Guard Extensions, https://cdrdv2.intel.com/v1/dl/getContent/648682 Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-17x86/sgx: Implement ENCLS[EUPDATESVN]Elena Reshetova3-15/+96
All running enclaves and cryptographic assets (such as internal SGX encryption keys) are assumed to be compromised whenever an SGX-related microcode update occurs. To mitigate this assumed compromise the new supervisor SGX instruction ENCLS[EUPDATESVN] can generate fresh cryptographic assets. Before executing EUPDATESVN, all SGX memory must be marked as unused. This requirement ensures that no potentially compromised enclave survives the update and allows the system to safely regenerate cryptographic assets. Add the method to perform ENCLS[EUPDATESVN]. However, until the follow up patch that wires calling sgx_update_svn() from sgx_inc_usage_count(), this code is not reachable. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-17x86/sgx: Define error codes for use by ENCLS[EUPDATESVN]Elena Reshetova1-0/+6
Add error codes for ENCLS[EUPDATESVN], then SGX CPUSVN update process can know the execution state of EUPDATESVN and notify userspace. EUPDATESVN will be called when no active SGX users is guaranteed. Only add the error codes that can legally happen. E.g., it could also fail due to "SGX not ready" when there's SGX users but it wouldn't happen in this implementation. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-17x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flagElena Reshetova3-0/+3
Add a flag indicating whenever ENCLS[EUPDATESVN] SGX instruction is supported. This will be used by SGX driver to perform CPU SVN updates. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-17x86/sgx: Introduce functions to count the sgx_(vepc_)open()Elena Reshetova5-2/+51
Currently, when SGX is compromised and the microcode update fix is applied, the machine needs to be rebooted to invalidate old SGX crypto-assets and make SGX be in an updated safe state. It's not friendly for the cloud. To avoid having to reboot, a new ENCLS[EUPDATESVN] is introduced to update SGX environment at runtime. This process needs to be done when there's no SGX users to make sure no compromised enclaves can survive from the update and allow the system to regenerate crypto-assets. For now there's no counter to track the active SGX users of host enclave and virtual EPC. Introduce such counter mechanism so that the EUPDATESVN can be done only when there's no SGX users. Define placeholder functions sgx_inc/dec_usage_count() that are used to increment and decrement such a counter. Also, wire the call sites for these functions. Encapsulate the current sgx_(vepc_)open() to __sgx_(vepc_)open() to make the new sgx_(vepc_)open() easy to read. The definition of the counter itself and the actual implementation of sgx_inc/dec_usage_count() functions come next. Note: The EUPDATESVN, which may fail, will be done in sgx_inc_usage_count(). Make it return 'int' to make subsequent patches which implement EUPDATESVN easier to review. For now it always returns success. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16x86/idtentry: Add missing '*' to kernel-doc linesRandy Dunlap1-2/+2
Fix kernel-doc warnings by adding the missing '*' to each line. Warning: include/asm/idtentry.h:395 bad line: when raised from kernel mode Warning: include/asm/idtentry.h:405 bad line: when raised from user mode Since this is in a kernel-doc block, these lines need a leading " *" on each line to prevent the warnings. Fixes: a13644f3a53d ("x86/entry/64: Add entry code for #VC handler") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-16RISC-V: KVM: Fix check for local interrupts on riscv32Samuel Holland1-1/+1
To set all 64 bits in the mask on a 32-bit system, the constant must have type `unsigned long long`. Fixes: 6b1e8ba4bac4 ("RISC-V: KVM: Use bitmap for irqs_pending and irqs_pending_mask") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251016001714.3889380-1-samuel.holland@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-10-16sched/topology,x86: Fix build warningPeter Zijlstra1-0/+2
A compile warning slipped through: arch/x86/kernel/smpboot.c:548:5: warning: no previous prototype for function 'arch_sched_node_distance' [-Wmissing-prototypes] Fixes: 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-16arm64: tegra: Mark Jetson Xavier NX's PHY as a wakeup sourceRussell King (Oracle)1-0/+1
Mark the RTL8211F PHY as a wakeup source for the Jetson Xavier NX. This allows the reworked RTL8211F driver to know that the PHY is wired to wakeup capable hardware, and thus to expose WoL capabilities. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-16sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 modeTim Chen1-0/+70
It is possible for Granite Rapids (GNR) and Clearwater Forest (CWF) to have up to 3 dies per package. When sub-numa cluster (SNC-3) is enabled, each die will become a separate NUMA node in the package with different distances between dies within the same package. For example, on GNR, we see the following numa distances for a 2 socket system with 3 dies per socket: package 1 package2 ---------------- | | --------- --------- | 0 | | 3 | --------- --------- | | --------- --------- | 1 | | 4 | --------- --------- | | --------- --------- | 2 | | 5 | --------- --------- | | ---------------- node distances: node 0 1 2 3 4 5 0: 10 15 17 21 28 26 1: 15 10 15 23 26 23 2: 17 15 10 26 23 21 3: 21 28 26 10 15 17 4: 23 26 23 15 10 15 5: 26 23 21 17 15 10 The node distances above led to 2 problems: 1. Asymmetric routes taken between nodes in different packages led to asymmetric scheduler domain perspective depending on which node you are on. Current scheduler code failed to build domains properly with asymmetric distances. 2. Multiple remote distances to respective tiles on remote package create too many levels of domain hierarchies grouping different nodes between remote packages. For example, the above GNR topology lead to NUMA domains below: Sched domains from the perspective of a CPU in node 0, where the number in bracket represent node number. NUMA-level 1 [0,1] [2] NUMA-level 2 [0,1,2] [3] NUMA-level 3 [0,1,2,3] [5] NUMA-level 4 [0,1,2,3,5] [4] Sched domains from the perspective of a CPU in node 4 NUMA-level 1 [4] [3,5] NUMA-level 2 [3,4,5] [0,2] NUMA-level 3 [0,2,3,4,5] [1] Scheduler group peers for load balancing from the perspective of CPU 0 and 4 are different. Improper task could be chosen for load balancing between groups such as [0,2,3,4,5] [1]. Ideally you should choose nodes in 0 or 2 that are in same package as node 1 first. But instead tasks in the remote package node 3, 4, 5 could be chosen with an equal chance and could lead to excessive remote package migrations and imbalance of load between packages. We should not group partial remote nodes and local nodes together. Simplify the remote distances for CWF and GNR for the purpose of sched domains building, which maintains symmetry and leads to a more reasonable load balance hierarchy. The sched domains from the perspective of a CPU in node 0 NUMA-level 1 is now NUMA-level 1 [0,1] [2] NUMA-level 2 [0,1,2] [3,4,5] The sched domains from the perspective of a CPU in node 4 NUMA-level 1 is now NUMA-level 1 [4] [3,5] NUMA-level 2 [3,4,5] [0,1,2] We have the same balancing perspective from node 0 or node 4. Loads are now balanced equally between packages. Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Zhao Liu <zhao1.liu@intel.com>
2025-10-16x86/insn: Simplify for_each_insn_prefix()Peter Zijlstra5-18/+11
Use the new-found freedom of allowing variable declarions inside for() to simplify the for_each_insn_prefix() iterator to no longer need an external temporary. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-16x86/insn,uprobes,alternative: Unify insn_is_nop()Peter Zijlstra4-48/+151
Both uprobes and alternatives have insn_is_nop() variants, unify them and make sure insn_is_nop() works for both x86_64 and i386. Specifically, uprobe must not compare userspace instructions to kernel nops as that does not work right in the compat case. For the uprobe case we therefore must recognise common 32bit and 64bit nops. Because uprobe will consume the instruction as a nop, it must not mistakenly claim a non-nop instruction to be a nop. Eg. 'REX.b3 NOP' is 'xchg %r8,%rax' - not a nop. For the kernel case similar constraints apply, is it used to optimize NOPs by replacing strings of short(er) nops with longer nops. Must not claim an instruction is a nop if it really isn't. Not recognising a nop is non-fatal. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-16perf/x86/amd: Check event before enable to avoid GPFGeorge Kennedy1-1/+6
On AMD machines cpuc->events[idx] can become NULL in a subtle race condition with NMI->throttle->x86_pmu_stop(). Check event for NULL in amd_pmu_enable_all() before enable to avoid a GPF. This appears to be an AMD only issue. Syzkaller reported a GPF in amd_pmu_enable_all. INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.143 msecs Oops: general protection fault, probably for non-canonical address 0xdffffc0000000034: 0000 PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000001a0-0x00000000000001a7] CPU: 0 UID: 0 PID: 328415 Comm: repro_36674776 Not tainted 6.12.0-rc1-syzk RIP: 0010:x86_pmu_enable_event (arch/x86/events/perf_event.h:1195 arch/x86/events/core.c:1430) RSP: 0018:ffff888118009d60 EFLAGS: 00010012 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000034 RSI: 0000000000000000 RDI: 00000000000001a0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 R13: ffff88811802a440 R14: ffff88811802a240 R15: ffff8881132d8601 FS: 00007f097dfaa700(0000) GS:ffff888118000000(0000) GS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 0000000103d56000 CR4: 00000000000006f0 Call Trace: <IRQ> amd_pmu_enable_all (arch/x86/events/amd/core.c:760 (discriminator 2)) x86_pmu_enable (arch/x86/events/core.c:1360) event_sched_out (kernel/events/core.c:1191 kernel/events/core.c:1186 kernel/events/core.c:2346) __perf_remove_from_context (kernel/events/core.c:2435) event_function (kernel/events/core.c:259) remote_function (kernel/events/core.c:92 (discriminator 1) kernel/events/core.c:72 (discriminator 1)) __flush_smp_call_function_queue (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/csd.h:64 kernel/smp.c:135 kernel/smp.c:540) __sysvec_call_function_single (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:272) sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 47) arch/x86/kernel/smp.c:266 (discriminator 47)) </IRQ> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-16parisc: entry: set W bit for !compat tasks in syscall_restore_rfi()Sven Schnelle2-1/+6
When the kernel leaves to userspace via syscall_restore_rfi(), the W bit is not set in the new PSW. This doesn't cause any problems because there's no 64 bit userspace for parisc. Simple static binaries are usually loaded at addresses way below the 32 bit limit so the W bit doesn't matter. Fix this by setting the W bit when TIF_32BIT is not set. Signed-off-by: Sven Schnelle <svens@stackframe.org> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-15arm64: defconfig: Enable DW HDMI QP CEC supportCristian Ciocaltea1-0/+1
Enable support for the CEC interface of the Synopsys DesignWare HDMI QP IP block. This is used by all boards based on RK3588 & RK3576 SoCs. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-15x86/CPU/AMD: Prevent reset reasons from being retained across rebootRong Zhang1-2/+14
The S5_RESET_STATUS register is parsed on boot and printed to kmsg. However, this could sometimes be misleading and lead to users wasting a lot of time on meaningless debugging for two reasons: * Some bits are never cleared by hardware. It's the software's responsibility to clear them as per the Processor Programming Reference (see [1]). * Some rare hardware-initiated platform resets do not update the register at all. In both cases, a previous reboot could leave its trace in the register, resulting in users seeing unrelated reboot reasons while debugging random reboots afterward. Write the read value back to the register in order to clear all reason bits since they are write-1-to-clear while the others must be preserved. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=206537#attach_303991 [ bp: Massage commit message. ] Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset") Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/all/20250913144245.23237-1-i@rong.moe/
2025-10-15KVM: SVM: Disallow EFER.LMSLE when not supported by hardwareJim Mattson1-1/+3
Modern AMD CPUs do not support segment limit checks in 64-bit mode (i.e. EFER.LMSLE must be zero). Do not allow a guest to set EFER.LMSLE on a CPU that requires the bit to be zero. For backwards compatibility, allow EFER.LMSLE to be set on CPUs that support segment limit checks in 64-bit mode, even though KVM's implementation of the feature is incomplete (e.g. KVM's emulator does not enforce segment limits in 64-bit mode). Fixes: eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://lore.kernel.org/r/20251001001529.1119031-3-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-15Remove long-stale ext3 defconfig optionLinus Torvalds31-31/+0
Inspired by commit c065b6046b34 ("Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs") I looked around for any other left-over EXT3 config options, and found some old defconfig files still mentioned CONFIG_EXT3_DEFAULTS_TO_ORDERED. That config option was removed a decade ago in commit c290ea01abb7 ("fs: Remove ext3 filesystem driver"). It had a good run, but let's remove it for good. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-10-15Merge tag 'ext4_for_linus-6.18-rc2' of ↵Linus Torvalds106-173/+173
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: - Fix regression caused by removing CONFIG_EXT3_FS when testing some very old defconfigs - Avoid a BUG_ON when opening a file on a maliciously corrupted file system - Avoid mm warnings when freeing a very large orphan file metadata - Avoid a theoretical races between metadata writeback and checkpoints (it's very hard to hit in practice, since the race requires that the writeback take a very long time) * tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs ext4: free orphan info with kvfree ext4: detect invalid INLINE_DATA + EXTENTS flag combination ext4, doc: fix and improve directory hash tree description ext4: wait for ongoing I/O to complete before freeing blocks jbd2: ensure that all ongoing I/O complete before freeing blocks
2025-10-15x86/microcode/intel: Enable staging when availableChang S. Bae2-0/+25
With staging support implemented, enable it when the CPU reports the feature. [ bp: Sort in the MSR properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15x86/microcode/intel: Support mailbox transferChang S. Bae1-6/+166
The functions for sending microcode data and retrieving the next offset were previously placeholders, as they need to handle a specific mailbox format. While the kernel supports similar mailboxes, none of them are compatible with this one. Attempts to share code led to unnecessary complexity, so add a dedicated implementation instead. [ bp: Sort the include properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15x86/microcode/intel: Implement staging handlerChang S. Bae1-3/+120
Previously, per-package staging invocations and their associated state data were established. The next step is to implement the actual staging handler according to the specified protocol. Below are key aspects to note: (a) Each staging process must begin by resetting the staging hardware. (b) The staging hardware processes up to a page-sized chunk of the microcode image per iteration, requiring software to submit data incrementally. (c) Once a data chunk is processed, the hardware responds with an offset in the image for the next chunk. (d) The offset may indicate completion or request retransmission of an already transferred chunk. As long as the total transferred data remains within the predefined limit (twice the image size), retransmissions should be acceptable. Incorporate them in the handler, while data transmission and mailbox format handling are implemented separately. [ bp: Sort the headers in a reversed name-length order. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15x86/microcode/intel: Define staging state structChang S. Bae1-0/+17
Define a staging_state struct to simplify function prototypes by consolidating relevant data, instead of passing multiple local variables. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15x86/microcode/intel: Establish staging control logicChang S. Bae2-0/+53
When microcode staging is initiated, operations are carried out through an MMIO interface. Each package has a unique interface specified by the IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers. Prepare staging with the following steps: 1. Ensure the microcode image is 32-bit aligned to match the MMIO register size. 2. Identify each MMIO interface based on its per-package scope. 3. Invoke the staging function for each identified interface, which will be implemented separately. [ bp: Improve error logging. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/all/871pznq229.ffs@tglx