summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-03-04Merge tag 'riscv-soc-fixes-for-v7.0-rc1' of ↵Arnd Bergmann1-0/+2
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes RISC-V soc fixes for v7.0-rc1 drivers: Fix leaks in probe/init function teardown code in three drivers. microchip: Fix a warning introduced by a recent binding change, that made resets required on Polarfire SoC's CAN IP. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-soc-fixes-for-v7.0-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: cache: ax45mp: Fix device node reference leak in ax45mp_cache_init() cache: starfive: fix device node leak in starlink_cache_init() riscv: dts: microchip: add can resets to mpfs soc: microchip: mpfs: Fix memory leak in mpfs_sys_controller_probe() Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-03-04arm64: dts: qcom: monaco: Fix UART10 pinconfLoic Poulain1-2/+2
UART10 RTS and TX pins were incorrectly mapped to gpio84 and gpio85. Correct them to gpio85 (RTS) and gpio86 (TX) to match the hardware I/O mapping. Fixes: 467284a3097f ("arm64: dts: qcom: qcs8300: Add QUPv3 configuration") Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260202155611.1568-1-loic.poulain@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: monaco: Add EL2 overlayMukesh Ojha2-0/+37
All the Monaco IOT variants boards are using Gunyah hypervisor which means that, so far, Linux-based OS could only boot in EL1 on those devices. However, it is possible for us to boot Linux at EL2 on these devices [1]. When running under Gunyah, the remote processor firmware IOMMU streams are controlled by Gunyah. However, without Gunyah, the IOMMU is managed by the consumer of this DeviceTree. Therefore, describe the firmware streams for each remote processor. Add a EL2-specific DT overlay and apply it to Monaco IOT variant devices to create -el2.dtb for each of them alongside "normal" dtb. [1] https://docs.qualcomm.com/bundle/publicresource/topics/80-70020-4/boot-developer-touchpoints.html#uefi Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260127-talos-el2-overlay-v2-2-b6a2266532c4@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: lemans: disable zap-shader for EL2 configurationMukesh Ojha1-0/+4
We don't need to use zap shader in EL2 as Linux can zap the gpu on it's own. Lets disable zap-shader for Lemans EL2 configuration. Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260127-talos-el2-overlay-v2-1-b6a2266532c4@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: hamoa: Add EL2 overlay for hamoa-evkXin Liu1-0/+4
Add support for building an EL2 combined DTB for the hamoa-evk in the Qualcomm DTS Makefile. The new hamoa-iot-evk-el2.dtb is generated by combining the base hamoa-iot-evk.dtb with the x1-el2.dtbo overlay, enabling EL2-specific configurations required by the platform. Signed-off-by: Xin Liu <xin.liu@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260127062425.1084673-1-xin.liu@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: talos: Add missing clock-names to GCCKonrad Dybcio1-0/+3
The binding for this clock controller requires that clock-names are present. They're not really used by the kernel driver, but they're marked as required, so someone might have assumed it's done on purpose (where in reality we try to stay away from that since index-based references are faster, take up less space and are already widely used) and referenced it in drivers for another OS. Hence, do the least painful thing and add the missing entries. Fixes: 8e266654a2fe ("arm64: dts: qcom: add QCS615 platform") Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Taniya Das <taniya.das@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260126-topic-talos_dt_warn-v1-1-c452afc647ad@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: ipq9574: remove MP5496 regulator references from SoC dtsiGabor Juhos2-5/+17
The 'cpu-supply' properties in the IPQ9574 SoC dtsi are referencing to a regulator provided by an MP5496 PMIC via the RPM firmware which's node is defined externally in the common RDP dtsi file. Since the PMIC is not part of the SoC it should not be referenced from the SoC specific dtsi, so remove the properties from there and define those in the common RDP dtsi instead. While at it, also change the prefix of the label from 'ipq9574' to 'mp5496' to keep it consistent with the labels of the l{2,5} regulators provided by the same PMIC. No functional changes. According to dtx_diff there are no differences between the ipq9574*.dtb files built with and without the change. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260123-ipq9574-mp5496-cleanup-v1-1-9fa86f72b873@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: kodiak: Fix PCIe1 PHY ref clock votingKrishna Chaitanya Chundru1-1/+1
GCC_PCIE_CLKREF_EN controls a repeater that provides the reference clock only to the PCIe0 PHY. PCIe1 PHY receives its refclk directly from the CXO source. If the PCIe1 driver in HLOS votes for or against GCC_PCIE_CLKREF_EN, it will inadvertently modify the refclk to PCIe0 as well. Since PCIe0 is managed by WPSS while PCIe1 is managed in HLOS, there is no mechanism to coordinate these votes. As a result, HLOS may disable this repeater during suspend and cut off the PCIe0 PHY refclk while PCIe0 is still active. Replace the unused GCC_PCIE_CLKREF_EN clock entry with RPMH_CXO_CLK to reflect the actual hardware wiring and prevent unintended changes to PCIe0 clocking. Fixes: 92e0ee9f83b3 ("arm64: dts: qcom: sc7280: Add PCIe and PHY related nodes") Cc: stable@vger.kernel.org Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260123-fix_pcie1_phy_clk-v1-1-38f82ea01792@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: Add support for ECS LIVA QC710Val Packett2-0/+618
Add a device tree for the ECS LIVA QC710 (Snapdragon 7c) mini PC/devkit. Working: - Wi-Fi (wcn3990 hw1.0) - Bluetooth - USB Type-A (USB3 and USB2) - Ethernet (over USB2) - HDMI Display - eMMC - SDHC (microSD slot) Not included: - HDMI Audio - EC (IT8987) Signed-off-by: Val Packett <val@packett.cool> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120234029.419825-10-val@packett.cool [bjorn: Reordered apps_rsc and tlmm nodes] Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sdm630: add SPI7 interfaceGianluca Boiano1-0/+34
Add spi7 interface to SDM630 device tree. Signed-off-by: Gianluca Boiano <morf3089@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120193634.1089688-1-morf3089@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: Add base PURWA-IOT-EVK boardYijie Yang2-0/+1550
The PURWA-IOT-EVK is an evaluation platform for IoT products, composed of the Purwa IoT SoM and a carrier board. Together, they form a complete embedded system capable of booting to UART. PURWA-IOT-EVK uses the PS8833 as a retimer for USB0, unlike HAMOA-IOT-EVK. Meanwhile, USB0 bypasses the SBU selector FSUSB42. Make the following peripherals on the carrier board enabled: - UART - On-board regulators - USB Type-C mux - Pinctrl - Embedded USB (EUSB) repeaters - NVMe - pmic-glink - USB DisplayPorts - Bluetooth - WLAN - Audio - PCIe ports for PCIe3 through PCIe6a - TPM Signed-off-by: Yijie Yang <yijie.yang@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260202073555.1345260-4-yijie.yang@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: Add PURWA-IOT-SOM platformYijie Yang1-0/+685
The PURWA-IOT-SOM is a compact computing module that integrates a System on Chip (SoC) — specifically the x1p42100 — along with essential components optimized for IoT applications. It is designed to be mounted on carrier boards, enabling the development of complete embedded systems. Purwa uses a slightly different Iris HW revision (8.1.2 on Hamoa, 8.1.11 on Purwa). Support will be added later. Make the following peripherals on the SOM enabled: - Regulators on the SOM - Reserved memory regions - PCIe3, PCIe4, PCIe5, PCIe6a - USB0 through USB6 and their PHYs - ADSP, CDSP - Graphic Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Yijie Yang <yijie.yang@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260202073555.1345260-3-yijie.yang@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: qcs6490-rubikpi3: Use lt9611 DSI Port BHongyang Zhao1-4/+4
The LT9611 HDMI bridge on RubikPi3 has DSI physically connected to Port B. Update the devicetree to use port@1 which corresponds to Port B input on the LT9611. Fixes: f055a39f6874 ("arm64: dts: qcom: Add qcs6490-rubikpi3 board dts") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Hongyang Zhao <hongyang.zhao@thundersoft.com> Reviewed-by: Roger Shimizu <rosh@debian.org> Tested-by: Roger Shimizu <rosh@debian.org> Link: https://lore.kernel.org/r/20260207-rubikpi-next-20260116-v3-3-23b9aa189a3a@thundersoft.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: talos: Mark usb controllers are wakeup capable devicesKrishna Kurapati1-0/+3
USB controllers on talos are wakeup capable. Hence add wakeup-source property to both controller nodes. Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260128062720.437712-3-krishna.kurapati@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: talos: Flatten usb controller nodesKrishna Kurapati2-63/+43
Flatten usb controller nodes and update to using latest bindings and flattened driver approach. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260128062720.437712-2-krishna.kurapati@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: Add Redmi Note 8TBarnabás Czémán4-291/+319
Redmi Note 8T (willow) is very similar to Redmi Note 8 (ginkgo) the only difference is willow have NFC. Make a common base from ginkgo devicetree for both device. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-7-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sm6125-xiaomi-ginkgo: Fix reserved gpio rangesBarnabás Czémán1-1/+1
The device was crashing on boot because the reserved gpio ranges was wrongly defined. Correct the ranges for avoid pinctrl crashing. Fixes: 9b1a6c925c88 ("arm64: dts: qcom: sm6125: Initial support for xiaomi-ginkgo") Tested-by: Biswapriyo Nath <nathbappai@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-5-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sm6125-xiaomi-ginkgo: Remove extconBarnabás Czémán1-9/+0
GPIO pin 102 is related to DisplayPort what is not supported by this device and it is also disabled at downstream, remove the unnecessary extcon-usb node. Fixes: 9b1a6c925c88 ("arm64: dts: qcom: sm6125: Initial support for xiaomi-ginkgo") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-4-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sm6125-xiaomi-ginkgo: Set memory-region for framebufferBarnabás Czémán1-3/+3
Use memory-region property for framebuffer instead of reg. Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-3-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sm6125-xiaomi-ginkgo: Correct reserved memory rangesBarnabás Czémán1-12/+29
The device was crashing on high memory load because the reserved memory ranges was wrongly defined. Correct the ranges for avoid the crashes. Change the ramoops memory range to match with the values from the recovery to be able to get the results from the device. Fixes: 9b1a6c925c88 ("arm64: dts: qcom: sm6125: Initial support for xiaomi-ginkgo") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-2-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04arm64: dts: qcom: sm6125-xiaomi-ginkgo: Remove board-idBarnabás Czémán1-2/+0
Remove board-id it is not necessary for the bootloader. Fixes: 9b1a6c925c88 ("arm64: dts: qcom: sm6125: Initial support for xiaomi-ginkgo") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20260126-xiaomi-willow-v3-1-aad7b106c311@mainlining.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04ARM: dts: qcom: Drop unused .dtsiRob Herring (Arm)6-164/+0
These .dtsi files are not included anywhere in the tree and can't be tested. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20251212203226.458694-2-robh@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUNYosry Ahmed1-10/+18
For guests with NRIPS disabled, L1 does not provide NextRIP when running an L2 with an injected soft interrupt, instead it advances the current RIP before running it. KVM uses the current RIP as the NextRIP in vmcb02 to emulate a CPU without NRIPS. However, after L2 runs the first time, NextRIP will be updated by the CPU and/or KVM, and the current RIP is no longer the correct value to use in vmcb02. Hence, after save/restore, use the current RIP if and only if a nested run is pending, otherwise use NextRIP. Give soft_int_next_rip the same treatment, as it's the same logic, just for a narrower use case. Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE") CC: stable@vger.kernel.org Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260225005950.3739782-6-yosry@kernel.org [sean: give soft_int_next_rip the same treatment] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/mm/pat: Convert split_large_page() to use ptdescsVishal Moola (Oracle)1-6/+7
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Update split_large_page() to allocate a ptdesc instead of allocating a page for use as a page table. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-5-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert populate_pgd() to use page table apisVishal Moola (Oracle)1-2/+10
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the remaining get_zeroed_page() calls to the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as both functions require it to identify kernel page tables. Because the generic implementations do not use the second argument, pass a placeholder to avoid reimplementing them or risking breakage on other architectures. It is not obvious whether these pages are freed. Regardless, convert the remaining free paths as needed, noting that the only other possible free paths have already been converted and that a frozen page table test kernel has not reported any issues. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-4-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pmd code to use page table apisVishal Moola (Oracle)1-2/+6
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PMD allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as pmd_alloc_one() requires it to identify kernel page tables. Because the generic implementation does not use the second argument, pass a placeholder to avoid reimplementing it or risking breakage on other architectures. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-3-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pte code to use page table apisVishal Moola (Oracle)1-2/+2
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PTE allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables; otherwise, pte_alloc_one_kernel() becomes a no-op. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-2-vishal.moola@gmail.com
2026-03-04KVM: TDX: Fold tdx_bringup() into tdx_hardware_setup()Sean Christopherson3-41/+25
Now that TDX doesn't need to manually enable virtualization through _KVM_ APIs during setup, fold tdx_bringup() into tdx_hardware_setup() where the code belongs, e.g. so that KVM doesn't leave the S-EPT kvm_x86_ops wired up when TDX is disabled. The weird ordering (and naming) was necessary to allow KVM TDX to use kvm_enable_virtualization(), which in turn had a hard dependency on kvm_x86_ops.enable_virtualization_cpu and thus kvm_x86_vendor_init(). Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt/tdx: Use ida_is_empty() to detect if any TDs may be runningSean Christopherson1-13/+4
Drop nr_configured_hkid and instead use ida_is_empty() to detect if any HKIDs have been allocated/configured. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt/tdx: KVM: Consolidate TDX CPU hotplug handlingChao Gao2-69/+47
The core kernel registers a CPU hotplug callback to do VMX and TDX init and deinit while KVM registers a separate CPU offline callback to block offlining the last online CPU in a socket. Splitting TDX-related CPU hotplug handling across two components is odd and adds unnecessary complexity. Consolidate TDX-related CPU hotplug handling by integrating KVM's tdx_offline_cpu() to the one in the core kernel. Also move nr_configured_hkid to the core kernel because tdx_offline_cpu() references it. Since HKID allocation and free are handled in the core kernel, it's more natural to track used HKIDs there. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt/tdx: Tag a pile of functions as __init, and globals as __ro_after_initSean Christopherson2-63/+66
Now that TDX-Module initialization is done during subsys init, tag all related functions as __init, and relevant data as __ro_after_init. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86/tdx: Do VMXON and TDX-Module initialization during subsys initSean Christopherson4-200/+122
Now that VMXON can be done without bouncing through KVM, do TDX-Module initialization during subsys init (specifically before module_init() so that it runs before KVM when both are built-in). Aside from the obvious benefits of separating core TDX code from KVM, this will allow tagging a pile of TDX functions and globals as being __init and __ro_after_init. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt/tdx: Drop the outdated requirement that TDX be enabled in IRQ contextSean Christopherson2-16/+2
Remove TDX's outdated requirement that per-CPU enabling be done via IPI function call, which was a stale artifact leftover from early versions of the TDX enablement series. The requirement that IRQs be disabled should have been dropped as part of the revamped series that relied on a the KVM rework to enable VMX at module load. In other words, the kernel's "requirement" was never a requirement at all, but instead a reflection of how KVM enabled VMX (via IPI callback) when the TDX subsystem code was merged. Note, accessing per-CPU information is safe even without disabling IRQs, as tdx_online_cpu() is invoked via a cpuhp callback, i.e. from a per-CPU thread. Link: https://lore.kernel.org/all/ZyJOiPQnBz31qLZ7@google.com Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt: Add refcounting of VMX/SVM usage to support multiple in-kernel usersSean Christopherson4-30/+53
Implement a per-CPU refcounting scheme so that "users" of hardware virtualization, e.g. KVM and the future TDX code, can co-exist without pulling the rug out from under each other. E.g. if KVM were to disable VMX on module unload or when the last KVM VM was destroyed, SEAMCALLs from the TDX subsystem would #UD and panic the kernel. Disable preemption in the get/put APIs to ensure virtualization is fully enabled/disabled before returning to the caller. E.g. if the task were preempted after a 0=>1 transition, the new task would see a 1=>2 and thus return without enabling virtualization. Explicitly disable preemption instead of requiring the caller to do so, because the need to disable preemption is an artifact of the implementation. E.g. from KVM's perspective there is no _need_ to disable preemption as KVM guarantees the pCPU on which it is running is stable (but preemption is enabled). Opportunistically abstract away SVM vs. VMX in the public APIs by using X86_FEATURE_{SVM,VMX} to communicate what technology the caller wants to enable and use. Cc: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystemSean Christopherson9-94/+138
Move the majority of the code related to disabling hardware virtualization in emergency from KVM into the virt subsystem so that virt can take full ownership of the state of SVM/VMX. This will allow refcounting usage of SVM/VMX so that KVM and the TDX subsystem can enable VMX without stomping on each other. To route the emergency callback to the "right" vendor code, add to avoid mixing vendor and generic code, implement a x86_virt_ops structure to track the emergency callback, along with the SVM vs. VMX (vs. "none") feature that is active. To avoid having to choose between SVM and VMX, simply refuse to enable either if both are somehow supported. No known CPU supports both SVM and VMX, and it's comically unlikely such a CPU will ever exist. Leave KVM's clearing of loaded VMCSes and MSR_VM_HSAVE_PA in KVM, via a callback explicitly scoped to KVM. Loading VMCSes and saving/restoring host state are firmly tied to running VMs, and thus are (a) KVM's responsibility and (b) operations that are still exclusively reserved for KVM (as far as in-tree code is concerned). I.e. the contract being established is that non-KVM subsystems can utilize virtualization, but for all intents and purposes cannot act as full-blown hypervisors. Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: SVM: Move core EFER.SVME enablement to kernelSean Christopherson3-27/+66
Move the innermost EFER.SVME logic out of KVM and into to core x86 to land the SVM support alongside VMX support. This will allow providing a more unified API from the kernel to KVM, and will allow moving the bulk of the emergency disabling insanity out of KVM without having a weird split between kernel and KVM for SVM vs. VMX. No functional change intended. Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: VMX: Move core VMXON enablement to kernelSean Christopherson4-73/+92
Move the innermost VMXON+VMXOFF logic out of KVM and into to core x86 so that TDX can (eventually) force VMXON without having to rely on KVM being loaded, e.g. to do SEAMCALLs during initialization. Opportunistically update the comment regarding emergency disabling via NMI to clarify that virt_rebooting will be set by _another_ emergency callback, i.e. that virt_rebooting doesn't need to be set before VMCLEAR, only before _this_ invocation does VMXOFF. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt: Force-clear X86_FEATURE_VMX if configuring root VMCS failsSean Christopherson2-5/+16
If allocating and configuring a root VMCS fails, clear X86_FEATURE_VMX in all CPUs so that KVM doesn't need to manually check root_vmcs. As added bonuses, clearing VMX will reflect that VMX is unusable in /proc/cpuinfo, and will avoid a futile auto-probe of kvm-intel.ko. WARN if allocating a root VMCS page fails, e.g. to help users figure out why VMX is broken in the unlikely scenario something goes sideways during boot (and because the allocation should succeed unless there's a kernel bug). Tweak KVM's error message to suggest checking kernel logs if VMX is unsupported (in addition to checking BIOS). Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: VMX: Unconditionally allocate root VMCSes during boot CPU bringupSean Christopherson4-55/+89
Allocate the root VMCS (misleading called "vmxarea" and "kvm_area" in KVM) for each possible CPU during early boot CPU bringup, before early TDX initialization, so that TDX can eventually do VMXON on-demand (to make SEAMCALLs) without needing to load kvm-intel.ko. Allocate the pages early on, e.g. instead of trying to do so on-demand, to avoid having to juggle allocation failures at runtime. Opportunistically rename the per-CPU pointers to better reflect the role of the VMCS. Use Intel's "root VMCS" terminology, e.g. from various VMCS patents[1][2] and older SDMs, not the more opaque "VMXON region" used in recent versions of the SDM. While it's possible the VMCS passed to VMXON no longer serves as _the_ root VMCS on modern CPUs, it is still in effect a "root mode VMCS", as described in the patents. Link: https://patentimages.storage.googleapis.com/c7/e4/32/d7a7def5580667/WO2013101191A1.pdf [1] Link: https://patentimages.storage.googleapis.com/13/f6/8d/1361fab8c33373/US20080163205A1.pdf [2] Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move "kvm_rebooting" to kernel as "virt_rebooting"Sean Christopherson10-20/+41
Move "kvm_rebooting" to the kernel, exported for KVM, as one of many steps towards extracting the innermost VMXON and EFER.SVME management logic out of KVM and into to core x86. For lack of a better name, call the new file "hw.c", to yield "virt hardware" when combined with its parent directory. No functional change intended. Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: VMX: Move architectural "vmcs" and "vmcs_hdr" structures to public vmx.hSean Christopherson2-11/+11
Move "struct vmcs" and "struct vmcs_hdr" to asm/vmx.h in anticipation of moving VMXON/VMXOFF to the core kernel (VMXON requires a "root" VMCS with the appropriate revision ID in its header). No functional change intended. Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move kvm_rebooting to x86Sean Christopherson2-0/+23
Move kvm_rebooting, which is only read by x86, to KVM x86 so that it can be moved again to core x86 code. Add a "shutdown" arch hook to facilate setting the flag in KVM x86, along with a pile of comments to provide more context around what KVM x86 is doing and why. Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04arm64: make runtime const not usable by modulesJisheng Zhang1-0/+4
Similar as commit 284922f4c563 ("x86: uaccess: don't use runtime-const rewriting in modules") does, make arm64's runtime const not usable by modules too, to "make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations". The reason is well explained in the above commit: "The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code." Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-04x86/entry/vdso32: Work around libgcc unwinder bugH. Peter Anvin1-0/+30
The unwinder code in libgcc has a long standing bug which causes it to fail to pick up the signal frame CFI flag. This is a generic bug across all platforms. It affects the __kernel_sigreturn and __kernel_rt_sigreturn vdso entry points on i386. The x86-64 kernel doesn't provide a sigreturn stub, and so there is no kernel-provided code that is affected on x86-64. libgcc does have a legacy fallback path which happens to work as long as the bytes immediately before each of the sigreturn functions fall outside any function. This patch adds a nop before the ALIGN to each of the sigreturn stubs to ensure that this is, indeed, the case. The rest of the patch is just a comment which documents the invariants that need to be maintained for this legacy path to work correctly. This is a manifest bug: in the current vdso, __kernel_vsyscall is a multiple of 16 bytes long and thus __kernel_sigreturn does not have any padding in front of it. Closes: https://lore.kernel.org/lkml/f3412cc3e8f66d1853cc9d572c0f2fab076872b1.camel@xry111.site Fixes: 884961618ee5 ("x86/entry/vdso32: Remove open-coded DWARF in sigreturn.S") Reported-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124050 Link: https://patch.msgid.link/20260227010308.310342-1-hpa@zytor.com
2026-03-04x86/resctrl: Fix SNC detectionTony Luck1-31/+5
Now that the x86 topology code has a sensible nodes-per-package measure, that does not depend on the online status of CPUs, use this to divinate the SNC mode. Note that when Cluster on Die (CoD) is configured on older systems this will also show multiple NUMA nodes per package. Intel Resource Director Technology is incomaptible with CoD. Print a warning and do not use the fixup MSR_RMID_SNC_CONFIG. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/aaCxbbgjL6OZ6VMd@agluck-desk3 Link: https://patch.msgid.link/20260303110100.367976706@infradead.org
2026-03-04x86/topo: Fix SNC topology messPeter Zijlstra1-47/+143
Per 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode"), the original crazy SNC-3 SLIT table was: node distances: node 0 1 2 3 4 5 0: 10 15 17 21 28 26 1: 15 10 15 23 26 23 2: 17 15 10 26 23 21 3: 21 28 26 10 15 17 4: 23 26 23 15 10 15 5: 26 23 21 17 15 10 And per: https://lore.kernel.org/lkml/20250825075642.GQ3245006@noisy.programming.kicks-ass.net/ The suggestion was to average the off-trace clusters to restore sanity. However, 4d6dd05d07d0 implements this under various assumptions: - anything GNR/CWF with numa_in_package; - there will never be more than 2 packages; - the off-trace cluster will have distance >20 And then HPE shows up with a machine that matches the Vendor-Family-Model checks but looks like this: Here's an 8 socket (2 chassis) HPE system with SNC enabled: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 12 16 16 16 16 18 18 40 40 40 40 40 40 40 40 1: 12 10 16 16 16 16 18 18 40 40 40 40 40 40 40 40 2: 16 16 10 12 18 18 16 16 40 40 40 40 40 40 40 40 3: 16 16 12 10 18 18 16 16 40 40 40 40 40 40 40 40 4: 16 16 18 18 10 12 16 16 40 40 40 40 40 40 40 40 5: 16 16 18 18 12 10 16 16 40 40 40 40 40 40 40 40 6: 18 18 16 16 16 16 10 12 40 40 40 40 40 40 40 40 7: 18 18 16 16 16 16 12 10 40 40 40 40 40 40 40 40 8: 40 40 40 40 40 40 40 40 10 12 16 16 16 16 18 18 9: 40 40 40 40 40 40 40 40 12 10 16 16 16 16 18 18 10: 40 40 40 40 40 40 40 40 16 16 10 12 18 18 16 16 11: 40 40 40 40 40 40 40 40 16 16 12 10 18 18 16 16 12: 40 40 40 40 40 40 40 40 16 16 18 18 10 12 16 16 13: 40 40 40 40 40 40 40 40 16 16 18 18 12 10 16 16 14: 40 40 40 40 40 40 40 40 18 18 16 16 16 16 10 12 15: 40 40 40 40 40 40 40 40 18 18 16 16 16 16 12 10 10 = Same chassis and socket 12 = Same chassis and socket (SNC) 16 = Same chassis and adjacent socket 18 = Same chassis and non-adjacent socket 40 = Different chassis Turns out, the 'max 2 packages' thing is only relevant to the SNC-3 parts, the smaller parts do 8 sockets (like usual). The above SLIT table is sane, but violates the previous assumptions and trips a WARN. Now that the topology code has a sensible measure of nodes-per-package, we can use that to divinate the SNC mode at hand, and only fix up SNC-3 topologies. There is a 'healthy' amount of paranoia code validating the assumptions on the SLIT table, a simple pr_err(FW_BUG) print on failure and a fallback to using the regular table. Lets see how long this lasts :-) Fixes: 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode") Reported-by: Kyle Meyer <kyle.meyer@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.238361290@infradead.org
2026-03-04x86/topo: Replace x86_has_numa_in_packagePeter Zijlstra1-10/+3
.. with the brand spanking new topology_num_nodes_per_package(). Having the topology setup determine this value during MADT/SRAT parsing before SMP bringup avoids having to detect this situation when building the SMP topology masks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.123701837@infradead.org
2026-03-04x86/topo: Add topology_num_nodes_per_package()Peter Zijlstra3-2/+20
Use the MADT and SRAT table data to compute __num_nodes_per_package. Specifically, SRAT has already been parsed in x86_numa_init(), which is called before acpi_boot_init() which parses MADT. So both are available in topology_init_possible_cpus(). This number is useful to divinate the various Intel CoD/SNC and AMD NPS modes, since the platforms are failing to provide this otherwise. Doing it this way is independent of the number of online CPUs and other such shenanigans. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.004091624@infradead.org
2026-03-04x86/numa: Store extra copy of numa_nodes_parsedPeter Zijlstra3-0/+16
The topology setup code needs to know the total number of physical nodes enumerated in SRAT; however NUMA_EMU can cause the existing numa_nodes_parsed bitmap to be fictitious. Therefore, keep a copy of the bitmap specifically to retain the physical node count. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110059.889884023@infradead.org
2026-03-04arm64: mm: Add PTE_DIRTY back to PAGE_KERNEL* to fix kexec/hibernationCatalin Marinas1-5/+5
Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") changed pte_mkwrite_novma() to only clear PTE_RDONLY when PTE_DIRTY is set. This was to allow writable-clean PTEs for swap pages that haven't actually been written. However, this broke kexec and hibernation for some platforms. Both go through trans_pgd_create_copy() -> _copy_pte(), which calls pte_mkwrite_novma() to make the temporary linear-map copy fully writable. With the updated pte_mkwrite_novma(), read-only kernel pages (without PTE_DIRTY) remain read-only in the temporary mapping. While such behaviour is fine for user pages where hardware DBM or trapping will make them writeable, subsequent in-kernel writes by the kexec relocation code will fault. Add PTE_DIRTY back to all _PAGE_KERNEL* protection definitions. This was the case prior to 5.4, commit aa57157be69f ("arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default"). With the kernel linear-map PTEs always having PTE_DIRTY set, pte_mkwrite_novma() correctly clears PTE_RDONLY. Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@vger.kernel.org Reported-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Link: https://lore.kernel.org/r/20251204062722.3367201-1-jianpeng.chang.cn@windriver.com Cc: Will Deacon <will@kernel.org> Cc: Huang, Ying <ying.huang@linux.alibaba.com> Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Signed-off-by: Will Deacon <will@kernel.org>