summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2025-10-15x86/microcode: Introduce staging step to reduce late-loading timeChang S. Bae2-1/+14
As microcode patch sizes continue to grow, late-loading latency spikes can lead to timeouts and disruptions in running workloads. This trend of increasing patch sizes is expected to continue, so a foundational solution is needed to address the issue. To mitigate the problem, introduce a microcode staging feature. This option processes most of the microcode update (excluding activation) on a non-critical path, allowing CPUs to remain operational during the majority of the update. By offloading work from the critical path, staging can significantly reduce latency spikes. Integrate staging as a preparatory step in late-loading. Introduce a new callback for staging, which is invoked at the beginning of load_late_stop_cpus(), before CPUs enter the rendezvous phase. Staging follows an opportunistic model: * If successful, it reduces CPU rendezvous time * Even though it fails, the process falls back to the legacy path to finish the loading process but with potentially higher latency. Extend struct microcode_ops to incorporate staging properties, which will be implemented in the vendor code separately. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15x86/cpu/topology: Make primary thread mask available with SMP=nChang S. Bae4-13/+9
cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even in UP kernels there is always exactly one CPU, which can reasonably be treated as the primary thread. Historically, topology_is_primary_thread() always returned true with CONFIG_SMP=n. A recent commit: 4b455f59945aa ("cpu/SMT: Provide a default topology_is_primary_thread()") replaced it with a generic implementation with the note: "When disabling SMT, the primary thread of the SMT will remain enabled/active. Architectures that have a special primary thread (e.g. x86) need to override this function. ..." For consistency and clarity, make the primary thread mask available regardless of SMP, similar to cpu_possible_mask and cpu_present_mask. Move __cpu_primary_thread_mask into common code to prevent build issues. Let cpu_mark_primary_thread() configure the mask even for UP kernels, alongside other masks. Then, topology_is_primary_thread() can consistently reference it. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15arm64: dts: agilex5: Add GMAC0 node for NAND daughter cardBoon Khai Ng1-0/+18
Enable the GMAC0 node for the Agilex5 device when using the NAND daughter card. Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-15arm64: dts: socfpga: agilex5: Add 4-bit SPI bus widthFong, Yan Kei1-0/+2
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the agilex5 device tree. This update configures the SPI controller to use a 4-bit bus width for both transmission and reception, potentially improving SPI throughput and matching the hardware capabilities more closely. Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com> Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-15arm64: dts: socfpga: agilex: Add 4-bit SPI bus widthFong, Yan Kei1-0/+2
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the agilex device tree. This update configures the SPI controller to use a 4-bit bus width for both transmission and reception, potentially improving SPI throughput and matching the hardware capabilities more closely. Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com> Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-15arm64: dts: socfpga: stratix10: Add 4-bit SPI bus widthFong, Yan Kei1-0/+2
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the stratix10 device tree. This update configures the SPI controller to use a 4-bit bus width for both transmission and reception, potentially improving SPI throughput and matching the hardware capabilities more closely. Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com> Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-15arm64: dts: socfpga: n5x: Add 4-bit SPI bus widthFong, Yan Kei1-0/+2
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the n5x device tree. This update configures the SPI controller to use a 4-bit bus width for both transmission and reception, potentially improving SPI throughput and matching the hardware capabilities more closely. Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com> Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-15KVM: x86: Advertise EferLmsleUnsupported to userspaceJim Mattson2-0/+2
CPUID.80000008H:EBX.EferLmsleUnsupported[bit 20] is a defeature bit. When this bit is clear, EFER.LMSLE is supported. When this bit is set, EFER.LMLSE is unsupported. KVM has never _emulated_ EFER.LMSLE, so KVM cannot truly support a 0-setting of this bit. However, KVM has allowed the guest to enable EFER.LMSLE in hardware since commit eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm"), i.e. KVM partially virtualizes long-mode segment limits _if_ they are supported by the underlying hardware. Pass through the bit in KVM_GET_SUPPORTED_CPUID to advertise the unavailability of EFER.LMSLE to userspace based on the raw underlying hardware. Attempting to enable EFER.LSMLE on such CPUs simply doesn't work, e.g. immediately crashes on VMRUN. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://lore.kernel.org/r/20251001001529.1119031-2-jmattson@google.com [sean: add context about partial virtualization, use PASSTHROUGH_F] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-15livepatch: Add CONFIG_KLP_BUILDJosh Poimboeuf1-0/+1
In preparation for introducing klp-build, add a new CONFIG_KLP_BUILD option. The initial version will only be supported on x86-64. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-15x86/asm: Annotate special section entriesJosh Poimboeuf5-0/+12
In preparation for the objtool klp diff subcommand, add annotations for special section entries. This will enable objtool to determine the size and location of the entries and to extract them when needed. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-15x86/alternative: Refactor INT3 call emulation selftestJosh Poimboeuf1-23/+28
The INT3 call emulation selftest is a bit fragile as it relies on the compiler not inserting any extra instructions before the int3_selftest_ip() definition. Also, the int3_selftest_ip() symbol overlaps with the int3_selftest symbol(), which can confuse objtool. Fix those issues by slightly reworking the functionality and moving int3_selftest_ip() to a separate asm function. While at it, improve the naming. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-15x86/kprobes: Remove STACK_FRAME_NON_STANDARD annotationJosh Poimboeuf1-4/+0
Since commit 877b145f0f47 ("x86/kprobes: Move trampoline code into RODATA"), the optprobe template code is no longer analyzed by objtool so it doesn't need to be ignored. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-15x86/module: Improve relocation error messagesJosh Poimboeuf1-6/+9
Add the section number and reloc index to relocation error messages to help find the faulty relocation. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-15s390/vmlinux.lds.S: Prevent thunk functions from getting placed with normal textJosh Poimboeuf2-2/+2
The s390 indirect thunks are placed in the .text.__s390_indirect_jump_* sections. Certain config options which enable -ffunction-sections have a custom version of the TEXT_TEXT macro: .text.[0-9a-zA-Z_]* That unintentionally matches the thunk sections, causing them to get grouped with normal text rather than being handled by their intended rule later in the script: *(.text.*_indirect_*) Fix that by adding another period to the thunk section names, following the kernel's general convention for distinguishing code-generated text sections from compiler-generated ones. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14KVM: SVM: Mark VMCB_NPT as dirty on nested VMRUNJim Mattson1-0/+1
Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save() on every nested VMRUN. If L1 changes the PAT MSR between two VMRUN instructions on the same L1 vCPU, the g_pat field in the associated vmcb02 will change, and the VMCB_NPT clean bit should be cleared. Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit") Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14KVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUNJim Mattson1-0/+1
Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control() on every nested VMRUN. If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN instructions on the same L1 vCPU, the msrpm_base_pa in the associated vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared. Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit") Reported-by: Matteo Rizzo <matteorizzo@google.com> Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14arm64: dts: rockchip: Make RK3588 GPU OPP table naming less genericDragan Simic2-2/+2
Unify the naming of the existing GPU OPP table nodes found in the RK3588 and RK3588J SoC dtsi files with the other SoC's GPU OPP nodes, following the more "modern" node naming scheme. Fixes: a7b2070505a2 ("arm64: dts: rockchip: Split GPU OPPs of RK3588 and RK3588j") Signed-off-by: Dragan Simic <dsimic@manjaro.org> [opp-table also is way too generic on systems with like 4-5 opp-tables] Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Enable DisplayPort for rk3588-evb2Chaoyi Chen1-0/+48
The rk3588 evb2 board has a full size DisplayPort connector, enable for it. Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Add devicetree for the FriendlyElec NanoPi R76STianling Shen2-0/+861
The NanoPi R76S (as "R76S") is an open-sourced mini IoT gateway device with two 2.5G, designed and developed by FriendlyElec. Specification: - Rockchip RK3576 - 2/4GB LPDDR4X RAM - 2x 2500Base-T (PCIe, rtl8125b) - 3x LEDs (Power, LAN, WAN) - 32GB eMMC - MicroSD Slot - MDMI 1.4/2.0 OUT - M.2 E-Key SDIO slot - USB 3.0 Port - USB Type-C 5V Power Signed-off-by: Tianling Shen <cnsztl@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Drop 'rockchip,grf' prop from tsadc on rk3328Diederik de Haas1-1/+0
The 'rockchip,grf' property for tsadc in rk3328 wasn't actually used in the driver and is no longer allowed in the DT since commit e881662aa06a ("dt-bindings: thermal: rockchip: Tighten grf requirements") So remove that property which fixes the following DT validation issue tsadc@ff250000 (rockchip,rk3328-tsadc): rockchip,grf: False schema does not allow [[58]] Signed-off-by: Diederik de Haas <didi.debian@cknow.org> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Remove non-functioning CPU OPPs from RK3576Alexey Charkov1-12/+0
Drop the top-frequency OPPs from both the LITTLE and big CPU clusters on RK3576, as neither the opensource TF-A [1] nor the recent (after v1.08) binary BL31 images provided by Rockchip expose those. This fixes the problem [2] when the cpufreq governor tries to jump directly to the highest-frequency OPP, which results in a failed SCMI call leaving the system stuck at the previous OPP before the attempted change. [1] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/rockchip/rk3576/scmi/rk3576_clk.c#L264-L304 [2] https://lore.kernel.org/linux-rockchip/CABjd4Yz4NbqzZH4Qsed3ias56gcga9K6CmYA+BLDBxtbG915Ag@mail.gmail.com/ Fixes: 57b1ce903966 ("arm64: dts: rockchip: Add rk3576 SoC base DT") Cc: stable@vger.kernel.org Signed-off-by: Alexey Charkov <alchark@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Fix PCIe power enable pin for BigTreeTech CB2 and Pi2Andrey Leonchikov1-1/+1
Fix typo into regulator GPIO definition. With current definition, PCIe doesn't start up. Valid definition is already used in "pinctrl" section, "pcie_drv" (gpio4, RK_PB1). Fixes: bfbc663d2733a ("arm64: dts: rockchip: Add BigTreeTech CB2 and Pi2") Signed-off-by: Andrey Leonchikov <andreil499@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: defconfig: Enable Rockchip extensions for Synopsys DW DPAndy Yan1-0/+1
Enable Rockchip specific extensions for Synopsys DesignWare DisplayPort driver. This is used to provide DisplayPort output support for many boards based on RK3588 SoC. Signed-off-by: Andy Yan <andyshrk@163.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Set correct pinctrl for I2S1 8ch TX on odroid-m1Anand Moon1-0/+2
Enable proper pin multiplexing for the I2S1 8-channel transmit interface by adding the default pinctrl configuration which esures correct signal routing and avoids pinmux conflicts during audio playback. Changes fix the error [ 116.856643] [ T782] rockchip-pinctrl pinctrl: pin gpio1-10 already requested by affinity_hint; cannot claim for fe410000.i2s [ 116.857567] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: pin-42 (fe410000.i2s) [ 116.857618] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: could not request pin 42 (gpio1-10) from group i2s1m0-sdi1 on device rockchip-pinctrl [ 116.857659] [ T782] rockchip-i2s-tdm fe410000.i2s: Error applying setting, reverse things back I2S1 on the M1 to the codec in the RK809 only uses the SCLK, LRCK, SDI0 and SDO0 signals, so limit the claimed pins to those. With this change audio output works as expected: $ aplay -l **** List of PLAYBACK Hardware Devices **** card 0: HDMI [HDMI], device 0: fe400000.i2s-i2s-hifi i2s-hifi-0 [fe400000.i2s-i2s-hifi i2s-hifi-0] Subdevices: 1/1 Subdevice #0: subdevice #0 card 1: RK817 [Analog RK817], device 0: fe410000.i2s-rk817-hifi rk817-hifi-0 [fe410000.i2s-rk817-hifi rk817-hifi-0] Subdevices: 1/1 Subdevice #0: subdevice #0 Fixes: 78f858447cb7 ("arm64: dts: rockchip: Add analog audio on ODROID-M1") Cc: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Anand Moon <linux.amoon@gmail.com> [adapted the commit message a bit] Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Add DSI for RK3368WeiHao Li1-0/+37
Add the Designware MIPI DSI controller and it's port nodes. Signed-off-by: WeiHao Li <cn.liweihao@gmail.com> [removed endpoint address, as there is only one vop leading to DSI] Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Add D-PHY for RK3368WeiHao Li1-0/+11
RK3368 has a InnoSilicon D-PHY which supports DSI/LVDS/TTL with maximum trasnfer rate of 1 Gbps per lane. Signed-off-by: WeiHao Li <cn.liweihao@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14arm64: dts: rockchip: Add display subsystem for RK3368WeiHao Li1-0/+25
Add vop and display-subsystem nodes to RK3368's device tree. Signed-off-by: WeiHao Li <cn.liweihao@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14s390/mm: Support removal of boot-allocated virtual memory mapSumanth Korikkar2-9/+14
On s390, memory blocks are not currently removed via arch_remove_memory(). With upcoming dynamic memory (de)configuration support, runtime removal of memory blocks is possible. This internally involves tearing down identity mapping, virtual memory mappings and freeing the physical memory backing the struct pages metadata. During early boot, physical memory used to back the struct pages metadata in vmemmap is allocated through: setup_arch() -> sparse_init() -> sparse_init_nid() -> __populate_section_memmap() -> vmemmap_alloc_block_buf() -> sparse_buffer_alloc() -> memblock_alloc() Here, sparse_init_nid() sets up virtual-to-physical mapping for struct pages backed by memblock_alloc(). This differs from runtime addition of hotplug memory which uses the buddy allocator later. To correctly free identity mappings, vmemmap mappings during hot-remove, boot-time and runtime allocations must be distinguished using the PageReserved bit: * Boot-time memory, such as identity-mapped page tables allocated via boot_crst_alloc() and reserved via reserve_pgtables() is marked PageReserved in memmap_init_reserved_pages(). * Physical memory backing vmemmap (struct pages from memblock_alloc()) is also marked PageReserved similarly. During teardown, PageReserved bit is checked to distinguish between boot-time allocation or buddy allocation. This is similar to commit 645d5ce2f7d6 ("powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14ARM: dts: mediatek: drop wrong syscon hifsys compatible for MT2701/7623Christian Marangi2-3/+2
The syscon compatible for the hifsys node for Mediatek MT2701/MT7623 SoC was wrongly added following the pattern of other clock node but it's actually not needed as the register are not used by other device on the SoC. On top of this it's against the schema for hifsys YAML and causes a dtbs_check warning. Drop the "syscon" compatible to mute the warning and reflect the compatible property described in the mediatek,mt2701-hifsys.yaml schema. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-14x86/alternative: Drop not needed test after call of alt_replace_call()Juergen Gross1-6/+3
alt_replace_call() will never return a negative value, so testing the return value to be less than zero can be dropped. This makes it possible to switch the return type of alt_replace_call() and the type of insn_buff_sz to unsigned int. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-14arm64: Revamp HCR_EL2.E2H RES1 detectionMarc Zyngier1-6/+32
We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com
2025-10-14Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigsTheodore Ts'o109-182/+182
Commit d6ace46c82fd ("ext4: remove obsolete EXT3 config options") removed the obsolete EXT3_CONFIG options, since it had been over a decade since fs/ext3 had been removed. Unfortunately, there were a number of defconfigs that still used CONFIG_EXT3_FS which the cleanup commit didn't fix up. This led to a large number of defconfig test builds to fail. Oops. Fixes: d6ace46c82fd ("ext4: remove obsolete EXT3 config options") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-10-14KVM: TDX: Replace kmalloc + copy_from_user with memdup_user in tdx_td_init()Thorsten Blum1-20/+10
Use get_user() to retrieve the number of entries instead of allocating memory for 'init_vm' with the maximum size, copying 'cmd->data' to it, only to then read the actual entry count 'cpuid.nent' from the copy. Use memdup_user() to allocate just enough memory to fit all entries and to copy 'cmd->data' from userspace. Use struct_size() instead of manually calculating the number of bytes to allocate and copy. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250916213129.2535597-2-thorsten.blum@linux.dev [sean: s/user_init_vm/user_data] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14KVM: nVMX: Use vcpu instead of vmx->vcpu when vcpu is availableXin Li1-4/+4
Prefer using vcpu directly when available, instead of accessing it through vmx->vcpu. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250924145421.2046822-1-xin@zytor.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14KVM: VMX: Remove stale vmx_set_dr6() declarationDmytro Maluka1-1/+0
Remove leftover after commit 80c64c7afea1 ("KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag") which removed vmx_set_dr6(). Signed-off-by: Dmytro Maluka <dmaluka@chromium.org> Link: https://lore.kernel.org/r/20250926155724.1619716-1-dmaluka@chromium.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14KVM: x86/mmu: Skip MMIO SPTE invalidation if enable_mmio_caching=0Dmytro Maluka1-0/+3
If MMIO caching is disabled, there are no MMIO SPTEs to invalidate, so the costly zapping of all pages is unnecessary even in the unlikely case when the MMIO generation number has wrapped. Signed-off-by: Dmytro Maluka <dmaluka@chromium.org> Link: https://lore.kernel.org/r/20250926135139.1597781-1-dmaluka@chromium.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14x86/fred: Fix 64bit identifier in fred_ssAndrew Cooper3-5/+5
FRED can only be enabled in Long Mode. This is the 64bit mode (as opposed to compatibility mode) identifier, rather than being something hard-wired at 1. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-10-13x86/mm: Fix SMP ordering in switch_mm_irqs_off()Ingo Molnar1-2/+22
Stephen noted that it is possible to not have an smp_mb() between the loaded_mm store and the tlb_gen load in switch_mm(), meaning the ordering against flush_tlb_mm_range() goes out the window, and it becomes possible for switch_mm() to not observe a recent tlb_gen update and fail to flush the TLBs. [ dhansen: merge conflict fixed by Ingo ] Fixes: 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily") Reported-by: Stephen Dolan <sdolan@janestreet.com> Closes: https://lore.kernel.org/all/CAHDw0oGd0B4=uuv8NGqbUQ_ZVmSheU2bN70e4QhFXWvuAZdt2w@mail.gmail.com/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2025-10-13x86/mm: Fix overflow in __cpa_addr()Rik van Riel1-1/+1
The change to have cpa_flush() call flush_kernel_pages() introduced a bug where __cpa_addr() can access an address one larger than the largest one in the cpa->pages array. KASAN reports the issue like this: BUG: KASAN: slab-out-of-bounds in __cpa_addr arch/x86/mm/pat/set_memory.c:309 [inline] BUG: KASAN: slab-out-of-bounds in __cpa_addr+0x1d3/0x220 arch/x86/mm/pat/set_memory.c:306 Read of size 8 at addr ffff88801f75e8f8 by task syz.0.17/5978 This bug could cause cpa_flush() to not properly flush memory, which somehow never showed any symptoms in my tests, possibly because cpa_flush() is called so rarely, but could potentially cause issues for other people. Fix the issue by directly calculating the flush end address from the start address. Fixes: 86e6815b316e ("x86/mm: Change cpa_flush() to call flush_kernel_range() directly") Reported-by: syzbot+afec6555eef563c66c97@syzkaller.appspotmail.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Link: https://lore.kernel.org/all/68e2ff90.050a0220.2c17c1.0038.GAE@google.com/
2025-10-13parisc: Drop padding fields and layers entries from inventory logHelge Deller1-7/+1
Drop those, as they are zero and not used by HPPA-SeaBIOS. Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-13x86/resctrl: Fix miscount of bandwidth event when reactivating previously ↵Babu Moger1-4/+10
unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-10-13ARM: dts: broadcom: rpi: Switch to V3D firmware clockStefan Wahren2-0/+17
Until commit 919d6924ae9b ("clk: bcm: rpi: Turn firmware clock on/off when preparing/unpreparing") the clk-raspberrypi driver wasn't able to change the state of the V3D clock. Only the clk-bcm2835 was able to do this before. After this commit both drivers were able to work against each other, which could result in a system freeze. One step to avoid this conflict is to switch all V3D consumer to the firmware clock. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/linux-arm-kernel/727aa0c8-2981-4662-adf3-69cac2da956d@samsung.com/ Fixes: 919d6924ae9b ("clk: bcm: rpi: Turn firmware clock on/off when preparing/unpreparing") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Co-developed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251005113816.6721-1-wahrenst@gmx.net Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-13arm64: dts: broadcom: bcm2712: Define VGIC interruptPeter Robinson1-0/+2
Define the interrupt in the GICv2 for vGIC so KVM can be used, it was missed from the original upstream DTB for some reason. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Cc: Andrea della Porta <andrea.porta@suse.com> Cc: Phil Elwell <phil@raspberrypi.com> Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5") Link: https://lore.kernel.org/r/20250924085612.1039247-1-pbrobinson@gmail.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-13x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater ForestChen Yu1-0/+1
Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com>
2025-10-13x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specificBorislav Petkov (AMD)5-9/+10
That cpuinfo_x86.x86_capability[] element was supposed to mirror CPUID flags from CPUID_0x80000007_EBX but that leaf has still to this day only three bits defined in it. So move those bits to scattered.c and free the capability element for synthetic flags. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-13riscv: dts: spacemit: add i2c aliases on BPI-F3Aurelien Jarno1-0/+2
Add i2c aliases for i2c2 and i2c8 on BPI-F3. This is useful to keep a stable number for the /dev entries after loading the i2c-dev module. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Reviewed-by: Yixun Lan <dlan@gentoo.org> Link: https://lore.kernel.org/r/20250926175833.3048516-4-aurelien@aurel32.net Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13riscv: dts: spacemit: add 24c02 eeprom on BPI-F3Aurelien Jarno1-1/+24
The BPI-F3 board includes a 24c02 eeprom, that stores the MAC addresses of the two network interfaces and the board's serial number. These values are also exposed via an onie,tlv-layout nvmem layout. The eeprom is marked as read-only since its contents are not supposed to be modified. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Reviewed-by: Yixun Lan <dlan@gentoo.org> Link: https://lore.kernel.org/r/20250926175833.3048516-3-aurelien@aurel32.net Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13riscv: dts: spacemit: enable the i2c2 adapter on BPI-F3Aurelien Jarno3-0/+26
Define properties for the I2C adapter, and enable it on the BPI-F3. It will be used by the 24c02 eeprom. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Reviewed-by: Yixun Lan <dlan@gentoo.org> Link: https://lore.kernel.org/r/20250926175833.3048516-2-aurelien@aurel32.net Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when availableOliver Upton2-4/+15
Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Compute per-vCPU FGTs at vcpu_load()Oliver Upton5-131/+151
To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>