| Age | Commit message (Collapse) | Author | Files | Lines |
|
As microcode patch sizes continue to grow, late-loading latency spikes can
lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational solution is
needed to address the issue.
To mitigate the problem, introduce a microcode staging feature. This option
processes most of the microcode update (excluding activation) on
a non-critical path, allowing CPUs to remain operational during the majority
of the update. By offloading work from the critical path, staging can
significantly reduce latency spikes.
Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.
Staging follows an opportunistic model:
* If successful, it reduces CPU rendezvous time
* Even though it fails, the process falls back to the legacy path to
finish the loading process but with potentially higher latency.
Extend struct microcode_ops to incorporate staging properties, which will be
implemented in the vendor code separately.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
|
|
cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even
in UP kernels there is always exactly one CPU, which can reasonably be
treated as the primary thread.
Historically, topology_is_primary_thread() always returned true with
CONFIG_SMP=n. A recent commit:
4b455f59945aa ("cpu/SMT: Provide a default topology_is_primary_thread()")
replaced it with a generic implementation with the note:
"When disabling SMT, the primary thread of the SMT will remain
enabled/active. Architectures that have a special primary thread (e.g.
x86) need to override this function. ..."
For consistency and clarity, make the primary thread mask available
regardless of SMP, similar to cpu_possible_mask and cpu_present_mask.
Move __cpu_primary_thread_mask into common code to prevent build issues.
Let cpu_mark_primary_thread() configure the mask even for UP kernels,
alongside other masks. Then, topology_is_primary_thread() can
consistently reference it.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250320234104.8288-1-chang.seok.bae@intel.com
|
|
Enable the GMAC0 node for the Agilex5 device when using the NAND
daughter card.
Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
agilex5 device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.
Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
agilex device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.
Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
stratix10 device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.
Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
n5x device tree. This update configures the SPI controller to use a 4-bit
bus width for both transmission and reception, potentially improving SPI
throughput and matching the hardware capabilities more closely.
Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
CPUID.80000008H:EBX.EferLmsleUnsupported[bit 20] is a defeature
bit. When this bit is clear, EFER.LMSLE is supported. When this bit is
set, EFER.LMLSE is unsupported. KVM has never _emulated_ EFER.LMSLE, so
KVM cannot truly support a 0-setting of this bit.
However, KVM has allowed the guest to enable EFER.LMSLE in hardware
since commit eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with
nested svm"), i.e. KVM partially virtualizes long-mode segment limits _if_
they are supported by the underlying hardware.
Pass through the bit in KVM_GET_SUPPORTED_CPUID to advertise the
unavailability of EFER.LMSLE to userspace based on the raw underlying
hardware. Attempting to enable EFER.LSMLE on such CPUs simply doesn't
work, e.g. immediately crashes on VMRUN.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://lore.kernel.org/r/20251001001529.1119031-2-jmattson@google.com
[sean: add context about partial virtualization, use PASSTHROUGH_F]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In preparation for introducing klp-build, add a new CONFIG_KLP_BUILD
option. The initial version will only be supported on x86-64.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
In preparation for the objtool klp diff subcommand, add annotations for
special section entries. This will enable objtool to determine the size
and location of the entries and to extract them when needed.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
The INT3 call emulation selftest is a bit fragile as it relies on the
compiler not inserting any extra instructions before the
int3_selftest_ip() definition.
Also, the int3_selftest_ip() symbol overlaps with the int3_selftest
symbol(), which can confuse objtool.
Fix those issues by slightly reworking the functionality and moving
int3_selftest_ip() to a separate asm function. While at it, improve the
naming.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Since commit 877b145f0f47 ("x86/kprobes: Move trampoline code into
RODATA"), the optprobe template code is no longer analyzed by objtool so
it doesn't need to be ignored.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Add the section number and reloc index to relocation error messages to
help find the faulty relocation.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
The s390 indirect thunks are placed in the .text.__s390_indirect_jump_*
sections.
Certain config options which enable -ffunction-sections have a custom
version of the TEXT_TEXT macro:
.text.[0-9a-zA-Z_]*
That unintentionally matches the thunk sections, causing them to get
grouped with normal text rather than being handled by their intended
rule later in the script:
*(.text.*_indirect_*)
Fix that by adding another period to the thunk section names, following
the kernel's general convention for distinguishing code-generated text
sections from compiler-generated ones.
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save()
on every nested VMRUN.
If L1 changes the PAT MSR between two VMRUN instructions on the same
L1 vCPU, the g_pat field in the associated vmcb02 will change, and the
VMCB_NPT clean bit should be cleared.
Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control()
on every nested VMRUN.
If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN
instructions on the same L1 vCPU, the msrpm_base_pa in the associated
vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared.
Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Unify the naming of the existing GPU OPP table nodes found in the RK3588
and RK3588J SoC dtsi files with the other SoC's GPU OPP nodes, following
the more "modern" node naming scheme.
Fixes: a7b2070505a2 ("arm64: dts: rockchip: Split GPU OPPs of RK3588 and RK3588j")
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
[opp-table also is way too generic on systems with like 4-5 opp-tables]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The rk3588 evb2 board has a full size DisplayPort connector, enable
for it.
Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The NanoPi R76S (as "R76S") is an open-sourced mini IoT gateway
device with two 2.5G, designed and developed by FriendlyElec.
Specification:
- Rockchip RK3576
- 2/4GB LPDDR4X RAM
- 2x 2500Base-T (PCIe, rtl8125b)
- 3x LEDs (Power, LAN, WAN)
- 32GB eMMC
- MicroSD Slot
- MDMI 1.4/2.0 OUT
- M.2 E-Key SDIO slot
- USB 3.0 Port
- USB Type-C 5V Power
Signed-off-by: Tianling Shen <cnsztl@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The 'rockchip,grf' property for tsadc in rk3328 wasn't actually used in
the driver and is no longer allowed in the DT since commit
e881662aa06a ("dt-bindings: thermal: rockchip: Tighten grf requirements")
So remove that property which fixes the following DT validation issue
tsadc@ff250000 (rockchip,rk3328-tsadc): rockchip,grf: False schema does not allow [[58]]
Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Drop the top-frequency OPPs from both the LITTLE and big CPU clusters on
RK3576, as neither the opensource TF-A [1] nor the recent (after v1.08)
binary BL31 images provided by Rockchip expose those.
This fixes the problem [2] when the cpufreq governor tries to jump
directly to the highest-frequency OPP, which results in a failed SCMI call
leaving the system stuck at the previous OPP before the attempted change.
[1] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/rockchip/rk3576/scmi/rk3576_clk.c#L264-L304
[2] https://lore.kernel.org/linux-rockchip/CABjd4Yz4NbqzZH4Qsed3ias56gcga9K6CmYA+BLDBxtbG915Ag@mail.gmail.com/
Fixes: 57b1ce903966 ("arm64: dts: rockchip: Add rk3576 SoC base DT")
Cc: stable@vger.kernel.org
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Fix typo into regulator GPIO definition. With current definition, PCIe
doesn't start up. Valid definition is already used in "pinctrl" section,
"pcie_drv" (gpio4, RK_PB1).
Fixes: bfbc663d2733a ("arm64: dts: rockchip: Add BigTreeTech CB2 and Pi2")
Signed-off-by: Andrey Leonchikov <andreil499@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Enable Rockchip specific extensions for Synopsys DesignWare DisplayPort
driver. This is used to provide DisplayPort output support for many boards
based on RK3588 SoC.
Signed-off-by: Andy Yan <andyshrk@163.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Enable proper pin multiplexing for the I2S1 8-channel transmit interface by
adding the default pinctrl configuration which esures correct signal routing
and avoids pinmux conflicts during audio playback.
Changes fix the error
[ 116.856643] [ T782] rockchip-pinctrl pinctrl: pin gpio1-10 already requested by affinity_hint; cannot claim for fe410000.i2s
[ 116.857567] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: pin-42 (fe410000.i2s)
[ 116.857618] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: could not request pin 42 (gpio1-10) from group i2s1m0-sdi1 on device rockchip-pinctrl
[ 116.857659] [ T782] rockchip-i2s-tdm fe410000.i2s: Error applying setting, reverse things back
I2S1 on the M1 to the codec in the RK809 only uses the SCLK, LRCK, SDI0
and SDO0 signals, so limit the claimed pins to those.
With this change audio output works as expected:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: HDMI [HDMI], device 0: fe400000.i2s-i2s-hifi i2s-hifi-0 [fe400000.i2s-i2s-hifi i2s-hifi-0]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: RK817 [Analog RK817], device 0: fe410000.i2s-rk817-hifi rk817-hifi-0 [fe410000.i2s-rk817-hifi rk817-hifi-0]
Subdevices: 1/1
Subdevice #0: subdevice #0
Fixes: 78f858447cb7 ("arm64: dts: rockchip: Add analog audio on ODROID-M1")
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
[adapted the commit message a bit]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add the Designware MIPI DSI controller and it's port nodes.
Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
[removed endpoint address, as there is only one vop leading to DSI]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
RK3368 has a InnoSilicon D-PHY which supports DSI/LVDS/TTL with maximum
trasnfer rate of 1 Gbps per lane.
Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add vop and display-subsystem nodes to RK3368's device tree.
Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
On s390, memory blocks are not currently removed via
arch_remove_memory(). With upcoming dynamic memory (de)configuration
support, runtime removal of memory blocks is possible. This internally
involves tearing down identity mapping, virtual memory mappings and
freeing the physical memory backing the struct pages metadata.
During early boot, physical memory used to back the struct pages
metadata in vmemmap is allocated through:
setup_arch()
-> sparse_init()
-> sparse_init_nid()
-> __populate_section_memmap()
-> vmemmap_alloc_block_buf()
-> sparse_buffer_alloc()
-> memblock_alloc()
Here, sparse_init_nid() sets up virtual-to-physical mapping for struct
pages backed by memblock_alloc(). This differs from runtime addition of
hotplug memory which uses the buddy allocator later.
To correctly free identity mappings, vmemmap mappings during hot-remove,
boot-time and runtime allocations must be distinguished using the
PageReserved bit:
* Boot-time memory, such as identity-mapped page tables allocated via
boot_crst_alloc() and reserved via reserve_pgtables() is marked
PageReserved in memmap_init_reserved_pages().
* Physical memory backing vmemmap (struct pages from memblock_alloc())
is also marked PageReserved similarly.
During teardown, PageReserved bit is checked to distinguish between
boot-time allocation or buddy allocation.
This is similar to commit 645d5ce2f7d6 ("powerpc/mm/radix: Fix PTE/PMD
fragment count for early page table mappings")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The syscon compatible for the hifsys node for Mediatek MT2701/MT7623 SoC
was wrongly added following the pattern of other clock node but it's
actually not needed as the register are not used by other device on the
SoC.
On top of this it's against the schema for hifsys YAML and causes a
dtbs_check warning.
Drop the "syscon" compatible to mute the warning and reflect the
compatible property described in the mediatek,mt2701-hifsys.yaml schema.
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
alt_replace_call() will never return a negative value, so testing the
return value to be less than zero can be dropped.
This makes it possible to switch the return type of alt_replace_call()
and the type of insn_buff_sz to unsigned int.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
We currently have two ways to identify CPUs that only implement FEAT_VHE
and not FEAT_E2H0:
- either they advertise it via ID_AA64MMFR4_EL1.E2H0,
- or the HCR_EL2.E2H bit is RAO/WI
However, there is a third category of "cpus" that fall between these
two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether
an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value
is zero.
A consequence of this is that on systems such as Neoverse V2, a NV
guest cannot reliably detect that it is in a VHE-only configuration
(E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's
best effort to repaint the id register.
Replace the RAO/WI test by a sequence that makes use of the VHE
register remnapping between EL1 and EL2 to detect this situation,
and work out whether we get the VHE behaviour even after having
set HCR_EL2.E2H to 0.
This solves the NV problem, and provides a more reliable acid test
for CPUs that do not completely follow the letter of the architecture
while providing a RES1 behaviour for HCR_EL2.E2H.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Jan Kotas <jank@cadence.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com
|
|
Commit d6ace46c82fd ("ext4: remove obsolete EXT3 config options")
removed the obsolete EXT3_CONFIG options, since it had been over a
decade since fs/ext3 had been removed. Unfortunately, there were a
number of defconfigs that still used CONFIG_EXT3_FS which the cleanup
commit didn't fix up. This led to a large number of defconfig test
builds to fail. Oops.
Fixes: d6ace46c82fd ("ext4: remove obsolete EXT3 config options")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use get_user() to retrieve the number of entries instead of allocating
memory for 'init_vm' with the maximum size, copying 'cmd->data' to it,
only to then read the actual entry count 'cpuid.nent' from the copy.
Use memdup_user() to allocate just enough memory to fit all entries and
to copy 'cmd->data' from userspace. Use struct_size() instead of
manually calculating the number of bytes to allocate and copy.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20250916213129.2535597-2-thorsten.blum@linux.dev
[sean: s/user_init_vm/user_data]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Prefer using vcpu directly when available, instead of accessing it
through vmx->vcpu.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250924145421.2046822-1-xin@zytor.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Remove leftover after commit 80c64c7afea1 ("KVM: x86: Drop
kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag") which removed
vmx_set_dr6().
Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Link: https://lore.kernel.org/r/20250926155724.1619716-1-dmaluka@chromium.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
If MMIO caching is disabled, there are no MMIO SPTEs to invalidate, so
the costly zapping of all pages is unnecessary even in the unlikely case
when the MMIO generation number has wrapped.
Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Link: https://lore.kernel.org/r/20250926135139.1597781-1-dmaluka@chromium.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
FRED can only be enabled in Long Mode. This is the 64bit mode (as opposed to
compatibility mode) identifier, rather than being something hard-wired at 1.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
|
|
Stephen noted that it is possible to not have an smp_mb() between
the loaded_mm store and the tlb_gen load in switch_mm(), meaning the
ordering against flush_tlb_mm_range() goes out the window, and it
becomes possible for switch_mm() to not observe a recent tlb_gen
update and fail to flush the TLBs.
[ dhansen: merge conflict fixed by Ingo ]
Fixes: 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Closes: https://lore.kernel.org/all/CAHDw0oGd0B4=uuv8NGqbUQ_ZVmSheU2bN70e4QhFXWvuAZdt2w@mail.gmail.com/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
|
|
The change to have cpa_flush() call flush_kernel_pages() introduced
a bug where __cpa_addr() can access an address one larger than the
largest one in the cpa->pages array.
KASAN reports the issue like this:
BUG: KASAN: slab-out-of-bounds in __cpa_addr arch/x86/mm/pat/set_memory.c:309 [inline]
BUG: KASAN: slab-out-of-bounds in __cpa_addr+0x1d3/0x220 arch/x86/mm/pat/set_memory.c:306
Read of size 8 at addr ffff88801f75e8f8 by task syz.0.17/5978
This bug could cause cpa_flush() to not properly flush memory,
which somehow never showed any symptoms in my tests, possibly
because cpa_flush() is called so rarely, but could potentially
cause issues for other people.
Fix the issue by directly calculating the flush end address
from the start address.
Fixes: 86e6815b316e ("x86/mm: Change cpa_flush() to call flush_kernel_range() directly")
Reported-by: syzbot+afec6555eef563c66c97@syzkaller.appspotmail.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Link: https://lore.kernel.org/all/68e2ff90.050a0220.2c17c1.0038.GAE@google.com/
|
|
Drop those, as they are zero and not used by HPPA-SeaBIOS.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
unavailable RMID
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$mount -t resctrl resctrl /sys/fs/resctrl
$mkdir /sys/fs/resctrl/mon_groups/test1/
$echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
17592178699059 <- Overflow on domain 0
3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
Bandwidth (MBM).
[ bp: Split commit message into smaller paragraph chunks for better
consumption. ]
Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
Until commit 919d6924ae9b ("clk: bcm: rpi: Turn firmware clock on/off
when preparing/unpreparing") the clk-raspberrypi driver wasn't able
to change the state of the V3D clock. Only the clk-bcm2835 was able
to do this before. After this commit both drivers were able to work
against each other, which could result in a system freeze. One step
to avoid this conflict is to switch all V3D consumer to the firmware
clock.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/linux-arm-kernel/727aa0c8-2981-4662-adf3-69cac2da956d@samsung.com/
Fixes: 919d6924ae9b ("clk: bcm: rpi: Turn firmware clock on/off when preparing/unpreparing")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Co-developed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251005113816.6721-1-wahrenst@gmx.net
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
|
|
Define the interrupt in the GICv2 for vGIC so KVM
can be used, it was missed from the original upstream
DTB for some reason.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Cc: Andrea della Porta <andrea.porta@suse.com>
Cc: Phil Elwell <phil@raspberrypi.com>
Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5")
Link: https://lore.kernel.org/r/20250924085612.1039247-1-pbrobinson@gmail.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
|
|
Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
|
|
That cpuinfo_x86.x86_capability[] element was supposed to mirror CPUID flags
from CPUID_0x80000007_EBX but that leaf has still to this day only three bits
defined in it. So move those bits to scattered.c and free the capability
element for synthetic flags.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Add i2c aliases for i2c2 and i2c8 on BPI-F3. This is useful to keep a
stable number for the /dev entries after loading the i2c-dev module.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-4-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
|
|
The BPI-F3 board includes a 24c02 eeprom, that stores the MAC addresses
of the two network interfaces and the board's serial number. These
values are also exposed via an onie,tlv-layout nvmem layout.
The eeprom is marked as read-only since its contents are not supposed to
be modified.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-3-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
|
|
Define properties for the I2C adapter, and enable it on the BPI-F3. It
will be used by the 24c02 eeprom.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-2-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
|
|
Marc reports that the performance of running an L3 guest has regressed
by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture.
That's of course terrible for the single user of recursive NV ;-)
While there's nothing to be done on non-FGT systems, take advantage of
the precise write trap of MDSCR_EL1 and leave the rest of the debug
registers untrapped.
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
To date KVM has used the fine-grained traps for the sake of UNDEF
enforcement (so-called FGUs), meaning the constituent parts could be
computed on a per-VM basis and folded into the effective value when
programmed.
Prepare for traps changing based on the vCPU context by computing the
whole mess of them at vcpu_load(). Aggressively inline all the helpers
to preserve the build-time checks that were there before.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|