Age | Commit message (Collapse) | Author | Files | Lines |
|
Enable the SD_ASYM_PACKING domain flag for the PKG domain on AMD heterogeneous
processors. This flag is beneficial for processors with one or more CCDs and
relies on x86_sched_itmt_flags().
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241025171459.1093-4-mario.limonciello@amd.com
|
|
CPUID leaf 0x80000026 advertises core types with different efficiency
rankings.
Bit 30 indicates the heterogeneous core topology feature, if the bit
set, it means not all instances at the current hierarchical level have
the same core topology.
This is described in the AMD64 Architecture Programmers Manual Volume
2 and 3, doc ID #25493 and #25494.
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-3-mario.limonciello@amd.com
|
|
This feature is an AMD unique feature of some processors, so put
AMD into the name.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-2-mario.limonciello@amd.com
|
|
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical
accesses in kernel space. And so using just the high bit to decide
whether an access is in user space or kernel space ends up with the good
old "leak speculative data" if you have the right gadget using the
result:
CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“
Now, the kernel surrounds the access with a STAC/CLAC pair, and those
instructions end up serializing execution on older Zen architectures,
which closes the speculation window.
But that was true only up until Zen 5, which renames the AC bit [1].
That improves performance of STAC/CLAC a lot, but also means that the
speculation window is now open.
Note that this affects not just the new address masking, but also the
regular valid_user_address() check used by access_ok(), and the asm
version of the sign bit check in the get_user() helpers.
It does not affect put_user() or clear_user() variants, since there's no
speculative result to be used in a gadget for those operations.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/
Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1]
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html
Link: https://arxiv.org/pdf/2108.10771
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case
Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional")
Fixes: 6014bc27561f ("x86-64: make access_ok() independent of LAM")
Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, an unconditional cache flush is performed during every
microcode update. Although the original changelog did not mention
a specific erratum, this measure was primarily intended to address
a specific microcode bug, the load of which has already been blocked by
is_blacklisted(). Therefore, this cache flush is no longer necessary.
Additionally, the side effects of doing this have been overlooked. It
increases CPU rendezvous time during late loading, where the cache flush
takes between 1x to 3.5x longer than the actual microcode update.
Remove native_wbinvd() and update the erratum name to align with the
latest errata documentation, document ID 334163 Version 022US.
[ bp: Zap the flaky documentation URL. ]
Fixes: 91df9fdf5149 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
Link: https://lore.kernel.org/r/20241001161042.465584-2-chang.seok.bae@intel.com
|
|
This function should've been split a long time ago because it is used in
two paths:
1) On the late loading path, when the microcode is loaded through the
request_firmware interface
2) In the save_microcode_in_initrd() path which collects all the
microcode patches which are relevant for the current system before
the initrd with the microcode container has been jettisoned.
In that path, it is not really necessary to iterate over the nodes on
a system and match a patch however it didn't cause any trouble so it
was left for a later cleanup
However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().
So split that function and do only the necessary work needed at each
stage.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
|
|
Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
the patch matching needs to happen with different rigidity, depending on what
is being done:
1) When the patch is added to the patches cache, the stepping must be ignored
because the driver still supports different steppings per system
2) When the patch is matched for loading, then the stepping must be taken into
account because each CPU needs the patch matching its exact stepping
Take care of that by making the matching smarter.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and
ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced in
6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace
misconfigures the emulation, resulting in various splats. Headaches
courtesy of our Syzkaller friends
- Stop wasting space in the HYP idmap, as we are dangerously close to
the 4kB limit, and this has already exploded in -next
- Fix another race in vgic_init()
- Fix a UBSAN error when faking the cache topology with MTE enabled
RISCV:
- RISCV: KVM: use raw_spinlock for critical section in imsic
x86:
- A bandaid for lack of XCR0 setup in selftests, which causes trouble
if the compiler is configured to have x86-64-v3 (with AVX) as the
default ISA. Proper XCR0 setup will come in the next merge window.
- Fix an issue where KVM would not ignore low bits of the nested CR3
and potentially leak up to 31 bytes out of the guest memory's
bounds
- Fix case in which an out-of-date cached value for the segments
could by returned by KVM_GET_SREGS.
- More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
- Override MTRR state for KVM confidential guests, making it WB by
default as is already the case for Hyper-V guests.
Generic:
- Remove a couple of unused functions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
RISCV: KVM: use raw_spinlock for critical section in imsic
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
KVM: selftests: x86: Avoid using SSE/AVX instructions
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
x86/kvm: Override default caching mode for SEV-SNP and TDX
KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
KVM: Remove unused kvm_vcpu_gfn_to_pfn
KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
KVM: arm64: Fix shift-out-of-bounds bug
KVM: arm64: Shave a few bytes from the EL2 idmap code
KVM: arm64: Don't eagerly teardown the vgic on init error
KVM: arm64: Expose S1PIE to guests
KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Explicitly disable the TSC deadline timer when going idle to address
some CPU errata in that area
- Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
late microcode loading path
- Clear CPU buffers later in the NMI exit path on 32-bit to avoid
register clearing while they still contain sensitive data, for the
RDFS mitigation
- Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
path on 32-bit
- Fix parsing issues of memory bandwidth specification in sysfs for
resctrl's memory bandwidth allocation feature
- Other small cleanups and improvements
* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Always explicitly disarm TSC-deadline timer
x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
x86/bugs: Use code segment selector for VERW operand
x86/entry_32: Clear CPU buffers after register restore in NMI return
x86/entry_32: Do not clobber user EFLAGS.ZF
x86/resctrl: Annotate get_mem_config() functions as __init
x86/resctrl: Avoid overflow in MB settings in bw_validate()
x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
|
|
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).
This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().
Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When arch_stack_walk_reliable() is called to unwind for newly forked
tasks, the return value is negative which means the call stack is
unreliable. This obviously does not meet expectations.
The root cause is that after commit 3aec4ecb3d1f ("x86: Rewrite
ret_from_fork() in C"), the 'ret_addr' of newly forked task is changed
to 'ret_from_fork_asm' (see copy_thread()), then at the start of the
unwind, it is incorrectly interprets not as a "signal" one because
'ret_from_fork' is still used to determine the initial "signal" (see
__unwind_start()). Then the address gets incorrectly decremented in the
call to orc_find() (see unwind_next_frame()) and resulting in the
incorrect ORC data.
To fix it, check 'ret_from_fork_asm' rather than 'ret_from_fork' in
__unwind_start().
Fixes: 3aec4ecb3d1f ("x86: Rewrite ret_from_fork() in C")
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
When kernel IBT is enabled, objtool detects all text references in order
to determine which functions can be indirectly branched to.
In text, such references look like one of the following:
mov $0x0,%rax R_X86_64_32S .init.text+0x7e0a0
lea 0x0(%rip),%rax R_X86_64_PC32 autoremove_wake_function-0x4
Either way the function pointer is denoted by a relocation, so objtool
just reads that.
However there are some "lea xxx(%rip)" cases which don't use relocations
because they're referencing code in the same translation unit. Objtool
doesn't have visibility to those.
The only currently known instances of that are a few hand-coded asm text
references which don't actually need ENDBR. So it's not actually a
problem at the moment.
However if we enable -fpie, the compiler would start generating them and
there would definitely be bugs in the IBT sealing.
Detect non-relocated text references and handle them appropriately.
[ Note: I removed the manual static_call_tramp check -- that should
already be handled by the noendbr check. ]
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Use the irq_get_nr_irqs() and irq_set_nr_irqs() functions instead of the
global variable 'nr_irqs'. Prepare for changing 'nr_irqs' from an
exported global variable into a variable with file scope.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241015190953.1266194-7-bvanassche@acm.org
|
|
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.
The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.
Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.
Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.
1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revision-guides/41322_10h_Rev_Gd.pdf
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
|
|
'mon_info' is already zeroed in the list_for_each_entry() loop below. There
is no need to explicitly initialize it here. It just wastes some space and
cycles.
Remove this un-needed code.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
74967 5103 1880 81950 1401e arch/x86/kernel/cpu/resctrl/rdtgroup.o
After:
=====
text data bss dec hex filename
74903 5103 1880 81886 13fde arch/x86/kernel/cpu/resctrl/rdtgroup.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/b2ebc809c8b6c6440d17b12ccf7c2d29aaafd488.1720868538.git.christophe.jaillet@wanadoo.fr
|
|
Commit
f69759be251d ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
causes a bit in the DE_CFG MSR to get set erroneously after a microcode late
load.
The microcode late load path calls into amd_check_microcode() and subsequently
zen2_zenbleed_check(). Since the above commit removes the cpu_has_amd_erratum()
call from zen2_zenbleed_check(), this will cause all non-Zen2 CPUs to go
through the function and set the bit in the DE_CFG MSR.
Call into the Zenbleed fix path on Zen2 CPUs only.
[ bp: Massage commit message, use cpu_feature_enabled(). ]
Fixes: f69759be251d ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240923164404.27227-1-john.allen@amd.com
|
|
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!
Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.
Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since X86_FEATURE_ENTRY_IBPB will invalidate all harmful predictions
with IBPB, no software-based untraining of returns is needed anymore.
Currently, this change affects retbleed and SRSO mitigations so if
either of the mitigations is doing IBPB and the other one does the
software sequence, the latter is not needed anymore.
[ bp: Massage commit message. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
|
|
entry_ibpb() is designed to follow Intel's IBPB specification regardless
of CPU. This includes invalidating RSB entries.
Hence, if IBPB on VMEXIT has been selected, entry_ibpb() as part of the
RET untraining in the VMEXIT path will take care of all BTB and RSB
clearing so there's no need to explicitly fill the RSB anymore.
[ bp: Massage commit message. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
|
|
Set this flag if the CPU has an IBPB implementation that does not
invalidate return target predictions. Zen generations < 4 do not flush
the RSB when executing an IBPB and this bug flag denotes that.
[ bp: Massage. ]
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
|
|
After a recent LLVM change [1] that deduces __cold on functions that only call
cold code (such as __init functions), there is a section mismatch warning from
__get_mem_config_intel(), which got moved to .text.unlikely. as a result of
that optimization:
WARNING: modpost: vmlinux: section mismatch in reference: \
__get_mem_config_intel+0x77 (section: .text.unlikely.) -> thread_throttle_mode_init (section: .init.text)
Mark __get_mem_config_intel() as __init as well since it is only called
from __init code, which clears up the warning.
While __rdt_get_mem_config_amd() does not exhibit a warning because it
does not call any __init code, it is a similar function that is only
called from __init code like __get_mem_config_intel(), so mark it __init
as well to keep the code symmetrical.
CONFIG_SECTION_MISMATCH_WARN_ONLY=n would turn this into a fatal error.
Fixes: 05b93417ce5b ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)")
Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: <stable@kernel.org>
Link: https://github.com/llvm/llvm-project/commit/6b11573b8c5e3d36beee099dbe7347c2a007bf53 [1]
Link: https://lore.kernel.org/r/20240917-x86-restctrl-get_mem_config_intel-init-v3-1-10d521256284@kernel.org
|
|
The resctrl schemata file supports specifying memory bandwidth associated with
the Memory Bandwidth Allocation (MBA) feature via a percentage (this is the
default) or bandwidth in MiBps (when resctrl is mounted with the "mba_MBps"
option).
The allowed range for the bandwidth percentage is from
/sys/fs/resctrl/info/MB/min_bandwidth to 100, using a granularity of
/sys/fs/resctrl/info/MB/bandwidth_gran. The supported range for the MiBps
bandwidth is 0 to U32_MAX.
There are two issues with parsing of MiBps memory bandwidth:
* The user provided MiBps is mistakenly rounded up to the granularity
that is unique to percentage input.
* The user provided MiBps is parsed using unsigned long (thus accepting
values up to ULONG_MAX), and then assigned to u32 that could result in
overflow.
Do not round up the MiBps value and parse user provided bandwidth as the u32
it is intended to be. Use the appropriate kstrtou32() that can detect out of
range values.
Fixes: 8205a078ba78 ("x86/intel_rdt/mba_sc: Add schemata support")
Fixes: 6ce1560d35f6 ("x86/resctrl: Switch over to the resctrl mbps_val list")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Martin Kletzander <nert.pinx@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
|
|
Add new PCI ID for Device 18h and Function 4.
Signed-off-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240913162903.649519-1-richard.gong@amd.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Define helper get_this_hybrid_cpu_native_id() to return the CPU core
native ID. This core native ID combining with core type can be used to
figure out the CPU core uarch uniquely.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lkml.kernel.org/r/20240820073853.1974746-3-dapeng1.mi@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.12, take #1
- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail
- Make sure that the host's vector length is at capped by a value
common to all CPUs
- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken
- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly
|
|
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
"A second round of Xen related changes and features:
- a small fix of the xen-pciback driver for a warning issued by
sparse
- support PCI passthrough when using a PVH dom0
- enable loading the kernel in PVH mode at arbitrary addresses,
avoiding conflicts with the memory map when running as a Xen dom0
using the host memory layout"
* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pvh: Add 64bit relocation page tables
x86/kernel: Move page table macros to header
x86/pvh: Set phys_base when calling xen_prepare_pvh()
x86/pvh: Make PVH entrypoint PIC for x86-64
xen: sync elfnote.h from xen tree
xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
xen/privcmd: Add new syscall to get gsi from dev
xen/pvh: Setup gsi for passthrough device
xen/pci: Add a function to reset device for xen
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are only two small patches, one cleanup for arch/alpha and a
preparation patch cleaning up the handling of runtime constants in the
linker scripts"
* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
runtime constants: move list of constants to vmlinux.lds.h
alpha: no need to include asm/xchg.h twice
|
|
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to pgtable_64.h, so they can be re-used.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- fix a boot problem as a Xen dom0 on some AMD systems
- fix Xen PVH boot problems with KASAN enabled
- fix for a build warning
- fixes to swiotlb-xen
* tag 'for-linus-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: fix allocated size
xen/swiotlb: add alignment check for dma buffers
xen/pci: Avoid -Wflex-array-member-not-at-end warning
xen/xenbus: Convert to use ERR_CAST()
xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()
x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()
xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)
xen: tolerate ACPI NVS memory overlapping with Xen allocated memory
xen: allow mapping ACPI data using a different physical address
xen: add capability to remap non-RAM pages to different PFNs
xen: move max_pfn in xen_memory_setup() out of function scope
xen: move checks for e820 conflicts further up
xen: introduce generic helper checking for memory map conflicts
xen: use correct end address of kernel for conflict checking
|
|
into HEAD
KVM VMX and x86 PAT MSR macro cleanup for 6.12:
- Add common defines for the x86 architectural memory types, i.e. the types
that are shared across PAT, MTRRs, VMCSes, and EPTPs.
- Clean up the various VMX MSR macros to make the code self-documenting
(inasmuch as possible), and to make it less painful to add new macros.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
- Use the topology information of number of packages for making the
decision about TSC trust instead of using the number of online nodes
which is not reflecting the real topology.
- Stop the PIT timer 0 when its not in use as to stop pointless
emulation in the VMM.
- Fix the PIT timer stop sequence for timer 0 so it truly stops both
real hardware and buggy VMM emulations.
* tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Check for sockets instead of CPUs to make code match comment
clockevents/drivers/i8253: Fix stop sequence for timer 0
x86/i8253: Disable PIT timer 0 when not in use
x86/tsc: Use topology_max_packages() to get package number
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Thomas Gleixner:
- Rework kcpuid to handle the the autogenerated CSV file correctly and
update the CSV file to cover the whole zoo of CPUID.
- Avoid memcpy() for ia32 syscall_get_arguments() and use direct
assignments as fortified memcpy() is unhappy about writing/reading
beyond the end of the addresses destination/source struct member
- A few new PCI IDs for AMD
- Update MAINTAINERS to cover x86 specific selftests
* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add selftests/x86 entry
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
MAINTAINERS: Add x86 cpuid database entry
tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
tools/x86/kcpuid: Parse subleaf ranges if provided
tools/x86/kcpuid: Recognize all leaves with subleaves
tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
tools/x86/kcpuid: Protect against faulty "max subleaf" values
tools/x86/kcpuid: Set max possible subleaves count to 64
tools/x86/kcpuid: Properly align long-description columns
tools/x86/kcpuid: Remove unused variable
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 memory management updates from Thomas Gleixner:
- Make LAM enablement safe vs. kernel threads using a process mm
temporarily as switching back to the process would not update CR3 and
therefore not enable LAM causing faults in user space when using
tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
all CPUs which use the related mm.
- Cure a LAM harmless inconsistency between CR3 and the state during
context switch. It's both confusing and prone to lead to real bugs
- Handle alt stack handling for threads which run with a non-zero
protection key. The non-zero key prevents the kernel to access the
alternate stack. Cure it by temporarily enabling all protection keys
for the alternate stack setup/restore operations.
- Provide a EFI config table identity mapping for kexec kernel to
prevent kexec fails because the new kernel cannot access the config
table array
- Use GB pages only when a full GB is mapped in the identity map as
otherwise the CPU can speculate into reserved areas after the end of
memory which causes malfunction on UV systems.
- Remove the noisy and pointless SRAT table dump during boot
- Use is_ioremap_addr() for iounmap() address range checks instead of
high_memory. is_ioremap_addr() is more precise.
* tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioremap: Improve iounmap() address range checks
x86/mm: Remove duplicate check from build_cr3()
x86/mm: Remove unused NX related declarations
x86/mm: Remove unused CR3_HW_ASID_BITS
x86/mm: Don't print out SRAT table information
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
x86/kexec: Add EFI config table identity mapping for kexec kernel
selftests/mm: Add new testcases for pkeys
x86/pkeys: Restore altstack access in sigreturn()
x86/pkeys: Update PKRU to enable all pkeys before XSAVE
x86/pkeys: Add helper functions to update PKRU on the sigframe
x86/pkeys: Add PKRU as a parameter in signal handling functions
x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
x86/mm: Fix LAM inconsistency during context switch
x86/mm: Use IPIs to synchronize LAM enablement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FRED updates from Thomas Gleixner:
- Enable FRED right after init_mem_mapping() because at that point the
early IDT fault handler is replaced by the real fault handler. The
real fault handler retrieves the faulting address from the stack
frame and not from CR2 when the FRED feature is set. But that
obviously only works when FRED is enabled in the CPU as well.
- Set SS to __KERNEL_DS when enabling FRED to prevent a corner case
where ERETS can observe a SS mismatch and raises a #GP.
* tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Set FRED RSP0 on return to userspace instead of context switch
x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
x86/entry: Test ti_work for zero before processing individual bits
x86/fred: Set SS to __KERNEL_DS when enabling FRED
x86/fred: Enable FRED right after init_mem_mapping()
x86/fred: Move FRED RSP initialization into a separate function
x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Provide FPU buffer layout in core dumps:
Debuggers have guess the FPU buffer layout in core dumps, which is
error prone. This is because AMD and Intel layouts differ.
To avoid buggy heuristics add a ELF section which describes the buffer
layout which can be retrieved by tools"
* tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/elf: Add a new FPU buffer layout info to x86 core files
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core update from Thomas Gleixner:
"Enable UBSAN traps for x86, which provides better reporting through
metadata encodeded into UD1"
* tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/traps: Enable UBSAN traps on x86
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Handle an allocation failure in the IO/APIC code gracefully instead
of crashing the machine.
- Remove support for APIC local destination mode on 64bit
Logical destination mode of the local APIC is used for systems with
up to 8 CPUs. It has an advantage over physical destination mode as
it allows to target multiple CPUs at once with IPIs. That advantage
was definitely worth it when systems with up to 8 CPUs were state of
the art for servers and workstations, but that's history.
In the recent past there were quite some reports of new laptops
failing to boot with logical destination mode, but they work fine
with physical destination mode. That's not a suprise because physical
destination mode is guaranteed to work as it's the only way to get a
CPU up and running via the INIT/INIT/STARTUP sequence. Some of the
affected systems were cured by BIOS updates, but not all OEMs provide
them.
As the number of CPUs keep increasing, logical destination mode
becomes less used and the benefit for small systems, like laptops, is
not really worth the trouble. So just remove logical destination mode
support for 64bit and be done with it.
- Code and comment cleanups in the APIC area.
* tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Fix comment on IRQ vector layout
x86/apic: Remove unused extern declarations
x86/apic: Remove logical destination mode for 64-bit
x86/apic: Remove unused inline function apic_set_eoi_cb()
x86/ioapic: Cleanup remaining coding style issues
x86/ioapic: Cleanup line breaks
x86/ioapic: Cleanup bracket usage
x86/ioapic: Cleanup comments
x86/ioapic: Move replace_pin_at_irq_node() to the call site
iommu/vt-d: Cleanup apic_printk()
x86/mpparse: Cleanup apic_printk()s
x86/ioapic: Cleanup guarded debug printk()s
x86/ioapic: Cleanup apic_printk()s
x86/apic: Cleanup apic_printk()s
x86/apic: Provide apic_printk() helpers
x86/ioapic: Use guard() for locking where applicable
x86/ioapic: Cleanup structs
x86/ioapic: Mark mp_alloc_timer_irq() __init
x86/ioapic: Handle allocation failures gracefully
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of cleanups across x86:
- Use memremap() for the EISA probe instead of ioremap(). EISA is
strictly memory and not MMIO
- Cleanups and enhancement all over the place"
* tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/EISA: Dereference memory directly instead of using readl()
x86/extable: Remove unused declaration fixup_bug()
x86/boot/64: Strip percpu address space when setting up GDT descriptors
x86/cpu: Clarify the error message when BIOS does not support SGX
x86/kexec: Add comments around swap_pages() assembly to improve readability
x86/kexec: Fix a comment of swap_pages() assembly
x86/sgx: Fix a W=1 build warning in function comment
x86/EISA: Use memremap() to probe for the EISA BIOS signature
x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()
x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Thomas Gleixner:
"Updates for KCOV instrumentation on x86:
- Prevent spurious KCOV coverage in common_interrupt()
- Fixup the KCOV Makefile directive which got stale due to a source
file rename
- Exclude stack unwinding from KCOV as it creates large amounts of
uninteresting coverage
- Provide a self test to validate that KCOV coverage of the interrupt
handling code starts not before preempt count got updated"
* tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Ignore stack unwinding in KCOV
module: Fix KCOV-ignored file name
kcov: Add interrupt handling self test
x86/entry: Remove unwanted instrumentation in common_interrupt()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Core:
- Overhaul of posix-timers in preparation of removing the workaround
for periodic timers which have signal delivery ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep time
since the large rewrite to a non-cascading wheel, but the extra
jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack for
real time tasks and enforce it at the core level instead of having
inconsistent individual checks in various timer setup functions.
- The usual set of updates and enhancements all over the place.
Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers"
* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
ntp: Make sure RTC is synchronized when time goes backwards
treewide: Fix wrong singular form of jiffies in comments
cpu: Use already existing usleep_range()
timers: Rename next_expiry_recalc() to be unique
platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
clocksource/drivers/jcore: Use request_percpu_irq()
clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
clocksource: acpi_pm: Add external callback for suspend/resume
clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
dt-bindings: timer: rockchip: Add rk3576 compatible
timers: Annotate possible non critical data race of next_expiry
timers: Remove historical extra jiffie for timeout in msleep()
hrtimer: Use and report correct timerslack values for realtime tasks
hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
timers: Add sparse annotation for timer_sync_wait_running().
signal: Replace BUG_ON()s
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"By the number of new lines of code, the most visible change here is
the addition of hybrid CPU capacity scaling support to the
intel_pstate driver. Next are the amd-pstate driver changes related to
the calculation of the AMD boost numerator and preferred core
detection.
As far as new hardware support is concerned, the intel_idle driver
will now handle Granite Rapids Xeon processors natively, the
intel_rapl power capping driver will recognize family 1Ah of AMD
processors and Intel ArrowLake-U chipos, and intel_pstate will handle
Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode.
Apart from the above, there is a usual collection of assorted fixes
and code cleanups in many places and there are tooling updates.
Specifics:
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef)
- Add support for Granite Rapids and Sierra Forest in OOB mode to the
intel_pstate cpufreq driver (Srinivas Pandruvada)
- Add basic support for CPU capacity scaling on x86 and make the
intel_pstate driver set asymmetric CPU capacity on hybrid systems
without SMT (Rafael Wysocki)
- Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
driver (Jeff Johnson)
- Several OF related cleanups in cpufreq drivers (Rob Herring)
- Enable COMPILE_TEST for ARM drivers (Rob Herrring)
- Introduce quirks for syscon failures and use socinfo to get
revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon)
- Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
Ugwekar)
- Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
(Danila Tikhonov, Huacai Chen, and Liu Jing)
- Make amd-pstate validate return of any attempt to update EPP
limits, which fixes the masking hardware problems (Mario
Limonciello)
- Move the calculation of the AMD boost numerator outside of
amd-pstate, correcting acpi-cpufreq on systems with preferred cores
(Mario Limonciello)
- Harden preferred core detection in amd-pstate to avoid potential
false positives (Mario Limonciello)
- Add extra unit test coverage for mode state machine (Mario
Limonciello)
- Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang
Liu)
- Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy)
- Disable promotion to C1E on Jasper Lake and Elkhart Lake in
intel_idle (Kai-Heng Feng)
- Use scoped device node handling to fix missing of_node_put() and
simplify walking OF children in the riscv-sbi cpuidle driver
(Krzysztof Kozlowski)
- Remove dead code from cpuidle_enter_state() (Dhruva Gole)
- Change an error pointer to NULL to fix error handling in the
intel_rapl power capping driver (Dan Carpenter)
- Fix off by one in get_rpi() in the intel_rapl power capping driver
(Dan Carpenter)
- Add support for ArrowLake-U to the intel_rapl power capping driver
(Sumeet Pawnikar)
- Fix the energy-pkg event for AMD CPUs in the intel_rapl power
capping driver (Dhananjay Ugwekar)
- Add support for AMD family 1Ah processors to the intel_rapl power
capping driver (Dhananjay Ugwekar)
- Remove unused stub for saveable_highmem_page() and remove
deprecated macros from power management documentation (Andy
Shevchenko)
- Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
sysfs interface (Xueqin Luo)
- Update the maintainers information for the
operating-points-v2-ti-cpu DT binding (Dhruva Gole)
- Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring)
- Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
Johnson)
- Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
Moon)
- Use of_property_present() instead of of_get_property() in the
imx-bus devfreq driver (Rob Herring)
- Update directory handling and installation process in the pm-graph
Makefile and add .gitignore to ignore sleepgraph.py artifacts to
pm-graph (Amit Vadhavana, Yo-Jung Lin)
- Make cpupower display residency value in idle-info (Aboorva
Devarajan)
- Add missing powercap_set_enabled() stub function to cpupower (John
B. Wyatt IV)
- Add SWIG support to cpupower (John B. Wyatt IV)"
* tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
cpufreq/amd-pstate-ut: Add test case for mode switches
cpufreq/amd-pstate: Export symbols for changing modes
amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
x86/amd: Move amd_get_highest_perf() out of amd-pstate
ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
ACPI: CPPC: Drop check for non zero perf ratio
x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
PM: hibernate: Remove unused stub for saveable_highmem_page()
pm:cpupower: Add error warning when SWIG is not installed
MAINTAINERS: Add Maintainers for SWIG Python bindings
pm:cpupower: Include test_raw_pylibcpupower.py
pm:cpupower: Add SWIG bindings files for libcpupower
pm:cpupower: Add missing powercap_set_enabled() stub function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Dave Hansen:
"These fix a deadlock in the SGX NUMA allocator.
It's probably only triggerable today on servers with buggy BIOSes, but
it's theoretically possible it can happen on less goofy systems"
* tag 'x86_sgx_for_6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Log information when a node lacks an EPC section
x86/sgx: Fix deadlock in SGX NUMA node search
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hw mitigation updates from Borislav Petkov:
- Add CONFIG_ option for every hw CPU mitigation. The intent is to
support configurations and scenarios where the mitigations code is
irrelevant
- Other small fixlets and improvements
* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Fix handling when SRSO mitigation is disabled
x86/bugs: Add missing NO_SSB flag
Documentation/srso: Document a method for checking safe RET operates properly
x86/bugs: Add a separate config for GDS
x86/bugs: Remove GDS Force Kconfig option
x86/bugs: Add a separate config for SSB
x86/bugs: Add a separate config for Spectre V2
x86/bugs: Add a separate config for SRBDS
x86/bugs: Add a separate config for Spectre v1
x86/bugs: Add a separate config for RETBLEED
x86/bugs: Add a separate config for L1TF
x86/bugs: Add a separate config for MMIO Stable Data
x86/bugs: Add a separate config for TAA
x86/bugs: Add a separate config for MDS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Add the final conversions to the new Intel VFM CPU model matching
macros which include the vendor and finally drop the old ones which
hardcode family 6
* tag 'x86_cpu_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/vfm: Delete all the *_FAM6_ CPU #defines
x86/cpu/vfm: Delete X86_MATCH_INTEL_FAM6_MODEL[_STEPPING]() macros
extcon: axp288: Switch to new Intel CPU model defines
x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Reorganize the struct mce populating functions so that MCA errors
reported through BIOS' BERT method can report the correct CPU number
the error has been detected on
* tag 'ras_core_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()
x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
x86/mce: Rename mce_setup() to mce_prep_record()
|