| Age | Commit message (Collapse) | Author | Files | Lines |
|
Right now e820__range_remove() has two parameters to control the
E820 type of the range removed:
extern void e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type);
Since E820 types start at 1, zero has a natural meaning of 'no type.
Consolidate the (old_type,check_type) parameters into a single (filter_type)
parameter:
extern void e820__range_remove(u64 start, u64 size, enum e820_type filter_type);
Note that both e820__mapped_raw_any() and e820__mapped_any()
already have such semantics for their 'type' parameter, although
it's currently not used with '0' by in-kernel code.
Also, the __e820__mapped_all() internal helper already has such
semantics implemented as well, and the e820__get_entry_type() API
uses the '0' type to such effect.
This simplifies not just e820__range_remove(), and synchronizes its
use of type filters with other E820 API functions, but simplifies
usage sites as well, such as parse_memmap_one(), beyond the reduction
of the number of parameters:
- else if (from)
- e820__range_remove(start_at, mem_size, from, 1);
else
- e820__range_remove(start_at, mem_size, 0, 0);
+ e820__range_remove(start_at, mem_size, from);
The generated code gets smaller as well:
add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-66 (-66)
Function old new delta
parse_memopt 112 107 -5
efi_init 1048 1039 -9
setup_arch 2719 2709 -10
e820__range_remove 283 273 -10
parse_memmap_opt 559 527 -32
Total: Before=22,675,600, After=22,675,534, chg -0.00%
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-28-mingo@kernel.org
|
|
None of the usage sites make use of the 'real_removed_size'
return parameter of e820__range_remove(), and it's hard
to contemplate much constructive use: E820 maps can have
holes, and removing a fixed range may result in removal
of any number of bytes from 0 to the requested size.
So remove this pointless calculation. This simplifies
the function a bit:
text data bss dec hex filename
7645 44072 0 51717 ca05 e820.o.before
7597 44072 0 51669 c9d5 e820.o.after
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-27-mingo@kernel.org
|
|
single-entry tables
So append_e820_table() begins with this weird condition that checks 'nr_entries':
static int __init append_e820_table(struct boot_e820_entry *entries, u32 nr_entries)
{
/* Only one memory region (or negative)? Ignore it */
if (nr_entries < 2)
return -1;
Firstly, 'nr_entries' has been an u32 since 2017 and cannot be negative.
Secondly, there's nothing inherently wrong with single-entry E820 maps,
especially in virtualized environments.
So remove this restriction and remove the __append_e820_table()
indirection.
Also:
- fix/update comments
- remove obsolete comments
This shrinks the generated code a bit as well:
text data bss dec hex filename
7549 44072 0 51621 c9a5 e820.o.before
7533 44072 0 51605 c995 e820.o.after
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-26-mingo@kernel.org
|
|
So the e820.c file has a hodgepodge of __init and __initdata tag
placements:
static int __init e820_search_gap(unsigned long *max_gap_start, unsigned long *max_gap_size)
__init void e820__setup_pci_gap(void)
__init void e820__reallocate_tables(void)
void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
static int __init e820__register_nvs_regions(void)
u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
Standardize on the style used by e820__setup_pci_gap() and place
them before the storage class.
In addition to the consistency, as a bonus this makes the grep output
rather clean looking:
__init void e820__range_remove(u64 start, u64 size, enum e820_type filter_type)
__init void e820__update_table_print(void)
__init static void e820__update_table_kexec(void)
__init static int e820_search_gap(unsigned long *max_gap_start, unsigned long *max_gap_size)
__init void e820__setup_pci_gap(void)
__init void e820__reallocate_tables(void)
__init void e820__memory_setup_extended(u64 phys_addr, u32 data_len)
__init void e820__register_nosave_regions(unsigned long limit_pfn)
__init static int e820__register_nvs_regions(void)
... and if one learns to just ignore the leftmost '__init' noise then
the rest of the line looks just like a regular C function definition.
With the 'mixed' tag placement style the __init tag breaks up the function's
prototype for no good reason.
Do the same for __initdata.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-25-mingo@kernel.org
|
|
Use 'entry_new' to make clear we are allocating a new entry.
Change the table-full message to say that the table is full.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-24-mingo@kernel.org
|
|
e820_search_gap() et al
The PCI gap searching functions pass around pointers to the
gap_start/gap_size variables, which refer to the maximum
size gap found so far.
Rename the variables to say so, and disambiguate their namespace
from 'current gap' variables.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-23-mingo@kernel.org
|
|
PCI gap
Right now the main x86 function that determines the position and size
of the 'PCI gap', e820_search_gap(), has this curious property:
while (--idx >= 0) {
...
if (gap >= *gap_size) {
I.e. it will iterate the E820 table backwards, from its end to the beginning,
and will search for larger and larger gaps in the memory map below 4GB,
until it finishes with the table.
This logic will, should there be two gaps with the same size, pick the
one with the lower physical address - which is contrary to usual
practice that the PCI gap is just below 4GB.
Furthermore, the commit that introduced this weird logic 16 years ago:
3381959da5a0 ("x86: cleanup e820_setup_gap(), add e820_search_gap(), v2")
- if (gap > gapsize) {
+ if (gap >= *gapsize) {
didn't even declare this change, the title says it's a cleanup,
and the changelog declares it as a preparatory refactoring
for a later bugfix:
809d9a8f93bd ("x86/PCI: ACPI based PCI gap calculation")
which bugfix was reverted only 1 day later without much of an
explanation, and was never reintroduced:
58b6e5538460 ("Revert "x86/PCI: ACPI based PCI gap calculation"")
So based on the Git archeology and by the plain reading of the
code I declare this '>=' change an unintended bug and side effect.
Change it to '>' again.
It should not make much of a difference in practice, as the
likelihood of having *two* largest gaps with exactly the same
size are very low outside of weird user-provided memory maps.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-22-mingo@kernel.org
|
|
Apply misc cleanups:
- Use a bit more readable variable names, we haven't run out of
underscore characters in the kernel yet.
- s/0x400000/SZ_4M
- s/1024*1024/SZ_1M
Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-21-mingo@kernel.org
|
|
__u32 is for UAPI headers, and this definition is the only place
in the kernel-internal E820 code that uses __u32. Change it to u32.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-20-mingo@kernel.org
|
|
So we have 'idx' types of 'int' and 'unsigned int', and sometimes
we assign 'u32' fields such as e820_table::nr_entries to these 'int'
values.
While there's no real risk of overflow with these tables, make it
all cleaner by standardizing on a single type: u32.
This also happens to shrink the code a bit:
text data bss dec hex filename
7745 44072 0 51817 ca69 e820.o.before
7613 44072 0 51685 c9e5 e820.o.after
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-19-mingo@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-18-mingo@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-17-mingo@kernel.org
|
|
So __refdata, like __init, is more of a storage class specifier,
so move the attribute in front of the type, not after the variable
name. This also aligns it vertically.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-16-mingo@kernel.org
|
|
- Use 'idx' index variable instead of a weird 'x'
- Make the error message E820-specific
- Group the code a bit better
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-15-mingo@kernel.org
|
|
For E820_TYPE_RESERVED, print:
'reserved' -> 'device reserved'
For E820_TYPE_PRAM and E820_TYPE_PMEM:
'persistent' -> 'persistent RAM'
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-14-mingo@kernel.org
|
|
E820 related resource allocations
So the E820 code has a rather confusing area of code at around
e820__reserve_resources(), which is, by its plain reading,
rather self-contradictory. For example, the comment explaining
e820__reserve_resources() claims:
- '* Mark E820 reserved areas as busy for the resource manager'
By 'E820 reserved areas' one can naively conclude that it's
talking about E820_TYPE_RESERVED areas - while those areas
are treated in exactly the opposite fashion by do_mark_busy():
switch (type) {
case E820_TYPE_RESERVED:
case E820_TYPE_SOFT_RESERVED:
case E820_TYPE_PRAM:
case E820_TYPE_PMEM:
return false;
Ie. E820_TYPE_RESERVED areas are *not* marked busy for the
resource manager, because E820_TYPE_RESERVED areas are
device regions that might eventually be claimed by a device driver.
This type of confusion permeates this whole area of code,
making it exceedingly difficult to read (for me at least).
So untangle it bit by bit:
- Instead of talking about ambiguous 'reserved areas',
talk about 'E820 device address regions' instead,
and 'register'/'lock' them.
- The do_mark_busy() function is a misnomer as well, because despite
its name it 'does' nothing - it only determines what type
of resource handling an E820 type should receive from the
kernel. Rename it to e820_device_region() and negate its
meaning, to avoid the 'busy/reserved' confusion. Because
that's what this code is really about: filtering out
device regions such as E820_TYPE_RESERVED, E820_TYPE_PRAM,
E820_TYPE_PMEM, etc., and allowing them to be claimed
by device drivers later on.
- All other E820 regions (system regions) are registered and
locked early on, before the PCI resource manager does its
search for device BAR addresses, etc.
Also fix this somewhat misleading comment:
/*
* Try to bump up RAM regions to reasonable boundaries, to
* avoid stolen RAM:
*/
and explain that here we register artificial 'gap' resources
at the end of suspiciously sized RAM regions, as heuristics
to try to avoid buggy firmware with undeclared 'stolen RAM' regions:
/*
* Create additional 'gaps' at the end of RAM regions,
* rounding them up to 64k/1MB/64MB boundaries, should
* they be weirdly sized, and register extra, locked
* resource regions for them, to make sure drivers
* won't claim those addresses.
*
* These are basically blind guesses and heuristics to
* avoid resource conflicts with broken firmware that
* doesn't properly list 'stolen RAM' as a system region
* in the E820 map.
*/
Also improve the printout of this extra resource a bit: make the
message more unambiguous, and upgrade it from pr_debug() (where
very few people will see it), to pr_info() (where it will make
it into the syslog on default distro configs).
Also fix spelling and improve comment placement.
No change in functionality intended.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-13-mingo@kernel.org
|
|
early_panic() is a pointless wrapper around panic():
static void __init early_panic(char *msg)
{
early_printk(msg);
panic(msg);
}
panic() will already do a printk() of 'msg', and an early_printk() if
earlyprintk is enabled. There's no need to print it separately.
Remove the function.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-12-mingo@kernel.org
|
|
There's a number of structure fields and local variables related
to E820 entry physical addresses that are defined as 'unsigned long long',
but then are compared to u64 fields.
Make the types all consistently u64.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-11-mingo@kernel.org
|
|
It is a bit weird and inconsistent that the PCI gap is
advertised during bootup as 'mem'ory:
[mem 0xc0000000-0xfed1bfff] available for PCI devices
^^^
It's not really memory, it's a gap that PCI devices can decode
and use and they often do not map it to any memory themselves.
So advertise it for what it is, a gap:
[gap 0xc0000000-0xfed1bfff] available for PCI devices
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-10-mingo@kernel.org
|
|
So it is a bit weird that the actual RAM entries of the E820 table
are not actually called RAM, but 'usable':
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] 1.9 GB usable
'usable' is pretty passive-aggressive in that context and ambiguous,
most E820 entries denote 'usable' address ranges - reserved ranges
may be used by devices, or the platform.
Clarify and disambiguate this by making the boot log entry
explicitly say 'System RAM', like in /proc/iomem:
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] 1.9 GB System RAM
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20250515120549.2820541-9-mingo@kernel.org
|
|
e820_print_type()
We are going to add more columns to the E820 table printout,
so make e820_print_type()'s field separator (space character)
part of the function itself.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-7-mingo@kernel.org
|
|
Gaps in the E820 table are not obvious at a glance and can
easily be overlooked.
Print out gaps in the E820 table:
Before:
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] usable
BIOS-e820: [mem 0x000000007ffdc000-0x000000007fffffff] reserved
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
After:
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [gap 0x00000000000a0000-0x00000000000effff]
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] usable
BIOS-e820: [mem 0x000000007ffdc000-0x000000007fffffff] reserved
BIOS-e820: [gap 0x0000000080000000-0x00000000afffffff]
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [gap 0x00000000c0000000-0x00000000fed1bfff]
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [gap 0x00000000fed20000-0x00000000feffbfff]
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [gap 0x00000000ff000000-0x00000000fffbffff]
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
BIOS-e820: [gap 0x0000000100000000-0x000000fcffffffff]
BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
Also warn about badly ordered E820 table entries:
BUG: out of order E820 entry!
( this is printed before the entry is printed, so there's no need to
print any additional data with the warning. )
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-6-mingo@kernel.org
|
|
There are no external users of this function left.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-5-mingo@kernel.org
|
|
No need to print out the table - users won't really be able
to tell much from it anyway and the messages around this
erratum are unnecessarily obtuse.
Instead clearly inform the user that a 256 kB hole is being
punched in their memory map at the 1.75 GB physical address.
Not that there are many PPro users left. :-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-4-mingo@kernel.org
|
|
Introduce 'entry' for the current table entry and shorten
repetitious use of e820_table->entries[i].
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20250515120549.2820541-3-mingo@kernel.org
|
|
function name, rename it to e820_type_mergeable()
It's a bad practice to put inverted logic into function names,
flip it back and rename it to e820_type_mergeable().
Add/update a few comments about this function while at it.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://patch.msgid.link/20250515120549.2820541-2-mingo@kernel.org
|
|
ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is
used in conjunction with the "Enabled" MADT LAPIC flag to determine if
a CPU can be enabled/hotplugged by the OS after boot.
Before the new bit was defined, the "Enabled" bit was explicitly
described like this (ACPI v6.0 wording provided):
"If zero, this processor is unusable, and the operating system
support will not attempt to use it"
This means that CPU hotplug (based on MADT) is not possible. Many BIOS
implementations follow this guidance. They may include LAPIC entries in
MADT for unavailable CPUs, but since these entries are marked with
"Enabled=0" it is expected that the OS will completely ignore these
entries.
However, QEMU will do the same (include entries with "Enabled=0") for
the purpose of allowing CPU hotplug within the guest.
Comment from QEMU function pc_madt_cpu_entry():
/* ACPI spec says that LAPIC entry for non present
* CPU may be omitted from MADT or it must be marked
* as disabled. However omitting non present CPU from
* MADT breaks hotplug on linux. So possible CPUs
* should be put in MADT but kept disabled.
*/
Recent Linux topology changes broke the QEMU use case. A following fix
for the QEMU use case broke bare metal topology enumeration.
Rework the Linux MADT LAPIC flags check to allow the QEMU use case only
for guests and to maintain the ACPI spec behavior for bare metal.
Remove an unnecessary check added to fix a bare metal case introduced by
the QEMU "fix".
[ bp: Change logic as Michal suggested. ]
[ mingo: Removed misapplied -stable tag. ]
Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check")
Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com
Reported-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Michal Pecio <michal.pecio@gmail.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com
Cc: stable@vger.kernel.org
|
|
When UBSAN is enabled, multiple array-index-out-of-bounds messages are
printed:
[ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:276:23
[ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]'
...
[ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:277:32
[ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]'
...
[ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:282:16
[ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]'
...
[ 0.515850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1344:23
[ 0.519851] [ T1] index 1 is out of range for type '<unknown> [1]'
...
[ 0.603850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1345:32
[ 0.607850] [ T1] index 1 is out of range for type '<unknown> [1]'
...
[ 0.691850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1353:20
[ 0.695850] [ T1] index 1 is out of range for type '<unknown> [1]'
One-element arrays have been deprecated:
https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays
Switch entry in struct uv_systab to a flexible array member to fix UBSAN
array-index-out-of-bounds messages.
sizeof(struct uv_systab) is passed to early_memremap() and ioremap(). The
flexible array member is not accessed until the UV system table size is used to
remap the entire UV system table, so changes to sizeof(struct uv_systab) have no
impact.
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/aTxksN-3otY41WvQ@hpe.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
- Fix NULL pointer dereference crash in the Intel PMU driver
- Fix missing read event generation on task exit
- Fix AMD uncore driver init error handling
- Fix whitespace noise
* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
perf/core: Fix missing read event generation on task exit
perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
perf/uprobes: Remove <space><Tab> whitespace noise
|
|
There is no opening quote. Remove the unmatched closing quote.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251210125628.544916-1-thorsten.blum@linux.dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand)
fixes a couple of minor things in ppc land
- "Improve folio split related functions" (Zi Yan)
some cleanups and minorish fixes in the folio splitting code
* tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/tests/core-kunit: avoid damos_test_commit stack warning
mm: vmscan: correct nr_requested tracing in scan_folios
MAINTAINERS: add idr core-api doc file to XARRAY
mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages()
mm: fix CONFIG_STACK_GROWSUP typo in mm.h
mm/huge_memory: fix folio split stats counting
mm/huge_memory: make min_order_for_split() always return an order
mm/huge_memory: replace can_split_folio() with direct refcount calculation
mm/huge_memory: change folio_split_supported() to folio_check_splittable()
mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM
powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages
powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION
|
|
Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code. But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.
Let's add the gitignore file before somebody thinks that generated
header should be committed.
Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only slightly large change is the enablement of CIX HD-audio
controller, which took a bit time to be cooked up, while most of other
changes are device-specific small trivial fixes:
- Default disablement of the kconfig for decades old pre-release
alsa-lib PCM API; it's only the default config value change, so it
can't lead to any regressions for the existing setups
- Support for CIX HD-audio controller
- A few ASoC ACP fixes
- Fixes for ASoC cirrus, bcm, wcd, qcom, ak platforms
- Trivial hardening for FireWire and USB-audio
- HD-audio Intel binding fix and quirks"
* tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
ALSA: hda/tas2781: Add new quirk for HP new project
ALSA: hda: cix-ipbloq: Use modern PM ops
ALSA: hda: intel-dsp-config: Prefer legacy driver as fallback
ASoC: amd: acp: update tdm channels for specific DAI
ASoC: cs35l56: Fix incorrect select SND_SOC_CS35L56_CAL_SYSFS_COMMON
ALSA: firewire-motu: add bounds check in put_user loop for DSP events
ASoC: cs35l41: Always return 0 when a subsystem ID is found
ALSA: uapi: Fix typo in asound.h comment
ALSA: Do not build obsolete API
ALSA: hda: add CIX IPBLOQ HDA controller support
ALSA: hda/core: add addr_offset field for bus address translation
ALSA: hda: dt-bindings: add CIX IPBLOQ HDA controller support
ALSA: hda/realtek: Add support for ASUS UM3406GA
ALSA: hda/realtek: Add support for HP Turbine Laptops
ALSA: usb-audio: Initialize status1 to fix uninitialized symbol errors
ALSA: firewire-motu: fix buffer overflow in hwdep read for DSP events
ALSA: hda: cs35l41: Fix NULL pointer dereference in cs35l41_hda_read_acpi()
ASoC: cros_ec_codec: Remove unnecessary selection of CRYPTO
ASoc: qcom: q6afe: fix bad guard conversion
ASoC: rockchip: Fix Wvoid-pointer-to-enum-cast warning (again)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Add basic LoongArch32 support
Note: Build infrastructures of LoongArch32 are not enabled yet,
because we need to adjust irqchip drivers and wait for GNU toolchain
be upstream first.
- Select HAVE_ARCH_BITREVERSE in Kconfig
- Fix build and boot for CONFIG_RANDSTRUCT
- Correct the calculation logic of thread_count
- Some bug fixes and other small changes
* tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits)
LoongArch: Adjust default config files for 32BIT/64BIT
LoongArch: Adjust VDSO/VSYSCALL for 32BIT/64BIT
LoongArch: Adjust misc routines for 32BIT/64BIT
LoongArch: Adjust user accessors for 32BIT/64BIT
LoongArch: Adjust system call for 32BIT/64BIT
LoongArch: Adjust module loader for 32BIT/64BIT
LoongArch: Adjust time routines for 32BIT/64BIT
LoongArch: Adjust process management for 32BIT/64BIT
LoongArch: Adjust memory management for 32BIT/64BIT
LoongArch: Adjust boot & setup for 32BIT/64BIT
LoongArch: Adjust common macro definitions for 32BIT/64BIT
LoongArch: Add adaptive CSR accessors for 32BIT/64BIT
LoongArch: Add atomic operations for 32BIT/64BIT
LoongArch: Add new PCI ID for pci_fixup_vgadev()
LoongArch: Add and use some macros for AVEC
LoongArch: Correct the calculation logic of thread_count
LoongArch: Use unsigned long for _end and _text
LoongArch: Use __pmd()/__pte() for swap entry conversions
LoongArch: Fix arch_dup_task_struct() for CONFIG_RANDSTRUCT
LoongArch: Fix build errors for CONFIG_RANDSTRUCT
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
"Fixes for some recent regressions as well as some longstanding issues:
- Fix incorrect output from the arm64 NEON implementation of GHASH
- Merge the ksimd scopes in the arm64 XTS code to reduce stack usage
- Roll up the BLAKE2b round loop on 32-bit kernels to greatly reduce
code size and stack usage
- Add missing RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS dependency
- Fix chacha-riscv64-zvkb.S to not use frame pointer for data"
* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: arm64/ghash - Fix incorrect output from ghash-neon
crypto/arm64: sm4/xts - Merge ksimd scopes to reduce stack bloat
crypto/arm64: aes/xts - Use single ksimd scope to reduce stack bloat
lib/crypto: blake2s: Replace manual unrolling with unrolled_full
lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
lib/crypto: riscv: Depend on RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
lib/crypto: riscv/chacha: Avoid s0/fp register
|
|
handle_pmi_common() may observe an active bit set in cpuc->active_mask
while the corresponding cpuc->events[] entry has already been cleared,
which leads to a NULL pointer dereference.
This can happen when interrupt throttling stops all events in a group
while PEBS processing is still in progress. perf_event_overflow() can
trigger perf_event_throttle_group(), which stops the group and clears
the cpuc->events[] entry, but the active bit may still be set when
handle_pmi_common() iterates over the events.
The following recent fix:
7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and
relied on cpuc->active_mask/pebs_enabled checks. However,
handle_pmi_common() can still encounter a NULL cpuc->events[] entry
despite the active bit being set.
Add an explicit NULL check on the event pointer before using it,
to cover this legitimate scenario and avoid the NULL dereference crash.
Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Reported-by: kitta <kitta@linux.alibaba.com>
Co-developed-by: kitta <kitta@linux.alibaba.com>
Signed-off-by: Evan Li <evan.li@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Use the MSI parent domain API instead of the legacy API for setup and
teardown of PCI MSI IRQs
- Select POSIX_CPU_TIMERS_TASK_WORK now that VIRT_XFER_TO_GUEST_WORK
has been implemented for s390
- Fix a KVM bug which can lead to guest memory corruption
- Fix KASAN shadow memory mapping for hotplugged memory
- Minor bug fixes and improvements
* tag 's390-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/bug: Add missing alignment
s390/bug: Add missing CONFIG_BUG ifdef again
KVM: s390: Fix gmap_helper_zap_one_page() again
s390/pci: Migrate s390 IRQ logic to IRQ domain API
genirq: Change hwirq parameter to irq_hw_number_t
s390: Select POSIX_CPU_TIMERS_TASK_WORK
s390: Unmap early KASAN shadow on memory offlining
s390/vmem: Support 2G page splitting for KASAN shadow freeing
s390/boot: Use entire page for PTEs
s390/vmur: Use scnprintf() instead of sprintf()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha
Pull alpha updates from Magnus Lindholm:
"Two small uapi fixes. One patch hardcodes TC* ioctl values that
previously depended on the deprecated termio struct, avoiding build
issues with newer glibc versions. The other patch switches uapi
headers to use the compiler-defined __ASSEMBLER__ macro for better
consistency between kernel and userspace.
- don't reference obsolete termio struct for TC* constants
- Replace __ASSEMBLY__ with __ASSEMBLER__ in the alpha headers"
* tag 'alpha-for-v6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha:
alpha: don't reference obsolete termio struct for TC* constants
alpha: Replace __ASSEMBLY__ with __ASSEMBLER__ in the alpha headers
|
|
Pull ARM updates from Russell King:
- disable jump label and high PTE for PREEMPT RT kernels
- fix input operand modification in load_unaligned_zeropad()
- fix hash_name() / fault path induced warnings
- fix branch predictor hardening
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: fix branch predictor hardening
ARM: fix hash_name() fault
ARM: allow __do_kernel_fault() to report execution of memory faults
ARM: group is_permission_fault() with is_translation_fault()
ARM: 9464/1: fix input-only operand modification in load_unaligned_zeropad()
ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
ARM: 9459/1: Disable jump-label on PREEMPT_RT
|
|
Commit 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block
handling") made ghash_finup() pass the wrong buffer to
ghash_do_simd_update(). As a result, ghash-neon now produces incorrect
outputs when the message length isn't divisible by 16 bytes. Fix this.
(I didn't notice this earlier because this code is reached only on CPUs
that support NEON but not PMULL. I haven't yet found a way to get
qemu-system-aarch64 to emulate that configuration.)
Fixes: 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block handling")
Cc: stable@vger.kernel.org
Reported-by: Diederik de Haas <diederik@cknow-tech.com>
Closes: https://lore.kernel.org/linux-crypto/DETXT7QI62KE.F3CGH2VWX1SC@cknow-tech.com/
Tested-by: Diederik de Haas <diederik@cknow-tech.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20251209223417.112294-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
|
|
__do_user_fault() may be called with indeterminent interrupt enable
state, which means we may be preemptive at this point. This causes
problems when calling harden_branch_predictor(). For example, when
called from a data abort, do_alignment_fault()->do_bad_area().
Move harden_branch_predictor() out of __do_user_fault() and into the
calling contexts.
Moving it into do_kernel_address_page_fault(), we can be sure that
interrupts will be disabled here.
Converting do_translation_fault() to use do_kernel_address_page_fault()
rather than do_bad_area() means that we keep branch predictor handling
for translation faults. Interrupts will also be disabled at this call
site.
do_sect_fault() needs special handling, so detect user mode accesses
to kernel-addresses, and add an explicit call to branch predictor
hardening.
Finally, add branch predictor hardening to do_alignment() for the
faulting case (user mode accessing kernel addresses) before interrupts
are enabled.
This should cover all cases where harden_branch_predictor() is called,
ensuring that it is always has interrupts disabled, also ensuring that
it is called early in each call path.
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Zizhi Wo reports:
"During the execution of hash_name()->load_unaligned_zeropad(), a
potential memory access beyond the PAGE boundary may occur. For
example, when the filename length is near the PAGE_SIZE boundary.
This triggers a page fault, which leads to a call to
do_page_fault()->mmap_read_trylock(). If we can't acquire the lock,
we have to fall back to the mmap_read_lock() path, which calls
might_sleep(). This breaks RCU semantics because path lookup occurs
under an RCU read-side critical section."
This is seen with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_KFENCE=y.
Kernel addresses (with the exception of the vectors/kuser helper
page) do not have VMAs associated with them. If the vectors/kuser
helper page faults, then there are two possibilities:
1. if the fault happened while in kernel mode, then we're basically
dead, because the CPU won't be able to vector through this page
to handle the fault.
2. if the fault happened while in user mode, that means the page was
protected from user access, and we want to fault anyway.
Thus, we can handle kernel addresses from any context entirely
separately without going anywhere near the mmap lock. This gives us
an entirely non-sleeping path for all kernel mode kernel address
faults.
As we handle the kernel address faults before interrupts are enabled,
this change has the side effect of improving the branch predictor
hardening, but does not completely solve the issue.
Reported-by: Zizhi Wo <wozizhi@huaweicloud.com>
Reported-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Link: https://lore.kernel.org/r/20251126090505.3057219-1-wozizhi@huaweicloud.com
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Allow __do_kernel_fault() to detect the execution of memory, so we can
provide the same fault message as do_page_fault() would do. This is
required when we split the kernel address fault handling from the
main do_page_fault() code path.
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Zero can be a valid value of num_records. For example, on Intel Atom x6425RE,
only x87 and SSE are supported (features 0, 1), and fpu_user_cfg.max_features
is 3. The for_each_extended_xfeature() loop only iterates feature 2, which is
not enabled, so num_records = 0. This is valid and should not cause core dump
failure.
The issue is that dump_xsave_layout_desc() returns 0 for both genuine errors
(dump_emit() failure) and valid cases (no extended features). Use negative
return values for errors and only abort on genuine failures.
Fixes: ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86 core files")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251210000219.4094353-2-yongxin.liu@windriver.com
|
|
BPF JIT programs and trampolines use a frame pointer, so the current ORC
unwinder strategy of falling back to frame pointers (when an ORC entry
is missing) usually works in practice when unwinding through BPF JIT
stack frames.
However, that frame pointer fallback is just a guess, so the unwind gets
marked unreliable for live patching, which can cause livepatch
transition stalls.
Make the common case reliable by calling the bpf_has_frame_pointer()
helper to detect the valid frame pointer region of BPF JIT programs and
trampolines.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Closes: https://lore.kernel.org/0e555733-c670-4e84-b2e6-abb8b84ade38@crowdstrike.com
Acked-by: Song Liu <song@kernel.org>
Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/a18505975662328c8ffb1090dded890c6f8c1004.1764818927.git.jpoimboe@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
|
|
Introduce a bpf_has_frame_pointer() helper that unwinders can call to
determine whether a given instruction pointer is within the valid frame
pointer region of a BPF JIT program or trampoline (i.e., after the
prologue, before the epilogue).
This will enable livepatch (with the ORC unwinder) to reliably unwind
through BPF JIT frames.
Acked-by: Song Liu <song@kernel.org>
Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/fd2bc5b4e261a680774b28f6100509fd5ebad2f0.1764818927.git.jpoimboe@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
|
|
Analogically to the x86 commit 881a9c9cb785 ("bpf: Do not audit
capability check in do_jit()"), change the capable() call to
ns_capable_noaudit() in order to avoid spurious SELinux denials in audit
log.
The commit log from that commit applies here as well:
"""
The failure of this check only results in a security mitigation being
applied, slightly affecting performance of the compiled BPF program. It
doesn't result in a failed syscall, an thus auditing a failed LSM
permission check for it is unwanted. For example with SELinux, it causes
a denial to be reported for confined processes running as root, which
tends to be flagged as a problem to be fixed in the policy. Yet
dontauditing or allowing CAP_SYS_ADMIN to the domain may not be
desirable, as it would allow/silence also other checks - either going
against the principle of least privilege or making debugging potentially
harder.
Fix it by changing it from capable() to ns_capable_noaudit(), which
instructs the LSMs to not audit the resulting denials.
"""
Fixes: f300769ead03 ("arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Link: https://lore.kernel.org/r/20251204125916.441021-1-omosnace@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull csky updates from Guo Ren:
- Remove compile warning for CONFIG_SMP
- Fix __ASSEMBLER__ typo in headers
- Fix csky_cmpxchg_fixup
* tag 'csky-for-linus-6.19' of https://github.com/c-sky/csky-linux:
csky: Remove compile warning for CONFIG_SMP
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi header
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
csky: fix csky_cmpxchg_fixup not working
|
|
Merge the two ksimd scopes in the implementation of SM4-XTS to prevent
stack bloat in cases where the compiler fails to combine the stack slots
for the kernel mode FP/SIMD buffers.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-6-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|