kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2025-06-27	efi/libstub: Describe missing 'out' parameter in efi_load_initrd	Hans Zhang	1	-0/+1
	[ Upstream commit c8e1927e7f7d63721e32ec41d27ccb0eb1a1b0fc ] The function efi_load_initrd() had a documentation warning due to the missing description for the 'out' parameter. Add the parameter description to the kernel-doc comment to resolve the warning and improve API documentation. Fixes the following compiler warning: drivers/firmware/efi/libstub/efi-stub-helper.c:611: warning: Function parameter or struct member 'out' not described in 'efi_load_initrd' Fixes: f4dc7fffa987 ("efi: libstub: unify initrd loading between architectures") Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-28	efi/libstub: Avoid physical address 0x0 when doing random allocation	Ard Biesheuvel	1	-0/+4
	commit cb16dfed0093217a68c0faa9394fa5823927e04c upstream. Ben reports spurious EFI zboot failures on a system where physical RAM starts at 0x0. When doing random memory allocation from the EFI stub on such a platform, a random seed of 0x0 (which means no entropy source is available) will result in the allocation to be placed at address 0x0 if sufficient space is available. When this allocation is subsequently passed on to the decompression code, the 0x0 address is mistaken for NULL and the code complains and gives up. So avoid address 0x0 when doing random allocation, and set the minimum address to the minimum alignment. Cc: <stable@vger.kernel.org> Reported-by: Ben Schneider <ben@bens.haus> Tested-by: Ben Schneider <ben@bens.haus> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21	efi: Avoid cold plugged memory for placing the kernel	Ard Biesheuvel	2	-0/+6
	commit ba69e0750b0362870294adab09339a0c39c3beaf upstream. UEFI 2.11 introduced EFI_MEMORY_HOT_PLUGGABLE to annotate system memory regions that are 'cold plugged' at boot, i.e., hot pluggable memory that is available from early boot, and described as system RAM by the firmware. Existing loaders and EFI applications running in the boot context will happily use this memory for allocating data structures that cannot be freed or moved at runtime, and this prevents the memory from being unplugged. Going forward, the new EFI_MEMORY_HOT_PLUGGABLE attribute should be tested, and memory annotated as such should be avoided for such allocations. In the EFI stub, there are a couple of occurrences where, instead of the high-level AllocatePages() UEFI boot service, a low-level code sequence is used that traverses the EFI memory map and carves out the requested number of pages from a free region. This is needed, e.g., for allocating as low as possible, or for allocating pages at random. While AllocatePages() should presumably avoid special purpose memory and cold plugged regions, this manual approach needs to incorporate this logic itself, in order to prevent the kernel itself from ending up in a hot unpluggable region, preventing it from being unplugged. So add the EFI_MEMORY_HOTPLUGGABLE macro definition, and check for it where appropriate. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21	efi: libstub: Use '-std=gnu11' to fix build with GCC 15	Nathan Chancellor	1	-1/+1
	commit 8ba14d9f490aef9fd535c04e9e62e1169eb7a055 upstream. GCC 15 changed the default C standard version to C23, which should not have impacted the kernel because it requests the gnu11 standard via '-std=' in the main Makefile. However, the EFI libstub Makefile uses its own set of KBUILD_CFLAGS for x86 without a '-std=' value (i.e., using the default), resulting in errors from the kernel's definitions of bool, true, and false in stddef.h, which are reserved keywords under C23. ./include/linux/stddef.h:11:9: error: expected identifier before ‘false’ 11 \| false = 0, ./include/linux/types.h:35:33: error: two or more data types in declaration specifiers 35 \| typedef _Bool bool; Set '-std=gnu11' in the x86 cflags to resolve the error and consistently use the same C standard version for the entire kernel. All other architectures reuse KBUILD_CFLAGS from the rest of the kernel, so this issue is not visible for them. Cc: stable@vger.kernel.org Reported-by: Kostadin Shishmanov <kostadinshishmanov@protonmail.com> Closes: https://lore.kernel.org/4OAhbllK7x4QJGpZjkYjtBYNLd_2whHx9oFiuZcGwtVR4hIzvduultkgfAIRZI3vQpZylu7Gl929HaYFRGeMEalWCpeMzCIIhLxxRhq4U-Y=@protonmail.com/ Reported-by: Jakub Jelinek <jakub@redhat.com> Closes: https://lore.kernel.org/Z4467umXR2PZ0M1H@tucnak/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17	efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption	Ard Biesheuvel	1	-1/+1
	commit 77d48d39e99170b528e4f2e9fc5d1d64cdedd386 upstream. The TPM event log table is a Linux specific construct, where the data produced by the GetEventLog() boot service is cached in memory, and passed on to the OS using an EFI configuration table. The use of EFI_LOADER_DATA here results in the region being left unreserved in the E820 memory map constructed by the EFI stub, and this is the memory description that is passed on to the incoming kernel by kexec, which is therefore unaware that the region should be reserved. Even though the utility of the TPM2 event log after a kexec is questionable, any corruption might send the parsing code off into the weeds and crash the kernel. So let's use EFI_ACPI_RECLAIM_MEMORY instead, which is always treated as reserved by the E820 conversion logic. Cc: <stable@vger.kernel.org> Reported-by: Breno Leitao <leitao@debian.org> Tested-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	x86/efistub: Revert to heap allocated boot_params for PE entrypoint	Ard Biesheuvel	1	-5/+15
	commit ae835a96d72cd025421910edb0e8faf706998727 upstream. This is a partial revert of commit 8117961d98f ("x86/efi: Disregard setup header of loaded image") which triggers boot issues on older Dell laptops. As it turns out, switching back to a heap allocation for the struct boot_params constructed by the EFI stub works around this, even though it is unclear why. Cc: Christian Heusel <christian@heusel.eu> Reported-by: <mavrix#kernel@simplelogin.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03	x86/efistub: Avoid returning EFI_SUCCESS on error	Ard Biesheuvel	1	-4/+1
	commit fb318ca0a522295edd6d796fb987e99ec41f0ee5 upstream. The fail label is only used in a situation where the previous EFI API call succeeded, and so status will be set to EFI_SUCCESS. Fix this, by dropping the goto entirely, and call efi_exit() with the correct error code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25	efi/libstub: zboot.lds: Discard .discard sections	Nathan Chancellor	1	-0/+1
	[ Upstream commit 5134acb15d9ef27aa2b90aad46d4e89fcef79fdc ] When building ARCH=loongarch defconfig + CONFIG_UNWINDER_ORC=y using LLVM, there is a warning from ld.lld when linking the EFI zboot image due to the use of unreachable() in number() in vsprintf.c: ld.lld: warning: drivers/firmware/efi/libstub/lib.a(vsprintf.stub.o):(.discard.unreachable+0x0): has non-ABS relocation R_LARCH_32_PCREL against symbol '' If the compiler cannot eliminate the default case for any reason, the .discard.unreachable section will remain in the final binary but the entire point of any section prefixed with .discard is that it is only used at compile time, so it can be discarded via /DISCARD/ in a linker script. The asm-generic vmlinux.lds.h includes .discard and .discard.* in the COMMON_DISCARDS macro but that is not used for zboot.lds, as it is not a kernel image linker script. Add .discard and .discard.* to /DISCARD/ in zboot.lds, so that any sections meant to be discarded at link time are not included in the final zboot image. This issue is not specific to LoongArch, it is just the first architecture to select CONFIG_OBJTOOL, which defines annotate_unreachable() as an asm statement to add the .discard.unreachable section, and use the EFI stub. Closes: https://github.com/ClangBuiltLinux/linux/issues/2023 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12	efi: libstub: only free priv.runtime_map when allocated	Hagar Hemdan	1	-2/+2
	commit 4b2543f7e1e6b91cfc8dd1696e3cdf01c3ac8974 upstream. priv.runtime_map is only allocated when efi_novamap is not set. Otherwise, it is an uninitialized value. In the error path, it is freed unconditionally. Avoid passing an uninitialized value to free_pool. Free priv.runtime_map only when it was allocated. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Fixes: f80d26043af9 ("efi: libstub: avoid efi_get_memory_map() for allocating the virt map") Cc: <stable@vger.kernel.org> Signed-off-by: Hagar Hemdan <hagarhem@amazon.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12	x86/efistub: Omit physical KASLR when memory reservations exist	Ard Biesheuvel	1	-2/+26
	commit 15aa8fb852f995dd234a57f12dfb989044968bb6 upstream. The legacy decompressor has elaborate logic to ensure that the randomized physical placement of the decompressed kernel image does not conflict with any memory reservations, including ones specified on the command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are taken into account by the kernel proper at a later stage. When booting in EFI mode, it is the firmware's job to ensure that the chosen range does not conflict with any memory reservations that it knows about, and this is trivially achieved by using the firmware's memory allocation APIs. That leaves reservations specified on the command line, though, which the firmware knows nothing about, as these regions have no other special significance to the platform. Since commit a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") these reservations are not taken into account when randomizing the physical placement, which may result in conflicts where the memory cannot be reserved by the kernel proper because its own executable image resides there. To avoid having to duplicate or reuse the existing complicated logic, disable physical KASLR entirely when such overrides are specified. These are mostly diagnostic tools or niche features, and physical KASLR (as opposed to virtual KASLR, which is much more important as it affects the memory addresses observed by code executing in the kernel) is something we can live without. Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com Reported-by: Ben Chaney <bchaney@akamai.com> Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27	x86/efistub: Remap kernel text read-only before dropping NX attribute	Ard Biesheuvel	1	-1/+10
	[ Commit 9c55461040a9264b7e44444c53d26480b438eda6 upstream ] Currently, the EFI stub invokes the EFI memory attributes protocol to strip any NX restrictions from the entire loaded kernel, resulting in all code and data being mapped read-write-execute. The point of the EFI memory attributes protocol is to remove the need for all memory allocations to be mapped with both write and execute permissions by default, and make it the OS loader's responsibility to transition data mappings to code mappings where appropriate. Even though the UEFI specification does not appear to leave room for denying memory attribute changes based on security policy, let's be cautious and avoid relying on the ability to create read-write-execute mappings. This is trivially achievable, given that the amount of kernel code executing via the firmware's 1:1 mapping is rather small and limited to the .head.text region. So let's drop the NX restrictions only on that subregion, but not before remapping it as read-only first. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27	x86/efistub: Reinstate soft limit for initrd loading	Ard Biesheuvel	1	-0/+1
	[ Commit decd347c2a75d32984beb8807d470b763a53b542 upstream ] Commit 8117961d98fb2 ("x86/efi: Disregard setup header of loaded image") dropped the memcopy of the image's setup header into the boot_params struct provided to the core kernel, on the basis that EFI boot does not need it and should rely only on a single protocol to interface with the boot chain. It is also a prerequisite for being able to increase the section alignment to 4k, which is needed to enable memory protections when running in the boot services. So only the setup_header fields that matter to the core kernel are populated explicitly, and everything else is ignored. One thing was overlooked, though: the initrd_addr_max field in the setup_header is not used by the core kernel, but it is used by the EFI stub itself when it loads the initrd, where its default value of INT_MAX is used as the soft limit for memory allocation. This means that, in the old situation, the initrd was virtually always loaded in the lower 2G of memory, but now, due to initrd_addr_max being 0x0, the initrd may end up anywhere in memory. This should not be an issue principle, as most systems can deal with this fine. However, it does appear to tickle some problems in older UEFI implementations, where the memory ends up being corrupted, resulting in errors when unpacking the initramfs. So set the initrd_addr_max field to INT_MAX like it was before. Fixes: 8117961d98fb2 ("x86/efi: Disregard setup header of loaded image") Reported-by: Radek Podgorny <radek@podgorny.cz> Closes: https://lore.kernel.org/all/a99a831a-8ad5-4cb0-bff9-be637311f771@podgorny.cz Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27	x86/efi: Disregard setup header of loaded image	Ard Biesheuvel	1	-40/+6
	[ Commit 7e50262229faad0c7b8c54477cd1c883f31cc4a7 upstream ] The native EFI entrypoint does not take a struct boot_params from the loader, but instead, it constructs one from scratch, using the setup header data placed at the start of the image. This setup header is placed in a way that permits legacy loaders to manipulate the contents (i.e., to pass the kernel command line or the address and size of an initial ramdisk), but EFI boot does not use it in that way - it only copies the contents that were placed there at build time, but EFI loaders will not (and should not) manipulate the setup header to configure the boot. (Commit 63bf28ceb3ebbe76 "efi: x86: Wipe setup_data on pure EFI boot" deals with some of the fallout of using setup_data in a way that breaks EFI boot.) Given that none of the non-zero values that are copied from the setup header into the EFI stub's struct boot_params are relevant to the boot now that the EFI stub no longer enters via the legacy decompressor, the copy can be omitted altogether. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230912090051.4014114-19-ardb@google.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27	x86/efi: Drop EFI stub .bss from .data section	Ard Biesheuvel	1	-7/+0
	[ Commit 5f51c5d0e905608ba7be126737f7c84a793ae1aa upstream ] Now that the EFI stub always zero inits its BSS section upon entry, there is no longer a need to place the BSS symbols carried by the stub into the .data section. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230912090051.4014114-18-ardb@google.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03	efi/libstub: Cast away type warning in use of max()	Ard Biesheuvel	1	-1/+1
	commit 61d130f261a3c15ae2c4b6f3ac3517d5d5b78855 upstream. Avoid a type mismatch warning in max() by switching to max_t() and providing the type explicitly. Fixes: 3cb4a4827596abc82e ("efi/libstub: fix efi_random_alloc() ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03	efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or ↵	KONDO KAZUMA(近藤　和真)	1	-1/+1
	higher address [ Upstream commit 3cb4a4827596abc82e55b80364f509d0fefc3051 ] Following warning is sometimes observed while booting my servers: [ 3.594838] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations [ 3.602918] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL\|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-1 ... [ 3.851862] DMA: preallocated 1024 KiB GFP_KERNEL\|GFP_DMA pool for atomic allocation If 'nokaslr' boot option is set, the warning always happens. On x86, ZONE_DMA is small zone at the first 16MB of physical address space. When this problem happens, most of that space seems to be used by decompressed kernel. Thereby, there is not enough space at DMA_ZONE to meet the request of DMA pool allocation. The commit 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") tried to fix this problem by introducing lower bound of allocation. But the fix is not complete. efi_random_alloc() allocates pages by following steps. 1. Count total available slots ('total_slots') 2. Select a slot ('target_slot') to allocate randomly 3. Calculate a starting address ('target') to be included target_slot 4. Allocate pages, which starting address is 'target' In step 1, 'alloc_min' is used to offset the starting address of memory chunk. But in step 3 'alloc_min' is not considered at all. As the result, 'target' can be miscalculated and become lower than 'alloc_min'. When KASLR is disabled, 'target_slot' is always 0 and the problem happens everytime if the EFI memory map of the system meets the condition. Fix this problem by calculating 'target' considering 'alloc_min'. Cc: linux-efi@vger.kernel.org Cc: Tom Englund <tomenglund26@gmail.com> Cc: linux-kernel@vger.kernel.org Fixes: 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27	x86/efistub: Don't clear BSS twice in mixed mode	Ard Biesheuvel	1	-1/+2
	[ Upstream commit df7ecce842b846a04d087ba85fdb79a90e26a1b0 ] Clearing BSS should only be done once, at the very beginning. efi_pe_entry() is the entrypoint from the firmware, which may not clear BSS and so it is done explicitly. However, efi_pe_entry() is also used as an entrypoint by the mixed mode startup code, in which case BSS will already have been cleared, and doing it again at this point will corrupt global variables holding the firmware's GDT/IDT and segment selectors. So make the memset() conditional on whether the EFI stub is running in native mode. Fixes: b3810c5a2cc4a666 ("x86/efistub: Clear decompressor BSS in native EFI entrypoint") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27	x86/efistub: Clear decompressor BSS in native EFI entrypoint	Ard Biesheuvel	1	-2/+4
	[ Upstream commit b3810c5a2cc4a6665f7a65bed5393c75ce3f3aa2 ] The EFI stub on x86 no longer invokes the decompressor as a subsequent boot stage, but calls into the decompression code directly while running in the context of the EFI boot services. This means that when using the native EFI entrypoint (as opposed to the EFI handover protocol, which clears BSS explicitly), the firmware PE image loader is being relied upon to ensure that BSS is zeroed before the EFI stub is entered from the firmware. As Radek's report proves, this is a bad idea. Not all loaders do this correctly, which means some global variables that should be statically initialized to 0x0 may have junk in them. So clear BSS explicitly when entering via efi_pe_entry(). Note that zeroing BSS from C code is not generally safe, but in this case, the following assignment and dereference of a global pointer variable ensures that the memset() cannot be deferred or reordered. Cc: <stable@kernel.org> # v6.1+ Reported-by: Radek Podgorny <radek@podgorny.cz> Closes: https://lore.kernel.org/all/a99a831a-8ad5-4cb0-bff9-be637311f771@podgorny.cz Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06	x86/efistub: Give up if memory attribute protocol returns an error	Ard Biesheuvel	2	-12/+16
	commit a7a6a01f88e87dec4bf2365571dd2dc7403d52d0 upstream. The recently introduced EFI memory attributes protocol should be used if it exists to ensure that the memory allocation created for the kernel permits execution. This is needed for compatibility with tightened requirements related to Windows logo certification for x86 PCs. Currently, we simply strip the execute protect (XP) attribute from the entire range, but this might be rejected under some firmware security policies, and so in a subsequent patch, this will be changed to only strip XP from the executable region that runs early, and make it read-only (RO) as well. In order to catch any issues early, ensure that the memory attribute protocol works as intended, and give up if it produces spurious errors. Note that the DXE services based fallback was always based on best effort, so don't propagate any errors returned by that API. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	efi/x86: Fix the missing KASLR_FLAG bit in boot_params->hdr.loadflags	Ard Biesheuvel	1	-0/+2
	From: Yuntao Wang <ytcoode@gmail.com> [ Commit 01638431c465741e071ab34acf3bef3c2570f878 upstream ] When KASLR is enabled, the KASLR_FLAG bit in boot_params->hdr.loadflags should be set to 1 to propagate KASLR status from compressed kernel to kernel, just as the choose_random_location() function does. Currently, when the kernel is booted via the EFI stub, the KASLR_FLAG bit in boot_params->hdr.loadflags is not set, even though it should be. This causes some functions, such as kernel_randomize_memory(), not to execute as expected. Fix it. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> [ardb: drop 'else' branch clearing KASLR_FLAG] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/boot: efistub: Assign global boot_params variable	Ard Biesheuvel	1	-0/+2
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit 50dcc2e0d62e3c4a54f39673c4dc3dcde7c74d52 upstream ] Now that the x86 EFI stub calls into some APIs exposed by the decompressor (e.g., kaslr_get_random_long()), it is necessary to ensure that the global boot_params variable is set correctly before doing so. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR	Ard Biesheuvel	4	-7/+11
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit 2f77465b05b1270c832b5e2ee27037672ad2a10a upstream ] The EFI stub's kernel placement logic randomizes the physical placement of the kernel by taking all available memory into account, and picking a region at random, based on a random seed. When KASLR is disabled, this seed is set to 0x0, and this results in the lowest available region of memory to be selected for loading the kernel, even if this is below LOAD_PHYSICAL_ADDR. Some of this memory is typically reserved for the GFP_DMA region, to accommodate masters that can only access the first 16 MiB of system memory. Even if such devices are rare these days, we may still end up with a warning in the kernel log, as reported by Tom: swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL\|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 Fix this by tweaking the random allocation logic to accept a low bound on the placement, and set it to LOAD_PHYSICAL_ADDR. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Reported-by: Tom Englund <tomenglund26@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218404 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	efi/x86: Avoid physical KASLR on older Dell systems	Ard Biesheuvel	1	-7/+24
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit 50d7cdf7a9b1ab6f4f74a69c84e974d5dc0c1bf1 upstream ] River reports boot hangs with v6.6 and v6.7, and the bisect points to commit a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") which moves the memory allocation and kernel decompression from the legacy decompressor (which executes after ExitBootServices()) to the EFI stub, using boot services for allocating the memory. The memory allocation succeeds but the subsequent call to decompress_kernel() never returns, resulting in a failed boot and a hanging system. As it turns out, this issue only occurs when physical address randomization (KASLR) is enabled, and given that this is a feature we can live without (virtual KASLR is much more important), let's disable the physical part of KASLR when booting on AMI UEFI firmware claiming to implement revision v2.0 of the specification (which was released in 2006), as this is the version these systems advertise. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218173 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Avoid legacy decompressor when doing EFI boot	Ard Biesheuvel	1	-94/+72
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a upstream ] The bare metal decompressor code was never really intended to run in a hosted environment such as the EFI boot services, and does a few things that are becoming problematic in the context of EFI boot now that the logo requirements are getting tighter: EFI executables will no longer be allowed to consist of a single executable section that is mapped with read, write and execute permissions if they are intended for use in a context where Secure Boot is enabled (and where Microsoft's set of certificates is used, i.e., every x86 PC built to run Windows). To avoid stepping on reserved memory before having inspected the E820 tables, and to ensure the correct placement when running a kernel build that is non-relocatable, the bare metal decompressor moves its own executable image to the end of the allocation that was reserved for it, in order to perform the decompression in place. This means the region in question requires both write and execute permissions, which either need to be given upfront (which EFI will no longer permit), or need to be applied on demand using the existing page fault handling framework. However, the physical placement of the kernel is usually randomized anyway, and even if it isn't, a dedicated decompression output buffer can be allocated anywhere in memory using EFI APIs when still running in the boot services, given that EFI support already implies a relocatable kernel. This means that decompression in place is never necessary, nor is moving the compressed image from one end to the other. Since EFI already maps all of memory 1:1, it is also unnecessary to create new page tables or handle page faults when decompressing the kernel. That means there is also no need to replace the special exception handlers for SEV. Generally, there is little need to do any of the things that the decompressor does beyond - initialize SEV encryption, if needed, - perform the 4/5 level paging switch, if needed, - decompress the kernel - relocate the kernel So do all of this from the EFI stub code, and avoid the bare metal decompressor altogether. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-24-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Perform SNP feature test while running in the firmware	Ard Biesheuvel	1	-0/+17
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit 31c77a50992e8dd136feed7b67073bb5f1f978cc upstream ] Before refactoring the EFI stub boot flow to avoid the legacy bare metal decompressor, duplicate the SNP feature check in the EFI stub before handing over to the kernel proper. The SNP feature check can be performed while running under the EFI boot services, which means it can force the boot to fail gracefully and return an error to the bootloader if the loaded kernel does not implement support for all the features that the hypervisor enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-23-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Prefer EFI memory attributes protocol over DXE services	Ard Biesheuvel	1	-8/+21
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit 11078876b7a6a1b7226344fecab968945c806832 upstream ] Currently, the EFI stub relies on DXE services in some cases to clear non-execute restrictions from page allocations that need to be executable. This is dodgy, because DXE services are not specified by UEFI but by PI, and they are not intended for consumption by OS loaders. However, no alternative existed at the time. Now, there is a new UEFI protocol that should be used instead, so if it exists, prefer it over the DXE services calls. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-18-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Perform 4/5 level paging switch from the stub	Ard Biesheuvel	6	-26/+130
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit cb380000dd23cbbf8bd7d023b51896804c1f7e68 upstream ] In preparation for updating the EFI stub boot flow to avoid the bare metal decompressor code altogether, implement the support code for switching between 4 and 5 levels of paging before jumping to the kernel proper. This reuses the newly refactored trampoline that the bare metal decompressor uses, but relies on EFI APIs to allocate 32-bit addressable memory and remap it with the appropriate permissions. Given that the bare metal decompressor will no longer call into the trampoline if the number of paging levels is already set correctly, it is no longer needed to remove NX restrictions from the memory range where this trampoline may end up. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	efi/libstub: Add limit argument to efi_random_alloc()	Ard Biesheuvel	3	-6/+8
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit bc5ddceff4c14494d83449ad45c985e6cd353fce upstream ] x86 will need to limit the kernel memory allocation to the lowest 512 MiB of memory, to match the behavior of the existing bare metal KASLR physical randomization logic. So in preparation for that, add a limit parameter to efi_random_alloc() and wire it up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-22-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	efi/libstub: Add memory attribute protocol definitions	Ard Biesheuvel	1	-0/+20
	From: Evgeniy Baskov <baskov@ispras.ru> [ Commit 79729f26b074a5d2722c27fa76cc45ef721e65cd upstream ] EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to DXE services for setting memory attributes in EFI Boot Services environment. This protocol is better since it is a part of UEFI specification itself and not UEFI PI specification like DXE services. Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions. Support mixed mode properly for its calls. Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Clear BSS in EFI handover protocol entrypoint	Ard Biesheuvel	1	-2/+11
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit d7156b986d4cc0657fa6dc05c9fcf51c3d55a0fe upstream ] The so-called EFI handover protocol is value-add from the distros that permits a loader to simply copy a PE kernel image into memory and call an alternative entrypoint that is described by an embedded boot_params structure. Most implementations of this protocol do not bother to check the PE header for minimum alignment, section placement, etc, and therefore also don't clear the image's BSS, or even allocate enough memory for it. Allocating more memory on the fly is rather difficult, but at least clear the BSS region explicitly when entering in this manner, so that the EFI stub code does not get confused by global variables that were not zero-initialized correctly. When booting in mixed mode, this BSS clearing must occur before any global state is created, so clear it in the 32-bit asm entry point. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-7-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Simplify and clean up handover entry code	Ard Biesheuvel	1	-4/+16
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit df9215f15206c2a81909ccf60f21d170801dce38 upstream ] Now that the EFI entry code in assembler is only used by the optional and deprecated EFI handover protocol, and given that the EFI stub C code no longer returns to it, most of it can simply be dropped. While at it, clarify the symbol naming, by merging efi_main() and efi_stub_entry(), making the latter the shared entry point for all different boot modes that enter via the EFI stub. The efi32_stub_entry() and efi64_stub_entry() names are referenced explicitly by the tooling that populates the setup header, so these must be retained, but can be emitted as aliases of efi_stub_entry() where appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-5-ardb@kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	arm64: efi: Limit allocations to 48-bit addressable physical region	Ard Biesheuvel	5	-3/+12
	From: Ard Biesheuvel <ardb@kernel.org> [ Commit a37dac5c5dcfe0f1fd58513c16cdbc280a47f628 upstream ] The UEFI spec does not mention or reason about the configured size of the virtual address space at all, but it does mention that all memory should be identity mapped using a page size of 4 KiB. This means that a LPA2 capable system that has any system memory outside of the 48-bit addressable physical range and follows the spec to the letter may serve page allocation requests from regions of memory that the kernel cannot access unless it was built with LPA2 support and enables it at runtime. So let's ensure that all page allocations are limited to the 48-bit range. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/efistub: Branch straight to kernel entry point from C code	Ard Biesheuvel	1	-5/+17
	commit d2d7a54f69b67cd0a30e0ebb5307cb2de625baac upstream. Instead of returning to the calling code in assembler that does nothing more than perform an indirect call with the boot_params pointer in register ESI/RSI, perform the jump directly from the EFI stub C code. This will allow the asm entrypoint code to be dropped entirely in subsequent patches. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230807162720.545787-4-ardb@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	x86/boot/compressed, efi: Merge multiple definitions of image_offset into one	Ard Biesheuvel	1	-1/+1
	commit 4b52016247aeaa55ca3e3bc2e03cd91114c145c2 upstream. There is no need for head_32.S and head_64.S both declaring a copy of the global 'image_offset' variable, so drop those and make the extern C declaration the definition. When image_offset is moved to the .c file, it needs to be placed particularly in the .data section because it lands by default in the .bss section which is cleared too late, in .Lrelocated, before the first access to it and thus garbage gets read, leading to SEV guests exploding in early boot. This happens only when the SEV guest kernel is loaded through grub. If supplied with qemu's -kernel command line option, that memory is always cleared upfront by qemu and all is fine there. [ bp: Expand commit message with SEV aspect. ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221122161017.2426828-8-ardb@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06	efi: libstub: use EFI_LOADER_CODE region when moving the kernel in memory	Ard Biesheuvel	5	-9/+16
	commit 9cf42bca30e98a1c6c9e8abf876940a551eaa3d1 upstream. The EFI spec is not very clear about which permissions are being given when allocating pages of a certain type. However, it is quite obvious that EFI_LOADER_CODE is more likely to permit execution than EFI_LOADER_DATA, which becomes relevant once we permit booting the kernel proper with the firmware's 1:1 mapping still active. Ostensibly, recent systems such as the Surface Pro X grant executable permissions to EFI_LOADER_CODE regions but not EFI_LOADER_DATA regions. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01	riscv/efistub: Ensure GP-relative addressing is not used	Jan Kiszka	1	-1/+1
	commit afb2a4fb84555ef9e61061f6ea63ed7087b295d5 upstream. The cflags for the RISC-V efistub were missing -mno-relax, thus were under the risk that the compiler could use GP-relative addressing. That happened for _edata with binutils-2.41 and kernel 6.1, causing the relocation to fail due to an invalid kernel_size in handle_kernel_image. It was not yet observed with newer versions, but that may just be luck. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13	x86/efistub: Fix PCI ROM preservation in mixed mode	Mikel Rychliski	1	-1/+1
	[ Upstream commit 8b94da92559f7e403dc7ab81937cc50f949ee2fd ] preserve_pci_rom_image() was accessing the romsize field in efi_pci_io_protocol_t directly instead of using the efi_table_attr() helper. This prevents the ROM image from being saved correctly during a mixed mode boot. Fixes: 2c3625cb9fa2 ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function") Signed-off-by: Mikel Rychliski <mikel@mikelr.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19	efi/libstub: Disable PCI DMA before grabbing the EFI memory map	Ard Biesheuvel	1	-3/+3
	[ Upstream commit 2e28a798c3092ea42b968fa16ac835969d124898 ] Currently, the EFI stub will disable PCI DMA as the very last thing it does before calling ExitBootServices(), to avoid interfering with the firmware's normal operation as much as possible. However, the stub will invoke DisconnectController() on all endpoints downstream of the PCI bridges it disables, and this may affect the layout of the EFI memory map, making it substantially more likely that ExitBootServices() will fail the first time around, and that the EFI memory map needs to be reloaded. This, in turn, increases the likelihood that the slack space we allocated is insufficient (and we can no longer allocate memory via boot services after having called ExitBootServices() once), causing the second call to GetMemoryMap (and therefore the boot) to fail. This makes the PCI DMA disable feature a bit more fragile than it already is, so let's make it more robust, by allocating the space for the EFI memory map after disabling PCI DMA. Fixes: 4444f8541dad16fe ("efi: Allow disabling PCI busmastering on bridges during boot") Reported-by: Glenn Washburn <development@efficientek.com> Acked-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09	arm64: efi: Use SMBIOS processor version to key off Ampere quirk	Ard Biesheuvel	3	-13/+80
	commit eb684408f3ea4856639675d6465f0024e498e4b1 upstream. Instead of using the SMBIOS type 1 record 'family' field, which is often modified by OEMs, use the type 4 'processor ID' and 'processor version' fields, which are set to a small set of probe-able values on all known Ampere EFI systems in the field. Fixes: 550b33cfd4452968 ("arm64: efi: Force the use of ...") Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-30	efi/libstub: smbios: Use length member instead of record struct size	Ard Biesheuvel	1	-1/+1
	[ Upstream commit 34343eb06afc04af9178a9883d9354dc12beede0 ] The type 1 SMBIOS record happens to always be the same size, but there are other record types which have been augmented over time, and so we should really use the length field in the header to decide where the string table starts. Fixes: 550b33cfd4452968 ("arm64: efi: Force the use of ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-14	arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max ↵	Darren Hart	1	-3/+6
	machines commit 190233164cd77115f8dea718cbac561f557092c6 upstream. Commit 550b33cfd445 ("arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines") identifies the Altra family via the family field in the type#1 SMBIOS record. eMAG and Altra Max machines are similarly affected but not detected with the strict strcmp test. The type1_family smbios string is not an entirely reliable means of identifying systems with this issue as OEMs can, and do, use their own strings for these fields. However, until we have a better solution, capture the bulk of these systems by adding strcmp matching for "eMAG" and "Altra Max". Fixes: 550b33cfd445 ("arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines") Cc: <stable@vger.kernel.org> # 6.1.x Cc: Alexandru Elisei <alexandru.elisei@gmail.com> Signed-off-by: Darren Hart <darren@os.amperecomputing.com> Tested-by: Justin He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12	efi: random: combine bootloader provided RNG seed with RNG protocol output	Ard Biesheuvel	2	-6/+38
	commit 196dff2712ca5a2e651977bb2fe6b05474111a83 upstream. Instead of blindly creating the EFI random seed configuration table if the RNG protocol is implemented and works, check whether such a EFI configuration table was provided by an earlier boot stage and if so, concatenate the existing and the new seeds, leaving it up to the core code to mix it in and credit it the way it sees fit. This can be used for, e.g., systemd-boot, to pass an additional seed to Linux in a way that can be consumed by the kernel very early. In that case, the following definitions should be used to pass the seed to the EFI stub: struct linux_efi_random_seed { u32 size; // of the 'seed' array in bytes u8 seed[]; }; The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY pool memory, and the address of the struct in memory should be installed as a EFI configuration table using the following GUID: LINUX_EFI_RANDOM_SEED_TABLE_GUID 1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b Note that doing so is safe even on kernels that were built without this patch applied, but the seed will simply be overwritten with a seed derived from the EFI RNG protocol, if available. The recommended seed size is 32 bytes, and seeds larger than 512 bytes are considered corrupted and ignored entirely. In order to preserve forward secrecy, seeds from previous bootloaders are memzero'd out, and in order to preserve memory, those older seeds are also freed from memory. Freeing from memory without first memzeroing is not safe to do, as it's possible that nothing else will ever overwrite those pages used by EFI. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> [ardb: incorporate Jason's followup changes to extend the maximum seed size on the consumer end, memzero() it and drop a needless printk] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-11	arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines	Ard Biesheuvel	4	-2/+93
	Ampere Altra machines are reported to misbehave when the SetTime() EFI runtime service is called after ExitBootServices() but before calling SetVirtualAddressMap(). Given that the latter is horrid, pointless and explicitly documented as optional by the EFI spec, we no longer invoke it at boot if the configured size of the VA space guarantees that the EFI runtime memory regions can remain mapped 1:1 like they are at boot time. On Ampere Altra machines, this results in SetTime() calls issued by the rtc-efi driver triggering synchronous exceptions during boot. We can now recover from those without bringing down the system entirely, due to commit 23715a26c8d81291 ("arm64: efi: Recover from synchronous exceptions occurring in firmware"). However, it would be better to avoid the issue entirely, given that the firmware appears to remain in a funny state after this. So attempt to identify these machines based on the 'family' field in the type #1 SMBIOS record, and call SetVirtualAddressMap() unconditionally in that case. Tested-by: Alexandru Elisei <alexandru.elisei@gmail.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-24	efi: random: Use 'ACPI reclaim' memory for random seed	Ard Biesheuvel	1	-1/+6
	EFI runtime services data is guaranteed to be preserved by the OS, making it a suitable candidate for the EFI random seed table, which may be passed to kexec kernels as well (after refreshing the seed), and so we need to ensure that the memory is preserved without support from the OS itself. However, runtime services data is intended for allocations that are relevant to the implementations of the runtime services themselves, and so they are unmapped from the kernel linear map, and mapped into the EFI page tables that are active while runtime service invocations are in progress. None of this is needed for the RNG seed. So let's switch to EFI 'ACPI reclaim' memory: in spite of the name, there is nothing exclusively ACPI about it, it is simply a type of allocation that carries firmware provided data which may or may not be relevant to the OS, and it is left up to the OS to decide whether to reclaim it after having consumed its contents. Given that in Linux, we never reclaim these allocations, it is a good choice for the EFI RNG seed, as the allocation is guaranteed to survive kexec reboots. One additional reason for changing this now is to align it with the upcoming recommendation for EFI bootloader provided RNG seeds, which must not use EFI runtime services code/data allocations. Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
2022-10-21	efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0	Ard Biesheuvel	1	-4/+4
	The generic EFI stub can be instructed to avoid SetVirtualAddressMap(), and simply run with the firmware's 1:1 mapping. In this case, it populates the virtual address fields of the runtime regions in the memory map with the physical address of each region, so that the mapping code has to be none the wiser. Only if SetVirtualAddressMap() fails, the virtual addresses are wiped and the kernel code knows that the regions cannot be mapped. However, wiping amounts to setting it to zero, and if a runtime region happens to live at physical address 0, its valid 1:1 mapped virtual address could be mistaken for a wiped field, resulting on loss of access to the EFI services at runtime. So let's only assume that VA == 0 means 'no runtime services' if the region in question does not live at PA 0x0. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-21	efi: libstub: Fix incorrect payload size in zboot header	Ard Biesheuvel	1	-1/+2
	The linker script symbol definition that captures the size of the compressed payload inside the zboot decompressor (which is exposed via the image header) refers to '.' for the end of the region, which does not give the correct result as the expression is not placed at the end of the payload. So use the symbol name explicitly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-21	efi: libstub: Give efi_main() asmlinkage qualification	Ard Biesheuvel	1	-3/+3
	To stop the bots from sending sparse warnings to me and the list about efi_main() not having a prototype, decorate it with asmlinkage so that it is clear that it is called from assembly, and therefore needs to remain external, even if it is never declared in a header file. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-21	efi: libstub: Remove zboot signing from build options	Ard Biesheuvel	1	-25/+4
	The zboot decompressor series introduced a feature to sign the PE/COFF kernel image for secure boot as part of the kernel build. This was necessary because there are actually two images that need to be signed: the kernel with the EFI stub attached, and the decompressor application. This is a bit of a burden, because it means that the images must be signed on the the same system that performs the build, and this is not realistic for distros. During the next cycle, we will introduce changes to the zboot code so that the inner image no longer needs to be signed. This means that the outer PE/COFF image can be handled as usual, and be signed later in the release process. Let's remove the associated Kconfig options now so that they don't end up in a LTS release while already being deprecated. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-11	Merge tag 'mm-stable-2022-10-08' of ↵	Linus Torvalds	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-09	Merge tag 'efi-next-for-v6.1' of ↵	Linus Torvalds	18	-428/+1177
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "A bit more going on than usual in the EFI subsystem. The main driver for this has been the introduction of the LoonArch architecture last cycle, which inspired some cleanup and refactoring of the EFI code. Another driver for EFI changes this cycle and in the future is confidential compute. The LoongArch architecture does not use either struct bootparams or DT natively [yet], and so passing information between the EFI stub and the core kernel using either of those is undesirable. And in general, overloading DT has been a source of issues on arm64, so using DT for this on new architectures is a to avoid for the time being (even if we might converge on something DT based for non-x86 architectures in the future). For this reason, in addition to the patch that enables EFI boot for LoongArch, there are a number of refactoring patches applied on top of which separate the DT bits from the generic EFI stub bits. These changes are on a separate topich branch that has been shared with the LoongArch maintainers, who will include it in their pull request as well. This is not ideal, but the best way to manage the conflicts without stalling LoongArch for another cycle. Another development inspired by LoongArch is the newly added support for EFI based decompressors. Instead of adding yet another arch-specific incarnation of this pattern for LoongArch, we are introducing an EFI app based on the existing EFI libstub infrastructure that encapulates the decompression code we use on other architectures, but in a way that is fully generic. This has been developed and tested in collaboration with distro and systemd folks, who are eager to start using this for systemd-boot and also for arm64 secure boot on Fedora. Note that the EFI zimage files this introduces can also be decompressed by non-EFI bootloaders if needed, as the image header describes the location of the payload inside the image, and the type of compression that was used. (Note that Fedora's arm64 GRUB is buggy [0] so you'll need a recent version or switch to systemd-boot in order to use this.) Finally, we are adding TPM measurement of the kernel command line provided by EFI. There is an oversight in the TCG spec which results in a blind spot for command line arguments passed to loaded images, which means that either the loader or the stub needs to take the measurement. Given the combinatorial explosion I am anticipating when it comes to firmware/bootloader stacks and firmware based attestation protocols (SEV-SNP, TDX, DICE, DRTM), it is good to set a baseline now when it comes to EFI measured boot, which is that the kernel measures the initrd and command line. Intermediate loaders can measure additional assets if needed, but with the baseline in place, we can deploy measured boot in a meaningful way even if you boot into Linux straight from the EFI firmware. Summary: - implement EFI boot support for LoongArch - implement generic EFI compressed boot support for arm64, RISC-V and LoongArch, none of which implement a decompressor today - measure the kernel command line into the TPM if measured boot is in effect - refactor the EFI stub code in order to isolate DT dependencies for architectures other than x86 - avoid calling SetVirtualAddressMap() on arm64 if the configured size of the VA space guarantees that doing so is unnecessary - move some ARM specific code out of the generic EFI source files - unmap kernel code from the x86 mixed mode 1:1 page tables" * tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) efi/arm64: libstub: avoid SetVirtualAddressMap() when possible efi: zboot: create MemoryMapped() device path for the parent if needed efi: libstub: fix up the last remaining open coded boot service call efi/arm: libstub: move ARM specific code out of generic routines efi/libstub: measure EFI LoadOptions efi/libstub: refactor the initrd measuring functions efi/loongarch: libstub: remove dependency on flattened DT efi: libstub: install boot-time memory map as config table efi: libstub: remove DT dependency from generic stub efi: libstub: unify initrd loading between architectures efi: libstub: remove pointless goto kludge efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap efi: libstub: avoid efi_get_memory_map() for allocating the virt map efi: libstub: drop pointless get_memory_map() call efi: libstub: fix type confusion for load_options_size arm64: efi: enable generic EFI compressed boot loongarch: efi: enable generic EFI compressed boot riscv: efi: enable generic EFI compressed boot efi/libstub: implement generic EFI zboot efi/libstub: move efi_system_table global var into separate object ...