summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-10-24x86/microcode/intel: Add a minimum required revision for late loadingAshok Raj1-4/+33
In general users, don't have the necessary information to determine whether late loading of a new microcode version is safe and does not modify anything which the currently running kernel uses already, e.g. removal of CPUID bits or behavioural changes of MSRs. To address this issue, Intel has added a "minimum required version" field to a previously reserved field in the microcode header. Microcode updates should only be applied if the current microcode version is equal to, or greater than this minimum required version. Thomas made some suggestions on how meta-data in the microcode file could provide Linux with information to decide if the new microcode is suitable candidate for late loading. But even the "simpler" option requires a lot of metadata and corresponding kernel code to parse it, so the final suggestion was to add the 'minimum required version' field in the header. When microcode changes visible features, microcode will set the minimum required version to its own revision which prevents late loading. Old microcode blobs have the minimum revision field always set to 0, which indicates that there is no information and the kernel considers it unsafe. This is a pure OS software mechanism. The hardware/firmware ignores this header field. For early loading there is no restriction because OS visible features are enumerated after the early load and therefore a change has no effect. The check is always enabled, but by default not enforced. It can be enforced via Kconfig or kernel command line. If enforced, the kernel refuses to late load microcode with a minimum required version field which is zero or when the currently loaded microcode revision is smaller than the minimum required revision. If not enforced the load happens independent of the revision check to stay compatible with the existing behaviour, but it influences the decision whether the kernel is tainted or not. If the check signals that the late load is safe, then the kernel is not tainted. Early loading is not affected by this. [ tglx: Massaged changelog and fixed up the implementation ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.776467264@linutronix.de
2023-10-24x86/microcode: Prepare for minimal revision checkThomas Gleixner4-5/+22
Applying microcode late can be fatal for the running kernel when the update changes functionality which is in use already in a non-compatible way, e.g. by removing a CPUID bit. There is no way for admins which do not have access to the vendors deep technical support to decide whether late loading of such a microcode is safe or not. Intel has added a new field to the microcode header which tells the minimal microcode revision which is required to be active in the CPU in order to be safe. Provide infrastructure for handling this in the core code and a command line switch which allows to enforce it. If the update is considered safe the kernel is not tainted and the annoying warning message not emitted. If it's enforced and the currently loaded microcode revision is not safe for late loading then the load is aborted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de
2023-10-24x86/microcode: Handle "offline" CPUs correctlyThomas Gleixner3-6/+112
Offline CPUs need to be parked in a safe loop when microcode update is in progress on the primary CPU. Currently, offline CPUs are parked in mwait_play_dead(), and for Intel CPUs, its not a safe instruction, because the MWAIT instruction can be patched in the new microcode update that can cause instability. - Add a new microcode state 'UCODE_OFFLINE' to report status on per-CPU basis. - Force NMI on the offline CPUs. Wake up offline CPUs while the update is in progress and then return them back to mwait_play_dead() after microcode update is complete. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.660850472@linutronix.de
2023-10-24x86/apic: Provide apic_force_nmi_on_cpu()Thomas Gleixner4-0/+12
When SMT siblings are soft-offlined and parked in one of the play_dead() variants they still react on NMI, which is problematic on affected Intel CPUs. The default play_dead() variant uses MWAIT on modern CPUs, which is not guaranteed to be safe when updated concurrently. Right now late loading is prevented when not all SMT siblings are online, but as they still react on NMI, it is possible to bring them out of their park position into a trivial rendezvous handler. Provide a function which allows to do that. I does sanity checks whether the target is in the cpus_booted_once_mask and whether the APIC driver supports it. Mark X2APIC and XAPIC as capable, but exclude 32bit and the UV and NUMACHIP variants as that needs feedback from the relevant experts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.603100036@linutronix.de
2023-10-24x86/microcode: Protect against instrumentationThomas Gleixner1-28/+83
The wait for control loop in which the siblings are waiting for the microcode update on the primary thread must be protected against instrumentation as instrumentation can end up in #INT3, #DB or #PF, which then returns with IRET. That IRET reenables NMI which is the opposite of what the NMI rendezvous is trying to achieve. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.545969323@linutronix.de
2023-10-24x86/microcode: Rendezvous and load in NMIThomas Gleixner4-5/+45
stop_machine() does not prevent the spin-waiting sibling from handling an NMI, which is obviously violating the whole concept of rendezvous. Implement a static branch right in the beginning of the NMI handler which is nopped out except when enabled by the late loading mechanism. The late loader enables the static branch before stop_machine() is invoked. Each CPU has an nmi_enable in its control structure which indicates whether the CPU should go into the update routine. This is required to bridge the gap between enabling the branch and actually being at the point where it is required to enter the loader wait loop. Each CPU which arrives in the stopper thread function sets that flag and issues a self NMI right after that. If the NMI function sees the flag clear, it returns. If it's set it clears the flag and enters the rendezvous. This is safe against a real NMI which hits in between setting the flag and sending the NMI to itself. The real NMI will be swallowed by the microcode update and the self NMI will then let stuff continue. Otherwise this would end up with a spurious NMI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.489900814@linutronix.de
2023-10-24x86/microcode: Replace the all-in-one rendevous handlerThomas Gleixner1-42/+9
with a new handler which just separates the control flow of primary and secondary CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.433704135@linutronix.de
2023-10-24x86/microcode: Provide new control functionsThomas Gleixner1-0/+84
The current all in one code is unreadable and really not suited for adding future features like uniform loading with package or system scope. Provide a set of new control functions which split the handling of the primary and secondary CPUs. These will replace the current rendezvous all in one function in the next step. This is intentionally a separate change because diff makes an complete unreadable mess otherwise. So the flow separates the primary and the secondary CPUs into their own functions which use the control field in the per CPU ucode_ctrl struct. primary() secondary() wait_for_all() wait_for_all() apply_ucode() wait_for_release() release() apply_ucode() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.377922731@linutronix.de
2023-10-24x86/microcode: Add per CPU control fieldThomas Gleixner1-2/+18
Add a per CPU control field to ucode_ctrl and define constants for it which are going to be used to control the loading state machine. In theory this could be a global control field, but a global control does not cover the following case: 15 primary CPUs load microcode successfully 1 primary CPU fails and returns with an error code With global control the sibling of the failed CPU would either try again or the whole operation would be aborted with the consequence that the 15 siblings do not invoke the apply path and end up with inconsistent software state. The result in dmesg would be inconsistent too. There are two additional fields added and initialized: ctrl_cpu and secondaries. ctrl_cpu is the CPU number of the primary thread for now, but with the upcoming uniform loading at package or system scope this will be one CPU per package or just one CPU. Secondaries hands the control CPU a CPU mask which will be required to release the secondary CPUs out of the wait loop. Preparatory change for implementing a properly split control flow for primary and secondary CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.319959519@linutronix.de
2023-10-24x86/microcode: Add per CPU result stateThomas Gleixner2-47/+68
The microcode rendezvous is purely acting on global state, which does not allow to analyze fails in a coherent way. Introduce per CPU state where the results are written into, which allows to analyze the return codes of the individual CPUs. Initialize the state when walking the cpu_present_mask in the online check to avoid another for_each_cpu() loop. Enhance the result print out with that. The structure is intentionally named ucode_ctrl as it will gain control fields in subsequent changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.632681010@linutronix.de
2023-10-24x86/microcode: Sanitize __wait_for_cpus()Thomas Gleixner1-22/+17
The code is too complicated for no reason: - The return value is pointless as this is a strict boolean. - It's way simpler to count down from num_online_cpus() and check for zero. - The timeout argument is pointless as this is always one second. - Touching the NMI watchdog every 100ns does not make any sense, neither does checking every 100ns. This is really not a hotpath operation. Preload the atomic counter with the number of online CPUs and simplify the whole timeout logic. Delay for one microsecond and touch the NMI watchdog once per millisecond. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.204251527@linutronix.de
2023-10-24x86/microcode: Clarify the late load logicThomas Gleixner1-22/+19
reload_store() is way too complicated. Split the inner workings out and make the following enhancements: - Taint the kernel only when the microcode was actually updated. If. e.g. the rendezvous fails, then nothing happened and there is no reason for tainting. - Return useful error codes Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20231002115903.145048840@linutronix.de
2023-10-24x86/microcode: Handle "nosmt" correctlyThomas Gleixner3-30/+43
On CPUs where microcode loading is not NMI-safe the SMT siblings which are parked in one of the play_dead() variants still react to NMIs. So if an NMI hits while the primary thread updates the microcode the resulting behaviour is undefined. The default play_dead() implementation on modern CPUs is using MWAIT which is not guaranteed to be safe against a microcode update which affects MWAIT. Take the cpus_booted_once_mask into account to detect this case and refuse to load late if the vendor specific driver does not advertise that late loading is NMI safe. AMD stated that this is safe, so mark the AMD driver accordingly. This requirement will be partially lifted in later changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.087472735@linutronix.de
2023-10-24x86/microcode: Clean up mc_cpu_down_prep()Thomas Gleixner1-7/+1
This function has nothing to do with suspend. It's a hotplug callback. Remove the bogus comment. Drop the pointless debug printk. The hotplug core provides tracepoints which track the invocation of those callbacks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.028651784@linutronix.de
2023-10-24x86/microcode: Get rid of the schedule work indirectionThomas Gleixner1-19/+10
Scheduling work on all CPUs to collect the microcode information is just another extra step for no value. Let the CPU hotplug callback registration do it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.354748138@linutronix.de
2023-10-24x86/microcode: Mop up early loading leftoversThomas Gleixner2-17/+1
Get rid of the initrd_gone hack which was required to keep find_microcode_in_initrd() functional after init. As find_microcode_in_initrd() is now only used during init, mark it accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.298854846@linutronix.de
2023-10-24x86/microcode/amd: Use cached microcode for AP loadThomas Gleixner3-24/+13
Now that the microcode cache is initialized before the APs are brought up, there is no point in scanning builtin/initrd microcode during AP loading. Convert the AP loader to utilize the cache, which in turn makes the CPU hotplug callback which applies the microcode after initrd/builtin is gone, obsolete as the early loading during late hotplug operations including the resume path depends now only on the cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.243426023@linutronix.de
2023-10-24x86/microcode/amd: Cache builtin/initrd microcode earlyThomas Gleixner2-17/+11
There is no reason to scan builtin/initrd microcode on each AP. Cache the builtin/initrd microcode in an early initcall so that the early AP loader can utilize the cache. The existing fs initcall which invoked save_microcode_in_initrd_amd() is still required to maintain the initrd_gone flag. Rename it accordingly. This will be removed once the AP loader code is converted to use the cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.187566507@linutronix.de
2023-10-24x86/microcode/amd: Cache builtin microcode tooThomas Gleixner1-1/+1
save_microcode_in_initrd_amd() fails to cache builtin microcode and only scans initrd. Use find_blobs_in_containers() instead which covers both. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010150702.495139089@linutronix.de
2023-10-24x86/microcode/amd: Use correct per CPU ucode_cpu_infoThomas Gleixner1-3/+3
find_blobs_in_containers() is invoked on every CPU but overwrites unconditionally ucode_cpu_info of CPU0. Fix this by using the proper CPU data and move the assignment into the call site apply_ucode_from_containers() so that the function can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010150702.433454320@linutronix.de
2023-10-24x86/microcode: Remove pointless apply() invocationThomas Gleixner1-17/+6
Microcode is applied on the APs during early bringup. There is no point in trying to apply the microcode again during the hotplug operations and neither at the point where the microcode device is initialized. Collect CPU info and microcode revision in setup_online_cpu() for now. This will move to the CPU hotplug callback later. [ bp: Leave the starting notifier for the following scenario: - boot, late load, suspend to disk, resume without the starting notifier, only the last core manages to update the microcode upon resume: # rdmsr -a 0x8b 10000bf 10000bf 10000bf 10000bf 10000bf 10000dc <---- This is on an AMD F10h machine. For the future, one should check whether potential unification of the CPU init path could cover the resume path too so that this can be simplified even more. tglx: This is caused by the odd handling of APs which try to find the microcode blob in builtin or initrd instead of caching the microcode blob during early init before the APs are brought up. Will be cleaned up in a later step. ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231017211723.018821624@linutronix.de
2023-10-24x86/microcode/intel: Rework intel_find_matching_signature()Thomas Gleixner1-12/+19
Take a cpu_signature argument and work from there. Move the match() helper next to the callsite as there is no point for having it in a header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.797820205@linutronix.de
2023-10-24x86/microcode/intel: Reuse intel_cpu_collect_info()Thomas Gleixner1-15/+1
No point for an almost duplicate function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.741173606@linutronix.de
2023-10-24x86/microcode/intel: Rework intel_cpu_collect_info()Thomas Gleixner1-24/+9
Nothing needs struct ucode_cpu_info. Make it take struct cpu_signature, let it return a boolean and simplify the implementation. Rename it now that the silly name clash with collect_cpu_info() is gone. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.851573238@linutronix.de
2023-10-24x86/microcode/intel: Unify microcode apply() functionsThomas Gleixner1-68/+36
Deduplicate the early and late apply() functions. [ bp: Rename the function which does the actual application to __apply_microcode() to differentiate it from microcode_ops.apply_microcode(). ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231017211722.795508212@linutronix.de
2023-10-24x86/microcode/intel: Switch to kvmalloc()Thomas Gleixner1-23/+25
Microcode blobs are getting larger and might soon reach the kmalloc() limit. Switch over kvmalloc(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.564323243@linutronix.de
2023-10-24x86/microcode/intel: Save the microcode only after a successful late-loadThomas Gleixner3-15/+20
There are situations where the late microcode is loaded into memory but is not applied: 1) The rendezvous fails 2) The microcode is rejected by the CPUs If any of this happens then the pointer which was updated at firmware load time is stale and subsequent CPU hotplug operations either fail to update or create inconsistent microcode state. Save the loaded microcode in a separate pointer before the late load is attempted and when successful, update the hotplug pointer accordingly via a new microcode_ops callback. Remove the pointless fallback in the loader to a microcode pointer which is never populated. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.505491309@linutronix.de
2023-10-24x86/microcode/intel: Simplify early loadingThomas Gleixner3-93/+79
The early loading code is overly complicated: - It scans the builtin/initrd for microcode not only on the BSP, but also on all APs during early boot and then later in the boot process it scans again to duplicate and save the microcode before initrd goes away. That's a pointless exercise because this can be simply done before bringing up the APs when the memory allocator is up and running. - Saving the microcode from within the scan loop is completely non-obvious and a left over of the microcode cache. This can be done at the call site now which makes it obvious. Rework the code so that only the BSP scans the builtin/initrd microcode once during early boot and save it away in an early initcall for later use. [ bp: Test and fold in a fix from tglx ontop which handles the need to distinguish what save_microcode() does depending on when it is called: - when on the BSP during early load, it needs to find a newer revision than the one currently loaded on the BSP - later, before SMP init, it still runs on the BSP and gets the BSP revision just loaded and uses that revision to know which patch to save for the APs. For that it needs to find the exact one as on the BSP. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.629085215@linutronix.de
2023-10-23Merge tag 'v6.6-rc7' into sched/core, to pick up fixesIngo Molnar11-99/+175
Pick up recent sched/urgent fixes merged upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-23x86/percpu: Introduce const-qualified const_pcpu_hot to micro-optimize code ↵Uros Bizjak2-0/+2
generation Some variables in pcpu_hot, currently current_task and top_of_stack are actually per-thread variables implemented as per-CPU variables and thus stable for the duration of the respective task. There is already an attempt to eliminate redundant reads from these variables using this_cpu_read_stable() asm macro, which hides the dependency on the read memory address. However, the compiler has limited ability to eliminate asm common subexpressions, so this approach results in a limited success. The solution is to allow more aggressive elimination by aliasing pcpu_hot into a const-qualified const_pcpu_hot, and to read stable per-CPU variables from this constant copy. The current per-CPU infrastructure does not support reads from const-qualified variables. However, when the compiler supports segment qualifiers, it is possible to declare the const-aliased variable in the relevant named address space. The compiler considers access to the variable, declared in this way, as a read from a constant location, and will optimize reads from the variable accordingly. By implementing constant-qualified const_pcpu_hot, the compiler can eliminate redundant reads from the constant variables, reducing the number of loads from current_task from 3766 to 3217 on a test build, a -14.6% reduction. The reduction of loads translates to the following code savings: text data bss dec hex filename 25,477,353 4389456 808452 30675261 1d4113d vmlinux-old.o 25,476,074 4389440 808452 30673966 1d40c2e vmlinux-new.o representing a code size reduction of -1279 bytes. [ mingo: Updated the changelog, EXPORT(const_pcpu_hot). ] Co-developed-by: Nadav Amit <namit@vmware.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231020162004.135244-1-ubizjak@gmail.com
2023-10-20x86/callthunks: Delete unused "struct thunk_desc"Alexey Dobriyan1-5/+0
It looks like it was never used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/843bf596-db67-4b33-a865-2bae4a4418e5@p183
2023-10-20x86/srso: Remove unnecessary semicolonYang Li1-1/+1
scripts/coccinelle/misc/semicolon.cocci reports: arch/x86/kernel/cpu/bugs.c:713:2-3: Unneeded semicolon Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230810010550.25733-1-yang.lee@linux.alibaba.com
2023-10-20x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()Josh Poimboeuf1-1/+2
For consistency with the other return thunks, rename __x86_return_skl() to call_depth_return_thunk(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ae44e9f9976934e3b5b47a458d523ccb15867561.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Disentangle rethunk-dependent optionsJosh Poimboeuf2-8/+4
CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all tangled up. De-spaghettify the code a bit. Some of the rethunk-related code has been shuffled around within the '.text..__x86.return_thunk' section, but otherwise there are no functional changes. srso_alias_untrain_ret() and srso_alias_safe_ret() ((which are very address-sensitive) haven't moved. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Move retbleed IBPB check into existing 'has_microcode' code blockJosh Poimboeuf1-3/+1
Simplify the code flow a bit by moving the retbleed IBPB check into the existing 'has_microcode' block. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/0a22b86b1f6b07f9046a9ab763fc0e0d1b7a91d4.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/bugs: Remove default case for fully switched enumsJosh Poimboeuf1-10/+7
For enum switch statements which handle all possible cases, remove the default case so a compiler warning gets printed if one of the enums gets accidentally omitted from the switch statement. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/fcf6feefab991b72e411c2aed688b18e65e06aed.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Remove 'pred_cmd' labelJosh Poimboeuf1-8/+13
SBPB is only enabled in two distinct cases: 1) when SRSO has been disabled with srso=off 2) when SRSO has been fixed (in future HW) Simplify the control flow by getting rid of the 'pred_cmd' label and moving the SBPB enablement check to the two corresponding code sites. This makes it more clear when exactly SBPB gets enabled. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/bb20e8569cfa144def5e6f25e610804bc4974de2.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/percpu: Correct PER_CPU_VAR() usage to include symbol and its addendUros Bizjak1-1/+1
The PER_CPU_VAR() macro should be applied to a symbol and its addend. Inconsistent usage is currently harmless, but needs to be corrected before %rip-relative addressing is introduced to the PER_CPU_VAR() macro. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com>
2023-10-20x86/srso: Fix vulnerability reporting for missing microcodeJosh Poimboeuf1-14/+22
The SRSO default safe-ret mitigation is reported as "mitigated" even if microcode hasn't been updated. That's wrong because userspace may still be vulnerable to SRSO attacks due to IBPB not flushing branch type predictions. Report the safe-ret + !microcode case as vulnerable. Also report the microcode-only case as vulnerable as it leaves the kernel open to attacks. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/a8a14f97d1b0e03ec255c81637afdf4cf0ae9c99.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Print mitigation for retbleed IBPB caseJosh Poimboeuf1-3/+3
When overriding the requested mitigation with IBPB due to retbleed=ibpb, print the mitigation in the usual format instead of a custom error message. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ec3af919e267773d896c240faf30bfc6a1fd6304.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Print actual mitigation if requested mitigation isn't possibleJosh Poimboeuf1-3/+0
If the kernel wasn't compiled to support the requested option, print the actual option that ends up getting used. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/7e7a12ea9d85a9f76ca16a3efb71f262dee46ab1.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Fix SBPB enablement for (possible) future fixed HWJosh Poimboeuf1-1/+1
Make the SBPB check more robust against the (possible) case where future HW has SRSO fixed but doesn't have the SRSO_NO bit set. Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/cee5050db750b391c9f35f5334f8ff40e66c01b9.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/fpu: Clean up FPU switching in the middle of task switchingLinus Torvalds2-8/+6
It happens to work, but it's very very wrong, because our 'current' macro is magic that is supposedly loading a stable value. It just happens to be not quite stable enough and the compilers re-load the value enough for this code to work. But it's wrong. The whole struct fpu *prev_fpu = &prev->fpu; thing in __switch_to() is pretty ugly. There's no reason why we should look at that 'prev_fpu' pointer there, or pass it down. And it only generates worse code, in how it loads 'current' when __switch_to() has the right task pointers. The attached patch not only cleans this up, it actually generates better code too: (a) it removes one push/pop pair at entry/exit because there's one less register used (no 'current') (b) it removes that pointless load of 'current' because it just uses the right argument: - movq %gs:pcpu_hot(%rip), %r12 - testq $16384, (%r12) + testq $16384, (%rdi) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231018184227.446318-1-ubizjak@gmail.com
2023-10-20Merge tag 'sev_fixes_for_v6.6' of ↵Linus Torvalds2-9/+74
//git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Take care of a race between when the #VC exception is raised and when the guest kernel gets to emulate certain instructions in SEV-{ES,SNP} guests by: - disabling emulation of MMIO instructions when coming from user mode - checking the IO permission bitmap before emulating IO instructions and verifying the memory operands of INS/OUTS insns" * tag 'sev_fixes_for_v6.6' of //git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Check for user-space IOIO pointing to kernel space x86/sev: Check IOBM for IOIO exceptions from user-space x86/sev: Disable MMIO emulation from user mode
2023-10-19x86/microcode/intel: Cleanup code furtherThomas Gleixner1-44/+32
Sanitize the microcode scan loop, fixup printks and move the loading function for builtin microcode next to the place where it is used and mark it __init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.389400871@linutronix.de
2023-10-19x86/microcode/intel: Simplify and rename generic_load_microcode()Thomas Gleixner1-30/+17
so it becomes less obfuscated and rename it because there is nothing generic about it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.330295409@linutronix.de
2023-10-19x86/microcode/intel: Simplify scan_microcode()Thomas Gleixner1-21/+7
Make it readable and comprehensible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.271940980@linutronix.de
2023-10-19x86/microcode/intel: Rip out mixed stepping support for Intel CPUsAshok Raj4-121/+34
Mixed steppings aren't supported on Intel CPUs. Only one microcode patch is required for the entire system. The caching of microcode blobs which match the family and model is therefore pointless and in fact is dysfunctional as CPU hotplug updates use only a single microcode blob, i.e. the one where *intel_ucode_patch points to. Remove the microcode cache and make it an AMD local feature. [ tglx: - save only at the end. Otherwise random microcode ends up in the pointer for early loading - free the ucode patch pointer in save_microcode_patch() only after kmemdup() has succeeded, as reported by Andrew Cooper ] Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.404362809@linutronix.de
2023-10-18x86/microcode/32: Move early loading after paging enableThomas Gleixner8-265/+71
32-bit loads microcode before paging is enabled. The commit which introduced that has zero justification in the changelog. The cover letter has slightly more content, but it does not give any technical justification either: "The problem in current microcode loading method is that we load a microcode way, way too late; ideally we should load it before turning paging on. This may only be practical on 32 bits since we can't get to 64-bit mode without paging on, but we should still do it as early as at all possible." Handwaving word salad with zero technical content. Someone claimed in an offlist conversation that this is required for curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires an microcode update in order to make the usage of PSE safe. But during early boot, PSE is completely irrelevant and it is evaluated way later. Neither is it relevant for the AP on single core HT enabled CPUs as the microcode loading on the AP is not doing anything. On dual core CPUs there is a theoretical problem if a split of an executable large page between enabling paging including PSE and loading the microcode happens. But that's only theoretical, it's practically irrelevant because the affected dual core CPUs are 64bit enabled and therefore have paging and PSE enabled before loading the microcode on the second core. So why would it work on 64-bit but not on 32-bit? The erratum: "AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is Split Into 4-Kbyte Pages Problem: If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." The practical relevance for this is exactly zero because there is no splitting of large text pages during early boot-time, i.e. between paging enable and microcode loading, and neither during CPU hotplug. IOW, this load microcode before paging enable is yet another voodoo programming solution in search of a problem. What's worse is that it causes at least two serious problems: 1) When stackprotector is enabled, the microcode loader code has the stackprotector mechanics enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via %fs on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. That's not necessarily true during runtime CPU hotplug. 2) When function tracing is enabled, the relevant microcode loader functions and the functions invoked from there will call into the tracing code and evaluate global and per CPU variables in physical address mode. What could potentially go wrong? Cure this and move the microcode loading after the early paging enable, use the new temporary initrd mapping and remove the gunk in the microcode loader which is required to handle physical address mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de
2023-10-18x86/boot/32: Temporarily map initrd for microcode loadingThomas Gleixner1-2/+52
Early microcode loading on 32-bit runs in physical address mode because the initrd is not covered by the initial page tables. That results in a horrible mess all over the microcode loader code. Provide a temporary mapping for the initrd in the initial page tables by appending it to the actual initial mapping starting with a new PGD or PMD depending on the configured page table levels ([non-]PAE). The page table entries are located after _brk_end so they are not permanently using memory space. The mapping is invalidated right away in i386_start_kernel() after the early microcode loader has run. This prepares for removing the physical address mode oddities from all over the microcode loader code, which in turn allows further cleanups. Provide the map and unmap code and document the place where the microcode loader needs to be invoked with a comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.292291436@linutronix.de