summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-06-23x86/cpu: Write the default PKRU value when enabling PKEThomas Gleixner1-0/+2
In preparation of making the PKRU management more independent from XSTATES, write the default PKRU value into the hardware right after enabling PKRU in CR4. This ensures that switch_to() and copy_thread() have the correct setting for init task and the per CPU idle threads right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.622983906@linutronix.de
2021-06-23x86/cpu: Sanitize X86_FEATURE_OSPKEThomas Gleixner4-16/+14
X86_FEATURE_OSPKE is enabled first on the boot CPU and the feature flag is set. Secondary CPUs have to enable CR4.PKE as well and set their per CPU feature flag. That's ineffective because all call sites have checks for boot_cpu_data. Make it smarter and force the feature flag when PKU is enabled on the boot cpu which allows then to use cpu_feature_enabled(X86_FEATURE_OSPKE) all over the place. That either compiles the code out when PKEY support is disabled in Kconfig or uses a static_cpu_has() for the feature check which makes a significant difference in hotpaths, e.g. context switch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.305113644@linutronix.de
2021-06-23x86/fpu: Rename and sanitize fpu__save/copy()Thomas Gleixner4-14/+13
Both function names are a misnomer. fpu__save() is actually about synchronizing the hardware register state into the task's memory state so that either coredump or a math exception handler can inspect the state at the time where the problem happens. The function guarantees to preserve the register state, while "save" is a common terminology for saving the current state so it can be modified and restored later. This is clearly not the case here. Rename it to fpu_sync_fpstate(). fpu__copy() is used to clone the current task's FPU state when duplicating task_struct. While the register state is a copy the rest of the FPU state is not. Name it accordingly and remove the really pointless @src argument along with the warning which comes along with it. Nothing can ever copy the FPU state of a non-current task. It's clearly just a consequence of arch_dup_task_struct(), but it makes no sense to proliferate that further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.196727450@linutronix.de
2021-06-23x86/pkeys: Move read_pkru() and write_pkru()Dave Hansen1-0/+1
write_pkru() was originally used just to write to the PKRU register. It was mercifully short and sweet and was not out of place in pgtable.h with some other pkey-related code. But, later work included a requirement to also modify the task XSAVE buffer when updating the register. This really is more related to the XSAVE architecture than to paging. Move the read/write_pkru() to asm/pkru.h. pgtable.h won't miss them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.de
2021-06-23x86/fpu/xstate: Sanitize handling of independent featuresThomas Gleixner1-50/+48
The copy functions for the independent features are horribly named and the supervisor and independent part is just overengineered. The point is that the supplied mask has either to be a subset of the independent features or a subset of the task->fpu.xstate managed features. Rewrite it so it checks for invalid overlaps of these areas in the caller supplied feature mask. Rename it so it follows the new naming convention for these operations. Mop up the function documentation. This allows to use that function for other purposes as well. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210623121455.004880675@linutronix.de
2021-06-23x86/fpu: Rename "dynamic" XSTATEs to "independent"Andy Lutomirski1-31/+31
The salient feature of "dynamic" XSTATEs is that they are not part of the main task XSTATE buffer. The fact that they are dynamically allocated is irrelevant and will become quite confusing when user math XSTATEs start being dynamically allocated. Rename them to "independent" because they are independent of the main XSTATE code. This is just a search-and-replace with some whitespace updates to keep things aligned. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/1eecb0e4f3e07828ebe5d737ec77dc3b708fad2d.1623388344.git.luto@kernel.org Link: https://lkml.kernel.org/r/20210623121454.911450390@linutronix.de
2021-06-23x86/fpu: Rename initstate copy functionsThomas Gleixner1-3/+3
Again this not a copy. It's restoring register state from kernel memory. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.816581630@linutronix.de
2021-06-23x86/fpu: Get rid of the FNSAVE optimizationThomas Gleixner1-31/+23
The FNSAVE support requires conditionals in quite some call paths because FNSAVE reinitializes the FPU hardware. If the save has to preserve the FPU register state then the caller has to conditionally restore it from memory when FNSAVE is in use. This also requires a conditional in context switch because the restore avoidance optimization cannot work with FNSAVE. As this only affects 20+ years old CPUs there is really no reason to keep this optimization effective for FNSAVE. It's about time to not optimize for antiques anymore. Just unconditionally FRSTOR the save content to the registers and clean up the conditionals all over the place. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.617369268@linutronix.de
2021-06-23x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate()Thomas Gleixner1-5/+5
A copy is guaranteed to leave the source intact, which is not the case when FNSAVE is used as that reinitilizes the registers. Save does not make such guarantees and it matches what this is about, i.e. to save the state for a later restore. Rename it to save_fpregs_to_fpstate(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.508853062@linutronix.de
2021-06-23x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate()Thomas Gleixner1-90/+47
copy_uabi_from_user_to_xstate() and copy_uabi_from_kernel_to_xstate() are almost identical except for the copy function. Unify them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20210623121454.414215896@linutronix.de
2021-06-23x86/fpu: Rename xstate copy functions which are related to UABIThomas Gleixner3-4/+5
Rename them to reflect that these functions deal with user space format XSAVE buffers. copy_kernel_to_xstate() -> copy_uabi_from_kernel_to_xstate() copy_user_to_xstate() -> copy_sigframe_from_user_to_xstate() Again a clear statement that these functions deal with user space ABI. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.318485015@linutronix.de
2021-06-23x86/fpu: Rename fregs-related copy functionsThomas Gleixner2-4/+4
The function names for fnsave/fnrstor operations are horribly named and a permanent source of confusion. Rename: copy_kernel_to_fregs() to frstor() copy_fregs_to_user() to fnsave_to_user_sigframe() copy_user_to_fregs() to frstor_from_user_sigframe() so it's clear what these are doing. All these functions are really low level wrappers around the equally named instructions, so mapping to the documentation is just natural. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.223594101@linutronix.de
2021-06-23x86/fpu: Rename fxregs-related copy functionsThomas Gleixner2-8/+8
The function names for fxsave/fxrstor operations are horribly named and a permanent source of confusion. Rename: copy_fxregs_to_kernel() to fxsave() copy_kernel_to_fxregs() to fxrstor() copy_fxregs_to_user() to fxsave_to_user_sigframe() copy_user_to_fxregs() to fxrstor_from_user_sigframe() so it's clear what these are doing. All these functions are really low level wrappers around the equally named instructions, so mapping to the documentation is just natural. While at it, replace the static_cpu_has(X86_FEATURE_FXSR) with use_fxsr() to be consistent with the rest of the code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.017863494@linutronix.de
2021-06-23x86/fpu: Rename copy_user_to_xregs() and copy_xregs_to_user()Thomas Gleixner1-2/+2
The function names for xsave[s]/xrstor[s] operations are horribly named and a permanent source of confusion. Rename: copy_xregs_to_user() to xsave_to_user_sigframe() copy_user_to_xregs() to xrstor_from_user_sigframe() so it's entirely clear what this is about. This is also a clear indicator of the potentially different storage format because this is user ABI and cannot use compacted format. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.924266705@linutronix.de
2021-06-23x86/fpu: Rename copy_xregs_to_kernel() and copy_kernel_to_xregs()Thomas Gleixner3-15/+15
The function names for xsave[s]/xrstor[s] operations are horribly named and a permanent source of confusion. Rename: copy_xregs_to_kernel() to os_xsave() copy_kernel_to_xregs() to os_xrstor() These are truly low level wrappers around the actual instructions XSAVE[OPT]/XRSTOR and XSAVES/XRSTORS with the twist that the selection based on the available CPU features happens with an alternative to avoid conditionals all over the place and to provide the best performance for hot paths. The os_ prefix tells that this is the OS selected mechanism. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.830239347@linutronix.de
2021-06-23x86/fpu: Get rid of copy_supervisor_to_kernel()Thomas Gleixner2-60/+8
If the fast path of restoring the FPU state on sigreturn fails or is not taken and the current task's FPU is active then the FPU has to be deactivated for the slow path to allow a safe update of the tasks FPU memory state. With supervisor states enabled, this requires to save the supervisor state in the memory state first. Supervisor states require XSAVES so saving only the supervisor state requires to reshuffle the memory buffer because XSAVES uses the compacted format and therefore stores the supervisor states at the beginning of the memory state. That's just an overengineered optimization. Get rid of it and save the full state for this case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.734561971@linutronix.de
2021-06-23x86/fpu: Cleanup arch_set_user_pkey_access()Thomas Gleixner1-5/+6
The function does a sanity check with a WARN_ON_ONCE() but happily proceeds when the pkey argument is out of range. Clean it up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.635764326@linutronix.de
2021-06-23x86/fpu: Get rid of using_compacted_format()Thomas Gleixner1-18/+4
This function is pointlessly global and a complete misnomer because it's usage is related to both supervisor state checks and compacted format checks. Remove it and just make the conditions check the XSAVES feature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.425493349@linutronix.de
2021-06-23x86/fpu: Move fpu__write_begin() to regsetThomas Gleixner2-27/+22
The only usecase for fpu__write_begin is the set() callback of regset, so the function is pointlessly global. Move it to the regset code and rename it to fpu_force_restore() which is exactly decribing what the function does. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.328652975@linutronix.de
2021-06-23x86/fpu/regset: Move fpu__read_begin() into regsetThomas Gleixner2-23/+19
The function can only be used from the regset get() callbacks safely. So there is no reason to have it globally exposed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.234942936@linutronix.de
2021-06-23x86/fpu: Remove fpstate_sanitize_xstate()Thomas Gleixner1-79/+0
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.124819167@linutronix.de
2021-06-23x86/fpu: Use copy_xstate_to_uabi_buf() in fpregs_get()Thomas Gleixner1-10/+20
Use the new functionality of copy_xstate_to_uabi_buf() to retrieve the FX state when XSAVE* is in use. This avoids to overwrite the FPU state buffer with fpstate_sanitize_xstate() which is error prone and duplicated code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.014441775@linutronix.de
2021-06-23x86/fpu: Use copy_xstate_to_uabi_buf() in xfpregs_get()Thomas Gleixner1-3/+8
Use the new functionality of copy_xstate_to_uabi_buf() to retrieve the FX state when XSAVE* is in use. This avoids overwriting the FPU state buffer with fpstate_sanitize_xstate() which is error prone and duplicated code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.901736860@linutronix.de
2021-06-23x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()Thomas Gleixner2-12/+32
When xsave with init state optimization is used then a component's state in the task's xsave buffer can be stale when the corresponding feature bit is not set. fpregs_get() and xfpregs_get() invoke fpstate_sanitize_xstate() to update the task's xsave buffer before retrieving the FX or FP state. That's just duplicated code as copy_xstate_to_kernel() already handles this correctly. Add a copy mode argument to the function which allows to restrict the state copy to the FP and SSE features. Also rename the function to copy_xstate_to_uabi_buf() so the name reflects what it is doing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.805327286@linutronix.de
2021-06-23x86/fpu: Clean up fpregs_set()Andy Lutomirski1-15/+16
fpregs_set() has unnecessary complexity to support short or nonzero-offset writes and to handle the case in which a copy from userspace overwrites some of the target buffer and then fails. Support for partial writes is useless -- just require that the write has offset 0 and the correct size, and copy into a temporary kernel buffer to avoid clobbering the state if the user access fails. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.710467587@linutronix.de
2021-06-23x86/fpu: Fail ptrace() requests that try to set invalid MXCSR valuesAndy Lutomirski1-2/+3
There is no benefit from accepting and silently changing an invalid MXCSR value supplied via ptrace(). Instead, return -EINVAL on invalid input. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.613614842@linutronix.de
2021-06-23x86/fpu: Rewrite xfpregs_set()Andy Lutomirski1-14/+23
xfpregs_set() was incomprehensible. Almost all of the complexity was due to trying to support nonsensically sized writes or -EFAULT errors that would have partially or completely overwritten the destination before failing. Nonsensically sized input would only have been possible using PTRACE_SETREGSET on REGSET_XFP. Fortunately, it appears (based on Debian code search results) that no one uses that API at all, let alone with the wrong sized buffer. Failed user access can be handled more cleanly by first copying to kernel memory. Just rewrite it to require sensible input. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.504234607@linutronix.de
2021-06-23x86/fpu: Simplify PTRACE_GETREGS codeDave Hansen2-24/+6
ptrace() has interfaces that let a ptracer inspect a ptracee's register state. This includes XSAVE state. The ptrace() ABI includes a hardware-format XSAVE buffer for both the SETREGS and GETREGS interfaces. In the old days, the kernel buffer and the ptrace() ABI buffer were the same boring non-compacted format. But, since the advent of supervisor states and the compacted format, the kernel buffer has diverged from the format presented in the ABI. This leads to two paths in the kernel: 1. Effectively a verbatim copy_to_user() which just copies the kernel buffer out to userspace. This is used when the kernel buffer is kept in the non-compacted form which means that it shares a format with the ptrace ABI. 2. A one-state-at-a-time path: copy_xstate_to_kernel(). This is theoretically slower since it does a bunch of piecemeal copies. Remove the verbatim copy case. Speed probably does not matter in this path, and the vast majority of new hardware will use the one-state-at-a-time path anyway. This ensures greater testing for the "slow" path. This also makes enabling PKRU in this interface easier since a single path can be patched instead of two. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.408457100@linutronix.de
2021-06-23x86/fpu: Reject invalid MXCSR values in copy_kernel_to_xstate()Thomas Gleixner1-3/+16
Instead of masking out reserved bits, check them and reject the provided state as invalid if not zero. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.308388343@linutronix.de
2021-06-23x86/fpu: Sanitize xstateregs_set()Thomas Gleixner2-31/+25
xstateregs_set() operates on a stopped task and tries to copy the provided buffer into the task's fpu.state.xsave buffer. Any error while copying or invalid state detected after copying results in wiping the target task's FPU state completely including supervisor states. That's just wrong. The caller supplied invalid data or has a problem with unmapped memory, so there is absolutely no justification to corrupt the target state. Fix this with the following modifications: 1) If data has to be copied from userspace, allocate a buffer and copy from user first. 2) Use copy_kernel_to_xstate() unconditionally so that header checking works correctly. 3) Return on error without corrupting the target state. This prevents corrupting states and lets the caller deal with the problem it caused in the first place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.214903673@linutronix.de
2021-06-23x86/fpu: Limit xstate copy size in xstateregs_set()Thomas Gleixner1-1/+1
If the count argument is larger than the xstate size, this will happily copy beyond the end of xstate. Fixes: 91c3dba7dbc1 ("x86/fpu/xstate: Fix PTRACE frames for XSAVES") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.120741557@linutronix.de
2021-06-23x86/fpu: Move inlines where they belongThomas Gleixner1-0/+15
They are only used in fpstate_init() and there is no point to have them in a header just to make reading the code harder. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.023118522@linutronix.de
2021-06-23x86/fpu: Remove unused get_xsave_field_ptr()Thomas Gleixner1-30/+0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.915614415@linutronix.de
2021-06-23x86/fpu: Get rid of fpu__get_supported_xfeatures_mask()Thomas Gleixner3-15/+5
This function is really not doing what the comment advertises: "Find supported xfeatures based on cpu features and command-line input. This must be called after fpu__init_parse_early_param() is called and xfeatures_mask is enumerated." fpu__init_parse_early_param() does not exist anymore and the function just returns a constant. Remove it and fix the caller and get rid of further references to fpu__init_parse_early_param(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.816404717@linutronix.de
2021-06-23x86/fpu: Make xfeatures_mask_all __ro_after_initThomas Gleixner1-13/+15
Nothing has to modify this after init. But of course there is code which unconditionally masks xfeatures_mask_all on CPU hotplug. This goes unnoticed during boot hotplug because at that point the variable is still RW mapped. This is broken in several ways: 1) Masking this in post init CPU hotplug means that any modification of this state goes unnoticed until actual hotplug happens. 2) If that ever happens then these bogus feature bits are already populated all over the place and the system is in inconsistent state vs. the compacted XSTATE offsets. If at all then this has to panic the machine because the inconsistency cannot be undone anymore. Make this a one-time paranoia check in xstate init code and disable xsave when this happens. Reported-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.712803952@linutronix.de
2021-06-23x86/fpu: Mark various FPU state variables __ro_after_initThomas Gleixner2-7/+11
Nothing modifies these after booting. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20210623121451.611751529@linutronix.de
2021-06-23x86/pkeys: Revert a5eff7259790 ("x86/pkeys: Add PKRU value to init_fpstate")Thomas Gleixner1-5/+0
This cannot work and it's unclear how that ever made a difference. init_fpstate.xsave.header.xfeatures is always 0 so get_xsave_addr() will always return a NULL pointer, which will prevent storing the default PKRU value in init_fpstate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.451391598@linutronix.de
2021-06-23x86/fpu: Fix copy_xstate_to_kernel() gap handlingThomas Gleixner1-44/+61
The gap handling in copy_xstate_to_kernel() is wrong when XSAVES is in use. Using init_fpstate for copying the init state of features which are not set in the xstate header is only correct for the legacy area, but not for the extended features area because when XSAVES is in use then init_fpstate is in compacted form which means the xstate offsets which are used to copy from init_fpstate are not valid. Fortunately, this is not a real problem today because all extended features in use have an all-zeros init state, but it is wrong nevertheless and with a potentially dynamically sized init_fpstate this would result in an access outside of the init_fpstate. Fix this by keeping track of the last copied state in the target buffer and explicitly zero it when there is a feature or alignment gap. Use the compacted offset when accessing the extended feature space in init_fpstate. As this is not a functional issue on older kernels this is intentionally not tagged for stable. Fixes: b8be15d58806 ("x86/fpu/xstate: Re-enable XSAVES") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.294282032@linutronix.de
2021-06-23Merge x86/urgent into x86/fpuBorislav Petkov11-184/+274
Pick up dependent changes which either went mainline (x86/urgent is based on -rc7 and that contains them) as urgent fixes and the current x86/urgent branch which contains two more urgent fixes, so that the bigger FPU rework can base off ontop. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-06-23Merge branch 'topic/ppc-kvm' of ↵Paolo Bonzini10-24/+40
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes
2021-06-23x86/sev: Use "SEV: " prefix for messages from sev.cJoerg Roedel1-1/+1
The source file has been renamed froms sev-es.c to sev.c, but the messages are still prefixed with "SEV-ES: ". Change that to "SEV: " to make it consistent. Fixes: e759959fe3b8 ("x86/sev-es: Rename sev-es.{ch} to sev.{ch}") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210622144825.27588-4-joro@8bytes.org
2021-06-23Backmerge tag 'v5.13-rc7' into drm-nextDave Airlie8-110/+135
Backmerge Linux 5.13-rc7 to make some pulls from later bases apply, and to bake in the conflicts so far.
2021-06-22clocksource: Reduce clocksource-skew thresholdPaul E. McKenney1-0/+1
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed by more than 12.5% in order to be marked unstable. Except that a clock that is skewed by that much is probably destroying unsuspecting software right and left. And given that there are now checks for false-positive skews due to delays between reading the two clocks, it should be possible to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks such as TSC. Therefore, add a new uncertainty_margin field to the clocksource structure that contains the maximum uncertainty in nanoseconds for the corresponding clock. This field may be initialized manually, as it is for clocksource_tsc_early and clocksource_jiffies, which is copied to refined_jiffies. If the field is not initialized manually, it will be computed at clock-registry time as the period of the clock in question based on the scale and freq parameters to __clocksource_update_freq_scale() function. If either of those two parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative to dividing by zero. No matter how the uncertainty_margin field is calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100 microseconds. Note that manually initialized uncertainty_margin fields are not adjusted, but there is a WARN_ON_ONCE() that triggers if any such field is less than twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage production use of the one-nanosecond uncertainty_margin values that are used to test the clock-skew code itself. The actual clock-skew check uses the sum of the uncertainty_margin fields of the two clocksource structures being compared. Integer overflow is avoided because the largest computed value of the uncertainty_margin fields is one billion (10^9), and double that value fits into an unsigned int. However, if someone manually specifies (say) UINT_MAX, they will get what they deserve. Note that the refined_jiffies uncertainty_margin field is initialized to TICK_NSEC, which means that skew checks involving this clocksource will be sufficently forgiving. In a similar vein, the clocksource_tsc_early uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which replicates the current behavior and allows custom setting if needed in order to address the rare skews detected for this clocksource in current mainline. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
2021-06-22clocksource: Check per-CPU clock synchronization when marked unstablePaul E. McKenney1-1/+2
Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? Therefore apply CPU-to-CPU synchronization checking to newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
2021-06-22x86/fpu: Make init_fpstate correct with optimized XSAVEThomas Gleixner1-3/+38
The XSAVE init code initializes all enabled and supported components with XRSTOR(S) to init state. Then it XSAVEs the state of the components back into init_fpstate which is used in several places to fill in the init state of components. This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because those use the init optimization and skip writing state of components which are in init state. So init_fpstate.xsave still contains all zeroes after this operation. There are two ways to solve that: 1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when XSAVES is enabled because XSAVES uses compacted format. 2) Save the components which are known to have a non-zero init state by other means. Looking deeper, #2 is the right thing to do because all components the kernel supports have all-zeroes init state except the legacy features (FP, SSE). Those cannot be hard coded because the states are not identical on all CPUs, but they can be saved with FXSAVE which avoids all conditionals. Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with a BUILD_BUG_ON() which reminds developers to validate that a newly added component has all zeroes init state. As a bonus remove the now unused copy_xregs_to_kernel_booting() crutch. The XSAVE and reshuffle method can still be implemented in the unlikely case that components are added which have a non-zero init state and no other means to save them. For now, FXSAVE is just simple and good enough. [ bp: Fix a typo or two in the text. ] Fixes: 6bad06b76892 ("x86, xsave: Use xsaveopt in context-switch path when supported") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
2021-06-22x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()Thomas Gleixner1-18/+8
sanitize_restored_user_xstate() preserves the supervisor states only when the fx_only argument is zero, which allows unprivileged user space to put supervisor states back into init state. Preserve them unconditionally. [ bp: Fix a typo or two in the text. ] Fixes: 5d6b6a6f9b5c ("x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210618143444.438635017@linutronix.de
2021-06-21x86/sev: Split up runtime #VC handler for correct state trackingJoerg Roedel1-69/+79
Split up the #VC handler code into a from-user and a from-kernel part. This allows clean and correct state tracking, as the #VC handler needs to enter NMI-state when raised from kernel mode and plain IRQ state when raised from user-mode. Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210618115409.22735-3-joro@8bytes.org
2021-06-21x86/sev: Make sure IRQs are disabled while GHCB is activeJoerg Roedel1-12/+22
The #VC handler only cares about IRQs being disabled while the GHCB is active, as it must not be interrupted by something which could cause another #VC while it holds the GHCB (NMI is the exception for which the backup GHCB exits). Make sure nothing interrupts the code path while the GHCB is active by making sure that callers of __sev_{get,put}_ghcb() have disabled interrupts upfront. [ bp: Massage commit message. ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210618115409.22735-2-joro@8bytes.org
2021-06-20Merge tag 'x86_urgent_for_v5.13_rc6' of ↵Linus Torvalds2-19/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "A first set of urgent fixes to the FPU/XSTATE handling mess^W code. (There's a lot more in the pipe): - Prevent corruption of the XSTATE buffer in signal handling by validating what is being copied from userspace first. - Invalidate other task's preserved FPU registers on XRSTOR failure (#PF) because latter can still modify some of them. - Restore the proper PKRU value in case userspace modified it - Reset FPU state when signal restoring fails Other: - Map EFI boot services data memory as encrypted in a SEV guest so that the guest can access it and actually boot properly - Two SGX correctness fixes: proper resources freeing and a NUMA fix" * tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Avoid truncating memblocks for SGX memory x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed x86/fpu: Reset state for all signal restore failures x86/pkru: Write hardware init value to PKRU when xstate is init x86/process: Check PF_KTHREAD and not current->mm for kernel threads x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer x86/fpu: Prevent state corruption in __fpu__restore_sig() x86/ioremap: Map EFI-reserved memory as encrypted for SEV
2021-06-18sched: Introduce task_is_running()Peter Zijlstra1-2/+2
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org