summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2021-06-25KVM: x86/mmu: Drop "nx" from MMU context now that there are no readersSean Christopherson1-2/+0
Drop kvm_mmu.nx as there no consumers left. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-39-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigansSean Christopherson1-2/+2
Rename "nxe" to "efer_nx" so that future macro magic can use the pattern <reg>_<bit> for all CR0, CR4, and EFER bits that included in the role. Using "efer_nx" also makes it clear that the role bit reflects EFER.NX, not the NX bit in the corresponding PTE. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25Revert "KVM: MMU: record maximum physical address width in ↵Sean Christopherson1-1/+0
kvm_mmu_extended_role" Drop MAXPHYADDR from mmu_role now that all MMUs have their role invalidated after a CPUID update. Invalidating the role forces all MMUs to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can only be changed only through a CPUID update. This reverts commit de3ccd26fafc707b09792d9b633c8b5b48865315. Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is brokenSean Christopherson1-1/+1
Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest instability. Initialize last_vmentry_cpu to -1 and use it to detect if the vCPU has been run at least once when its CPUID model is changed. KVM does not correctly handle changes to paging related settings in the guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc... KVM could theoretically zap all shadow pages, but actually making that happen is a mess due to lock inversion (vcpu->mutex is held). And even then, updating paging settings on the fly would only work if all vCPUs are stopped, updated in concert with identical settings, then restarted. To support running vCPUs with different vCPU models (that affect paging), KVM would need to track all relevant information in kvm_mmu_page_role. Note, that's the _page_ role, not the full mmu_role. Updating mmu_role isn't sufficient as a vCPU can reuse a shadow page translation that was created by a vCPU with different settings and thus completely skip the reserved bit checks (that are tied to CPUID). Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as it would require doubling gfn_track from a u16 to a u32, i.e. would increase KVM's memory footprint by 2 bytes for every 4kb of guest memory. E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT would all need to be tracked. In practice, there is no remotely sane use case for changing any paging related CPUID entries on the fly, so just sweep it under the rug (after yelling at userspace). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25KVM: x86: Force all MMUs to reinitialize if guest CPUID is modifiedSean Christopherson1-0/+1
Invalidate all MMUs' roles after a CPUID update to force reinitizliation of the MMU context/helpers. Despite the efforts of commit de3ccd26fafc ("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"), there are still a handful of CPUID-based properties that affect MMU behavior but are not incorporated into mmu_role. E.g. 1gb hugepage support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all factor into the guest's reserved PTE bits. The obvious alternative would be to add all such properties to mmu_role, but doing so provides no benefit over simply forcing a reinitialization on every CPUID update, as setting guest CPUID is a rare operation. Note, reinitializing all MMUs after a CPUID update does not fix all of KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID properties, which means that a vCPU can reuse shadow pages that should not exist for the new vCPU model, e.g. that map GPAs that are now illegal (due to MAXPHYADDR changes) or that set bits that are now reserved (PAGE_SIZE for 1gb pages), etc... Tracking the relevant CPUID properties in kvm_mmu_page_role would address the majority of problems, but fully tracking that much state in the shadow page role comes with an unpalatable cost as it would require a non-trivial increase in KVM's memory footprint. The GBPAGES case is even worse, as neither Intel nor AMD provides a way to disable 1gb hugepage support in the hardware page walker, i.e. it's a virtualization hole that can't be closed when using TDP. In other words, resetting the MMU after a CPUID update is largely a superficial fix. But, it will allow reverting the tracking of MAXPHYADDR in the mmu_role, and that case in particular needs to mostly work because KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging is supported. For cases where KVM botches guest behavior, the damage is limited to that guest. But for the shadow_root_level, a misconfigured MMU can cause KVM to incorrectly access memory, e.g. due to walking off the end of its shadow page tables. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"Sean Christopherson1-0/+1
Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in mmu_role.base.level because that tracks the shadow root level, i.e. TDP level. Normally, this is not an issue because LA57 can't be toggled while long mode is active, i.e. the guest has to first disable paging, then toggle LA57, then re-enable paging, thus ensuring an MMU reinitialization. But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57 without having to bounce through an unpaged section. L1 can also load a new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a single entry PML5 is sufficient. Such shenanigans are only problematic if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode. Note, in the L2 case with nested TDP, even though L1 can switch between L2s with different LA57 settings, thus bypassing the paging requirement, in that case KVM's nested_mmu will track LA57 in base.level. This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea. Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: stats: Separate generic stats from architecture specific onesJing Zhang1-7/+2
Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-23x86/fpu: Return proper error codes from user access functionsThomas Gleixner1-7/+12
When *RSTOR from user memory raises an exception, there is no way to differentiate them. That's bad because it forces the slow path even when the failure was not a fault. If the operation raised eg. #GP then going through the slow path is pointless. Use _ASM_EXTABLE_FAULT() which stores the trap number and let the exception fixup return the negated trap number as error. This allows to separate the fast path and let it handle faults directly and avoid the slow path for all other exceptions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121457.601480369@linutronix.de
2021-06-23x86/pkru: Remove xstate fiddling from write_pkru()Thomas Gleixner2-26/+5
The PKRU value of a task is stored in task->thread.pkru when the task is scheduled out. PKRU is restored on schedule in from there. So keeping the XSAVE buffer up to date is a pointless exercise. Remove the xstate fiddling and cleanup all related functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.897372712@linutronix.de
2021-06-23x86/fpu: Remove PKRU handling from switch_fpu_finish()Thomas Gleixner1-30/+4
PKRU is already updated and the xstate is not longer the proper source of information. [ bp: Use cpu_feature_enabled() ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.708180184@linutronix.de
2021-06-23x86/fpu: Mask PKRU from kernel XRSTOR[S] operationsThomas Gleixner2-2/+12
As the PKRU state is managed separately restoring it from the xstate buffer would be counterproductive as it might either restore a stale value or reinit the PKRU state to 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.606745195@linutronix.de
2021-06-23x86/fpu: Hook up PKRU into ptrace()Dave Hansen1-1/+1
One nice thing about having PKRU be XSAVE-managed is that it gets naturally exposed into the XSAVE-using ABIs. Now that XSAVE will not be used to manage PKRU, these ABIs need to be manually enabled to deal with PKRU. ptrace() uses copy_uabi_xstate_to_kernel() to collect the tracee's XSTATE. As PKRU is not in the task's XSTATE buffer, use task->thread.pkru for filling in up the ptrace buffer. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.508770763@linutronix.de
2021-06-23x86/fpu: Add PKRU storage outside of task XSAVE bufferDave Hansen1-0/+9
PKRU is currently partly XSAVE-managed and partly not. It has space in the task XSAVE buffer and is context-switched by XSAVE/XRSTOR. However, it is switched more eagerly than FPU because there may be a need for PKRU to be up-to-date for things like copy_to/from_user() since PKRU affects user-permission memory accesses, not just accesses from userspace itself. This leaves PKRU in a very odd position. XSAVE brings very little value to the table for how Linux uses PKRU except for signal related XSTATE handling. Prepare to move PKRU away from being XSAVE-managed. Allocate space in the thread_struct for it and save/restore it in the context-switch path separately from the XSAVE-managed features. task->thread_struct.pkru is only valid when the task is scheduled out. For the current task the authoritative source is the hardware, i.e. it has to be retrieved via rdpkru(). Leave the XSAVE code in place for now to ensure bisectability. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.399107624@linutronix.de
2021-06-23x86/fpu: Dont restore PKRU in fpregs_restore_userspace()Thomas Gleixner2-1/+34
switch_to() and flush_thread() write the task's PKRU value eagerly so the PKRU value of current is always valid in the hardware. That means there is no point in restoring PKRU on exit to user or when reactivating the task's FPU registers in the signal frame setup path. This allows to remove all the xstate buffer updates with PKRU values once the PKRU state is stored in thread struct while a task is scheduled out. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.303919033@linutronix.de
2021-06-23x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()Thomas Gleixner2-2/+11
Rename it so it's clear that this is about user ABI features which can differ from the feature set which the kernel saves and restores because the kernel handles e.g. PKRU differently. But the user ABI (ptrace, signal frame) expects it to be there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.211585137@linutronix.de
2021-06-23x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()Thomas Gleixner1-24/+1
copy_kernel_to_fpregs() restores all xfeatures but it is also the place where the AMD FXSAVE_LEAK bug is handled. That prevents fpregs_restore_userregs() to limit the restored features, which is required to untangle PKRU and XSTATE handling and also for the upcoming supervisor state management. Move the FXSAVE_LEAK quirk into __copy_kernel_to_fpregs() and deinline that function which has become rather fat. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.114271278@linutronix.de
2021-06-23x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()Thomas Gleixner1-4/+2
Rename it so that it becomes entirely clear what this function is about. It's purpose is to restore the FPU registers to the state which was saved in the task's FPU memory state either at context switch or by an in kernel FPU user. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.018867925@linutronix.de
2021-06-23x86/fpu: Rename fpu__clear_all() to fpu_flush_thread()Thomas Gleixner1-1/+2
Make it clear what the function is about. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.827979263@linutronix.de
2021-06-23x86/fpu: Use pkru_write_default() in copy_init_fpstate_to_fpregs()Thomas Gleixner1-1/+0
There is no point in using copy_init_pkru_to_fpregs() which in turn calls write_pkru(). write_pkru() tries to fiddle with the task's xstate buffer for nothing because the XRSTOR[S](init_fpstate) just cleared the xfeature flag in the xstate header which makes get_xsave_addr() fail. It's a useless exercise anyway because the reinitialization activates the FPU so before the task's xstate buffer can be used again a XRSTOR[S] must happen which in turn dumps the PKRU value. Get rid of the now unused copy_init_pkru_to_fpregs(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.732508792@linutronix.de
2021-06-23x86/pkru: Provide pkru_write_default()Thomas Gleixner1-0/+8
Provide a simple and trivial helper which just writes the PKRU default value without trying to fiddle with the task's xsave buffer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.513729794@linutronix.de
2021-06-23x86/pkru: Provide pkru_get_init_value()Thomas Gleixner1-0/+2
When CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled then the following code fails to compile: if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { u32 pkru = READ_ONCE(init_pkru_value); .. } because init_pkru_value is defined as '0' which makes READ_ONCE() upset. Provide an accessor macro to avoid #ifdeffery all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.404880646@linutronix.de
2021-06-23x86/cpu: Sanitize X86_FEATURE_OSPKEThomas Gleixner2-6/+6
X86_FEATURE_OSPKE is enabled first on the boot CPU and the feature flag is set. Secondary CPUs have to enable CR4.PKE as well and set their per CPU feature flag. That's ineffective because all call sites have checks for boot_cpu_data. Make it smarter and force the feature flag when PKU is enabled on the boot cpu which allows then to use cpu_feature_enabled(X86_FEATURE_OSPKE) all over the place. That either compiles the code out when PKEY support is disabled in Kconfig or uses a static_cpu_has() for the feature check which makes a significant difference in hotpaths, e.g. context switch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.305113644@linutronix.de
2021-06-23x86/fpu: Rename and sanitize fpu__save/copy()Thomas Gleixner1-2/+4
Both function names are a misnomer. fpu__save() is actually about synchronizing the hardware register state into the task's memory state so that either coredump or a math exception handler can inspect the state at the time where the problem happens. The function guarantees to preserve the register state, while "save" is a common terminology for saving the current state so it can be modified and restored later. This is clearly not the case here. Rename it to fpu_sync_fpstate(). fpu__copy() is used to clone the current task's FPU state when duplicating task_struct. While the register state is a copy the rest of the FPU state is not. Name it accordingly and remove the really pointless @src argument along with the warning which comes along with it. Nothing can ever copy the FPU state of a non-current task. It's clearly just a consequence of arch_dup_task_struct(), but it makes no sense to proliferate that further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.196727450@linutronix.de
2021-06-23x86/pkeys: Move read_pkru() and write_pkru()Dave Hansen3-56/+63
write_pkru() was originally used just to write to the PKRU register. It was mercifully short and sweet and was not out of place in pgtable.h with some other pkey-related code. But, later work included a requirement to also modify the task XSAVE buffer when updating the register. This really is more related to the XSAVE architecture than to paging. Move the read/write_pkru() to asm/pkru.h. pgtable.h won't miss them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.de
2021-06-23x86/fpu/xstate: Sanitize handling of independent featuresThomas Gleixner1-2/+3
The copy functions for the independent features are horribly named and the supervisor and independent part is just overengineered. The point is that the supplied mask has either to be a subset of the independent features or a subset of the task->fpu.xstate managed features. Rewrite it so it checks for invalid overlaps of these areas in the caller supplied feature mask. Rename it so it follows the new naming convention for these operations. Mop up the function documentation. This allows to use that function for other purposes as well. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210623121455.004880675@linutronix.de
2021-06-23x86/fpu: Rename "dynamic" XSTATEs to "independent"Andy Lutomirski1-11/+11
The salient feature of "dynamic" XSTATEs is that they are not part of the main task XSTATE buffer. The fact that they are dynamically allocated is irrelevant and will become quite confusing when user math XSTATEs start being dynamically allocated. Rename them to "independent" because they are independent of the main XSTATE code. This is just a search-and-replace with some whitespace updates to keep things aligned. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/1eecb0e4f3e07828ebe5d737ec77dc3b708fad2d.1623388344.git.luto@kernel.org Link: https://lkml.kernel.org/r/20210623121454.911450390@linutronix.de
2021-06-23x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate()Thomas Gleixner1-4/+4
This is not a copy functionality. It restores the register state from the supplied kernel buffer. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.716058365@linutronix.de
2021-06-23x86/fpu: Get rid of the FNSAVE optimizationThomas Gleixner1-7/+11
The FNSAVE support requires conditionals in quite some call paths because FNSAVE reinitializes the FPU hardware. If the save has to preserve the FPU register state then the caller has to conditionally restore it from memory when FNSAVE is in use. This also requires a conditional in context switch because the restore avoidance optimization cannot work with FNSAVE. As this only affects 20+ years old CPUs there is really no reason to keep this optimization effective for FNSAVE. It's about time to not optimize for antiques anymore. Just unconditionally FRSTOR the save content to the registers and clean up the conditionals all over the place. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.617369268@linutronix.de
2021-06-23x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate()Thomas Gleixner1-2/+2
A copy is guaranteed to leave the source intact, which is not the case when FNSAVE is used as that reinitilizes the registers. Save does not make such guarantees and it matches what this is about, i.e. to save the state for a later restore. Rename it to save_fpregs_to_fpstate(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.508853062@linutronix.de
2021-06-23x86/fpu: Rename xstate copy functions which are related to UABIThomas Gleixner1-2/+2
Rename them to reflect that these functions deal with user space format XSAVE buffers. copy_kernel_to_xstate() -> copy_uabi_from_kernel_to_xstate() copy_user_to_xstate() -> copy_sigframe_from_user_to_xstate() Again a clear statement that these functions deal with user space ABI. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.318485015@linutronix.de
2021-06-23x86/fpu: Rename fregs-related copy functionsThomas Gleixner1-5/+5
The function names for fnsave/fnrstor operations are horribly named and a permanent source of confusion. Rename: copy_kernel_to_fregs() to frstor() copy_fregs_to_user() to fnsave_to_user_sigframe() copy_user_to_fregs() to frstor_from_user_sigframe() so it's clear what these are doing. All these functions are really low level wrappers around the equally named instructions, so mapping to the documentation is just natural. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.223594101@linutronix.de
2021-06-23x86/fpu: Rename fxregs-related copy functionsThomas Gleixner1-13/+5
The function names for fxsave/fxrstor operations are horribly named and a permanent source of confusion. Rename: copy_fxregs_to_kernel() to fxsave() copy_kernel_to_fxregs() to fxrstor() copy_fxregs_to_user() to fxsave_to_user_sigframe() copy_user_to_fxregs() to fxrstor_from_user_sigframe() so it's clear what these are doing. All these functions are really low level wrappers around the equally named instructions, so mapping to the documentation is just natural. While at it, replace the static_cpu_has(X86_FEATURE_FXSR) with use_fxsr() to be consistent with the rest of the code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.017863494@linutronix.de
2021-06-23x86/fpu: Rename copy_user_to_xregs() and copy_xregs_to_user()Thomas Gleixner1-2/+2
The function names for xsave[s]/xrstor[s] operations are horribly named and a permanent source of confusion. Rename: copy_xregs_to_user() to xsave_to_user_sigframe() copy_user_to_xregs() to xrstor_from_user_sigframe() so it's entirely clear what this is about. This is also a clear indicator of the potentially different storage format because this is user ABI and cannot use compacted format. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.924266705@linutronix.de
2021-06-23x86/fpu: Rename copy_xregs_to_kernel() and copy_kernel_to_xregs()Thomas Gleixner1-6/+11
The function names for xsave[s]/xrstor[s] operations are horribly named and a permanent source of confusion. Rename: copy_xregs_to_kernel() to os_xsave() copy_kernel_to_xregs() to os_xrstor() These are truly low level wrappers around the actual instructions XSAVE[OPT]/XRSTOR and XSAVES/XRSTORS with the twist that the selection based on the available CPU features happens with an alternative to avoid conditionals all over the place and to provide the best performance for hot paths. The os_ prefix tells that this is the OS selected mechanism. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.830239347@linutronix.de
2021-06-23x86/fpu: Get rid of copy_supervisor_to_kernel()Thomas Gleixner1-1/+0
If the fast path of restoring the FPU state on sigreturn fails or is not taken and the current task's FPU is active then the FPU has to be deactivated for the slow path to allow a safe update of the tasks FPU memory state. With supervisor states enabled, this requires to save the supervisor state in the memory state first. Supervisor states require XSAVES so saving only the supervisor state requires to reshuffle the memory buffer because XSAVES uses the compacted format and therefore stores the supervisor states at the beginning of the memory state. That's just an overengineered optimization. Get rid of it and save the full state for this case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.734561971@linutronix.de
2021-06-23x86/fpu: Get rid of using_compacted_format()Thomas Gleixner1-1/+0
This function is pointlessly global and a complete misnomer because it's usage is related to both supervisor state checks and compacted format checks. Remove it and just make the conditions check the XSAVES feature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.425493349@linutronix.de
2021-06-23x86/fpu: Move fpu__write_begin() to regsetThomas Gleixner1-1/+0
The only usecase for fpu__write_begin is the set() callback of regset, so the function is pointlessly global. Move it to the regset code and rename it to fpu_force_restore() which is exactly decribing what the function does. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.328652975@linutronix.de
2021-06-23x86/fpu/regset: Move fpu__read_begin() into regsetThomas Gleixner1-1/+0
The function can only be used from the regset get() callbacks safely. So there is no reason to have it globally exposed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.234942936@linutronix.de
2021-06-23x86/fpu: Remove fpstate_sanitize_xstate()Thomas Gleixner1-2/+0
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.124819167@linutronix.de
2021-06-23x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()Thomas Gleixner1-2/+10
When xsave with init state optimization is used then a component's state in the task's xsave buffer can be stale when the corresponding feature bit is not set. fpregs_get() and xfpregs_get() invoke fpstate_sanitize_xstate() to update the task's xsave buffer before retrieving the FX or FP state. That's just duplicated code as copy_xstate_to_kernel() already handles this correctly. Add a copy mode argument to the function which allows to restrict the state copy to the FP and SSE features. Also rename the function to copy_xstate_to_uabi_buf() so the name reflects what it is doing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.805327286@linutronix.de
2021-06-23x86/fpu: Sanitize xstateregs_set()Thomas Gleixner1-4/+0
xstateregs_set() operates on a stopped task and tries to copy the provided buffer into the task's fpu.state.xsave buffer. Any error while copying or invalid state detected after copying results in wiping the target task's FPU state completely including supervisor states. That's just wrong. The caller supplied invalid data or has a problem with unmapped memory, so there is absolutely no justification to corrupt the target state. Fix this with the following modifications: 1) If data has to be copied from userspace, allocate a buffer and copy from user first. 2) Use copy_kernel_to_xstate() unconditionally so that header checking works correctly. 3) Return on error without corrupting the target state. This prevents corrupting states and lets the caller deal with the problem it caused in the first place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.214903673@linutronix.de
2021-06-23x86/fpu: Move inlines where they belongThomas Gleixner1-14/+0
They are only used in fpstate_init() and there is no point to have them in a header just to make reading the code harder. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121452.023118522@linutronix.de
2021-06-23x86/fpu: Remove unused get_xsave_field_ptr()Thomas Gleixner1-1/+0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.915614415@linutronix.de
2021-06-23x86/fpu: Get rid of fpu__get_supported_xfeatures_mask()Thomas Gleixner1-1/+0
This function is really not doing what the comment advertises: "Find supported xfeatures based on cpu features and command-line input. This must be called after fpu__init_parse_early_param() is called and xfeatures_mask is enumerated." fpu__init_parse_early_param() does not exist anymore and the function just returns a constant. Remove it and fix the caller and get rid of further references to fpu__init_parse_early_param(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.816404717@linutronix.de
2021-06-23Merge x86/urgent into x86/fpuBorislav Petkov7-43/+27
Pick up dependent changes which either went mainline (x86/urgent is based on -rc7 and that contains them) as urgent fixes and the current x86/urgent branch which contains two more urgent fixes, so that the bigger FPU rework can base off ontop. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-06-23Merge branch 'topic/ppc-kvm' of ↵Paolo Bonzini5-29/+73
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes
2021-06-23x86/sev: Add defines for GHCB version 2 MSR protocol requestsBrijesh Singh1-1/+15
Add the necessary defines for supporting the GHCB version 2 protocol. This includes defines for: - MSR-based AP hlt request/response - Hypervisor Feature request/response This is the bare minimum of requests that need to be supported by a GHCB version 2 implementation. There are more requests in the specification, but those depend on Secure Nested Paging support being available. These defines are shared between SEV host and guest support. [ bp: Fold in https://lkml.kernel.org/r/20210622144825.27588-2-joro@8bytes.org too. Simplify the brewing macro maze into readability. ] Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/YNLXQIZ5e1wjkshG@8bytes.org
2021-06-22x86: Always inline task_size_max()Peter Zijlstra1-1/+1
Fix: vmlinux.o: warning: objtool: handle_bug()+0x10: call to task_size_max() leaves .noinstr.text section When #UD isn't a BUG, we shouldn't violate noinstr (we'll still probably die, but that's another story). Fixes: 025768a966a3 ("x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210621120120.682468274@infradead.org
2021-06-22x86/fpu: Make init_fpstate correct with optimized XSAVEThomas Gleixner1-22/+8
The XSAVE init code initializes all enabled and supported components with XRSTOR(S) to init state. Then it XSAVEs the state of the components back into init_fpstate which is used in several places to fill in the init state of components. This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because those use the init optimization and skip writing state of components which are in init state. So init_fpstate.xsave still contains all zeroes after this operation. There are two ways to solve that: 1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when XSAVES is enabled because XSAVES uses compacted format. 2) Save the components which are known to have a non-zero init state by other means. Looking deeper, #2 is the right thing to do because all components the kernel supports have all-zeroes init state except the legacy features (FP, SSE). Those cannot be hard coded because the states are not identical on all CPUs, but they can be saved with FXSAVE which avoids all conditionals. Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with a BUILD_BUG_ON() which reminds developers to validate that a newly added component has all zeroes init state. As a bonus remove the now unused copy_xregs_to_kernel_booting() crutch. The XSAVE and reshuffle method can still be implemented in the unlikely case that components are added which have a non-zero init state and no other means to save them. For now, FXSAVE is just simple and good enough. [ bp: Fix a typo or two in the text. ] Fixes: 6bad06b76892 ("x86, xsave: Use xsaveopt in context-switch path when supported") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
2021-06-21x86/sev: Split up runtime #VC handler for correct state trackingJoerg Roedel1-19/+10
Split up the #VC handler code into a from-user and a from-kernel part. This allows clean and correct state tracking, as the #VC handler needs to enter NMI-state when raised from kernel mode and plain IRQ state when raised from user-mode. Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210618115409.22735-3-joro@8bytes.org