summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/fpsimd.h
AgeCommit message (Collapse)AuthorFilesLines
2018-03-28arm64: fpsimd: Split cpu field out from struct fpsimd_stateDave Martin1-27/+2
In preparation for using a common representation of the FPSIMD state for tasks and KVM vcpus, this patch separates out the "cpu" field that is used to track the cpu on which the state was most recently loaded. This will allow common code to operate on task and vcpu contexts without requiring the cpu field to be stored at the same offset from the FPSIMD register data in both cases. This should avoid the need for messing with the definition of those parts of struct vcpu_arch that are exposed in the KVM user ABI. The resulting change is also convenient for grouping and defining the set of thread_struct fields that are supposed to be accessible to copy_{to,from}_user(), which includes user_fpsimd_state but should exclude the cpu field. This patch does not amend the usercopy whitelist to match: that will be addressed in a subsequent patch. Signed-off-by: Dave Martin <Dave.Martin@arm.com> [will: inline fpsimd_flush_state for now] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-27arm64: fpsimd: include <linux/init.h> in fpsimd.hWill Deacon1-0/+1
fpsimd.h uses the __init annotation, so pull in linux/init.h Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26arm64: capabilities: Update prototype for enable call backDave Martin1-1/+3
We issue the enable() call back for all CPU hwcaps capabilities available on the system, on all the CPUs. So far we have ignored the argument passed to the call back, which had a prototype to accept a "void *" for use with on_each_cpu() and later with stop_machine(). However, with commit 0a0d111d40fd1 ("arm64: cpufeature: Pass capability structure to ->enable callback"), there are some users of the argument who wants the matching capability struct pointer where there are multiple matching criteria for a single capability. Clean up the declaration of the call back to make it clear. 1) Renamed to cpu_enable(), to imply taking necessary actions on the called CPU for the entry. 2) Pass const pointer to the capability, to allow the call back to check the entry. (e.,g to check if any action is needed on the CPU) 3) We don't care about the result of the call back, turning this to a void. Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: James Morse <james.morse@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Dave Martin <dave.martin@arm.com> [suzuki: convert more users, rename call back and drop results] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-01-16arm64: fpsimd: Fix state leakage when migrating after sigreturnDave Martin1-1/+1
When refactoring the sigreturn code to handle SVE, I changed the sigreturn implementation to store the new FPSIMD state from the user sigframe into task_struct before reloading the state into the CPU regs. This makes it easier to convert the data for SVE when needed. However, it turns out that the fpsimd_state structure passed into fpsimd_update_current_state is not fully initialised, so assigning the structure as a whole corrupts current->thread.fpsimd_state.cpu with uninitialised data. This means that if the garbage data written to .cpu happens to be a valid cpu number, and the task is subsequently migrated to the cpu identified by the that number, and then tries to enter userspace, the CPU FPSIMD regs will be assumed to be correct for the task and not reloaded as they should be. This can result in returning to userspace with the FPSIMD registers containing data that is stale or that belongs to another task or to the kernel. Knowingly handing around a kernel structure that is incompletely initialised with user data is a potential source of mistakes, especially across source file boundaries. To help avoid a repeat of this issue, this patch adapts the relevant internal API to hand around the user-accessible subset only: struct user_fpsimd_state. To avoid future surprises, this patch also converts all uses of struct fpsimd_state that really only access the user subset, to use struct user_fpsimd_state. A few missing consts are added to function prototypes for good measure. Thanks to Will for spotting the cause of the bug here. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-11-03arm64/sve: KVM: Prevent guests from using SVEDave Martin1-0/+1
Until KVM has full SVE support, guests must not be allowed to execute SVE instructions. This patch enables the necessary traps, and also ensures that the traps are disabled again on exit from the guest so that the host can still use SVE if it wants to. On guest exit, high bits of the SVE Zn registers may have been clobbered as a side-effect the execution of FPSIMD instructions in the guest. The existing KVM host FPSIMD restore code is not sufficient to restore these bits, so this patch explicitly marks the CPU as not containing cached vector state for any task, thus forcing a reload on the next return to userspace. This is an interim measure, in advance of adding full SVE awareness to KVM. This marking of cached vector state in the CPU as invalid is done using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due to the repeated use of this rather obscure operation, it makes sense to factor it out as a separate helper with a clearer name. This patch factors it out as fpsimd_flush_cpu_state(), and ports all callers to use it. As a side effect of this refactoring, a this_cpu_write() in fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This should be fine, since cpu_pm_enter() is supposed to be called only with interrupts disabled. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Add prctl controls for userspace vector length managementDave Martin1-0/+14
This patch adds two arm64-specific prctls, to permit userspace to control its vector length: * PR_SVE_SET_VL: set the thread's SVE vector length and vector length inheritance mode. * PR_SVE_GET_VL: get the same information. Although these prctls resemble instruction set features in the SVE architecture, they provide additional control: the vector length inheritance mode is Linux-specific and nothing to do with the architecture, and the architecture does not permit EL0 to set its own vector length directly. Both can be used in portable tools without requiring the use of SVE instructions. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> [will: Fixed up prctl constants to avoid clash with PDEATHSIG] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: ptrace and ELF coredump supportDave Martin1-1/+11
This patch defines and implements a new regset NT_ARM_SVE, which describes a thread's SVE register state. This allows a debugger to manipulate the SVE state, as well as being included in ELF coredumps for post-mortem debugging. Because the regset size and layout are dependent on the thread's current vector length, it is not possible to define a C struct to describe the regset contents as is done for existing regsets. Instead, and for the same reasons, NT_ARM_SVE is based on the freeform variable-layout approach used for the SVE signal frame. Additionally, to reduce debug overhead when debugging threads that might or might not have live SVE register state, NT_ARM_SVE may be presented in one of two different formats: the old struct user_fpsimd_state format is embedded for describing the state of a thread with no live SVE state, whereas a new variable-layout structure is embedded for describing live SVE state. This avoids a debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and allows existing userspace code to handle the non-SVE case without too much modification. For this to work, NT_ARM_SVE is defined with a fixed-format header of type struct user_sve_header, which the recipient can use to figure out the content, size and layout of the reset of the regset. Accessor macros are defined to allow the vector-length-dependent parts of the regset to be manipulated. Signed-off-by: Alan Hayward <alan.hayward@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Probe SVE capabilities and usable vector lengthsDave Martin1-0/+14
This patch uses the cpufeatures framework to determine common SVE capabilities and vector lengths, and configures the runtime SVE support code appropriately. ZCR_ELx is not really a feature register, but it is convenient to use it as a template for recording the maximum vector length supported by a CPU, using the LEN field. This field is similar to a feature field in that it is a contiguous bitfield for which we want to determine the minimum system-wide value. This patch adds ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate custom code to populate it. Finding the minimum supported value of the LEN field is left to the cpufeatures framework in the usual way. The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet, so for now we just require it to be zero. Note that much of this code is dormant and SVE still won't be used yet, since system_supports_sve() remains hardwired to false. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Backend logic for setting the vector lengthDave Martin1-0/+8
This patch implements the core logic for changing a task's vector length on request from userspace. This will be used by the ptrace and prctl frontends that are implemented in later patches. The SVE architecture permits, but does not require, implementations to support vector lengths that are not a power of two. To handle this, logic is added to check a requested vector length against a possibly sparse bitmap of available vector lengths at runtime, so that the best supported value can be chosen. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Signal handling supportDave Martin1-0/+1
This patch implements support for saving and restoring the SVE registers around signals. A fixed-size header struct sve_context is always included in the signal frame encoding the thread's vector length at the time of signal delivery, optionally followed by a variable-layout structure encoding the SVE registers. Because of the need to preserve backwards compatibility, the FPSIMD view of the SVE registers is always dumped as a struct fpsimd_context in the usual way, in addition to any sve_context. The SVE vector registers are dumped in full, including bits 127:0 of each register which alias the corresponding FPSIMD vector registers in the hardware. To avoid any ambiguity about which alias to restore during sigreturn, the kernel always restores bits 127:0 of each SVE vector register from the fpsimd_context in the signal frame (which must be present): userspace needs to take this into account if it wants to modify the SVE vector register contents on return from a signal. FPSR and FPCR, which are used by both FPSIMD and SVE, are not included in sve_context because they are always present in fpsimd_context anyway. For signal delivery, a new helper fpsimd_signal_preserve_current_state() is added to update _both_ the FPSIMD and SVE views in the task struct, to make it easier to populate this information into the signal frame. Because of the redundancy between the two views of the state, only one is updated otherwise. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Core task context handlingDave Martin1-0/+16
This patch adds the core support for switching and managing the SVE architectural state of user tasks. Calls to the existing FPSIMD low-level save/restore functions are factored out as new functions task_fpsimd_{save,load}(), since SVE now dynamically may or may not need to be handled at these points depending on the kernel configuration, hardware features discovered at boot, and the runtime state of the task. To make these decisions as fast as possible, const cpucaps are used where feasible, via the system_supports_sve() helper. The SVE registers are only tracked for threads that have explicitly used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the FPSIMD view of the architectural state is stored in thread.fpsimd_state as usual. When in use, the SVE registers are not stored directly in thread_struct due to their potentially large and variable size. Because the task_struct slab allocator must be configured very early during kernel boot, it is also tricky to configure it correctly to match the maximum vector length provided by the hardware, since this depends on examining secondary CPUs as well as the primary. Instead, a pointer sve_state in thread_struct points to a dynamically allocated buffer containing the SVE register data, and code is added to allocate and free this buffer at appropriate times. TIF_SVE is set when taking an SVE access trap from userspace, if suitable hardware support has been detected. This enables SVE for the thread: a subsequent return to userspace will disable the trap accordingly. If such a trap is taken without sufficient system- wide hardware support, SIGILL is sent to the thread instead as if an undefined instruction had been executed: this may happen if userspace tries to use SVE in a system where not all CPUs support it for example. The kernel will clear TIF_SVE and disable SVE for the thread whenever an explicit syscall is made by userspace. For backwards compatibility reasons and conformance with the spirit of the base AArch64 procedure call standard, the subset of the SVE register state that aliases the FPSIMD registers is still preserved across a syscall even if this happens. The remainder of the SVE register state logically becomes zero at syscall entry, though the actual zeroing work is currently deferred until the thread next tries to use SVE, causing another trap to the kernel. This implementation is suboptimal: in the future, the fastpath case may be optimised to zero the registers in-place and leave SVE enabled for the task, where beneficial. TIF_SVE is also cleared in the following slowpath cases, which are taken as reasonable hints that the task may no longer use SVE: * exec * fork and clone Code is added to sync data between thread.fpsimd_state and thread.sve_state whenever enabling/disabling SVE, in a manner consistent with the SVE architectural programmer's model. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Alex Bennée <alex.bennee@linaro.org> [will: added #include to fix allnoconfig build] [will: use enable_daif in do_sve_acc] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64/sve: Low-level SVE architectural state manipulation functionsDave Martin1-0/+5
Manipulating the SVE architectural state, including the vector and predicate registers, first-fault register and the vector length, requires the use of dedicated instructions added by SVE. This patch adds suitable assembly functions for saving and restoring the SVE registers and querying the vector length. Setting of the vector length is done as part of register restore. Since people building kernels may not all get an SVE-enabled toolchain for a while, this patch uses macros that generate explicit opcodes in place of assembler mnemonics. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-04arm64: neon: Remove support for nested or hardirq kernel-mode NEONDave Martin1-14/+0
Support for kernel-mode NEON to be nested and/or used in hardirq context adds significant complexity, and the benefits may be marginal. In practice, kernel-mode NEON is not used in hardirq context, and is rarely used in softirq context (by certain mac80211 drivers). This patch implements an arm64 may_use_simd() function to allow clients to check whether kernel-mode NEON is usable in the current context, and simplifies kernel_neon_{begin,end}() to handle only saving of the task FPSIMD state (if any). Without nesting, there is no other state to save. The partial fpsimd save/restore functions become redundant as a result of these changes, so they are removed too. The save/restore model is changed to operate directly on task_struct without additional percpu storage. This simplifies the code and saves a bit of memory, but means that softirqs must now be disabled when manipulating the task fpsimd state from task context: correspondingly, preempt_{en,dis}sable() calls are upgraded to local_bh_{en,dis}able() as appropriate. fpsimd_thread_switch() already runs with hardirqs disabled and so is already protected from softirqs. These changes should make it easier to support kernel-mode NEON in the presence of the Scalable Vector extension in the future. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-04arm64: neon: Allow EFI runtime services to use FPSIMD in irq contextDave Martin1-0/+4
In order to be able to cope with kernel-mode NEON being unavailable in hardirq/nmi context and non-nestable, we need special handling for EFI runtime service calls that may be made during an interrupt that interrupted a kernel_neon_begin()..._end() block. This will occur if the kernel tries to write diagnostic data to EFI persistent storage during a panic triggered by an NMI for example. EFI runtime services specify an ABI that clobbers the FPSIMD state, rather than being able to use it optionally as an accelerator. This means that EFI is really a special case and can be handled specially. To enable EFI calls from interrupts, this patch creates dedicated __efi_fpsimd_{begin,end}() helpers solely for this purpose, which save/restore to a separate percpu buffer if called in a context where kernel_neon_begin() is not usable. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-08arm64: add support for kernel mode NEON in interrupt contextArd Biesheuvel1-0/+15
This patch modifies kernel_neon_begin() and kernel_neon_end(), so they may be called from any context. To address the case where only a couple of registers are needed, kernel_neon_begin_partial(u32) is introduced which takes as a parameter the number of bottom 'n' NEON q-registers required. To mark the end of such a partial section, the regular kernel_neon_end() should be used. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08arm64: defer reloading a task's FPSIMD state to userland resumeArd Biesheuvel1-0/+5
If a task gets scheduled out and back in again and nothing has touched its FPSIMD state in the mean time, there is really no reason to reload it from memory. Similarly, repeated calls to kernel_neon_begin() and kernel_neon_end() will preserve and restore the FPSIMD state every time. This patch defers the FPSIMD state restore to the last possible moment, i.e., right before the task returns to userland. If a task does not return to userland at all (for any reason), the existing FPSIMD state is preserved and may be reused by the owning task if it gets scheduled in again on the same CPU. This patch adds two more functions to abstract away from straight FPSIMD register file saves and restores: - fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded - fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08arm64: add abstractions for FPSIMD state manipulationArd Biesheuvel1-0/+3
There are two tacit assumptions in the FPSIMD handling code that will no longer hold after the next patch that optimizes away some FPSIMD state restores: . the FPSIMD registers of this CPU contain the userland FPSIMD state of task 'current'; . when switching to a task, its FPSIMD state will always be restored from memory. This patch adds the following functions to abstract away from straight FPSIMD register file saves and restores: - fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved - fpsimd_update_current_state -> replace current's FPSIMD state Where necessary, the signal handling and fork code are updated to use the above wrappers instead of poking into the FPSIMD registers directly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2012-11-08arm64: elf: fix core dumping definitions for GP and FP registersWill Deacon1-3/+2
struct user_fp does not exist for arm64, so use struct user_fpsimd_state instead for the ELF core dumping definitions. Furthermore, since we use regset-based core dumping, we do not need definitions for dump_task_regs and dump_fpu. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2012-09-17arm64: Floating point and SIMDCatalin Marinas1-0/+64
This patch adds support for FP/ASIMD register bank saving and restoring during context switch and FP exception handling to generate SIGFPE. There are 32 128-bit registers and the context switching is currently done non-lazily. Benchmarks on real hardware are required before implementing lazy FP state saving/restoring. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>