diff options
Diffstat (limited to 'Documentation/arm64/sme.rst')
-rw-r--r-- | Documentation/arm64/sme.rst | 468 |
1 files changed, 0 insertions, 468 deletions
diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst deleted file mode 100644 index 1c43ea12eb4f..000000000000 --- a/Documentation/arm64/sme.rst +++ /dev/null @@ -1,468 +0,0 @@ -=================================================== -Scalable Matrix Extension support for AArch64 Linux -=================================================== - -This document outlines briefly the interface provided to userspace by Linux in -order to support use of the ARM Scalable Matrix Extension (SME). - -This is an outline of the most important features and issues only and not -intended to be exhaustive. It should be read in conjunction with the SVE -documentation in sve.rst which provides details on the Streaming SVE mode -included in SME. - -This document does not aim to describe the SME architecture or programmer's -model. To aid understanding, a minimal description of relevant programmer's -model features for SME is included in Appendix A. - - -1. General ------------ - -* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when - present) ZTn register state and TPIDR2_EL0 are tracked per thread. - -* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector - AT_HWCAP2 entry. Presence of this flag implies the presence of the SME - instructions and registers, and the Linux-specific system interfaces - described in this document. SME is reported in /proc/cpuinfo as "sme". - -* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the - aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of - the SME2 instructions and ZT0, and the Linux-specific system interfaces - described in this document. SME2 is reported in /proc/cpuinfo as "sme2". - -* Support for the execution of SME instructions in userspace can also be - detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS - instruction, and checking that the value of the SME field is nonzero. [3] - - It does not guarantee the presence of the system interfaces described in the - following sections: software that needs to verify that those interfaces are - present must check for HWCAP2_SME instead. - -* There are a number of optional SME features, presence of these is reported - through AT_HWCAP2 through: - - HWCAP2_SME_I16I64 - HWCAP2_SME_F64F64 - HWCAP2_SME_I8I32 - HWCAP2_SME_F16F32 - HWCAP2_SME_B16F32 - HWCAP2_SME_F32F32 - HWCAP2_SME_FA64 - HWCAP2_SME2 - - This list may be extended over time as the SME architecture evolves. - - These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, - which userspace can read using an MRS instruction. See elf_hwcaps.txt and - cpu-feature-registers.txt for details. - -* Debuggers should restrict themselves to interacting with the target via the - NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended - way of detecting support for these regsets is to connect to a target process - first and then attempt a - - ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov). - -* Whenever ZA register values are exchanged in memory between userspace and - the kernel, the register value is encoded in memory as a series of horizontal - vectors from 0 to VL/8-1 stored in the same endianness invariant format as is - used for SVE vectors. - -* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, - in which case it is set to 0. - -2. Vector lengths ------------------- - -SME defines a second vector length similar to the SVE vector length which is -controls the size of the streaming mode SVE vectors and the ZA matrix array. -The ZA matrix is square with each side having as many bytes as a streaming -mode SVE vector. - - -3. Sharing of streaming and non-streaming mode SVE state ---------------------------------------------------------- - -It is implementation defined which if any parts of the SVE state are shared -between streaming and non-streaming modes. When switching between modes -via software interfaces such as ptrace if no register content is provided as -part of switching no state will be assumed to be shared and everything will -be zeroed. - - -4. System call behaviour -------------------------- - -* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the - ZA matrix and ZTn (if present) are preserved. - -* On syscall PSTATE.SM will be cleared and the SVE registers will be handled - as per the standard SVE ABI. - -* None of the SVE registers, ZA or ZTn are used to pass arguments to - or receive results from any syscall. - -* On process creation (eg, clone()) the newly created process will have - PSTATE.SM cleared. - -* All other SME state of a thread, including the currently configured vector - length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector - length (if any), is preserved across all syscalls, subject to the specific - exceptions for execve() described in section 6. - - -5. Signal handling -------------------- - -* Signal handlers are invoked with streaming mode and ZA disabled. - -* A new signal frame record TPIDR2_MAGIC is added formatted as a struct - tpidr2_context to allow access to TPIDR2_EL0 from signal handlers. - -* A new signal frame record za_context encodes the ZA register contents on - signal delivery. [1] - -* The signal frame record for ZA always contains basic metadata, in particular - the thread's vector length (in za_context.vl). - -* The ZA matrix may or may not be included in the record, depending on - the value of PSTATE.ZA. The registers are present if and only if: - za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) - in which case PSTATE.ZA == 1. - -* If matrix data is present, the remainder of the record has a vl-dependent - size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to - them. - -* The matrix is stored as a series of horizontal vectors in the same format as - is used for SVE vectors. - -* If the ZA context is too big to fit in sigcontext.__reserved[], then extra - space is allocated on the stack, an extra_context record is written in - __reserved[] referencing this space. za_context is then written in the - extra space. Refer to [1] for further details about this mechanism. - -* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will - be generated. - -* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a - standard signal frame header followed by a struct zt_context specifying - the number of ZTn registers supported by the system, then zt_context.nregs - blocks of 64 bytes of data per register. - - -5. Signal return ------------------ - -When returning from a signal handler: - -* If there is no za_context record in the signal frame, or if the record is - present but contains no register data as described in the previous section, - then ZA is disabled. - -* If za_context is present in the signal frame and contains matrix data then - PSTATE.ZA is set to 1 and ZA is populated with the specified data. - -* The vector length cannot be changed via signal return. If za_context.vl in - the signal frame does not match the current vector length, the signal return - attempt is treated as illegal, resulting in a forced SIGSEGV. - -* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a - signal frame record for ZTn, resulting in a forced SIGSEGV. - - -6. prctl extensions --------------------- - -Some new prctl() calls are added to allow programs to manage the SME vector -length: - -prctl(PR_SME_SET_VL, unsigned long arg) - - Sets the vector length of the calling thread and related flags, where - arg == vl | flags. Other threads of the calling process are unaffected. - - vl is the desired vector length, where sve_vl_valid(vl) must be true. - - flags: - - PR_SME_VL_INHERIT - - Inherit the current vector length across execve(). Otherwise, the - vector length is reset to the system default at execve(). (See - Section 9.) - - PR_SME_SET_VL_ONEXEC - - Defer the requested vector length change until the next execve() - performed by this thread. - - The effect is equivalent to implicit execution of the following - call immediately after the next execve() (if any) by the thread: - - prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) - - This allows launching of a new program with a different vector - length, while avoiding runtime side effects in the caller. - - Without PR_SME_SET_VL_ONEXEC, the requested change takes effect - immediately. - - - Return value: a nonnegative on success, or a negative value on error: - EINVAL: SME not supported, invalid vector length requested, or - invalid flags. - - - On success: - - * Either the calling thread's vector length or the deferred vector length - to be applied at the next execve() by the thread (dependent on whether - PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value - supported by the system that is less than or equal to vl. If vl == - SVE_VL_MAX, the value set will be the largest value supported by the - system. - - * Any previously outstanding deferred vector length change in the calling - thread is cancelled. - - * The returned value describes the resulting configuration, encoded as for - PR_SME_GET_VL. The vector length reported in this value is the new - current vector length for this thread if PR_SME_SET_VL_ONEXEC was not - present in arg; otherwise, the reported vector length is the deferred - vector length that will be applied at the next execve() by the calling - thread. - - * Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all - bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become - unspecified, including both streaming and non-streaming SVE state. - Calling PR_SME_SET_VL with vl equal to the thread's current vector - length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, - does not constitute a change to the vector length for this purpose. - - * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. - Calling PR_SME_SET_VL with vl equal to the thread's current vector - length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, - does not constitute a change to the vector length for this purpose. - - -prctl(PR_SME_GET_VL) - - Gets the vector length of the calling thread. - - The following flag may be OR-ed into the result: - - PR_SME_VL_INHERIT - - Vector length will be inherited across execve(). - - There is no way to determine whether there is an outstanding deferred - vector length change (which would only normally be the case between a - fork() or vfork() and the corresponding execve() in typical use). - - To extract the vector length from the result, bitwise and it with - PR_SME_VL_LEN_MASK. - - Return value: a nonnegative value on success, or a negative value on error: - EINVAL: SME not supported. - - -7. ptrace extensions ---------------------- - -* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE - state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in - sve.rst. - -* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via - PTRACE_GETREGSET and PTRACE_SETREGSET. - - Refer to [2] for definitions. - -The regset data starts with struct user_za_header, containing: - - size - - Size of the complete regset, in bytes. - This depends on vl and possibly on other things in the future. - - If a call to PTRACE_GETREGSET requests less data than the value of - size, the caller can allocate a larger buffer and retry in order to - read the complete regset. - - max_size - - Maximum size in bytes that the regset can grow to for the target - thread. The regset won't grow bigger than this even if the target - thread changes its vector length etc. - - vl - - Target thread's current streaming vector length, in bytes. - - max_vl - - Maximum possible streaming vector length for the target thread. - - flags - - Zero or more of the following flags, which have the same - meaning and behaviour as the corresponding PR_SET_VL_* flags: - - SME_PT_VL_INHERIT - - SME_PT_VL_ONEXEC (SETREGSET only). - -* The effects of changing the vector length and/or flags are equivalent to - those documented for PR_SME_SET_VL. - - The caller must make a further GETREGSET call if it needs to know what VL is - actually set by SETREGSET, unless is it known in advance that the requested - VL is supported. - -* The size and layout of the payload depends on the header fields. The - SME_PT_ZA_*() macros are provided to facilitate access to the data. - -* In either case, for SETREGSET it is permissible to omit the payload, in which - case the vector length and flags are changed and PSTATE.ZA is set to 0 - (along with any consequences of those changes). If a payload is provided - then PSTATE.ZA will be set to 1. - -* For SETREGSET, if the requested VL is not supported, the effect will be the - same as if the payload were omitted, except that an EIO error is reported. - No attempt is made to translate the payload data to the correct layout - for the vector length actually set. It is up to the caller to translate the - payload layout for the actual VL and retry. - -* The effect of writing a partial, incomplete payload is unspecified. - -* A new regset NT_ARM_ZT is defined for access to ZTn state via - PTRACE_GETREGSET and PTRACE_SETREGSET. - -* The NT_ARM_ZT regset consists of a single 512 bit register. - -* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0. - -* Writes to NT_ARM_ZT will set PSTATE.ZA to 1. - - -8. ELF coredump extensions ---------------------------- - -* NT_ARM_SSVE notes will be added to each coredump for - each thread of the dumped process. The contents will be equivalent to the - data that would have been read if a PTRACE_GETREGSET of the corresponding - type were executed for each thread when the coredump was generated. - -* A NT_ARM_ZA note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread - when the coredump was generated. - -* A NT_ARM_ZT note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread - when the coredump was generated. - -* The NT_ARM_TLS note will be extended to two registers, the second register - will contain TPIDR2_EL0 on systems that support SME and will be read as - zero with writes ignored otherwise. - -9. System runtime configuration --------------------------------- - -* To mitigate the ABI impact of expansion of the signal frame, a policy - mechanism is provided for administrators, distro maintainers and developers - to set the default vector length for userspace processes: - -/proc/sys/abi/sme_default_vector_length - - Writing the text representation of an integer to this file sets the system - default vector length to the specified value, unless the value is greater - than the maximum vector length supported by the system in which case the - default vector length is set to that maximum. - - The result can be determined by reopening the file and reading its - contents. - - At boot, the default vector length is initially set to 32 or the maximum - supported vector length, whichever is smaller and supported. This - determines the initial vector length of the init process (PID 1). - - Reading this file returns the current system default vector length. - -* At every execve() call, the new vector length of the new process is set to - the system default vector length, unless - - * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the - calling thread, or - - * a deferred vector length change is pending, established via the - PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). - -* Modifying the system default vector length does not affect the vector length - of any existing process or thread that does not make an execve() call. - - -Appendix A. SME programmer's model (informative) -================================================= - -This section provides a minimal description of the additions made by SME to the -ARMv8-A programmer's model that are relevant to this document. - -Note: This section is for information only and not intended to be complete or -to replace any architectural specification. - -A.1. Registers ---------------- - -In A64 state, SME adds the following: - -* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE - features are available. When supported EL0 software may enter and leave - streaming mode at any time. - - For best system performance it is strongly encouraged for software to enable - streaming mode only when it is actively being used. - -* A new vector length controlling the size of ZA and the Z registers when in - streaming mode, separately to the vector length used for SVE when not in - streaming mode. There is no requirement that either the currently selected - vector length or the set of vector lengths supported for the two modes in - a given system have any relationship. The streaming mode vector length - is referred to as SVL. - -* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most - operations on ZA require that streaming mode be enabled but ZA can be - enabled without streaming mode in order to load, save and retain data. - - For best system performance it is strongly encouraged for software to enable - ZA only when it is actively being used. - -* A new ZT0 register is introduced when SME2 is present. This is a 512 bit - register which is accessible when PSTATE.ZA is set, as ZA itself is. - -* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and - SMSTOP instructions or by access to the SVCR system register: - - * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid - data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is - changed from 0 to 1 all bits in ZA are cleared. - - * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value - of PSTATE.SM is changed then it is implementation defined if the subset - of the floating point register bits valid in both modes may be retained. - Any other bits will be cleared. - - -References -========== - -[1] arch/arm64/include/uapi/asm/sigcontext.h - AArch64 Linux signal ABI definitions - -[2] arch/arm64/include/uapi/asm/ptrace.h - AArch64 Linux ptrace ABI definitions - -[3] Documentation/arm64/cpu-feature-registers.rst |