diff options
Diffstat (limited to 'Documentation/arch')
86 files changed, 12549 insertions, 2 deletions
diff --git a/Documentation/arch/arm/arm.rst b/Documentation/arch/arm/arm.rst new file mode 100644 index 000000000000..99d660fdf73f --- /dev/null +++ b/Documentation/arch/arm/arm.rst @@ -0,0 +1,212 @@ +======================= +ARM Linux 2.6 and upper +======================= + + Please check <ftp://ftp.arm.linux.org.uk/pub/armlinux> for + updates. + +Compilation of kernel +--------------------- + + In order to compile ARM Linux, you will need a compiler capable of + generating ARM ELF code with GNU extensions. GCC 3.3 is known to be + a good compiler. Fortunately, you needn't guess. The kernel will report + an error if your compiler is a recognized offender. + + To build ARM Linux natively, you shouldn't have to alter the ARCH = line + in the top level Makefile. However, if you don't have the ARM Linux ELF + tools installed as default, then you should change the CROSS_COMPILE + line as detailed below. + + If you wish to cross-compile, then alter the following lines in the top + level make file:: + + ARCH = <whatever> + + with:: + + ARCH = arm + + and:: + + CROSS_COMPILE= + + to:: + + CROSS_COMPILE=<your-path-to-your-compiler-without-gcc> + + eg.:: + + CROSS_COMPILE=arm-linux- + + Do a 'make config', followed by 'make Image' to build the kernel + (arch/arm/boot/Image). A compressed image can be built by doing a + 'make zImage' instead of 'make Image'. + + +Bug reports etc +--------------- + + Please send patches to the patch system. For more information, see + http://www.arm.linux.org.uk/developer/patches/info.php Always include some + explanation as to what the patch does and why it is needed. + + Bug reports should be sent to linux-arm-kernel@lists.arm.linux.org.uk, + or submitted through the web form at + http://www.arm.linux.org.uk/developer/ + + When sending bug reports, please ensure that they contain all relevant + information, eg. the kernel messages that were printed before/during + the problem, what you were doing, etc. + + +Include files +------------- + + Several new include directories have been created under include/asm-arm, + which are there to reduce the clutter in the top-level directory. These + directories, and their purpose is listed below: + + ============= ========================================================== + `arch-*` machine/platform specific header files + `hardware` driver-internal ARM specific data structures/definitions + `mach` descriptions of generic ARM to specific machine interfaces + `proc-*` processor dependent header files (currently only two + categories) + ============= ========================================================== + + +Machine/Platform support +------------------------ + + The ARM tree contains support for a lot of different machine types. To + continue supporting these differences, it has become necessary to split + machine-specific parts by directory. For this, the machine category is + used to select which directories and files get included (we will use + $(MACHINE) to refer to the category) + + To this end, we now have arch/arm/mach-$(MACHINE) directories which are + designed to house the non-driver files for a particular machine (eg, PCI, + memory management, architecture definitions etc). For all future + machines, there should be a corresponding arch/arm/mach-$(MACHINE)/include/mach + directory. + + +Modules +------- + + Although modularisation is supported (and required for the FP emulator), + each module on an ARM2/ARM250/ARM3 machine when is loaded will take + memory up to the next 32k boundary due to the size of the pages. + Therefore, is modularisation on these machines really worth it? + + However, ARM6 and up machines allow modules to take multiples of 4k, and + as such Acorn RiscPCs and other architectures using these processors can + make good use of modularisation. + + +ADFS Image files +---------------- + + You can access image files on your ADFS partitions by mounting the ADFS + partition, and then using the loopback device driver. You must have + losetup installed. + + Please note that the PCEmulator DOS partitions have a partition table at + the start, and as such, you will have to give '-o offset' to losetup. + + +Request to developers +--------------------- + + When writing device drivers which include a separate assembler file, please + include it in with the C file, and not the arch/arm/lib directory. This + allows the driver to be compiled as a loadable module without requiring + half the code to be compiled into the kernel image. + + In general, try to avoid using assembler unless it is really necessary. It + makes drivers far less easy to port to other hardware. + + +ST506 hard drives +----------------- + + The ST506 hard drive controllers seem to be working fine (if a little + slowly). At the moment they will only work off the controllers on an + A4x0's motherboard, but for it to work off a Podule just requires + someone with a podule to add the addresses for the IRQ mask and the + HDC base to the source. + + As of 31/3/96 it works with two drives (you should get the ADFS + `*configure` harddrive set to 2). I've got an internal 20MB and a great + big external 5.25" FH 64MB drive (who could ever want more :-) ). + + I've just got 240K/s off it (a dd with bs=128k); thats about half of what + RiscOS gets; but it's a heck of a lot better than the 50K/s I was getting + last week :-) + + Known bug: Drive data errors can cause a hang; including cases where + the controller has fixed the error using ECC. (Possibly ONLY + in that case...hmm). + + +1772 Floppy +----------- + This also seems to work OK, but hasn't been stressed much lately. It + hasn't got any code for disc change detection in there at the moment which + could be a bit of a problem! Suggestions on the correct way to do this + are welcome. + + +`CONFIG_MACH_` and `CONFIG_ARCH_` +--------------------------------- + A change was made in 2003 to the macro names for new machines. + Historically, `CONFIG_ARCH_` was used for the bonafide architecture, + e.g. SA1100, as well as implementations of the architecture, + e.g. Assabet. It was decided to change the implementation macros + to read `CONFIG_MACH_` for clarity. Moreover, a retroactive fixup has + not been made because it would complicate patching. + + Previous registrations may be found online. + + <http://www.arm.linux.org.uk/developer/machines/> + +Kernel entry (head.S) +--------------------- + The initial entry into the kernel is via head.S, which uses machine + independent code. The machine is selected by the value of 'r1' on + entry, which must be kept unique. + + Due to the large number of machines which the ARM port of Linux provides + for, we have a method to manage this which ensures that we don't end up + duplicating large amounts of code. + + We group machine (or platform) support code into machine classes. A + class typically based around one or more system on a chip devices, and + acts as a natural container around the actual implementations. These + classes are given directories - arch/arm/mach-<class> - which contain + the source files and include/mach/ to support the machine class. + + For example, the SA1100 class is based upon the SA1100 and SA1110 SoC + devices, and contains the code to support the way the on-board and off- + board devices are used, or the device is setup, and provides that + machine specific "personality." + + For platforms that support device tree (DT), the machine selection is + controlled at runtime by passing the device tree blob to the kernel. At + compile-time, support for the machine type must be selected. This allows for + a single multiplatform kernel build to be used for several machine types. + + For platforms that do not use device tree, this machine selection is + controlled by the machine type ID, which acts both as a run-time and a + compile-time code selection method. You can register a new machine via the + web site at: + + <http://www.arm.linux.org.uk/developer/machines/> + + Note: Please do not register a machine type for DT-only platforms. If your + platform is DT-only, you do not need a registered machine type. + +--- + +Russell King (15/03/2004) diff --git a/Documentation/arch/arm/booting.rst b/Documentation/arch/arm/booting.rst new file mode 100644 index 000000000000..5974e37b3d20 --- /dev/null +++ b/Documentation/arch/arm/booting.rst @@ -0,0 +1,237 @@ +================= +Booting ARM Linux +================= + +Author: Russell King + +Date : 18 May 2002 + +The following documentation is relevant to 2.4.18-rmk6 and beyond. + +In order to boot ARM Linux, you require a boot loader, which is a small +program that runs before the main kernel. The boot loader is expected +to initialise various devices, and eventually call the Linux kernel, +passing information to the kernel. + +Essentially, the boot loader should provide (as a minimum) the +following: + +1. Setup and initialise the RAM. +2. Initialise one serial port. +3. Detect the machine type. +4. Setup the kernel tagged list. +5. Load initramfs. +6. Call the kernel image. + + +1. Setup and initialise RAM +--------------------------- + +Existing boot loaders: + MANDATORY +New boot loaders: + MANDATORY + +The boot loader is expected to find and initialise all RAM that the +kernel will use for volatile data storage in the system. It performs +this in a machine dependent manner. (It may use internal algorithms +to automatically locate and size all RAM, or it may use knowledge of +the RAM in the machine, or any other method the boot loader designer +sees fit.) + + +2. Initialise one serial port +----------------------------- + +Existing boot loaders: + OPTIONAL, RECOMMENDED +New boot loaders: + OPTIONAL, RECOMMENDED + +The boot loader should initialise and enable one serial port on the +target. This allows the kernel serial driver to automatically detect +which serial port it should use for the kernel console (generally +used for debugging purposes, or communication with the target.) + +As an alternative, the boot loader can pass the relevant 'console=' +option to the kernel via the tagged lists specifying the port, and +serial format options as described in + + Documentation/admin-guide/kernel-parameters.rst. + + +3. Detect the machine type +-------------------------- + +Existing boot loaders: + OPTIONAL +New boot loaders: + MANDATORY except for DT-only platforms + +The boot loader should detect the machine type its running on by some +method. Whether this is a hard coded value or some algorithm that +looks at the connected hardware is beyond the scope of this document. +The boot loader must ultimately be able to provide a MACH_TYPE_xxx +value to the kernel. (see linux/arch/arm/tools/mach-types). This +should be passed to the kernel in register r1. + +For DT-only platforms, the machine type will be determined by device +tree. set the machine type to all ones (~0). This is not strictly +necessary, but assures that it will not match any existing types. + +4. Setup boot data +------------------ + +Existing boot loaders: + OPTIONAL, HIGHLY RECOMMENDED +New boot loaders: + MANDATORY + +The boot loader must provide either a tagged list or a dtb image for +passing configuration data to the kernel. The physical address of the +boot data is passed to the kernel in register r2. + +4a. Setup the kernel tagged list +-------------------------------- + +The boot loader must create and initialise the kernel tagged list. +A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE. +The ATAG_CORE tag may or may not be empty. An empty ATAG_CORE tag +has the size field set to '2' (0x00000002). The ATAG_NONE must set +the size field to zero. + +Any number of tags can be placed in the list. It is undefined +whether a repeated tag appends to the information carried by the +previous tag, or whether it replaces the information in its +entirety; some tags behave as the former, others the latter. + +The boot loader must pass at a minimum the size and location of +the system memory, and root filesystem location. Therefore, the +minimum tagged list should look:: + + +-----------+ + base -> | ATAG_CORE | | + +-----------+ | + | ATAG_MEM | | increasing address + +-----------+ | + | ATAG_NONE | | + +-----------+ v + +The tagged list should be stored in system RAM. + +The tagged list must be placed in a region of memory where neither +the kernel decompressor nor initrd 'bootp' program will overwrite +it. The recommended placement is in the first 16KiB of RAM. + +4b. Setup the device tree +------------------------- + +The boot loader must load a device tree image (dtb) into system ram +at a 64bit aligned address and initialize it with the boot data. The +dtb format is documented at https://www.devicetree.org/specifications/. +The kernel will look for the dtb magic value of 0xd00dfeed at the dtb +physical address to determine if a dtb has been passed instead of a +tagged list. + +The boot loader must pass at a minimum the size and location of the +system memory, and the root filesystem location. The dtb must be +placed in a region of memory where the kernel decompressor will not +overwrite it, while remaining within the region which will be covered +by the kernel's low-memory mapping. + +A safe location is just above the 128MiB boundary from start of RAM. + +5. Load initramfs. +------------------ + +Existing boot loaders: + OPTIONAL +New boot loaders: + OPTIONAL + +If an initramfs is in use then, as with the dtb, it must be placed in +a region of memory where the kernel decompressor will not overwrite it +while also with the region which will be covered by the kernel's +low-memory mapping. + +A safe location is just above the device tree blob which itself will +be loaded just above the 128MiB boundary from the start of RAM as +recommended above. + +6. Calling the kernel image +--------------------------- + +Existing boot loaders: + MANDATORY +New boot loaders: + MANDATORY + +There are two options for calling the kernel zImage. If the zImage +is stored in flash, and is linked correctly to be run from flash, +then it is legal for the boot loader to call the zImage in flash +directly. + +The zImage may also be placed in system RAM and called there. The +kernel should be placed in the first 128MiB of RAM. It is recommended +that it is loaded above 32MiB in order to avoid the need to relocate +prior to decompression, which will make the boot process slightly +faster. + +When booting a raw (non-zImage) kernel the constraints are tighter. +In this case the kernel must be loaded at an offset into system equal +to TEXT_OFFSET - PAGE_OFFSET. + +In any case, the following conditions must be met: + +- Quiesce all DMA capable devices so that memory does not get + corrupted by bogus network packets or disk data. This will save + you many hours of debug. + +- CPU register settings + + - r0 = 0, + - r1 = machine type number discovered in (3) above. + - r2 = physical address of tagged list in system RAM, or + physical address of device tree block (dtb) in system RAM + +- CPU mode + + All forms of interrupts must be disabled (IRQs and FIQs) + + For CPUs which do not include the ARM virtualization extensions, the + CPU must be in SVC mode. (A special exception exists for Angel) + + CPUs which include support for the virtualization extensions can be + entered in HYP mode in order to enable the kernel to make full use of + these extensions. This is the recommended boot method for such CPUs, + unless the virtualisations are already in use by a pre-installed + hypervisor. + + If the kernel is not entered in HYP mode for any reason, it must be + entered in SVC mode. + +- Caches, MMUs + + The MMU must be off. + + Instruction cache may be on or off. + + Data cache must be off. + + If the kernel is entered in HYP mode, the above requirements apply to + the HYP mode configuration in addition to the ordinary PL1 (privileged + kernel modes) configuration. In addition, all traps into the + hypervisor must be disabled, and PL1 access must be granted for all + peripherals and CPU resources for which this is architecturally + possible. Except for entering in HYP mode, the system configuration + should be such that a kernel which does not include support for the + virtualization extensions can boot correctly without extra help. + +- The boot loader is expected to call the kernel image by jumping + directly to the first instruction of the kernel image. + + On CPUs supporting the ARM instruction set, the entry must be + made in ARM state, even for a Thumb-2 kernel. + + On CPUs supporting only the Thumb instruction set such as + Cortex-M class CPUs, the entry must be made in Thumb state. diff --git a/Documentation/arch/arm/cluster-pm-race-avoidance.rst b/Documentation/arch/arm/cluster-pm-race-avoidance.rst new file mode 100644 index 000000000000..aa58603d3f28 --- /dev/null +++ b/Documentation/arch/arm/cluster-pm-race-avoidance.rst @@ -0,0 +1,533 @@ +========================================================= +Cluster-wide Power-up/power-down race avoidance algorithm +========================================================= + +This file documents the algorithm which is used to coordinate CPU and +cluster setup and teardown operations and to manage hardware coherency +controls safely. + +The section "Rationale" explains what the algorithm is for and why it is +needed. "Basic model" explains general concepts using a simplified view +of the system. The other sections explain the actual details of the +algorithm in use. + + +Rationale +--------- + +In a system containing multiple CPUs, it is desirable to have the +ability to turn off individual CPUs when the system is idle, reducing +power consumption and thermal dissipation. + +In a system containing multiple clusters of CPUs, it is also desirable +to have the ability to turn off entire clusters. + +Turning entire clusters off and on is a risky business, because it +involves performing potentially destructive operations affecting a group +of independently running CPUs, while the OS continues to run. This +means that we need some coordination in order to ensure that critical +cluster-level operations are only performed when it is truly safe to do +so. + +Simple locking may not be sufficient to solve this problem, because +mechanisms like Linux spinlocks may rely on coherency mechanisms which +are not immediately enabled when a cluster powers up. Since enabling or +disabling those mechanisms may itself be a non-atomic operation (such as +writing some hardware registers and invalidating large caches), other +methods of coordination are required in order to guarantee safe +power-down and power-up at the cluster level. + +The mechanism presented in this document describes a coherent memory +based protocol for performing the needed coordination. It aims to be as +lightweight as possible, while providing the required safety properties. + + +Basic model +----------- + +Each cluster and CPU is assigned a state, as follows: + + - DOWN + - COMING_UP + - UP + - GOING_DOWN + +:: + + +---------> UP ----------+ + | v + + COMING_UP GOING_DOWN + + ^ | + +--------- DOWN <--------+ + + +DOWN: + The CPU or cluster is not coherent, and is either powered off or + suspended, or is ready to be powered off or suspended. + +COMING_UP: + The CPU or cluster has committed to moving to the UP state. + It may be part way through the process of initialisation and + enabling coherency. + +UP: + The CPU or cluster is active and coherent at the hardware + level. A CPU in this state is not necessarily being used + actively by the kernel. + +GOING_DOWN: + The CPU or cluster has committed to moving to the DOWN + state. It may be part way through the process of teardown and + coherency exit. + + +Each CPU has one of these states assigned to it at any point in time. +The CPU states are described in the "CPU state" section, below. + +Each cluster is also assigned a state, but it is necessary to split the +state value into two parts (the "cluster" state and "inbound" state) and +to introduce additional states in order to avoid races between different +CPUs in the cluster simultaneously modifying the state. The cluster- +level states are described in the "Cluster state" section. + +To help distinguish the CPU states from cluster states in this +discussion, the state names are given a `CPU_` prefix for the CPU states, +and a `CLUSTER_` or `INBOUND_` prefix for the cluster states. + + +CPU state +--------- + +In this algorithm, each individual core in a multi-core processor is +referred to as a "CPU". CPUs are assumed to be single-threaded: +therefore, a CPU can only be doing one thing at a single point in time. + +This means that CPUs fit the basic model closely. + +The algorithm defines the following states for each CPU in the system: + + - CPU_DOWN + - CPU_COMING_UP + - CPU_UP + - CPU_GOING_DOWN + +:: + + cluster setup and + CPU setup complete policy decision + +-----------> CPU_UP ------------+ + | v + + CPU_COMING_UP CPU_GOING_DOWN + + ^ | + +----------- CPU_DOWN <----------+ + policy decision CPU teardown complete + or hardware event + + +The definitions of the four states correspond closely to the states of +the basic model. + +Transitions between states occur as follows. + +A trigger event (spontaneous) means that the CPU can transition to the +next state as a result of making local progress only, with no +requirement for any external event to happen. + + +CPU_DOWN: + A CPU reaches the CPU_DOWN state when it is ready for + power-down. On reaching this state, the CPU will typically + power itself down or suspend itself, via a WFI instruction or a + firmware call. + + Next state: + CPU_COMING_UP + Conditions: + none + + Trigger events: + a) an explicit hardware power-up operation, resulting + from a policy decision on another CPU; + + b) a hardware event, such as an interrupt. + + +CPU_COMING_UP: + A CPU cannot start participating in hardware coherency until the + cluster is set up and coherent. If the cluster is not ready, + then the CPU will wait in the CPU_COMING_UP state until the + cluster has been set up. + + Next state: + CPU_UP + Conditions: + The CPU's parent cluster must be in CLUSTER_UP. + Trigger events: + Transition of the parent cluster to CLUSTER_UP. + + Refer to the "Cluster state" section for a description of the + CLUSTER_UP state. + + +CPU_UP: + When a CPU reaches the CPU_UP state, it is safe for the CPU to + start participating in local coherency. + + This is done by jumping to the kernel's CPU resume code. + + Note that the definition of this state is slightly different + from the basic model definition: CPU_UP does not mean that the + CPU is coherent yet, but it does mean that it is safe to resume + the kernel. The kernel handles the rest of the resume + procedure, so the remaining steps are not visible as part of the + race avoidance algorithm. + + The CPU remains in this state until an explicit policy decision + is made to shut down or suspend the CPU. + + Next state: + CPU_GOING_DOWN + Conditions: + none + Trigger events: + explicit policy decision + + +CPU_GOING_DOWN: + While in this state, the CPU exits coherency, including any + operations required to achieve this (such as cleaning data + caches). + + Next state: + CPU_DOWN + Conditions: + local CPU teardown complete + Trigger events: + (spontaneous) + + +Cluster state +------------- + +A cluster is a group of connected CPUs with some common resources. +Because a cluster contains multiple CPUs, it can be doing multiple +things at the same time. This has some implications. In particular, a +CPU can start up while another CPU is tearing the cluster down. + +In this discussion, the "outbound side" is the view of the cluster state +as seen by a CPU tearing the cluster down. The "inbound side" is the +view of the cluster state as seen by a CPU setting the CPU up. + +In order to enable safe coordination in such situations, it is important +that a CPU which is setting up the cluster can advertise its state +independently of the CPU which is tearing down the cluster. For this +reason, the cluster state is split into two parts: + + "cluster" state: The global state of the cluster; or the state + on the outbound side: + + - CLUSTER_DOWN + - CLUSTER_UP + - CLUSTER_GOING_DOWN + + "inbound" state: The state of the cluster on the inbound side. + + - INBOUND_NOT_COMING_UP + - INBOUND_COMING_UP + + + The different pairings of these states results in six possible + states for the cluster as a whole:: + + CLUSTER_UP + +==========> INBOUND_NOT_COMING_UP -------------+ + # | + | + CLUSTER_UP <----+ | + INBOUND_COMING_UP | v + + ^ CLUSTER_GOING_DOWN CLUSTER_GOING_DOWN + # INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP + + CLUSTER_DOWN | | + INBOUND_COMING_UP <----+ | + | + ^ | + +=========== CLUSTER_DOWN <------------+ + INBOUND_NOT_COMING_UP + + Transitions -----> can only be made by the outbound CPU, and + only involve changes to the "cluster" state. + + Transitions ===##> can only be made by the inbound CPU, and only + involve changes to the "inbound" state, except where there is no + further transition possible on the outbound side (i.e., the + outbound CPU has put the cluster into the CLUSTER_DOWN state). + + The race avoidance algorithm does not provide a way to determine + which exact CPUs within the cluster play these roles. This must + be decided in advance by some other means. Refer to the section + "Last man and first man selection" for more explanation. + + + CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the + cluster can actually be powered down. + + The parallelism of the inbound and outbound CPUs is observed by + the existence of two different paths from CLUSTER_GOING_DOWN/ + INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic + model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to + COMING_UP in the basic model). The second path avoids cluster + teardown completely. + + CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic + model. The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP + is trivial and merely resets the state machine ready for the + next cycle. + + Details of the allowable transitions follow. + + The next state in each case is notated + + <cluster state>/<inbound state> (<transitioner>) + + where the <transitioner> is the side on which the transition + can occur; either the inbound or the outbound side. + + +CLUSTER_DOWN/INBOUND_NOT_COMING_UP: + Next state: + CLUSTER_DOWN/INBOUND_COMING_UP (inbound) + Conditions: + none + + Trigger events: + a) an explicit hardware power-up operation, resulting + from a policy decision on another CPU; + + b) a hardware event, such as an interrupt. + + +CLUSTER_DOWN/INBOUND_COMING_UP: + + In this state, an inbound CPU sets up the cluster, including + enabling of hardware coherency at the cluster level and any + other operations (such as cache invalidation) which are required + in order to achieve this. + + The purpose of this state is to do sufficient cluster-level + setup to enable other CPUs in the cluster to enter coherency + safely. + + Next state: + CLUSTER_UP/INBOUND_COMING_UP (inbound) + Conditions: + cluster-level setup and hardware coherency complete + Trigger events: + (spontaneous) + + +CLUSTER_UP/INBOUND_COMING_UP: + + Cluster-level setup is complete and hardware coherency is + enabled for the cluster. Other CPUs in the cluster can safely + enter coherency. + + This is a transient state, leading immediately to + CLUSTER_UP/INBOUND_NOT_COMING_UP. All other CPUs on the cluster + should consider treat these two states as equivalent. + + Next state: + CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound) + Conditions: + none + Trigger events: + (spontaneous) + + +CLUSTER_UP/INBOUND_NOT_COMING_UP: + + Cluster-level setup is complete and hardware coherency is + enabled for the cluster. Other CPUs in the cluster can safely + enter coherency. + + The cluster will remain in this state until a policy decision is + made to power the cluster down. + + Next state: + CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound) + Conditions: + none + Trigger events: + policy decision to power down the cluster + + +CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP: + + An outbound CPU is tearing the cluster down. The selected CPU + must wait in this state until all CPUs in the cluster are in the + CPU_DOWN state. + + When all CPUs are in the CPU_DOWN state, the cluster can be torn + down, for example by cleaning data caches and exiting + cluster-level coherency. + + To avoid wasteful unnecessary teardown operations, the outbound + should check the inbound cluster state for asynchronous + transitions to INBOUND_COMING_UP. Alternatively, individual + CPUs can be checked for entry into CPU_COMING_UP or CPU_UP. + + + Next states: + + CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound) + Conditions: + cluster torn down and ready to power off + Trigger events: + (spontaneous) + + CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound) + Conditions: + none + + Trigger events: + a) an explicit hardware power-up operation, + resulting from a policy decision on another + CPU; + + b) a hardware event, such as an interrupt. + + +CLUSTER_GOING_DOWN/INBOUND_COMING_UP: + + The cluster is (or was) being torn down, but another CPU has + come online in the meantime and is trying to set up the cluster + again. + + If the outbound CPU observes this state, it has two choices: + + a) back out of teardown, restoring the cluster to the + CLUSTER_UP state; + + b) finish tearing the cluster down and put the cluster + in the CLUSTER_DOWN state; the inbound CPU will + set up the cluster again from there. + + Choice (a) permits the removal of some latency by avoiding + unnecessary teardown and setup operations in situations where + the cluster is not really going to be powered down. + + + Next states: + + CLUSTER_UP/INBOUND_COMING_UP (outbound) + Conditions: + cluster-level setup and hardware + coherency complete + + Trigger events: + (spontaneous) + + CLUSTER_DOWN/INBOUND_COMING_UP (outbound) + Conditions: + cluster torn down and ready to power off + + Trigger events: + (spontaneous) + + +Last man and First man selection +-------------------------------- + +The CPU which performs cluster tear-down operations on the outbound side +is commonly referred to as the "last man". + +The CPU which performs cluster setup on the inbound side is commonly +referred to as the "first man". + +The race avoidance algorithm documented above does not provide a +mechanism to choose which CPUs should play these roles. + + +Last man: + +When shutting down the cluster, all the CPUs involved are initially +executing Linux and hence coherent. Therefore, ordinary spinlocks can +be used to select a last man safely, before the CPUs become +non-coherent. + + +First man: + +Because CPUs may power up asynchronously in response to external wake-up +events, a dynamic mechanism is needed to make sure that only one CPU +attempts to play the first man role and do the cluster-level +initialisation: any other CPUs must wait for this to complete before +proceeding. + +Cluster-level initialisation may involve actions such as configuring +coherency controls in the bus fabric. + +The current implementation in mcpm_head.S uses a separate mutual exclusion +mechanism to do this arbitration. This mechanism is documented in +detail in vlocks.txt. + + +Features and Limitations +------------------------ + +Implementation: + + The current ARM-based implementation is split between + arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and + arch/arm/common/mcpm_entry.c (everything else): + + __mcpm_cpu_going_down() signals the transition of a CPU to the + CPU_GOING_DOWN state. + + __mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN + state. + + A CPU transitions to CPU_COMING_UP and then to CPU_UP via the + low-level power-up code in mcpm_head.S. This could + involve CPU-specific setup code, but in the current + implementation it does not. + + __mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical() + handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN + and from there to CLUSTER_DOWN or back to CLUSTER_UP (in + the case of an aborted cluster power-down). + + These functions are more complex than the __mcpm_cpu_*() + functions due to the extra inter-CPU coordination which + is needed for safe transitions at the cluster level. + + A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via + the low-level power-up code in mcpm_head.S. This + typically involves platform-specific setup code, + provided by the platform-specific power_up_setup + function registered via mcpm_sync_init. + +Deep topologies: + + As currently described and implemented, the algorithm does not + support CPU topologies involving more than two levels (i.e., + clusters of clusters are not supported). The algorithm could be + extended by replicating the cluster-level states for the + additional topological levels, and modifying the transition + rules for the intermediate (non-outermost) cluster levels. + + +Colophon +-------- + +Originally created and documented by Dave Martin for Linaro Limited, in +collaboration with Nicolas Pitre and Achin Gupta. + +Copyright (C) 2012-2013 Linaro Limited +Distributed under the terms of Version 2 of the GNU General Public +License, as defined in linux/COPYING. diff --git a/Documentation/arch/arm/features.rst b/Documentation/arch/arm/features.rst new file mode 100644 index 000000000000..7414ec03dd15 --- /dev/null +++ b/Documentation/arch/arm/features.rst @@ -0,0 +1,3 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. kernel-feat:: $srctree/Documentation/features arm diff --git a/Documentation/arch/arm/firmware.rst b/Documentation/arch/arm/firmware.rst new file mode 100644 index 000000000000..efd844baec1d --- /dev/null +++ b/Documentation/arch/arm/firmware.rst @@ -0,0 +1,72 @@ +========================================================================== +Interface for registering and calling firmware-specific operations for ARM +========================================================================== + +Written by Tomasz Figa <t.figa@samsung.com> + +Some boards are running with secure firmware running in TrustZone secure +world, which changes the way some things have to be initialized. This makes +a need to provide an interface for such platforms to specify available firmware +operations and call them when needed. + +Firmware operations can be specified by filling in a struct firmware_ops +with appropriate callbacks and then registering it with register_firmware_ops() +function:: + + void register_firmware_ops(const struct firmware_ops *ops) + +The ops pointer must be non-NULL. More information about struct firmware_ops +and its members can be found in arch/arm/include/asm/firmware.h header. + +There is a default, empty set of operations provided, so there is no need to +set anything if platform does not require firmware operations. + +To call a firmware operation, a helper macro is provided:: + + #define call_firmware_op(op, ...) \ + ((firmware_ops->op) ? firmware_ops->op(__VA_ARGS__) : (-ENOSYS)) + +the macro checks if the operation is provided and calls it or otherwise returns +-ENOSYS to signal that given operation is not available (for example, to allow +fallback to legacy operation). + +Example of registering firmware operations:: + + /* board file */ + + static int platformX_do_idle(void) + { + /* tell platformX firmware to enter idle */ + return 0; + } + + static int platformX_cpu_boot(int i) + { + /* tell platformX firmware to boot CPU i */ + return 0; + } + + static const struct firmware_ops platformX_firmware_ops = { + .do_idle = exynos_do_idle, + .cpu_boot = exynos_cpu_boot, + /* other operations not available on platformX */ + }; + + /* init_early callback of machine descriptor */ + static void __init board_init_early(void) + { + register_firmware_ops(&platformX_firmware_ops); + } + +Example of using a firmware operation:: + + /* some platform code, e.g. SMP initialization */ + + __raw_writel(__pa_symbol(exynos4_secondary_startup), + CPU1_BOOT_REG); + + /* Call Exynos specific smc call */ + if (call_firmware_op(cpu_boot, cpu) == -ENOSYS) + cpu_boot_legacy(...); /* Try legacy way */ + + gic_raise_softirq(cpumask_of(cpu), 1); diff --git a/Documentation/arch/arm/google/chromebook-boot-flow.rst b/Documentation/arch/arm/google/chromebook-boot-flow.rst new file mode 100644 index 000000000000..36da77684bba --- /dev/null +++ b/Documentation/arch/arm/google/chromebook-boot-flow.rst @@ -0,0 +1,69 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================== +Chromebook Boot Flow +====================================== + +Most recent Chromebooks that use device tree are using the opensource +depthcharge_ bootloader. Depthcharge_ expects the OS to be packaged as a `FIT +Image`_ which contains an OS image as well as a collection of device trees. It +is up to depthcharge_ to pick the right device tree from the `FIT Image`_ and +provide it to the OS. + +The scheme that depthcharge_ uses to pick the device tree takes into account +three variables: + +- Board name, specified at depthcharge_ compile time. This is $(BOARD) below. +- Board revision number, determined at runtime (perhaps by reading GPIO + strappings, perhaps via some other method). This is $(REV) below. +- SKU number, read from GPIO strappings at boot time. This is $(SKU) below. + +For recent Chromebooks, depthcharge_ creates a match list that looks like this: + +- google,$(BOARD)-rev$(REV)-sku$(SKU) +- google,$(BOARD)-rev$(REV) +- google,$(BOARD)-sku$(SKU) +- google,$(BOARD) + +Note that some older Chromebooks use a slightly different list that may +not include SKU matching or may prioritize SKU/rev differently. + +Note that for some boards there may be extra board-specific logic to inject +extra compatibles into the list, but this is uncommon. + +Depthcharge_ will look through all device trees in the `FIT Image`_ trying to +find one that matches the most specific compatible. It will then look +through all device trees in the `FIT Image`_ trying to find the one that +matches the *second most* specific compatible, etc. + +When searching for a device tree, depthcharge_ doesn't care where the +compatible string falls within a device tree's root compatible string array. +As an example, if we're on board "lazor", rev 4, SKU 0 and we have two device +trees: + +- "google,lazor-rev5-sku0", "google,lazor-rev4-sku0", "qcom,sc7180" +- "google,lazor", "qcom,sc7180" + +Then depthcharge_ will pick the first device tree even though +"google,lazor-rev4-sku0" was the second compatible listed in that device tree. +This is because it is a more specific compatible than "google,lazor". + +It should be noted that depthcharge_ does not have any smarts to try to +match board or SKU revisions that are "close by". That is to say that +if depthcharge_ knows it's on "rev4" of a board but there is no "rev4" +device tree then depthcharge_ *won't* look for a "rev3" device tree. + +In general when any significant changes are made to a board the board +revision number is increased even if none of those changes need to +be reflected in the device tree. Thus it's fairly common to see device +trees with multiple revisions. + +It should be noted that, taking into account the above system that +depthcharge_ has, the most flexibility is achieved if the device tree +supporting the newest revision(s) of a board omits the "-rev{REV}" +compatible strings. When this is done then if you get a new board +revision and try to run old software on it then we'll at pick the +newest device tree we know about. + +.. _depthcharge: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/depthcharge/ +.. _`FIT Image`: https://doc.coreboot.org/lib/payloads/fit.html diff --git a/Documentation/arch/arm/index.rst b/Documentation/arch/arm/index.rst new file mode 100644 index 000000000000..fd43502ae924 --- /dev/null +++ b/Documentation/arch/arm/index.rst @@ -0,0 +1,85 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +ARM Architecture +================ + +.. toctree:: + :maxdepth: 1 + + arm + booting + cluster-pm-race-avoidance + firmware + interrupts + kernel_mode_neon + kernel_user_helpers + memory + mem_alignment + tcm + setup + swp_emulation + uefi + vlocks + porting + + features + +SoC-specific documents +====================== + +.. toctree:: + :maxdepth: 1 + + google/chromebook-boot-flow + + ixp4xx + + marvell + microchip + + netwinder + nwfpe/index + + keystone/overview + keystone/knav-qmss + + omap/index + + pxa/mfp + + + sa1100/index + + stm32/stm32f746-overview + stm32/overview + stm32/stm32h743-overview + stm32/stm32h750-overview + stm32/stm32f769-overview + stm32/stm32f429-overview + stm32/stm32mp13-overview + stm32/stm32mp151-overview + stm32/stm32mp157-overview + stm32/stm32-dma-mdma-chaining + + sunxi + + samsung/index + + sunxi/clocks + + spear/overview + + sti/stih407-overview + sti/stih418-overview + sti/overview + + vfp/release-notes + + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/arch/arm/interrupts.rst b/Documentation/arch/arm/interrupts.rst new file mode 100644 index 000000000000..2ae70e0e9732 --- /dev/null +++ b/Documentation/arch/arm/interrupts.rst @@ -0,0 +1,169 @@ +========== +Interrupts +========== + +2.5.2-rmk5: + This is the first kernel that contains a major shake up of some of the + major architecture-specific subsystems. + +Firstly, it contains some pretty major changes to the way we handle the +MMU TLB. Each MMU TLB variant is now handled completely separately - +we have TLB v3, TLB v4 (without write buffer), TLB v4 (with write buffer), +and finally TLB v4 (with write buffer, with I TLB invalidate entry). +There is more assembly code inside each of these functions, mainly to +allow more flexible TLB handling for the future. + +Secondly, the IRQ subsystem. + +The 2.5 kernels will be having major changes to the way IRQs are handled. +Unfortunately, this means that machine types that touch the irq_desc[] +array (basically all machine types) will break, and this means every +machine type that we currently have. + +Lets take an example. On the Assabet with Neponset, we have:: + + GPIO25 IRR:2 + SA1100 ------------> Neponset -----------> SA1111 + IIR:1 + -----------> USAR + IIR:0 + -----------> SMC9196 + +The way stuff currently works, all SA1111 interrupts are mutually +exclusive of each other - if you're processing one interrupt from the +SA1111 and another comes in, you have to wait for that interrupt to +finish processing before you can service the new interrupt. Eg, an +IDE PIO-based interrupt on the SA1111 excludes all other SA1111 and +SMC9196 interrupts until it has finished transferring its multi-sector +data, which can be a long time. Note also that since we loop in the +SA1111 IRQ handler, SA1111 IRQs can hold off SMC9196 IRQs indefinitely. + + +The new approach brings several new ideas... + +We introduce the concept of a "parent" and a "child". For example, +to the Neponset handler, the "parent" is GPIO25, and the "children"d +are SA1111, SMC9196 and USAR. + +We also bring the idea of an IRQ "chip" (mainly to reduce the size of +the irqdesc array). This doesn't have to be a real "IC"; indeed the +SA11x0 IRQs are handled by two separate "chip" structures, one for +GPIO0-10, and another for all the rest. It is just a container for +the various operations (maybe this'll change to a better name). +This structure has the following operations:: + + struct irqchip { + /* + * Acknowledge the IRQ. + * If this is a level-based IRQ, then it is expected to mask the IRQ + * as well. + */ + void (*ack)(unsigned int irq); + /* + * Mask the IRQ in hardware. + */ + void (*mask)(unsigned int irq); + /* + * Unmask the IRQ in hardware. + */ + void (*unmask)(unsigned int irq); + /* + * Re-run the IRQ + */ + void (*rerun)(unsigned int irq); + /* + * Set the type of the IRQ. + */ + int (*type)(unsigned int irq, unsigned int, type); + }; + +ack + - required. May be the same function as mask for IRQs + handled by do_level_IRQ. +mask + - required. +unmask + - required. +rerun + - optional. Not required if you're using do_level_IRQ for all + IRQs that use this 'irqchip'. Generally expected to re-trigger + the hardware IRQ if possible. If not, may call the handler + directly. +type + - optional. If you don't support changing the type of an IRQ, + it should be null so people can detect if they are unable to + set the IRQ type. + +For each IRQ, we keep the following information: + + - "disable" depth (number of disable_irq()s without enable_irq()s) + - flags indicating what we can do with this IRQ (valid, probe, + noautounmask) as before + - status of the IRQ (probing, enable, etc) + - chip + - per-IRQ handler + - irqaction structure list + +The handler can be one of the 3 standard handlers - "level", "edge" and +"simple", or your own specific handler if you need to do something special. + +The "level" handler is what we currently have - its pretty simple. +"edge" knows about the brokenness of such IRQ implementations - that you +need to leave the hardware IRQ enabled while processing it, and queueing +further IRQ events should the IRQ happen again while processing. The +"simple" handler is very basic, and does not perform any hardware +manipulation, nor state tracking. This is useful for things like the +SMC9196 and USAR above. + +So, what's changed? +=================== + +1. Machine implementations must not write to the irqdesc array. + +2. New functions to manipulate the irqdesc array. The first 4 are expected + to be useful only to machine specific code. The last is recommended to + only be used by machine specific code, but may be used in drivers if + absolutely necessary. + + set_irq_chip(irq,chip) + Set the mask/unmask methods for handling this IRQ + + set_irq_handler(irq,handler) + Set the handler for this IRQ (level, edge, simple) + + set_irq_chained_handler(irq,handler) + Set a "chained" handler for this IRQ - automatically + enables this IRQ (eg, Neponset and SA1111 handlers). + + set_irq_flags(irq,flags) + Set the valid/probe/noautoenable flags. + + set_irq_type(irq,type) + Set active the IRQ edge(s)/level. This replaces the + SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge() + function. Type should be one of IRQ_TYPE_xxx defined in + <linux/irq.h> + +3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type. + +4. Direct access to SA1111 INTPOL is deprecated. Use set_irq_type instead. + +5. A handler is expected to perform any necessary acknowledgement of the + parent IRQ via the correct chip specific function. For instance, if + the SA1111 is directly connected to a SA1110 GPIO, then you should + acknowledge the SA1110 IRQ each time you re-read the SA1111 IRQ status. + +6. For any child which doesn't have its own IRQ enable/disable controls + (eg, SMC9196), the handler must mask or acknowledge the parent IRQ + while the child handler is called, and the child handler should be the + "simple" handler (not "edge" nor "level"). After the handler completes, + the parent IRQ should be unmasked, and the status of all children must + be re-checked for pending events. (see the Neponset IRQ handler for + details). + +7. fixup_irq() is gone, as is `arch/arm/mach-*/include/mach/irq.h` + +Please note that this will not solve all problems - some of them are +hardware based. Mixing level-based and edge-based IRQs on the same +parent signal (eg neponset) is one such area where a software based +solution can't provide the full answer to low IRQ latency. diff --git a/Documentation/arch/arm/ixp4xx.rst b/Documentation/arch/arm/ixp4xx.rst new file mode 100644 index 000000000000..a57235616294 --- /dev/null +++ b/Documentation/arch/arm/ixp4xx.rst @@ -0,0 +1,173 @@ +=========================================================== +Release Notes for Linux on Intel's IXP4xx Network Processor +=========================================================== + +Maintained by Deepak Saxena <dsaxena@plexity.net> +------------------------------------------------------------------------- + +1. Overview + +Intel's IXP4xx network processor is a highly integrated SOC that +is targeted for network applications, though it has become popular +in industrial control and other areas due to low cost and power +consumption. The IXP4xx family currently consists of several processors +that support different network offload functions such as encryption, +routing, firewalling, etc. The IXP46x family is an updated version which +supports faster speeds, new memory and flash configurations, and more +integration such as an on-chip I2C controller. + +For more information on the various versions of the CPU, see: + + http://developer.intel.com/design/network/products/npfamily/ixp4xx.htm + +Intel also made the IXCP1100 CPU for sometime which is an IXP4xx +stripped of much of the network intelligence. + +2. Linux Support + +Linux currently supports the following features on the IXP4xx chips: + +- Dual serial ports +- PCI interface +- Flash access (MTD/JFFS) +- I2C through GPIO on IXP42x +- GPIO for input/output/interrupts + See arch/arm/mach-ixp4xx/include/mach/platform.h for access functions. +- Timers (watchdog, OS) + +The following components of the chips are not supported by Linux and +require the use of Intel's proprietary CSR software: + +- USB device interface +- Network interfaces (HSS, Utopia, NPEs, etc) +- Network offload functionality + +If you need to use any of the above, you need to download Intel's +software from: + + http://developer.intel.com/design/network/products/npfamily/ixp425.htm + +DO NOT POST QUESTIONS TO THE LINUX MAILING LISTS REGARDING THE PROPRIETARY +SOFTWARE. + +There are several websites that provide directions/pointers on using +Intel's software: + + - http://sourceforge.net/projects/ixp4xx-osdg/ + Open Source Developer's Guide for using uClinux and the Intel libraries + + - http://gatewaymaker.sourceforge.net/ + Simple one page summary of building a gateway using an IXP425 and Linux + + - http://ixp425.sourceforge.net/ + ATM device driver for IXP425 that relies on Intel's libraries + +3. Known Issues/Limitations + +3a. Limited inbound PCI window + +The IXP4xx family allows for up to 256MB of memory but the PCI interface +can only expose 64MB of that memory to the PCI bus. This means that if +you are running with > 64MB, all PCI buffers outside of the accessible +range will be bounced using the routines in arch/arm/common/dmabounce.c. + +3b. Limited outbound PCI window + +IXP4xx provides two methods of accessing PCI memory space: + +1) A direct mapped window from 0x48000000 to 0x4bffffff (64MB). + To access PCI via this space, we simply ioremap() the BAR + into the kernel and we can use the standard read[bwl]/write[bwl] + macros. This is the preffered method due to speed but it + limits the system to just 64MB of PCI memory. This can be + problamatic if using video cards and other memory-heavy devices. + +2) If > 64MB of memory space is required, the IXP4xx can be + configured to use indirect registers to access PCI This allows + for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus. + The disadvantage of this is that every PCI access requires + three local register accesses plus a spinlock, but in some + cases the performance hit is acceptable. In addition, you cannot + mmap() PCI devices in this case due to the indirect nature + of the PCI window. + +By default, the direct method is used for performance reasons. If +you need more PCI memory, enable the IXP4XX_INDIRECT_PCI config option. + +3c. GPIO as Interrupts + +Currently the code only handles level-sensitive GPIO interrupts + +4. Supported platforms + +ADI Engineering Coyote Gateway Reference Platform +http://www.adiengineering.com/productsCoyote.html + + The ADI Coyote platform is reference design for those building + small residential/office gateways. One NPE is connected to a 10/100 + interface, one to 4-port 10/100 switch, and the third to and ADSL + interface. In addition, it also supports to POTs interfaces connected + via SLICs. Note that those are not supported by Linux ATM. Finally, + the platform has two mini-PCI slots used for 802.11[bga] cards. + Finally, there is an IDE port hanging off the expansion bus. + +Gateworks Avila Network Platform +http://www.gateworks.com/support/overview.php + + The Avila platform is basically and IXDP425 with the 4 PCI slots + replaced with mini-PCI slots and a CF IDE interface hanging off + the expansion bus. + +Intel IXDP425 Development Platform +http://www.intel.com/design/network/products/npfamily/ixdpg425.htm + + This is Intel's standard reference platform for the IXDP425 and is + also known as the Richfield board. It contains 4 PCI slots, 16MB + of flash, two 10/100 ports and one ADSL port. + +Intel IXDP465 Development Platform +http://www.intel.com/design/network/products/npfamily/ixdp465.htm + + This is basically an IXDP425 with an IXP465 and 32M of flash instead + of just 16. + +Intel IXDPG425 Development Platform + + This is basically and ADI Coyote board with a NEC EHCI controller + added. One issue with this board is that the mini-PCI slots only + have the 3.3v line connected, so you can't use a PCI to mini-PCI + adapter with an E100 card. So to NFS root you need to use either + the CSR or a WiFi card and a ramdisk that BOOTPs and then does + a pivot_root to NFS. + +Motorola PrPMC1100 Processor Mezanine Card +http://www.fountainsys.com + + The PrPMC1100 is based on the IXCP1100 and is meant to plug into + and IXP2400/2800 system to act as the system controller. It simply + contains a CPU and 16MB of flash on the board and needs to be + plugged into a carrier board to function. Currently Linux only + supports the Motorola PrPMC carrier board for this platform. + +5. TODO LIST + +- Add support for Coyote IDE +- Add support for edge-based GPIO interrupts +- Add support for CF IDE on expansion bus + +6. Thanks + +The IXP4xx work has been funded by Intel Corp. and MontaVista Software, Inc. + +The following people have contributed patches/comments/etc: + +- Lennerty Buytenhek +- Lutz Jaenicke +- Justin Mayfield +- Robert E. Ranslam + +[I know I've forgotten others, please email me to be added] + +------------------------------------------------------------------------- + +Last Update: 01/04/2005 diff --git a/Documentation/arch/arm/kernel_mode_neon.rst b/Documentation/arch/arm/kernel_mode_neon.rst new file mode 100644 index 000000000000..9bfb71a2a9b9 --- /dev/null +++ b/Documentation/arch/arm/kernel_mode_neon.rst @@ -0,0 +1,124 @@ +================ +Kernel mode NEON +================ + +TL;DR summary +------------- +* Use only NEON instructions, or VFP instructions that don't rely on support + code +* Isolate your NEON code in a separate compilation unit, and compile it with + '-march=armv7-a -mfpu=neon -mfloat-abi=softfp' +* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your + NEON code +* Don't sleep in your NEON code, and be aware that it will be executed with + preemption disabled + + +Introduction +------------ +It is possible to use NEON instructions (and in some cases, VFP instructions) in +code that runs in kernel mode. However, for performance reasons, the NEON/VFP +register file is not preserved and restored at every context switch or taken +exception like the normal register file is, so some manual intervention is +required. Furthermore, special care is required for code that may sleep [i.e., +may call schedule()], as NEON or VFP instructions will be executed in a +non-preemptible section for reasons outlined below. + + +Lazy preserve and restore +------------------------- +The NEON/VFP register file is managed using lazy preserve (on UP systems) and +lazy restore (on both SMP and UP systems). This means that the register file is +kept 'live', and is only preserved and restored when multiple tasks are +contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to +another core). Lazy restore is implemented by disabling the NEON/VFP unit after +every context switch, resulting in a trap when subsequently a NEON/VFP +instruction is issued, allowing the kernel to step in and perform the restore if +necessary. + +Any use of the NEON/VFP unit in kernel mode should not interfere with this, so +it is required to do an 'eager' preserve of the NEON/VFP register file, and +enable the NEON/VFP unit explicitly so no exceptions are generated on first +subsequent use. This is handled by the function kernel_neon_begin(), which +should be called before any kernel mode NEON or VFP instructions are issued. +Likewise, the NEON/VFP unit should be disabled again after use to make sure user +mode will hit the lazy restore trap upon next use. This is handled by the +function kernel_neon_end(). + + +Interruptions in kernel mode +---------------------------- +For reasons of performance and simplicity, it was decided that there shall be no +preserve/restore mechanism for the kernel mode NEON/VFP register contents. This +implies that interruptions of a kernel mode NEON section can only be allowed if +they are guaranteed not to touch the NEON/VFP registers. For this reason, the +following rules and restrictions apply in the kernel: +* NEON/VFP code is not allowed in interrupt context; +* NEON/VFP code is not allowed to sleep; +* NEON/VFP code is executed with preemption disabled. + +If latency is a concern, it is possible to put back to back calls to +kernel_neon_end() and kernel_neon_begin() in places in your code where none of +the NEON registers are live. (Additional calls to kernel_neon_begin() should be +reasonably cheap if no context switch occurred in the meantime) + + +VFP and support code +-------------------- +Earlier versions of VFP (prior to version 3) rely on software support for things +like IEEE-754 compliant underflow handling etc. When the VFP unit needs such +software assistance, it signals the kernel by raising an undefined instruction +exception. The kernel responds by inspecting the VFP control registers and the +current instruction and arguments, and emulates the instruction in software. + +Such software assistance is currently not implemented for VFP instructions +executed in kernel mode. If such a condition is encountered, the kernel will +fail and generate an OOPS. + + +Separating NEON code from ordinary code +--------------------------------------- +The compiler is not aware of the special significance of kernel_neon_begin() and +kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions +between calls to these respective functions. Furthermore, GCC may generate NEON +instructions of its own at -O3 level if -mfpu=neon is selected, and even if the +kernel is currently compiled at -O2, future changes may result in NEON/VFP +instructions appearing in unexpected places if no special care is taken. + +Therefore, the recommended and only supported way of using NEON/VFP in the +kernel is by adhering to the following rules: + +* isolate the NEON code in a separate compilation unit and compile it with + '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'; +* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls + into the unit containing the NEON code from a compilation unit which is *not* + built with the GCC flag '-mfpu=neon' set. + +As the kernel is compiled with '-msoft-float', the above will guarantee that +both NEON and VFP instructions will only ever appear in designated compilation +units at any optimization level. + + +NEON assembler +-------------- +NEON assembler is supported with no additional caveats as long as the rules +above are followed. + + +NEON code generated by GCC +-------------------------- +The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit +parallelism, and generates NEON code from ordinary C source code. This is fully +supported as long as the rules above are followed. + + +NEON intrinsics +--------------- +NEON intrinsics are also supported. However, as code using NEON intrinsics +relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should +observe the following in addition to the rules above: + +* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC + uses its builtin version of <stdint.h> (this is a C99 header which the kernel + does not supply); +* Include <arm_neon.h> last, or at least after <linux/types.h> diff --git a/Documentation/arch/arm/kernel_user_helpers.rst b/Documentation/arch/arm/kernel_user_helpers.rst new file mode 100644 index 000000000000..eb6f3d916622 --- /dev/null +++ b/Documentation/arch/arm/kernel_user_helpers.rst @@ -0,0 +1,268 @@ +============================ +Kernel-provided User Helpers +============================ + +These are segment of kernel provided user code reachable from user space +at a fixed address in kernel memory. This is used to provide user space +with some operations which require kernel help because of unimplemented +native feature and/or instructions in many ARM CPUs. The idea is for this +code to be executed directly in user mode for best efficiency but which is +too intimate with the kernel counter part to be left to user libraries. +In fact this code might even differ from one CPU to another depending on +the available instruction set, or whether it is a SMP systems. In other +words, the kernel reserves the right to change this code as needed without +warning. Only the entry points and their results as documented here are +guaranteed to be stable. + +This is different from (but doesn't preclude) a full blown VDSO +implementation, however a VDSO would prevent some assembly tricks with +constants that allows for efficient branching to those code segments. And +since those code segments only use a few cycles before returning to user +code, the overhead of a VDSO indirect far call would add a measurable +overhead to such minimalistic operations. + +User space is expected to bypass those helpers and implement those things +inline (either in the code emitted directly by the compiler, or part of +the implementation of a library call) when optimizing for a recent enough +processor that has the necessary native support, but only if resulting +binaries are already to be incompatible with earlier ARM processors due to +usage of similar native instructions for other things. In other words +don't make binaries unable to run on earlier processors just for the sake +of not using these kernel helpers if your compiled code is not going to +use new instructions for other purpose. + +New helpers may be added over time, so an older kernel may be missing some +helpers present in a newer kernel. For this reason, programs must check +the value of __kuser_helper_version (see below) before assuming that it is +safe to call any particular helper. This check should ideally be +performed only once at process startup time, and execution aborted early +if the required helpers are not provided by the kernel version that +process is running on. + +kuser_helper_version +-------------------- + +Location: 0xffff0ffc + +Reference declaration:: + + extern int32_t __kuser_helper_version; + +Definition: + + This field contains the number of helpers being implemented by the + running kernel. User space may read this to determine the availability + of a particular helper. + +Usage example:: + + #define __kuser_helper_version (*(int32_t *)0xffff0ffc) + + void check_kuser_version(void) + { + if (__kuser_helper_version < 2) { + fprintf(stderr, "can't do atomic operations, kernel too old\n"); + abort(); + } + } + +Notes: + + User space may assume that the value of this field never changes + during the lifetime of any single process. This means that this + field can be read once during the initialisation of a library or + startup phase of a program. + +kuser_get_tls +------------- + +Location: 0xffff0fe0 + +Reference prototype:: + + void * __kuser_get_tls(void); + +Input: + + lr = return address + +Output: + + r0 = TLS value + +Clobbered registers: + + none + +Definition: + + Get the TLS value as previously set via the __ARM_NR_set_tls syscall. + +Usage example:: + + typedef void * (__kuser_get_tls_t)(void); + #define __kuser_get_tls (*(__kuser_get_tls_t *)0xffff0fe0) + + void foo() + { + void *tls = __kuser_get_tls(); + printf("TLS = %p\n", tls); + } + +Notes: + + - Valid only if __kuser_helper_version >= 1 (from kernel version 2.6.12). + +kuser_cmpxchg +------------- + +Location: 0xffff0fc0 + +Reference prototype:: + + int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr); + +Input: + + r0 = oldval + r1 = newval + r2 = ptr + lr = return address + +Output: + + r0 = success code (zero or non-zero) + C flag = set if r0 == 0, clear if r0 != 0 + +Clobbered registers: + + r3, ip, flags + +Definition: + + Atomically store newval in `*ptr` only if `*ptr` is equal to oldval. + Return zero if `*ptr` was changed or non-zero if no exchange happened. + The C flag is also set if `*ptr` was changed to allow for assembly + optimization in the calling code. + +Usage example:: + + typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr); + #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0) + + int atomic_add(volatile int *ptr, int val) + { + int old, new; + + do { + old = *ptr; + new = old + val; + } while(__kuser_cmpxchg(old, new, ptr)); + + return new; + } + +Notes: + + - This routine already includes memory barriers as needed. + + - Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12). + +kuser_memory_barrier +-------------------- + +Location: 0xffff0fa0 + +Reference prototype:: + + void __kuser_memory_barrier(void); + +Input: + + lr = return address + +Output: + + none + +Clobbered registers: + + none + +Definition: + + Apply any needed memory barrier to preserve consistency with data modified + manually and __kuser_cmpxchg usage. + +Usage example:: + + typedef void (__kuser_dmb_t)(void); + #define __kuser_dmb (*(__kuser_dmb_t *)0xffff0fa0) + +Notes: + + - Valid only if __kuser_helper_version >= 3 (from kernel version 2.6.15). + +kuser_cmpxchg64 +--------------- + +Location: 0xffff0f60 + +Reference prototype:: + + int __kuser_cmpxchg64(const int64_t *oldval, + const int64_t *newval, + volatile int64_t *ptr); + +Input: + + r0 = pointer to oldval + r1 = pointer to newval + r2 = pointer to target value + lr = return address + +Output: + + r0 = success code (zero or non-zero) + C flag = set if r0 == 0, clear if r0 != 0 + +Clobbered registers: + + r3, lr, flags + +Definition: + + Atomically store the 64-bit value pointed by `*newval` in `*ptr` only if `*ptr` + is equal to the 64-bit value pointed by `*oldval`. Return zero if `*ptr` was + changed or non-zero if no exchange happened. + + The C flag is also set if `*ptr` was changed to allow for assembly + optimization in the calling code. + +Usage example:: + + typedef int (__kuser_cmpxchg64_t)(const int64_t *oldval, + const int64_t *newval, + volatile int64_t *ptr); + #define __kuser_cmpxchg64 (*(__kuser_cmpxchg64_t *)0xffff0f60) + + int64_t atomic_add64(volatile int64_t *ptr, int64_t val) + { + int64_t old, new; + + do { + old = *ptr; + new = old + val; + } while(__kuser_cmpxchg64(&old, &new, ptr)); + + return new; + } + +Notes: + + - This routine already includes memory barriers as needed. + + - Due to the length of this sequence, this spans 2 conventional kuser + "slots", therefore 0xffff0f80 is not used as a valid entry point. + + - Valid only if __kuser_helper_version >= 5 (from kernel version 3.1). diff --git a/Documentation/arch/arm/keystone/knav-qmss.rst b/Documentation/arch/arm/keystone/knav-qmss.rst new file mode 100644 index 000000000000..7f7638d80b42 --- /dev/null +++ b/Documentation/arch/arm/keystone/knav-qmss.rst @@ -0,0 +1,60 @@ +====================================================================== +Texas Instruments Keystone Navigator Queue Management SubSystem driver +====================================================================== + +Driver source code path + drivers/soc/ti/knav_qmss.c + drivers/soc/ti/knav_qmss_acc.c + +The QMSS (Queue Manager Sub System) found on Keystone SOCs is one of +the main hardware sub system which forms the backbone of the Keystone +multi-core Navigator. QMSS consist of queue managers, packed-data structure +processors(PDSP), linking RAM, descriptor pools and infrastructure +Packet DMA. +The Queue Manager is a hardware module that is responsible for accelerating +management of the packet queues. Packets are queued/de-queued by writing or +reading descriptor address to a particular memory mapped location. The PDSPs +perform QMSS related functions like accumulation, QoS, or event management. +Linking RAM registers are used to link the descriptors which are stored in +descriptor RAM. Descriptor RAM is configurable as internal or external memory. +The QMSS driver manages the PDSP setups, linking RAM regions, +queue pool management (allocation, push, pop and notify) and descriptor +pool management. + +knav qmss driver provides a set of APIs to drivers to open/close qmss queues, +allocate descriptor pools, map the descriptors, push/pop to queues etc. For +details of the available APIs, please refers to include/linux/soc/ti/knav_qmss.h + +DT documentation is available at +Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt + +Accumulator QMSS queues using PDSP firmware +============================================ +The QMSS PDSP firmware support accumulator channel that can monitor a single +queue or multiple contiguous queues. drivers/soc/ti/knav_qmss_acc.c is the +driver that interface with the accumulator PDSP. This configures +accumulator channels defined in DTS (example in DT documentation) to monitor +1 or 32 queues per channel. More description on the firmware is available in +CPPI/QMSS Low Level Driver document (docs/CPPI_QMSS_LLD_SDS.pdf) at + + git://git.ti.com/keystone-rtos/qmss-lld.git + +k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin firmware supports upto 48 accumulator +channels. This firmware is available under ti-keystone folder of +firmware.git at + + git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git + +To use copy the firmware image to lib/firmware folder of the initramfs or +ubifs file system and provide a sym link to k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin +in the file system and boot up the kernel. User would see + + "firmware file ks2_qmss_pdsp_acc48.bin downloaded for PDSP" + +in the boot up log if loading of firmware to PDSP is successful. + +Use of accumulated queues requires the firmware image to be present in the +file system. The driver doesn't acc queues to the supported queue range if +PDSP is not running in the SoC. The API call fails if there is a queue open +request to an acc queue and PDSP is not running. So make sure to copy firmware +to file system before using these queue types. diff --git a/Documentation/arch/arm/keystone/overview.rst b/Documentation/arch/arm/keystone/overview.rst new file mode 100644 index 000000000000..cd90298c493c --- /dev/null +++ b/Documentation/arch/arm/keystone/overview.rst @@ -0,0 +1,74 @@ +========================== +TI Keystone Linux Overview +========================== + +Introduction +------------ +Keystone range of SoCs are based on ARM Cortex-A15 MPCore Processors +and c66x DSP cores. This document describes essential information required +for users to run Linux on Keystone based EVMs from Texas Instruments. + +Following SoCs & EVMs are currently supported:- + +K2HK SoC and EVM +================= + +a.k.a Keystone 2 Hawking/Kepler SoC +TCI6636K2H & TCI6636K2K: See documentation at + + http://www.ti.com/product/tci6638k2k + http://www.ti.com/product/tci6638k2h + +EVM: + http://www.advantech.com/Support/TI-EVM/EVMK2HX_sd.aspx + +K2E SoC and EVM +=============== + +a.k.a Keystone 2 Edison SoC + +K2E - 66AK2E05: + +See documentation at + + http://www.ti.com/product/66AK2E05/technicaldocuments + +EVM: + https://www.einfochips.com/index.php/partnerships/texas-instruments/k2e-evm.html + +K2L SoC and EVM +=============== + +a.k.a Keystone 2 Lamarr SoC + +K2L - TCI6630K2L: + +See documentation at + http://www.ti.com/product/TCI6630K2L/technicaldocuments + +EVM: + https://www.einfochips.com/index.php/partnerships/texas-instruments/k2l-evm.html + +Configuration +------------- + +All of the K2 SoCs/EVMs share a common defconfig, keystone_defconfig and same +image is used to boot on individual EVMs. The platform configuration is +specified through DTS. Following are the DTS used: + + K2HK EVM: + k2hk-evm.dts + K2E EVM: + k2e-evm.dts + K2L EVM: + k2l-evm.dts + +The device tree documentation for the keystone machines are located at + + Documentation/devicetree/bindings/arm/keystone/keystone.txt + +Document Author +--------------- +Murali Karicheri <m-karicheri2@ti.com> + +Copyright 2015 Texas Instruments diff --git a/Documentation/arch/arm/marvell.rst b/Documentation/arch/arm/marvell.rst new file mode 100644 index 000000000000..3d369a566038 --- /dev/null +++ b/Documentation/arch/arm/marvell.rst @@ -0,0 +1,527 @@ +================ +ARM Marvell SoCs +================ + +This document lists all the ARM Marvell SoCs that are currently +supported in mainline by the Linux kernel. As the Marvell families of +SoCs are large and complex, it is hard to understand where the support +for a particular SoC is available in the Linux kernel. This document +tries to help in understanding where those SoCs are supported, and to +match them with their corresponding public datasheet, when available. + +Orion family +------------ + + Flavors: + - 88F5082 + - 88F5181 a.k.a Orion-1 + - 88F5181L a.k.a Orion-VoIP + - 88F5182 a.k.a Orion-NAS + + - Datasheet: https://web.archive.org/web/20210124231420/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-datasheet.pdf + - Programmer's User Guide: https://web.archive.org/web/20210124231536/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-opensource-manual.pdf + - User Manual: https://web.archive.org/web/20210124231631/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-usermanual.pdf + - Functional Errata: https://web.archive.org/web/20210704165540/https://www.digriz.org.uk/ts78xx/88F5182_Functional_Errata.pdf + - 88F5281 a.k.a Orion-2 + + - Datasheet: https://web.archive.org/web/20131028144728/http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf + - 88F6183 a.k.a Orion-1-90 + Homepage: + https://web.archive.org/web/20080607215437/http://www.marvell.com/products/media/index.jsp + Core: + Feroceon 88fr331 (88f51xx) or 88fr531-vd (88f52xx) ARMv5 compatible + Linux kernel mach directory: + arch/arm/mach-orion5x + Linux kernel plat directory: + arch/arm/plat-orion + +Kirkwood family +--------------- + + Flavors: + - 88F6282 a.k.a Armada 300 + + - Product Brief : https://web.archive.org/web/20111027032509/http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf + - 88F6283 a.k.a Armada 310 + + - Product Brief : https://web.archive.org/web/20111027032509/http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf + - 88F6190 + + - Product Brief : https://web.archive.org/web/20130730072715/http://www.marvell.com/embedded-processors/kirkwood/assets/88F6190-003_WEB.pdf + - Hardware Spec : https://web.archive.org/web/20121021182835/http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20130730091033/http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf + - 88F6192 + + - Product Brief : https://web.archive.org/web/20131113121446/http://www.marvell.com/embedded-processors/kirkwood/assets/88F6192-003_ver1.pdf + - Hardware Spec : https://web.archive.org/web/20121021182835/http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20130730091033/http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf + - 88F6182 + - 88F6180 + + - Product Brief : https://web.archive.org/web/20120616201621/http://www.marvell.com/embedded-processors/kirkwood/assets/88F6180-003_ver1.pdf + - Hardware Spec : https://web.archive.org/web/20130730091654/http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6180_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20130730091033/http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf + - 88F6280 + + - Product Brief : https://web.archive.org/web/20130730091058/http://www.marvell.com/embedded-processors/kirkwood/assets/88F6280_SoC_PB-001.pdf + - 88F6281 + + - Product Brief : https://web.archive.org/web/20120131133709/http://www.marvell.com/embedded-processors/kirkwood/assets/88F6281-004_ver1.pdf + - Hardware Spec : https://web.archive.org/web/20120620073511/http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6281_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20130730091033/http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf + - 88F6321 + - 88F6322 + - 88F6323 + + - Product Brief : https://web.archive.org/web/20120616201639/http://www.marvell.com/embedded-processors/kirkwood/assets/88f632x_pb.pdf + Homepage: + https://web.archive.org/web/20160513194943/http://www.marvell.com/embedded-processors/kirkwood/ + Core: + Feroceon 88fr131 ARMv5 compatible + Linux kernel mach directory: + arch/arm/mach-mvebu + Linux kernel plat directory: + none + +Discovery family +---------------- + + Flavors: + - MV78100 + + - Product Brief : https://web.archive.org/web/20120616194711/http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78100-003_WEB.pdf + - Hardware Spec : https://web.archive.org/web/20141005120451/http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78100_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20111110081125/http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf + - MV78200 + + - Product Brief : https://web.archive.org/web/20140801121623/http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78200-002_WEB.pdf + - Hardware Spec : https://web.archive.org/web/20141005120458/http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78200_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20111110081125/http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf + + - MV76100 + + - Product Brief : https://web.archive.org/web/20140722064429/http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV76100-002_WEB.pdf + - Hardware Spec : https://web.archive.org/web/20140722064425/http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV76100_OpenSource.pdf + - Functional Spec: https://web.archive.org/web/20111110081125/http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf + + Not supported by the Linux kernel. + + Homepage: + https://web.archive.org/web/20110924171043/http://www.marvell.com/embedded-processors/discovery-innovation/ + Core: + Feroceon 88fr571-vd ARMv5 compatible + + Linux kernel mach directory: + arch/arm/mach-mv78xx0 + Linux kernel plat directory: + arch/arm/plat-orion + +EBU Armada family +----------------- + + Armada 370 Flavors: + - 88F6710 + - 88F6707 + - 88F6W11 + + - Product infos: https://web.archive.org/web/20141002083258/http://www.marvell.com/embedded-processors/armada-370/ + - Product Brief: https://web.archive.org/web/20121115063038/http://www.marvell.com/embedded-processors/armada-300/assets/Marvell_ARMADA_370_SoC.pdf + - Hardware Spec: https://web.archive.org/web/20140617183747/http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-datasheet.pdf + - Functional Spec: https://web.archive.org/web/20140617183701/http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-FunctionalSpec-datasheet.pdf + + Core: + Sheeva ARMv7 compatible PJ4B + + Armada XP Flavors: + - MV78230 + - MV78260 + - MV78460 + + NOTE: + not to be confused with the non-SMP 78xx0 SoCs + + - Product infos: https://web.archive.org/web/20150101215721/http://www.marvell.com/embedded-processors/armada-xp/ + - Product Brief: https://web.archive.org/web/20121021173528/http://www.marvell.com/embedded-processors/armada-xp/assets/Marvell-ArmadaXP-SoC-product%20brief.pdf + - Functional Spec: https://web.archive.org/web/20180829171131/http://www.marvell.com/embedded-processors/armada-xp/assets/ARMADA-XP-Functional-SpecDatasheet.pdf + - Hardware Specs: + - https://web.archive.org/web/20141127013651/http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78230_OS.PDF + - https://web.archive.org/web/20141222000224/http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78260_OS.PDF + - https://web.archive.org/web/20141222000230/http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78460_OS.PDF + + Core: + Sheeva ARMv7 compatible Dual-core or Quad-core PJ4B-MP + + Armada 375 Flavors: + - 88F6720 + + - Product infos: https://web.archive.org/web/20140108032402/http://www.marvell.com/embedded-processors/armada-375/ + - Product Brief: https://web.archive.org/web/20131216023516/http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA_375_SoC-01_product_brief.pdf + + Core: + ARM Cortex-A9 + + Armada 38x Flavors: + - 88F6810 Armada 380 + - 88F6811 Armada 381 + - 88F6821 Armada 382 + - 88F6W21 Armada 383 + - 88F6820 Armada 385 + - 88F6825 + - 88F6828 Armada 388 + + - Product infos: https://web.archive.org/web/20181006144616/http://www.marvell.com/embedded-processors/armada-38x/ + - Functional Spec: https://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf + - Hardware Spec: https://web.archive.org/web/20180713105318/https://www.marvell.com/docs/embedded-processors/assets/marvell-embedded-processors-armada-38x-hardware-specifications-2017-03.pdf + - Design guide: https://web.archive.org/web/20180712231737/https://www.marvell.com/docs/embedded-processors/assets/marvell-embedded-processors-armada-38x-hardware-design-guide-2017-08.pdf + + Core: + ARM Cortex-A9 + + Armada 39x Flavors: + - 88F6920 Armada 390 + - 88F6925 Armada 395 + - 88F6928 Armada 398 + + - Product infos: https://web.archive.org/web/20181020222559/http://www.marvell.com/embedded-processors/armada-39x/ + + Core: + ARM Cortex-A9 + + Linux kernel mach directory: + arch/arm/mach-mvebu + Linux kernel plat directory: + none + +EBU Armada family ARMv8 +----------------------- + + Armada 3710/3720 Flavors: + - 88F3710 + - 88F3720 + + Core: + ARM Cortex A53 (ARMv8) + + Homepage: + https://web.archive.org/web/20181103003602/http://www.marvell.com/embedded-processors/armada-3700/ + + Product Brief: + https://web.archive.org/web/20210121194810/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-37xx-product-brief-2016-01.pdf + + Hardware Spec: + https://web.archive.org/web/20210202162011/http://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-37xx-hardware-specifications-2019-09.pdf + + Device tree files: + arch/arm64/boot/dts/marvell/armada-37* + + Armada 7K Flavors: + - 88F6040 (AP806 Quad 600 MHz + one CP110) + - 88F7020 (AP806 Dual + one CP110) + - 88F7040 (AP806 Quad + one CP110) + + Core: ARM Cortex A72 + + Homepage: + https://web.archive.org/web/20181020222606/http://www.marvell.com/embedded-processors/armada-70xx/ + + Product Brief: + - https://web.archive.org/web/20161010105541/http://www.marvell.com/embedded-processors/assets/Armada7020PB-Jan2016.pdf + - https://web.archive.org/web/20160928154533/http://www.marvell.com/embedded-processors/assets/Armada7040PB-Jan2016.pdf + + Device tree files: + arch/arm64/boot/dts/marvell/armada-70* + + Armada 8K Flavors: + - 88F8020 (AP806 Dual + two CP110) + - 88F8040 (AP806 Quad + two CP110) + Core: + ARM Cortex A72 + + Homepage: + https://web.archive.org/web/20181022004830/http://www.marvell.com/embedded-processors/armada-80xx/ + + Product Brief: + - https://web.archive.org/web/20210124233728/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-8020-product-brief-2017-12.pdf + - https://web.archive.org/web/20161010105532/http://www.marvell.com/embedded-processors/assets/Armada8040PB-Jan2016.pdf + + Device tree files: + arch/arm64/boot/dts/marvell/armada-80* + + Octeon TX2 CN913x Flavors: + - CN9130 (AP807 Quad + one internal CP115) + - CN9131 (AP807 Quad + one internal CP115 + one external CP115 / 88F8215) + - CN9132 (AP807 Quad + one internal CP115 + two external CP115 / 88F8215) + + Core: + ARM Cortex A72 + + Homepage: + https://web.archive.org/web/20200803150818/https://www.marvell.com/products/infrastructure-processors/multi-core-processors/octeon-tx2/octeon-tx2-cn9130.html + + Product Brief: + https://web.archive.org/web/20200803150818/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-infrastructure-processors-octeon-tx2-cn913x-product-brief-2020-02.pdf + + Device tree files: + arch/arm64/boot/dts/marvell/cn913* + +Avanta family +------------- + + Flavors: + - 88F6500 + - 88F6510 + - 88F6530P + - 88F6550 + - 88F6560 + - 88F6601 + + Homepage: + https://web.archive.org/web/20181005145041/http://www.marvell.com/broadband/ + + Product Brief: + https://web.archive.org/web/20180829171057/http://www.marvell.com/broadband/assets/Marvell_Avanta_88F6510_305_060-001_product_brief.pdf + + No public datasheet available. + + Core: + ARMv5 compatible + + Linux kernel mach directory: + no code in mainline yet, planned for the future + Linux kernel plat directory: + no code in mainline yet, planned for the future + +Storage family +-------------- + + Armada SP: + - 88RC1580 + + Product infos: + https://web.archive.org/web/20191129073953/http://www.marvell.com/storage/armada-sp/ + + Core: + Sheeva ARMv7 compatible Quad-core PJ4C + + (not supported in upstream Linux kernel) + +Dove family (application processor) +----------------------------------- + + Flavors: + - 88AP510 a.k.a Armada 510 + + Product Brief: + https://web.archive.org/web/20111102020643/http://www.marvell.com/application-processors/armada-500/assets/Marvell_Armada510_SoC.pdf + + Hardware Spec: + https://web.archive.org/web/20160428160231/http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Hardware-Spec.pdf + + Functional Spec: + https://web.archive.org/web/20120130172443/http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Functional-Spec.pdf + + Homepage: + https://web.archive.org/web/20160822232651/http://www.marvell.com/application-processors/armada-500/ + + Core: + ARMv7 compatible + + Directory: + - arch/arm/mach-mvebu (DT enabled platforms) + - arch/arm/mach-dove (non-DT enabled platforms) + +PXA 2xx/3xx/93x/95x family +-------------------------- + + Flavors: + - PXA21x, PXA25x, PXA26x + - Application processor only + - Core: ARMv5 XScale1 core + - PXA270, PXA271, PXA272 + - Product Brief : https://web.archive.org/web/20150927135510/http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_pb.pdf + - Design guide : https://web.archive.org/web/20120111181937/http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_design_guide.pdf + - Developers manual : https://web.archive.org/web/20150927164805/http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_dev_man.pdf + - Specification : https://web.archive.org/web/20140211221535/http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_emts.pdf + - Specification update : https://web.archive.org/web/20120111104906/http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_spec_update.pdf + - Application processor only + - Core: ARMv5 XScale2 core + - PXA300, PXA310, PXA320 + - PXA 300 Product Brief : https://web.archive.org/web/20120111121203/http://www.marvell.com/application-processors/pxa-family/assets/PXA300_PB_R4.pdf + - PXA 310 Product Brief : https://web.archive.org/web/20120111104515/http://www.marvell.com/application-processors/pxa-family/assets/PXA310_PB_R4.pdf + - PXA 320 Product Brief : https://web.archive.org/web/20121021182826/http://www.marvell.com/application-processors/pxa-family/assets/PXA320_PB_R4.pdf + - Design guide : https://web.archive.org/web/20130727144625/http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Design_Guide.pdf + - Developers manual : https://web.archive.org/web/20130727144605/http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Developers_Manual.zip + - Specifications : https://web.archive.org/web/20130727144559/http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_EMTS.pdf + - Specification Update : https://web.archive.org/web/20150927183411/http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Spec_Update.zip + - Reference Manual : https://web.archive.org/web/20120111103844/http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_TavorP_BootROM_Ref_Manual.pdf + - Application processor only + - Core: ARMv5 XScale3 core + - PXA930, PXA935 + - Application processor with Communication processor + - Core: ARMv5 XScale3 core + - PXA955 + - Application processor with Communication processor + - Core: ARMv7 compatible Sheeva PJ4 core + + Comments: + + * This line of SoCs originates from the XScale family developed by + Intel and acquired by Marvell in ~2006. The PXA21x, PXA25x, + PXA26x, PXA27x, PXA3xx and PXA93x were developed by Intel, while + the later PXA95x were developed by Marvell. + + * Due to their XScale origin, these SoCs have virtually nothing in + common with the other (Kirkwood, Dove, etc.) families of Marvell + SoCs, except with the MMP/MMP2 family of SoCs. + + Linux kernel mach directory: + arch/arm/mach-pxa + +MMP/MMP2/MMP3 family (communication processor) +---------------------------------------------- + + Flavors: + - PXA168, a.k.a Armada 168 + - Homepage : https://web.archive.org/web/20110926014256/http://www.marvell.com/application-processors/armada-100/armada-168.jsp + - Product brief : https://web.archive.org/web/20111102030100/http://www.marvell.com/application-processors/armada-100/assets/pxa_168_pb.pdf + - Hardware manual : https://web.archive.org/web/20160428165359/http://www.marvell.com/application-processors/armada-100/assets/armada_16x_datasheet.pdf + - Software manual : https://web.archive.org/web/20160428154454/http://www.marvell.com/application-processors/armada-100/assets/armada_16x_software_manual.pdf + - Specification update : https://web.archive.org/web/20150927160338/http://www.marvell.com/application-processors/armada-100/assets/ARMADA16x_Spec_update.pdf + - Boot ROM manual : https://web.archive.org/web/20130727205559/http://www.marvell.com/application-processors/armada-100/assets/armada_16x_ref_manual.pdf + - App node package : https://web.archive.org/web/20141005090706/http://www.marvell.com/application-processors/armada-100/assets/armada_16x_app_note_package.pdf + - Application processor only + - Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk) + - PXA910/PXA920 + - Homepage : https://web.archive.org/web/20150928121236/http://www.marvell.com/communication-processors/pxa910/ + - Product Brief : https://archive.org/download/marvell-pxa910-pb/Marvell_PXA910_Platform-001_PB.pdf + - Application processor with Communication processor + - Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk) + - PXA688, a.k.a. MMP2, a.k.a Armada 610 (OLPC XO-1.75) + - Product Brief : https://web.archive.org/web/20111102023255/http://www.marvell.com/application-processors/armada-600/assets/armada610_pb.pdf + - Application processor only + - Core: ARMv7 compatible Sheeva PJ4 88sv581x core + - PXA2128, a.k.a. MMP3, a.k.a Armada 620 (OLPC XO-4) + - Product Brief : https://web.archive.org/web/20120824055155/http://www.marvell.com/application-processors/armada/pxa2128/assets/Marvell-ARMADA-PXA2128-SoC-PB.pdf + - Application processor only + - Core: Dual-core ARMv7 compatible Sheeva PJ4C core + - PXA960/PXA968/PXA978 (Linux support not upstream) + - Application processor with Communication Processor + - Core: ARMv7 compatible Sheeva PJ4 core + - PXA986/PXA988 (Linux support not upstream) + - Application processor with Communication Processor + - Core: Dual-core ARMv7 compatible Sheeva PJ4B-MP core + - PXA1088/PXA1920 (Linux support not upstream) + - Application processor with Communication Processor + - Core: quad-core ARMv7 Cortex-A7 + - PXA1908/PXA1928/PXA1936 + - Application processor with Communication Processor + - Core: multi-core ARMv8 Cortex-A53 + + Comments: + + * This line of SoCs originates from the XScale family developed by + Intel and acquired by Marvell in ~2006. All the processors of + this MMP/MMP2 family were developed by Marvell. + + * Due to their XScale origin, these SoCs have virtually nothing in + common with the other (Kirkwood, Dove, etc.) families of Marvell + SoCs, except with the PXA family of SoCs listed above. + + Linux kernel mach directory: + arch/arm/mach-mmp + +Berlin family (Multimedia Solutions) +------------------------------------- + + - Flavors: + - 88DE3010, Armada 1000 (no Linux support) + - Core: Marvell PJ1 (ARMv5TE), Dual-core + - Product Brief: https://web.archive.org/web/20131103162620/http://www.marvell.com/digital-entertainment/assets/armada_1000_pb.pdf + - 88DE3005, Armada 1500 Mini + - Design name: BG2CD + - Core: ARM Cortex-A9, PL310 L2CC + - 88DE3006, Armada 1500 Mini Plus + - Design name: BG2CDP + - Core: Dual Core ARM Cortex-A7 + - 88DE3100, Armada 1500 + - Design name: BG2 + - Core: Marvell PJ4B-MP (ARMv7), Tauros3 L2CC + - 88DE3114, Armada 1500 Pro + - Design name: BG2Q + - Core: Quad Core ARM Cortex-A9, PL310 L2CC + - 88DE3214, Armada 1500 Pro 4K + - Design name: BG3 + - Core: ARM Cortex-A15, CA15 integrated L2CC + - 88DE3218, ARMADA 1500 Ultra + - Core: ARM Cortex-A53 + + Homepage: https://www.synaptics.com/products/multimedia-solutions + Directory: arch/arm/mach-berlin + + Comments: + + * This line of SoCs is based on Marvell Sheeva or ARM Cortex CPUs + with Synopsys DesignWare (IRQ, GPIO, Timers, ...) and PXA IP (SDHCI, USB, ETH, ...). + + * The Berlin family was acquired by Synaptics from Marvell in 2017. + +CPU Cores +--------- + +The XScale cores were designed by Intel, and shipped by Marvell in the older +PXA processors. Feroceon is a Marvell designed core that developed in-house, +and that evolved into Sheeva. The XScale and Feroceon cores were phased out +over time and replaced with Sheeva cores in later products, which subsequently +got replaced with licensed ARM Cortex-A cores. + + XScale 1 + CPUID 0x69052xxx + ARMv5, iWMMXt + XScale 2 + CPUID 0x69054xxx + ARMv5, iWMMXt + XScale 3 + CPUID 0x69056xxx or 0x69056xxx + ARMv5, iWMMXt + Feroceon-1850 88fr331 "Mohawk" + CPUID 0x5615331x or 0x41xx926x + ARMv5TE, single issue + Feroceon-2850 88fr531-vd "Jolteon" + CPUID 0x5605531x or 0x41xx926x + ARMv5TE, VFP, dual-issue + Feroceon 88fr571-vd "Jolteon" + CPUID 0x5615571x + ARMv5TE, VFP, dual-issue + Feroceon 88fr131 "Mohawk-D" + CPUID 0x5625131x + ARMv5TE, single-issue in-order + Sheeva PJ1 88sv331 "Mohawk" + CPUID 0x561584xx + ARMv5, single-issue iWMMXt v2 + Sheeva PJ4 88sv581x "Flareon" + CPUID 0x560f581x + ARMv7, idivt, optional iWMMXt v2 + Sheeva PJ4B 88sv581x + CPUID 0x561f581x + ARMv7, idivt, optional iWMMXt v2 + Sheeva PJ4B-MP / PJ4C + CPUID 0x562f584x + ARMv7, idivt/idiva, LPAE, optional iWMMXt v2 and/or NEON + +Long-term plans +--------------- + + * Unify the mach-dove/, mach-mv78xx0/, mach-orion5x/ into the + mach-mvebu/ to support all SoCs from the Marvell EBU (Engineering + Business Unit) in a single mach-<foo> directory. The plat-orion/ + would therefore disappear. + +Credits +------- + +- Maen Suleiman <maen@marvell.com> +- Lior Amsalem <alior@marvell.com> +- Thomas Petazzoni <thomas.petazzoni@free-electrons.com> +- Andrew Lunn <andrew@lunn.ch> +- Nicolas Pitre <nico@fluxnic.net> +- Eric Miao <eric.y.miao@gmail.com> diff --git a/Documentation/arch/arm/mem_alignment.rst b/Documentation/arch/arm/mem_alignment.rst new file mode 100644 index 000000000000..aa22893b62bc --- /dev/null +++ b/Documentation/arch/arm/mem_alignment.rst @@ -0,0 +1,63 @@ +================ +Memory alignment +================ + +Too many problems popped up because of unnoticed misaligned memory access in +kernel code lately. Therefore the alignment fixup is now unconditionally +configured in for SA11x0 based targets. According to Alan Cox, this is a +bad idea to configure it out, but Russell King has some good reasons for +doing so on some f***ed up ARM architectures like the EBSA110. However +this is not the case on many design I'm aware of, like all SA11x0 based +ones. + +Of course this is a bad idea to rely on the alignment trap to perform +unaligned memory access in general. If those access are predictable, you +are better to use the macros provided by include/asm/unaligned.h. The +alignment trap can fixup misaligned access for the exception cases, but at +a high performance cost. It better be rare. + +Now for user space applications, it is possible to configure the alignment +trap to SIGBUS any code performing unaligned access (good for debugging bad +code), or even fixup the access by software like for kernel code. The later +mode isn't recommended for performance reasons (just think about the +floating point emulation that works about the same way). Fix your code +instead! + +Please note that randomly changing the behaviour without good thought is +real bad - it changes the behaviour of all unaligned instructions in user +space, and might cause programs to fail unexpectedly. + +To change the alignment trap behavior, simply echo a number into +/proc/cpu/alignment. The number is made up from various bits: + +=== ======================================================== +bit behavior when set +=== ======================================================== +0 A user process performing an unaligned memory access + will cause the kernel to print a message indicating + process name, pid, pc, instruction, address, and the + fault code. + +1 The kernel will attempt to fix up the user process + performing the unaligned access. This is of course + slow (think about the floating point emulator) and + not recommended for production use. + +2 The kernel will send a SIGBUS signal to the user process + performing the unaligned access. +=== ======================================================== + +Note that not all combinations are supported - only values 0 through 5. +(6 and 7 don't make sense). + +For example, the following will turn on the warnings, but without +fixing up or sending SIGBUS signals:: + + echo 1 > /proc/cpu/alignment + +You can also read the content of the same file to get statistical +information on unaligned access occurrences plus the current mode of +operation for user space code. + + +Nicolas Pitre, Mar 13, 2001. Modified Russell King, Nov 30, 2001. diff --git a/Documentation/arch/arm/memory.rst b/Documentation/arch/arm/memory.rst new file mode 100644 index 000000000000..0cb1e2938823 --- /dev/null +++ b/Documentation/arch/arm/memory.rst @@ -0,0 +1,103 @@ +================================= +Kernel Memory Layout on ARM Linux +================================= + + Russell King <rmk@arm.linux.org.uk> + + November 17, 2005 (2.6.15) + +This document describes the virtual memory layout which the Linux +kernel uses for ARM processors. It indicates which regions are +free for platforms to use, and which are used by generic code. + +The ARM CPU is capable of addressing a maximum of 4GB virtual memory +space, and this must be shared between user space processes, the +kernel, and hardware devices. + +As the ARM architecture matures, it becomes necessary to reserve +certain regions of VM space for use for new facilities; therefore +this document may reserve more VM space over time. + +=============== =============== =============================================== +Start End Use +=============== =============== =============================================== +ffff8000 ffffffff copy_user_page / clear_user_page use. + For SA11xx and Xscale, this is used to + setup a minicache mapping. + +ffff4000 ffffffff cache aliasing on ARMv6 and later CPUs. + +ffff1000 ffff7fff Reserved. + Platforms must not use this address range. + +ffff0000 ffff0fff CPU vector page. + The CPU vectors are mapped here if the + CPU supports vector relocation (control + register V bit.) + +fffe0000 fffeffff XScale cache flush area. This is used + in proc-xscale.S to flush the whole data + cache. (XScale does not have TCM.) + +fffe8000 fffeffff DTCM mapping area for platforms with + DTCM mounted inside the CPU. + +fffe0000 fffe7fff ITCM mapping area for platforms with + ITCM mounted inside the CPU. + +ffc80000 ffefffff Fixmap mapping region. Addresses provided + by fix_to_virt() will be located here. + +ffc00000 ffc7ffff Guard region + +ff800000 ffbfffff Permanent, fixed read-only mapping of the + firmware provided DT blob + +fee00000 feffffff Mapping of PCI I/O space. This is a static + mapping within the vmalloc space. + +VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. + Memory returned by vmalloc/ioremap will + be dynamically placed in this region. + Machine specific static mappings are also + located here through iotable_init(). + VMALLOC_START is based upon the value + of the high_memory variable, and VMALLOC_END + is equal to 0xff800000. + +PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. + This maps the platforms RAM, and typically + maps all platform RAM in a 1:1 relationship. + +PKMAP_BASE PAGE_OFFSET-1 Permanent kernel mappings + One way of mapping HIGHMEM pages into kernel + space. + +MODULES_VADDR MODULES_END-1 Kernel module space + Kernel modules inserted via insmod are + placed here using dynamic mappings. + +TASK_SIZE MODULES_VADDR-1 KASAn shadow memory when KASan is in use. + The range from MODULES_VADDR to the top + of the memory is shadowed here with 1 bit + per byte of memory. + +00001000 TASK_SIZE-1 User space mappings + Per-thread mappings are placed here via + the mmap() system call. + +00000000 00000fff CPU vector page / null pointer trap + CPUs which do not support vector remapping + place their vector page here. NULL pointer + dereferences by both the kernel and user + space are also caught via this mapping. +=============== =============== =============================================== + +Please note that mappings which collide with the above areas may result +in a non-bootable kernel, or may cause the kernel to (eventually) panic +at run time. + +Since future CPUs may impact the kernel mapping layout, user programs +must not access any memory which is not mapped inside their 0x0001000 +to TASK_SIZE address range. If they wish to access these areas, they +must set up their own mappings using open() and mmap(). diff --git a/Documentation/arch/arm/microchip.rst b/Documentation/arch/arm/microchip.rst new file mode 100644 index 000000000000..e721d855f2c9 --- /dev/null +++ b/Documentation/arch/arm/microchip.rst @@ -0,0 +1,230 @@ +============================= +ARM Microchip SoCs (aka AT91) +============================= + + +Introduction +------------ +This document gives useful information about the ARM Microchip SoCs that are +currently supported in Linux Mainline (you know, the one on kernel.org). + +It is important to note that the Microchip (previously Atmel) ARM-based MPU +product line is historically named "AT91" or "at91" throughout the Linux kernel +development process even if this product prefix has completely disappeared from +the official Microchip product name. Anyway, files, directories, git trees, +git branches/tags and email subject always contain this "at91" sub-string. + + +AT91 SoCs +--------- +Documentation and detailed datasheet for each product are available on +the Microchip website: http://www.microchip.com. + + Flavors: + * ARM 920 based SoC + - at91rm9200 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-1768-32-bit-ARM920T-Embedded-Microprocessor-AT91RM9200_Datasheet.pdf + + * ARM 926 based SoCs + - at91sam9260 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6221-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9260_Datasheet.pdf + + - at91sam9xe + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6254-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9XE_Datasheet.pdf + + - at91sam9261 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6062-ARM926EJ-S-Microprocessor-SAM9261_Datasheet.pdf + + - at91sam9263 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6249-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9263_Datasheet.pdf + + - at91sam9rl + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/doc6289.pdf + + - at91sam9g20 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001516A.pdf + + - at91sam9g45 family + - at91sam9g45 + - at91sam9g46 + - at91sam9m10 + - at91sam9m11 (device superset) + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6437-32-bit-ARM926-Embedded-Microprocessor-SAM9M11_Datasheet.pdf + + - at91sam9x5 family (aka "The 5 series") + - at91sam9g15 + - at91sam9g25 + - at91sam9g35 + - at91sam9x25 + - at91sam9x35 + + * Datasheet (can be considered as covering the whole family) + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11055-32-bit-ARM926EJ-S-Microcontroller-SAM9X35_Datasheet.pdf + + - at91sam9n12 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001517A.pdf + + - sam9x60 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/SAM9X60-Data-Sheet-DS60001579A.pdf + + * ARM Cortex-A5 based SoCs + - sama5d3 family + + - sama5d31 + - sama5d33 + - sama5d34 + - sama5d35 + - sama5d36 (device superset) + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11121-32-bit-Cortex-A5-Microcontroller-SAMA5D3_Datasheet_B.pdf + + * ARM Cortex-A5 + NEON based SoCs + - sama5d4 family + + - sama5d41 + - sama5d42 + - sama5d43 + - sama5d44 (device superset) + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/60001525A.pdf + + - sama5d2 family + + - sama5d21 + - sama5d22 + - sama5d23 + - sama5d24 + - sama5d26 + - sama5d27 (device superset) + - sama5d28 (device superset + environmental monitors) + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001476B.pdf + + * ARM Cortex-A7 based SoCs + - sama7g5 family + + - sama7g51 + - sama7g52 + - sama7g53 + - sama7g54 (device superset) + + * Datasheet + + Coming soon + + - lan966 family + - lan9662 + - lan9668 + + * Datasheet + + Coming soon + + * ARM Cortex-M7 MCUs + - sams70 family + + - sams70j19 + - sams70j20 + - sams70j21 + - sams70n19 + - sams70n20 + - sams70n21 + - sams70q19 + - sams70q20 + - sams70q21 + + - samv70 family + + - samv70j19 + - samv70j20 + - samv70n19 + - samv70n20 + - samv70q19 + - samv70q20 + + - samv71 family + + - samv71j19 + - samv71j20 + - samv71j21 + - samv71n19 + - samv71n20 + - samv71n21 + - samv71q19 + - samv71q20 + - samv71q21 + + * Datasheet + + http://ww1.microchip.com/downloads/en/DeviceDoc/SAM-E70-S70-V70-V71-Family-Data-Sheet-DS60001527D.pdf + + +Linux kernel information +------------------------ +Linux kernel mach directory: arch/arm/mach-at91 +MAINTAINERS entry is: "ARM/Microchip (AT91) SoC support" + + +Device Tree for AT91 SoCs and boards +------------------------------------ +All AT91 SoCs are converted to Device Tree. Since Linux 3.19, these products +must use this method to boot the Linux kernel. + +Work In Progress statement: +Device Tree files and Device Tree bindings that apply to AT91 SoCs and boards are +considered as "Unstable". To be completely clear, any at91 binding can change at +any time. So, be sure to use a Device Tree Binary and a Kernel Image generated from +the same source tree. +Please refer to the Documentation/devicetree/bindings/ABI.rst file for a +definition of a "Stable" binding/ABI. +This statement will be removed by AT91 MAINTAINERS when appropriate. + +Naming conventions and best practice: + +- SoCs Device Tree Source Include files are named after the official name of + the product (at91sam9g20.dtsi or sama5d33.dtsi for instance). +- Device Tree Source Include files (.dtsi) are used to collect common nodes that can be + shared across SoCs or boards (sama5d3.dtsi or at91sam9x5cm.dtsi for instance). + When collecting nodes for a particular peripheral or topic, the identifier have to + be placed at the end of the file name, separated with a "_" (at91sam9x5_can.dtsi + or sama5d3_gmac.dtsi for example). +- board Device Tree Source files (.dts) are prefixed by the string "at91-" so + that they can be identified easily. Note that some files are historical exceptions + to this rule (sama5d3[13456]ek.dts, usb_a9g20.dts or animeo_ip.dts for example). diff --git a/Documentation/arch/arm/netwinder.rst b/Documentation/arch/arm/netwinder.rst new file mode 100644 index 000000000000..8eab66caa2ac --- /dev/null +++ b/Documentation/arch/arm/netwinder.rst @@ -0,0 +1,85 @@ +================================ +NetWinder specific documentation +================================ + +The NetWinder is a small low-power computer, primarily designed +to run Linux. It is based around the StrongARM RISC processor, +DC21285 PCI bridge, with PC-type hardware glued around it. + +Port usage +========== + +======= ====== =============================== +Min Max Description +======= ====== =============================== +0x0000 0x000f DMA1 +0x0020 0x0021 PIC1 +0x0060 0x006f Keyboard +0x0070 0x007f RTC +0x0080 0x0087 DMA1 +0x0088 0x008f DMA2 +0x00a0 0x00a3 PIC2 +0x00c0 0x00df DMA2 +0x0180 0x0187 IRDA +0x01f0 0x01f6 ide0 +0x0201 Game port +0x0203 RWA010 configuration read +0x0220 ? SoundBlaster +0x0250 ? WaveArtist +0x0279 RWA010 configuration index +0x02f8 0x02ff Serial ttyS1 +0x0300 0x031f Ether10 +0x0338 GPIO1 +0x033a GPIO2 +0x0370 0x0371 W83977F configuration registers +0x0388 ? AdLib +0x03c0 0x03df VGA +0x03f6 ide0 +0x03f8 0x03ff Serial ttyS0 +0x0400 0x0408 DC21143 +0x0480 0x0487 DMA1 +0x0488 0x048f DMA2 +0x0a79 RWA010 configuration write +0xe800 0xe80f ide0/ide1 BM DMA +======= ====== =============================== + + +Interrupt usage +=============== + +======= ======= ======================== +IRQ type Description +======= ======= ======================== + 0 ISA 100Hz timer + 1 ISA Keyboard + 2 ISA cascade + 3 ISA Serial ttyS1 + 4 ISA Serial ttyS0 + 5 ISA PS/2 mouse + 6 ISA IRDA + 7 ISA Printer + 8 ISA RTC alarm + 9 ISA +10 ISA GP10 (Orange reset button) +11 ISA +12 ISA WaveArtist +13 ISA +14 ISA hda1 +15 ISA +======= ======= ======================== + +DMA usage +========= + +======= ======= =========== +DMA type Description +======= ======= =========== + 0 ISA IRDA + 1 ISA + 2 ISA cascade + 3 ISA WaveArtist + 4 ISA + 5 ISA + 6 ISA + 7 ISA WaveArtist +======= ======= =========== diff --git a/Documentation/arch/arm/nwfpe/index.rst b/Documentation/arch/arm/nwfpe/index.rst new file mode 100644 index 000000000000..3c4d2f9aa10e --- /dev/null +++ b/Documentation/arch/arm/nwfpe/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +NetWinder's floating point emulator +=================================== + +.. toctree:: + :maxdepth: 1 + + nwfpe + netwinder-fpe + notes + todo diff --git a/Documentation/arch/arm/nwfpe/netwinder-fpe.rst b/Documentation/arch/arm/nwfpe/netwinder-fpe.rst new file mode 100644 index 000000000000..cbb320960fc4 --- /dev/null +++ b/Documentation/arch/arm/nwfpe/netwinder-fpe.rst @@ -0,0 +1,162 @@ +============= +Current State +============= + +The following describes the current state of the NetWinder's floating point +emulator. + +In the following nomenclature is used to describe the floating point +instructions. It follows the conventions in the ARM manual. + +:: + + <S|D|E> = <single|double|extended>, no default + {P|M|Z} = {round to +infinity,round to -infinity,round to zero}, + default = round to nearest + +Note: items enclosed in {} are optional. + +Floating Point Coprocessor Data Transfer Instructions (CPDT) +------------------------------------------------------------ + +LDF/STF - load and store floating + +<LDF|STF>{cond}<S|D|E> Fd, Rn +<LDF|STF>{cond}<S|D|E> Fd, [Rn, #<expression>]{!} +<LDF|STF>{cond}<S|D|E> Fd, [Rn], #<expression> + +These instructions are fully implemented. + +LFM/SFM - load and store multiple floating + +Form 1 syntax: +<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn] +<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn, #<expression>]{!} +<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn], #<expression> + +Form 2 syntax: +<LFM|SFM>{cond}<FD,EA> Fd, <count>, [Rn]{!} + +These instructions are fully implemented. They store/load three words +for each floating point register into the memory location given in the +instruction. The format in memory is unlikely to be compatible with +other implementations, in particular the actual hardware. Specific +mention of this is made in the ARM manuals. + +Floating Point Coprocessor Register Transfer Instructions (CPRT) +---------------------------------------------------------------- + +Conversions, read/write status/control register instructions + +FLT{cond}<S,D,E>{P,M,Z} Fn, Rd Convert integer to floating point +FIX{cond}{P,M,Z} Rd, Fn Convert floating point to integer +WFS{cond} Rd Write floating point status register +RFS{cond} Rd Read floating point status register +WFC{cond} Rd Write floating point control register +RFC{cond} Rd Read floating point control register + +FLT/FIX are fully implemented. + +RFS/WFS are fully implemented. + +RFC/WFC are fully implemented. RFC/WFC are supervisor only instructions, and +presently check the CPU mode, and do an invalid instruction trap if not called +from supervisor mode. + +Compare instructions + +CMF{cond} Fn, Fm Compare floating +CMFE{cond} Fn, Fm Compare floating with exception +CNF{cond} Fn, Fm Compare negated floating +CNFE{cond} Fn, Fm Compare negated floating with exception + +These are fully implemented. + +Floating Point Coprocessor Data Instructions (CPDT) +--------------------------------------------------- + +Dyadic operations: + +ADF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - add +SUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - subtract +RSF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse subtract +MUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - multiply +DVF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - divide +RDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse divide + +These are fully implemented. + +FML{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast multiply +FDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast divide +FRD{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast reverse divide + +These are fully implemented as well. They use the same algorithm as the +non-fast versions. Hence, in this implementation their performance is +equivalent to the MUF/DVF/RDV instructions. This is acceptable according +to the ARM manual. The manual notes these are defined only for single +operands, on the actual FPA11 hardware they do not work for double or +extended precision operands. The emulator currently does not check +the requested permissions conditions, and performs the requested operation. + +RMF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - IEEE remainder + +This is fully implemented. + +Monadic operations: + +MVF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move +MNF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move negated + +These are fully implemented. + +ABS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - absolute value +SQT{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - square root +RND{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - round + +These are fully implemented. + +URD{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - unnormalized round +NRM{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - normalize + +These are implemented. URD is implemented using the same code as the RND +instruction. Since URD cannot return a unnormalized number, NRM becomes +a NOP. + +Library calls: + +POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power +RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power +POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2) + +LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10 +LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e +EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent +SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine +COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine +TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent +ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine +ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine +ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent + +These are not implemented. They are not currently issued by the compiler, +and are handled by routines in libc. These are not implemented by the FPA11 +hardware, but are handled by the floating point support code. They should +be implemented in future versions. + +Signalling: + +Signals are implemented. However current ELF kernels produced by Rebel.com +have a bug in them that prevents the module from generating a SIGFPE. This +is caused by a failure to alias fp_current to the kernel variable +current_set[0] correctly. + +The kernel provided with this distribution (vmlinux-nwfpe-0.93) contains +a fix for this problem and also incorporates the current version of the +emulator directly. It is possible to run with no floating point module +loaded with this kernel. It is provided as a demonstration of the +technology and for those who want to do floating point work that depends +on signals. It is not strictly necessary to use the module. + +A module (either the one provided by Russell King, or the one in this +distribution) can be loaded to replace the functionality of the emulator +built into the kernel. diff --git a/Documentation/arch/arm/nwfpe/notes.rst b/Documentation/arch/arm/nwfpe/notes.rst new file mode 100644 index 000000000000..102e55af8439 --- /dev/null +++ b/Documentation/arch/arm/nwfpe/notes.rst @@ -0,0 +1,32 @@ +Notes +===== + +There seems to be a problem with exp(double) and our emulator. I haven't +been able to track it down yet. This does not occur with the emulator +supplied by Russell King. + +I also found one oddity in the emulator. I don't think it is serious but +will point it out. The ARM calling conventions require floating point +registers f4-f7 to be preserved over a function call. The compiler quite +often uses an stfe instruction to save f4 on the stack upon entry to a +function, and an ldfe instruction to restore it before returning. + +I was looking at some code, that calculated a double result, stored it in f4 +then made a function call. Upon return from the function call the number in +f4 had been converted to an extended value in the emulator. + +This is a side effect of the stfe instruction. The double in f4 had to be +converted to extended, then stored. If an lfm/sfm combination had been used, +then no conversion would occur. This has performance considerations. The +result from the function call and f4 were used in a multiplication. If the +emulator sees a multiply of a double and extended, it promotes the double to +extended, then does the multiply in extended precision. + +This code will cause this problem: + +double x, y, z; +z = log(x)/log(y); + +The result of log(x) (a double) will be calculated, returned in f0, then +moved to f4 to preserve it over the log(y) call. The division will be done +in extended precision, due to the stfe instruction used to save f4 in log(y). diff --git a/Documentation/arch/arm/nwfpe/nwfpe.rst b/Documentation/arch/arm/nwfpe/nwfpe.rst new file mode 100644 index 000000000000..35cd90dacbff --- /dev/null +++ b/Documentation/arch/arm/nwfpe/nwfpe.rst @@ -0,0 +1,74 @@ +Introduction +============ + +This directory contains the version 0.92 test release of the NetWinder +Floating Point Emulator. + +The majority of the code was written by me, Scott Bambrough It is +written in C, with a small number of routines in inline assembler +where required. It was written quickly, with a goal of implementing a +working version of all the floating point instructions the compiler +emits as the first target. I have attempted to be as optimal as +possible, but there remains much room for improvement. + +I have attempted to make the emulator as portable as possible. One of +the problems is with leading underscores on kernel symbols. Elf +kernels have no leading underscores, a.out compiled kernels do. I +have attempted to use the C_SYMBOL_NAME macro wherever this may be +important. + +Another choice I made was in the file structure. I have attempted to +contain all operating system specific code in one module (fpmodule.*). +All the other files contain emulator specific code. This should allow +others to port the emulator to NetBSD for instance relatively easily. + +The floating point operations are based on SoftFloat Release 2, by +John Hauser. SoftFloat is a software implementation of floating-point +that conforms to the IEC/IEEE Standard for Binary Floating-point +Arithmetic. As many as four formats are supported: single precision, +double precision, extended double precision, and quadruple precision. +All operations required by the standard are implemented, except for +conversions to and from decimal. We use only the single precision, +double precision and extended double precision formats. The port of +SoftFloat to the ARM was done by Phil Blundell, based on an earlier +port of SoftFloat version 1 by Neil Carson for NetBSD/arm32. + +The file README.FPE contains a description of what has been implemented +so far in the emulator. The file TODO contains a information on what +remains to be done, and other ideas for the emulator. + +Bug reports, comments, suggestions should be directed to me at +<scottb@netwinder.org>. General reports of "this program doesn't +work correctly when your emulator is installed" are useful for +determining that bugs still exist; but are virtually useless when +attempting to isolate the problem. Please report them, but don't +expect quick action. Bugs still exist. The problem remains in isolating +which instruction contains the bug. Small programs illustrating a specific +problem are a godsend. + +Legal Notices +------------- + +The NetWinder Floating Point Emulator is free software. Everything Rebel.com +has written is provided under the GNU GPL. See the file COPYING for copying +conditions. Excluded from the above is the SoftFloat code. John Hauser's +legal notice for SoftFloat is included below. + +------------------------------------------------------------------------------- + +SoftFloat Legal Notice + +SoftFloat was written by John R. Hauser. This work was made possible in +part by the International Computer Science Institute, located at Suite 600, +1947 Center Street, Berkeley, California 94704. Funding was partially +provided by the National Science Foundation under grant MIP-9311980. The +original version of this code was written as part of a project to build +a fixed-point vector processor in collaboration with the University of +California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. + +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. +------------------------------------------------------------------------------- diff --git a/Documentation/arch/arm/nwfpe/todo.rst b/Documentation/arch/arm/nwfpe/todo.rst new file mode 100644 index 000000000000..393f11b14540 --- /dev/null +++ b/Documentation/arch/arm/nwfpe/todo.rst @@ -0,0 +1,72 @@ +TODO LIST +========= + +:: + + POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power + RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power + POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2) + + LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10 + LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e + EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent + SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine + COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine + TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent + ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine + ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine + ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent + +These are not implemented. They are not currently issued by the compiler, +and are handled by routines in libc. These are not implemented by the FPA11 +hardware, but are handled by the floating point support code. They should +be implemented in future versions. + +There are a couple of ways to approach the implementation of these. One +method would be to use accurate table methods for these routines. I have +a couple of papers by S. Gal from IBM's research labs in Haifa, Israel that +seem to promise extreme accuracy (in the order of 99.8%) and reasonable speed. +These methods are used in GLIBC for some of the transcendental functions. + +Another approach, which I know little about is CORDIC. This stands for +Coordinate Rotation Digital Computer, and is a method of computing +transcendental functions using mostly shifts and adds and a few +multiplications and divisions. The ARM excels at shifts and adds, +so such a method could be promising, but requires more research to +determine if it is feasible. + +Rounding Methods +---------------- + +The IEEE standard defines 4 rounding modes. Round to nearest is the +default, but rounding to + or - infinity or round to zero are also allowed. +Many architectures allow the rounding mode to be specified by modifying bits +in a control register. Not so with the ARM FPA11 architecture. To change +the rounding mode one must specify it with each instruction. + +This has made porting some benchmarks difficult. It is possible to +introduce such a capability into the emulator. The FPCR contains +bits describing the rounding mode. The emulator could be altered to +examine a flag, which if set forced it to ignore the rounding mode in +the instruction, and use the mode specified in the bits in the FPCR. + +This would require a method of getting/setting the flag, and the bits +in the FPCR. This requires a kernel call in ArmLinux, as WFC/RFC are +supervisor only instructions. If anyone has any ideas or comments I +would like to hear them. + +NOTE: + pulled out from some docs on ARM floating point, specifically + for the Acorn FPE, but not limited to it: + + The floating point control register (FPCR) may only be present in some + implementations: it is there to control the hardware in an implementation- + specific manner, for example to disable the floating point system. The user + mode of the ARM is not permitted to use this register (since the right is + reserved to alter it between implementations) and the WFC and RFC + instructions will trap if tried in user mode. + + Hence, the answer is yes, you could do this, but then you will run a high + risk of becoming isolated if and when hardware FP emulation comes out + + -- Russell. diff --git a/Documentation/arch/arm/omap/dss.rst b/Documentation/arch/arm/omap/dss.rst new file mode 100644 index 000000000000..a40c4d9c717a --- /dev/null +++ b/Documentation/arch/arm/omap/dss.rst @@ -0,0 +1,372 @@ +========================= +OMAP2/3 Display Subsystem +========================= + +This is an almost total rewrite of the OMAP FB driver in drivers/video/omap +(let's call it DSS1). The main differences between DSS1 and DSS2 are DSI, +TV-out and multiple display support, but there are lots of small improvements +also. + +The DSS2 driver (omapdss module) is in arch/arm/plat-omap/dss/, and the FB, +panel and controller drivers are in drivers/video/omap2/. DSS1 and DSS2 live +currently side by side, you can choose which one to use. + +Features +-------- + +Working and tested features include: + +- MIPI DPI (parallel) output +- MIPI DSI output in command mode +- MIPI DBI (RFBI) output +- SDI output +- TV output +- All pieces can be compiled as a module or inside kernel +- Use DISPC to update any of the outputs +- Use CPU to update RFBI or DSI output +- OMAP DISPC planes +- RGB16, RGB24 packed, RGB24 unpacked +- YUV2, UYVY +- Scaling +- Adjusting DSS FCK to find a good pixel clock +- Use DSI DPLL to create DSS FCK + +Tested boards include: +- OMAP3 SDP board +- Beagle board +- N810 + +omapdss driver +-------------- + +The DSS driver does not itself have any support for Linux framebuffer, V4L or +such like the current ones, but it has an internal kernel API that upper level +drivers can use. + +The DSS driver models OMAP's overlays, overlay managers and displays in a +flexible way to enable non-common multi-display configuration. In addition to +modelling the hardware overlays, omapdss supports virtual overlays and overlay +managers. These can be used when updating a display with CPU or system DMA. + +omapdss driver support for audio +-------------------------------- +There exist several display technologies and standards that support audio as +well. Hence, it is relevant to update the DSS device driver to provide an audio +interface that may be used by an audio driver or any other driver interested in +the functionality. + +The audio_enable function is intended to prepare the relevant +IP for playback (e.g., enabling an audio FIFO, taking in/out of reset +some IP, enabling companion chips, etc). It is intended to be called before +audio_start. The audio_disable function performs the reverse operation and is +intended to be called after audio_stop. + +While a given DSS device driver may support audio, it is possible that for +certain configurations audio is not supported (e.g., an HDMI display using a +VESA video timing). The audio_supported function is intended to query whether +the current configuration of the display supports audio. + +The audio_config function is intended to configure all the relevant audio +parameters of the display. In order to make the function independent of any +specific DSS device driver, a struct omap_dss_audio is defined. Its purpose +is to contain all the required parameters for audio configuration. At the +moment, such structure contains pointers to IEC-60958 channel status word +and CEA-861 audio infoframe structures. This should be enough to support +HDMI and DisplayPort, as both are based on CEA-861 and IEC-60958. + +The audio_enable/disable, audio_config and audio_supported functions could be +implemented as functions that may sleep. Hence, they should not be called +while holding a spinlock or a readlock. + +The audio_start/audio_stop function is intended to effectively start/stop audio +playback after the configuration has taken place. These functions are designed +to be used in an atomic context. Hence, audio_start should return quickly and be +called only after all the needed resources for audio playback (audio FIFOs, +DMA channels, companion chips, etc) have been enabled to begin data transfers. +audio_stop is designed to only stop the audio transfers. The resources used +for playback are released using audio_disable. + +The enum omap_dss_audio_state may be used to help the implementations of +the interface to keep track of the audio state. The initial state is _DISABLED; +then, the state transitions to _CONFIGURED, and then, when it is ready to +play audio, to _ENABLED. The state _PLAYING is used when the audio is being +rendered. + + +Panel and controller drivers +---------------------------- + +The drivers implement panel or controller specific functionality and are not +usually visible to users except through omapfb driver. They register +themselves to the DSS driver. + +omapfb driver +------------- + +The omapfb driver implements arbitrary number of standard linux framebuffers. +These framebuffers can be routed flexibly to any overlays, thus allowing very +dynamic display architecture. + +The driver exports some omapfb specific ioctls, which are compatible with the +ioctls in the old driver. + +The rest of the non standard features are exported via sysfs. Whether the final +implementation will use sysfs, or ioctls, is still open. + +V4L2 drivers +------------ + +V4L2 is being implemented in TI. + +From omapdss point of view the V4L2 drivers should be similar to framebuffer +driver. + +Architecture +-------------------- + +Some clarification what the different components do: + + - Framebuffer is a memory area inside OMAP's SRAM/SDRAM that contains the + pixel data for the image. Framebuffer has width and height and color + depth. + - Overlay defines where the pixels are read from and where they go on the + screen. The overlay may be smaller than framebuffer, thus displaying only + part of the framebuffer. The position of the overlay may be changed if + the overlay is smaller than the display. + - Overlay manager combines the overlays in to one image and feeds them to + display. + - Display is the actual physical display device. + +A framebuffer can be connected to multiple overlays to show the same pixel data +on all of the overlays. Note that in this case the overlay input sizes must be +the same, but, in case of video overlays, the output size can be different. Any +framebuffer can be connected to any overlay. + +An overlay can be connected to one overlay manager. Also DISPC overlays can be +connected only to DISPC overlay managers, and virtual overlays can be only +connected to virtual overlays. + +An overlay manager can be connected to one display. There are certain +restrictions which kinds of displays an overlay manager can be connected: + + - DISPC TV overlay manager can be only connected to TV display. + - Virtual overlay managers can only be connected to DBI or DSI displays. + - DISPC LCD overlay manager can be connected to all displays, except TV + display. + +Sysfs +----- +The sysfs interface is mainly used for testing. I don't think sysfs +interface is the best for this in the final version, but I don't quite know +what would be the best interfaces for these things. + +The sysfs interface is divided to two parts: DSS and FB. + +/sys/class/graphics/fb? directory: +mirror 0=off, 1=on +rotate Rotation 0-3 for 0, 90, 180, 270 degrees +rotate_type 0 = DMA rotation, 1 = VRFB rotation +overlays List of overlay numbers to which framebuffer pixels go +phys_addr Physical address of the framebuffer +virt_addr Virtual address of the framebuffer +size Size of the framebuffer + +/sys/devices/platform/omapdss/overlay? directory: +enabled 0=off, 1=on +input_size width,height (ie. the framebuffer size) +manager Destination overlay manager name +name +output_size width,height +position x,y +screen_width width +global_alpha global alpha 0-255 0=transparent 255=opaque + +/sys/devices/platform/omapdss/manager? directory: +display Destination display +name +alpha_blending_enabled 0=off, 1=on +trans_key_enabled 0=off, 1=on +trans_key_type gfx-destination, video-source +trans_key_value transparency color key (RGB24) +default_color default background color (RGB24) + +/sys/devices/platform/omapdss/display? directory: + +=============== ============================================================= +ctrl_name Controller name +mirror 0=off, 1=on +update_mode 0=off, 1=auto, 2=manual +enabled 0=off, 1=on +name +rotate Rotation 0-3 for 0, 90, 180, 270 degrees +timings Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw) + When writing, two special timings are accepted for tv-out: + "pal" and "ntsc" +panel_name +tear_elim Tearing elimination 0=off, 1=on +output_type Output type (video encoder only): "composite" or "svideo" +=============== ============================================================= + +There are also some debugfs files at <debugfs>/omapdss/ which show information +about clocks and registers. + +Examples +-------- + +The following definitions have been made for the examples below:: + + ovl0=/sys/devices/platform/omapdss/overlay0 + ovl1=/sys/devices/platform/omapdss/overlay1 + ovl2=/sys/devices/platform/omapdss/overlay2 + + mgr0=/sys/devices/platform/omapdss/manager0 + mgr1=/sys/devices/platform/omapdss/manager1 + + lcd=/sys/devices/platform/omapdss/display0 + dvi=/sys/devices/platform/omapdss/display1 + tv=/sys/devices/platform/omapdss/display2 + + fb0=/sys/class/graphics/fb0 + fb1=/sys/class/graphics/fb1 + fb2=/sys/class/graphics/fb2 + +Default setup on OMAP3 SDP +-------------------------- + +Here's the default setup on OMAP3 SDP board. All planes go to LCD. DVI +and TV-out are not in use. The columns from left to right are: +framebuffers, overlays, overlay managers, displays. Framebuffers are +handled by omapfb, and the rest by the DSS:: + + FB0 --- GFX -\ DVI + FB1 --- VID1 --+- LCD ---- LCD + FB2 --- VID2 -/ TV ----- TV + +Example: Switch from LCD to DVI +------------------------------- + +:: + + w=`cat $dvi/timings | cut -d "," -f 2 | cut -d "/" -f 1` + h=`cat $dvi/timings | cut -d "," -f 3 | cut -d "/" -f 1` + + echo "0" > $lcd/enabled + echo "" > $mgr0/display + fbset -fb /dev/fb0 -xres $w -yres $h -vxres $w -vyres $h + # at this point you have to switch the dvi/lcd dip-switch from the omap board + echo "dvi" > $mgr0/display + echo "1" > $dvi/enabled + +After this the configuration looks like::: + + FB0 --- GFX -\ -- DVI + FB1 --- VID1 --+- LCD -/ LCD + FB2 --- VID2 -/ TV ----- TV + +Example: Clone GFX overlay to LCD and TV +---------------------------------------- + +:: + + w=`cat $tv/timings | cut -d "," -f 2 | cut -d "/" -f 1` + h=`cat $tv/timings | cut -d "," -f 3 | cut -d "/" -f 1` + + echo "0" > $ovl0/enabled + echo "0" > $ovl1/enabled + + echo "" > $fb1/overlays + echo "0,1" > $fb0/overlays + + echo "$w,$h" > $ovl1/output_size + echo "tv" > $ovl1/manager + + echo "1" > $ovl0/enabled + echo "1" > $ovl1/enabled + + echo "1" > $tv/enabled + +After this the configuration looks like (only relevant parts shown):: + + FB0 +-- GFX ---- LCD ---- LCD + \- VID1 ---- TV ---- TV + +Misc notes +---------- + +OMAP FB allocates the framebuffer memory using the standard dma allocator. You +can enable Contiguous Memory Allocator (CONFIG_CMA) to improve the dma +allocator, and if CMA is enabled, you use "cma=" kernel parameter to increase +the global memory area for CMA. + +Using DSI DPLL to generate pixel clock it is possible produce the pixel clock +of 86.5MHz (max possible), and with that you get 1280x1024@57 output from DVI. + +Rotation and mirroring currently only supports RGB565 and RGB8888 modes. VRFB +does not support mirroring. + +VRFB rotation requires much more memory than non-rotated framebuffer, so you +probably need to increase your vram setting before using VRFB rotation. Also, +many applications may not work with VRFB if they do not pay attention to all +framebuffer parameters. + +Kernel boot arguments +--------------------- + +omapfb.mode=<display>:<mode>[,...] + - Default video mode for specified displays. For example, + "dvi:800x400MR-24@60". See drivers/video/modedb.c. + There are also two special modes: "pal" and "ntsc" that + can be used to tv out. + +omapfb.vram=<fbnum>:<size>[@<physaddr>][,...] + - VRAM allocated for a framebuffer. Normally omapfb allocates vram + depending on the display size. With this you can manually allocate + more or define the physical address of each framebuffer. For example, + "1:4M" to allocate 4M for fb1. + +omapfb.debug=<y|n> + - Enable debug printing. You have to have OMAPFB debug support enabled + in kernel config. + +omapfb.test=<y|n> + - Draw test pattern to framebuffer whenever framebuffer settings change. + You need to have OMAPFB debug support enabled in kernel config. + +omapfb.vrfb=<y|n> + - Use VRFB rotation for all framebuffers. + +omapfb.rotate=<angle> + - Default rotation applied to all framebuffers. + 0 - 0 degree rotation + 1 - 90 degree rotation + 2 - 180 degree rotation + 3 - 270 degree rotation + +omapfb.mirror=<y|n> + - Default mirror for all framebuffers. Only works with DMA rotation. + +omapdss.def_disp=<display> + - Name of default display, to which all overlays will be connected. + Common examples are "lcd" or "tv". + +omapdss.debug=<y|n> + - Enable debug printing. You have to have DSS debug support enabled in + kernel config. + +TODO +---- + +DSS locking + +Error checking + +- Lots of checks are missing or implemented just as BUG() + +System DMA update for DSI + +- Can be used for RGB16 and RGB24P modes. Probably not for RGB24U (how + to skip the empty byte?) + +OMAP1 support + +- Not sure if needed diff --git a/Documentation/arch/arm/omap/index.rst b/Documentation/arch/arm/omap/index.rst new file mode 100644 index 000000000000..8b365b212e49 --- /dev/null +++ b/Documentation/arch/arm/omap/index.rst @@ -0,0 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======= +TI OMAP +======= + +.. toctree:: + :maxdepth: 1 + + omap + omap_pm + dss diff --git a/Documentation/arch/arm/omap/omap.rst b/Documentation/arch/arm/omap/omap.rst new file mode 100644 index 000000000000..f440c0f4613f --- /dev/null +++ b/Documentation/arch/arm/omap/omap.rst @@ -0,0 +1,18 @@ +============ +OMAP history +============ + +This file contains documentation for running mainline +kernel on omaps. + +====== ====================================================== +KERNEL NEW DEPENDENCIES +====== ====================================================== +v4.3+ Update is needed for custom .config files to make sure + CONFIG_REGULATOR_PBIAS is enabled for MMC1 to work + properly. + +v4.18+ Update is needed for custom .config files to make sure + CONFIG_MMC_SDHCI_OMAP is enabled for all MMC instances + to work in DRA7 and K2G based boards. +====== ====================================================== diff --git a/Documentation/arch/arm/omap/omap_pm.rst b/Documentation/arch/arm/omap/omap_pm.rst new file mode 100644 index 000000000000..a335e4c8ce2c --- /dev/null +++ b/Documentation/arch/arm/omap/omap_pm.rst @@ -0,0 +1,165 @@ +===================== +The OMAP PM interface +===================== + +This document describes the temporary OMAP PM interface. Driver +authors use these functions to communicate minimum latency or +throughput constraints to the kernel power management code. +Over time, the intention is to merge features from the OMAP PM +interface into the Linux PM QoS code. + +Drivers need to express PM parameters which: + +- support the range of power management parameters present in the TI SRF; + +- separate the drivers from the underlying PM parameter + implementation, whether it is the TI SRF or Linux PM QoS or Linux + latency framework or something else; + +- specify PM parameters in terms of fundamental units, such as + latency and throughput, rather than units which are specific to OMAP + or to particular OMAP variants; + +- allow drivers which are shared with other architectures (e.g., + DaVinci) to add these constraints in a way which won't affect non-OMAP + systems, + +- can be implemented immediately with minimal disruption of other + architectures. + + +This document proposes the OMAP PM interface, including the following +five power management functions for driver code: + +1. Set the maximum MPU wakeup latency:: + + (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t) + +2. Set the maximum device wakeup latency:: + + (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t) + +3. Set the maximum system DMA transfer start latency (CORE pwrdm):: + + (*pdata->set_max_sdma_lat)(struct device *dev, long t) + +4. Set the minimum bus throughput needed by a device:: + + (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r) + +5. Return the number of times the device has lost context:: + + (*pdata->get_dev_context_loss_count)(struct device *dev) + + +Further documentation for all OMAP PM interface functions can be +found in arch/arm/plat-omap/include/mach/omap-pm.h. + + +The OMAP PM layer is intended to be temporary +--------------------------------------------- + +The intention is that eventually the Linux PM QoS layer should support +the range of power management features present in OMAP3. As this +happens, existing drivers using the OMAP PM interface can be modified +to use the Linux PM QoS code; and the OMAP PM interface can disappear. + + +Driver usage of the OMAP PM functions +------------------------------------- + +As the 'pdata' in the above examples indicates, these functions are +exposed to drivers through function pointers in driver .platform_data +structures. The function pointers are initialized by the `board-*.c` +files to point to the corresponding OMAP PM functions: + +- set_max_dev_wakeup_lat will point to + omap_pm_set_max_dev_wakeup_lat(), etc. Other architectures which do + not support these functions should leave these function pointers set + to NULL. Drivers should use the following idiom:: + + if (pdata->set_max_dev_wakeup_lat) + (*pdata->set_max_dev_wakeup_lat)(dev, t); + +The most common usage of these functions will probably be to specify +the maximum time from when an interrupt occurs, to when the device +becomes accessible. To accomplish this, driver writers should use the +set_max_mpu_wakeup_lat() function to constrain the MPU wakeup +latency, and the set_max_dev_wakeup_lat() function to constrain the +device wakeup latency (from clk_enable() to accessibility). For +example:: + + /* Limit MPU wakeup latency */ + if (pdata->set_max_mpu_wakeup_lat) + (*pdata->set_max_mpu_wakeup_lat)(dev, tc); + + /* Limit device powerdomain wakeup latency */ + if (pdata->set_max_dev_wakeup_lat) + (*pdata->set_max_dev_wakeup_lat)(dev, td); + + /* total wakeup latency in this example: (tc + td) */ + +The PM parameters can be overwritten by calling the function again +with the new value. The settings can be removed by calling the +function with a t argument of -1 (except in the case of +set_max_bus_tput(), which should be called with an r argument of 0). + +The fifth function above, omap_pm_get_dev_context_loss_count(), +is intended as an optimization to allow drivers to determine whether the +device has lost its internal context. If context has been lost, the +driver must restore its internal context before proceeding. + + +Other specialized interface functions +------------------------------------- + +The five functions listed above are intended to be usable by any +device driver. DSPBridge and CPUFreq have a few special requirements. +DSPBridge expresses target DSP performance levels in terms of OPP IDs. +CPUFreq expresses target MPU performance levels in terms of MPU +frequency. The OMAP PM interface contains functions for these +specialized cases to convert that input information (OPPs/MPU +frequency) into the form that the underlying power management +implementation needs: + +6. `(*pdata->dsp_get_opp_table)(void)` + +7. `(*pdata->dsp_set_min_opp)(u8 opp_id)` + +8. `(*pdata->dsp_get_opp)(void)` + +9. `(*pdata->cpu_get_freq_table)(void)` + +10. `(*pdata->cpu_set_freq)(unsigned long f)` + +11. `(*pdata->cpu_get_freq)(void)` + +Customizing OPP for platform +============================ +Defining CONFIG_PM should enable OPP layer for the silicon +and the registration of OPP table should take place automatically. +However, in special cases, the default OPP table may need to be +tweaked, for e.g.: + + * enable default OPPs which are disabled by default, but which + could be enabled on a platform + * Disable an unsupported OPP on the platform + * Define and add a custom opp table entry + in these cases, the board file needs to do additional steps as follows: + +arch/arm/mach-omapx/board-xyz.c:: + + #include "pm.h" + .... + static void __init omap_xyz_init_irq(void) + { + .... + /* Initialize the default table */ + omapx_opp_init(); + /* Do customization to the defaults */ + .... + } + +NOTE: + omapx_opp_init will be omap3_opp_init or as required + based on the omap family. diff --git a/Documentation/arch/arm/porting.rst b/Documentation/arch/arm/porting.rst new file mode 100644 index 000000000000..bd21958bdb2d --- /dev/null +++ b/Documentation/arch/arm/porting.rst @@ -0,0 +1,137 @@ +======= +Porting +======= + +Taken from list archive at http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2001-July/004064.html + +Initial definitions +------------------- + +The following symbol definitions rely on you knowing the translation that +__virt_to_phys() does for your machine. This macro converts the passed +virtual address to a physical address. Normally, it is simply: + + phys = virt - PAGE_OFFSET + PHYS_OFFSET + + +Decompressor Symbols +-------------------- + +ZTEXTADDR + Start address of decompressor. There's no point in talking about + virtual or physical addresses here, since the MMU will be off at + the time when you call the decompressor code. You normally call + the kernel at this address to start it booting. This doesn't have + to be located in RAM, it can be in flash or other read-only or + read-write addressable medium. + +ZBSSADDR + Start address of zero-initialised work area for the decompressor. + This must be pointing at RAM. The decompressor will zero initialise + this for you. Again, the MMU will be off. + +ZRELADDR + This is the address where the decompressed kernel will be written, + and eventually executed. The following constraint must be valid: + + __virt_to_phys(TEXTADDR) == ZRELADDR + + The initial part of the kernel is carefully coded to be position + independent. + +INITRD_PHYS + Physical address to place the initial RAM disk. Only relevant if + you are using the bootpImage stuff (which only works on the old + struct param_struct). + +INITRD_VIRT + Virtual address of the initial RAM disk. The following constraint + must be valid: + + __virt_to_phys(INITRD_VIRT) == INITRD_PHYS + +PARAMS_PHYS + Physical address of the struct param_struct or tag list, giving the + kernel various parameters about its execution environment. + + +Kernel Symbols +-------------- + +PHYS_OFFSET + Physical start address of the first bank of RAM. + +PAGE_OFFSET + Virtual start address of the first bank of RAM. During the kernel + boot phase, virtual address PAGE_OFFSET will be mapped to physical + address PHYS_OFFSET, along with any other mappings you supply. + This should be the same value as TASK_SIZE. + +TASK_SIZE + The maximum size of a user process in bytes. Since user space + always starts at zero, this is the maximum address that a user + process can access+1. The user space stack grows down from this + address. + + Any virtual address below TASK_SIZE is deemed to be user process + area, and therefore managed dynamically on a process by process + basis by the kernel. I'll call this the user segment. + + Anything above TASK_SIZE is common to all processes. I'll call + this the kernel segment. + + (In other words, you can't put IO mappings below TASK_SIZE, and + hence PAGE_OFFSET). + +TEXTADDR + Virtual start address of kernel, normally PAGE_OFFSET + 0x8000. + This is where the kernel image ends up. With the latest kernels, + it must be located at 32768 bytes into a 128MB region. Previous + kernels placed a restriction of 256MB here. + +DATAADDR + Virtual address for the kernel data segment. Must not be defined + when using the decompressor. + +VMALLOC_START / VMALLOC_END + Virtual addresses bounding the vmalloc() area. There must not be + any static mappings in this area; vmalloc will overwrite them. + The addresses must also be in the kernel segment (see above). + Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the + last virtual RAM address (found using variable high_memory). + +VMALLOC_OFFSET + Offset normally incorporated into VMALLOC_START to provide a hole + between virtual RAM and the vmalloc area. We do this to allow + out of bounds memory accesses (eg, something writing off the end + of the mapped memory map) to be caught. Normally set to 8MB. + +Architecture Specific Macros +---------------------------- + +BOOT_MEM(pram,pio,vio) + `pram` specifies the physical start address of RAM. Must always + be present, and should be the same as PHYS_OFFSET. + + `pio` is the physical address of an 8MB region containing IO for + use with the debugging macros in arch/arm/kernel/debug-armv.S. + + `vio` is the virtual address of the 8MB debugging region. + + It is expected that the debugging region will be re-initialised + by the architecture specific code later in the code (via the + MAPIO function). + +BOOT_PARAMS + Same as, and see PARAMS_PHYS. + +FIXUP(func) + Machine specific fixups, run before memory subsystems have been + initialised. + +MAPIO(func) + Machine specific function to map IO areas (including the debug + region above). + +INITIRQ(func) + Machine specific function to initialise interrupts. diff --git a/Documentation/arch/arm/pxa/mfp.rst b/Documentation/arch/arm/pxa/mfp.rst new file mode 100644 index 000000000000..ac34e5d7ee44 --- /dev/null +++ b/Documentation/arch/arm/pxa/mfp.rst @@ -0,0 +1,288 @@ +============================================== +MFP Configuration for PXA2xx/PXA3xx Processors +============================================== + + Eric Miao <eric.miao@marvell.com> + +MFP stands for Multi-Function Pin, which is the pin-mux logic on PXA3xx and +later PXA series processors. This document describes the existing MFP API, +and how board/platform driver authors could make use of it. + +Basic Concept +============= + +Unlike the GPIO alternate function settings on PXA25x and PXA27x, a new MFP +mechanism is introduced from PXA3xx to completely move the pin-mux functions +out of the GPIO controller. In addition to pin-mux configurations, the MFP +also controls the low power state, driving strength, pull-up/down and event +detection of each pin. Below is a diagram of internal connections between +the MFP logic and the remaining SoC peripherals:: + + +--------+ + | |--(GPIO19)--+ + | GPIO | | + | |--(GPIO...) | + +--------+ | + | +---------+ + +--------+ +------>| | + | PWM2 |--(PWM_OUT)-------->| MFP | + +--------+ +------>| |-------> to external PAD + | +---->| | + +--------+ | | +-->| | + | SSP2 |---(TXD)----+ | | +---------+ + +--------+ | | + | | + +--------+ | | + | Keypad |--(MKOUT4)----+ | + +--------+ | + | + +--------+ | + | UART2 |---(TXD)--------+ + +--------+ + +NOTE: the external pad is named as MFP_PIN_GPIO19, it doesn't necessarily +mean it's dedicated for GPIO19, only as a hint that internally this pin +can be routed from GPIO19 of the GPIO controller. + +To better understand the change from PXA25x/PXA27x GPIO alternate function +to this new MFP mechanism, here are several key points: + + 1. GPIO controller on PXA3xx is now a dedicated controller, same as other + internal controllers like PWM, SSP and UART, with 128 internal signals + which can be routed to external through one or more MFPs (e.g. GPIO<0> + can be routed through either MFP_PIN_GPIO0 as well as MFP_PIN_GPIO0_2, + see arch/arm/mach-pxa/mfp-pxa300.h) + + 2. Alternate function configuration is removed from this GPIO controller, + the remaining functions are pure GPIO-specific, i.e. + + - GPIO signal level control + - GPIO direction control + - GPIO level change detection + + 3. Low power state for each pin is now controlled by MFP, this means the + PGSRx registers on PXA2xx are now useless on PXA3xx + + 4. Wakeup detection is now controlled by MFP, PWER does not control the + wakeup from GPIO(s) any more, depending on the sleeping state, ADxER + (as defined in pxa3xx-regs.h) controls the wakeup from MFP + +NOTE: with such a clear separation of MFP and GPIO, by GPIO<xx> we normally +mean it is a GPIO signal, and by MFP<xxx> or pin xxx, we mean a physical +pad (or ball). + +MFP API Usage +============= + +For board code writers, here are some guidelines: + +1. include ONE of the following header files in your <board>.c: + + - #include "mfp-pxa25x.h" + - #include "mfp-pxa27x.h" + - #include "mfp-pxa300.h" + - #include "mfp-pxa320.h" + - #include "mfp-pxa930.h" + + NOTE: only one file in your <board>.c, depending on the processors used, + because pin configuration definitions may conflict in these file (i.e. + same name, different meaning and settings on different processors). E.g. + for zylonite platform, which support both PXA300/PXA310 and PXA320, two + separate files are introduced: zylonite_pxa300.c and zylonite_pxa320.c + (in addition to handle MFP configuration differences, they also handle + the other differences between the two combinations). + + NOTE: PXA300 and PXA310 are almost identical in pin configurations (with + PXA310 supporting some additional ones), thus the difference is actually + covered in a single mfp-pxa300.h. + +2. prepare an array for the initial pin configurations, e.g.:: + + static unsigned long mainstone_pin_config[] __initdata = { + /* Chip Select */ + GPIO15_nCS_1, + + /* LCD - 16bpp Active TFT */ + GPIOxx_TFT_LCD_16BPP, + GPIO16_PWM0_OUT, /* Backlight */ + + /* MMC */ + GPIO32_MMC_CLK, + GPIO112_MMC_CMD, + GPIO92_MMC_DAT_0, + GPIO109_MMC_DAT_1, + GPIO110_MMC_DAT_2, + GPIO111_MMC_DAT_3, + + ... + + /* GPIO */ + GPIO1_GPIO | WAKEUP_ON_EDGE_BOTH, + }; + + a) once the pin configurations are passed to pxa{2xx,3xx}_mfp_config(), + and written to the actual registers, they are useless and may discard, + adding '__initdata' will help save some additional bytes here. + + b) when there is only one possible pin configurations for a component, + some simplified definitions can be used, e.g. GPIOxx_TFT_LCD_16BPP on + PXA25x and PXA27x processors + + c) if by board design, a pin can be configured to wake up the system + from low power state, it can be 'OR'ed with any of: + + WAKEUP_ON_EDGE_BOTH + WAKEUP_ON_EDGE_RISE + WAKEUP_ON_EDGE_FALL + WAKEUP_ON_LEVEL_HIGH - specifically for enabling of keypad GPIOs, + + to indicate that this pin has the capability of wake-up the system, + and on which edge(s). This, however, doesn't necessarily mean the + pin _will_ wakeup the system, it will only when set_irq_wake() is + invoked with the corresponding GPIO IRQ (GPIO_IRQ(xx) or gpio_to_irq()) + and eventually calls gpio_set_wake() for the actual register setting. + + d) although PXA3xx MFP supports edge detection on each pin, the + internal logic will only wakeup the system when those specific bits + in ADxER registers are set, which can be well mapped to the + corresponding peripheral, thus set_irq_wake() can be called with + the peripheral IRQ to enable the wakeup. + + +MFP on PXA3xx +============= + +Every external I/O pad on PXA3xx (excluding those for special purpose) has +one MFP logic associated, and is controlled by one MFP register (MFPR). + +The MFPR has the following bit definitions (for PXA300/PXA310/PXA320):: + + 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 + +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | RESERVED |PS|PU|PD| DRIVE |SS|SD|SO|EC|EF|ER|--| AF_SEL | + +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + Bit 3: RESERVED + Bit 4: EDGE_RISE_EN - enable detection of rising edge on this pin + Bit 5: EDGE_FALL_EN - enable detection of falling edge on this pin + Bit 6: EDGE_CLEAR - disable edge detection on this pin + Bit 7: SLEEP_OE_N - enable outputs during low power modes + Bit 8: SLEEP_DATA - output data on the pin during low power modes + Bit 9: SLEEP_SEL - selection control for low power modes signals + Bit 13: PULLDOWN_EN - enable the internal pull-down resistor on this pin + Bit 14: PULLUP_EN - enable the internal pull-up resistor on this pin + Bit 15: PULL_SEL - pull state controlled by selected alternate function + (0) or by PULL{UP,DOWN}_EN bits (1) + + Bit 0 - 2: AF_SEL - alternate function selection, 8 possibilities, from 0-7 + Bit 10-12: DRIVE - drive strength and slew rate + 0b000 - fast 1mA + 0b001 - fast 2mA + 0b002 - fast 3mA + 0b003 - fast 4mA + 0b004 - slow 6mA + 0b005 - fast 6mA + 0b006 - slow 10mA + 0b007 - fast 10mA + +MFP Design for PXA2xx/PXA3xx +============================ + +Due to the difference of pin-mux handling between PXA2xx and PXA3xx, a unified +MFP API is introduced to cover both series of processors. + +The basic idea of this design is to introduce definitions for all possible pin +configurations, these definitions are processor and platform independent, and +the actual API invoked to convert these definitions into register settings and +make them effective there-after. + +Files Involved +-------------- + + - arch/arm/mach-pxa/include/mach/mfp.h + + for + 1. Unified pin definitions - enum constants for all configurable pins + 2. processor-neutral bit definitions for a possible MFP configuration + + - arch/arm/mach-pxa/mfp-pxa3xx.h + + for PXA3xx specific MFPR register bit definitions and PXA3xx common pin + configurations + + - arch/arm/mach-pxa/mfp-pxa2xx.h + + for PXA2xx specific definitions and PXA25x/PXA27x common pin configurations + + - arch/arm/mach-pxa/mfp-pxa25x.h + arch/arm/mach-pxa/mfp-pxa27x.h + arch/arm/mach-pxa/mfp-pxa300.h + arch/arm/mach-pxa/mfp-pxa320.h + arch/arm/mach-pxa/mfp-pxa930.h + + for processor specific definitions + + - arch/arm/mach-pxa/mfp-pxa3xx.c + - arch/arm/mach-pxa/mfp-pxa2xx.c + + for implementation of the pin configuration to take effect for the actual + processor. + +Pin Configuration +----------------- + + The following comments are copied from mfp.h (see the actual source code + for most updated info):: + + /* + * a possible MFP configuration is represented by a 32-bit integer + * + * bit 0.. 9 - MFP Pin Number (1024 Pins Maximum) + * bit 10..12 - Alternate Function Selection + * bit 13..15 - Drive Strength + * bit 16..18 - Low Power Mode State + * bit 19..20 - Low Power Mode Edge Detection + * bit 21..22 - Run Mode Pull State + * + * to facilitate the definition, the following macros are provided + * + * MFP_CFG_DEFAULT - default MFP configuration value, with + * alternate function = 0, + * drive strength = fast 3mA (MFP_DS03X) + * low power mode = default + * edge detection = none + * + * MFP_CFG - default MFPR value with alternate function + * MFP_CFG_DRV - default MFPR value with alternate function and + * pin drive strength + * MFP_CFG_LPM - default MFPR value with alternate function and + * low power mode + * MFP_CFG_X - default MFPR value with alternate function, + * pin drive strength and low power mode + */ + + Examples of pin configurations are:: + + #define GPIO94_SSP3_RXD MFP_CFG_X(GPIO94, AF1, DS08X, FLOAT) + + which reads GPIO94 can be configured as SSP3_RXD, with alternate function + selection of 1, driving strength of 0b101, and a float state in low power + modes. + + NOTE: this is the default setting of this pin being configured as SSP3_RXD + which can be modified a bit in board code, though it is not recommended to + do so, simply because this default setting is usually carefully encoded, + and is supposed to work in most cases. + +Register Settings +----------------- + + Register settings on PXA3xx for a pin configuration is actually very + straight-forward, most bits can be converted directly into MFPR value + in a easier way. Two sets of MFPR values are calculated: the run-time + ones and the low power mode ones, to allow different settings. + + The conversion from a generic pin configuration to the actual register + settings on PXA2xx is a bit complicated: many registers are involved, + including GAFRx, GPDRx, PGSRx, PWER, PKWR, PFER and PRER. Please see + mfp-pxa2xx.c for how the conversion is made. diff --git a/Documentation/arch/arm/sa1100/assabet.rst b/Documentation/arch/arm/sa1100/assabet.rst new file mode 100644 index 000000000000..a761e128fb08 --- /dev/null +++ b/Documentation/arch/arm/sa1100/assabet.rst @@ -0,0 +1,301 @@ +============================================ +The Intel Assabet (SA-1110 evaluation) board +============================================ + +Please see: +http://developer.intel.com + +Also some notes from John G Dorsey <jd5q@andrew.cmu.edu>: +http://www.cs.cmu.edu/~wearable/software/assabet.html + + +Building the kernel +------------------- + +To build the kernel with current defaults:: + + make assabet_defconfig + make oldconfig + make zImage + +The resulting kernel image should be available in linux/arch/arm/boot/zImage. + + +Installing a bootloader +----------------------- + +A couple of bootloaders able to boot Linux on Assabet are available: + +BLOB (http://www.lartmaker.nl/lartware/blob/) + + BLOB is a bootloader used within the LART project. Some contributed + patches were merged into BLOB to add support for Assabet. + +Compaq's Bootldr + John Dorsey's patch for Assabet support +(http://www.handhelds.org/Compaq/bootldr.html) +(http://www.wearablegroup.org/software/bootldr/) + + Bootldr is the bootloader developed by Compaq for the iPAQ Pocket PC. + John Dorsey has produced add-on patches to add support for Assabet and + the JFFS filesystem. + +RedBoot (http://sources.redhat.com/redboot/) + + RedBoot is a bootloader developed by Red Hat based on the eCos RTOS + hardware abstraction layer. It supports Assabet amongst many other + hardware platforms. + +RedBoot is currently the recommended choice since it's the only one to have +networking support, and is the most actively maintained. + +Brief examples on how to boot Linux with RedBoot are shown below. But first +you need to have RedBoot installed in your flash memory. A known to work +precompiled RedBoot binary is available from the following location: + +- ftp://ftp.netwinder.org/users/n/nico/ +- ftp://ftp.arm.linux.org.uk/pub/linux/arm/people/nico/ +- ftp://ftp.handhelds.org/pub/linux/arm/sa-1100-patches/ + +Look for redboot-assabet*.tgz. Some installation infos are provided in +redboot-assabet*.txt. + + +Initial RedBoot configuration +----------------------------- + +The commands used here are explained in The RedBoot User's Guide available +on-line at http://sources.redhat.com/ecos/docs.html. +Please refer to it for explanations. + +If you have a CF network card (my Assabet kit contained a CF+ LP-E from +Socket Communications Inc.), you should strongly consider using it for TFTP +file transfers. You must insert it before RedBoot runs since it can't detect +it dynamically. + +To initialize the flash directory:: + + fis init -f + +To initialize the non-volatile settings, like whether you want to use BOOTP or +a static IP address, etc, use this command:: + + fconfig -i + + +Writing a kernel image into flash +--------------------------------- + +First, the kernel image must be loaded into RAM. If you have the zImage file +available on a TFTP server:: + + load zImage -r -b 0x100000 + +If you rather want to use Y-Modem upload over the serial port:: + + load -m ymodem -r -b 0x100000 + +To write it to flash:: + + fis create "Linux kernel" -b 0x100000 -l 0xc0000 + + +Booting the kernel +------------------ + +The kernel still requires a filesystem to boot. A ramdisk image can be loaded +as follows:: + + load ramdisk_image.gz -r -b 0x800000 + +Again, Y-Modem upload can be used instead of TFTP by replacing the file name +by '-y ymodem'. + +Now the kernel can be retrieved from flash like this:: + + fis load "Linux kernel" + +or loaded as described previously. To boot the kernel:: + + exec -b 0x100000 -l 0xc0000 + +The ramdisk image could be stored into flash as well, but there are better +solutions for on-flash filesystems as mentioned below. + + +Using JFFS2 +----------- + +Using JFFS2 (the Second Journalling Flash File System) is probably the most +convenient way to store a writable filesystem into flash. JFFS2 is used in +conjunction with the MTD layer which is responsible for low-level flash +management. More information on the Linux MTD can be found on-line at: +http://www.linux-mtd.infradead.org/. A JFFS howto with some infos about +creating JFFS/JFFS2 images is available from the same site. + +For instance, a sample JFFS2 image can be retrieved from the same FTP sites +mentioned below for the precompiled RedBoot image. + +To load this file:: + + load sample_img.jffs2 -r -b 0x100000 + +The result should look like:: + + RedBoot> load sample_img.jffs2 -r -b 0x100000 + Raw file loaded 0x00100000-0x00377424 + +Now we must know the size of the unallocated flash:: + + fis free + +Result:: + + RedBoot> fis free + 0x500E0000 .. 0x503C0000 + +The values above may be different depending on the size of the filesystem and +the type of flash. See their usage below as an example and take care of +substituting yours appropriately. + +We must determine some values:: + + size of unallocated flash: 0x503c0000 - 0x500e0000 = 0x2e0000 + size of the filesystem image: 0x00377424 - 0x00100000 = 0x277424 + +We want to fit the filesystem image of course, but we also want to give it all +the remaining flash space as well. To write it:: + + fis unlock -f 0x500E0000 -l 0x2e0000 + fis erase -f 0x500E0000 -l 0x2e0000 + fis write -b 0x100000 -l 0x277424 -f 0x500E0000 + fis create "JFFS2" -n -f 0x500E0000 -l 0x2e0000 + +Now the filesystem is associated to a MTD "partition" once Linux has discovered +what they are in the boot process. From Redboot, the 'fis list' command +displays them:: + + RedBoot> fis list + Name FLASH addr Mem addr Length Entry point + RedBoot 0x50000000 0x50000000 0x00020000 0x00000000 + RedBoot config 0x503C0000 0x503C0000 0x00020000 0x00000000 + FIS directory 0x503E0000 0x503E0000 0x00020000 0x00000000 + Linux kernel 0x50020000 0x00100000 0x000C0000 0x00000000 + JFFS2 0x500E0000 0x500E0000 0x002E0000 0x00000000 + +However Linux should display something like:: + + SA1100 flash: probing 32-bit flash bus + SA1100 flash: Found 2 x16 devices at 0x0 in 32-bit mode + Using RedBoot partition definition + Creating 5 MTD partitions on "SA1100 flash": + 0x00000000-0x00020000 : "RedBoot" + 0x00020000-0x000e0000 : "Linux kernel" + 0x000e0000-0x003c0000 : "JFFS2" + 0x003c0000-0x003e0000 : "RedBoot config" + 0x003e0000-0x00400000 : "FIS directory" + +What's important here is the position of the partition we are interested in, +which is the third one. Within Linux, this correspond to /dev/mtdblock2. +Therefore to boot Linux with the kernel and its root filesystem in flash, we +need this RedBoot command:: + + fis load "Linux kernel" + exec -b 0x100000 -l 0xc0000 -c "root=/dev/mtdblock2" + +Of course other filesystems than JFFS might be used, like cramfs for example. +You might want to boot with a root filesystem over NFS, etc. It is also +possible, and sometimes more convenient, to flash a filesystem directly from +within Linux while booted from a ramdisk or NFS. The Linux MTD repository has +many tools to deal with flash memory as well, to erase it for example. JFFS2 +can then be mounted directly on a freshly erased partition and files can be +copied over directly. Etc... + + +RedBoot scripting +----------------- + +All the commands above aren't so useful if they have to be typed in every +time the Assabet is rebooted. Therefore it's possible to automate the boot +process using RedBoot's scripting capability. + +For example, I use this to boot Linux with both the kernel and the ramdisk +images retrieved from a TFTP server on the network:: + + RedBoot> fconfig + Run script at boot: false true + Boot script: + Enter script, terminate with empty line + >> load zImage -r -b 0x100000 + >> load ramdisk_ks.gz -r -b 0x800000 + >> exec -b 0x100000 -l 0xc0000 + >> + Boot script timeout (1000ms resolution): 3 + Use BOOTP for network configuration: true + GDB connection port: 9000 + Network debug at boot time: false + Update RedBoot non-volatile configuration - are you sure (y/n)? y + +Then, rebooting the Assabet is just a matter of waiting for the login prompt. + + + +Nicolas Pitre +nico@fluxnic.net + +June 12, 2001 + + +Status of peripherals in -rmk tree (updated 14/10/2001) +------------------------------------------------------- + +Assabet: + Serial ports: + Radio: TX, RX, CTS, DSR, DCD, RI + - PM: Not tested. + - COM: TX, RX, CTS, DSR, DCD, RTS, DTR, PM + - PM: Not tested. + - I2C: Implemented, not fully tested. + - L3: Fully tested, pass. + - PM: Not tested. + + Video: + - LCD: Fully tested. PM + + (LCD doesn't like being blanked with neponset connected) + + - Video out: Not fully + + Audio: + UDA1341: + - Playback: Fully tested, pass. + - Record: Implemented, not tested. + - PM: Not tested. + + UCB1200: + - Audio play: Implemented, not heavily tested. + - Audio rec: Implemented, not heavily tested. + - Telco audio play: Implemented, not heavily tested. + - Telco audio rec: Implemented, not heavily tested. + - POTS control: No + - Touchscreen: Yes + - PM: Not tested. + + Other: + - PCMCIA: + - LPE: Fully tested, pass. + - USB: No + - IRDA: + - SIR: Fully tested, pass. + - FIR: Fully tested, pass. + - PM: Not tested. + +Neponset: + Serial ports: + - COM1,2: TX, RX, CTS, DSR, DCD, RTS, DTR + - PM: Not tested. + - USB: Implemented, not heavily tested. + - PCMCIA: Implemented, not heavily tested. + - CF: Implemented, not heavily tested. + - PM: Not tested. + +More stuff can be found in the -np (Nicolas Pitre's) tree. diff --git a/Documentation/arch/arm/sa1100/cerf.rst b/Documentation/arch/arm/sa1100/cerf.rst new file mode 100644 index 000000000000..7fa71b609bf9 --- /dev/null +++ b/Documentation/arch/arm/sa1100/cerf.rst @@ -0,0 +1,35 @@ +============== +CerfBoard/Cube +============== + +*** The StrongARM version of the CerfBoard/Cube has been discontinued *** + +The Intrinsyc CerfBoard is a StrongARM 1110-based computer on a board +that measures approximately 2" square. It includes an Ethernet +controller, an RS232-compatible serial port, a USB function port, and +one CompactFlash+ slot on the back. Pictures can be found at the +Intrinsyc website, http://www.intrinsyc.com. + +This document describes the support in the Linux kernel for the +Intrinsyc CerfBoard. + +Supported in this version +========================= + + - CompactFlash+ slot (select PCMCIA in General Setup and any options + that may be required) + - Onboard Crystal CS8900 Ethernet controller (Cerf CS8900A support in + Network Devices) + - Serial ports with a serial console (hardcoded to 38400 8N1) + +In order to get this kernel onto your Cerf, you need a server that runs +both BOOTP and TFTP. Detailed instructions should have come with your +evaluation kit on how to use the bootloader. This series of commands +will suffice:: + + make ARCH=arm CROSS_COMPILE=arm-linux- cerfcube_defconfig + make ARCH=arm CROSS_COMPILE=arm-linux- zImage + make ARCH=arm CROSS_COMPILE=arm-linux- modules + cp arch/arm/boot/zImage <TFTP directory> + +support@intrinsyc.com diff --git a/Documentation/arch/arm/sa1100/index.rst b/Documentation/arch/arm/sa1100/index.rst new file mode 100644 index 000000000000..c9aed43280ff --- /dev/null +++ b/Documentation/arch/arm/sa1100/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +Intel StrongARM 1100 +==================== + +.. toctree:: + :maxdepth: 1 + + assabet + cerf + lart + serial_uart diff --git a/Documentation/arch/arm/sa1100/lart.rst b/Documentation/arch/arm/sa1100/lart.rst new file mode 100644 index 000000000000..94c0568d1095 --- /dev/null +++ b/Documentation/arch/arm/sa1100/lart.rst @@ -0,0 +1,15 @@ +==================================== +Linux Advanced Radio Terminal (LART) +==================================== + +The LART is a small (7.5 x 10cm) SA-1100 board, designed for embedded +applications. It has 32 MB DRAM, 4MB Flash ROM, double RS232 and all +other StrongARM-gadgets. Almost all SA signals are directly accessible +through a number of connectors. The powersupply accepts voltages +between 3.5V and 16V and is overdimensioned to support a range of +daughterboards. A quad Ethernet / IDE / PS2 / sound daughterboard +is under development, with plenty of others in different stages of +planning. + +The hardware designs for this board have been released under an open license; +see the LART page at http://www.lartmaker.nl/ for more information. diff --git a/Documentation/arch/arm/sa1100/serial_uart.rst b/Documentation/arch/arm/sa1100/serial_uart.rst new file mode 100644 index 000000000000..ea983642b9be --- /dev/null +++ b/Documentation/arch/arm/sa1100/serial_uart.rst @@ -0,0 +1,51 @@ +================== +SA1100 serial port +================== + +The SA1100 serial port had its major/minor numbers officially assigned:: + + > Date: Sun, 24 Sep 2000 21:40:27 -0700 + > From: H. Peter Anvin <hpa@transmeta.com> + > To: Nicolas Pitre <nico@CAM.ORG> + > Cc: Device List Maintainer <device@lanana.org> + > Subject: Re: device + > + > Okay. Note that device numbers 204 and 205 are used for "low density + > serial devices", so you will have a range of minors on those majors (the + > tty device layer handles this just fine, so you don't have to worry about + > doing anything special.) + > + > So your assignments are: + > + > 204 char Low-density serial ports + > 5 = /dev/ttySA0 SA1100 builtin serial port 0 + > 6 = /dev/ttySA1 SA1100 builtin serial port 1 + > 7 = /dev/ttySA2 SA1100 builtin serial port 2 + > + > 205 char Low-density serial ports (alternate device) + > 5 = /dev/cusa0 Callout device for ttySA0 + > 6 = /dev/cusa1 Callout device for ttySA1 + > 7 = /dev/cusa2 Callout device for ttySA2 + > + +You must create those inodes in /dev on the root filesystem used +by your SA1100-based device:: + + mknod ttySA0 c 204 5 + mknod ttySA1 c 204 6 + mknod ttySA2 c 204 7 + mknod cusa0 c 205 5 + mknod cusa1 c 205 6 + mknod cusa2 c 205 7 + +In addition to the creation of the appropriate device nodes above, you +must ensure your user space applications make use of the correct device +name. The classic example is the content of the /etc/inittab file where +you might have a getty process started on ttyS0. + +In this case: + +- replace occurrences of ttyS0 with ttySA0, ttyS1 with ttySA1, etc. + +- don't forget to add 'ttySA0', 'console', or the appropriate tty name + in /etc/securetty for root to be allowed to login as well. diff --git a/Documentation/arch/arm/samsung/bootloader-interface.rst b/Documentation/arch/arm/samsung/bootloader-interface.rst new file mode 100644 index 000000000000..a56f325dae78 --- /dev/null +++ b/Documentation/arch/arm/samsung/bootloader-interface.rst @@ -0,0 +1,81 @@ +========================================================== +Interface between kernel and boot loaders on Exynos boards +========================================================== + +Author: Krzysztof Kozlowski + +Date : 6 June 2015 + +The document tries to describe currently used interface between Linux kernel +and boot loaders on Samsung Exynos based boards. This is not a definition +of interface but rather a description of existing state, a reference +for information purpose only. + +In the document "boot loader" means any of following: U-boot, proprietary +SBOOT or any other firmware for ARMv7 and ARMv8 initializing the board before +executing kernel. + + +1. Non-Secure mode + +Address: sysram_ns_base_addr + +============= ============================================ ================== +Offset Value Purpose +============= ============================================ ================== +0x08 exynos_cpu_resume_ns, mcpm_entry_point System suspend +0x0c 0x00000bad (Magic cookie) System suspend +0x1c exynos4_secondary_startup Secondary CPU boot +0x1c + 4*cpu exynos4_secondary_startup (Exynos4412) Secondary CPU boot +0x20 0xfcba0d10 (Magic cookie) AFTR +0x24 exynos_cpu_resume_ns AFTR +0x28 + 4*cpu 0x8 (Magic cookie, Exynos3250) AFTR +0x28 0x0 or last value during resume (Exynos542x) System suspend +============= ============================================ ================== + + +2. Secure mode + +Address: sysram_base_addr + +============= ============================================ ================== +Offset Value Purpose +============= ============================================ ================== +0x00 exynos4_secondary_startup Secondary CPU boot +0x04 exynos4_secondary_startup (Exynos542x) Secondary CPU boot +4*cpu exynos4_secondary_startup (Exynos4412) Secondary CPU boot +0x20 exynos_cpu_resume (Exynos4210 r1.0) AFTR +0x24 0xfcba0d10 (Magic cookie, Exynos4210 r1.0) AFTR +============= ============================================ ================== + +Address: pmu_base_addr + +============= ============================================ ================== +Offset Value Purpose +============= ============================================ ================== +0x0800 exynos_cpu_resume AFTR, suspend +0x0800 mcpm_entry_point (Exynos542x with MCPM) AFTR, suspend +0x0804 0xfcba0d10 (Magic cookie) AFTR +0x0804 0x00000bad (Magic cookie) System suspend +0x0814 exynos4_secondary_startup (Exynos4210 r1.1) Secondary CPU boot +0x0818 0xfcba0d10 (Magic cookie, Exynos4210 r1.1) AFTR +0x081C exynos_cpu_resume (Exynos4210 r1.1) AFTR +============= ============================================ ================== + +3. Other (regardless of secure/non-secure mode) + +Address: pmu_base_addr + +============= =============================== =============================== +Offset Value Purpose +============= =============================== =============================== +0x0908 Non-zero Secondary CPU boot up indicator + on Exynos3250 and Exynos542x +============= =============================== =============================== + + +4. Glossary + +AFTR - ARM Off Top Running, a low power mode, Cortex cores and many other +modules are power gated, except the TOP modules +MCPM - Multi-Cluster Power Management diff --git a/Documentation/arch/arm/samsung/clksrc-change-registers.awk b/Documentation/arch/arm/samsung/clksrc-change-registers.awk new file mode 100755 index 000000000000..7be1b8aa7cd9 --- /dev/null +++ b/Documentation/arch/arm/samsung/clksrc-change-registers.awk @@ -0,0 +1,166 @@ +#!/usr/bin/awk -f +# +# Copyright 2010 Ben Dooks <ben-linux@fluff.org> +# +# Released under GPLv2 + +# example usage +# ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst + +function extract_value(s) +{ + eqat = index(s, "=") + comat = index(s, ",") + return substr(s, eqat+2, (comat-eqat)-2) +} + +function remove_brackets(b) +{ + return substr(b, 2, length(b)-2) +} + +function splitdefine(l, p) +{ + r = split(l, tp) + + p[0] = tp[2] + p[1] = remove_brackets(tp[3]) +} + +function find_length(f) +{ + if (0) + printf "find_length " f "\n" > "/dev/stderr" + + if (f ~ /0x1/) + return 1 + else if (f ~ /0x3/) + return 2 + else if (f ~ /0x7/) + return 3 + else if (f ~ /0xf/) + return 4 + + printf "unknown length " f "\n" > "/dev/stderr" + exit +} + +function find_shift(s) +{ + id = index(s, "<") + if (id <= 0) { + printf "cannot find shift " s "\n" > "/dev/stderr" + exit + } + + return substr(s, id+2) +} + + +BEGIN { + if (ARGC < 2) { + print "too few arguments" > "/dev/stderr" + exit + } + +# read the header file and find the mask values that we will need +# to replace and create an associative array of values + + while (getline line < ARGV[1] > 0) { + if (line ~ /\#define.*_MASK/ && + !(line ~ /USB_SIG_MASK/)) { + splitdefine(line, fields) + name = fields[0] + if (0) + printf "MASK " line "\n" > "/dev/stderr" + dmask[name,0] = find_length(fields[1]) + dmask[name,1] = find_shift(fields[1]) + if (0) + printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr" + } else { + } + } + + delete ARGV[1] +} + +/clksrc_clk.*=.*{/ { + shift="" + mask="" + divshift="" + reg_div="" + reg_src="" + indent=1 + + print $0 + + for(; indent >= 1;) { + if ((getline line) <= 0) { + printf "unexpected end of file" > "/dev/stderr" + exit 1; + } + + if (line ~ /\.shift/) { + shift = extract_value(line) + } else if (line ~ /\.mask/) { + mask = extract_value(line) + } else if (line ~ /\.reg_divider/) { + reg_div = extract_value(line) + } else if (line ~ /\.reg_source/) { + reg_src = extract_value(line) + } else if (line ~ /\.divider_shift/) { + divshift = extract_value(line) + } else if (line ~ /{/) { + indent++ + print line + } else if (line ~ /}/) { + indent-- + + if (indent == 0) { + if (0) { + printf "shift '" shift "' ='" dmask[shift,0] "'\n" > "/dev/stderr" + printf "mask '" mask "'\n" > "/dev/stderr" + printf "dshft '" divshift "'\n" > "/dev/stderr" + printf "rdiv '" reg_div "'\n" > "/dev/stderr" + printf "rsrc '" reg_src "'\n" > "/dev/stderr" + } + + generated = mask + sub(reg_src, reg_div, generated) + + if (0) { + printf "/* rsrc " reg_src " */\n" + printf "/* rdiv " reg_div " */\n" + printf "/* shift " shift " */\n" + printf "/* mask " mask " */\n" + printf "/* generated " generated " */\n" + } + + if (reg_div != "") { + printf "\t.reg_div = { " + printf ".reg = " reg_div ", " + printf ".shift = " dmask[generated,1] ", " + printf ".size = " dmask[generated,0] ", " + printf "},\n" + } + + printf "\t.reg_src = { " + printf ".reg = " reg_src ", " + printf ".shift = " dmask[mask,1] ", " + printf ".size = " dmask[mask,0] ", " + + printf "},\n" + + } + + print line + } else { + print line + } + + if (0) + printf indent ":" line "\n" > "/dev/stderr" + } +} + +// && ! /clksrc_clk.*=.*{/ { print $0 } diff --git a/Documentation/arch/arm/samsung/gpio.rst b/Documentation/arch/arm/samsung/gpio.rst new file mode 100644 index 000000000000..27fae0d50361 --- /dev/null +++ b/Documentation/arch/arm/samsung/gpio.rst @@ -0,0 +1,32 @@ +=========================== +Samsung GPIO implementation +=========================== + +Introduction +------------ + +This outlines the Samsung GPIO implementation and the architecture +specific calls provided alongside the drivers/gpio core. + + +GPIOLIB integration +------------------- + +The gpio implementation uses gpiolib as much as possible, only providing +specific calls for the items that require Samsung specific handling, such +as pin special-function or pull resistor control. + +GPIO numbering is synchronised between the Samsung and gpiolib system. + + +PIN configuration +----------------- + +Pin configuration is specific to the Samsung architecture, with each SoC +registering the necessary information for the core gpio configuration +implementation to configure pins as necessary. + +The s3c_gpio_cfgpin() and s3c_gpio_setpull() provide the means for a +driver or machine to change gpio configuration. + +See arch/arm/mach-s3c/gpio-cfg.h for more information on these functions. diff --git a/Documentation/arch/arm/samsung/index.rst b/Documentation/arch/arm/samsung/index.rst new file mode 100644 index 000000000000..8142cce3d23e --- /dev/null +++ b/Documentation/arch/arm/samsung/index.rst @@ -0,0 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========== +Samsung SoC +=========== + +.. toctree:: + :maxdepth: 1 + + gpio + bootloader-interface + overview diff --git a/Documentation/arch/arm/samsung/overview.rst b/Documentation/arch/arm/samsung/overview.rst new file mode 100644 index 000000000000..8b15a190169b --- /dev/null +++ b/Documentation/arch/arm/samsung/overview.rst @@ -0,0 +1,76 @@ +========================== +Samsung ARM Linux Overview +========================== + +Introduction +------------ + + The Samsung range of ARM SoCs spans many similar devices, from the initial + ARM9 through to the newest ARM cores. This document shows an overview of + the current kernel support, how to use it and where to find the code + that supports this. + + The currently supported SoCs are: + + - S3C64XX: S3C6400 and S3C6410 + - S5PC110 / S5PV210 + + +Configuration +------------- + + A number of configurations are supplied, as there is no current way of + unifying all the SoCs into one kernel. + + s5pc110_defconfig + - S5PC110 specific default configuration + s5pv210_defconfig + - S5PV210 specific default configuration + + +Layout +------ + + The directory layout is currently being restructured, and consists of + several platform directories and then the machine specific directories + of the CPUs being built for. + + plat-samsung provides the base for all the implementations, and is the + last in the line of include directories that are processed for the build + specific information. It contains the base clock, GPIO and device definitions + to get the system running. + + plat-s5p is for s5p specific builds, and contains common support for the + S5P specific systems. Not all S5Ps use all the features in this directory + due to differences in the hardware. + + +Layout changes +-------------- + + The old plat-s3c and plat-s5pc1xx directories have been removed, with + support moved to either plat-samsung or plat-s5p as necessary. These moves + where to simplify the include and dependency issues involved with having + so many different platform directories. + + +Port Contributors +----------------- + + Ben Dooks (BJD) + Vincent Sanders + Herbert Potzl + Arnaud Patard (RTP) + Roc Wu + Klaus Fetscher + Dimitry Andric + Shannon Holland + Guillaume Gourat (NexVision) + Christer Weinigel (wingel) (Acer N30) + Lucas Correia Villa Real (S3C2400 port) + + +Document Author +--------------- + +Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org> diff --git a/Documentation/arch/arm/setup.rst b/Documentation/arch/arm/setup.rst new file mode 100644 index 000000000000..8e12ef3fb9a7 --- /dev/null +++ b/Documentation/arch/arm/setup.rst @@ -0,0 +1,108 @@ +============================================= +Kernel initialisation parameters on ARM Linux +============================================= + +The following document describes the kernel initialisation parameter +structure, otherwise known as 'struct param_struct' which is used +for most ARM Linux architectures. + +This structure is used to pass initialisation parameters from the +kernel loader to the Linux kernel proper, and may be short lived +through the kernel initialisation process. As a general rule, it +should not be referenced outside of arch/arm/kernel/setup.c:setup_arch(). + +There are a lot of parameters listed in there, and they are described +below: + + page_size + This parameter must be set to the page size of the machine, and + will be checked by the kernel. + + nr_pages + This is the total number of pages of memory in the system. If + the memory is banked, then this should contain the total number + of pages in the system. + + If the system contains separate VRAM, this value should not + include this information. + + ramdisk_size + This is now obsolete, and should not be used. + + flags + Various kernel flags, including: + + ===== ======================== + bit 0 1 = mount root read only + bit 1 unused + bit 2 0 = load ramdisk + bit 3 0 = prompt for ramdisk + ===== ======================== + + rootdev + major/minor number pair of device to mount as the root filesystem. + + video_num_cols / video_num_rows + These two together describe the character size of the dummy console, + or VGA console character size. They should not be used for any other + purpose. + + It's generally a good idea to set these to be either standard VGA, or + the equivalent character size of your fbcon display. This then allows + all the bootup messages to be displayed correctly. + + video_x / video_y + This describes the character position of cursor on VGA console, and + is otherwise unused. (should not be used for other console types, and + should not be used for other purposes). + + memc_control_reg + MEMC chip control register for Acorn Archimedes and Acorn A5000 + based machines. May be used differently by different architectures. + + sounddefault + Default sound setting on Acorn machines. May be used differently by + different architectures. + + adfsdrives + Number of ADFS/MFM disks. May be used differently by different + architectures. + + bytes_per_char_h / bytes_per_char_v + These are now obsolete, and should not be used. + + pages_in_bank[4] + Number of pages in each bank of the systems memory (used for RiscPC). + This is intended to be used on systems where the physical memory + is non-contiguous from the processors point of view. + + pages_in_vram + Number of pages in VRAM (used on Acorn RiscPC). This value may also + be used by loaders if the size of the video RAM can't be obtained + from the hardware. + + initrd_start / initrd_size + This describes the kernel virtual start address and size of the + initial ramdisk. + + rd_start + Start address in sectors of the ramdisk image on a floppy disk. + + system_rev + system revision number. + + system_serial_low / system_serial_high + system 64-bit serial number + + mem_fclk_21285 + The speed of the external oscillator to the 21285 (footbridge), + which control's the speed of the memory bus, timer & serial port. + Depending upon the speed of the cpu its value can be between + 0-66 MHz. If no params are passed or a value of zero is passed, + then a value of 50 Mhz is the default on 21285 architectures. + + paths[8][128] + These are now obsolete, and should not be used. + + commandline + Kernel command line parameters. Details can be found elsewhere. diff --git a/Documentation/arch/arm/spear/overview.rst b/Documentation/arch/arm/spear/overview.rst new file mode 100644 index 000000000000..1a77f6b213b6 --- /dev/null +++ b/Documentation/arch/arm/spear/overview.rst @@ -0,0 +1,66 @@ +======================== +SPEAr ARM Linux Overview +======================== + +Introduction +------------ + + SPEAr (Structured Processor Enhanced Architecture). + weblink : http://www.st.com/spear + + The ST Microelectronics SPEAr range of ARM9/CortexA9 System-on-Chip CPUs are + supported by the 'spear' platform of ARM Linux. Currently SPEAr1310, + SPEAr1340, SPEAr300, SPEAr310, SPEAr320 and SPEAr600 SOCs are supported. + + Hierarchy in SPEAr is as follows: + + SPEAr (Platform) + + - SPEAr3XX (3XX SOC series, based on ARM9) + - SPEAr300 (SOC) + - SPEAr300 Evaluation Board + - SPEAr310 (SOC) + - SPEAr310 Evaluation Board + - SPEAr320 (SOC) + - SPEAr320 Evaluation Board + - SPEAr6XX (6XX SOC series, based on ARM9) + - SPEAr600 (SOC) + - SPEAr600 Evaluation Board + - SPEAr13XX (13XX SOC series, based on ARM CORTEXA9) + - SPEAr1310 (SOC) + - SPEAr1310 Evaluation Board + - SPEAr1340 (SOC) + - SPEAr1340 Evaluation Board + +Configuration +------------- + + A generic configuration is provided for each machine, and can be used as the + default by:: + + make spear13xx_defconfig + make spear3xx_defconfig + make spear6xx_defconfig + +Layout +------ + + The common files for multiple machine families (SPEAr3xx, SPEAr6xx and + SPEAr13xx) are located in the platform code contained in arch/arm/plat-spear + with headers in plat/. + + Each machine series have a directory with name arch/arm/mach-spear followed by + series name. Like mach-spear3xx, mach-spear6xx and mach-spear13xx. + + Common file for machines of spear3xx family is mach-spear3xx/spear3xx.c, for + spear6xx is mach-spear6xx/spear6xx.c and for spear13xx family is + mach-spear13xx/spear13xx.c. mach-spear* also contain soc/machine specific + files, like spear1310.c, spear1340.c spear300.c, spear310.c, spear320.c and + spear600.c. mach-spear* doesn't contains board specific files as they fully + support Flattened Device Tree. + + +Document Author +--------------- + + Viresh Kumar <vireshk@kernel.org>, (c) 2010-2012 ST Microelectronics diff --git a/Documentation/arch/arm/sti/overview.rst b/Documentation/arch/arm/sti/overview.rst new file mode 100644 index 000000000000..ae16aced800f --- /dev/null +++ b/Documentation/arch/arm/sti/overview.rst @@ -0,0 +1,32 @@ +====================== +STi ARM Linux Overview +====================== + +Introduction +------------ + + The ST Microelectronics Multimedia and Application Processors range of + CortexA9 System-on-Chip are supported by the 'STi' platform of + ARM Linux. Currently STiH407, STiH410 and STiH418 are supported. + + +configuration +------------- + + The configuration for the STi platform is supported via the multi_v7_defconfig. + +Layout +------ + + All the files for multiple machine families (STiH407, STiH410, and STiH418) + are located in the platform code contained in arch/arm/mach-sti + + There is a generic board board-dt.c in the mach folder which support + Flattened Device Tree, which means, It works with any compatible board with + Device Trees. + + +Document Author +--------------- + + Srinivas Kandagatla <srinivas.kandagatla@st.com>, (c) 2013 ST Microelectronics diff --git a/Documentation/arch/arm/sti/stih407-overview.rst b/Documentation/arch/arm/sti/stih407-overview.rst new file mode 100644 index 000000000000..027e75bc7b7c --- /dev/null +++ b/Documentation/arch/arm/sti/stih407-overview.rst @@ -0,0 +1,19 @@ +================ +STiH407 Overview +================ + +Introduction +------------ + + The STiH407 is the new generation of SoC for Multi-HD, AVC set-top boxes + and server/connected client application for satellite, cable, terrestrial + and IP-STB markets. + + Features + - ARM Cortex-A9 1.5 GHz dual core CPU (28nm) + - SATA2, USB 3.0, PCIe, Gbit Ethernet + +Document Author +--------------- + + Maxime Coquelin <maxime.coquelin@st.com>, (c) 2014 ST Microelectronics diff --git a/Documentation/arch/arm/sti/stih418-overview.rst b/Documentation/arch/arm/sti/stih418-overview.rst new file mode 100644 index 000000000000..b563c1f4fe5a --- /dev/null +++ b/Documentation/arch/arm/sti/stih418-overview.rst @@ -0,0 +1,21 @@ +================ +STiH418 Overview +================ + +Introduction +------------ + + The STiH418 is the new generation of SoC for UHDp60 set-top boxes + and server/connected client application for satellite, cable, terrestrial + and IP-STB markets. + + Features + - ARM Cortex-A9 1.5 GHz quad core CPU (28nm) + - SATA2, USB 3.0, PCIe, Gbit Ethernet + - HEVC L5.1 Main 10 + - VP9 + +Document Author +--------------- + + Maxime Coquelin <maxime.coquelin@st.com>, (c) 2015 ST Microelectronics diff --git a/Documentation/arch/arm/stm32/overview.rst b/Documentation/arch/arm/stm32/overview.rst new file mode 100644 index 000000000000..85cfc8410798 --- /dev/null +++ b/Documentation/arch/arm/stm32/overview.rst @@ -0,0 +1,34 @@ +======================== +STM32 ARM Linux Overview +======================== + +Introduction +------------ + +The STMicroelectronics STM32 family of Cortex-A microprocessors (MPUs) and +Cortex-M microcontrollers (MCUs) are supported by the 'STM32' platform of +ARM Linux. + +Configuration +------------- + +For MCUs, use the provided default configuration: + make stm32_defconfig +For MPUs, use multi_v7 configuration: + make multi_v7_defconfig + +Layout +------ + +All the files for multiple machine families are located in the platform code +contained in arch/arm/mach-stm32 + +There is a generic board board-dt.c in the mach folder which support +Flattened Device Tree, which means, it works with any compatible board with +Device Trees. + +:Authors: + +- Maxime Coquelin <mcoquelin.stm32@gmail.com> +- Ludovic Barre <ludovic.barre@st.com> +- Gerald Baeza <gerald.baeza@st.com> diff --git a/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst b/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst new file mode 100644 index 000000000000..2945e0e33104 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst @@ -0,0 +1,415 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +STM32 DMA-MDMA chaining +======================= + + +Introduction +------------ + + This document describes the STM32 DMA-MDMA chaining feature. But before going + further, let's introduce the peripherals involved. + + To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed + direct memory access controllers (DMA). + + STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA + request routing capabilities are enhanced by a DMA request multiplexer + (STM32 DMAMUX). + + **STM32 DMAMUX** + + STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA + controller (STM32MP1 counts two STM32 DMA controllers) channels. + + **STM32 DMA** + + STM32 DMA is mainly used to implement central data buffer storage (usually in + the system SRAM) for different peripheral. It can access external RAMs but + without the ability to generate convenient burst transfer ensuring the best + load of the AXI. + + **STM32 MDMA** + + STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between + RAM data buffers without CPU intervention. It can also be used in a + hierarchical structure that uses STM32 DMA as first level data buffer + interfaces for AHB peripherals, while the STM32 MDMA acts as a second level + DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control + of the AXI/AHB bus. + + +Principles +---------- + + STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and + STM32 MDMA controllers. + + STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction + (when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers + (configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data + counter is automatically reloaded. This allows the SW or the STM32 MDMA to + process one memory area while the second memory area is being filled/used by + the STM32 DMA transfer. + + With STM32 MDMA linked-list mode, a single request initiates the data array + (collection of nodes) to be transferred until the linked-list pointer for the + channel is null. The channel transfer complete of the last node is the end of + transfer, unless first and last nodes are linked to each other, in such a + case, the linked-list loops on to create a circular MDMA transfer. + + STM32 MDMA has direct connections with STM32 DMA. This enables autonomous + communication and synchronization between peripherals, thus saving CPU + resources and bus congestion. Transfer Complete signal of STM32 DMA channel + can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated + by the STM32 DMA by writing to its Interrupt Clear register (whose address is + stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR). + + .. table:: STM32 MDMA interconnect table with STM32 DMA + + +--------------+----------------+-----------+------------+ + | STM32 DMAMUX | STM32 DMA | STM32 DMA | STM32 MDMA | + | channels | channels | Transfer | request | + | | | complete | | + | | | signal | | + +==============+================+===========+============+ + | Channel *0* | DMA1 channel 0 | dma1_tcf0 | *0x00* | + +--------------+----------------+-----------+------------+ + | Channel *1* | DMA1 channel 1 | dma1_tcf1 | *0x01* | + +--------------+----------------+-----------+------------+ + | Channel *2* | DMA1 channel 2 | dma1_tcf2 | *0x02* | + +--------------+----------------+-----------+------------+ + | Channel *3* | DMA1 channel 3 | dma1_tcf3 | *0x03* | + +--------------+----------------+-----------+------------+ + | Channel *4* | DMA1 channel 4 | dma1_tcf4 | *0x04* | + +--------------+----------------+-----------+------------+ + | Channel *5* | DMA1 channel 5 | dma1_tcf5 | *0x05* | + +--------------+----------------+-----------+------------+ + | Channel *6* | DMA1 channel 6 | dma1_tcf6 | *0x06* | + +--------------+----------------+-----------+------------+ + | Channel *7* | DMA1 channel 7 | dma1_tcf7 | *0x07* | + +--------------+----------------+-----------+------------+ + | Channel *8* | DMA2 channel 0 | dma2_tcf0 | *0x08* | + +--------------+----------------+-----------+------------+ + | Channel *9* | DMA2 channel 1 | dma2_tcf1 | *0x09* | + +--------------+----------------+-----------+------------+ + | Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A* | + +--------------+----------------+-----------+------------+ + | Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B* | + +--------------+----------------+-----------+------------+ + | Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C* | + +--------------+----------------+-----------+------------+ + | Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D* | + +--------------+----------------+-----------+------------+ + | Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E* | + +--------------+----------------+-----------+------------+ + | Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F* | + +--------------+----------------+-----------+------------+ + + STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed + three fast access static internal RAMs of various size, used for data storage. + Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are + bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used + between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods + and STM32 DMA uses one period while STM32 MDMA uses the other period + simultaneously. + :: + + dma[1:2]-tcf[0:7] + .----------------. + ____________ ' _________ V____________ + | STM32 DMA | / __|>_ \ | STM32 MDMA | + |------------| | / \ | |------------| + | DMA_SxM0AR |<=>| | SRAM | |<=>| []-[]...[] | + | DMA_SxM1AR | | \_____/ | | | + |____________| \___<|____/ |____________| + + STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to + exchange the parameters needed to configure MDMA. These parameters are + gathered into a u32 array with three values: + + * the STM32 MDMA request (which is actually the DMAMUX channel ID), + * the address of the STM32 DMA register to clear the Transfer Complete + interrupt flag, + * the mask of the Transfer Complete interrupt flag of the STM32 DMA channel. + +Device Tree updates for STM32 DMA-MDMA chaining support +------------------------------------------------------- + + **1. Allocate a SRAM buffer** + + SRAM device tree node is defined in SoC device tree. You can refer to it in + your board device tree to define your SRAM pool. + :: + + &sram { + my_foo_device_dma_pool: dma-sram@0 { + reg = <0x0 0x1000>; + }; + }; + + Be careful of the start index, in case there are other SRAM consumers. + Define your pool size strategically: to optimise chaining, the idea is that + STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the + SRAM. + If the SRAM period is greater than the expected DMA transfer, then STM32 DMA + and STM32 MDMA will work sequentially instead of simultaneously. It is not a + functional issue but it is not optimal. + + Don't forget to refer to your SRAM pool in your device node. You need to + define a new property. + :: + + &my_foo_device { + ... + my_dma_pool = &my_foo_device_dma_pool; + }; + + Then get this SRAM pool in your foo driver and allocate your SRAM buffer. + + **2. Allocate a STM32 DMA channel and a STM32 MDMA channel** + + You need to define an extra channel in your device tree node, in addition to + the one you should already have for "classic" DMA operation. + + This new channel must be taken from STM32 MDMA channels, so, the phandle of + the DMA controller to use is the MDMA controller's one. + :: + + &my_foo_device { + [...] + my_dma_pool = &my_foo_device_dma_pool; + dmas = <&dmamux1 ...>, // STM32 DMA channel + <&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel + }; + + Concerning STM32 MDMA bindings: + + 1. The request line number : whatever the value here, it will be overwritten + by MDMA driver with the STM32 DMAMUX channel ID passed through + (struct dma_slave_config).peripheral_config + + 2. The priority level : choose Very High (0x3) so that your channel will + take priority other the other during request arbitration + + 3. A 32bit mask specifying the DMA channel configuration : source and + destination address increment, block transfer with 128 bytes per single + transfer + + 4. The 32bit value specifying the register to be used to acknowledge the + request: it will be overwritten by MDMA driver, with the DMA channel + interrupt flag clear register address passed through + (struct dma_slave_config).peripheral_config + + 5. The 32bit mask specifying the value to be written to acknowledge the + request: it will be overwritten by MDMA driver, with the DMA channel + Transfer Complete flag passed through + (struct dma_slave_config).peripheral_config + +Driver updates for STM32 DMA-MDMA chaining support in foo driver +---------------------------------------------------------------- + + **0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()** + + In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as + is. Two new sg_tables must be created from the original one. One for + STM32 DMA transfer (where memory address targets now the SRAM buffer instead + of DDR buffer) and one for STM32 MDMA transfer (where memory address targets + the DDR buffer). + + The new sg_list items must fit SRAM period length. Here is an example for + DMA_DEV_TO_MEM: + :: + + /* + * Assuming sgl and nents, respectively the initial scatterlist and its + * length. + * Assuming sram_dma_buf and sram_period, respectively the memory + * allocated from the pool for DMA usage, and the length of the period, + * which is half of the sram_buf size. + */ + struct sg_table new_dma_sgt, new_mdma_sgt; + struct scatterlist *s, *_sgl; + dma_addr_t ddr_dma_buf; + u32 new_nents = 0, len; + int i; + + /* Count the number of entries needed */ + for_each_sg(sgl, s, nents, i) + if (sg_dma_len(s) > sram_period) + new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period); + else + new_nents++; + + /* Create sg table for STM32 DMA channel */ + ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC); + if (ret) + dev_err(dev, "DMA sg table alloc failed\n"); + + for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) { + _sgl = sgl; + sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period); + /* Targets the beginning = first half of the sram_buf */ + s->dma_address = sram_buf; + /* + * Targets the second half of the sram_buf + * for odd indexes of the item of the sg_list + */ + if (i & 1) + s->dma_address += sram_period; + } + + /* Create sg table for STM32 MDMA channel */ + ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC); + if (ret) + dev_err(dev, "MDMA sg_table alloc failed\n"); + + _sgl = sgl; + len = sg_dma_len(sgl); + ddr_dma_buf = sg_dma_address(sgl); + for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) { + size_t bytes = min_t(size_t, len, sram_period); + + sg_dma_len(s) = bytes; + sg_dma_address(s) = ddr_dma_buf; + len -= bytes; + + if (!len && sg_next(_sgl)) { + _sgl = sg_next(_sgl); + len = sg_dma_len(_sgl); + ddr_dma_buf = sg_dma_address(_sgl); + } else { + ddr_dma_buf += bytes; + } + } + + Don't forget to release these new sg_tables after getting the descriptors + with dmaengine_prep_slave_sg(). + + **1. Set controller specific parameters** + + First, use dmaengine_slave_config() with a struct dma_slave_config to + configure STM32 DMA channel. You just have to take care of DMA addresses, + the memory address (depending on the transfer direction) must point on your + SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0. + + STM32 DMA driver will check (struct dma_slave_config).peripheral_size to + determine if chaining is being used or not. If it is used, then STM32 DMA + driver fills (struct dma_slave_config).peripheral_config with an array of + three u32 : the first one containing STM32 DMAMUX channel ID, the second one + the channel interrupt flag clear register address, and the third one the + channel Transfer Complete flag mask. + + Then, use dmaengine_slave_config with another struct dma_slave_config to + configure STM32 MDMA channel. Take care of DMA addresses, the device address + (depending on the transfer direction) must point on your SRAM buffer, and + the memory address must point to the buffer originally used for "classic" + DMA operation. Use the previous (struct dma_slave_config).peripheral_size + and .peripheral_config that have been updated by STM32 DMA driver, to set + (struct dma_slave_config).peripheral_size and .peripheral_config of the + struct dma_slave_config to configure STM32 MDMA channel. + :: + + struct dma_slave_config dma_conf; + struct dma_slave_config mdma_conf; + + memset(&dma_conf, 0, sizeof(dma_conf)); + [...] + config.direction = DMA_DEV_TO_MEM; + config.dst_addr = sram_dma_buf; // SRAM buffer + config.peripheral_size = 1; // peripheral_size != 0 => chaining + + dmaengine_slave_config(dma_chan, &dma_config); + + memset(&mdma_conf, 0, sizeof(mdma_conf)); + config.direction = DMA_DEV_TO_MEM; + mdma_conf.src_addr = sram_dma_buf; // SRAM buffer + mdma_conf.dst_addr = rx_dma_buf; // original memory buffer + mdma_conf.peripheral_size = dma_conf.peripheral_size; // <- dma_conf + mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf + + dmaengine_slave_config(mdma_chan, &mdma_conf); + + **2. Get a descriptor for STM32 DMA channel transaction** + + In the same way you get your descriptor for your "classic" DMA operation, + you just have to replace the original sg_list (in case of + dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to + replace the original buffer address, length and period (in case of + dmaengine_prep_dma_cyclic()) with the new SRAM buffer. + + **3. Get a descriptor for STM32 MDMA channel transaction** + + If you previously get descriptor (for STM32 DMA) with + + * dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for + STM32 MDMA; + * dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for + STM32 MDMA. + + Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg()) + or, depending on the transfer direction, either the original DDR buffer (in + case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the + source address being previously set with dmaengine_slave_config(). + + **4. Submit both transactions** + + Before submitting your transactions, you may need to define on which + descriptor you want a callback to be called at the end of the transfer + (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()). + Depending on the direction, set the callback on the descriptor that finishes + the overal transfer: + + * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor + * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor + + Then, submit the descriptors whatever the order, with dmaengine_tx_submit(). + + **5. Issue pending requests (and wait for callback notification)** + + As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue + STM32 MDMA channel before STM32 DMA channel. + + If any, your callback will be called to warn you about the end of the overal + transfer or the period completion. + + Don't forget to terminate both channels. STM32 DMA channel is configured in + cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate + it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not + in case of cyclic transfer. You can terminate it whatever the kind of transfer. + + **STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case** + + STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the + STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads + data from SRAM buffer. So some data (the first period) have to be copied in + SRAM buffer when the STM32 DMA starts to read. + + A trick could be pausing the STM32 DMA channel (that will raise a Transfer + Complete signal, triggering the STM32 MDMA channel), but the first data read + by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM + period with dmaengine_prep_dma_memcpy(). Then this first period should be + "removed" from the sg or the cyclic transfer. + + Due to this complexity, rather use the STM32 DMA-MDMA chaining for + DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless + you're not afraid. + +Resources +--------- + + Application note, datasheet and reference manual are available on ST website + (STM32MP1_). + + Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_) + dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA. + +.. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html +.. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf +.. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf +.. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf + +:Authors: + +- Amelie Delaunay <amelie.delaunay@foss.st.com>
\ No newline at end of file diff --git a/Documentation/arch/arm/stm32/stm32f429-overview.rst b/Documentation/arch/arm/stm32/stm32f429-overview.rst new file mode 100644 index 000000000000..a7ebe8ea6697 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32f429-overview.rst @@ -0,0 +1,25 @@ +================== +STM32F429 Overview +================== + +Introduction +------------ + +The STM32F429 is a Cortex-M4 MCU aimed at various applications. +It features: + +- ARM Cortex-M4 up to 180MHz with FPU +- 2MB internal Flash Memory +- External memory support through FMC controller (PSRAM, SDRAM, NOR, NAND) +- I2C, SPI, SAI, CAN, USB OTG, Ethernet controllers +- LCD controller & Camera interface +- Cryptographic processor + +Resources +--------- + +Datasheet and reference manual are publicly available on ST website (STM32F429_). + +.. _STM32F429: http://www.st.com/web/en/catalog/mmc/FM141/SC1169/SS1577/LN1806?ecmp=stm32f429-439_pron_pr-ces2014_nov2013 + +:Authors: Maxime Coquelin <mcoquelin.stm32@gmail.com> diff --git a/Documentation/arch/arm/stm32/stm32f746-overview.rst b/Documentation/arch/arm/stm32/stm32f746-overview.rst new file mode 100644 index 000000000000..78befddc7740 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32f746-overview.rst @@ -0,0 +1,32 @@ +================== +STM32F746 Overview +================== + +Introduction +------------ + +The STM32F746 is a Cortex-M7 MCU aimed at various applications. +It features: + +- Cortex-M7 core running up to @216MHz +- 1MB internal flash, 320KBytes internal RAM (+4KB of backup SRAM) +- FMC controller to connect SDRAM, NOR and NAND memories +- Dual mode QSPI +- SD/MMC/SDIO support +- Ethernet controller +- USB OTFG FS & HS controllers +- I2C, SPI, CAN busses support +- Several 16 & 32 bits general purpose timers +- Serial Audio interface +- LCD controller +- HDMI-CEC +- SPDIFRX + +Resources +--------- + +Datasheet and reference manual are publicly available on ST website (STM32F746_). + +.. _STM32F746: http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series/stm32f7x6/stm32f746ng.html + +:Authors: Alexandre Torgue <alexandre.torgue@st.com> diff --git a/Documentation/arch/arm/stm32/stm32f769-overview.rst b/Documentation/arch/arm/stm32/stm32f769-overview.rst new file mode 100644 index 000000000000..e482980ddf21 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32f769-overview.rst @@ -0,0 +1,34 @@ +================== +STM32F769 Overview +================== + +Introduction +------------ + +The STM32F769 is a Cortex-M7 MCU aimed at various applications. +It features: + +- Cortex-M7 core running up to @216MHz +- 2MB internal flash, 512KBytes internal RAM (+4KB of backup SRAM) +- FMC controller to connect SDRAM, NOR and NAND memories +- Dual mode QSPI +- SD/MMC/SDIO support*2 +- Ethernet controller +- USB OTFG FS & HS controllers +- I2C*4, SPI*6, CAN*3 busses support +- Several 16 & 32 bits general purpose timers +- Serial Audio interface*2 +- LCD controller +- HDMI-CEC +- DSI +- SPDIFRX +- MDIO salave interface + +Resources +--------- + +Datasheet and reference manual are publicly available on ST website (STM32F769_). + +.. _STM32F769: http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32-high-performance-mcus/stm32f7-series/stm32f7x9/stm32f769ni.html + +:Authors: Alexandre Torgue <alexandre.torgue@st.com> diff --git a/Documentation/arch/arm/stm32/stm32h743-overview.rst b/Documentation/arch/arm/stm32/stm32h743-overview.rst new file mode 100644 index 000000000000..4e15f1a42730 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32h743-overview.rst @@ -0,0 +1,33 @@ +================== +STM32H743 Overview +================== + +Introduction +------------ + +The STM32H743 is a Cortex-M7 MCU aimed at various applications. +It features: + +- Cortex-M7 core running up to @400MHz +- 2MB internal flash, 1MBytes internal RAM +- FMC controller to connect SDRAM, NOR and NAND memories +- Dual mode QSPI +- SD/MMC/SDIO support +- Ethernet controller +- USB OTFG FS & HS controllers +- I2C, SPI, CAN busses support +- Several 16 & 32 bits general purpose timers +- Serial Audio interface +- LCD controller +- HDMI-CEC +- SPDIFRX +- DFSDM + +Resources +--------- + +Datasheet and reference manual are publicly available on ST website (STM32H743_). + +.. _STM32H743: http://www.st.com/en/microcontrollers/stm32h7x3.html?querycriteria=productId=LN2033 + +:Authors: Alexandre Torgue <alexandre.torgue@st.com> diff --git a/Documentation/arch/arm/stm32/stm32h750-overview.rst b/Documentation/arch/arm/stm32/stm32h750-overview.rst new file mode 100644 index 000000000000..0e51235c9547 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32h750-overview.rst @@ -0,0 +1,34 @@ +================== +STM32H750 Overview +================== + +Introduction +------------ + +The STM32H750 is a Cortex-M7 MCU aimed at various applications. +It features: + +- Cortex-M7 core running up to @480MHz +- 128K internal flash, 1MBytes internal RAM +- FMC controller to connect SDRAM, NOR and NAND memories +- Dual mode QSPI +- SD/MMC/SDIO support +- Ethernet controller +- USB OTFG FS & HS controllers +- I2C, SPI, CAN busses support +- Several 16 & 32 bits general purpose timers +- Serial Audio interface +- LCD controller +- HDMI-CEC +- SPDIFRX +- DFSDM + +Resources +--------- + +Datasheet and reference manual are publicly available on ST website (STM32H750_). + +.. _STM32H750: https://www.st.com/en/microcontrollers-microprocessors/stm32h750-value-line.html + +:Authors: Dillon Min <dillon.minfei@gmail.com> + diff --git a/Documentation/arch/arm/stm32/stm32mp13-overview.rst b/Documentation/arch/arm/stm32/stm32mp13-overview.rst new file mode 100644 index 000000000000..3bb9492dad49 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32mp13-overview.rst @@ -0,0 +1,37 @@ +=================== +STM32MP13 Overview +=================== + +Introduction +------------ + +The STM32MP131/STM32MP133/STM32MP135 are Cortex-A MPU aimed at various applications. +They feature: + +- One Cortex-A7 application core +- Standard memories interface support +- Standard connectivity, widely inherited from the STM32 MCU family +- Comprehensive security support + +More details: + +- Cortex-A7 core running up to @900MHz +- FMC controller to connect SDRAM, NOR and NAND memories +- QSPI +- SD/MMC/SDIO support +- 2*Ethernet controller +- CAN +- ADC/DAC +- USB EHCI/OHCI controllers +- USB OTG +- I2C, SPI, CAN busses support +- Several general purpose timers +- Serial Audio interface +- LCD controller +- DCMIPP +- SPDIFRX +- DFSDM + +:Authors: + +- Alexandre Torgue <alexandre.torgue@foss.st.com> diff --git a/Documentation/arch/arm/stm32/stm32mp151-overview.rst b/Documentation/arch/arm/stm32/stm32mp151-overview.rst new file mode 100644 index 000000000000..f42a2ac309c0 --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32mp151-overview.rst @@ -0,0 +1,36 @@ +=================== +STM32MP151 Overview +=================== + +Introduction +------------ + +The STM32MP151 is a Cortex-A MPU aimed at various applications. +It features: + +- Single Cortex-A7 application core +- Standard memories interface support +- Standard connectivity, widely inherited from the STM32 MCU family +- Comprehensive security support + +More details: + +- Cortex-A7 core running up to @800MHz +- FMC controller to connect SDRAM, NOR and NAND memories +- QSPI +- SD/MMC/SDIO support +- Ethernet controller +- ADC/DAC +- USB EHCI/OHCI controllers +- USB OTG +- I2C, SPI busses support +- Several general purpose timers +- Serial Audio interface +- LCD-TFT controller +- DCMIPP +- SPDIFRX +- DFSDM + +:Authors: + +- Roan van Dijk <roan@protonic.nl> diff --git a/Documentation/arch/arm/stm32/stm32mp157-overview.rst b/Documentation/arch/arm/stm32/stm32mp157-overview.rst new file mode 100644 index 000000000000..f62fdc8e7d8d --- /dev/null +++ b/Documentation/arch/arm/stm32/stm32mp157-overview.rst @@ -0,0 +1,20 @@ +=================== +STM32MP157 Overview +=================== + +Introduction +------------ + +The STM32MP157 is a Cortex-A MPU aimed at various applications. +It features: + +- Dual core Cortex-A7 application core +- 2D/3D image composition with GPU +- Standard memories interface support +- Standard connectivity, widely inherited from the STM32 MCU family +- Comprehensive security support + +:Authors: + +- Ludovic Barre <ludovic.barre@st.com> +- Gerald Baeza <gerald.baeza@st.com> diff --git a/Documentation/arch/arm/sunxi.rst b/Documentation/arch/arm/sunxi.rst new file mode 100644 index 000000000000..b85d1e2f2d47 --- /dev/null +++ b/Documentation/arch/arm/sunxi.rst @@ -0,0 +1,170 @@ +================== +ARM Allwinner SoCs +================== + +This document lists all the ARM Allwinner SoCs that are currently +supported in mainline by the Linux kernel. This document will also +provide links to documentation and/or datasheet for these SoCs. + +SunXi family +------------ + Linux kernel mach directory: arch/arm/mach-sunxi + + Flavors: + + * ARM926 based SoCs + - Allwinner F20 (sun3i) + + * Not Supported + + * ARM Cortex-A8 based SoCs + - Allwinner A10 (sun4i) + + * Datasheet + + http://dl.linux-sunxi.org/A10/A10%20Datasheet%20-%20v1.21%20%282012-04-06%29.pdf + * User Manual + + http://dl.linux-sunxi.org/A10/A10%20User%20Manual%20-%20v1.20%20%282012-04-09%2c%20DECRYPTED%29.pdf + + - Allwinner A10s (sun5i) + + * Datasheet + + http://dl.linux-sunxi.org/A10s/A10s%20Datasheet%20-%20v1.20%20%282012-03-27%29.pdf + + - Allwinner A13 / R8 (sun5i) + + * Datasheet + + http://dl.linux-sunxi.org/A13/A13%20Datasheet%20-%20v1.12%20%282012-03-29%29.pdf + * User Manual + + http://dl.linux-sunxi.org/A13/A13%20User%20Manual%20-%20v1.2%20%282013-01-08%29.pdf + + - Next Thing Co GR8 (sun5i) + + * Single ARM Cortex-A7 based SoCs + - Allwinner V3s (sun8i) + + * Datasheet + + http://linux-sunxi.org/File:Allwinner_V3s_Datasheet_V1.0.pdf + + * Dual ARM Cortex-A7 based SoCs + - Allwinner A20 (sun7i) + + * User Manual + + http://dl.linux-sunxi.org/A20/A20%20User%20Manual%202013-03-22.pdf + + - Allwinner A23 (sun8i) + + * Datasheet + + http://dl.linux-sunxi.org/A23/A23%20Datasheet%20V1.0%2020130830.pdf + + * User Manual + + http://dl.linux-sunxi.org/A23/A23%20User%20Manual%20V1.0%2020130830.pdf + + * Quad ARM Cortex-A7 based SoCs + - Allwinner A31 (sun6i) + + * Datasheet + + http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20datasheet%20V1.3%2020131106.pdf + + * User Manual + + http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20user%20manual%20V1.1%2020130630.pdf + + - Allwinner A31s (sun6i) + + * Datasheet + + http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20datasheet%20V1.3%2020131106.pdf + + * User Manual + + http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20User%20Manual%20%20V1.0%2020130322.pdf + + - Allwinner A33 (sun8i) + + * Datasheet + + http://dl.linux-sunxi.org/A33/A33%20Datasheet%20release%201.1.pdf + + * User Manual + + http://dl.linux-sunxi.org/A33/A33%20user%20manual%20release%201.1.pdf + + - Allwinner H2+ (sun8i) + + * No document available now, but is known to be working properly with + H3 drivers and memory map. + + - Allwinner H3 (sun8i) + + * Datasheet + + https://linux-sunxi.org/images/4/4b/Allwinner_H3_Datasheet_V1.2.pdf + + - Allwinner R40 (sun8i) + + * Datasheet + + https://github.com/tinalinux/docs/raw/r40-v1.y/R40_Datasheet_V1.0.pdf + + * User Manual + + https://github.com/tinalinux/docs/raw/r40-v1.y/Allwinner_R40_User_Manual_V1.0.pdf + + * Quad ARM Cortex-A15, Quad ARM Cortex-A7 based SoCs + - Allwinner A80 + + * Datasheet + + http://dl.linux-sunxi.org/A80/A80_Datasheet_Revision_1.0_0404.pdf + + * Octa ARM Cortex-A7 based SoCs + - Allwinner A83T + + * Datasheet + + https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_Datasheet_v1.3_20150510.pdf + + * User Manual + + https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_User_Manual_v1.5.1_20150513.pdf + + * Quad ARM Cortex-A53 based SoCs + - Allwinner A64 + + * Datasheet + + http://dl.linux-sunxi.org/A64/A64_Datasheet_V1.1.pdf + + * User Manual + + http://dl.linux-sunxi.org/A64/Allwinner%20A64%20User%20Manual%20v1.0.pdf + + - Allwinner H6 + + * Datasheet + + https://linux-sunxi.org/images/5/5c/Allwinner_H6_V200_Datasheet_V1.1.pdf + + * User Manual + + https://linux-sunxi.org/images/4/46/Allwinner_H6_V200_User_Manual_V1.1.pdf + + - Allwinner H616 + + * Datasheet + + https://linux-sunxi.org/images/b/b9/H616_Datasheet_V1.0_cleaned.pdf + + * User Manual + + https://linux-sunxi.org/images/2/24/H616_User_Manual_V1.0_cleaned.pdf diff --git a/Documentation/arch/arm/sunxi/clocks.rst b/Documentation/arch/arm/sunxi/clocks.rst new file mode 100644 index 000000000000..23bd03f3e21f --- /dev/null +++ b/Documentation/arch/arm/sunxi/clocks.rst @@ -0,0 +1,57 @@ +======================================================= +Frequently asked questions about the sunxi clock system +======================================================= + +This document contains useful bits of information that people tend to ask +about the sunxi clock system, as well as accompanying ASCII art when adequate. + +Q: Why is the main 24MHz oscillator gatable? Wouldn't that break the + system? + +A: The 24MHz oscillator allows gating to save power. Indeed, if gated + carelessly the system would stop functioning, but with the right + steps, one can gate it and keep the system running. Consider this + simplified suspend example: + + While the system is operational, you would see something like:: + + 24MHz 32kHz + | + PLL1 + \ + \_ CPU Mux + | + [CPU] + + When you are about to suspend, you switch the CPU Mux to the 32kHz + oscillator:: + + 24Mhz 32kHz + | | + PLL1 | + / + CPU Mux _/ + | + [CPU] + + Finally you can gate the main oscillator:: + + 32kHz + | + | + / + CPU Mux _/ + | + [CPU] + +Q: Were can I learn more about the sunxi clocks? + +A: The linux-sunxi wiki contains a page documenting the clock registers, + you can find it at + + http://linux-sunxi.org/A10/CCM + + The authoritative source for information at this time is the ccmu driver + released by Allwinner, you can find it at + + https://github.com/linux-sunxi/linux-sunxi/tree/sunxi-3.0/arch/arm/mach-sun4i/clock/ccmu diff --git a/Documentation/arch/arm/swp_emulation.rst b/Documentation/arch/arm/swp_emulation.rst new file mode 100644 index 000000000000..6a608a9c3715 --- /dev/null +++ b/Documentation/arch/arm/swp_emulation.rst @@ -0,0 +1,27 @@ +Software emulation of deprecated SWP instruction (CONFIG_SWP_EMULATE) +--------------------------------------------------------------------- + +ARMv6 architecture deprecates use of the SWP/SWPB instructions, and recommeds +moving to the load-locked/store-conditional instructions LDREX and STREX. + +ARMv7 multiprocessing extensions introduce the ability to disable these +instructions, triggering an undefined instruction exception when executed. +Trapped instructions are emulated using an LDREX/STREX or LDREXB/STREXB +sequence. If a memory access fault (an abort) occurs, a segmentation fault is +signalled to the triggering process. + +/proc/cpu/swp_emulation holds some statistics/information, including the PID of +the last process to trigger the emulation to be invocated. For example:: + + Emulated SWP: 12 + Emulated SWPB: 0 + Aborted SWP{B}: 1 + Last process: 314 + + +NOTE: + when accessing uncached shared regions, LDREX/STREX rely on an external + transaction monitoring block called a global monitor to maintain update + atomicity. If your system does not implement a global monitor, this option can + cause programs that perform SWP operations to uncached memory to deadlock, as + the STREX operation will always fail. diff --git a/Documentation/arch/arm/tcm.rst b/Documentation/arch/arm/tcm.rst new file mode 100644 index 000000000000..1dc6c39220f9 --- /dev/null +++ b/Documentation/arch/arm/tcm.rst @@ -0,0 +1,161 @@ +================================================== +ARM TCM (Tightly-Coupled Memory) handling in Linux +================================================== + +Written by Linus Walleij <linus.walleij@stericsson.com> + +Some ARM SoCs have a so-called TCM (Tightly-Coupled Memory). +This is usually just a few (4-64) KiB of RAM inside the ARM +processor. + +Due to being embedded inside the CPU, the TCM has a +Harvard-architecture, so there is an ITCM (instruction TCM) +and a DTCM (data TCM). The DTCM can not contain any +instructions, but the ITCM can actually contain data. +The size of DTCM or ITCM is minimum 4KiB so the typical +minimum configuration is 4KiB ITCM and 4KiB DTCM. + +ARM CPUs have special registers to read out status, physical +location and size of TCM memories. arch/arm/include/asm/cputype.h +defines a CPUID_TCM register that you can read out from the +system control coprocessor. Documentation from ARM can be found +at http://infocenter.arm.com, search for "TCM Status Register" +to see documents for all CPUs. Reading this register you can +determine if ITCM (bits 1-0) and/or DTCM (bit 17-16) is present +in the machine. + +There is further a TCM region register (search for "TCM Region +Registers" at the ARM site) that can report and modify the location +size of TCM memories at runtime. This is used to read out and modify +TCM location and size. Notice that this is not a MMU table: you +actually move the physical location of the TCM around. At the +place you put it, it will mask any underlying RAM from the +CPU so it is usually wise not to overlap any physical RAM with +the TCM. + +The TCM memory can then be remapped to another address again using +the MMU, but notice that the TCM is often used in situations where +the MMU is turned off. To avoid confusion the current Linux +implementation will map the TCM 1 to 1 from physical to virtual +memory in the location specified by the kernel. Currently Linux +will map ITCM to 0xfffe0000 and on, and DTCM to 0xfffe8000 and +on, supporting a maximum of 32KiB of ITCM and 32KiB of DTCM. + +Newer versions of the region registers also support dividing these +TCMs in two separate banks, so for example an 8KiB ITCM is divided +into two 4KiB banks with its own control registers. The idea is to +be able to lock and hide one of the banks for use by the secure +world (TrustZone). + +TCM is used for a few things: + +- FIQ and other interrupt handlers that need deterministic + timing and cannot wait for cache misses. + +- Idle loops where all external RAM is set to self-refresh + retention mode, so only on-chip RAM is accessible by + the CPU and then we hang inside ITCM waiting for an + interrupt. + +- Other operations which implies shutting off or reconfiguring + the external RAM controller. + +There is an interface for using TCM on the ARM architecture +in <asm/tcm.h>. Using this interface it is possible to: + +- Define the physical address and size of ITCM and DTCM. + +- Tag functions to be compiled into ITCM. + +- Tag data and constants to be allocated to DTCM and ITCM. + +- Have the remaining TCM RAM added to a special + allocation pool with gen_pool_create() and gen_pool_add() + and provice tcm_alloc() and tcm_free() for this + memory. Such a heap is great for things like saving + device state when shutting off device power domains. + +A machine that has TCM memory shall select HAVE_TCM from +arch/arm/Kconfig for itself. Code that needs to use TCM shall +#include <asm/tcm.h> + +Functions to go into itcm can be tagged like this: +int __tcmfunc foo(int bar); + +Since these are marked to become long_calls and you may want +to have functions called locally inside the TCM without +wasting space, there is also the __tcmlocalfunc prefix that +will make the call relative. + +Variables to go into dtcm can be tagged like this:: + + int __tcmdata foo; + +Constants can be tagged like this:: + + int __tcmconst foo; + +To put assembler into TCM just use:: + + .section ".tcm.text" or .section ".tcm.data" + +respectively. + +Example code:: + + #include <asm/tcm.h> + + /* Uninitialized data */ + static u32 __tcmdata tcmvar; + /* Initialized data */ + static u32 __tcmdata tcmassigned = 0x2BADBABEU; + /* Constant */ + static const u32 __tcmconst tcmconst = 0xCAFEBABEU; + + static void __tcmlocalfunc tcm_to_tcm(void) + { + int i; + for (i = 0; i < 100; i++) + tcmvar ++; + } + + static void __tcmfunc hello_tcm(void) + { + /* Some abstract code that runs in ITCM */ + int i; + for (i = 0; i < 100; i++) { + tcmvar ++; + } + tcm_to_tcm(); + } + + static void __init test_tcm(void) + { + u32 *tcmem; + int i; + + hello_tcm(); + printk("Hello TCM executed from ITCM RAM\n"); + + printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar); + tcmvar = 0xDEADBEEFU; + printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar); + + printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned); + + printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst); + + /* Allocate some TCM memory from the pool */ + tcmem = tcm_alloc(20); + if (tcmem) { + printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem); + tcmem[0] = 0xDEADBEEFU; + tcmem[1] = 0x2BADBABEU; + tcmem[2] = 0xCAFEBABEU; + tcmem[3] = 0xDEADBEEFU; + tcmem[4] = 0x2BADBABEU; + for (i = 0; i < 5; i++) + printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]); + tcm_free(tcmem, 20); + } + } diff --git a/Documentation/arch/arm/uefi.rst b/Documentation/arch/arm/uefi.rst new file mode 100644 index 000000000000..baebe688a006 --- /dev/null +++ b/Documentation/arch/arm/uefi.rst @@ -0,0 +1,70 @@ +================================================ +The Unified Extensible Firmware Interface (UEFI) +================================================ + +UEFI, the Unified Extensible Firmware Interface, is a specification +governing the behaviours of compatible firmware interfaces. It is +maintained by the UEFI Forum - http://www.uefi.org/. + +UEFI is an evolution of its predecessor 'EFI', so the terms EFI and +UEFI are used somewhat interchangeably in this document and associated +source code. As a rule, anything new uses 'UEFI', whereas 'EFI' refers +to legacy code or specifications. + +UEFI support in Linux +===================== +Booting on a platform with firmware compliant with the UEFI specification +makes it possible for the kernel to support additional features: + +- UEFI Runtime Services +- Retrieving various configuration information through the standardised + interface of UEFI configuration tables. (ACPI, SMBIOS, ...) + +For actually enabling [U]EFI support, enable: + +- CONFIG_EFI=y +- CONFIG_EFIVAR_FS=y or m + +The implementation depends on receiving information about the UEFI environment +in a Flattened Device Tree (FDT) - so is only available with CONFIG_OF. + +UEFI stub +========= +The "stub" is a feature that extends the Image/zImage into a valid UEFI +PE/COFF executable, including a loader application that makes it possible to +load the kernel directly from the UEFI shell, boot menu, or one of the +lightweight bootloaders like Gummiboot or rEFInd. + +The kernel image built with stub support remains a valid kernel image for +booting in non-UEFI environments. + +UEFI kernel support on ARM +========================== +UEFI kernel support on the ARM architectures (arm and arm64) is only available +when boot is performed through the stub. + +When booting in UEFI mode, the stub deletes any memory nodes from a provided DT. +Instead, the kernel reads the UEFI memory map. + +The stub populates the FDT /chosen node with (and the kernel scans for) the +following parameters: + +========================== ====== =========================================== +Name Size Description +========================== ====== =========================================== +linux,uefi-system-table 64-bit Physical address of the UEFI System Table. + +linux,uefi-mmap-start 64-bit Physical address of the UEFI memory map, + populated by the UEFI GetMemoryMap() call. + +linux,uefi-mmap-size 32-bit Size in bytes of the UEFI memory map + pointed to in previous entry. + +linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI + memory map. + +linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format. + +kaslr-seed 64-bit Entropy used to randomize the kernel image + base address location. +========================== ====== =========================================== diff --git a/Documentation/arch/arm/vfp/release-notes.rst b/Documentation/arch/arm/vfp/release-notes.rst new file mode 100644 index 000000000000..c6b04937cee3 --- /dev/null +++ b/Documentation/arch/arm/vfp/release-notes.rst @@ -0,0 +1,57 @@ +=============================================== +Release notes for Linux Kernel VFP support code +=============================================== + +Date: 20 May 2004 + +Author: Russell King + +This is the first release of the Linux Kernel VFP support code. It +provides support for the exceptions bounced from VFP hardware found +on ARM926EJ-S. + +This release has been validated against the SoftFloat-2b library by +John R. Hauser using the TestFloat-2a test suite. Details of this +library and test suite can be found at: + + http://www.jhauser.us/arithmetic/SoftFloat.html + +The operations which have been tested with this package are: + + - fdiv + - fsub + - fadd + - fmul + - fcmp + - fcmpe + - fcvtd + - fcvts + - fsito + - ftosi + - fsqrt + +All the above pass softfloat tests with the following exceptions: + +- fadd/fsub shows some differences in the handling of +0 / -0 results + when input operands differ in signs. +- the handling of underflow exceptions is slightly different. If a + result underflows before rounding, but becomes a normalised number + after rounding, we do not signal an underflow exception. + +Other operations which have been tested by basic assembly-only tests +are: + + - fcpy + - fabs + - fneg + - ftoui + - ftosiz + - ftouiz + +The combination operations have not been tested: + + - fmac + - fnmac + - fmsc + - fnmsc + - fnmul diff --git a/Documentation/arch/arm/vlocks.rst b/Documentation/arch/arm/vlocks.rst new file mode 100644 index 000000000000..a40a1742110b --- /dev/null +++ b/Documentation/arch/arm/vlocks.rst @@ -0,0 +1,212 @@ +====================================== +vlocks for Bare-Metal Mutual Exclusion +====================================== + +Voting Locks, or "vlocks" provide a simple low-level mutual exclusion +mechanism, with reasonable but minimal requirements on the memory +system. + +These are intended to be used to coordinate critical activity among CPUs +which are otherwise non-coherent, in situations where the hardware +provides no other mechanism to support this and ordinary spinlocks +cannot be used. + + +vlocks make use of the atomicity provided by the memory system for +writes to a single memory location. To arbitrate, every CPU "votes for +itself", by storing a unique number to a common memory location. The +final value seen in that memory location when all the votes have been +cast identifies the winner. + +In order to make sure that the election produces an unambiguous result +in finite time, a CPU will only enter the election in the first place if +no winner has been chosen and the election does not appear to have +started yet. + + +Algorithm +--------- + +The easiest way to explain the vlocks algorithm is with some pseudo-code:: + + + int currently_voting[NR_CPUS] = { 0, }; + int last_vote = -1; /* no votes yet */ + + bool vlock_trylock(int this_cpu) + { + /* signal our desire to vote */ + currently_voting[this_cpu] = 1; + if (last_vote != -1) { + /* someone already volunteered himself */ + currently_voting[this_cpu] = 0; + return false; /* not ourself */ + } + + /* let's suggest ourself */ + last_vote = this_cpu; + currently_voting[this_cpu] = 0; + + /* then wait until everyone else is done voting */ + for_each_cpu(i) { + while (currently_voting[i] != 0) + /* wait */; + } + + /* result */ + if (last_vote == this_cpu) + return true; /* we won */ + return false; + } + + bool vlock_unlock(void) + { + last_vote = -1; + } + + +The currently_voting[] array provides a way for the CPUs to determine +whether an election is in progress, and plays a role analogous to the +"entering" array in Lamport's bakery algorithm [1]. + +However, once the election has started, the underlying memory system +atomicity is used to pick the winner. This avoids the need for a static +priority rule to act as a tie-breaker, or any counters which could +overflow. + +As long as the last_vote variable is globally visible to all CPUs, it +will contain only one value that won't change once every CPU has cleared +its currently_voting flag. + + +Features and limitations +------------------------ + + * vlocks are not intended to be fair. In the contended case, it is the + _last_ CPU which attempts to get the lock which will be most likely + to win. + + vlocks are therefore best suited to situations where it is necessary + to pick a unique winner, but it does not matter which CPU actually + wins. + + * Like other similar mechanisms, vlocks will not scale well to a large + number of CPUs. + + vlocks can be cascaded in a voting hierarchy to permit better scaling + if necessary, as in the following hypothetical example for 4096 CPUs:: + + /* first level: local election */ + my_town = towns[(this_cpu >> 4) & 0xf]; + I_won = vlock_trylock(my_town, this_cpu & 0xf); + if (I_won) { + /* we won the town election, let's go for the state */ + my_state = states[(this_cpu >> 8) & 0xf]; + I_won = vlock_lock(my_state, this_cpu & 0xf)); + if (I_won) { + /* and so on */ + I_won = vlock_lock(the_whole_country, this_cpu & 0xf]; + if (I_won) { + /* ... */ + } + vlock_unlock(the_whole_country); + } + vlock_unlock(my_state); + } + vlock_unlock(my_town); + + +ARM implementation +------------------ + +The current ARM implementation [2] contains some optimisations beyond +the basic algorithm: + + * By packing the members of the currently_voting array close together, + we can read the whole array in one transaction (providing the number + of CPUs potentially contending the lock is small enough). This + reduces the number of round-trips required to external memory. + + In the ARM implementation, this means that we can use a single load + and comparison:: + + LDR Rt, [Rn] + CMP Rt, #0 + + ...in place of code equivalent to:: + + LDRB Rt, [Rn] + CMP Rt, #0 + LDRBEQ Rt, [Rn, #1] + CMPEQ Rt, #0 + LDRBEQ Rt, [Rn, #2] + CMPEQ Rt, #0 + LDRBEQ Rt, [Rn, #3] + CMPEQ Rt, #0 + + This cuts down on the fast-path latency, as well as potentially + reducing bus contention in contended cases. + + The optimisation relies on the fact that the ARM memory system + guarantees coherency between overlapping memory accesses of + different sizes, similarly to many other architectures. Note that + we do not care which element of currently_voting appears in which + bits of Rt, so there is no need to worry about endianness in this + optimisation. + + If there are too many CPUs to read the currently_voting array in + one transaction then multiple transations are still required. The + implementation uses a simple loop of word-sized loads for this + case. The number of transactions is still fewer than would be + required if bytes were loaded individually. + + + In principle, we could aggregate further by using LDRD or LDM, but + to keep the code simple this was not attempted in the initial + implementation. + + + * vlocks are currently only used to coordinate between CPUs which are + unable to enable their caches yet. This means that the + implementation removes many of the barriers which would be required + when executing the algorithm in cached memory. + + packing of the currently_voting array does not work with cached + memory unless all CPUs contending the lock are cache-coherent, due + to cache writebacks from one CPU clobbering values written by other + CPUs. (Though if all the CPUs are cache-coherent, you should be + probably be using proper spinlocks instead anyway). + + + * The "no votes yet" value used for the last_vote variable is 0 (not + -1 as in the pseudocode). This allows statically-allocated vlocks + to be implicitly initialised to an unlocked state simply by putting + them in .bss. + + An offset is added to each CPU's ID for the purpose of setting this + variable, so that no CPU uses the value 0 for its ID. + + +Colophon +-------- + +Originally created and documented by Dave Martin for Linaro Limited, for +use in ARM-based big.LITTLE platforms, with review and input gratefully +received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for +grabbing most of this text out of the relevant mail thread and writing +up the pseudocode. + +Copyright (C) 2012-2013 Linaro Limited +Distributed under the terms of Version 2 of the GNU General Public +License, as defined in linux/COPYING. + + +References +---------- + +[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming + Problem", Communications of the ACM 17, 8 (August 1974), 453-455. + + https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm + +[2] linux/arch/arm/common/vlock.S, www.kernel.org. diff --git a/Documentation/arch/arm64/acpi_object_usage.rst b/Documentation/arch/arm64/acpi_object_usage.rst new file mode 100644 index 000000000000..1da22200fdf8 --- /dev/null +++ b/Documentation/arch/arm64/acpi_object_usage.rst @@ -0,0 +1,809 @@ +=========== +ACPI Tables +=========== + +The expectations of individual ACPI tables are discussed in the list that +follows. + +If a section number is used, it refers to a section number in the ACPI +specification where the object is defined. If "Signature Reserved" is used, +the table signature (the first four bytes of the table) is the only portion +of the table recognized by the specification, and the actual table is defined +outside of the UEFI Forum (see Section 5.2.6 of the specification). + +For ACPI on arm64, tables also fall into the following categories: + + - Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT + + - Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT + + - Optional: AGDI, BGRT, CEDT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, + HMAT, IBFT, IORT, MCHI, MPAM, MPST, MSCT, NFIT, PMTT, PPTT, RASF, SBST, + SDEI, SLIT, SPMI, SRAT, STAO, TCPA, TPM2, UEFI, XENV + + - Not supported: AEST, APMT, BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, + MSDM, OEMx, PDTT, PSDT, RAS2, RSDT, SLIC, WAET, WDAT, WDRT, WPBT + +====== ======================================================================== +Table Usage for ARMv8 Linux +====== ======================================================================== +AEST Signature Reserved (signature == "AEST") + + **Arm Error Source Table** + + This table informs the OS of any error nodes in the system that are + compliant with the Arm RAS architecture. + +AGDI Signature Reserved (signature == "AGDI") + + **Arm Generic diagnostic Dump and Reset Device Interface Table** + + This table describes a non-maskable event, that is used by the platform + firmware, to request the OS to generate a diagnostic dump and reset the device. + +APMT Signature Reserved (signature == "APMT") + + **Arm Performance Monitoring Table** + + This table describes the properties of PMU support implmented by + components in the system. + +BERT Section 18.3 (signature == "BERT") + + **Boot Error Record Table** + + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +BOOT Signature Reserved (signature == "BOOT") + + **simple BOOT flag table** + + Microsoft only table, will not be supported. + +BGRT Section 5.2.22 (signature == "BGRT") + + **Boot Graphics Resource Table** + + Optional, not currently supported, with no real use-case for an + ARM server. + +CEDT Signature Reserved (signature == "CEDT") + + **CXL Early Discovery Table** + + This table allows the OS to discover any CXL Host Bridges and the Host + Bridge registers. + +CPEP Section 5.2.18 (signature == "CPEP") + + **Corrected Platform Error Polling table** + + Optional, not currently supported, and not recommended until such + time as ARM-compatible hardware is available, and the specification + suitably modified. + +CSRT Signature Reserved (signature == "CSRT") + + **Core System Resources Table** + + Optional, not currently supported. + +DBG2 Signature Reserved (signature == "DBG2") + + **DeBuG port table 2** + + License has changed and should be usable. Optional if used instead + of earlycon=<device> on the command line. + +DBGP Signature Reserved (signature == "DBGP") + + **DeBuG Port table** + + Microsoft only table, will not be supported. + +DSDT Section 5.2.11.1 (signature == "DSDT") + + **Differentiated System Description Table** + + A DSDT is required; see also SSDT. + + ACPI tables contain only one DSDT but can contain one or more SSDTs, + which are optional. Each SSDT can only add to the ACPI namespace, + but cannot modify or replace anything in the DSDT. + +DMAR Signature Reserved (signature == "DMAR") + + **DMA Remapping table** + + x86 only table, will not be supported. + +DRTM Signature Reserved (signature == "DRTM") + + **Dynamic Root of Trust for Measurement table** + + Optional, not currently supported. + +ECDT Section 5.2.16 (signature == "ECDT") + + **Embedded Controller Description Table** + + Optional, not currently supported, but could be used on ARM if and + only if one uses the GPE_BIT field to represent an IRQ number, since + there are no GPE blocks defined in hardware reduced mode. This would + need to be modified in the ACPI specification. + +EINJ Section 18.6 (signature == "EINJ") + + **Error Injection table** + + This table is very useful for testing platform response to error + conditions; it allows one to inject an error into the system as + if it had actually occurred. However, this table should not be + shipped with a production system; it should be dynamically loaded + and executed with the ACPICA tools only during testing. + +ERST Section 18.5 (signature == "ERST") + + **Error Record Serialization Table** + + On a platform supports RAS, this table must be supplied if it is not + UEFI-based; if it is UEFI-based, this table may be supplied. When this + table is not present, UEFI run time service will be utilized to save + and retrieve hardware error information to and from a persistent store. + +ETDT Signature Reserved (signature == "ETDT") + + **Event Timer Description Table** + + Obsolete table, will not be supported. + +FACS Section 5.2.10 (signature == "FACS") + + **Firmware ACPI Control Structure** + + It is unlikely that this table will be terribly useful. If it is + provided, the Global Lock will NOT be used since it is not part of + the hardware reduced profile, and only 64-bit address fields will + be considered valid. + +FADT Section 5.2.9 (signature == "FACP") + + **Fixed ACPI Description Table** + Required for arm64. + + + The HW_REDUCED_ACPI flag must be set. All of the fields that are + to be ignored when HW_REDUCED_ACPI is set are expected to be set to + zero. + + If an FACS table is provided, the X_FIRMWARE_CTRL field is to be + used, not FIRMWARE_CTRL. + + If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is + filled in properly - that the PSCI_COMPLIANT flag is set and that + PSCI_USE_HVC is set or unset as needed (see table 5-37). + + For the DSDT that is also required, the X_DSDT field is to be used, + not the DSDT field. + +FPDT Section 5.2.23 (signature == "FPDT") + + **Firmware Performance Data Table** + + Optional, useful for boot performance profiling. + +GTDT Section 5.2.24 (signature == "GTDT") + + **Generic Timer Description Table** + + Required for arm64. + +HEST Section 18.3.2 (signature == "HEST") + + **Hardware Error Source Table** + + ARM-specific error sources have been defined; please use those or the + PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER + Bridge), or use type 9 (Generic Hardware Error Source). Firmware first + error handling is possible if and only if Trusted Firmware is being + used on arm64. + + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +HMAT Section 5.2.28 (signature == "HMAT") + + **Heterogeneous Memory Attribute Table** + + This table describes the memory attributes, such as memory side cache + attributes and bandwidth and latency details, related to Memory Proximity + Domains. The OS uses this information to optimize the system memory + configuration. + +HPET Signature Reserved (signature == "HPET") + + **High Precision Event timer Table** + + x86 only table, will not be supported. + +IBFT Signature Reserved (signature == "IBFT") + + **iSCSI Boot Firmware Table** + + Microsoft defined table, support TBD. + +IORT Signature Reserved (signature == "IORT") + + **Input Output Remapping Table** + + arm64 only table, required in order to describe IO topology, SMMUs, + and GIC ITSs, and how those various components are connected together, + such as identifying which components are behind which SMMUs/ITSs. + This table will only be required on certain SBSA platforms (e.g., + when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it + remains optional. + +IVRS Signature Reserved (signature == "IVRS") + + **I/O Virtualization Reporting Structure** + + x86_64 (AMD) only table, will not be supported. + +LPIT Signature Reserved (signature == "LPIT") + + **Low Power Idle Table** + + x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor + descriptions and power states on ARM platforms should use the DSDT + and define processor container devices (_HID ACPI0010, Section 8.4, + and more specifically 8.4.3 and 8.4.4). + +MADT Section 5.2.12 (signature == "APIC") + + **Multiple APIC Description Table** + + Required for arm64. Only the GIC interrupt controller structures + should be used (types 0xA - 0xF). + +MCFG Signature Reserved (signature == "MCFG") + + **Memory-mapped ConFiGuration space** + + If the platform supports PCI/PCIe, an MCFG table is required. + +MCHI Signature Reserved (signature == "MCHI") + + **Management Controller Host Interface table** + + Optional, not currently supported. + +MPAM Signature Reserved (signature == "MPAM") + + **Memory Partitioning And Monitoring table** + + This table allows the OS to discover the MPAM controls implemented by + the subsystems. + +MPST Section 5.2.21 (signature == "MPST") + + **Memory Power State Table** + + Optional, not currently supported. + +MSCT Section 5.2.19 (signature == "MSCT") + + **Maximum System Characteristic Table** + + Optional, not currently supported. + +MSDM Signature Reserved (signature == "MSDM") + + **Microsoft Data Management table** + + Microsoft only table, will not be supported. + +NFIT Section 5.2.25 (signature == "NFIT") + + **NVDIMM Firmware Interface Table** + + Optional, not currently supported. + +OEMx Signature of "OEMx" only + + **OEM Specific Tables** + + All tables starting with a signature of "OEM" are reserved for OEM + use. Since these are not meant to be of general use but are limited + to very specific end users, they are not recommended for use and are + not supported by the kernel for arm64. + +PCCT Section 14.1 (signature == "PCCT) + + **Platform Communications Channel Table** + + Recommend for use on arm64; use of PCC is recommended when using CPPC + to control performance and power for platform processors. + +PDTT Section 5.2.29 (signature == "PDTT") + + **Platform Debug Trigger Table** + + This table describes PCC channels used to gather debug logs of + non-architectural features. + + +PMTT Section 5.2.21.12 (signature == "PMTT") + + **Platform Memory Topology Table** + + Optional, not currently supported. + +PPTT Section 5.2.30 (signature == "PPTT") + + **Processor Properties Topology Table** + + This table provides the processor and cache topology. + +PSDT Section 5.2.11.3 (signature == "PSDT") + + **Persistent System Description Table** + + Obsolete table, will not be supported. + +RAS2 Section 5.2.21 (signature == "RAS2") + + **RAS Features 2 table** + + This table provides interfaces for the RAS capabilities implemented in + the platform. + +RASF Section 5.2.20 (signature == "RASF") + + **RAS Feature table** + + Optional, not currently supported. + +RSDP Section 5.2.5 (signature == "RSD PTR") + + **Root System Description PoinTeR** + + Required for arm64. + +RSDT Section 5.2.7 (signature == "RSDT") + + **Root System Description Table** + + Since this table can only provide 32-bit addresses, it is deprecated + on arm64, and will not be used. If provided, it will be ignored. + +SBST Section 5.2.14 (signature == "SBST") + + **Smart Battery Subsystem Table** + + Optional, not currently supported. + +SDEI Signature Reserved (signature == "SDEI") + + **Software Delegated Exception Interface table** + + This table advertises the presence of the SDEI interface. + +SLIC Signature Reserved (signature == "SLIC") + + **Software LIcensing table** + + Microsoft only table, will not be supported. + +SLIT Section 5.2.17 (signature == "SLIT") + + **System Locality distance Information Table** + + Optional in general, but required for NUMA systems. + +SPCR Signature Reserved (signature == "SPCR") + + **Serial Port Console Redirection table** + + Required for arm64. + +SPMI Signature Reserved (signature == "SPMI") + + **Server Platform Management Interface table** + + Optional, not currently supported. + +SRAT Section 5.2.16 (signature == "SRAT") + + **System Resource Affinity Table** + + Optional, but if used, only the GICC Affinity structures are read. + To support arm64 NUMA, this table is required. + +SSDT Section 5.2.11.2 (signature == "SSDT") + + **Secondary System Description Table** + + These tables are a continuation of the DSDT; these are recommended + for use with devices that can be added to a running system, but can + also serve the purpose of dividing up device descriptions into more + manageable pieces. + + An SSDT can only ADD to the ACPI namespace. It cannot modify or + replace existing device descriptions already in the namespace. + + These tables are optional, however. ACPI tables should contain only + one DSDT but can contain many SSDTs. + +STAO Signature Reserved (signature == "STAO") + + **_STA Override table** + + Optional, but only necessary in virtualized environments in order to + hide devices from guest OSs. + +TCPA Signature Reserved (signature == "TCPA") + + **Trusted Computing Platform Alliance table** + + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +TPM2 Signature Reserved (signature == "TPM2") + + **Trusted Platform Module 2 table** + + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +UEFI Signature Reserved (signature == "UEFI") + + **UEFI ACPI data table** + + Optional, not currently supported. No known use case for arm64, + at present. + +WAET Signature Reserved (signature == "WAET") + + **Windows ACPI Emulated devices Table** + + Microsoft only table, will not be supported. + +WDAT Signature Reserved (signature == "WDAT") + + **Watch Dog Action Table** + + Microsoft only table, will not be supported. + +WDRT Signature Reserved (signature == "WDRT") + + **Watch Dog Resource Table** + + Microsoft only table, will not be supported. + +WPBT Signature Reserved (signature == "WPBT") + + **Windows Platform Binary Table** + + Microsoft only table, will not be supported. + +XENV Signature Reserved (signature == "XENV") + + **Xen project table** + + Optional, used only by Xen at present. + +XSDT Section 5.2.8 (signature == "XSDT") + + **eXtended System Description Table** + + Required for arm64. +====== ======================================================================== + +ACPI Objects +------------ +The expectations on individual ACPI objects that are likely to be used are +shown in the list that follows; any object not explicitly mentioned below +should be used as needed for a particular platform or particular subsystem, +such as power management or PCI. + +===== ================ ======================================================== +Name Section Usage for ARMv8 Linux +===== ================ ======================================================== +_CCA 6.2.17 This method must be defined for all bus masters + on arm64 - there are no assumptions made about + whether such devices are cache coherent or not. + The _CCA value is inherited by all descendants of + these devices so it does not need to be repeated. + Without _CCA on arm64, the kernel does not know what + to do about setting up DMA for the device. + + NB: this method provides default cache coherency + attributes; the presence of an SMMU can be used to + modify that, however. For example, a master could + default to non-coherent, but be made coherent with + the appropriate SMMU configuration (see Table 17 of + the IORT specification, ARM Document DEN 0049B). + +_CID 6.1.2 Use as needed, see also _HID. + +_CLS 6.1.3 Use as needed, see also _HID. + +_CPC 8.4.7.1 Use as needed, power management specific. CPPC is + recommended on arm64. + +_CRS 6.2.2 Required on arm64. + +_CSD 8.4.2.2 Use as needed, used only in conjunction with _CST. + +_CST 8.4.2.1 Low power idle states (8.4.4) are recommended instead + of C-states. + +_DDN 6.1.4 This field can be used for a device name. However, + it is meant for DOS device names (e.g., COM1), so be + careful of its use across OSes. + +_DSD 6.2.5 To be used with caution. If this object is used, try + to use it within the constraints already defined by the + Device Properties UUID. Only in rare circumstances + should it be necessary to create a new _DSD UUID. + + In either case, submit the _DSD definition along with + any driver patches for discussion, especially when + device properties are used. A driver will not be + considered complete without a corresponding _DSD + description. Once approved by kernel maintainers, + the UUID or device properties must then be registered + with the UEFI Forum; this may cause some iteration as + more than one OS will be registering entries. + +_DSM 9.1.1 Do not use this method. It is not standardized, the + return values are not well documented, and it is + currently a frequent source of error. + +\_GL 5.7.1 This object is not to be used in hardware reduced + mode, and therefore should not be used on arm64. + +_GLK 6.5.7 This object requires a global lock be defined; there + is no global lock on arm64 since it runs in hardware + reduced mode. Hence, do not use this object on arm64. + +\_GPE 5.3.1 This namespace is for x86 use only. Do not use it + on arm64. + +_HID 6.1.5 This is the primary object to use in device probing, + though _CID and _CLS may also be used. + +_INI 6.5.1 Not required, but can be useful in setting up devices + when UEFI leaves them in a state that may not be what + the driver expects before it starts probing. + +_LPI 8.4.4.3 Recommended for use with processor definitions (_HID + ACPI0010) on arm64. See also _RDI. + +_MLS 6.1.7 Highly recommended for use in internationalization. + +_OFF 7.2.2 It is recommended to define this method for any device + that can be turned on or off. + +_ON 7.2.3 It is recommended to define this method for any device + that can be turned on or off. + +\_OS 5.7.3 This method will return "Linux" by default (this is + the value of the macro ACPI_OS_NAME on Linux). The + command line parameter acpi_os=<string> can be used + to set it to some other value. + +_OSC 6.2.11 This method can be a global method in ACPI (i.e., + \_SB._OSC), or it may be associated with a specific + device (e.g., \_SB.DEV0._OSC), or both. When used + as a global method, only capabilities published in + the ACPI specification are allowed. When used as + a device-specific method, the process described for + using _DSD MUST be used to create an _OSC definition; + out-of-process use of _OSC is not allowed. That is, + submit the device-specific _OSC usage description as + part of the kernel driver submission, get it approved + by the kernel community, then register it with the + UEFI Forum. + +\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is + concerned, _OSI is not to be used to determine what + sort of system is being used or what functionality + is provided. The _OSC method is to be used instead. + +_PDC 8.4.1 Deprecated, do not use on arm64. + +\_PIC 5.8.1 The method should not be used. On arm64, the only + interrupt model available is GIC. + +\_PR 5.3.1 This namespace is for x86 use only on legacy systems. + Do not use it on arm64. + +_PRT 6.2.13 Required as part of the definition of all PCI root + devices. + +_PRx 7.3.8-11 Use as needed; power management specific. If _PR0 is + defined, _PR3 must also be defined. + +_PSx 7.3.2-5 Use as needed; power management specific. If _PS0 is + defined, _PS3 must also be defined. If clocks or + regulators need adjusting to be consistent with power + usage, change them in these methods. + +_RDI 8.4.4.4 Recommended for use with processor definitions (_HID + ACPI0010) on arm64. This should only be used in + conjunction with _LPI. + +\_REV 5.7.4 Always returns the latest version of ACPI supported. + +\_SB 5.3.1 Required on arm64; all devices must be defined in this + namespace. + +_SLI 6.2.15 Use is recommended when SLIT table is in use. + +_STA 6.3.7, It is recommended to define this method for any device + 7.2.4 that can be turned on or off. See also the STAO table + that provides overrides to hide devices in virtualized + environments. + +_SRS 6.2.16 Use as needed; see also _PRS. + +_STR 6.1.10 Recommended for conveying device names to end users; + this is preferred over using _DDN. + +_SUB 6.1.9 Use as needed; _HID or _CID are preferred. + +_SUN 6.1.11 Use as needed, but recommended. + +_SWS 7.4.3 Use as needed; power management specific; this may + require specification changes for use on arm64. + +_UID 6.1.12 Recommended for distinguishing devices of the same + class; define it if at all possible. +===== ================ ======================================================== + + + + +ACPI Event Model +---------------- +Do not use GPE block devices; these are not supported in the hardware reduced +profile used by arm64. Since there are no GPE blocks defined for use on ARM +platforms, ACPI events must be signaled differently. + +There are two options: GPIO-signaled interrupts (Section 5.6.5), and +interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a +new feature in the ACPI 6.1 specification. Either - or both - can be used +on a given platform, and which to use may be dependent of limitations in any +given SoC. If possible, interrupt-signaled events are recommended. + + +ACPI Processor Control +---------------------- +Section 8 of the ACPI specification changed significantly in version 6.0. +Processors should now be defined as Device objects with _HID ACPI0007; do +not use the deprecated Processor statement in ASL. All multiprocessor systems +should also define a hierarchy of processors, done with Processor Container +Devices (see Section 8.4.3.1, _HID ACPI0010); do not use processor aggregator +devices (Section 8.5) to describe processor topology. Section 8.4 of the +specification describes the semantics of these object definitions and how +they interrelate. + +Most importantly, the processor hierarchy defined also defines the low power +idle states that are available to the platform, along with the rules for +determining which processors can be turned on or off and the circumstances +that control that. Without this information, the processors will run in +whatever power state they were left in by UEFI. + +Note too, that the processor Device objects defined and the entries in the +MADT for GICs are expected to be in synchronization. The _UID of the Device +object must correspond to processor IDs used in the MADT. + +It is recommended that CPPC (8.4.5) be used as the primary model for processor +performance control on arm64. C-states and P-states may become available at +some point in the future, but most current design work appears to favor CPPC. + +Further, it is essential that the ARMv8 SoC provide a fully functional +implementation of PSCI; this will be the only mechanism supported by ACPI +to control CPU power state. Booting of secondary CPUs using the ACPI +parking protocol is possible, but discouraged, since only PSCI is supported +for ARM servers. + + +ACPI System Address Map Interfaces +---------------------------------- +In Section 15 of the ACPI specification, several methods are mentioned as +possible mechanisms for conveying memory resource information to the kernel. +For arm64, we will only support UEFI for booting with ACPI, hence the UEFI +GetMemoryMap() boot service is the only mechanism that will be used. + + +ACPI Platform Error Interfaces (APEI) +------------------------------------- +The APEI tables supported are described above. + +APEI requires the equivalent of an SCI and an NMI on ARMv8. The SCI is used +to notify the OSPM of errors that have occurred but can be corrected and the +system can continue correct operation, even if possibly degraded. The NMI is +used to indicate fatal errors that cannot be corrected, and require immediate +attention. + +Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles +these slightly differently. The SCI is handled as a high priority interrupt; +given that these are corrected (or correctable) errors being reported, this +is sufficient. The NMI is emulated as the highest priority interrupt +possible. This implies some caution must be used since there could be +interrupts at higher privilege levels or even interrupts at the same priority +as the emulated NMI. In Linux, this should not be the case but one should +be aware it could happen. + + +ACPI Objects Not Supported on ARM64 +----------------------------------- +While this may change in the future, there are several classes of objects +that can be defined, but are not currently of general interest to ARM servers. +Some of these objects have x86 equivalents, and may actually make sense in ARM +servers. However, there is either no hardware available at present, or there +may not even be a non-ARM implementation yet. Hence, they are not currently +supported. + +The following classes of objects are not supported: + + - Section 9.2: ambient light sensor devices + + - Section 9.3: battery devices + + - Section 9.4: lids (e.g., laptop lids) + + - Section 9.8.2: IDE controllers + + - Section 9.9: floppy controllers + + - Section 9.10: GPE block devices + + - Section 9.15: PC/AT RTC/CMOS devices + + - Section 9.16: user presence detection devices + + - Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT + + - Section 9.18: time and alarm devices (see 9.15) + + - Section 10: power source and power meter devices + + - Section 11: thermal management + + - Section 12: embedded controllers interface + + - Section 13: SMBus interfaces + + +This also means that there is no support for the following objects: + +==== =========================== ==== ========== +Name Section Name Section +==== =========================== ==== ========== +_ALC 9.3.4 _FDM 9.10.3 +_ALI 9.3.2 _FIX 6.2.7 +_ALP 9.3.6 _GAI 10.4.5 +_ALR 9.3.5 _GHL 10.4.7 +_ALT 9.3.3 _GTM 9.9.2.1.1 +_BCT 10.2.2.10 _LID 9.5.1 +_BDN 6.5.3 _PAI 10.4.4 +_BIF 10.2.2.1 _PCL 10.3.2 +_BIX 10.2.2.1 _PIF 10.3.3 +_BLT 9.2.3 _PMC 10.4.1 +_BMA 10.2.2.4 _PMD 10.4.8 +_BMC 10.2.2.12 _PMM 10.4.3 +_BMD 10.2.2.11 _PRL 10.3.4 +_BMS 10.2.2.5 _PSR 10.3.1 +_BST 10.2.2.6 _PTP 10.4.2 +_BTH 10.2.2.7 _SBS 10.1.3 +_BTM 10.2.2.9 _SHL 10.4.6 +_BTP 10.2.2.8 _STM 9.9.2.1.1 +_DCK 6.5.2 _UPD 9.16.1 +_EC 12.12 _UPP 9.16.2 +_FDE 9.10.1 _WPC 10.5.2 +_FDI 9.10.2 _WPP 10.5.3 +==== =========================== ==== ========== diff --git a/Documentation/arch/arm64/amu.rst b/Documentation/arch/arm64/amu.rst new file mode 100644 index 000000000000..01f2de2b0450 --- /dev/null +++ b/Documentation/arch/arm64/amu.rst @@ -0,0 +1,119 @@ +.. _amu_index: + +======================================================= +Activity Monitors Unit (AMU) extension in AArch64 Linux +======================================================= + +Author: Ionela Voinescu <ionela.voinescu@arm.com> + +Date: 2019-09-10 + +This document briefly describes the provision of Activity Monitors Unit +support in AArch64 Linux. + + +Architecture overview +--------------------- + +The activity monitors extension is an optional extension introduced by the +ARMv8.4 CPU architecture. + +The activity monitors unit, implemented in each CPU, provides performance +counters intended for system management use. The AMU extension provides a +system register interface to the counter registers and also supports an +optional external memory-mapped interface. + +Version 1 of the Activity Monitors architecture implements a counter group +of four fixed and architecturally defined 64-bit event counters. + + - CPU cycle counter: increments at the frequency of the CPU. + - Constant counter: increments at the fixed frequency of the system + clock. + - Instructions retired: increments with every architecturally executed + instruction. + - Memory stall cycles: counts instruction dispatch stall cycles caused by + misses in the last level cache within the clock domain. + +When in WFI or WFE these counters do not increment. + +The Activity Monitors architecture provides space for up to 16 architected +event counters. Future versions of the architecture may use this space to +implement additional architected event counters. + +Additionally, version 1 implements a counter group of up to 16 auxiliary +64-bit event counters. + +On cold reset all counters reset to 0. + + +Basic support +------------- + +The kernel can safely run a mix of CPUs with and without support for the +activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is +selected we unconditionally enable the capability to allow any late CPU +(secondary or hotplugged) to detect and use the feature. + +When the feature is detected on a CPU, we flag the availability of the +feature but this does not guarantee the correct functionality of the +counters, only the presence of the extension. + +Firmware (code running at higher exception levels, e.g. arm-tf) support is +needed to: + + - Enable access for lower exception levels (EL2 and EL1) to the AMU + registers. + - Enable the counters. If not enabled these will read as 0. + - Save/restore the counters before/after the CPU is being put/brought up + from the 'off' power state. + +When using kernels that have this feature enabled but boot with broken +firmware the user may experience panics or lockups when accessing the +counter registers. Even if these symptoms are not observed, the values +returned by the register reads might not correctly reflect reality. Most +commonly, the counters will read as 0, indicating that they are not +enabled. + +If proper support is not provided in firmware it's best to disable +CONFIG_ARM64_AMU_EXTN. To be noted that for security reasons, this does not +bypass the setting of AMUSERENR_EL0 to trap accesses from EL0 (userspace) to +EL1 (kernel). Therefore, firmware should still ensure accesses to AMU registers +are not trapped in EL2/EL3. + +The fixed counters of AMUv1 are accessible though the following system +register definitions: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +Auxiliary platform specific counters can be accessed using +SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. + +Details can be found in: arch/arm64/include/asm/sysreg.h. + + +Userspace access +---------------- + +Currently, access from userspace to the AMU registers is disabled due to: + + - Security reasons: they might expose information about code executed in + secure mode. + - Purpose: AMU counters are intended for system management use. + +Also, the presence of the feature is not visible to userspace. + + +Virtualization +-------------- + +Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM +guest side is disabled due to: + + - Security reasons: they might expose information about code executed + by other guests or the host. + +Any attempt to access the AMU registers will result in an UNDEFINED +exception being injected into the guest. diff --git a/Documentation/arch/arm64/arm-acpi.rst b/Documentation/arch/arm64/arm-acpi.rst new file mode 100644 index 000000000000..94274a8d84cf --- /dev/null +++ b/Documentation/arch/arm64/arm-acpi.rst @@ -0,0 +1,575 @@ +=================== +ACPI on Arm systems +=================== + +ACPI can be used for Armv8 and Armv9 systems designed to follow +the BSA (Arm Base System Architecture) [0] and BBR (Arm +Base Boot Requirements) [1] specifications. Both BSA and BBR are publicly +accessible documents. +Arm Servers, in addition to being BSA compliant, comply with a set +of rules defined in SBSA (Server Base System Architecture) [2]. + +The Arm kernel implements the reduced hardware model of ACPI version +5.1 or later. Links to the specification and all external documents +it refers to are managed by the UEFI Forum. The specification is +available at http://www.uefi.org/specifications and documents referenced +by the specification can be found via http://www.uefi.org/acpi. + +If an Arm system does not meet the requirements of the BSA and BBR, +or cannot be described using the mechanisms defined in the required ACPI +specifications, then ACPI may not be a good fit for the hardware. + +While the documents mentioned above set out the requirements for building +industry-standard Arm systems, they also apply to more than one operating +system. The purpose of this document is to describe the interaction between +ACPI and Linux only, on an Arm system -- that is, what Linux expects of +ACPI and what ACPI can expect of Linux. + + +Why ACPI on Arm? +---------------- +Before examining the details of the interface between ACPI and Linux, it is +useful to understand why ACPI is being used. Several technologies already +exist in Linux for describing non-enumerable hardware, after all. In this +section we summarize a blog post [3] from Grant Likely that outlines the +reasoning behind ACPI on Arm systems. Actually, we snitch a good portion +of the summary text almost directly, to be honest. + +The short form of the rationale for ACPI on Arm is: + +- ACPI’s byte code (AML) allows the platform to encode hardware behavior, + while DT explicitly does not support this. For hardware vendors, being + able to encode behavior is a key tool used in supporting operating + system releases on new hardware. + +- ACPI’s OSPM defines a power management model that constrains what the + platform is allowed to do into a specific model, while still providing + flexibility in hardware design. + +- In the enterprise server environment, ACPI has established bindings (such + as for RAS) which are currently used in production systems. DT does not. + Such bindings could be defined in DT at some point, but doing so means Arm + and x86 would end up using completely different code paths in both firmware + and the kernel. + +- Choosing a single interface to describe the abstraction between a platform + and an OS is important. Hardware vendors would not be required to implement + both DT and ACPI if they want to support multiple operating systems. And, + agreeing on a single interface instead of being fragmented into per OS + interfaces makes for better interoperability overall. + +- The new ACPI governance process works well and Linux is now at the same + table as hardware vendors and other OS vendors. In fact, there is no + longer any reason to feel that ACPI only belongs to Windows or that + Linux is in any way secondary to Microsoft in this arena. The move of + ACPI governance into the UEFI forum has significantly opened up the + specification development process, and currently, a large portion of the + changes being made to ACPI are being driven by Linux. + +Key to the use of ACPI is the support model. For servers in general, the +responsibility for hardware behaviour cannot solely be the domain of the +kernel, but rather must be split between the platform and the kernel, in +order to allow for orderly change over time. ACPI frees the OS from needing +to understand all the minute details of the hardware so that the OS doesn’t +need to be ported to each and every device individually. It allows the +hardware vendors to take responsibility for power management behaviour without +depending on an OS release cycle which is not under their control. + +ACPI is also important because hardware and OS vendors have already worked +out the mechanisms for supporting a general purpose computing ecosystem. The +infrastructure is in place, the bindings are in place, and the processes are +in place. DT does exactly what Linux needs it to when working with vertically +integrated devices, but there are no good processes for supporting what the +server vendors need. Linux could potentially get there with DT, but doing so +really just duplicates something that already works. ACPI already does what +the hardware vendors need, Microsoft won’t collaborate on DT, and hardware +vendors would still end up providing two completely separate firmware +interfaces -- one for Linux and one for Windows. + + +Kernel Compatibility +-------------------- +One of the primary motivations for ACPI is standardization, and using that +to provide backward compatibility for Linux kernels. In the server market, +software and hardware are often used for long periods. ACPI allows the +kernel and firmware to agree on a consistent abstraction that can be +maintained over time, even as hardware or software change. As long as the +abstraction is supported, systems can be updated without necessarily having +to replace the kernel. + +When a Linux driver or subsystem is first implemented using ACPI, it by +definition ends up requiring a specific version of the ACPI specification +-- it's baseline. ACPI firmware must continue to work, even though it may +not be optimal, with the earliest kernel version that first provides support +for that baseline version of ACPI. There may be a need for additional drivers, +but adding new functionality (e.g., CPU power management) should not break +older kernel versions. Further, ACPI firmware must also work with the most +recent version of the kernel. + + +Relationship with Device Tree +----------------------------- +ACPI support in drivers and subsystems for Arm should never be mutually +exclusive with DT support at compile time. + +At boot time the kernel will only use one description method depending on +parameters passed from the boot loader (including kernel bootargs). + +Regardless of whether DT or ACPI is used, the kernel must always be capable +of booting with either scheme (in kernels with both schemes enabled at compile +time). + + +Booting using ACPI tables +------------------------- +The only defined method for passing ACPI tables to the kernel on Arm +is via the UEFI system configuration table. Just so it is explicit, this +means that ACPI is only supported on platforms that boot via UEFI. + +When an Arm system boots, it can either have DT information, ACPI tables, +or in some very unusual cases, both. If no command line parameters are used, +the kernel will try to use DT for device enumeration; if there is no DT +present, the kernel will try to use ACPI tables, but only if they are present. +In neither is available, the kernel will not boot. If acpi=force is used +on the command line, the kernel will attempt to use ACPI tables first, but +fall back to DT if there are no ACPI tables present. The basic idea is that +the kernel will not fail to boot unless it absolutely has no other choice. + +Processing of ACPI tables may be disabled by passing acpi=off on the kernel +command line; this is the default behavior. + +In order for the kernel to load and use ACPI tables, the UEFI implementation +MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with +the ACPI signature "RSD PTR "). If this pointer is incorrect and acpi=force +is used, the kernel will disable ACPI and try to use DT to boot instead; the +kernel has, in effect, determined that ACPI tables are not present at that +point. + +If the pointer to the RSDP table is correct, the table will be mapped into +the kernel by the ACPI core, using the address provided by UEFI. + +The ACPI core will then locate and map in all other ACPI tables provided by +using the addresses in the RSDP table to find the XSDT (eXtended System +Description Table). The XSDT in turn provides the addresses to all other +ACPI tables provided by the system firmware; the ACPI core will then traverse +this table and map in the tables listed. + +The ACPI core will ignore any provided RSDT (Root System Description Table). +RSDTs have been deprecated and are ignored on arm64 since they only allow +for 32-bit addresses. + +Further, the ACPI core will only use the 64-bit address fields in the FADT +(Fixed ACPI Description Table). Any 32-bit address fields in the FADT will +be ignored on arm64. + +Hardware reduced mode (see Section 4.1 of the ACPI 6.1 specification) will +be enforced by the ACPI core on arm64. Doing so allows the ACPI core to +run less complex code since it no longer has to provide support for legacy +hardware from other architectures. Any fields that are not to be used for +hardware reduced mode must be set to zero. + +For the ACPI core to operate properly, and in turn provide the information +the kernel needs to configure devices, it expects to find the following +tables (all section numbers refer to the ACPI 6.5 specification): + + - RSDP (Root System Description Pointer), section 5.2.5 + + - XSDT (eXtended System Description Table), section 5.2.8 + + - FADT (Fixed ACPI Description Table), section 5.2.9 + + - DSDT (Differentiated System Description Table), section + 5.2.11.1 + + - MADT (Multiple APIC Description Table), section 5.2.12 + + - GTDT (Generic Timer Description Table), section 5.2.24 + + - PPTT (Processor Properties Topology Table), section 5.2.30 + + - DBG2 (DeBuG port table 2), section 5.2.6, specifically Table 5-6. + + - APMT (Arm Performance Monitoring unit Table), section 5.2.6, specifically Table 5-6. + + - AGDI (Arm Generic diagnostic Dump and Reset Device Interface Table), section 5.2.6, specifically Table 5-6. + + - If PCI is supported, the MCFG (Memory mapped ConFiGuration + Table), section 5.2.6, specifically Table 5-6. + + - If booting without a console=<device> kernel parameter is + supported, the SPCR (Serial Port Console Redirection table), + section 5.2.6, specifically Table 5-6. + + - If necessary to describe the I/O topology, SMMUs and GIC ITSs, + the IORT (Input Output Remapping Table, section 5.2.6, specifically + Table 5-6). + + - If NUMA is supported, the following tables are required: + + - SRAT (System Resource Affinity Table), section 5.2.16 + + - SLIT (System Locality distance Information Table), section 5.2.17 + + - If NUMA is supported, and the system contains heterogeneous memory, + the HMAT (Heterogeneous Memory Attribute Table), section 5.2.28. + + - If the ACPI Platform Error Interfaces are required, the following + tables are conditionally required: + + - BERT (Boot Error Record Table, section 18.3.1) + + - EINJ (Error INJection table, section 18.6.1) + + - ERST (Error Record Serialization Table, section 18.5) + + - HEST (Hardware Error Source Table, section 18.3.2) + + - SDEI (Software Delegated Exception Interface table, section 5.2.6, + specifically Table 5-6) + + - AEST (Arm Error Source Table, section 5.2.6, + specifically Table 5-6) + + - RAS2 (ACPI RAS2 feature table, section 5.2.21) + + - If the system contains controllers using PCC channel, the + PCCT (Platform Communications Channel Table), section 14.1 + + - If the system contains a controller to capture board-level system state, + and communicates with the host via PCC, the PDTT (Platform Debug Trigger + Table), section 5.2.29. + + - If NVDIMM is supported, the NFIT (NVDIMM Firmware Interface Table), section 5.2.26 + + - If video framebuffer is present, the BGRT (Boot Graphics Resource Table), section 5.2.23 + + - If IPMI is implemented, the SPMI (Server Platform Management Interface), + section 5.2.6, specifically Table 5-6. + + - If the system contains a CXL Host Bridge, the CEDT (CXL Early Discovery + Table), section 5.2.6, specifically Table 5-6. + + - If the system supports MPAM, the MPAM (Memory Partitioning And Monitoring table), section 5.2.6, + specifically Table 5-6. + + - If the system lacks persistent storage, the IBFT (ISCSI Boot Firmware + Table), section 5.2.6, specifically Table 5-6. + + +If the above tables are not all present, the kernel may or may not be +able to boot properly since it may not be able to configure all of the +devices available. This list of tables is not meant to be all inclusive; +in some environments other tables may be needed (e.g., any of the APEI +tables from section 18) to support specific functionality. + + +ACPI Detection +-------------- +Drivers should determine their probe() type by checking for a null +value for ACPI_HANDLE, or checking .of_node, or other information in +the device structure. This is detailed further in the "Driver +Recommendations" section. + +In non-driver code, if the presence of ACPI needs to be detected at +run time, then check the value of acpi_disabled. If CONFIG_ACPI is not +set, acpi_disabled will always be 1. + + +Device Enumeration +------------------ +Device descriptions in ACPI should use standard recognized ACPI interfaces. +These may contain less information than is typically provided via a Device +Tree description for the same device. This is also one of the reasons that +ACPI can be useful -- the driver takes into account that it may have less +detailed information about the device and uses sensible defaults instead. +If done properly in the driver, the hardware can change and improve over +time without the driver having to change at all. + +Clocks provide an excellent example. In DT, clocks need to be specified +and the drivers need to take them into account. In ACPI, the assumption +is that UEFI will leave the device in a reasonable default state, including +any clock settings. If for some reason the driver needs to change a clock +value, this can be done in an ACPI method; all the driver needs to do is +invoke the method and not concern itself with what the method needs to do +to change the clock. Changing the hardware can then take place over time +by changing what the ACPI method does, and not the driver. + +In DT, the parameters needed by the driver to set up clocks as in the example +above are known as "bindings"; in ACPI, these are known as "Device Properties" +and provided to a driver via the _DSD object. + +ACPI tables are described with a formal language called ASL, the ACPI +Source Language (section 19 of the specification). This means that there +are always multiple ways to describe the same thing -- including device +properties. For example, device properties could use an ASL construct +that looks like this: Name(KEY0, "value0"). An ACPI device driver would +then retrieve the value of the property by evaluating the KEY0 object. +However, using Name() this way has multiple problems: (1) ACPI limits +names ("KEY0") to four characters unlike DT; (2) there is no industry +wide registry that maintains a list of names, minimizing re-use; (3) +there is also no registry for the definition of property values ("value0"), +again making re-use difficult; and (4) how does one maintain backward +compatibility as new hardware comes out? The _DSD method was created +to solve precisely these sorts of problems; Linux drivers should ALWAYS +use the _DSD method for device properties and nothing else. + +The _DSM object (ACPI Section 9.14.1) could also be used for conveying +device properties to a driver. Linux drivers should only expect it to +be used if _DSD cannot represent the data required, and there is no way +to create a new UUID for the _DSD object. Note that there is even less +regulation of the use of _DSM than there is of _DSD. Drivers that depend +on the contents of _DSM objects will be more difficult to maintain over +time because of this; as of this writing, the use of _DSM is the cause +of quite a few firmware problems and is not recommended. + +Drivers should look for device properties in the _DSD object ONLY; the _DSD +object is described in the ACPI specification section 6.2.5, but this only +describes how to define the structure of an object returned via _DSD, and +how specific data structures are defined by specific UUIDs. Linux should +only use the _DSD Device Properties UUID [4]: + + - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 + +Common device properties can be registered by creating a pull request to [4] so +that they may be used across all operating systems supporting ACPI. +Device properties that have not been registered with the UEFI Forum can be used +but not as "uefi-" common properties. + +Before creating new device properties, check to be sure that they have not +been defined before and either registered in the Linux kernel documentation +as DT bindings, or the UEFI Forum as device properties. While we do not want +to simply move all DT bindings into ACPI device properties, we can learn from +what has been previously defined. + +If it is necessary to define a new device property, or if it makes sense to +synthesize the definition of a binding so it can be used in any firmware, +both DT bindings and ACPI device properties for device drivers have review +processes. Use them both. When the driver itself is submitted for review +to the Linux mailing lists, the device property definitions needed must be +submitted at the same time. A driver that supports ACPI and uses device +properties will not be considered complete without their definitions. Once +the device property has been accepted by the Linux community, it must be +registered with the UEFI Forum [4], which will review it again for consistency +within the registry. This may require iteration. The UEFI Forum, though, +will always be the canonical site for device property definitions. + +It may make sense to provide notice to the UEFI Forum that there is the +intent to register a previously unused device property name as a means of +reserving the name for later use. Other operating system vendors will +also be submitting registration requests and this may help smooth the +process. + +Once registration and review have been completed, the kernel provides an +interface for looking up device properties in a manner independent of +whether DT or ACPI is being used. This API should be used [5]; it can +eliminate some duplication of code paths in driver probing functions and +discourage divergence between DT bindings and ACPI device properties. + + +Programmable Power Control Resources +------------------------------------ +Programmable power control resources include such resources as voltage/current +providers (regulators) and clock sources. + +With ACPI, the kernel clock and regulator framework is not expected to be used +at all. + +The kernel assumes that power control of these resources is represented with +Power Resource Objects (ACPI section 7.1). The ACPI core will then handle +correctly enabling and disabling resources as they are needed. In order to +get that to work, ACPI assumes each device has defined D-states and that these +can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3; +in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for +turning a device full off. + +There are two options for using those Power Resources. They can: + + - be managed in a _PSx method which gets called on entry to power + state Dx. + + - be declared separately as power resources with their own _ON and _OFF + methods. They are then tied back to D-states for a particular device + via _PRx which specifies which power resources a device needs to be on + while in Dx. Kernel then tracks number of devices using a power resource + and calls _ON/_OFF as needed. + +The kernel ACPI code will also assume that the _PSx methods follow the normal +ACPI rules for such methods: + + - If either _PS0 or _PS3 is implemented, then the other method must also + be implemented. + + - If a device requires usage or setup of a power resource when on, the ASL + should organize that it is allocated/enabled using the _PS0 method. + + - Resources allocated or enabled in the _PS0 method should be disabled + or de-allocated in the _PS3 method. + + - Firmware will leave the resources in a reasonable state before handing + over control to the kernel. + +Such code in _PSx methods will of course be very platform specific. But, +this allows the driver to abstract out the interface for operating the device +and avoid having to read special non-standard values from ACPI tables. Further, +abstracting the use of these resources allows the hardware to change over time +without requiring updates to the driver. + + +Clocks +------ +ACPI makes the assumption that clocks are initialized by the firmware -- +UEFI, in this case -- to some working value before control is handed over +to the kernel. This has implications for devices such as UARTs, or SoC-driven +LCD displays, for example. + +When the kernel boots, the clocks are assumed to be set to reasonable +working values. If for some reason the frequency needs to change -- e.g., +throttling for power management -- the device driver should expect that +process to be abstracted out into some ACPI method that can be invoked +(please see the ACPI specification for further recommendations on standard +methods to be expected). The only exceptions to this are CPU clocks where +CPPC provides a much richer interface than ACPI methods. If the clocks +are not set, there is no direct way for Linux to control them. + +If an SoC vendor wants to provide fine-grained control of the system clocks, +they could do so by providing ACPI methods that could be invoked by Linux +drivers. However, this is NOT recommended and Linux drivers should NOT use +such methods, even if they are provided. Such methods are not currently +standardized in the ACPI specification, and using them could tie a kernel +to a very specific SoC, or tie an SoC to a very specific version of the +kernel, both of which we are trying to avoid. + + +Driver Recommendations +---------------------- +DO NOT remove any DT handling when adding ACPI support for a driver. The +same device may be used on many different systems. + +DO try to structure the driver so that it is data-driven. That is, set up +a struct containing internal per-device state based on defaults and whatever +else must be discovered by the driver probe function. Then, have the rest +of the driver operate off of the contents of that struct. Doing so should +allow most divergence between ACPI and DT functionality to be kept local to +the probe function instead of being scattered throughout the driver. For +example:: + + static int device_probe_dt(struct platform_device *pdev) + { + /* DT specific functionality */ + ... + } + + static int device_probe_acpi(struct platform_device *pdev) + { + /* ACPI specific functionality */ + ... + } + + static int device_probe(struct platform_device *pdev) + { + ... + struct device_node node = pdev->dev.of_node; + ... + + if (node) + ret = device_probe_dt(pdev); + else if (ACPI_HANDLE(&pdev->dev)) + ret = device_probe_acpi(pdev); + else + /* other initialization */ + ... + /* Continue with any generic probe operations */ + ... + } + +DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it +clear the different names the driver is probed for, both from DT and from +ACPI:: + + static struct of_device_id virtio_mmio_match[] = { + { .compatible = "virtio,mmio", }, + { } + }; + MODULE_DEVICE_TABLE(of, virtio_mmio_match); + + static const struct acpi_device_id virtio_mmio_acpi_match[] = { + { "LNRO0005", }, + { } + }; + MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); + + +ASWG +---- +The ACPI specification changes regularly. During the year 2014, for instance, +version 5.1 was released and version 6.0 substantially completed, with most of +the changes being driven by Arm-specific requirements. Proposed changes are +presented and discussed in the ASWG (ACPI Specification Working Group) which +is a part of the UEFI Forum. The current version of the ACPI specification +is 6.5 release in August 2022. + +Participation in this group is open to all UEFI members. Please see +http://www.uefi.org/workinggroup for details on group membership. + +It is the intent of the Arm ACPI kernel code to follow the ACPI specification +as closely as possible, and to only implement functionality that complies with +the released standards from UEFI ASWG. As a practical matter, there will be +vendors that provide bad ACPI tables or violate the standards in some way. +If this is because of errors, quirks and fix-ups may be necessary, but will +be avoided if possible. If there are features missing from ACPI that preclude +it from being used on a platform, ECRs (Engineering Change Requests) should be +submitted to ASWG and go through the normal approval process; for those that +are not UEFI members, many other members of the Linux community are and would +likely be willing to assist in submitting ECRs. + + +Linux Code +---------- +Individual items specific to Linux on Arm, contained in the Linux +source code, are in the list that follows: + +ACPI_OS_NAME + This macro defines the string to be returned when + an ACPI method invokes the _OS method. On Arm + systems, this macro will be "Linux" by default. + The command line parameter acpi_os=<string> + can be used to set it to some other value. The + default value for other architectures is "Microsoft + Windows NT", for example. + +ACPI Objects +------------ +Detailed expectations for ACPI tables and object are listed in the file +Documentation/arch/arm64/acpi_object_usage.rst. + + +References +---------- +[0] https://developer.arm.com/documentation/den0094/latest + document Arm-DEN-0094: "Arm Base System Architecture", version 1.0C, dated 6 Oct 2022 + +[1] https://developer.arm.com/documentation/den0044/latest + Document Arm-DEN-0044: "Arm Base Boot Requirements", version 2.0G, dated 15 Apr 2022 + +[2] https://developer.arm.com/documentation/den0029/latest + Document Arm-DEN-0029: "Arm Server Base System Architecture", version 7.1, dated 06 Oct 2022 + +[3] http://www.secretlab.ca/archives/151, + 10 Jan 2015, Copyright (c) 2015, + Linaro Ltd., written by Grant Likely. + +[4] _DSD (Device Specific Data) Implementation Guide + https://github.com/UEFI/DSD-Guide/blob/main/dsd-guide.pdf + +[5] Kernel code for the unified device + property interface can be found in + include/linux/property.h and drivers/base/property.c. + + +Authors +------- +- Al Stone <al.stone@linaro.org> +- Graeme Gregory <graeme.gregory@linaro.org> +- Hanjun Guo <hanjun.guo@linaro.org> + +- Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section diff --git a/Documentation/arch/arm64/asymmetric-32bit.rst b/Documentation/arch/arm64/asymmetric-32bit.rst new file mode 100644 index 000000000000..64a0b505da7d --- /dev/null +++ b/Documentation/arch/arm64/asymmetric-32bit.rst @@ -0,0 +1,155 @@ +====================== +Asymmetric 32-bit SoCs +====================== + +Author: Will Deacon <will@kernel.org> + +This document describes the impact of asymmetric 32-bit SoCs on the +execution of 32-bit (``AArch32``) applications. + +Date: 2021-05-17 + +Introduction +============ + +Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset +of the CPUs are capable of executing 32-bit user applications. On such +a system, Linux by default treats the asymmetry as a "mismatch" and +disables support for both the ``PER_LINUX32`` personality and +``execve(2)`` of 32-bit ELF binaries, with the latter returning +``-ENOEXEC``. If the mismatch is detected during late onlining of a +64-bit-only CPU, then the onlining operation fails and the new CPU is +unavailable for scheduling. + +Surprisingly, these SoCs have been produced with the intention of +running legacy 32-bit binaries. Unsurprisingly, that doesn't work very +well with the default behaviour of Linux. + +It seems inevitable that future SoCs will drop 32-bit support +altogether, so if you're stuck in the unenviable position of needing to +run 32-bit code on one of these transitionary platforms then you would +be wise to consider alternatives such as recompilation, emulation or +retirement. If neither of those options are practical, then read on. + +Enabling kernel support +======================= + +Since the kernel support is not completely transparent to userspace, +allowing 32-bit tasks to run on an asymmetric 32-bit system requires an +explicit "opt-in" and can be enabled by passing the +``allow_mismatched_32bit_el0`` parameter on the kernel command-line. + +For the remainder of this document we will refer to an *asymmetric +system* to mean an asymmetric 32-bit SoC running Linux with this kernel +command-line option enabled. + +Userspace impact +================ + +32-bit tasks running on an asymmetric system behave in mostly the same +way as on a homogeneous system, with a few key differences relating to +CPU affinity. + +sysfs +----- + +The subset of CPUs capable of running 32-bit tasks is described in +``/sys/devices/system/cpu/aarch32_el0`` and is documented further in +``Documentation/ABI/testing/sysfs-devices-system-cpu``. + +**Note:** CPUs are advertised by this file as they are detected and so +late-onlining of 32-bit-capable CPUs can result in the file contents +being modified by the kernel at runtime. Once advertised, CPUs are never +removed from the file. + +``execve(2)`` +------------- + +On a homogeneous system, the CPU affinity of a task is preserved across +``execve(2)``. This is not always possible on an asymmetric system, +specifically when the new program being executed is 32-bit yet the +affinity mask contains 64-bit-only CPUs. In this situation, the kernel +determines the new affinity mask as follows: + + 1. If the 32-bit-capable subset of the affinity mask is not empty, + then the affinity is restricted to that subset and the old affinity + mask is saved. This saved mask is inherited over ``fork(2)`` and + preserved across ``execve(2)`` of 32-bit programs. + + **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks. + See `SCHED_DEADLINE`_. + + 2. Otherwise, the cpuset hierarchy of the task is walked until an + ancestor is found containing at least one 32-bit-capable CPU. The + affinity of the task is then changed to match the 32-bit-capable + subset of the cpuset determined by the walk. + + 3. On failure (i.e. out of memory), the affinity is changed to the set + of all 32-bit-capable CPUs of which the kernel is aware. + +A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will +invalidate the affinity mask saved in (1) and attempt to restore the CPU +affinity of the task using the saved mask if it was previously valid. +This restoration may fail due to intervening changes to the deadline +policy or cpuset hierarchy, in which case the ``execve(2)`` continues +with the affinity unchanged. + +Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only +the 32-bit-capable CPUs of the requested affinity mask. On success, the +affinity for the task is updated and any saved mask from a prior +``execve(2)`` is invalidated. + +``SCHED_DEADLINE`` +------------------ + +Explicit admission of a 32-bit deadline task to the default root domain +(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric +32-bit system unless admission control is disabled by writing -1 to +``/proc/sys/kernel/sched_rt_runtime_us``. + +``execve(2)`` of a 32-bit program from a 64-bit deadline task will +return ``-ENOEXEC`` if the root domain for the task contains any +64-bit-only CPUs and admission control is enabled. Concurrent offlining +of 32-bit-capable CPUs may still necessitate the procedure described in +`execve(2)`_, in which case step (1) is skipped and a warning is +emitted on the console. + +**Note:** It is recommended that a set of 32-bit-capable CPUs are placed +into a separate root domain if ``SCHED_DEADLINE`` is to be used with +32-bit tasks on an asymmetric system. Failure to do so is likely to +result in missed deadlines. + +Cpusets +------- + +The affinity of a 32-bit task on an asymmetric system may include CPUs +that are not explicitly allowed by the cpuset to which it is attached. +This can occur as a result of the following two situations: + + - A 64-bit task attached to a cpuset which allows only 64-bit CPUs + executes a 32-bit program. + + - All of the 32-bit-capable CPUs allowed by a cpuset containing a + 32-bit task are offlined. + +In both of these cases, the new affinity is calculated according to step +(2) of the process described in `execve(2)`_ and the cpuset hierarchy is +unchanged irrespective of the cgroup version. + +CPU hotplug +----------- + +On an asymmetric system, the first detected 32-bit-capable CPU is +prevented from being offlined by userspace and any such attempt will +return ``-EPERM``. Note that suspend is still permitted even if the +primary CPU (i.e. CPU 0) is 64-bit-only. + +KVM +--- + +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an +asymmetric system, a broken guest at EL1 could still attempt to execute +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit +mode will return to host userspace with an ``exit_reason`` of +``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully +re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst new file mode 100644 index 000000000000..b57776a68f15 --- /dev/null +++ b/Documentation/arch/arm64/booting.rst @@ -0,0 +1,463 @@ +===================== +Booting AArch64 Linux +===================== + +Author: Will Deacon <will.deacon@arm.com> + +Date : 07 September 2012 + +This document is based on the ARM booting document by Russell King and +is relevant to all public releases of the AArch64 Linux kernel. + +The AArch64 exception model is made up of a number of exception levels +(EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure +counterpart. EL2 is the hypervisor level, EL3 is the highest priority +level and exists only in secure mode. Both are architecturally optional. + +For the purposes of this document, we will use the term `boot loader` +simply to define all software that executes on the CPU(s) before control +is passed to the Linux kernel. This may include secure monitor and +hypervisor code, or it may just be a handful of instructions for +preparing a minimal boot environment. + +Essentially, the boot loader should provide (as a minimum) the +following: + +1. Setup and initialise the RAM +2. Setup the device tree +3. Decompress the kernel image +4. Call the kernel image + + +1. Setup and initialise RAM +--------------------------- + +Requirement: MANDATORY + +The boot loader is expected to find and initialise all RAM that the +kernel will use for volatile data storage in the system. It performs +this in a machine dependent manner. (It may use internal algorithms +to automatically locate and size all RAM, or it may use knowledge of +the RAM in the machine, or any other method the boot loader designer +sees fit.) + + +2. Setup the device tree +------------------------- + +Requirement: MANDATORY + +The device tree blob (dtb) must be placed on an 8-byte boundary and must +not exceed 2 megabytes in size. Since the dtb will be mapped cacheable +using blocks of up to 2 megabytes in size, it must not be placed within +any 2M region which must be mapped with any specific attributes. + +NOTE: versions prior to v4.2 also require that the DTB be placed within +the 512 MB region starting at text_offset bytes below the kernel Image. + +3. Decompress the kernel image +------------------------------ + +Requirement: OPTIONAL + +The AArch64 kernel does not currently provide a decompressor and +therefore requires decompression (gzip etc.) to be performed by the boot +loader if a compressed Image target (e.g. Image.gz) is used. For +bootloaders that do not implement this requirement, the uncompressed +Image target is available instead. + + +4. Call the kernel image +------------------------ + +Requirement: MANDATORY + +The decompressed kernel image contains a 64-byte header as follows:: + + u32 code0; /* Executable code */ + u32 code1; /* Executable code */ + u64 text_offset; /* Image load offset, little endian */ + u64 image_size; /* Effective Image size, little endian */ + u64 flags; /* kernel flags, little endian */ + u64 res2 = 0; /* reserved */ + u64 res3 = 0; /* reserved */ + u64 res4 = 0; /* reserved */ + u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ + u32 res5; /* reserved (used for PE COFF offset) */ + + +Header notes: + +- As of v3.17, all fields are little endian unless stated otherwise. + +- code0/code1 are responsible for branching to stext. + +- when booting through EFI, code0/code1 are initially skipped. + res5 is an offset to the PE header and the PE header has the EFI + entry point (efi_stub_entry). When the stub has done its work, it + jumps to code0 to resume the normal boot process. + +- Prior to v3.17, the endianness of text_offset was not specified. In + these cases image_size is zero and text_offset is 0x80000 in the + endianness of the kernel. Where image_size is non-zero image_size is + little-endian and must be respected. Where image_size is zero, + text_offset can be assumed to be 0x80000. + +- The flags field (introduced in v3.17) is a little-endian 64-bit field + composed as follows: + + ============= =============================================================== + Bit 0 Kernel endianness. 1 if BE, 0 if LE. + Bit 1-2 Kernel Page size. + + * 0 - Unspecified. + * 1 - 4K + * 2 - 16K + * 3 - 64K + Bit 3 Kernel physical placement + + 0 + 2MB aligned base should be as close as possible + to the base of DRAM, since memory below it is not + accessible via the linear mapping + 1 + 2MB aligned base such that all image_size bytes + counted from the start of the image are within + the 48-bit addressable range of physical memory + Bits 4-63 Reserved. + ============= =============================================================== + +- When image_size is zero, a bootloader should attempt to keep as much + memory as possible free for use by the kernel immediately after the + end of the kernel image. The amount of space required will vary + depending on selected features, and is effectively unbound. + +The Image must be placed text_offset bytes from a 2MB aligned base +address anywhere in usable system RAM and called there. The region +between the 2 MB aligned base address and the start of the image has no +special significance to the kernel, and may be used for other purposes. +At least image_size bytes from the start of the image must be free for +use by the kernel. +NOTE: versions prior to v4.6 cannot make use of memory below the +physical offset of the Image so it is recommended that the Image be +placed as close as possible to the start of system RAM. + +If an initrd/initramfs is passed to the kernel at boot, it must reside +entirely within a 1 GB aligned physical memory window of up to 32 GB in +size that fully covers the kernel Image as well. + +Any memory described to the kernel (even that below the start of the +image) which is not marked as reserved from the kernel (e.g., with a +memreserve region in the device tree) will be considered as available to +the kernel. + +Before jumping into the kernel, the following conditions must be met: + +- Quiesce all DMA capable devices so that memory does not get + corrupted by bogus network packets or disk data. This will save + you many hours of debug. + +- Primary CPU general-purpose register settings: + + - x0 = physical address of device tree blob (dtb) in system RAM. + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) + +- CPU mode + + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, + IRQ and FIQ). + The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order + to have access to the virtualisation extensions), or in EL1. + +- Caches, MMUs + + The MMU must be off. + + The instruction cache may be on or off, and must not hold any stale + entries corresponding to the loaded kernel image. + + The address range corresponding to the loaded kernel image must be + cleaned to the PoC. In the presence of a system cache or other + coherent masters with caches enabled, this will typically require + cache maintenance by VA rather than set/way operations. + System caches which respect the architected cache maintenance by VA + operations must be configured and may be enabled. + System caches which do not respect architected cache maintenance by VA + operations (not recommended) must be configured and disabled. + +- Architected timers + + CNTFRQ must be programmed with the timer frequency and CNTVOFF must + be programmed with a consistent value on all CPUs. If entering the + kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where + available. + +- Coherency + + All CPUs to be booted by the kernel must be part of the same coherency + domain on entry to the kernel. This may require IMPLEMENTATION DEFINED + initialisation to enable the receiving of maintenance operations on + each CPU. + +- System registers + + All writable architected system registers at or below the exception + level where the kernel image will be entered must be initialised by + software at a higher exception level to prevent execution in an UNKNOWN + state. + + For all systems: + - If EL3 is present: + + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is + executing on. + - The value of SCR_EL3.FIQ must be the same as the one present at boot + time whenever the kernel is executing. + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.HCE (bit 8) must be initialised to 0b1. + + For systems with a GICv3 interrupt controller to be used in v3 mode: + - If EL3 is present: + + - ICC_SRE_EL3.Enable (bit 3) must be initialised to 0b1. + - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. + - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across + all CPUs the kernel is executing on, and must stay constant + for the lifetime of the kernel. + + - If the kernel is entered at EL1: + + - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 + - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. + + - The DT or ACPI tables must describe a GICv3 interrupt controller. + + For systems with a GICv3 interrupt controller to be used in + compatibility (v2) mode: + + - If EL3 is present: + + ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. + + - If the kernel is entered at EL1: + + ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. + + - The DT or ACPI tables must describe a GICv2 interrupt controller. + + For CPUs with pointer authentication functionality: + + - If EL3 is present: + + - SCR_EL3.APK (bit 16) must be initialised to 0b1 + - SCR_EL3.API (bit 17) must be initialised to 0b1 + + - If the kernel is entered at EL1: + + - HCR_EL2.APK (bit 40) must be initialised to 0b1 + - HCR_EL2.API (bit 41) must be initialised to 0b1 + + For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: + + - If EL3 is present: + + - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 + - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + - If the kernel is entered at EL1: + + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. + + For CPUs with support for HCRX_EL2 (FEAT_HCX) present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.HXEn (bit 38) must be initialised to 0b1. + + For CPUs with Advanced SIMD and floating point support: + + - If EL3 is present: + + - CPTR_EL3.TFP (bit 10) must be initialised to 0b0. + + - If EL2 is present and the kernel is entered at EL1: + + - CPTR_EL2.TFP (bit 10) must be initialised to 0b0. + + For CPUs with the Scalable Vector Extension (FEAT_SVE) present: + + - if EL3 is present: + + - CPTR_EL3.EZ (bit 8) must be initialised to 0b1. + + - ZCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel is executed on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TZ (bit 8) must be initialised to 0b0. + + - CPTR_EL2.ZEN (bits 17:16) must be initialised to 0b11. + + - ZCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + For CPUs with the Scalable Matrix Extension (FEAT_SME): + + - If EL3 is present: + + - CPTR_EL3.ESM (bit 12) must be initialised to 0b1. + + - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1. + + - SMCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TSM (bit 12) must be initialised to 0b0. + + - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11. + + - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1. + + - SMCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64): + + - If EL3 is present: + + - SMCR_EL3.FA64 (bit 31) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - SMCR_EL2.FA64 (bit 31) must be initialised to 0b1. + + For CPUs with the Memory Tagging Extension feature (FEAT_MTE2): + + - If EL3 is present: + + - SCR_EL3.ATA (bit 26) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HCR_EL2.ATA (bit 56) must be initialised to 0b1. + + For CPUs with the Scalable Matrix Extension version 2 (FEAT_SME2): + + - If EL3 is present: + + - SMCR_EL3.EZT0 (bit 30) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1. + + For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS): + + - If the kernel is entered at EL1 and EL2 is present: + + - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1. + + For CPUs with the Extended Translation Control Register feature (FEAT_TCR2): + + - If EL3 is present: + + - SCR_EL3.TCR2En (bit 43) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HCRX_EL2.TCR2En (bit 14) must be initialised to 0b1. + + For CPUs with the Stage 1 Permission Indirection Extension feature (FEAT_S1PIE): + + - If EL3 is present: + + - SCR_EL3.PIEn (bit 45) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HFGRTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1. + + - HFGWTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1. + + - HFGRTR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1. + + - HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1. + +The requirements described above for CPU mode, caches, MMUs, architected +timers, coherency and system registers apply to all CPUs. All CPUs must +enter the kernel in the same exception level. Where the values documented +disable traps it is permissible for these traps to be enabled so long as +those traps are handled transparently by higher exception levels as though +the values documented were set. + +The boot loader is expected to enter the kernel on each CPU in the +following manner: + +- The primary CPU must jump directly to the first instruction of the + kernel image. The device tree blob passed by this CPU must contain + an 'enable-method' property for each cpu node. The supported + enable-methods are described below. + + It is expected that the bootloader will generate these device tree + properties and insert them into the blob prior to kernel entry. + +- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' + property in their cpu node. This property identifies a + naturally-aligned 64-bit zero-initalised memory location. + + These CPUs should spin outside of the kernel in a reserved area of + memory (communicated to the kernel by a /memreserve/ region in the + device tree) polling their cpu-release-addr location, which must be + contained in the reserved region. A wfe instruction may be inserted + to reduce the overhead of the busy-loop and a sev will be issued by + the primary CPU. When a read of the location pointed to by the + cpu-release-addr returns a non-zero value, the CPU must jump to this + value. The value will be written as a single 64-bit little-endian + value, so CPUs must convert the read value to their native endianness + before jumping to it. + +- CPUs with a "psci" enable method should remain outside of + the kernel (i.e. outside of the regions of memory described to the + kernel in the memory node, or in a reserved area of memory described + to the kernel by a /memreserve/ region in the device tree). The + kernel will issue CPU_ON calls as described in ARM document number ARM + DEN 0022A ("Power State Coordination Interface System Software on ARM + processors") to bring CPUs into the kernel. + + The device tree should contain a 'psci' node, as described in + Documentation/devicetree/bindings/arm/psci.yaml. + +- Secondary CPU general-purpose register settings + + - x0 = 0 (reserved for future use) + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) diff --git a/Documentation/arch/arm64/cpu-feature-registers.rst b/Documentation/arch/arm64/cpu-feature-registers.rst new file mode 100644 index 000000000000..4e4625f2455f --- /dev/null +++ b/Documentation/arch/arm64/cpu-feature-registers.rst @@ -0,0 +1,402 @@ +=========================== +ARM64 CPU Feature Registers +=========================== + +Author: Suzuki K Poulose <suzuki.poulose@arm.com> + + +This file describes the ABI for exporting the AArch64 CPU ID/feature +registers to userspace. The availability of this ABI is advertised +via the HWCAP_CPUID in HWCAPs. + +1. Motivation +------------- + +The ARM architecture defines a set of feature registers, which describe +the capabilities of the CPU/system. Access to these system registers is +restricted from EL0 and there is no reliable way for an application to +extract this information to make better decisions at runtime. There is +limited information available to the application via HWCAPs, however +there are some issues with their usage. + + a) Any change to the HWCAPs requires an update to userspace (e.g libc) + to detect the new changes, which can take a long time to appear in + distributions. Exposing the registers allows applications to get the + information without requiring updates to the toolchains. + + b) Access to HWCAPs is sometimes limited (e.g prior to libc, or + when ld is initialised at startup time). + + c) HWCAPs cannot represent non-boolean information effectively. The + architecture defines a canonical format for representing features + in the ID registers; this is well defined and is capable of + representing all valid architecture variations. + + +2. Requirements +--------------- + + a) Safety: + + Applications should be able to use the information provided by the + infrastructure to run safely across the system. This has greater + implications on a system with heterogeneous CPUs. + The infrastructure exports a value that is safe across all the + available CPU on the system. + + e.g, If at least one CPU doesn't implement CRC32 instructions, while + others do, we should report that the CRC32 is not implemented. + Otherwise an application could crash when scheduled on the CPU + which doesn't support CRC32. + + b) Security: + + Applications should only be able to receive information that is + relevant to the normal operation in userspace. Hence, some of the + fields are masked out(i.e, made invisible) and their values are set to + indicate the feature is 'not supported'. See Section 4 for the list + of visible features. Also, the kernel may manipulate the fields + based on what it supports. e.g, If FP is not supported by the + kernel, the values could indicate that the FP is not available + (even when the CPU provides it). + + c) Implementation Defined Features + + The infrastructure doesn't expose any register which is + IMPLEMENTATION DEFINED as per ARMv8-A Architecture. + + d) CPU Identification: + + MIDR_EL1 is exposed to help identify the processor. On a + heterogeneous system, this could be racy (just like getcpu()). The + process could be migrated to another CPU by the time it uses the + register value, unless the CPU affinity is set. Hence, there is no + guarantee that the value reflects the processor that it is + currently executing on. The REVIDR is not exposed due to this + constraint, as REVIDR makes sense only in conjunction with the + MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs + at:: + + /sys/devices/system/cpu/cpu$ID/regs/identification/ + \- midr + \- revidr + +3. Implementation +-------------------- + +The infrastructure is built on the emulation of the 'MRS' instruction. +Accessing a restricted system register from an application generates an +exception and ends up in SIGILL being delivered to the process. +The infrastructure hooks into the exception handler and emulates the +operation if the source belongs to the supported system register space. + +The infrastructure emulates only the following system register space:: + + Op0=3, Op1=0, CRn=0, CRm=0,2,3,4,5,6,7 + +(See Table C5-6 'System instruction encodings for non-Debug System +register accesses' in ARMv8 ARM DDI 0487A.h, for the list of +registers). + +The following rules are applied to the value returned by the +infrastructure: + + a) The value of an 'IMPLEMENTATION DEFINED' field is set to 0. + b) The value of a reserved field is populated with the reserved + value as defined by the architecture. + c) The value of a 'visible' field holds the system wide safe value + for the particular feature (except for MIDR_EL1, see section 4). + d) All other fields (i.e, invisible fields) are set to indicate + the feature is missing (as defined by the architecture). + +4. List of registers with visible features +------------------------------------------- + + 1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | RNDR | [63-60] | y | + +------------------------------+---------+---------+ + | TS | [55-52] | y | + +------------------------------+---------+---------+ + | FHM | [51-48] | y | + +------------------------------+---------+---------+ + | DP | [47-44] | y | + +------------------------------+---------+---------+ + | SM4 | [43-40] | y | + +------------------------------+---------+---------+ + | SM3 | [39-36] | y | + +------------------------------+---------+---------+ + | SHA3 | [35-32] | y | + +------------------------------+---------+---------+ + | RDM | [31-28] | y | + +------------------------------+---------+---------+ + | ATOMICS | [23-20] | y | + +------------------------------+---------+---------+ + | CRC32 | [19-16] | y | + +------------------------------+---------+---------+ + | SHA2 | [15-12] | y | + +------------------------------+---------+---------+ + | SHA1 | [11-8] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + + + 2) ID_AA64PFR0_EL1 - Processor Feature Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | DIT | [51-48] | y | + +------------------------------+---------+---------+ + | SVE | [35-32] | y | + +------------------------------+---------+---------+ + | GIC | [27-24] | n | + +------------------------------+---------+---------+ + | AdvSIMD | [23-20] | y | + +------------------------------+---------+---------+ + | FP | [19-16] | y | + +------------------------------+---------+---------+ + | EL3 | [15-12] | n | + +------------------------------+---------+---------+ + | EL2 | [11-8] | n | + +------------------------------+---------+---------+ + | EL1 | [7-4] | n | + +------------------------------+---------+---------+ + | EL0 | [3-0] | n | + +------------------------------+---------+---------+ + + + 3) ID_AA64PFR1_EL1 - Processor Feature Register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | MTE | [11-8] | y | + +------------------------------+---------+---------+ + | SSBS | [7-4] | y | + +------------------------------+---------+---------+ + | BT | [3-0] | y | + +------------------------------+---------+---------+ + + + 4) MIDR_EL1 - Main ID Register + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | Implementer | [31-24] | y | + +------------------------------+---------+---------+ + | Variant | [23-20] | y | + +------------------------------+---------+---------+ + | Architecture | [19-16] | y | + +------------------------------+---------+---------+ + | PartNum | [15-4] | y | + +------------------------------+---------+---------+ + | Revision | [3-0] | y | + +------------------------------+---------+---------+ + + NOTE: The 'visible' fields of MIDR_EL1 will contain the value + as available on the CPU where it is fetched and is not a system + wide safe value. + + 5) ID_AA64ISAR1_EL1 - Instruction set attribute register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | I8MM | [55-52] | y | + +------------------------------+---------+---------+ + | DGH | [51-48] | y | + +------------------------------+---------+---------+ + | BF16 | [47-44] | y | + +------------------------------+---------+---------+ + | SB | [39-36] | y | + +------------------------------+---------+---------+ + | FRINTTS | [35-32] | y | + +------------------------------+---------+---------+ + | GPI | [31-28] | y | + +------------------------------+---------+---------+ + | GPA | [27-24] | y | + +------------------------------+---------+---------+ + | LRCPC | [23-20] | y | + +------------------------------+---------+---------+ + | FCMA | [19-16] | y | + +------------------------------+---------+---------+ + | JSCVT | [15-12] | y | + +------------------------------+---------+---------+ + | API | [11-8] | y | + +------------------------------+---------+---------+ + | APA | [7-4] | y | + +------------------------------+---------+---------+ + | DPB | [3-0] | y | + +------------------------------+---------+---------+ + + 6) ID_AA64MMFR0_EL1 - Memory model feature register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | ECV | [63-60] | y | + +------------------------------+---------+---------+ + + 7) ID_AA64MMFR2_EL1 - Memory model feature register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AT | [35-32] | y | + +------------------------------+---------+---------+ + + 8) ID_AA64ZFR0_EL1 - SVE feature ID register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | F64MM | [59-56] | y | + +------------------------------+---------+---------+ + | F32MM | [55-52] | y | + +------------------------------+---------+---------+ + | I8MM | [47-44] | y | + +------------------------------+---------+---------+ + | SM4 | [43-40] | y | + +------------------------------+---------+---------+ + | SHA3 | [35-32] | y | + +------------------------------+---------+---------+ + | BF16 | [23-20] | y | + +------------------------------+---------+---------+ + | BitPerm | [19-16] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + | SVEVer | [3-0] | y | + +------------------------------+---------+---------+ + + 8) ID_AA64MMFR1_EL1 - Memory model feature register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AFP | [47-44] | y | + +------------------------------+---------+---------+ + + 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | MOPS | [19-16] | y | + +------------------------------+---------+---------+ + | RPRES | [7-4] | y | + +------------------------------+---------+---------+ + | WFXT | [3-0] | y | + +------------------------------+---------+---------+ + + 10) MVFR0_EL1 - AArch32 Media and VFP Feature Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | FPDP | [11-8] | y | + +------------------------------+---------+---------+ + + 11) MVFR1_EL1 - AArch32 Media and VFP Feature Register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | SIMDFMAC | [31-28] | y | + +------------------------------+---------+---------+ + | SIMDSP | [19-16] | y | + +------------------------------+---------+---------+ + | SIMDInt | [15-12] | y | + +------------------------------+---------+---------+ + | SIMDLS | [11-8] | y | + +------------------------------+---------+---------+ + + 12) ID_ISAR5_EL1 - AArch32 Instruction Set Attribute Register 5 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | CRC32 | [19-16] | y | + +------------------------------+---------+---------+ + | SHA2 | [15-12] | y | + +------------------------------+---------+---------+ + | SHA1 | [11-8] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + + +Appendix I: Example +------------------- + +:: + + /* + * Sample program to demonstrate the MRS emulation ABI. + * + * Copyright (C) 2015-2016, ARM Ltd + * + * Author: Suzuki K Poulose <suzuki.poulose@arm.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + + #include <asm/hwcap.h> + #include <stdio.h> + #include <sys/auxv.h> + + #define get_cpu_ftr(id) ({ \ + unsigned long __val; \ + asm("mrs %0, "#id : "=r" (__val)); \ + printf("%-20s: 0x%016lx\n", #id, __val); \ + }) + + int main(void) + { + + if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) { + fputs("CPUID registers unavailable\n", stderr); + return 1; + } + + get_cpu_ftr(ID_AA64ISAR0_EL1); + get_cpu_ftr(ID_AA64ISAR1_EL1); + get_cpu_ftr(ID_AA64MMFR0_EL1); + get_cpu_ftr(ID_AA64MMFR1_EL1); + get_cpu_ftr(ID_AA64PFR0_EL1); + get_cpu_ftr(ID_AA64PFR1_EL1); + get_cpu_ftr(ID_AA64DFR0_EL1); + get_cpu_ftr(ID_AA64DFR1_EL1); + + get_cpu_ftr(MIDR_EL1); + get_cpu_ftr(MPIDR_EL1); + get_cpu_ftr(REVIDR_EL1); + + #if 0 + /* Unexposed register access causes SIGILL */ + get_cpu_ftr(ID_MMFR0_EL1); + #endif + + return 0; + } diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst new file mode 100644 index 000000000000..8c8addb4194c --- /dev/null +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -0,0 +1,312 @@ +.. _elf_hwcaps_index: + +================ +ARM64 ELF hwcaps +================ + +This document describes the usage and semantics of the arm64 ELF hwcaps. + + +1. Introduction +--------------- + +Some hardware or software features are only available on some CPU +implementations, and/or with certain kernel configurations, but have no +architected discovery mechanism available to userspace code at EL0. The +kernel exposes the presence of these features to userspace through a set +of flags called hwcaps, exposed in the auxiliary vector. + +Userspace software can test for features by acquiring the AT_HWCAP or +AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant +flags are set, e.g.:: + + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; + + return false; + } + +Where software relies on a feature described by a hwcap, it should check +the relevant hwcap flag to verify that the feature is present before +attempting to make use of the feature. + +Features cannot be probed reliably through other means. When a feature +is not available, attempting to use it may result in unpredictable +behaviour, and is not guaranteed to result in any reliable indication +that the feature is unavailable, such as a SIGILL. + + +2. Interpretation of hwcaps +--------------------------- + +The majority of hwcaps are intended to indicate the presence of features +which are described by architected ID registers inaccessible to +userspace code at EL0. These hwcaps are defined in terms of ID register +fields, and should be interpreted with reference to the definition of +these fields in the ARM Architecture Reference Manual (ARM ARM). + +Such hwcaps are described below in the form:: + + Functionality implied by idreg.field == val. + +Such hwcaps indicate the availability of functionality that the ARM ARM +defines as being present when idreg.field has value val, but do not +indicate that idreg.field is precisely equal to val, nor do they +indicate the absence of functionality implied by other values of +idreg.field. + +Other hwcaps may indicate the presence of features which cannot be +described by ID registers alone. These may be described without +reference to ID registers, and may refer to other documentation. + + +3. The hwcaps exposed in AT_HWCAP +--------------------------------- + +HWCAP_FP + Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000. + +HWCAP_ASIMD + Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000. + +HWCAP_EVTSTRM + The generic timer is configured to generate events at a frequency of + approximately 10KHz. + +HWCAP_AES + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. + +HWCAP_PMULL + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010. + +HWCAP_SHA1 + Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001. + +HWCAP_SHA2 + Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001. + +HWCAP_CRC32 + Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001. + +HWCAP_ATOMICS + Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010. + +HWCAP_FPHP + Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001. + +HWCAP_ASIMDHP + Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001. + +HWCAP_CPUID + EL0 access to certain ID registers is available, to the extent + described by Documentation/arch/arm64/cpu-feature-registers.rst. + + These ID registers may imply the availability of features. + +HWCAP_ASIMDRDM + Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001. + +HWCAP_JSCVT + Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001. + +HWCAP_FCMA + Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001. + +HWCAP_LRCPC + Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001. + +HWCAP_DCPOP + Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001. + +HWCAP_SHA3 + Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001. + +HWCAP_SM3 + Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001. + +HWCAP_SM4 + Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001. + +HWCAP_ASIMDDP + Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001. + +HWCAP_SHA512 + Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010. + +HWCAP_SVE + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001. + +HWCAP_ASIMDFHM + Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001. + +HWCAP_DIT + Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001. + +HWCAP_USCAT + Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001. + +HWCAP_ILRCPC + Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010. + +HWCAP_FLAGM + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. + +HWCAP_SSBS + Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. + +HWCAP_SB + Functionality implied by ID_AA64ISAR1_EL1.SB == 0b0001. + +HWCAP_PACA + Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or + ID_AA64ISAR1_EL1.API == 0b0001, as described by + Documentation/arch/arm64/pointer-authentication.rst. + +HWCAP_PACG + Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or + ID_AA64ISAR1_EL1.GPI == 0b0001, as described by + Documentation/arch/arm64/pointer-authentication.rst. + +HWCAP2_DCPODP + Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. + +HWCAP2_SVE2 + Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001. + +HWCAP2_SVEAES + Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001. + +HWCAP2_SVEPMULL + Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010. + +HWCAP2_SVEBITPERM + Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001. + +HWCAP2_SVESHA3 + Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001. + +HWCAP2_SVESM4 + Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. + +HWCAP2_FLAGM2 + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. + +HWCAP2_FRINT + Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. + +HWCAP2_SVEI8MM + Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. + +HWCAP2_SVEF32MM + Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. + +HWCAP2_SVEF64MM + Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. + +HWCAP2_SVEBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. + +HWCAP2_I8MM + Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. + +HWCAP2_BF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001. + +HWCAP2_DGH + Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001. + +HWCAP2_RNG + Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001. + +HWCAP2_BTI + Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001. + +HWCAP2_MTE + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described + by Documentation/arch/arm64/memory-tagging-extension.rst. + +HWCAP2_ECV + Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001. + +HWCAP2_AFP + Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001. + +HWCAP2_RPRES + Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. + +HWCAP2_MTE3 + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described + by Documentation/arch/arm64/memory-tagging-extension.rst. + +HWCAP2_SME + Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described + by Documentation/arch/arm64/sme.rst. + +HWCAP2_SME_I16I64 + Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111. + +HWCAP2_SME_F64F64 + Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1. + +HWCAP2_SME_I8I32 + Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111. + +HWCAP2_SME_F16F32 + Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1. + +HWCAP2_SME_B16F32 + Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1. + +HWCAP2_SME_F32F32 + Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1. + +HWCAP2_SME_FA64 + Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1. + +HWCAP2_WFXT + Functionality implied by ID_AA64ISAR2_EL1.WFXT == 0b0010. + +HWCAP2_EBF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. + +HWCAP2_SVE_EBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. + +HWCAP2_CSSC + Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0001. + +HWCAP2_RPRFM + Functionality implied by ID_AA64ISAR2_EL1.RPRFM == 0b0001. + +HWCAP2_SVE2P1 + Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010. + +HWCAP2_SME2 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001. + +HWCAP2_SME2P1 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0010. + +HWCAP2_SMEI16I32 + Functionality implied by ID_AA64SMFR0_EL1.I16I32 == 0b0101 + +HWCAP2_SMEBI32I32 + Functionality implied by ID_AA64SMFR0_EL1.BI32I32 == 0b1 + +HWCAP2_SMEB16B16 + Functionality implied by ID_AA64SMFR0_EL1.B16B16 == 0b1 + +HWCAP2_SMEF16F16 + Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1 + +HWCAP2_MOPS + Functionality implied by ID_AA64ISAR2_EL1.MOPS == 0b0001. + +4. Unused AT_HWCAP bits +----------------------- + +For interoperation with userspace, the kernel guarantees that bits 62 +and 63 of AT_HWCAP will always be returned as 0. diff --git a/Documentation/arch/arm64/features.rst b/Documentation/arch/arm64/features.rst new file mode 100644 index 000000000000..dfa4cb3cd3ef --- /dev/null +++ b/Documentation/arch/arm64/features.rst @@ -0,0 +1,3 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. kernel-feat:: $srctree/Documentation/features arm64 diff --git a/Documentation/arch/arm64/hugetlbpage.rst b/Documentation/arch/arm64/hugetlbpage.rst new file mode 100644 index 000000000000..a110124c11e3 --- /dev/null +++ b/Documentation/arch/arm64/hugetlbpage.rst @@ -0,0 +1,43 @@ +.. _hugetlbpage_index: + +==================== +HugeTLBpage on ARM64 +==================== + +Hugepage relies on making efficient use of TLBs to improve performance of +address translations. The benefit depends on both - + + - the size of hugepages + - size of entries supported by the TLBs + +The ARM64 port supports two flavours of hugepages. + +1) Block mappings at the pud/pmd level +-------------------------------------- + +These are regular hugepages where a pmd or a pud page table entry points to a +block of memory. Regardless of the supported size of entries in TLB, block +mappings reduce the depth of page table walk needed to translate hugepage +addresses. + +2) Using the Contiguous bit +--------------------------- + +The architecture provides a contiguous bit in the translation table entries +(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a +contiguous set of entries that can be cached in a single TLB entry. + +The contiguous bit is used in Linux to increase the mapping size at the pmd and +pte (last) level. The number of supported contiguous entries varies by page size +and level of the page table. + + +The following hugepage sizes are supported - + + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G + ====== ======== ==== ======== === diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst new file mode 100644 index 000000000000..d08e924204bf --- /dev/null +++ b/Documentation/arch/arm64/index.rst @@ -0,0 +1,38 @@ +.. _arm64_index: + +================== +ARM64 Architecture +================== + +.. toctree:: + :maxdepth: 1 + + acpi_object_usage + amu + arm-acpi + asymmetric-32bit + booting + cpu-feature-registers + elf_hwcaps + hugetlbpage + kdump + legacy_instructions + memory + memory-tagging-extension + perf + pointer-authentication + ptdump + silicon-errata + sme + sve + tagged-address-abi + tagged-pointers + + features + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/arch/arm64/kasan-offsets.sh b/Documentation/arch/arm64/kasan-offsets.sh new file mode 100644 index 000000000000..2dc5f9e18039 --- /dev/null +++ b/Documentation/arch/arm64/kasan-offsets.sh @@ -0,0 +1,26 @@ +#!/bin/sh + +# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW +# start address at the top of the linear region + +print_kasan_offset () { + printf "%02d\t" $1 + printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ + - (1 << (64 - 32 - $2)) )) +} + +echo KASAN_SHADOW_SCALE_SHIFT = 3 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 3 +print_kasan_offset 47 3 +print_kasan_offset 42 3 +print_kasan_offset 39 3 +print_kasan_offset 36 3 +echo +echo KASAN_SHADOW_SCALE_SHIFT = 4 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 4 +print_kasan_offset 47 4 +print_kasan_offset 42 4 +print_kasan_offset 39 4 +print_kasan_offset 36 4 diff --git a/Documentation/arch/arm64/kdump.rst b/Documentation/arch/arm64/kdump.rst new file mode 100644 index 000000000000..56a89f45df28 --- /dev/null +++ b/Documentation/arch/arm64/kdump.rst @@ -0,0 +1,92 @@ +======================================= +crashkernel memory reservation on arm64 +======================================= + +Author: Baoquan He <bhe@redhat.com> + +Kdump mechanism is used to capture a corrupted kernel vmcore so that +it can be subsequently analyzed. In order to do this, a preliminarily +reserved memory is needed to pre-load the kdump kernel and boot such +kernel if corruption happens. + +That reserved memory for kdump is adapted to be able to minimally +accommodate the kdump kernel and the user space programs needed for the +vmcore collection. + +Kernel parameter +================ + +Through the kernel parameters below, memory can be reserved accordingly +during the early stage of the first kernel booting so that a continuous +large chunk of memomy can be found. The low memory reservation needs to +be considered if the crashkernel is reserved from the high memory area. + +- crashkernel=size@offset +- crashkernel=size +- crashkernel=size,high crashkernel=size,low + +Low memory and high memory +========================== + +For kdump reservations, low memory is the memory area under a specific +limit, usually decided by the accessible address bits of the DMA-capable +devices needed by the kdump kernel to run. Those devices not related to +vmcore dumping can be ignored. On arm64, the low memory upper bound is +not fixed: it is 1G on the RPi4 platform but 4G on most other systems. +On special kernels built with CONFIG_ZONE_(DMA|DMA32) disabled, the +whole system RAM is low memory. Outside of the low memory described +above, the rest of system RAM is considered high memory. + +Implementation +============== + +1) crashkernel=size@offset +-------------------------- + +The crashkernel memory must be reserved at the user-specified region or +fail if already occupied. + + +2) crashkernel=size +------------------- + +The crashkernel memory region will be reserved in any available position +according to the search order: + +Firstly, the kernel searches the low memory area for an available region +with the specified size. + +If searching for low memory fails, the kernel falls back to searching +the high memory area for an available region of the specified size. If +the reservation in high memory succeeds, a default size reservation in +the low memory will be done. Currently the default size is 128M, +sufficient for the low memory needs of the kdump kernel. + +Note: crashkernel=size is the recommended option for crashkernel kernel +reservations. The user would not need to know the system memory layout +for a specific platform. + +3) crashkernel=size,high crashkernel=size,low +--------------------------------------------- + +crashkernel=size,(high|low) are an important supplement to +crashkernel=size. They allows the user to specify how much memory needs +to be allocated from the high memory and low memory respectively. On +many systems the low memory is precious and crashkernel reservations +from this area should be kept to a minimum. + +To reserve memory for crashkernel=size,high, searching is first +attempted from the high memory region. If the reservation succeeds, the +low memory reservation will be done subsequently. + +If reservation from the high memory failed, the kernel falls back to +searching the low memory with the specified size in crashkernel=,high. +If it succeeds, no further reservation for low memory is needed. + +Notes: + +- If crashkernel=,low is not specified, the default low memory + reservation will be done automatically. + +- if crashkernel=0,low is specified, it means that the low memory + reservation is omitted intentionally. diff --git a/Documentation/arch/arm64/legacy_instructions.rst b/Documentation/arch/arm64/legacy_instructions.rst new file mode 100644 index 000000000000..54401b22cb8f --- /dev/null +++ b/Documentation/arch/arm64/legacy_instructions.rst @@ -0,0 +1,68 @@ +=================== +Legacy instructions +=================== + +The arm64 port of the Linux kernel provides infrastructure to support +emulation of instructions which have been deprecated, or obsoleted in +the architecture. The infrastructure code uses undefined instruction +hooks to support emulation. Where available it also allows turning on +the instruction execution in hardware. + +The emulation mode can be controlled by writing to sysctl nodes +(/proc/sys/abi). The following explains the different execution +behaviours and the corresponding values of the sysctl nodes - + +* Undef + Value: 0 + + Generates undefined instruction abort. Default for instructions that + have been obsoleted in the architecture, e.g., SWP + +* Emulate + Value: 1 + + Uses software emulation. To aid migration of software, in this mode + usage of emulated instruction is traced as well as rate limited + warnings are issued. This is the default for deprecated + instructions, .e.g., CP15 barriers + +* Hardware Execution + Value: 2 + + Although marked as deprecated, some implementations may support the + enabling/disabling of hardware support for the execution of these + instructions. Using hardware execution generally provides better + performance, but at the loss of ability to gather runtime statistics + about the use of the deprecated instructions. + +The default mode depends on the status of the instruction in the +architecture. Deprecated instructions should default to emulation +while obsolete instructions must be undefined by default. + +Note: Instruction emulation may not be possible in all cases. See +individual instruction notes for further information. + +Supported legacy instructions +----------------------------- +* SWP{B} + +:Node: /proc/sys/abi/swp +:Status: Obsolete +:Default: Undef (0) + +* CP15 Barriers + +:Node: /proc/sys/abi/cp15_barrier +:Status: Deprecated +:Default: Emulate (1) + +* SETEND + +:Node: /proc/sys/abi/setend +:Status: Deprecated +:Default: Emulate (1)* + + Note: All the cpus on the system must have mixed endian support at EL0 + for this feature to be enabled. If a new CPU - which doesn't support mixed + endian - is hotplugged in after this feature has been enabled, there could + be unexpected results in the application. diff --git a/Documentation/arch/arm64/memory-tagging-extension.rst b/Documentation/arch/arm64/memory-tagging-extension.rst new file mode 100644 index 000000000000..679725030731 --- /dev/null +++ b/Documentation/arch/arm64/memory-tagging-extension.rst @@ -0,0 +1,375 @@ +=============================================== +Memory Tagging Extension (MTE) in AArch64 Linux +=============================================== + +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> + Catalin Marinas <catalin.marinas@arm.com> + +Date: 2020-02-25 + +This document describes the provision of the Memory Tagging Extension +functionality in AArch64 Linux. + +Introduction +============ + +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI +(Top Byte Ignore) feature and allows software to access a 4-bit +allocation tag for each 16-byte granule in the physical address space. +Such memory range must be mapped with the Normal-Tagged memory +attribute. A logical tag is derived from bits 59-56 of the virtual +address used for the memory access. A CPU with MTE enabled will compare +the logical tag against the allocation tag and potentially raise an +exception on mismatch, subject to system registers configuration. + +Userspace Support +================= + +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is +supported by the hardware, the kernel advertises the feature to +userspace via ``HWCAP2_MTE``. + +PROT_MTE +-------- + +To access the allocation tags, a user process must enable the Tagged +memory attribute on an address range using a new ``prot`` flag for +``mmap()`` and ``mprotect()``: + +``PROT_MTE`` - Pages allow access to the MTE allocation tags. + +The allocation tag is set to 0 when such pages are first mapped in the +user address space and preserved on copy-on-write. ``MAP_SHARED`` is +supported and the allocation tags can be shared between processes. + +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other +types of mapping will result in ``-EINVAL`` returned by these system +calls. + +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot +be cleared by ``mprotect()``. + +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any +point after the system call. + +Tag Check Faults +---------------- + +When ``PROT_MTE`` is enabled on an address range and a mismatch between +the logical and allocation tags occurs on access, there are three +configurable behaviours: + +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the + tag check fault. + +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with + ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The + memory access is not performed. If ``SIGSEGV`` is ignored or blocked + by the offending thread, the containing process is terminated with a + ``coredump``. + +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending + thread, asynchronously following one or multiple tag check faults, + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting + address is unknown). + +- *Asymmetric* - Reads are handled as for synchronous mode while writes + are handled as for asynchronous mode. + +The user can select the above modes, per thread, using the +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` +contains any number of the following values in the ``PR_MTE_TCF_MASK`` +bit-field: + +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults + (ignored if combined with other options) +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode + +If no modes are specified, tag check faults are ignored. If a single +mode is specified, the program will run in that mode. If multiple +modes are specified, the mode is selected as described in the "Per-CPU +preferred tag checking modes" section below. + +The current tag check fault configuration can be read using the +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If +multiple modes were requested then all will be reported. + +Tag checking can also be disabled for a user thread by setting the +``PSTATE.TCO`` bit with ``MSR TCO, #1``. + +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on +``sigreturn()``. + +**Note**: There are no *match-all* logical tags available for user +applications. + +**Note**: Kernel accesses to the user address space (e.g. ``read()`` +system call) are not checked if the user thread tag checking mode is +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user +address accesses, however it cannot always guarantee it. Kernel accesses +to user addresses are always performed with an effective ``PSTATE.TCO`` +value of zero, regardless of the user configuration. + +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions +----------------------------------------------------------------- + +The architecture allows excluding certain tags to be randomly generated +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux +excludes all tags other than 0. A user thread can enable specific tags +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap +in the ``PR_MTE_TAG_MASK`` bit-field. + +**Note**: The hardware uses an exclude mask but the ``prctl()`` +interface provides an include mask. An include mask of ``0`` (exclusion +mask ``0xffff``) results in the CPU always generating tag ``0``. + +Per-CPU preferred tag checking mode +----------------------------------- + +On some CPUs the performance of MTE in stricter tag checking modes +is similar to that of less strict tag checking modes. This makes it +worthwhile to enable stricter checks on those CPUs when a less strict +checking mode is requested, in order to gain the error detection +benefits of the stricter checks without the performance downsides. To +support this scenario, a privileged user may configure a stricter +tag checking mode as the CPU's preferred tag checking mode. + +The preferred tag checking mode for each CPU is controlled by +``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a +privileged user may write the value ``async``, ``sync`` or ``asymm``. The +default preferred mode for each CPU is ``async``. + +To allow a program to potentially run in the CPU's preferred tag +checking mode, the user program may set multiple tag check fault mode +bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call. If both synchronous and asynchronous +modes are requested then asymmetric mode may also be selected by the +kernel. If the CPU's preferred tag checking mode is in the task's set +of provided tag checking modes, that mode will be selected. Otherwise, +one of the modes in the task's mode will be selected by the kernel +from the task's mode set using the preference order: + + 1. Asynchronous + 2. Asymmetric + 3. Synchronous + +Note that there is no way for userspace to request multiple modes and +also disable asymmetric mode. + +Initial process state +--------------------- + +On ``execve()``, the new process has the following configuration: + +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) +- No tag checking modes are selected (tag check faults ignored) +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) +- ``PSTATE.TCO`` set to 0 +- ``PROT_MTE`` not set on any of the initial memory maps + +On ``fork()``, the new process inherits the parent's configuration and +memory map attributes with the exception of the ``madvise()`` ranges +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set +to 0). + +The ``ptrace()`` interface +-------------------------- + +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read +the tags from or set the tags to a tracee's address space. The +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, +data)`` where: + +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. +- ``pid`` - the tracee's PID. +- ``addr`` - address in the tracee's address space. +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to + a buffer of ``iov_len`` length in the tracer's address space. + +The tags in the tracer's ``iov_base`` buffer are represented as one +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the +tracee's address space. + +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel +will use the corresponding aligned address. + +``ptrace()`` return value: + +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the + number of tags transferred. This may be smaller than the requested + ``iov_len`` if the requested address range in the tracee's or the + tracer's space cannot be accessed or does not have valid tags. +- ``-EPERM`` - the specified process cannot be traced. +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid + address) and no tags copied. ``iov_len`` not updated. +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` + or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never + mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. + +**Note**: There are no transient errors for the requests above, so user +programs should not retry in case of a non-zero system call return. + +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged +address ABI control and MTE configuration of a process as per the +``prctl()`` options described in +Documentation/arch/arm64/tagged-address-abi.rst and above. The corresponding +``regset`` is 1 element of 8 bytes (``sizeof(long))``). + +Core dump support +----------------- + +The allocation tags for user memory mapped with ``PROT_MTE`` are dumped +in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The +program header for such segment is defined as: + +:``p_type``: ``PT_AARCH64_MEMTAG_MTE`` +:``p_flags``: 0 +:``p_offset``: segment file offset +:``p_vaddr``: segment virtual address, same as the corresponding + ``PT_LOAD`` segment +:``p_paddr``: 0 +:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32`` + (two 4-bit tags cover 32 bytes of memory) +:``p_memsz``: segment size in memory, same as the corresponding + ``PT_LOAD`` segment +:``p_align``: 0 + +The tags are stored in the core file at ``p_offset`` as two 4-bit tags +in a byte. With the tag granule of 16 bytes, a 4K page requires 128 +bytes in the core file. + +Example of correct usage +======================== + +*MTE Example code* + +.. code-block:: c + + /* + * To be compiled with -march=armv8.5-a+memtag + */ + #include <errno.h> + #include <stdint.h> + #include <stdio.h> + #include <stdlib.h> + #include <unistd.h> + #include <sys/auxv.h> + #include <sys/mman.h> + #include <sys/prctl.h> + + /* + * From arch/arm64/include/uapi/asm/hwcap.h + */ + #define HWCAP2_MTE (1 << 18) + + /* + * From arch/arm64/include/uapi/asm/mman.h + */ + #define PROT_MTE 0x20 + + /* + * From include/uapi/linux/prctl.h + */ + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_GET_TAGGED_ADDR_CTRL 56 + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) + # define PR_MTE_TCF_SHIFT 1 + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TAG_SHIFT 3 + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) + + /* + * Insert a random logical tag into the given pointer. + */ + #define insert_random_tag(ptr) ({ \ + uint64_t __val; \ + asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ + __val; \ + }) + + /* + * Set the allocation tag on the destination address. + */ + #define set_tag(tagged_addr) do { \ + asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ + } while (0) + + int main() + { + unsigned char *a; + unsigned long page_sz = sysconf(_SC_PAGESIZE); + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + /* check if MTE is present */ + if (!(hwcap2 & HWCAP2_MTE)) + return EXIT_FAILURE; + + /* + * Enable the tagged address ABI, synchronous or asynchronous MTE + * tag check faults (based on per-CPU preference) and allow all + * non-zero tags in the randomly generated set. + */ + if (prctl(PR_SET_TAGGED_ADDR_CTRL, + PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | + (0xfffe << PR_MTE_TAG_SHIFT), + 0, 0, 0)) { + perror("prctl() failed"); + return EXIT_FAILURE; + } + + a = mmap(0, page_sz, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (a == MAP_FAILED) { + perror("mmap() failed"); + return EXIT_FAILURE; + } + + /* + * Enable MTE on the above anonymous mmap. The flag could be passed + * directly to mmap() and skip this step. + */ + if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { + perror("mprotect() failed"); + return EXIT_FAILURE; + } + + /* access with the default tag (0) */ + a[0] = 1; + a[1] = 2; + + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* set the logical and allocation tags */ + a = (unsigned char *)insert_random_tag(a); + set_tag(a); + + printf("%p\n", a); + + /* non-zero tag access */ + a[0] = 3; + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* + * If MTE is enabled correctly the next instruction will generate an + * exception. + */ + printf("Expecting SIGSEGV...\n"); + a[16] = 0xdd; + + /* this should not be printed in the PR_MTE_TCF_SYNC mode */ + printf("...haven't got one\n"); + + return EXIT_FAILURE; + } diff --git a/Documentation/arch/arm64/memory.rst b/Documentation/arch/arm64/memory.rst new file mode 100644 index 000000000000..55a55f30eed8 --- /dev/null +++ b/Documentation/arch/arm64/memory.rst @@ -0,0 +1,167 @@ +============================== +Memory Layout on AArch64 Linux +============================== + +Author: Catalin Marinas <catalin.marinas@arm.com> + +This document describes the virtual memory layout used by the AArch64 +Linux kernel. The architecture allows up to 4 levels of translation +tables with a 4KB page size and up to 3 levels with a 64KB page size. + +AArch64 Linux uses either 3 levels or 4 levels of translation tables +with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit +(256TB) virtual addresses, respectively, for both user and kernel. With +64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) +virtual address, are used but the memory layout is the same. + +ARMv8.2 adds optional support for Large Virtual Address space. This is +only available when running with a 64KB page size and expands the +number of descriptors in the first level of translation. + +User addresses have bits 63:48 set to 0 while the kernel addresses have +the same bits set to 1. TTBRx selection is given by bit 63 of the +virtual address. The swapper_pg_dir contains only kernel (global) +mappings while the user pgd contains only user (non-global) mappings. +The swapper_pg_dir address is written to TTBR1 and never written to +TTBR0. + + +AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 0000ffffffffffff 256TB user + ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map + [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] + ffff800000000000 ffff80007fffffff 2GB modules + ffff800080000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 fffffdffffffffff 2TB vmemmap + fffffe0000000000 ffffffffffffffff 2TB [guard region] + + +AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 000fffffffffffff 4PB user + fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map + [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] + ffff800000000000 ffff80007fffffff 2GB modules + ffff800080000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 ffffffdfffffffff ~4TB vmemmap + ffffffe000000000 ffffffffffffffff 128GB [guard region] + + +Translation table lookup with 4KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] in-page offset + | | | | +-> [20:12] L3 index + | | | +-----------> [29:21] L2 index + | | +---------------------> [38:30] L1 index + | +-------------------------------> [47:39] L0 index + +-------------------------------------------------> [63] TTBR0/1 + + +Translation table lookup with 64KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] in-page offset + | | | +----------> [28:16] L3 index + | | +--------------------------> [41:29] L2 index + | +-------------------------------> [47:42] L1 index (48-bit) + | [51:42] L1 index (52-bit) + +-------------------------------------------------> [63] TTBR0/1 + + +When using KVM without the Virtualization Host Extensions, the +hypervisor maps kernel pages in EL2 at a fixed (and potentially +random) offset from the linear mapping. See the kern_hyp_va macro and +kvm_update_va_mask function for more details. MMIO devices such as +GICv2 gets mapped next to the HYP idmap page, as do vectors when +ARM64_SPECTRE_V3A is enabled for particular CPUs. + +When using KVM with the Virtualization Host Extensions, no additional +mappings are created, since the host kernel runs directly in EL2. + +52-bit VA support in the kernel +------------------------------- +If the ARMv8.2-LVA optional feature is present, and we are running +with a 64KB page size; then it is possible to use 52-bits of address +space for both userspace and kernel addresses. However, any kernel +binary that supports 52-bit must also be able to fall back to 48-bit +at early boot time if the hardware feature is not present. + +This fallback mechanism necessitates the kernel .text to be in the +higher addresses such that they are invariant to 48/52-bit VAs. Due +to the kasan shadow being a fraction of the entire kernel VA space, +the end of the kasan shadow must also be in the higher half of the +kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, +the end of the kasan shadow is invariant and dependent on ~0UL, +whilst the start address will "grow" towards the lower addresses). + +In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET +is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), +this obviates the need for an extra variable read. The physvirt +offset and vmemmap offsets are computed at early boot to enable +this logic. + +As a single binary will need to support both 48-bit and 52-bit VA +spaces, the VMEMMAP must be sized large enough for 52-bit VAs and +also must be sized large enough to accommodate a fixed PAGE_OFFSET. + +Most code in the kernel should not need to consider the VA_BITS, for +code that does need to know the VA size the variables are +defined as follows: + +VA_BITS constant the *maximum* VA space size + +VA_BITS_MIN constant the *minimum* VA space size + +vabits_actual variable the *actual* VA space size + + +Maximum and minimum sizes can be useful to ensure that buffers are +sized large enough or that addresses are positioned close enough for +the "worst" case. + +52-bit userspace VAs +-------------------- +To maintain compatibility with software that relies on the ARMv8.0 +VA space maximum size of 48-bits, the kernel will, by default, +return virtual addresses to userspace from a 48-bit range. + +Software can "opt-in" to receiving VAs from a 52-bit space by +specifying an mmap hint parameter that is larger than 48-bit. + +For example: + +.. code-block:: c + + maybe_high_address = mmap(~0UL, size, prot, flags,...); + +It is also possible to build a debug kernel that returns addresses +from a 52-bit space by enabling the following kernel config options: + +.. code-block:: sh + + CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y + +Note that this option is only intended for debugging applications +and should not be used in production. diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst new file mode 100644 index 000000000000..1f87b57c2332 --- /dev/null +++ b/Documentation/arch/arm64/perf.rst @@ -0,0 +1,166 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _perf_index: + +==== +Perf +==== + +Perf Event Attributes +===================== + +:Author: Andrew Murray <andrew.murray@arm.com> +:Date: 2019-03-06 + +exclude_user +------------ + +This attribute excludes userspace. + +Userspace always runs at EL0 and thus this attribute will exclude EL0. + + +exclude_kernel +-------------- + +This attribute excludes the kernel. + +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run +at EL1. + +For the host this attribute will exclude EL1 and additionally EL2 on a VHE +system. + +For the guest this attribute will exclude EL1. Please note that EL2 is +never counted within a guest. + + +exclude_hv +---------- + +This attribute excludes the hypervisor. + +For a VHE host this attribute is ignored as we consider the host kernel to +be the hypervisor. + +For a non-VHE host this attribute will exclude EL2 as we consider the +hypervisor to be any code that runs at EL2 which is predominantly used for +guest/host transitions. + +For the guest this attribute has no effect. Please note that EL2 is +never counted within a guest. + + +exclude_host / exclude_guest +---------------------------- + +These attributes exclude the KVM host and guest, respectively. + +The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE +kernel or non-VHE hypervisor). + +The KVM guest may run at EL0 (userspace) and EL1 (kernel). + +Due to the overlapping exception levels between host and guests we cannot +exclusively rely on the PMU's hardware exception filtering - therefore we +must enable/disable counting on the entry and exit to the guest. This is +performed differently on VHE and non-VHE systems. + +For non-VHE systems we exclude EL2 for exclude_host - upon entering and +exiting the guest we disable/enable the event as appropriate based on the +exclude_host and exclude_guest attributes. + +For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 +for exclude_host. Upon entering and exiting the guest we modify the event +to include/exclude EL0 as appropriate based on the exclude_host and +exclude_guest attributes. + +The statements above also apply when these attributes are used within a +non-VHE guest however please note that EL2 is never counted within a guest. + + +Accuracy +-------- + +On non-VHE hosts we enable/disable counters on the entry/exit of host/guest +transition at EL2 - however there is a period of time between +enabling/disabling the counters and entering/exiting the guest. We are +able to eliminate counters counting host events on the boundaries of guest +entry/exit when counting guest events by filtering out EL2 for +exclude_host. However when using !exclude_hv there is a small blackout +window at the guest entry/exit where host events are not captured. + +On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +========================================== + +Overview +-------- +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +------ +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counters, the global sysctl +kernel/perf_user_access must first be enabled: + +.. code-block:: sh + + echo 1 > /proc/sys/kernel/perf_user_access + +It is necessary to open the event using the perf tool interface with config1:1 +attr bit set: the sys_perf_event_open syscall returns a fd which can +subsequently be used with the mmap syscall in order to retrieve a page of memory +containing information about the event. The PMU driver uses this page to expose +to the user the hardware counter's index and other necessary data. Using this +index enables the user to access the PMU registers using the `mrs` instruction. +Access to the PMU registers is only valid while the sequence lock is unchanged. +In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is +changed. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--------------------------- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and counter sizes +-------------------------------------- +The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) +counter along with userspace access. The sys_perf_event_open syscall will fail +if a 64-bit counter is requested and the hardware doesn't support 64-bit +counters. Chained events are not supported in conjunction with userspace counter +access. If a 32-bit counter is requested on hardware with 64-bit counters, then +userspace must treat the upper 32-bits read from the counter as UNKNOWN. The +'pmc_width' field in the user page will indicate the valid width of the counter +and should be used to mask the upper bits as needed. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c diff --git a/Documentation/arch/arm64/pointer-authentication.rst b/Documentation/arch/arm64/pointer-authentication.rst new file mode 100644 index 000000000000..e5dad2e40aa8 --- /dev/null +++ b/Documentation/arch/arm64/pointer-authentication.rst @@ -0,0 +1,142 @@ +======================================= +Pointer authentication in AArch64 Linux +======================================= + +Author: Mark Rutland <mark.rutland@arm.com> + +Date: 2017-07-19 + +This document briefly describes the provision of pointer authentication +functionality in AArch64 Linux. + + +Architecture overview +--------------------- + +The ARMv8.3 Pointer Authentication extension adds primitives that can be +used to mitigate certain classes of attack where an attacker can corrupt +the contents of some memory (e.g. the stack). + +The extension uses a Pointer Authentication Code (PAC) to determine +whether pointers have been modified unexpectedly. A PAC is derived from +a pointer, another value (such as the stack pointer), and a secret key +held in system registers. + +The extension adds instructions to insert a valid PAC into a pointer, +and to verify/remove the PAC from a pointer. The PAC occupies a number +of high-order bits of the pointer, which varies dependent on the +configured virtual address size and whether pointer tagging is in use. + +A subset of these instructions have been allocated from the HINT +encoding space. In the absence of the extension (or when disabled), +these instructions behave as NOPs. Applications and libraries using +these instructions operate correctly regardless of the presence of the +extension. + +The extension provides five separate keys to generate PACs - two for +instruction addresses (APIAKey, APIBKey), two for data addresses +(APDAKey, APDBKey), and one for generic authentication (APGAKey). + + +Basic support +------------- + +When CONFIG_ARM64_PTR_AUTH is selected, and relevant HW support is +present, the kernel will assign random key values to each process at +exec*() time. The keys are shared by all threads within the process, and +are preserved across fork(). + +Presence of address authentication functionality is advertised via +HWCAP_PACA, and generic authentication functionality via HWCAP_PACG. + +The number of bits that the PAC occupies in a pointer is 55 minus the +virtual address size configured by the kernel. For example, with a +virtual address size of 48, the PAC is 7 bits wide. + +When ARM64_PTR_AUTH_KERNEL is selected, the kernel will be compiled +with HINT space pointer authentication instructions protecting +function returns. Kernels built with this option will work on hardware +with or without pointer authentication support. + +In addition to exec(), keys can also be reinitialized to random values +using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY, +PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY and PR_PAC_APGAKEY +specifies which keys are to be reinitialized; specifying 0 means "all +keys". + + +Debugging +--------- + +When CONFIG_ARM64_PTR_AUTH is selected, and HW support for address +authentication is present, the kernel will expose the position of TTBR0 +PAC bits in the NT_ARM_PAC_MASK regset (struct user_pac_mask), which +userspace can acquire via PTRACE_GETREGSET. + +The regset is exposed only when HWCAP_PACA is set. Separate masks are +exposed for data pointers and instruction pointers, as the set of PAC +bits can vary between the two. Note that the masks apply to TTBR0 +addresses, and are not valid to apply to TTBR1 addresses (e.g. kernel +pointers). + +Additionally, when CONFIG_CHECKPOINT_RESTORE is also set, the kernel +will expose the NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS regsets (struct +user_pac_address_keys and struct user_pac_generic_keys). These can be +used to get and set the keys for a thread. + + +Virtualization +-------------- + +Pointer authentication is enabled in KVM guest when each virtual cpu is +initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and +requesting these two separate cpu features to be enabled. The current KVM +guest implementation works by enabling both features together, so both +these userspace flags are checked before enabling pointer authentication. +The separate userspace flag will allow to have no userspace ABI changes +if support is added in the future to allow these two features to be +enabled independently of one another. + +As Arm Architecture specifies that Pointer Authentication feature is +implemented along with the VHE feature so KVM arm64 ptrauth code relies +on VHE mode to be present. + +Additionally, when these vcpu feature flags are not set then KVM will +filter out the Pointer Authentication system key registers from +KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID +register. Any attempt to use the Pointer Authentication instructions will +result in an UNDEFINED exception being injected into the guest. + + +Enabling and disabling keys +--------------------------- + +The prctl PR_PAC_SET_ENABLED_KEYS allows the user program to control which +PAC keys are enabled in a particular task. It takes two arguments, the +first being a bitmask of PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY +and PR_PAC_APDBKEY specifying which keys shall be affected by this prctl, +and the second being a bitmask of the same bits specifying whether the key +should be enabled or disabled. For example:: + + prctl(PR_PAC_SET_ENABLED_KEYS, + PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY | PR_PAC_APDBKEY, + PR_PAC_APIBKEY, 0, 0); + +disables all keys except the IB key. + +The main reason why this is useful is to enable a userspace ABI that uses PAC +instructions to sign and authenticate function pointers and other pointers +exposed outside of the function, while still allowing binaries conforming to +the ABI to interoperate with legacy binaries that do not sign or authenticate +pointers. + +The idea is that a dynamic loader or early startup code would issue this +prctl very early after establishing that a process may load legacy binaries, +but before executing any PAC instructions. + +For compatibility with previous kernel versions, processes start up with IA, +IB, DA and DB enabled, and are reset to this state on exec(). Processes created +via fork() and clone() inherit the key enabled state from the calling process. + +It is recommended to avoid disabling the IA key, as this has higher performance +overhead than disabling any of the other keys. diff --git a/Documentation/arch/arm64/ptdump.rst b/Documentation/arch/arm64/ptdump.rst new file mode 100644 index 000000000000..5dcfc5d7cddf --- /dev/null +++ b/Documentation/arch/arm64/ptdump.rst @@ -0,0 +1,96 @@ +====================== +Kernel page table dump +====================== + +ptdump is a debugfs interface that provides a detailed dump of the +kernel page tables. It offers a comprehensive overview of the kernel +virtual memory layout as well as the attributes associated with the +various regions in a human-readable format. It is useful to dump the +kernel page tables to verify permissions and memory types. Examining the +page table entries and permissions helps identify potential security +vulnerabilities such as mappings with overly permissive access rights or +improper memory protections. + +Memory hotplug allows dynamic expansion or contraction of available +memory without requiring a system reboot. To maintain the consistency +and integrity of the memory management data structures, arm64 makes use +of the ``mem_hotplug_lock`` semaphore in write mode. Additionally, in +read mode, ``mem_hotplug_lock`` supports an efficient implementation of +``get_online_mems()`` and ``put_online_mems()``. These protect the +offlining of memory being accessed by the ptdump code. + +In order to dump the kernel page tables, enable the following +configurations and mount debugfs:: + + CONFIG_GENERIC_PTDUMP=y + CONFIG_PTDUMP_CORE=y + CONFIG_PTDUMP_DEBUGFS=y + + mount -t debugfs nodev /sys/kernel/debug + cat /sys/kernel/debug/kernel_page_tables + +On analysing the output of ``cat /sys/kernel/debug/kernel_page_tables`` +one can derive information about the virtual address range of the entry, +followed by size of the memory region covered by this entry, the +hierarchical structure of the page tables and finally the attributes +associated with each page. The page attributes provide information about +access permissions, execution capability, type of mapping such as leaf +level PTE or block level PGD, PMD and PUD, and access status of a page +within the kernel memory. Assessing these attributes can assist in +understanding the memory layout, access patterns and security +characteristics of the kernel pages. + +Kernel virtual memory layout example:: + + start address end address size attributes + +---------------------------------------------------------------------------------------+ + | ---[ Linear Mapping start ]---------------------------------------------------------- | + | .................. | + | 0xfff0000000000000-0xfff0000000210000 2112K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED | + | 0xfff0000000210000-0xfff0000001c00000 26560K PTE ro NX SHD AF UXN MEM/NORMAL | + | .................. | + | ---[ Linear Mapping end ]------------------------------------------------------------ | + +---------------------------------------------------------------------------------------+ + | ---[ Modules start ]----------------------------------------------------------------- | + | .................. | + | 0xffff800000000000-0xffff800008000000 128M PTE | + | .................. | + | ---[ Modules end ]------------------------------------------------------------------- | + +---------------------------------------------------------------------------------------+ + | ---[ vmalloc() area ]---------------------------------------------------------------- | + | .................. | + | 0xffff800008010000-0xffff800008200000 1984K PTE ro x SHD AF UXN MEM/NORMAL | + | 0xffff800008200000-0xffff800008e00000 12M PTE ro x SHD AF CON UXN MEM/NORMAL | + | .................. | + | ---[ vmalloc() end ]----------------------------------------------------------------- | + +---------------------------------------------------------------------------------------+ + | ---[ Fixmap start ]------------------------------------------------------------------ | + | .................. | + | 0xfffffbfffdb80000-0xfffffbfffdb90000 64K PTE ro x SHD AF UXN MEM/NORMAL | + | 0xfffffbfffdb90000-0xfffffbfffdba0000 64K PTE ro NX SHD AF UXN MEM/NORMAL | + | .................. | + | ---[ Fixmap end ]-------------------------------------------------------------------- | + +---------------------------------------------------------------------------------------+ + | ---[ PCI I/O start ]----------------------------------------------------------------- | + | .................. | + | 0xfffffbfffe800000-0xfffffbffff800000 16M PTE | + | .................. | + | ---[ PCI I/O end ]------------------------------------------------------------------- | + +---------------------------------------------------------------------------------------+ + | ---[ vmemmap start ]----------------------------------------------------------------- | + | .................. | + | 0xfffffc0002000000-0xfffffc0002200000 2M PTE RW NX SHD AF UXN MEM/NORMAL | + | 0xfffffc0002200000-0xfffffc0020000000 478M PTE | + | .................. | + | ---[ vmemmap end ]------------------------------------------------------------------- | + +---------------------------------------------------------------------------------------+ + +``cat /sys/kernel/debug/kernel_page_tables`` output:: + + 0xfff0000001c00000-0xfff0000080000000 2020M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED + 0xfff0000080000000-0xfff0000800000000 30G PMD + 0xfff0000800000000-0xfff0000800700000 7M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED + 0xfff0000800700000-0xfff0000800710000 64K PTE ro NX SHD AF UXN MEM/NORMAL-TAGGED + 0xfff0000800710000-0xfff0000880000000 2089920K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED + 0xfff0000880000000-0xfff0040000000000 4062G PMD + 0xfff0040000000000-0xffff800000000000 3964T PGD diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst new file mode 100644 index 000000000000..f093a9d8bc5c --- /dev/null +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -0,0 +1,224 @@ +======================================= +Silicon Errata and Software Workarounds +======================================= + +Author: Will Deacon <will.deacon@arm.com> + +Date : 27 November 2015 + +It is an unfortunate fact of life that hardware is often produced with +so-called "errata", which can cause it to deviate from the architecture +under specific circumstances. For hardware produced by ARM, these +errata are broadly classified into the following categories: + + ========== ======================================================== + Category A A critical error without a viable workaround. + Category B A significant or critical error with an acceptable + workaround. + Category C A minor error that is not expected to occur under normal + operation. + ========== ======================================================== + +For more information, consult one of the "Software Developers Errata +Notice" documents available on infocenter.arm.com (registration +required). + +As far as Linux is concerned, Category B errata may require some special +treatment in the operating system. For example, avoiding a particular +sequence of code, or configuring the processor in a particular way. A +less common situation may require similar actions in order to declassify +a Category A erratum into a Category C erratum. These are collectively +known as "software workarounds" and are only required in the minority of +cases (e.g. those cases that both require a non-secure workaround *and* +can be triggered by Linux). + +For software workarounds that may adversely impact systems unaffected by +the erratum in question, a Kconfig entry is added under "Kernel +Features" -> "ARM errata workarounds via the alternatives framework". +These are enabled by default and patched in at runtime when an affected +CPU is detected. For less-intrusive workarounds, a Kconfig option is not +available and the code is structured (preferably with a comment) in such +a way that the erratum will not be hit. + +This approach can make it slightly onerous to determine exactly which +errata are worked around in an arbitrary kernel source tree, so this +file acts as a registry of software workarounds in the Linux Kernel and +will be updated when new workarounds are committed and backported to +stable kernels. + ++----------------+-----------------+-----------------+-----------------------------+ +| Implementor | Component | Erratum ID | Kconfig | ++================+=================+=================+=============================+ +| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #2441007 | ARM64_ERRATUM_2441007 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #852523 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #853709 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1349291 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-500 | #841119,826419 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-600 | #1076982,1209401| N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-700 | #2268618,2812531| N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | ++----------------+-----------------+-----------------+-----------------------------+ +| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX GICv3 | #38539 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX SMMUv2 | #27704 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 SMMUv3| #74 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 SMMUv3| #126 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 Core | #219 | CAVIUM_TX2_ERRATUM_219 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Marvell | ARM-MMU-500 | #582743 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM | ++----------------+-----------------+-----------------+-----------------------------+ +| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{6,7} | #161010701 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{6,7} | #161010803 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo/Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Rockchip | RK3588 | #3588001 | ROCKCHIP_ERRATUM_3588001 | ++----------------+-----------------+-----------------+-----------------------------+ + ++----------------+-----------------+-----------------+-----------------------------+ +| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | ++----------------+-----------------+-----------------+-----------------------------+ + ++----------------+-----------------+-----------------+-----------------------------+ +| ASR | ASR8601 | #8601001 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/arm64/sme.rst b/Documentation/arch/arm64/sme.rst new file mode 100644 index 000000000000..ba529a1dc606 --- /dev/null +++ b/Documentation/arch/arm64/sme.rst @@ -0,0 +1,468 @@ +=================================================== +Scalable Matrix Extension support for AArch64 Linux +=================================================== + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Matrix Extension (SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. It should be read in conjunction with the SVE +documentation in sve.rst which provides details on the Streaming SVE mode +included in SME. + +This document does not aim to describe the SME architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SME is included in Appendix A. + + +1. General +----------- + +* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when + present) ZTn register state and TPIDR2_EL0 are tracked per thread. + +* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector + AT_HWCAP2 entry. Presence of this flag implies the presence of the SME + instructions and registers, and the Linux-specific system interfaces + described in this document. SME is reported in /proc/cpuinfo as "sme". + +* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the + aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of + the SME2 instructions and ZT0, and the Linux-specific system interfaces + described in this document. SME2 is reported in /proc/cpuinfo as "sme2". + +* Support for the execution of SME instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS + instruction, and checking that the value of the SME field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP2_SME instead. + +* There are a number of optional SME features, presence of these is reported + through AT_HWCAP2 through: + + HWCAP2_SME_I16I64 + HWCAP2_SME_F64F64 + HWCAP2_SME_I8I32 + HWCAP2_SME_F16F32 + HWCAP2_SME_B16F32 + HWCAP2_SME_F32F32 + HWCAP2_SME_FA64 + HWCAP2_SME2 + + This list may be extended over time as the SME architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended + way of detecting support for these regsets is to connect to a target process + first and then attempt a + + ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov). + +* Whenever ZA register values are exchanged in memory between userspace and + the kernel, the register value is encoded in memory as a series of horizontal + vectors from 0 to VL/8-1 stored in the same endianness invariant format as is + used for SVE vectors. + +* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, + in which case it is set to 0. + +2. Vector lengths +------------------ + +SME defines a second vector length similar to the SVE vector length which is +controls the size of the streaming mode SVE vectors and the ZA matrix array. +The ZA matrix is square with each side having as many bytes as a streaming +mode SVE vector. + + +3. Sharing of streaming and non-streaming mode SVE state +--------------------------------------------------------- + +It is implementation defined which if any parts of the SVE state are shared +between streaming and non-streaming modes. When switching between modes +via software interfaces such as ptrace if no register content is provided as +part of switching no state will be assumed to be shared and everything will +be zeroed. + + +4. System call behaviour +------------------------- + +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the + ZA matrix and ZTn (if present) are preserved. + +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled + as per the standard SVE ABI. + +* None of the SVE registers, ZA or ZTn are used to pass arguments to + or receive results from any syscall. + +* On process creation (eg, clone()) the newly created process will have + PSTATE.SM cleared. + +* All other SME state of a thread, including the currently configured vector + length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + +5. Signal handling +------------------- + +* Signal handlers are invoked with streaming mode and ZA disabled. + +* A new signal frame record TPIDR2_MAGIC is added formatted as a struct + tpidr2_context to allow access to TPIDR2_EL0 from signal handlers. + +* A new signal frame record za_context encodes the ZA register contents on + signal delivery. [1] + +* The signal frame record for ZA always contains basic metadata, in particular + the thread's vector length (in za_context.vl). + +* The ZA matrix may or may not be included in the record, depending on + the value of PSTATE.ZA. The registers are present if and only if: + za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + in which case PSTATE.ZA == 1. + +* If matrix data is present, the remainder of the record has a vl-dependent + size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to + them. + +* The matrix is stored as a series of horizontal vectors in the same format as + is used for SVE vectors. + +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. za_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + +* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will + be generated. + +* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a + standard signal frame header followed by a struct zt_context specifying + the number of ZTn registers supported by the system, then zt_context.nregs + blocks of 64 bytes of data per register. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no za_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then ZA is disabled. + +* If za_context is present in the signal frame and contains matrix data then + PSTATE.ZA is set to 1 and ZA is populated with the specified data. + +* The vector length cannot be changed via signal return. If za_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + +* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a + signal frame record for ZTn, resulting in a forced SIGSEGV. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SME vector +length: + +prctl(PR_SME_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SME_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SME_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + Without PR_SME_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SME not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SME_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SME_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all + bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified, including both streaming and non-streaming SVE state. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + +prctl(PR_SME_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SME_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SME_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SME not supported. + + +7. ptrace extensions +--------------------- + +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE + state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in + sve.rst. + +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + + Refer to [2] for definitions. + +The regset data starts with struct user_za_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current streaming vector length, in bytes. + + max_vl + + Maximum possible streaming vector length for the target thread. + + flags + + Zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SME_PT_VL_INHERIT + + SME_PT_VL_ONEXEC (SETREGSET only). + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SME_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* The size and layout of the payload depends on the header fields. The + SME_PT_ZA_*() macros are provided to facilitate access to the data. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case the vector length and flags are changed and PSTATE.ZA is set to 0 + (along with any consequences of those changes). If a payload is provided + then PSTATE.ZA will be set to 1. + +* For SETREGSET, if the requested VL is not supported, the effect will be the + same as if the payload were omitted, except that an EIO error is reported. + No attempt is made to translate the payload data to the correct layout + for the vector length actually set. It is up to the caller to translate the + payload layout for the actual VL and retry. + +* The effect of writing a partial, incomplete payload is unspecified. + +* A new regset NT_ARM_ZT is defined for access to ZTn state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + +* The NT_ARM_ZT regset consists of a single 512 bit register. + +* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0. + +* Writes to NT_ARM_ZT will set PSTATE.ZA to 1. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +* A NT_ARM_ZA note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread + when the coredump was generated. + +* A NT_ARM_ZT note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread + when the coredump was generated. + +* The NT_ARM_TLS note will be extended to two registers, the second register + will contain TPIDR2_EL0 on systems that support SME and will be read as + zero with writes ignored otherwise. + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sme_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 32 or the maximum + supported vector length, whichever is smaller and supported. This + determines the initial vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + + +Appendix A. SME programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SME to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SME adds the following: + +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE + features are available. When supported EL0 software may enter and leave + streaming mode at any time. + + For best system performance it is strongly encouraged for software to enable + streaming mode only when it is actively being used. + +* A new vector length controlling the size of ZA and the Z registers when in + streaming mode, separately to the vector length used for SVE when not in + streaming mode. There is no requirement that either the currently selected + vector length or the set of vector lengths supported for the two modes in + a given system have any relationship. The streaming mode vector length + is referred to as SVL. + +* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most + operations on ZA require that streaming mode be enabled but ZA can be + enabled without streaming mode in order to load, save and retain data. + + For best system performance it is strongly encouraged for software to enable + ZA only when it is actively being used. + +* A new ZT0 register is introduced when SME2 is present. This is a 512 bit + register which is accessible when PSTATE.ZA is set, as ZA itself is. + +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and + SMSTOP instructions or by access to the SVCR system register: + + * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid + data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is + changed from 0 to 1 all bits in ZA are cleared. + + * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value + of PSTATE.SM is changed then it is implementation defined if the subset + of the floating point register bits valid in both modes may be retained. + Any other bits will be cleared. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arch/arm64/cpu-feature-registers.rst diff --git a/Documentation/arch/arm64/sve.rst b/Documentation/arch/arm64/sve.rst new file mode 100644 index 000000000000..0d9a426e9f85 --- /dev/null +++ b/Documentation/arch/arm64/sve.rst @@ -0,0 +1,616 @@ +=================================================== +Scalable Vector Extension support for AArch64 Linux +=================================================== + +Author: Dave Martin <Dave.Martin@arm.com> + +Date: 4 August 2017 + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Vector Extension (SVE), including +interactions with Streaming SVE mode added by the Scalable Matrix Extension +(SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. + +This document does not aim to describe the SVE architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SVE is included in Appendix A. + + +1. General +----------- + +* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are + tracked per-thread. + +* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present + in the system, when it is not supported and these interfaces are used to + access streaming mode FFR is read and written as zero. + +* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector + AT_HWCAP entry. Presence of this flag implies the presence of the SVE + instructions and registers, and the Linux-specific system interfaces + described in this document. SVE is reported in /proc/cpuinfo as "sve". + +* Support for the execution of SVE instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS + instruction, and checking that the value of the SVE field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP_SVE instead. + +* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also + be reported in the AT_HWCAP2 aux vector entry. In addition to this, + optional extensions to SVE2 may be reported by the presence of: + + HWCAP2_SVE2 + HWCAP2_SVEAES + HWCAP2_SVEPMULL + HWCAP2_SVEBITPERM + HWCAP2_SVESHA3 + HWCAP2_SVESM4 + HWCAP2_SVE2P1 + + This list may be extended over time as the SVE architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* On hardware that supports the SME extensions, HWCAP2_SME will also be + reported in the AT_HWCAP2 aux vector entry. Among other things SME adds + streaming mode which provides a subset of the SVE feature set using a + separate SME vector length and the same Z/V registers. See sme.rst + for more details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE regset. The recommended way of detecting support for this regset + is to connect to a target process first and then attempt a + ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is + present and streaming SVE mode is in use the FPSIMD subset of registers + will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode + in the target. + +* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory + between userspace and the kernel, the register value is encoded in memory in + an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at + byte offset i from the start of the memory representation. This affects for + example the signal frame (struct sve_context) and ptrace interface + (struct user_sve_header) and associated data. + + Beware that on big-endian systems this results in a different byte order than + for the FPSIMD V-registers, which are stored as single host-endian 128-bit + values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at + byte offset i. (struct fpsimd_context, struct user_fpsimd_state). + + +2. Vector length terminology +----------------------------- + +The size of an SVE vector (Z) register is referred to as the "vector length". + +To avoid confusion about the units used to express vector length, the kernel +adopts the following conventions: + +* Vector length (VL) = size of a Z-register in bytes + +* Vector quadwords (VQ) = size of a Z-register in units of 128 bits + +(So, VL = 16 * VQ.) + +The VQ convention is used where the underlying granularity is important, such +as in data structure definitions. In most other situations, the VL convention +is used. This is consistent with the meaning of the "VL" pseudo-register in +the SVE instruction set architecture. + + +3. System call behaviour +------------------------- + +* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of + Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR + become zero on return from a syscall. + +* The SVE registers are not used to pass arguments to or receive results from + any syscall. + +* In practice the affected registers/bits will be preserved or will be replaced + with zeros on return from a syscall, but userspace should not make + assumptions about this. The kernel behaviour may vary on a case-by-case + basis. + +* All other SVE state of a thread, including the currently configured vector + length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + In particular, on return from a fork() or clone(), the parent and new child + process or thread share identical SVE configuration, matching that of the + parent before the call. + + +4. Signal handling +------------------- + +* A new signal frame record sve_context encodes the SVE registers on signal + delivery. [1] + +* This record is supplementary to fpsimd_context. The FPSR and FPCR registers + are only present in fpsimd_context. For convenience, the content of V0..V31 + is duplicated between sve_context and fpsimd_context. + +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which + if set indicates that the thread is in streaming mode and the vector length + and register data (if present) describe the streaming SVE data and vector + length. + +* The signal frame record for SVE always contains basic metadata, in particular + the thread's vector length (in sve_context.vl). + +* The SVE registers may or may not be included in the record, depending on + whether the registers are live for the thread. The registers are present if + and only if: + sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). + +* If the registers are present, the remainder of the record has a vl-dependent + size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to + the members. + +* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant + layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the + start of the register's representation in memory. + +* If the SVE context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. sve_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no sve_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then the SVE registers/bits become non-live and take unspecified values. + +* If sve_context is present in the signal frame and contains full register + data, the SVE registers become live and are populated with the specified + data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 + are always restored from the corresponding members of fpsimd_context.vregs[] + and not from sve_context. The remaining bits are restored from sve_context. + +* Inclusion of fpsimd_context in the signal frame remains mandatory, + irrespective of whether sve_context is present or not. + +* The vector length cannot be changed via signal return. If sve_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + +* It is permitted to enter or leave streaming mode by setting or clearing + the SVE_SIG_FLAG_SM flag but applications should take care to ensure that + when doing so sve_context.vl and any register data are appropriate for the + vector length in the new mode. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SVE vector +length: + +prctl(PR_SVE_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SVE_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SVE_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + + Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SVE not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SVE_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of P0..P15, FFR and all bits of + Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current + vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC + flag, does not constitute a change to the vector length for this purpose. + + +prctl(PR_SVE_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SVE_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SVE_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SVE not supported. + + +7. ptrace extensions +--------------------- + +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with + PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the + streaming mode SVE registers and NT_ARM_SVE describes the + non-streaming mode SVE registers. + + In this description a register set is referred to as being "live" when + the target is in the appropriate streaming or non-streaming mode and is + using data beyond the subset shared with the FPSIMD Vn registers. + + Refer to [2] for definitions. + +The regset data starts with struct user_sve_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current vector length, in bytes. + + max_vl + + Maximum possible vector length for the target thread. + + flags + + at most one of + + SVE_PT_REGS_FPSIMD + + SVE registers are not live (GETREGSET) or are to be made + non-live (SETREGSET). + + The payload is of type struct user_fpsimd_state, with the same + meaning as for NT_PRFPREG, starting at offset + SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. + + Extra data might be appended in the future: the size of the + payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). + + vq should be obtained using sve_vq_from_vl(vl). + + or + + SVE_PT_REGS_SVE + + SVE registers are live (GETREGSET) or are to be made live + (SETREGSET). + + The payload contains the SVE register data, starting at offset + SVE_PT_SVE_OFFSET from the start of user_sve_header, and with + size SVE_PT_SVE_SIZE(vq, flags); + + ... OR-ed with zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SVE_PT_VL_INHERIT + + SVE_PT_VL_ONEXEC (SETREGSET only). + + If neither FPSIMD nor SVE flags are provided then no register + payload is available, this is only possible when SME is implemented. + + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SVE_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on + the header fields. The SVE_PT_SVE_*() macros are provided to facilitate + access to the members. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case only the vector length and flags are changed (along with any + consequences of those changes). + +* In systems supporting SME when in streaming mode a GETREGSET for + NT_REG_SVE will return only the user_sve_header with no register data, + similarly a GETREGSET for NT_REG_SSVE will not return any register data + when not in streaming mode. + +* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. + +* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the + requested VL is not supported, the effect will be the same as if the + payload were omitted, except that an EIO error is reported. No + attempt is made to translate the payload data to the correct layout + for the vector length actually set. The thread's FPSIMD state is + preserved, but the remaining bits of the SVE registers become + unspecified. It is up to the caller to translate the payload layout + for the actual VL and retry. + +* Where SME is implemented it is not possible to GETREGSET the register + state for normal SVE when in streaming mode, nor the streaming mode + register state when in normal mode, regardless of the implementation defined + behaviour of the hardware for sharing data between the two modes. + +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in + streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode + if the target was not in streaming mode. + +* The effect of writing a partial, incomplete payload is unspecified. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sve_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 64 or the maximum + supported vector length, whichever is smaller. This determines the initial + vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + +10. Perf extensions +-------------------------------- + +* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register + at index 46. This register is used for DWARF unwinding when variable length + SVE registers are pushed onto the stack. + +* Its value is equivalent to the current SVE vector length (VL) in bits divided + by 64. + +* The value is included in Perf samples in the regs[46] field if + PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. + +* The value is the current value at the time the sample was taken, and it can + change over time. + +* If the system doesn't support SVE when perf_event_open is called with these + settings, the event will fail to open. + +Appendix A. SVE programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SVE to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SVE adds the following: + +* 32 8VL-bit vector registers Z0..Z31 + For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. + + A register write using a Vn register name zeros all bits of the corresponding + Zn except for bits [127:0]. + +* 16 VL-bit predicate registers P0..P15 + +* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") + +* a VL "pseudo-register" that determines the size of each vector register + + The SVE instruction set architecture provides no way to write VL directly. + Instead, it can be modified only by EL1 and above, by writing appropriate + system registers. + +* The value of VL can be configured at runtime by EL1 and above: + 16 <= VL <= VLmax, where VL must be a multiple of 16. + +* The maximum vector length is determined by the hardware: + 16 <= VLmax <= 256. + + (The SVE architecture specifies 256, but permits future architecture + revisions to raise this limit.) + +* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point + operations in a similar way to the way in which they interact with ARMv8 + floating-point operations:: + + 8VL-1 128 0 bit index + +---- //// -----------------+ + Z0 | : V0 | + : : + Z7 | : V7 | + Z8 | : * V8 | + : : : + Z15 | : *V15 | + Z16 | : V16 | + : : + Z31 | : V31 | + +---- //// -----------------+ + 31 0 + VL-1 0 +-------+ + +---- //// --+ FPSR | | + P0 | | +-------+ + : | | *FPCR | | + P15 | | +-------+ + +---- //// --+ + FFR | | +-----+ + +---- //// --+ VL | | + +-----+ + +(*) callee-save: + This only applies to bits [63:0] of Z-/V-registers. + FPCR contains callee-save and caller-save bits. See [4] for details. + + +A.2. Procedure call standard +----------------------------- + +The ARMv8-A base procedure call standard is extended as follows with respect to +the additional SVE register state: + +* All SVE register bits that are not shared with FP/SIMD are caller-save. + +* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. + + This follows from the way these bits are mapped to V8..V15, which are caller- + save in the base procedure call standard. + + +Appendix B. ARMv8-A FP/SIMD programmer's model +=============================================== + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +Refer to [4] for more information. + +ARMv8-A defines the following floating-point / SIMD register state: + +* 32 128-bit vector registers V0..V31 +* 2 32-bit status/control registers FPSR, FPCR + +:: + + 127 0 bit index + +---------------+ + V0 | | + : : : + V7 | | + * V8 | | + : : : : + *V15 | | + V16 | | + : : : + V31 | | + +---------------+ + + 31 0 + +-------+ + FPSR | | + +-------+ + *FPCR | | + +-------+ + +(*) callee-save: + This only applies to bits [63:0] of V-registers. + FPCR contains a mixture of callee-save and caller-save bits. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arch/arm64/cpu-feature-registers.rst + +[4] ARM IHI0055C + http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf + http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html + Procedure Call Standard for the ARM 64-bit Architecture (AArch64) + +[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst diff --git a/Documentation/arch/arm64/tagged-address-abi.rst b/Documentation/arch/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..fe24a3f158c5 --- /dev/null +++ b/Documentation/arch/arm64/tagged-address-abi.rst @@ -0,0 +1,179 @@ +========================== +AArch64 TAGGED ADDRESS ABI +========================== + +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> + Catalin Marinas <catalin.marinas@arm.com> + +Date: 21 August 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--------------- + +On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing +userspace (EL0) to perform memory accesses through 64-bit pointers with +a non-zero top byte. This document describes the relaxation of the +syscall ABI that allows userspace to pass certain tagged pointers to +kernel syscalls. + +2. AArch64 Tagged Address ABI +----------------------------- + +From the kernel syscall interface perspective and for the purposes of +this document, a "valid tagged pointer" is a pointer with a potentially +non-zero top-byte that references an address in the user process address +space obtained in one of the following ways: + +- ``mmap()`` syscall where either: + + - flags have the ``MAP_ANONYMOUS`` bit set or + - the file descriptor refers to a regular file (including those + returned by ``memfd_create()``) or ``/dev/zero`` + +- ``brk()`` syscall (i.e. the heap area between the initial location of + the program break at process creation and its current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for ``mmap()`` above + (e.g. data, bss, stack). + +The AArch64 Tagged Address ABI has two stages of relaxation depending on +how the user addresses are used by the kernel: + +1. User addresses not accessed by the kernel but used for address space + management (e.g. ``mprotect()``, ``madvise()``). The use of valid + tagged pointers in this context is allowed with these exceptions: + + - ``brk()``, ``mmap()`` and the ``new_address`` argument to + ``mremap()`` as these have the potential to alias with existing + user addresses. + + NOTE: This behaviour changed in v5.6 and so some earlier kernels may + incorrectly accept valid tagged pointers for the ``brk()``, + ``mmap()`` and ``mremap()`` system calls. + + - The ``range.start``, ``start`` and ``dst`` arguments to the + ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from + ``userfaultfd()``, as fault addresses subsequently obtained by reading + the file descriptor will be untagged, which may otherwise confuse + tag-unaware programs. + + NOTE: This behaviour changed in v5.14 and so some earlier kernels may + incorrectly accept valid tagged pointers for this system call. + +2. User addresses accessed by the kernel (e.g. ``write()``). This ABI + relaxation is disabled by default and the application thread needs to + explicitly enable it via ``prctl()`` as follows: + + - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged + Address ABI for the calling thread. + + The ``(unsigned int) arg2`` argument is a bit mask describing the + control mode used: + + - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI. + Default status is disabled. + + Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0. + + - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged + Address ABI for the calling thread. + + Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0. + + The ABI properties described above are thread-scoped, inherited on + clone() and fork() and cleared on exec(). + + Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)`` + returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally + disabled by ``sysctl abi.tagged_addr_disabled=1``. The default + ``sysctl abi.tagged_addr_disabled`` configuration is 0. + +When the AArch64 Tagged Address ABI is enabled for a thread, the +following behaviours are guaranteed: + +- All syscalls except the cases mentioned in section 3 can accept any + valid tagged pointer. + +- The syscall behaviour is undefined for invalid tagged pointers: it may + result in an error code being returned, a (fatal) signal being raised, + or other modes of failure. + +- The syscall behaviour for a valid tagged pointer is the same as for + the corresponding untagged pointer. + + +A definition of the meaning of tagged pointers on AArch64 can be found +in Documentation/arch/arm64/tagged-pointers.rst. + +3. AArch64 Tagged Address ABI Exceptions +----------------------------------------- + +The following system call parameters must be untagged regardless of the +ABI relaxation: + +- ``prctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``ioctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``shmat()`` and ``shmdt()``. + +- ``brk()`` (since kernel v5.6). + +- ``mmap()`` (since kernel v5.6). + +- ``mremap()``, the ``new_address`` argument (since kernel v5.6). + +Any attempt to use non-zero tagged pointers may result in an error code +being returned, a (fatal) signal being raised, or other modes of +failure. + +4. Example of correct usage +--------------------------- +.. code-block:: c + + #include <stdlib.h> + #include <string.h> + #include <unistd.h> + #include <sys/mman.h> + #include <sys/prctl.h> + + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_TAGGED_ADDR_ENABLE (1UL << 0) + + #define TAG_SHIFT 56 + + int main(void) + { + int tbi_enabled = 0; + unsigned long tag = 0; + char *ptr; + + /* check/enable the tagged address ABI */ + if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)) + tbi_enabled = 1; + + /* memory allocation */ + ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (ptr == MAP_FAILED) + return 1; + + /* set a non-zero tag if the ABI is available */ + if (tbi_enabled) + tag = rand() & 0xff; + ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT)); + + /* memory access to a tagged address */ + strcpy(ptr, "tagged pointer\n"); + + /* syscall with a tagged pointer */ + write(1, ptr, strlen(ptr)); + + return 0; + } diff --git a/Documentation/arch/arm64/tagged-pointers.rst b/Documentation/arch/arm64/tagged-pointers.rst new file mode 100644 index 000000000000..81b6c2a770dd --- /dev/null +++ b/Documentation/arch/arm64/tagged-pointers.rst @@ -0,0 +1,88 @@ +========================================= +Tagged virtual addresses in AArch64 Linux +========================================= + +Author: Will Deacon <will.deacon@arm.com> + +Date : 12 June 2013 + +This document briefly describes the provision of tagged virtual +addresses in the AArch64 translation system and their potential uses +in AArch64 Linux. + +The kernel configures the translation tables so that translations made +via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of +the virtual address ignored by the translation hardware. This frees up +this byte for application use. + + +Passing tagged addresses to the kernel +-------------------------------------- + +All interpretation of userspace memory addresses by the kernel assumes +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arch/arm64/tagged-address-abi.rst). + +This includes, but is not limited to, addresses found in: + + - pointer arguments to system calls, including pointers in structures + passed to system calls, + + - the stack pointer (sp), e.g. when interpreting it to deliver a + signal, + + - the frame pointer (x29) and frame records, e.g. when interpreting + them to generate a backtrace or call graph. + +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. + +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. + +Programs maintaining a frame pointer and frame records that use non-zero +address tags may suffer impaired or inaccurate debug and profiling +visibility. + + +Preserving tags +--------------- + +When delivering signals, non-zero tags are not preserved in +siginfo.si_addr unless the flag SA_EXPOSE_TAGBITS was set in +sigaction.sa_flags when the signal handler was installed. This means +that signal handlers in applications making use of tags cannot rely +on the tag information for user virtual addresses being maintained +in these fields unless the flag was set. + +Due to architecture limitations, bits 63:60 of the fault address +are not preserved in response to synchronous tag check faults +(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should +treat the values of these bits as undefined in order to accommodate +future architecture revisions which may preserve the bits. + +For signals raised in response to watchpoint debug exceptions, the +tag information will be preserved regardless of the SA_EXPOSE_TAGBITS +flag setting. + +Non-zero tags are never preserved in sigcontext.fault_address +regardless of the SA_EXPOSE_TAGBITS flag setting. + +The architecture prevents the use of a tagged PC, so the upper byte will +be set to a sign-extension of bit 55 on exception return. + +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. + + +Other considerations +-------------------- + +Special care should be taken when using tagged pointers, since it is +likely that C compilers will not hazard two virtual addresses differing +only in the upper byte. diff --git a/Documentation/arch/index.rst b/Documentation/arch/index.rst index 80ee31016584..8458b88e9b79 100644 --- a/Documentation/arch/index.rst +++ b/Documentation/arch/index.rst @@ -10,8 +10,8 @@ implementation. :maxdepth: 2 arc/index - ../arm/index - ../arm64/index + arm/index + arm64/index ia64/index ../loongarch/index m68k/index diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst index 387ccbcb558f..cb05d90111b4 100644 --- a/Documentation/arch/x86/resctrl.rst +++ b/Documentation/arch/x86/resctrl.rst @@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it represents to the parent. Removing one of the created CTRL_MON groups will automatically remove all MON groups below it. +Moving MON group directories to a new parent CTRL_MON group is supported +for the purpose of changing the resource allocations of a MON group +without impacting its monitoring data or assigned tasks. This operation +is not allowed for MON groups which monitor CPUs. No other move +operation is currently allowed other than simply renaming a CTRL_MON or +MON group. + All groups contain the following files: "tasks": |