diff options
Diffstat (limited to 'Documentation/driver-api')
-rw-r--r-- | Documentation/driver-api/dmaengine/client.rst | 87 | ||||
-rw-r--r-- | Documentation/driver-api/dmaengine/provider.rst | 48 | ||||
-rw-r--r-- | Documentation/driver-api/driver-model/devres.rst | 3 | ||||
-rw-r--r-- | Documentation/driver-api/gpio/driver.rst | 5 | ||||
-rw-r--r-- | Documentation/driver-api/gpio/drivers-on-gpio.rst | 8 | ||||
-rw-r--r-- | Documentation/driver-api/gpio/index.rst | 1 | ||||
-rw-r--r-- | Documentation/driver-api/gpio/using-gpio.rst | 50 | ||||
-rw-r--r-- | Documentation/driver-api/interconnect.rst | 22 | ||||
-rw-r--r-- | Documentation/driver-api/thermal/cpu-idle-cooling.rst | 189 | ||||
-rw-r--r-- | Documentation/driver-api/thermal/exynos_thermal.rst | 8 |
10 files changed, 407 insertions, 14 deletions
diff --git a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst index 45953f171500..a9a7a3c84c63 100644 --- a/Documentation/driver-api/dmaengine/client.rst +++ b/Documentation/driver-api/dmaengine/client.rst @@ -151,6 +151,93 @@ The details of these operations are: Note that callbacks will always be invoked from the DMA engines tasklet, never from interrupt context. + Optional: per descriptor metadata + --------------------------------- + DMAengine provides two ways for metadata support. + + DESC_METADATA_CLIENT + + The metadata buffer is allocated/provided by the client driver and it is + attached to the descriptor. + + .. code-block:: c + + int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc, + void *data, size_t len); + + DESC_METADATA_ENGINE + + The metadata buffer is allocated/managed by the DMA driver. The client + driver can ask for the pointer, maximum size and the currently used size of + the metadata and can directly update or read it. + + Becasue the DMA driver manages the memory area containing the metadata, + clients must make sure that they do not try to access or get the pointer + after their transfer completion callback has run for the descriptor. + If no completion callback has been defined for the transfer, then the + metadata must not be accessed after issue_pending. + In other words: if the aim is to read back metadata after the transfer is + completed, then the client must use completion callback. + + .. code-block:: c + + void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc, + size_t *payload_len, size_t *max_len); + + int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc, + size_t payload_len); + + Client drivers can query if a given mode is supported with: + + .. code-block:: c + + bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan, + enum dma_desc_metadata_mode mode); + + Depending on the used mode client drivers must follow different flow. + + DESC_METADATA_CLIENT + + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + construct the metadata in the client's buffer + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the + descriptor + 3. submit the transfer + - DMA_DEV_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the + descriptor + 3. submit the transfer + 4. when the transfer is completed, the metadata should be available in the + attached buffer + + DESC_METADATA_ENGINE + + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the + engine's metadata area + 3. update the metadata at the pointer + 4. use dmaengine_desc_set_metadata_len() to tell the DMA engine the + amount of data the client has placed into the metadata buffer + 5. submit the transfer + - DMA_DEV_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. submit the transfer + 3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get + the pointer to the engine's metadata area + 4. read out the metadata from the pointer + + .. note:: + + When DESC_METADATA_ENGINE mode is used the metadata area for the descriptor + is no longer valid after the transfer has been completed (valid up to the + point when the completion callback returns if used). + + Mixed use of DESC_METADATA_CLIENT / DESC_METADATA_ENGINE is not allowed, + client drivers must use either of the modes per descriptor. + 4. Submit the transaction Once the descriptor has been prepared and the callback information diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst index dfc4486b5743..790a15089f1f 100644 --- a/Documentation/driver-api/dmaengine/provider.rst +++ b/Documentation/driver-api/dmaengine/provider.rst @@ -247,6 +247,54 @@ after each transfer. In case of a ring buffer, they may loop (DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO) are typically fixed. +Per descriptor metadata support +------------------------------- +Some data movement architecture (DMA controller and peripherals) uses metadata +associated with a transaction. The DMA controller role is to transfer the +payload and the metadata alongside. +The metadata itself is not used by the DMA engine itself, but it contains +parameters, keys, vectors, etc for peripheral or from the peripheral. + +The DMAengine framework provides a generic ways to facilitate the metadata for +descriptors. Depending on the architecture the DMA driver can implement either +or both of the methods and it is up to the client driver to choose which one +to use. + +- DESC_METADATA_CLIENT + + The metadata buffer is allocated/provided by the client driver and it is + attached (via the dmaengine_desc_attach_metadata() helper to the descriptor. + + From the DMA driver the following is expected for this mode: + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM + The data from the provided metadata buffer should be prepared for the DMA + controller to be sent alongside of the payload data. Either by copying to a + hardware descriptor, or highly coupled packet. + - DMA_DEV_TO_MEM + On transfer completion the DMA driver must copy the metadata to the client + provided metadata buffer before notifying the client about the completion. + After the transfer completion, DMA drivers must not touch the metadata + buffer provided by the client. + +- DESC_METADATA_ENGINE + + The metadata buffer is allocated/managed by the DMA driver. The client driver + can ask for the pointer, maximum size and the currently used size of the + metadata and can directly update or read it. dmaengine_desc_get_metadata_ptr() + and dmaengine_desc_set_metadata_len() is provided as helper functions. + + From the DMA driver the following is expected for this mode: + - get_metadata_ptr + Should return a pointer for the metadata buffer, the maximum size of the + metadata buffer and the currently used / valid (if any) bytes in the buffer. + - set_metadata_len + It is called by the clients after it have placed the metadata to the buffer + to let the DMA driver know the number of valid bytes provided. + + Note: since the client will ask for the metadata pointer in the completion + callback (in DMA_DEV_TO_MEM case) the DMA driver must ensure that the + descriptor is not freed up prior the callback is called. + Device operations ----------------- diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index 13046fcf0a5d..46c13780994c 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -267,6 +267,8 @@ DRM GPIO devm_gpiod_get() + devm_gpiod_get_array() + devm_gpiod_get_array_optional() devm_gpiod_get_index() devm_gpiod_get_index_optional() devm_gpiod_get_optional() @@ -313,7 +315,6 @@ IOMAP devm_ioport_map() devm_ioport_unmap() devm_ioremap() - devm_ioremap_nocache() devm_ioremap_uc() devm_ioremap_wc() devm_ioremap_resource() : checks resource, requests memory region, ioremaps diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst index 2ff743105927..871922529332 100644 --- a/Documentation/driver-api/gpio/driver.rst +++ b/Documentation/driver-api/gpio/driver.rst @@ -507,11 +507,6 @@ available but we try to move away from this: cascaded irq has to be handled by a threaded interrupt handler. Apart from that it works exactly like the chained irqchip. -- DEPRECATED: gpiochip_set_chained_irqchip(): sets up a chained cascaded irq - handler for a gpio_chip from a parent IRQ and passes the struct gpio_chip* - as handler data. Notice that we pass is as the handler data, since the - irqchip data is likely used by the parent irqchip. - - gpiochip_set_nested_irqchip(): sets up a nested cascaded irq handler for a gpio_chip from a parent IRQ. As the parent IRQ has usually been explicitly requested by the driver, this does very little more than diff --git a/Documentation/driver-api/gpio/drivers-on-gpio.rst b/Documentation/driver-api/gpio/drivers-on-gpio.rst index f3a189320e11..820b403d50f6 100644 --- a/Documentation/driver-api/gpio/drivers-on-gpio.rst +++ b/Documentation/driver-api/gpio/drivers-on-gpio.rst @@ -95,7 +95,7 @@ to emulate MCTRL (modem control) signals CTS/RTS by using two GPIO lines. The MTD NOR flash has add-ons for extra GPIO lines too, though the address bus is usually connected directly to the flash. -Use those instead of talking directly to the GPIOs using sysfs; they integrate -with kernel frameworks better than your userspace code could. Needless to say, -just using the appropriate kernel drivers will simplify and speed up your -embedded hacking in particular by providing ready-made components. +Use those instead of talking directly to the GPIOs from userspace; they +integrate with kernel frameworks better than your userspace code could. +Needless to say, just using the appropriate kernel drivers will simplify and +speed up your embedded hacking in particular by providing ready-made components. diff --git a/Documentation/driver-api/gpio/index.rst b/Documentation/driver-api/gpio/index.rst index 5b61032aa4ea..1d48fe248f05 100644 --- a/Documentation/driver-api/gpio/index.rst +++ b/Documentation/driver-api/gpio/index.rst @@ -8,6 +8,7 @@ Contents: :maxdepth: 2 intro + using-gpio driver consumer board diff --git a/Documentation/driver-api/gpio/using-gpio.rst b/Documentation/driver-api/gpio/using-gpio.rst new file mode 100644 index 000000000000..dda069444032 --- /dev/null +++ b/Documentation/driver-api/gpio/using-gpio.rst @@ -0,0 +1,50 @@ +========================= +Using GPIO Lines in Linux +========================= + +The Linux kernel exists to abstract and present hardware to users. GPIO lines +as such are normally not user facing abstractions. The most obvious, natural +and preferred way to use GPIO lines is to let kernel hardware drivers deal +with them. + +For examples of already existing generic drivers that will also be good +examples for any other kernel drivers you want to author, refer to +:doc:`drivers-on-gpio` + +For any kind of mass produced system you want to support, such as servers, +laptops, phones, tablets, routers, and any consumer or office or business goods +using appropriate kernel drivers is paramount. Submit your code for inclusion +in the upstream Linux kernel when you feel it is mature enough and you will get +help to refine it, see :doc:`../../process/submitting-patches`. + +In Linux GPIO lines also have a userspace ABI. + +The userspace ABI is intended for one-off deployments. Examples are prototypes, +factory lines, maker community projects, workshop specimen, production tools, +industrial automation, PLC-type use cases, door controllers, in short a piece +of specialized equipment that is not produced by the numbers, requiring +operators to have a deep knowledge of the equipment and knows about the +software-hardware interface to be set up. They should not have a natural fit +to any existing kernel subsystem and not be a good fit for an operating system, +because of not being reusable or abstract enough, or involving a lot of non +computer hardware related policy. + +Applications that have a good reason to use the industrial I/O (IIO) subsystem +from userspace will likely be a good fit for using GPIO lines from userspace as +well. + +Do not under any circumstances abuse the GPIO userspace ABI to cut corners in +any product development projects. If you use it for prototyping, then do not +productify the prototype: rewrite it using proper kernel drivers. Do not under +any circumstances deploy any uniform products using GPIO from userspace. + +The userspace ABI is a character device for each GPIO hardware unit (GPIO chip). +These devices will appear on the system as ``/dev/gpiochip0`` thru +``/dev/gpiochipN``. Examples of how to directly use the userspace ABI can be +found in the kernel tree ``tools/gpio`` subdirectory. + +For structured and managed applications, we recommend that you make use of the +libgpiod_ library. This provides helper abstractions, command line utlities +and arbitration for multiple simultaneous consumers on the same GPIO chip. + +.. _libgpiod: https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/ diff --git a/Documentation/driver-api/interconnect.rst b/Documentation/driver-api/interconnect.rst index cdeb5825f314..5ed4f57a6bac 100644 --- a/Documentation/driver-api/interconnect.rst +++ b/Documentation/driver-api/interconnect.rst @@ -91,3 +91,25 @@ Interconnect consumers are the clients which use the interconnect APIs to get paths between endpoints and set their bandwidth/latency/QoS requirements for these interconnect paths. These interfaces are not currently documented. + +Interconnect debugfs interfaces +------------------------------- + +Like several other subsystems interconnect will create some files for debugging +and introspection. Files in debugfs are not considered ABI so application +software shouldn't rely on format details change between kernel versions. + +``/sys/kernel/debug/interconnect/interconnect_summary``: + +Show all interconnect nodes in the system with their aggregated bandwidth +request. Indented under each node show bandwidth requests from each device. + +``/sys/kernel/debug/interconnect/interconnect_graph``: + +Show the interconnect graph in the graphviz dot format. It shows all +interconnect nodes and links in the system and groups together nodes from the +same provider as subgraphs. The format is human-readable and can also be piped +through dot to generate diagrams in many graphical formats:: + + $ cat /sys/kernel/debug/interconnect/interconnect_graph | \ + dot -Tsvg > interconnect_graph.svg diff --git a/Documentation/driver-api/thermal/cpu-idle-cooling.rst b/Documentation/driver-api/thermal/cpu-idle-cooling.rst new file mode 100644 index 000000000000..e4f0859486c7 --- /dev/null +++ b/Documentation/driver-api/thermal/cpu-idle-cooling.rst @@ -0,0 +1,189 @@ + +Situation: +---------- + +Under certain circumstances a SoC can reach a critical temperature +limit and is unable to stabilize the temperature around a temperature +control. When the SoC has to stabilize the temperature, the kernel can +act on a cooling device to mitigate the dissipated power. When the +critical temperature is reached, a decision must be taken to reduce +the temperature, that, in turn impacts performance. + +Another situation is when the silicon temperature continues to +increase even after the dynamic leakage is reduced to its minimum by +clock gating the component. This runaway phenomenon can continue due +to the static leakage. The only solution is to power down the +component, thus dropping the dynamic and static leakage that will +allow the component to cool down. + +Last but not least, the system can ask for a specific power budget but +because of the OPP density, we can only choose an OPP with a power +budget lower than the requested one and under-utilize the CPU, thus +losing performance. In other words, one OPP under-utilizes the CPU +with a power less than the requested power budget and the next OPP +exceeds the power budget. An intermediate OPP could have been used if +it were present. + +Solutions: +---------- + +If we can remove the static and the dynamic leakage for a specific +duration in a controlled period, the SoC temperature will +decrease. Acting on the idle state duration or the idle cycle +injection period, we can mitigate the temperature by modulating the +power budget. + +The Operating Performance Point (OPP) density has a great influence on +the control precision of cpufreq, however different vendors have a +plethora of OPP density, and some have large power gap between OPPs, +that will result in loss of performance during thermal control and +loss of power in other scenarios. + +At a specific OPP, we can assume that injecting idle cycle on all CPUs +belong to the same cluster, with a duration greater than the cluster +idle state target residency, we lead to dropping the static and the +dynamic leakage for this period (modulo the energy needed to enter +this state). So the sustainable power with idle cycles has a linear +relation with the OPP’s sustainable power and can be computed with a +coefficient similar to: + + Power(IdleCycle) = Coef x Power(OPP) + +Idle Injection: +--------------- + +The base concept of the idle injection is to force the CPU to go to an +idle state for a specified time each control cycle, it provides +another way to control CPU power and heat in addition to +cpufreq. Ideally, if all CPUs belonging to the same cluster, inject +their idle cycles synchronously, the cluster can reach its power down +state with a minimum power consumption and reduce the static leakage +to almost zero. However, these idle cycles injection will add extra +latencies as the CPUs will have to wakeup from a deep sleep state. + +We use a fixed duration of idle injection that gives an acceptable +performance penalty and a fixed latency. Mitigation can be increased +or decreased by modulating the duty cycle of the idle injection. + + ^ + | + | + |------- ------- + |_______|_______________________|_______|___________ + + <------> + idle <----------------------> + running + + <-----------------------------> + duty cycle 25% + + +The implementation of the cooling device bases the number of states on +the duty cycle percentage. When no mitigation is happening the cooling +device state is zero, meaning the duty cycle is 0%. + +When the mitigation begins, depending on the governor's policy, a +starting state is selected. With a fixed idle duration and the duty +cycle (aka the cooling device state), the running duration can be +computed. + +The governor will change the cooling device state thus the duty cycle +and this variation will modulate the cooling effect. + + ^ + | + | + |------- ------- + |_______|_______________|_______|___________ + + <------> + idle <--------------> + running + + <-----------------------------> + duty cycle 33% + + + ^ + | + | + |------- ------- + |_______|_______|_______|___________ + + <------> + idle <------> + running + + <-------------> + duty cycle 50% + +The idle injection duration value must comply with the constraints: + +- It is less than or equal to the latency we tolerate when the + mitigation begins. It is platform dependent and will depend on the + user experience, reactivity vs performance trade off we want. This + value should be specified. + +- It is greater than the idle state’s target residency we want to go + for thermal mitigation, otherwise we end up consuming more energy. + +Power considerations +-------------------- + +When we reach the thermal trip point, we have to sustain a specified +power for a specific temperature but at this time we consume: + + Power = Capacitance x Voltage^2 x Frequency x Utilisation + +... which is more than the sustainable power (or there is something +wrong in the system setup). The ‘Capacitance’ and ‘Utilisation’ are a +fixed value, ‘Voltage’ and the ‘Frequency’ are fixed artificially +because we don’t want to change the OPP. We can group the +‘Capacitance’ and the ‘Utilisation’ into a single term which is the +‘Dynamic Power Coefficient (Cdyn)’ Simplifying the above, we have: + + Pdyn = Cdyn x Voltage^2 x Frequency + +The power allocator governor will ask us somehow to reduce our power +in order to target the sustainable power defined in the device +tree. So with the idle injection mechanism, we want an average power +(Ptarget) resulting in an amount of time running at full power on a +specific OPP and idle another amount of time. That could be put in a +equation: + + P(opp)target = ((Trunning x (P(opp)running) + (Tidle x P(opp)idle)) / + (Trunning + Tidle) + ... + + Tidle = Trunning x ((P(opp)running / P(opp)target) - 1) + +At this point if we know the running period for the CPU, that gives us +the idle injection we need. Alternatively if we have the idle +injection duration, we can compute the running duration with: + + Trunning = Tidle / ((P(opp)running / P(opp)target) - 1) + +Practically, if the running power is less than the targeted power, we +end up with a negative time value, so obviously the equation usage is +bound to a power reduction, hence a higher OPP is needed to have the +running power greater than the targeted power. + +However, in this demonstration we ignore three aspects: + + * The static leakage is not defined here, we can introduce it in the + equation but assuming it will be zero most of the time as it is + difficult to get the values from the SoC vendors + + * The idle state wake up latency (or entry + exit latency) is not + taken into account, it must be added in the equation in order to + rigorously compute the idle injection + + * The injected idle duration must be greater than the idle state + target residency, otherwise we end up consuming more energy and + potentially invert the mitigation effect + +So the final equation is: + + Trunning = (Tidle - Twakeup ) x + (((P(opp)dyn + P(opp)static ) - P(opp)target) / P(opp)target ) diff --git a/Documentation/driver-api/thermal/exynos_thermal.rst b/Documentation/driver-api/thermal/exynos_thermal.rst index 5bd556566c70..764df4ab584d 100644 --- a/Documentation/driver-api/thermal/exynos_thermal.rst +++ b/Documentation/driver-api/thermal/exynos_thermal.rst @@ -4,7 +4,7 @@ Kernel driver exynos_tmu Supported chips: -* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC +* ARM Samsung Exynos4, Exynos5 series of SoC Datasheet: Not publicly available @@ -14,7 +14,7 @@ Authors: Amit Daniel <amit.daniel@samsung.com> TMU controller Description: --------------------------- -This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC. +This driver allows to read temperature inside Samsung Exynos4/5 series of SoC. The chip only exposes the measured 8-bit temperature code value through a register. @@ -43,7 +43,7 @@ The three equations are: Trimming info for 85 degree Celsius (stored at TRIMINFO register) Temperature code measured at 85 degree Celsius which is unchanged -TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt +TMU(Thermal Management Unit) in Exynos4/5 generates interrupt when temperature exceeds pre-defined levels. The maximum number of configurable threshold is five. The threshold levels are defined as follows:: @@ -67,7 +67,7 @@ TMU driver description: The exynos thermal driver is structured as:: Kernel Core thermal framework - (thermal_core.c, step_wise.c, cpu_cooling.c) + (thermal_core.c, step_wise.c, cpufreq_cooling.c) ^ | | |