diff options
Diffstat (limited to 'Documentation/driver-api')
27 files changed, 2911 insertions, 8 deletions
diff --git a/Documentation/driver-api/80211/cfg80211.rst b/Documentation/driver-api/80211/cfg80211.rst index b1e149ea6fee..eca534ab6172 100644 --- a/Documentation/driver-api/80211/cfg80211.rst +++ b/Documentation/driver-api/80211/cfg80211.rst @@ -45,6 +45,9 @@ Device registration :functions: wiphy_new .. kernel-doc:: include/net/cfg80211.h + :functions: wiphy_read_of_freq_limits + +.. kernel-doc:: include/net/cfg80211.h :functions: wiphy_register .. kernel-doc:: include/net/cfg80211.h diff --git a/Documentation/driver-api/device-io.rst b/Documentation/driver-api/device-io.rst new file mode 100644 index 000000000000..b00b23903078 --- /dev/null +++ b/Documentation/driver-api/device-io.rst @@ -0,0 +1,201 @@ +.. Copyright 2001 Matthew Wilcox +.. +.. This documentation is free software; you can redistribute +.. it and/or modify it under the terms of the GNU General Public +.. License as published by the Free Software Foundation; either +.. version 2 of the License, or (at your option) any later +.. version. + +=============================== +Bus-Independent Device Accesses +=============================== + +:Author: Matthew Wilcox +:Author: Alan Cox + +Introduction +============ + +Linux provides an API which abstracts performing IO across all busses +and devices, allowing device drivers to be written independently of bus +type. + +Memory Mapped IO +================ + +Getting Access to the Device +---------------------------- + +The most widely supported form of IO is memory mapped IO. That is, a +part of the CPU's address space is interpreted not as accesses to +memory, but as accesses to a device. Some architectures define devices +to be at a fixed address, but most have some method of discovering +devices. The PCI bus walk is a good example of such a scheme. This +document does not cover how to receive such an address, but assumes you +are starting with one. Physical addresses are of type unsigned long. + +This address should not be used directly. Instead, to get an address +suitable for passing to the accessor functions described below, you +should call :c:func:`ioremap()`. An address suitable for accessing +the device will be returned to you. + +After you've finished using the device (say, in your module's exit +routine), call :c:func:`iounmap()` in order to return the address +space to the kernel. Most architectures allocate new address space each +time you call :c:func:`ioremap()`, and they can run out unless you +call :c:func:`iounmap()`. + +Accessing the device +-------------------- + +The part of the interface most used by drivers is reading and writing +memory-mapped registers on the device. Linux provides interfaces to read +and write 8-bit, 16-bit, 32-bit and 64-bit quantities. Due to a +historical accident, these are named byte, word, long and quad accesses. +Both read and write accesses are supported; there is no prefetch support +at this time. + +The functions are named readb(), readw(), readl(), readq(), +readb_relaxed(), readw_relaxed(), readl_relaxed(), readq_relaxed(), +writeb(), writew(), writel() and writeq(). + +Some devices (such as framebuffers) would like to use larger transfers than +8 bytes at a time. For these devices, the :c:func:`memcpy_toio()`, +:c:func:`memcpy_fromio()` and :c:func:`memset_io()` functions are +provided. Do not use memset or memcpy on IO addresses; they are not +guaranteed to copy data in order. + +The read and write functions are defined to be ordered. That is the +compiler is not permitted to reorder the I/O sequence. When the ordering +can be compiler optimised, you can use __readb() and friends to +indicate the relaxed ordering. Use this with care. + +While the basic functions are defined to be synchronous with respect to +each other and ordered with respect to each other the busses the devices +sit on may themselves have asynchronicity. In particular many authors +are burned by the fact that PCI bus writes are posted asynchronously. A +driver author must issue a read from the same device to ensure that +writes have occurred in the specific cases the author cares. This kind +of property cannot be hidden from driver writers in the API. In some +cases, the read used to flush the device may be expected to fail (if the +card is resetting, for example). In that case, the read should be done +from config space, which is guaranteed to soft-fail if the card doesn't +respond. + +The following is an example of flushing a write to a device when the +driver would like to ensure the write's effects are visible prior to +continuing execution:: + + static inline void + qla1280_disable_intrs(struct scsi_qla_host *ha) + { + struct device_reg *reg; + + reg = ha->iobase; + /* disable risc and host interrupts */ + WRT_REG_WORD(®->ictrl, 0); + /* + * The following read will ensure that the above write + * has been received by the device before we return from this + * function. + */ + RD_REG_WORD(®->ictrl); + ha->flags.ints_enabled = 0; + } + +In addition to write posting, on some large multiprocessing systems +(e.g. SGI Challenge, Origin and Altix machines) posted writes won't be +strongly ordered coming from different CPUs. Thus it's important to +properly protect parts of your driver that do memory-mapped writes with +locks and use the :c:func:`mmiowb()` to make sure they arrive in the +order intended. Issuing a regular readX() will also ensure write ordering, +but should only be used when the +driver has to be sure that the write has actually arrived at the device +(not that it's simply ordered with respect to other writes), since a +full readX() is a relatively expensive operation. + +Generally, one should use :c:func:`mmiowb()` prior to releasing a spinlock +that protects regions using :c:func:`writeb()` or similar functions that +aren't surrounded by readb() calls, which will ensure ordering +and flushing. The following pseudocode illustrates what might occur if +write ordering isn't guaranteed via :c:func:`mmiowb()` or one of the +readX() functions:: + + CPU A: spin_lock_irqsave(&dev_lock, flags) + CPU A: ... + CPU A: writel(newval, ring_ptr); + CPU A: spin_unlock_irqrestore(&dev_lock, flags) + ... + CPU B: spin_lock_irqsave(&dev_lock, flags) + CPU B: writel(newval2, ring_ptr); + CPU B: ... + CPU B: spin_unlock_irqrestore(&dev_lock, flags) + +In the case above, newval2 could be written to ring_ptr before newval. +Fixing it is easy though:: + + CPU A: spin_lock_irqsave(&dev_lock, flags) + CPU A: ... + CPU A: writel(newval, ring_ptr); + CPU A: mmiowb(); /* ensure no other writes beat us to the device */ + CPU A: spin_unlock_irqrestore(&dev_lock, flags) + ... + CPU B: spin_lock_irqsave(&dev_lock, flags) + CPU B: writel(newval2, ring_ptr); + CPU B: ... + CPU B: mmiowb(); + CPU B: spin_unlock_irqrestore(&dev_lock, flags) + +See tg3.c for a real world example of how to use :c:func:`mmiowb()` + +PCI ordering rules also guarantee that PIO read responses arrive after any +outstanding DMA writes from that bus, since for some devices the result of +a readb() call may signal to the driver that a DMA transaction is +complete. In many cases, however, the driver may want to indicate that the +next readb() call has no relation to any previous DMA writes +performed by the device. The driver can use readb_relaxed() for +these cases, although only some platforms will honor the relaxed +semantics. Using the relaxed read functions will provide significant +performance benefits on platforms that support it. The qla2xxx driver +provides examples of how to use readX_relaxed(). In many cases, a majority +of the driver's readX() calls can safely be converted to readX_relaxed() +calls, since only a few will indicate or depend on DMA completion. + +Port Space Accesses +=================== + +Port Space Explained +-------------------- + +Another form of IO commonly supported is Port Space. This is a range of +addresses separate to the normal memory address space. Access to these +addresses is generally not as fast as accesses to the memory mapped +addresses, and it also has a potentially smaller address space. + +Unlike memory mapped IO, no preparation is required to access port +space. + +Accessing Port Space +-------------------- + +Accesses to this space are provided through a set of functions which +allow 8-bit, 16-bit and 32-bit accesses; also known as byte, word and +long. These functions are :c:func:`inb()`, :c:func:`inw()`, +:c:func:`inl()`, :c:func:`outb()`, :c:func:`outw()` and +:c:func:`outl()`. + +Some variants are provided for these functions. Some devices require +that accesses to their ports are slowed down. This functionality is +provided by appending a ``_p`` to the end of the function. +There are also equivalents to memcpy. The :c:func:`ins()` and +:c:func:`outs()` functions copy bytes, words or longs to the given +port. + +Public Functions Provided +========================= + +.. kernel-doc:: arch/x86/include/asm/io.h + :internal: + +.. kernel-doc:: lib/pci_iomap.c + :export: diff --git a/Documentation/driver-api/device_link.rst b/Documentation/driver-api/device_link.rst index 5f5713448703..70e328e16aad 100644 --- a/Documentation/driver-api/device_link.rst +++ b/Documentation/driver-api/device_link.rst @@ -1,3 +1,6 @@ +.. |struct dev_pm_domain| replace:: :c:type:`struct dev_pm_domain <dev_pm_domain>` +.. |struct generic_pm_domain| replace:: :c:type:`struct generic_pm_domain <generic_pm_domain>` + ============ Device links ============ @@ -120,12 +123,11 @@ Examples is the same as if the MMU was the parent of the master device. The fact that both devices share the same power domain would normally - suggest usage of a :c:type:`struct dev_pm_domain` or :c:type:`struct - generic_pm_domain`, however these are not independent devices that - happen to share a power switch, but rather the MMU device serves the - busmaster device and is useless without it. A device link creates a - synthetic hierarchical relationship between the devices and is thus - more apt. + suggest usage of a |struct dev_pm_domain| or |struct generic_pm_domain|, + however these are not independent devices that happen to share a power + switch, but rather the MMU device serves the busmaster device and is + useless without it. A device link creates a synthetic hierarchical + relationship between the devices and is thus more apt. * A Thunderbolt host controller comprises a number of PCIe hotplug ports and an NHI device to manage the PCIe switch. On resume from system sleep, @@ -157,7 +159,7 @@ Examples Alternatives ============ -* A :c:type:`struct dev_pm_domain` can be used to override the bus, +* A |struct dev_pm_domain| can be used to override the bus, class or device type callbacks. It is intended for devices sharing a single on/off switch, however it does not guarantee a specific suspend/resume ordering, this needs to be implemented separately. @@ -166,7 +168,7 @@ Alternatives suspended. Furthermore it cannot be used to enforce a specific shutdown ordering or a driver presence dependency. -* A :c:type:`struct generic_pm_domain` is a lot more heavyweight than a +* A |struct generic_pm_domain| is a lot more heavyweight than a device link and does not allow for shutdown ordering or driver presence dependencies. It also cannot be used on ACPI systems. diff --git a/Documentation/driver-api/firmware/built-in-fw.rst b/Documentation/driver-api/firmware/built-in-fw.rst new file mode 100644 index 000000000000..7300e66857f8 --- /dev/null +++ b/Documentation/driver-api/firmware/built-in-fw.rst @@ -0,0 +1,38 @@ +================= +Built-in firmware +================= + +Firmware can be built-in to the kernel, this means building the firmware +into vmlinux directly, to enable avoiding having to look for firmware from +the filesystem. Instead, firmware can be looked for inside the kernel +directly. You can enable built-in firmware using the kernel configuration +options: + + * CONFIG_EXTRA_FIRMWARE + * CONFIG_EXTRA_FIRMWARE_DIR + +This should not be confused with CONFIG_FIRMWARE_IN_KERNEL, this is for drivers +which enables firmware to be built as part of the kernel build process. This +option, CONFIG_FIRMWARE_IN_KERNEL, will build all firmware for all drivers +enabled which ship its firmware inside the Linux kernel source tree. + +There are a few reasons why you might want to consider building your firmware +into the kernel with CONFIG_EXTRA_FIRMWARE though: + +* Speed +* Firmware is needed for accessing the boot device, and the user doesn't + want to stuff the firmware into the boot initramfs. + +Even if you have these needs there are a few reasons why you may not be +able to make use of built-in firmware: + +* Legalese - firmware is non-GPL compatible +* Some firmware may be optional +* Firmware upgrades are possible, therefore a new firmware would implicate + a complete kernel rebuild. +* Some firmware files may be really large in size. The remote-proc subsystem + is an example subsystem which deals with these sorts of firmware +* The firmware may need to be scraped out from some device specific location + dynamically, an example is calibration data for for some WiFi chipsets. This + calibration data can be unique per sold device. + diff --git a/Documentation/driver-api/firmware/core.rst b/Documentation/driver-api/firmware/core.rst new file mode 100644 index 000000000000..1d1688cbc078 --- /dev/null +++ b/Documentation/driver-api/firmware/core.rst @@ -0,0 +1,16 @@ +========================== +Firmware API core features +========================== + +The firmware API has a rich set of core features available. This section +documents these features. + +.. toctree:: + + fw_search_path + built-in-fw + firmware_cache + direct-fs-lookup + fallback-mechanisms + lookup-order + diff --git a/Documentation/driver-api/firmware/direct-fs-lookup.rst b/Documentation/driver-api/firmware/direct-fs-lookup.rst new file mode 100644 index 000000000000..82b4d585a213 --- /dev/null +++ b/Documentation/driver-api/firmware/direct-fs-lookup.rst @@ -0,0 +1,30 @@ +======================== +Direct filesystem lookup +======================== + +Direct filesystem lookup is the most common form of firmware lookup performed +by the kernel. The kernel looks for the firmware directly on the root +filesystem in the paths documented in the section 'Firmware search paths'. +The filesystem lookup is implemented in fw_get_filesystem_firmware(), it +uses common core kernel file loader facility kernel_read_file_from_path(). +The max path allowed is PATH_MAX -- currently this is 4096 characters. + +It is recommended you keep /lib/firmware paths on your root filesystem, +avoid having a separate partition for them in order to avoid possible +races with lookups and avoid uses of the custom fallback mechanisms +documented below. + +Firmware and initramfs +---------------------- + +Drivers which are built-in to the kernel should have the firmware integrated +also as part of the initramfs used to boot the kernel given that otherwise +a race is possible with loading the driver and the real rootfs not yet being +available. Stuffing the firmware into initramfs resolves this race issue, +however note that using initrd does not suffice to address the same race. + +There are circumstances that justify not wanting to include firmware into +initramfs, such as dealing with large firmware firmware files for the +remote-proc subsystem. For such cases using a userspace fallback mechanism +is currently the only viable solution as only userspace can know for sure +when the real rootfs is ready and mounted. diff --git a/Documentation/driver-api/firmware/fallback-mechanisms.rst b/Documentation/driver-api/firmware/fallback-mechanisms.rst new file mode 100644 index 000000000000..d19354794e67 --- /dev/null +++ b/Documentation/driver-api/firmware/fallback-mechanisms.rst @@ -0,0 +1,195 @@ +=================== +Fallback mechanisms +=================== + +A fallback mechanism is supported to allow to overcome failures to do a direct +filesystem lookup on the root filesystem or when the firmware simply cannot be +installed for practical reasons on the root filesystem. The kernel +configuration options related to supporting the firmware fallback mechanism are: + + * CONFIG_FW_LOADER_USER_HELPER: enables building the firmware fallback + mechanism. Most distributions enable this option today. If enabled but + CONFIG_FW_LOADER_USER_HELPER_FALLBACK is disabled, only the custom fallback + mechanism is available and for the request_firmware_nowait() call. + * CONFIG_FW_LOADER_USER_HELPER_FALLBACK: force enables each request to + enable the kobject uevent fallback mechanism on all firmware API calls + except request_firmware_direct(). Most distributions disable this option + today. The call request_firmware_nowait() allows for one alternative + fallback mechanism: if this kconfig option is enabled and your second + argument to request_firmware_nowait(), uevent, is set to false you are + informing the kernel that you have a custom fallback mechanism and it will + manually load the firmware. Read below for more details. + +Note that this means when having this configuration: + +CONFIG_FW_LOADER_USER_HELPER=y +CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n + +the kobject uevent fallback mechanism will never take effect even +for request_firmware_nowait() when uevent is set to true. + +Justifying the firmware fallback mechanism +========================================== + +Direct filesystem lookups may fail for a variety of reasons. Known reasons for +this are worth itemizing and documenting as it justifies the need for the +fallback mechanism: + +* Race against access with the root filesystem upon bootup. + +* Races upon resume from suspend. This is resolved by the firmware cache, but + the firmware cache is only supported if you use uevents, and its not + supported for request_firmware_into_buf(). + +* Firmware is not accessible through typical means: + * It cannot be installed into the root filesystem + * The firmware provides very unique device specific data tailored for + the unit gathered with local information. An example is calibration + data for WiFi chipsets for mobile devices. This calibration data is + not common to all units, but tailored per unit. Such information may + be installed on a separate flash partition other than where the root + filesystem is provided. + +Types of fallback mechanisms +============================ + +There are really two fallback mechanisms available using one shared sysfs +interface as a loading facility: + +* Kobject uevent fallback mechanism +* Custom fallback mechanism + +First lets document the shared sysfs loading facility. + +Firmware sysfs loading facility +=============================== + +In order to help device drivers upload firmware using a fallback mechanism +the firmware infrastructure creates a sysfs interface to enable userspace +to load and indicate when firmware is ready. The sysfs directory is created +via fw_create_instance(). This call creates a new struct device named after +the firmware requested, and establishes it in the device hierarchy by +associating the device used to make the request as the device's parent. +The sysfs directory's file attributes are defined and controlled through +the new device's class (firmare_class) and group (fw_dev_attr_groups). +This is actually where the original firmware_class.c file name comes from, +as originally the only firmware loading mechanism available was the +mechanism we now use as a fallback mechanism. + +To load firmware using the sysfs interface we expose a loading indicator, +and a file upload firmware into: + + * /sys/$DEVPATH/loading + * /sys/$DEVPATH/data + +To upload firmware you will echo 1 onto the loading file to indicate +you are loading firmware. You then cat the firmware into the data file, +and you notify the kernel the firmware is ready by echo'ing 0 onto +the loading file. + +The firmware device used to help load firmware using sysfs is only created if +direct firmware loading fails and if the fallback mechanism is enabled for your +firmware request, this is set up with fw_load_from_user_helper(). It is +important to re-iterate that no device is created if a direct filesystem lookup +succeeded. + +Using:: + + echo 1 > /sys/$DEVPATH/loading + +Will clean any previous partial load at once and make the firmware API +return an error. When loading firmware the firmware_class grows a buffer +for the firmware in PAGE_SIZE increments to hold the image as it comes in. + +firmware_data_read() and firmware_loading_show() are just provided for the +test_firmware driver for testing, they are not called in normal use or +expected to be used regularly by userspace. + +Firmware kobject uevent fallback mechanism +========================================== + +Since a device is created for the sysfs interface to help load firmware as a +fallback mechanism userspace can be informed of the addition of the device by +relying on kobject uevents. The addition of the device into the device +hierarchy means the fallback mechanism for firmware loading has been initiated. +For details of implementation refer to _request_firmware_load(), in particular +on the use of dev_set_uevent_suppress() and kobject_uevent(). + +The kernel's kobject uevent mechanism is implemented in lib/kobject_uevent.c, +it issues uevents to userspace. As a supplement to kobject uevents Linux +distributions could also enable CONFIG_UEVENT_HELPER_PATH, which makes use of +core kernel's usermode helper (UMH) functionality to call out to a userspace +helper for kobject uevents. In practice though no standard distribution has +ever used the CONFIG_UEVENT_HELPER_PATH. If CONFIG_UEVENT_HELPER_PATH is +enabled this binary would be called each time kobject_uevent_env() gets called +in the kernel for each kobject uevent triggered. + +Different implementations have been supported in userspace to take advantage of +this fallback mechanism. When firmware loading was only possible using the +sysfs mechanism the userspace component "hotplug" provided the functionality of +monitoring for kobject events. Historically this was superseded be systemd's +udev, however firmware loading support was removed from udev as of systemd +commit be2ea723b1d0 ("udev: remove userspace firmware loading support") +as of v217 on August, 2014. This means most Linux distributions today are +not using or taking advantage of the firmware fallback mechanism provided +by kobject uevents. This is specially exacerbated due to the fact that most +distributions today disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK. + +Refer to do_firmware_uevent() for details of the kobject event variables +setup. Variables passwdd with a kobject add event: + +* FIRMWARE=firmware name +* TIMEOUT=timeout value +* ASYNC=whether or not the API request was asynchronous + +By default DEVPATH is set by the internal kernel kobject infrastructure. +Below is an example simple kobject uevent script:: + + # Both $DEVPATH and $FIRMWARE are already provided in the environment. + MY_FW_DIR=/lib/firmware/ + echo 1 > /sys/$DEVPATH/loading + cat $MY_FW_DIR/$FIRMWARE > /sys/$DEVPATH/data + echo 0 > /sys/$DEVPATH/loading + +Firmware custom fallback mechanism +================================== + +Users of the request_firmware_nowait() call have yet another option available +at their disposal: rely on the sysfs fallback mechanism but request that no +kobject uevents be issued to userspace. The original logic behind this +was that utilities other than udev might be required to lookup firmware +in non-traditional paths -- paths outside of the listing documented in the +section 'Direct filesystem lookup'. This option is not available to any of +the other API calls as uevents are always forced for them. + +Since uevents are only meaningful if the fallback mechanism is enabled +in your kernel it would seem odd to enable uevents with kernels that do not +have the fallback mechanism enabled in their kernels. Unfortunately we also +rely on the uevent flag which can be disabled by request_firmware_nowait() to +also setup the firmware cache for firmware requests. As documented above, +the firmware cache is only set up if uevent is enabled for an API call. +Although this can disable the firmware cache for request_firmware_nowait() +calls, users of this API should not use it for the purposes of disabling +the cache as that was not the original purpose of the flag. Not setting +the uevent flag means you want to opt-in for the firmware fallback mechanism +but you want to suppress kobject uevents, as you have a custom solution which +will monitor for your device addition into the device hierarchy somehow and +load firmware for you through a custom path. + +Firmware fallback timeout +========================= + +The firmware fallback mechanism has a timeout. If firmware is not loaded +onto the sysfs interface by the timeout value an error is sent to the +driver. By default the timeout is set to 60 seconds if uevents are +desirable, otherwise MAX_JIFFY_OFFSET is used (max timeout possible). +The logic behind using MAX_JIFFY_OFFSET for non-uevents is that a custom +solution will have as much time as it needs to load firmware. + +You can customize the firmware timeout by echo'ing your desired timeout into +the following file: + +* /sys/class/firmware/timeout + +If you echo 0 into it means MAX_JIFFY_OFFSET will be used. The data type +for the timeout is an int. diff --git a/Documentation/driver-api/firmware/firmware_cache.rst b/Documentation/driver-api/firmware/firmware_cache.rst new file mode 100644 index 000000000000..2210e5bfb332 --- /dev/null +++ b/Documentation/driver-api/firmware/firmware_cache.rst @@ -0,0 +1,51 @@ +============== +Firmware cache +============== + +When Linux resumes from suspend some device drivers require firmware lookups to +re-initialize devices. During resume there may be a period of time during which +firmware lookups are not possible, during this short period of time firmware +requests will fail. Time is of essence though, and delaying drivers to wait for +the root filesystem for firmware delays user experience with device +functionality. In order to support these requirements the firmware +infrastructure implements a firmware cache for device drivers for most API +calls, automatically behind the scenes. + +The firmware cache makes using certain firmware API calls safe during a device +driver's suspend and resume callback. Users of these API calls needn't cache +the firmware by themselves for dealing with firmware loss during system resume. + +The firmware cache works by requesting for firmware prior to suspend and +caching it in memory. Upon resume device drivers using the firmware API will +have access to the firmware immediately, without having to wait for the root +filesystem to mount or dealing with possible race issues with lookups as the +root filesystem mounts. + +Some implementation details about the firmware cache setup: + +* The firmware cache is setup by adding a devres entry for each device that + uses all synchronous call except :c:func:`request_firmware_into_buf`. + +* If an asynchronous call is used the firmware cache is only set up for a + device if if the second argument (uevent) to request_firmware_nowait() is + true. When uevent is true it requests that a kobject uevent be sent to + userspace for the firmware request. For details refer to the Fackback + mechanism documented below. + +* If the firmware cache is determined to be needed as per the above two + criteria the firmware cache is setup by adding a devres entry for the + device making the firmware request. + +* The firmware devres entry is maintained throughout the lifetime of the + device. This means that even if you release_firmware() the firmware cache + will still be used on resume from suspend. + +* The timeout for the fallback mechanism is temporarily reduced to 10 seconds + as the firmware cache is set up during suspend, the timeout is set back to + the old value you had configured after the cache is set up. + +* Upon suspend any pending non-uevent firmware requests are killed to avoid + stalling the kernel, this is done with kill_requests_without_uevent(). Kernel + calls requiring the non-uevent therefore need to implement their own firmware + cache mechanism but must not use the firmware API on suspend. + diff --git a/Documentation/driver-api/firmware/fw_search_path.rst b/Documentation/driver-api/firmware/fw_search_path.rst new file mode 100644 index 000000000000..a360f1009fa3 --- /dev/null +++ b/Documentation/driver-api/firmware/fw_search_path.rst @@ -0,0 +1,26 @@ +===================== +Firmware search paths +===================== + +The following search paths are used to look for firmware on your +root filesystem. + +* fw_path_para - module parameter - default is empty so this is ignored +* /lib/firmware/updates/UTS_RELEASE/ +* /lib/firmware/updates/ +* /lib/firmware/UTS_RELEASE/ +* /lib/firmware/ + +The module parameter ''path'' can be passed to the firmware_class module +to activate the first optional custom fw_path_para. The custom path can +only be up to 256 characters long. The kernel parameter passed would be: + +* 'firmware_class.path=$CUSTOMIZED_PATH' + +There is an alternative to customize the path at run time after bootup, you +can use the file: + +* /sys/module/firmware_class/parameters/path + +You would echo into it your custom path and firmware requested will be +searched for there first. diff --git a/Documentation/driver-api/firmware/index.rst b/Documentation/driver-api/firmware/index.rst new file mode 100644 index 000000000000..1abe01793031 --- /dev/null +++ b/Documentation/driver-api/firmware/index.rst @@ -0,0 +1,16 @@ +================== +Linux Firmware API +================== + +.. toctree:: + + introduction + core + request_firmware + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/firmware/introduction.rst b/Documentation/driver-api/firmware/introduction.rst new file mode 100644 index 000000000000..211cb44eb972 --- /dev/null +++ b/Documentation/driver-api/firmware/introduction.rst @@ -0,0 +1,27 @@ +============ +Introduction +============ + +The firmware API enables kernel code to request files required +for functionality from userspace, the uses vary: + +* Microcode for CPU errata +* Device driver firmware, required to be loaded onto device + microcontrollers +* Device driver information data (calibration data, EEPROM overrides), + some of which can be completely optional. + +Types of firmware requests +========================== + +There are two types of calls: + +* Synchronous +* Asynchronous + +Which one you use vary depending on your requirements, the rule of thumb +however is you should strive to use the asynchronous APIs unless you also +are already using asynchronous initialization mechanisms which will not +stall or delay boot. Even if loading firmware does not take a lot of time +processing firmware might, and this can still delay boot or initialization, +as such mechanisms such as asynchronous probe can help supplement drivers. diff --git a/Documentation/driver-api/firmware/lookup-order.rst b/Documentation/driver-api/firmware/lookup-order.rst new file mode 100644 index 000000000000..88c81739683c --- /dev/null +++ b/Documentation/driver-api/firmware/lookup-order.rst @@ -0,0 +1,18 @@ +===================== +Firmware lookup order +===================== + +Different functionality is available to enable firmware to be found. +Below is chronological order of how firmware will be looked for once +a driver issues a firmware API call. + +* The ''Built-in firmware'' is checked first, if the firmware is present we + return it immediately +* The ''Firmware cache'' is looked at next. If the firmware is found we + return it immediately +* The ''Direct filesystem lookup'' is performed next, if found we + return it immediately +* If no firmware has been found and the fallback mechanism was enabled + the sysfs interface is created. After this either a kobject uevent + is issued or the custom firmware loading is relied upon for firmware + loading up to the timeout value. diff --git a/Documentation/driver-api/firmware/request_firmware.rst b/Documentation/driver-api/firmware/request_firmware.rst new file mode 100644 index 000000000000..cc0aea880824 --- /dev/null +++ b/Documentation/driver-api/firmware/request_firmware.rst @@ -0,0 +1,56 @@ +==================== +request_firmware API +==================== + +You would typically load firmware and then load it into your device somehow. +The typical firmware work flow is reflected below:: + + if(request_firmware(&fw_entry, $FIRMWARE, device) == 0) + copy_fw_to_device(fw_entry->data, fw_entry->size); + release_firmware(fw_entry); + +Synchronous firmware requests +============================= + +Synchronous firmware requests will wait until the firmware is found or until +an error is returned. + +request_firmware +---------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware + +request_firmware_direct +----------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_direct + +request_firmware_into_buf +------------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_into_buf + +Asynchronous firmware requests +============================== + +Asynchronous firmware requests allow driver code to not have to wait +until the firmware or an error is returned. Function callbacks are +provided so that when the firmware or an error is found the driver is +informed through the callback. request_firmware_nowait() cannot be called +in atomic contexts. + +request_firmware_nowait +----------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_nowait + +request firmware API expected driver use +======================================== + +Once an API call returns you process the firmware and then release the +firmware. For example if you used request_firmware() and it returns, +the driver has the firmware image accessible in fw_entry->{data,size}. +If something went wrong request_firmware() returns non-zero and fw_entry +is set to NULL. Once your driver is done with processing the firmware it +can call call release_firmware(fw_entry) to release the firmware image +and any related resource. diff --git a/Documentation/driver-api/iio/buffers.rst b/Documentation/driver-api/iio/buffers.rst new file mode 100644 index 000000000000..02c99a6bee18 --- /dev/null +++ b/Documentation/driver-api/iio/buffers.rst @@ -0,0 +1,125 @@ +======= +Buffers +======= + +* struct :c:type:`iio_buffer` — general buffer structure +* :c:func:`iio_validate_scan_mask_onehot` — Validates that exactly one channel + is selected +* :c:func:`iio_buffer_get` — Grab a reference to the buffer +* :c:func:`iio_buffer_put` — Release the reference to the buffer + +The Industrial I/O core offers a way for continuous data capture based on a +trigger source. Multiple data channels can be read at once from +:file:`/dev/iio:device{X}` character device node, thus reducing the CPU load. + +IIO buffer sysfs interface +========================== +An IIO buffer has an associated attributes directory under +:file:`/sys/bus/iio/iio:device{X}/buffer/*`. Here are some of the existing +attributes: + +* :file:`length`, the total number of data samples (capacity) that can be + stored by the buffer. +* :file:`enable`, activate buffer capture. + +IIO buffer setup +================ + +The meta information associated with a channel reading placed in a buffer is +called a scan element . The important bits configuring scan elements are +exposed to userspace applications via the +:file:`/sys/bus/iio/iio:device{X}/scan_elements/*` directory. This file contains +attributes of the following form: + +* :file:`enable`, used for enabling a channel. If and only if its attribute + is non *zero*, then a triggered capture will contain data samples for this + channel. +* :file:`type`, description of the scan element data storage within the buffer + and hence the form in which it is read from user space. + Format is [be|le]:[s|u]bits/storagebitsXrepeat[>>shift] . + * *be* or *le*, specifies big or little endian. + * *s* or *u*, specifies if signed (2's complement) or unsigned. + * *bits*, is the number of valid data bits. + * *storagebits*, is the number of bits (after padding) that it occupies in the + buffer. + * *shift*, if specified, is the shift that needs to be applied prior to + masking out unused bits. + * *repeat*, specifies the number of bits/storagebits repetitions. When the + repeat element is 0 or 1, then the repeat value is omitted. + +For example, a driver for a 3-axis accelerometer with 12 bit resolution where +data is stored in two 8-bits registers as follows:: + + 7 6 5 4 3 2 1 0 + +---+---+---+---+---+---+---+---+ + |D3 |D2 |D1 |D0 | X | X | X | X | (LOW byte, address 0x06) + +---+---+---+---+---+---+---+---+ + + 7 6 5 4 3 2 1 0 + +---+---+---+---+---+---+---+---+ + |D11|D10|D9 |D8 |D7 |D6 |D5 |D4 | (HIGH byte, address 0x07) + +---+---+---+---+---+---+---+---+ + +will have the following scan element type for each axis:: + + $ cat /sys/bus/iio/devices/iio:device0/scan_elements/in_accel_y_type + le:s12/16>>4 + +A user space application will interpret data samples read from the buffer as +two byte little endian signed data, that needs a 4 bits right shift before +masking out the 12 valid bits of data. + +For implementing buffer support a driver should initialize the following +fields in iio_chan_spec definition:: + + struct iio_chan_spec { + /* other members */ + int scan_index + struct { + char sign; + u8 realbits; + u8 storagebits; + u8 shift; + u8 repeat; + enum iio_endian endianness; + } scan_type; + }; + +The driver implementing the accelerometer described above will have the +following channel definition:: + + struct struct iio_chan_spec accel_channels[] = { + { + .type = IIO_ACCEL, + .modified = 1, + .channel2 = IIO_MOD_X, + /* other stuff here */ + .scan_index = 0, + .scan_type = { + .sign = 's', + .realbits = 12, + .storagebits = 16, + .shift = 4, + .endianness = IIO_LE, + }, + } + /* similar for Y (with channel2 = IIO_MOD_Y, scan_index = 1) + * and Z (with channel2 = IIO_MOD_Z, scan_index = 2) axis + */ + } + +Here **scan_index** defines the order in which the enabled channels are placed +inside the buffer. Channels with a lower **scan_index** will be placed before +channels with a higher index. Each channel needs to have a unique +**scan_index**. + +Setting **scan_index** to -1 can be used to indicate that the specific channel +does not support buffered capture. In this case no entries will be created for +the channel in the scan_elements directory. + +More details +============ +.. kernel-doc:: include/linux/iio/buffer.h +.. kernel-doc:: drivers/iio/industrialio-buffer.c + :export: + diff --git a/Documentation/driver-api/iio/core.rst b/Documentation/driver-api/iio/core.rst new file mode 100644 index 000000000000..9a34ae03b679 --- /dev/null +++ b/Documentation/driver-api/iio/core.rst @@ -0,0 +1,182 @@ +============= +Core elements +============= + +The Industrial I/O core offers a unified framework for writing drivers for +many different types of embedded sensors. a standard interface to user space +applications manipulating sensors. The implementation can be found under +:file:`drivers/iio/industrialio-*` + +Industrial I/O Devices +---------------------- + +* struct :c:type:`iio_dev` - industrial I/O device +* :c:func:`iio_device_alloc()` - alocate an :c:type:`iio_dev` from a driver +* :c:func:`iio_device_free()` - free an :c:type:`iio_dev` from a driver +* :c:func:`iio_device_register()` - register a device with the IIO subsystem +* :c:func:`iio_device_unregister()` - unregister a device from the IIO + subsystem + +An IIO device usually corresponds to a single hardware sensor and it +provides all the information needed by a driver handling a device. +Let's first have a look at the functionality embedded in an IIO device +then we will show how a device driver makes use of an IIO device. + +There are two ways for a user space application to interact with an IIO driver. + +1. :file:`/sys/bus/iio/iio:device{X}/`, this represents a hardware sensor + and groups together the data channels of the same chip. +2. :file:`/dev/iio:device{X}`, character device node interface used for + buffered data transfer and for events information retrieval. + +A typical IIO driver will register itself as an :doc:`I2C <../i2c>` or +:doc:`SPI <../spi>` driver and will create two routines, probe and remove. + +At probe: + +1. Call :c:func:`iio_device_alloc()`, which allocates memory for an IIO device. +2. Initialize IIO device fields with driver specific information (e.g. + device name, device channels). +3. Call :c:func:`iio_device_register()`, this registers the device with the + IIO core. After this call the device is ready to accept requests from user + space applications. + +At remove, we free the resources allocated in probe in reverse order: + +1. :c:func:`iio_device_unregister()`, unregister the device from the IIO core. +2. :c:func:`iio_device_free()`, free the memory allocated for the IIO device. + +IIO device sysfs interface +========================== + +Attributes are sysfs files used to expose chip info and also allowing +applications to set various configuration parameters. For device with +index X, attributes can be found under /sys/bus/iio/iio:deviceX/ directory. +Common attributes are: + +* :file:`name`, description of the physical chip. +* :file:`dev`, shows the major:minor pair associated with + :file:`/dev/iio:deviceX` node. +* :file:`sampling_frequency_available`, available discrete set of sampling + frequency values for device. +* Available standard attributes for IIO devices are described in the + :file:`Documentation/ABI/testing/sysfs-bus-iio` file in the Linux kernel + sources. + +IIO device channels +=================== + +struct :c:type:`iio_chan_spec` - specification of a single channel + +An IIO device channel is a representation of a data channel. An IIO device can +have one or multiple channels. For example: + +* a thermometer sensor has one channel representing the temperature measurement. +* a light sensor with two channels indicating the measurements in the visible + and infrared spectrum. +* an accelerometer can have up to 3 channels representing acceleration on X, Y + and Z axes. + +An IIO channel is described by the struct :c:type:`iio_chan_spec`. +A thermometer driver for the temperature sensor in the example above would +have to describe its channel as follows:: + + static const struct iio_chan_spec temp_channel[] = { + { + .type = IIO_TEMP, + .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED), + }, + }; + +Channel sysfs attributes exposed to userspace are specified in the form of +bitmasks. Depending on their shared info, attributes can be set in one of the +following masks: + +* **info_mask_separate**, attributes will be specific to + this channel +* **info_mask_shared_by_type**, attributes are shared by all channels of the + same type +* **info_mask_shared_by_dir**, attributes are shared by all channels of the same + direction +* **info_mask_shared_by_all**, attributes are shared by all channels + +When there are multiple data channels per channel type we have two ways to +distinguish between them: + +* set **.modified** field of :c:type:`iio_chan_spec` to 1. Modifiers are + specified using **.channel2** field of the same :c:type:`iio_chan_spec` + structure and are used to indicate a physically unique characteristic of the + channel such as its direction or spectral response. For example, a light + sensor can have two channels, one for infrared light and one for both + infrared and visible light. +* set **.indexed** field of :c:type:`iio_chan_spec` to 1. In this case the + channel is simply another instance with an index specified by the **.channel** + field. + +Here is how we can make use of the channel's modifiers:: + + static const struct iio_chan_spec light_channels[] = { + { + .type = IIO_INTENSITY, + .modified = 1, + .channel2 = IIO_MOD_LIGHT_IR, + .info_mask_separate = BIT(IIO_CHAN_INFO_RAW), + .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ), + }, + { + .type = IIO_INTENSITY, + .modified = 1, + .channel2 = IIO_MOD_LIGHT_BOTH, + .info_mask_separate = BIT(IIO_CHAN_INFO_RAW), + .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ), + }, + { + .type = IIO_LIGHT, + .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED), + .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ), + }, + } + +This channel's definition will generate two separate sysfs files for raw data +retrieval: + +* :file:`/sys/bus/iio/iio:device{X}/in_intensity_ir_raw` +* :file:`/sys/bus/iio/iio:device{X}/in_intensity_both_raw` + +one file for processed data: + +* :file:`/sys/bus/iio/iio:device{X}/in_illuminance_input` + +and one shared sysfs file for sampling frequency: + +* :file:`/sys/bus/iio/iio:device{X}/sampling_frequency`. + +Here is how we can make use of the channel's indexing:: + + static const struct iio_chan_spec light_channels[] = { + { + .type = IIO_VOLTAGE, + .indexed = 1, + .channel = 0, + .info_mask_separate = BIT(IIO_CHAN_INFO_RAW), + }, + { + .type = IIO_VOLTAGE, + .indexed = 1, + .channel = 1, + .info_mask_separate = BIT(IIO_CHAN_INFO_RAW), + }, + } + +This will generate two separate attributes files for raw data retrieval: + +* :file:`/sys/bus/iio/devices/iio:device{X}/in_voltage0_raw`, representing + voltage measurement for channel 0. +* :file:`/sys/bus/iio/devices/iio:device{X}/in_voltage1_raw`, representing + voltage measurement for channel 1. + +More details +============ +.. kernel-doc:: include/linux/iio/iio.h +.. kernel-doc:: drivers/iio/industrialio-core.c + :export: diff --git a/Documentation/driver-api/iio/index.rst b/Documentation/driver-api/iio/index.rst new file mode 100644 index 000000000000..e5c3922d1b6f --- /dev/null +++ b/Documentation/driver-api/iio/index.rst @@ -0,0 +1,17 @@ +.. include:: <isonum.txt> + +Industrial I/O +============== + +**Copyright** |copy| 2015 Intel Corporation + +Contents: + +.. toctree:: + :maxdepth: 2 + + intro + core + buffers + triggers + triggered-buffers diff --git a/Documentation/driver-api/iio/intro.rst b/Documentation/driver-api/iio/intro.rst new file mode 100644 index 000000000000..3653fbd57069 --- /dev/null +++ b/Documentation/driver-api/iio/intro.rst @@ -0,0 +1,33 @@ +.. include:: <isonum.txt> + +============ +Introduction +============ + +The main purpose of the Industrial I/O subsystem (IIO) is to provide support +for devices that in some sense perform either +analog-to-digital conversion (ADC) or digital-to-analog conversion (DAC) +or both. The aim is to fill the gap between the somewhat similar hwmon and +:doc:`input <../input>` subsystems. Hwmon is directed at low sample rate +sensors used to monitor and control the system itself, like fan speed control +or temperature measurement. :doc:`Input <../input>` is, as its name suggests, +focused on human interaction input devices (keyboard, mouse, touchscreen). +In some cases there is considerable overlap between these and IIO. + +Devices that fall into this category include: + +* analog to digital converters (ADCs) +* accelerometers +* capacitance to digital converters (CDCs) +* digital to analog converters (DACs) +* gyroscopes +* inertial measurement units (IMUs) +* color and light sensors +* magnetometers +* pressure sensors +* proximity sensors +* temperature sensors + +Usually these sensors are connected via :doc:`SPI <../spi>` or +:doc:`I2C <../i2c>`. A common use case of the sensors devices is to have +combined functionality (e.g. light plus proximity sensor). diff --git a/Documentation/driver-api/iio/triggered-buffers.rst b/Documentation/driver-api/iio/triggered-buffers.rst new file mode 100644 index 000000000000..0db12660cc90 --- /dev/null +++ b/Documentation/driver-api/iio/triggered-buffers.rst @@ -0,0 +1,69 @@ +================= +Triggered Buffers +================= + +Now that we know what buffers and triggers are let's see how they work together. + +IIO triggered buffer setup +========================== + +* :c:func:`iio_triggered_buffer_setup` — Setup triggered buffer and pollfunc +* :c:func:`iio_triggered_buffer_cleanup` — Free resources allocated by + :c:func:`iio_triggered_buffer_setup` +* struct :c:type:`iio_buffer_setup_ops` — buffer setup related callbacks + +A typical triggered buffer setup looks like this:: + + const struct iio_buffer_setup_ops sensor_buffer_setup_ops = { + .preenable = sensor_buffer_preenable, + .postenable = sensor_buffer_postenable, + .postdisable = sensor_buffer_postdisable, + .predisable = sensor_buffer_predisable, + }; + + irqreturn_t sensor_iio_pollfunc(int irq, void *p) + { + pf->timestamp = iio_get_time_ns((struct indio_dev *)p); + return IRQ_WAKE_THREAD; + } + + irqreturn_t sensor_trigger_handler(int irq, void *p) + { + u16 buf[8]; + int i = 0; + + /* read data for each active channel */ + for_each_set_bit(bit, active_scan_mask, masklength) + buf[i++] = sensor_get_data(bit) + + iio_push_to_buffers_with_timestamp(indio_dev, buf, timestamp); + + iio_trigger_notify_done(trigger); + return IRQ_HANDLED; + } + + /* setup triggered buffer, usually in probe function */ + iio_triggered_buffer_setup(indio_dev, sensor_iio_polfunc, + sensor_trigger_handler, + sensor_buffer_setup_ops); + +The important things to notice here are: + +* :c:type:`iio_buffer_setup_ops`, the buffer setup functions to be called at + predefined points in the buffer configuration sequence (e.g. before enable, + after disable). If not specified, the IIO core uses the default + iio_triggered_buffer_setup_ops. +* **sensor_iio_pollfunc**, the function that will be used as top half of poll + function. It should do as little processing as possible, because it runs in + interrupt context. The most common operation is recording of the current + timestamp and for this reason one can use the IIO core defined + :c:func:`iio_pollfunc_store_time` function. +* **sensor_trigger_handler**, the function that will be used as bottom half of + the poll function. This runs in the context of a kernel thread and all the + processing takes place here. It usually reads data from the device and + stores it in the internal buffer together with the timestamp recorded in the + top half. + +More details +============ +.. kernel-doc:: drivers/iio/buffer/industrialio-triggered-buffer.c diff --git a/Documentation/driver-api/iio/triggers.rst b/Documentation/driver-api/iio/triggers.rst new file mode 100644 index 000000000000..f89d37e7dd82 --- /dev/null +++ b/Documentation/driver-api/iio/triggers.rst @@ -0,0 +1,80 @@ +======== +Triggers +======== + +* struct :c:type:`iio_trigger` — industrial I/O trigger device +* :c:func:`devm_iio_trigger_alloc` — Resource-managed iio_trigger_alloc +* :c:func:`devm_iio_trigger_free` — Resource-managed iio_trigger_free +* :c:func:`devm_iio_trigger_register` — Resource-managed iio_trigger_register +* :c:func:`devm_iio_trigger_unregister` — Resource-managed + iio_trigger_unregister +* :c:func:`iio_trigger_validate_own_device` — Check if a trigger and IIO + device belong to the same device + +In many situations it is useful for a driver to be able to capture data based +on some external event (trigger) as opposed to periodically polling for data. +An IIO trigger can be provided by a device driver that also has an IIO device +based on hardware generated events (e.g. data ready or threshold exceeded) or +provided by a separate driver from an independent interrupt source (e.g. GPIO +line connected to some external system, timer interrupt or user space writing +a specific file in sysfs). A trigger may initiate data capture for a number of +sensors and also it may be completely unrelated to the sensor itself. + +IIO trigger sysfs interface +=========================== + +There are two locations in sysfs related to triggers: + +* :file:`/sys/bus/iio/devices/trigger{Y}/*`, this file is created once an + IIO trigger is registered with the IIO core and corresponds to trigger + with index Y. + Because triggers can be very different depending on type there are few + standard attributes that we can describe here: + + * :file:`name`, trigger name that can be later used for association with a + device. + * :file:`sampling_frequency`, some timer based triggers use this attribute to + specify the frequency for trigger calls. + +* :file:`/sys/bus/iio/devices/iio:device{X}/trigger/*`, this directory is + created once the device supports a triggered buffer. We can associate a + trigger with our device by writing the trigger's name in the + :file:`current_trigger` file. + +IIO trigger setup +================= + +Let's see a simple example of how to setup a trigger to be used by a driver:: + + struct iio_trigger_ops trigger_ops = { + .set_trigger_state = sample_trigger_state, + .validate_device = sample_validate_device, + } + + struct iio_trigger *trig; + + /* first, allocate memory for our trigger */ + trig = iio_trigger_alloc(dev, "trig-%s-%d", name, idx); + + /* setup trigger operations field */ + trig->ops = &trigger_ops; + + /* now register the trigger with the IIO core */ + iio_trigger_register(trig); + +IIO trigger ops +=============== + +* struct :c:type:`iio_trigger_ops` — operations structure for an iio_trigger. + +Notice that a trigger has a set of operations attached: + +* :file:`set_trigger_state`, switch the trigger on/off on demand. +* :file:`validate_device`, function to validate the device when the current + trigger gets changed. + +More details +============ +.. kernel-doc:: include/linux/iio/trigger.h +.. kernel-doc:: drivers/iio/industrialio-trigger.c + :export: diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 5475a2807e7a..60db00d1532b 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -16,11 +16,15 @@ available subsections can be seen below. basics infrastructure + pm/index + device-io dma-buf device_link message-based sound frame-buffer + regulator + iio/index input usb spi @@ -30,6 +34,8 @@ available subsections can be seen below. miscellaneous vme 80211/index + uio-howto + firmware/index .. only:: subproject and html diff --git a/Documentation/driver-api/pm/conf.py b/Documentation/driver-api/pm/conf.py new file mode 100644 index 000000000000..a89fac11272f --- /dev/null +++ b/Documentation/driver-api/pm/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = "Device Power Management" + +tags.add("subproject") + +latex_documents = [ + ('index', 'pm.tex', project, + 'The kernel development community', 'manual'), +] diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst new file mode 100644 index 000000000000..bedd32388dac --- /dev/null +++ b/Documentation/driver-api/pm/devices.rst @@ -0,0 +1,736 @@ +.. |struct dev_pm_ops| replace:: :c:type:`struct dev_pm_ops <dev_pm_ops>` +.. |struct dev_pm_domain| replace:: :c:type:`struct dev_pm_domain <dev_pm_domain>` +.. |struct bus_type| replace:: :c:type:`struct bus_type <bus_type>` +.. |struct device_type| replace:: :c:type:`struct device_type <device_type>` +.. |struct class| replace:: :c:type:`struct class <class>` +.. |struct wakeup_source| replace:: :c:type:`struct wakeup_source <wakeup_source>` +.. |struct device| replace:: :c:type:`struct device <device>` + +============================== +Device Power Management Basics +============================== + +:: + + Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. + Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu> + Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> + +Most of the code in Linux is device drivers, so most of the Linux power +management (PM) code is also driver-specific. Most drivers will do very +little; others, especially for platforms with small batteries (like cell +phones), will do a lot. + +This writeup gives an overview of how drivers interact with system-wide +power management goals, emphasizing the models and interfaces that are +shared by everything that hooks up to the driver model core. Read it as +background for the domain-specific work you'd do with any specific driver. + + +Two Models for Device Power Management +====================================== + +Drivers will use one or both of these models to put devices into low-power +states: + + System Sleep model: + + Drivers can enter low-power states as part of entering system-wide + low-power states like "suspend" (also known as "suspend-to-RAM"), or + (mostly for systems with disks) "hibernation" (also known as + "suspend-to-disk"). + + This is something that device, bus, and class drivers collaborate on + by implementing various role-specific suspend and resume methods to + cleanly power down hardware and software subsystems, then reactivate + them without loss of data. + + Some drivers can manage hardware wakeup events, which make the system + leave the low-power state. This feature may be enabled or disabled + using the relevant :file:`/sys/devices/.../power/wakeup` file (for + Ethernet drivers the ioctl interface used by ethtool may also be used + for this purpose); enabling it may cost some power usage, but let the + whole system enter low-power states more often. + + Runtime Power Management model: + + Devices may also be put into low-power states while the system is + running, independently of other power management activity in principle. + However, devices are not generally independent of each other (for + example, a parent device cannot be suspended unless all of its child + devices have been suspended). Moreover, depending on the bus type the + device is on, it may be necessary to carry out some bus-specific + operations on the device for this purpose. Devices put into low power + states at run time may require special handling during system-wide power + transitions (suspend or hibernation). + + For these reasons not only the device driver itself, but also the + appropriate subsystem (bus type, device type or device class) driver and + the PM core are involved in runtime power management. As in the system + sleep power management case, they need to collaborate by implementing + various role-specific suspend and resume methods, so that the hardware + is cleanly powered down and reactivated without data or service loss. + +There's not a lot to be said about those low-power states except that they are +very system-specific, and often device-specific. Also, that if enough devices +have been put into low-power states (at runtime), the effect may be very similar +to entering some system-wide low-power state (system sleep) ... and that +synergies exist, so that several drivers using runtime PM might put the system +into a state where even deeper power saving options are available. + +Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except +for wakeup events), no more data read or written, and requests from upstream +drivers are no longer accepted. A given bus or platform may have different +requirements though. + +Examples of hardware wakeup events include an alarm from a real time clock, +network wake-on-LAN packets, keyboard or mouse activity, and media insertion +or removal (for PCMCIA, MMC/SD, USB, and so on). + +Interfaces for Entering System Sleep States +=========================================== + +There are programming interfaces provided for subsystems (bus type, device type, +device class) and device drivers to allow them to participate in the power +management of devices they are concerned with. These interfaces cover both +system sleep and runtime power management. + + +Device Power Management Operations +---------------------------------- + +Device power management operations, at the subsystem level as well as at the +device driver level, are implemented by defining and populating objects of type +|struct dev_pm_ops| defined in :file:`include/linux/pm.h`. The roles of the +methods included in it will be explained in what follows. For now, it should be +sufficient to remember that the last three methods are specific to runtime power +management while the remaining ones are used during system-wide power +transitions. + +There also is a deprecated "old" or "legacy" interface for power management +operations available at least for some subsystems. This approach does not use +|struct dev_pm_ops| objects and it is suitable only for implementing system +sleep power management methods in a limited way. Therefore it is not described +in this document, so please refer directly to the source code for more +information about it. + + +Subsystem-Level Methods +----------------------- + +The core methods to suspend and resume devices reside in +|struct dev_pm_ops| pointed to by the :c:member:`ops` member of +|struct dev_pm_domain|, or by the :c:member:`pm` member of |struct bus_type|, +|struct device_type| and |struct class|. They are mostly of interest to the +people writing infrastructure for platforms and buses, like PCI or USB, or +device type and device class drivers. They also are relevant to the writers of +device drivers whose subsystems (PM domains, device types, device classes and +bus types) don't provide all power management methods. + +Bus drivers implement these methods as appropriate for the hardware and the +drivers using it; PCI works differently from USB, and so on. Not many people +write subsystem-level drivers; most driver code is a "device driver" that builds +on top of bus-specific framework code. + +For more information on these driver calls, see the description later; +they are called in phases for every device, respecting the parent-child +sequencing in the driver model tree. + + +:file:`/sys/devices/.../power/wakeup` files +------------------------------------------- + +All device objects in the driver model contain fields that control the handling +of system wakeup events (hardware signals that can force the system out of a +sleep state). These fields are initialized by bus or device driver code using +:c:func:`device_set_wakeup_capable()` and :c:func:`device_set_wakeup_enable()`, +defined in :file:`include/linux/pm_wakeup.h`. + +The :c:member:`power.can_wakeup` flag just records whether the device (and its +driver) can physically support wakeup events. The +:c:func:`device_set_wakeup_capable()` routine affects this flag. The +:c:member:`power.wakeup` field is a pointer to an object of type +|struct wakeup_source| used for controlling whether or not the device should use +its system wakeup mechanism and for notifying the PM core of system wakeup +events signaled by the device. This object is only present for wakeup-capable +devices (i.e. devices whose :c:member:`can_wakeup` flags are set) and is created +(or removed) by :c:func:`device_set_wakeup_capable()`. + +Whether or not a device is capable of issuing wakeup events is a hardware +matter, and the kernel is responsible for keeping track of it. By contrast, +whether or not a wakeup-capable device should issue wakeup events is a policy +decision, and it is managed by user space through a sysfs attribute: the +:file:`power/wakeup` file. User space can write the "enabled" or "disabled" +strings to it to indicate whether or not, respectively, the device is supposed +to signal system wakeup. This file is only present if the +:c:member:`power.wakeup` object exists for the given device and is created (or +removed) along with that object, by :c:func:`device_set_wakeup_capable()`. +Reads from the file will return the corresponding string. + +The initial value in the :file:`power/wakeup` file is "disabled" for the +majority of devices; the major exceptions are power buttons, keyboards, and +Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool. +It should also default to "enabled" for devices that don't generate wakeup +requests on their own but merely forward wakeup requests from one bus to another +(like PCI Express ports). + +The :c:func:`device_may_wakeup()` routine returns true only if the +:c:member:`power.wakeup` object exists and the corresponding :file:`power/wakeup` +file contains the "enabled" string. This information is used by subsystems, +like the PCI bus type code, to see whether or not to enable the devices' wakeup +mechanisms. If device wakeup mechanisms are enabled or disabled directly by +drivers, they also should use :c:func:`device_may_wakeup()` to decide what to do +during a system sleep transition. Device drivers, however, are not expected to +call :c:func:`device_set_wakeup_enable()` directly in any case. + +It ought to be noted that system wakeup is conceptually different from "remote +wakeup" used by runtime power management, although it may be supported by the +same physical mechanism. Remote wakeup is a feature allowing devices in +low-power states to trigger specific interrupts to signal conditions in which +they should be put into the full-power state. Those interrupts may or may not +be used to signal system wakeup events, depending on the hardware design. On +some systems it is impossible to trigger them from system sleep states. In any +case, remote wakeup should always be enabled for runtime power management for +all devices and drivers that support it. + + +:file:`/sys/devices/.../power/control` files +-------------------------------------------- + +Each device in the driver model has a flag to control whether it is subject to +runtime power management. This flag, :c:member:`runtime_auto`, is initialized +by the bus type (or generally subsystem) code using :c:func:`pm_runtime_allow()` +or :c:func:`pm_runtime_forbid()`; the default is to allow runtime power +management. + +The setting can be adjusted by user space by writing either "on" or "auto" to +the device's :file:`power/control` sysfs file. Writing "auto" calls +:c:func:`pm_runtime_allow()`, setting the flag and allowing the device to be +runtime power-managed by its driver. Writing "on" calls +:c:func:`pm_runtime_forbid()`, clearing the flag, returning the device to full +power if it was in a low-power state, and preventing the +device from being runtime power-managed. User space can check the current value +of the :c:member:`runtime_auto` flag by reading that file. + +The device's :c:member:`runtime_auto` flag has no effect on the handling of +system-wide power transitions. In particular, the device can (and in the +majority of cases should and will) be put into a low-power state during a +system-wide transition to a sleep state even though its :c:member:`runtime_auto` +flag is clear. + +For more information about the runtime power management framework, refer to +:file:`Documentation/power/runtime_pm.txt`. + + +Calling Drivers to Enter and Leave System Sleep States +====================================================== + +When the system goes into a sleep state, each device's driver is asked to +suspend the device by putting it into a state compatible with the target +system state. That's usually some version of "off", but the details are +system-specific. Also, wakeup-enabled devices will usually stay partly +functional in order to wake the system. + +When the system leaves that low-power state, the device's driver is asked to +resume it by returning it to full power. The suspend and resume operations +always go together, and both are multi-phase operations. + +For simple drivers, suspend might quiesce the device using class code +and then turn its hardware as "off" as possible during suspend_noirq. The +matching resume calls would then completely reinitialize the hardware +before reactivating its class I/O queues. + +More power-aware drivers might prepare the devices for triggering system wakeup +events. + + +Call Sequence Guarantees +------------------------ + +To ensure that bridges and similar links needing to talk to a device are +available when the device is suspended or resumed, the device hierarchy is +walked in a bottom-up order to suspend devices. A top-down order is +used to resume those devices. + +The ordering of the device hierarchy is defined by the order in which devices +get registered: a child can never be registered, probed or resumed before +its parent; and can't be removed or suspended after that parent. + +The policy is that the device hierarchy should match hardware bus topology. +[Or at least the control bus, for devices which use multiple busses.] +In particular, this means that a device registration may fail if the parent of +the device is suspending (i.e. has been chosen by the PM core as the next +device to suspend) or has already suspended, as well as after all of the other +devices have been suspended. Device drivers must be prepared to cope with such +situations. + + +System Power Management Phases +------------------------------ + +Suspending or resuming the system is done in several phases. Different phases +are used for suspend-to-idle, shallow (standby), and deep ("suspend-to-RAM") +sleep states and the hibernation state ("suspend-to-disk"). Each phase involves +executing callbacks for every device before the next phase begins. Not all +buses or classes support all these callbacks and not all drivers use all the +callbacks. The various phases always run after tasks have been frozen and +before they are unfrozen. Furthermore, the ``*_noirq phases`` run at a time +when IRQ handlers have been disabled (except for those marked with the +IRQF_NO_SUSPEND flag). + +All phases use PM domain, bus, type, class or driver callbacks (that is, methods +defined in ``dev->pm_domain->ops``, ``dev->bus->pm``, ``dev->type->pm``, +``dev->class->pm`` or ``dev->driver->pm``). These callbacks are regarded by the +PM core as mutually exclusive. Moreover, PM domain callbacks always take +precedence over all of the other callbacks and, for example, type callbacks take +precedence over bus, class and driver callbacks. To be precise, the following +rules are used to determine which callback to execute in the given phase: + + 1. If ``dev->pm_domain`` is present, the PM core will choose the callback + provided by ``dev->pm_domain->ops`` for execution. + + 2. Otherwise, if both ``dev->type`` and ``dev->type->pm`` are present, the + callback provided by ``dev->type->pm`` will be chosen for execution. + + 3. Otherwise, if both ``dev->class`` and ``dev->class->pm`` are present, + the callback provided by ``dev->class->pm`` will be chosen for + execution. + + 4. Otherwise, if both ``dev->bus`` and ``dev->bus->pm`` are present, the + callback provided by ``dev->bus->pm`` will be chosen for execution. + +This allows PM domains and device types to override callbacks provided by bus +types or device classes if necessary. + +The PM domain, type, class and bus callbacks may in turn invoke device- or +driver-specific methods stored in ``dev->driver->pm``, but they don't have to do +that. + +If the subsystem callback chosen for execution is not present, the PM core will +execute the corresponding method from the ``dev->driver->pm`` set instead if +there is one. + + +Entering System Suspend +----------------------- + +When the system goes into the freeze, standby or memory sleep state, +the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. + + 1. The ``prepare`` phase is meant to prevent races by preventing new + devices from being registered; the PM core would never know that all the + children of a device had been suspended if new children could be + registered at will. [By contrast, from the PM core's perspective, + devices may be unregistered at any time.] Unlike the other + suspend-related phases, during the ``prepare`` phase the device + hierarchy is traversed top-down. + + After the ``->prepare`` callback method returns, no new children may be + registered below the device. The method may also prepare the device or + driver in some way for the upcoming system power transition, but it + should not put the device into a low-power state. + + For devices supporting runtime power management, the return value of the + prepare callback can be used to indicate to the PM core that it may + safely leave the device in runtime suspend (if runtime-suspended + already), provided that all of the device's descendants are also left in + runtime suspend. Namely, if the prepare callback returns a positive + number and that happens for all of the descendants of the device too, + and all of them (including the device itself) are runtime-suspended, the + PM core will skip the ``suspend``, ``suspend_late`` and + ``suspend_noirq`` phases as well as all of the corresponding phases of + the subsequent device resume for all of these devices. In that case, + the ``->complete`` callback will be invoked directly after the + ``->prepare`` callback and is entirely responsible for putting the + device into a consistent state as appropriate. + + Note that this direct-complete procedure applies even if the device is + disabled for runtime PM; only the runtime-PM status matters. It follows + that if a device has system-sleep callbacks but does not support runtime + PM, then its prepare callback must never return a positive value. This + is because all such devices are initially set to runtime-suspended with + runtime PM disabled. + + 2. The ``->suspend`` methods should quiesce the device to stop it from + performing I/O. They also may save the device registers and put it into + the appropriate low-power state, depending on the bus type the device is + on, and they may enable wakeup events. + + 3. For a number of devices it is convenient to split suspend into the + "quiesce device" and "save device state" phases, in which cases + ``suspend_late`` is meant to do the latter. It is always executed after + runtime power management has been disabled for the device in question. + + 4. The ``suspend_noirq`` phase occurs after IRQ handlers have been disabled, + which means that the driver's interrupt handler will not be called while + the callback method is running. The ``->suspend_noirq`` methods should + save the values of the device's registers that weren't saved previously + and finally put the device into the appropriate low-power state. + + The majority of subsystems and device drivers need not implement this + callback. However, bus types allowing devices to share interrupt + vectors, like PCI, generally need it; otherwise a driver might encounter + an error during the suspend phase by fielding a shared interrupt + generated by some other device after its own device had been set to low + power. + +At the end of these phases, drivers should have stopped all I/O transactions +(DMA, IRQs), saved enough state that they can re-initialize or restore previous +state (as needed by the hardware), and placed the device into a low-power state. +On many platforms they will gate off one or more clock sources; sometimes they +will also switch off power supplies or reduce voltages. [Drivers supporting +runtime PM may already have performed some or all of these steps.] + +If :c:func:`device_may_wakeup(dev)` returns ``true``, the device should be +prepared for generating hardware wakeup signals to trigger a system wakeup event +when the system is in the sleep state. For example, :c:func:`enable_irq_wake()` +might identify GPIO signals hooked up to a switch or other external hardware, +and :c:func:`pci_enable_wake()` does something similar for the PCI PME signal. + +If any of these callbacks returns an error, the system won't enter the desired +low-power state. Instead, the PM core will unwind its actions by resuming all +the devices that were suspended. + + +Leaving System Suspend +---------------------- + +When resuming from freeze, standby or memory sleep, the phases are: +``resume_noirq``, ``resume_early``, ``resume``, ``complete``. + + 1. The ``->resume_noirq`` callback methods should perform any actions + needed before the driver's interrupt handlers are invoked. This + generally means undoing the actions of the ``suspend_noirq`` phase. If + the bus type permits devices to share interrupt vectors, like PCI, the + method should bring the device and its driver into a state in which the + driver can recognize if the device is the source of incoming interrupts, + if any, and handle them correctly. + + For example, the PCI bus type's ``->pm.resume_noirq()`` puts the device + into the full-power state (D0 in the PCI terminology) and restores the + standard configuration registers of the device. Then it calls the + device driver's ``->pm.resume_noirq()`` method to perform device-specific + actions. + + 2. The ``->resume_early`` methods should prepare devices for the execution + of the resume methods. This generally involves undoing the actions of + the preceding ``suspend_late`` phase. + + 3. The ``->resume`` methods should bring the device back to its operating + state, so that it can perform normal I/O. This generally involves + undoing the actions of the ``suspend`` phase. + + 4. The ``complete`` phase should undo the actions of the ``prepare`` phase. + For this reason, unlike the other resume-related phases, during the + ``complete`` phase the device hierarchy is traversed bottom-up. + + Note, however, that new children may be registered below the device as + soon as the ``->resume`` callbacks occur; it's not necessary to wait + until the ``complete`` phase with that. + + Moreover, if the preceding ``->prepare`` callback returned a positive + number, the device may have been left in runtime suspend throughout the + whole system suspend and resume (the ``suspend``, ``suspend_late``, + ``suspend_noirq`` phases of system suspend and the ``resume_noirq``, + ``resume_early``, ``resume`` phases of system resume may have been + skipped for it). In that case, the ``->complete`` callback is entirely + responsible for putting the device into a consistent state after system + suspend if necessary. [For example, it may need to queue up a runtime + resume request for the device for this purpose.] To check if that is + the case, the ``->complete`` callback can consult the device's + ``power.direct_complete`` flag. Namely, if that flag is set when the + ``->complete`` callback is being run, it has been called directly after + the preceding ``->prepare`` and special actions may be required + to make the device work correctly afterward. + +At the end of these phases, drivers should be as functional as they were before +suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are +gated on. + +However, the details here may again be platform-specific. For example, +some systems support multiple "run" states, and the mode in effect at +the end of resume might not be the one which preceded suspension. +That means availability of certain clocks or power supplies changed, +which could easily affect how a driver works. + +Drivers need to be able to handle hardware which has been reset since all of the +suspend methods were called, for example by complete reinitialization. +This may be the hardest part, and the one most protected by NDA'd documents +and chip errata. It's simplest if the hardware state hasn't changed since +the suspend was carried out, but that can only be guaranteed if the target +system sleep entered was suspend-to-idle. For the other system sleep states +that may not be the case (and usually isn't for ACPI-defined system sleep +states, like S3). + +Drivers must also be prepared to notice that the device has been removed +while the system was powered down, whenever that's physically possible. +PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses +where common Linux platforms will see such removal. Details of how drivers +will notice and handle such removals are currently bus-specific, and often +involve a separate thread. + +These callbacks may return an error value, but the PM core will ignore such +errors since there's nothing it can do about them other than printing them in +the system log. + + +Entering Hibernation +-------------------- + +Hibernating the system is more complicated than putting it into sleep states, +because it involves creating and saving a system image. Therefore there are +more phases for hibernation, with a different set of callbacks. These phases +always run after tasks have been frozen and enough memory has been freed. + +The general procedure for hibernation is to quiesce all devices ("freeze"), +create an image of the system memory while everything is stable, reactivate all +devices ("thaw"), write the image to permanent storage, and finally shut down +the system ("power off"). The phases used to accomplish this are: ``prepare``, +``freeze``, ``freeze_late``, ``freeze_noirq``, ``thaw_noirq``, ``thaw_early``, +``thaw``, ``complete``, ``prepare``, ``poweroff``, ``poweroff_late``, +``poweroff_noirq``. + + 1. The ``prepare`` phase is discussed in the "Entering System Suspend" + section above. + + 2. The ``->freeze`` methods should quiesce the device so that it doesn't + generate IRQs or DMA, and they may need to save the values of device + registers. However the device does not have to be put in a low-power + state, and to save time it's best not to do so. Also, the device should + not be prepared to generate wakeup events. + + 3. The ``freeze_late`` phase is analogous to the ``suspend_late`` phase + described earlier, except that the device should not be put into a + low-power state and should not be allowed to generate wakeup events. + + 4. The ``freeze_noirq`` phase is analogous to the ``suspend_noirq`` phase + discussed earlier, except again that the device should not be put into + a low-power state and should not be allowed to generate wakeup events. + +At this point the system image is created. All devices should be inactive and +the contents of memory should remain undisturbed while this happens, so that the +image forms an atomic snapshot of the system state. + + 5. The ``thaw_noirq`` phase is analogous to the ``resume_noirq`` phase + discussed earlier. The main difference is that its methods can assume + the device is in the same state as at the end of the ``freeze_noirq`` + phase. + + 6. The ``thaw_early`` phase is analogous to the ``resume_early`` phase + described above. Its methods should undo the actions of the preceding + ``freeze_late``, if necessary. + + 7. The ``thaw`` phase is analogous to the ``resume`` phase discussed + earlier. Its methods should bring the device back to an operating + state, so that it can be used for saving the image if necessary. + + 8. The ``complete`` phase is discussed in the "Leaving System Suspend" + section above. + +At this point the system image is saved, and the devices then need to be +prepared for the upcoming system shutdown. This is much like suspending them +before putting the system into the suspend-to-idle, shallow or deep sleep state, +and the phases are similar. + + 9. The ``prepare`` phase is discussed above. + + 10. The ``poweroff`` phase is analogous to the ``suspend`` phase. + + 11. The ``poweroff_late`` phase is analogous to the ``suspend_late`` phase. + + 12. The ``poweroff_noirq`` phase is analogous to the ``suspend_noirq`` phase. + +The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks +should do essentially the same things as the ``->suspend``, ``->suspend_late`` +and ``->suspend_noirq`` callbacks, respectively. The only notable difference is +that they need not store the device register values, because the registers +should already have been stored during the ``freeze``, ``freeze_late`` or +``freeze_noirq`` phases. + + +Leaving Hibernation +------------------- + +Resuming from hibernation is, again, more complicated than resuming from a sleep +state in which the contents of main memory are preserved, because it requires +a system image to be loaded into memory and the pre-hibernation memory contents +to be restored before control can be passed back to the image kernel. + +Although in principle the image might be loaded into memory and the +pre-hibernation memory contents restored by the boot loader, in practice this +can't be done because boot loaders aren't smart enough and there is no +established protocol for passing the necessary information. So instead, the +boot loader loads a fresh instance of the kernel, called "the restore kernel", +into memory and passes control to it in the usual way. Then the restore kernel +reads the system image, restores the pre-hibernation memory contents, and passes +control to the image kernel. Thus two different kernel instances are involved +in resuming from hibernation. In fact, the restore kernel may be completely +different from the image kernel: a different configuration and even a different +version. This has important consequences for device drivers and their +subsystems. + +To be able to load the system image into memory, the restore kernel needs to +include at least a subset of device drivers allowing it to access the storage +medium containing the image, although it doesn't need to include all of the +drivers present in the image kernel. After the image has been loaded, the +devices managed by the boot kernel need to be prepared for passing control back +to the image kernel. This is very similar to the initial steps involved in +creating a system image, and it is accomplished in the same way, using +``prepare``, ``freeze``, and ``freeze_noirq`` phases. However, the devices +affected by these phases are only those having drivers in the restore kernel; +other devices will still be in whatever state the boot loader left them. + +Should the restoration of the pre-hibernation memory contents fail, the restore +kernel would go through the "thawing" procedure described above, using the +``thaw_noirq``, ``thaw_early``, ``thaw``, and ``complete`` phases, and then +continue running normally. This happens only rarely. Most often the +pre-hibernation memory contents are restored successfully and control is passed +to the image kernel, which then becomes responsible for bringing the system back +to the working state. + +To achieve this, the image kernel must restore the devices' pre-hibernation +functionality. The operation is much like waking up from a sleep state (with +the memory contents preserved), although it involves different phases: +``restore_noirq``, ``restore_early``, ``restore``, ``complete``. + + 1. The ``restore_noirq`` phase is analogous to the ``resume_noirq`` phase. + + 2. The ``restore_early`` phase is analogous to the ``resume_early`` phase. + + 3. The ``restore`` phase is analogous to the ``resume`` phase. + + 4. The ``complete`` phase is discussed above. + +The main difference from ``resume[_early|_noirq]`` is that +``restore[_early|_noirq]`` must assume the device has been accessed and +reconfigured by the boot loader or the restore kernel. Consequently, the state +of the device may be different from the state remembered from the ``freeze``, +``freeze_late`` and ``freeze_noirq`` phases. The device may even need to be +reset and completely re-initialized. In many cases this difference doesn't +matter, so the ``->resume[_early|_noirq]`` and ``->restore[_early|_norq]`` +method pointers can be set to the same routines. Nevertheless, different +callback pointers are used in case there is a situation where it actually does +matter. + + +Power Management Notifiers +========================== + +There are some operations that cannot be carried out by the power management +callbacks discussed above, because the callbacks occur too late or too early. +To handle these cases, subsystems and device drivers may register power +management notifiers that are called before tasks are frozen and after they have +been thawed. Generally speaking, the PM notifiers are suitable for performing +actions that either require user space to be available, or at least won't +interfere with user space. + +For details refer to :doc:`notifiers`. + + +Device Low-Power (suspend) States +================================= + +Device low-power states aren't standard. One device might only handle +"on" and "off", while another might support a dozen different versions of +"on" (how many engines are active?), plus a state that gets back to "on" +faster than from a full "off". + +Some buses define rules about what different suspend states mean. PCI +gives one example: after the suspend sequence completes, a non-legacy +PCI device may not perform DMA or issue IRQs, and any wakeup events it +issues would be issued through the PME# bus signal. Plus, there are +several PCI-standard device states, some of which are optional. + +In contrast, integrated system-on-chip processors often use IRQs as the +wakeup event sources (so drivers would call :c:func:`enable_irq_wake`) and +might be able to treat DMA completion as a wakeup event (sometimes DMA can stay +active too, it'd only be the CPU and some peripherals that sleep). + +Some details here may be platform-specific. Systems may have devices that +can be fully active in certain sleep states, such as an LCD display that's +refreshed using DMA while most of the system is sleeping lightly ... and +its frame buffer might even be updated by a DSP or other non-Linux CPU while +the Linux control processor stays idle. + +Moreover, the specific actions taken may depend on the target system state. +One target system state might allow a given device to be very operational; +another might require a hard shut down with re-initialization on resume. +And two different target systems might use the same device in different +ways; the aforementioned LCD might be active in one product's "standby", +but a different product using the same SOC might work differently. + + +Device Power Management Domains +=============================== + +Sometimes devices share reference clocks or other power resources. In those +cases it generally is not possible to put devices into low-power states +individually. Instead, a set of devices sharing a power resource can be put +into a low-power state together at the same time by turning off the shared +power resource. Of course, they also need to be put into the full-power state +together, by turning the shared power resource on. A set of devices with this +property is often referred to as a power domain. A power domain may also be +nested inside another power domain. The nested domain is referred to as the +sub-domain of the parent domain. + +Support for power domains is provided through the :c:member:`pm_domain` field of +|struct device|. This field is a pointer to an object of type +|struct dev_pm_domain|, defined in :file:`include/linux/pm.h``, providing a set +of power management callbacks analogous to the subsystem-level and device driver +callbacks that are executed for the given device during all power transitions, +instead of the respective subsystem-level callbacks. Specifically, if a +device's :c:member:`pm_domain` pointer is not NULL, the ``->suspend()`` callback +from the object pointed to by it will be executed instead of its subsystem's +(e.g. bus type's) ``->suspend()`` callback and analogously for all of the +remaining callbacks. In other words, power management domain callbacks, if +defined for the given device, always take precedence over the callbacks provided +by the device's subsystem (e.g. bus type). + +The support for device power management domains is only relevant to platforms +needing to use the same device driver power management callbacks in many +different power domain configurations and wanting to avoid incorporating the +support for power domains into subsystem-level callbacks, for example by +modifying the platform bus type. Other platforms need not implement it or take +it into account in any way. + +Devices may be defined as IRQ-safe which indicates to the PM core that their +runtime PM callbacks may be invoked with disabled interrupts (see +:file:`Documentation/power/runtime_pm.txt` for more information). If an +IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be +disallowed, unless the domain itself is defined as IRQ-safe. However, it +makes sense to define a PM domain as IRQ-safe only if all the devices in it +are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime +PM of the parent is only allowed if the parent itself is IRQ-safe too with the +additional restriction that all child domains of an IRQ-safe parent must also +be IRQ-safe. + + +Runtime Power Management +======================== + +Many devices are able to dynamically power down while the system is still +running. This feature is useful for devices that are not being used, and +can offer significant power savings on a running system. These devices +often support a range of runtime power states, which might use names such +as "off", "sleep", "idle", "active", and so on. Those states will in some +cases (like PCI) be partially constrained by the bus the device uses, and will +usually include hardware states that are also used in system sleep states. + +A system-wide power transition can be started while some devices are in low +power states due to runtime power management. The system sleep PM callbacks +should recognize such situations and react to them appropriately, but the +necessary actions are subsystem-specific. + +In some cases the decision may be made at the subsystem level while in other +cases the device driver may be left to decide. In some cases it may be +desirable to leave a suspended device in that state during a system-wide power +transition, but in other cases the device must be put back into the full-power +state temporarily, for example so that its system wakeup capability can be +disabled. This all depends on the hardware and the design of the subsystem and +device driver in question. + +During system-wide resume from a sleep state it's easiest to put devices into +the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. +Refer to that document for more information regarding this particular issue as +well as for information on the device runtime power management framework in +general. diff --git a/Documentation/driver-api/pm/index.rst b/Documentation/driver-api/pm/index.rst new file mode 100644 index 000000000000..2f6d0e9cf6b7 --- /dev/null +++ b/Documentation/driver-api/pm/index.rst @@ -0,0 +1,16 @@ +======================= +Device Power Management +======================= + +.. toctree:: + + devices + notifiers + types + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pm/notifiers.rst b/Documentation/driver-api/pm/notifiers.rst new file mode 100644 index 000000000000..62f860026992 --- /dev/null +++ b/Documentation/driver-api/pm/notifiers.rst @@ -0,0 +1,70 @@ +============================= +Suspend/Hibernation Notifiers +============================= + +:: + + Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> + +There are some operations that subsystems or drivers may want to carry out +before hibernation/suspend or after restore/resume, but they require the system +to be fully functional, so the drivers' and subsystems' ``->suspend()`` and +``->resume()`` or even ``->prepare()`` and ``->complete()`` callbacks are not +suitable for this purpose. + +For example, device drivers may want to upload firmware to their devices after +resume/restore, but they cannot do it by calling :c:func:`request_firmware()` +from their ``->resume()`` or ``->complete()`` callback routines (user land +processes are frozen at these points). The solution may be to load the firmware +into memory before processes are frozen and upload it from there in the +``->resume()`` routine. A suspend/hibernation notifier may be used for that. + +Subsystems or drivers having such needs can register suspend notifiers that +will be called upon the following events by the PM core: + +``PM_HIBERNATION_PREPARE`` + The system is going to hibernate, tasks will be frozen immediately. This + is different from ``PM_SUSPEND_PREPARE`` below, because in this case + additional work is done between the notifiers and the invocation of PM + callbacks for the "freeze" transition. + +``PM_POST_HIBERNATION`` + The system memory state has been restored from a hibernation image or an + error occurred during hibernation. Device restore callbacks have been + executed and tasks have been thawed. + +``PM_RESTORE_PREPARE`` + The system is going to restore a hibernation image. If all goes well, + the restored image kernel will issue a ``PM_POST_HIBERNATION`` + notification. + +``PM_POST_RESTORE`` + An error occurred during restore from hibernation. Device restore + callbacks have been executed and tasks have been thawed. + +``PM_SUSPEND_PREPARE`` + The system is preparing for suspend. + +``PM_POST_SUSPEND`` + The system has just resumed or an error occurred during suspend. Device + resume callbacks have been executed and tasks have been thawed. + +It is generally assumed that whatever the notifiers do for +``PM_HIBERNATION_PREPARE``, should be undone for ``PM_POST_HIBERNATION``. +Analogously, operations carried out for ``PM_SUSPEND_PREPARE`` should be +reversed for ``PM_POST_SUSPEND``. + +Moreover, if one of the notifiers fails for the ``PM_HIBERNATION_PREPARE`` or +``PM_SUSPEND_PREPARE`` event, the notifiers that have already succeeded for that +event will be called for ``PM_POST_HIBERNATION`` or ``PM_POST_SUSPEND``, +respectively. + +The hibernation and suspend notifiers are called with :c:data:`pm_mutex` held. +They are defined in the usual way, but their last argument is meaningless (it is +always NULL). + +To register and/or unregister a suspend notifier use +:c:func:`register_pm_notifier()` and :c:func:`unregister_pm_notifier()`, +respectively (both defined in :file:`include/linux/suspend.h`). If you don't +need to unregister the notifier, you can also use the :c:func:`pm_notifier()` +macro defined in :file:`include/linux/suspend.h`. diff --git a/Documentation/driver-api/pm/types.rst b/Documentation/driver-api/pm/types.rst new file mode 100644 index 000000000000..3ebdecc54104 --- /dev/null +++ b/Documentation/driver-api/pm/types.rst @@ -0,0 +1,5 @@ +================================== +Device Power Management Data Types +================================== + +.. kernel-doc:: include/linux/pm.h diff --git a/Documentation/driver-api/regulator.rst b/Documentation/driver-api/regulator.rst new file mode 100644 index 000000000000..520da0a5251d --- /dev/null +++ b/Documentation/driver-api/regulator.rst @@ -0,0 +1,170 @@ +.. Copyright 2007-2008 Wolfson Microelectronics + +.. This documentation is free software; you can redistribute +.. it and/or modify it under the terms of the GNU General Public +.. License version 2 as published by the Free Software Foundation. + +================================= +Voltage and current regulator API +================================= + +:Author: Liam Girdwood +:Author: Mark Brown + +Introduction +============ + +This framework is designed to provide a standard kernel interface to +control voltage and current regulators. + +The intention is to allow systems to dynamically control regulator power +output in order to save power and prolong battery life. This applies to +both voltage regulators (where voltage output is controllable) and +current sinks (where current limit is controllable). + +Note that additional (and currently more complete) documentation is +available in the Linux kernel source under +``Documentation/power/regulator``. + +Glossary +-------- + +The regulator API uses a number of terms which may not be familiar: + +Regulator + + Electronic device that supplies power to other devices. Most regulators + can enable and disable their output and some can also control their + output voltage or current. + +Consumer + + Electronic device which consumes power provided by a regulator. These + may either be static, requiring only a fixed supply, or dynamic, + requiring active management of the regulator at runtime. + +Power Domain + + The electronic circuit supplied by a given regulator, including the + regulator and all consumer devices. The configuration of the regulator + is shared between all the components in the circuit. + +Power Management Integrated Circuit (PMIC) + + An IC which contains numerous regulators and often also other + subsystems. In an embedded system the primary PMIC is often equivalent + to a combination of the PSU and southbridge in a desktop system. + +Consumer driver interface +========================= + +This offers a similar API to the kernel clock framework. Consumer +drivers use `get <#API-regulator-get>`__ and +`put <#API-regulator-put>`__ operations to acquire and release +regulators. Functions are provided to `enable <#API-regulator-enable>`__ +and `disable <#API-regulator-disable>`__ the regulator and to get and +set the runtime parameters of the regulator. + +When requesting regulators consumers use symbolic names for their +supplies, such as "Vcc", which are mapped into actual regulator devices +by the machine interface. + +A stub version of this API is provided when the regulator framework is +not in use in order to minimise the need to use ifdefs. + +Enabling and disabling +---------------------- + +The regulator API provides reference counted enabling and disabling of +regulators. Consumer devices use the :c:func:`regulator_enable()` and +:c:func:`regulator_disable()` functions to enable and disable +regulators. Calls to the two functions must be balanced. + +Note that since multiple consumers may be using a regulator and machine +constraints may not allow the regulator to be disabled there is no +guarantee that calling :c:func:`regulator_disable()` will actually +cause the supply provided by the regulator to be disabled. Consumer +drivers should assume that the regulator may be enabled at all times. + +Configuration +------------- + +Some consumer devices may need to be able to dynamically configure their +supplies. For example, MMC drivers may need to select the correct +operating voltage for their cards. This may be done while the regulator +is enabled or disabled. + +The :c:func:`regulator_set_voltage()` and +:c:func:`regulator_set_current_limit()` functions provide the primary +interface for this. Both take ranges of voltages and currents, supporting +drivers that do not require a specific value (eg, CPU frequency scaling +normally permits the CPU to use a wider range of supply voltages at lower +frequencies but does not require that the supply voltage be lowered). Where +an exact value is required both minimum and maximum values should be +identical. + +Callbacks +--------- + +Callbacks may also be registered for events such as regulation failures. + +Regulator driver interface +========================== + +Drivers for regulator chips register the regulators with the regulator +core, providing operations structures to the core. A notifier interface +allows error conditions to be reported to the core. + +Registration should be triggered by explicit setup done by the platform, +supplying a struct :c:type:`regulator_init_data` for the regulator +containing constraint and supply information. + +Machine interface +================= + +This interface provides a way to define how regulators are connected to +consumers on a given system and what the valid operating parameters are +for the system. + +Supplies +-------- + +Regulator supplies are specified using struct +:c:type:`regulator_consumer_supply`. This is done at driver registration +time as part of the machine constraints. + +Constraints +----------- + +As well as defining the connections the machine interface also provides +constraints defining the operations that clients are allowed to perform +and the parameters that may be set. This is required since generally +regulator devices will offer more flexibility than it is safe to use on +a given system, for example supporting higher supply voltages than the +consumers are rated for. + +This is done at driver registration time` by providing a +struct :c:type:`regulation_constraints`. + +The constraints may also specify an initial configuration for the +regulator in the constraints, which is particularly useful for use with +static consumers. + +API reference +============= + +Due to limitations of the kernel documentation framework and the +existing layout of the source code the entire regulator API is +documented here. + +.. kernel-doc:: include/linux/regulator/consumer.h + :internal: + +.. kernel-doc:: include/linux/regulator/machine.h + :internal: + +.. kernel-doc:: include/linux/regulator/driver.h + :internal: + +.. kernel-doc:: drivers/regulator/core.c + :export: diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst new file mode 100644 index 000000000000..f73d660b2956 --- /dev/null +++ b/Documentation/driver-api/uio-howto.rst @@ -0,0 +1,705 @@ +======================= +The Userspace I/O HOWTO +======================= + +:Author: Hans-Jürgen Koch Linux developer, Linutronix +:Date: 2006-12-11 + +About this document +=================== + +Translations +------------ + +If you know of any translations for this document, or you are interested +in translating it, please email me hjk@hansjkoch.de. + +Preface +------- + +For many types of devices, creating a Linux kernel driver is overkill. +All that is really needed is some way to handle an interrupt and provide +access to the memory space of the device. The logic of controlling the +device does not necessarily have to be within the kernel, as the device +does not need to take advantage of any of other resources that the +kernel provides. One such common class of devices that are like this are +for industrial I/O cards. + +To address this situation, the userspace I/O system (UIO) was designed. +For typical industrial I/O cards, only a very small kernel module is +needed. The main part of the driver will run in user space. This +simplifies development and reduces the risk of serious bugs within a +kernel module. + +Please note that UIO is not an universal driver interface. Devices that +are already handled well by other kernel subsystems (like networking or +serial or USB) are no candidates for an UIO driver. Hardware that is +ideally suited for an UIO driver fulfills all of the following: + +- The device has memory that can be mapped. The device can be + controlled completely by writing to this memory. + +- The device usually generates interrupts. + +- The device does not fit into one of the standard kernel subsystems. + +Acknowledgments +--------------- + +I'd like to thank Thomas Gleixner and Benedikt Spranger of Linutronix, +who have not only written most of the UIO code, but also helped greatly +writing this HOWTO by giving me all kinds of background information. + +Feedback +-------- + +Find something wrong with this document? (Or perhaps something right?) I +would love to hear from you. Please email me at hjk@hansjkoch.de. + +About UIO +========= + +If you use UIO for your card's driver, here's what you get: + +- only one small kernel module to write and maintain. + +- develop the main part of your driver in user space, with all the + tools and libraries you're used to. + +- bugs in your driver won't crash the kernel. + +- updates of your driver can take place without recompiling the kernel. + +How UIO works +------------- + +Each UIO device is accessed through a device file and several sysfs +attribute files. The device file will be called ``/dev/uio0`` for the +first device, and ``/dev/uio1``, ``/dev/uio2`` and so on for subsequent +devices. + +``/dev/uioX`` is used to access the address space of the card. Just use +:c:func:`mmap()` to access registers or RAM locations of your card. + +Interrupts are handled by reading from ``/dev/uioX``. A blocking +:c:func:`read()` from ``/dev/uioX`` will return as soon as an +interrupt occurs. You can also use :c:func:`select()` on +``/dev/uioX`` to wait for an interrupt. The integer value read from +``/dev/uioX`` represents the total interrupt count. You can use this +number to figure out if you missed some interrupts. + +For some hardware that has more than one interrupt source internally, +but not separate IRQ mask and status registers, there might be +situations where userspace cannot determine what the interrupt source +was if the kernel handler disables them by writing to the chip's IRQ +register. In such a case, the kernel has to disable the IRQ completely +to leave the chip's register untouched. Now the userspace part can +determine the cause of the interrupt, but it cannot re-enable +interrupts. Another cornercase is chips where re-enabling interrupts is +a read-modify-write operation to a combined IRQ status/acknowledge +register. This would be racy if a new interrupt occurred simultaneously. + +To address these problems, UIO also implements a write() function. It is +normally not used and can be ignored for hardware that has only a single +interrupt source or has separate IRQ mask and status registers. If you +need it, however, a write to ``/dev/uioX`` will call the +:c:func:`irqcontrol()` function implemented by the driver. You have +to write a 32-bit value that is usually either 0 or 1 to disable or +enable interrupts. If a driver does not implement +:c:func:`irqcontrol()`, :c:func:`write()` will return with +``-ENOSYS``. + +To handle interrupts properly, your custom kernel module can provide its +own interrupt handler. It will automatically be called by the built-in +handler. + +For cards that don't generate interrupts but need to be polled, there is +the possibility to set up a timer that triggers the interrupt handler at +configurable time intervals. This interrupt simulation is done by +calling :c:func:`uio_event_notify()` from the timer's event +handler. + +Each driver provides attributes that are used to read or write +variables. These attributes are accessible through sysfs files. A custom +kernel driver module can add its own attributes to the device owned by +the uio driver, but not added to the UIO device itself at this time. +This might change in the future if it would be found to be useful. + +The following standard attributes are provided by the UIO framework: + +- ``name``: The name of your device. It is recommended to use the name + of your kernel module for this. + +- ``version``: A version string defined by your driver. This allows the + user space part of your driver to deal with different versions of the + kernel module. + +- ``event``: The total number of interrupts handled by the driver since + the last time the device node was read. + +These attributes appear under the ``/sys/class/uio/uioX`` directory. +Please note that this directory might be a symlink, and not a real +directory. Any userspace code that accesses it must be able to handle +this. + +Each UIO device can make one or more memory regions available for memory +mapping. This is necessary because some industrial I/O cards require +access to more than one PCI memory region in a driver. + +Each mapping has its own directory in sysfs, the first mapping appears +as ``/sys/class/uio/uioX/maps/map0/``. Subsequent mappings create +directories ``map1/``, ``map2/``, and so on. These directories will only +appear if the size of the mapping is not 0. + +Each ``mapX/`` directory contains four read-only files that show +attributes of the memory: + +- ``name``: A string identifier for this mapping. This is optional, the + string can be empty. Drivers can set this to make it easier for + userspace to find the correct mapping. + +- ``addr``: The address of memory that can be mapped. + +- ``size``: The size, in bytes, of the memory pointed to by addr. + +- ``offset``: The offset, in bytes, that has to be added to the pointer + returned by :c:func:`mmap()` to get to the actual device memory. + This is important if the device's memory is not page aligned. + Remember that pointers returned by :c:func:`mmap()` are always + page aligned, so it is good style to always add this offset. + +From userspace, the different mappings are distinguished by adjusting +the ``offset`` parameter of the :c:func:`mmap()` call. To map the +memory of mapping N, you have to use N times the page size as your +offset:: + + offset = N * getpagesize(); + +Sometimes there is hardware with memory-like regions that can not be +mapped with the technique described here, but there are still ways to +access them from userspace. The most common example are x86 ioports. On +x86 systems, userspace can access these ioports using +:c:func:`ioperm()`, :c:func:`iopl()`, :c:func:`inb()`, +:c:func:`outb()`, and similar functions. + +Since these ioport regions can not be mapped, they will not appear under +``/sys/class/uio/uioX/maps/`` like the normal memory described above. +Without information about the port regions a hardware has to offer, it +becomes difficult for the userspace part of the driver to find out which +ports belong to which UIO device. + +To address this situation, the new directory +``/sys/class/uio/uioX/portio/`` was added. It only exists if the driver +wants to pass information about one or more port regions to userspace. +If that is the case, subdirectories named ``port0``, ``port1``, and so +on, will appear underneath ``/sys/class/uio/uioX/portio/``. + +Each ``portX/`` directory contains four read-only files that show name, +start, size, and type of the port region: + +- ``name``: A string identifier for this port region. The string is + optional and can be empty. Drivers can set it to make it easier for + userspace to find a certain port region. + +- ``start``: The first port of this region. + +- ``size``: The number of ports in this region. + +- ``porttype``: A string describing the type of port. + +Writing your own kernel module +============================== + +Please have a look at ``uio_cif.c`` as an example. The following +paragraphs explain the different sections of this file. + +struct uio_info +--------------- + +This structure tells the framework the details of your driver, Some of +the members are required, others are optional. + +- ``const char *name``: Required. The name of your driver as it will + appear in sysfs. I recommend using the name of your module for this. + +- ``const char *version``: Required. This string appears in + ``/sys/class/uio/uioX/version``. + +- ``struct uio_mem mem[ MAX_UIO_MAPS ]``: Required if you have memory + that can be mapped with :c:func:`mmap()`. For each mapping you + need to fill one of the ``uio_mem`` structures. See the description + below for details. + +- ``struct uio_port port[ MAX_UIO_PORTS_REGIONS ]``: Required if you + want to pass information about ioports to userspace. For each port + region you need to fill one of the ``uio_port`` structures. See the + description below for details. + +- ``long irq``: Required. If your hardware generates an interrupt, it's + your modules task to determine the irq number during initialization. + If you don't have a hardware generated interrupt but want to trigger + the interrupt handler in some other way, set ``irq`` to + ``UIO_IRQ_CUSTOM``. If you had no interrupt at all, you could set + ``irq`` to ``UIO_IRQ_NONE``, though this rarely makes sense. + +- ``unsigned long irq_flags``: Required if you've set ``irq`` to a + hardware interrupt number. The flags given here will be used in the + call to :c:func:`request_irq()`. + +- ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``: + Optional. If you need a special :c:func:`mmap()` + function, you can set it here. If this pointer is not NULL, your + :c:func:`mmap()` will be called instead of the built-in one. + +- ``int (*open)(struct uio_info *info, struct inode *inode)``: + Optional. You might want to have your own :c:func:`open()`, + e.g. to enable interrupts only when your device is actually used. + +- ``int (*release)(struct uio_info *info, struct inode *inode)``: + Optional. If you define your own :c:func:`open()`, you will + probably also want a custom :c:func:`release()` function. + +- ``int (*irqcontrol)(struct uio_info *info, s32 irq_on)``: + Optional. If you need to be able to enable or disable interrupts + from userspace by writing to ``/dev/uioX``, you can implement this + function. The parameter ``irq_on`` will be 0 to disable interrupts + and 1 to enable them. + +Usually, your device will have one or more memory regions that can be +mapped to user space. For each region, you have to set up a +``struct uio_mem`` in the ``mem[]`` array. Here's a description of the +fields of ``struct uio_mem``: + +- ``const char *name``: Optional. Set this to help identify the memory + region, it will show up in the corresponding sysfs node. + +- ``int memtype``: Required if the mapping is used. Set this to + ``UIO_MEM_PHYS`` if you you have physical memory on your card to be + mapped. Use ``UIO_MEM_LOGICAL`` for logical memory (e.g. allocated + with :c:func:`kmalloc()`). There's also ``UIO_MEM_VIRTUAL`` for + virtual memory. + +- ``phys_addr_t addr``: Required if the mapping is used. Fill in the + address of your memory block. This address is the one that appears in + sysfs. + +- ``resource_size_t size``: Fill in the size of the memory block that + ``addr`` points to. If ``size`` is zero, the mapping is considered + unused. Note that you *must* initialize ``size`` with zero for all + unused mappings. + +- ``void *internal_addr``: If you have to access this memory region + from within your kernel module, you will want to map it internally by + using something like :c:func:`ioremap()`. Addresses returned by + this function cannot be mapped to user space, so you must not store + it in ``addr``. Use ``internal_addr`` instead to remember such an + address. + +Please do not touch the ``map`` element of ``struct uio_mem``! It is +used by the UIO framework to set up sysfs files for this mapping. Simply +leave it alone. + +Sometimes, your device can have one or more port regions which can not +be mapped to userspace. But if there are other possibilities for +userspace to access these ports, it makes sense to make information +about the ports available in sysfs. For each region, you have to set up +a ``struct uio_port`` in the ``port[]`` array. Here's a description of +the fields of ``struct uio_port``: + +- ``char *porttype``: Required. Set this to one of the predefined + constants. Use ``UIO_PORT_X86`` for the ioports found in x86 + architectures. + +- ``unsigned long start``: Required if the port region is used. Fill in + the number of the first port of this region. + +- ``unsigned long size``: Fill in the number of ports in this region. + If ``size`` is zero, the region is considered unused. Note that you + *must* initialize ``size`` with zero for all unused regions. + +Please do not touch the ``portio`` element of ``struct uio_port``! It is +used internally by the UIO framework to set up sysfs files for this +region. Simply leave it alone. + +Adding an interrupt handler +--------------------------- + +What you need to do in your interrupt handler depends on your hardware +and on how you want to handle it. You should try to keep the amount of +code in your kernel interrupt handler low. If your hardware requires no +action that you *have* to perform after each interrupt, then your +handler can be empty. + +If, on the other hand, your hardware *needs* some action to be performed +after each interrupt, then you *must* do it in your kernel module. Note +that you cannot rely on the userspace part of your driver. Your +userspace program can terminate at any time, possibly leaving your +hardware in a state where proper interrupt handling is still required. + +There might also be applications where you want to read data from your +hardware at each interrupt and buffer it in a piece of kernel memory +you've allocated for that purpose. With this technique you could avoid +loss of data if your userspace program misses an interrupt. + +A note on shared interrupts: Your driver should support interrupt +sharing whenever this is possible. It is possible if and only if your +driver can detect whether your hardware has triggered the interrupt or +not. This is usually done by looking at an interrupt status register. If +your driver sees that the IRQ bit is actually set, it will perform its +actions, and the handler returns IRQ_HANDLED. If the driver detects +that it was not your hardware that caused the interrupt, it will do +nothing and return IRQ_NONE, allowing the kernel to call the next +possible interrupt handler. + +If you decide not to support shared interrupts, your card won't work in +computers with no free interrupts. As this frequently happens on the PC +platform, you can save yourself a lot of trouble by supporting interrupt +sharing. + +Using uio_pdrv for platform devices +----------------------------------- + +In many cases, UIO drivers for platform devices can be handled in a +generic way. In the same place where you define your +``struct platform_device``, you simply also implement your interrupt +handler and fill your ``struct uio_info``. A pointer to this +``struct uio_info`` is then used as ``platform_data`` for your platform +device. + +You also need to set up an array of ``struct resource`` containing +addresses and sizes of your memory mappings. This information is passed +to the driver using the ``.resource`` and ``.num_resources`` elements of +``struct platform_device``. + +You now have to set the ``.name`` element of ``struct platform_device`` +to ``"uio_pdrv"`` to use the generic UIO platform device driver. This +driver will fill the ``mem[]`` array according to the resources given, +and register the device. + +The advantage of this approach is that you only have to edit a file you +need to edit anyway. You do not have to create an extra driver. + +Using uio_pdrv_genirq for platform devices +------------------------------------------ + +Especially in embedded devices, you frequently find chips where the irq +pin is tied to its own dedicated interrupt line. In such cases, where +you can be really sure the interrupt is not shared, we can take the +concept of ``uio_pdrv`` one step further and use a generic interrupt +handler. That's what ``uio_pdrv_genirq`` does. + +The setup for this driver is the same as described above for +``uio_pdrv``, except that you do not implement an interrupt handler. The +``.handler`` element of ``struct uio_info`` must remain ``NULL``. The +``.irq_flags`` element must not contain ``IRQF_SHARED``. + +You will set the ``.name`` element of ``struct platform_device`` to +``"uio_pdrv_genirq"`` to use this driver. + +The generic interrupt handler of ``uio_pdrv_genirq`` will simply disable +the interrupt line using :c:func:`disable_irq_nosync()`. After +doing its work, userspace can reenable the interrupt by writing +0x00000001 to the UIO device file. The driver already implements an +:c:func:`irq_control()` to make this possible, you must not +implement your own. + +Using ``uio_pdrv_genirq`` not only saves a few lines of interrupt +handler code. You also do not need to know anything about the chip's +internal registers to create the kernel part of the driver. All you need +to know is the irq number of the pin the chip is connected to. + +Using uio_dmem_genirq for platform devices +------------------------------------------ + +In addition to statically allocated memory ranges, they may also be a +desire to use dynamically allocated regions in a user space driver. In +particular, being able to access memory made available through the +dma-mapping API, may be particularly useful. The ``uio_dmem_genirq`` +driver provides a way to accomplish this. + +This driver is used in a similar manner to the ``"uio_pdrv_genirq"`` +driver with respect to interrupt configuration and handling. + +Set the ``.name`` element of ``struct platform_device`` to +``"uio_dmem_genirq"`` to use this driver. + +When using this driver, fill in the ``.platform_data`` element of +``struct platform_device``, which is of type +``struct uio_dmem_genirq_pdata`` and which contains the following +elements: + +- ``struct uio_info uioinfo``: The same structure used as the + ``uio_pdrv_genirq`` platform data + +- ``unsigned int *dynamic_region_sizes``: Pointer to list of sizes of + dynamic memory regions to be mapped into user space. + +- ``unsigned int num_dynamic_regions``: Number of elements in + ``dynamic_region_sizes`` array. + +The dynamic regions defined in the platform data will be appended to the +`` mem[] `` array after the platform device resources, which implies +that the total number of static and dynamic memory regions cannot exceed +``MAX_UIO_MAPS``. + +The dynamic memory regions will be allocated when the UIO device file, +``/dev/uioX`` is opened. Similar to static memory resources, the memory +region information for dynamic regions is then visible via sysfs at +``/sys/class/uio/uioX/maps/mapY/*``. The dynamic memory regions will be +freed when the UIO device file is closed. When no processes are holding +the device file open, the address returned to userspace is ~0. + +Writing a driver in userspace +============================= + +Once you have a working kernel module for your hardware, you can write +the userspace part of your driver. You don't need any special libraries, +your driver can be written in any reasonable language, you can use +floating point numbers and so on. In short, you can use all the tools +and libraries you'd normally use for writing a userspace application. + +Getting information about your UIO device +----------------------------------------- + +Information about all UIO devices is available in sysfs. The first thing +you should do in your driver is check ``name`` and ``version`` to make +sure your talking to the right device and that its kernel driver has the +version you expect. + +You should also make sure that the memory mapping you need exists and +has the size you expect. + +There is a tool called ``lsuio`` that lists UIO devices and their +attributes. It is available here: + +http://www.osadl.org/projects/downloads/UIO/user/ + +With ``lsuio`` you can quickly check if your kernel module is loaded and +which attributes it exports. Have a look at the manpage for details. + +The source code of ``lsuio`` can serve as an example for getting +information about an UIO device. The file ``uio_helper.c`` contains a +lot of functions you could use in your userspace driver code. + +mmap() device memory +-------------------- + +After you made sure you've got the right device with the memory mappings +you need, all you have to do is to call :c:func:`mmap()` to map the +device's memory to userspace. + +The parameter ``offset`` of the :c:func:`mmap()` call has a special +meaning for UIO devices: It is used to select which mapping of your +device you want to map. To map the memory of mapping N, you have to use +N times the page size as your offset:: + + offset = N * getpagesize(); + +N starts from zero, so if you've got only one memory range to map, set +``offset = 0``. A drawback of this technique is that memory is always +mapped beginning with its start address. + +Waiting for interrupts +---------------------- + +After you successfully mapped your devices memory, you can access it +like an ordinary array. Usually, you will perform some initialization. +After that, your hardware starts working and will generate an interrupt +as soon as it's finished, has some data available, or needs your +attention because an error occurred. + +``/dev/uioX`` is a read-only file. A :c:func:`read()` will always +block until an interrupt occurs. There is only one legal value for the +``count`` parameter of :c:func:`read()`, and that is the size of a +signed 32 bit integer (4). Any other value for ``count`` causes +:c:func:`read()` to fail. The signed 32 bit integer read is the +interrupt count of your device. If the value is one more than the value +you read the last time, everything is OK. If the difference is greater +than one, you missed interrupts. + +You can also use :c:func:`select()` on ``/dev/uioX``. + +Generic PCI UIO driver +====================== + +The generic driver is a kernel module named uio_pci_generic. It can +work with any device compliant to PCI 2.3 (circa 2002) and any compliant +PCI Express device. Using this, you only need to write the userspace +driver, removing the need to write a hardware-specific kernel module. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device ids, it will not get loaded +automatically and will not automatically bind to any devices, you must +load it and allocate id to the driver yourself. For example:: + + modprobe uio_pci_generic + echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id + +If there already is a hardware specific kernel driver for your device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver (why would you?) you'll have to manually unbind +the hardware specific driver and bind the generic driver, like this:: + + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/pci/devices/0000:00:19.0/driver + +Which if successful should print:: + + .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic + +Note that the generic driver will not bind to old PCI 2.2 devices. If +binding the device failed, run the following command:: + + dmesg + +and look in the output for failure reasons. + +Things to know about uio_pci_generic +------------------------------------ + +Interrupts are handled using the Interrupt Disable bit in the PCI +command register and Interrupt Status bit in the PCI status register. +All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI +Express devices should support these bits. uio_pci_generic detects +this support, and won't bind to devices which do not support the +Interrupt Disable Bit in the command register. + +On each interrupt, uio_pci_generic sets the Interrupt Disable bit. +This prevents the device from generating further interrupts until the +bit is cleared. The userspace driver should clear this bit before +blocking and waiting for more interrupts. + +Writing userspace driver using uio_pci_generic +------------------------------------------------ + +Userspace driver can use pci sysfs interface, or the libpci library that +wraps it, to talk to the device and to re-enable interrupts by writing +to the command register. + +Example code using uio_pci_generic +---------------------------------- + +Here is some sample userspace driver code using uio_pci_generic:: + + #include <stdlib.h> + #include <stdio.h> + #include <unistd.h> + #include <sys/types.h> + #include <sys/stat.h> + #include <fcntl.h> + #include <errno.h> + + int main() + { + int uiofd; + int configfd; + int err; + int i; + unsigned icount; + unsigned char command_high; + + uiofd = open("/dev/uio0", O_RDONLY); + if (uiofd < 0) { + perror("uio open:"); + return errno; + } + configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); + if (configfd < 0) { + perror("config open:"); + return errno; + } + + /* Read and cache command value */ + err = pread(configfd, &command_high, 1, 5); + if (err != 1) { + perror("command config read:"); + return errno; + } + command_high &= ~0x4; + + for(i = 0;; ++i) { + /* Print out a message, for debugging. */ + if (i == 0) + fprintf(stderr, "Started uio test driver.\n"); + else + fprintf(stderr, "Interrupts: %d\n", icount); + + /****************************************/ + /* Here we got an interrupt from the + device. Do something to it. */ + /****************************************/ + + /* Re-enable interrupts. */ + err = pwrite(configfd, &command_high, 1, 5); + if (err != 1) { + perror("config write:"); + break; + } + + /* Wait for next interrupt. */ + err = read(uiofd, &icount, 4); + if (err != 4) { + perror("uio read:"); + break; + } + + } + return errno; + } + +Generic Hyper-V UIO driver +========================== + +The generic driver is a kernel module named uio_hv_generic. It +supports devices on the Hyper-V VMBus similar to uio_pci_generic on +PCI bus. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device GUID's, it will not get +loaded automatically and will not automatically bind to any devices, you +must load it and allocate id to the driver yourself. For example, to use +the network device GUID:: + + modprobe uio_hv_generic + echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id + +If there already is a hardware specific kernel driver for the device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver (why would you?) you'll have to manually unbind +the hardware specific driver and bind the generic driver, like this:: + + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver + +Which if successful should print:: + + .../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic + +Things to know about uio_hv_generic +----------------------------------- + +On each interrupt, uio_hv_generic sets the Interrupt Disable bit. This +prevents the device from generating further interrupts until the bit is +cleared. The userspace driver should clear this bit before blocking and +waiting for more interrupts. + +Further information +=================== + +- `OSADL homepage. <http://www.osadl.org>`_ + +- `Linutronix homepage. <http://www.linutronix.de>`_ |