From 49b2fd6ea63d7fe9c81f00e6d0117827db1d30c6 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Sun, 1 Jan 2017 12:32:45 +0000 Subject: docs: IIO documentation sphinx conversion This is a manual conversion of the existing DocBook documentation for IIO. The intent is not to substantially change any of the content in this patch, but to give a base to build upon. Signed-off-by: Jonathan Cameron Signed-off-by: Jonathan Corbet --- Documentation/driver-api/index.rst | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 5475a2807e7a..a2e5db07756c 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -21,6 +21,7 @@ available subsections can be seen below. message-based sound frame-buffer + iio/index input usb spi -- cgit v1.2.3 From 113ccc38378b6f0b24c0993040c6044e35163a51 Mon Sep 17 00:00:00 2001 From: "Luis R. Rodriguez" Date: Fri, 16 Dec 2016 03:10:36 -0800 Subject: firmware: revamp firmware documentation Understanding this code is getting out of control without any notes. Give the firmware_class driver a much needed documentation love, and while at it convert it to the new sphinx documentation format. v2: typos and small fixes Signed-off-by: Luis R. Rodriguez Signed-off-by: Greg Kroah-Hartman --- Documentation/driver-api/firmware/built-in-fw.rst | 38 ++++ Documentation/driver-api/firmware/core.rst | 16 ++ .../driver-api/firmware/direct-fs-lookup.rst | 30 ++++ .../driver-api/firmware/fallback-mechanisms.rst | 195 +++++++++++++++++++++ .../driver-api/firmware/firmware_cache.rst | 51 ++++++ .../driver-api/firmware/fw_search_path.rst | 26 +++ Documentation/driver-api/firmware/index.rst | 16 ++ Documentation/driver-api/firmware/introduction.rst | 27 +++ Documentation/driver-api/firmware/lookup-order.rst | 18 ++ .../driver-api/firmware/request_firmware.rst | 56 ++++++ Documentation/driver-api/index.rst | 1 + Documentation/firmware_class/README | 128 -------------- 12 files changed, 474 insertions(+), 128 deletions(-) create mode 100644 Documentation/driver-api/firmware/built-in-fw.rst create mode 100644 Documentation/driver-api/firmware/core.rst create mode 100644 Documentation/driver-api/firmware/direct-fs-lookup.rst create mode 100644 Documentation/driver-api/firmware/fallback-mechanisms.rst create mode 100644 Documentation/driver-api/firmware/firmware_cache.rst create mode 100644 Documentation/driver-api/firmware/fw_search_path.rst create mode 100644 Documentation/driver-api/firmware/index.rst create mode 100644 Documentation/driver-api/firmware/introduction.rst create mode 100644 Documentation/driver-api/firmware/lookup-order.rst create mode 100644 Documentation/driver-api/firmware/request_firmware.rst delete mode 100644 Documentation/firmware_class/README (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/driver-api/firmware/built-in-fw.rst b/Documentation/driver-api/firmware/built-in-fw.rst new file mode 100644 index 000000000000..7300e66857f8 --- /dev/null +++ b/Documentation/driver-api/firmware/built-in-fw.rst @@ -0,0 +1,38 @@ +================= +Built-in firmware +================= + +Firmware can be built-in to the kernel, this means building the firmware +into vmlinux directly, to enable avoiding having to look for firmware from +the filesystem. Instead, firmware can be looked for inside the kernel +directly. You can enable built-in firmware using the kernel configuration +options: + + * CONFIG_EXTRA_FIRMWARE + * CONFIG_EXTRA_FIRMWARE_DIR + +This should not be confused with CONFIG_FIRMWARE_IN_KERNEL, this is for drivers +which enables firmware to be built as part of the kernel build process. This +option, CONFIG_FIRMWARE_IN_KERNEL, will build all firmware for all drivers +enabled which ship its firmware inside the Linux kernel source tree. + +There are a few reasons why you might want to consider building your firmware +into the kernel with CONFIG_EXTRA_FIRMWARE though: + +* Speed +* Firmware is needed for accessing the boot device, and the user doesn't + want to stuff the firmware into the boot initramfs. + +Even if you have these needs there are a few reasons why you may not be +able to make use of built-in firmware: + +* Legalese - firmware is non-GPL compatible +* Some firmware may be optional +* Firmware upgrades are possible, therefore a new firmware would implicate + a complete kernel rebuild. +* Some firmware files may be really large in size. The remote-proc subsystem + is an example subsystem which deals with these sorts of firmware +* The firmware may need to be scraped out from some device specific location + dynamically, an example is calibration data for for some WiFi chipsets. This + calibration data can be unique per sold device. + diff --git a/Documentation/driver-api/firmware/core.rst b/Documentation/driver-api/firmware/core.rst new file mode 100644 index 000000000000..1d1688cbc078 --- /dev/null +++ b/Documentation/driver-api/firmware/core.rst @@ -0,0 +1,16 @@ +========================== +Firmware API core features +========================== + +The firmware API has a rich set of core features available. This section +documents these features. + +.. toctree:: + + fw_search_path + built-in-fw + firmware_cache + direct-fs-lookup + fallback-mechanisms + lookup-order + diff --git a/Documentation/driver-api/firmware/direct-fs-lookup.rst b/Documentation/driver-api/firmware/direct-fs-lookup.rst new file mode 100644 index 000000000000..82b4d585a213 --- /dev/null +++ b/Documentation/driver-api/firmware/direct-fs-lookup.rst @@ -0,0 +1,30 @@ +======================== +Direct filesystem lookup +======================== + +Direct filesystem lookup is the most common form of firmware lookup performed +by the kernel. The kernel looks for the firmware directly on the root +filesystem in the paths documented in the section 'Firmware search paths'. +The filesystem lookup is implemented in fw_get_filesystem_firmware(), it +uses common core kernel file loader facility kernel_read_file_from_path(). +The max path allowed is PATH_MAX -- currently this is 4096 characters. + +It is recommended you keep /lib/firmware paths on your root filesystem, +avoid having a separate partition for them in order to avoid possible +races with lookups and avoid uses of the custom fallback mechanisms +documented below. + +Firmware and initramfs +---------------------- + +Drivers which are built-in to the kernel should have the firmware integrated +also as part of the initramfs used to boot the kernel given that otherwise +a race is possible with loading the driver and the real rootfs not yet being +available. Stuffing the firmware into initramfs resolves this race issue, +however note that using initrd does not suffice to address the same race. + +There are circumstances that justify not wanting to include firmware into +initramfs, such as dealing with large firmware firmware files for the +remote-proc subsystem. For such cases using a userspace fallback mechanism +is currently the only viable solution as only userspace can know for sure +when the real rootfs is ready and mounted. diff --git a/Documentation/driver-api/firmware/fallback-mechanisms.rst b/Documentation/driver-api/firmware/fallback-mechanisms.rst new file mode 100644 index 000000000000..d19354794e67 --- /dev/null +++ b/Documentation/driver-api/firmware/fallback-mechanisms.rst @@ -0,0 +1,195 @@ +=================== +Fallback mechanisms +=================== + +A fallback mechanism is supported to allow to overcome failures to do a direct +filesystem lookup on the root filesystem or when the firmware simply cannot be +installed for practical reasons on the root filesystem. The kernel +configuration options related to supporting the firmware fallback mechanism are: + + * CONFIG_FW_LOADER_USER_HELPER: enables building the firmware fallback + mechanism. Most distributions enable this option today. If enabled but + CONFIG_FW_LOADER_USER_HELPER_FALLBACK is disabled, only the custom fallback + mechanism is available and for the request_firmware_nowait() call. + * CONFIG_FW_LOADER_USER_HELPER_FALLBACK: force enables each request to + enable the kobject uevent fallback mechanism on all firmware API calls + except request_firmware_direct(). Most distributions disable this option + today. The call request_firmware_nowait() allows for one alternative + fallback mechanism: if this kconfig option is enabled and your second + argument to request_firmware_nowait(), uevent, is set to false you are + informing the kernel that you have a custom fallback mechanism and it will + manually load the firmware. Read below for more details. + +Note that this means when having this configuration: + +CONFIG_FW_LOADER_USER_HELPER=y +CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n + +the kobject uevent fallback mechanism will never take effect even +for request_firmware_nowait() when uevent is set to true. + +Justifying the firmware fallback mechanism +========================================== + +Direct filesystem lookups may fail for a variety of reasons. Known reasons for +this are worth itemizing and documenting as it justifies the need for the +fallback mechanism: + +* Race against access with the root filesystem upon bootup. + +* Races upon resume from suspend. This is resolved by the firmware cache, but + the firmware cache is only supported if you use uevents, and its not + supported for request_firmware_into_buf(). + +* Firmware is not accessible through typical means: + * It cannot be installed into the root filesystem + * The firmware provides very unique device specific data tailored for + the unit gathered with local information. An example is calibration + data for WiFi chipsets for mobile devices. This calibration data is + not common to all units, but tailored per unit. Such information may + be installed on a separate flash partition other than where the root + filesystem is provided. + +Types of fallback mechanisms +============================ + +There are really two fallback mechanisms available using one shared sysfs +interface as a loading facility: + +* Kobject uevent fallback mechanism +* Custom fallback mechanism + +First lets document the shared sysfs loading facility. + +Firmware sysfs loading facility +=============================== + +In order to help device drivers upload firmware using a fallback mechanism +the firmware infrastructure creates a sysfs interface to enable userspace +to load and indicate when firmware is ready. The sysfs directory is created +via fw_create_instance(). This call creates a new struct device named after +the firmware requested, and establishes it in the device hierarchy by +associating the device used to make the request as the device's parent. +The sysfs directory's file attributes are defined and controlled through +the new device's class (firmare_class) and group (fw_dev_attr_groups). +This is actually where the original firmware_class.c file name comes from, +as originally the only firmware loading mechanism available was the +mechanism we now use as a fallback mechanism. + +To load firmware using the sysfs interface we expose a loading indicator, +and a file upload firmware into: + + * /sys/$DEVPATH/loading + * /sys/$DEVPATH/data + +To upload firmware you will echo 1 onto the loading file to indicate +you are loading firmware. You then cat the firmware into the data file, +and you notify the kernel the firmware is ready by echo'ing 0 onto +the loading file. + +The firmware device used to help load firmware using sysfs is only created if +direct firmware loading fails and if the fallback mechanism is enabled for your +firmware request, this is set up with fw_load_from_user_helper(). It is +important to re-iterate that no device is created if a direct filesystem lookup +succeeded. + +Using:: + + echo 1 > /sys/$DEVPATH/loading + +Will clean any previous partial load at once and make the firmware API +return an error. When loading firmware the firmware_class grows a buffer +for the firmware in PAGE_SIZE increments to hold the image as it comes in. + +firmware_data_read() and firmware_loading_show() are just provided for the +test_firmware driver for testing, they are not called in normal use or +expected to be used regularly by userspace. + +Firmware kobject uevent fallback mechanism +========================================== + +Since a device is created for the sysfs interface to help load firmware as a +fallback mechanism userspace can be informed of the addition of the device by +relying on kobject uevents. The addition of the device into the device +hierarchy means the fallback mechanism for firmware loading has been initiated. +For details of implementation refer to _request_firmware_load(), in particular +on the use of dev_set_uevent_suppress() and kobject_uevent(). + +The kernel's kobject uevent mechanism is implemented in lib/kobject_uevent.c, +it issues uevents to userspace. As a supplement to kobject uevents Linux +distributions could also enable CONFIG_UEVENT_HELPER_PATH, which makes use of +core kernel's usermode helper (UMH) functionality to call out to a userspace +helper for kobject uevents. In practice though no standard distribution has +ever used the CONFIG_UEVENT_HELPER_PATH. If CONFIG_UEVENT_HELPER_PATH is +enabled this binary would be called each time kobject_uevent_env() gets called +in the kernel for each kobject uevent triggered. + +Different implementations have been supported in userspace to take advantage of +this fallback mechanism. When firmware loading was only possible using the +sysfs mechanism the userspace component "hotplug" provided the functionality of +monitoring for kobject events. Historically this was superseded be systemd's +udev, however firmware loading support was removed from udev as of systemd +commit be2ea723b1d0 ("udev: remove userspace firmware loading support") +as of v217 on August, 2014. This means most Linux distributions today are +not using or taking advantage of the firmware fallback mechanism provided +by kobject uevents. This is specially exacerbated due to the fact that most +distributions today disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK. + +Refer to do_firmware_uevent() for details of the kobject event variables +setup. Variables passwdd with a kobject add event: + +* FIRMWARE=firmware name +* TIMEOUT=timeout value +* ASYNC=whether or not the API request was asynchronous + +By default DEVPATH is set by the internal kernel kobject infrastructure. +Below is an example simple kobject uevent script:: + + # Both $DEVPATH and $FIRMWARE are already provided in the environment. + MY_FW_DIR=/lib/firmware/ + echo 1 > /sys/$DEVPATH/loading + cat $MY_FW_DIR/$FIRMWARE > /sys/$DEVPATH/data + echo 0 > /sys/$DEVPATH/loading + +Firmware custom fallback mechanism +================================== + +Users of the request_firmware_nowait() call have yet another option available +at their disposal: rely on the sysfs fallback mechanism but request that no +kobject uevents be issued to userspace. The original logic behind this +was that utilities other than udev might be required to lookup firmware +in non-traditional paths -- paths outside of the listing documented in the +section 'Direct filesystem lookup'. This option is not available to any of +the other API calls as uevents are always forced for them. + +Since uevents are only meaningful if the fallback mechanism is enabled +in your kernel it would seem odd to enable uevents with kernels that do not +have the fallback mechanism enabled in their kernels. Unfortunately we also +rely on the uevent flag which can be disabled by request_firmware_nowait() to +also setup the firmware cache for firmware requests. As documented above, +the firmware cache is only set up if uevent is enabled for an API call. +Although this can disable the firmware cache for request_firmware_nowait() +calls, users of this API should not use it for the purposes of disabling +the cache as that was not the original purpose of the flag. Not setting +the uevent flag means you want to opt-in for the firmware fallback mechanism +but you want to suppress kobject uevents, as you have a custom solution which +will monitor for your device addition into the device hierarchy somehow and +load firmware for you through a custom path. + +Firmware fallback timeout +========================= + +The firmware fallback mechanism has a timeout. If firmware is not loaded +onto the sysfs interface by the timeout value an error is sent to the +driver. By default the timeout is set to 60 seconds if uevents are +desirable, otherwise MAX_JIFFY_OFFSET is used (max timeout possible). +The logic behind using MAX_JIFFY_OFFSET for non-uevents is that a custom +solution will have as much time as it needs to load firmware. + +You can customize the firmware timeout by echo'ing your desired timeout into +the following file: + +* /sys/class/firmware/timeout + +If you echo 0 into it means MAX_JIFFY_OFFSET will be used. The data type +for the timeout is an int. diff --git a/Documentation/driver-api/firmware/firmware_cache.rst b/Documentation/driver-api/firmware/firmware_cache.rst new file mode 100644 index 000000000000..2210e5bfb332 --- /dev/null +++ b/Documentation/driver-api/firmware/firmware_cache.rst @@ -0,0 +1,51 @@ +============== +Firmware cache +============== + +When Linux resumes from suspend some device drivers require firmware lookups to +re-initialize devices. During resume there may be a period of time during which +firmware lookups are not possible, during this short period of time firmware +requests will fail. Time is of essence though, and delaying drivers to wait for +the root filesystem for firmware delays user experience with device +functionality. In order to support these requirements the firmware +infrastructure implements a firmware cache for device drivers for most API +calls, automatically behind the scenes. + +The firmware cache makes using certain firmware API calls safe during a device +driver's suspend and resume callback. Users of these API calls needn't cache +the firmware by themselves for dealing with firmware loss during system resume. + +The firmware cache works by requesting for firmware prior to suspend and +caching it in memory. Upon resume device drivers using the firmware API will +have access to the firmware immediately, without having to wait for the root +filesystem to mount or dealing with possible race issues with lookups as the +root filesystem mounts. + +Some implementation details about the firmware cache setup: + +* The firmware cache is setup by adding a devres entry for each device that + uses all synchronous call except :c:func:`request_firmware_into_buf`. + +* If an asynchronous call is used the firmware cache is only set up for a + device if if the second argument (uevent) to request_firmware_nowait() is + true. When uevent is true it requests that a kobject uevent be sent to + userspace for the firmware request. For details refer to the Fackback + mechanism documented below. + +* If the firmware cache is determined to be needed as per the above two + criteria the firmware cache is setup by adding a devres entry for the + device making the firmware request. + +* The firmware devres entry is maintained throughout the lifetime of the + device. This means that even if you release_firmware() the firmware cache + will still be used on resume from suspend. + +* The timeout for the fallback mechanism is temporarily reduced to 10 seconds + as the firmware cache is set up during suspend, the timeout is set back to + the old value you had configured after the cache is set up. + +* Upon suspend any pending non-uevent firmware requests are killed to avoid + stalling the kernel, this is done with kill_requests_without_uevent(). Kernel + calls requiring the non-uevent therefore need to implement their own firmware + cache mechanism but must not use the firmware API on suspend. + diff --git a/Documentation/driver-api/firmware/fw_search_path.rst b/Documentation/driver-api/firmware/fw_search_path.rst new file mode 100644 index 000000000000..a360f1009fa3 --- /dev/null +++ b/Documentation/driver-api/firmware/fw_search_path.rst @@ -0,0 +1,26 @@ +===================== +Firmware search paths +===================== + +The following search paths are used to look for firmware on your +root filesystem. + +* fw_path_para - module parameter - default is empty so this is ignored +* /lib/firmware/updates/UTS_RELEASE/ +* /lib/firmware/updates/ +* /lib/firmware/UTS_RELEASE/ +* /lib/firmware/ + +The module parameter ''path'' can be passed to the firmware_class module +to activate the first optional custom fw_path_para. The custom path can +only be up to 256 characters long. The kernel parameter passed would be: + +* 'firmware_class.path=$CUSTOMIZED_PATH' + +There is an alternative to customize the path at run time after bootup, you +can use the file: + +* /sys/module/firmware_class/parameters/path + +You would echo into it your custom path and firmware requested will be +searched for there first. diff --git a/Documentation/driver-api/firmware/index.rst b/Documentation/driver-api/firmware/index.rst new file mode 100644 index 000000000000..1abe01793031 --- /dev/null +++ b/Documentation/driver-api/firmware/index.rst @@ -0,0 +1,16 @@ +================== +Linux Firmware API +================== + +.. toctree:: + + introduction + core + request_firmware + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/firmware/introduction.rst b/Documentation/driver-api/firmware/introduction.rst new file mode 100644 index 000000000000..211cb44eb972 --- /dev/null +++ b/Documentation/driver-api/firmware/introduction.rst @@ -0,0 +1,27 @@ +============ +Introduction +============ + +The firmware API enables kernel code to request files required +for functionality from userspace, the uses vary: + +* Microcode for CPU errata +* Device driver firmware, required to be loaded onto device + microcontrollers +* Device driver information data (calibration data, EEPROM overrides), + some of which can be completely optional. + +Types of firmware requests +========================== + +There are two types of calls: + +* Synchronous +* Asynchronous + +Which one you use vary depending on your requirements, the rule of thumb +however is you should strive to use the asynchronous APIs unless you also +are already using asynchronous initialization mechanisms which will not +stall or delay boot. Even if loading firmware does not take a lot of time +processing firmware might, and this can still delay boot or initialization, +as such mechanisms such as asynchronous probe can help supplement drivers. diff --git a/Documentation/driver-api/firmware/lookup-order.rst b/Documentation/driver-api/firmware/lookup-order.rst new file mode 100644 index 000000000000..88c81739683c --- /dev/null +++ b/Documentation/driver-api/firmware/lookup-order.rst @@ -0,0 +1,18 @@ +===================== +Firmware lookup order +===================== + +Different functionality is available to enable firmware to be found. +Below is chronological order of how firmware will be looked for once +a driver issues a firmware API call. + +* The ''Built-in firmware'' is checked first, if the firmware is present we + return it immediately +* The ''Firmware cache'' is looked at next. If the firmware is found we + return it immediately +* The ''Direct filesystem lookup'' is performed next, if found we + return it immediately +* If no firmware has been found and the fallback mechanism was enabled + the sysfs interface is created. After this either a kobject uevent + is issued or the custom firmware loading is relied upon for firmware + loading up to the timeout value. diff --git a/Documentation/driver-api/firmware/request_firmware.rst b/Documentation/driver-api/firmware/request_firmware.rst new file mode 100644 index 000000000000..cc0aea880824 --- /dev/null +++ b/Documentation/driver-api/firmware/request_firmware.rst @@ -0,0 +1,56 @@ +==================== +request_firmware API +==================== + +You would typically load firmware and then load it into your device somehow. +The typical firmware work flow is reflected below:: + + if(request_firmware(&fw_entry, $FIRMWARE, device) == 0) + copy_fw_to_device(fw_entry->data, fw_entry->size); + release_firmware(fw_entry); + +Synchronous firmware requests +============================= + +Synchronous firmware requests will wait until the firmware is found or until +an error is returned. + +request_firmware +---------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware + +request_firmware_direct +----------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_direct + +request_firmware_into_buf +------------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_into_buf + +Asynchronous firmware requests +============================== + +Asynchronous firmware requests allow driver code to not have to wait +until the firmware or an error is returned. Function callbacks are +provided so that when the firmware or an error is found the driver is +informed through the callback. request_firmware_nowait() cannot be called +in atomic contexts. + +request_firmware_nowait +----------------------- +.. kernel-doc:: drivers/base/firmware_class.c + :functions: request_firmware_nowait + +request firmware API expected driver use +======================================== + +Once an API call returns you process the firmware and then release the +firmware. For example if you used request_firmware() and it returns, +the driver has the firmware image accessible in fw_entry->{data,size}. +If something went wrong request_firmware() returns non-zero and fw_entry +is set to NULL. Once your driver is done with processing the firmware it +can call call release_firmware(fw_entry) to release the firmware image +and any related resource. diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 5475a2807e7a..d6f4ad1a872d 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -30,6 +30,7 @@ available subsections can be seen below. miscellaneous vme 80211/index + firmware/index .. only:: subproject and html diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README deleted file mode 100644 index cafdca8b3b15..000000000000 --- a/Documentation/firmware_class/README +++ /dev/null @@ -1,128 +0,0 @@ - - request_firmware() hotplug interface: - ------------------------------------ - Copyright (C) 2003 Manuel Estrada Sainz - - Why: - --- - - Today, the most extended way to use firmware in the Linux kernel is linking - it statically in a header file. Which has political and technical issues: - - 1) Some firmware is not legal to redistribute. - 2) The firmware occupies memory permanently, even though it often is just - used once. - 3) Some people, like the Debian crowd, don't consider some firmware free - enough and remove entire drivers (e.g.: keyspan). - - High level behavior (mixed): - ============================ - - 1), kernel(driver): - - calls request_firmware(&fw_entry, $FIRMWARE, device) - - kernel searches the firmware image with name $FIRMWARE directly - in the below search path of root filesystem: - User customized search path by module parameter 'path'[1] - "/lib/firmware/updates/" UTS_RELEASE, - "/lib/firmware/updates", - "/lib/firmware/" UTS_RELEASE, - "/lib/firmware" - - If found, goto 7), else goto 2) - - [1], the 'path' is a string parameter which length should be less - than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH' - if firmware_class is built in kernel(the general situation) - - 2), userspace: - - /sys/class/firmware/xxx/{loading,data} appear. - - hotplug gets called with a firmware identifier in $FIRMWARE - and the usual hotplug environment. - - hotplug: echo 1 > /sys/class/firmware/xxx/loading - - 3), kernel: Discard any previous partial load. - - 4), userspace: - - hotplug: cat appropriate_firmware_image > \ - /sys/class/firmware/xxx/data - - 5), kernel: grows a buffer in PAGE_SIZE increments to hold the image as it - comes in. - - 6), userspace: - - hotplug: echo 0 > /sys/class/firmware/xxx/loading - - 7), kernel: request_firmware() returns and the driver has the firmware - image in fw_entry->{data,size}. If something went wrong - request_firmware() returns non-zero and fw_entry is set to - NULL. - - 8), kernel(driver): Driver code calls release_firmware(fw_entry) releasing - the firmware image and any related resource. - - High level behavior (driver code): - ================================== - - if(request_firmware(&fw_entry, $FIRMWARE, device) == 0) - copy_fw_to_device(fw_entry->data, fw_entry->size); - release_firmware(fw_entry); - - Sample/simple hotplug script: - ============================ - - # Both $DEVPATH and $FIRMWARE are already provided in the environment. - - HOTPLUG_FW_DIR=/usr/lib/hotplug/firmware/ - - echo 1 > /sys/$DEVPATH/loading - cat $HOTPLUG_FW_DIR/$FIRMWARE > /sys/$DEVPATH/data - echo 0 > /sys/$DEVPATH/loading - - Random notes: - ============ - - - "echo -1 > /sys/class/firmware/xxx/loading" will cancel the load at - once and make request_firmware() return with error. - - - firmware_data_read() and firmware_loading_show() are just provided - for testing and completeness, they are not called in normal use. - - - There is also /sys/class/firmware/timeout which holds a timeout in - seconds for the whole load operation. - - - request_firmware_nowait() is also provided for convenience in - user contexts to request firmware asynchronously, but can't be called - in atomic contexts. - - - about in-kernel persistence: - --------------------------- - Under some circumstances, as explained below, it would be interesting to keep - firmware images in non-swappable kernel memory or even in the kernel image - (probably within initramfs). - - Note that this functionality has not been implemented. - - - Why OPTIONAL in-kernel persistence may be a good idea sometimes: - - - If the device that needs the firmware is needed to access the - filesystem. When upon some error the device has to be reset and the - firmware reloaded, it won't be possible to get it from userspace. - e.g.: - - A diskless client with a network card that needs firmware. - - The filesystem is stored in a disk behind an scsi device - that needs firmware. - - Replacing buggy DSDT/SSDT ACPI tables on boot. - Note: this would require the persistent objects to be included - within the kernel image, probably within initramfs. - - And the same device can be needed to access the filesystem or not depending - on the setup, so I think that the choice on what firmware to make - persistent should be left to userspace. - - about firmware cache: - -------------------- - After firmware cache mechanism is introduced during system sleep, - request_firmware can be called safely inside device's suspend and - resume callback, and callers needn't cache the firmware by - themselves any more for dealing with firmware loss during system - resume. -- cgit v1.2.3 From cadf8106661c061ab5041282a8e088de4e470526 Mon Sep 17 00:00:00 2001 From: Alexander Dahl Date: Sat, 28 Jan 2017 10:45:32 +0100 Subject: doc: convert UIO howto from docbook to sphinx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Converted with tmplcvt. Only some tiny things needed manual fixing. Signed-off-by: Alexander Dahl Cc: Hans-Jürgen Koch Cc: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- Documentation/DocBook/Makefile | 2 +- Documentation/DocBook/uio-howto.tmpl | 1112 -------------------------------- Documentation/driver-api/index.rst | 1 + Documentation/driver-api/uio-howto.rst | 705 ++++++++++++++++++++ MAINTAINERS | 2 +- 5 files changed, 708 insertions(+), 1114 deletions(-) delete mode 100644 Documentation/DocBook/uio-howto.tmpl create mode 100644 Documentation/driver-api/uio-howto.rst (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index a6eb7dcd4dd5..5fd8f5effd0c 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -11,7 +11,7 @@ DOCBOOKS := z8530book.xml \ writing_usb_driver.xml networking.xml \ kernel-api.xml filesystems.xml lsm.xml kgdb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ - genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ + genericirq.xml s390-drivers.xml scsi.xml \ sh.xml regulator.xml w1.xml \ writing_musb_glue_layer.xml iio.xml diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl deleted file mode 100644 index 5210f8a577c6..000000000000 --- a/Documentation/DocBook/uio-howto.tmpl +++ /dev/null @@ -1,1112 +0,0 @@ - - - - - -The Userspace I/O HOWTO - - - Hans-Jürgen - Koch - Linux developer, Linutronix - - - Linutronix - - -
- hjk@hansjkoch.de -
-
-
- - - 2006-2008 - Hans-Jürgen Koch. - - - 2009 - Red Hat Inc, Michael S. Tsirkin (mst@redhat.com) - - - - -This documentation is Free Software licensed under the terms of the -GPL version 2. - - - -2006-12-11 - - - This HOWTO describes concept and usage of Linux kernel's - Userspace I/O system. - - - - - 0.10 - 2016-10-17 - sch - Added generic hyperv driver - - - - 0.9 - 2009-07-16 - mst - Added generic pci driver - - - - 0.8 - 2008-12-24 - hjk - Added name attributes in mem and portio sysfs directories. - - - - 0.7 - 2008-12-23 - hjk - Added generic platform drivers and offset attribute. - - - 0.6 - 2008-12-05 - hjk - Added description of portio sysfs attributes. - - - 0.5 - 2008-05-22 - hjk - Added description of write() function. - - - 0.4 - 2007-11-26 - hjk - Removed section about uio_dummy. - - - 0.3 - 2007-04-29 - hjk - Added section about userspace drivers. - - - 0.2 - 2007-02-13 - hjk - Update after multiple mappings were added. - - - 0.1 - 2006-12-11 - hjk - First draft. - - -
- - - -About this document - - - -Translations - -If you know of any translations for this document, or you are -interested in translating it, please email me -hjk@hansjkoch.de. - - - - -Preface - - For many types of devices, creating a Linux kernel driver is - overkill. All that is really needed is some way to handle an - interrupt and provide access to the memory space of the - device. The logic of controlling the device does not - necessarily have to be within the kernel, as the device does - not need to take advantage of any of other resources that the - kernel provides. One such common class of devices that are - like this are for industrial I/O cards. - - - To address this situation, the userspace I/O system (UIO) was - designed. For typical industrial I/O cards, only a very small - kernel module is needed. The main part of the driver will run in - user space. This simplifies development and reduces the risk of - serious bugs within a kernel module. - - - Please note that UIO is not an universal driver interface. Devices - that are already handled well by other kernel subsystems (like - networking or serial or USB) are no candidates for an UIO driver. - Hardware that is ideally suited for an UIO driver fulfills all of - the following: - - - - The device has memory that can be mapped. The device can be - controlled completely by writing to this memory. - - - The device usually generates interrupts. - - - The device does not fit into one of the standard kernel - subsystems. - - - - - -Acknowledgments - I'd like to thank Thomas Gleixner and Benedikt Spranger of - Linutronix, who have not only written most of the UIO code, but also - helped greatly writing this HOWTO by giving me all kinds of background - information. - - - -Feedback - Find something wrong with this document? (Or perhaps something - right?) I would love to hear from you. Please email me at - hjk@hansjkoch.de. - - - - - -About UIO - -If you use UIO for your card's driver, here's what you get: - - - - only one small kernel module to write and maintain. - - - develop the main part of your driver in user space, - with all the tools and libraries you're used to. - - - bugs in your driver won't crash the kernel. - - - updates of your driver can take place without recompiling - the kernel. - - - - -How UIO works - - Each UIO device is accessed through a device file and several - sysfs attribute files. The device file will be called - /dev/uio0 for the first device, and - /dev/uio1, /dev/uio2 - and so on for subsequent devices. - - - /dev/uioX is used to access the - address space of the card. Just use - mmap() to access registers or RAM - locations of your card. - - - - Interrupts are handled by reading from - /dev/uioX. A blocking - read() from - /dev/uioX will return as soon as an - interrupt occurs. You can also use - select() on - /dev/uioX to wait for an interrupt. The - integer value read from /dev/uioX - represents the total interrupt count. You can use this number - to figure out if you missed some interrupts. - - - For some hardware that has more than one interrupt source internally, - but not separate IRQ mask and status registers, there might be - situations where userspace cannot determine what the interrupt source - was if the kernel handler disables them by writing to the chip's IRQ - register. In such a case, the kernel has to disable the IRQ completely - to leave the chip's register untouched. Now the userspace part can - determine the cause of the interrupt, but it cannot re-enable - interrupts. Another cornercase is chips where re-enabling interrupts - is a read-modify-write operation to a combined IRQ status/acknowledge - register. This would be racy if a new interrupt occurred - simultaneously. - - - To address these problems, UIO also implements a write() function. It - is normally not used and can be ignored for hardware that has only a - single interrupt source or has separate IRQ mask and status registers. - If you need it, however, a write to /dev/uioX - will call the irqcontrol() function implemented - by the driver. You have to write a 32-bit value that is usually either - 0 or 1 to disable or enable interrupts. If a driver does not implement - irqcontrol(), write() will - return with -ENOSYS. - - - - To handle interrupts properly, your custom kernel module can - provide its own interrupt handler. It will automatically be - called by the built-in handler. - - - - For cards that don't generate interrupts but need to be - polled, there is the possibility to set up a timer that - triggers the interrupt handler at configurable time intervals. - This interrupt simulation is done by calling - uio_event_notify() - from the timer's event handler. - - - - Each driver provides attributes that are used to read or write - variables. These attributes are accessible through sysfs - files. A custom kernel driver module can add its own - attributes to the device owned by the uio driver, but not added - to the UIO device itself at this time. This might change in the - future if it would be found to be useful. - - - - The following standard attributes are provided by the UIO - framework: - - - - - name: The name of your device. It is - recommended to use the name of your kernel module for this. - - - - - version: A version string defined by your - driver. This allows the user space part of your driver to deal - with different versions of the kernel module. - - - - - event: The total number of interrupts - handled by the driver since the last time the device node was - read. - - - - - These attributes appear under the - /sys/class/uio/uioX directory. Please - note that this directory might be a symlink, and not a real - directory. Any userspace code that accesses it must be able - to handle this. - - - Each UIO device can make one or more memory regions available for - memory mapping. This is necessary because some industrial I/O cards - require access to more than one PCI memory region in a driver. - - - Each mapping has its own directory in sysfs, the first mapping - appears as /sys/class/uio/uioX/maps/map0/. - Subsequent mappings create directories map1/, - map2/, and so on. These directories will only - appear if the size of the mapping is not 0. - - - Each mapX/ directory contains four read-only files - that show attributes of the memory: - - - - - name: A string identifier for this mapping. This - is optional, the string can be empty. Drivers can set this to make it - easier for userspace to find the correct mapping. - - - - - addr: The address of memory that can be mapped. - - - - - size: The size, in bytes, of the memory - pointed to by addr. - - - - - offset: The offset, in bytes, that has to be - added to the pointer returned by mmap() to get - to the actual device memory. This is important if the device's memory - is not page aligned. Remember that pointers returned by - mmap() are always page aligned, so it is good - style to always add this offset. - - - - - - From userspace, the different mappings are distinguished by adjusting - the offset parameter of the - mmap() call. To map the memory of mapping N, you - have to use N times the page size as your offset: - - -offset = N * getpagesize(); - - - - Sometimes there is hardware with memory-like regions that can not be - mapped with the technique described here, but there are still ways to - access them from userspace. The most common example are x86 ioports. - On x86 systems, userspace can access these ioports using - ioperm(), iopl(), - inb(), outb(), and similar - functions. - - - Since these ioport regions can not be mapped, they will not appear under - /sys/class/uio/uioX/maps/ like the normal memory - described above. Without information about the port regions a hardware - has to offer, it becomes difficult for the userspace part of the - driver to find out which ports belong to which UIO device. - - - To address this situation, the new directory - /sys/class/uio/uioX/portio/ was added. It only - exists if the driver wants to pass information about one or more port - regions to userspace. If that is the case, subdirectories named - port0, port1, and so on, - will appear underneath - /sys/class/uio/uioX/portio/. - - - Each portX/ directory contains four read-only - files that show name, start, size, and type of the port region: - - - - - name: A string identifier for this port region. - The string is optional and can be empty. Drivers can set it to make it - easier for userspace to find a certain port region. - - - - - start: The first port of this region. - - - - - size: The number of ports in this region. - - - - - porttype: A string describing the type of port. - - - - - - - - - - -Writing your own kernel module - - Please have a look at uio_cif.c as an - example. The following paragraphs explain the different - sections of this file. - - - -struct uio_info - - This structure tells the framework the details of your driver, - Some of the members are required, others are optional. - - - - -const char *name: Required. The name of your driver as -it will appear in sysfs. I recommend using the name of your module for this. - - - -const char *version: Required. This string appears in -/sys/class/uio/uioX/version. - - - -struct uio_mem mem[ MAX_UIO_MAPS ]: Required if you -have memory that can be mapped with mmap(). For each -mapping you need to fill one of the uio_mem structures. -See the description below for details. - - - -struct uio_port port[ MAX_UIO_PORTS_REGIONS ]: Required -if you want to pass information about ioports to userspace. For each port -region you need to fill one of the uio_port structures. -See the description below for details. - - - -long irq: Required. If your hardware generates an -interrupt, it's your modules task to determine the irq number during -initialization. If you don't have a hardware generated interrupt but -want to trigger the interrupt handler in some other way, set -irq to UIO_IRQ_CUSTOM. -If you had no interrupt at all, you could set -irq to UIO_IRQ_NONE, though this -rarely makes sense. - - - -unsigned long irq_flags: Required if you've set -irq to a hardware interrupt number. The flags given -here will be used in the call to request_irq(). - - - -int (*mmap)(struct uio_info *info, struct vm_area_struct -*vma): Optional. If you need a special -mmap() function, you can set it here. If this -pointer is not NULL, your mmap() will be called -instead of the built-in one. - - - -int (*open)(struct uio_info *info, struct inode *inode) -: Optional. You might want to have your own -open(), e.g. to enable interrupts only when your -device is actually used. - - - -int (*release)(struct uio_info *info, struct inode *inode) -: Optional. If you define your own -open(), you will probably also want a custom -release() function. - - - -int (*irqcontrol)(struct uio_info *info, s32 irq_on) -: Optional. If you need to be able to enable or disable -interrupts from userspace by writing to /dev/uioX, -you can implement this function. The parameter irq_on -will be 0 to disable interrupts and 1 to enable them. - - - - -Usually, your device will have one or more memory regions that can be mapped -to user space. For each region, you have to set up a -struct uio_mem in the mem[] array. -Here's a description of the fields of struct uio_mem: - - - - -const char *name: Optional. Set this to help identify -the memory region, it will show up in the corresponding sysfs node. - - - -int memtype: Required if the mapping is used. Set this to -UIO_MEM_PHYS if you you have physical memory on your -card to be mapped. Use UIO_MEM_LOGICAL for logical -memory (e.g. allocated with kmalloc()). There's also -UIO_MEM_VIRTUAL for virtual memory. - - - -phys_addr_t addr: Required if the mapping is used. -Fill in the address of your memory block. This address is the one that -appears in sysfs. - - - -resource_size_t size: Fill in the size of the -memory block that addr points to. If size -is zero, the mapping is considered unused. Note that you -must initialize size with zero for -all unused mappings. - - - -void *internal_addr: If you have to access this memory -region from within your kernel module, you will want to map it internally by -using something like ioremap(). Addresses -returned by this function cannot be mapped to user space, so you must not -store it in addr. Use internal_addr -instead to remember such an address. - - - - -Please do not touch the map element of -struct uio_mem! It is used by the UIO framework -to set up sysfs files for this mapping. Simply leave it alone. - - - -Sometimes, your device can have one or more port regions which can not be -mapped to userspace. But if there are other possibilities for userspace to -access these ports, it makes sense to make information about the ports -available in sysfs. For each region, you have to set up a -struct uio_port in the port[] array. -Here's a description of the fields of struct uio_port: - - - - -char *porttype: Required. Set this to one of the predefined -constants. Use UIO_PORT_X86 for the ioports found in x86 -architectures. - - - -unsigned long start: Required if the port region is used. -Fill in the number of the first port of this region. - - - -unsigned long size: Fill in the number of ports in this -region. If size is zero, the region is considered unused. -Note that you must initialize size -with zero for all unused regions. - - - - -Please do not touch the portio element of -struct uio_port! It is used internally by the UIO -framework to set up sysfs files for this region. Simply leave it alone. - - - - - -Adding an interrupt handler - - What you need to do in your interrupt handler depends on your - hardware and on how you want to handle it. You should try to - keep the amount of code in your kernel interrupt handler low. - If your hardware requires no action that you - have to perform after each interrupt, - then your handler can be empty. If, on the other - hand, your hardware needs some action to - be performed after each interrupt, then you - must do it in your kernel module. Note - that you cannot rely on the userspace part of your driver. Your - userspace program can terminate at any time, possibly leaving - your hardware in a state where proper interrupt handling is - still required. - - - - There might also be applications where you want to read data - from your hardware at each interrupt and buffer it in a piece - of kernel memory you've allocated for that purpose. With this - technique you could avoid loss of data if your userspace - program misses an interrupt. - - - - A note on shared interrupts: Your driver should support - interrupt sharing whenever this is possible. It is possible if - and only if your driver can detect whether your hardware has - triggered the interrupt or not. This is usually done by looking - at an interrupt status register. If your driver sees that the - IRQ bit is actually set, it will perform its actions, and the - handler returns IRQ_HANDLED. If the driver detects that it was - not your hardware that caused the interrupt, it will do nothing - and return IRQ_NONE, allowing the kernel to call the next - possible interrupt handler. - - - - If you decide not to support shared interrupts, your card - won't work in computers with no free interrupts. As this - frequently happens on the PC platform, you can save yourself a - lot of trouble by supporting interrupt sharing. - - - - -Using uio_pdrv for platform devices - - In many cases, UIO drivers for platform devices can be handled in a - generic way. In the same place where you define your - struct platform_device, you simply also implement - your interrupt handler and fill your - struct uio_info. A pointer to this - struct uio_info is then used as - platform_data for your platform device. - - - You also need to set up an array of struct resource - containing addresses and sizes of your memory mappings. This - information is passed to the driver using the - .resource and .num_resources - elements of struct platform_device. - - - You now have to set the .name element of - struct platform_device to - "uio_pdrv" to use the generic UIO platform device - driver. This driver will fill the mem[] array - according to the resources given, and register the device. - - - The advantage of this approach is that you only have to edit a file - you need to edit anyway. You do not have to create an extra driver. - - - - -Using uio_pdrv_genirq for platform devices - - Especially in embedded devices, you frequently find chips where the - irq pin is tied to its own dedicated interrupt line. In such cases, - where you can be really sure the interrupt is not shared, we can take - the concept of uio_pdrv one step further and use a - generic interrupt handler. That's what - uio_pdrv_genirq does. - - - The setup for this driver is the same as described above for - uio_pdrv, except that you do not implement an - interrupt handler. The .handler element of - struct uio_info must remain - NULL. The .irq_flags element - must not contain IRQF_SHARED. - - - You will set the .name element of - struct platform_device to - "uio_pdrv_genirq" to use this driver. - - - The generic interrupt handler of uio_pdrv_genirq - will simply disable the interrupt line using - disable_irq_nosync(). After doing its work, - userspace can reenable the interrupt by writing 0x00000001 to the UIO - device file. The driver already implements an - irq_control() to make this possible, you must not - implement your own. - - - Using uio_pdrv_genirq not only saves a few lines of - interrupt handler code. You also do not need to know anything about - the chip's internal registers to create the kernel part of the driver. - All you need to know is the irq number of the pin the chip is - connected to. - - - - -Using uio_dmem_genirq for platform devices - - In addition to statically allocated memory ranges, they may also be - a desire to use dynamically allocated regions in a user space driver. - In particular, being able to access memory made available through the - dma-mapping API, may be particularly useful. The - uio_dmem_genirq driver provides a way to accomplish - this. - - - This driver is used in a similar manner to the - "uio_pdrv_genirq" driver with respect to interrupt - configuration and handling. - - - Set the .name element of - struct platform_device to - "uio_dmem_genirq" to use this driver. - - - When using this driver, fill in the .platform_data - element of struct platform_device, which is of type - struct uio_dmem_genirq_pdata and which contains the - following elements: - - - struct uio_info uioinfo: The same - structure used as the uio_pdrv_genirq platform - data - unsigned int *dynamic_region_sizes: - Pointer to list of sizes of dynamic memory regions to be mapped into - user space. - - unsigned int num_dynamic_regions: - Number of elements in dynamic_region_sizes array. - - - - The dynamic regions defined in the platform data will be appended to - the mem[] array after the platform device - resources, which implies that the total number of static and dynamic - memory regions cannot exceed MAX_UIO_MAPS. - - - The dynamic memory regions will be allocated when the UIO device file, - /dev/uioX is opened. - Similar to static memory resources, the memory region information for - dynamic regions is then visible via sysfs at - /sys/class/uio/uioX/maps/mapY/*. - The dynamic memory regions will be freed when the UIO device file is - closed. When no processes are holding the device file open, the address - returned to userspace is ~0. - - - - - - - -Writing a driver in userspace - - Once you have a working kernel module for your hardware, you can - write the userspace part of your driver. You don't need any special - libraries, your driver can be written in any reasonable language, - you can use floating point numbers and so on. In short, you can - use all the tools and libraries you'd normally use for writing a - userspace application. - - - -Getting information about your UIO device - - Information about all UIO devices is available in sysfs. The - first thing you should do in your driver is check - name and version to - make sure your talking to the right device and that its kernel - driver has the version you expect. - - - You should also make sure that the memory mapping you need - exists and has the size you expect. - - - There is a tool called lsuio that lists - UIO devices and their attributes. It is available here: - - - - http://www.osadl.org/projects/downloads/UIO/user/ - - - With lsuio you can quickly check if your - kernel module is loaded and which attributes it exports. - Have a look at the manpage for details. - - - The source code of lsuio can serve as an - example for getting information about an UIO device. - The file uio_helper.c contains a lot of - functions you could use in your userspace driver code. - - - - -mmap() device memory - - After you made sure you've got the right device with the - memory mappings you need, all you have to do is to call - mmap() to map the device's memory - to userspace. - - - The parameter offset of the - mmap() call has a special meaning - for UIO devices: It is used to select which mapping of - your device you want to map. To map the memory of - mapping N, you have to use N times the page size as - your offset: - - - offset = N * getpagesize(); - - - N starts from zero, so if you've got only one memory - range to map, set offset = 0. - A drawback of this technique is that memory is always - mapped beginning with its start address. - - - - -Waiting for interrupts - - After you successfully mapped your devices memory, you - can access it like an ordinary array. Usually, you will - perform some initialization. After that, your hardware - starts working and will generate an interrupt as soon - as it's finished, has some data available, or needs your - attention because an error occurred. - - - /dev/uioX is a read-only file. A - read() will always block until an - interrupt occurs. There is only one legal value for the - count parameter of - read(), and that is the size of a - signed 32 bit integer (4). Any other value for - count causes read() - to fail. The signed 32 bit integer read is the interrupt - count of your device. If the value is one more than the value - you read the last time, everything is OK. If the difference - is greater than one, you missed interrupts. - - - You can also use select() on - /dev/uioX. - - - - - - - -Generic PCI UIO driver - - The generic driver is a kernel module named uio_pci_generic. - It can work with any device compliant to PCI 2.3 (circa 2002) and - any compliant PCI Express device. Using this, you only need to - write the userspace driver, removing the need to write - a hardware-specific kernel module. - - - -Making the driver recognize the device - -Since the driver does not declare any device ids, it will not get loaded -automatically and will not automatically bind to any devices, you must load it -and allocate id to the driver yourself. For example: - - modprobe uio_pci_generic - echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id - - - -If there already is a hardware specific kernel driver for your device, the -generic driver still won't bind to it, in this case if you want to use the -generic driver (why would you?) you'll have to manually unbind the hardware -specific driver and bind the generic driver, like this: - - echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind - echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind - - - -You can verify that the device has been bound to the driver -by looking for it in sysfs, for example like the following: - - ls -l /sys/bus/pci/devices/0000:00:19.0/driver - -Which if successful should print - - .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic - -Note that the generic driver will not bind to old PCI 2.2 devices. -If binding the device failed, run the following command: - - dmesg - -and look in the output for failure reasons - - - - -Things to know about uio_pci_generic - -Interrupts are handled using the Interrupt Disable bit in the PCI command -register and Interrupt Status bit in the PCI status register. All devices -compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should -support these bits. uio_pci_generic detects this support, and won't bind to -devices which do not support the Interrupt Disable Bit in the command register. - - -On each interrupt, uio_pci_generic sets the Interrupt Disable bit. -This prevents the device from generating further interrupts -until the bit is cleared. The userspace driver should clear this -bit before blocking and waiting for more interrupts. - - - -Writing userspace driver using uio_pci_generic - -Userspace driver can use pci sysfs interface, or the -libpci libray that wraps it, to talk to the device and to -re-enable interrupts by writing to the command register. - - - -Example code using uio_pci_generic - -Here is some sample userspace driver code using uio_pci_generic: - -#include <stdlib.h> -#include <stdio.h> -#include <unistd.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <fcntl.h> -#include <errno.h> - -int main() -{ - int uiofd; - int configfd; - int err; - int i; - unsigned icount; - unsigned char command_high; - - uiofd = open("/dev/uio0", O_RDONLY); - if (uiofd < 0) { - perror("uio open:"); - return errno; - } - configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); - if (configfd < 0) { - perror("config open:"); - return errno; - } - - /* Read and cache command value */ - err = pread(configfd, &command_high, 1, 5); - if (err != 1) { - perror("command config read:"); - return errno; - } - command_high &= ~0x4; - - for(i = 0;; ++i) { - /* Print out a message, for debugging. */ - if (i == 0) - fprintf(stderr, "Started uio test driver.\n"); - else - fprintf(stderr, "Interrupts: %d\n", icount); - - /****************************************/ - /* Here we got an interrupt from the - device. Do something to it. */ - /****************************************/ - - /* Re-enable interrupts. */ - err = pwrite(configfd, &command_high, 1, 5); - if (err != 1) { - perror("config write:"); - break; - } - - /* Wait for next interrupt. */ - err = read(uiofd, &icount, 4); - if (err != 4) { - perror("uio read:"); - break; - } - - } - return errno; -} - - - - - - - - - -Generic Hyper-V UIO driver - - The generic driver is a kernel module named uio_hv_generic. - It supports devices on the Hyper-V VMBus similar to uio_pci_generic - on PCI bus. - - - -Making the driver recognize the device - -Since the driver does not declare any device GUID's, it will not get loaded -automatically and will not automatically bind to any devices, you must load it -and allocate id to the driver yourself. For example, to use the network device -GUID: - - modprobe uio_hv_generic - echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id - - - -If there already is a hardware specific kernel driver for the device, the -generic driver still won't bind to it, in this case if you want to use the -generic driver (why would you?) you'll have to manually unbind the hardware -specific driver and bind the generic driver, like this: - - echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind - echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind - - - -You can verify that the device has been bound to the driver -by looking for it in sysfs, for example like the following: - - ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver - -Which if successful should print - - .../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic - - - - - -Things to know about uio_hv_generic - -On each interrupt, uio_hv_generic sets the Interrupt Disable bit. -This prevents the device from generating further interrupts -until the bit is cleared. The userspace driver should clear this -bit before blocking and waiting for more interrupts. - - - - - -Further information - - - - OSADL homepage. - - - - Linutronix homepage. - - - - -
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 5475a2807e7a..c5a1cd0a4ae7 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -30,6 +30,7 @@ available subsections can be seen below. miscellaneous vme 80211/index + uio-howto .. only:: subproject and html diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst new file mode 100644 index 000000000000..f73d660b2956 --- /dev/null +++ b/Documentation/driver-api/uio-howto.rst @@ -0,0 +1,705 @@ +======================= +The Userspace I/O HOWTO +======================= + +:Author: Hans-Jürgen Koch Linux developer, Linutronix +:Date: 2006-12-11 + +About this document +=================== + +Translations +------------ + +If you know of any translations for this document, or you are interested +in translating it, please email me hjk@hansjkoch.de. + +Preface +------- + +For many types of devices, creating a Linux kernel driver is overkill. +All that is really needed is some way to handle an interrupt and provide +access to the memory space of the device. The logic of controlling the +device does not necessarily have to be within the kernel, as the device +does not need to take advantage of any of other resources that the +kernel provides. One such common class of devices that are like this are +for industrial I/O cards. + +To address this situation, the userspace I/O system (UIO) was designed. +For typical industrial I/O cards, only a very small kernel module is +needed. The main part of the driver will run in user space. This +simplifies development and reduces the risk of serious bugs within a +kernel module. + +Please note that UIO is not an universal driver interface. Devices that +are already handled well by other kernel subsystems (like networking or +serial or USB) are no candidates for an UIO driver. Hardware that is +ideally suited for an UIO driver fulfills all of the following: + +- The device has memory that can be mapped. The device can be + controlled completely by writing to this memory. + +- The device usually generates interrupts. + +- The device does not fit into one of the standard kernel subsystems. + +Acknowledgments +--------------- + +I'd like to thank Thomas Gleixner and Benedikt Spranger of Linutronix, +who have not only written most of the UIO code, but also helped greatly +writing this HOWTO by giving me all kinds of background information. + +Feedback +-------- + +Find something wrong with this document? (Or perhaps something right?) I +would love to hear from you. Please email me at hjk@hansjkoch.de. + +About UIO +========= + +If you use UIO for your card's driver, here's what you get: + +- only one small kernel module to write and maintain. + +- develop the main part of your driver in user space, with all the + tools and libraries you're used to. + +- bugs in your driver won't crash the kernel. + +- updates of your driver can take place without recompiling the kernel. + +How UIO works +------------- + +Each UIO device is accessed through a device file and several sysfs +attribute files. The device file will be called ``/dev/uio0`` for the +first device, and ``/dev/uio1``, ``/dev/uio2`` and so on for subsequent +devices. + +``/dev/uioX`` is used to access the address space of the card. Just use +:c:func:`mmap()` to access registers or RAM locations of your card. + +Interrupts are handled by reading from ``/dev/uioX``. A blocking +:c:func:`read()` from ``/dev/uioX`` will return as soon as an +interrupt occurs. You can also use :c:func:`select()` on +``/dev/uioX`` to wait for an interrupt. The integer value read from +``/dev/uioX`` represents the total interrupt count. You can use this +number to figure out if you missed some interrupts. + +For some hardware that has more than one interrupt source internally, +but not separate IRQ mask and status registers, there might be +situations where userspace cannot determine what the interrupt source +was if the kernel handler disables them by writing to the chip's IRQ +register. In such a case, the kernel has to disable the IRQ completely +to leave the chip's register untouched. Now the userspace part can +determine the cause of the interrupt, but it cannot re-enable +interrupts. Another cornercase is chips where re-enabling interrupts is +a read-modify-write operation to a combined IRQ status/acknowledge +register. This would be racy if a new interrupt occurred simultaneously. + +To address these problems, UIO also implements a write() function. It is +normally not used and can be ignored for hardware that has only a single +interrupt source or has separate IRQ mask and status registers. If you +need it, however, a write to ``/dev/uioX`` will call the +:c:func:`irqcontrol()` function implemented by the driver. You have +to write a 32-bit value that is usually either 0 or 1 to disable or +enable interrupts. If a driver does not implement +:c:func:`irqcontrol()`, :c:func:`write()` will return with +``-ENOSYS``. + +To handle interrupts properly, your custom kernel module can provide its +own interrupt handler. It will automatically be called by the built-in +handler. + +For cards that don't generate interrupts but need to be polled, there is +the possibility to set up a timer that triggers the interrupt handler at +configurable time intervals. This interrupt simulation is done by +calling :c:func:`uio_event_notify()` from the timer's event +handler. + +Each driver provides attributes that are used to read or write +variables. These attributes are accessible through sysfs files. A custom +kernel driver module can add its own attributes to the device owned by +the uio driver, but not added to the UIO device itself at this time. +This might change in the future if it would be found to be useful. + +The following standard attributes are provided by the UIO framework: + +- ``name``: The name of your device. It is recommended to use the name + of your kernel module for this. + +- ``version``: A version string defined by your driver. This allows the + user space part of your driver to deal with different versions of the + kernel module. + +- ``event``: The total number of interrupts handled by the driver since + the last time the device node was read. + +These attributes appear under the ``/sys/class/uio/uioX`` directory. +Please note that this directory might be a symlink, and not a real +directory. Any userspace code that accesses it must be able to handle +this. + +Each UIO device can make one or more memory regions available for memory +mapping. This is necessary because some industrial I/O cards require +access to more than one PCI memory region in a driver. + +Each mapping has its own directory in sysfs, the first mapping appears +as ``/sys/class/uio/uioX/maps/map0/``. Subsequent mappings create +directories ``map1/``, ``map2/``, and so on. These directories will only +appear if the size of the mapping is not 0. + +Each ``mapX/`` directory contains four read-only files that show +attributes of the memory: + +- ``name``: A string identifier for this mapping. This is optional, the + string can be empty. Drivers can set this to make it easier for + userspace to find the correct mapping. + +- ``addr``: The address of memory that can be mapped. + +- ``size``: The size, in bytes, of the memory pointed to by addr. + +- ``offset``: The offset, in bytes, that has to be added to the pointer + returned by :c:func:`mmap()` to get to the actual device memory. + This is important if the device's memory is not page aligned. + Remember that pointers returned by :c:func:`mmap()` are always + page aligned, so it is good style to always add this offset. + +From userspace, the different mappings are distinguished by adjusting +the ``offset`` parameter of the :c:func:`mmap()` call. To map the +memory of mapping N, you have to use N times the page size as your +offset:: + + offset = N * getpagesize(); + +Sometimes there is hardware with memory-like regions that can not be +mapped with the technique described here, but there are still ways to +access them from userspace. The most common example are x86 ioports. On +x86 systems, userspace can access these ioports using +:c:func:`ioperm()`, :c:func:`iopl()`, :c:func:`inb()`, +:c:func:`outb()`, and similar functions. + +Since these ioport regions can not be mapped, they will not appear under +``/sys/class/uio/uioX/maps/`` like the normal memory described above. +Without information about the port regions a hardware has to offer, it +becomes difficult for the userspace part of the driver to find out which +ports belong to which UIO device. + +To address this situation, the new directory +``/sys/class/uio/uioX/portio/`` was added. It only exists if the driver +wants to pass information about one or more port regions to userspace. +If that is the case, subdirectories named ``port0``, ``port1``, and so +on, will appear underneath ``/sys/class/uio/uioX/portio/``. + +Each ``portX/`` directory contains four read-only files that show name, +start, size, and type of the port region: + +- ``name``: A string identifier for this port region. The string is + optional and can be empty. Drivers can set it to make it easier for + userspace to find a certain port region. + +- ``start``: The first port of this region. + +- ``size``: The number of ports in this region. + +- ``porttype``: A string describing the type of port. + +Writing your own kernel module +============================== + +Please have a look at ``uio_cif.c`` as an example. The following +paragraphs explain the different sections of this file. + +struct uio_info +--------------- + +This structure tells the framework the details of your driver, Some of +the members are required, others are optional. + +- ``const char *name``: Required. The name of your driver as it will + appear in sysfs. I recommend using the name of your module for this. + +- ``const char *version``: Required. This string appears in + ``/sys/class/uio/uioX/version``. + +- ``struct uio_mem mem[ MAX_UIO_MAPS ]``: Required if you have memory + that can be mapped with :c:func:`mmap()`. For each mapping you + need to fill one of the ``uio_mem`` structures. See the description + below for details. + +- ``struct uio_port port[ MAX_UIO_PORTS_REGIONS ]``: Required if you + want to pass information about ioports to userspace. For each port + region you need to fill one of the ``uio_port`` structures. See the + description below for details. + +- ``long irq``: Required. If your hardware generates an interrupt, it's + your modules task to determine the irq number during initialization. + If you don't have a hardware generated interrupt but want to trigger + the interrupt handler in some other way, set ``irq`` to + ``UIO_IRQ_CUSTOM``. If you had no interrupt at all, you could set + ``irq`` to ``UIO_IRQ_NONE``, though this rarely makes sense. + +- ``unsigned long irq_flags``: Required if you've set ``irq`` to a + hardware interrupt number. The flags given here will be used in the + call to :c:func:`request_irq()`. + +- ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``: + Optional. If you need a special :c:func:`mmap()` + function, you can set it here. If this pointer is not NULL, your + :c:func:`mmap()` will be called instead of the built-in one. + +- ``int (*open)(struct uio_info *info, struct inode *inode)``: + Optional. You might want to have your own :c:func:`open()`, + e.g. to enable interrupts only when your device is actually used. + +- ``int (*release)(struct uio_info *info, struct inode *inode)``: + Optional. If you define your own :c:func:`open()`, you will + probably also want a custom :c:func:`release()` function. + +- ``int (*irqcontrol)(struct uio_info *info, s32 irq_on)``: + Optional. If you need to be able to enable or disable interrupts + from userspace by writing to ``/dev/uioX``, you can implement this + function. The parameter ``irq_on`` will be 0 to disable interrupts + and 1 to enable them. + +Usually, your device will have one or more memory regions that can be +mapped to user space. For each region, you have to set up a +``struct uio_mem`` in the ``mem[]`` array. Here's a description of the +fields of ``struct uio_mem``: + +- ``const char *name``: Optional. Set this to help identify the memory + region, it will show up in the corresponding sysfs node. + +- ``int memtype``: Required if the mapping is used. Set this to + ``UIO_MEM_PHYS`` if you you have physical memory on your card to be + mapped. Use ``UIO_MEM_LOGICAL`` for logical memory (e.g. allocated + with :c:func:`kmalloc()`). There's also ``UIO_MEM_VIRTUAL`` for + virtual memory. + +- ``phys_addr_t addr``: Required if the mapping is used. Fill in the + address of your memory block. This address is the one that appears in + sysfs. + +- ``resource_size_t size``: Fill in the size of the memory block that + ``addr`` points to. If ``size`` is zero, the mapping is considered + unused. Note that you *must* initialize ``size`` with zero for all + unused mappings. + +- ``void *internal_addr``: If you have to access this memory region + from within your kernel module, you will want to map it internally by + using something like :c:func:`ioremap()`. Addresses returned by + this function cannot be mapped to user space, so you must not store + it in ``addr``. Use ``internal_addr`` instead to remember such an + address. + +Please do not touch the ``map`` element of ``struct uio_mem``! It is +used by the UIO framework to set up sysfs files for this mapping. Simply +leave it alone. + +Sometimes, your device can have one or more port regions which can not +be mapped to userspace. But if there are other possibilities for +userspace to access these ports, it makes sense to make information +about the ports available in sysfs. For each region, you have to set up +a ``struct uio_port`` in the ``port[]`` array. Here's a description of +the fields of ``struct uio_port``: + +- ``char *porttype``: Required. Set this to one of the predefined + constants. Use ``UIO_PORT_X86`` for the ioports found in x86 + architectures. + +- ``unsigned long start``: Required if the port region is used. Fill in + the number of the first port of this region. + +- ``unsigned long size``: Fill in the number of ports in this region. + If ``size`` is zero, the region is considered unused. Note that you + *must* initialize ``size`` with zero for all unused regions. + +Please do not touch the ``portio`` element of ``struct uio_port``! It is +used internally by the UIO framework to set up sysfs files for this +region. Simply leave it alone. + +Adding an interrupt handler +--------------------------- + +What you need to do in your interrupt handler depends on your hardware +and on how you want to handle it. You should try to keep the amount of +code in your kernel interrupt handler low. If your hardware requires no +action that you *have* to perform after each interrupt, then your +handler can be empty. + +If, on the other hand, your hardware *needs* some action to be performed +after each interrupt, then you *must* do it in your kernel module. Note +that you cannot rely on the userspace part of your driver. Your +userspace program can terminate at any time, possibly leaving your +hardware in a state where proper interrupt handling is still required. + +There might also be applications where you want to read data from your +hardware at each interrupt and buffer it in a piece of kernel memory +you've allocated for that purpose. With this technique you could avoid +loss of data if your userspace program misses an interrupt. + +A note on shared interrupts: Your driver should support interrupt +sharing whenever this is possible. It is possible if and only if your +driver can detect whether your hardware has triggered the interrupt or +not. This is usually done by looking at an interrupt status register. If +your driver sees that the IRQ bit is actually set, it will perform its +actions, and the handler returns IRQ_HANDLED. If the driver detects +that it was not your hardware that caused the interrupt, it will do +nothing and return IRQ_NONE, allowing the kernel to call the next +possible interrupt handler. + +If you decide not to support shared interrupts, your card won't work in +computers with no free interrupts. As this frequently happens on the PC +platform, you can save yourself a lot of trouble by supporting interrupt +sharing. + +Using uio_pdrv for platform devices +----------------------------------- + +In many cases, UIO drivers for platform devices can be handled in a +generic way. In the same place where you define your +``struct platform_device``, you simply also implement your interrupt +handler and fill your ``struct uio_info``. A pointer to this +``struct uio_info`` is then used as ``platform_data`` for your platform +device. + +You also need to set up an array of ``struct resource`` containing +addresses and sizes of your memory mappings. This information is passed +to the driver using the ``.resource`` and ``.num_resources`` elements of +``struct platform_device``. + +You now have to set the ``.name`` element of ``struct platform_device`` +to ``"uio_pdrv"`` to use the generic UIO platform device driver. This +driver will fill the ``mem[]`` array according to the resources given, +and register the device. + +The advantage of this approach is that you only have to edit a file you +need to edit anyway. You do not have to create an extra driver. + +Using uio_pdrv_genirq for platform devices +------------------------------------------ + +Especially in embedded devices, you frequently find chips where the irq +pin is tied to its own dedicated interrupt line. In such cases, where +you can be really sure the interrupt is not shared, we can take the +concept of ``uio_pdrv`` one step further and use a generic interrupt +handler. That's what ``uio_pdrv_genirq`` does. + +The setup for this driver is the same as described above for +``uio_pdrv``, except that you do not implement an interrupt handler. The +``.handler`` element of ``struct uio_info`` must remain ``NULL``. The +``.irq_flags`` element must not contain ``IRQF_SHARED``. + +You will set the ``.name`` element of ``struct platform_device`` to +``"uio_pdrv_genirq"`` to use this driver. + +The generic interrupt handler of ``uio_pdrv_genirq`` will simply disable +the interrupt line using :c:func:`disable_irq_nosync()`. After +doing its work, userspace can reenable the interrupt by writing +0x00000001 to the UIO device file. The driver already implements an +:c:func:`irq_control()` to make this possible, you must not +implement your own. + +Using ``uio_pdrv_genirq`` not only saves a few lines of interrupt +handler code. You also do not need to know anything about the chip's +internal registers to create the kernel part of the driver. All you need +to know is the irq number of the pin the chip is connected to. + +Using uio_dmem_genirq for platform devices +------------------------------------------ + +In addition to statically allocated memory ranges, they may also be a +desire to use dynamically allocated regions in a user space driver. In +particular, being able to access memory made available through the +dma-mapping API, may be particularly useful. The ``uio_dmem_genirq`` +driver provides a way to accomplish this. + +This driver is used in a similar manner to the ``"uio_pdrv_genirq"`` +driver with respect to interrupt configuration and handling. + +Set the ``.name`` element of ``struct platform_device`` to +``"uio_dmem_genirq"`` to use this driver. + +When using this driver, fill in the ``.platform_data`` element of +``struct platform_device``, which is of type +``struct uio_dmem_genirq_pdata`` and which contains the following +elements: + +- ``struct uio_info uioinfo``: The same structure used as the + ``uio_pdrv_genirq`` platform data + +- ``unsigned int *dynamic_region_sizes``: Pointer to list of sizes of + dynamic memory regions to be mapped into user space. + +- ``unsigned int num_dynamic_regions``: Number of elements in + ``dynamic_region_sizes`` array. + +The dynamic regions defined in the platform data will be appended to the +`` mem[] `` array after the platform device resources, which implies +that the total number of static and dynamic memory regions cannot exceed +``MAX_UIO_MAPS``. + +The dynamic memory regions will be allocated when the UIO device file, +``/dev/uioX`` is opened. Similar to static memory resources, the memory +region information for dynamic regions is then visible via sysfs at +``/sys/class/uio/uioX/maps/mapY/*``. The dynamic memory regions will be +freed when the UIO device file is closed. When no processes are holding +the device file open, the address returned to userspace is ~0. + +Writing a driver in userspace +============================= + +Once you have a working kernel module for your hardware, you can write +the userspace part of your driver. You don't need any special libraries, +your driver can be written in any reasonable language, you can use +floating point numbers and so on. In short, you can use all the tools +and libraries you'd normally use for writing a userspace application. + +Getting information about your UIO device +----------------------------------------- + +Information about all UIO devices is available in sysfs. The first thing +you should do in your driver is check ``name`` and ``version`` to make +sure your talking to the right device and that its kernel driver has the +version you expect. + +You should also make sure that the memory mapping you need exists and +has the size you expect. + +There is a tool called ``lsuio`` that lists UIO devices and their +attributes. It is available here: + +http://www.osadl.org/projects/downloads/UIO/user/ + +With ``lsuio`` you can quickly check if your kernel module is loaded and +which attributes it exports. Have a look at the manpage for details. + +The source code of ``lsuio`` can serve as an example for getting +information about an UIO device. The file ``uio_helper.c`` contains a +lot of functions you could use in your userspace driver code. + +mmap() device memory +-------------------- + +After you made sure you've got the right device with the memory mappings +you need, all you have to do is to call :c:func:`mmap()` to map the +device's memory to userspace. + +The parameter ``offset`` of the :c:func:`mmap()` call has a special +meaning for UIO devices: It is used to select which mapping of your +device you want to map. To map the memory of mapping N, you have to use +N times the page size as your offset:: + + offset = N * getpagesize(); + +N starts from zero, so if you've got only one memory range to map, set +``offset = 0``. A drawback of this technique is that memory is always +mapped beginning with its start address. + +Waiting for interrupts +---------------------- + +After you successfully mapped your devices memory, you can access it +like an ordinary array. Usually, you will perform some initialization. +After that, your hardware starts working and will generate an interrupt +as soon as it's finished, has some data available, or needs your +attention because an error occurred. + +``/dev/uioX`` is a read-only file. A :c:func:`read()` will always +block until an interrupt occurs. There is only one legal value for the +``count`` parameter of :c:func:`read()`, and that is the size of a +signed 32 bit integer (4). Any other value for ``count`` causes +:c:func:`read()` to fail. The signed 32 bit integer read is the +interrupt count of your device. If the value is one more than the value +you read the last time, everything is OK. If the difference is greater +than one, you missed interrupts. + +You can also use :c:func:`select()` on ``/dev/uioX``. + +Generic PCI UIO driver +====================== + +The generic driver is a kernel module named uio_pci_generic. It can +work with any device compliant to PCI 2.3 (circa 2002) and any compliant +PCI Express device. Using this, you only need to write the userspace +driver, removing the need to write a hardware-specific kernel module. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device ids, it will not get loaded +automatically and will not automatically bind to any devices, you must +load it and allocate id to the driver yourself. For example:: + + modprobe uio_pci_generic + echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id + +If there already is a hardware specific kernel driver for your device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver (why would you?) you'll have to manually unbind +the hardware specific driver and bind the generic driver, like this:: + + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/pci/devices/0000:00:19.0/driver + +Which if successful should print:: + + .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic + +Note that the generic driver will not bind to old PCI 2.2 devices. If +binding the device failed, run the following command:: + + dmesg + +and look in the output for failure reasons. + +Things to know about uio_pci_generic +------------------------------------ + +Interrupts are handled using the Interrupt Disable bit in the PCI +command register and Interrupt Status bit in the PCI status register. +All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI +Express devices should support these bits. uio_pci_generic detects +this support, and won't bind to devices which do not support the +Interrupt Disable Bit in the command register. + +On each interrupt, uio_pci_generic sets the Interrupt Disable bit. +This prevents the device from generating further interrupts until the +bit is cleared. The userspace driver should clear this bit before +blocking and waiting for more interrupts. + +Writing userspace driver using uio_pci_generic +------------------------------------------------ + +Userspace driver can use pci sysfs interface, or the libpci library that +wraps it, to talk to the device and to re-enable interrupts by writing +to the command register. + +Example code using uio_pci_generic +---------------------------------- + +Here is some sample userspace driver code using uio_pci_generic:: + + #include + #include + #include + #include + #include + #include + #include + + int main() + { + int uiofd; + int configfd; + int err; + int i; + unsigned icount; + unsigned char command_high; + + uiofd = open("/dev/uio0", O_RDONLY); + if (uiofd < 0) { + perror("uio open:"); + return errno; + } + configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); + if (configfd < 0) { + perror("config open:"); + return errno; + } + + /* Read and cache command value */ + err = pread(configfd, &command_high, 1, 5); + if (err != 1) { + perror("command config read:"); + return errno; + } + command_high &= ~0x4; + + for(i = 0;; ++i) { + /* Print out a message, for debugging. */ + if (i == 0) + fprintf(stderr, "Started uio test driver.\n"); + else + fprintf(stderr, "Interrupts: %d\n", icount); + + /****************************************/ + /* Here we got an interrupt from the + device. Do something to it. */ + /****************************************/ + + /* Re-enable interrupts. */ + err = pwrite(configfd, &command_high, 1, 5); + if (err != 1) { + perror("config write:"); + break; + } + + /* Wait for next interrupt. */ + err = read(uiofd, &icount, 4); + if (err != 4) { + perror("uio read:"); + break; + } + + } + return errno; + } + +Generic Hyper-V UIO driver +========================== + +The generic driver is a kernel module named uio_hv_generic. It +supports devices on the Hyper-V VMBus similar to uio_pci_generic on +PCI bus. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device GUID's, it will not get +loaded automatically and will not automatically bind to any devices, you +must load it and allocate id to the driver yourself. For example, to use +the network device GUID:: + + modprobe uio_hv_generic + echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id + +If there already is a hardware specific kernel driver for the device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver (why would you?) you'll have to manually unbind +the hardware specific driver and bind the generic driver, like this:: + + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver + +Which if successful should print:: + + .../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic + +Things to know about uio_hv_generic +----------------------------------- + +On each interrupt, uio_hv_generic sets the Interrupt Disable bit. This +prevents the device from generating further interrupts until the bit is +cleared. The userspace driver should clear this bit before blocking and +waiting for more interrupts. + +Further information +=================== + +- `OSADL homepage. `_ + +- `Linutronix homepage. `_ diff --git a/MAINTAINERS b/MAINTAINERS index be8de24fd6dd..6f6efd2e706a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12966,7 +12966,7 @@ USERSPACE I/O (UIO) M: Greg Kroah-Hartman S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git -F: Documentation/DocBook/uio-howto.tmpl +F: Documentation/driver-api/uio-howto.rst F: drivers/uio/ F: include/linux/uio*.h -- cgit v1.2.3 From 8a8a602fdb834ffce9cf3e9f6021a86cdada78f0 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 27 Jan 2017 15:43:01 -0700 Subject: docs: Convert the deviceio template to RST Convert deviceiobook.tmpl to RST and incorporate it into the driver API manual. Like the rest of our documentation, this one could use some work. There's no mention of ioremap() and friends, no mention of io_read*() and friends. But we have nice documentation for all those folks writing new drivers that do port I/O :). The :c:func: notation has been left off of all the read*/write* functions. There's no kerneldoc comments for them anyway, so those links will never be live, and writing a bunch of repetitive "read a byte from I/O memory" comments lacks appeal. Cc: Matthew Wilcox Cc: Alan Cox Signed-off-by: Jonathan Corbet --- Documentation/DocBook/deviceiobook.tmpl | 323 -------------------------------- Documentation/driver-api/device-io.rst | 201 ++++++++++++++++++++ Documentation/driver-api/index.rst | 1 + 3 files changed, 202 insertions(+), 323 deletions(-) delete mode 100644 Documentation/DocBook/deviceiobook.tmpl create mode 100644 Documentation/driver-api/device-io.rst (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/DocBook/deviceiobook.tmpl b/Documentation/DocBook/deviceiobook.tmpl deleted file mode 100644 index 54199a0dcf9a..000000000000 --- a/Documentation/DocBook/deviceiobook.tmpl +++ /dev/null @@ -1,323 +0,0 @@ - - - - - - Bus-Independent Device Accesses - - - - Matthew - Wilcox - -
- matthew@wil.cx -
-
-
-
- - - - Alan - Cox - -
- alan@lxorguk.ukuu.org.uk -
-
-
-
- - - 2001 - Matthew Wilcox - - - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - - Introduction - - Linux provides an API which abstracts performing IO across all busses - and devices, allowing device drivers to be written independently of - bus type. - - - - - Known Bugs And Assumptions - - None. - - - - - Memory Mapped IO - - Getting Access to the Device - - The most widely supported form of IO is memory mapped IO. - That is, a part of the CPU's address space is interpreted - not as accesses to memory, but as accesses to a device. Some - architectures define devices to be at a fixed address, but most - have some method of discovering devices. The PCI bus walk is a - good example of such a scheme. This document does not cover how - to receive such an address, but assumes you are starting with one. - Physical addresses are of type unsigned long. - - - - This address should not be used directly. Instead, to get an - address suitable for passing to the accessor functions described - below, you should call ioremap. - An address suitable for accessing the device will be returned to you. - - - - After you've finished using the device (say, in your module's - exit routine), call iounmap in order to return - the address space to the kernel. Most architectures allocate new - address space each time you call ioremap, and - they can run out unless you call iounmap. - - - - - Accessing the device - - The part of the interface most used by drivers is reading and - writing memory-mapped registers on the device. Linux provides - interfaces to read and write 8-bit, 16-bit, 32-bit and 64-bit - quantities. Due to a historical accident, these are named byte, - word, long and quad accesses. Both read and write accesses are - supported; there is no prefetch support at this time. - - - - The functions are named readb, - readw, readl, - readq, readb_relaxed, - readw_relaxed, readl_relaxed, - readq_relaxed, writeb, - writew, writel and - writeq. - - - - Some devices (such as framebuffers) would like to use larger - transfers than 8 bytes at a time. For these devices, the - memcpy_toio, memcpy_fromio - and memset_io functions are provided. - Do not use memset or memcpy on IO addresses; they - are not guaranteed to copy data in order. - - - - The read and write functions are defined to be ordered. That is the - compiler is not permitted to reorder the I/O sequence. When the - ordering can be compiler optimised, you can use - __readb and friends to indicate the relaxed ordering. Use - this with care. - - - - While the basic functions are defined to be synchronous with respect - to each other and ordered with respect to each other the busses the - devices sit on may themselves have asynchronicity. In particular many - authors are burned by the fact that PCI bus writes are posted - asynchronously. A driver author must issue a read from the same - device to ensure that writes have occurred in the specific cases the - author cares. This kind of property cannot be hidden from driver - writers in the API. In some cases, the read used to flush the device - may be expected to fail (if the card is resetting, for example). In - that case, the read should be done from config space, which is - guaranteed to soft-fail if the card doesn't respond. - - - - The following is an example of flushing a write to a device when - the driver would like to ensure the write's effects are visible prior - to continuing execution. - - - -static inline void -qla1280_disable_intrs(struct scsi_qla_host *ha) -{ - struct device_reg *reg; - - reg = ha->iobase; - /* disable risc and host interrupts */ - WRT_REG_WORD(&reg->ictrl, 0); - /* - * The following read will ensure that the above write - * has been received by the device before we return from this - * function. - */ - RD_REG_WORD(&reg->ictrl); - ha->flags.ints_enabled = 0; -} - - - - In addition to write posting, on some large multiprocessing systems - (e.g. SGI Challenge, Origin and Altix machines) posted writes won't - be strongly ordered coming from different CPUs. Thus it's important - to properly protect parts of your driver that do memory-mapped writes - with locks and use the mmiowb to make sure they - arrive in the order intended. Issuing a regular readX - will also ensure write ordering, but should only be used - when the driver has to be sure that the write has actually arrived - at the device (not that it's simply ordered with respect to other - writes), since a full readX is a relatively - expensive operation. - - - - Generally, one should use mmiowb prior to - releasing a spinlock that protects regions using writeb - or similar functions that aren't surrounded by - readb calls, which will ensure ordering and flushing. The - following pseudocode illustrates what might occur if write ordering - isn't guaranteed via mmiowb or one of the - readX functions. - - - -CPU A: spin_lock_irqsave(&dev_lock, flags) -CPU A: ... -CPU A: writel(newval, ring_ptr); -CPU A: spin_unlock_irqrestore(&dev_lock, flags) - ... -CPU B: spin_lock_irqsave(&dev_lock, flags) -CPU B: writel(newval2, ring_ptr); -CPU B: ... -CPU B: spin_unlock_irqrestore(&dev_lock, flags) - - - - In the case above, newval2 could be written to ring_ptr before - newval. Fixing it is easy though: - - - -CPU A: spin_lock_irqsave(&dev_lock, flags) -CPU A: ... -CPU A: writel(newval, ring_ptr); -CPU A: mmiowb(); /* ensure no other writes beat us to the device */ -CPU A: spin_unlock_irqrestore(&dev_lock, flags) - ... -CPU B: spin_lock_irqsave(&dev_lock, flags) -CPU B: writel(newval2, ring_ptr); -CPU B: ... -CPU B: mmiowb(); -CPU B: spin_unlock_irqrestore(&dev_lock, flags) - - - - See tg3.c for a real world example of how to use mmiowb - - - - - PCI ordering rules also guarantee that PIO read responses arrive - after any outstanding DMA writes from that bus, since for some devices - the result of a readb call may signal to the - driver that a DMA transaction is complete. In many cases, however, - the driver may want to indicate that the next - readb call has no relation to any previous DMA - writes performed by the device. The driver can use - readb_relaxed for these cases, although only - some platforms will honor the relaxed semantics. Using the relaxed - read functions will provide significant performance benefits on - platforms that support it. The qla2xxx driver provides examples - of how to use readX_relaxed. In many cases, - a majority of the driver's readX calls can - safely be converted to readX_relaxed calls, since - only a few will indicate or depend on DMA completion. - - - - - - - Port Space Accesses - - Port Space Explained - - - Another form of IO commonly supported is Port Space. This is a - range of addresses separate to the normal memory address space. - Access to these addresses is generally not as fast as accesses - to the memory mapped addresses, and it also has a potentially - smaller address space. - - - - Unlike memory mapped IO, no preparation is required - to access port space. - - - - - Accessing Port Space - - Accesses to this space are provided through a set of functions - which allow 8-bit, 16-bit and 32-bit accesses; also - known as byte, word and long. These functions are - inb, inw, - inl, outb, - outw and outl. - - - - Some variants are provided for these functions. Some devices - require that accesses to their ports are slowed down. This - functionality is provided by appending a _p - to the end of the function. There are also equivalents to memcpy. - The ins and outs - functions copy bytes, words or longs to the given port. - - - - - - - Public Functions Provided -!Iarch/x86/include/asm/io.h -!Elib/pci_iomap.c - - -
diff --git a/Documentation/driver-api/device-io.rst b/Documentation/driver-api/device-io.rst new file mode 100644 index 000000000000..b00b23903078 --- /dev/null +++ b/Documentation/driver-api/device-io.rst @@ -0,0 +1,201 @@ +.. Copyright 2001 Matthew Wilcox +.. +.. This documentation is free software; you can redistribute +.. it and/or modify it under the terms of the GNU General Public +.. License as published by the Free Software Foundation; either +.. version 2 of the License, or (at your option) any later +.. version. + +=============================== +Bus-Independent Device Accesses +=============================== + +:Author: Matthew Wilcox +:Author: Alan Cox + +Introduction +============ + +Linux provides an API which abstracts performing IO across all busses +and devices, allowing device drivers to be written independently of bus +type. + +Memory Mapped IO +================ + +Getting Access to the Device +---------------------------- + +The most widely supported form of IO is memory mapped IO. That is, a +part of the CPU's address space is interpreted not as accesses to +memory, but as accesses to a device. Some architectures define devices +to be at a fixed address, but most have some method of discovering +devices. The PCI bus walk is a good example of such a scheme. This +document does not cover how to receive such an address, but assumes you +are starting with one. Physical addresses are of type unsigned long. + +This address should not be used directly. Instead, to get an address +suitable for passing to the accessor functions described below, you +should call :c:func:`ioremap()`. An address suitable for accessing +the device will be returned to you. + +After you've finished using the device (say, in your module's exit +routine), call :c:func:`iounmap()` in order to return the address +space to the kernel. Most architectures allocate new address space each +time you call :c:func:`ioremap()`, and they can run out unless you +call :c:func:`iounmap()`. + +Accessing the device +-------------------- + +The part of the interface most used by drivers is reading and writing +memory-mapped registers on the device. Linux provides interfaces to read +and write 8-bit, 16-bit, 32-bit and 64-bit quantities. Due to a +historical accident, these are named byte, word, long and quad accesses. +Both read and write accesses are supported; there is no prefetch support +at this time. + +The functions are named readb(), readw(), readl(), readq(), +readb_relaxed(), readw_relaxed(), readl_relaxed(), readq_relaxed(), +writeb(), writew(), writel() and writeq(). + +Some devices (such as framebuffers) would like to use larger transfers than +8 bytes at a time. For these devices, the :c:func:`memcpy_toio()`, +:c:func:`memcpy_fromio()` and :c:func:`memset_io()` functions are +provided. Do not use memset or memcpy on IO addresses; they are not +guaranteed to copy data in order. + +The read and write functions are defined to be ordered. That is the +compiler is not permitted to reorder the I/O sequence. When the ordering +can be compiler optimised, you can use __readb() and friends to +indicate the relaxed ordering. Use this with care. + +While the basic functions are defined to be synchronous with respect to +each other and ordered with respect to each other the busses the devices +sit on may themselves have asynchronicity. In particular many authors +are burned by the fact that PCI bus writes are posted asynchronously. A +driver author must issue a read from the same device to ensure that +writes have occurred in the specific cases the author cares. This kind +of property cannot be hidden from driver writers in the API. In some +cases, the read used to flush the device may be expected to fail (if the +card is resetting, for example). In that case, the read should be done +from config space, which is guaranteed to soft-fail if the card doesn't +respond. + +The following is an example of flushing a write to a device when the +driver would like to ensure the write's effects are visible prior to +continuing execution:: + + static inline void + qla1280_disable_intrs(struct scsi_qla_host *ha) + { + struct device_reg *reg; + + reg = ha->iobase; + /* disable risc and host interrupts */ + WRT_REG_WORD(®->ictrl, 0); + /* + * The following read will ensure that the above write + * has been received by the device before we return from this + * function. + */ + RD_REG_WORD(®->ictrl); + ha->flags.ints_enabled = 0; + } + +In addition to write posting, on some large multiprocessing systems +(e.g. SGI Challenge, Origin and Altix machines) posted writes won't be +strongly ordered coming from different CPUs. Thus it's important to +properly protect parts of your driver that do memory-mapped writes with +locks and use the :c:func:`mmiowb()` to make sure they arrive in the +order intended. Issuing a regular readX() will also ensure write ordering, +but should only be used when the +driver has to be sure that the write has actually arrived at the device +(not that it's simply ordered with respect to other writes), since a +full readX() is a relatively expensive operation. + +Generally, one should use :c:func:`mmiowb()` prior to releasing a spinlock +that protects regions using :c:func:`writeb()` or similar functions that +aren't surrounded by readb() calls, which will ensure ordering +and flushing. The following pseudocode illustrates what might occur if +write ordering isn't guaranteed via :c:func:`mmiowb()` or one of the +readX() functions:: + + CPU A: spin_lock_irqsave(&dev_lock, flags) + CPU A: ... + CPU A: writel(newval, ring_ptr); + CPU A: spin_unlock_irqrestore(&dev_lock, flags) + ... + CPU B: spin_lock_irqsave(&dev_lock, flags) + CPU B: writel(newval2, ring_ptr); + CPU B: ... + CPU B: spin_unlock_irqrestore(&dev_lock, flags) + +In the case above, newval2 could be written to ring_ptr before newval. +Fixing it is easy though:: + + CPU A: spin_lock_irqsave(&dev_lock, flags) + CPU A: ... + CPU A: writel(newval, ring_ptr); + CPU A: mmiowb(); /* ensure no other writes beat us to the device */ + CPU A: spin_unlock_irqrestore(&dev_lock, flags) + ... + CPU B: spin_lock_irqsave(&dev_lock, flags) + CPU B: writel(newval2, ring_ptr); + CPU B: ... + CPU B: mmiowb(); + CPU B: spin_unlock_irqrestore(&dev_lock, flags) + +See tg3.c for a real world example of how to use :c:func:`mmiowb()` + +PCI ordering rules also guarantee that PIO read responses arrive after any +outstanding DMA writes from that bus, since for some devices the result of +a readb() call may signal to the driver that a DMA transaction is +complete. In many cases, however, the driver may want to indicate that the +next readb() call has no relation to any previous DMA writes +performed by the device. The driver can use readb_relaxed() for +these cases, although only some platforms will honor the relaxed +semantics. Using the relaxed read functions will provide significant +performance benefits on platforms that support it. The qla2xxx driver +provides examples of how to use readX_relaxed(). In many cases, a majority +of the driver's readX() calls can safely be converted to readX_relaxed() +calls, since only a few will indicate or depend on DMA completion. + +Port Space Accesses +=================== + +Port Space Explained +-------------------- + +Another form of IO commonly supported is Port Space. This is a range of +addresses separate to the normal memory address space. Access to these +addresses is generally not as fast as accesses to the memory mapped +addresses, and it also has a potentially smaller address space. + +Unlike memory mapped IO, no preparation is required to access port +space. + +Accessing Port Space +-------------------- + +Accesses to this space are provided through a set of functions which +allow 8-bit, 16-bit and 32-bit accesses; also known as byte, word and +long. These functions are :c:func:`inb()`, :c:func:`inw()`, +:c:func:`inl()`, :c:func:`outb()`, :c:func:`outw()` and +:c:func:`outl()`. + +Some variants are provided for these functions. Some devices require +that accesses to their ports are slowed down. This functionality is +provided by appending a ``_p`` to the end of the function. +There are also equivalents to memcpy. The :c:func:`ins()` and +:c:func:`outs()` functions copy bytes, words or longs to the given +port. + +Public Functions Provided +========================= + +.. kernel-doc:: arch/x86/include/asm/io.h + :internal: + +.. kernel-doc:: lib/pci_iomap.c + :export: diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index a2e5db07756c..365ce64abd7c 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -16,6 +16,7 @@ available subsections can be seen below. basics infrastructure + device-io dma-buf device_link message-based -- cgit v1.2.3 From 028f25332c4f7e8befb22e12eaedd105cd45acb2 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 27 Jan 2017 16:50:34 -0700 Subject: docs: Convert the regulator docbook to RST A fairly straightforward conversion to RST; the document is then added to the driver-api manual. Of course, this document has seen no substantive changes since 2008, so chances are it needs work in other areas as well. Cc: Mark Brown Signed-off-by: Jonathan Corbet --- Documentation/DocBook/regulator.tmpl | 304 --------------------------------- Documentation/driver-api/index.rst | 1 + Documentation/driver-api/regulator.rst | 170 ++++++++++++++++++ 3 files changed, 171 insertions(+), 304 deletions(-) delete mode 100644 Documentation/DocBook/regulator.tmpl create mode 100644 Documentation/driver-api/regulator.rst (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/DocBook/regulator.tmpl b/Documentation/DocBook/regulator.tmpl deleted file mode 100644 index 3b08a085d2c7..000000000000 --- a/Documentation/DocBook/regulator.tmpl +++ /dev/null @@ -1,304 +0,0 @@ - - - - - - Voltage and current regulator API - - - - Liam - Girdwood - -
- lrg@slimlogic.co.uk -
-
-
- - Mark - Brown - - Wolfson Microelectronics -
- broonie@opensource.wolfsonmicro.com -
-
-
-
- - - 2007-2008 - Wolfson Microelectronics - - - 2008 - Liam Girdwood - - - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License version 2 as published by the Free Software Foundation. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - - Introduction - - This framework is designed to provide a standard kernel - interface to control voltage and current regulators. - - - The intention is to allow systems to dynamically control - regulator power output in order to save power and prolong - battery life. This applies to both voltage regulators (where - voltage output is controllable) and current sinks (where current - limit is controllable). - - - Note that additional (and currently more complete) documentation - is available in the Linux kernel source under - Documentation/power/regulator. - - - - Glossary - - The regulator API uses a number of terms which may not be - familiar: - - - - - Regulator - - - Electronic device that supplies power to other devices. Most - regulators can enable and disable their output and some can also - control their output voltage or current. - - - - - - Consumer - - - Electronic device which consumes power provided by a regulator. - These may either be static, requiring only a fixed supply, or - dynamic, requiring active management of the regulator at - runtime. - - - - - - Power Domain - - - The electronic circuit supplied by a given regulator, including - the regulator and all consumer devices. The configuration of - the regulator is shared between all the components in the - circuit. - - - - - - Power Management Integrated Circuit - PMIC - - - An IC which contains numerous regulators and often also other - subsystems. In an embedded system the primary PMIC is often - equivalent to a combination of the PSU and southbridge in a - desktop system. - - - - - - - - - Consumer driver interface - - This offers a similar API to the kernel clock framework. - Consumer drivers use get and put operations to acquire and - release regulators. Functions are - provided to enable - and disable the - regulator and to get and set the runtime parameters of the - regulator. - - - When requesting regulators consumers use symbolic names for their - supplies, such as "Vcc", which are mapped into actual regulator - devices by the machine interface. - - - A stub version of this API is provided when the regulator - framework is not in use in order to minimise the need to use - ifdefs. - - - - Enabling and disabling - - The regulator API provides reference counted enabling and - disabling of regulators. Consumer devices use the regulator_enable - and regulator_disable - functions to enable and disable regulators. Calls - to the two functions must be balanced. - - - Note that since multiple consumers may be using a regulator and - machine constraints may not allow the regulator to be disabled - there is no guarantee that calling - regulator_disable will actually cause the - supply provided by the regulator to be disabled. Consumer - drivers should assume that the regulator may be enabled at all - times. - - - - - Configuration - - Some consumer devices may need to be able to dynamically - configure their supplies. For example, MMC drivers may need to - select the correct operating voltage for their cards. This may - be done while the regulator is enabled or disabled. - - - The regulator_set_voltage - and regulator_set_current_limit - functions provide the primary interface for this. - Both take ranges of voltages and currents, supporting drivers - that do not require a specific value (eg, CPU frequency scaling - normally permits the CPU to use a wider range of supply - voltages at lower frequencies but does not require that the - supply voltage be lowered). Where an exact value is required - both minimum and maximum values should be identical. - - - - - Callbacks - - Callbacks may also be registered - for events such as regulation failures. - - - - - - Regulator driver interface - - Drivers for regulator chips register the regulators - with the regulator core, providing operations structures to the - core. A notifier interface - allows error conditions to be reported to the core. - - - Registration should be triggered by explicit setup done by the - platform, supplying a struct - regulator_init_data for the regulator containing - constraint and - supply information. - - - - - Machine interface - - This interface provides a way to define how regulators are - connected to consumers on a given system and what the valid - operating parameters are for the system. - - - - Supplies - - Regulator supplies are specified using struct - regulator_consumer_supply. This is done at - driver registration - time as part of the machine constraints. - - - - - Constraints - - As well as defining the connections the machine interface - also provides constraints defining the operations that - clients are allowed to perform and the parameters that may be - set. This is required since generally regulator devices will - offer more flexibility than it is safe to use on a given - system, for example supporting higher supply voltages than the - consumers are rated for. - - - This is done at driver - registration time by providing a struct - regulation_constraints. - - - The constraints may also specify an initial configuration for the - regulator in the constraints, which is particularly useful for - use with static consumers. - - - - - - API reference - - Due to limitations of the kernel documentation framework and the - existing layout of the source code the entire regulator API is - documented here. - -!Iinclude/linux/regulator/consumer.h -!Iinclude/linux/regulator/machine.h -!Iinclude/linux/regulator/driver.h -!Edrivers/regulator/core.c - -
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 365ce64abd7c..9cdf81b59ec5 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -22,6 +22,7 @@ available subsections can be seen below. message-based sound frame-buffer + regulator iio/index input usb diff --git a/Documentation/driver-api/regulator.rst b/Documentation/driver-api/regulator.rst new file mode 100644 index 000000000000..520da0a5251d --- /dev/null +++ b/Documentation/driver-api/regulator.rst @@ -0,0 +1,170 @@ +.. Copyright 2007-2008 Wolfson Microelectronics + +.. This documentation is free software; you can redistribute +.. it and/or modify it under the terms of the GNU General Public +.. License version 2 as published by the Free Software Foundation. + +================================= +Voltage and current regulator API +================================= + +:Author: Liam Girdwood +:Author: Mark Brown + +Introduction +============ + +This framework is designed to provide a standard kernel interface to +control voltage and current regulators. + +The intention is to allow systems to dynamically control regulator power +output in order to save power and prolong battery life. This applies to +both voltage regulators (where voltage output is controllable) and +current sinks (where current limit is controllable). + +Note that additional (and currently more complete) documentation is +available in the Linux kernel source under +``Documentation/power/regulator``. + +Glossary +-------- + +The regulator API uses a number of terms which may not be familiar: + +Regulator + + Electronic device that supplies power to other devices. Most regulators + can enable and disable their output and some can also control their + output voltage or current. + +Consumer + + Electronic device which consumes power provided by a regulator. These + may either be static, requiring only a fixed supply, or dynamic, + requiring active management of the regulator at runtime. + +Power Domain + + The electronic circuit supplied by a given regulator, including the + regulator and all consumer devices. The configuration of the regulator + is shared between all the components in the circuit. + +Power Management Integrated Circuit (PMIC) + + An IC which contains numerous regulators and often also other + subsystems. In an embedded system the primary PMIC is often equivalent + to a combination of the PSU and southbridge in a desktop system. + +Consumer driver interface +========================= + +This offers a similar API to the kernel clock framework. Consumer +drivers use `get <#API-regulator-get>`__ and +`put <#API-regulator-put>`__ operations to acquire and release +regulators. Functions are provided to `enable <#API-regulator-enable>`__ +and `disable <#API-regulator-disable>`__ the regulator and to get and +set the runtime parameters of the regulator. + +When requesting regulators consumers use symbolic names for their +supplies, such as "Vcc", which are mapped into actual regulator devices +by the machine interface. + +A stub version of this API is provided when the regulator framework is +not in use in order to minimise the need to use ifdefs. + +Enabling and disabling +---------------------- + +The regulator API provides reference counted enabling and disabling of +regulators. Consumer devices use the :c:func:`regulator_enable()` and +:c:func:`regulator_disable()` functions to enable and disable +regulators. Calls to the two functions must be balanced. + +Note that since multiple consumers may be using a regulator and machine +constraints may not allow the regulator to be disabled there is no +guarantee that calling :c:func:`regulator_disable()` will actually +cause the supply provided by the regulator to be disabled. Consumer +drivers should assume that the regulator may be enabled at all times. + +Configuration +------------- + +Some consumer devices may need to be able to dynamically configure their +supplies. For example, MMC drivers may need to select the correct +operating voltage for their cards. This may be done while the regulator +is enabled or disabled. + +The :c:func:`regulator_set_voltage()` and +:c:func:`regulator_set_current_limit()` functions provide the primary +interface for this. Both take ranges of voltages and currents, supporting +drivers that do not require a specific value (eg, CPU frequency scaling +normally permits the CPU to use a wider range of supply voltages at lower +frequencies but does not require that the supply voltage be lowered). Where +an exact value is required both minimum and maximum values should be +identical. + +Callbacks +--------- + +Callbacks may also be registered for events such as regulation failures. + +Regulator driver interface +========================== + +Drivers for regulator chips register the regulators with the regulator +core, providing operations structures to the core. A notifier interface +allows error conditions to be reported to the core. + +Registration should be triggered by explicit setup done by the platform, +supplying a struct :c:type:`regulator_init_data` for the regulator +containing constraint and supply information. + +Machine interface +================= + +This interface provides a way to define how regulators are connected to +consumers on a given system and what the valid operating parameters are +for the system. + +Supplies +-------- + +Regulator supplies are specified using struct +:c:type:`regulator_consumer_supply`. This is done at driver registration +time as part of the machine constraints. + +Constraints +----------- + +As well as defining the connections the machine interface also provides +constraints defining the operations that clients are allowed to perform +and the parameters that may be set. This is required since generally +regulator devices will offer more flexibility than it is safe to use on +a given system, for example supporting higher supply voltages than the +consumers are rated for. + +This is done at driver registration time` by providing a +struct :c:type:`regulation_constraints`. + +The constraints may also specify an initial configuration for the +regulator in the constraints, which is particularly useful for use with +static consumers. + +API reference +============= + +Due to limitations of the kernel documentation framework and the +existing layout of the source code the entire regulator API is +documented here. + +.. kernel-doc:: include/linux/regulator/consumer.h + :internal: + +.. kernel-doc:: include/linux/regulator/machine.h + :internal: + +.. kernel-doc:: include/linux/regulator/driver.h + :internal: + +.. kernel-doc:: drivers/regulator/core.c + :export: -- cgit v1.2.3 From 2728b2d2e5be4b828a523a06089cd605419fc65c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 2 Feb 2017 01:32:13 +0100 Subject: PM / core / docs: Convert sleep states API document to reST Move the document describing the system sleep state transitions API for devices to Documentation/driver-api/pm/, convert it to reST and update it to use current terminology. Also remove the remaining reference to the old version of it from pm.h. The new document still contains references to some documents in the .txt format that will be converted later. Signed-off-by: Rafael J. Wysocki Signed-off-by: Jonathan Corbet --- Documentation/driver-api/index.rst | 1 + Documentation/driver-api/pm/conf.py | 10 + Documentation/driver-api/pm/devices.rst | 732 ++++++++++++++++++++++++++++++++ Documentation/driver-api/pm/index.rst | 15 + Documentation/driver-api/pm/types.rst | 5 + Documentation/power/devices.txt | 716 ------------------------------- include/linux/pm.h | 3 - 7 files changed, 763 insertions(+), 719 deletions(-) create mode 100644 Documentation/driver-api/pm/conf.py create mode 100644 Documentation/driver-api/pm/devices.rst create mode 100644 Documentation/driver-api/pm/index.rst create mode 100644 Documentation/driver-api/pm/types.rst delete mode 100644 Documentation/power/devices.txt (limited to 'Documentation/driver-api/index.rst') diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 9cdf81b59ec5..ea580c0aa232 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -16,6 +16,7 @@ available subsections can be seen below. basics infrastructure + pm/index device-io dma-buf device_link diff --git a/Documentation/driver-api/pm/conf.py b/Documentation/driver-api/pm/conf.py new file mode 100644 index 000000000000..a89fac11272f --- /dev/null +++ b/Documentation/driver-api/pm/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = "Device Power Management" + +tags.add("subproject") + +latex_documents = [ + ('index', 'pm.tex', project, + 'The kernel development community', 'manual'), +] diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst new file mode 100644 index 000000000000..f165cf6f733e --- /dev/null +++ b/Documentation/driver-api/pm/devices.rst @@ -0,0 +1,732 @@ +.. |struct| replace:: :c:type:`struct` + +============================== +Device Power Management Basics +============================== + +:: + + Copyright (c) 2010-2011 Rafael J. Wysocki , Novell Inc. + Copyright (c) 2010 Alan Stern + Copyright (c) 2016 Intel Corp., Rafael J. Wysocki + +Most of the code in Linux is device drivers, so most of the Linux power +management (PM) code is also driver-specific. Most drivers will do very +little; others, especially for platforms with small batteries (like cell +phones), will do a lot. + +This writeup gives an overview of how drivers interact with system-wide +power management goals, emphasizing the models and interfaces that are +shared by everything that hooks up to the driver model core. Read it as +background for the domain-specific work you'd do with any specific driver. + + +Two Models for Device Power Management +====================================== + +Drivers will use one or both of these models to put devices into low-power +states: + + System Sleep model: + + Drivers can enter low-power states as part of entering system-wide + low-power states like "suspend" (also known as "suspend-to-RAM"), or + (mostly for systems with disks) "hibernation" (also known as + "suspend-to-disk"). + + This is something that device, bus, and class drivers collaborate on + by implementing various role-specific suspend and resume methods to + cleanly power down hardware and software subsystems, then reactivate + them without loss of data. + + Some drivers can manage hardware wakeup events, which make the system + leave the low-power state. This feature may be enabled or disabled + using the relevant :file:`/sys/devices/.../power/wakeup` file (for + Ethernet drivers the ioctl interface used by ethtool may also be used + for this purpose); enabling it may cost some power usage, but let the + whole system enter low-power states more often. + + Runtime Power Management model: + + Devices may also be put into low-power states while the system is + running, independently of other power management activity in principle. + However, devices are not generally independent of each other (for + example, a parent device cannot be suspended unless all of its child + devices have been suspended). Moreover, depending on the bus type the + device is on, it may be necessary to carry out some bus-specific + operations on the device for this purpose. Devices put into low power + states at run time may require special handling during system-wide power + transitions (suspend or hibernation). + + For these reasons not only the device driver itself, but also the + appropriate subsystem (bus type, device type or device class) driver and + the PM core are involved in runtime power management. As in the system + sleep power management case, they need to collaborate by implementing + various role-specific suspend and resume methods, so that the hardware + is cleanly powered down and reactivated without data or service loss. + +There's not a lot to be said about those low-power states except that they are +very system-specific, and often device-specific. Also, that if enough devices +have been put into low-power states (at runtime), the effect may be very similar +to entering some system-wide low-power state (system sleep) ... and that +synergies exist, so that several drivers using runtime PM might put the system +into a state where even deeper power saving options are available. + +Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except +for wakeup events), no more data read or written, and requests from upstream +drivers are no longer accepted. A given bus or platform may have different +requirements though. + +Examples of hardware wakeup events include an alarm from a real time clock, +network wake-on-LAN packets, keyboard or mouse activity, and media insertion +or removal (for PCMCIA, MMC/SD, USB, and so on). + +Interfaces for Entering System Sleep States +=========================================== + +There are programming interfaces provided for subsystems (bus type, device type, +device class) and device drivers to allow them to participate in the power +management of devices they are concerned with. These interfaces cover both +system sleep and runtime power management. + + +Device Power Management Operations +---------------------------------- + +Device power management operations, at the subsystem level as well as at the +device driver level, are implemented by defining and populating objects of type +|struct| :c:type:`dev_pm_ops` defined in :file:`include/linux/pm.h`. +The roles of the methods included in it will be explained in what follows. For +now, it should be sufficient to remember that the last three methods are +specific to runtime power management while the remaining ones are used during +system-wide power transitions. + +There also is a deprecated "old" or "legacy" interface for power management +operations available at least for some subsystems. This approach does not use +|struct| :c:type:`dev_pm_ops` objects and it is suitable only for implementing +system sleep power management methods in a limited way. Therefore it is not +described in this document, so please refer directly to the source code for more +information about it. + + +Subsystem-Level Methods +----------------------- + +The core methods to suspend and resume devices reside in +|struct| :c:type:`dev_pm_ops` pointed to by the :c:member:`ops` +member of |struct| :c:type:`dev_pm_domain`, or by the :c:member:`pm` +member of |struct| :c:type:`bus_type`, |struct| :c:type:`device_type` and +|struct| :c:type:`class`. They are mostly of interest to the people writing +infrastructure for platforms and buses, like PCI or USB, or device type and +device class drivers. They also are relevant to the writers of device drivers +whose subsystems (PM domains, device types, device classes and bus types) don't +provide all power management methods. + +Bus drivers implement these methods as appropriate for the hardware and the +drivers using it; PCI works differently from USB, and so on. Not many people +write subsystem-level drivers; most driver code is a "device driver" that builds +on top of bus-specific framework code. + +For more information on these driver calls, see the description later; +they are called in phases for every device, respecting the parent-child +sequencing in the driver model tree. + + +:file:`/sys/devices/.../power/wakeup` files +------------------------------------------- + +All device objects in the driver model contain fields that control the handling +of system wakeup events (hardware signals that can force the system out of a +sleep state). These fields are initialized by bus or device driver code using +:c:func:`device_set_wakeup_capable()` and :c:func:`device_set_wakeup_enable()`, +defined in :file:`include/linux/pm_wakeup.h`. + +The :c:member:`power.can_wakeup` flag just records whether the device (and its +driver) can physically support wakeup events. The +:c:func:`device_set_wakeup_capable()` routine affects this flag. The +:c:member:`power.wakeup` field is a pointer to an object of type +|struct| :c:type:`wakeup_source` used for controlling whether or not +the device should use its system wakeup mechanism and for notifying the +PM core of system wakeup events signaled by the device. This object is only +present for wakeup-capable devices (i.e. devices whose +:c:member:`can_wakeup` flags are set) and is created (or removed) by +:c:func:`device_set_wakeup_capable()`. + +Whether or not a device is capable of issuing wakeup events is a hardware +matter, and the kernel is responsible for keeping track of it. By contrast, +whether or not a wakeup-capable device should issue wakeup events is a policy +decision, and it is managed by user space through a sysfs attribute: the +:file:`power/wakeup` file. User space can write the "enabled" or "disabled" +strings to it to indicate whether or not, respectively, the device is supposed +to signal system wakeup. This file is only present if the +:c:member:`power.wakeup` object exists for the given device and is created (or +removed) along with that object, by :c:func:`device_set_wakeup_capable()`. +Reads from the file will return the corresponding string. + +The initial value in the :file:`power/wakeup` file is "disabled" for the +majority of devices; the major exceptions are power buttons, keyboards, and +Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool. +It should also default to "enabled" for devices that don't generate wakeup +requests on their own but merely forward wakeup requests from one bus to another +(like PCI Express ports). + +The :c:func:`device_may_wakeup()` routine returns true only if the +:c:member:`power.wakeup` object exists and the corresponding :file:`power/wakeup` +file contains the "enabled" string. This information is used by subsystems, +like the PCI bus type code, to see whether or not to enable the devices' wakeup +mechanisms. If device wakeup mechanisms are enabled or disabled directly by +drivers, they also should use :c:func:`device_may_wakeup()` to decide what to do +during a system sleep transition. Device drivers, however, are not expected to +call :c:func:`device_set_wakeup_enable()` directly in any case. + +It ought to be noted that system wakeup is conceptually different from "remote +wakeup" used by runtime power management, although it may be supported by the +same physical mechanism. Remote wakeup is a feature allowing devices in +low-power states to trigger specific interrupts to signal conditions in which +they should be put into the full-power state. Those interrupts may or may not +be used to signal system wakeup events, depending on the hardware design. On +some systems it is impossible to trigger them from system sleep states. In any +case, remote wakeup should always be enabled for runtime power management for +all devices and drivers that support it. + + +:file:`/sys/devices/.../power/control` files +-------------------------------------------- + +Each device in the driver model has a flag to control whether it is subject to +runtime power management. This flag, :c:member:`runtime_auto`, is initialized +by the bus type (or generally subsystem) code using :c:func:`pm_runtime_allow()` +or :c:func:`pm_runtime_forbid()`; the default is to allow runtime power +management. + +The setting can be adjusted by user space by writing either "on" or "auto" to +the device's :file:`power/control` sysfs file. Writing "auto" calls +:c:func:`pm_runtime_allow()`, setting the flag and allowing the device to be +runtime power-managed by its driver. Writing "on" calls +:c:func:`pm_runtime_forbid()`, clearing the flag, returning the device to full +power if it was in a low-power state, and preventing the +device from being runtime power-managed. User space can check the current value +of the :c:member:`runtime_auto` flag by reading that file. + +The device's :c:member:`runtime_auto` flag has no effect on the handling of +system-wide power transitions. In particular, the device can (and in the +majority of cases should and will) be put into a low-power state during a +system-wide transition to a sleep state even though its :c:member:`runtime_auto` +flag is clear. + +For more information about the runtime power management framework, refer to +:file:`Documentation/power/runtime_pm.txt`. + + +Calling Drivers to Enter and Leave System Sleep States +====================================================== + +When the system goes into a sleep state, each device's driver is asked to +suspend the device by putting it into a state compatible with the target +system state. That's usually some version of "off", but the details are +system-specific. Also, wakeup-enabled devices will usually stay partly +functional in order to wake the system. + +When the system leaves that low-power state, the device's driver is asked to +resume it by returning it to full power. The suspend and resume operations +always go together, and both are multi-phase operations. + +For simple drivers, suspend might quiesce the device using class code +and then turn its hardware as "off" as possible during suspend_noirq. The +matching resume calls would then completely reinitialize the hardware +before reactivating its class I/O queues. + +More power-aware drivers might prepare the devices for triggering system wakeup +events. + + +Call Sequence Guarantees +------------------------ + +To ensure that bridges and similar links needing to talk to a device are +available when the device is suspended or resumed, the device hierarchy is +walked in a bottom-up order to suspend devices. A top-down order is +used to resume those devices. + +The ordering of the device hierarchy is defined by the order in which devices +get registered: a child can never be registered, probed or resumed before +its parent; and can't be removed or suspended after that parent. + +The policy is that the device hierarchy should match hardware bus topology. +[Or at least the control bus, for devices which use multiple busses.] +In particular, this means that a device registration may fail if the parent of +the device is suspending (i.e. has been chosen by the PM core as the next +device to suspend) or has already suspended, as well as after all of the other +devices have been suspended. Device drivers must be prepared to cope with such +situations. + + +System Power Management Phases +------------------------------ + +Suspending or resuming the system is done in several phases. Different phases +are used for suspend-to-idle, shallow (standby), and deep ("suspend-to-RAM") +sleep states and the hibernation state ("suspend-to-disk"). Each phase involves +executing callbacks for every device before the next phase begins. Not all +buses or classes support all these callbacks and not all drivers use all the +callbacks. The various phases always run after tasks have been frozen and +before they are unfrozen. Furthermore, the ``*_noirq phases`` run at a time +when IRQ handlers have been disabled (except for those marked with the +IRQF_NO_SUSPEND flag). + +All phases use PM domain, bus, type, class or driver callbacks (that is, methods +defined in ``dev->pm_domain->ops``, ``dev->bus->pm``, ``dev->type->pm``, +``dev->class->pm`` or ``dev->driver->pm``). These callbacks are regarded by the +PM core as mutually exclusive. Moreover, PM domain callbacks always take +precedence over all of the other callbacks and, for example, type callbacks take +precedence over bus, class and driver callbacks. To be precise, the following +rules are used to determine which callback to execute in the given phase: + + 1. If ``dev->pm_domain`` is present, the PM core will choose the callback + provided by ``dev->pm_domain->ops`` for execution. + + 2. Otherwise, if both ``dev->type`` and ``dev->type->pm`` are present, the + callback provided by ``dev->type->pm`` will be chosen for execution. + + 3. Otherwise, if both ``dev->class`` and ``dev->class->pm`` are present, + the callback provided by ``dev->class->pm`` will be chosen for + execution. + + 4. Otherwise, if both ``dev->bus`` and ``dev->bus->pm`` are present, the + callback provided by ``dev->bus->pm`` will be chosen for execution. + +This allows PM domains and device types to override callbacks provided by bus +types or device classes if necessary. + +The PM domain, type, class and bus callbacks may in turn invoke device- or +driver-specific methods stored in ``dev->driver->pm``, but they don't have to do +that. + +If the subsystem callback chosen for execution is not present, the PM core will +execute the corresponding method from the ``dev->driver->pm`` set instead if +there is one. + + +Entering System Suspend +----------------------- + +When the system goes into the freeze, standby or memory sleep state, +the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. + + 1. The ``prepare`` phase is meant to prevent races by preventing new + devices from being registered; the PM core would never know that all the + children of a device had been suspended if new children could be + registered at will. [By contrast, from the PM core's perspective, + devices may be unregistered at any time.] Unlike the other + suspend-related phases, during the ``prepare`` phase the device + hierarchy is traversed top-down. + + After the ``->prepare`` callback method returns, no new children may be + registered below the device. The method may also prepare the device or + driver in some way for the upcoming system power transition, but it + should not put the device into a low-power state. + + For devices supporting runtime power management, the return value of the + prepare callback can be used to indicate to the PM core that it may + safely leave the device in runtime suspend (if runtime-suspended + already), provided that all of the device's descendants are also left in + runtime suspend. Namely, if the prepare callback returns a positive + number and that happens for all of the descendants of the device too, + and all of them (including the device itself) are runtime-suspended, the + PM core will skip the ``suspend``, ``suspend_late`` and + ``suspend_noirq`` phases as well as all of the corresponding phases of + the subsequent device resume for all of these devices. In that case, + the ``->complete`` callback will be invoked directly after the + ``->prepare`` callback and is entirely responsible for putting the + device into a consistent state as appropriate. + + Note that this direct-complete procedure applies even if the device is + disabled for runtime PM; only the runtime-PM status matters. It follows + that if a device has system-sleep callbacks but does not support runtime + PM, then its prepare callback must never return a positive value. This + is because all such devices are initially set to runtime-suspended with + runtime PM disabled. + + 2. The ``->suspend`` methods should quiesce the device to stop it from + performing I/O. They also may save the device registers and put it into + the appropriate low-power state, depending on the bus type the device is + on, and they may enable wakeup events. + + 3. For a number of devices it is convenient to split suspend into the + "quiesce device" and "save device state" phases, in which cases + ``suspend_late`` is meant to do the latter. It is always executed after + runtime power management has been disabled for the device in question. + + 4. The ``suspend_noirq`` phase occurs after IRQ handlers have been disabled, + which means that the driver's interrupt handler will not be called while + the callback method is running. The ``->suspend_noirq`` methods should + save the values of the device's registers that weren't saved previously + and finally put the device into the appropriate low-power state. + + The majority of subsystems and device drivers need not implement this + callback. However, bus types allowing devices to share interrupt + vectors, like PCI, generally need it; otherwise a driver might encounter + an error during the suspend phase by fielding a shared interrupt + generated by some other device after its own device had been set to low + power. + +At the end of these phases, drivers should have stopped all I/O transactions +(DMA, IRQs), saved enough state that they can re-initialize or restore previous +state (as needed by the hardware), and placed the device into a low-power state. +On many platforms they will gate off one or more clock sources; sometimes they +will also switch off power supplies or reduce voltages. [Drivers supporting +runtime PM may already have performed some or all of these steps.] + +If :c:func:`device_may_wakeup(dev)` returns ``true``, the device should be +prepared for generating hardware wakeup signals to trigger a system wakeup event +when the system is in the sleep state. For example, :c:func:`enable_irq_wake()` +might identify GPIO signals hooked up to a switch or other external hardware, +and :c:func:`pci_enable_wake()` does something similar for the PCI PME signal. + +If any of these callbacks returns an error, the system won't enter the desired +low-power state. Instead, the PM core will unwind its actions by resuming all +the devices that were suspended. + + +Leaving System Suspend +---------------------- + +When resuming from freeze, standby or memory sleep, the phases are: +``resume_noirq``, ``resume_early``, ``resume``, ``complete``. + + 1. The ``->resume_noirq`` callback methods should perform any actions + needed before the driver's interrupt handlers are invoked. This + generally means undoing the actions of the ``suspend_noirq`` phase. If + the bus type permits devices to share interrupt vectors, like PCI, the + method should bring the device and its driver into a state in which the + driver can recognize if the device is the source of incoming interrupts, + if any, and handle them correctly. + + For example, the PCI bus type's ``->pm.resume_noirq()`` puts the device + into the full-power state (D0 in the PCI terminology) and restores the + standard configuration registers of the device. Then it calls the + device driver's ``->pm.resume_noirq()`` method to perform device-specific + actions. + + 2. The ``->resume_early`` methods should prepare devices for the execution + of the resume methods. This generally involves undoing the actions of + the preceding ``suspend_late`` phase. + + 3. The ``->resume`` methods should bring the device back to its operating + state, so that it can perform normal I/O. This generally involves + undoing the actions of the ``suspend`` phase. + + 4. The ``complete`` phase should undo the actions of the ``prepare`` phase. + For this reason, unlike the other resume-related phases, during the + ``complete`` phase the device hierarchy is traversed bottom-up. + + Note, however, that new children may be registered below the device as + soon as the ``->resume`` callbacks occur; it's not necessary to wait + until the ``complete`` phase with that. + + Moreover, if the preceding ``->prepare`` callback returned a positive + number, the device may have been left in runtime suspend throughout the + whole system suspend and resume (the ``suspend``, ``suspend_late``, + ``suspend_noirq`` phases of system suspend and the ``resume_noirq``, + ``resume_early``, ``resume`` phases of system resume may have been + skipped for it). In that case, the ``->complete`` callback is entirely + responsible for putting the device into a consistent state after system + suspend if necessary. [For example, it may need to queue up a runtime + resume request for the device for this purpose.] To check if that is + the case, the ``->complete`` callback can consult the device's + ``power.direct_complete`` flag. Namely, if that flag is set when the + ``->complete`` callback is being run, it has been called directly after + the preceding ``->prepare`` and special actions may be required + to make the device work correctly afterward. + +At the end of these phases, drivers should be as functional as they were before +suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are +gated on. + +However, the details here may again be platform-specific. For example, +some systems support multiple "run" states, and the mode in effect at +the end of resume might not be the one which preceded suspension. +That means availability of certain clocks or power supplies changed, +which could easily affect how a driver works. + +Drivers need to be able to handle hardware which has been reset since all of the +suspend methods were called, for example by complete reinitialization. +This may be the hardest part, and the one most protected by NDA'd documents +and chip errata. It's simplest if the hardware state hasn't changed since +the suspend was carried out, but that can only be guaranteed if the target +system sleep entered was suspend-to-idle. For the other system sleep states +that may not be the case (and usually isn't for ACPI-defined system sleep +states, like S3). + +Drivers must also be prepared to notice that the device has been removed +while the system was powered down, whenever that's physically possible. +PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses +where common Linux platforms will see such removal. Details of how drivers +will notice and handle such removals are currently bus-specific, and often +involve a separate thread. + +These callbacks may return an error value, but the PM core will ignore such +errors since there's nothing it can do about them other than printing them in +the system log. + + +Entering Hibernation +-------------------- + +Hibernating the system is more complicated than putting it into sleep states, +because it involves creating and saving a system image. Therefore there are +more phases for hibernation, with a different set of callbacks. These phases +always run after tasks have been frozen and enough memory has been freed. + +The general procedure for hibernation is to quiesce all devices ("freeze"), +create an image of the system memory while everything is stable, reactivate all +devices ("thaw"), write the image to permanent storage, and finally shut down +the system ("power off"). The phases used to accomplish this are: ``prepare``, +``freeze``, ``freeze_late``, ``freeze_noirq``, ``thaw_noirq``, ``thaw_early``, +``thaw``, ``complete``, ``prepare``, ``poweroff``, ``poweroff_late``, +``poweroff_noirq``. + + 1. The ``prepare`` phase is discussed in the "Entering System Suspend" + section above. + + 2. The ``->freeze`` methods should quiesce the device so that it doesn't + generate IRQs or DMA, and they may need to save the values of device + registers. However the device does not have to be put in a low-power + state, and to save time it's best not to do so. Also, the device should + not be prepared to generate wakeup events. + + 3. The ``freeze_late`` phase is analogous to the ``suspend_late`` phase + described earlier, except that the device should not be put into a + low-power state and should not be allowed to generate wakeup events. + + 4. The ``freeze_noirq`` phase is analogous to the ``suspend_noirq`` phase + discussed earlier, except again that the device should not be put into + a low-power state and should not be allowed to generate wakeup events. + +At this point the system image is created. All devices should be inactive and +the contents of memory should remain undisturbed while this happens, so that the +image forms an atomic snapshot of the system state. + + 5. The ``thaw_noirq`` phase is analogous to the ``resume_noirq`` phase + discussed earlier. The main difference is that its methods can assume + the device is in the same state as at the end of the ``freeze_noirq`` + phase. + + 6. The ``thaw_early`` phase is analogous to the ``resume_early`` phase + described above. Its methods should undo the actions of the preceding + ``freeze_late``, if necessary. + + 7. The ``thaw`` phase is analogous to the ``resume`` phase discussed + earlier. Its methods should bring the device back to an operating + state, so that it can be used for saving the image if necessary. + + 8. The ``complete`` phase is discussed in the "Leaving System Suspend" + section above. + +At this point the system image is saved, and the devices then need to be +prepared for the upcoming system shutdown. This is much like suspending them +before putting the system into the suspend-to-idle, shallow or deep sleep state, +and the phases are similar. + + 9. The ``prepare`` phase is discussed above. + + 10. The ``poweroff`` phase is analogous to the ``suspend`` phase. + + 11. The ``poweroff_late`` phase is analogous to the ``suspend_late`` phase. + + 12. The ``poweroff_noirq`` phase is analogous to the ``suspend_noirq`` phase. + +The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks +should do essentially the same things as the ``->suspend``, ``->suspend_late`` +and ``->suspend_noirq`` callbacks, respectively. The only notable difference is +that they need not store the device register values, because the registers +should already have been stored during the ``freeze``, ``freeze_late`` or +``freeze_noirq`` phases. + + +Leaving Hibernation +------------------- + +Resuming from hibernation is, again, more complicated than resuming from a sleep +state in which the contents of main memory are preserved, because it requires +a system image to be loaded into memory and the pre-hibernation memory contents +to be restored before control can be passed back to the image kernel. + +Although in principle the image might be loaded into memory and the +pre-hibernation memory contents restored by the boot loader, in practice this +can't be done because boot loaders aren't smart enough and there is no +established protocol for passing the necessary information. So instead, the +boot loader loads a fresh instance of the kernel, called "the restore kernel", +into memory and passes control to it in the usual way. Then the restore kernel +reads the system image, restores the pre-hibernation memory contents, and passes +control to the image kernel. Thus two different kernel instances are involved +in resuming from hibernation. In fact, the restore kernel may be completely +different from the image kernel: a different configuration and even a different +version. This has important consequences for device drivers and their +subsystems. + +To be able to load the system image into memory, the restore kernel needs to +include at least a subset of device drivers allowing it to access the storage +medium containing the image, although it doesn't need to include all of the +drivers present in the image kernel. After the image has been loaded, the +devices managed by the boot kernel need to be prepared for passing control back +to the image kernel. This is very similar to the initial steps involved in +creating a system image, and it is accomplished in the same way, using +``prepare``, ``freeze``, and ``freeze_noirq`` phases. However, the devices +affected by these phases are only those having drivers in the restore kernel; +other devices will still be in whatever state the boot loader left them. + +Should the restoration of the pre-hibernation memory contents fail, the restore +kernel would go through the "thawing" procedure described above, using the +``thaw_noirq``, ``thaw_early``, ``thaw``, and ``complete`` phases, and then +continue running normally. This happens only rarely. Most often the +pre-hibernation memory contents are restored successfully and control is passed +to the image kernel, which then becomes responsible for bringing the system back +to the working state. + +To achieve this, the image kernel must restore the devices' pre-hibernation +functionality. The operation is much like waking up from a sleep state (with +the memory contents preserved), although it involves different phases: +``restore_noirq``, ``restore_early``, ``restore``, ``complete``. + + 1. The ``restore_noirq`` phase is analogous to the ``resume_noirq`` phase. + + 2. The ``restore_early`` phase is analogous to the ``resume_early`` phase. + + 3. The ``restore`` phase is analogous to the ``resume`` phase. + + 4. The ``complete`` phase is discussed above. + +The main difference from ``resume[_early|_noirq]`` is that +``restore[_early|_noirq]`` must assume the device has been accessed and +reconfigured by the boot loader or the restore kernel. Consequently, the state +of the device may be different from the state remembered from the ``freeze``, +``freeze_late`` and ``freeze_noirq`` phases. The device may even need to be +reset and completely re-initialized. In many cases this difference doesn't +matter, so the ``->resume[_early|_noirq]`` and ``->restore[_early|_norq]`` +method pointers can be set to the same routines. Nevertheless, different +callback pointers are used in case there is a situation where it actually does +matter. + + +Power Management Notifiers +========================== + +There are some operations that cannot be carried out by the power management +callbacks discussed above, because the callbacks occur too late or too early. +To handle these cases, subsystems and device drivers may register power +management notifiers that are called before tasks are frozen and after they have +been thawed. Generally speaking, the PM notifiers are suitable for performing +actions that either require user space to be available, or at least won't +interfere with user space. + +For details refer to :file:`Documentation/power/notifiers.txt`. + + +Device Low-Power (suspend) States +================================= + +Device low-power states aren't standard. One device might only handle +"on" and "off", while another might support a dozen different versions of +"on" (how many engines are active?), plus a state that gets back to "on" +faster than from a full "off". + +Some buses define rules about what different suspend states mean. PCI +gives one example: after the suspend sequence completes, a non-legacy +PCI device may not perform DMA or issue IRQs, and any wakeup events it +issues would be issued through the PME# bus signal. Plus, there are +several PCI-standard device states, some of which are optional. + +In contrast, integrated system-on-chip processors often use IRQs as the +wakeup event sources (so drivers would call :c:func:`enable_irq_wake`) and +might be able to treat DMA completion as a wakeup event (sometimes DMA can stay +active too, it'd only be the CPU and some peripherals that sleep). + +Some details here may be platform-specific. Systems may have devices that +can be fully active in certain sleep states, such as an LCD display that's +refreshed using DMA while most of the system is sleeping lightly ... and +its frame buffer might even be updated by a DSP or other non-Linux CPU while +the Linux control processor stays idle. + +Moreover, the specific actions taken may depend on the target system state. +One target system state might allow a given device to be very operational; +another might require a hard shut down with re-initialization on resume. +And two different target systems might use the same device in different +ways; the aforementioned LCD might be active in one product's "standby", +but a different product using the same SOC might work differently. + + +Device Power Management Domains +=============================== + +Sometimes devices share reference clocks or other power resources. In those +cases it generally is not possible to put devices into low-power states +individually. Instead, a set of devices sharing a power resource can be put +into a low-power state together at the same time by turning off the shared +power resource. Of course, they also need to be put into the full-power state +together, by turning the shared power resource on. A set of devices with this +property is often referred to as a power domain. A power domain may also be +nested inside another power domain. The nested domain is referred to as the +sub-domain of the parent domain. + +Support for power domains is provided through the :c:member:`pm_domain` field of +|struct| :c:type:`device`. This field is a pointer to an object of +type |struct| :c:type:`dev_pm_domain`, defined in :file:`include/linux/pm.h``, +providing a set of power management callbacks analogous to the subsystem-level +and device driver callbacks that are executed for the given device during all +power transitions, instead of the respective subsystem-level callbacks. +Specifically, if a device's :c:member:`pm_domain` pointer is not NULL, the +``->suspend()`` callback from the object pointed to by it will be executed +instead of its subsystem's (e.g. bus type's) ``->suspend()`` callback and +analogously for all of the remaining callbacks. In other words, power +management domain callbacks, if defined for the given device, always take +precedence over the callbacks provided by the device's subsystem (e.g. bus type). + +The support for device power management domains is only relevant to platforms +needing to use the same device driver power management callbacks in many +different power domain configurations and wanting to avoid incorporating the +support for power domains into subsystem-level callbacks, for example by +modifying the platform bus type. Other platforms need not implement it or take +it into account in any way. + +Devices may be defined as IRQ-safe which indicates to the PM core that their +runtime PM callbacks may be invoked with disabled interrupts (see +:file:`Documentation/power/runtime_pm.txt` for more information). If an +IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be +disallowed, unless the domain itself is defined as IRQ-safe. However, it +makes sense to define a PM domain as IRQ-safe only if all the devices in it +are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime +PM of the parent is only allowed if the parent itself is IRQ-safe too with the +additional restriction that all child domains of an IRQ-safe parent must also +be IRQ-safe. + + +Runtime Power Management +======================== + +Many devices are able to dynamically power down while the system is still +running. This feature is useful for devices that are not being used, and +can offer significant power savings on a running system. These devices +often support a range of runtime power states, which might use names such +as "off", "sleep", "idle", "active", and so on. Those states will in some +cases (like PCI) be partially constrained by the bus the device uses, and will +usually include hardware states that are also used in system sleep states. + +A system-wide power transition can be started while some devices are in low +power states due to runtime power management. The system sleep PM callbacks +should recognize such situations and react to them appropriately, but the +necessary actions are subsystem-specific. + +In some cases the decision may be made at the subsystem level while in other +cases the device driver may be left to decide. In some cases it may be +desirable to leave a suspended device in that state during a system-wide power +transition, but in other cases the device must be put back into the full-power +state temporarily, for example so that its system wakeup capability can be +disabled. This all depends on the hardware and the design of the subsystem and +device driver in question. + +During system-wide resume from a sleep state it's easiest to put devices into +the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. +Refer to that document for more information regarding this particular issue as +well as for information on the device runtime power management framework in +general. diff --git a/Documentation/driver-api/pm/index.rst b/Documentation/driver-api/pm/index.rst new file mode 100644 index 000000000000..f7b3ce9e9ba0 --- /dev/null +++ b/Documentation/driver-api/pm/index.rst @@ -0,0 +1,15 @@ +======================= +Device Power Management +======================= + +.. toctree:: + + devices + types + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pm/types.rst b/Documentation/driver-api/pm/types.rst new file mode 100644 index 000000000000..3ebdecc54104 --- /dev/null +++ b/Documentation/driver-api/pm/types.rst @@ -0,0 +1,5 @@ +================================== +Device Power Management Data Types +================================== + +.. kernel-doc:: include/linux/pm.h diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt deleted file mode 100644 index 73ddea39a9ce..000000000000 --- a/Documentation/power/devices.txt +++ /dev/null @@ -1,716 +0,0 @@ -Device Power Management - -Copyright (c) 2010-2011 Rafael J. Wysocki , Novell Inc. -Copyright (c) 2010 Alan Stern -Copyright (c) 2014 Intel Corp., Rafael J. Wysocki - - -Most of the code in Linux is device drivers, so most of the Linux power -management (PM) code is also driver-specific. Most drivers will do very -little; others, especially for platforms with small batteries (like cell -phones), will do a lot. - -This writeup gives an overview of how drivers interact with system-wide -power management goals, emphasizing the models and interfaces that are -shared by everything that hooks up to the driver model core. Read it as -background for the domain-specific work you'd do with any specific driver. - - -Two Models for Device Power Management -====================================== -Drivers will use one or both of these models to put devices into low-power -states: - - System Sleep model: - Drivers can enter low-power states as part of entering system-wide - low-power states like "suspend" (also known as "suspend-to-RAM"), or - (mostly for systems with disks) "hibernation" (also known as - "suspend-to-disk"). - - This is something that device, bus, and class drivers collaborate on - by implementing various role-specific suspend and resume methods to - cleanly power down hardware and software subsystems, then reactivate - them without loss of data. - - Some drivers can manage hardware wakeup events, which make the system - leave the low-power state. This feature may be enabled or disabled - using the relevant /sys/devices/.../power/wakeup file (for Ethernet - drivers the ioctl interface used by ethtool may also be used for this - purpose); enabling it may cost some power usage, but let the whole - system enter low-power states more often. - - Runtime Power Management model: - Devices may also be put into low-power states while the system is - running, independently of other power management activity in principle. - However, devices are not generally independent of each other (for - example, a parent device cannot be suspended unless all of its child - devices have been suspended). Moreover, depending on the bus type the - device is on, it may be necessary to carry out some bus-specific - operations on the device for this purpose. Devices put into low power - states at run time may require special handling during system-wide power - transitions (suspend or hibernation). - - For these reasons not only the device driver itself, but also the - appropriate subsystem (bus type, device type or device class) driver and - the PM core are involved in runtime power management. As in the system - sleep power management case, they need to collaborate by implementing - various role-specific suspend and resume methods, so that the hardware - is cleanly powered down and reactivated without data or service loss. - -There's not a lot to be said about those low-power states except that they are -very system-specific, and often device-specific. Also, that if enough devices -have been put into low-power states (at runtime), the effect may be very similar -to entering some system-wide low-power state (system sleep) ... and that -synergies exist, so that several drivers using runtime PM might put the system -into a state where even deeper power saving options are available. - -Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except -for wakeup events), no more data read or written, and requests from upstream -drivers are no longer accepted. A given bus or platform may have different -requirements though. - -Examples of hardware wakeup events include an alarm from a real time clock, -network wake-on-LAN packets, keyboard or mouse activity, and media insertion -or removal (for PCMCIA, MMC/SD, USB, and so on). - - -Interfaces for Entering System Sleep States -=========================================== -There are programming interfaces provided for subsystems (bus type, device type, -device class) and device drivers to allow them to participate in the power -management of devices they are concerned with. These interfaces cover both -system sleep and runtime power management. - - -Device Power Management Operations ----------------------------------- -Device power management operations, at the subsystem level as well as at the -device driver level, are implemented by defining and populating objects of type -struct dev_pm_ops: - -struct dev_pm_ops { - int (*prepare)(struct device *dev); - void (*complete)(struct device *dev); - int (*suspend)(struct device *dev); - int (*resume)(struct device *dev); - int (*freeze)(struct device *dev); - int (*thaw)(struct device *dev); - int (*poweroff)(struct device *dev); - int (*restore)(struct device *dev); - int (*suspend_late)(struct device *dev); - int (*resume_early)(struct device *dev); - int (*freeze_late)(struct device *dev); - int (*thaw_early)(struct device *dev); - int (*poweroff_late)(struct device *dev); - int (*restore_early)(struct device *dev); - int (*suspend_noirq)(struct device *dev); - int (*resume_noirq)(struct device *dev); - int (*freeze_noirq)(struct device *dev); - int (*thaw_noirq)(struct device *dev); - int (*poweroff_noirq)(struct device *dev); - int (*restore_noirq)(struct device *dev); - int (*runtime_suspend)(struct device *dev); - int (*runtime_resume)(struct device *dev); - int (*runtime_idle)(struct device *dev); -}; - -This structure is defined in include/linux/pm.h and the methods included in it -are also described in that file. Their roles will be explained in what follows. -For now, it should be sufficient to remember that the last three methods are -specific to runtime power management while the remaining ones are used during -system-wide power transitions. - -There also is a deprecated "old" or "legacy" interface for power management -operations available at least for some subsystems. This approach does not use -struct dev_pm_ops objects and it is suitable only for implementing system sleep -power management methods. Therefore it is not described in this document, so -please refer directly to the source code for more information about it. - - -Subsystem-Level Methods ------------------------ -The core methods to suspend and resume devices reside in struct dev_pm_ops -pointed to by the ops member of struct dev_pm_domain, or by the pm member of -struct bus_type, struct device_type and struct class. They are mostly of -interest to the people writing infrastructure for platforms and buses, like PCI -or USB, or device type and device class drivers. They also are relevant to the -writers of device drivers whose subsystems (PM domains, device types, device -classes and bus types) don't provide all power management methods. - -Bus drivers implement these methods as appropriate for the hardware and the -drivers using it; PCI works differently from USB, and so on. Not many people -write subsystem-level drivers; most driver code is a "device driver" that builds -on top of bus-specific framework code. - -For more information on these driver calls, see the description later; -they are called in phases for every device, respecting the parent-child -sequencing in the driver model tree. - - -/sys/devices/.../power/wakeup files ------------------------------------ -All device objects in the driver model contain fields that control the handling -of system wakeup events (hardware signals that can force the system out of a -sleep state). These fields are initialized by bus or device driver code using -device_set_wakeup_capable() and device_set_wakeup_enable(), defined in -include/linux/pm_wakeup.h. - -The "power.can_wakeup" flag just records whether the device (and its driver) can -physically support wakeup events. The device_set_wakeup_capable() routine -affects this flag. The "power.wakeup" field is a pointer to an object of type -struct wakeup_source used for controlling whether or not the device should use -its system wakeup mechanism and for notifying the PM core of system wakeup -events signaled by the device. This object is only present for wakeup-capable -devices (i.e. devices whose "can_wakeup" flags are set) and is created (or -removed) by device_set_wakeup_capable(). - -Whether or not a device is capable of issuing wakeup events is a hardware -matter, and the kernel is responsible for keeping track of it. By contrast, -whether or not a wakeup-capable device should issue wakeup events is a policy -decision, and it is managed by user space through a sysfs attribute: the -"power/wakeup" file. User space can write the strings "enabled" or "disabled" -to it to indicate whether or not, respectively, the device is supposed to signal -system wakeup. This file is only present if the "power.wakeup" object exists -for the given device and is created (or removed) along with that object, by -device_set_wakeup_capable(). Reads from the file will return the corresponding -string. - -The "power/wakeup" file is supposed to contain the "disabled" string initially -for the majority of devices; the major exceptions are power buttons, keyboards, -and Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with -ethtool. It should also default to "enabled" for devices that don't generate -wakeup requests on their own but merely forward wakeup requests from one bus to -another (like PCI Express ports). - -The device_may_wakeup() routine returns true only if the "power.wakeup" object -exists and the corresponding "power/wakeup" file contains the string "enabled". -This information is used by subsystems, like the PCI bus type code, to see -whether or not to enable the devices' wakeup mechanisms. If device wakeup -mechanisms are enabled or disabled directly by drivers, they also should use -device_may_wakeup() to decide what to do during a system sleep transition. -Device drivers, however, are not supposed to call device_set_wakeup_enable() -directly in any case. - -It ought to be noted that system wakeup is conceptually different from "remote -wakeup" used by runtime power management, although it may be supported by the -same physical mechanism. Remote wakeup is a feature allowing devices in -low-power states to trigger specific interrupts to signal conditions in which -they should be put into the full-power state. Those interrupts may or may not -be used to signal system wakeup events, depending on the hardware design. On -some systems it is impossible to trigger them from system sleep states. In any -case, remote wakeup should always be enabled for runtime power management for -all devices and drivers that support it. - -/sys/devices/.../power/control files ------------------------------------- -Each device in the driver model has a flag to control whether it is subject to -runtime power management. This flag, called runtime_auto, is initialized by the -bus type (or generally subsystem) code using pm_runtime_allow() or -pm_runtime_forbid(); the default is to allow runtime power management. - -The setting can be adjusted by user space by writing either "on" or "auto" to -the device's power/control sysfs file. Writing "auto" calls pm_runtime_allow(), -setting the flag and allowing the device to be runtime power-managed by its -driver. Writing "on" calls pm_runtime_forbid(), clearing the flag, returning -the device to full power if it was in a low-power state, and preventing the -device from being runtime power-managed. User space can check the current value -of the runtime_auto flag by reading the file. - -The device's runtime_auto flag has no effect on the handling of system-wide -power transitions. In particular, the device can (and in the majority of cases -should and will) be put into a low-power state during a system-wide transition -to a sleep state even though its runtime_auto flag is clear. - -For more information about the runtime power management framework, refer to -Documentation/power/runtime_pm.txt. - - -Calling Drivers to Enter and Leave System Sleep States -====================================================== -When the system goes into a sleep state, each device's driver is asked to -suspend the device by putting it into a state compatible with the target -system state. That's usually some version of "off", but the details are -system-specific. Also, wakeup-enabled devices will usually stay partly -functional in order to wake the system. - -When the system leaves that low-power state, the device's driver is asked to -resume it by returning it to full power. The suspend and resume operations -always go together, and both are multi-phase operations. - -For simple drivers, suspend might quiesce the device using class code -and then turn its hardware as "off" as possible during suspend_noirq. The -matching resume calls would then completely reinitialize the hardware -before reactivating its class I/O queues. - -More power-aware drivers might prepare the devices for triggering system wakeup -events. - - -Call Sequence Guarantees ------------------------- -To ensure that bridges and similar links needing to talk to a device are -available when the device is suspended or resumed, the device tree is -walked in a bottom-up order to suspend devices. A top-down order is -used to resume those devices. - -The ordering of the device tree is defined by the order in which devices -get registered: a child can never be registered, probed or resumed before -its parent; and can't be removed or suspended after that parent. - -The policy is that the device tree should match hardware bus topology. -(Or at least the control bus, for devices which use multiple busses.) -In particular, this means that a device registration may fail if the parent of -the device is suspending (i.e. has been chosen by the PM core as the next -device to suspend) or has already suspended, as well as after all of the other -devices have been suspended. Device drivers must be prepared to cope with such -situations. - - -System Power Management Phases ------------------------------- -Suspending or resuming the system is done in several phases. Different phases -are used for freeze, standby, and memory sleep states ("suspend-to-RAM") and the -hibernation state ("suspend-to-disk"). Each phase involves executing callbacks -for every device before the next phase begins. Not all busses or classes -support all these callbacks and not all drivers use all the callbacks. The -various phases always run after tasks have been frozen and before they are -unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have -been disabled (except for those marked with the IRQF_NO_SUSPEND flag). - -All phases use PM domain, bus, type, class or driver callbacks (that is, methods -defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, dev->class->pm or -dev->driver->pm). These callbacks are regarded by the PM core as mutually -exclusive. Moreover, PM domain callbacks always take precedence over all of the -other callbacks and, for example, type callbacks take precedence over bus, class -and driver callbacks. To be precise, the following rules are used to determine -which callback to execute in the given phase: - - 1. If dev->pm_domain is present, the PM core will choose the callback - included in dev->pm_domain->ops for execution - - 2. Otherwise, if both dev->type and dev->type->pm are present, the callback - included in dev->type->pm will be chosen for execution. - - 3. Otherwise, if both dev->class and dev->class->pm are present, the - callback included in dev->class->pm will be chosen for execution. - - 4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback - included in dev->bus->pm will be chosen for execution. - -This allows PM domains and device types to override callbacks provided by bus -types or device classes if necessary. - -The PM domain, type, class and bus callbacks may in turn invoke device- or -driver-specific methods stored in dev->driver->pm, but they don't have to do -that. - -If the subsystem callback chosen for execution is not present, the PM core will -execute the corresponding method from dev->driver->pm instead if there is one. - - -Entering System Suspend ------------------------ -When the system goes into the freeze, standby or memory sleep state, -the phases are: - - prepare, suspend, suspend_late, suspend_noirq. - - 1. The prepare phase is meant to prevent races by preventing new devices - from being registered; the PM core would never know that all the - children of a device had been suspended if new children could be - registered at will. (By contrast, devices may be unregistered at any - time.) Unlike the other suspend-related phases, during the prepare - phase the device tree is traversed top-down. - - After the prepare callback method returns, no new children may be - registered below the device. The method may also prepare the device or - driver in some way for the upcoming system power transition, but it - should not put the device into a low-power state. - - For devices supporting runtime power management, the return value of the - prepare callback can be used to indicate to the PM core that it may - safely leave the device in runtime suspend (if runtime-suspended - already), provided that all of the device's descendants are also left in - runtime suspend. Namely, if the prepare callback returns a positive - number and that happens for all of the descendants of the device too, - and all of them (including the device itself) are runtime-suspended, the - PM core will skip the suspend, suspend_late and suspend_noirq suspend - phases as well as the resume_noirq, resume_early and resume phases of - the following system resume for all of these devices. In that case, - the complete callback will be called directly after the prepare callback - and is entirely responsible for bringing the device back to the - functional state as appropriate. - - Note that this direct-complete procedure applies even if the device is - disabled for runtime PM; only the runtime-PM status matters. It follows - that if a device has system-sleep callbacks but does not support runtime - PM, then its prepare callback must never return a positive value. This - is because all devices are initially set to runtime-suspended with - runtime PM disabled. - - 2. The suspend methods should quiesce the device to stop it from performing - I/O. They also may save the device registers and put it into the - appropriate low-power state, depending on the bus type the device is on, - and they may enable wakeup events. - - 3 For a number of devices it is convenient to split suspend into the - "quiesce device" and "save device state" phases, in which cases - suspend_late is meant to do the latter. It is always executed after - runtime power management has been disabled for all devices. - - 4. The suspend_noirq phase occurs after IRQ handlers have been disabled, - which means that the driver's interrupt handler will not be called while - the callback method is running. The methods should save the values of - the device's registers that weren't saved previously and finally put the - device into the appropriate low-power state. - - The majority of subsystems and device drivers need not implement this - callback. However, bus types allowing devices to share interrupt - vectors, like PCI, generally need it; otherwise a driver might encounter - an error during the suspend phase by fielding a shared interrupt - generated by some other device after its own device had been set to low - power. - -At the end of these phases, drivers should have stopped all I/O transactions -(DMA, IRQs), saved enough state that they can re-initialize or restore previous -state (as needed by the hardware), and placed the device into a low-power state. -On many platforms they will gate off one or more clock sources; sometimes they -will also switch off power supplies or reduce voltages. (Drivers supporting -runtime PM may already have performed some or all of these steps.) - -If device_may_wakeup(dev) returns true, the device should be prepared for -generating hardware wakeup signals to trigger a system wakeup event when the -system is in the sleep state. For example, enable_irq_wake() might identify -GPIO signals hooked up to a switch or other external hardware, and -pci_enable_wake() does something similar for the PCI PME signal. - -If any of these callbacks returns an error, the system won't enter the desired -low-power state. Instead the PM core will unwind its actions by resuming all -the devices that were suspended. - - -Leaving System Suspend ----------------------- -When resuming from freeze, standby or memory sleep, the phases are: - - resume_noirq, resume_early, resume, complete. - - 1. The resume_noirq callback methods should perform any actions needed - before the driver's interrupt handlers are invoked. This generally - means undoing the actions of the suspend_noirq phase. If the bus type - permits devices to share interrupt vectors, like PCI, the method should - bring the device and its driver into a state in which the driver can - recognize if the device is the source of incoming interrupts, if any, - and handle them correctly. - - For example, the PCI bus type's ->pm.resume_noirq() puts the device into - the full-power state (D0 in the PCI terminology) and restores the - standard configuration registers of the device. Then it calls the - device driver's ->pm.resume_noirq() method to perform device-specific - actions. - - 2. The resume_early methods should prepare devices for the execution of - the resume methods. This generally involves undoing the actions of the - preceding suspend_late phase. - - 3 The resume methods should bring the device back to its operating - state, so that it can perform normal I/O. This generally involves - undoing the actions of the suspend phase. - - 4. The complete phase should undo the actions of the prepare phase. Note, - however, that new children may be registered below the device as soon as - the resume callbacks occur; it's not necessary to wait until the - complete phase. - - Moreover, if the preceding prepare callback returned a positive number, - the device may have been left in runtime suspend throughout the whole - system suspend and resume (the suspend, suspend_late, suspend_noirq - phases of system suspend and the resume_noirq, resume_early, resume - phases of system resume may have been skipped for it). In that case, - the complete callback is entirely responsible for bringing the device - back to the functional state after system suspend if necessary. [For - example, it may need to queue up a runtime resume request for the device - for this purpose.] To check if that is the case, the complete callback - can consult the device's power.direct_complete flag. Namely, if that - flag is set when the complete callback is being run, it has been called - directly after the preceding prepare and special action may be required - to make the device work correctly afterward. - -At the end of these phases, drivers should be as functional as they were before -suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are -gated on. - -However, the details here may again be platform-specific. For example, -some systems support multiple "run" states, and the mode in effect at -the end of resume might not be the one which preceded suspension. -That means availability of certain clocks or power supplies changed, -which could easily affect how a driver works. - -Drivers need to be able to handle hardware which has been reset since the -suspend methods were called, for example by complete reinitialization. -This may be the hardest part, and the one most protected by NDA'd documents -and chip errata. It's simplest if the hardware state hasn't changed since -the suspend was carried out, but that can't be guaranteed (in fact, it usually -is not the case). - -Drivers must also be prepared to notice that the device has been removed -while the system was powered down, whenever that's physically possible. -PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses -where common Linux platforms will see such removal. Details of how drivers -will notice and handle such removals are currently bus-specific, and often -involve a separate thread. - -These callbacks may return an error value, but the PM core will ignore such -errors since there's nothing it can do about them other than printing them in -the system log. - - -Entering Hibernation --------------------- -Hibernating the system is more complicated than putting it into the other -sleep states, because it involves creating and saving a system image. -Therefore there are more phases for hibernation, with a different set of -callbacks. These phases always run after tasks have been frozen and memory has -been freed. - -The general procedure for hibernation is to quiesce all devices (freeze), create -an image of the system memory while everything is stable, reactivate all -devices (thaw), write the image to permanent storage, and finally shut down the -system (poweroff). The phases used to accomplish this are: - - prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early, - thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq - - 1. The prepare phase is discussed in the "Entering System Suspend" section - above. - - 2. The freeze methods should quiesce the device so that it doesn't generate - IRQs or DMA, and they may need to save the values of device registers. - However the device does not have to be put in a low-power state, and to - save time it's best not to do so. Also, the device should not be - prepared to generate wakeup events. - - 3. The freeze_late phase is analogous to the suspend_late phase described - above, except that the device should not be put in a low-power state and - should not be allowed to generate wakeup events by it. - - 4. The freeze_noirq phase is analogous to the suspend_noirq phase discussed - above, except again that the device should not be put in a low-power - state and should not be allowed to generate wakeup events. - -At this point the system image is created. All devices should be inactive and -the contents of memory should remain undisturbed while this happens, so that the -image forms an atomic snapshot of the system state. - - 5. The thaw_noirq phase is analogous to the resume_noirq phase discussed - above. The main difference is that its methods can assume the device is - in the same state as at the end of the freeze_noirq phase. - - 6. The thaw_early phase is analogous to the resume_early phase described - above. Its methods should undo the actions of the preceding - freeze_late, if necessary. - - 7. The thaw phase is analogous to the resume phase discussed above. Its - methods should bring the device back to an operating state, so that it - can be used for saving the image if necessary. - - 8. The complete phase is discussed in the "Leaving System Suspend" section - above. - -At this point the system image is saved, and the devices then need to be -prepared for the upcoming system shutdown. This is much like suspending them -before putting the system into the freeze, standby or memory sleep state, -and the phases are similar. - - 9. The prepare phase is discussed above. - - 10. The poweroff phase is analogous to the suspend phase. - - 11. The poweroff_late phase is analogous to the suspend_late phase. - - 12. The poweroff_noirq phase is analogous to the suspend_noirq phase. - -The poweroff, poweroff_late and poweroff_noirq callbacks should do essentially -the same things as the suspend, suspend_late and suspend_noirq callbacks, -respectively. The only notable difference is that they need not store the -device register values, because the registers should already have been stored -during the freeze, freeze_late or freeze_noirq phases. - - -Leaving Hibernation -------------------- -Resuming from hibernation is, again, more complicated than resuming from a sleep -state in which the contents of main memory are preserved, because it requires -a system image to be loaded into memory and the pre-hibernation memory contents -to be restored before control can be passed back to the image kernel. - -Although in principle, the image might be loaded into memory and the -pre-hibernation memory contents restored by the boot loader, in practice this -can't be done because boot loaders aren't smart enough and there is no -established protocol for passing the necessary information. So instead, the -boot loader loads a fresh instance of the kernel, called the boot kernel, into -memory and passes control to it in the usual way. Then the boot kernel reads -the system image, restores the pre-hibernation memory contents, and passes -control to the image kernel. Thus two different kernels are involved in -resuming from hibernation. In fact, the boot kernel may be completely different -from the image kernel: a different configuration and even a different version. -This has important consequences for device drivers and their subsystems. - -To be able to load the system image into memory, the boot kernel needs to -include at least a subset of device drivers allowing it to access the storage -medium containing the image, although it doesn't need to include all of the -drivers present in the image kernel. After the image has been loaded, the -devices managed by the boot kernel need to be prepared for passing control back -to the image kernel. This is very similar to the initial steps involved in -creating a system image, and it is accomplished in the same way, using prepare, -freeze, and freeze_noirq phases. However the devices affected by these phases -are only those having drivers in the boot kernel; other devices will still be in -whatever state the boot loader left them. - -Should the restoration of the pre-hibernation memory contents fail, the boot -kernel would go through the "thawing" procedure described above, using the -thaw_noirq, thaw, and complete phases, and then continue running normally. This -happens only rarely. Most often the pre-hibernation memory contents are -restored successfully and control is passed to the image kernel, which then -becomes responsible for bringing the system back to the working state. - -To achieve this, the image kernel must restore the devices' pre-hibernation -functionality. The operation is much like waking up from the memory sleep -state, although it involves different phases: - - restore_noirq, restore_early, restore, complete - - 1. The restore_noirq phase is analogous to the resume_noirq phase. - - 2. The restore_early phase is analogous to the resume_early phase. - - 3. The restore phase is analogous to the resume phase. - - 4. The complete phase is discussed above. - -The main difference from resume[_early|_noirq] is that restore[_early|_noirq] -must assume the device has been accessed and reconfigured by the boot loader or -the boot kernel. Consequently the state of the device may be different from the -state remembered from the freeze, freeze_late and freeze_noirq phases. The -device may even need to be reset and completely re-initialized. In many cases -this difference doesn't matter, so the resume[_early|_noirq] and -restore[_early|_norq] method pointers can be set to the same routines. -Nevertheless, different callback pointers are used in case there is a situation -where it actually does matter. - - -Device Power Management Domains -------------------------------- -Sometimes devices share reference clocks or other power resources. In those -cases it generally is not possible to put devices into low-power states -individually. Instead, a set of devices sharing a power resource can be put -into a low-power state together at the same time by turning off the shared -power resource. Of course, they also need to be put into the full-power state -together, by turning the shared power resource on. A set of devices with this -property is often referred to as a power domain. A power domain may also be -nested inside another power domain. The nested domain is referred to as the -sub-domain of the parent domain. - -Support for power domains is provided through the pm_domain field of struct -device. This field is a pointer to an object of type struct dev_pm_domain, -defined in include/linux/pm.h, providing a set of power management callbacks -analogous to the subsystem-level and device driver callbacks that are executed -for the given device during all power transitions, instead of the respective -subsystem-level callbacks. Specifically, if a device's pm_domain pointer is -not NULL, the ->suspend() callback from the object pointed to by it will be -executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and -analogously for all of the remaining callbacks. In other words, power -management domain callbacks, if defined for the given device, always take -precedence over the callbacks provided by the device's subsystem (e.g. bus -type). - -The support for device power management domains is only relevant to platforms -needing to use the same device driver power management callbacks in many -different power domain configurations and wanting to avoid incorporating the -support for power domains into subsystem-level callbacks, for example by -modifying the platform bus type. Other platforms need not implement it or take -it into account in any way. - -Devices may be defined as IRQ-safe which indicates to the PM core that their -runtime PM callbacks may be invoked with disabled interrupts (see -Documentation/power/runtime_pm.txt for more information). If an IRQ-safe -device belongs to a PM domain, the runtime PM of the domain will be -disallowed, unless the domain itself is defined as IRQ-safe. However, it -makes sense to define a PM domain as IRQ-safe only if all the devices in it -are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime -PM of the parent is only allowed if the parent itself is IRQ-safe too with the -additional restriction that all child domains of an IRQ-safe parent must also -be IRQ-safe. - -Device Low Power (suspend) States ---------------------------------- -Device low-power states aren't standard. One device might only handle -"on" and "off", while another might support a dozen different versions of -"on" (how many engines are active?), plus a state that gets back to "on" -faster than from a full "off". - -Some busses define rules about what different suspend states mean. PCI -gives one example: after the suspend sequence completes, a non-legacy -PCI device may not perform DMA or issue IRQs, and any wakeup events it -issues would be issued through the PME# bus signal. Plus, there are -several PCI-standard device states, some of which are optional. - -In contrast, integrated system-on-chip processors often use IRQs as the -wakeup event sources (so drivers would call enable_irq_wake) and might -be able to treat DMA completion as a wakeup event (sometimes DMA can stay -active too, it'd only be the CPU and some peripherals that sleep). - -Some details here may be platform-specific. Systems may have devices that -can be fully active in certain sleep states, such as an LCD display that's -refreshed using DMA while most of the system is sleeping lightly ... and -its frame buffer might even be updated by a DSP or other non-Linux CPU while -the Linux control processor stays idle. - -Moreover, the specific actions taken may depend on the target system state. -One target system state might allow a given device to be very operational; -another might require a hard shut down with re-initialization on resume. -And two different target systems might use the same device in different -ways; the aforementioned LCD might be active in one product's "standby", -but a different product using the same SOC might work differently. - - -Power Management Notifiers --------------------------- -There are some operations that cannot be carried out by the power management -callbacks discussed above, because the callbacks occur too late or too early. -To handle these cases, subsystems and device drivers may register power -management notifiers that are called before tasks are frozen and after they have -been thawed. Generally speaking, the PM notifiers are suitable for performing -actions that either require user space to be available, or at least won't -interfere with user space. - -For details refer to Documentation/power/notifiers.txt. - - -Runtime Power Management -======================== -Many devices are able to dynamically power down while the system is still -running. This feature is useful for devices that are not being used, and -can offer significant power savings on a running system. These devices -often support a range of runtime power states, which might use names such -as "off", "sleep", "idle", "active", and so on. Those states will in some -cases (like PCI) be partially constrained by the bus the device uses, and will -usually include hardware states that are also used in system sleep states. - -A system-wide power transition can be started while some devices are in low -power states due to runtime power management. The system sleep PM callbacks -should recognize such situations and react to them appropriately, but the -necessary actions are subsystem-specific. - -In some cases the decision may be made at the subsystem level while in other -cases the device driver may be left to decide. In some cases it may be -desirable to leave a suspended device in that state during a system-wide power -transition, but in other cases the device must be put back into the full-power -state temporarily, for example so that its system wakeup capability can be -disabled. This all depends on the hardware and the design of the subsystem and -device driver in question. - -During system-wide resume from a sleep state it's easiest to put devices into -the full-power state, as explained in Documentation/power/runtime_pm.txt. Refer -to that document for more information regarding this particular issue as well as -for information on the device runtime power management framework in general. diff --git a/include/linux/pm.h b/include/linux/pm.h index 10867b11d503..a0894bc52bb4 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -276,9 +276,6 @@ typedef struct pm_message { * example, if it detects that a child was unplugged while the system was * asleep). * - * Refer to Documentation/power/devices.txt for more information about the role - * of the above callbacks in the system suspend process. - * * There also are callbacks related to runtime power management of devices. * Again, as a rule these callbacks are executed by the PM core for subsystems * (PM domains, device types, classes and bus types) and the subsystem-level -- cgit v1.2.3