diff options
Diffstat (limited to 'Documentation')
308 files changed, 14181 insertions, 5028 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-led b/Documentation/ABI/testing/sysfs-class-led index 3646ec85d513..86ace287d48b 100644 --- a/Documentation/ABI/testing/sysfs-class-led +++ b/Documentation/ABI/testing/sysfs-class-led @@ -24,7 +24,8 @@ Description: of led events. You can change triggers in a similar manner to the way an IO scheduler is chosen. Trigger specific parameters can appear in - /sys/class/leds/<led> once a given trigger is selected. + /sys/class/leds/<led> once a given trigger is selected. For + their documentation see sysfs-class-led-trigger-*. What: /sys/class/leds/<led>/inverted Date: January 2011 diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-oneshot b/Documentation/ABI/testing/sysfs-class-led-trigger-oneshot new file mode 100644 index 000000000000..378a3a4df3c8 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-oneshot @@ -0,0 +1,36 @@ +What: /sys/class/leds/<led>/delay_on +Date: Jun 2012 +KernelVersion: 3.6 +Contact: linux-leds@vger.kernel.org +Description: + Specifies for how many milliseconds the LED has to stay at + LED_FULL brightness after it has been armed. + Defaults to 100 ms. + +What: /sys/class/leds/<led>/delay_off +Date: Jun 2012 +KernelVersion: 3.6 +Contact: linux-leds@vger.kernel.org +Description: + Specifies for how many milliseconds the LED has to stay at + LED_OFF brightness after it has been armed. + Defaults to 100 ms. + +What: /sys/class/leds/<led>/invert +Date: Jun 2012 +KernelVersion: 3.6 +Contact: linux-leds@vger.kernel.org +Description: + Reverse the blink logic. If set to 0 (default) blink on for + delay_on ms, then blink off for delay_off ms, leaving the LED + normally off. If set to 1, blink off for delay_off ms, then + blink on for delay_on ms, leaving the LED normally on. + Setting this value also immediately changes the LED state. + +What: /sys/class/leds/<led>/shot +Date: Jun 2012 +KernelVersion: 3.6 +Contact: linux-leds@vger.kernel.org +Description: + Write any non-empty string to signal an events, this starts a + blink sequence if not already running. diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-usbport b/Documentation/ABI/testing/sysfs-class-led-trigger-usbport new file mode 100644 index 000000000000..f440e690daef --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-usbport @@ -0,0 +1,12 @@ +What: /sys/class/leds/<led>/ports/<port> +Date: September 2016 +KernelVersion: 4.9 +Contact: linux-leds@vger.kernel.org + linux-usb@vger.kernel.org +Description: + Every dir entry represents a single USB port that can be + selected for the USB port trigger. Selecting ports makes trigger + observing them for any connected devices and lighting on LED if + there are any. + Echoing "1" value selects USB port. Echoing "0" unselects it. + Current state can be also read. diff --git a/Documentation/ABI/testing/sysfs-class-mic.txt b/Documentation/ABI/testing/sysfs-class-mic.txt index d45eed2bf128..6ef682603179 100644 --- a/Documentation/ABI/testing/sysfs-class-mic.txt +++ b/Documentation/ABI/testing/sysfs-class-mic.txt @@ -153,7 +153,7 @@ Description: What: /sys/class/mic/mic(x)/heartbeat_enable Date: March 2015 -KernelVersion: 3.20 +KernelVersion: 4.4 Contact: Ashutosh Dixit <ashutosh.dixit@intel.com> Description: The MIC drivers detect and inform user space about card crashes diff --git a/Documentation/ABI/testing/sysfs-i2c-bmp085 b/Documentation/ABI/testing/sysfs-i2c-bmp085 deleted file mode 100644 index 585962ad0465..000000000000 --- a/Documentation/ABI/testing/sysfs-i2c-bmp085 +++ /dev/null @@ -1,31 +0,0 @@ -What: /sys/bus/i2c/devices/<busnum>-<devaddr>/pressure0_input -Date: June 2010 -Contact: Christoph Mair <christoph.mair@gmail.com> -Description: Start a pressure measurement and read the result. Values - represent the ambient air pressure in pascal (0.01 millibar). - - Reading: returns the current air pressure. - - -What: /sys/bus/i2c/devices/<busnum>-<devaddr>/temp0_input -Date: June 2010 -Contact: Christoph Mair <christoph.mair@gmail.com> -Description: Measure the ambient temperature. The returned value represents - the ambient temperature in units of 0.1 degree celsius. - - Reading: returns the current temperature. - - -What: /sys/bus/i2c/devices/<busnum>-<devaddr>/oversampling -Date: June 2010 -Contact: Christoph Mair <christoph.mair@gmail.com> -Description: Tell the bmp085 to use more samples to calculate a pressure - value. When writing to this file the chip will use 2^x samples - to calculate the next pressure value with x being the value - written. Using this feature will decrease RMS noise and - increase the measurement time. - - Reading: returns the current oversampling setting. - - Writing: sets a new oversampling setting. - Accepted values: 0..3. diff --git a/Documentation/ABI/testing/sysfs-kernel-irq b/Documentation/ABI/testing/sysfs-kernel-irq new file mode 100644 index 000000000000..eb074b100986 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-irq @@ -0,0 +1,53 @@ +What: /sys/kernel/irq +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: Directory containing information about the system's IRQs. + Specifically, data from the associated struct irq_desc. + The information here is similar to that in /proc/interrupts + but in a more machine-friendly format. This directory contains + one subdirectory for each Linux IRQ number. + +What: /sys/kernel/irq/<irq>/actions +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: The IRQ action chain. A comma-separated list of zero or more + device names associated with this interrupt. + +What: /sys/kernel/irq/<irq>/chip_name +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: Human-readable chip name supplied by the associated device + driver. + +What: /sys/kernel/irq/<irq>/hwirq +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: When interrupt translation domains are used, this file contains + the underlying hardware IRQ number used for this Linux IRQ. + +What: /sys/kernel/irq/<irq>/name +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: Human-readable flow handler name as defined by the irq chip + driver. + +What: /sys/kernel/irq/<irq>/per_cpu_count +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: The number of times the interrupt has fired since boot. This + is a comma-separated list of counters; one per CPU in CPU id + order. NOTE: This file consistently shows counters for all + CPU ids. This differs from the behavior of /proc/interrupts + which only shows counters for online CPUs. + +What: /sys/kernel/irq/<irq>/type +Date: September 2016 +KernelVersion: 4.9 +Contact: Craig Gallek <kraig@google.com> +Description: The type of the interrupt. Either the string 'level' or 'edge'. diff --git a/Documentation/Changes b/Documentation/Changes index ec97b77c8b00..22797a15dc24 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -1,8 +1,13 @@ +.. _changes: + +Minimal requerements to compile the Kernel +++++++++++++++++++++++++++++++++++++++++++ + Intro ===== This document is designed to provide a list of the minimum levels of -software necessary to run the 3.0 kernels. +software necessary to run the 4.x kernels. This document is originally based on my "Changes" file for 2.0.x kernels and therefore owes credit to the same people as that file (Jared Mauch, @@ -10,9 +15,9 @@ Axel Boldt, Alessandro Sigala, and countless other users all over the 'net). Current Minimal Requirements -============================ +**************************** -Upgrade to at *least* these software revisions before thinking you've +Upgrade to at **least** these software revisions before thinking you've encountered a bug! If you're unsure what version you're currently running, the suggested command should tell you. @@ -21,34 +26,40 @@ running a Linux kernel. Also, not all tools are necessary on all systems; obviously, if you don't have any ISDN hardware, for example, you probably needn't concern yourself with isdn4k-utils. -o GNU C 3.2 # gcc --version -o GNU make 3.80 # make --version -o binutils 2.12 # ld -v -o util-linux 2.10o # fdformat --version -o module-init-tools 0.9.10 # depmod -V -o e2fsprogs 1.41.4 # e2fsck -V -o jfsutils 1.1.3 # fsck.jfs -V -o reiserfsprogs 3.6.3 # reiserfsck -V -o xfsprogs 2.6.0 # xfs_db -V -o squashfs-tools 4.0 # mksquashfs -version -o btrfs-progs 0.18 # btrfsck -o pcmciautils 004 # pccardctl -V -o quota-tools 3.09 # quota -V -o PPP 2.4.0 # pppd --version -o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version -o nfs-utils 1.0.5 # showmount --version -o procps 3.2.0 # ps --version -o oprofile 0.9 # oprofiled --version -o udev 081 # udevd --version -o grub 0.93 # grub --version || grub-install --version -o mcelog 0.6 # mcelog --version -o iptables 1.4.2 # iptables -V -o openssl & libcrypto 1.0.0 # openssl version -o bc 1.06.95 # bc --version - +====================== =============== ======================================== + Program Minimal version Command to check the version +====================== =============== ======================================== +GNU C 3.2 gcc --version +GNU make 3.80 make --version +binutils 2.12 ld -v +util-linux 2.10o fdformat --version +module-init-tools 0.9.10 depmod -V +e2fsprogs 1.41.4 e2fsck -V +jfsutils 1.1.3 fsck.jfs -V +reiserfsprogs 3.6.3 reiserfsck -V +xfsprogs 2.6.0 xfs_db -V +squashfs-tools 4.0 mksquashfs -version +btrfs-progs 0.18 btrfsck +pcmciautils 004 pccardctl -V +quota-tools 3.09 quota -V +PPP 2.4.0 pppd --version +isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version +nfs-utils 1.0.5 showmount --version +procps 3.2.0 ps --version +oprofile 0.9 oprofiled --version +udev 081 udevd --version +grub 0.93 grub --version || grub-install --version +mcelog 0.6 mcelog --version +iptables 1.4.2 iptables -V +openssl & libcrypto 1.0.0 openssl version +bc 1.06.95 bc --version +Sphinx\ [#f1]_ 1.2 sphinx-build --version +====================== =============== ======================================== + +.. [#f1] Sphinx is needed only to build the Kernel documentation Kernel compilation -================== +****************** GCC --- @@ -64,16 +75,16 @@ You will need GNU make 3.80 or later to build the kernel. Binutils -------- -Linux on IA-32 has recently switched from using as86 to using gas for -assembling the 16-bit boot code, removing the need for as86 to compile +Linux on IA-32 has recently switched from using ``as86`` to using ``gas`` for +assembling the 16-bit boot code, removing the need for ``as86`` to compile your kernel. This change does, however, mean that you need a recent release of binutils. Perl ---- -You will need perl 5 and the following modules: Getopt::Long, Getopt::Std, -File::Basename, and File::Find to build the kernel. +You will need perl 5 and the following modules: ``Getopt::Long``, +``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel. BC -- @@ -93,7 +104,7 @@ and higher. System utilities -================ +**************** Architectural changes --------------------- @@ -115,7 +126,7 @@ well as the desired DocBook stylesheets. Util-linux ---------- -New versions of util-linux provide *fdisk support for larger disks, +New versions of util-linux provide ``fdisk`` support for larger disks, support new options to mount, recognize more supported partition types, have a fdformat which works with 2.4 kernels, and similar goodies. You'll probably want to upgrade. @@ -125,54 +136,57 @@ Ksymoops If the unthinkable happens and your kernel oopses, you may need the ksymoops tool to decode it, but in most cases you don't. -It is generally preferred to build the kernel with CONFIG_KALLSYMS so +It is generally preferred to build the kernel with ``CONFIG_KALLSYMS`` so that it produces readable dumps that can be used as-is (this also produces better output than ksymoops). If for some reason your kernel -is not build with CONFIG_KALLSYMS and you have no way to rebuild and +is not build with ``CONFIG_KALLSYMS`` and you have no way to rebuild and reproduce the Oops with that option, then you can still decode that Oops with ksymoops. Module-Init-Tools ----------------- -A new module loader is now in the kernel that requires module-init-tools +A new module loader is now in the kernel that requires ``module-init-tools`` to use. It is backward compatible with the 2.4.x series kernels. Mkinitrd -------- -These changes to the /lib/modules file tree layout also require that +These changes to the ``/lib/modules`` file tree layout also require that mkinitrd be upgraded. E2fsprogs --------- -The latest version of e2fsprogs fixes several bugs in fsck and +The latest version of ``e2fsprogs`` fixes several bugs in fsck and debugfs. Obviously, it's a good idea to upgrade. JFSutils -------- -The jfsutils package contains the utilities for the file system. +The ``jfsutils`` package contains the utilities for the file system. The following utilities are available: -o fsck.jfs - initiate replay of the transaction log, and check + +- ``fsck.jfs`` - initiate replay of the transaction log, and check and repair a JFS formatted partition. -o mkfs.jfs - create a JFS formatted partition. -o other file system utilities are also available in this package. + +- ``mkfs.jfs`` - create a JFS formatted partition. + +- other file system utilities are also available in this package. Reiserfsprogs ------------- The reiserfsprogs package should be used for reiserfs-3.6.x (Linux kernels 2.4.x). It is a combined package and contains working -versions of mkreiserfs, resize_reiserfs, debugreiserfs and -reiserfsck. These utils work on both i386 and alpha platforms. +versions of ``mkreiserfs``, ``resize_reiserfs``, ``debugreiserfs`` and +``reiserfsck``. These utils work on both i386 and alpha platforms. Xfsprogs -------- -The latest version of xfsprogs contains mkfs.xfs, xfs_db, and the -xfs_repair utilities, among others, for the XFS filesystem. It is +The latest version of ``xfsprogs`` contains ``mkfs.xfs``, ``xfs_db``, and the +``xfs_repair`` utilities, among others, for the XFS filesystem. It is architecture independent and any version from 2.0.0 onward should work correctly with this version of the XFS kernel code (2.6.0 or later is recommended, due to some significant improvements). @@ -180,7 +194,7 @@ later is recommended, due to some significant improvements). PCMCIAutils ----------- -PCMCIAutils replaces pcmcia-cs. It properly sets up +PCMCIAutils replaces ``pcmcia-cs``. It properly sets up PCMCIA sockets at system startup and loads the appropriate modules for 16-bit PCMCIA devices if the kernel is modularized and the hotplug subsystem is used. @@ -198,19 +212,20 @@ Intel IA32 microcode A driver has been added to allow updating of Intel IA32 microcode, accessible as a normal (misc) character device. If you are not using -udev you may need to: +udev you may need to:: -mkdir /dev/cpu -mknod /dev/cpu/microcode c 10 184 -chmod 0644 /dev/cpu/microcode + mkdir /dev/cpu + mknod /dev/cpu/microcode c 10 184 + chmod 0644 /dev/cpu/microcode as root before you can use this. You'll probably also want to get the user-space microcode_ctl utility to use with this. udev ---- -udev is a userspace application for populating /dev dynamically with -only entries for devices actually present. udev replaces the basic + +``udev`` is a userspace application for populating ``/dev`` dynamically with +only entries for devices actually present. ``udev`` replaces the basic functionality of devfs, while allowing persistent device naming for devices. @@ -218,10 +233,10 @@ FUSE ---- Needs libfuse 2.4.0 or later. Absolute minimum is 2.3.0 but mount -options 'direct_io' and 'kernel_cache' won't work. +options ``direct_io`` and ``kernel_cache`` won't work. Networking -========== +********** General changes --------------- @@ -243,9 +258,9 @@ enable it to operate over diverse media layers. If you use PPP, upgrade pppd to at least 2.4.0. If you are not using udev, you must have the device file /dev/ppp -which can be made by: +which can be made by:: -mknod /dev/ppp c 108 0 + mknod /dev/ppp c 108 0 as root. @@ -260,22 +275,22 @@ NFS-utils In ancient (2.4 and earlier) kernels, the nfs server needed to know about any client that expected to be able to access files via NFS. This -information would be given to the kernel by "mountd" when the client -mounted the filesystem, or by "exportfs" at system startup. exportfs -would take information about active clients from /var/lib/nfs/rmtab. +information would be given to the kernel by ``mountd`` when the client +mounted the filesystem, or by ``exportfs`` at system startup. exportfs +would take information about active clients from ``/var/lib/nfs/rmtab``. This approach is quite fragile as it depends on rmtab being correct which is not always easy, particularly when trying to implement -fail-over. Even when the system is working well, rmtab suffers from +fail-over. Even when the system is working well, ``rmtab`` suffers from getting lots of old entries that never get removed. With modern kernels we have the option of having the kernel tell mountd when it gets a request from an unknown host, and mountd can give appropriate export information to the kernel. This removes the -dependency on rmtab and means that the kernel only needs to know about +dependency on ``rmtab`` and means that the kernel only needs to know about currently active clients. -To enable this new functionality, you need to: +To enable this new functionality, you need to:: mount -t nfsd nfsd /proc/fs/nfsd @@ -287,8 +302,32 @@ mcelog ------ On x86 kernels the mcelog utility is needed to process and log machine check -events when CONFIG_X86_MCE is enabled. Machine check events are errors reported -by the CPU. Processing them is strongly encouraged. +events when ``CONFIG_X86_MCE`` is enabled. Machine check events are errors +reported by the CPU. Processing them is strongly encouraged. + +Kernel documentation +******************** + +Sphinx +------ + +The ReST markups currently used by the Documentation/ files are meant to be +built with ``Sphinx`` version 1.2 or upper. If you're desiring to build +PDF outputs, it is recommended to use version 1.4.6. + +.. note:: + + Please notice that, for PDF and LaTeX output, you'll also need ``XeLaTeX`` + version 3.14159265. Depending on the distribution, you may also need + to install a series of ``texlive`` packages that provide the minimal + set of functionalities required for ``XeLaTex`` to work. + +Other tools +----------- + +In order to produce documentation from DocBook, you'll also need ``xmlto``. +Please notice, however, that we're currently migrating all documents to use +``Sphinx``. Getting updated software ======================== @@ -298,114 +337,149 @@ Kernel compilation gcc --- -o <ftp://ftp.gnu.org/gnu/gcc/> + +- <ftp://ftp.gnu.org/gnu/gcc/> Make ---- -o <ftp://ftp.gnu.org/gnu/make/> + +- <ftp://ftp.gnu.org/gnu/make/> Binutils -------- -o <ftp://ftp.kernel.org/pub/linux/devel/binutils/> + +- <ftp://ftp.kernel.org/pub/linux/devel/binutils/> OpenSSL ------- -o <https://www.openssl.org/> + +- <https://www.openssl.org/> System utilities **************** Util-linux ---------- -o <ftp://ftp.kernel.org/pub/linux/utils/util-linux/> + +- <ftp://ftp.kernel.org/pub/linux/utils/util-linux/> Ksymoops -------- -o <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/> + +- <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/> Module-Init-Tools ----------------- -o <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/> + +- <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/> Mkinitrd -------- -o <https://code.launchpad.net/initrd-tools/main> + +- <https://code.launchpad.net/initrd-tools/main> E2fsprogs --------- -o <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz> + +- <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz> JFSutils -------- -o <http://jfs.sourceforge.net/> + +- <http://jfs.sourceforge.net/> Reiserfsprogs ------------- -o <http://www.kernel.org/pub/linux/utils/fs/reiserfs/> + +- <http://www.kernel.org/pub/linux/utils/fs/reiserfs/> Xfsprogs -------- -o <ftp://oss.sgi.com/projects/xfs/> + +- <ftp://oss.sgi.com/projects/xfs/> Pcmciautils ----------- -o <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/> + +- <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/> Quota-tools ----------- -o <http://sourceforge.net/projects/linuxquota/> +----------- + +- <http://sourceforge.net/projects/linuxquota/> DocBook Stylesheets ------------------- -o <http://sourceforge.net/projects/docbook/files/docbook-dsssl/> + +- <http://sourceforge.net/projects/docbook/files/docbook-dsssl/> XMLTO XSLT Frontend ------------------- -o <http://cyberelk.net/tim/xmlto/> + +- <http://cyberelk.net/tim/xmlto/> Intel P6 microcode ------------------ -o <https://downloadcenter.intel.com/> + +- <https://downloadcenter.intel.com/> udev ---- -o <http://www.freedesktop.org/software/systemd/man/udev.html> + +- <http://www.freedesktop.org/software/systemd/man/udev.html> FUSE ---- -o <http://sourceforge.net/projects/fuse> + +- <http://sourceforge.net/projects/fuse> mcelog ------ -o <http://www.mcelog.org/> + +- <http://www.mcelog.org/> Networking ********** PPP --- -o <ftp://ftp.samba.org/pub/ppp/> + +- <ftp://ftp.samba.org/pub/ppp/> Isdn4k-utils ------------ -o <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/> + +- <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/> NFS-utils --------- -o <http://sourceforge.net/project/showfiles.php?group_id=14> + +- <http://sourceforge.net/project/showfiles.php?group_id=14> Iptables -------- -o <http://www.iptables.org/downloads.html> + +- <http://www.iptables.org/downloads.html> Ip-route2 --------- -o <https://www.kernel.org/pub/linux/utils/net/iproute2/> + +- <https://www.kernel.org/pub/linux/utils/net/iproute2/> OProfile -------- -o <http://oprofile.sf.net/download/> + +- <http://oprofile.sf.net/download/> NFS-Utils --------- -o <http://nfs.sourceforge.net/> + +- <http://nfs.sourceforge.net/> + +Kernel documentation +******************** + +Sphinx +------ + +- <http://www.sphinx-doc.org/> diff --git a/Documentation/CodeOfConflict b/Documentation/CodeOfConflict index 1684d0b4efa6..49a8ecc157a2 100644 --- a/Documentation/CodeOfConflict +++ b/Documentation/CodeOfConflict @@ -19,7 +19,7 @@ please contact the Linux Foundation's Technical Advisory Board at will work to resolve the issue to the best of their ability. For more information on who is on the Technical Advisory Board and what their role is, please see: - http://www.linuxfoundation.org/programs/advisory-councils/tab + http://www.linuxfoundation.org/projects/linux/tab As a reviewer of code, please strive to keep things civil and focused on the technical issues involved. We are all humans, and frustrations can diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index a096836723ca..9c61c039ccd9 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -1,8 +1,10 @@ +.. _codingstyle: - Linux kernel coding style +Linux kernel coding style +========================= This is a short document describing the preferred coding style for the -linux kernel. Coding style is very personal, and I won't _force_ my +linux kernel. Coding style is very personal, and I won't **force** my views on anybody, but this is what goes for anything that I have to be able to maintain, and I'd prefer it for most other things too. Please at least consider the points made here. @@ -13,7 +15,8 @@ and NOT read it. Burn them, it's a great symbolic gesture. Anyway, here goes: - Chapter 1: Indentation +1) Indentation +-------------- Tabs are 8 characters, and thus indentations are also 8 characters. There are heretic movements that try to make indentations 4 (or even 2!) @@ -36,8 +39,10 @@ benefit of warning you when you're nesting your functions too deep. Heed that warning. The preferred way to ease multiple indentation levels in a switch statement is -to align the "switch" and its subordinate "case" labels in the same column -instead of "double-indenting" the "case" labels. E.g.: +to align the ``switch`` and its subordinate ``case`` labels in the same column +instead of ``double-indenting`` the ``case`` labels. E.g.: + +.. code-block:: c switch (suffix) { case 'G': @@ -59,6 +64,8 @@ instead of "double-indenting" the "case" labels. E.g.: Don't put multiple statements on a single line unless you have something to hide: +.. code-block:: c + if (condition) do_this; do_something_everytime; @@ -71,7 +78,8 @@ used for indentation, and the above example is deliberately broken. Get a decent editor and don't leave whitespace at the end of lines. - Chapter 2: Breaking long lines and strings +2) Breaking long lines and strings +---------------------------------- Coding style is all about readability and maintainability using commonly available tools. @@ -87,7 +95,8 @@ with a long argument list. However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them. - Chapter 3: Placing Braces and Spaces +3) Placing Braces and Spaces +---------------------------- The other issue that always comes up in C styling is the placement of braces. Unlike the indent size, there are few technical reasons to @@ -95,6 +104,8 @@ choose one placement strategy over the other, but the preferred way, as shown to us by the prophets Kernighan and Ritchie, is to put the opening brace last on the line, and put the closing brace first, thusly: +.. code-block:: c + if (x is true) { we do y } @@ -102,6 +113,8 @@ brace last on the line, and put the closing brace first, thusly: This applies to all non-function statement blocks (if, switch, for, while, do). E.g.: +.. code-block:: c + switch (action) { case KOBJ_ADD: return "add"; @@ -116,6 +129,8 @@ while, do). E.g.: However, there is one special case, namely functions: they have the opening brace at the beginning of the next line, thus: +.. code-block:: c + int function(int x) { body of function @@ -123,20 +138,24 @@ opening brace at the beginning of the next line, thus: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that -(a) K&R are _right_ and (b) K&R are right. Besides, functions are +(a) K&R are **right** and (b) K&R are right. Besides, functions are special anyway (you can't nest them in C). -Note that the closing brace is empty on a line of its own, _except_ in +Note that the closing brace is empty on a line of its own, **except** in the cases where it is followed by a continuation of the same statement, -ie a "while" in a do-statement or an "else" in an if-statement, like +ie a ``while`` in a do-statement or an ``else`` in an if-statement, like this: +.. code-block:: c + do { body of do-loop } while (condition); and +.. code-block:: c + if (x == y) { .. } else if (x > y) { @@ -155,11 +174,15 @@ comments on. Do not unnecessarily use braces where a single statement will do. +.. code-block:: c + if (condition) action(); and +.. code-block:: none + if (condition) do_this(); else @@ -168,6 +191,8 @@ and This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches: +.. code-block:: c + if (condition) { do_this(); do_that(); @@ -175,57 +200,67 @@ statement; in the latter case use braces in both branches: otherwise(); } - 3.1: Spaces +3.1) Spaces +*********** Linux kernel style for use of spaces depends (mostly) on function-versus-keyword usage. Use a space after (most) keywords. The notable exceptions are sizeof, typeof, alignof, and __attribute__, which look somewhat like functions (and are usually used with parentheses in Linux, -although they are not required in the language, as in: "sizeof info" after -"struct fileinfo info;" is declared). +although they are not required in the language, as in: ``sizeof info`` after +``struct fileinfo info;`` is declared). -So use a space after these keywords: +So use a space after these keywords:: if, switch, case, for, do, while but not with sizeof, typeof, alignof, or __attribute__. E.g., +.. code-block:: c + + s = sizeof(struct file); Do not add spaces around (inside) parenthesized expressions. This example is -*bad*: +**bad**: + +.. code-block:: c + s = sizeof( struct file ); When declaring pointer data or a function that returns a pointer type, the -preferred use of '*' is adjacent to the data name or function name and not +preferred use of ``*`` is adjacent to the data name or function name and not adjacent to the type name. Examples: +.. code-block:: c + + char *linux_banner; unsigned long long memparse(char *ptr, char **retptr); char *match_strdup(substring_t *s); Use one space around (on each side of) most binary and ternary operators, -such as any of these: +such as any of these:: = + - < > * / % | & ^ <= >= == != ? : -but no space after unary operators: +but no space after unary operators:: & * + - ~ ! sizeof typeof alignof __attribute__ defined -no space before the postfix increment & decrement unary operators: +no space before the postfix increment & decrement unary operators:: ++ -- -no space after the prefix increment & decrement unary operators: +no space after the prefix increment & decrement unary operators:: ++ -- -and no space around the '.' and "->" structure member operators. +and no space around the ``.`` and ``->`` structure member operators. Do not leave trailing whitespace at the ends of lines. Some editors with -"smart" indentation will insert whitespace at the beginning of new lines as +``smart`` indentation will insert whitespace at the beginning of new lines as appropriate, so you can start typing the next line of code right away. However, some such editors do not remove the whitespace if you end up not putting a line of code there, such as if you leave a blank line. As a result, @@ -237,22 +272,23 @@ of patches, this may make later patches in the series fail by changing their context lines. - Chapter 4: Naming +4) Naming +--------- C is a Spartan language, and so should your naming be. Unlike Modula-2 and Pascal programmers, C programmers do not use cute names like ThisVariableIsATemporaryCounter. A C programmer would call that -variable "tmp", which is much easier to write, and not the least more +variable ``tmp``, which is much easier to write, and not the least more difficult to understand. HOWEVER, while mixed-case names are frowned upon, descriptive names for -global variables are a must. To call a global function "foo" is a +global variables are a must. To call a global function ``foo`` is a shooting offense. -GLOBAL variables (to be used only if you _really_ need them) need to +GLOBAL variables (to be used only if you **really** need them) need to have descriptive names, as do global functions. If you have a function that counts the number of active users, you should call that -"count_active_users()" or similar, you should _not_ call it "cntusr()". +``count_active_users()`` or similar, you should **not** call it ``cntusr()``. Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can @@ -260,9 +296,9 @@ check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs. LOCAL variable names should be short, and to the point. If you have -some random integer loop counter, it should probably be called "i". -Calling it "loop_counter" is non-productive, if there is no chance of it -being mis-understood. Similarly, "tmp" can be just about any type of +some random integer loop counter, it should probably be called ``i``. +Calling it ``loop_counter`` is non-productive, if there is no chance of it +being mis-understood. Similarly, ``tmp`` can be just about any type of variable that is used to hold a temporary value. If you are afraid to mix up your local variable names, you have another @@ -270,59 +306,69 @@ problem, which is called the function-growth-hormone-imbalance syndrome. See chapter 6 (Functions). - Chapter 5: Typedefs +5) Typedefs +----------- + +Please don't use things like ``vps_t``. +It's a **mistake** to use typedef for structures and pointers. When you see a + +.. code-block:: c -Please don't use things like "vps_t". -It's a _mistake_ to use typedef for structures and pointers. When you see a vps_t a; in the source, what does it mean? In contrast, if it says +.. code-block:: c + struct virtual_container *a; -you can actually tell what "a" is. +you can actually tell what ``a`` is. -Lots of people think that typedefs "help readability". Not so. They are +Lots of people think that typedefs ``help readability``. Not so. They are useful only for: - (a) totally opaque objects (where the typedef is actively used to _hide_ + (a) totally opaque objects (where the typedef is actively used to **hide** what the object is). - Example: "pte_t" etc. opaque objects that you can only access using + Example: ``pte_t`` etc. opaque objects that you can only access using the proper accessor functions. - NOTE! Opaqueness and "accessor functions" are not good in themselves. - The reason we have them for things like pte_t etc. is that there - really is absolutely _zero_ portably accessible information there. + .. note:: + + Opaqueness and ``accessor functions`` are not good in themselves. + The reason we have them for things like pte_t etc. is that there + really is absolutely **zero** portably accessible information there. - (b) Clear integer types, where the abstraction _helps_ avoid confusion - whether it is "int" or "long". + (b) Clear integer types, where the abstraction **helps** avoid confusion + whether it is ``int`` or ``long``. u8/u16/u32 are perfectly fine typedefs, although they fit into category (d) better than here. - NOTE! Again - there needs to be a _reason_ for this. If something is - "unsigned long", then there's no reason to do + .. note:: + + Again - there needs to be a **reason** for this. If something is + ``unsigned long``, then there's no reason to do typedef unsigned long myflags_t; but if there is a clear reason for why it under certain circumstances - might be an "unsigned int" and under other configurations might be - "unsigned long", then by all means go ahead and use a typedef. + might be an ``unsigned int`` and under other configurations might be + ``unsigned long``, then by all means go ahead and use a typedef. - (c) when you use sparse to literally create a _new_ type for + (c) when you use sparse to literally create a **new** type for type-checking. (d) New types which are identical to standard C99 types, in certain exceptional circumstances. Although it would only take a short amount of time for the eyes and - brain to become accustomed to the standard types like 'uint32_t', + brain to become accustomed to the standard types like ``uint32_t``, some people object to their use anyway. - Therefore, the Linux-specific 'u8/u16/u32/u64' types and their + Therefore, the Linux-specific ``u8/u16/u32/u64`` types and their signed equivalents which are identical to standard types are permitted -- although they are not mandatory in new code of your own. @@ -333,7 +379,7 @@ useful only for: (e) Types safe for use in userspace. In certain structures which are visible to userspace, we cannot - require C99 types and cannot use the 'u32' form above. Thus, we + require C99 types and cannot use the ``u32`` form above. Thus, we use __u32 and similar types in all structures which are shared with userspace. @@ -341,10 +387,11 @@ Maybe there are other cases too, but the rule should basically be to NEVER EVER use a typedef unless you can clearly match one of those rules. In general, a pointer, or a struct that has elements that can reasonably -be directly accessed should _never_ be a typedef. +be directly accessed should **never** be a typedef. - Chapter 6: Functions +6) Functions +------------ Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, @@ -372,8 +419,10 @@ and it gets confused. You know you're brilliant, but maybe you'd like to understand what you did 2 weeks from now. In source files, separate functions with one blank line. If the function is -exported, the EXPORT* macro for it should follow immediately after the closing -function brace line. E.g.: +exported, the **EXPORT** macro for it should follow immediately after the +closing function brace line. E.g.: + +.. code-block:: c int system_is_up(void) { @@ -386,7 +435,8 @@ Although this is not required by the C language, it is preferred in Linux because it is a simple way to add valuable information for the reader. - Chapter 7: Centralized exiting of functions +7) Centralized exiting of functions +----------------------------------- Albeit deprecated by some people, the equivalent of the goto statement is used frequently by compilers in form of the unconditional jump instruction. @@ -396,18 +446,21 @@ locations and some common work such as cleanup has to be done. If there is no cleanup needed then just return directly. Choose label names which say what the goto does or why the goto exists. An -example of a good name could be "out_buffer:" if the goto frees "buffer". Avoid -using GW-BASIC names like "err1:" and "err2:". Also don't name them after the -goto location like "err_kmalloc_failed:" +example of a good name could be ``out_free_buffer:`` if the goto frees ``buffer``. +Avoid using GW-BASIC names like ``err1:`` and ``err2:``, as you would have to +renumber them if you ever add or remove exit paths, and they make correctness +difficult to verify anyway. The rationale for using gotos is: - unconditional statements are easier to understand and follow - nesting is reduced - errors by not updating individual exit points when making - modifications are prevented + modifications are prevented - saves the compiler work to optimize redundant code away ;) +.. code-block:: c + int fun(int a) { int result = 0; @@ -425,27 +478,41 @@ The rationale for using gotos is: goto out_buffer; } ... - out_buffer: + out_free_buffer: kfree(buffer); return result; } -A common type of bug to be aware of is "one err bugs" which look like this: +A common type of bug to be aware of is ``one err bugs`` which look like this: + +.. code-block:: c err: kfree(foo->bar); kfree(foo); return ret; -The bug in this code is that on some exit paths "foo" is NULL. Normally the -fix for this is to split it up into two error labels "err_bar:" and "err_foo:". +The bug in this code is that on some exit paths ``foo`` is NULL. Normally the +fix for this is to split it up into two error labels ``err_free_bar:`` and +``err_free_foo:``: +.. code-block:: c - Chapter 8: Commenting + err_free_bar: + kfree(foo->bar); + err_free_foo: + kfree(foo); + return ret; + +Ideally you should simulate errors to test all exit paths. + + +8) Commenting +------------- Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to -write the code so that the _working_ is obvious, and it's a waste of +write the code so that the **working** is obvious, and it's a waste of time to explain badly written code. Generally, you want your comments to tell WHAT your code does, not HOW. @@ -461,11 +528,10 @@ When commenting the kernel API functions, please use the kernel-doc format. See the files Documentation/kernel-documentation.rst and scripts/kernel-doc for details. -Linux style for comments is the C89 "/* ... */" style. -Don't use C99-style "// ..." comments. - The preferred style for long (multi-line) comments is: +.. code-block:: c + /* * This is the preferred style for multi-line * comments in the Linux kernel source code. @@ -478,6 +544,8 @@ The preferred style for long (multi-line) comments is: For files in net/ and drivers/net/ the preferred style for long (multi-line) comments is a little different. +.. code-block:: c + /* The preferred comment style for files in net/ and drivers/net * looks like this. * @@ -491,10 +559,11 @@ multiple data declarations). This leaves you room for a small comment on each item, explaining its use. - Chapter 9: You've made a mess of it +9) You've made a mess of it +--------------------------- That's OK, we all do. You've probably been told by your long-time Unix -user helper that "GNU emacs" automatically formats the C sources for +user helper that ``GNU emacs`` automatically formats the C sources for you, and you've noticed that yes, it does do that, but the defaults it uses are less than desirable (in fact, they are worse than random typing - an infinite number of monkeys typing into GNU emacs would never @@ -503,63 +572,66 @@ make a good program). So, you can either get rid of GNU emacs, or change it to use saner values. To do the latter, you can stick the following in your .emacs file: -(defun c-lineup-arglist-tabs-only (ignored) - "Line up argument lists by tabs, not spaces" - (let* ((anchor (c-langelem-pos c-syntactic-element)) - (column (c-langelem-2nd-pos c-syntactic-element)) - (offset (- (1+ column) anchor)) - (steps (floor offset c-basic-offset))) - (* (max steps 1) - c-basic-offset))) - -(add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - -(add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) +.. code-block:: none + + (defun c-lineup-arglist-tabs-only (ignored) + "Line up argument lists by tabs, not spaces" + (let* ((anchor (c-langelem-pos c-syntactic-element)) + (column (c-langelem-2nd-pos c-syntactic-element)) + (offset (- (1+ column) anchor)) + (steps (floor offset c-basic-offset))) + (* (max steps 1) + c-basic-offset))) + + (add-hook 'c-mode-common-hook + (lambda () + ;; Add kernel style + (c-add-style + "linux-tabs-only" + '("linux" (c-offsets-alist + (arglist-cont-nonempty + c-lineup-gcc-asm-reg + c-lineup-arglist-tabs-only)))))) + + (add-hook 'c-mode-hook + (lambda () + (let ((filename (buffer-file-name))) + ;; Enable kernel mode for the appropriate files + (when (and filename + (string-match (expand-file-name "~/src/linux-trees") + filename)) + (setq indent-tabs-mode t) + (setq show-trailing-whitespace t) + (c-set-style "linux-tabs-only"))))) This will make emacs go better with the kernel coding style for C -files below ~/src/linux-trees. +files below ``~/src/linux-trees``. But even if you fail in getting emacs to do sane formatting, not -everything is lost: use "indent". +everything is lost: use ``indent``. Now, again, GNU indent has the same brain-dead settings that GNU emacs has, which is why you need to give it a few command line options. However, that's not too bad, because even the makers of GNU indent recognize the authority of K&R (the GNU people aren't evil, they are just severely misguided in this matter), so you just give indent the -options "-kr -i8" (stands for "K&R, 8 character indents"), or use -"scripts/Lindent", which indents in the latest style. +options ``-kr -i8`` (stands for ``K&R, 8 character indents``), or use +``scripts/Lindent``, which indents in the latest style. -"indent" has a lot of options, and especially when it comes to comment +``indent`` has a lot of options, and especially when it comes to comment re-formatting you may want to take a look at the man page. But -remember: "indent" is not a fix for bad programming. +remember: ``indent`` is not a fix for bad programming. - Chapter 10: Kconfig configuration files +10) Kconfig configuration files +------------------------------- For all of the Kconfig* configuration files throughout the source tree, -the indentation is somewhat different. Lines under a "config" definition +the indentation is somewhat different. Lines under a ``config`` definition are indented with one tab, while help text is indented an additional two -spaces. Example: +spaces. Example:: -config AUDIT + config AUDIT bool "Auditing support" depends on NET help @@ -569,9 +641,9 @@ config AUDIT auditing without CONFIG_AUDITSYSCALL. Seriously dangerous features (such as write support for certain -filesystems) should advertise this prominently in their prompt string: +filesystems) should advertise this prominently in their prompt string:: -config ADFS_FS_RW + config ADFS_FS_RW bool "ADFS write support (DANGEROUS)" depends on ADFS_FS ... @@ -580,41 +652,45 @@ For full documentation on the configuration files, see the file Documentation/kbuild/kconfig-language.txt. - Chapter 11: Data structures +11) Data structures +------------------- Data structures that have visibility outside the single-threaded environment they are created and destroyed in should always have reference counts. In the kernel, garbage collection doesn't exist (and outside the kernel garbage collection is slow and inefficient), which -means that you absolutely _have_ to reference count all your uses. +means that you absolutely **have** to reference count all your uses. Reference counting means that you can avoid locking, and allows multiple users to have access to the data structure in parallel - and not having to worry about the structure suddenly going away from under them just because they slept or did something else for a while. -Note that locking is _not_ a replacement for reference counting. +Note that locking is **not** a replacement for reference counting. Locking is used to keep data structures coherent, while reference counting is a memory management technique. Usually both are needed, and they are not to be confused with each other. Many data structures can indeed have two levels of reference counting, -when there are users of different "classes". The subclass count counts +when there are users of different ``classes``. The subclass count counts the number of subclass users, and decrements the global count just once when the subclass count goes to zero. -Examples of this kind of "multi-level-reference-counting" can be found in -memory management ("struct mm_struct": mm_users and mm_count), and in -filesystem code ("struct super_block": s_count and s_active). +Examples of this kind of ``multi-level-reference-counting`` can be found in +memory management (``struct mm_struct``: mm_users and mm_count), and in +filesystem code (``struct super_block``: s_count and s_active). Remember: if another thread can find your data structure, and you don't have a reference count on it, you almost certainly have a bug. - Chapter 12: Macros, Enums and RTL +12) Macros, Enums and RTL +------------------------- Names of macros defining constants and labels in enums are capitalized. +.. code-block:: c + #define CONSTANT 0x12345 Enums are preferred when defining several related constants. @@ -626,7 +702,9 @@ Generally, inline functions are preferable to macros resembling functions. Macros with multiple statements should be enclosed in a do - while block: - #define macrofun(a, b, c) \ +.. code-block:: c + + #define macrofun(a, b, c) \ do { \ if (a == 5) \ do_this(b, c); \ @@ -636,17 +714,21 @@ Things to avoid when using macros: 1) macros that affect control flow: +.. code-block:: c + #define FOO(x) \ do { \ if (blah(x) < 0) \ return -EBUGGERED; \ } while (0) -is a _very_ bad idea. It looks like a function call but exits the "calling" +is a **very** bad idea. It looks like a function call but exits the ``calling`` function; don't break the internal parsers of those who will read the code. 2) macros that depend on having a local variable with a magic name: +.. code-block:: c + #define FOO(val) bar(index, val) might look like a good thing, but it's confusing as hell when one reads the @@ -659,18 +741,22 @@ bite you if somebody e.g. turns FOO into an inline function. must enclose the expression in parentheses. Beware of similar issues with macros using parameters. +.. code-block:: c + #define CONSTANT 0x4000 #define CONSTEXP (CONSTANT | 3) 5) namespace collisions when defining local variables in macros resembling functions: -#define FOO(x) \ -({ \ - typeof(x) ret; \ - ret = calc_ret(x); \ - (ret); \ -}) +.. code-block:: c + + #define FOO(x) \ + ({ \ + typeof(x) ret; \ + ret = calc_ret(x); \ + (ret); \ + }) ret is a common name for a local variable - __foo_ret is less likely to collide with an existing variable. @@ -679,11 +765,12 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also covers RTL which is used frequently with assembly language in the kernel. - Chapter 13: Printing kernel messages +13) Printing kernel messages +---------------------------- Kernel developers like to be seen as literate. Do mind the spelling of kernel messages to make a good impression. Do not use crippled -words like "dont"; use "do not" or "don't" instead. Make the messages +words like ``dont``; use ``do not`` or ``don't`` instead. Make the messages concise, clear, and unambiguous. Kernel messages do not have to be terminated with a period. @@ -713,7 +800,8 @@ already inside a debug-related #ifdef section, printk(KERN_DEBUG ...) can be used. - Chapter 14: Allocating memory +14) Allocating memory +--------------------- The kernel provides the following general purpose memory allocators: kmalloc(), kzalloc(), kmalloc_array(), kcalloc(), vmalloc(), and @@ -722,6 +810,8 @@ about them. The preferred form for passing a size of a struct is the following: +.. code-block:: c + p = kmalloc(sizeof(*p), ...); The alternative form where struct name is spelled out hurts readability and @@ -734,20 +824,25 @@ language. The preferred form for allocating an array is the following: +.. code-block:: c + p = kmalloc_array(n, sizeof(...), ...); The preferred form for allocating a zeroed array is the following: +.. code-block:: c + p = kcalloc(n, sizeof(...), ...); Both forms check for overflow on the allocation size n * sizeof(...), and return NULL if that occurred. - Chapter 15: The inline disease +15) The inline disease +---------------------- There appears to be a common misperception that gcc has a magic "make me -faster" speedup option called "inline". While the use of inlines can be +faster" speedup option called ``inline``. While the use of inlines can be appropriate (for example as a means of replacing macros, see Chapter 12), it very often is not. Abundant use of the inline keyword leads to a much bigger kernel, which in turn slows the system as a whole down, due to a bigger @@ -771,26 +866,27 @@ appears outweighs the potential value of the hint that tells gcc to do something it would have done anyway. - Chapter 16: Function return values and names +16) Function return values and names +------------------------------------ Functions can return values of many different kinds, and one of the most common is a value indicating whether the function succeeded or failed. Such a value can be represented as an error-code integer -(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, +(-Exxx = failure, 0 = success) or a ``succeeded`` boolean (0 = failure, non-zero = success). Mixing up these two sorts of representations is a fertile source of difficult-to-find bugs. If the C language included a strong distinction between integers and booleans then the compiler would find these mistakes for us... but it doesn't. To help prevent such bugs, always follow this -convention: +convention:: If the name of a function is an action or an imperative command, the function should return an error-code integer. If the name is a predicate, the function should return a "succeeded" boolean. -For example, "add work" is a command, and the add_work() function returns 0 -for success or -EBUSY for failure. In the same way, "PCI device present" is +For example, ``add work`` is a command, and the add_work() function returns 0 +for success or -EBUSY for failure. In the same way, ``PCI device present`` is a predicate, and the pci_dev_present() function returns 1 if it succeeds in finding a matching device or 0 if it doesn't. @@ -805,17 +901,22 @@ result. Typical examples would be functions that return pointers; they use NULL or the ERR_PTR mechanism to report failure. - Chapter 17: Don't re-invent the kernel macros +17) Don't re-invent the kernel macros +------------------------------------- The header file include/linux/kernel.h contains a number of macros that you should use, rather than explicitly coding some variant of them yourself. For example, if you need to calculate the length of an array, take advantage of the macro +.. code-block:: c + #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) Similarly, if you need to calculate the size of some structure member, use +.. code-block:: c + #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) There are also min() and max() macros that do strict type checking if you @@ -823,16 +924,21 @@ need them. Feel free to peruse that header file to see what else is already defined that you shouldn't reproduce in your code. - Chapter 18: Editor modelines and other cruft +18) Editor modelines and other cruft +------------------------------------ Some editors can interpret configuration information embedded in source files, indicated with special markers. For example, emacs interprets lines marked like this: +.. code-block:: c + -*- mode: c -*- Or like this: +.. code-block:: c + /* Local Variables: compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c" @@ -841,6 +947,8 @@ Or like this: Vim interprets markers that look like this: +.. code-block:: c + /* vim:set sw=8 noet */ Do not include any of these in source files. People have their own personal @@ -850,7 +958,8 @@ own custom mode, or may have some other magic method for making indentation work correctly. - Chapter 19: Inline assembly +19) Inline assembly +------------------- In architecture-specific code, you may need to use inline assembly to interface with CPU or platform functionality. Don't hesitate to do so when necessary. @@ -863,7 +972,7 @@ that inline assembly can use C parameters. Large, non-trivial assembly functions should go in .S files, with corresponding C prototypes defined in C header files. The C prototypes for assembly -functions should use "asmlinkage". +functions should use ``asmlinkage``. You may need to mark your asm statement as volatile, to prevent GCC from removing it if GCC doesn't notice any side effects. You don't always need to @@ -874,12 +983,15 @@ instructions, put each instruction on a separate line in a separate quoted string, and end each string except the last with \n\t to properly indent the next instruction in the assembly output: +.. code-block:: c + asm ("magic %reg1, #42\n\t" "more_magic %reg2, %reg3" : /* outputs */ : /* inputs */ : /* clobbers */); - Chapter 20: Conditional Compilation +20) Conditional Compilation +--------------------------- Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c files; doing so makes code harder to read and logic harder to follow. Instead, @@ -903,6 +1015,8 @@ unused, delete it.) Within code, where possible, use the IS_ENABLED macro to convert a Kconfig symbol into a C boolean expression, and use it in a normal C conditional: +.. code-block:: c + if (IS_ENABLED(CONFIG_SOMETHING)) { ... } @@ -918,12 +1032,15 @@ At the end of any non-trivial #if or #ifdef block (more than a few lines), place a comment after the #endif on the same line, noting the conditional expression used. For instance: +.. code-block:: c + #ifdef CONFIG_SOMETHING ... #endif /* CONFIG_SOMETHING */ - Appendix I: References +Appendix I) References +---------------------- The C Programming Language, Second Edition by Brian W. Kernighan and Dennis M. Ritchie. @@ -943,4 +1060,3 @@ language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ Kernel CodingStyle, by greg@kroah.com at OLS 2002: http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ - diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt index 781024ef9050..979228bc9035 100644 --- a/Documentation/DMA-API-HOWTO.txt +++ b/Documentation/DMA-API-HOWTO.txt @@ -699,7 +699,7 @@ to use the dma_sync_*() interfaces. dma_addr_t mapping; mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE); - if (dma_mapping_error(cp->dev, dma_handle)) { + if (dma_mapping_error(cp->dev, mapping)) { /* * reduce current DMA mapping usage, * delay and try again later or @@ -931,10 +931,8 @@ to "Closing". 1) Struct scatterlist requirements. - Don't invent the architecture specific struct scatterlist; just use - <asm-generic/scatterlist.h>. You need to enable - CONFIG_NEED_SG_DMA_LENGTH if the architecture supports IOMMUs - (including software IOMMU). + You need to enable CONFIG_NEED_SG_DMA_LENGTH if the architecture + supports IOMMUs (including software IOMMU). 2) ARCH_DMA_MINALIGN diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 1d26eeb6b5f6..6b20128fab8a 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -277,14 +277,26 @@ and <size> parameters are provided to do partial page mapping, it is recommended that you never use these unless you really know what the cache width is. +dma_addr_t +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) + +void +dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) + +API for mapping and unmapping for MMIO resources. All the notes and +warnings for the other mapping APIs apply here. The API should only be +used to map device MMIO resources, mapping of RAM is not permitted. + int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -In some circumstances dma_map_single() and dma_map_page() will fail to create -a mapping. A driver can check for these errors by testing the returned -DMA address with dma_mapping_error(). A non-zero return value means the mapping -could not be created and the driver should take appropriate action (e.g. -reduce current DMA mapping usage or delay and try again later). +In some circumstances dma_map_single(), dma_map_page() and dma_map_resource() +will fail to create a mapping. A driver can check for these errors by testing +the returned DMA address with dma_mapping_error(). A non-zero return value +means the mapping could not be created and the driver should take appropriate +action (e.g. reduce current DMA mapping usage or delay and try again later). int dma_map_sg(struct device *dev, struct scatterlist *sg, diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 64460a897f56..736f5916daea 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -6,7 +6,7 @@ # To add a new book the only step required is to add the book to the # list of DOCBOOKS. -DOCBOOKS := z8530book.xml device-drivers.xml \ +DOCBOOKS := z8530book.xml \ kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ writing_usb_driver.xml networking.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ @@ -22,9 +22,15 @@ ifeq ($(DOCBOOKS),) # Skip DocBook build if the user explicitly requested no DOCBOOKS. .DEFAULT: @echo " SKIP DocBook $@ target (DOCBOOKS=\"\" specified)." +else +ifneq ($(SPHINXDIRS),) +# Skip DocBook build if the user explicitly requested a sphinx dir +.DEFAULT: + @echo " SKIP DocBook $@ target (SPHINXDIRS specified)." else + ### # The build process is as follows (targets): # (xmldocs) [by docproc] @@ -66,6 +72,7 @@ installmandocs: mandocs # no-op for the DocBook toolchain epubdocs: +latexdocs: ### #External programs used @@ -221,6 +228,7 @@ silent_gen_xml = : echo "</programlisting>") > $@ endif # DOCBOOKS="" +endif # SPHINDIR=... ### # Help targets as used by the top-level makefile diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl deleted file mode 100644 index 9c10030eb2be..000000000000 --- a/Documentation/DocBook/device-drivers.tmpl +++ /dev/null @@ -1,521 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" - "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> - -<book id="LinuxDriversAPI"> - <bookinfo> - <title>Linux Device Drivers</title> - - <legalnotice> - <para> - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - </para> - - <para> - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - </para> - - <para> - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - </para> - - <para> - For more details see the file COPYING in the source - distribution of Linux. - </para> - </legalnotice> - </bookinfo> - -<toc></toc> - - <chapter id="Basics"> - <title>Driver Basics</title> - <sect1><title>Driver Entry and Exit points</title> -!Iinclude/linux/init.h - </sect1> - - <sect1><title>Atomic and pointer manipulation</title> -!Iarch/x86/include/asm/atomic.h - </sect1> - - <sect1><title>Delaying, scheduling, and timer routines</title> -!Iinclude/linux/sched.h -!Ekernel/sched/core.c -!Ikernel/sched/cpupri.c -!Ikernel/sched/fair.c -!Iinclude/linux/completion.h -!Ekernel/time/timer.c - </sect1> - <sect1><title>Wait queues and Wake events</title> -!Iinclude/linux/wait.h -!Ekernel/sched/wait.c - </sect1> - <sect1><title>High-resolution timers</title> -!Iinclude/linux/ktime.h -!Iinclude/linux/hrtimer.h -!Ekernel/time/hrtimer.c - </sect1> - <sect1><title>Workqueues and Kevents</title> -!Iinclude/linux/workqueue.h -!Ekernel/workqueue.c - </sect1> - <sect1><title>Internal Functions</title> -!Ikernel/exit.c -!Ikernel/signal.c -!Iinclude/linux/kthread.h -!Ekernel/kthread.c - </sect1> - - <sect1><title>Kernel objects manipulation</title> -<!-- -X!Iinclude/linux/kobject.h ---> -!Elib/kobject.c - </sect1> - - <sect1><title>Kernel utility functions</title> -!Iinclude/linux/kernel.h -!Ekernel/printk/printk.c -!Ekernel/panic.c -!Ekernel/sys.c -!Ekernel/rcu/srcu.c -!Ekernel/rcu/tree.c -!Ekernel/rcu/tree_plugin.h -!Ekernel/rcu/update.c - </sect1> - - <sect1><title>Device Resource Management</title> -!Edrivers/base/devres.c - </sect1> - - </chapter> - - <chapter id="devdrivers"> - <title>Device drivers infrastructure</title> - <sect1><title>The Basic Device Driver-Model Structures </title> -!Iinclude/linux/device.h - </sect1> - <sect1><title>Device Drivers Base</title> -!Idrivers/base/init.c -!Edrivers/base/driver.c -!Edrivers/base/core.c -!Edrivers/base/syscore.c -!Edrivers/base/class.c -!Idrivers/base/node.c -!Edrivers/base/firmware_class.c -!Edrivers/base/transport_class.c -<!-- Cannot be included, because - attribute_container_add_class_device_adapter - and attribute_container_classdev_to_container - exceed allowed 44 characters maximum -X!Edrivers/base/attribute_container.c ---> -!Edrivers/base/dd.c -<!-- -X!Edrivers/base/interface.c ---> -!Iinclude/linux/platform_device.h -!Edrivers/base/platform.c -!Edrivers/base/bus.c - </sect1> - <sect1> - <title>Buffer Sharing and Synchronization</title> - <para> - The dma-buf subsystem provides the framework for sharing buffers - for hardware (DMA) access across multiple device drivers and - subsystems, and for synchronizing asynchronous hardware access. - </para> - <para> - This is used, for example, by drm "prime" multi-GPU support, but - is of course not limited to GPU use cases. - </para> - <para> - The three main components of this are: (1) dma-buf, representing - a sg_table and exposed to userspace as a file descriptor to allow - passing between devices, (2) fence, which provides a mechanism - to signal when one device as finished access, and (3) reservation, - which manages the shared or exclusive fence(s) associated with - the buffer. - </para> - <sect2><title>dma-buf</title> -!Edrivers/dma-buf/dma-buf.c -!Iinclude/linux/dma-buf.h - </sect2> - <sect2><title>reservation</title> -!Pdrivers/dma-buf/reservation.c Reservation Object Overview -!Edrivers/dma-buf/reservation.c -!Iinclude/linux/reservation.h - </sect2> - <sect2><title>fence</title> -!Edrivers/dma-buf/fence.c -!Iinclude/linux/fence.h -!Edrivers/dma-buf/seqno-fence.c -!Iinclude/linux/seqno-fence.h -!Edrivers/dma-buf/fence-array.c -!Iinclude/linux/fence-array.h -!Edrivers/dma-buf/reservation.c -!Iinclude/linux/reservation.h -!Edrivers/dma-buf/sync_file.c -!Iinclude/linux/sync_file.h - </sect2> - </sect1> - <sect1><title>Device Drivers DMA Management</title> -!Edrivers/base/dma-coherent.c -!Edrivers/base/dma-mapping.c - </sect1> - <sect1><title>Device Drivers Power Management</title> -!Edrivers/base/power/main.c - </sect1> - <sect1><title>Device Drivers ACPI Support</title> -<!-- Internal functions only -X!Edrivers/acpi/sleep/main.c -X!Edrivers/acpi/sleep/wakeup.c -X!Edrivers/acpi/motherboard.c -X!Edrivers/acpi/bus.c ---> -!Edrivers/acpi/scan.c -!Idrivers/acpi/scan.c -<!-- No correct structured comments -X!Edrivers/acpi/pci_bind.c ---> - </sect1> - <sect1><title>Device drivers PnP support</title> -!Idrivers/pnp/core.c -<!-- No correct structured comments -X!Edrivers/pnp/system.c - --> -!Edrivers/pnp/card.c -!Idrivers/pnp/driver.c -!Edrivers/pnp/manager.c -!Edrivers/pnp/support.c - </sect1> - <sect1><title>Userspace IO devices</title> -!Edrivers/uio/uio.c -!Iinclude/linux/uio_driver.h - </sect1> - </chapter> - - <chapter id="parportdev"> - <title>Parallel Port Devices</title> -!Iinclude/linux/parport.h -!Edrivers/parport/ieee1284.c -!Edrivers/parport/share.c -!Idrivers/parport/daisy.c - </chapter> - - <chapter id="message_devices"> - <title>Message-based devices</title> - <sect1><title>Fusion message devices</title> -!Edrivers/message/fusion/mptbase.c -!Idrivers/message/fusion/mptbase.c -!Edrivers/message/fusion/mptscsih.c -!Idrivers/message/fusion/mptscsih.c -!Idrivers/message/fusion/mptctl.c -!Idrivers/message/fusion/mptspi.c -!Idrivers/message/fusion/mptfc.c -!Idrivers/message/fusion/mptlan.c - </sect1> - </chapter> - - <chapter id="snddev"> - <title>Sound Devices</title> -!Iinclude/sound/core.h -!Esound/sound_core.c -!Iinclude/sound/pcm.h -!Esound/core/pcm.c -!Esound/core/device.c -!Esound/core/info.c -!Esound/core/rawmidi.c -!Esound/core/sound.c -!Esound/core/memory.c -!Esound/core/pcm_memory.c -!Esound/core/init.c -!Esound/core/isadma.c -!Esound/core/control.c -!Esound/core/pcm_lib.c -!Esound/core/hwdep.c -!Esound/core/pcm_native.c -!Esound/core/memalloc.c -<!-- FIXME: Removed for now since no structured comments in source -X!Isound/sound_firmware.c ---> - </chapter> - - - <chapter id="uart16x50"> - <title>16x50 UART Driver</title> -!Edrivers/tty/serial/serial_core.c -!Edrivers/tty/serial/8250/8250_core.c - </chapter> - - <chapter id="fbdev"> - <title>Frame Buffer Library</title> - - <para> - The frame buffer drivers depend heavily on four data structures. - These structures are declared in include/linux/fb.h. They are - fb_info, fb_var_screeninfo, fb_fix_screeninfo and fb_monospecs. - The last three can be made available to and from userland. - </para> - - <para> - fb_info defines the current state of a particular video card. - Inside fb_info, there exists a fb_ops structure which is a - collection of needed functions to make fbdev and fbcon work. - fb_info is only visible to the kernel. - </para> - - <para> - fb_var_screeninfo is used to describe the features of a video card - that are user defined. With fb_var_screeninfo, things such as - depth and the resolution may be defined. - </para> - - <para> - The next structure is fb_fix_screeninfo. This defines the - properties of a card that are created when a mode is set and can't - be changed otherwise. A good example of this is the start of the - frame buffer memory. This "locks" the address of the frame buffer - memory, so that it cannot be changed or moved. - </para> - - <para> - The last structure is fb_monospecs. In the old API, there was - little importance for fb_monospecs. This allowed for forbidden things - such as setting a mode of 800x600 on a fix frequency monitor. With - the new API, fb_monospecs prevents such things, and if used - correctly, can prevent a monitor from being cooked. fb_monospecs - will not be useful until kernels 2.5.x. - </para> - - <sect1><title>Frame Buffer Memory</title> -!Edrivers/video/fbdev/core/fbmem.c - </sect1> -<!-- - <sect1><title>Frame Buffer Console</title> -X!Edrivers/video/console/fbcon.c - </sect1> ---> - <sect1><title>Frame Buffer Colormap</title> -!Edrivers/video/fbdev/core/fbcmap.c - </sect1> -<!-- FIXME: - drivers/video/fbgen.c has no docs, which stuffs up the sgml. Comment - out until somebody adds docs. KAO - <sect1><title>Frame Buffer Generic Functions</title> -X!Idrivers/video/fbgen.c - </sect1> -KAO --> - <sect1><title>Frame Buffer Video Mode Database</title> -!Idrivers/video/fbdev/core/modedb.c -!Edrivers/video/fbdev/core/modedb.c - </sect1> - <sect1><title>Frame Buffer Macintosh Video Mode Database</title> -!Edrivers/video/fbdev/macmodes.c - </sect1> - <sect1><title>Frame Buffer Fonts</title> - <para> - Refer to the file lib/fonts/fonts.c for more information. - </para> -<!-- FIXME: Removed for now since no structured comments in source -X!Ilib/fonts/fonts.c ---> - </sect1> - </chapter> - - <chapter id="input_subsystem"> - <title>Input Subsystem</title> - <sect1><title>Input core</title> -!Iinclude/linux/input.h -!Edrivers/input/input.c -!Edrivers/input/ff-core.c -!Edrivers/input/ff-memless.c - </sect1> - <sect1><title>Multitouch Library</title> -!Iinclude/linux/input/mt.h -!Edrivers/input/input-mt.c - </sect1> - <sect1><title>Polled input devices</title> -!Iinclude/linux/input-polldev.h -!Edrivers/input/input-polldev.c - </sect1> - <sect1><title>Matrix keyboards/keypads</title> -!Iinclude/linux/input/matrix_keypad.h - </sect1> - <sect1><title>Sparse keymap support</title> -!Iinclude/linux/input/sparse-keymap.h -!Edrivers/input/sparse-keymap.c - </sect1> - </chapter> - - <chapter id="spi"> - <title>Serial Peripheral Interface (SPI)</title> - <para> - SPI is the "Serial Peripheral Interface", widely used with - embedded systems because it is a simple and efficient - interface: basically a multiplexed shift register. - Its three signal wires hold a clock (SCK, often in the range - of 1-20 MHz), a "Master Out, Slave In" (MOSI) data line, and - a "Master In, Slave Out" (MISO) data line. - SPI is a full duplex protocol; for each bit shifted out the - MOSI line (one per clock) another is shifted in on the MISO line. - Those bits are assembled into words of various sizes on the - way to and from system memory. - An additional chipselect line is usually active-low (nCS); - four signals are normally used for each peripheral, plus - sometimes an interrupt. - </para> - <para> - The SPI bus facilities listed here provide a generalized - interface to declare SPI busses and devices, manage them - according to the standard Linux driver model, and perform - input/output operations. - At this time, only "master" side interfaces are supported, - where Linux talks to SPI peripherals and does not implement - such a peripheral itself. - (Interfaces to support implementing SPI slaves would - necessarily look different.) - </para> - <para> - The programming interface is structured around two kinds of driver, - and two kinds of device. - A "Controller Driver" abstracts the controller hardware, which may - be as simple as a set of GPIO pins or as complex as a pair of FIFOs - connected to dual DMA engines on the other side of the SPI shift - register (maximizing throughput). Such drivers bridge between - whatever bus they sit on (often the platform bus) and SPI, and - expose the SPI side of their device as a - <structname>struct spi_master</structname>. - SPI devices are children of that master, represented as a - <structname>struct spi_device</structname> and manufactured from - <structname>struct spi_board_info</structname> descriptors which - are usually provided by board-specific initialization code. - A <structname>struct spi_driver</structname> is called a - "Protocol Driver", and is bound to a spi_device using normal - driver model calls. - </para> - <para> - The I/O model is a set of queued messages. Protocol drivers - submit one or more <structname>struct spi_message</structname> - objects, which are processed and completed asynchronously. - (There are synchronous wrappers, however.) Messages are - built from one or more <structname>struct spi_transfer</structname> - objects, each of which wraps a full duplex SPI transfer. - A variety of protocol tweaking options are needed, because - different chips adopt very different policies for how they - use the bits transferred with SPI. - </para> -!Iinclude/linux/spi/spi.h -!Fdrivers/spi/spi.c spi_register_board_info -!Edrivers/spi/spi.c - </chapter> - - <chapter id="i2c"> - <title>I<superscript>2</superscript>C and SMBus Subsystem</title> - - <para> - I<superscript>2</superscript>C (or without fancy typography, "I2C") - is an acronym for the "Inter-IC" bus, a simple bus protocol which is - widely used where low data rate communications suffice. - Since it's also a licensed trademark, some vendors use another - name (such as "Two-Wire Interface", TWI) for the same bus. - I2C only needs two signals (SCL for clock, SDA for data), conserving - board real estate and minimizing signal quality issues. - Most I2C devices use seven bit addresses, and bus speeds of up - to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet - found wide use. - I2C is a multi-master bus; open drain signaling is used to - arbitrate between masters, as well as to handshake and to - synchronize clocks from slower clients. - </para> - - <para> - The Linux I2C programming interfaces support only the master - side of bus interactions, not the slave side. - The programming interface is structured around two kinds of driver, - and two kinds of device. - An I2C "Adapter Driver" abstracts the controller hardware; it binds - to a physical device (perhaps a PCI device or platform_device) and - exposes a <structname>struct i2c_adapter</structname> representing - each I2C bus segment it manages. - On each I2C bus segment will be I2C devices represented by a - <structname>struct i2c_client</structname>. Those devices will - be bound to a <structname>struct i2c_driver</structname>, - which should follow the standard Linux driver model. - (At this writing, a legacy model is more widely used.) - There are functions to perform various I2C protocol operations; at - this writing all such functions are usable only from task context. - </para> - - <para> - The System Management Bus (SMBus) is a sibling protocol. Most SMBus - systems are also I2C conformant. The electrical constraints are - tighter for SMBus, and it standardizes particular protocol messages - and idioms. Controllers that support I2C can also support most - SMBus operations, but SMBus controllers don't support all the protocol - options that an I2C controller will. - There are functions to perform various SMBus protocol operations, - either using I2C primitives or by issuing SMBus commands to - i2c_adapter devices which don't support those I2C operations. - </para> - -!Iinclude/linux/i2c.h -!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info -!Edrivers/i2c/i2c-core.c - </chapter> - - <chapter id="hsi"> - <title>High Speed Synchronous Serial Interface (HSI)</title> - - <para> - High Speed Synchronous Serial Interface (HSI) is a - serial interface mainly used for connecting application - engines (APE) with cellular modem engines (CMT) in cellular - handsets. - - HSI provides multiplexing for up to 16 logical channels, - low-latency and full duplex communication. - </para> - -!Iinclude/linux/hsi/hsi.h -!Edrivers/hsi/hsi_core.c - </chapter> - - <chapter id="pwm"> - <title>Pulse-Width Modulation (PWM)</title> - <para> - Pulse-width modulation is a modulation technique primarily used to - control power supplied to electrical devices. - </para> - <para> - The PWM framework provides an abstraction for providers and consumers - of PWM signals. A controller that provides one or more PWM signals is - registered as <structname>struct pwm_chip</structname>. Providers are - expected to embed this structure in a driver-specific structure. This - structure contains fields that describe a particular chip. - </para> - <para> - A chip exposes one or more PWM signal sources, each of which exposed - as a <structname>struct pwm_device</structname>. Operations can be - performed on PWM devices to control the period, duty cycle, polarity - and active state of the signal. - </para> - <para> - Note that PWM devices are exclusive resources: they can always only be - used by one consumer at a time. - </para> -!Iinclude/linux/pwm.h -!Edrivers/pwm/core.c - </chapter> - -</book> diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 1f345da28ec5..5f042349f987 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -1,5 +1,5 @@ HOWTO do Linux kernel development ---------------------------------- +================================= This is the be-all, end-all document on this topic. It contains instructions on how to become a Linux kernel developer and how to learn @@ -28,6 +28,7 @@ kernel development. Assembly (any architecture) is not required unless you plan to do low-level development for that architecture. Though they are not a good substitute for a solid C education and/or years of experience, the following books are good for, if anything, reference: + - "The C Programming Language" by Kernighan and Ritchie [Prentice Hall] - "Practical C Programming" by Steve Oualline [O'Reilly] - "C: A Reference Manual" by Harbison and Steele [Prentice Hall] @@ -64,7 +65,8 @@ people on the mailing lists are not lawyers, and you should not rely on their statements on legal matters. For common questions and answers about the GPL, please see: - http://www.gnu.org/licenses/gpl-faq.html + + https://www.gnu.org/licenses/gpl-faq.html Documentation @@ -82,96 +84,118 @@ linux-api@vger.kernel.org. Here is a list of files that are in the kernel source tree that are required reading: + README This file gives a short background on the Linux kernel and describes what is necessary to do to configure and build the kernel. People who are new to the kernel should start here. - Documentation/Changes + :ref:`Documentation/Changes <changes>` This file gives a list of the minimum levels of various software packages that are necessary to build and run the kernel successfully. - Documentation/CodingStyle + :ref:`Documentation/CodingStyle <codingstyle>` This describes the Linux kernel coding style, and some of the rationale behind it. All new code is expected to follow the guidelines in this document. Most maintainers will only accept patches if these rules are followed, and many people will only review code if it is in the proper style. - Documentation/SubmittingPatches - Documentation/SubmittingDrivers + :ref:`Documentation/SubmittingPatches <submittingpatches>` and :ref:`Documentation/SubmittingDrivers <submittingdrivers>` These files describe in explicit detail how to successfully create and send a patch, including (but not limited to): + - Email contents - Email format - Who to send it to + Following these rules will not guarantee success (as all patches are subject to scrutiny for content and style), but not following them will almost always prevent it. Other excellent descriptions of how to create patches properly are: + "The Perfect Patch" - http://www.ozlabs.org/~akpm/stuff/tpp.txt + https://www.ozlabs.org/~akpm/stuff/tpp.txt + "Linux kernel patch submission format" http://linux.yyz.us/patch-format.html - Documentation/stable_api_nonsense.txt + :ref:`Documentation/stable_api_nonsense.txt <stable_api_nonsense>` This file describes the rationale behind the conscious decision to not have a stable API within the kernel, including things like: + - Subsystem shim-layers (for compatibility?) - Driver portability between Operating Systems. - Mitigating rapid change within the kernel source tree (or preventing rapid change) + This document is crucial for understanding the Linux development philosophy and is very important for people moving to Linux from development on other Operating Systems. - Documentation/SecurityBugs + :ref:`Documentation/SecurityBugs <securitybugs>` If you feel you have found a security problem in the Linux kernel, please follow the steps in this document to help notify the kernel developers, and help solve the issue. - Documentation/ManagementStyle + :ref:`Documentation/ManagementStyle <managementstyle>` This document describes how Linux kernel maintainers operate and the shared ethos behind their methodologies. This is important reading for anyone new to kernel development (or anyone simply curious about it), as it resolves a lot of common misconceptions and confusion about the unique behavior of kernel maintainers. - Documentation/stable_kernel_rules.txt + :ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>` This file describes the rules on how the stable kernel releases happen, and what to do if you want to get a change into one of these releases. - Documentation/kernel-docs.txt + :ref:`Documentation/kernel-docs.txt <kernel_docs>` A list of external documentation that pertains to kernel development. Please consult this list if you do not find what you are looking for within the in-kernel documentation. - Documentation/applying-patches.txt + :ref:`Documentation/applying-patches.txt <applying_patches>` A good introduction describing exactly what a patch is and how to apply it to the different development branches of the kernel. The kernel also has a large number of documents that can be -automatically generated from the source code itself. This includes a +automatically generated from the source code itself or from +ReStructuredText markups (ReST), like this one. This includes a full description of the in-kernel API, and rules on how to handle -locking properly. The documents will be created in the -Documentation/DocBook/ directory and can be generated as PDF, -Postscript, HTML, and man pages by running: +locking properly. + +All such documents can be generated as PDF or HTML by running:: + make pdfdocs - make psdocs make htmldocs - make mandocs + respectively from the main kernel source directory. +The documents that uses ReST markup will be generated at Documentation/output. +They can also be generated on LaTeX and ePub formats with:: + + make latexdocs + make epubdocs + +Currently, there are some documents written on DocBook that are in +the process of conversion to ReST. Such documents will be created in the +Documentation/DocBook/ directory and can be generated also as +Postscript or man pages by running:: + + make psdocs + make mandocs Becoming A Kernel Developer --------------------------- If you do not know anything about Linux kernel development, you should look at the Linux KernelNewbies project: - http://kernelnewbies.org + + https://kernelnewbies.org + It consists of a helpful mailing list where you can ask almost any type of basic kernel development question (make sure to search the archives first, before asking something that has already been answered in the @@ -187,7 +211,9 @@ apply a patch. If you do not know where you want to start, but you want to look for some task to start doing to join into the kernel development community, go to the Linux Kernel Janitor's project: - http://kernelnewbies.org/KernelJanitors + + https://kernelnewbies.org/KernelJanitors + It is a great place to start. It describes a list of relatively simple problems that need to be cleaned up and fixed within the Linux kernel source tree. Working with the developers in charge of this project, you @@ -199,7 +225,8 @@ If you already have a chunk of code that you want to put into the kernel tree, but need some help getting it in the proper form, the kernel-mentors project was created to help you out with this. It is a mailing list, and can be found at: - http://selenic.com/mailman/listinfo/kernel-mentors + + https://selenic.com/mailman/listinfo/kernel-mentors Before making any actual modifications to the Linux kernel code, it is imperative to understand how the code in question works. For this @@ -209,6 +236,7 @@ tools. One such tool that is particularly recommended is the Linux Cross-Reference project, which is able to present source code in a self-referential, indexed webpage format. An excellent up-to-date repository of the kernel code may be found at: + http://lxr.free-electrons.com/ @@ -218,6 +246,7 @@ The development process Linux kernel development process currently consists of a few different main kernel "branches" and lots of different subsystem-specific kernel branches. These different branches are: + - main 4.x kernel tree - 4.x.y -stable kernel tree - 4.x -git kernel patches @@ -227,14 +256,15 @@ branches. These different branches are: 4.x kernel tree ----------------- 4.x kernels are maintained by Linus Torvalds, and can be found on -kernel.org in the pub/linux/kernel/v4.x/ directory. Its development +https://kernel.org in the pub/linux/kernel/v4.x/ directory. Its development process is as follows: + - As soon as a new kernel is released a two weeks window is open, during this period of time maintainers can submit big diffs to Linus, usually the patches that have already been included in the -next kernel for a few weeks. The preferred way to submit big changes is using git (the kernel's source management tool, more information - can be found at http://git-scm.com/) but plain patches are also just + can be found at https://git-scm.com/) but plain patches are also just fine. - After two weeks a -rc1 kernel is released it is now possible to push only patches that do not include new features that could affect the @@ -253,9 +283,10 @@ process is as follows: It is worth mentioning what Andrew Morton wrote on the linux-kernel mailing list about kernel releases: - "Nobody knows when a kernel will be released, because it's + + *"Nobody knows when a kernel will be released, because it's released according to perceived bug status, not according to a - preconceived timeline." + preconceived timeline."* 4.x.y -stable kernel tree ------------------------- @@ -301,7 +332,7 @@ submission and other already ongoing work are avoided. Most of these repositories are git trees, but there are also other SCMs in use, or patch queues being published as quilt series. Addresses of these subsystem repositories are listed in the MAINTAINERS file. Many -of them can be browsed at http://git.kernel.org/. +of them can be browsed at https://git.kernel.org/. Before a proposed patch is committed to such a subsystem tree, it is subject to review which primarily happens on mailing lists (see the @@ -310,7 +341,7 @@ process is tracked with the tool patchwork. Patchwork offers a web interface which shows patch postings, any comments on a patch or revisions to it, and maintainers can mark patches as under review, accepted, or rejected. Most of these patchwork sites are listed at -http://patchwork.kernel.org/. +https://patchwork.kernel.org/. 4.x -next kernel tree for integration tests ------------------------------------------- @@ -318,7 +349,8 @@ Before updates from subsystem trees are merged into the mainline 4.x tree, they need to be integration-tested. For this purpose, a special testing repository exists into which virtually all subsystem trees are pulled on an almost daily basis: - http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git + + https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git This way, the -next kernel gives a summary outlook onto what will be expected to go into the mainline kernel at the next merge period. @@ -328,10 +360,11 @@ Adventurous testers are very welcome to runtime-test the -next kernel. Bug Reporting ------------- -bugzilla.kernel.org is where the Linux kernel developers track kernel +https://bugzilla.kernel.org is where the Linux kernel developers track kernel bugs. Users are encouraged to report all bugs that they find in this tool. For details on how to use the kernel bugzilla, please see: - http://bugzilla.kernel.org/page.cgi?id=faq.html + + https://bugzilla.kernel.org/page.cgi?id=faq.html The file REPORTING-BUGS in the main kernel source directory has a good template for how to report a possible kernel bug, and details what kind @@ -349,13 +382,14 @@ your skills, and other developers will be aware of your presence. Fixing bugs is one of the best ways to get merits among other developers, because not many people like wasting time fixing other people's bugs. -To work in the already reported bug reports, go to http://bugzilla.kernel.org. +To work in the already reported bug reports, go to https://bugzilla.kernel.org. If you want to be advised of the future bug reports, you can subscribe to the bugme-new mailing list (only new bug reports are mailed here) or to the bugme-janitor mailing list (every change in the bugzilla is mailed here) - http://lists.linux-foundation.org/mailman/listinfo/bugme-new - http://lists.linux-foundation.org/mailman/listinfo/bugme-janitors + https://lists.linux-foundation.org/mailman/listinfo/bugme-new + + https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors @@ -365,10 +399,14 @@ Mailing lists As some of the above documents describe, the majority of the core kernel developers participate on the Linux Kernel Mailing list. Details on how to subscribe and unsubscribe from the list can be found at: + http://vger.kernel.org/vger-lists.html#linux-kernel + There are archives of the mailing list on the web in many different places. Use a search engine to find these archives. For example: + http://dir.gmane.org/gmane.linux.kernel + It is highly recommended that you search the archives about the topic you want to bring up, before you post it to the list. A lot of things already discussed in detail are only recorded at the mailing list @@ -381,11 +419,13 @@ groups. Many of the lists are hosted on kernel.org. Information on them can be found at: + http://vger.kernel.org/vger-lists.html Please remember to follow good behavioral habits when using the lists. Though a bit cheesy, the following URL has some simple guidelines for interacting with the list (or any list): + http://www.albion.com/netiquette/ If multiple people respond to your mail, the CC: list of recipients may @@ -400,13 +440,14 @@ add your statements between the individual quoted sections instead of writing at the top of the mail. If you add patches to your mail, make sure they are plain readable text -as stated in Documentation/SubmittingPatches. Kernel developers don't -want to deal with attachments or compressed patches; they may want -to comment on individual lines of your patch, which works only that way. -Make sure you use a mail program that does not mangle spaces and tab -characters. A good first test is to send the mail to yourself and try -to apply your own patch by yourself. If that doesn't work, get your -mail program fixed or change it until it works. +as stated in Documentation/SubmittingPatches. +Kernel developers don't want to deal with +attachments or compressed patches; they may want to comment on +individual lines of your patch, which works only that way. Make sure you +use a mail program that does not mangle spaces and tab characters. A +good first test is to send the mail to yourself and try to apply your +own patch by yourself. If that doesn't work, get your mail program fixed +or change it until it works. Above all, please remember to show respect to other subscribers. @@ -418,6 +459,7 @@ The goal of the kernel community is to provide the best possible kernel there is. When you submit a patch for acceptance, it will be reviewed on its technical merits and those alone. So, what should you be expecting? + - criticism - comments - requests for change @@ -432,6 +474,7 @@ If there are no responses to your posting, wait a few days and try again, sometimes things get lost in the huge volume. What should you not do? + - expect your patch to be accepted without question - become defensive - ignore comments @@ -445,8 +488,8 @@ Remember, being wrong is acceptable as long as you are willing to work toward a solution that is right. It is normal that the answers to your first patch might simply be a list -of a dozen things you should correct. This does _not_ imply that your -patch will not be accepted, and it is _not_ meant against you +of a dozen things you should correct. This does **not** imply that your +patch will not be accepted, and it is **not** meant against you personally. Simply correct all issues raised against your patch and resend it. @@ -457,7 +500,9 @@ Differences between the kernel community and corporate structures The kernel community works differently than most traditional corporate development environments. Here are a list of things that you can try to do to avoid problems: + Good things to say regarding your proposed changes: + - "This solves multiple problems." - "This deletes 2000 lines of code." - "Here is a patch that explains what I am trying to describe." @@ -466,6 +511,7 @@ do to avoid problems: - "This increases performance on typical machines..." Bad things you should avoid saying: + - "We did it this way in AIX/ptx/Solaris, so therefore it must be good..." - "I've being doing this for 20 years, so..." @@ -527,17 +573,18 @@ The reasons for breaking things up are the following: and simplify (or simply re-order) patches before submitting them. Here is an analogy from kernel developer Al Viro: - "Think of a teacher grading homework from a math student. The + + *"Think of a teacher grading homework from a math student. The teacher does not want to see the student's trials and errors before they came up with the solution. They want to see the cleanest, most elegant answer. A good student knows this, and would never submit her intermediate work before the final - solution." + solution.* - The same is true of kernel development. The maintainers and + *The same is true of kernel development. The maintainers and reviewers do not want to see the thought process behind the solution to the problem one is solving. They want to see a - simple and elegant solution." + simple and elegant solution."* It may be challenging to keep the balance between presenting an elegant solution and working together with the community and discussing your @@ -565,6 +612,7 @@ When sending in your patches, pay special attention to what you say in the text in your email. This information will become the ChangeLog information for the patch, and will be preserved for everyone to see for all time. It should describe the patch completely, containing: + - why the change is necessary - the overall design approach in the patch - implementation details @@ -572,12 +620,11 @@ all time. It should describe the patch completely, containing: For more details on what this should all look like, please see the ChangeLog section of the document: + "The Perfect Patch" http://www.ozlabs.org/~akpm/stuff/tpp.txt - - All of these things are sometimes very hard to do. It can take years to perfect these practices (if at all). It's a continuous process of improvement that requires a lot of patience and determination. But @@ -588,8 +635,9 @@ start exactly where you are now. ---------- + Thanks to Paolo Ciarrocchi who allowed the "Development Process" -(http://lwn.net/Articles/94386/) section +(https://lwn.net/Articles/94386/) section to be based on text he had written, and to Randy Dunlap and Gerrit Huizenga for some of the list of things you should and should not say. Also thanks to Pat Mochel, Hanna Linder, Randy Dunlap, Kay Sievers, diff --git a/Documentation/Makefile.sphinx b/Documentation/Makefile.sphinx index 857f1e273418..92deea30b183 100644 --- a/Documentation/Makefile.sphinx +++ b/Documentation/Makefile.sphinx @@ -5,6 +5,9 @@ # You can set these variables from the command line. SPHINXBUILD = sphinx-build SPHINXOPTS = +SPHINXDIRS = . +_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py)) +SPHINX_CONF = conf.py PAPER = BUILDDIR = $(obj)/output @@ -25,38 +28,62 @@ else ifneq ($(DOCBOOKS),) else # HAVE_SPHINX -# User-friendly check for rst2pdf -HAVE_RST2PDF := $(shell if python -c "import rst2pdf" >/dev/null 2>&1; then echo 1; else echo 0; fi) +# User-friendly check for pdflatex +HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi) # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter KERNELDOC = $(srctree)/scripts/kernel-doc KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC) -ALLSPHINXOPTS = -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) -d $(BUILDDIR)/.doctrees $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) -c $(srctree)/$(src) $(SPHINXOPTS) $(srctree)/$(src) +ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . -quiet_cmd_sphinx = SPHINX $@ - cmd_sphinx = BUILDDIR=$(BUILDDIR) $(SPHINXBUILD) -b $2 $(ALLSPHINXOPTS) $(BUILDDIR)/$2 +# commands; the 'cmd' from scripts/Kbuild.include is not *loopable* +loop_cmd = $(echo-cmd) $(cmd_$(1)) + +# $2 sphinx builder e.g. "html" +# $3 name of the build subfolder / e.g. "media", used as: +# * dest folder relative to $(BUILDDIR) and +# * cache folder relative to $(BUILDDIR)/.doctrees +# $4 dest subfolder e.g. "man" for man pages at media/man +# $5 reST source folder relative to $(srctree)/$(src), +# e.g. "media" for the linux-tv book-set at ./Documentation/media + +quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4); + cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\ + BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ + $(SPHINXBUILD) \ + -b $2 \ + -c $(abspath $(srctree)/$(src)) \ + -d $(abspath $(BUILDDIR)/.doctrees/$3) \ + -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \ + $(ALLSPHINXOPTS) \ + $(abspath $(srctree)/$(src)/$5) \ + $(abspath $(BUILDDIR)/$3/$4); htmldocs: - $(MAKE) BUILDDIR=$(BUILDDIR) -f $(srctree)/Documentation/media/Makefile $@ - $(call cmd,sphinx,html) + @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) -pdfdocs: -ifeq ($(HAVE_RST2PDF),0) - $(warning The Python 'rst2pdf' module was not found. Make sure you have the module installed to produce PDF output.) +latexdocs: +ifeq ($(HAVE_PDFLATEX),0) + $(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.) @echo " SKIP Sphinx $@ target." -else # HAVE_RST2PDF - $(call cmd,sphinx,pdf) -endif # HAVE_RST2PDF +else # HAVE_PDFLATEX + @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) +endif # HAVE_PDFLATEX + +pdfdocs: latexdocs +ifneq ($(HAVE_PDFLATEX),0) + $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex) +endif # HAVE_PDFLATEX epubdocs: - $(call cmd,sphinx,epub) + @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var))) xmldocs: - $(call cmd,sphinx,xml) + @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var))) # no-ops for the Sphinx toolchain sgmldocs: @@ -72,7 +99,14 @@ endif # HAVE_SPHINX dochelp: @echo ' Linux kernel internal documentation in different formats (Sphinx):' @echo ' htmldocs - HTML' + @echo ' latexdocs - LaTeX' @echo ' pdfdocs - PDF' @echo ' epubdocs - EPUB' @echo ' xmldocs - XML' @echo ' cleandocs - clean all generated files' + @echo + @echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2' + @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)' + @echo + @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build' + @echo ' configuration. This is e.g. useful to build with nit-picking config.' diff --git a/Documentation/ManagementStyle b/Documentation/ManagementStyle index a211ee8d8b44..dea2e66c9a10 100644 --- a/Documentation/ManagementStyle +++ b/Documentation/ManagementStyle @@ -1,10 +1,12 @@ +.. _managementstyle: - Linux kernel management style +Linux kernel management style +============================= This is a short document describing the preferred (or made up, depending on who you ask) management style for the linux kernel. It's meant to mirror the CodingStyle document to some degree, and mainly written to -avoid answering (*) the same (or similar) questions over and over again. +avoid answering [#f1]_ the same (or similar) questions over and over again. Management style is very personal and much harder to quantify than simple coding style rules, so this document may or may not have anything @@ -14,50 +16,52 @@ might not actually be true. You'll have to decide for yourself. Btw, when talking about "kernel manager", it's all about the technical lead persons, not the people who do traditional management inside companies. If you sign purchase orders or you have any clue about the -budget of your group, you're almost certainly not a kernel manager. -These suggestions may or may not apply to you. +budget of your group, you're almost certainly not a kernel manager. +These suggestions may or may not apply to you. First off, I'd suggest buying "Seven Habits of Highly Effective -People", and NOT read it. Burn it, it's a great symbolic gesture. +People", and NOT read it. Burn it, it's a great symbolic gesture. -(*) This document does so not so much by answering the question, but by -making it painfully obvious to the questioner that we don't have a clue -to what the answer is. +.. [#f1] This document does so not so much by answering the question, but by + making it painfully obvious to the questioner that we don't have a clue + to what the answer is. Anyway, here goes: +.. _decisions: - Chapter 1: Decisions +1) Decisions +------------ Everybody thinks managers make decisions, and that decision-making is important. The bigger and more painful the decision, the bigger the manager must be to make it. That's very deep and obvious, but it's not -actually true. +actually true. -The name of the game is to _avoid_ having to make a decision. In +The name of the game is to **avoid** having to make a decision. In particular, if somebody tells you "choose (a) or (b), we really need you to decide on this", you're in trouble as a manager. The people you manage had better know the details better than you, so if they come to you for a technical decision, you're screwed. You're clearly not -competent to make that decision for them. +competent to make that decision for them. (Corollary:if the people you manage don't know the details better than -you, you're also screwed, although for a totally different reason. -Namely that you are in the wrong job, and that _they_ should be managing -your brilliance instead). +you, you're also screwed, although for a totally different reason. +Namely that you are in the wrong job, and that **they** should be managing +your brilliance instead). -So the name of the game is to _avoid_ decisions, at least the big and +So the name of the game is to **avoid** decisions, at least the big and painful ones. Making small and non-consequential decisions is fine, and makes you look like you know what you're doing, so what a kernel manager needs to do is to turn the big and painful ones into small things where -nobody really cares. +nobody really cares. It helps to realize that the key difference between a big decision and a small one is whether you can fix your decision afterwards. Any decision can be made small by just always making sure that if you were wrong (and -you _will_ be wrong), you can always undo the damage later by +you **will** be wrong), you can always undo the damage later by backtracking. Suddenly, you get to be doubly managerial for making -_two_ inconsequential decisions - the wrong one _and_ the right one. +**two** inconsequential decisions - the wrong one **and** the right one. And people will even see that as true leadership (*cough* bullshit *cough*). @@ -65,10 +69,10 @@ And people will even see that as true leadership (*cough* bullshit Thus the key to avoiding big decisions becomes to just avoiding to do things that can't be undone. Don't get ushered into a corner from which you cannot escape. A cornered rat may be dangerous - a cornered manager -is just pitiful. +is just pitiful. It turns out that since nobody would be stupid enough to ever really let -a kernel manager have huge fiscal responsibility _anyway_, it's usually +a kernel manager have huge fiscal responsibility **anyway**, it's usually fairly easy to backtrack. Since you're not going to be able to waste huge amounts of money that you might not be able to repay, the only thing you can backtrack on is a technical decision, and there @@ -76,113 +80,118 @@ back-tracking is very easy: just tell everybody that you were an incompetent nincompoop, say you're sorry, and undo all the worthless work you had people work on for the last year. Suddenly the decision you made a year ago wasn't a big decision after all, since it could be -easily undone. +easily undone. It turns out that some people have trouble with this approach, for two reasons: + - admitting you were an idiot is harder than it looks. We all like to maintain appearances, and coming out in public to say that you were - wrong is sometimes very hard indeed. + wrong is sometimes very hard indeed. - having somebody tell you that what you worked on for the last year wasn't worthwhile after all can be hard on the poor lowly engineers - too, and while the actual _work_ was easy enough to undo by just + too, and while the actual **work** was easy enough to undo by just deleting it, you may have irrevocably lost the trust of that engineer. And remember: "irrevocable" was what we tried to avoid in the first place, and your decision ended up being a big one after - all. + all. Happily, both of these reasons can be mitigated effectively by just admitting up-front that you don't have a friggin' clue, and telling people ahead of the fact that your decision is purely preliminary, and might be the wrong thing. You should always reserve the right to change -your mind, and make people very _aware_ of that. And it's much easier -to admit that you are stupid when you haven't _yet_ done the really +your mind, and make people very **aware** of that. And it's much easier +to admit that you are stupid when you haven't **yet** done the really stupid thing. Then, when it really does turn out to be stupid, people just roll their -eyes and say "Oops, he did it again". +eyes and say "Oops, he did it again". This preemptive admission of incompetence might also make the people who actually do the work also think twice about whether it's worth doing or -not. After all, if _they_ aren't certain whether it's a good idea, you +not. After all, if **they** aren't certain whether it's a good idea, you sure as hell shouldn't encourage them by promising them that what they work on will be included. Make them at least think twice before they -embark on a big endeavor. +embark on a big endeavor. Remember: they'd better know more about the details than you do, and they usually already think they have the answer to everything. The best thing you can do as a manager is not to instill confidence, but rather a -healthy dose of critical thinking on what they do. +healthy dose of critical thinking on what they do. Btw, another way to avoid a decision is to plaintively just whine "can't we just do both?" and look pitiful. Trust me, it works. If it's not clear which approach is better, they'll eventually figure it out. The answer may end up being that both teams get so frustrated by the -situation that they just give up. +situation that they just give up. That may sound like a failure, but it's usually a sign that there was something wrong with both projects, and the reason the people involved couldn't decide was that they were both wrong. You end up coming up smelling like roses, and you avoided yet another decision that you could -have screwed up on. +have screwed up on. - Chapter 2: People +2) People +--------- Most people are idiots, and being a manager means you'll have to deal -with it, and perhaps more importantly, that _they_ have to deal with -_you_. +with it, and perhaps more importantly, that **they** have to deal with +**you**. It turns out that while it's easy to undo technical mistakes, it's not as easy to undo personality disorders. You just have to live with -theirs - and yours. +theirs - and yours. However, in order to prepare yourself as a kernel manager, it's best to remember not to burn any bridges, bomb any innocent villagers, or alienate too many kernel developers. It turns out that alienating people is fairly easy, and un-alienating them is hard. Thus "alienating" immediately falls under the heading of "not reversible", and becomes a -no-no according to Chapter 1. +no-no according to :ref:`decisions`. There's just a few simple rules here: + (1) don't call people d*ckheads (at least not in public) (2) learn how to apologize when you forgot rule (1) The problem with #1 is that it's very easy to do, since you can say -"you're a d*ckhead" in millions of different ways (*), sometimes without +"you're a d*ckhead" in millions of different ways [#f2]_, sometimes without even realizing it, and almost always with a white-hot conviction that -you are right. +you are right. And the more convinced you are that you are right (and let's face it, -you can call just about _anybody_ a d*ckhead, and you often _will_ be -right), the harder it ends up being to apologize afterwards. +you can call just about **anybody** a d*ckhead, and you often **will** be +right), the harder it ends up being to apologize afterwards. To solve this problem, you really only have two options: + - get really good at apologies - spread the "love" out so evenly that nobody really ends up feeling like they get unfairly targeted. Make it inventive enough, and they - might even be amused. + might even be amused. The option of being unfailingly polite really doesn't exist. Nobody will trust somebody who is so clearly hiding his true character. -(*) Paul Simon sang "Fifty Ways to Leave Your Lover", because quite -frankly, "A Million Ways to Tell a Developer He Is a D*ckhead" doesn't -scan nearly as well. But I'm sure he thought about it. +.. [#f2] Paul Simon sang "Fifty Ways to Leave Your Lover", because quite + frankly, "A Million Ways to Tell a Developer He Is a D*ckhead" doesn't + scan nearly as well. But I'm sure he thought about it. - Chapter 3: People II - the Good Kind +3) People II - the Good Kind +---------------------------- While it turns out that most people are idiots, the corollary to that is sadly that you are one too, and that while we can all bask in the secure knowledge that we're better than the average person (let's face it, nobody ever believes that they're average or below-average), we should also admit that we're not the sharpest knife around, and there will be -other people that are less of an idiot than you are. +other people that are less of an idiot than you are. -Some people react badly to smart people. Others take advantage of them. +Some people react badly to smart people. Others take advantage of them. -Make sure that you, as a kernel maintainer, are in the second group. +Make sure that you, as a kernel maintainer, are in the second group. Suck up to them, because they are the people who will make your job easier. In particular, they'll be able to make your decisions for you, which is what the game is all about. @@ -191,7 +200,7 @@ So when you find somebody smarter than you are, just coast along. Your management responsibilities largely become ones of saying "Sounds like a good idea - go wild", or "That sounds good, but what about xxx?". The second version in particular is a great way to either learn something -new about "xxx" or seem _extra_ managerial by pointing out something the +new about "xxx" or seem **extra** managerial by pointing out something the smarter person hadn't thought about. In either case, you win. One thing to look out for is to realize that greatness in one area does @@ -199,47 +208,49 @@ not necessarily translate to other areas. So you might prod people in specific directions, but let's face it, they might be good at what they do, and suck at everything else. The good news is that people tend to naturally gravitate back to what they are good at, so it's not like you -are doing something irreversible when you _do_ prod them in some +are doing something irreversible when you **do** prod them in some direction, just don't push too hard. - Chapter 4: Placing blame +4) Placing blame +---------------- Things will go wrong, and people want somebody to blame. Tag, you're it. It's not actually that hard to accept the blame, especially if people -kind of realize that it wasn't _all_ your fault. Which brings us to the +kind of realize that it wasn't **all** your fault. Which brings us to the best way of taking the blame: do it for another guy. You'll feel good for taking the fall, he'll feel good about not getting blamed, and the guy who lost his whole 36GB porn-collection because of your incompetence will grudgingly admit that you at least didn't try to weasel out of it. Then make the developer who really screwed up (if you can find him) know -_in_private_ that he screwed up. Not just so he can avoid it in the +**in_private** that he screwed up. Not just so he can avoid it in the future, but so that he knows he owes you one. And, perhaps even more importantly, he's also likely the person who can fix it. Because, let's -face it, it sure ain't you. +face it, it sure ain't you. -Taking the blame is also why you get to be manager in the first place. +Taking the blame is also why you get to be manager in the first place. It's part of what makes people trust you, and allow you the potential glory, because you're the one who gets to say "I screwed up". And if you've followed the previous rules, you'll be pretty good at saying that -by now. +by now. - Chapter 5: Things to avoid +5) Things to avoid +------------------ There's one thing people hate even more than being called "d*ckhead", and that is being called a "d*ckhead" in a sanctimonious voice. The first you can apologize for, the second one you won't really get the chance. They likely will no longer be listening even if you otherwise -do a good job. +do a good job. We all think we're better than anybody else, which means that when -somebody else puts on airs, it _really_ rubs us the wrong way. You may +somebody else puts on airs, it **really** rubs us the wrong way. You may be morally and intellectually superior to everybody around you, but -don't try to make it too obvious unless you really _intend_ to irritate -somebody (*). +don't try to make it too obvious unless you really **intend** to irritate +somebody [#f3]_. Similarly, don't be too polite or subtle about things. Politeness easily ends up going overboard and hiding the problem, and as they say, "On the @@ -251,15 +262,16 @@ Some humor can help pad both the bluntness and the moralizing. Going overboard to the point of being ridiculous can drive a point home without making it painful to the recipient, who just thinks you're being silly. It can thus help get through the personal mental block we all -have about criticism. +have about criticism. -(*) Hint: internet newsgroups that are not directly related to your work -are great ways to take out your frustrations at other people. Write -insulting posts with a sneer just to get into a good flame every once in -a while, and you'll feel cleansed. Just don't crap too close to home. +.. [#f3] Hint: internet newsgroups that are not directly related to your work + are great ways to take out your frustrations at other people. Write + insulting posts with a sneer just to get into a good flame every once in + a while, and you'll feel cleansed. Just don't crap too close to home. - Chapter 6: Why me? +6) Why me? +---------- Since your main responsibility seems to be to take the blame for other peoples mistakes, and make it painfully obvious to everybody else that @@ -268,9 +280,9 @@ first place? First off, while you may or may not get screaming teenage girls (or boys, let's not be judgmental or sexist here) knocking on your dressing -room door, you _will_ get an immense feeling of personal accomplishment +room door, you **will** get an immense feeling of personal accomplishment for being "in charge". Never mind the fact that you're really leading by trying to keep up with everybody else and running after them as fast -as you can. Everybody will still think you're the person in charge. +as you can. Everybody will still think you're the person in charge. It's a great job if you can hack it. diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index ece410f40436..a4d3838130e4 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2493,6 +2493,28 @@ or some future “lazy” variant of <tt>call_rcu()</tt> that might one day be created for energy-efficiency purposes. +<p> +That said, there are limits. +RCU requires that the <tt>rcu_head</tt> structure be aligned to a +two-byte boundary, and passing a misaligned <tt>rcu_head</tt> +structure to one of the <tt>call_rcu()</tt> family of functions +will result in a splat. +It is therefore necessary to exercise caution when packing +structures containing fields of type <tt>rcu_head</tt>. +Why not a four-byte or even eight-byte alignment requirement? +Because the m68k architecture provides only two-byte alignment, +and thus acts as alignment's least common denominator. + +<p> +The reason for reserving the bottom bit of pointers to +<tt>rcu_head</tt> structures is to leave the door open to +“lazy” callbacks whose invocations can safely be deferred. +Deferring invocation could potentially have energy-efficiency +benefits, but only if the rate of non-lazy callbacks decreases +significantly for some important workload. +In the meantime, reserving the bottom bit keeps this option open +in case it one day becomes useful. + <h3><a name="Performance, Scalability, Response Time, and Reliability"> Performance, Scalability, Response Time, and Reliability</a></h3> diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 118e7c176ce7..278f6a9383b6 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -10,21 +10,6 @@ status messages via printk(), which can be examined via the dmesg command (perhaps grepping for "torture"). The test is started when the module is loaded, and stops when the module is unloaded. -CONFIG_RCU_TORTURE_TEST_RUNNABLE - -It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will -result in the tests being loaded into the base kernel. In this case, -the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify -whether the RCU torture tests are to be started immediately during -boot or whether the /proc/sys/kernel/rcutorture_runnable file is used -to enable them. This /proc file can be used to repeatedly pause and -restart the tests, regardless of the initial state specified by the -CONFIG_RCU_TORTURE_TEST_RUNNABLE config option. - -You will normally -not- want to start the RCU torture tests during boot -(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing -this can sometimes be useful in finding boot-time bugs. - MODULE PARAMETERS diff --git a/Documentation/SecurityBugs b/Documentation/SecurityBugs index a660d494c8ed..342d769834f6 100644 --- a/Documentation/SecurityBugs +++ b/Documentation/SecurityBugs @@ -1,9 +1,15 @@ +.. _securitybugs: + +Security bugs +============= + Linux kernel developers take security very seriously. As such, we'd like to know when a security bug is found so that it can be fixed and disclosed as quickly as possible. Please report security bugs to the Linux kernel security team. 1) Contact +---------- The Linux kernel security team can be contacted by email at <security@kernel.org>. This is a private list of security officers @@ -18,6 +24,7 @@ Any exploit code is very helpful and will not be released without consent from the reporter unless it has already been made public. 2) Disclosure +------------- The goal of the Linux kernel security team is to work with the bug submitter to bug resolution as well as disclosure. We prefer @@ -33,6 +40,7 @@ to a few weeks. As a basic default policy, we expect report date to disclosure date to be on the order of 7 days. 3) Non-disclosure agreements +---------------------------- The Linux kernel security team is not a formal body and therefore unable to enter any non-disclosure agreements. diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 2b7e32dfe00d..894289b22b15 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist @@ -1,109 +1,120 @@ +.. _submitchecklist: + Linux Kernel patch submission checklist -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here are some basic things that developers should do if they want to see their kernel patch submissions accepted more quickly. These are all above and beyond the documentation that is provided in -Documentation/SubmittingPatches and elsewhere regarding submitting Linux -kernel patches. +:ref:`Documentation/SubmittingPatches <submittingpatches>` +and elsewhere regarding submitting Linux kernel patches. -1: If you use a facility then #include the file that defines/declares +1) If you use a facility then #include the file that defines/declares that facility. Don't depend on other header files pulling in ones that you use. -2: Builds cleanly with applicable or modified CONFIG options =y, =m, and - =n. No gcc warnings/errors, no linker warnings/errors. +2) Builds cleanly: + + a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and + ``=n``. No ``gcc`` warnings/errors, no linker warnings/errors. -2b: Passes allnoconfig, allmodconfig + b) Passes ``allnoconfig``, ``allmodconfig`` -2c: Builds successfully when using O=builddir + c) Builds successfully when using ``O=builddir`` -3: Builds on multiple CPU architectures by using local cross-compile tools +3) Builds on multiple CPU architectures by using local cross-compile tools or some other build farm. -4: ppc64 is a good architecture for cross-compilation checking because it - tends to use `unsigned long' for 64-bit quantities. +4) ppc64 is a good architecture for cross-compilation checking because it + tends to use ``unsigned long`` for 64-bit quantities. -5: Check your patch for general style as detailed in - Documentation/CodingStyle. Check for trivial violations with the - patch style checker prior to submission (scripts/checkpatch.pl). +5) Check your patch for general style as detailed in + :ref:`Documentation/CodingStyle <codingstyle>`. + Check for trivial violations with the patch style checker prior to + submission (``scripts/checkpatch.pl``). You should be able to justify all violations that remain in your patch. -6: Any new or modified CONFIG options don't muck up the config menu. +6) Any new or modified ``CONFIG`` options don't muck up the config menu. -7: All new Kconfig options have help text. +7) All new ``Kconfig`` options have help text. -8: Has been carefully reviewed with respect to relevant Kconfig +8) Has been carefully reviewed with respect to relevant ``Kconfig`` combinations. This is very hard to get right with testing -- brainpower pays off here. -9: Check cleanly with sparse. +9) Check cleanly with sparse. + +10) Use ``make checkstack`` and ``make namespacecheck`` and fix any problems + that they find. + + .. note:: -10: Use 'make checkstack' and 'make namespacecheck' and fix any problems - that they find. Note: checkstack does not point out problems explicitly, - but any one function that uses more than 512 bytes on the stack is a - candidate for change. + ``checkstack`` does not point out problems explicitly, + but any one function that uses more than 512 bytes on the stack is a + candidate for change. -11: Include kernel-doc to document global kernel APIs. (Not required for - static functions, but OK there also.) Use 'make htmldocs' or 'make - mandocs' to check the kernel-doc and fix any issues. +11) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs. + (Not required for static functions, but OK there also.) Use + ``make htmldocs`` or ``make pdfdocs`` to check the + :ref:`kernel-doc <kernel_doc>` and fix any issues. -12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, - CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, - CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_PROVE_RCU - and CONFIG_DEBUG_OBJECTS_RCU_HEAD all simultaneously enabled. +12) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, + ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, + ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, + ``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all + simultaneously enabled. -13: Has been build- and runtime tested with and without CONFIG_SMP and - CONFIG_PREEMPT. +13) Has been build- and runtime tested with and without ``CONFIG_SMP`` and + ``CONFIG_PREEMPT.`` -14: If the patch affects IO/Disk, etc: has been tested with and without - CONFIG_LBDAF. +14) If the patch affects IO/Disk, etc: has been tested with and without + ``CONFIG_LBDAF.`` -15: All codepaths have been exercised with all lockdep features enabled. +15) All codepaths have been exercised with all lockdep features enabled. -16: All new /proc entries are documented under Documentation/ +16) All new ``/proc`` entries are documented under ``Documentation/`` -17: All new kernel boot parameters are documented in - Documentation/kernel-parameters.txt. +17) All new kernel boot parameters are documented in + ``Documentation/kernel-parameters.txt``. -18: All new module parameters are documented with MODULE_PARM_DESC() +18) All new module parameters are documented with ``MODULE_PARM_DESC()`` -19: All new userspace interfaces are documented in Documentation/ABI/. - See Documentation/ABI/README for more information. +19) All new userspace interfaces are documented in ``Documentation/ABI/``. + See ``Documentation/ABI/README`` for more information. Patches that change userspace interfaces should be CCed to linux-api@vger.kernel.org. -20: Check that it all passes `make headers_check'. +20) Check that it all passes ``make headers_check``. -21: Has been checked with injection of at least slab and page-allocation - failures. See Documentation/fault-injection/. +21) Has been checked with injection of at least slab and page-allocation + failures. See ``Documentation/fault-injection/``. If the new code is substantial, addition of subsystem-specific fault injection might be appropriate. -22: Newly-added code has been compiled with `gcc -W' (use "make - EXTRA_CFLAGS=-W"). This will generate lots of noise, but is good for - finding bugs like "warning: comparison between signed and unsigned". +22) Newly-added code has been compiled with ``gcc -W`` (use + ``make EXTRA_CFLAGS=-W``). This will generate lots of noise, but is good + for finding bugs like "warning: comparison between signed and unsigned". -23: Tested after it has been merged into the -mm patchset to make sure +23) Tested after it has been merged into the -mm patchset to make sure that it still works with all of the other queued patches and various changes in the VM, VFS, and other subsystems. -24: All memory barriers {e.g., barrier(), rmb(), wmb()} need a comment in the - source code that explains the logic of what they are doing and why. +24) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a + comment in the source code that explains the logic of what they are doing + and why. -25: If any ioctl's are added by the patch, then also update - Documentation/ioctl/ioctl-number.txt. +25) If any ioctl's are added by the patch, then also update + ``Documentation/ioctl/ioctl-number.txt``. -26: If your modified source code depends on or uses any of the kernel - APIs or features that are related to the following kconfig symbols, - then test multiple builds with the related kconfig symbols disabled - and/or =m (if that option is available) [not all of these at the +26) If your modified source code depends on or uses any of the kernel + APIs or features that are related to the following ``Kconfig`` symbols, + then test multiple builds with the related ``Kconfig`` symbols disabled + and/or ``=m`` (if that option is available) [not all of these at the same time, just various/random combinations of them]: - CONFIG_SMP, CONFIG_SYSFS, CONFIG_PROC_FS, CONFIG_INPUT, CONFIG_PCI, - CONFIG_BLOCK, CONFIG_PM, CONFIG_MAGIC_SYSRQ, - CONFIG_NET, CONFIG_INET=n (but latter with CONFIG_NET=y) + ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, + ``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``). diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers index 31d372609ac0..252b77a23fad 100644 --- a/Documentation/SubmittingDrivers +++ b/Documentation/SubmittingDrivers @@ -1,5 +1,7 @@ +.. _submittingdrivers: + Submitting Drivers For The Linux Kernel ---------------------------------------- +======================================= This document is intended to explain how to submit device drivers to the various kernel trees. Note that if you are interested in video card drivers @@ -38,42 +40,48 @@ Linux 2.4: maintainer does not respond or you cannot find the appropriate maintainer then please contact Willy Tarreau <w@1wt.eu>. -Linux 2.6: +Linux 2.6 and upper: The same rules apply as 2.4 except that you should follow linux-kernel - to track changes in API's. The final contact point for Linux 2.6 + to track changes in API's. The final contact point for Linux 2.6+ submissions is Andrew Morton. What Criteria Determine Acceptance ---------------------------------- -Licensing: The code must be released to us under the +Licensing: + The code must be released to us under the GNU General Public License. We don't insist on any kind of exclusive GPL licensing, and if you wish the driver to be useful to other communities such as BSD you may well wish to release under multiple licenses. See accepted licenses at include/linux/module.h -Copyright: The copyright owner must agree to use of GPL. +Copyright: + The copyright owner must agree to use of GPL. It's best if the submitter and copyright owner are the same person/entity. If not, the name of the person/entity authorizing use of GPL should be listed in case it's necessary to verify the will of the copyright owner. -Interfaces: If your driver uses existing interfaces and behaves like +Interfaces: + If your driver uses existing interfaces and behaves like other drivers in the same class it will be much more likely to be accepted than if it invents gratuitous new ones. If you need to implement a common API over Linux and NT drivers do it in userspace. -Code: Please use the Linux style of code formatting as documented - in Documentation/CodingStyle. If you have sections of code +Code: + Please use the Linux style of code formatting as documented + in :ref:`Documentation/CodingStyle <codingStyle>`. + If you have sections of code that need to be in other formats, for example because they are shared with a windows driver kit and you want to maintain them just once separate them out nicely and note this fact. -Portability: Pointers are not always 32bits, not all computers are little +Portability: + Pointers are not always 32bits, not all computers are little endian, people do not all have floating point and you shouldn't use inline x86 assembler in your driver without careful thought. Pure x86 drivers generally are not popular. @@ -81,12 +89,14 @@ Portability: Pointers are not always 32bits, not all computers are little but it is easy to make sure the code can easily be made portable. -Clarity: It helps if anyone can see how to fix the driver. It helps +Clarity: + It helps if anyone can see how to fix the driver. It helps you because you get patches not bug reports. If you submit a driver that intentionally obfuscates how the hardware works it will go in the bitbucket. -PM support: Since Linux is used on many portable and desktop systems, your +PM support: + Since Linux is used on many portable and desktop systems, your driver is likely to be used on such a system and therefore it should support basic power management by implementing, if necessary, the .suspend and .resume methods used during the @@ -101,7 +111,8 @@ PM support: Since Linux is used on many portable and desktop systems, your complete overview of the power management issues related to drivers see Documentation/power/devices.txt . -Control: In general if there is active maintenance of a driver by +Control: + In general if there is active maintenance of a driver by the author then patches will be redirected to them unless they are totally obvious and without need of checking. If you want to be the contact and update point for the @@ -111,13 +122,15 @@ Control: In general if there is active maintenance of a driver by What Criteria Do Not Determine Acceptance ----------------------------------------- -Vendor: Being the hardware vendor and maintaining the driver is +Vendor: + Being the hardware vendor and maintaining the driver is often a good thing. If there is a stable working driver from other people already in the tree don't expect 'we are the vendor' to get your driver chosen. Ideally work with the existing driver author to build a single perfect driver. -Author: It doesn't matter if a large Linux company wrote the driver, +Author: + It doesn't matter if a large Linux company wrote the driver, or you did. Nobody has any special access to the kernel tree. Anyone who tells you otherwise isn't telling the whole story. @@ -127,8 +140,10 @@ Resources --------- Linux kernel master tree: - ftp.??.kernel.org:/pub/linux/kernel/... - ?? == your country code, such as "us", "uk", "fr", etc. + ftp.\ *country_code*\ .kernel.org:/pub/linux/kernel/... + + where *country_code* == your country code, such as + **us**, **uk**, **fr**, etc. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git @@ -141,14 +156,19 @@ Linux Device Drivers, Third Edition (covers 2.6.10): LWN.net: Weekly summary of kernel development activity - http://lwn.net/ + 2.6 API changes: + http://lwn.net/Articles/2.6-kernel-api/ + Porting drivers from prior kernels to 2.6: + http://lwn.net/Articles/driver-porting/ KernelNewbies: Documentation and assistance for new kernel programmers - http://kernelnewbies.org/ + + http://kernelnewbies.org/ Linux USB project: http://www.linux-usb.org/ diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 8c79f1d53731..36f1dedc944c 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -1,9 +1,7 @@ +.. _submittingpatches: - How to Get Your Change Into the Linux Kernel - or - Care And Operation Of Your Linus Torvalds - - +How to Get Your Change Into the Linux Kernel or Care And Operation Of Your Linus Torvalds +========================================================================================= For a person or company who wishes to submit a change to the Linux kernel, the process can sometimes be daunting if you're not familiar @@ -12,57 +10,59 @@ can greatly increase the chances of your change being accepted. This document contains a large number of suggestions in a relatively terse format. For detailed information on how the kernel development process -works, see Documentation/development-process. Also, read -Documentation/SubmitChecklist for a list of items to check before +works, see :ref:`Documentation/development-process <development_process_main>`. +Also, read :ref:`Documentation/SubmitChecklist <submitchecklist>` +for a list of items to check before submitting code. If you are submitting a driver, also read -Documentation/SubmittingDrivers; for device tree binding patches, read +:ref:`Documentation/SubmittingDrivers <submittingdrivers>`; +for device tree binding patches, read Documentation/devicetree/bindings/submitting-patches.txt. -Many of these steps describe the default behavior of the git version -control system; if you use git to prepare your patches, you'll find much +Many of these steps describe the default behavior of the ``git`` version +control system; if you use ``git`` to prepare your patches, you'll find much of the mechanical work done for you, though you'll still need to prepare -and document a sensible set of patches. In general, use of git will make +and document a sensible set of patches. In general, use of ``git`` will make your life as a kernel developer easier. --------------------------------------------- -SECTION 1 - CREATING AND SENDING YOUR CHANGE --------------------------------------------- +Creating and Sending your Change +******************************** 0) Obtain a current source tree ------------------------------- If you do not have a repository with the current kernel source handy, use -git to obtain one. You'll want to start with the mainline repository, -which can be grabbed with: +``git`` to obtain one. You'll want to start with the mainline repository, +which can be grabbed with:: - git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Note, however, that you may not want to develop against the mainline tree directly. Most subsystem maintainers run their own trees and want to see -patches prepared against those trees. See the "T:" entry for the subsystem +patches prepared against those trees. See the **T:** entry for the subsystem in the MAINTAINERS file to find that tree, or simply ask the maintainer if the tree is not listed there. It is still possible to download kernel releases via tarballs (as described in the next section), but that is the hard way to do kernel development. -1) "diff -up" ------------- +1) ``diff -up`` +--------------- -If you must generate your patches by hand, use "diff -up" or "diff -uprN" +If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN`` to create patches. Git generates patches in this form by default; if -you're using git, you can skip this section entirely. +you're using ``git``, you can skip this section entirely. All changes to the Linux kernel occur in the form of patches, as -generated by diff(1). When creating your patch, make sure to create it -in "unified diff" format, as supplied by the '-u' argument to diff(1). -Also, please use the '-p' argument which shows which C function each -change is in - that makes the resultant diff a lot easier to read. +generated by :manpage:`diff(1)`. When creating your patch, make sure to +create it in "unified diff" format, as supplied by the ``-u`` argument +to :manpage:`diff(1)`. +Also, please use the ``-p`` argument which shows which C function each +change is in - that makes the resultant ``diff`` a lot easier to read. Patches should be based in the root kernel source directory, not in any lower subdirectory. -To create a patch for a single file, it is often sufficient to do: +To create a patch for a single file, it is often sufficient to do:: SRCTREE= linux MYFILE= drivers/net/mydriver.c @@ -74,8 +74,8 @@ To create a patch for a single file, it is often sufficient to do: diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch To create a patch for multiple files, you should unpack a "vanilla", -or unmodified kernel source tree, and generate a diff against your -own source tree. For example: +or unmodified kernel source tree, and generate a ``diff`` against your +own source tree. For example:: MYSRC= /devel/linux @@ -84,27 +84,27 @@ own source tree. For example: diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \ linux-3.19-vanilla $MYSRC > /tmp/patch -"dontdiff" is a list of files which are generated by the kernel during -the build process, and should be ignored in any diff(1)-generated +``dontdiff`` is a list of files which are generated by the kernel during +the build process, and should be ignored in any :manpage:`diff(1)`-generated patch. Make sure your patch does not include any extra files which do not belong in a patch submission. Make sure to review your patch -after- -generating it with diff(1), to ensure accuracy. +generating it with :manpage:`diff(1)`, to ensure accuracy. If your changes produce a lot of deltas, you need to split them into -individual patches which modify things in logical stages; see section -#3. This will facilitate review by other kernel developers, +individual patches which modify things in logical stages; see +:ref:`split_changes`. This will facilitate review by other kernel developers, very important if you want your patch accepted. -If you're using git, "git rebase -i" can help you with this process. If -you're not using git, quilt <http://savannah.nongnu.org/projects/quilt> +If you're using ``git``, ``git rebase -i`` can help you with this process. If +you're not using ``git``, ``quilt`` <http://savannah.nongnu.org/projects/quilt> is another popular alternative. +.. _describe_changes: - -2) Describe your changes. -------------------------- +2) Describe your changes +------------------------ Describe your problem. Whether your patch is a one-line bug fix or 5000 lines of a new feature, there must be an underlying problem that @@ -137,11 +137,11 @@ as you intend it to. The maintainer will thank you if you write your patch description in a form which can be easily pulled into Linux's source code management -system, git, as a "commit log". See #15, below. +system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`. Solve only one problem per patch. If your description starts to get long, that's a sign that you probably need to split up your patch. -See #3, next. +See :ref:`split_changes`. When you submit or resubmit a patch or patch series, include the complete patch description and justification for it. Don't just @@ -160,7 +160,7 @@ its behaviour. If the patch fixes a logged bug entry, refer to that bug entry by number and URL. If the patch follows from a mailing list discussion, give a URL to the mailing list archive; use the https://lkml.kernel.org/ -redirector with a Message-Id, to ensure that the links cannot become +redirector with a ``Message-Id``, to ensure that the links cannot become stale. However, try to make your explanation understandable without external @@ -171,7 +171,7 @@ patch as submitted. If you want to refer to a specific commit, don't just refer to the SHA-1 ID of the commit. Please also include the oneline summary of the commit, to make it easier for reviewers to know what it is about. -Example: +Example:: Commit e21d2170f36602ae2708 ("video: remove unnecessary platform_set_drvdata()") removed the unnecessary @@ -185,23 +185,25 @@ there is no collision with your six-character ID now, that condition may change five years from now. If your patch fixes a bug in a specific commit, e.g. you found an issue using -git-bisect, please use the 'Fixes:' tag with the first 12 characters of the -SHA-1 ID, and the one line summary. For example: +``git bisect``, please use the 'Fixes:' tag with the first 12 characters of +the SHA-1 ID, and the one line summary. For example:: Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") -The following git-config settings can be used to add a pretty format for -outputting the above style in the git log or git show commands +The following ``git config`` settings can be used to add a pretty format for +outputting the above style in the ``git log`` or ``git show`` commands:: [core] abbrev = 12 [pretty] fixes = Fixes: %h (\"%s\") -3) Separate your changes. -------------------------- +.. _split_changes: + +3) Separate your changes +------------------------ -Separate each _logical change_ into a separate patch. +Separate each **logical change** into a separate patch. For example, if your changes include both bug fixes and performance enhancements for a single driver, separate those changes into two @@ -217,12 +219,12 @@ change that can be verified by reviewers. Each patch should be justifiable on its own merits. If one patch depends on another patch in order for a change to be -complete, that is OK. Simply note "this patch depends on patch X" +complete, that is OK. Simply note **"this patch depends on patch X"** in your patch description. When dividing your change into a series of patches, take special care to ensure that the kernel builds and runs properly after each patch in the -series. Developers using "git bisect" to track down a problem can end up +series. Developers using ``git bisect`` to track down a problem can end up splitting your patch series at any point; they will not thank you if you introduce bugs in the middle. @@ -231,11 +233,13 @@ then only post say 15 or so at a time and wait for review and integration. -4) Style-check your changes. ----------------------------- +4) Style-check your changes +--------------------------- Check your patch for basic style violations, details of which can be -found in Documentation/CodingStyle. Failure to do so simply wastes +found in +:ref:`Documentation/CodingStyle <codingstyle>`. +Failure to do so simply wastes the reviewers time and will get your patch rejected, probably without even being read. @@ -260,8 +264,8 @@ You should be able to justify all violations that remain in your patch. -5) Select the recipients for your patch. ----------------------------------------- +5) Select the recipients for your patch +--------------------------------------- You should always copy the appropriate subsystem maintainer(s) on any patch to code that they maintain; look through the MAINTAINERS file and the @@ -295,13 +299,14 @@ to allow distributors to get the patch out to users; in such cases, obviously, the patch should not be sent to any public lists. Patches that fix a severe bug in a released kernel should be directed -toward the stable maintainers by putting a line like this: +toward the stable maintainers by putting a line like this:: Cc: stable@vger.kernel.org into the sign-off area of your patch (note, NOT an email recipient). You -should also read Documentation/stable_kernel_rules.txt in addition to this -file. +should also read +:ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>` +in addition to this file. Note, however, that some subsystem maintainers want to come to their own conclusions on which patches should go to the stable trees. The networking @@ -312,28 +317,30 @@ If changes affect userland-kernel interfaces, please send the MAN-PAGES maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at least a notification of the change, so that some information makes its way into the manual pages. User-space API changes should also be copied to -linux-api@vger.kernel.org. +linux-api@vger.kernel.org. For small patches you may want to CC the Trivial Patch Monkey trivial@kernel.org which collects "trivial" patches. Have a look into the MAINTAINERS file for its current manager. + Trivial patches must qualify for one of the following rules: - Spelling fixes in documentation - Spelling fixes for errors which could break grep(1) - Warning fixes (cluttering with useless warnings is bad) - Compilation fixes (only if they are actually correct) - Runtime fixes (only if they actually fix things) - Removing use of deprecated functions/macros - Contact detail and documentation fixes - Non-portable code replaced by portable code (even in arch-specific, - since people copy, as long as it's trivial) - Any fix by the author/maintainer of the file (ie. patch monkey - in re-transmission mode) + +- Spelling fixes in documentation +- Spelling fixes for errors which could break :manpage:`grep(1)` +- Warning fixes (cluttering with useless warnings is bad) +- Compilation fixes (only if they are actually correct) +- Runtime fixes (only if they actually fix things) +- Removing use of deprecated functions/macros +- Contact detail and documentation fixes +- Non-portable code replaced by portable code (even in arch-specific, + since people copy, as long as it's trivial) +- Any fix by the author/maintainer of the file (ie. patch monkey + in re-transmission mode) -6) No MIME, no links, no compression, no attachments. Just plain text. ------------------------------------------------------------------------ +6) No MIME, no links, no compression, no attachments. Just plain text +---------------------------------------------------------------------- Linus and other kernel developers need to be able to read and comment on the changes you are submitting. It is important for a kernel @@ -341,8 +348,11 @@ developer to be able to "quote" your changes, using standard e-mail tools, so that they may comment on specific portions of your code. For this reason, all patches should be submitted by e-mail "inline". -WARNING: Be wary of your editor's word-wrap corrupting your patch, -if you choose to cut-n-paste your patch. + +.. warning:: + + Be wary of your editor's word-wrap corrupting your patch, + if you choose to cut-n-paste your patch. Do not attach the patch as a MIME attachment, compressed or not. Many popular e-mail applications will not always transmit a MIME @@ -353,11 +363,12 @@ decreasing the likelihood of your MIME-attached change being accepted. Exception: If your mailer is mangling patches then someone may ask you to re-send them using MIME. -See Documentation/email-clients.txt for hints about configuring -your e-mail client so that it sends your patches untouched. +See :ref:`Documentation/email-clients.txt <email_clients>` +for hints about configuring your e-mail client so that it sends your patches +untouched. -7) E-mail size. ---------------- +7) E-mail size +-------------- Large changes are not appropriate for mailing lists, and some maintainers. If your patch, uncompressed, exceeds 300 kB in size, @@ -366,8 +377,8 @@ server, and provide instead a URL (link) pointing to your patch. But note that if your patch exceeds 300 kB, it almost certainly needs to be broken up anyway. -8) Respond to review comments. ------------------------------- +8) Respond to review comments +----------------------------- Your patch will almost certainly get comments from reviewers on ways in which the patch can be improved. You must respond to those comments; @@ -382,8 +393,8 @@ reviewers sometimes get grumpy. Even in that case, though, respond politely and address the problems they have pointed out. -9) Don't get discouraged - or impatient. ----------------------------------------- +9) Don't get discouraged - or impatient +--------------------------------------- After you have submitted your change, be patient and wait. Reviewers are busy people and may not get to your patch right away. @@ -419,9 +430,10 @@ patch, which certifies that you wrote it or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below: - Developer's Certificate of Origin 1.1 +Developer's Certificate of Origin 1.1 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - By making a contribution to this project, I certify that: +By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license @@ -445,7 +457,7 @@ can certify the below: maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. -then you just add a line saying +then you just add a line saying:: Signed-off-by: Random J Developer <random@developer.example.org> @@ -466,7 +478,7 @@ you add a line between the last Signed-off-by header and yours, indicating the nature of your changes. While there is nothing mandatory about this, it seems like prepending the description with your mail and/or name, all enclosed in square brackets, is noticeable enough to make it obvious that -you are responsible for last-minute changes. Example : +you are responsible for last-minute changes. Example:: Signed-off-by: Random J Developer <random@developer.example.org> [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] @@ -481,15 +493,15 @@ which appears in the changelog. Special note to back-porters: It seems to be a common and useful practice to insert an indication of the origin of a patch at the top of the commit message (just after the subject line) to facilitate tracking. For instance, -here's what we see in a 3.x-stable release: +here's what we see in a 3.x-stable release:: -Date: Tue Oct 7 07:26:38 2014 -0400 + Date: Tue Oct 7 07:26:38 2014 -0400 libata: Un-break ATA blacklist commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. -And here's what might appear in an older kernel once a patch is backported: +And here's what might appear in an older kernel once a patch is backported:: Date: Tue May 13 22:12:27 2008 +0200 @@ -529,7 +541,7 @@ When in doubt people should refer to the original discussion in the mailing list archives. If a person has had the opportunity to comment on a patch, but has not -provided such comments, you may optionally add a "Cc:" tag to the patch. +provided such comments, you may optionally add a ``Cc:`` tag to the patch. This is the only tag which might be added without an explicit action by the person it names - but it should indicate that this person was copied on the patch. This tag documents that potentially interested parties @@ -552,11 +564,12 @@ future patches, and ensures credit for the testers. Reviewed-by:, instead, indicates that the patch has been reviewed and found acceptable according to the Reviewer's Statement: - Reviewer's statement of oversight +Reviewer's statement of oversight +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - By offering my Reviewed-by: tag, I state that: +By offering my Reviewed-by: tag, I state that: - (a) I have carried out a technical review of this patch to + (a) I have carried out a technical review of this patch to evaluate its appropriateness and readiness for inclusion into the mainline kernel. @@ -594,24 +607,25 @@ A Fixes: tag indicates that the patch fixes an issue in a previous commit. It is used to make it easy to determine where a bug originated, which can help review a bug fix. This tag also assists the stable kernel team in determining which stable kernel versions should receive your fix. This is the preferred -method for indicating a bug fixed by the patch. See #2 above for more details. +method for indicating a bug fixed by the patch. See :ref:`describe_changes` +for more details. 14) The canonical patch format ------------------------------ This section describes how the patch itself should be formatted. Note -that, if you have your patches stored in a git repository, proper patch -formatting can be had with "git format-patch". The tools cannot create +that, if you have your patches stored in a ``git`` repository, proper patch +formatting can be had with ``git format-patch``. The tools cannot create the necessary text, though, so read the instructions below anyway. -The canonical patch subject line is: +The canonical patch subject line is:: Subject: [PATCH 001/123] subsystem: summary phrase The canonical patch message body contains the following: - - A "from" line specifying the patch author (only needed if the person + - A ``from`` line specifying the patch author (only needed if the person sending the patch is not the author). - An empty line. @@ -619,46 +633,46 @@ The canonical patch message body contains the following: - The body of the explanation, line wrapped at 75 columns, which will be copied to the permanent changelog to describe this patch. - - The "Signed-off-by:" lines, described above, which will + - The ``Signed-off-by:`` lines, described above, which will also go in the changelog. - - A marker line containing simply "---". + - A marker line containing simply ``---``. - Any additional comments not suitable for the changelog. - - The actual patch (diff output). + - The actual patch (``diff`` output). The Subject line format makes it very easy to sort the emails alphabetically by subject line - pretty much any email reader will support that - since because the sequence number is zero-padded, the numerical and alphabetic sort is the same. -The "subsystem" in the email's Subject should identify which +The ``subsystem`` in the email's Subject should identify which area or subsystem of the kernel is being patched. -The "summary phrase" in the email's Subject should concisely -describe the patch which that email contains. The "summary -phrase" should not be a filename. Do not use the same "summary -phrase" for every patch in a whole patch series (where a "patch -series" is an ordered sequence of multiple, related patches). +The ``summary phrase`` in the email's Subject should concisely +describe the patch which that email contains. The ``summary +phrase`` should not be a filename. Do not use the same ``summary +phrase`` for every patch in a whole patch series (where a ``patch +series`` is an ordered sequence of multiple, related patches). -Bear in mind that the "summary phrase" of your email becomes a +Bear in mind that the ``summary phrase`` of your email becomes a globally-unique identifier for that patch. It propagates all the way -into the git changelog. The "summary phrase" may later be used in +into the ``git`` changelog. The ``summary phrase`` may later be used in developer discussions which refer to the patch. People will want to -google for the "summary phrase" to read discussion regarding that +google for the ``summary phrase`` to read discussion regarding that patch. It will also be the only thing that people may quickly see when, two or three months later, they are going through perhaps -thousands of patches using tools such as "gitk" or "git log ---oneline". +thousands of patches using tools such as ``gitk`` or ``git log +--oneline``. -For these reasons, the "summary" must be no more than 70-75 +For these reasons, the ``summary`` must be no more than 70-75 characters, and it must describe both what the patch changes, as well as why the patch might be necessary. It is challenging to be both succinct and descriptive, but that is what a well-written summary should do. -The "summary phrase" may be prefixed by tags enclosed in square +The ``summary phrase`` may be prefixed by tags enclosed in square brackets: "Subject: [PATCH <tag>...] <summary phrase>". The tags are not considered part of the summary phrase, but describe how the patch should be treated. Common tags might include a version descriptor if @@ -670,19 +684,19 @@ that developers understand the order in which the patches should be applied and that they have reviewed or applied all of the patches in the patch series. -A couple of example Subjects: +A couple of example Subjects:: Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching Subject: [PATCH v2 01/27] x86: fix eflags tracking -The "from" line must be the very first line in the message body, +The ``from`` line must be the very first line in the message body, and has the form: From: Original Author <author@example.com> -The "from" line specifies who will be credited as the author of the -patch in the permanent changelog. If the "from" line is missing, -then the "From:" line from the email header will be used to determine +The ``from`` line specifies who will be credited as the author of the +patch in the permanent changelog. If the ``from`` line is missing, +then the ``From:`` line from the email header will be used to determine the patch author in the changelog. The explanation body will be committed to the permanent source @@ -694,35 +708,37 @@ especially useful for people who might be searching the commit logs looking for the applicable patch. If a patch fixes a compile failure, it may not be necessary to include _all_ of the compile failures; just enough that it is likely that someone searching for the patch can find -it. As in the "summary phrase", it is important to be both succinct as +it. As in the ``summary phrase``, it is important to be both succinct as well as descriptive. -The "---" marker line serves the essential purpose of marking for patch +The ``---`` marker line serves the essential purpose of marking for patch handling tools where the changelog message ends. -One good use for the additional comments after the "---" marker is for -a diffstat, to show what files have changed, and the number of -inserted and deleted lines per file. A diffstat is especially useful +One good use for the additional comments after the ``---`` marker is for +a ``diffstat``, to show what files have changed, and the number of +inserted and deleted lines per file. A ``diffstat`` is especially useful on bigger patches. Other comments relevant only to the moment or the maintainer, not suitable for the permanent changelog, should also go -here. A good example of such comments might be "patch changelogs" +here. A good example of such comments might be ``patch changelogs`` which describe what has changed between the v1 and v2 version of the patch. -If you are going to include a diffstat after the "---" marker, please -use diffstat options "-p 1 -w 70" so that filenames are listed from +If you are going to include a ``diffstat`` after the ``---`` marker, please +use ``diffstat`` options ``-p 1 -w 70`` so that filenames are listed from the top of the kernel source tree and don't use too much horizontal -space (easily fit in 80 columns, maybe with some indentation). (git +space (easily fit in 80 columns, maybe with some indentation). (``git`` generates appropriate diffstats by default.) See more details on the proper patch format in the following references. +.. _explicit_in_reply_to: + 15) Explicit In-Reply-To headers -------------------------------- It can be helpful to manually add In-Reply-To: headers to a patch -(e.g., when using "git send-email") to associate the patch with +(e.g., when using ``git send-email``) to associate the patch with previous relevant discussion, e.g. to link a bug fix to the email with the bug report. However, for a multi-patch series, it is generally best to avoid using In-Reply-To: to link to older versions of the @@ -732,12 +748,12 @@ helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in the cover email text) to link to an earlier version of the patch series. -16) Sending "git pull" requests -------------------------------- +16) Sending ``git pull`` requests +--------------------------------- If you have a series of patches, it may be most convenient to have the maintainer pull them directly into the subsystem repository with a -"git pull" operation. Note, however, that pulling patches from a developer +``git pull`` operation. Note, however, that pulling patches from a developer requires a higher degree of trust than taking patches from a mailing list. As a result, many subsystem maintainers are reluctant to take pull requests, especially from new, unknown developers. If in doubt you can use @@ -746,7 +762,7 @@ series, giving the maintainer the option of using either. A pull request should have [GIT] or [PULL] in the subject line. The request itself should include the repository name and the branch of -interest on a single line; it should look something like: +interest on a single line; it should look something like:: Please pull from @@ -755,10 +771,10 @@ interest on a single line; it should look something like: to get these changes: A pull request should also include an overall message saying what will be -included in the request, a "git shortlog" listing of the patches -themselves, and a diffstat showing the overall effect of the patch series. +included in the request, a ``git shortlog`` listing of the patches +themselves, and a ``diffstat`` showing the overall effect of the patch series. The easiest way to get all this information together is, of course, to let -git do it for you with the "git request-pull" command. +``git`` do it for you with the ``git request-pull`` command. Some maintainers (including Linus) want to see pull requests from signed commits; that increases their confidence that the request actually came @@ -770,8 +786,8 @@ signed by one or more core kernel developers. This step can be hard for new developers, but there is no way around it. Attending conferences can be a good way to find developers who can sign your key. -Once you have prepared a patch series in git that you wish to have somebody -pull, create a signed tag with "git tag -s". This will create a new tag +Once you have prepared a patch series in ``git`` that you wish to have somebody +pull, create a signed tag with ``git tag -s``. This will create a new tag identifying the last commit in the series and containing a signature created with your private key. You will also have the opportunity to add a changelog-style message to the tag; this is an ideal place to describe the @@ -782,14 +798,13 @@ are working from, don't forget to push the signed tag explicitly to the public tree. When generating your pull request, use the signed tag as the target. A -command like this will do the trick: +command like this will do the trick:: git request-pull master git://my.public.tree/linux.git my-signed-tag ----------------------- -SECTION 2 - REFERENCES ----------------------- +REFERENCES +********** Andrew Morton, "The perfect patch" (tpp). <http://www.ozlabs.org/~akpm/stuff/tpp.txt> @@ -799,23 +814,28 @@ Jeff Garzik, "Linux kernel patch submission format". Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". <http://www.kroah.com/log/linux/maintainer.html> + <http://www.kroah.com/log/linux/maintainer-02.html> + <http://www.kroah.com/log/linux/maintainer-03.html> + <http://www.kroah.com/log/linux/maintainer-04.html> + <http://www.kroah.com/log/linux/maintainer-05.html> + <http://www.kroah.com/log/linux/maintainer-06.html> NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! <https://lkml.org/lkml/2005/7/11/336> Kernel Documentation/CodingStyle: - <Documentation/CodingStyle> + :ref:`Documentation/CodingStyle <codingstyle>` Linus Torvalds's mail on the canonical patch format: <http://lkml.org/lkml/2005/4/7/183> Andi Kleen, "On submitting kernel patches" Some strategies to get difficult or controversial changes in. + http://halobates.de/on-submitting-patches.pdf --- diff --git a/Documentation/acpi/acpi-lid.txt b/Documentation/acpi/acpi-lid.txt new file mode 100644 index 000000000000..effe7af3a5af --- /dev/null +++ b/Documentation/acpi/acpi-lid.txt @@ -0,0 +1,96 @@ +Special Usage Model of the ACPI Control Method Lid Device + +Copyright (C) 2016, Intel Corporation +Author: Lv Zheng <lv.zheng@intel.com> + + +Abstract: + +Platforms containing lids convey lid state (open/close) to OSPMs using a +control method lid device. To implement this, the AML tables issue +Notify(lid_device, 0x80) to notify the OSPMs whenever the lid state has +changed. The _LID control method for the lid device must be implemented to +report the "current" state of the lid as either "opened" or "closed". + +For most platforms, both the _LID method and the lid notifications are +reliable. However, there are exceptions. In order to work with these +exceptional buggy platforms, special restrictions and expections should be +taken into account. This document describes the restrictions and the +expections of the Linux ACPI lid device driver. + + +1. Restrictions of the returning value of the _LID control method + +The _LID control method is described to return the "current" lid state. +However the word of "current" has ambiguity, some buggy AML tables return +the lid state upon the last lid notification instead of returning the lid +state upon the last _LID evaluation. There won't be difference when the +_LID control method is evaluated during the runtime, the problem is its +initial returning value. When the AML tables implement this control method +with cached value, the initial returning value is likely not reliable. +There are platforms always retun "closed" as initial lid state. + +2. Restrictions of the lid state change notifications + +There are buggy AML tables never notifying when the lid device state is +changed to "opened". Thus the "opened" notification is not guaranteed. But +it is guaranteed that the AML tables always notify "closed" when the lid +state is changed to "closed". The "closed" notification is normally used to +trigger some system power saving operations on Windows. Since it is fully +tested, it is reliable from all AML tables. + +3. Expections for the userspace users of the ACPI lid device driver + +The ACPI button driver exports the lid state to the userspace via the +following file: + /proc/acpi/button/lid/LID0/state +This file actually calls the _LID control method described above. And given +the previous explanation, it is not reliable enough on some platforms. So +it is advised for the userspace program to not to solely rely on this file +to determine the actual lid state. + +The ACPI button driver emits the following input event to the userspace: + SW_LID +The ACPI lid device driver is implemented to try to deliver the platform +triggered events to the userspace. However, given the fact that the buggy +firmware cannot make sure "opened"/"closed" events are paired, the ACPI +button driver uses the following 3 modes in order not to trigger issues. + +If the userspace hasn't been prepared to ignore the unreliable "opened" +events and the unreliable initial state notification, Linux users can use +the following kernel parameters to handle the possible issues: +A. button.lid_init_state=method: + When this option is specified, the ACPI button driver reports the + initial lid state using the returning value of the _LID control method + and whether the "opened"/"closed" events are paired fully relies on the + firmware implementation. + This option can be used to fix some platforms where the returning value + of the _LID control method is reliable but the initial lid state + notification is missing. + This option is the default behavior during the period the userspace + isn't ready to handle the buggy AML tables. +B. button.lid_init_state=open: + When this option is specified, the ACPI button driver always reports the + initial lid state as "opened" and whether the "opened"/"closed" events + are paired fully relies on the firmware implementation. + This may fix some platforms where the returning value of the _LID + control method is not reliable and the initial lid state notification is + missing. + +If the userspace has been prepared to ignore the unreliable "opened" events +and the unreliable initial state notification, Linux users should always +use the following kernel parameter: +C. button.lid_init_state=ignore: + When this option is specified, the ACPI button driver never reports the + initial lid state and there is a compensation mechanism implemented to + ensure that the reliable "closed" notifications can always be delievered + to the userspace by always pairing "closed" input events with complement + "opened" input events. But there is still no guarantee that the "opened" + notifications can be delivered to the userspace when the lid is actually + opens given that some AML tables do not send "opened" notifications + reliably. + In this mode, if everything is correctly implemented by the platform + firmware, the old userspace programs should still work. Otherwise, the + new userspace programs are required to work with the ACPI button driver. + This option will be the default behavior after the userspace is ready to + handle the buggy AML tables. diff --git a/Documentation/acpi/gpio-properties.txt b/Documentation/acpi/gpio-properties.txt index f35dad11f0de..5aafe0b351a1 100644 --- a/Documentation/acpi/gpio-properties.txt +++ b/Documentation/acpi/gpio-properties.txt @@ -28,8 +28,8 @@ index, like the ASL example below shows: ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), Package () { - Package () {"reset-gpio", Package() {^BTH, 1, 1, 0 }}, - Package () {"shutdown-gpio", Package() {^BTH, 0, 0, 0 }}, + Package () {"reset-gpios", Package() {^BTH, 1, 1, 0 }}, + Package () {"shutdown-gpios", Package() {^BTH, 0, 0, 0 }}, } }) } @@ -48,7 +48,7 @@ Since ACPI GpioIo() resource does not have a field saying whether it is active low or high, the "active_low" argument can be used here. Setting it to 1 marks the GPIO as active low. -In our Bluetooth example the "reset-gpio" refers to the second GpioIo() +In our Bluetooth example the "reset-gpios" refers to the second GpioIo() resource, second pin in that resource with the GPIO number of 31. ACPI GPIO Mappings Provided by Drivers @@ -83,8 +83,8 @@ static const struct acpi_gpio_params reset_gpio = { 1, 1, false }; static const struct acpi_gpio_params shutdown_gpio = { 0, 0, false }; static const struct acpi_gpio_mapping bluetooth_acpi_gpios[] = { - { "reset-gpio", &reset_gpio, 1 }, - { "shutdown-gpio", &shutdown_gpio, 1 }, + { "reset-gpios", &reset_gpio, 1 }, + { "shutdown-gpios", &shutdown_gpio, 1 }, { }, }; diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt index 77df55b0225a..02ce4924468e 100644 --- a/Documentation/applying-patches.txt +++ b/Documentation/applying-patches.txt @@ -1,9 +1,13 @@ +.. _applying_patches: - Applying Patches To The Linux Kernel - ------------------------------------ +Applying Patches To The Linux Kernel +++++++++++++++++++++++++++++++++++++ - Original by: Jesper Juhl, August 2005 - Last update: 2006-01-05 +Original by: + Jesper Juhl, August 2005 + +Last update: + 2016-09-14 A frequently asked question on the Linux Kernel Mailing List is how to apply @@ -17,10 +21,12 @@ their specific patches) is also provided. What is a patch? ---- - A patch is a small text document containing a delta of changes between two -different versions of a source tree. Patches are created with the `diff' +================ + +A patch is a small text document containing a delta of changes between two +different versions of a source tree. Patches are created with the ``diff`` program. + To correctly apply a patch you need to know what base it was generated from and what new version the patch will change the source tree into. These should both be present in the patch file metadata or be possible to deduce @@ -28,8 +34,9 @@ from the filename. How do I apply or revert a patch? ---- - You apply a patch with the `patch' program. The patch program reads a diff +================================= + +You apply a patch with the ``patch`` program. The patch program reads a diff (or patch) file and makes the changes to the source tree described in it. Patches for the Linux kernel are generated relative to the parent directory @@ -38,26 +45,33 @@ holding the kernel source dir. This means that paths to files inside the patch file contain the name of the kernel source directories it was generated against (or some other directory names like "a/" and "b/"). + Since this is unlikely to match the name of the kernel source dir on your local machine (but is often useful info to see what version an otherwise unlabeled patch was generated against) you should change into your kernel source directory and then strip the first element of the path from filenames -in the patch file when applying it (the -p1 argument to `patch' does this). +in the patch file when applying it (the ``-p1`` argument to ``patch`` does +this). To revert a previously applied patch, use the -R argument to patch. -So, if you applied a patch like this: +So, if you applied a patch like this:: + patch -p1 < ../patch-x.y.z -You can revert (undo) it like this: +You can revert (undo) it like this:: + patch -R -p1 < ../patch-x.y.z -How do I feed a patch/diff file to `patch'? ---- - This (as usual with Linux and other UNIX like operating systems) can be +How do I feed a patch/diff file to ``patch``? +============================================= + +This (as usual with Linux and other UNIX like operating systems) can be done in several different ways. + In all the examples below I feed the file (in uncompressed form) to patch -via stdin using the following syntax: +via stdin using the following syntax:: + patch -p1 < path/to/patch-x.y.z If you just want to be able to follow the examples below and don't want to @@ -65,35 +79,40 @@ know of more than one way to use patch, then you can stop reading this section here. Patch can also get the name of the file to use via the -i argument, like -this: +this:: + patch -p1 -i path/to/patch-x.y.z -If your patch file is compressed with gzip or bzip2 and you don't want to +If your patch file is compressed with gzip or xz and you don't want to uncompress it before applying it, then you can feed it to patch like this -instead: - zcat path/to/patch-x.y.z.gz | patch -p1 - bzcat path/to/patch-x.y.z.bz2 | patch -p1 +instead:: + + xzcat path/to/patch-x.y.z.xz | patch -p1 + bzcat path/to/patch-x.y.z.gz | patch -p1 If you wish to uncompress the patch file by hand first before applying it (what I assume you've done in the examples below), then you simply run -gunzip or bunzip2 on the file -- like this: +gunzip or xz on the file -- like this:: + gunzip patch-x.y.z.gz - bunzip2 patch-x.y.z.bz2 + xz -d patch-x.y.z.xz Which will leave you with a plain text patch-x.y.z file that you can feed to -patch via stdin or the -i argument, as you prefer. +patch via stdin or the ``-i`` argument, as you prefer. -A few other nice arguments for patch are -s which causes patch to be silent +A few other nice arguments for patch are ``-s`` which causes patch to be silent except for errors which is nice to prevent errors from scrolling out of the -screen too fast, and --dry-run which causes patch to just print a listing of -what would happen, but doesn't actually make any changes. Finally --verbose +screen too fast, and ``--dry-run`` which causes patch to just print a listing of +what would happen, but doesn't actually make any changes. Finally ``--verbose`` tells patch to print more information about the work being done. Common errors when patching ---- - When patch applies a patch file it attempts to verify the sanity of the +=========================== + +When patch applies a patch file it attempts to verify the sanity of the file in different ways. + Checking that the file looks like a valid patch file and checking the code around the bits being modified matches the context provided in the patch are just two of the basic sanity checks patch does. @@ -111,13 +130,13 @@ everything looks good it has just moved up or down a bit, and patch will usually adjust the line numbers and apply the patch. Whenever patch applies a patch that it had to modify a bit to make it fit -it'll tell you about it by saying the patch applied with 'fuzz'. +it'll tell you about it by saying the patch applied with **fuzz**. You should be wary of such changes since even though patch probably got it right it doesn't /always/ get it right, and the result will sometimes be wrong. When patch encounters a change that it can't fix up with fuzz it rejects it -outright and leaves a file with a .rej extension (a reject file). You can +outright and leaves a file with a ``.rej`` extension (a reject file). You can read this file to see exactly what change couldn't be applied, so you can go fix it up by hand if you wish. @@ -132,43 +151,47 @@ to start with a fresh tree downloaded in full from kernel.org. Let's look a bit more at some of the messages patch can produce. -If patch stops and presents a "File to patch:" prompt, then patch could not +If patch stops and presents a ``File to patch:`` prompt, then patch could not find a file to be patched. Most likely you forgot to specify -p1 or you are in the wrong directory. Less often, you'll find patches that need to be -applied with -p0 instead of -p1 (reading the patch file should reveal if +applied with ``-p0`` instead of ``-p1`` (reading the patch file should reveal if this is the case -- if so, then this is an error by the person who created the patch but is not fatal). -If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a +If you get ``Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines).`` or a message similar to that, then it means that patch had to adjust the location of the change (in this example it needed to move 7 lines from where it expected to make the change to make it fit). + The resulting file may or may not be OK, depending on the reason the file was different than expected. + This often happens if you try to apply a patch that was generated against a different kernel version than the one you are trying to patch. -If you get a message like "Hunk #3 FAILED at 2387.", then it means that the +If you get a message like ``Hunk #3 FAILED at 2387.``, then it means that the patch could not be applied correctly and the patch program was unable to -fuzz its way through. This will generate a .rej file with the change that -caused the patch to fail and also a .orig file showing you the original +fuzz its way through. This will generate a ``.rej`` file with the change that +caused the patch to fail and also a ``.orig`` file showing you the original content that couldn't be changed. -If you get "Reversed (or previously applied) patch detected! Assume -R? [n]" +If you get ``Reversed (or previously applied) patch detected! Assume -R? [n]`` then patch detected that the change contained in the patch seems to have already been made. + If you actually did apply this patch previously and you just re-applied it in error, then just say [n]o and abort this patch. If you applied this patch previously and actually intended to revert it, but forgot to specify -R, -then you can say [y]es here to make patch revert it for you. +then you can say [**y**]es here to make patch revert it for you. + This can also happen if the creator of the patch reversed the source and destination directories when creating the patch, and in that case reverting the patch will in fact apply it. -A message similar to "patch: **** unexpected end of file in patch" or "patch -unexpectedly ends in middle of line" means that patch could make no sense of -the file you fed to it. Either your download is broken, you tried to feed -patch a compressed patch file without uncompressing it first, or the patch +A message similar to ``patch: **** unexpected end of file in patch`` or +``patch unexpectedly ends in middle of line`` means that patch could make no +sense of the file you fed to it. Either your download is broken, you tried to +feed patch a compressed patch file without uncompressing it first, or the patch file that you are using has been mangled by a mail client or mail transfer agent along the way somewhere, e.g., by splitting a long line into two lines. Often these warnings can easily be fixed by joining (concatenating) the @@ -182,28 +205,32 @@ to start over with a fresh download of a full kernel tree and the patch you wish to apply. -Are there any alternatives to `patch'? ---- - Yes there are alternatives. +Are there any alternatives to ``patch``? +======================================== - You can use the `interdiff' program (http://cyberelk.net/tim/patchutils/) to + +Yes there are alternatives. + +You can use the ``interdiff`` program (http://cyberelk.net/tim/patchutils/) to generate a patch representing the differences between two patches and then apply the result. -This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single + +This will let you move from something like 4.7.2 to 4.7.3 in a single step. The -z flag to interdiff will even let you feed it patches in gzip or bzip2 compressed form directly without the use of zcat or bzcat or manual decompression. -Here's how you'd go from 2.6.12.2 to 2.6.12.3 in a single step: - interdiff -z ../patch-2.6.12.2.bz2 ../patch-2.6.12.3.gz | patch -p1 +Here's how you'd go from 4.7.2 to 4.7.3 in a single step:: + + interdiff -z ../patch-4.7.2.gz ../patch-4.7.3.gz | patch -p1 Although interdiff may save you a step or two you are generally advised to do the additional steps since interdiff can get things wrong in some cases. - Another alternative is `ketchup', which is a python script for automatic +Another alternative is ``ketchup``, which is a python script for automatic downloading and applying of patches (http://www.selenic.com/ketchup/). - Other nice tools are diffstat, which shows a summary of changes made by a +Other nice tools are diffstat, which shows a summary of changes made by a patch; lsdiff, which displays a short listing of affected files in a patch file, along with (optionally) the line numbers of the start of each patch; and grepdiff, which displays a list of the files modified by a patch where @@ -211,99 +238,103 @@ the patch contains a given regular expression. Where can I download the patches? ---- - The patches are available at http://kernel.org/ +================================= + +The patches are available at http://kernel.org/ Most recent patches are linked from the front page, but they also have specific homes. -The 2.6.x.y (-stable) and 2.6.x patches live at - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ +The 4.x.y (-stable) and 4.x patches live at -The -rc patches live at - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/ + ftp://ftp.kernel.org/pub/linux/kernel/v4.x/ -The -git patches live at - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/ +The -rc patches live at -The -mm kernels live at - ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/ + ftp://ftp.kernel.org/pub/linux/kernel/v4.x/testing/ -In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a +In place of ``ftp.kernel.org`` you can use ``ftp.cc.kernel.org``, where cc is a country code. This way you'll be downloading from a mirror site that's most likely geographically closer to you, resulting in faster downloads for you, less bandwidth used globally and less load on the main kernel.org servers -- these are good things, so do use mirrors when possible. -The 2.6.x kernels ---- - These are the base stable releases released by Linus. The highest numbered +The 4.x kernels +=============== + +These are the base stable releases released by Linus. The highest numbered release is the most recent. If regressions or other serious flaws are found, then a -stable fix patch -will be released (see below) on top of this base. Once a new 2.6.x base +will be released (see below) on top of this base. Once a new 4.x base kernel is released, a patch is made available that is a delta between the -previous 2.6.x kernel and the new one. +previous 4.x kernel and the new one. + +To apply a patch moving from 4.6 to 4.7, you'd do the following (note +that such patches do **NOT** apply on top of 4.x.y kernels but on top of the +base 4.x kernel -- if you need to move from 4.x.y to 4.x+1 you need to +first revert the 4.x.y patch). + +Here are some examples:: -To apply a patch moving from 2.6.11 to 2.6.12, you'd do the following (note -that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the -base 2.6.x kernel -- if you need to move from 2.6.x.y to 2.6.x+1 you need to -first revert the 2.6.x.y patch). + # moving from 4.6 to 4.7 -Here are some examples: + $ cd ~/linux-4.6 # change to kernel source dir + $ patch -p1 < ../patch-4.7 # apply the 4.7 patch + $ cd .. + $ mv linux-4.6 linux-4.7 # rename source dir -# moving from 2.6.11 to 2.6.12 -$ cd ~/linux-2.6.11 # change to kernel source dir -$ patch -p1 < ../patch-2.6.12 # apply the 2.6.12 patch -$ cd .. -$ mv linux-2.6.11 linux-2.6.12 # rename source dir + # moving from 4.6.1 to 4.7 -# moving from 2.6.11.1 to 2.6.12 -$ cd ~/linux-2.6.11.1 # change to kernel source dir -$ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch - # source dir is now 2.6.11 -$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch -$ cd .. -$ mv linux-2.6.11.1 linux-2.6.12 # rename source dir + $ cd ~/linux-4.6.1 # change to kernel source dir + $ patch -p1 -R < ../patch-4.6.1 # revert the 4.6.1 patch + # source dir is now 4.6 + $ patch -p1 < ../patch-4.7 # apply new 4.7 patch + $ cd .. + $ mv linux-4.6.1 linux-4.7 # rename source dir -The 2.6.x.y kernels ---- - Kernels with 4-digit versions are -stable kernels. They contain small(ish) +The 4.x.y kernels +================= + +Kernels with 3-digit versions are -stable kernels. They contain small(ish) critical fixes for security problems or significant regressions discovered -in a given 2.6.x kernel. +in a given 4.x kernel. This is the recommended branch for users who want the most recent stable kernel and are not interested in helping test development/experimental versions. -If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is +If no 4.x.y kernel is available, then the highest numbered 4.x kernel is the current stable kernel. - note: the -stable team usually do make incremental patches available as well +.. note:: + + The -stable team usually do make incremental patches available as well as patches against the latest mainline release, but I only cover the non-incremental ones below. The incremental ones can be found at - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/incr/ + ftp://ftp.kernel.org/pub/linux/kernel/v4.x/incr/ -These patches are not incremental, meaning that for example the 2.6.12.3 -patch does not apply on top of the 2.6.12.2 kernel source, but rather on top -of the base 2.6.12 kernel source . -So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel -source you have to first back out the 2.6.12.2 patch (so you are left with a -base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch. +These patches are not incremental, meaning that for example the 4.7.3 +patch does not apply on top of the 4.7.2 kernel source, but rather on top +of the base 4.7 kernel source. -Here's a small example: +So, in order to apply the 4.7.3 patch to your existing 4.7.2 kernel +source you have to first back out the 4.7.2 patch (so you are left with a +base 4.7 kernel source) and then apply the new 4.7.3 patch. -$ cd ~/linux-2.6.12.2 # change into the kernel source dir -$ patch -p1 -R < ../patch-2.6.12.2 # revert the 2.6.12.2 patch -$ patch -p1 < ../patch-2.6.12.3 # apply the new 2.6.12.3 patch -$ cd .. -$ mv linux-2.6.12.2 linux-2.6.12.3 # rename the kernel source dir +Here's a small example:: + $ cd ~/linux-4.7.2 # change to the kernel source dir + $ patch -p1 -R < ../patch-4.7.2 # revert the 4.7.2 patch + $ patch -p1 < ../patch-4.7.3 # apply the new 4.7.3 patch + $ cd .. + $ mv linux-4.7.2 linux-4.7.3 # rename the kernel source dir The -rc kernels ---- - These are release-candidate kernels. These are development kernels released +=============== + +These are release-candidate kernels. These are development kernels released by Linus whenever he deems the current git (the kernel's source management tool) tree to be in a reasonably sane state adequate for testing. @@ -317,39 +348,44 @@ This is a good branch to run for people who want to help out testing development kernels but do not want to run some of the really experimental stuff (such people should see the sections about -git and -mm kernels below). -The -rc patches are not incremental, they apply to a base 2.6.x kernel, just -like the 2.6.x.y patches described above. The kernel version before the -rcN +The -rc patches are not incremental, they apply to a base 4.x kernel, just +like the 4.x.y patches described above. The kernel version before the -rcN suffix denotes the version of the kernel that this -rc kernel will eventually turn into. -So, 2.6.13-rc5 means that this is the fifth release candidate for the 2.6.13 -kernel and the patch should be applied on top of the 2.6.12 kernel source. -Here are 3 examples of how to apply these patches: +So, 4.8-rc5 means that this is the fifth release candidate for the 4.8 +kernel and the patch should be applied on top of the 4.7 kernel source. -# first an example of moving from 2.6.12 to 2.6.13-rc3 -$ cd ~/linux-2.6.12 # change into the 2.6.12 source dir -$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch -$ cd .. -$ mv linux-2.6.12 linux-2.6.13-rc3 # rename the source dir +Here are 3 examples of how to apply these patches:: -# now let's move from 2.6.13-rc3 to 2.6.13-rc5 -$ cd ~/linux-2.6.13-rc3 # change into the 2.6.13-rc3 dir -$ patch -p1 -R < ../patch-2.6.13-rc3 # revert the 2.6.13-rc3 patch -$ patch -p1 < ../patch-2.6.13-rc5 # apply the new 2.6.13-rc5 patch -$ cd .. -$ mv linux-2.6.13-rc3 linux-2.6.13-rc5 # rename the source dir + # first an example of moving from 4.7 to 4.8-rc3 -# finally let's try and move from 2.6.12.3 to 2.6.13-rc5 -$ cd ~/linux-2.6.12.3 # change to the kernel source dir -$ patch -p1 -R < ../patch-2.6.12.3 # revert the 2.6.12.3 patch -$ patch -p1 < ../patch-2.6.13-rc5 # apply new 2.6.13-rc5 patch -$ cd .. -$ mv linux-2.6.12.3 linux-2.6.13-rc5 # rename the kernel source dir + $ cd ~/linux-4.7 # change to the 4.7 source dir + $ patch -p1 < ../patch-4.8-rc3 # apply the 4.8-rc3 patch + $ cd .. + $ mv linux-4.7 linux-4.8-rc3 # rename the source dir + + # now let's move from 4.8-rc3 to 4.8-rc5 + + $ cd ~/linux-4.8-rc3 # change to the 4.8-rc3 dir + $ patch -p1 -R < ../patch-4.8-rc3 # revert the 4.8-rc3 patch + $ patch -p1 < ../patch-4.8-rc5 # apply the new 4.8-rc5 patch + $ cd .. + $ mv linux-4.8-rc3 linux-4.8-rc5 # rename the source dir + + # finally let's try and move from 4.7.3 to 4.8-rc5 + + $ cd ~/linux-4.7.3 # change to the kernel source dir + $ patch -p1 -R < ../patch-4.7.3 # revert the 4.7.3 patch + $ patch -p1 < ../patch-4.8-rc5 # apply new 4.8-rc5 patch + $ cd .. + $ mv linux-4.7.3 linux-4.8-rc5 # rename the kernel source dir The -git kernels ---- - These are daily snapshots of Linus' kernel tree (managed in a git +================ + +These are daily snapshots of Linus' kernel tree (managed in a git repository, hence the name). These patches are usually released daily and represent the current state of @@ -357,91 +393,66 @@ Linus's tree. They are more experimental than -rc kernels since they are generated automatically without even a cursory glance to see if they are sane. --git patches are not incremental and apply either to a base 2.6.x kernel or -a base 2.6.x-rc kernel -- you can see which from their name. -A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch -named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel. - -Here are some examples of how to apply these patches: - -# moving from 2.6.12 to 2.6.12-git1 -$ cd ~/linux-2.6.12 # change to the kernel source dir -$ patch -p1 < ../patch-2.6.12-git1 # apply the 2.6.12-git1 patch -$ cd .. -$ mv linux-2.6.12 linux-2.6.12-git1 # rename the kernel source dir - -# moving from 2.6.12-git1 to 2.6.13-rc2-git3 -$ cd ~/linux-2.6.12-git1 # change to the kernel source dir -$ patch -p1 -R < ../patch-2.6.12-git1 # revert the 2.6.12-git1 patch - # we now have a 2.6.12 kernel -$ patch -p1 < ../patch-2.6.13-rc2 # apply the 2.6.13-rc2 patch - # the kernel is now 2.6.13-rc2 -$ patch -p1 < ../patch-2.6.13-rc2-git3 # apply the 2.6.13-rc2-git3 patch - # the kernel is now 2.6.13-rc2-git3 -$ cd .. -$ mv linux-2.6.12-git1 linux-2.6.13-rc2-git3 # rename source dir - - -The -mm kernels ---- - These are experimental kernels released by Andrew Morton. - -The -mm tree serves as a sort of proving ground for new features and other -experimental patches. -Once a patch has proved its worth in -mm for a while Andrew pushes it on to -Linus for inclusion in mainline. - -Although it's encouraged that patches flow to Linus via the -mm tree, this -is not always enforced. -Subsystem maintainers (or individuals) sometimes push their patches directly -to Linus, even though (or after) they have been merged and tested in -mm (or -sometimes even without prior testing in -mm). - -You should generally strive to get your patches into mainline via -mm to -ensure maximum testing. - -This branch is in constant flux and contains many experimental features, a +-git patches are not incremental and apply either to a base 4.x kernel or +a base 4.x-rc kernel -- you can see which from their name. +A patch named 4.7-git1 applies to the 4.7 kernel source and a patch +named 4.8-rc3-git2 applies to the source of the 4.8-rc3 kernel. + +Here are some examples of how to apply these patches:: + + # moving from 4.7 to 4.7-git1 + + $ cd ~/linux-4.7 # change to the kernel source dir + $ patch -p1 < ../patch-4.7-git1 # apply the 4.7-git1 patch + $ cd .. + $ mv linux-4.7 linux-4.7-git1 # rename the kernel source dir + + # moving from 4.7-git1 to 4.8-rc2-git3 + + $ cd ~/linux-4.7-git1 # change to the kernel source dir + $ patch -p1 -R < ../patch-4.7-git1 # revert the 4.7-git1 patch + # we now have a 4.7 kernel + $ patch -p1 < ../patch-4.8-rc2 # apply the 4.8-rc2 patch + # the kernel is now 4.8-rc2 + $ patch -p1 < ../patch-4.8-rc2-git3 # apply the 4.8-rc2-git3 patch + # the kernel is now 4.8-rc2-git3 + $ cd .. + $ mv linux-4.7-git1 linux-4.8-rc2-git3 # rename source dir + + +The -mm patches and the linux-next tree +======================================= + +The -mm patches are experimental patches released by Andrew Morton. + +In the past, -mm tree were used to also test subsystem patches, but this +function is now done via the +:ref:`linux-next <https://www.kernel.org/doc/man-pages/linux-next.html>` +tree. The Subsystem maintainers push their patches first to linux-next, +and, during the merge window, sends them directly to Linus. + +The -mm patches serve as a sort of proving ground for new features and other +experimental patches that aren't merged via a subsystem tree. +Once such patches has proved its worth in -mm for a while Andrew pushes +it on to Linus for inclusion in mainline. + +The linux-next tree is daily updated, and includes the -mm patches. +Both are in constant flux and contains many experimental features, a lot of debugging patches not appropriate for mainline etc., and is the most experimental of the branches described in this document. -These kernels are not appropriate for use on systems that are supposed to be +These patches are not appropriate for use on systems that are supposed to be stable and they are more risky to run than any of the other branches (make sure you have up-to-date backups -- that goes for any experimental kernel but -even more so for -mm kernels). - -These kernels in addition to all the other experimental patches they contain -usually also contain any changes in the mainline -git kernels available at -the time of release. - -Testing of -mm kernels is greatly appreciated since the whole point of the -tree is to weed out regressions, crashes, data corruption bugs, build -breakage (and any other bug in general) before changes are merged into the -more stable mainline Linus tree. -But testers of -mm should be aware that breakage in this tree is more common -than in any other tree. - -The -mm kernels are not released on a fixed schedule, but usually a few -mm -kernels are released in between each -rc kernel (1 to 3 is common). -The -mm kernels apply to either a base 2.6.x kernel (when no -rc kernels -have been released yet) or to a Linus -rc kernel. - -Here are some examples of applying the -mm patches: - -# moving from 2.6.12 to 2.6.12-mm1 -$ cd ~/linux-2.6.12 # change to the 2.6.12 source dir -$ patch -p1 < ../2.6.12-mm1 # apply the 2.6.12-mm1 patch -$ cd .. -$ mv linux-2.6.12 linux-2.6.12-mm1 # rename the source appropriately - -# moving from 2.6.12-mm1 to 2.6.13-rc3-mm3 -$ cd ~/linux-2.6.12-mm1 -$ patch -p1 -R < ../2.6.12-mm1 # revert the 2.6.12-mm1 patch - # we now have a 2.6.12 source -$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch - # we now have a 2.6.13-rc3 source -$ patch -p1 < ../2.6.13-rc3-mm3 # apply the 2.6.13-rc3-mm3 patch -$ cd .. -$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir +even more so for -mm patches or using a Kernel from the linux-next tree). + +Testing of -mm patches and linux-next is greatly appreciated since the whole +point of those are to weed out regressions, crashes, data corruption bugs, +build breakage (and any other bug in general) before changes are merged into +the more stable mainline Linus tree. + +But testers of -mm and linux-next should be aware that breakages are +more common than in any other tree. This concludes this list of explanations of the various kernel trees. diff --git a/Documentation/arm/CCN.txt b/Documentation/arm/CCN.txt index ffca443a19b4..15cdb7bc57c3 100644 --- a/Documentation/arm/CCN.txt +++ b/Documentation/arm/CCN.txt @@ -18,13 +18,17 @@ and config2 fields of the perf_event_attr structure. The "events" directory provides configuration templates for all documented events, that can be used with perf tool. For example "xp_valid_flit" is an equivalent of "type=0x8,event=0x4". Other parameters must be -explicitly specified. For events originating from device, "node" -defines its index. All crosspoint events require "xp" (index), -"port" (device port number) and "vc" (virtual channel ID) and -"dir" (direction). Watchpoints (special "event" value 0xfe) also -require comparator values ("cmp_l" and "cmp_h") and "mask", being -index of the comparator mask. +explicitly specified. +For events originating from device, "node" defines its index. + +Crosspoint PMU events require "xp" (index), "bus" (bus number) +and "vc" (virtual channel ID). + +Crosspoint watchpoint-based events (special "event" value 0xfe) +require "xp" and "vc" as as above plus "port" (device port index), +"dir" (transmit/receive direction), comparator values ("cmp_l" +and "cmp_h") and "mask", being index of the comparator mask. Masks are defined separately from the event description (due to limited number of the config values) in the "cmp_mask" directory, with first 8 configurable by user and additional diff --git a/Documentation/arm/sunxi/README b/Documentation/arm/sunxi/README index e5a115f24471..c7a0554523da 100644 --- a/Documentation/arm/sunxi/README +++ b/Documentation/arm/sunxi/README @@ -73,4 +73,13 @@ SunXi family * Octa ARM Cortex-A7 based SoCs - Allwinner A83T + Datasheet - http://dl.linux-sunxi.org/A83T/A83T_datasheet_Revision_1.1.pdf + https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_Datasheet_v1.3_20150510.pdf + + User Manual + https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_User_Manual_v1.5.1_20150513.pdf + + * Quad ARM Cortex-A53 based SoCs + - Allwinner A64 + + Datasheet + http://dl.linux-sunxi.org/A64/A64_Datasheet_V1.1.pdf + + User Manual + http://dl.linux-sunxi.org/A64/Allwinner%20A64%20User%20Manual%20v1.0.pdf diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index ccc60324e738..405da11fc3e4 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -61,3 +61,5 @@ stable kernels. | Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | | Cavium | ThunderX SMMUv2 | #27704 | N/A | +| | | | | +| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | diff --git a/Documentation/clk.txt b/Documentation/clk.txt index 5c4bc4d01d0c..22f026aa2f34 100644 --- a/Documentation/clk.txt +++ b/Documentation/clk.txt @@ -31,24 +31,25 @@ serve as a convenient shorthand for the implementation of the hardware-specific bits for the hypothetical "foo" hardware. Tying the two halves of this interface together is struct clk_hw, which -is defined in struct clk_foo and pointed to within struct clk. This +is defined in struct clk_foo and pointed to within struct clk_core. This allows for easy navigation between the two discrete halves of the common clock interface. Part 2 - common data structures and api -Below is the common struct clk definition from -include/linux/clk-private.h, modified for brevity: +Below is the common struct clk_core definition from +drivers/clk/clk.c, modified for brevity: - struct clk { + struct clk_core { const char *name; const struct clk_ops *ops; struct clk_hw *hw; - char **parent_names; - struct clk **parents; - struct clk *parent; - struct hlist_head children; - struct hlist_node child_node; + struct module *owner; + struct clk_core *parent; + const char **parent_names; + struct clk_core **parents; + u8 num_parents; + u8 new_parent_index; ... }; @@ -56,16 +57,19 @@ The members above make up the core of the clk tree topology. The clk api itself defines several driver-facing functions which operate on struct clk. That api is documented in include/linux/clk.h. -Platforms and devices utilizing the common struct clk use the struct -clk_ops pointer in struct clk to perform the hardware-specific parts of -the operations defined in clk.h: +Platforms and devices utilizing the common struct clk_core use the struct +clk_ops pointer in struct clk_core to perform the hardware-specific parts of +the operations defined in clk-provider.h: struct clk_ops { int (*prepare)(struct clk_hw *hw); void (*unprepare)(struct clk_hw *hw); + int (*is_prepared)(struct clk_hw *hw); + void (*unprepare_unused)(struct clk_hw *hw); int (*enable)(struct clk_hw *hw); void (*disable)(struct clk_hw *hw); int (*is_enabled)(struct clk_hw *hw); + void (*disable_unused)(struct clk_hw *hw); unsigned long (*recalc_rate)(struct clk_hw *hw, unsigned long parent_rate); long (*round_rate)(struct clk_hw *hw, @@ -84,6 +88,8 @@ the operations defined in clk.h: u8 index); unsigned long (*recalc_accuracy)(struct clk_hw *hw, unsigned long parent_accuracy); + int (*get_phase)(struct clk_hw *hw); + int (*set_phase)(struct clk_hw *hw, int degrees); void (*init)(struct clk_hw *hw); int (*debug_init)(struct clk_hw *hw, struct dentry *dentry); @@ -91,7 +97,7 @@ the operations defined in clk.h: Part 3 - hardware clk implementations -The strength of the common struct clk comes from its .ops and .hw pointers +The strength of the common struct clk_core comes from its .ops and .hw pointers which abstract the details of struct clk from the hardware-specific bits, and vice versa. To illustrate consider the simple gateable clk implementation in drivers/clk/clk-gate.c: @@ -107,7 +113,7 @@ struct clk_gate contains struct clk_hw hw as well as hardware-specific knowledge about which register and bit controls this clk's gating. Nothing about clock topology or accounting, such as enable_count or notifier_count, is needed here. That is all handled by the common -framework code and struct clk. +framework code and struct clk_core. Let's walk through enabling this clk from driver code: @@ -139,22 +145,18 @@ static void clk_gate_set_bit(struct clk_gate *gate) Note that to_clk_gate is defined as: -#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, clk) +#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw) This pattern of abstraction is used for every clock hardware representation. Part 4 - supporting your own clk hardware -When implementing support for a new type of clock it only necessary to +When implementing support for a new type of clock it is only necessary to include the following header: #include <linux/clk-provider.h> -include/linux/clk.h is included within that header and clk-private.h -must never be included from the code which implements the operations for -a clock. More on that below in Part 5. - To construct a clk hardware structure for your platform you must define the following: diff --git a/Documentation/conf.py b/Documentation/conf.py index 106ae9c740b9..0cc8765d3f98 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -14,11 +14,17 @@ import sys import os +import sphinx + +# Get Sphinx version +major, minor, patch = map(int, sphinx.__version__.split(".")) + # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('sphinx')) +from load_config import loadConfig # -- General configuration ------------------------------------------------ @@ -28,14 +34,13 @@ sys.path.insert(0, os.path.abspath('sphinx')) # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. -extensions = ['kernel-doc', 'rstFlatTable', 'kernel_include'] +extensions = ['kernel-doc', 'rstFlatTable', 'kernel_include', 'cdomain'] -# Gracefully handle missing rst2pdf. -try: - import rst2pdf - extensions += ['rst2pdf.pdfbuilder'] -except ImportError: - pass +# The name of the math extension changed on Sphinx 1.4 +if minor > 3: + extensions.append("sphinx.ext.imgmath") +else: + extensions.append("sphinx.ext.pngmath") # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] @@ -252,23 +257,90 @@ htmlhelp_basename = 'TheLinuxKerneldoc' latex_elements = { # The paper size ('letterpaper' or 'a4paper'). -#'papersize': 'letterpaper', +'papersize': 'a4paper', # The font size ('10pt', '11pt' or '12pt'). -#'pointsize': '10pt', - -# Additional stuff for the LaTeX preamble. -#'preamble': '', +'pointsize': '8pt', # Latex figure (float) alignment #'figure_align': 'htbp', + +# Don't mangle with UTF-8 chars +'inputenc': '', +'utf8extra': '', + +# Additional stuff for the LaTeX preamble. + 'preamble': ''' + % Adjust margins + \\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry} + + % Allow generate some pages in landscape + \\usepackage{lscape} + + % Put notes in color and let them be inside a table + \\definecolor{NoteColor}{RGB}{204,255,255} + \\definecolor{WarningColor}{RGB}{255,204,204} + \\definecolor{AttentionColor}{RGB}{255,255,204} + \\definecolor{OtherColor}{RGB}{204,204,204} + \\newlength{\\mynoticelength} + \\makeatletter\\newenvironment{coloredbox}[1]{% + \\setlength{\\fboxrule}{1pt} + \\setlength{\\fboxsep}{7pt} + \\setlength{\\mynoticelength}{\\linewidth} + \\addtolength{\\mynoticelength}{-2\\fboxsep} + \\addtolength{\\mynoticelength}{-2\\fboxrule} + \\begin{lrbox}{\\@tempboxa}\\begin{minipage}{\\mynoticelength}}{\\end{minipage}\\end{lrbox}% + \\ifthenelse% + {\\equal{\\py@noticetype}{note}}% + {\\colorbox{NoteColor}{\\usebox{\\@tempboxa}}}% + {% + \\ifthenelse% + {\\equal{\\py@noticetype}{warning}}% + {\\colorbox{WarningColor}{\\usebox{\\@tempboxa}}}% + {% + \\ifthenelse% + {\\equal{\\py@noticetype}{attention}}% + {\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}% + {\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}% + }% + }% + }\\makeatother + + \\makeatletter + \\renewenvironment{notice}[2]{% + \\def\\py@noticetype{#1} + \\begin{coloredbox}{#1} + \\bf\\it + \\par\\strong{#2} + \\csname py@noticestart@#1\\endcsname + } + { + \\csname py@noticeend@\\py@noticetype\\endcsname + \\end{coloredbox} + } + \\makeatother + + % Use some font with UTF-8 support with XeLaTeX + \\usepackage{fontspec} + \\setsansfont{DejaVu Serif} + \\setromanfont{DejaVu Sans} + \\setmonofont{DejaVu Sans Mono} + + % To allow adjusting table sizes + \\usepackage{adjustbox} + + ''' } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ - (master_doc, 'TheLinuxKernel.tex', 'The Linux Kernel Documentation', + ('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation', + 'The kernel development community', 'manual'), + ('development-process/index', 'development-process.tex', 'Linux Kernel Development Documentation', + 'The kernel development community', 'manual'), + ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', 'The kernel development community', 'manual'), ] @@ -419,3 +491,9 @@ pdf_documents = [ # line arguments. kerneldoc_bin = '../scripts/kernel-doc' kerneldoc_srctree = '..' + +# ------------------------------------------------------------------------------ +# Since loadConfig overwrites settings from the global namespace, it has to be +# the last statement in the conf.py file +# ------------------------------------------------------------------------------ +loadConfig(globals()) diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt index fc647492e940..8d9773f23550 100644 --- a/Documentation/cpu-freq/cpufreq-stats.txt +++ b/Documentation/cpu-freq/cpufreq-stats.txt @@ -103,7 +103,7 @@ Config Main Menu Power management options (ACPI, APM) ---> CPU Frequency scaling ---> [*] CPU Frequency scaling - <*> CPU frequency translation statistics + [*] CPU frequency translation statistics [*] CPU frequency translation statistics details diff --git a/Documentation/coccinelle.txt b/Documentation/dev-tools/coccinelle.rst index 01fb1dae3163..4a64b4c69d3f 100644 --- a/Documentation/coccinelle.txt +++ b/Documentation/dev-tools/coccinelle.rst @@ -1,10 +1,18 @@ -Copyright 2010 Nicolas Palix <npalix@diku.dk> -Copyright 2010 Julia Lawall <julia@diku.dk> -Copyright 2010 Gilles Muller <Gilles.Muller@lip6.fr> +.. Copyright 2010 Nicolas Palix <npalix@diku.dk> +.. Copyright 2010 Julia Lawall <julia@diku.dk> +.. Copyright 2010 Gilles Muller <Gilles.Muller@lip6.fr> +.. highlight:: none - Getting Coccinelle -~~~~~~~~~~~~~~~~~~~~ +Coccinelle +========== + +Coccinelle is a tool for pattern matching and text transformation that has +many uses in kernel development, including the application of complex, +tree-wide patches and detection of problematic programming patterns. + +Getting Coccinelle +------------------- The semantic patches included in the kernel use features and options which are provided by Coccinelle version 1.0.0-rc11 and above. @@ -22,24 +30,23 @@ of many distributions, e.g. : - NetBSD - FreeBSD - You can get the latest version released from the Coccinelle homepage at http://coccinelle.lip6.fr/ Information and tips about Coccinelle are also provided on the wiki pages at http://cocci.ekstranet.diku.dk/wiki/doku.php -Once you have it, run the following command: +Once you have it, run the following command:: ./configure make -as a regular user, and install it with +as a regular user, and install it with:: sudo make install - Supplemental documentation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Supplemental documentation +--------------------------- For supplemental documentation refer to the wiki: @@ -47,49 +54,52 @@ https://bottest.wiki.kernel.org/coccicheck The wiki documentation always refers to the linux-next version of the script. - Using Coccinelle on the Linux kernel -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Using Coccinelle on the Linux kernel +------------------------------------ A Coccinelle-specific target is defined in the top level -Makefile. This target is named 'coccicheck' and calls the 'coccicheck' -front-end in the 'scripts' directory. +Makefile. This target is named ``coccicheck`` and calls the ``coccicheck`` +front-end in the ``scripts`` directory. -Four basic modes are defined: patch, report, context, and org. The mode to -use is specified by setting the MODE variable with 'MODE=<mode>'. +Four basic modes are defined: ``patch``, ``report``, ``context``, and +``org``. The mode to use is specified by setting the MODE variable with +``MODE=<mode>``. -'patch' proposes a fix, when possible. +- ``patch`` proposes a fix, when possible. -'report' generates a list in the following format: +- ``report`` generates a list in the following format: file:line:column-column: message -'context' highlights lines of interest and their context in a -diff-like style.Lines of interest are indicated with '-'. +- ``context`` highlights lines of interest and their context in a + diff-like style.Lines of interest are indicated with ``-``. -'org' generates a report in the Org mode format of Emacs. +- ``org`` generates a report in the Org mode format of Emacs. Note that not all semantic patches implement all modes. For easy use of Coccinelle, the default mode is "report". Two other modes provide some common combinations of these modes. -'chain' tries the previous modes in the order above until one succeeds. +- ``chain`` tries the previous modes in the order above until one succeeds. + +- ``rep+ctxt`` runs successively the report mode and the context mode. + It should be used with the C option (described later) + which checks the code on a file basis. -'rep+ctxt' runs successively the report mode and the context mode. - It should be used with the C option (described later) - which checks the code on a file basis. +Examples +~~~~~~~~ -Examples: - To make a report for every semantic patch, run the following command: +To make a report for every semantic patch, run the following command:: make coccicheck MODE=report - To produce patches, run: +To produce patches, run:: make coccicheck MODE=patch The coccicheck target applies every semantic patch available in the -sub-directories of 'scripts/coccinelle' to the entire Linux kernel. +sub-directories of ``scripts/coccinelle`` to the entire Linux kernel. For each semantic patch, a commit message is proposed. It gives a description of the problem being checked by the semantic patch, and @@ -99,15 +109,15 @@ As any static code analyzer, Coccinelle produces false positives. Thus, reports must be carefully checked, and patches reviewed. -To enable verbose messages set the V= variable, for example: +To enable verbose messages set the V= variable, for example:: make coccicheck MODE=report V=1 - Coccinelle parallelization -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Coccinelle parallelization +--------------------------- By default, coccicheck tries to run as parallel as possible. To change -the parallelism, set the J= variable. For example, to run across 4 CPUs: +the parallelism, set the J= variable. For example, to run across 4 CPUs:: make coccicheck MODE=report J=4 @@ -115,44 +125,47 @@ As of Coccinelle 1.0.2 Coccinelle uses Ocaml parmap for parallelization, if support for this is detected you will benefit from parmap parallelization. When parmap is enabled coccicheck will enable dynamic load balancing by using -'--chunksize 1' argument, this ensures we keep feeding threads with work +``--chunksize 1`` argument, this ensures we keep feeding threads with work one by one, so that we avoid the situation where most work gets done by only a few threads. With dynamic load balancing, if a thread finishes early we keep feeding it more work. When parmap is enabled, if an error occurs in Coccinelle, this error -value is propagated back, the return value of the 'make coccicheck' +value is propagated back, the return value of the ``make coccicheck`` captures this return value. - Using Coccinelle with a single semantic patch -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Using Coccinelle with a single semantic patch +--------------------------------------------- The optional make variable COCCI can be used to check a single semantic patch. In that case, the variable must be initialized with the name of the semantic patch to apply. -For instance: +For instance:: make coccicheck COCCI=<my_SP.cocci> MODE=patch -or + +or:: + make coccicheck COCCI=<my_SP.cocci> MODE=report - Controlling Which Files are Processed by Coccinelle -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Controlling Which Files are Processed by Coccinelle +--------------------------------------------------- + By default the entire kernel source tree is checked. -To apply Coccinelle to a specific directory, M= can be used. -For example, to check drivers/net/wireless/ one may write: +To apply Coccinelle to a specific directory, ``M=`` can be used. +For example, to check drivers/net/wireless/ one may write:: make coccicheck M=drivers/net/wireless/ To apply Coccinelle on a file basis, instead of a directory basis, the -following command may be used: +following command may be used:: make C=1 CHECK="scripts/coccicheck" -To check only newly edited code, use the value 2 for the C flag, i.e. +To check only newly edited code, use the value 2 for the C flag, i.e.:: make C=2 CHECK="scripts/coccicheck" @@ -166,8 +179,8 @@ semantic patch as shown in the previous section. The "report" mode is the default. You can select another one with the MODE variable explained above. - Debugging Coccinelle SmPL patches -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Debugging Coccinelle SmPL patches +--------------------------------- Using coccicheck is best as it provides in the spatch command line include options matching the options used when we compile the kernel. @@ -177,8 +190,8 @@ manually run Coccinelle with debug options added. Alternatively you can debug running Coccinelle against SmPL patches by asking for stderr to be redirected to stderr, by default stderr is redirected to /dev/null, if you'd like to capture stderr you -can specify the DEBUG_FILE="file.txt" option to coccicheck. For -instance: +can specify the ``DEBUG_FILE="file.txt"`` option to coccicheck. For +instance:: rm -f cocci.err make coccicheck COCCI=scripts/coccinelle/free/kfree.cocci MODE=report DEBUG_FILE=cocci.err @@ -186,7 +199,7 @@ instance: You can use SPFLAGS to add debugging flags, for instance you may want to add both --profile --show-trying to SPFLAGS when debugging. For instance -you may want to use: +you may want to use:: rm -f err.log export COCCI=scripts/coccinelle/misc/irqf_oneshot.cocci @@ -198,24 +211,24 @@ work. DEBUG_FILE support is only supported when using coccinelle >= 1.2. - .cocciconfig support -~~~~~~~~~~~~~~~~~~~~~~ +.cocciconfig support +-------------------- Coccinelle supports reading .cocciconfig for default Coccinelle options that should be used every time spatch is spawned, the order of precedence for variables for .cocciconfig is as follows: - o Your current user's home directory is processed first - o Your directory from which spatch is called is processed next - o The directory provided with the --dir option is processed last, if used +- Your current user's home directory is processed first +- Your directory from which spatch is called is processed next +- The directory provided with the --dir option is processed last, if used Since coccicheck runs through make, it naturally runs from the kernel proper dir, as such the second rule above would be implied for picking up a -.cocciconfig when using 'make coccicheck'. +.cocciconfig when using ``make coccicheck``. -'make coccicheck' also supports using M= targets.If you do not supply +``make coccicheck`` also supports using M= targets.If you do not supply any M= target, it is assumed you want to target the entire kernel. -The kernel coccicheck script has: +The kernel coccicheck script has:: if [ "$KBUILD_EXTMOD" = "" ] ; then OPTIONS="--dir $srctree $COCCIINCLUDE" @@ -235,12 +248,12 @@ override any of the kernel's .coccicheck's settings using SPFLAGS. We help Coccinelle when used against Linux with a set of sensible defaults options for Linux with our own Linux .cocciconfig. This hints to coccinelle -git can be used for 'git grep' queries over coccigrep. A timeout of 200 +git can be used for ``git grep`` queries over coccigrep. A timeout of 200 seconds should suffice for now. The options picked up by coccinelle when reading a .cocciconfig do not appear as arguments to spatch processes running on your system, to confirm what -options will be used by Coccinelle run: +options will be used by Coccinelle run:: spatch --print-options-only @@ -252,219 +265,227 @@ carries its own .cocciconfig, you will need to use SPFLAGS to use idutils if desired. See below section "Additional flags" for more details on how to use idutils. - Additional flags -~~~~~~~~~~~~~~~~~~ +Additional flags +---------------- Additional flags can be passed to spatch through the SPFLAGS variable. This works as Coccinelle respects the last flags -given to it when options are in conflict. +given to it when options are in conflict. :: make SPFLAGS=--use-glimpse coccicheck Coccinelle supports idutils as well but requires coccinelle >= 1.0.6. When no ID file is specified coccinelle assumes your ID database file is in the file .id-utils.index on the top level of the kernel, coccinelle -carries a script scripts/idutils_index.sh which creates the database with +carries a script scripts/idutils_index.sh which creates the database with:: mkid -i C --output .id-utils.index If you have another database filename you can also just symlink with this -name. +name. :: make SPFLAGS=--use-idutils coccicheck Alternatively you can specify the database filename explicitly, for -instance: +instance:: make SPFLAGS="--use-idutils /full-path/to/ID" coccicheck -See spatch --help to learn more about spatch options. +See ``spatch --help`` to learn more about spatch options. -Note that the '--use-glimpse' and '--use-idutils' options +Note that the ``--use-glimpse`` and ``--use-idutils`` options require external tools for indexing the code. None of them is thus active by default. However, by indexing the code with one of these tools, and according to the cocci file used, spatch could proceed the entire code base more quickly. - SmPL patch specific options -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +SmPL patch specific options +--------------------------- SmPL patches can have their own requirements for options passed to Coccinelle. SmPL patch specific options can be provided by -providing them at the top of the SmPL patch, for instance: +providing them at the top of the SmPL patch, for instance:: -// Options: --no-includes --include-headers + // Options: --no-includes --include-headers - SmPL patch Coccinelle requirements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +SmPL patch Coccinelle requirements +---------------------------------- As Coccinelle features get added some more advanced SmPL patches may require newer versions of Coccinelle. If an SmPL patch requires at least a version of Coccinelle, this can be specified as follows, -as an example if requiring at least Coccinelle >= 1.0.5: +as an example if requiring at least Coccinelle >= 1.0.5:: -// Requires: 1.0.5 + // Requires: 1.0.5 - Proposing new semantic patches -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Proposing new semantic patches +------------------------------- New semantic patches can be proposed and submitted by kernel developers. For sake of clarity, they should be organized in the -sub-directories of 'scripts/coccinelle/'. +sub-directories of ``scripts/coccinelle/``. - Detailed description of the 'report' mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Detailed description of the ``report`` mode +------------------------------------------- + +``report`` generates a list in the following format:: -'report' generates a list in the following format: file:line:column-column: message -Example: +Example +~~~~~~~ -Running +Running:: make coccicheck MODE=report COCCI=scripts/coccinelle/api/err_cast.cocci -will execute the following part of the SmPL script. +will execute the following part of the SmPL script:: -<smpl> -@r depends on !context && !patch && (org || report)@ -expression x; -position p; -@@ + <smpl> + @r depends on !context && !patch && (org || report)@ + expression x; + position p; + @@ - ERR_PTR@p(PTR_ERR(x)) + ERR_PTR@p(PTR_ERR(x)) -@script:python depends on report@ -p << r.p; -x << r.x; -@@ + @script:python depends on report@ + p << r.p; + x << r.x; + @@ -msg="ERR_CAST can be used with %s" % (x) -coccilib.report.print_report(p[0], msg) -</smpl> + msg="ERR_CAST can be used with %s" % (x) + coccilib.report.print_report(p[0], msg) + </smpl> This SmPL excerpt generates entries on the standard output, as -illustrated below: +illustrated below:: -/home/user/linux/crypto/ctr.c:188:9-16: ERR_CAST can be used with alg -/home/user/linux/crypto/authenc.c:619:9-16: ERR_CAST can be used with auth -/home/user/linux/crypto/xts.c:227:9-16: ERR_CAST can be used with alg + /home/user/linux/crypto/ctr.c:188:9-16: ERR_CAST can be used with alg + /home/user/linux/crypto/authenc.c:619:9-16: ERR_CAST can be used with auth + /home/user/linux/crypto/xts.c:227:9-16: ERR_CAST can be used with alg - Detailed description of the 'patch' mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Detailed description of the ``patch`` mode +------------------------------------------ -When the 'patch' mode is available, it proposes a fix for each problem +When the ``patch`` mode is available, it proposes a fix for each problem identified. -Example: +Example +~~~~~~~ + +Running:: -Running make coccicheck MODE=patch COCCI=scripts/coccinelle/api/err_cast.cocci -will execute the following part of the SmPL script. +will execute the following part of the SmPL script:: -<smpl> -@ depends on !context && patch && !org && !report @ -expression x; -@@ + <smpl> + @ depends on !context && patch && !org && !report @ + expression x; + @@ -- ERR_PTR(PTR_ERR(x)) -+ ERR_CAST(x) -</smpl> + - ERR_PTR(PTR_ERR(x)) + + ERR_CAST(x) + </smpl> This SmPL excerpt generates patch hunks on the standard output, as -illustrated below: +illustrated below:: -diff -u -p a/crypto/ctr.c b/crypto/ctr.c ---- a/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 -+++ b/crypto/ctr.c 2010-06-03 23:44:49.000000000 +0200 -@@ -185,7 +185,7 @@ static struct crypto_instance *crypto_ct + diff -u -p a/crypto/ctr.c b/crypto/ctr.c + --- a/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 + +++ b/crypto/ctr.c 2010-06-03 23:44:49.000000000 +0200 + @@ -185,7 +185,7 @@ static struct crypto_instance *crypto_ct alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK); if (IS_ERR(alg)) -- return ERR_PTR(PTR_ERR(alg)); -+ return ERR_CAST(alg); - + - return ERR_PTR(PTR_ERR(alg)); + + return ERR_CAST(alg); + /* Block size must be >= 4 bytes. */ err = -EINVAL; - Detailed description of the 'context' mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Detailed description of the ``context`` mode +-------------------------------------------- -'context' highlights lines of interest and their context +``context`` highlights lines of interest and their context in a diff-like style. -NOTE: The diff-like output generated is NOT an applicable patch. The - intent of the 'context' mode is to highlight the important lines - (annotated with minus, '-') and gives some surrounding context + **NOTE**: The diff-like output generated is NOT an applicable patch. The + intent of the ``context`` mode is to highlight the important lines + (annotated with minus, ``-``) and gives some surrounding context lines around. This output can be used with the diff mode of Emacs to review the code. -Example: +Example +~~~~~~~ + +Running:: -Running make coccicheck MODE=context COCCI=scripts/coccinelle/api/err_cast.cocci -will execute the following part of the SmPL script. +will execute the following part of the SmPL script:: -<smpl> -@ depends on context && !patch && !org && !report@ -expression x; -@@ + <smpl> + @ depends on context && !patch && !org && !report@ + expression x; + @@ -* ERR_PTR(PTR_ERR(x)) -</smpl> + * ERR_PTR(PTR_ERR(x)) + </smpl> This SmPL excerpt generates diff hunks on the standard output, as -illustrated below: +illustrated below:: -diff -u -p /home/user/linux/crypto/ctr.c /tmp/nothing ---- /home/user/linux/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 -+++ /tmp/nothing -@@ -185,7 +185,6 @@ static struct crypto_instance *crypto_ct + diff -u -p /home/user/linux/crypto/ctr.c /tmp/nothing + --- /home/user/linux/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 + +++ /tmp/nothing + @@ -185,7 +185,6 @@ static struct crypto_instance *crypto_ct alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK); if (IS_ERR(alg)) -- return ERR_PTR(PTR_ERR(alg)); - + - return ERR_PTR(PTR_ERR(alg)); + /* Block size must be >= 4 bytes. */ err = -EINVAL; - Detailed description of the 'org' mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Detailed description of the ``org`` mode +---------------------------------------- + +``org`` generates a report in the Org mode format of Emacs. -'org' generates a report in the Org mode format of Emacs. +Example +~~~~~~~ -Example: +Running:: -Running make coccicheck MODE=org COCCI=scripts/coccinelle/api/err_cast.cocci -will execute the following part of the SmPL script. +will execute the following part of the SmPL script:: -<smpl> -@r depends on !context && !patch && (org || report)@ -expression x; -position p; -@@ + <smpl> + @r depends on !context && !patch && (org || report)@ + expression x; + position p; + @@ - ERR_PTR@p(PTR_ERR(x)) + ERR_PTR@p(PTR_ERR(x)) -@script:python depends on org@ -p << r.p; -x << r.x; -@@ + @script:python depends on org@ + p << r.p; + x << r.x; + @@ -msg="ERR_CAST can be used with %s" % (x) -msg_safe=msg.replace("[","@(").replace("]",")") -coccilib.org.print_todo(p[0], msg_safe) -</smpl> + msg="ERR_CAST can be used with %s" % (x) + msg_safe=msg.replace("[","@(").replace("]",")") + coccilib.org.print_todo(p[0], msg_safe) + </smpl> This SmPL excerpt generates Org entries on the standard output, as -illustrated below: +illustrated below:: -* TODO [[view:/home/user/linux/crypto/ctr.c::face=ovl-face1::linb=188::colb=9::cole=16][ERR_CAST can be used with alg]] -* TODO [[view:/home/user/linux/crypto/authenc.c::face=ovl-face1::linb=619::colb=9::cole=16][ERR_CAST can be used with auth]] -* TODO [[view:/home/user/linux/crypto/xts.c::face=ovl-face1::linb=227::colb=9::cole=16][ERR_CAST can be used with alg]] + * TODO [[view:/home/user/linux/crypto/ctr.c::face=ovl-face1::linb=188::colb=9::cole=16][ERR_CAST can be used with alg]] + * TODO [[view:/home/user/linux/crypto/authenc.c::face=ovl-face1::linb=619::colb=9::cole=16][ERR_CAST can be used with auth]] + * TODO [[view:/home/user/linux/crypto/xts.c::face=ovl-face1::linb=227::colb=9::cole=16][ERR_CAST can be used with alg]] diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst new file mode 100644 index 000000000000..19eedfea8800 --- /dev/null +++ b/Documentation/dev-tools/gcov.rst @@ -0,0 +1,256 @@ +Using gcov with the Linux kernel +================================ + +gcov profiling kernel support enables the use of GCC's coverage testing +tool gcov_ with the Linux kernel. Coverage data of a running kernel +is exported in gcov-compatible format via the "gcov" debugfs directory. +To get coverage data for a specific file, change to the kernel build +directory and use gcov with the ``-o`` option as follows (requires root):: + + # cd /tmp/linux-out + # gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c + +This will create source code files annotated with execution counts +in the current directory. In addition, graphical gcov front-ends such +as lcov_ can be used to automate the process of collecting data +for the entire kernel and provide coverage overviews in HTML format. + +Possible uses: + +* debugging (has this line been reached at all?) +* test improvement (how do I change my test to cover these lines?) +* minimizing kernel configurations (do I need this option if the + associated code is never run?) + +.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html +.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php + + +Preparation +----------- + +Configure the kernel with:: + + CONFIG_DEBUG_FS=y + CONFIG_GCOV_KERNEL=y + +select the gcc's gcov format, default is autodetect based on gcc version:: + + CONFIG_GCOV_FORMAT_AUTODETECT=y + +and to get coverage data for the entire kernel:: + + CONFIG_GCOV_PROFILE_ALL=y + +Note that kernels compiled with profiling flags will be significantly +larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported +on all architectures. + +Profiling data will only become accessible once debugfs has been +mounted:: + + mount -t debugfs none /sys/kernel/debug + + +Customization +------------- + +To enable profiling for specific files or directories, add a line +similar to the following to the respective kernel Makefile: + +- For a single file (e.g. main.o):: + + GCOV_PROFILE_main.o := y + +- For all files in one directory:: + + GCOV_PROFILE := y + +To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL +is specified, use:: + + GCOV_PROFILE_main.o := n + +and:: + + GCOV_PROFILE := n + +Only files which are linked to the main kernel image or are compiled as +kernel modules are supported by this mechanism. + + +Files +----- + +The gcov kernel support creates the following files in debugfs: + +``/sys/kernel/debug/gcov`` + Parent directory for all gcov-related files. + +``/sys/kernel/debug/gcov/reset`` + Global reset file: resets all coverage data to zero when + written to. + +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda`` + The actual gcov data file as understood by the gcov + tool. Resets file coverage data to zero when written to. + +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno`` + Symbolic link to a static data file required by the gcov + tool. This file is generated by gcc when compiling with + option ``-ftest-coverage``. + + +Modules +------- + +Kernel modules may contain cleanup code which is only run during +module unload time. The gcov mechanism provides a means to collect +coverage data for such code by keeping a copy of the data associated +with the unloaded module. This data remains available through debugfs. +Once the module is loaded again, the associated coverage counters are +initialized with the data from its previous instantiation. + +This behavior can be deactivated by specifying the gcov_persist kernel +parameter:: + + gcov_persist=0 + +At run-time, a user can also choose to discard data for an unloaded +module by writing to its data file or the global reset file. + + +Separated build and test machines +--------------------------------- + +The gcov kernel profiling infrastructure is designed to work out-of-the +box for setups where kernels are built and run on the same machine. In +cases where the kernel runs on a separate machine, special preparations +must be made, depending on where the gcov tool is used: + +a) gcov is run on the TEST machine + + The gcov tool version on the test machine must be compatible with the + gcc version used for kernel build. Also the following files need to be + copied from build to test machine: + + from the source tree: + - all C source files + headers + + from the build tree: + - all C source files + headers + - all .gcda and .gcno files + - all links to directories + + It is important to note that these files need to be placed into the + exact same file system location on the test machine as on the build + machine. If any of the path components is symbolic link, the actual + directory needs to be used instead (due to make's CURDIR handling). + +b) gcov is run on the BUILD machine + + The following files need to be copied after each test case from test + to build machine: + + from the gcov directory in sysfs: + - all .gcda files + - all links to .gcno files + + These files can be copied to any location on the build machine. gcov + must then be called with the -o option pointing to that directory. + + Example directory setup on the build machine:: + + /tmp/linux: kernel source tree + /tmp/out: kernel build directory as specified by make O= + /tmp/coverage: location of the files copied from the test machine + + [user@build] cd /tmp/out + [user@build] gcov -o /tmp/coverage/tmp/out/init main.c + + +Troubleshooting +--------------- + +Problem + Compilation aborts during linker step. + +Cause + Profiling flags are specified for source files which are not + linked to the main kernel or which are linked by a custom + linker procedure. + +Solution + Exclude affected source files from profiling by specifying + ``GCOV_PROFILE := n`` or ``GCOV_PROFILE_basename.o := n`` in the + corresponding Makefile. + +Problem + Files copied from sysfs appear empty or incomplete. + +Cause + Due to the way seq_file works, some tools such as cp or tar + may not correctly copy files from sysfs. + +Solution + Use ``cat``' to read ``.gcda`` files and ``cp -d`` to copy links. + Alternatively use the mechanism shown in Appendix B. + + +Appendix A: gather_on_build.sh +------------------------------ + +Sample script to gather coverage meta files on the build machine +(see 6a):: + + #!/bin/bash + + KSRC=$1 + KOBJ=$2 + DEST=$3 + + if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then + echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2 + exit 1 + fi + + KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) + KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) + + find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \ + -perm /u+r,g+r | tar cfz $DEST -P -T - + + if [ $? -eq 0 ] ; then + echo "$DEST successfully created, copy to test system and unpack with:" + echo " tar xfz $DEST -P" + else + echo "Could not create file $DEST" + fi + + +Appendix B: gather_on_test.sh +----------------------------- + +Sample script to gather coverage data files on the test machine +(see 6b):: + + #!/bin/bash -e + + DEST=$1 + GCDA=/sys/kernel/debug/gcov + + if [ -z "$DEST" ] ; then + echo "Usage: $0 <output.tar.gz>" >&2 + exit 1 + fi + + TEMPDIR=$(mktemp -d) + echo Collecting data.. + find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \; + find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \; + find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \; + tar czf $DEST -C $TEMPDIR sys + rm -rf $TEMPDIR + + echo "$DEST successfully created, copy to build system and unpack with:" + echo " tar xfz $DEST" diff --git a/Documentation/gdb-kernel-debugging.txt b/Documentation/dev-tools/gdb-kernel-debugging.rst index 7050ce8794b9..5e93c9bc6619 100644 --- a/Documentation/gdb-kernel-debugging.txt +++ b/Documentation/dev-tools/gdb-kernel-debugging.rst @@ -1,3 +1,5 @@ +.. highlight:: none + Debugging kernel and modules via gdb ==================================== @@ -13,54 +15,58 @@ be transferred to the other gdb stubs as well. Requirements ------------ - o gdb 7.2+ (recommended: 7.4+) with python support enabled (typically true - for distributions) +- gdb 7.2+ (recommended: 7.4+) with python support enabled (typically true + for distributions) Setup ----- - o Create a virtual Linux machine for QEMU/KVM (see www.linux-kvm.org and - www.qemu.org for more details). For cross-development, - http://landley.net/aboriginal/bin keeps a pool of machine images and - toolchains that can be helpful to start from. +- Create a virtual Linux machine for QEMU/KVM (see www.linux-kvm.org and + www.qemu.org for more details). For cross-development, + http://landley.net/aboriginal/bin keeps a pool of machine images and + toolchains that can be helpful to start from. - o Build the kernel with CONFIG_GDB_SCRIPTS enabled, but leave - CONFIG_DEBUG_INFO_REDUCED off. If your architecture supports - CONFIG_FRAME_POINTER, keep it enabled. +- Build the kernel with CONFIG_GDB_SCRIPTS enabled, but leave + CONFIG_DEBUG_INFO_REDUCED off. If your architecture supports + CONFIG_FRAME_POINTER, keep it enabled. - o Install that kernel on the guest. +- Install that kernel on the guest. + Alternatively, QEMU allows to boot the kernel directly using -kernel, + -append, -initrd command line switches. This is generally only useful if + you do not depend on modules. See QEMU documentation for more details on + this mode. - Alternatively, QEMU allows to boot the kernel directly using -kernel, - -append, -initrd command line switches. This is generally only useful if - you do not depend on modules. See QEMU documentation for more details on - this mode. +- Enable the gdb stub of QEMU/KVM, either - o Enable the gdb stub of QEMU/KVM, either - at VM startup time by appending "-s" to the QEMU command line - or + + or + - during runtime by issuing "gdbserver" from the QEMU monitor console - o cd /path/to/linux-build +- cd /path/to/linux-build - o Start gdb: gdb vmlinux +- Start gdb: gdb vmlinux - Note: Some distros may restrict auto-loading of gdb scripts to known safe - directories. In case gdb reports to refuse loading vmlinux-gdb.py, add + Note: Some distros may restrict auto-loading of gdb scripts to known safe + directories. In case gdb reports to refuse loading vmlinux-gdb.py, add:: add-auto-load-safe-path /path/to/linux-build - to ~/.gdbinit. See gdb help for more details. + to ~/.gdbinit. See gdb help for more details. + +- Attach to the booted guest:: - o Attach to the booted guest: (gdb) target remote :1234 Examples of using the Linux-provided gdb helpers ------------------------------------------------ - o Load module (and main kernel) symbols: +- Load module (and main kernel) symbols:: + (gdb) lx-symbols loading vmlinux scanning for modules in /home/user/linux/build @@ -72,17 +78,20 @@ Examples of using the Linux-provided gdb helpers ... loading @0xffffffffa0000000: /home/user/linux/build/drivers/ata/ata_generic.ko - o Set a breakpoint on some not yet loaded module function, e.g.: +- Set a breakpoint on some not yet loaded module function, e.g.:: + (gdb) b btrfs_init_sysfs Function "btrfs_init_sysfs" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (btrfs_init_sysfs) pending. - o Continue the target +- Continue the target:: + (gdb) c - o Load the module on the target and watch the symbols being loaded as well as - the breakpoint hit: +- Load the module on the target and watch the symbols being loaded as well as + the breakpoint hit:: + loading @0xffffffffa0034000: /home/user/linux/build/lib/libcrc32c.ko loading @0xffffffffa0050000: /home/user/linux/build/lib/lzo/lzo_compress.ko loading @0xffffffffa006e000: /home/user/linux/build/lib/zlib_deflate/zlib_deflate.ko @@ -91,7 +100,8 @@ Examples of using the Linux-provided gdb helpers Breakpoint 1, btrfs_init_sysfs () at /home/user/linux/fs/btrfs/sysfs.c:36 36 btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj); - o Dump the log buffer of the target kernel: +- Dump the log buffer of the target kernel:: + (gdb) lx-dmesg [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu @@ -102,19 +112,22 @@ Examples of using the Linux-provided gdb helpers [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved .... - o Examine fields of the current task struct: +- Examine fields of the current task struct:: + (gdb) p $lx_current().pid $1 = 4998 (gdb) p $lx_current().comm $2 = "modprobe\000\000\000\000\000\000\000" - o Make use of the per-cpu function for the current or a specified CPU: +- Make use of the per-cpu function for the current or a specified CPU:: + (gdb) p $lx_per_cpu("runqueues").nr_running $3 = 1 (gdb) p $lx_per_cpu("runqueues", 2).nr_running $4 = 0 - o Dig into hrtimers using the container_of helper: +- Dig into hrtimers using the container_of helper:: + (gdb) set $next = $lx_per_cpu("hrtimer_bases").clock_base[0].active.next (gdb) p *$container_of($next, "struct hrtimer", "node") $5 = { @@ -144,7 +157,7 @@ List of commands and functions ------------------------------ The number of commands and convenience functions may evolve over the time, -this is just a snapshot of the initial version: +this is just a snapshot of the initial version:: (gdb) apropos lx function lx_current -- Return current task diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst new file mode 100644 index 000000000000..f7a18f274357 --- /dev/null +++ b/Documentation/dev-tools/kasan.rst @@ -0,0 +1,173 @@ +The Kernel Address Sanitizer (KASAN) +==================================== + +Overview +-------- + +KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides +a fast and comprehensive solution for finding use-after-free and out-of-bounds +bugs. + +KASAN uses compile-time instrumentation for checking every memory access, +therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is +required for detection of out-of-bounds accesses to stack or global variables. + +Currently KASAN is supported only for the x86_64 and arm64 architectures. + +Usage +----- + +To enable KASAN configure kernel with:: + + CONFIG_KASAN = y + +and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and +inline are compiler instrumentation types. The former produces smaller binary +the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC +version 5.0 or later. + +KASAN works with both SLUB and SLAB memory allocators. +For better bug detection and nicer reporting, enable CONFIG_STACKTRACE. + +To disable instrumentation for specific files or directories, add a line +similar to the following to the respective kernel Makefile: + +- For a single file (e.g. main.o):: + + KASAN_SANITIZE_main.o := n + +- For all files in one directory:: + + KASAN_SANITIZE := n + +Error reports +~~~~~~~~~~~~~ + +A typical out of bounds access report looks like this:: + + ================================================================== + BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3 + Write of size 1 by task modprobe/1689 + ============================================================================= + BUG kmalloc-128 (Not tainted): kasan error + ----------------------------------------------------------------------------- + + Disabling lock debugging due to kernel taint + INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689 + __slab_alloc+0x4b4/0x4f0 + kmem_cache_alloc_trace+0x10b/0x190 + kmalloc_oob_right+0x3d/0x75 [test_kasan] + init_module+0x9/0x47 [test_kasan] + do_one_initcall+0x99/0x200 + load_module+0x2cb3/0x3b20 + SyS_finit_module+0x76/0x80 + system_call_fastpath+0x12/0x17 + INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080 + INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720 + + Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ + Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk + Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. + Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........ + Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ + CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 + ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78 + ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8 + ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558 + Call Trace: + [<ffffffff81cc68ae>] dump_stack+0x46/0x58 + [<ffffffff811fd848>] print_trailer+0xf8/0x160 + [<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan] + [<ffffffff811ff0f5>] object_err+0x35/0x40 + [<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan] + [<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0 + [<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40 + [<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40 + [<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40 + [<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan] + [<ffffffff8120a995>] __asan_store1+0x75/0xb0 + [<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan] + [<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan] + [<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan] + [<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan] + [<ffffffff810002d9>] do_one_initcall+0x99/0x200 + [<ffffffff811e4e5c>] ? __vunmap+0xec/0x160 + [<ffffffff81114f63>] load_module+0x2cb3/0x3b20 + [<ffffffff8110fd70>] ? m_show+0x240/0x240 + [<ffffffff81115f06>] SyS_finit_module+0x76/0x80 + [<ffffffff81cd3129>] system_call_fastpath+0x12/0x17 + Memory state around the buggy address: + ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc + ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc + ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc + ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc + ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00 + >ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc + ^ + ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc + ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc + ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb + ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb + ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb + ================================================================== + +The header of the report discribe what kind of bug happened and what kind of +access caused it. It's followed by the description of the accessed slub object +(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and +the description of the accessed memory page. + +In the last section the report shows memory state around the accessed address. +Reading this part requires some understanding of how KASAN works. + +The state of each 8 aligned bytes of memory is encoded in one shadow byte. +Those 8 bytes can be accessible, partially accessible, freed or be a redzone. +We use the following encoding for each shadow byte: 0 means that all 8 bytes +of the corresponding memory region are accessible; number N (1 <= N <= 7) means +that the first N bytes are accessible, and other (8 - N) bytes are not; +any negative value indicates that the entire 8-byte word is inaccessible. +We use different negative values to distinguish between different kinds of +inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h). + +In the report above the arrows point to the shadow byte 03, which means that +the accessed address is partially accessible. + + +Implementation details +---------------------- + +From a high level, our approach to memory error detection is similar to that +of kmemcheck: use shadow memory to record whether each byte of memory is safe +to access, and use compile-time instrumentation to check shadow memory on each +memory access. + +AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory +(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and +offset to translate a memory address to its corresponding shadow address. + +Here is the function which translates an address to its corresponding shadow +address:: + + static inline void *kasan_mem_to_shadow(const void *addr) + { + return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) + + KASAN_SHADOW_OFFSET; + } + +where ``KASAN_SHADOW_SCALE_SHIFT = 3``. + +Compile-time instrumentation used for checking memory accesses. Compiler inserts +function calls (__asan_load*(addr), __asan_store*(addr)) before each memory +access of size 1, 2, 4, 8 or 16. These functions check whether memory access is +valid or not by checking corresponding shadow memory. + +GCC 5.0 has possibility to perform inline instrumentation. Instead of making +function calls GCC directly inserts the code to check the shadow memory. +This option significantly enlarges kernel but it gives x1.1-x2 performance +boost over outline instrumented kernel. diff --git a/Documentation/kcov.txt b/Documentation/dev-tools/kcov.rst index 779ff4ab1c1d..aca0e27ca197 100644 --- a/Documentation/kcov.txt +++ b/Documentation/dev-tools/kcov.rst @@ -12,38 +12,38 @@ To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic parts of kernel is disbled (e.g. scheduler, locking). -Usage: -====== +Usage +----- -Configure kernel with: +Configure the kernel with:: CONFIG_KCOV=y CONFIG_KCOV requires gcc built on revision 231296 or later. -Profiling data will only become accessible once debugfs has been mounted: +Profiling data will only become accessible once debugfs has been mounted:: mount -t debugfs none /sys/kernel/debug -The following program demonstrates kcov usage from within a test program: - -#include <stdio.h> -#include <stddef.h> -#include <stdint.h> -#include <stdlib.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <sys/ioctl.h> -#include <sys/mman.h> -#include <unistd.h> -#include <fcntl.h> - -#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) -#define KCOV_ENABLE _IO('c', 100) -#define KCOV_DISABLE _IO('c', 101) -#define COVER_SIZE (64<<10) - -int main(int argc, char **argv) -{ +The following program demonstrates kcov usage from within a test program:: + + #include <stdio.h> + #include <stddef.h> + #include <stdint.h> + #include <stdlib.h> + #include <sys/types.h> + #include <sys/stat.h> + #include <sys/ioctl.h> + #include <sys/mman.h> + #include <unistd.h> + #include <fcntl.h> + + #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) + #define KCOV_ENABLE _IO('c', 100) + #define KCOV_DISABLE _IO('c', 101) + #define COVER_SIZE (64<<10) + + int main(int argc, char **argv) + { int fd; unsigned long *cover, n, i; @@ -83,24 +83,24 @@ int main(int argc, char **argv) if (close(fd)) perror("close"), exit(1); return 0; -} - -After piping through addr2line output of the program looks as follows: - -SyS_read -fs/read_write.c:562 -__fdget_pos -fs/file.c:774 -__fget_light -fs/file.c:746 -__fget_light -fs/file.c:750 -__fget_light -fs/file.c:760 -__fdget_pos -fs/file.c:784 -SyS_read -fs/read_write.c:562 + } + +After piping through addr2line output of the program looks as follows:: + + SyS_read + fs/read_write.c:562 + __fdget_pos + fs/file.c:774 + __fget_light + fs/file.c:746 + __fget_light + fs/file.c:750 + __fget_light + fs/file.c:760 + __fdget_pos + fs/file.c:784 + SyS_read + fs/read_write.c:562 If a program needs to collect coverage from several threads (independently), it needs to open /sys/kernel/debug/kcov in each thread separately. diff --git a/Documentation/dev-tools/kmemcheck.rst b/Documentation/dev-tools/kmemcheck.rst new file mode 100644 index 000000000000..7f3d1985de74 --- /dev/null +++ b/Documentation/dev-tools/kmemcheck.rst @@ -0,0 +1,733 @@ +Getting started with kmemcheck +============================== + +Vegard Nossum <vegardno@ifi.uio.no> + + +Introduction +------------ + +kmemcheck is a debugging feature for the Linux Kernel. More specifically, it +is a dynamic checker that detects and warns about some uses of uninitialized +memory. + +Userspace programmers might be familiar with Valgrind's memcheck. The main +difference between memcheck and kmemcheck is that memcheck works for userspace +programs only, and kmemcheck works for the kernel only. The implementations +are of course vastly different. Because of this, kmemcheck is not as accurate +as memcheck, but it turns out to be good enough in practice to discover real +programmer errors that the compiler is not able to find through static +analysis. + +Enabling kmemcheck on a kernel will probably slow it down to the extent that +the machine will not be usable for normal workloads such as e.g. an +interactive desktop. kmemcheck will also cause the kernel to use about twice +as much memory as normal. For this reason, kmemcheck is strictly a debugging +feature. + + +Downloading +----------- + +As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel. + + +Configuring and compiling +------------------------- + +kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of +configuration variables must have specific settings in order for the kmemcheck +menu to even appear in "menuconfig". These are: + +- ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n`` + This option is located under "General setup" / "Optimize for size". + + Without this, gcc will use certain optimizations that usually lead to + false positive warnings from kmemcheck. An example of this is a 16-bit + field in a struct, where gcc may load 32 bits, then discard the upper + 16 bits. kmemcheck sees only the 32-bit load, and may trigger a + warning for the upper 16 bits (if they're uninitialized). + +- ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y`` + This option is located under "General setup" / "Choose SLAB + allocator". + +- ``CONFIG_FUNCTION_TRACER=n`` + This option is located under "Kernel hacking" / "Tracers" / "Kernel + Function Tracer" + + When function tracing is compiled in, gcc emits a call to another + function at the beginning of every function. This means that when the + page fault handler is called, the ftrace framework will be called + before kmemcheck has had a chance to handle the fault. If ftrace then + modifies memory that was tracked by kmemcheck, the result is an + endless recursive page fault. + +- ``CONFIG_DEBUG_PAGEALLOC=n`` + This option is located under "Kernel hacking" / "Memory Debugging" + / "Debug page memory allocations". + +In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also +located under "Kernel hacking". With this, you will be able to get line number +information from the kmemcheck warnings, which is extremely valuable in +debugging a problem. This option is not mandatory, however, because it slows +down the compilation process and produces a much bigger kernel image. + +Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory +Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows +a description of the kmemcheck configuration variables: + +- ``CONFIG_KMEMCHECK`` + This must be enabled in order to use kmemcheck at all... + +- ``CONFIG_KMEMCHECK_``[``DISABLED`` | ``ENABLED`` | ``ONESHOT``]``_BY_DEFAULT`` + This option controls the status of kmemcheck at boot-time. "Enabled" + will enable kmemcheck right from the start, "disabled" will boot the + kernel as normal (but with the kmemcheck code compiled in, so it can + be enabled at run-time after the kernel has booted), and "one-shot" is + a special mode which will turn kmemcheck off automatically after + detecting the first use of uninitialized memory. + + If you are using kmemcheck to actively debug a problem, then you + probably want to choose "enabled" here. + + The one-shot mode is mostly useful in automated test setups because it + can prevent floods of warnings and increase the chances of the machine + surviving in case something is really wrong. In other cases, the one- + shot mode could actually be counter-productive because it would turn + itself off at the very first error -- in the case of a false positive + too -- and this would come in the way of debugging the specific + problem you were interested in. + + If you would like to use your kernel as normal, but with a chance to + enable kmemcheck in case of some problem, it might be a good idea to + choose "disabled" here. When kmemcheck is disabled, most of the run- + time overhead is not incurred, and the kernel will be almost as fast + as normal. + +- ``CONFIG_KMEMCHECK_QUEUE_SIZE`` + Select the maximum number of error reports to store in an internal + (fixed-size) buffer. Since errors can occur virtually anywhere and in + any context, we need a temporary storage area which is guaranteed not + to generate any other page faults when accessed. The queue will be + emptied as soon as a tasklet may be scheduled. If the queue is full, + new error reports will be lost. + + The default value of 64 is probably fine. If some code produces more + than 64 errors within an irqs-off section, then the code is likely to + produce many, many more, too, and these additional reports seldom give + any more information (the first report is usually the most valuable + anyway). + + This number might have to be adjusted if you are not using serial + console or similar to capture the kernel log. If you are using the + "dmesg" command to save the log, then getting a lot of kmemcheck + warnings might overflow the kernel log itself, and the earlier reports + will get lost in that way instead. Try setting this to 10 or so on + such a setup. + +- ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT`` + Select the number of shadow bytes to save along with each entry of the + error-report queue. These bytes indicate what parts of an allocation + are initialized, uninitialized, etc. and will be displayed when an + error is detected to help the debugging of a particular problem. + + The number entered here is actually the logarithm of the number of + bytes that will be saved. So if you pick for example 5 here, kmemcheck + will save 2^5 = 32 bytes. + + The default value should be fine for debugging most problems. It also + fits nicely within 80 columns. + +- ``CONFIG_KMEMCHECK_PARTIAL_OK`` + This option (when enabled) works around certain GCC optimizations that + produce 32-bit reads from 16-bit variables where the upper 16 bits are + thrown away afterwards. + + The default value (enabled) is recommended. This may of course hide + some real errors, but disabling it would probably produce a lot of + false positives. + +- ``CONFIG_KMEMCHECK_BITOPS_OK`` + This option silences warnings that would be generated for bit-field + accesses where not all the bits are initialized at the same time. This + may also hide some real bugs. + + This option is probably obsolete, or it should be replaced with + the kmemcheck-/bitfield-annotations for the code in question. The + default value is therefore fine. + +Now compile the kernel as usual. + + +How to use +---------- + +Booting +~~~~~~~ + +First some information about the command-line options. There is only one +option specific to kmemcheck, and this is called "kmemcheck". It can be used +to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT`` +option. Its possible settings are: + +- ``kmemcheck=0`` (disabled) +- ``kmemcheck=1`` (enabled) +- ``kmemcheck=2`` (one-shot mode) + +If SLUB debugging has been enabled in the kernel, it may take precedence over +kmemcheck in such a way that the slab caches which are under SLUB debugging +will not be tracked by kmemcheck. In order to ensure that this doesn't happen +(even though it shouldn't by default), use SLUB's boot option ``slub_debug``, +like this: ``slub_debug=-`` + +In fact, this option may also be used for fine-grained control over SLUB vs. +kmemcheck. For example, if the command line includes +``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only +for the "dentry" slab cache, and with kmemcheck tracking all the other +caches. This is advanced usage, however, and is not generally recommended. + + +Run-time enable/disable +~~~~~~~~~~~~~~~~~~~~~~~ + +When the kernel has booted, it is possible to enable or disable kmemcheck at +run-time. WARNING: This feature is still experimental and may cause false +positive warnings to appear. Therefore, try not to use this. If you find that +it doesn't work properly (e.g. you see an unreasonable amount of warnings), I +will be happy to take bug reports. + +Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.:: + + $ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck + +The numbers are the same as for the ``kmemcheck=`` command-line option. + + +Debugging +~~~~~~~~~ + +A typical report will look something like this:: + + WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) + 80000000000000000000000000000000000000000088ffff0000000000000000 + i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u + ^ + + Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A + RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 + RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002 + RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 + RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84 + RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000 + R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e + R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8 + FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000 + CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 + CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0 + DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 + DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 + [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 + [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 + [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 + [<ffffffff8100c7b5>] int_signal+0x12/0x17 + [<ffffffffffffffff>] 0xffffffffffffffff + +The single most valuable information in this report is the RIP (or EIP on 32- +bit) value. This will help us pinpoint exactly which instruction that caused +the warning. + +If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do +is give this address to the addr2line program, like this:: + + $ addr2line -e vmlinux -i ffffffff8104ede8 + arch/x86/include/asm/string_64.h:12 + include/asm-generic/siginfo.h:287 + kernel/signal.c:380 + kernel/signal.c:410 + +The "``-e vmlinux``" tells addr2line which file to look in. **IMPORTANT:** +This must be the vmlinux of the kernel that produced the warning in the +first place! If not, the line number information will almost certainly be +wrong. + +The "``-i``" tells addr2line to also print the line numbers of inlined +functions. In this case, the flag was very important, because otherwise, +it would only have printed the first line, which is just a call to +``memcpy()``, which could be called from a thousand places in the kernel, and +is therefore not very useful. These inlined functions would not show up in +the stack trace above, simply because the kernel doesn't load the extra +debugging information. This technique can of course be used with ordinary +kernel oopses as well. + +In this case, it's the caller of ``memcpy()`` that is interesting, and it can be +found in ``include/asm-generic/siginfo.h``, line 287:: + + 281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from) + 282 { + 283 if (from->si_code < 0) + 284 memcpy(to, from, sizeof(*to)); + 285 else + 286 /* _sigchld is currently the largest know union member */ + 287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld)); + 288 } + +Since this was a read (kmemcheck usually warns about reads only, though it can +warn about writes to unallocated or freed memory as well), it was probably the +"from" argument which contained some uninitialized bytes. Following the chain +of calls, we move upwards to see where "from" was allocated or initialized, +``kernel/signal.c``, line 380:: + + 359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info) + 360 { + ... + 367 list_for_each_entry(q, &list->list, list) { + 368 if (q->info.si_signo == sig) { + 369 if (first) + 370 goto still_pending; + 371 first = q; + ... + 377 if (first) { + 378 still_pending: + 379 list_del_init(&first->list); + 380 copy_siginfo(info, &first->info); + 381 __sigqueue_free(first); + ... + 392 } + 393 } + +Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The +variable ``first`` was found on a list -- passed in as the second argument to +``collect_signal()``. We continue our journey through the stack, to figure out +where the item on "list" was allocated or initialized. We move to line 410:: + + 395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, + 396 siginfo_t *info) + 397 { + ... + 410 collect_signal(sig, pending, info); + ... + 414 } + +Now we need to follow the ``pending`` pointer, since that is being passed on to +``collect_signal()`` as ``list``. At this point, we've run out of lines from the +"addr2line" output. Not to worry, we just paste the next addresses from the +kmemcheck stack dump, i.e.:: + + [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 + [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 + [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 + [<ffffffff8100c7b5>] int_signal+0x12/0x17 + + $ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \ + ffffffff8100b87d ffffffff8100c7b5 + kernel/signal.c:446 + kernel/signal.c:1806 + arch/x86/kernel/signal.c:805 + arch/x86/kernel/signal.c:871 + arch/x86/kernel/entry_64.S:694 + +Remember that since these addresses were found on the stack and not as the +RIP value, they actually point to the _next_ instruction (they are return +addresses). This becomes obvious when we look at the code for line 446:: + + 422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) + 423 { + ... + 431 signr = __dequeue_signal(&tsk->signal->shared_pending, + 432 mask, info); + 433 /* + 434 * itimer signal ? + 435 * + 436 * itimers are process shared and we restart periodic + 437 * itimers in the signal delivery path to prevent DoS + 438 * attacks in the high resolution timer case. This is + 439 * compliant with the old way of self restarting + 440 * itimers, as the SIGALRM is a legacy signal and only + 441 * queued once. Changing the restart behaviour to + 442 * restart the timer in the signal dequeue path is + 443 * reducing the timer noise on heavy loaded !highres + 444 * systems too. + 445 */ + 446 if (unlikely(signr == SIGALRM)) { + ... + 489 } + +So instead of looking at 446, we should be looking at 431, which is the line +that executes just before 446. Here we see that what we are looking for is +``&tsk->signal->shared_pending``. + +Our next task is now to figure out which function that puts items on this +``shared_pending`` list. A crude, but efficient tool, is ``git grep``:: + + $ git grep -n 'shared_pending' kernel/ + ... + kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending; + kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending; + ... + +There were more results, but none of them were related to list operations, +and these were the only assignments. We inspect the line numbers more closely +and find that this is indeed where items are being added to the list:: + + 816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, + 817 int group) + 818 { + ... + 828 pending = group ? &t->signal->shared_pending : &t->pending; + ... + 851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && + 852 (is_si_special(info) || + 853 info->si_code >= 0))); + 854 if (q) { + 855 list_add_tail(&q->list, &pending->list); + ... + 890 } + +and:: + + 1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) + 1310 { + .... + 1339 pending = group ? &t->signal->shared_pending : &t->pending; + 1340 list_add_tail(&q->list, &pending->list); + .... + 1347 } + +In the first case, the list element we are looking for, ``q``, is being +returned from the function ``__sigqueue_alloc()``, which looks like an +allocation function. Let's take a look at it:: + + 187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags, + 188 int override_rlimit) + 189 { + 190 struct sigqueue *q = NULL; + 191 struct user_struct *user; + 192 + 193 /* + 194 * We won't get problems with the target's UID changing under us + 195 * because changing it requires RCU be used, and if t != current, the + 196 * caller must be holding the RCU readlock (by way of a spinlock) and + 197 * we use RCU protection here + 198 */ + 199 user = get_uid(__task_cred(t)->user); + 200 atomic_inc(&user->sigpending); + 201 if (override_rlimit || + 202 atomic_read(&user->sigpending) <= + 203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) + 204 q = kmem_cache_alloc(sigqueue_cachep, flags); + 205 if (unlikely(q == NULL)) { + 206 atomic_dec(&user->sigpending); + 207 free_uid(user); + 208 } else { + 209 INIT_LIST_HEAD(&q->list); + 210 q->flags = 0; + 211 q->user = user; + 212 } + 213 + 214 return q; + 215 } + +We see that this function initializes ``q->list``, ``q->flags``, and +``q->user``. It seems that now is the time to look at the definition of +``struct sigqueue``, e.g.:: + + 14 struct sigqueue { + 15 struct list_head list; + 16 int flags; + 17 siginfo_t info; + 18 struct user_struct *user; + 19 }; + +And, you might remember, it was a ``memcpy()`` on ``&first->info`` that +caused the warning, so this makes perfect sense. It also seems reasonable +to assume that it is the caller of ``__sigqueue_alloc()`` that has the +responsibility of filling out (initializing) this member. + +But just which fields of the struct were uninitialized? Let's look at +kmemcheck's report again:: + + WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) + 80000000000000000000000000000000000000000088ffff0000000000000000 + i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u + ^ + +These first two lines are the memory dump of the memory object itself, and +the shadow bytemap, respectively. The memory object itself is in this case +``&first->info``. Just beware that the start of this dump is NOT the start +of the object itself! The position of the caret (^) corresponds with the +address of the read (ffff88003e4a2024). + +The shadow bytemap dump legend is as follows: + +- i: initialized +- u: uninitialized +- a: unallocated (memory has been allocated by the slab layer, but has not + yet been handed off to anybody) +- f: freed (memory has been allocated by the slab layer, but has been freed + by the previous owner) + +In order to figure out where (relative to the start of the object) the +uninitialized memory was located, we have to look at the disassembly. For +that, we'll need the RIP address again:: + + RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 + + $ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8: + ffffffff8104edc8: mov %r8,0x8(%r8) + ffffffff8104edcc: test %r10d,%r10d + ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168> + ffffffff8104edd5: mov %rax,%rdx + ffffffff8104edd8: mov $0xc,%ecx + ffffffff8104eddd: mov %r13,%rdi + ffffffff8104ede0: mov $0x30,%eax + ffffffff8104ede5: mov %rdx,%rsi + ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi) + ffffffff8104edea: test $0x2,%al + ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0> + ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi) + ffffffff8104edf0: test $0x1,%al + ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5> + ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi) + ffffffff8104edf5: mov %r8,%rdi + ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free> + +As expected, it's the "``rep movsl``" instruction from the ``memcpy()`` +that causes the warning. We know about ``REP MOVSL`` that it uses the register +``RCX`` to count the number of remaining iterations. By taking a look at the +register dump again (from the kmemcheck report), we can figure out how many +bytes were left to copy:: + + RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 + +By looking at the disassembly, we also see that ``%ecx`` is being loaded +with the value ``$0xc`` just before (ffffffff8104edd8), so we are very +lucky. Keep in mind that this is the number of iterations, not bytes. And +since this is a "long" operation, we need to multiply by 4 to get the +number of bytes. So this means that the uninitialized value was encountered +at 4 * (0xc - 0x9) = 12 bytes from the start of the object. + +We can now try to figure out which field of the "``struct siginfo``" that +was not initialized. This is the beginning of the struct:: + + 40 typedef struct siginfo { + 41 int si_signo; + 42 int si_errno; + 43 int si_code; + 44 + 45 union { + .. + 92 } _sifields; + 93 } siginfo_t; + +On 64-bit, the int is 4 bytes long, so it must the union member that has +not been initialized. We can verify this using gdb:: + + $ gdb vmlinux + ... + (gdb) p &((struct siginfo *) 0)->_sifields + $1 = (union {...} *) 0x10 + +Actually, it seems that the union member is located at offset 0x10 -- which +means that gcc has inserted 4 bytes of padding between the members ``si_code`` +and ``_sifields``. We can now get a fuller picture of the memory dump:: + + _----------------------------=> si_code + / _--------------------=> (padding) + | / _------------=> _sifields(._kill._pid) + | | / _----=> _sifields(._kill._uid) + | | | / + -------|-------|-------|-------| + 80000000000000000000000000000000000000000088ffff0000000000000000 + i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u + +This allows us to realize another important fact: ``si_code`` contains the +value 0x80. Remember that x86 is little endian, so the first 4 bytes +"80000000" are really the number 0x00000080. With a bit of research, we +find that this is actually the constant ``SI_KERNEL`` defined in +``include/asm-generic/siginfo.h``:: + + 144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */ + +This macro is used in exactly one place in the x86 kernel: In ``send_signal()`` +in ``kernel/signal.c``:: + + 816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, + 817 int group) + 818 { + ... + 828 pending = group ? &t->signal->shared_pending : &t->pending; + ... + 851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && + 852 (is_si_special(info) || + 853 info->si_code >= 0))); + 854 if (q) { + 855 list_add_tail(&q->list, &pending->list); + 856 switch ((unsigned long) info) { + ... + 865 case (unsigned long) SEND_SIG_PRIV: + 866 q->info.si_signo = sig; + 867 q->info.si_errno = 0; + 868 q->info.si_code = SI_KERNEL; + 869 q->info.si_pid = 0; + 870 q->info.si_uid = 0; + 871 break; + ... + 890 } + +Not only does this match with the ``.si_code`` member, it also matches the place +we found earlier when looking for where siginfo_t objects are enqueued on the +``shared_pending`` list. + +So to sum up: It seems that it is the padding introduced by the compiler +between two struct fields that is uninitialized, and this gets reported when +we do a ``memcpy()`` on the struct. This means that we have identified a false +positive warning. + +Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls +when both the source and destination addresses are tracked. (Instead, we copy +the shadow bytemap as well). In this case, the destination address clearly +was not tracked. We can dig a little deeper into the stack trace from above:: + + arch/x86/kernel/signal.c:805 + arch/x86/kernel/signal.c:871 + arch/x86/kernel/entry_64.S:694 + +And we clearly see that the destination siginfo object is located on the +stack:: + + 782 static void do_signal(struct pt_regs *regs) + 783 { + 784 struct k_sigaction ka; + 785 siginfo_t info; + ... + 804 signr = get_signal_to_deliver(&info, &ka, regs, NULL); + ... + 854 } + +And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the +destination argument. + +Now, even though we didn't find an actual error here, the example is still a +good one, because it shows how one would go about to find out what the report +was all about. + + +Annotating false positives +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are a few different ways to make annotations in the source code that +will keep kmemcheck from checking and reporting certain allocations. Here +they are: + +- ``__GFP_NOTRACK_FALSE_POSITIVE`` + This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()`` + (therefore also to other functions that end up calling one of + these) to indicate that the allocation should not be tracked + because it would lead to a false positive report. This is a "big + hammer" way of silencing kmemcheck; after all, even if the false + positive pertains to particular field in a struct, for example, we + will now lose the ability to find (real) errors in other parts of + the same struct. + + Example:: + + /* No warnings will ever trigger on accessing any part of x */ + x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE); + +- ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and + ``kmemcheck_annotate_bitfield(ptr, name)`` + The first two of these three macros can be used inside struct + definitions to signal, respectively, the beginning and end of a + bitfield. Additionally, this will assign the bitfield a name, which + is given as an argument to the macros. + + Having used these markers, one can later use + kmemcheck_annotate_bitfield() at the point of allocation, to indicate + which parts of the allocation is part of a bitfield. + + Example:: + + struct foo { + int x; + + kmemcheck_bitfield_begin(flags); + int flag_a:1; + int flag_b:1; + kmemcheck_bitfield_end(flags); + + int y; + }; + + struct foo *x = kmalloc(sizeof *x); + + /* No warnings will trigger on accessing the bitfield of x */ + kmemcheck_annotate_bitfield(x, flags); + + Note that ``kmemcheck_annotate_bitfield()`` can be used even before the + return value of ``kmalloc()`` is checked -- in other words, passing NULL + as the first argument is legal (and will do nothing). + + +Reporting errors +---------------- + +As we have seen, kmemcheck will produce false positive reports. Therefore, it +is not very wise to blindly post kmemcheck warnings to mailing lists and +maintainers. Instead, I encourage maintainers and developers to find errors +in their own code. If you get a warning, you can try to work around it, try +to figure out if it's a real error or not, or simply ignore it. Most +developers know their own code and will quickly and efficiently determine the +root cause of a kmemcheck report. This is therefore also the most efficient +way to work with kmemcheck. + +That said, we (the kmemcheck maintainers) will always be on the lookout for +false positives that we can annotate and silence. So whatever you find, +please drop us a note privately! Kernel configs and steps to reproduce (if +available) are of course a great help too. + +Happy hacking! + + +Technical description +--------------------- + +kmemcheck works by marking memory pages non-present. This means that whenever +somebody attempts to access the page, a page fault is generated. The page +fault handler notices that the page was in fact only hidden, and so it calls +on the kmemcheck code to make further investigations. + +When the investigations are completed, kmemcheck "shows" the page by marking +it present (as it would be under normal circumstances). This way, the +interrupted code can continue as usual. + +But after the instruction has been executed, we should hide the page again, so +that we can catch the next access too! Now kmemcheck makes use of a debugging +feature of the processor, namely single-stepping. When the processor has +finished the one instruction that generated the memory access, a debug +exception is raised. From here, we simply hide the page again and continue +execution, this time with the single-stepping feature turned off. + +kmemcheck requires some assistance from the memory allocator in order to work. +The memory allocator needs to + + 1. Tell kmemcheck about newly allocated pages and pages that are about to + be freed. This allows kmemcheck to set up and tear down the shadow memory + for the pages in question. The shadow memory stores the status of each + byte in the allocation proper, e.g. whether it is initialized or + uninitialized. + + 2. Tell kmemcheck which parts of memory should be marked uninitialized. + There are actually a few more states, such as "not yet allocated" and + "recently freed". + +If a slab cache is set up using the SLAB_NOTRACK flag, it will never return +memory that can take page faults because of kmemcheck. + +If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still +request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags. +This does not prevent the page faults from occurring, however, but marks the +object in question as being initialized so that no warnings will ever be +produced for this object. + +Currently, the SLAB and SLUB allocators are supported by kmemcheck. diff --git a/Documentation/kmemleak.txt b/Documentation/dev-tools/kmemleak.rst index 18e24abb3ecf..1788722d5495 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/dev-tools/kmemleak.rst @@ -1,15 +1,12 @@ Kernel Memory Leak Detector =========================== -Introduction ------------- - Kmemleak provides a way of detecting possible kernel memory leaks in a way similar to a tracing garbage collector (https://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors), with the difference that the orphan objects are not freed but only reported via /sys/kernel/debug/kmemleak. A similar method is used by the -Valgrind tool (memcheck --leak-check) to detect the memory leaks in +Valgrind tool (``memcheck --leak-check``) to detect the memory leaks in user-space applications. Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze, ppc, mips, s390, metag and tile. @@ -19,20 +16,20 @@ Usage CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel thread scans the memory every 10 minutes (by default) and prints the number of new unreferenced objects found. To display the details of all -the possible memory leaks: +the possible memory leaks:: # mount -t debugfs nodev /sys/kernel/debug/ # cat /sys/kernel/debug/kmemleak -To trigger an intermediate memory scan: +To trigger an intermediate memory scan:: # echo scan > /sys/kernel/debug/kmemleak -To clear the list of all current possible memory leaks: +To clear the list of all current possible memory leaks:: # echo clear > /sys/kernel/debug/kmemleak -New leaks will then come up upon reading /sys/kernel/debug/kmemleak +New leaks will then come up upon reading ``/sys/kernel/debug/kmemleak`` again. Note that the orphan objects are listed in the order they were allocated @@ -40,22 +37,31 @@ and one object at the beginning of the list may cause other subsequent objects to be reported as orphan. Memory scanning parameters can be modified at run-time by writing to the -/sys/kernel/debug/kmemleak file. The following parameters are supported: - - off - disable kmemleak (irreversible) - stack=on - enable the task stacks scanning (default) - stack=off - disable the tasks stacks scanning - scan=on - start the automatic memory scanning thread (default) - scan=off - stop the automatic memory scanning thread - scan=<secs> - set the automatic memory scanning period in seconds - (default 600, 0 to stop the automatic scanning) - scan - trigger a memory scan - clear - clear list of current memory leak suspects, done by - marking all current reported unreferenced objects grey, - or free all kmemleak objects if kmemleak has been disabled. - dump=<addr> - dump information about the object found at <addr> - -Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on +``/sys/kernel/debug/kmemleak`` file. The following parameters are supported: + +- off + disable kmemleak (irreversible) +- stack=on + enable the task stacks scanning (default) +- stack=off + disable the tasks stacks scanning +- scan=on + start the automatic memory scanning thread (default) +- scan=off + stop the automatic memory scanning thread +- scan=<secs> + set the automatic memory scanning period in seconds + (default 600, 0 to stop the automatic scanning) +- scan + trigger a memory scan +- clear + clear list of current memory leak suspects, done by + marking all current reported unreferenced objects grey, + or free all kmemleak objects if kmemleak has been disabled. +- dump=<addr> + dump information about the object found at <addr> + +Kmemleak can also be disabled at boot-time by passing ``kmemleak=off`` on the kernel command line. Memory may be allocated or freed before kmemleak is initialised and @@ -63,13 +69,14 @@ these actions are stored in an early log buffer. The size of this buffer is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option. If CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF are enabled, the kmemleak is -disabled by default. Passing "kmemleak=on" on the kernel command +disabled by default. Passing ``kmemleak=on`` on the kernel command line enables the function. Basic Algorithm --------------- -The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and +The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`, +:c:func:`kmem_cache_alloc` and friends are traced and the pointers, together with additional information like size and stack trace, are stored in a rbtree. The corresponding freeing function calls are tracked and the pointers @@ -113,13 +120,13 @@ when doing development. To work around these situations you can use the you can find new unreferenced objects; this should help with testing specific sections of code. -To test a critical section on demand with a clean kmemleak do: +To test a critical section on demand with a clean kmemleak do:: # echo clear > /sys/kernel/debug/kmemleak ... test your kernel or modules ... # echo scan > /sys/kernel/debug/kmemleak -Then as usual to get your report with: +Then as usual to get your report with:: # cat /sys/kernel/debug/kmemleak @@ -131,7 +138,7 @@ disabled by the user or due to an fatal error, internal kmemleak objects won't be freed when kmemleak is disabled, and those objects may occupy a large part of physical memory. -In this situation, you may reclaim memory with: +In this situation, you may reclaim memory with:: # echo clear > /sys/kernel/debug/kmemleak @@ -140,20 +147,20 @@ Kmemleak API See the include/linux/kmemleak.h header for the functions prototype. -kmemleak_init - initialize kmemleak -kmemleak_alloc - notify of a memory block allocation -kmemleak_alloc_percpu - notify of a percpu memory block allocation -kmemleak_free - notify of a memory block freeing -kmemleak_free_part - notify of a partial memory block freeing -kmemleak_free_percpu - notify of a percpu memory block freeing -kmemleak_update_trace - update object allocation stack trace -kmemleak_not_leak - mark an object as not a leak -kmemleak_ignore - do not scan or report an object as leak -kmemleak_scan_area - add scan areas inside a memory block -kmemleak_no_scan - do not scan a memory block -kmemleak_erase - erase an old value in a pointer variable -kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness -kmemleak_free_recursive - as kmemleak_free but checks the recursiveness +- ``kmemleak_init`` - initialize kmemleak +- ``kmemleak_alloc`` - notify of a memory block allocation +- ``kmemleak_alloc_percpu`` - notify of a percpu memory block allocation +- ``kmemleak_free`` - notify of a memory block freeing +- ``kmemleak_free_part`` - notify of a partial memory block freeing +- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing +- ``kmemleak_update_trace`` - update object allocation stack trace +- ``kmemleak_not_leak`` - mark an object as not a leak +- ``kmemleak_ignore`` - do not scan or report an object as leak +- ``kmemleak_scan_area`` - add scan areas inside a memory block +- ``kmemleak_no_scan`` - do not scan a memory block +- ``kmemleak_erase`` - erase an old value in a pointer variable +- ``kmemleak_alloc_recursive`` - as kmemleak_alloc but checks the recursiveness +- ``kmemleak_free_recursive`` - as kmemleak_free but checks the recursiveness Dealing with false positives/negatives -------------------------------------- diff --git a/Documentation/sparse.txt b/Documentation/dev-tools/sparse.rst index eceab1308a8c..8c250e8a2105 100644 --- a/Documentation/sparse.txt +++ b/Documentation/dev-tools/sparse.rst @@ -1,11 +1,20 @@ -Copyright 2004 Linus Torvalds -Copyright 2004 Pavel Machek <pavel@ucw.cz> -Copyright 2006 Bob Copeland <me@bobcopeland.com> +.. Copyright 2004 Linus Torvalds +.. Copyright 2004 Pavel Machek <pavel@ucw.cz> +.. Copyright 2006 Bob Copeland <me@bobcopeland.com> + +Sparse +====== + +Sparse is a semantic checker for C programs; it can be used to find a +number of potential problems with kernel code. See +https://lwn.net/Articles/689907/ for an overview of sparse; this document +contains some kernel-specific sparse information. + Using sparse for typechecking -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +----------------------------- -"__bitwise" is a type attribute, so you have to do something like this: +"__bitwise" is a type attribute, so you have to do something like this:: typedef int __bitwise pm_request_t; @@ -20,13 +29,13 @@ but in this case we really _do_ want to force the conversion). And because the enum values are all the same type, now "enum pm_request" will be that type too. -And with gcc, all the __bitwise/__force stuff goes away, and it all ends -up looking just like integers to gcc. +And with gcc, all the "__bitwise"/"__force stuff" goes away, and it all +ends up looking just like integers to gcc. Quite frankly, you don't need the enum there. The above all really just boils down to one special "int __bitwise" type. -So the simpler way is to just do +So the simpler way is to just do:: typedef int __bitwise pm_request_t; @@ -50,7 +59,7 @@ __bitwise - noisy stuff; in particular, __le*/__be* are that. We really don't want to drown in noise unless we'd explicitly asked for it. Using sparse for lock checking -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------ The following macros are undefined for gcc and defined during a sparse run to use the "context" tracking feature of sparse, applied to @@ -69,22 +78,22 @@ annotation is needed. The tree annotations above are for cases where sparse would otherwise report a context imbalance. Getting sparse -~~~~~~~~~~~~~~ +-------------- You can get latest released versions from the Sparse homepage at https://sparse.wiki.kernel.org/index.php/Main_Page Alternatively, you can get snapshots of the latest development version -of sparse using git to clone.. +of sparse using git to clone:: git://git.kernel.org/pub/scm/devel/sparse/sparse.git -DaveJ has hourly generated tarballs of the git tree available at.. +DaveJ has hourly generated tarballs of the git tree available at:: http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ -Once you have it, just do +Once you have it, just do:: make make install @@ -92,7 +101,7 @@ Once you have it, just do as a regular user, and it will install sparse in your ~/bin directory. Using sparse -~~~~~~~~~~~~ +------------ Do a kernel make with "make C=1" to run sparse on all the C files that get recompiled, or use "make C=2" to run sparse on the files whether they need to @@ -101,7 +110,7 @@ have already built it. The optional make variable CF can be used to pass arguments to sparse. The build system passes -Wbitwise to sparse automatically. To perform endianness -checks, you may define __CHECK_ENDIAN__: +checks, you may define __CHECK_ENDIAN__:: make C=2 CF="-D__CHECK_ENDIAN__" diff --git a/Documentation/dev-tools/tools.rst b/Documentation/dev-tools/tools.rst new file mode 100644 index 000000000000..824ae8e54dd5 --- /dev/null +++ b/Documentation/dev-tools/tools.rst @@ -0,0 +1,25 @@ +================================ +Development tools for the kernel +================================ + +This document is a collection of documents about development tools that can +be used to work on the kernel. For now, the documents have been pulled +together without any significant effot to integrate them into a coherent +whole; patches welcome! + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + coccinelle + sparse + kcov + gcov + kasan + ubsan + kmemleak + kmemcheck + gdb-kernel-debugging diff --git a/Documentation/ubsan.txt b/Documentation/dev-tools/ubsan.rst index f58215ef5797..655e6b63c227 100644 --- a/Documentation/ubsan.txt +++ b/Documentation/dev-tools/ubsan.rst @@ -1,7 +1,5 @@ -Undefined Behavior Sanitizer - UBSAN - -Overview --------- +The Undefined Behavior Sanitizer - UBSAN +======================================== UBSAN is a runtime undefined behaviour checker. @@ -10,11 +8,13 @@ Compiler inserts code that perform certain kinds of checks before operations that may cause UB. If check fails (i.e. UB detected) __ubsan_handle_* function called to print error message. -GCC has that feature since 4.9.x [1] (see -fsanitize=undefined option and -its suboptions). GCC 5.x has more checkers implemented [2]. +GCC has that feature since 4.9.x [1_] (see ``-fsanitize=undefined`` option and +its suboptions). GCC 5.x has more checkers implemented [2_]. Report example ---------------- +-------------- + +:: ================================================================================ UBSAN: Undefined behaviour in ../include/linux/bitops.h:110:33 @@ -47,29 +47,33 @@ Report example Usage ----- -To enable UBSAN configure kernel with: +To enable UBSAN configure kernel with:: CONFIG_UBSAN=y -and to check the entire kernel: +and to check the entire kernel:: CONFIG_UBSAN_SANITIZE_ALL=y To enable instrumentation for specific files or directories, add a line similar to the following to the respective kernel Makefile: - For a single file (e.g. main.o): - UBSAN_SANITIZE_main.o := y +- For a single file (e.g. main.o):: + + UBSAN_SANITIZE_main.o := y - For all files in one directory: - UBSAN_SANITIZE := y +- For all files in one directory:: + + UBSAN_SANITIZE := y To exclude files from being instrumented even if -CONFIG_UBSAN_SANITIZE_ALL=y, use: +``CONFIG_UBSAN_SANITIZE_ALL=y``, use:: + + UBSAN_SANITIZE_main.o := n + +and:: - UBSAN_SANITIZE_main.o := n - and: - UBSAN_SANITIZE := n + UBSAN_SANITIZE := n Detection of unaligned accesses controlled through the separate option - CONFIG_UBSAN_ALIGNMENT. It's off by default on architectures that support @@ -80,5 +84,5 @@ reports. References ---------- -[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html -[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html +.. _1: https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html +.. _2: https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html diff --git a/Documentation/development-process/1.Intro b/Documentation/development-process/1.Intro.rst index 9b614480aa84..22642b3fe903 100644 --- a/Documentation/development-process/1.Intro +++ b/Documentation/development-process/1.Intro.rst @@ -1,16 +1,8 @@ -1: A GUIDE TO THE KERNEL DEVELOPMENT PROCESS +Introdution +=========== -The purpose of this document is to help developers (and their managers) -work with the development community with a minimum of frustration. It is -an attempt to document how this community works in a way which is -accessible to those who are not intimately familiar with Linux kernel -development (or, indeed, free software development in general). While -there is some technical material here, this is very much a process-oriented -discussion which does not require a deep knowledge of kernel programming to -understand. - - -1.1: EXECUTIVE SUMMARY +Executive summary +----------------- The rest of this section covers the scope of the kernel development process and the kinds of frustrations that developers and their employers can @@ -20,41 +12,41 @@ availability to users, community support in many forms, and the ability to influence the direction of kernel development. Code contributed to the Linux kernel must be made available under a GPL-compatible license. -Section 2 introduces the development process, the kernel release cycle, and -the mechanics of the merge window. The various phases in the patch -development, review, and merging cycle are covered. There is some +:ref:`development_process` introduces the development process, the kernel +release cycle, and the mechanics of the merge window. The various phases in +the patch development, review, and merging cycle are covered. There is some discussion of tools and mailing lists. Developers wanting to get started with kernel development are encouraged to track down and fix bugs as an initial exercise. -Section 3 covers early-stage project planning, with an emphasis on -involving the development community as soon as possible. +:ref:`development_early_stage` covers early-stage project planning, with an +emphasis on involving the development community as soon as possible. -Section 4 is about the coding process; several pitfalls which have been -encountered by other developers are discussed. Some requirements for +:ref:`development_coding` is about the coding process; several pitfalls which +have been encountered by other developers are discussed. Some requirements for patches are covered, and there is an introduction to some of the tools which can help to ensure that kernel patches are correct. -Section 5 talks about the process of posting patches for review. To be -taken seriously by the development community, patches must be properly -formatted and described, and they must be sent to the right place. +:ref:`development_posting` talks about the process of posting patches for +review. To be taken seriously by the development community, patches must be +properly formatted and described, and they must be sent to the right place. Following the advice in this section should help to ensure the best possible reception for your work. -Section 6 covers what happens after posting patches; the job is far from -done at that point. Working with reviewers is a crucial part of the -development process; this section offers a number of tips on how to avoid -problems at this important stage. Developers are cautioned against +:ref:`development_followthrough` covers what happens after posting patches; the +job is far from done at that point. Working with reviewers is a crucial part +of the development process; this section offers a number of tips on how to +avoid problems at this important stage. Developers are cautioned against assuming that the job is done when a patch is merged into the mainline. -Section 7 introduces a couple of "advanced" topics: managing patches with -git and reviewing patches posted by others. - -Section 8 concludes the document with pointers to sources for more -information on kernel development. +:ref:`development_advancedtopics` introduces a couple of "advanced" topics: +managing patches with git and reviewing patches posted by others. +:ref:`development_conclusion` concludes the document with pointers to sources +for more information on kernel development. -1.2: WHAT THIS DOCUMENT IS ABOUT +What this document is about +--------------------------- The Linux kernel, at over 8 million lines of code and well over 1000 contributors to each release, is one of the largest and most active free @@ -108,8 +100,8 @@ community is always in need of developers who will help to make the kernel better; the following text should help you - or those who work for you - join our community. - -1.3: CREDITS +Credits +------- This document was written by Jonathan Corbet, corbet@lwn.net. It has been improved by comments from Johannes Berg, James Berry, Alex Chiang, Roland @@ -120,8 +112,8 @@ Jochen Voร. This work was supported by the Linux Foundation; thanks especially to Amanda McPherson, who saw the value of this effort and made it all happen. - -1.4: THE IMPORTANCE OF GETTING CODE INTO THE MAINLINE +The importance of getting code into the mainline +------------------------------------------------ Some companies and developers occasionally wonder why they should bother learning how to work with the kernel community and get their code into the @@ -233,8 +225,8 @@ commercial life, after which a new version must be released. At that point, vendors whose code is in the mainline and well maintained will be much better positioned to get the new product ready for market quickly. - -1.5: LICENSING +Licensing +--------- Code is contributed to the Linux kernel under a number of licenses, but all code must be compatible with version 2 of the GNU General Public License diff --git a/Documentation/development-process/2.Process b/Documentation/development-process/2.Process.rst index c24e156a6118..ce5561bb3f8e 100644 --- a/Documentation/development-process/2.Process +++ b/Documentation/development-process/2.Process.rst @@ -1,4 +1,7 @@ -2: HOW THE DEVELOPMENT PROCESS WORKS +.. _development_process: + +How the development process works +================================= Linux kernel development in the early 1990's was a pretty loose affair, with relatively small numbers of users and developers involved. With a @@ -7,19 +10,21 @@ course of one year, the kernel has since had to evolve a number of processes to keep development happening smoothly. A solid understanding of how the process works is required in order to be an effective part of it. - -2.1: THE BIG PICTURE +The big picture +--------------- The kernel developers use a loosely time-based release process, with a new major kernel release happening every two or three months. The recent release history looks like this: + ====== ================= 2.6.38 March 14, 2011 2.6.37 January 4, 2011 2.6.36 October 20, 2010 2.6.35 August 1, 2010 2.6.34 May 15, 2010 2.6.33 February 24, 2010 + ====== ================= Every 2.6.x release is a major kernel release with new features, internal API changes, and more. A typical 2.6 release can contain nearly 10,000 @@ -68,6 +73,7 @@ At that point the whole process starts over again. As an example, here is how the 2.6.38 development cycle went (all dates in 2011): + ============== =============================== January 4 2.6.37 stable release January 18 2.6.38-rc1, merge window closes January 21 2.6.38-rc2 @@ -78,6 +84,7 @@ As an example, here is how the 2.6.38 development cycle went (all dates in March 1 2.6.38-rc7 March 7 2.6.38-rc8 March 14 2.6.38 stable release + ============== =============================== How do the developers decide when to close the development cycle and create the stable release? The most significant metric used is the list of @@ -105,11 +112,13 @@ next development kernel. Kernels will typically receive stable updates for a little more than one development cycle past their initial release. So, for example, the 2.6.36 kernel's history looked like: + ============== =============================== October 10 2.6.36 stable release November 22 2.6.36.1 December 9 2.6.36.2 January 7 2.6.36.3 February 17 2.6.36.4 + ============== =============================== 2.6.36.4 was the final stable update for the 2.6.36 release. @@ -117,9 +126,11 @@ Some kernels are designated "long term" kernels; they will receive support for a longer period. As of this writing, the current long term kernels and their maintainers are: + ====== ====================== =========================== 2.6.27 Willy Tarreau (Deep-frozen stable kernel) 2.6.32 Greg Kroah-Hartman 2.6.35 Andi Kleen (Embedded flag kernel) + ====== ====================== =========================== The selection of a kernel for long-term support is purely a matter of a maintainer having the need and the time to maintain that release. There @@ -127,7 +138,8 @@ are no known plans for long-term support for any specific upcoming release. -2.2: THE LIFECYCLE OF A PATCH +The lifecycle of a patch +------------------------ Patches do not go directly from the developer's keyboard into the mainline kernel. There is, instead, a somewhat involved (if somewhat informal) @@ -195,8 +207,8 @@ is to try to cut the process down to a single "merging into the mainline" step. This approach invariably leads to frustration for everybody involved. - -2.3: HOW PATCHES GET INTO THE KERNEL +How patches get into the Kernel +------------------------------- There is exactly one person who can merge patches into the mainline kernel repository: Linus Torvalds. But, of the over 9,500 patches which went @@ -242,7 +254,8 @@ finding the right maintainer. Sending patches directly to Linus is not normally the right way to go. -2.4: NEXT TREES +Next trees +---------- The chain of subsystem trees guides the flow of patches into the kernel, but it also raises an interesting question: what if somebody wants to look @@ -294,7 +307,8 @@ all patches merged during a given merge window should really have found their way into linux-next some time before the merge window opens. -2.4.1: STAGING TREES +Staging trees +------------- The kernel source tree contains the drivers/staging/ directory, where many sub-directories for drivers or filesystems that are on their way to @@ -322,7 +336,8 @@ staging drivers. So staging is, at best, a stop on the way toward becoming a proper mainline driver. -2.5: TOOLS +Tools +----- As can be seen from the above text, the kernel development process depends heavily on the ability to herd collections of patches in various @@ -368,7 +383,8 @@ upstream. For the management of certain kinds of trees (-mm, for example), quilt is the best tool for the job. -2.6: MAILING LISTS +Mailing lists +------------- A great deal of Linux kernel development work is done by way of mailing lists. It is hard to be a fully-functioning member of the community @@ -436,7 +452,8 @@ filesystem, etc. subsystems. The best place to look for mailing lists is in the MAINTAINERS file packaged with the kernel source. -2.7: GETTING STARTED WITH KERNEL DEVELOPMENT +Getting started with Kernel development +--------------------------------------- Questions about how to get started with the kernel development process are common - from both individuals and companies. Equally common are missteps @@ -463,6 +480,8 @@ they wish for by these means. Andrew Morton gives this advice for aspiring kernel developers +:: + The #1 project for all kernel beginners should surely be "make sure that the kernel runs perfectly at all times on all machines which you can lay your hands on". Usually the way to do this is to work diff --git a/Documentation/development-process/3.Early-stage b/Documentation/development-process/3.Early-stage.rst index f87ba7b3fbac..af2c0af931d6 100644 --- a/Documentation/development-process/3.Early-stage +++ b/Documentation/development-process/3.Early-stage.rst @@ -1,4 +1,7 @@ -3: EARLY-STAGE PLANNING +.. _development_early_stage: + +Early-stage planning +==================== When contemplating a Linux kernel development project, it can be tempting to jump right in and start coding. As with any significant project, @@ -7,7 +10,8 @@ line of code is written. Some time spent in early planning and communication can save far more time later on. -3.1: SPECIFYING THE PROBLEM +Specifying the problem +---------------------- Like any engineering project, a successful kernel enhancement starts with a clear description of the problem to be solved. In some cases, this step is @@ -64,7 +68,8 @@ answers to a short set of questions: Only then does it make sense to start considering possible solutions. -3.2: EARLY DISCUSSION +Early discussion +---------------- When planning a kernel development project, it makes great sense to hold discussions with the community before launching into implementation. Early @@ -117,7 +122,8 @@ In each of these cases, a great deal of pain and extra work could have been avoided with some early discussion with the kernel developers. -3.3: WHO DO YOU TALK TO? +Who do you talk to? +------------------- When developers decide to take their plans public, the next question will be: where do we start? The answer is to find the right mailing list(s) and @@ -141,6 +147,8 @@ development project. The task of finding the right maintainer is sometimes challenging enough that the kernel developers have added a script to ease the process: +:: + .../scripts/get_maintainer.pl This script will return the current maintainer(s) for a given file or @@ -155,7 +163,8 @@ If all else fails, talking to Andrew Morton can be an effective way to track down a maintainer for a specific piece of code. -3.4: WHEN TO POST? +When to post? +------------- If possible, posting your plans during the early stages can only be helpful. Describe the problem being solved and any plans that have been @@ -179,7 +188,8 @@ idea. The best thing to do in this situation is to proceed, keeping the community informed as you go. -3.5: GETTING OFFICIAL BUY-IN +Getting official buy-in +----------------------- If your work is being done in a corporate environment - as most Linux kernel work is - you must, obviously, have permission from suitably diff --git a/Documentation/development-process/4.Coding b/Documentation/development-process/4.Coding.rst index 9a3ee77cefb1..9d5cef996f7f 100644 --- a/Documentation/development-process/4.Coding +++ b/Documentation/development-process/4.Coding.rst @@ -1,4 +1,7 @@ -4: GETTING THE CODE RIGHT +.. _development_coding: + +Getting the code right +====================== While there is much to be said for a solid and community-oriented design process, the proof of any kernel development project is in the resulting @@ -12,9 +15,11 @@ will shift toward doing things right and the tools which can help in that quest. -4.1: PITFALLS +Pitfalls +--------- -* Coding style +Coding style +************ The kernel has long had a standard coding style, described in Documentation/CodingStyle. For much of that time, the policies described @@ -54,7 +59,8 @@ style (a line which becomes far less readable if split to fit within the 80-column limit, for example), just do it. -* Abstraction layers +Abstraction layers +****************** Computer Science professors teach students to make extensive use of abstraction layers in the name of flexibility and information hiding. @@ -87,7 +93,8 @@ implement that functionality at a higher level. There is no value in replicating the same code throughout the kernel. -* #ifdef and preprocessor use in general +#ifdef and preprocessor use in general +************************************** The C preprocessor seems to present a powerful temptation to some C programmers, who see it as a way to efficiently encode a great deal of @@ -113,7 +120,8 @@ easier to read, do not evaluate their arguments multiple times, and allow the compiler to perform type checking on the arguments and return value. -* Inline functions +Inline functions +**************** Inline functions present a hazard of their own, though. Programmers can become enamored of the perceived efficiency inherent in avoiding a function @@ -137,7 +145,8 @@ placement of "inline" keywords may not just be excessive; it could also be irrelevant. -* Locking +Locking +******* In May, 2006, the "Devicescape" networking stack was, with great fanfare, released under the GPL and made available for inclusion in the @@ -151,7 +160,7 @@ This code showed a number of signs of having been developed behind corporate doors. But one large problem in particular was that it was not designed to work on multiprocessor systems. Before this networking stack (now called mac80211) could be merged, a locking scheme needed to be -retrofitted onto it. +retrofitted onto it. Once upon a time, Linux kernel code could be developed without thinking about the concurrency issues presented by multiprocessor systems. Now, @@ -169,7 +178,8 @@ enough to pick the right tool for the job. Code which shows a lack of attention to concurrency will have a difficult path into the mainline. -* Regressions +Regressions +*********** One final hazard worth mentioning is this: it can be tempting to make a change (which may bring big improvements) which causes something to break @@ -185,6 +195,8 @@ change if it brings new functionality to ten systems for each one it breaks? The best answer to this question was expressed by Linus in July, 2007: +:: + So we don't fix bugs by introducing new problems. That way lies madness, and nobody ever knows if you actually make any real progress at all. Is it two steps forwards, one step back, or one @@ -201,8 +213,8 @@ reason, a great deal of thought, clear documentation, and wide review for user-space interfaces is always required. - -4.2: CODE CHECKING TOOLS +Code checking tools +------------------- For now, at least, the writing of error-free code remains an ideal that few of us can reach. What we can hope to do, though, is to catch and fix as @@ -250,7 +262,7 @@ testing purposes. In particular, you should turn on: There are quite a few other debugging options, some of which will be discussed below. Some of them have a significant performance impact and should not be used all of the time. But some time spent learning the -available options will likely be paid back many times over in short order. +available options will likely be paid back many times over in short order. One of the heavier debugging tools is the locking checker, or "lockdep." This tool will track the acquisition and release of every lock (spinlock or @@ -263,7 +275,7 @@ occasion, deadlock. This kind of problem can be painful (for both developers and users) in a deployed system; lockdep allows them to be found in an automated manner ahead of time. Code with any sort of non-trivial locking should be run with lockdep enabled before being submitted for -inclusion. +inclusion. As a diligent kernel programmer, you will, beyond doubt, check the return status of any operation (such as a memory allocation) which can fail. The @@ -300,7 +312,7 @@ Documentation/coccinelle.txt for more information. Other kinds of portability errors are best found by compiling your code for other architectures. If you do not happen to have an S/390 system or a Blackfin development board handy, you can still perform the compilation -step. A large set of cross compilers for x86 systems can be found at +step. A large set of cross compilers for x86 systems can be found at http://www.kernel.org/pub/tools/crosstool/ @@ -308,7 +320,8 @@ Some time spent installing and using these compilers will help avoid embarrassment later. -4.3: DOCUMENTATION +Documentation +------------- Documentation has often been more the exception than the rule with kernel development. Even so, adequate documentation will help to ease the merging @@ -364,7 +377,8 @@ out. Anything which might tempt a code janitor to make an incorrect "cleanup" needs a comment saying why it is done the way it is. And so on. -4.4: INTERNAL API CHANGES +Internal API changes +-------------------- The binary interface provided by the kernel to user space cannot be broken except under the most severe circumstances. The kernel's internal diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting.rst index 8a48c9b62864..b511ddf7e82a 100644 --- a/Documentation/development-process/5.Posting +++ b/Documentation/development-process/5.Posting.rst @@ -1,4 +1,7 @@ -5: POSTING PATCHES +.. _development_posting: + +Posting patches +=============== Sooner or later, the time comes when your work is ready to be presented to the community for review and, eventually, inclusion into the mainline @@ -11,7 +14,8 @@ SubmittingDrivers, and SubmitChecklist in the kernel documentation directory. -5.1: WHEN TO POST +When to post +------------ There is a constant temptation to avoid posting patches before they are completely "ready." For simple patches, that is not a problem. If the @@ -27,7 +31,8 @@ patches which are known to be half-baked, but those who do will come in with the idea that they can help you drive the work in the right direction. -5.2: BEFORE CREATING PATCHES +Before creating patches +----------------------- There are a number of things which should be done before you consider sending patches to the development community. These include: @@ -52,7 +57,8 @@ As a general rule, putting in some extra thought before posting code almost always pays back the effort in short order. -5.3: PATCH PREPARATION +Patch preparation +----------------- The preparation of patches for posting can be a surprising amount of work, but, once again, attempting to save time here is not generally advisable @@ -122,7 +128,8 @@ which takes quite a bit of time and thought after the "real work" has been done. When done properly, though, it is time well spent. -5.4: PATCH FORMATTING AND CHANGELOGS +Patch formatting and changelogs +------------------------------- So now you have a perfect series of patches for posting, but the work is not done quite yet. Each patch needs to be formatted into a message which @@ -140,6 +147,8 @@ that end, each patch will be composed of the following: subsystem name first, followed by the purpose of the patch. For example: + :: + gpio: fix build on CONFIG_GPIO_SYSFS=n - A blank line followed by a detailed description of the contents of the @@ -192,6 +201,8 @@ been associated with the development of this patch. They are described in detail in the SubmittingPatches document; what follows here is a brief summary. Each of these lines has the format: +:: + tag: Full Name <email address> optional-other-stuff The tags in common use are: @@ -225,7 +236,8 @@ Be careful in the addition of tags to your patches: only Cc: is appropriate for addition without the explicit permission of the person named. -5.5: SENDING THE PATCH +Sending the patch +----------------- Before you mail your patches, there are a couple of other things you should take care of: @@ -287,6 +299,8 @@ obvious maintainer, Andrew Morton is often the patch target of last resort. Patches need good subject lines. The canonical format for a patch line is something like: +:: + [PATCH nn/mm] subsys: one-line description of the patch where "nn" is the ordinal number of the patch, "mm" is the total number of diff --git a/Documentation/development-process/6.Followthrough b/Documentation/development-process/6.Followthrough.rst index 41d324a9420d..a173cd5f93d2 100644 --- a/Documentation/development-process/6.Followthrough +++ b/Documentation/development-process/6.Followthrough.rst @@ -1,4 +1,7 @@ -6: FOLLOWTHROUGH +.. _development_followthrough: + +Followthrough +============= At this point, you have followed the guidelines given so far and, with the addition of your own engineering skills, have posted a perfect series of @@ -16,7 +19,8 @@ standards. A failure to participate in this process is quite likely to prevent the inclusion of your patches into the mainline. -6.1: WORKING WITH REVIEWERS +Working with reviewers +---------------------- A patch of any significance will result in a number of comments from other developers as they review the code. Working with reviewers can be, for @@ -97,7 +101,8 @@ though, and not before all other alternatives have been explored. And bear in mind, of course, that he may not agree with you either. -6.2: WHAT HAPPENS NEXT +What happens next +----------------- If a patch is considered to be a good thing to add to the kernel, and once most of the review issues have been resolved, the next step is usually @@ -177,7 +182,8 @@ it with the assumption that you will not be around to maintain it afterward. -6.3: OTHER THINGS THAT CAN HAPPEN +Other things that can happen +----------------------------- One day, you may open your mail client and see that somebody has mailed you a patch to your code. That is one of the advantages of having your code diff --git a/Documentation/development-process/7.AdvancedTopics b/Documentation/development-process/7.AdvancedTopics.rst index 26dc3fa196e4..81d61c5d62dd 100644 --- a/Documentation/development-process/7.AdvancedTopics +++ b/Documentation/development-process/7.AdvancedTopics.rst @@ -1,11 +1,15 @@ -7: ADVANCED TOPICS +.. _development_advancedtopics: + +Advanced topics +=============== At this point, hopefully, you have a handle on how the development process works. There is still more to learn, however! This section will cover a number of topics which can be helpful for developers wanting to become a regular part of the Linux kernel development process. -7.1: MANAGING PATCHES WITH GIT +Managing patches with git +------------------------- The use of distributed version control for the kernel began in early 2002, when Linus first started playing with the proprietary BitKeeper @@ -114,6 +118,8 @@ radar. Kernel developers tend to get unhappy when they see that kind of thing happening; putting up a git tree with unreviewed or off-topic patches can affect your ability to get trees pulled in the future. Quoting Linus: +:: + You can send me patches, but for me to pull a git patch from you, I need to know that you know what you're doing, and I need to be able to trust things *without* then having to go and check every @@ -141,7 +147,8 @@ format the request as other developers expect, and will also check to be sure that you have remembered to push those changes to the public server. -7.2: REVIEWING PATCHES +Reviewing patches +----------------- Some readers will certainly object to putting this section with "advanced topics" on the grounds that even beginning kernel developers should be diff --git a/Documentation/development-process/8.Conclusion b/Documentation/development-process/8.Conclusion.rst index caef69022e9c..23ec7cbc2d2b 100644 --- a/Documentation/development-process/8.Conclusion +++ b/Documentation/development-process/8.Conclusion.rst @@ -1,4 +1,7 @@ -8: FOR MORE INFORMATION +.. _development_conclusion: + +For more information +==================== There are numerous sources of information on Linux kernel development and related topics. First among those will always be the Documentation @@ -47,7 +50,8 @@ Documentation for git can be found at: http://www.kernel.org/pub/software/scm/git/docs/user-manual.html -9: CONCLUSION +Conclusion +========== Congratulations to anybody who has made it through this long-winded document. Hopefully it has provided a helpful understanding of how the diff --git a/Documentation/development-process/conf.py b/Documentation/development-process/conf.py new file mode 100644 index 000000000000..4b4a12dace02 --- /dev/null +++ b/Documentation/development-process/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = 'Linux Kernel Development Documentation' + +tags.add("subproject") + +latex_documents = [ + ('index', 'development-process.tex', 'Linux Kernel Development Documentation', + 'The kernel development community', 'manual'), +] diff --git a/Documentation/development-process/development-process.rst b/Documentation/development-process/development-process.rst new file mode 100644 index 000000000000..bd1399f7202a --- /dev/null +++ b/Documentation/development-process/development-process.rst @@ -0,0 +1,29 @@ +.. _development_process_main: + +A guide to the Kernel Development Process +========================================= + +Contents: + +.. toctree:: + :numbered: + :maxdepth: 2 + + 1.Intro + 2.Process + 3.Early-stage + 4.Coding + 5.Posting + 6.Followthrough + 7.AdvancedTopics + 8.Conclusion + +The purpose of this document is to help developers (and their managers) +work with the development community with a minimum of frustration. It is +an attempt to document how this community works in a way which is +accessible to those who are not intimately familiar with Linux kernel +development (or, indeed, free software development in general). While +there is some technical material here, this is very much a process-oriented +discussion which does not require a deep knowledge of kernel programming to +understand. + diff --git a/Documentation/development-process/index.rst b/Documentation/development-process/index.rst new file mode 100644 index 000000000000..c37475d91090 --- /dev/null +++ b/Documentation/development-process/index.rst @@ -0,0 +1,9 @@ +Linux Kernel Development Documentation +====================================== + +Contents: + +.. toctree:: + :maxdepth: 2 + + development-process diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt b/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt index b545856a444f..4a1714f96bab 100644 --- a/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt +++ b/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt @@ -90,6 +90,47 @@ Required Properties: - interrupts : Should be single bit error interrupt, then double bit error interrupt, in this order. +NAND FIFO ECC +Required Properties: +- compatible : Should be "altr,socfpga-nand-ecc" +- reg : Address and size for ECC block registers. +- altr,ecc-parent : phandle to parent NAND node. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order. + +DMA FIFO ECC +Required Properties: +- compatible : Should be "altr,socfpga-dma-ecc" +- reg : Address and size for ECC block registers. +- altr,ecc-parent : phandle to parent DMA node. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order. + +USB FIFO ECC +Required Properties: +- compatible : Should be "altr,socfpga-usb-ecc" +- reg : Address and size for ECC block registers. +- altr,ecc-parent : phandle to parent USB node. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order. + +QSPI FIFO ECC +Required Properties: +- compatible : Should be "altr,socfpga-qspi-ecc" +- reg : Address and size for ECC block registers. +- altr,ecc-parent : phandle to parent QSPI node. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order. + +SDMMC FIFO ECC +Required Properties: +- compatible : Should be "altr,socfpga-sdmmc-ecc" +- reg : Address and size for ECC block registers. +- altr,ecc-parent : phandle to parent SD/MMC node. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order for port A, and then single bit error interrupt, + then double bit error interrupt in this order for port B. + Example: eccmgr: eccmgr@ffd06000 { @@ -132,4 +173,61 @@ Example: interrupts = <5 IRQ_TYPE_LEVEL_HIGH>, <37 IRQ_TYPE_LEVEL_HIGH>; }; + + nand-buf-ecc@ff8c2000 { + compatible = "altr,socfpga-nand-ecc"; + reg = <0xff8c2000 0x400>; + altr,ecc-parent = <&nand>; + interrupts = <11 IRQ_TYPE_LEVEL_HIGH>, + <43 IRQ_TYPE_LEVEL_HIGH>; + }; + + nand-rd-ecc@ff8c2400 { + compatible = "altr,socfpga-nand-ecc"; + reg = <0xff8c2400 0x400>; + altr,ecc-parent = <&nand>; + interrupts = <13 IRQ_TYPE_LEVEL_HIGH>, + <45 IRQ_TYPE_LEVEL_HIGH>; + }; + + nand-wr-ecc@ff8c2800 { + compatible = "altr,socfpga-nand-ecc"; + reg = <0xff8c2800 0x400>; + altr,ecc-parent = <&nand>; + interrupts = <12 IRQ_TYPE_LEVEL_HIGH>, + <44 IRQ_TYPE_LEVEL_HIGH>; + }; + + dma-ecc@ff8c8000 { + compatible = "altr,socfpga-dma-ecc"; + reg = <0xff8c8000 0x400>; + altr,ecc-parent = <&pdma>; + interrupts = <10 IRQ_TYPE_LEVEL_HIGH>, + <42 IRQ_TYPE_LEVEL_HIGH>; + + usb0-ecc@ff8c8800 { + compatible = "altr,socfpga-usb-ecc"; + reg = <0xff8c8800 0x400>; + altr,ecc-parent = <&usb0>; + interrupts = <2 IRQ_TYPE_LEVEL_HIGH>, + <34 IRQ_TYPE_LEVEL_HIGH>; + }; + + qspi-ecc@ff8c8400 { + compatible = "altr,socfpga-qspi-ecc"; + reg = <0xff8c8400 0x400>; + altr,ecc-parent = <&qspi>; + interrupts = <14 IRQ_TYPE_LEVEL_HIGH>, + <46 IRQ_TYPE_LEVEL_HIGH>; + }; + + sdmmc-ecc@ff8c2c00 { + compatible = "altr,socfpga-sdmmc-ecc"; + reg = <0xff8c2c00 0x400>; + altr,ecc-parent = <&mmc>; + interrupts = <15 IRQ_TYPE_LEVEL_HIGH>, + <47 IRQ_TYPE_LEVEL_HIGH>, + <16 IRQ_TYPE_LEVEL_HIGH>, + <48 IRQ_TYPE_LEVEL_HIGH>; + }; }; diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt b/Documentation/devicetree/bindings/arm/arch_timer.txt index e774128935d5..ef5fbe9a77c7 100644 --- a/Documentation/devicetree/bindings/arm/arch_timer.txt +++ b/Documentation/devicetree/bindings/arm/arch_timer.txt @@ -25,6 +25,12 @@ to deliver its interrupts via SPIs. - always-on : a boolean property. If present, the timer is powered through an always-on power domain, therefore it never loses context. +- fsl,erratum-a008585 : A boolean property. Indicates the presence of + QorIQ erratum A-008585, which says that reading the counter is + unreliable unless the same value is returned by back-to-back reads. + This also affects writes to the tval register, due to the implicit + counter read. + ** Optional properties: - arm,cpu-registers-not-fw-configured : Firmware does not initialize diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt index 936166fbee09..cb0054ac7121 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt @@ -5,7 +5,8 @@ The Mediatek apmixedsys controller provides the PLLs to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-apmixedsys" - "mediatek,mt8135-apmixedsys" - "mediatek,mt8173-apmixedsys" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt new file mode 100644 index 000000000000..4137196dd686 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt @@ -0,0 +1,22 @@ +Mediatek bdpsys controller +============================ + +The Mediatek bdpsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-bdpsys", "syscon" +- #clock-cells: Must be 1 + +The bdpsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +bdpsys: clock-controller@1c000000 { + compatible = "mediatek,mt2701-bdpsys", "syscon"; + reg = <0 0x1c000000 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt new file mode 100644 index 000000000000..768f3a5bc055 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt @@ -0,0 +1,22 @@ +Mediatek ethsys controller +============================ + +The Mediatek ethsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-ethsys", "syscon" +- #clock-cells: Must be 1 + +The ethsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +ethsys: clock-controller@1b000000 { + compatible = "mediatek,mt2701-ethsys", "syscon"; + reg = <0 0x1b000000 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt new file mode 100644 index 000000000000..beed7b594cea --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt @@ -0,0 +1,24 @@ +Mediatek hifsys controller +============================ + +The Mediatek hifsys controller provides various clocks and reset +outputs to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-hifsys", "syscon" +- #clock-cells: Must be 1 + +The hifsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +hifsys: clock-controller@1a000000 { + compatible = "mediatek,mt2701-hifsys", "syscon"; + reg = <0 0x1a000000 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt index b1f2ce17dff8..f6a916686f4c 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt @@ -5,7 +5,8 @@ The Mediatek imgsys controller provides various clocks to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-imgsys", "syscon" - "mediatek,mt8173-imgsys", "syscon" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt index aaf8d1460c4d..1620ec2a5a3f 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt @@ -6,7 +6,8 @@ outputs to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-infracfg", "syscon" - "mediatek,mt8135-infracfg", "syscon" - "mediatek,mt8173-infracfg", "syscon" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.txt index 4385946eadef..67dd2e473d25 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.txt @@ -5,7 +5,8 @@ The Mediatek mmsys controller provides various clocks to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-mmsys", "syscon" - "mediatek,mt8173-mmsys", "syscon" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt index 2f6ff86df49f..e494366782aa 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt @@ -6,7 +6,8 @@ outputs to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-pericfg", "syscon" - "mediatek,mt8135-pericfg", "syscon" - "mediatek,mt8173-pericfg", "syscon" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt index f9e917994ced..9f2fe7860114 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt @@ -5,7 +5,8 @@ The Mediatek topckgen controller provides various clocks to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-topckgen" - "mediatek,mt8135-topckgen" - "mediatek,mt8173-topckgen" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,vdecsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,vdecsys.txt index 1faacf1c1b25..2440f73450c3 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,vdecsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,vdecsys.txt @@ -5,7 +5,8 @@ The Mediatek vdecsys controller provides various clocks to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-vdecsys", "syscon" - "mediatek,mt8173-vdecsys", "syscon" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt b/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt new file mode 100644 index 000000000000..a55d31b48d6e --- /dev/null +++ b/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt @@ -0,0 +1,45 @@ +* Amlogic GXBB AO Clock and Reset Unit + +The Amlogic GXBB AO clock controller generates and supplies clock to various +controllers within the Always-On part of the SoC. + +Required Properties: + +- compatible: should be "amlogic,gxbb-aoclkc" +- reg: physical base address of the clock controller and length of memory + mapped region. + +- #clock-cells: should be 1. + +Each clock is assigned an identifier and client nodes can use this identifier +to specify the clock which they consume. All available clocks are defined as +preprocessor macros in the dt-bindings/clock/gxbb-aoclkc.h header and can be +used in device tree sources. + +- #reset-cells: should be 1. + +Each reset is assigned an identifier and client nodes can use this identifier +to specify the reset which they consume. All available resets are defined as +preprocessor macros in the dt-bindings/reset/gxbb-aoclkc.h header and can be +used in device tree sources. + +Example: AO Clock controller node: + + clkc_AO: clock-controller@040 { + compatible = "amlogic,gxbb-aoclkc"; + reg = <0x0 0x040 0x0 0x4>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + +Example: UART controller node that consumes the clock and reset generated + by the clock controller: + + uart_AO: serial@4c0 { + compatible = "amlogic,meson-uart"; + reg = <0x4c0 0x14>; + interrupts = <0 90 1>; + clocks = <&clkc_AO CLKID_AO_UART1>; + resets = <&clkc_AO RESET_AO_UART1>; + status = "disabled"; + }; diff --git a/Documentation/devicetree/bindings/clock/arm-syscon-icst.txt b/Documentation/devicetree/bindings/clock/arm-syscon-icst.txt index 8b7177cecb36..27468119fd94 100644 --- a/Documentation/devicetree/bindings/clock/arm-syscon-icst.txt +++ b/Documentation/devicetree/bindings/clock/arm-syscon-icst.txt @@ -5,20 +5,50 @@ Technology (IDT). ARM integrated these oscillators deeply into their reference designs by adding special control registers that manage such oscillators to their system controllers. -The ARM system controller contains logic to serialize and initialize +The various ARM system controllers contain logic to serialize and initialize an ICST clock request after a write to the 32 bit register at an offset into the system controller. Furthermore, to even be able to alter one of these frequencies, the system controller must first be unlocked by writing a special token to another offset in the system controller. +Some ARM hardware contain special versions of the serial interface that only +connects the low 8 bits of the VDW (missing one bit), hardwires RDW to +different values and sometimes also hardwire the output divider. They +therefore have special compatible strings as per this table (the OD value is +the value on the pins, not the resulting output divider): + +Hardware variant: RDW OD VDW + +Integrator/AP 22 1 Bit 8 0, rest variable +integratorap-cm + +Integrator/AP 46 3 Bit 8 0, rest variable +integratorap-sys + +Integrator/AP 22 or 1 17 or (33 or 25 MHz) +integratorap-pci 14 1 14 + +Integrator/CP 22 variable Bit 8 0, rest variable +integratorcp-cm-core + +Integrator/CP 22 variable Bit 8 0, rest variable +integratorcp-cm-mem + The ICST oscillator must be provided inside a system controller node. Required properties: +- compatible: must be one of + "arm,syscon-icst525" + "arm,syscon-icst307" + "arm,syscon-icst525-integratorap-cm" + "arm,syscon-icst525-integratorap-sys" + "arm,syscon-icst525-integratorap-pci" + "arm,syscon-icst525-integratorcp-cm-core" + "arm,syscon-icst525-integratorcp-cm-mem" - lock-offset: the offset address into the system controller where the unlocking register is located - vco-offset: the offset address into the system controller where the ICST control register is located (even 32 bit address) -- compatible: must be one of "arm,syscon-icst525" or "arm,syscon-icst307" - #clock-cells: must be <0> - clocks: parent clock, since the ICST needs a parent clock to derive its frequency from, this attribute is compulsory. diff --git a/Documentation/devicetree/bindings/clock/armada3700-periph-clock.txt b/Documentation/devicetree/bindings/clock/armada3700-periph-clock.txt new file mode 100644 index 000000000000..1e3370ba189f --- /dev/null +++ b/Documentation/devicetree/bindings/clock/armada3700-periph-clock.txt @@ -0,0 +1,70 @@ +* Peripheral Clock bindings for Marvell Armada 37xx SoCs + +Marvell Armada 37xx SoCs provide peripheral clocks which are +used as clock source for the peripheral of the SoC. + +There are two different blocks associated to north bridge and south +bridge. + +The peripheral clock consumer should specify the desired clock by +having the clock ID in its "clocks" phandle cell. + +The following is a list of provided IDs for Armada 370 North bridge clocks: +ID Clock name Description +----------------------------------- +0 mmc MMC controller +1 sata_host Sata Host +2 sec_at Security AT +3 sac_dap Security DAP +4 tsecm Security Engine +5 setm_tmx Serial Embedded Trace Module +6 avs Adaptive Voltage Scaling +7 sqf SPI +8 pwm PWM +9 i2c_2 I2C 2 +10 i2c_1 I2C 1 +11 ddr_phy DDR PHY +12 ddr_fclk DDR F clock +13 trace Trace +14 counter Counter +15 eip97 EIP 97 +16 cpu CPU + +The following is a list of provided IDs for Armada 370 South bridge clocks: +ID Clock name Description +----------------------------------- +0 gbe-50 50 MHz parent clock for Gigabit Ethernet +1 gbe-core parent clock for Gigabit Ethernet core +2 gbe-125 125 MHz parent clock for Gigabit Ethernet +3 gbe1-50 50 MHz clock for Gigabit Ethernet port 1 +4 gbe0-50 50 MHz clock for Gigabit Ethernet port 0 +5 gbe1-125 125 MHz clock for Gigabit Ethernet port 1 +6 gbe0-125 125 MHz clock for Gigabit Ethernet port 0 +7 gbe1-core Gigabit Ethernet core port 1 +8 gbe0-core Gigabit Ethernet core port 0 +9 gbe-bm Gigabit Ethernet Buffer Manager +10 sdio SDIO +11 usb32-sub2-sys USB 2 clock +12 usb32-ss-sys USB 3 clock + +Required properties: + +- compatible : shall be "marvell,armada-3700-periph-clock-nb" for the + north bridge block, or + "marvell,armada-3700-periph-clock-sb" for the south bridge block +- reg : must be the register address of North/South Bridge Clock register +- #clock-cells : from common clock binding; shall be set to 1 + +- clocks : list of the parent clock phandle in the following order: + TBG-A P, TBG-B P, TBG-A S, TBG-B S and finally the xtal clock. + + +Example: + +nb_perih_clk: nb-periph-clk@13000{ + compatible = "marvell,armada-3700-periph-clock-nb"; + reg = <0x13000 0x1000>; + clocks = <&tbg 0>, <&tbg 1>, <&tbg 2>, + <&tbg 3>, <&xtalclk>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/clock/armada3700-tbg-clock.txt b/Documentation/devicetree/bindings/clock/armada3700-tbg-clock.txt new file mode 100644 index 000000000000..0ba1d83ff363 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/armada3700-tbg-clock.txt @@ -0,0 +1,27 @@ +* Time Base Generator Clock bindings for Marvell Armada 37xx SoCs + +Marvell Armada 37xx SoCs provde Time Base Generator clocks which are +used as parent clocks for the peripheral clocks. + +The TBG clock consumer should specify the desired clock by having the +clock ID in its "clocks" phandle cell. + +The following is a list of provided IDs and clock names on Armada 3700: + 0 = TBG A P + 1 = TBG B P + 2 = TBG A S + 3 = TBG B S + +Required properties: +- compatible : shall be "marvell,armada-3700-tbg-clock" +- reg : must be the register address of North Bridge PLL register +- #clock-cells : from common clock binding; shall be set to 1 + +Example: + +tbg: tbg@13200 { + compatible = "marvell,armada-3700-tbg-clock"; + reg = <0x13200 0x1000>; + clocks = <&xtalclk>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/clock/armada3700-xtal-clock.txt b/Documentation/devicetree/bindings/clock/armada3700-xtal-clock.txt new file mode 100644 index 000000000000..a88f1f05fbd6 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/armada3700-xtal-clock.txt @@ -0,0 +1,28 @@ +* Xtal Clock bindings for Marvell Armada 37xx SoCs + +Marvell Armada 37xx SoCs allow to determine the xtal clock frequencies by +reading the gpio latch register. + +This node must be a subnode of the node exposing the register address +of the GPIO block where the gpio latch is located. + +Required properties: +- compatible : shall be one of the following: + "marvell,armada-3700-xtal-clock" +- #clock-cells : from common clock binding; shall be set to 0 + +Optional properties: +- clock-output-names : from common clock binding; allows overwrite default clock + output names ("xtal") + +Example: +gpio1: gpio@13800 { + compatible = "marvell,armada-3700-gpio", "syscon", "simple-mfd"; + reg = <0x13800 0x1000>; + + xtalclk: xtal-clk { + compatible = "marvell,armada-3700-xtal-clock"; + clock-output-names = "xtal"; + #clock-cells = <0>; + }; +}; diff --git a/Documentation/devicetree/bindings/clock/at91-clock.txt b/Documentation/devicetree/bindings/clock/at91-clock.txt index 181bc8ac4e3a..5f3ad65daf69 100644 --- a/Documentation/devicetree/bindings/clock/at91-clock.txt +++ b/Documentation/devicetree/bindings/clock/at91-clock.txt @@ -6,7 +6,8 @@ This binding uses the common clock binding[1]. Required properties: - compatible : shall be one of the following: - "atmel,at91sam9x5-sckc": + "atmel,at91sam9x5-sckc" or + "atmel,sama5d4-sckc": at91 SCKC (Slow Clock Controller) This node contains the slow clock definitions. diff --git a/Documentation/devicetree/bindings/clock/brcm,bcm53573-ilp.txt b/Documentation/devicetree/bindings/clock/brcm,bcm53573-ilp.txt new file mode 100644 index 000000000000..2ebb107331dd --- /dev/null +++ b/Documentation/devicetree/bindings/clock/brcm,bcm53573-ilp.txt @@ -0,0 +1,36 @@ +Broadcom BCM53573 ILP clock +=========================== + +This binding uses the common clock binding: + Documentation/devicetree/bindings/clock/clock-bindings.txt + +This binding is used for ILP clock (sometimes referred as "slow clock") +on Broadcom BCM53573 devices using Cortex-A7 CPU. + +ILP's rate has to be calculated on runtime and it depends on ALP clock +which has to be referenced. + +This clock is part of PMU (Power Management Unit), a Broadcom's device +handing power-related aspects. Its node must be sub-node of the PMU +device. + +Required properties: +- compatible: "brcm,bcm53573-ilp" +- clocks: has to reference an ALP clock +- #clock-cells: should be <0> +- clock-output-names: from common clock bindings, should contain clock + name + +Example: + +pmu@18012000 { + compatible = "simple-mfd", "syscon"; + reg = <0x18012000 0x00001000>; + + ilp { + compatible = "brcm,bcm53573-ilp"; + clocks = <&alp>; + #clock-cells = <0>; + clock-output-names = "ilp"; + }; +}; diff --git a/Documentation/devicetree/bindings/clock/clk-exynos-audss.txt b/Documentation/devicetree/bindings/clock/clk-exynos-audss.txt index 180e8835569e..0c3d6015868d 100644 --- a/Documentation/devicetree/bindings/clock/clk-exynos-audss.txt +++ b/Documentation/devicetree/bindings/clock/clk-exynos-audss.txt @@ -10,6 +10,8 @@ Required Properties: - "samsung,exynos4210-audss-clock" - controller compatible with all Exynos4 SoCs. - "samsung,exynos5250-audss-clock" - controller compatible with Exynos5250 SoCs. + - "samsung,exynos5410-audss-clock" - controller compatible with Exynos5410 + SoCs. - "samsung,exynos5420-audss-clock" - controller compatible with Exynos5420 SoCs. - reg: physical base address and length of the controller's register set. @@ -91,5 +93,5 @@ i2s0: i2s@03830000 { <&clock_audss EXYNOS_MOUT_AUDSS>, <&clock_audss EXYNOS_MOUT_I2S>; clock-names = "iis", "i2s_opclk0", "i2s_opclk1", - "mout_audss", "mout_i2s"; + "mout_audss", "mout_i2s"; }; diff --git a/Documentation/devicetree/bindings/clock/exynos5410-clock.txt b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt index aeab635b07b5..4527de3ea205 100644 --- a/Documentation/devicetree/bindings/clock/exynos5410-clock.txt +++ b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt @@ -12,24 +12,29 @@ Required Properties: - #clock-cells: should be 1. +- clocks: should contain an entry specifying the root clock from external + oscillator supplied through XXTI or XusbXTI pin. This clock should be + defined using standard clock bindings with "fin_pll" clock-output-name. + That clock is being passed internally to the 9 PLLs. + All available clocks are defined as preprocessor macros in dt-bindings/clock/exynos5410.h header and can be used in device tree sources. -External clock: - -There is clock that is generated outside the SoC. It -is expected that it is defined using standard clock bindings -with following clock-output-name: - - - "fin_pll" - PLL input clock from XXTI - Example 1: An example of a clock controller node is listed below. + fin_pll: xxti { + compatible = "fixed-clock"; + clock-frequency = <24000000>; + clock-output-names = "fin_pll"; + #clock-cells = <0>; + }; + clock: clock-controller@0x10010000 { compatible = "samsung,exynos5410-clock"; reg = <0x10010000 0x30000>; #clock-cells = <1>; + clocks = <&fin_pll>; }; Example 2: UART controller node that consumes the clock generated by the clock diff --git a/Documentation/devicetree/bindings/clock/maxim,max77686.txt b/Documentation/devicetree/bindings/clock/maxim,max77686.txt index 9c40739a661a..8398a3a5e106 100644 --- a/Documentation/devicetree/bindings/clock/maxim,max77686.txt +++ b/Documentation/devicetree/bindings/clock/maxim,max77686.txt @@ -1,10 +1,24 @@ -Binding for Maxim MAX77686 32k clock generator block +Binding for Maxim MAX77686/MAX77802/MAX77620 32k clock generator block -This is a part of device tree bindings of MAX77686 multi-function device. -More information can be found in bindings/mfd/max77686.txt file. +This is a part of device tree bindings of MAX77686/MAX77802/MAX77620 +multi-function device. More information can be found in MFD DT binding +doc as follows: + bindings/mfd/max77686.txt for MAX77686 and + bindings/mfd/max77802.txt for MAX77802 and + bindings/mfd/max77620.txt for MAX77620. The MAX77686 contains three 32.768khz clock outputs that can be controlled -(gated/ungated) over I2C. +(gated/ungated) over I2C. Clocks are defined as preprocessor macros in +dt-bindings/clock/maxim,max77686.h. + + +The MAX77802 contains two 32.768khz clock outputs that can be controlled +(gated/ungated) over I2C. Clocks are defined as preprocessor macros in +dt-bindings/clock/maxim,max77802.h. + +The MAX77686 contains one 32.768khz clock outputs that can be controlled +(gated/ungated) over I2C. Clocks are defined as preprocessor macros in +dt-bindings/clock/maxim,max77620.h. Following properties should be presend in main device node of the MFD chip. @@ -17,30 +31,84 @@ Optional properties: Each clock is assigned an identifier and client nodes can use this identifier to specify the clock which they consume. Following indices are allowed: - - 0: 32khz_ap clock, - - 1: 32khz_cp clock, - - 2: 32khz_pmic clock. + - 0: 32khz_ap clock (max77686, max77802), 32khz_out0 (max77620) + - 1: 32khz_cp clock (max77686, max77802), + - 2: 32khz_pmic clock (max77686). + +Clocks are defined as preprocessor macros in above dt-binding header for +respective chips. + +Example: + +1. With MAX77686: + +#include <dt-bindings/clock/maxim,max77686.h> +/* ... */ + + Node of the MFD chip + max77686: max77686@09 { + compatible = "maxim,max77686"; + interrupt-parent = <&wakeup_eint>; + interrupts = <26 0>; + reg = <0x09>; + #clock-cells = <1>; + + /* ... */ + }; + + Clock consumer node + + foo@0 { + compatible = "bar,foo"; + /* ... */ + clock-names = "my-clock"; + clocks = <&max77686 MAX77686_CLK_PMIC>; + }; + +2. With MAX77802: + +#include <dt-bindings/clock/maxim,max77802.h> +/* ... */ + + Node of the MFD chip + max77802: max77802@09 { + compatible = "maxim,max77802"; + interrupt-parent = <&wakeup_eint>; + interrupts = <26 0>; + reg = <0x09>; + #clock-cells = <1>; + + /* ... */ + }; + + Clock consumer node + + foo@0 { + compatible = "bar,foo"; + /* ... */ + clock-names = "my-clock"; + clocks = <&max77802 MAX77802_CLK_32K_AP>; + }; -Clocks are defined as preprocessor macros in dt-bindings/clock/maxim,max77686.h -header and can be used in device tree sources. -Example: Node of the MFD chip +3. With MAX77620: - max77686: max77686@09 { - compatible = "maxim,max77686"; - interrupt-parent = <&wakeup_eint>; - interrupts = <26 0>; - reg = <0x09>; - #clock-cells = <1>; +#include <dt-bindings/clock/maxim,max77620.h> +/* ... */ - /* ... */ - }; + Node of the MFD chip + max77620: max77620@3c { + compatible = "maxim,max77620"; + reg = <0x3c>; + #clock-cells = <1>; + /* ... */ + }; -Example: Clock consumer node + Clock consumer node - foo@0 { - compatible = "bar,foo"; - /* ... */ - clock-names = "my-clock"; - clocks = <&max77686 MAX77686_CLK_PMIC>; - }; + foo@0 { + compatible = "bar,foo"; + /* ... */ + clock-names = "my-clock"; + clocks = <&max77620 MAX77620_CLK_32K_OUT0>; + }; diff --git a/Documentation/devicetree/bindings/clock/maxim,max77802.txt b/Documentation/devicetree/bindings/clock/maxim,max77802.txt deleted file mode 100644 index c6dc7835f06c..000000000000 --- a/Documentation/devicetree/bindings/clock/maxim,max77802.txt +++ /dev/null @@ -1,44 +0,0 @@ -Binding for Maxim MAX77802 32k clock generator block - -This is a part of device tree bindings of MAX77802 multi-function device. -More information can be found in bindings/mfd/max77802.txt file. - -The MAX77802 contains two 32.768khz clock outputs that can be controlled -(gated/ungated) over I2C. - -Following properties should be present in main device node of the MFD chip. - -Required properties: -- #clock-cells: From common clock binding; shall be set to 1. - -Optional properties: -- clock-output-names: From common clock binding. - -Each clock is assigned an identifier and client nodes can use this identifier -to specify the clock which they consume. Following indices are allowed: - - 0: 32khz_ap clock, - - 1: 32khz_cp clock. - -Clocks are defined as preprocessor macros in dt-bindings/clock/maxim,max77802.h -header and can be used in device tree sources. - -Example: Node of the MFD chip - - max77802: max77802@09 { - compatible = "maxim,max77802"; - interrupt-parent = <&wakeup_eint>; - interrupts = <26 0>; - reg = <0x09>; - #clock-cells = <1>; - - /* ... */ - }; - -Example: Clock consumer node - - foo@0 { - compatible = "bar,foo"; - /* ... */ - clock-names = "my-clock"; - clocks = <&max77802 MAX77802_CLK_32K_AP>; - }; diff --git a/Documentation/devicetree/bindings/clock/mvebu-gated-clock.txt b/Documentation/devicetree/bindings/clock/mvebu-gated-clock.txt index 660e64912cce..cb8542d910b3 100644 --- a/Documentation/devicetree/bindings/clock/mvebu-gated-clock.txt +++ b/Documentation/devicetree/bindings/clock/mvebu-gated-clock.txt @@ -86,6 +86,8 @@ ID Clock Peripheral 7 pex3 PCIe 3 8 pex0 PCIe 0 9 usb3h0 USB3 Host 0 +10 usb3h1 USB3 Host 1 +15 sata0 SATA 0 17 sdio SDIO 22 xor0 XOR 0 28 xor1 XOR 1 diff --git a/Documentation/devicetree/bindings/clock/qcom,gcc.txt b/Documentation/devicetree/bindings/clock/qcom,gcc.txt index 9a60fde32b02..869a2f0e2ff6 100644 --- a/Documentation/devicetree/bindings/clock/qcom,gcc.txt +++ b/Documentation/devicetree/bindings/clock/qcom,gcc.txt @@ -15,6 +15,7 @@ Required properties : "qcom,gcc-msm8974pro" "qcom,gcc-msm8974pro-ac" "qcom,gcc-msm8996" + "qcom,gcc-mdm9615" - reg : shall contain base register location and length - #clock-cells : shall contain 1 diff --git a/Documentation/devicetree/bindings/clock/qcom,lcc.txt b/Documentation/devicetree/bindings/clock/qcom,lcc.txt index dd755be63a01..a3c78aa88038 100644 --- a/Documentation/devicetree/bindings/clock/qcom,lcc.txt +++ b/Documentation/devicetree/bindings/clock/qcom,lcc.txt @@ -7,6 +7,7 @@ Required properties : "qcom,lcc-msm8960" "qcom,lcc-apq8064" "qcom,lcc-ipq8064" + "qcom,lcc-mdm9615" - reg : shall contain base register location and length - #clock-cells : shall contain 1 diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-divmux.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-divmux.txt deleted file mode 100644 index 6247652044a0..000000000000 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen-divmux.txt +++ /dev/null @@ -1,49 +0,0 @@ -Binding for a ST divider and multiplexer clock driver. - -This binding uses the common clock binding[1]. -Base address is located to the parent node. See clock binding[2] - -[1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[2] Documentation/devicetree/bindings/clock/st/st,clkgen.txt - -Required properties: - -- compatible : shall be: - "st,clkgena-divmux-c65-hs", "st,clkgena-divmux" - "st,clkgena-divmux-c65-ls", "st,clkgena-divmux" - "st,clkgena-divmux-c32-odf0", "st,clkgena-divmux" - "st,clkgena-divmux-c32-odf1", "st,clkgena-divmux" - "st,clkgena-divmux-c32-odf2", "st,clkgena-divmux" - "st,clkgena-divmux-c32-odf3", "st,clkgena-divmux" - -- #clock-cells : From common clock binding; shall be set to 1. - -- clocks : From common clock binding - -- clock-output-names : From common clock binding. - -Example: - - clockgen-a@fd345000 { - reg = <0xfd345000 0xb50>; - - clk_m_a1_div1: clk-m-a1-div1 { - #clock-cells = <1>; - compatible = "st,clkgena-divmux-c32-odf1", - "st,clkgena-divmux"; - - clocks = <&clk_m_a1_osc_prediv>, - <&clk_m_a1_pll0 1>, /* PLL0 PHI1 */ - <&clk_m_a1_pll1 1>; /* PLL1 PHI1 */ - - clock-output-names = "clk-m-rx-icn-ts", - "clk-m-rx-icn-vdp-0", - "", /* unused */ - "clk-m-prv-t1-bus", - "clk-m-icn-reg-12", - "clk-m-icn-reg-10", - "", /* unused */ - "clk-m-icn-st231"; - }; - }; - diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-mux.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-mux.txt index f1fa91c68768..9a46cb1d7a04 100644 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen-mux.txt +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen-mux.txt @@ -10,14 +10,7 @@ This binding uses the common clock binding[1]. Required properties: - compatible : shall be: - "st,stih416-clkgenc-vcc-hd", "st,clkgen-mux" - "st,stih416-clkgenf-vcc-fvdp", "st,clkgen-mux" - "st,stih416-clkgenf-vcc-hva", "st,clkgen-mux" - "st,stih416-clkgenf-vcc-hd", "st,clkgen-mux" - "st,stih416-clkgenf-vcc-sd", "st,clkgen-mux" - "st,stih415-clkgen-a9-mux", "st,clkgen-mux" - "st,stih416-clkgen-a9-mux", "st,clkgen-mux" - "st,stih407-clkgen-a9-mux", "st,clkgen-mux" + "st,stih407-clkgen-a9-mux" - #clock-cells : from common clock binding; shall be set to 0. @@ -27,10 +20,13 @@ Required properties: Example: - clk_m_hva: clk-m-hva@fd690868 { + clk_m_a9: clk-m-a9@92b0000 { #clock-cells = <0>; - compatible = "st,stih416-clkgenf-vcc-hva", "st,clkgen-mux"; - reg = <0xfd690868 4>; + compatible = "st,stih407-clkgen-a9-mux"; + reg = <0x92b0000 0x10000>; - clocks = <&clockgen_f 1>, <&clk_m_a1_div0 3>; + clocks = <&clockgen_a9_pll 0>, + <&clockgen_a9_pll 0>, + <&clk_s_c0_flexgen 13>, + <&clk_m_a9_ext2f_div2>; }; diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt index 844b3a0976bf..f207053e0550 100644 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt @@ -9,24 +9,10 @@ Base address is located to the parent node. See clock binding[2] Required properties: - compatible : shall be: - "st,clkgena-prediv-c65", "st,clkgena-prediv" - "st,clkgena-prediv-c32", "st,clkgena-prediv" - - "st,clkgena-plls-c65" - "st,plls-c32-a1x-0", "st,clkgen-plls-c32" - "st,plls-c32-a1x-1", "st,clkgen-plls-c32" - "st,stih415-plls-c32-a9", "st,clkgen-plls-c32" - "st,stih415-plls-c32-ddr", "st,clkgen-plls-c32" - "st,stih416-plls-c32-a9", "st,clkgen-plls-c32" - "st,stih416-plls-c32-ddr", "st,clkgen-plls-c32" - "st,stih407-plls-c32-a0", "st,clkgen-plls-c32" - "st,stih407-plls-c32-a9", "st,clkgen-plls-c32" - "sst,plls-c32-cx_0", "st,clkgen-plls-c32" - "sst,plls-c32-cx_1", "st,clkgen-plls-c32" - "st,stih418-plls-c28-a9", "st,clkgen-plls-c32" - - "st,stih415-gpu-pll-c32", "st,clkgengpu-pll-c32" - "st,stih416-gpu-pll-c32", "st,clkgengpu-pll-c32" + "st,clkgen-pll0" + "st,clkgen-pll1" + "st,stih407-clkgen-plla9" + "st,stih418-clkgen-plla9" - #clock-cells : From common clock binding; shall be set to 1. @@ -36,17 +22,16 @@ Required properties: Example: - clockgen-a@fee62000 { - reg = <0xfee62000 0xb48>; + clockgen-a9@92b0000 { + compatible = "st,clkgen-c32"; + reg = <0x92b0000 0xffff>; - clk_s_a0_pll: clk-s-a0-pll { + clockgen_a9_pll: clockgen-a9-pll { #clock-cells = <1>; - compatible = "st,clkgena-plls-c65"; + compatible = "st,stih407-clkgen-plla9"; clocks = <&clk_sysin>; - clock-output-names = "clk-s-a0-pll0-hs", - "clk-s-a0-pll0-ls", - "clk-s-a0-pll1"; + clock-output-names = "clockgen-a9-pll-odf"; }; }; diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-prediv.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-prediv.txt deleted file mode 100644 index 604766c2619e..000000000000 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen-prediv.txt +++ /dev/null @@ -1,36 +0,0 @@ -Binding for a ST pre-divider clock driver. - -This binding uses the common clock binding[1]. -Base address is located to the parent node. See clock binding[2] - -[1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[2] Documentation/devicetree/bindings/clock/st/st,clkgen.txt - -Required properties: - -- compatible : shall be: - "st,clkgena-prediv-c65", "st,clkgena-prediv" - "st,clkgena-prediv-c32", "st,clkgena-prediv" - -- #clock-cells : From common clock binding; shall be set to 0. - -- clocks : From common clock binding - -- clock-output-names : From common clock binding. - -Example: - - clockgen-a@fd345000 { - reg = <0xfd345000 0xb50>; - - clk_m_a2_osc_prediv: clk-m-a2-osc-prediv { - #clock-cells = <0>; - compatible = "st,clkgena-prediv-c32", - "st,clkgena-prediv"; - - clocks = <&clk_sysin>; - - clock-output-names = "clk-m-a2-osc-prediv"; - }; - }; - diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-vcc.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-vcc.txt deleted file mode 100644 index 109b3eddcb17..000000000000 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen-vcc.txt +++ /dev/null @@ -1,61 +0,0 @@ -Binding for a type of STMicroelectronics clock crossbar (VCC). - -The crossbar can take up to 4 input clocks and control up to 16 -output clocks. Not all inputs or outputs have to be in use in a -particular instantiation. Each output can be individually enabled, -select any of the input clocks and apply a divide (by 1,2,4 or 8) to -that selected clock. - -This binding uses the common clock binding[1]. - -[1] Documentation/devicetree/bindings/clock/clock-bindings.txt - -Required properties: - -- compatible : shall be: - "st,stih416-clkgenc", "st,vcc" - "st,stih416-clkgenf", "st,vcc" - -- #clock-cells : from common clock binding; shall be set to 1. - -- reg : A Base address and length of the register set. - -- clocks : from common clock binding - -- clock-output-names : From common clock binding. The block has 16 - clock outputs but not all of them in a specific instance - have to be used in the SoC. If a clock name is left as - an empty string then no clock will be created for the - output associated with that string index. If fewer than - 16 strings are provided then no clocks will be created - for the remaining outputs. - -Example: - - clockgen_c_vcc: clockgen-c-vcc@0xfe8308ac { - #clock-cells = <1>; - compatible = "st,stih416-clkgenc", "st,clkgen-vcc"; - reg = <0xfe8308ac 12>; - - clocks = <&clk_s_vcc_hd>, - <&clockgen_c 1>, - <&clk_s_tmds_fromphy>, - <&clockgen_c 2>; - - clock-output-names = "clk-s-pix-hdmi", - "clk-s-pix-dvo", - "clk-s-out-dvo", - "clk-s-pix-hd", - "clk-s-hddac", - "clk-s-denc", - "clk-s-sddac", - "clk-s-pix-main", - "clk-s-pix-aux", - "clk-s-stfe-frc-0", - "clk-s-ref-mcru", - "clk-s-slave-mcru", - "clk-s-tmds-hdmi", - "clk-s-hdmi-reject-pll", - "clk-s-thsens"; - }; - diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt index b18bf86f926f..c35390f60545 100644 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt @@ -13,14 +13,6 @@ address is common of all subnode. ... }; - prediv_node { - ... - }; - - divmux_node { - ... - }; - quadfs_node { ... }; @@ -29,10 +21,6 @@ address is common of all subnode. ... }; - vcc_node { - ... - }; - flexgen_node { ... }; @@ -43,11 +31,8 @@ This binding uses the common clock binding[1]. Each subnode should use the binding described in [2]..[7] [1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[2] Documentation/devicetree/bindings/clock/st,clkgen-divmux.txt [3] Documentation/devicetree/bindings/clock/st,clkgen-mux.txt [4] Documentation/devicetree/bindings/clock/st,clkgen-pll.txt -[5] Documentation/devicetree/bindings/clock/st,clkgen-prediv.txt -[6] Documentation/devicetree/bindings/clock/st,vcc.txt [7] Documentation/devicetree/bindings/clock/st,quadfs.txt [8] Documentation/devicetree/bindings/clock/st,flexgen.txt @@ -57,44 +42,27 @@ Required properties: Example: - clockgen-a@fee62000 { - - reg = <0xfee62000 0xb48>; + clockgen-a@090ff000 { + compatible = "st,clkgen-c32"; + reg = <0x90ff000 0x1000>; clk_s_a0_pll: clk-s-a0-pll { #clock-cells = <1>; - compatible = "st,clkgena-plls-c65"; - - clocks = <&clk-sysin>; - - clock-output-names = "clk-s-a0-pll0-hs", - "clk-s-a0-pll0-ls", - "clk-s-a0-pll1"; - }; - - clk_s_a0_osc_prediv: clk-s-a0-osc-prediv { - #clock-cells = <0>; - compatible = "st,clkgena-prediv-c65", - "st,clkgena-prediv"; + compatible = "st,clkgen-pll0"; clocks = <&clk_sysin>; - clock-output-names = "clk-s-a0-osc-prediv"; + clock-output-names = "clk-s-a0-pll-ofd-0"; }; - clk_s_a0_hs: clk-s-a0-hs { + clk_s_a0_flexgen: clk-s-a0-flexgen { + compatible = "st,flexgen"; + #clock-cells = <1>; - compatible = "st,clkgena-divmux-c65-hs", - "st,clkgena-divmux"; - clocks = <&clk-s_a0_osc_prediv>, - <&clk-s_a0_pll 0>, /* pll0 hs */ - <&clk-s_a0_pll 2>; /* pll1 */ + clocks = <&clk_s_a0_pll 0>, + <&clk_sysin>; - clock-output-names = "clk-s-fdma-0", - "clk-s-fdma-1", - ""; /* clk-s-jit-sense */ - /* fourth output unused */ + clock-output-names = "clk-ic-lmi0"; }; }; - diff --git a/Documentation/devicetree/bindings/clock/st/st,flexgen.txt b/Documentation/devicetree/bindings/clock/st/st,flexgen.txt index b7ee5c7e0f75..7ff77fc57dff 100644 --- a/Documentation/devicetree/bindings/clock/st/st,flexgen.txt +++ b/Documentation/devicetree/bindings/clock/st/st,flexgen.txt @@ -60,6 +60,10 @@ This binding uses the common clock binding[2]. Required properties: - compatible : shall be: "st,flexgen" + "st,flexgen-audio", "st,flexgen" (enable clock propagation on parent for + audio use case) + "st,flexgen-video", "st,flexgen" (enable clock propagation on parent + and activate synchronous mode) - #clock-cells : from common clock binding; shall be set to 1 (multiple clock outputs). diff --git a/Documentation/devicetree/bindings/clock/st/st,quadfs.txt b/Documentation/devicetree/bindings/clock/st/st,quadfs.txt index cedeb9cc8208..d93d49342e60 100644 --- a/Documentation/devicetree/bindings/clock/st/st,quadfs.txt +++ b/Documentation/devicetree/bindings/clock/st/st,quadfs.txt @@ -11,12 +11,8 @@ This binding uses the common clock binding[1]. Required properties: - compatible : shall be: - "st,stih416-quadfs216", "st,quadfs" - "st,stih416-quadfs432", "st,quadfs" - "st,stih416-quadfs660-E", "st,quadfs" - "st,stih416-quadfs660-F", "st,quadfs" - "st,stih407-quadfs660-C", "st,quadfs" - "st,stih407-quadfs660-D", "st,quadfs" + "st,quadfs" + "st,quadfs-pll" - #clock-cells : from common clock binding; shall be set to 1. @@ -35,14 +31,15 @@ Required properties: Example: - clockgen_e: clockgen-e@fd3208bc { - #clock-cells = <1>; - compatible = "st,stih416-quadfs660-E", "st,quadfs"; - reg = <0xfd3208bc 0xB0>; - - clocks = <&clk_sysin>; - clock-output-names = "clk-m-pix-mdtp-0", - "clk-m-pix-mdtp-1", - "clk-m-pix-mdtp-2", - "clk-m-mpelpc"; - }; + clk_s_c0_quadfs: clk-s-c0-quadfs@9103000 { + #clock-cells = <1>; + compatible = "st,quadfs-pll"; + reg = <0x9103000 0x1000>; + + clocks = <&clk_sysin>; + + clock-output-names = "clk-s-c0-fs0-ch0", + "clk-s-c0-fs0-ch1", + "clk-s-c0-fs0-ch2", + "clk-s-c0-fs0-ch3"; + }; diff --git a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt index cb91507ffb1e..3868458a5feb 100644 --- a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt +++ b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt @@ -2,7 +2,10 @@ Allwinner Clock Control Unit Binding ------------------------------------ Required properties : -- compatible: must contain one of the following compatible: +- compatible: must contain one of the following compatibles: + - "allwinner,sun6i-a31-ccu" + - "allwinner,sun8i-a23-ccu" + - "allwinner,sun8i-a33-ccu" - "allwinner,sun8i-h3-ccu" - reg: Must contain the registers base address and length diff --git a/Documentation/devicetree/bindings/clock/uniphier-clock.txt b/Documentation/devicetree/bindings/clock/uniphier-clock.txt new file mode 100644 index 000000000000..c7179d3b5c33 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/uniphier-clock.txt @@ -0,0 +1,134 @@ +UniPhier clock controller + + +System clock +------------ + +Required properties: +- compatible: should be one of the following: + "socionext,uniphier-sld3-clock" - for sLD3 SoC. + "socionext,uniphier-ld4-clock" - for LD4 SoC. + "socionext,uniphier-pro4-clock" - for Pro4 SoC. + "socionext,uniphier-sld8-clock" - for sLD8 SoC. + "socionext,uniphier-pro5-clock" - for Pro5 SoC. + "socionext,uniphier-pxs2-clock" - for PXs2/LD6b SoC. + "socionext,uniphier-ld11-clock" - for LD11 SoC. + "socionext,uniphier-ld20-clock" - for LD20 SoC. +- #clock-cells: should be 1. + +Example: + + sysctrl@61840000 { + compatible = "socionext,uniphier-sysctrl", + "simple-mfd", "syscon"; + reg = <0x61840000 0x4000>; + + clock { + compatible = "socionext,uniphier-ld20-clock"; + #clock-cells = <1>; + }; + + other nodes ... + }; + +Provided clocks: + + 8: ST DMAC +12: GIO (Giga bit stream I/O) +14: USB3 ch0 host +15: USB3 ch1 host +16: USB3 ch0 PHY0 +17: USB3 ch0 PHY1 +20: USB3 ch1 PHY0 +21: USB3 ch1 PHY1 + + +Media I/O (MIO) clock +--------------------- + +Required properties: +- compatible: should be one of the following: + "socionext,uniphier-sld3-mio-clock" - for sLD3 SoC. + "socionext,uniphier-ld4-mio-clock" - for LD4 SoC. + "socionext,uniphier-pro4-mio-clock" - for Pro4 SoC. + "socionext,uniphier-sld8-mio-clock" - for sLD8 SoC. + "socionext,uniphier-pro5-mio-clock" - for Pro5 SoC. + "socionext,uniphier-pxs2-mio-clock" - for PXs2/LD6b SoC. + "socionext,uniphier-ld11-mio-clock" - for LD11 SoC. + "socionext,uniphier-ld20-mio-clock" - for LD20 SoC. +- #clock-cells: should be 1. + +Example: + + mioctrl@59810000 { + compatible = "socionext,uniphier-mioctrl", + "simple-mfd", "syscon"; + reg = <0x59810000 0x800>; + + clock { + compatible = "socionext,uniphier-ld20-mio-clock"; + #clock-cells = <1>; + }; + + other nodes ... + }; + +Provided clocks: + + 0: SD ch0 host + 1: eMMC host + 2: SD ch1 host + 7: MIO DMAC + 8: USB2 ch0 host + 9: USB2 ch1 host +10: USB2 ch2 host +11: USB2 ch3 host +12: USB2 ch0 PHY +13: USB2 ch1 PHY +14: USB2 ch2 PHY +15: USB2 ch3 PHY + + +Peripheral clock +---------------- + +Required properties: +- compatible: should be one of the following: + "socionext,uniphier-sld3-peri-clock" - for sLD3 SoC. + "socionext,uniphier-ld4-peri-clock" - for LD4 SoC. + "socionext,uniphier-pro4-peri-clock" - for Pro4 SoC. + "socionext,uniphier-sld8-peri-clock" - for sLD8 SoC. + "socionext,uniphier-pro5-peri-clock" - for Pro5 SoC. + "socionext,uniphier-pxs2-peri-clock" - for PXs2/LD6b SoC. + "socionext,uniphier-ld11-peri-clock" - for LD11 SoC. + "socionext,uniphier-ld20-peri-clock" - for LD20 SoC. +- #clock-cells: should be 1. + +Example: + + perictrl@59820000 { + compatible = "socionext,uniphier-perictrl", + "simple-mfd", "syscon"; + reg = <0x59820000 0x200>; + + clock { + compatible = "socionext,uniphier-ld20-peri-clock"; + #clock-cells = <1>; + }; + + other nodes ... + }; + +Provided clocks: + + 0: UART ch0 + 1: UART ch1 + 2: UART ch2 + 3: UART ch3 + 4: I2C ch0 + 5: I2C ch1 + 6: I2C ch2 + 7: I2C ch3 + 8: I2C ch4 + 9: I2C ch5 +10: I2C ch6 diff --git a/Documentation/devicetree/bindings/clock/xgene.txt b/Documentation/devicetree/bindings/clock/xgene.txt index 82f9638121db..8233e771711b 100644 --- a/Documentation/devicetree/bindings/clock/xgene.txt +++ b/Documentation/devicetree/bindings/clock/xgene.txt @@ -8,6 +8,7 @@ Required properties: - compatible : shall be one of the following: "apm,xgene-socpll-clock" - for a X-Gene SoC PLL clock "apm,xgene-pcppll-clock" - for a X-Gene PCP PLL clock + "apm,xgene-pmd-clock" - for a X-Gene PMD clock "apm,xgene-device-clock" - for a X-Gene device clock "apm,xgene-socpll-v2-clock" - for a X-Gene SoC PLL v2 clock "apm,xgene-pcppll-v2-clock" - for a X-Gene PCP PLL v2 clock @@ -22,6 +23,15 @@ Required properties for SoC or PCP PLL clocks: Optional properties for PLL clocks: - clock-names : shall be the name of the PLL. If missing, use the device name. +Required properties for PMD clocks: +- reg : shall be the physical register address for the pmd clock. +- clocks : shall be the input parent clock phandle for the clock. +- #clock-cells : shall be set to 1. +- clock-output-names : shall be the name of the clock referenced by derive + clock. +Optional properties for PLL clocks: +- clock-names : shall be the name of the clock. If missing, use the device name. + Required properties for device clocks: - reg : shall be a list of address and length pairs describing the CSR reset and/or the divider. Either may be omitted, but at least @@ -59,6 +69,14 @@ For example: type = <0>; }; + pmd0clk: pmd0clk@7e200200 { + compatible = "apm,xgene-pmd-clock"; + #clock-cells = <1>; + clocks = <&pmdpll 0>; + reg = <0x0 0x7e200200 0x0 0x10>; + clock-output-names = "pmd0clk"; + }; + socpll: socpll@17000120 { compatible = "apm,xgene-socpll-clock"; #clock-cells = <1>; diff --git a/Documentation/devicetree/bindings/clock/zx296718-clk.txt b/Documentation/devicetree/bindings/clock/zx296718-clk.txt new file mode 100644 index 000000000000..8c18b7b237bf --- /dev/null +++ b/Documentation/devicetree/bindings/clock/zx296718-clk.txt @@ -0,0 +1,35 @@ +Device Tree Clock bindings for ZTE zx296718 + +This binding uses the common clock binding[1]. + +[1] Documentation/devicetree/bindings/clock/clock-bindings.txt + +Required properties: +- compatible : shall be one of the following: + "zte,zx296718-topcrm": + zx296718 top clock selection, divider and gating + + "zte,zx296718-lsp0crm" and + "zte,zx296718-lsp1crm": + zx296718 device level clock selection and gating + +- reg: Address and length of the register set + +The clock consumer should specify the desired clock by having the clock +ID in its "clocks" phandle cell. See include/dt-bindings/clock/zx296718-clock.h +for the full list of zx296718 clock IDs. + + +topclk: topcrm@1461000 { + compatible = "zte,zx296718-topcrm-clk"; + reg = <0x01461000 0x1000>; + #clock-cells = <1>; +}; + +usbphy0:usb-phy0 { + compatible = "zte,zx296718-usb-phy"; + #phy-cells = <0>; + clocks = <&topclk USB20_PHY_CLK>; + clock-names = "phyclk"; + status = "okay"; +}; diff --git a/Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt b/Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt new file mode 100644 index 000000000000..f2233138eba9 --- /dev/null +++ b/Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt @@ -0,0 +1,19 @@ + +* Rockchip rk3399 DFI device + +Required properties: +- compatible: Must be "rockchip,rk3399-dfi". +- reg: physical base address of each DFI and length of memory mapped region +- rockchip,pmu: phandle to the syscon managing the "pmu general register files" +- clocks: phandles for clock specified in "clock-names" property +- clock-names : the name of clock used by the DFI, must be "pclk_ddr_mon"; + +Example: + dfi: dfi@0xff630000 { + compatible = "rockchip,rk3399-dfi"; + reg = <0x00 0xff630000 0x00 0x4000>; + rockchip,pmu = <&pmugrf>; + clocks = <&cru PCLK_DDR_MON>; + clock-names = "pclk_ddr_mon"; + status = "disabled"; + }; diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt new file mode 100644 index 000000000000..7a9e8603c150 --- /dev/null +++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt @@ -0,0 +1,209 @@ +* Rockchip rk3399 DMC(Dynamic Memory Controller) device + +Required properties: +- compatible: Must be "rockchip,rk3399-dmc". +- devfreq-events: Node to get DDR loading, Refer to + Documentation/devicetree/bindings/devfreq/ + rockchip-dfi.txt +- interrupts: The interrupt number to the CPU. The interrupt + specifier format depends on the interrupt controller. + It should be DCF interrupts, when DDR dvfs finish, + it will happen. +- clocks: Phandles for clock specified in "clock-names" property +- clock-names : The name of clock used by the DFI, must be + "pclk_ddr_mon"; +- operating-points-v2: Refer to Documentation/devicetree/bindings/power/opp.txt + for details. +- center-supply: DMC supply node. +- status: Marks the node enabled/disabled. + +Following properties are ddr timing: + +- rockchip,dram_speed_bin : Value reference include/dt-bindings/clock/ddr.h, + it select ddr3 cl-trp-trcd type, default value + "DDR3_DEFAULT".it must selected according to + "Speed Bin" in ddr3 datasheet, DO NOT use + smaller "Speed Bin" than ddr3 exactly is. + +- rockchip,pd_idle : Config the PD_IDLE value, defined the power-down + idle period, memories are places into power-down + mode if bus is idle for PD_IDLE DFI clocks. + +- rockchip,sr_idle : Configure the SR_IDLE value, defined the + selfrefresh idle period, memories are places + into self-refresh mode if bus is idle for + SR_IDLE*1024 DFI clocks (DFI clocks freq is + half of dram's clocks), defaule value is "0". + +- rockchip,sr_mc_gate_idle : Defined the self-refresh with memory and + controller clock gating idle period, memories + are places into self-refresh mode and memory + controller clock arg gating if bus is idle for + sr_mc_gate_idle*1024 DFI clocks. + +- rockchip,srpd_lite_idle : Defined the self-refresh power down idle + period, memories are places into self-refresh + power down mode if bus is idle for + srpd_lite_idle*1024 DFI clocks. This parameter + is for LPDDR4 only. + +- rockchip,standby_idle : Defined the standby idle period, memories are + places into self-refresh than controller, pi, + phy and dram clock will gating if bus is idle + for standby_idle * DFI clocks. + +- rockchip,dram_dll_disb_freq : It's defined the DDR3 dll bypass frequency in + MHz, when ddr freq less than DRAM_DLL_DISB_FREQ, + ddr3 dll will bypssed note: if dll was bypassed, + the odt also stop working. + +- rockchip,phy_dll_disb_freq : Defined the PHY dll bypass frequency in + MHz (Mega Hz), when ddr freq less than + DRAM_DLL_DISB_FREQ, phy dll will bypssed. + note: phy dll and phy odt are independent. + +- rockchip,ddr3_odt_disb_freq : When dram type is DDR3, this parameter defined + the odt disable frequency in MHz (Mega Hz), + when ddr frequency less then ddr3_odt_disb_freq, + the odt on dram side and controller side are + both disabled. + +- rockchip,ddr3_drv : When dram type is DDR3, this parameter define + the dram side driver stength in ohm, default + value is DDR3_DS_40ohm. + +- rockchip,ddr3_odt : When dram type is DDR3, this parameter define + the dram side ODT stength in ohm, default value + is DDR3_ODT_120ohm. + +- rockchip,phy_ddr3_ca_drv : When dram type is DDR3, this parameter define + the phy side CA line(incluing command line, + address line and clock line) driver strength. + Default value is PHY_DRV_ODT_40. + +- rockchip,phy_ddr3_dq_drv : When dram type is DDR3, this parameter define + the phy side DQ line(incluing DQS/DQ/DM line) + driver strength. default value is PHY_DRV_ODT_40. + +- rockchip,phy_ddr3_odt : When dram type is DDR3, this parameter define the + phy side odt strength, default value is + PHY_DRV_ODT_240. + +- rockchip,lpddr3_odt_disb_freq : When dram type is LPDDR3, this parameter defined + then odt disable frequency in MHz (Mega Hz), + when ddr frequency less then ddr3_odt_disb_freq, + the odt on dram side and controller side are + both disabled. + +- rockchip,lpddr3_drv : When dram type is LPDDR3, this parameter define + the dram side driver stength in ohm, default + value is LP3_DS_34ohm. + +- rockchip,lpddr3_odt : When dram type is LPDDR3, this parameter define + the dram side ODT stength in ohm, default value + is LP3_ODT_240ohm. + +- rockchip,phy_lpddr3_ca_drv : When dram type is LPDDR3, this parameter define + the phy side CA line(incluing command line, + address line and clock line) driver strength. + default value is PHY_DRV_ODT_40. + +- rockchip,phy_lpddr3_dq_drv : When dram type is LPDDR3, this parameter define + the phy side DQ line(incluing DQS/DQ/DM line) + driver strength. default value is + PHY_DRV_ODT_40. + +- rockchip,phy_lpddr3_odt : When dram type is LPDDR3, this parameter define + the phy side odt strength, default value is + PHY_DRV_ODT_240. + +- rockchip,lpddr4_odt_disb_freq : When dram type is LPDDR4, this parameter + defined the odt disable frequency in + MHz (Mega Hz), when ddr frequency less then + ddr3_odt_disb_freq, the odt on dram side and + controller side are both disabled. + +- rockchip,lpddr4_drv : When dram type is LPDDR4, this parameter define + the dram side driver stength in ohm, default + value is LP4_PDDS_60ohm. + +- rockchip,lpddr4_dq_odt : When dram type is LPDDR4, this parameter define + the dram side ODT on dqs/dq line stength in ohm, + default value is LP4_DQ_ODT_40ohm. + +- rockchip,lpddr4_ca_odt : When dram type is LPDDR4, this parameter define + the dram side ODT on ca line stength in ohm, + default value is LP4_CA_ODT_40ohm. + +- rockchip,phy_lpddr4_ca_drv : When dram type is LPDDR4, this parameter define + the phy side CA line(incluing command address + line) driver strength. default value is + PHY_DRV_ODT_40. + +- rockchip,phy_lpddr4_ck_cs_drv : When dram type is LPDDR4, this parameter define + the phy side clock line and cs line driver + strength. default value is PHY_DRV_ODT_80. + +- rockchip,phy_lpddr4_dq_drv : When dram type is LPDDR4, this parameter define + the phy side DQ line(incluing DQS/DQ/DM line) + driver strength. default value is PHY_DRV_ODT_80. + +- rockchip,phy_lpddr4_odt : When dram type is LPDDR4, this parameter define + the phy side odt strength, default value is + PHY_DRV_ODT_60. + +Example: + dmc_opp_table: dmc_opp_table { + compatible = "operating-points-v2"; + + opp00 { + opp-hz = /bits/ 64 <300000000>; + opp-microvolt = <900000>; + }; + opp01 { + opp-hz = /bits/ 64 <666000000>; + opp-microvolt = <900000>; + }; + }; + + dmc: dmc { + compatible = "rockchip,rk3399-dmc"; + devfreq-events = <&dfi>; + interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cru SCLK_DDRCLK>; + clock-names = "dmc_clk"; + operating-points-v2 = <&dmc_opp_table>; + center-supply = <&ppvar_centerlogic>; + upthreshold = <15>; + downdifferential = <10>; + rockchip,ddr3_speed_bin = <21>; + rockchip,pd_idle = <0x40>; + rockchip,sr_idle = <0x2>; + rockchip,sr_mc_gate_idle = <0x3>; + rockchip,srpd_lite_idle = <0x4>; + rockchip,standby_idle = <0x2000>; + rockchip,dram_dll_dis_freq = <300>; + rockchip,phy_dll_dis_freq = <125>; + rockchip,auto_pd_dis_freq = <666>; + rockchip,ddr3_odt_dis_freq = <333>; + rockchip,ddr3_drv = <DDR3_DS_40ohm>; + rockchip,ddr3_odt = <DDR3_ODT_120ohm>; + rockchip,phy_ddr3_ca_drv = <PHY_DRV_ODT_40>; + rockchip,phy_ddr3_dq_drv = <PHY_DRV_ODT_40>; + rockchip,phy_ddr3_odt = <PHY_DRV_ODT_240>; + rockchip,lpddr3_odt_dis_freq = <333>; + rockchip,lpddr3_drv = <LP3_DS_34ohm>; + rockchip,lpddr3_odt = <LP3_ODT_240ohm>; + rockchip,phy_lpddr3_ca_drv = <PHY_DRV_ODT_40>; + rockchip,phy_lpddr3_dq_drv = <PHY_DRV_ODT_40>; + rockchip,phy_lpddr3_odt = <PHY_DRV_ODT_240>; + rockchip,lpddr4_odt_dis_freq = <333>; + rockchip,lpddr4_drv = <LP4_PDDS_60ohm>; + rockchip,lpddr4_dq_odt = <LP4_DQ_ODT_40ohm>; + rockchip,lpddr4_ca_odt = <LP4_CA_ODT_40ohm>; + rockchip,phy_lpddr4_ca_drv = <PHY_DRV_ODT_40>; + rockchip,phy_lpddr4_ck_cs_drv = <PHY_DRV_ODT_80>; + rockchip,phy_lpddr4_dq_drv = <PHY_DRV_ODT_80>; + rockchip,phy_lpddr4_odt = <PHY_DRV_ODT_60>; + status = "disabled"; + }; diff --git a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt index 175f0e44ed85..3c9a57a8443b 100644 --- a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt +++ b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt @@ -8,6 +8,7 @@ Required properties: "fsl,imx51-sdma" "fsl,imx53-sdma" "fsl,imx6q-sdma" + "fsl,imx7d-sdma" The -to variants should be preferred since they allow to determine the correct ROM script addresses needed for the driver to work without additional firmware. diff --git a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt index 5b902ac8d97e..5f2ce669789a 100644 --- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt @@ -1,4 +1,4 @@ -* Renesas R-Car DMA Controller Device Tree bindings +* Renesas R-Car (RZ/G) DMA Controller Device Tree bindings Renesas R-Car Generation 2 SoCs have multiple multi-channel DMA controller instances named DMAC capable of serving multiple clients. Channels @@ -16,6 +16,8 @@ Required Properties: - compatible: "renesas,dmac-<soctype>", "renesas,rcar-dmac" as fallback. Examples with soctypes are: + - "renesas,dmac-r8a7743" (RZ/G1M) + - "renesas,dmac-r8a7745" (RZ/G1E) - "renesas,dmac-r8a7790" (R-Car H2) - "renesas,dmac-r8a7791" (R-Car M2-W) - "renesas,dmac-r8a7792" (R-Car V2H) diff --git a/Documentation/devicetree/bindings/dma/sun6i-dma.txt b/Documentation/devicetree/bindings/dma/sun6i-dma.txt index d13c136cef8c..6b267045f522 100644 --- a/Documentation/devicetree/bindings/dma/sun6i-dma.txt +++ b/Documentation/devicetree/bindings/dma/sun6i-dma.txt @@ -7,6 +7,7 @@ Required properties: - compatible: Must be one of "allwinner,sun6i-a31-dma" "allwinner,sun8i-a23-dma" + "allwinner,sun8i-a83t-dma" "allwinner,sun8i-h3-dma" - reg: Should contain the registers base address and length - interrupts: Should contain a reference to the interrupt used by this device diff --git a/Documentation/devicetree/bindings/extcon/qcom,pm8941-misc.txt b/Documentation/devicetree/bindings/extcon/qcom,pm8941-misc.txt new file mode 100644 index 000000000000..35383adb10f1 --- /dev/null +++ b/Documentation/devicetree/bindings/extcon/qcom,pm8941-misc.txt @@ -0,0 +1,41 @@ +Qualcomm's PM8941 USB ID Extcon device + +Some Qualcomm PMICs have a "misc" module that can be used to detect when +the USB ID pin has been pulled low or high. + +PROPERTIES + +- compatible: + Usage: required + Value type: <string> + Definition: Should contain "qcom,pm8941-misc"; + +- reg: + Usage: required + Value type: <u32> + Definition: Should contain the offset to the misc address space + +- interrupts: + Usage: required + Value type: <prop-encoded-array> + Definition: Should contain the usb id interrupt + +- interrupt-names: + Usage: required + Value type: <stringlist> + Definition: Should contain the string "usb_id" for the usb id interrupt + +Example: + + pmic { + usb_id: misc@900 { + compatible = "qcom,pm8941-misc"; + reg = <0x900>; + interrupts = <0x0 0x9 0 IRQ_TYPE_EDGE_BOTH>; + interrupt-names = "usb_id"; + }; + } + + usb-controller { + extcon = <&usb_id>; + }; diff --git a/Documentation/devicetree/bindings/gpio/brcm,bcm6345-gpio.txt b/Documentation/devicetree/bindings/gpio/brcm,bcm6345-gpio.txt new file mode 100644 index 000000000000..e7853143fa42 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/brcm,bcm6345-gpio.txt @@ -0,0 +1,46 @@ +Bindings for the Broadcom's brcm,bcm6345-gpio memory-mapped GPIO controllers. + +These bindings can be used on any BCM63xx SoC. However, BCM6338 and BCM6345 +are the only ones which don't need a pinctrl driver. +BCM6338 have 8-bit data and dirout registers, where GPIO state can be read +and/or written, and the direction changed from input to output. +BCM6345 have 16-bit data and dirout registers, where GPIO state can be read +and/or written, and the direction changed from input to output. + +Required properties: + - compatible: should be "brcm,bcm6345-gpio" + - reg-names: must contain + "dat" - data register + "dirout" - direction (output) register + - reg: address + size pairs describing the GPIO register sets; + order must correspond with the order of entries in reg-names + - #gpio-cells: must be set to 2. The first cell is the pin number and + the second cell is used to specify the gpio polarity: + 0 = active high + 1 = active low + - gpio-controller: Marks the device node as a gpio controller. + +Optional properties: + - native-endian: use native endian memory. + +Examples: + - BCM6338: + gpio: gpio-controller@fffe0407 { + compatible = "brcm,bcm6345-gpio"; + reg-names = "dirout", "dat"; + reg = <0xfffe0407 1>, <0xfffe040f 1>; + + #gpio-cells = <2>; + gpio-controller; + }; + + - BCM6345: + gpio: gpio-controller@fffe0406 { + compatible = "brcm,bcm6345-gpio"; + reg-names = "dirout", "dat"; + reg = <0xfffe0406 2>, <0xfffe040a 2>; + native-endian; + + #gpio-cells = <2>; + gpio-controller; + }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt b/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt new file mode 100644 index 000000000000..393bb2ed8a77 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt @@ -0,0 +1,36 @@ +Aspeed GPIO controller Device Tree Bindings +------------------------------------------- + +Required properties: +- compatible : Either "aspeed,ast2400-gpio" or "aspeed,ast2500-gpio" + +- #gpio-cells : Should be two + - First cell is the GPIO line number + - Second cell is used to specify optional + parameters (unused) + +- reg : Address and length of the register set for the device +- gpio-controller : Marks the device node as a GPIO controller. +- interrupts : Interrupt specifier (see interrupt bindings for + details) +- interrupt-controller : Mark the GPIO controller as an interrupt-controller + +Optional properties: + +- interrupt-parent : The parent interrupt controller, optional if inherited + +The gpio and interrupt properties are further described in their respective +bindings documentation: + +- Documentation/devicetree/bindings/gpio/gpio.txt +- Documentation/devicetree/bindings/interrupt-controller/interrupts.txt + + Example: + gpio@1e780000 { + #gpio-cells = <2>; + compatible = "aspeed,ast2400-gpio"; + gpio-controller; + interrupts = <20>; + reg = <0x1e780000 0x1000>; + interrupt-controller; + }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt new file mode 100644 index 000000000000..a6611304dd3c --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt @@ -0,0 +1,30 @@ +AXP209 GPIO controller + +This driver follows the usual GPIO bindings found in +Documentation/devicetree/bindings/gpio/gpio.txt + +Required properties: +- compatible: Should be "x-powers,axp209-gpio" +- #gpio-cells: Should be two. The first cell is the pin number and the + second is the GPIO flags. +- gpio-controller: Marks the device node as a GPIO controller. + +This node must be a subnode of the axp20x PMIC, documented in +Documentation/devicetree/bindings/mfd/axp20x.txt + +Example: + +axp209: pmic@34 { + compatible = "x-powers,axp209"; + reg = <0x34>; + interrupt-parent = <&nmi_intc>; + interrupts = <0 IRQ_TYPE_LEVEL_LOW>; + interrupt-controller; + #interrupt-cells = <1>; + + axp_gpio: gpio { + compatible = "x-powers,axp209-gpio"; + gpio-controller; + #gpio-cells = <2>; + }; +}; diff --git a/Documentation/devicetree/bindings/gpio/gpio-tpic2810.txt b/Documentation/devicetree/bindings/gpio/gpio-tpic2810.txt new file mode 100644 index 000000000000..1afc2de7a537 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-tpic2810.txt @@ -0,0 +1,16 @@ +TPIC2810 GPIO controller bindings + +Required properties: + - compatible : Should be "ti,tpic2810". + - reg : The I2C address of the device + - gpio-controller : Marks the device node as a GPIO controller. + - #gpio-cells : Should be two. For consumer use see gpio.txt. + +Example: + + gpio@60 { + compatible = "ti,tpic2810"; + reg = <0x60>; + gpio-controller; + #gpio-cells = <2>; + }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-tps65086.txt b/Documentation/devicetree/bindings/gpio/gpio-tps65086.txt deleted file mode 100644 index ba051074bedc..000000000000 --- a/Documentation/devicetree/bindings/gpio/gpio-tps65086.txt +++ /dev/null @@ -1,16 +0,0 @@ -* TPS65086 GPO Controller bindings - -Required properties: - - compatible : Should be "ti,tps65086-gpio". - - gpio-controller : Marks the device node as a GPIO Controller. - - #gpio-cells : Should be two. The first cell is the pin number - and the second cell is used to specify flags. - See ../gpio/gpio.txt for possible values. - -Example: - - gpio4: gpio { - compatible = "ti,tps65086-gpio"; - gpio-controller; - #gpio-cells = <2>; - }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-ts4900.txt b/Documentation/devicetree/bindings/gpio/gpio-ts4900.txt new file mode 100644 index 000000000000..3f8e71b1ab2a --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-ts4900.txt @@ -0,0 +1,30 @@ +* Technologic Systems I2C-FPGA's GPIO controller bindings + +This bindings describes the GPIO controller for Technologic's FPGA core. +TS-4900's FPGA encodes the GPIO state on 3 bits, whereas the TS-7970's FPGA +uses 2 bits: it doesn't use a dedicated input bit. + +Required properties: +- compatible: Should be one of the following + "technologic,ts4900-gpio" + "technologic,ts7970-gpio" +- reg: Physical base address of the controller and length + of memory mapped region. +- #gpio-cells: Should be two. The first cell is the pin number. +- gpio-controller: Marks the device node as a gpio controller. + +Optional property: +- ngpios: Number of GPIOs this controller is instantiated with, + the default is 32. See gpio.txt for more details. + +Example: + +&i2c2 { + gpio8: gpio@28 { + compatible = "technologic,ts4900-gpio"; + reg = <0x28>; + #gpio-cells = <2>; + gpio-controller; + ngpios = <32>; + }; +}; diff --git a/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt index 98d198396956..c3d016532d8e 100644 --- a/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt +++ b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt @@ -44,26 +44,3 @@ Example for a PXA3xx platform: interrupt-controller; #interrupt-cells = <0x2>; }; - -* Marvell Orion GPIO Controller - -Required properties: -- compatible : Should be "marvell,orion-gpio" -- reg : Address and length of the register set for controller. -- gpio-controller : So we know this is a gpio controller. -- ngpio : How many gpios this controller has. -- interrupts : Up to 4 Interrupts for the controller. - -Optional properties: -- mask-offset : For SMP Orions, offset for Nth CPU - -Example: - - gpio0: gpio@10100 { - compatible = "marvell,orion-gpio"; - #gpio-cells = <2>; - gpio-controller; - reg = <0x10100 0x40>; - ngpio = <32>; - interrupts = <35>, <36>, <37>, <38>; - }; diff --git a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt index 8da26b35b5c3..7c1ab3b3254f 100644 --- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt +++ b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt @@ -11,6 +11,7 @@ Required Properties: - "renesas,gpio-r8a7793": for R8A7793 (R-Car M2-N) compatible GPIO controller. - "renesas,gpio-r8a7794": for R8A7794 (R-Car E2) compatible GPIO controller. - "renesas,gpio-r8a7795": for R8A7795 (R-Car H3) compatible GPIO controller. + - "renesas,gpio-r8a7796": for R8A7796 (R-Car M3-W) compatible GPIO controller. - "renesas,gpio-rcar": for generic R-Car GPIO controller. - reg: Base address and length of each memory resource used by the GPIO diff --git a/Documentation/devicetree/bindings/hwmon/ltc4151.txt b/Documentation/devicetree/bindings/hwmon/ltc4151.txt new file mode 100644 index 000000000000..d008a5ef525a --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/ltc4151.txt @@ -0,0 +1,18 @@ +LTC4151 High Voltage I2C Current and Voltage Monitor + +Required properties: +- compatible: Must be "lltc,ltc4151" +- reg: I2C address + +Optional properties: +- shunt-resistor-micro-ohms + Shunt resistor value in micro-Ohms + Defaults to <1000> if unset. + +Example: + +ltc4151@6e { + compatible = "lltc,ltc4151"; + reg = <0x6e>; + shunt-resistor-micro-ohms = <1500>; +}; diff --git a/Documentation/devicetree/bindings/hwmon/max6650.txt b/Documentation/devicetree/bindings/hwmon/max6650.txt new file mode 100644 index 000000000000..f6bd87d8e284 --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/max6650.txt @@ -0,0 +1,28 @@ +Bindings for MAX6651 and MAX6650 I2C fan controllers + +Reference: +[1] https://datasheets.maximintegrated.com/en/ds/MAX6650-MAX6651.pdf + +Required properties: +- compatible : One of "maxim,max6650" or "maxim,max6651" +- reg : I2C address, one of 0x1b, 0x1f, 0x4b, 0x48. + +Optional properties, default is to retain the chip's current setting: +- maxim,fan-microvolt : The supply voltage of the fan, either 5000000 uV or + 12000000 uV. +- maxim,fan-prescale : Pre-scaling value, as per datasheet [1]. Lower values + allow more fine-grained control of slower fans. + Valid: 1, 2, 4, 8, 16. +- maxim,fan-target-rpm: Initial requested fan rotation speed. If specified, the + driver selects closed-loop mode and the requested speed. + This ensures the fan is already running before userspace + takes over. + +Example: + fan-max6650: max6650@1b { + reg = <0x1b>; + compatible = "maxim,max6650"; + maxim,fan-microvolt = <12000000>; + maxim,fan-prescale = <4>; + maxim,fan-target-rpm = <1200>; + }; diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt index 5c70ce9c1954..1416c6a0d2cd 100644 --- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -38,6 +38,7 @@ dallas,ds4510 CPU Supervisor with Nonvolatile Memory and Programmable I/O dallas,ds75 Digital Thermometer and Thermostat dlg,da9053 DA9053: flexible system level PMIC with multicore support dlg,da9063 DA9063: system PMIC for quad-core application processors +domintech,dmard09 DMARD09: 3-axis Accelerometer epson,rx8010 I2C-BUS INTERFACE REAL TIME CLOCK MODULE epson,rx8025 High-Stability. I2C-Bus INTERFACE REAL TIME CLOCK MODULE epson,rx8581 I2C-BUS INTERFACE REAL TIME CLOCK MODULE @@ -56,6 +57,7 @@ maxim,ds1050 5 Bit Programmable, Pulse-Width Modulator maxim,max1237 Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs maxim,max6625 9-Bit/12-Bit Temperature Sensors with IยฒC-Compatible Serial Interface mc,rv3029c2 Real Time Clock Module with I2C-Bus +mcube,mc3230 mCube 3-axis 8-bit digital accelerometer microchip,mcp4531-502 Microchip 7-bit Single I2C Digital Potentiometer (5k) microchip,mcp4531-103 Microchip 7-bit Single I2C Digital Potentiometer (10k) microchip,mcp4531-503 Microchip 7-bit Single I2C Digital Potentiometer (50k) diff --git a/Documentation/devicetree/bindings/iio/accel/dmard06.txt b/Documentation/devicetree/bindings/iio/accel/dmard06.txt new file mode 100644 index 000000000000..ce105a12c645 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/accel/dmard06.txt @@ -0,0 +1,19 @@ +Device tree bindings for Domintech DMARD05, DMARD06, DMARD07 accelerometers + +Required properties: + - compatible : Should be "domintech,dmard05" + or "domintech,dmard06" + or "domintech,dmard07" + - reg : I2C address of the chip. Should be 0x1c + +Example: + &i2c1 { + /* ... */ + + accelerometer@1c { + compatible = "domintech,dmard06"; + reg = <0x1c>; + }; + + /* ... */ + }; diff --git a/Documentation/devicetree/bindings/iio/accel/kionix,kxsd9.txt b/Documentation/devicetree/bindings/iio/accel/kionix,kxsd9.txt new file mode 100644 index 000000000000..b25bf3a77e0f --- /dev/null +++ b/Documentation/devicetree/bindings/iio/accel/kionix,kxsd9.txt @@ -0,0 +1,22 @@ +Kionix KXSD9 Accelerometer device tree bindings + +Required properties: + - compatible: should be set to "kionix,kxsd9" + - reg: i2c slave address + +Optional properties: + - vdd-supply: The input supply for VDD + - iovdd-supply: The input supply for IOVDD + - interrupts: The movement detection interrupt + - mount-matrix: See mount-matrix.txt + +Example: + +kxsd9@18 { + compatible = "kionix,kxsd9"; + reg = <0x18>; + interrupt-parent = <&foo>; + interrupts = <57 IRQ_TYPE_EDGE_FALLING>; + iovdd-supply = <&bar>; + vdd-supply = <&baz>; +}; diff --git a/Documentation/devicetree/bindings/iio/adc/mt6577_auxadc.txt b/Documentation/devicetree/bindings/iio/adc/mt6577_auxadc.txt new file mode 100644 index 000000000000..68c45cbbe3d9 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/adc/mt6577_auxadc.txt @@ -0,0 +1,29 @@ +* Mediatek AUXADC - Analog to Digital Converter on Mediatek mobile soc (mt65xx/mt81xx/mt27xx) +=============== + +The Auxiliary Analog/Digital Converter (AUXADC) is an ADC found +in some Mediatek SoCs which among other things measures the temperatures +in the SoC. It can be used directly with register accesses, but it is also +used by thermal controller which reads the temperatures from the AUXADC +directly via its own bus interface. See +Documentation/devicetree/bindings/thermal/mediatek-thermal.txt +for the Thermal Controller which holds a phandle to the AUXADC. + +Required properties: + - compatible: Should be one of: + - "mediatek,mt2701-auxadc": For MT2701 family of SoCs + - "mediatek,mt8173-auxadc": For MT8173 family of SoCs + - reg: Address range of the AUXADC unit. + - clocks: Should contain a clock specifier for each entry in clock-names + - clock-names: Should contain "main". + - #io-channel-cells: Should be 1, see ../iio-bindings.txt + +Example: + +auxadc: adc@11001000 { + compatible = "mediatek,mt2701-auxadc"; + reg = <0 0x11001000 0 0x1000>; + clocks = <&pericfg CLK_PERI_AUXADC>; + clock-names = "main"; + #io-channel-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/iio/adc/ti-adc12138.txt b/Documentation/devicetree/bindings/iio/adc/ti-adc12138.txt new file mode 100644 index 000000000000..049a1d36f013 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/adc/ti-adc12138.txt @@ -0,0 +1,37 @@ +* Texas Instruments' ADC12130/ADC12132/ADC12138 + +Required properties: + - compatible: Should be one of + * "ti,adc12130" + * "ti,adc12132" + * "ti,adc12138" + - reg: SPI chip select number for the device + - interrupts: Should contain interrupt for EOC (end of conversion) + - clocks: phandle to conversion clock input + - spi-max-frequency: Definision as per + Documentation/devicetree/bindings/spi/spi-bus.txt + - vref-p-supply: The regulator supply for positive analog voltage reference + +Optional properties: + - vref-n-supply: The regulator supply for negative analog voltage reference + (Note that this must not go below GND or exceed vref-p) + If not specified, this is assumed to be analog ground. + - ti,acquisition-time: The number of conversion clock periods for the S/H's + acquisition time. Should be one of 6, 10, 18, 34. If not specified, + default value of 10 is used. + For high source impedances, this value can be increased to 18 or 34. + For less ADC accuracy and/or slower CCLK frequencies this value may be + decreased to 6. See section 6.0 INPUT SOURCE RESISTANCE in the + datasheet for details. + +Example: +adc@0 { + compatible = "ti,adc12138"; + reg = <0>; + interrupts = <28 IRQ_TYPE_EDGE_RISING>; + interrupt-parent = <&gpio1>; + clocks = <&cclk>; + vref-p-supply = <&ldo4_reg>; + spi-max-frequency = <5000000>; + ti,acquisition-time = <6>; +}; diff --git a/Documentation/devicetree/bindings/iio/adc/ti-adc161s626.txt b/Documentation/devicetree/bindings/iio/adc/ti-adc161s626.txt new file mode 100644 index 000000000000..9ed2315781e4 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/adc/ti-adc161s626.txt @@ -0,0 +1,16 @@ +* Texas Instruments ADC141S626 and ADC161S626 chips + +Required properties: + - compatible: Should be "ti,adc141s626" or "ti,adc161s626" + - reg: spi chip select number for the device + +Recommended properties: + - spi-max-frequency: Definition as per + Documentation/devicetree/bindings/spi/spi-bus.txt + +Example: +adc@0 { + compatible = "ti,adc161s626"; + reg = <0>; + spi-max-frequency = <4300000>; +}; diff --git a/Documentation/devicetree/bindings/iio/chemical/atlas,orp-sm.txt b/Documentation/devicetree/bindings/iio/chemical/atlas,orp-sm.txt new file mode 100644 index 000000000000..5d8b687d5edc --- /dev/null +++ b/Documentation/devicetree/bindings/iio/chemical/atlas,orp-sm.txt @@ -0,0 +1,22 @@ +* Atlas Scientific ORP-SM OEM sensor + +https://www.atlas-scientific.com/_files/_datasheets/_oem/ORP_oem_datasheet.pdf + +Required properties: + + - compatible: must be "atlas,orp-sm" + - reg: the I2C address of the sensor + - interrupt-parent: should be the phandle for the interrupt controller + - interrupts: the sole interrupt generated by the device + + Refer to interrupt-controller/interrupts.txt for generic interrupt client + node bindings. + +Example: + +atlas@66 { + compatible = "atlas,orp-sm"; + reg = <0x66>; + interrupt-parent = <&gpio1>; + interrupts = <16 2>; +}; diff --git a/Documentation/devicetree/bindings/iio/magnetometer/ak8974.txt b/Documentation/devicetree/bindings/iio/magnetometer/ak8974.txt new file mode 100644 index 000000000000..77d5aba1bd8c --- /dev/null +++ b/Documentation/devicetree/bindings/iio/magnetometer/ak8974.txt @@ -0,0 +1,29 @@ +* Asahi Kasei AK8974 magnetometer sensor + +Required properties: + +- compatible : should be "asahi-kasei,ak8974" +- reg : the I2C address of the magnetometer + +Optional properties: + +- avdd-supply: regulator supply for the analog voltage + (see regulator/regulator.txt) +- dvdd-supply: regulator supply for the digital voltage + (see regulator/regulator.txt) +- interrupts: data ready (DRDY) and interrupt (INT1) lines + from the chip, the DRDY interrupt must be placed first. + The interrupts can be triggered on rising or falling + edges alike. +- mount-matrix: an optional 3x3 mounting rotation matrix + +Example: + +ak8974@0f { + compatible = "asahi-kasei,ak8974"; + reg = <0x0f>; + avdd-supply = <&foo_reg>; + dvdd-supply = <&bar_reg>; + interrupts = <0 IRQ_TYPE_EDGE_RISING>, + <1 IRQ_TYPE_EDGE_RISING>; +}; diff --git a/Documentation/devicetree/bindings/iio/pressure/zpa2326.txt b/Documentation/devicetree/bindings/iio/pressure/zpa2326.txt new file mode 100644 index 000000000000..fb85de676e03 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/pressure/zpa2326.txt @@ -0,0 +1,31 @@ +Murata ZPA2326 pressure sensor + +Pressure sensor from Murata with SPI and I2C bus interfaces. + +Required properties: +- compatible: "murata,zpa2326" +- reg: the I2C address or SPI chip select the device will respond to + +Recommended properties for SPI bus usage: +- spi-max-frequency: maximum SPI bus frequency as documented in + Documentation/devicetree/bindings/spi/spi-bus.txt + +Optional properties: +- vref-supply: an optional regulator that needs to be on to provide VREF + power to the sensor +- vdd-supply: an optional regulator that needs to be on to provide VDD + power to the sensor +- interrupt-parent: phandle to the parent interrupt controller as documented in + Documentation/devicetree/bindings/interrupt-controller/interrupts.txt +- interrupts: interrupt mapping for IRQ as documented in + Documentation/devicetree/bindings/interrupt-controller/interrupts.txt + +Example: + +zpa2326@5c { + compatible = "murata,zpa2326"; + reg = <0x5c>; + interrupt-parent = <&gpio>; + interrupts = <12>; + vdd-supply = <&ldo_1v8_gnss>; +}; diff --git a/Documentation/devicetree/bindings/iio/proximity/sx9500.txt b/Documentation/devicetree/bindings/iio/proximity/sx9500.txt new file mode 100644 index 000000000000..b301dd2b35da --- /dev/null +++ b/Documentation/devicetree/bindings/iio/proximity/sx9500.txt @@ -0,0 +1,24 @@ +Semtech's SX9500 capacitive proximity button device driver + +Required properties: + - compatible: must be "semtech,sx9500" + - reg: i2c address where to find the device + - interrupt-parent : should be the phandle for the interrupt controller + - interrupts : the sole interrupt generated by the device + + Refer to interrupt-controller/interrupts.txt for generic + interrupt client node bindings. + +Optional properties: + - reset-gpios: Reference to the GPIO connected to the device's active + low reset pin. + +Example: + +sx9500@28 { + compatible = "semtech,sx9500"; + reg = <0x28>; + interrupt-parent = <&gpio2>; + interrupts = <16 IRQ_TYPE_LEVEL_LOW>; + reset-gpios = <&gpio2 10 GPIO_ACTIVE_LOW>; +}; diff --git a/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt b/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt new file mode 100644 index 000000000000..28bc5c4d965b --- /dev/null +++ b/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt @@ -0,0 +1,21 @@ +Maxim thermocouple support + +* https://datasheets.maximintegrated.com/en/ds/MAX6675.pdf +* https://datasheets.maximintegrated.com/en/ds/MAX31855.pdf + +Required properties: + + - compatible: must be "maxim,max31855" or "maxim,max6675" + - reg: SPI chip select number for the device + - spi-max-frequency: must be 4300000 + - spi-cpha: must be defined for max6675 to enable SPI mode 1 + + Refer to spi/spi-bus.txt for generic SPI slave bindings. + +Example: + + max31855@0 { + compatible = "maxim,max31855"; + reg = <0>; + spi-max-frequency = <4300000>; + }; diff --git a/Documentation/devicetree/bindings/input/touchscreen/silead_gsl1680.txt b/Documentation/devicetree/bindings/input/touchscreen/silead_gsl1680.txt index 1112e0d794e1..820fee4b77b6 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/silead_gsl1680.txt +++ b/Documentation/devicetree/bindings/input/touchscreen/silead_gsl1680.txt @@ -13,6 +13,7 @@ Required properties: - touchscreen-size-y : See touchscreen.txt Optional properties: +- firmware-name : File basename (string) for board specific firmware - touchscreen-inverted-x : See touchscreen.txt - touchscreen-inverted-y : See touchscreen.txt - touchscreen-swapped-x-y : See touchscreen.txt diff --git a/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt new file mode 100644 index 000000000000..ee2ad36f8df8 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt @@ -0,0 +1,26 @@ +J-Core Advanced Interrupt Controller + +Required properties: + +- compatible: Should be "jcore,aic1" for the (obsolete) first-generation aic + with 8 interrupt lines with programmable priorities, or "jcore,aic2" for + the "aic2" core with 64 interrupts. + +- reg: Memory region(s) for configuration. For SMP, there should be one + region per cpu, indexed by the sequential, zero-based hardware cpu + number. + +- interrupt-controller: Identifies the node as an interrupt controller + +- #interrupt-cells: Specifies the number of cells needed to encode an + interrupt source. The value shall be 1. + + +Example: + +aic: interrupt-controller@200 { + compatible = "jcore,aic2"; + reg = < 0x200 0x30 0x500 0x30 >; + interrupt-controller; + #interrupt-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/interrupt-controller/marvell,armada-8k-pic.txt b/Documentation/devicetree/bindings/interrupt-controller/marvell,armada-8k-pic.txt new file mode 100644 index 000000000000..86a7b4cd03f5 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/marvell,armada-8k-pic.txt @@ -0,0 +1,25 @@ +Marvell Armada 7K/8K PIC Interrupt controller +--------------------------------------------- + +This is the Device Tree binding for the PIC, a secondary interrupt +controller available on the Marvell Armada 7K/8K ARM64 SoCs, and +typically connected to the GIC as the primary interrupt controller. + +Required properties: +- compatible: should be "marvell,armada-8k-pic" +- interrupt-controller: identifies the node as an interrupt controller +- #interrupt-cells: the number of cells to define interrupts on this + controller. Should be 1 +- reg: the register area for the PIC interrupt controller +- interrupts: the interrupt to the primary interrupt controller, + typically the GIC + +Example: + + pic: interrupt-controller@3f0100 { + compatible = "marvell,armada-8k-pic"; + reg = <0x3f0100 0x10>; + #interrupt-cells = <1>; + interrupt-controller; + interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>; + }; diff --git a/Documentation/devicetree/bindings/interrupt-controller/marvell,odmi-controller.txt b/Documentation/devicetree/bindings/interrupt-controller/marvell,odmi-controller.txt index 8af0a8e613ab..3f6442c7f867 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/marvell,odmi-controller.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/marvell,odmi-controller.txt @@ -31,7 +31,7 @@ Required properties: Example: odmi: odmi@300000 { - compatible = "marvell,ap806-odm-controller", + compatible = "marvell,ap806-odmi-controller", "marvell,odmi-controller"; interrupt-controller; msi-controller; diff --git a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt index ae5054c27c99..e3f052d8c11a 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt @@ -1,10 +1,12 @@ -DT bindings for the R-Mobile/R-Car interrupt controller +DT bindings for the R-Mobile/R-Car/RZ/G interrupt controller Required properties: - compatible: has to be "renesas,irqc-<soctype>", "renesas,irqc" as fallback. Examples with soctypes are: - "renesas,irqc-r8a73a4" (R-Mobile APE6) + - "renesas,irqc-r8a7743" (RZ/G1M) + - "renesas,irqc-r8a7745" (RZ/G1E) - "renesas,irqc-r8a7790" (R-Car H2) - "renesas,irqc-r8a7791" (R-Car M2-W) - "renesas,irqc-r8a7792" (R-Car V2H) diff --git a/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt b/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt new file mode 100644 index 000000000000..6e7703d4ff5b --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt @@ -0,0 +1,20 @@ +STM32 External Interrupt Controller + +Required properties: + +- compatible: Should be "st,stm32-exti" +- reg: Specifies base physical address and size of the registers +- interrupt-controller: Indentifies the node as an interrupt controller +- #interrupt-cells: Specifies the number of cells to encode an interrupt + specifier, shall be 2 +- interrupts: interrupts references to primary interrupt controller + +Example: + +exti: interrupt-controller@40013c00 { + compatible = "st,stm32-exti"; + interrupt-controller; + #interrupt-cells = <2>; + reg = <0x40013C00 0x400>; + interrupts = <1>, <2>, <3>, <6>, <7>, <8>, <9>, <10>, <23>, <40>, <41>, <42>, <62>, <76>; +}; diff --git a/Documentation/devicetree/bindings/leds/common.txt b/Documentation/devicetree/bindings/leds/common.txt index 93ef6e6e43b5..696be5792625 100644 --- a/Documentation/devicetree/bindings/leds/common.txt +++ b/Documentation/devicetree/bindings/leds/common.txt @@ -19,6 +19,13 @@ Optional properties for child nodes: a device, i.e. no other LED class device can be assigned the same label. +- default-state : The initial state of the LED. Valid values are "on", "off", + and "keep". If the LED is already on or off and the default-state property is + set the to same value, then no glitch should be produced where the LED + momentarily turns off (or on). The "keep" setting will keep the LED at + whatever its current state is, without producing a glitch. The default is + off if this property is not present. + - linux,default-trigger : This parameter, if present, is a string defining the trigger assigned to the LED. Current triggers are: "backlight" - LED will act as a back-light, controlled by the framebuffer diff --git a/Documentation/devicetree/bindings/leds/leds-bcm6328.txt b/Documentation/devicetree/bindings/leds/leds-bcm6328.txt index 3f48c1eaf085..ccebce597f37 100644 --- a/Documentation/devicetree/bindings/leds/leds-bcm6328.txt +++ b/Documentation/devicetree/bindings/leds/leds-bcm6328.txt @@ -49,7 +49,7 @@ LED sub-node optional properties: - active-low : Boolean, makes LED active low. Default : false - default-state : see - Documentation/devicetree/bindings/leds/leds-gpio.txt + Documentation/devicetree/bindings/leds/common.txt - linux,default-trigger : see Documentation/devicetree/bindings/leds/common.txt diff --git a/Documentation/devicetree/bindings/leds/leds-bcm6358.txt b/Documentation/devicetree/bindings/leds/leds-bcm6358.txt index b22a55bcc65d..da5708e7b43b 100644 --- a/Documentation/devicetree/bindings/leds/leds-bcm6358.txt +++ b/Documentation/devicetree/bindings/leds/leds-bcm6358.txt @@ -28,7 +28,7 @@ LED sub-node optional properties: - active-low : Boolean, makes LED active low. Default : false - default-state : see - Documentation/devicetree/bindings/leds/leds-gpio.txt + Documentation/devicetree/bindings/leds/common.txt - linux,default-trigger : see Documentation/devicetree/bindings/leds/common.txt diff --git a/Documentation/devicetree/bindings/leds/leds-gpio.txt b/Documentation/devicetree/bindings/leds/leds-gpio.txt index 5b1b43a64265..76535ca37120 100644 --- a/Documentation/devicetree/bindings/leds/leds-gpio.txt +++ b/Documentation/devicetree/bindings/leds/leds-gpio.txt @@ -14,13 +14,8 @@ LED sub-node properties: see Documentation/devicetree/bindings/leds/common.txt - linux,default-trigger : (optional) see Documentation/devicetree/bindings/leds/common.txt -- default-state: (optional) The initial state of the LED. Valid - values are "on", "off", and "keep". If the LED is already on or off - and the default-state property is set the to same value, then no - glitch should be produced where the LED momentarily turns off (or - on). The "keep" setting will keep the LED at whatever its current - state is, without producing a glitch. The default is off if this - property is not present. +- default-state: (optional) The initial state of the LED. + see Documentation/devicetree/bindings/leds/common.txt - retain-state-suspended: (optional) The suspend state can be retained.Such as charge-led gpio. - panic-indicator : (optional) diff --git a/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt b/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt new file mode 100644 index 000000000000..fc2603484544 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt @@ -0,0 +1,59 @@ +LEDs connected to is31fl319x LED controller chip + +Required properties: +- compatible : Should be any of + "issi,is31fl3190" + "issi,is31fl3191" + "issi,is31fl3193" + "issi,is31fl3196" + "issi,is31fl3199" + "si-en,sn3199". +- #address-cells: Must be 1. +- #size-cells: Must be 0. +- reg: 0x64, 0x65, 0x66, or 0x67. + +Optional properties: +- audio-gain-db : audio gain selection for external analog modulation input. + Valid values: 0 - 21, step by 3 (rounded down) + Default: 0 + +Each led is represented as a sub-node of the issi,is31fl319x device. +There can be less leds subnodes than the chip can support but not more. + +Required led sub-node properties: +- reg : number of LED line + Valid values: 1 - number of leds supported by the chip variant. + +Optional led sub-node properties: +- label : see Documentation/devicetree/bindings/leds/common.txt. +- linux,default-trigger : + see Documentation/devicetree/bindings/leds/common.txt. +- led-max-microamp : (optional) + Valid values: 5000 - 40000, step by 5000 (rounded down) + Default: 20000 (20 mA) + Note: a driver will take the lowest of all led limits since the + chip has a single global setting. The lowest value will be chosen + due to the PWM specificity, where lower brightness is achieved + by reducing the dury-cycle of pulses and not the current, which + will always have its peak value equal to led-max-microamp. + +Examples: + +fancy_leds: leds@65 { + compatible = "issi,is31fl3196"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x65>; + + red_aux: led@1 { + label = "red:aux"; + reg = <1>; + led-max-microamp = <10000>; + }; + + green_power: led@5 { + label = "green:power"; + reg = <5>; + linux,default-trigger = "default-on"; + }; +}; diff --git a/Documentation/devicetree/bindings/leds/leds-pm8058.txt b/Documentation/devicetree/bindings/leds/leds-pm8058.txt new file mode 100644 index 000000000000..89584c49aab2 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-pm8058.txt @@ -0,0 +1,67 @@ +Qualcomm PM8058 LED driver + +The Qualcomm PM8058 is a multi-functional device which contains +an LED driver block for up to six LEDs: three normal LEDs, two +"flash" LEDs and one "keypad backlight" LED. The names are +quoted because sometimes these LED drivers are used for wildly +different things than flash or keypad backlight: their names +are more of a suggestion than a hard-wired usecase. + +Hardware-wise the different LEDs support slightly different +output currents. The "flash" LEDs do not need to charge nor +do they support external triggers. They are just powerful LED +drivers. + +The LEDs appear as children to the PM8058 device, with the +proper compatible string. For the PM8058 bindings see: +mfd/qcom-pm8xxx.txt. + +Each LED is represented as a sub-node of the syscon device. Each +node's name represents the name of the corresponding LED. + +LED sub-node properties: + +Required properties: +- compatible: one of + "qcom,pm8058-led" (for the normal LEDs at 0x131, 0x132 and 0x133) + "qcom,pm8058-keypad-led" (for the "keypad" LED at 0x48) + "qcom,pm8058-flash-led" (for the "flash" LEDs at 0x49 and 0xFB) + +Optional properties: +- label: see Documentation/devicetree/bindings/leds/common.txt +- default-state: see Documentation/devicetree/bindings/leds/common.txt +- linux,default-trigger: see Documentation/devicetree/bindings/leds/common.txt + +Example: + +qcom,ssbi@500000 { + pmicintc: pmic@0 { + compatible = "qcom,pm8058"; + led@48 { + compatible = "qcom,pm8058-keypad-led"; + reg = <0x48>; + label = "pm8050:white:keypad"; + default-state = "off"; + }; + led@131 { + compatible = "qcom,pm8058-led"; + reg = <0x131>; + label = "pm8058:red"; + default-state = "off"; + }; + led@132 { + compatible = "qcom,pm8058-led"; + reg = <0x132>; + label = "pm8058:yellow"; + default-state = "off"; + linux,default-trigger = "mmc0"; + }; + led@133 { + compatible = "qcom,pm8058-led"; + reg = <0x133>; + label = "pm8058:green"; + default-state = "on"; + linux,default-trigger = "heartbeat"; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/leds/register-bit-led.txt b/Documentation/devicetree/bindings/leds/register-bit-led.txt index 379cefdc0bda..59b56365f648 100644 --- a/Documentation/devicetree/bindings/leds/register-bit-led.txt +++ b/Documentation/devicetree/bindings/leds/register-bit-led.txt @@ -23,13 +23,8 @@ Optional properties: see Documentation/devicetree/bindings/leds/common.txt - linux,default-trigger : (optional) see Documentation/devicetree/bindings/leds/common.txt -- default-state: (optional) The initial state of the LED. Valid - values are "on", "off", and "keep". If the LED is already on or off - and the default-state property is set the to same value, then no - glitch should be produced where the LED momentarily turns off (or - on). The "keep" setting will keep the LED at whatever its current - state is, without producing a glitch. The default is off if this - property is not present. +- default-state: (optional) The initial state of the LED + see Documentation/devicetree/bindings/leds/common.txt Example: diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mem-ctrlr.txt b/Documentation/devicetree/bindings/memory-controllers/fsl/ddr.txt index f87856faf1ab..dde6d837083a 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/mem-ctrlr.txt +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/ddr.txt @@ -7,6 +7,8 @@ Properties: "fsl,qoriq-memory-controller". - reg : Address and size of DDR controller registers - interrupts : Error interrupt of DDR controller +- little-endian : Specifies little-endian access to registers + If omitted, big-endian will be used. Example 1: diff --git a/Documentation/devicetree/bindings/mfd/max77693.txt b/Documentation/devicetree/bindings/mfd/max77693.txt index d3425846aa5b..6a1ae3a2b77f 100644 --- a/Documentation/devicetree/bindings/mfd/max77693.txt +++ b/Documentation/devicetree/bindings/mfd/max77693.txt @@ -17,28 +17,28 @@ Required properties: - interrupt-parent : The parent interrupt controller. Optional properties: -- regulators : The regulators of max77693 have to be instantiated under subnod +- regulators : The regulators of max77693 have to be instantiated under subnode named "regulators" using the following format. regulators { - regualtor-compatible = ESAFEOUT1/ESAFEOUT2/CHARGER - standard regulator constratints[*]. + regulator-compatible = ESAFEOUT1/ESAFEOUT2/CHARGER + standard regulator constraints[*]. }; [*] refer Documentation/devicetree/bindings/regulator/regulator.txt - haptic : The MAX77693 haptic device utilises a PWM controlled motor to provide users with tactile feedback. PWM period and duty-cycle are varied in - order to provide the approprite level of feedback. + order to provide the appropriate level of feedback. Required properties: - - compatible : Must be "maxim,max77693-hpatic" + - compatible : Must be "maxim,max77693-haptic" - haptic-supply : power supply for the haptic motor [*] refer Documentation/devicetree/bindings/regulator/regulator.txt - pwms : phandle to the physical PWM(Pulse Width Modulation) device. PWM properties should be named "pwms". And number of cell is different for each pwm device. - To get more informations, please refer to documentaion. + To get more information, please refer to documentation. [*] refer Documentation/devicetree/bindings/pwm/pwm.txt - charger : Node configuring the charger driver. diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt b/Documentation/devicetree/bindings/mfd/stmpe.txt index 3fb68bfefc8b..f9065a5781a2 100644 --- a/Documentation/devicetree/bindings/mfd/stmpe.txt +++ b/Documentation/devicetree/bindings/mfd/stmpe.txt @@ -4,7 +4,7 @@ STMPE is an MFD device which may expose the following inbuilt devices: gpio, keypad, touchscreen, adc, pwm, rotator. Required properties: - - compatible : "st,stmpe[610|801|811|1601|2401|2403]" + - compatible : "st,stmpe[610|801|811|1600|1601|2401|2403]" - reg : I2C/SPI address of the device Optional properties: diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt index 3404afa9b938..49df630bd44f 100644 --- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt +++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt @@ -36,6 +36,9 @@ Optional Properties: - #clock-cells: If specified this should be the value <0>. With this property in place we will export a clock representing the Card Clock. This clock is expected to be consumed by our PHY. You must also specify + - xlnx,fails-without-test-cd: when present, the controller doesn't work when + the CD line is not connected properly, and the line is not connected + properly. Test mode can be used to force the controller to function. Example: sdhci@e0100000 { diff --git a/Documentation/devicetree/bindings/mmc/brcm,bcm7425-sdhci.txt b/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt index 82847174c37d..733b64a4d8eb 100644 --- a/Documentation/devicetree/bindings/mmc/brcm,bcm7425-sdhci.txt +++ b/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt @@ -8,7 +8,9 @@ on Device Tree properties to enable them for SoC/Board combinations that support them. Required properties: -- compatible: "brcm,bcm7425-sdhci" +- compatible: should be one of the following + - "brcm,bcm7425-sdhci" + - "brcm,bcm7445-sdhci" Refer to clocks/clock-bindings.txt for generic clock consumer properties. diff --git a/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.txt b/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.txt index ce0e76749671..e25436861867 100644 --- a/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.txt +++ b/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.txt @@ -16,6 +16,8 @@ Optional properties: See ../clocks/clock-bindings.txt for details. - clock-names : Must include the following entry: "ext_clock" (External clock provided to the card). +- post-power-on-delay-ms : Delay in ms after powering the card and + de-asserting the reset-gpios (if any) Example: diff --git a/Documentation/devicetree/bindings/mmc/mmc.txt b/Documentation/devicetree/bindings/mmc/mmc.txt index 22d1e1f3f38b..8a377827695b 100644 --- a/Documentation/devicetree/bindings/mmc/mmc.txt +++ b/Documentation/devicetree/bindings/mmc/mmc.txt @@ -75,6 +75,17 @@ Optional SDIO properties: - wakeup-source: Enables wake up of host system on SDIO IRQ assertion (Legacy property supported: "enable-sdio-wakeup") +MMC power +--------- + +Controllers may implement power control from both the connected cards and +the IO signaling (for example to change to high-speed 1.8V signalling). If +the system supports this, then the following two properties should point +to valid regulator nodes: + +- vqmmc-supply: supply node for IO line power +- vmmc-supply: supply node for card's power + MMC power sequences: -------------------- @@ -102,11 +113,13 @@ Required host node properties when using function subnodes: - #size-cells: should be zero. Required function subnode properties: -- compatible: name of SDIO function following generic names recommended practice - reg: Must contain the SDIO function number of the function this subnode describes. A value of 0 denotes the memory SD function, values from 1 to 7 denote the SDIO functions. +Optional function subnode properties: +- compatible: name of SDIO function following generic names recommended practice + Examples -------- diff --git a/Documentation/devicetree/bindings/mmc/sdhci-st.txt b/Documentation/devicetree/bindings/mmc/sdhci-st.txt index 88faa91125bf..3cd4c43a3260 100644 --- a/Documentation/devicetree/bindings/mmc/sdhci-st.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-st.txt @@ -10,7 +10,7 @@ Required properties: subsystem (mmcss) inside the FlashSS (available in STiH407 SoC family). -- clock-names: Should be "mmc". +- clock-names: Should be "mmc" and "icn". (NB: The latter is not compulsory) See: Documentation/devicetree/bindings/resource-names.txt - clocks: Phandle to the clock. See: Documentation/devicetree/bindings/clock/clock-bindings.txt diff --git a/Documentation/devicetree/bindings/mmc/sunxi-mmc.txt b/Documentation/devicetree/bindings/mmc/sunxi-mmc.txt index 4bf41d833804..55cdd804cdba 100644 --- a/Documentation/devicetree/bindings/mmc/sunxi-mmc.txt +++ b/Documentation/devicetree/bindings/mmc/sunxi-mmc.txt @@ -8,7 +8,12 @@ as the speed of SD standard 3.0. Absolute maximum transfer rate is 200MB/s Required properties: - - compatible : "allwinner,sun4i-a10-mmc" or "allwinner,sun5i-a13-mmc" + - compatible : should be one of: + * "allwinner,sun4i-a10-mmc" + * "allwinner,sun5i-a13-mmc" + * "allwinner,sun7i-a20-mmc" + * "allwinner,sun9i-a80-mmc" + * "allwinner,sun50i-a64-mmc" - reg : mmc controller base registers - clocks : a list with 4 phandle + clock specifier pairs - clock-names : must contain "ahb", "mmc", "output" and "sample" diff --git a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt index 8636f5ae97e5..4e00e859e885 100644 --- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt @@ -39,6 +39,10 @@ Required Properties: Optional properties: +* resets: phandle + reset specifier pair, intended to represent hardware + reset signal present internally in some host controller IC designs. + See Documentation/devicetree/bindings/reset/reset.txt for details. + * clocks: from common clock binding: handle to biu and ciu clocks for the bus interface unit clock and the card interface unit clock. diff --git a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt index 0f610d4b5b00..13df9c2399c3 100644 --- a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt +++ b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt @@ -23,6 +23,7 @@ Required properties: "renesas,sdhi-r8a7793" - SDHI IP on R8A7793 SoC "renesas,sdhi-r8a7794" - SDHI IP on R8A7794 SoC "renesas,sdhi-r8a7795" - SDHI IP on R8A7795 SoC + "renesas,sdhi-r8a7796" - SDHI IP on R8A7796 SoC Optional properties: - toshiba,mmc-wrprotect-disable: write-protect detection is unavailable diff --git a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt index e41b2d59ca7f..f591ab782dbc 100644 --- a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt +++ b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt @@ -47,6 +47,9 @@ Optional properties: Valid values are between 0 to 7, that maps to 273, 589, 899, 1222, 1480, 1806, 2147, 2464 ps Default value is 2, which corresponds to 899 ps +- rxlos-gpios: Input gpio from SFP+ module to indicate availability of + incoming signal. + Example: menetclk: menetclk { diff --git a/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt b/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt index 30d487597ecb..fb40891ee606 100644 --- a/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt +++ b/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt @@ -6,9 +6,13 @@ Required properties: - reg: addresses and length of the register sets for the device, must be 6 pairs of register addresses and lengths - interrupts: interrupts for the devices, must be two interrupts +- #address-cells: must be 1, see dsa/dsa.txt +- #size-cells: must be 0, see dsa/dsa.txt + +Deprecated binding required properties: + - dsa,mii-bus: phandle to the MDIO bus controller, see dsa/dsa.txt - dsa,ethernet: phandle to the CPU network interface controller, see dsa/dsa.txt -- #size-cells: must be 0 - #address-cells: must be 2, see dsa/dsa.txt Subnodes: @@ -48,6 +52,45 @@ switch_top@f0b00000 { ethernet_switch@0 { compatible = "brcm,bcm7445-switch-v4.0"; #size-cells = <0>; + #address-cells = <1>; + reg = <0x0 0x40000 + 0x40000 0x110 + 0x40340 0x30 + 0x40380 0x30 + 0x40400 0x34 + 0x40600 0x208>; + reg-names = "core", "reg", intrl2_0", "intrl2_1", + "fcb, "acb"; + interrupts = <0 0x18 0 + 0 0x19 0>; + brcm,num-gphy = <1>; + brcm,num-rgmii-ports = <2>; + brcm,fcb-pause-override; + brcm,acb-packets-inflight; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + label = "gphy"; + reg = <0>; + }; + }; + }; +}; + +Example using the old DSA DeviceTree binding: + +switch_top@f0b00000 { + compatible = "simple-bus"; + #size-cells = <1>; + #address-cells = <1>; + ranges = <0 0xf0b00000 0x40804>; + + ethernet_switch@0 { + compatible = "brcm,bcm7445-switch-v4.0"; + #size-cells = <0>; #address-cells = <2>; reg = <0x0 0x40000 0x40000 0x110 diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt new file mode 100644 index 000000000000..9c67ee4890d7 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -0,0 +1,89 @@ +* Qualcomm Atheros QCA8xxx switch family + +Required properties: + +- compatible: should be "qca,qca8337" +- #size-cells: must be 0 +- #address-cells: must be 1 + +Subnodes: + +The integrated switch subnode should be specified according to the binding +described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of +port and PHY id, each subnode describing a port needs to have a valid phandle +referencing the internal PHY connected to it. The CPU port of this switch is +always port 0. + +Example: + + + &mdio0 { + phy_port1: phy@0 { + reg = <0>; + }; + + phy_port2: phy@1 { + reg = <1>; + }; + + phy_port3: phy@2 { + reg = <2>; + }; + + phy_port4: phy@3 { + reg = <3>; + }; + + phy_port5: phy@4 { + reg = <4>; + }; + + switch0@0 { + compatible = "qca,qca8337"; + #address-cells = <1>; + #size-cells = <0>; + + reg = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "cpu"; + ethernet = <&gmac1>; + phy-mode = "rgmii"; + }; + + port@1 { + reg = <1>; + label = "lan1"; + phy-handle = <&phy_port1>; + }; + + port@2 { + reg = <2>; + label = "lan2"; + phy-handle = <&phy_port2>; + }; + + port@3 { + reg = <3>; + label = "lan3"; + phy-handle = <&phy_port3>; + }; + + port@4 { + reg = <4>; + label = "lan4"; + phy-handle = <&phy_port4>; + }; + + port@5 { + reg = <5>; + label = "wan"; + phy-handle = <&phy_port5>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt index 5d88f37480b6..e1d76812419c 100644 --- a/Documentation/devicetree/bindings/net/ethernet.txt +++ b/Documentation/devicetree/bindings/net/ethernet.txt @@ -11,8 +11,8 @@ The following properties are common to the Ethernet controllers: the maximum frame size (there's contradiction in ePAPR). - phy-mode: string, operation mode of the PHY interface; supported values are "mii", "gmii", "sgmii", "qsgmii", "tbi", "rev-mii", "rmii", "rgmii", "rgmii-id", - "rgmii-rxid", "rgmii-txid", "rtbi", "smii", "xgmii"; this is now a de-facto - standard property; + "rgmii-rxid", "rgmii-txid", "rtbi", "smii", "xgmii", "trgmii"; this is now a + de-facto standard property; - phy-connection-type: the same as "phy-mode" property but described in ePAPR; - phy-handle: phandle, specifies a reference to a node representing a PHY device; this property is described in ePAPR and so preferred; diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index b5a42df4c928..1506e948610c 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -21,6 +21,7 @@ Required properties: - clock-names: Tuple listing input clock names. Required elements: 'pclk', 'hclk' Optional elements: 'tx_clk' + Optional elements: 'rx_clk' applies to cdns,zynqmp-gem - clocks: Phandles to input clocks. Optional properties for PHY child node: diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt index 32eaaca04d9b..f09525772369 100644 --- a/Documentation/devicetree/bindings/net/mediatek-net.txt +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt @@ -24,14 +24,17 @@ Required properties: Optional properties: - interrupt-parent: Should be the phandle for the interrupt controller that services interrupts for this device - +- mediatek,hwlro: the capability if the hardware supports LRO functions * Ethernet MAC node Required properties: - compatible: Should be "mediatek,eth-mac" - reg: The number of the MAC -- phy-handle: see ethernet.txt file in the same directory. +- phy-handle: see ethernet.txt file in the same directory and + the phy-mode "trgmii" required being provided when reg + is equal to 0 and the MAC uses fixed-link to connect + with internal switch such as MT7530. Example: @@ -51,6 +54,7 @@ eth: ethernet@1b100000 { reset-names = "eth"; mediatek,ethsys = <ðsys>; mediatek,pctl = <&syscfg_pctl_a>; + mediatek,hwlro; #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt new file mode 100644 index 000000000000..99c7eb0a00c8 --- /dev/null +++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt @@ -0,0 +1,58 @@ +* Microsemi - vsc8531 Giga bit ethernet phy + +Required properties: +- compatible : Should contain phy id as "ethernet-phy-idAAAA.BBBB" + The PHY device uses the binding described in + Documentation/devicetree/bindings/net/phy.txt + +Optional properties: +- vsc8531,vddmac : The vddmac in mV. +- vsc8531,edge-slowdown : % the edge should be slowed down relative to + the fastest possible edge time. Native sign + need not enter. + Edge rate sets the drive strength of the MAC + interface output signals. Changing the drive + strength will affect the edge rate of the output + signal. The goal of this setting is to help + reduce electrical emission (EMI) by being able + to reprogram drive strength and in effect slow + down the edge rate if desired. Table 1 shows the + impact to the edge rate per VDDMAC supply for each + drive strength setting. + Ref: Table:1 - Edge rate change below. + +Note: see dt-bindings/net/mscc-phy-vsc8531.h for applicable values + +Table: 1 - Edge rate change +----------------------------------------------------------------| +| Edge Rate Change (VDDMAC) | +| | +| 3300 mV 2500 mV 1800 mV 1500 mV | +|---------------------------------------------------------------| +| Default Deafult Default Default | +| (Fastest) (recommended) (recommended) | +|---------------------------------------------------------------| +| -2% -3% -5% -6% | +|---------------------------------------------------------------| +| -4% -6% -9% -14% | +|---------------------------------------------------------------| +| -7% -10% -16% -21% | +|(recommended) (recommended) | +|---------------------------------------------------------------| +| -10% -14% -23% -29% | +|---------------------------------------------------------------| +| -17% -23% -35% -42% | +|---------------------------------------------------------------| +| -29% -37% -52% -58% | +|---------------------------------------------------------------| +| -53% -63% -76% -77% | +| (slowest) | +|---------------------------------------------------------------| + +Example: + + vsc8531_0: ethernet-phy@0 { + compatible = "ethernet-phy-id0007.0570"; + vsc8531,vddmac = <3300>; + vsc8531,edge-slowdown = <21>; + }; diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt b/Documentation/devicetree/bindings/net/qcom-emac.txt new file mode 100644 index 000000000000..346e6c7f47b7 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt @@ -0,0 +1,111 @@ +Qualcomm Technologies EMAC Gigabit Ethernet Controller + +This network controller consists of two devices: a MAC and an SGMII +internal PHY. Each device is represented by a device tree node. A phandle +connects the MAC node to its corresponding internal phy node. Another +phandle points to the external PHY node. + +Required properties: + +MAC node: +- compatible : Should be "qcom,fsm9900-emac". +- reg : Offset and length of the register regions for the device +- interrupts : Interrupt number used by this controller +- mac-address : The 6-byte MAC address. If present, it is the default + MAC address. +- internal-phy : phandle to the internal PHY node +- phy-handle : phandle the the external PHY node + +Internal PHY node: +- compatible : Should be "qcom,fsm9900-emac-sgmii" or "qcom,qdf2432-emac-sgmii". +- reg : Offset and length of the register region(s) for the device +- interrupts : Interrupt number used by this controller + +The external phy child node: +- reg : The phy address + +Example: + +FSM9900: + +soc { + #address-cells = <1>; + #size-cells = <1>; + + emac0: ethernet@feb20000 { + compatible = "qcom,fsm9900-emac"; + reg = <0xfeb20000 0x10000>, + <0xfeb36000 0x1000>; + interrupts = <76>; + + clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>, + <&gcc 6>, <&gcc 7>; + clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk", + "mdio_clk", "tx_clk", "rx_clk", "sys_clk"; + + internal-phy = <&emac_sgmii>; + + phy-handle = <&phy0>; + + #address-cells = <1>; + #size-cells = <0>; + phy0: ethernet-phy@0 { + reg = <0>; + }; + + pinctrl-names = "default"; + pinctrl-0 = <&mdio_pins_a>; + }; + + emac_sgmii: ethernet@feb38000 { + compatible = "qcom,fsm9900-emac-sgmii"; + reg = <0xfeb38000 0x1000>; + interrupts = <80>; + }; + + tlmm: pinctrl@fd510000 { + compatible = "qcom,fsm9900-pinctrl"; + + mdio_pins_a: mdio { + state { + pins = "gpio123", "gpio124"; + function = "mdio"; + }; + }; + }; + + +QDF2432: + +soc { + #address-cells = <2>; + #size-cells = <2>; + + emac0: ethernet@38800000 { + compatible = "qcom,fsm9900-emac"; + reg = <0x0 0x38800000 0x0 0x10000>, + <0x0 0x38816000 0x0 0x1000>; + interrupts = <0 256 4>; + + clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>, + <&gcc 6>, <&gcc 7>; + clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk", + "mdio_clk", "tx_clk", "rx_clk", "sys_clk"; + + internal-phy = <&emac_sgmii>; + + phy-handle = <&phy0>; + + #address-cells = <1>; + #size-cells = <0>; + phy0: ethernet-phy@4 { + reg = <4>; + }; + }; + + emac_sgmii: ethernet@410400 { + compatible = "qcom,qdf2432-emac-sgmii"; + reg = <0x0 0x00410400 0x0 0xc00>, /* Base address */ + <0x0 0x00410000 0x0 0x400>; /* Per-lane digital */ + interrupts = <0 254 1>; + }; diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt index cccd945fc45b..95383c5131fc 100644 --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt @@ -3,8 +3,12 @@ Rockchip SoC RK3288 10/100/1000 Ethernet driver(GMAC) The device node has following properties. Required properties: - - compatible: Can be one of "rockchip,rk3228-gmac", "rockchip,rk3288-gmac", - "rockchip,rk3368-gmac" + - compatible: should be "rockchip,<name>-gamc" + "rockchip,rk3228-gmac": found on RK322x SoCs + "rockchip,rk3288-gmac": found on RK3288 SoCs + "rockchip,rk3366-gmac": found on RK3366 SoCs + "rockchip,rk3368-gmac": found on RK3368 SoCs + "rockchip,rk3399-gmac": found on RK3399 SoCs - reg: addresses and length of the register sets for the device. - interrupts: Should contain the GMAC interrupts. - interrupt-names: Should contain the interrupt names "macirq". diff --git a/Documentation/devicetree/bindings/net/sh_eth.txt b/Documentation/devicetree/bindings/net/sh_eth.txt index 2f6ec85fda8e..0115c85a2425 100644 --- a/Documentation/devicetree/bindings/net/sh_eth.txt +++ b/Documentation/devicetree/bindings/net/sh_eth.txt @@ -5,6 +5,8 @@ interface contains. Required properties: - compatible: "renesas,gether-r8a7740" if the device is a part of R8A7740 SoC. + "renesas,ether-r8a7743" if the device is a part of R8A7743 SoC. + "renesas,ether-r8a7745" if the device is a part of R8A7745 SoC. "renesas,ether-r8a7778" if the device is a part of R8A7778 SoC. "renesas,ether-r8a7779" if the device is a part of R8A7779 SoC. "renesas,ether-r8a7790" if the device is a part of R8A7790 SoC. diff --git a/Documentation/devicetree/bindings/net/smsc911x.txt b/Documentation/devicetree/bindings/net/smsc911x.txt index 3fed3c124411..16c3a9501f5d 100644 --- a/Documentation/devicetree/bindings/net/smsc911x.txt +++ b/Documentation/devicetree/bindings/net/smsc911x.txt @@ -3,9 +3,11 @@ Required properties: - compatible : Should be "smsc,lan<model>", "smsc,lan9115" - reg : Address and length of the io space for SMSC LAN -- interrupts : Should contain SMSC LAN interrupt line -- interrupt-parent : Should be the phandle for the interrupt controller - that services interrupts for this device +- interrupts : one or two interrupt specifiers + - The first interrupt is the SMSC LAN interrupt line + - The second interrupt (if present) is the PME (power + management event) interrupt that is able to wake up the host + system with a 50ms pulse on network activity - phy-mode : See ethernet.txt file in the same directory Optional properties: @@ -21,6 +23,10 @@ Optional properties: external PHY - smsc,save-mac-address : Indicates that mac address needs to be saved before resetting the controller +- reset-gpios : a GPIO line connected to the RESET (active low) signal + of the device. On many systems this is wired high so the device goes + out of reset at power-on, but if it is under program control, this + optional GPIO can wake up in response to it. Examples: @@ -29,7 +35,8 @@ lan9220@f4000000 { reg = <0xf4000000 0x2000000>; phy-mode = "mii"; interrupt-parent = <&gpio1>; - interrupts = <31>; + interrupts = <31>, <32>; + reset-gpios = <&gpio1 30 GPIO_ACTIVE_LOW>; reg-io-width = <4>; smsc,irq-push-pull; }; diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt index 51f8d2eba8d8..d93f71ce8346 100644 --- a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt +++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt @@ -1,21 +1,111 @@ * Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC) +This binding supports the Synopsys Designware Ethernet QoS (Quality Of Service) +IP block. The IP supports multiple options for bus type, clocking and reset +structure, and feature list. Consequently, a number of properties and list +entries in properties are marked as optional, or only required in specific HW +configurations. Required properties: -- compatible: Should be "snps,dwc-qos-ethernet-4.10" +- compatible: One of: + - "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10" + Represents the IP core when integrated into the Axis ARTPEC-6 SoC. + - "nvidia,tegra186-eqos", "snps,dwc-qos-ethernet-4.10" + Represents the IP core when integrated into the NVIDIA Tegra186 SoC. + - "snps,dwc-qos-ethernet-4.10" + This combination is deprecated. It should be treated as equivalent to + "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10". It is supported to be + compatible with earlier revisions of this binding. - reg: Address and length of the register set for the device -- clocks: Phandles to the reference clock and the bus clock -- clock-names: Should be "phy_ref_clk" for the reference clock and "apb_pclk" - for the bus clock. +- clocks: Phandle and clock specifiers for each entry in clock-names, in the + same order. See ../clock/clock-bindings.txt. +- clock-names: May contain any/all of the following depending on the IP + configuration, in any order: + - "tx" + The EQOS transmit path clock. The HW signal name is clk_tx_i. + In some configurations (e.g. GMII/RGMII), this clock also drives the PHY TX + path. In other configurations, other clocks (such as tx_125, rmii) may + drive the PHY TX path. + - "rx" + The EQOS receive path clock. The HW signal name is clk_rx_i. + In some configurations (e.g. GMII/RGMII), this clock is derived from the + PHY's RX clock output. In other configurations, other clocks (such as + rx_125, rmii) may drive the EQOS RX path. + In cases where the PHY clock is directly fed into the EQOS receive path + without intervening logic, the DT need not represent this clock, since it + is assumed to be fully under the control of the PHY device/driver. In + cases where SoC integration adds additional logic to this path, such as a + SW-controlled clock gate, this clock should be represented in DT. + - "slave_bus" + The CPU/slave-bus (CSR) interface clock. This applies to any bus type; + APB, AHB, AXI, etc. The HW signal name is hclk_i (AHB) or clk_csr_i (other + buses). + - "master_bus" + The master bus interface clock. Only required in configurations that use a + separate clock for the master and slave bus interfaces. The HW signal name + is hclk_i (AHB) or aclk_i (AXI). + - "ptp_ref" + The PTP reference clock. The HW signal name is clk_ptp_ref_i. + - "phy_ref_clk" + This clock is deprecated and should not be used by new compatible values. + It is equivalent to "tx". + - "apb_pclk" + This clock is deprecated and should not be used by new compatible values. + It is equivalent to "slave_bus". + + Note: Support for additional IP configurations may require adding the + following clocks to this list in the future: clk_rx_125_i, clk_tx_125_i, + clk_pmarx_0_i, clk_pmarx1_i, clk_rmii_i, clk_revmii_rx_i, clk_revmii_tx_i. + Configurations exist where multiple similar clocks are used at once, e.g. all + of clk_rx_125_i, clk_pmarx_0_i, clk_pmarx1_i. For this reason it is best to + extend the binding with a separate clock-names entry for each of those RX + clocks, rather than repurposing the existing "rx" clock-names entry as a + generic/logical clock in a similar fashion to "master_bus" and "slave_bus". + This will allow easy support for configurations that support multiple PHY + interfaces using a mux, and hence need to have explicit control over + specific RX clocks. + + The following compatible values require the following set of clocks: + - "nvidia,tegra186-eqos", "snps,dwc-qos-ethernet-4.10": + - "slave_bus" + - "master_bus" + - "rx" + - "tx" + - "ptp_ref" + - "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10": + - "slave_bus" + - "master_bus" + - "tx" + - "ptp_ref" + - "snps,dwc-qos-ethernet-4.10" (deprecated): + - "phy_ref_clk" + - "apb_clk" - interrupt-parent: Should be the phandle for the interrupt controller that services interrupts for this device - interrupts: Should contain the core's combined interrupt signal - phy-mode: See ethernet.txt file in the same directory +- resets: Phandle and reset specifiers for each entry in reset-names, in the + same order. See ../reset/reset.txt. +- reset-names: May contain any/all of the following depending on the IP + configuration, in any order: + - "eqos". The reset to the entire module. The HW signal name is hreset_n + (AHB) or aresetn_i (AXI). + + The following compatible values require the following set of resets: + (the reset properties may be omitted if empty) + - "nvidia,tegra186-eqos", "snps,dwc-qos-ethernet-4.10": + - "eqos". + - "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10": + - None. + - "snps,dwc-qos-ethernet-4.10" (deprecated): + - None. Optional properties: - dma-coherent: Present if dma operations are coherent - mac-address: See ethernet.txt in the same directory - local-mac-address: See ethernet.txt in the same directory +- phy-reset-gpios: Phandle and specifier for any GPIO used to reset the PHY. + See ../gpio/gpio.txt. - snps,en-lpi: If present it enables use of the AXI low-power interface - snps,write-requests: Number of write requests that the AXI port can issue. It depends on the SoC configuration. @@ -52,6 +142,7 @@ ethernet2@40010000 { reg = <0x40010000 0x4000>; phy-handle = <&phy2>; phy-mode = "gmii"; + phy-reset-gpios = <&gpioctlr 43 GPIO_ACTIVE_LOW>; snps,en-tx-lpi-clockgating; snps,en-lpi; diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt new file mode 100644 index 000000000000..c35afb7e956a --- /dev/null +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt @@ -0,0 +1,32 @@ +STMicroelectronics STM32 / MCU DWMAC glue layer controller + +This file documents platform glue layer for stmmac. +Please see stmmac.txt for the other unchanged properties. + +The device node has following properties. + +Required properties: +- compatible: Should be "st,stm32-dwmac" to select glue, and + "snps,dwmac-3.50a" to select IP version. +- clocks: Must contain a phandle for each entry in clock-names. +- clock-names: Should be "stmmaceth" for the host clock. + Should be "mac-clk-tx" for the MAC TX clock. + Should be "mac-clk-rx" for the MAC RX clock. +- st,syscon : Should be phandle/offset pair. The phandle to the syscon node which + encompases the glue register, and the offset of the control register. +Example: + + ethernet@40028000 { + compatible = "st,stm32-dwmac", "snps,dwmac-3.50a"; + status = "disabled"; + reg = <0x40028000 0x8000>; + reg-names = "stmmaceth"; + interrupts = <0 61 0>, <0 62 0>; + interrupt-names = "macirq", "eth_wake_irq"; + clock-names = "stmmaceth", "mac-clk-tx", "mac-clk-rx"; + clocks = <&rcc 0 25>, <&rcc 0 26>, <&rcc 0 27>; + st,syscon = <&syscfg 0x4>; + snps,pbl = <8>; + snps,mixed-burst; + dma-ranges; + }; diff --git a/Documentation/devicetree/bindings/net/wireless/esp,esp8089.txt b/Documentation/devicetree/bindings/net/wireless/esp,esp8089.txt new file mode 100644 index 000000000000..19331bb4ff6e --- /dev/null +++ b/Documentation/devicetree/bindings/net/wireless/esp,esp8089.txt @@ -0,0 +1,31 @@ +Espressif ESP8089 wireless SDIO devices + +This node provides properties for controlling the ESP8089 wireless device. +The node is expected to be specified as a child node to the SDIO controller +that connects the device to the system. + +Required properties: + + - compatible : Should be "esp,esp8089". + +Optional properties: + - esp,crystal-26M-en: Integer value for the crystal_26M_en firmware parameter + +Example: + +&mmc1 { + #address-cells = <1>; + #size-cells = <0>; + + vmmc-supply = <®_dldo1>; + mmc-pwrseq = <&wifi_pwrseq>; + bus-width = <4>; + non-removable; + status = "okay"; + + esp8089: sdio_wifi@1 { + compatible = "esp,esp8089"; + reg = <1>; + esp,crystal-26M-en = <2>; + }; +}; diff --git a/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt new file mode 100644 index 000000000000..038dda48b8e6 --- /dev/null +++ b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt @@ -0,0 +1,35 @@ +XILINX GMIITORGMII Converter Driver Device Tree Bindings +-------------------------------------------------------- + +The Gigabit Media Independent Interface (GMII) to Reduced Gigabit Media +Independent Interface (RGMII) core provides the RGMII between RGMII-compliant +Ethernet physical media devices (PHY) and the Gigabit Ethernet controller. +This core can be used in all three modes of operation(10/100/1000 Mb/s). +The Management Data Input/Output (MDIO) interface is used to configure the +Speed of operation. This core can switch dynamically between the three +Different speed modes by configuring the conveter register through mdio write. + +This converter sits between the ethernet MAC and the external phy. +MAC <==> GMII2RGMII <==> RGMII_PHY + +For more details about mdio please refer phy.txt file in the same directory. + +Required properties: +- compatible : Should be "xlnx,gmii-to-rgmii-1.0" +- reg : The ID number for the phy, usually a small integer +- phy-handle : Should point to the external phy device. + See ethernet.txt file in the same directory. + +Example: + mdio { + #address-cells = <1>; + #size-cells = <0>; + phy: ethernet-phy@0 { + ...... + }; + gmiitorgmii: gmiitorgmii@8 { + compatible = "xlnx,gmii-to-rgmii-1.0"; + reg = <8>; + phy-handle = <&phy>; + }; + }; diff --git a/Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt b/Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt index 8f86ab3b1046..94aeeeabadd5 100644 --- a/Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt +++ b/Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt @@ -1,11 +1,20 @@ = Rockchip eFuse device tree bindings = Required properties: -- compatible: Should be "rockchip,rockchip-efuse" +- compatible: Should be one of the following. + - "rockchip,rk3066a-efuse" - for RK3066a SoCs. + - "rockchip,rk3188-efuse" - for RK3188 SoCs. + - "rockchip,rk3288-efuse" - for RK3288 SoCs. + - "rockchip,rk3399-efuse" - for RK3399 SoCs. - reg: Should contain the registers location and exact eFuse size - clocks: Should be the clock id of eFuse - clock-names: Should be "pclk_efuse" +Deprecated properties: +- compatible: "rockchip,rockchip-efuse" + Old efuse compatible value compatible to rk3066a, rk3188 and rk3288 + efuses + = Data cells = Are child nodes of eFuse, bindings of which as described in bindings/nvmem/nvmem.txt @@ -13,7 +22,7 @@ bindings/nvmem/nvmem.txt Example: efuse: efuse@ffb40000 { - compatible = "rockchip,rockchip-efuse"; + compatible = "rockchip,rk3288-efuse"; reg = <0xffb40000 0x20>; #address-cells = <1>; #size-cells = <1>; diff --git a/Documentation/devicetree/bindings/pci/axis,artpec6-pcie.txt b/Documentation/devicetree/bindings/pci/axis,artpec6-pcie.txt index 330a45b5f0b5..5ecaea1e6eee 100644 --- a/Documentation/devicetree/bindings/pci/axis,artpec6-pcie.txt +++ b/Documentation/devicetree/bindings/pci/axis,artpec6-pcie.txt @@ -24,16 +24,17 @@ Example: compatible = "axis,artpec6-pcie", "snps,dw-pcie"; reg = <0xf8050000 0x2000 0xf8040000 0x1000 - 0xc0000000 0x1000>; + 0xc0000000 0x2000>; reg-names = "dbi", "phy", "config"; #address-cells = <3>; #size-cells = <2>; device_type = "pci"; /* downstream I/O */ - ranges = <0x81000000 0 0x00010000 0xc0010000 0 0x00010000 + ranges = <0x81000000 0 0 0xc0002000 0 0x00010000 /* non-prefetchable memory */ - 0x82000000 0 0xc0020000 0xc0020000 0 0x1ffe0000>; + 0x82000000 0 0xc0012000 0xc0012000 0 0x1ffee000>; num-lanes = <2>; + bus-range = <0x00 0xff>; interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "msi"; #interrupt-cells = <1>; diff --git a/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt b/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt index 337fc97d18c9..3259798a1192 100644 --- a/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt +++ b/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt @@ -55,9 +55,10 @@ nwl_pcie: pcie@fd0e0000 { msi-parent = <&nwl_pcie>; reg = <0x0 0xfd0e0000 0x0 0x1000>, <0x0 0xfd480000 0x0 0x1000>, - <0x0 0xe0000000 0x0 0x1000000>; + <0x80 0x00000000 0x0 0x1000000>; reg-names = "breg", "pcireg", "cfg"; - ranges = <0x02000000 0x00000000 0xe1000000 0x00000000 0xe1000000 0 0x0f000000>; + ranges = <0x02000000 0x00000000 0xe0000000 0x00000000 0xe0000000 0x00000000 0x10000000 /* non-prefetchable memory */ + 0x43000000 0x00000006 0x00000000 0x00000006 0x00000000 0x00000002 0x00000000>;/* prefetchable memory */ pcie_intc: legacy-interrupt-controller { interrupt-controller; diff --git a/Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt b/Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt new file mode 100644 index 000000000000..09aeba94538d --- /dev/null +++ b/Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt @@ -0,0 +1,23 @@ +Driver for Broadcom Northstar USB 3.0 PHY + +Required properties: + +- compatible: one of: "brcm,ns-ax-usb3-phy", "brcm,ns-bx-usb3-phy". +- reg: register mappings for DMP (Device Management Plugin) and ChipCommon B + MMI. +- reg-names: "dmp" and "ccb-mii" + +Initialization of USB 3.0 PHY depends on Northstar version. There are currently +three known series: Ax, Bx and Cx. +Known A0: BCM4707 rev 0 +Known B0: BCM4707 rev 4, BCM53573 rev 2 +Known B1: BCM4707 rev 6 +Known C0: BCM47094 rev 0 + +Example: + usb3-phy { + compatible = "brcm,ns-ax-usb3-phy"; + reg = <0x18105000 0x1000>, <0x18003000 0x1000>; + reg-names = "dmp", "ccb-mii"; + #phy-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/phy/mxs-usb-phy.txt b/Documentation/devicetree/bindings/phy/mxs-usb-phy.txt index 379b84a567cc..1d25b04cd05e 100644 --- a/Documentation/devicetree/bindings/phy/mxs-usb-phy.txt +++ b/Documentation/devicetree/bindings/phy/mxs-usb-phy.txt @@ -12,6 +12,16 @@ Required properties: - interrupts: Should contain phy interrupt - fsl,anatop: phandle for anatop register, it is only for imx6 SoC series +Optional properties: +- fsl,tx-cal-45-dn-ohms: Integer [30-55]. Resistance (in ohms) of switchable + high-speed trimming resistor connected in parallel with the 45 ohm resistor + that terminates the DN output signal. Default: 45 +- fsl,tx-cal-45-dp-ohms: Integer [30-55]. Resistance (in ohms) of switchable + high-speed trimming resistor connected in parallel with the 45 ohm resistor + that terminates the DP output signal. Default: 45 +- fsl,tx-d-cal: Integer [79-119]. Current trimming value (as a percentage) of + the 17.78mA TX reference current. Default: 100 + Example: usbphy1: usbphy@020c9000 { compatible = "fsl,imx6q-usbphy", "fsl,imx23-usbphy"; diff --git a/Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt b/Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt new file mode 100644 index 000000000000..3c29c77a7018 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt @@ -0,0 +1,64 @@ +ROCKCHIP USB2.0 PHY WITH INNO IP BLOCK + +Required properties (phy (parent) node): + - compatible : should be one of the listed compatibles: + * "rockchip,rk3366-usb2phy" + * "rockchip,rk3399-usb2phy" + - reg : the address offset of grf for usb-phy configuration. + - #clock-cells : should be 0. + - clock-output-names : specify the 480m output clock name. + +Optional properties: + - clocks : phandle + phy specifier pair, for the input clock of phy. + - clock-names : input clock name of phy, must be "phyclk". + +Required nodes : a sub-node is required for each port the phy provides. + The sub-node name is used to identify host or otg port, + and shall be the following entries: + * "otg-port" : the name of otg port. + * "host-port" : the name of host port. + +Required properties (port (child) node): + - #phy-cells : must be 0. See ./phy-bindings.txt for details. + - interrupts : specify an interrupt for each entry in interrupt-names. + - interrupt-names : a list which shall be the following entries: + * "otg-id" : for the otg id interrupt. + * "otg-bvalid" : for the otg vbus interrupt. + * "linestate" : for the host/otg linestate interrupt. + +Optional properties: + - phy-supply : phandle to a regulator that provides power to VBUS. + See ./phy-bindings.txt for details. + +Example: + +grf: syscon@ff770000 { + compatible = "rockchip,rk3366-grf", "syscon", "simple-mfd"; + #address-cells = <1>; + #size-cells = <1>; + +... + + u2phy: usb2-phy@700 { + compatible = "rockchip,rk3366-usb2phy"; + reg = <0x700 0x2c>; + #clock-cells = <0>; + clock-output-names = "sclk_otgphy0_480m"; + + u2phy_otg: otg-port { + #phy-cells = <0>; + interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "otg-id", "otg-bvalid", "linestate"; + status = "okay"; + }; + + u2phy_host: host-port { + #phy-cells = <0>; + interrupts = <GIC_SPI 96 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "linestate"; + status = "okay"; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt new file mode 100644 index 000000000000..6ea867e3176f --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt @@ -0,0 +1,101 @@ +* ROCKCHIP type-c PHY +--------------------- + +Required properties: + - compatible : must be "rockchip,rk3399-typec-phy" + - reg: Address and length of the usb phy control register set + - rockchip,grf : phandle to the syscon managing the "general + register files" + - clocks : phandle + clock specifier for the phy clocks + - clock-names : string, clock name, must be "tcpdcore", "tcpdphy-ref"; + - assigned-clocks: main clock, should be <&cru SCLK_UPHY0_TCPDCORE> or + <&cru SCLK_UPHY1_TCPDCORE>; + - assigned-clock-rates : the phy core clk frequency, shall be: 50000000 + - resets : a list of phandle + reset specifier pairs + - reset-names : string reset name, must be: + "uphy", "uphy-pipe", "uphy-tcphy" + - extcon : extcon specifier for the Power Delivery + +Note, there are 2 type-c phys for RK3399, and they are almost identical, except +these registers(description below), every register node contains 3 sections: +offset, enable bit, write mask bit. + - rockchip,typec-conn-dir : the register of type-c connector direction, + for type-c phy0, it must be <0xe580 0 16>; + for type-c phy1, it must be <0xe58c 0 16>; + - rockchip,usb3tousb2-en : the register of type-c force usb3 to usb2 enable + control. + for type-c phy0, it must be <0xe580 3 19>; + for type-c phy1, it must be <0xe58c 3 19>; + - rockchip,external-psm : the register of type-c phy external psm clock + selection. + for type-c phy0, it must be <0xe588 14 30>; + for type-c phy1, it must be <0xe594 14 30>; + - rockchip,pipe-status : the register of type-c phy pipe status. + for type-c phy0, it must be <0xe5c0 0 0>; + for type-c phy1, it must be <0xe5c0 16 16>; + +Required nodes : a sub-node is required for each port the phy provides. + The sub-node name is used to identify dp or usb3 port, + and shall be the following entries: + * "dp-port" : the name of DP port. + * "usb3-port" : the name of USB3 port. + +Required properties (port (child) node): +- #phy-cells : must be 0, See ./phy-bindings.txt for details. + +Example: + tcphy0: phy@ff7c0000 { + compatible = "rockchip,rk3399-typec-phy"; + reg = <0x0 0xff7c0000 0x0 0x40000>; + rockchip,grf = <&grf>; + extcon = <&fusb0>; + clocks = <&cru SCLK_UPHY0_TCPDCORE>, + <&cru SCLK_UPHY0_TCPDPHY_REF>; + clock-names = "tcpdcore", "tcpdphy-ref"; + assigned-clocks = <&cru SCLK_UPHY0_TCPDCORE>; + assigned-clock-rates = <50000000>; + resets = <&cru SRST_UPHY0>, + <&cru SRST_UPHY0_PIPE_L00>, + <&cru SRST_P_UPHY0_TCPHY>; + reset-names = "uphy", "uphy-pipe", "uphy-tcphy"; + rockchip,typec-conn-dir = <0xe580 0 16>; + rockchip,usb3tousb2-en = <0xe580 3 19>; + rockchip,external-psm = <0xe588 14 30>; + rockchip,pipe-status = <0xe5c0 0 0>; + + tcphy0_dp: dp-port { + #phy-cells = <0>; + }; + + tcphy0_usb3: usb3-port { + #phy-cells = <0>; + }; + }; + + tcphy1: phy@ff800000 { + compatible = "rockchip,rk3399-typec-phy"; + reg = <0x0 0xff800000 0x0 0x40000>; + rockchip,grf = <&grf>; + extcon = <&fusb1>; + clocks = <&cru SCLK_UPHY1_TCPDCORE>, + <&cru SCLK_UPHY1_TCPDPHY_REF>; + clock-names = "tcpdcore", "tcpdphy-ref"; + assigned-clocks = <&cru SCLK_UPHY1_TCPDCORE>; + assigned-clock-rates = <50000000>; + resets = <&cru SRST_UPHY1>, + <&cru SRST_UPHY1_PIPE_L00>, + <&cru SRST_P_UPHY1_TCPHY>; + reset-names = "uphy", "uphy-pipe", "uphy-tcphy"; + rockchip,typec-conn-dir = <0xe58c 0 16>; + rockchip,usb3tousb2-en = <0xe58c 3 19>; + rockchip,external-psm = <0xe594 14 30>; + rockchip,pipe-status = <0xe5c0 16 16>; + + tcphy1_dp: dp-port { + #phy-cells = <0>; + }; + + tcphy1_usb3: usb3-port { + #phy-cells = <0>; + }; + }; diff --git a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt index 2281d6cdecb1..ace9cce2704a 100644 --- a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt +++ b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt @@ -6,6 +6,8 @@ This file provides information on what the device node for the R-Car generation Required properties: - compatible: "renesas,usb2-phy-r8a7795" if the device is a part of an R8A7795 SoC. + "renesas,usb2-phy-r8a7796" if the device is a part of an R8A7796 + SoC. "renesas,rcar-gen3-usb2-phy" for a generic R-Car Gen3 compatible device. When compatible with the generic version, nodes must list the @@ -30,11 +32,11 @@ Example (R-Car H3): compatible = "renesas,usb2-phy-r8a7795", "renesas,rcar-gen3-usb2-phy"; reg = <0 0xee080200 0 0x700>; interrupts = <GIC_SPI 108 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&mstp7_clks R8A7795_CLK_EHCI0>; + clocks = <&cpg CPG_MOD 703>; }; usb-phy@ee0a0200 { compatible = "renesas,usb2-phy-r8a7795", "renesas,rcar-gen3-usb2-phy"; reg = <0 0xee0a0200 0 0x700>; - clocks = <&mstp7_clks R8A7795_CLK_EHCI0>; + clocks = <&cpg CPG_MOD 702>; }; diff --git a/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt new file mode 100644 index 000000000000..0f6222a672ce --- /dev/null +++ b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt @@ -0,0 +1,31 @@ +Rockchip PCIE PHY +----------------------- + +Required properties: + - compatible: rockchip,rk3399-pcie-phy + - #phy-cells: must be 0 + - clocks: Must contain an entry in clock-names. + See ../clocks/clock-bindings.txt for details. + - clock-names: Must be "refclk" + - resets: Must contain an entry in reset-names. + See ../reset/reset.txt for details. + - reset-names: Must be "phy" + +Example: + +grf: syscon@ff770000 { + compatible = "rockchip,rk3399-grf", "syscon", "simple-mfd"; + #address-cells = <1>; + #size-cells = <1>; + + ... + + pcie_phy: pcie-phy { + compatible = "rockchip,rk3399-pcie-phy"; + #phy-cells = <0>; + clocks = <&cru SCLK_PCIEPHY_REF>; + clock-names = "refclk"; + resets = <&cru SRST_PCIEPHY>; + reset-names = "phy"; + }; +}; diff --git a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt index cc6be9680a6d..57dc388e2fa2 100644 --- a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt +++ b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt @@ -27,6 +27,9 @@ Optional Properties: - clocks : phandle + clock specifier for the phy clocks - clock-names: string, clock name, must be "phyclk" - #clock-cells: for users of the phy-pll, should be 0 +- reset-names: Only allow the following entries: + - phy-reset +- resets: Must contain an entry for each entry in reset-names. Example: diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt index 95736d77fbb7..287150db6db4 100644 --- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt +++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt @@ -10,6 +10,7 @@ Required properties: * allwinner,sun8i-a23-usb-phy * allwinner,sun8i-a33-usb-phy * allwinner,sun8i-h3-usb-phy + * allwinner,sun50i-a64-usb-phy - reg : a list of offset + length pairs - reg-names : * "phy_ctrl" diff --git a/Documentation/devicetree/bindings/phy/ti-phy.txt b/Documentation/devicetree/bindings/phy/ti-phy.txt index a3b394587874..cd13e6157088 100644 --- a/Documentation/devicetree/bindings/phy/ti-phy.txt +++ b/Documentation/devicetree/bindings/phy/ti-phy.txt @@ -31,6 +31,8 @@ OMAP USB2 PHY Required properties: - compatible: Should be "ti,omap-usb2" + Should be "ti,dra7x-usb2" for the 1st instance of USB2 PHY on + DRA7x Should be "ti,dra7x-usb2-phy2" for the 2nd instance of USB2 PHY in DRA7x - reg : Address and length of the register set for the device. diff --git a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt index 69617220c5d6..1685821eea41 100644 --- a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt @@ -23,6 +23,7 @@ Required properties: "allwinner,sun8i-h3-pinctrl" "allwinner,sun8i-h3-r-pinctrl" "allwinner,sun50i-a64-pinctrl" + "nextthing,gr8-pinctrl" - reg: Should contain the register physical address and length for the pin controller. diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt new file mode 100644 index 000000000000..5e60ad18f147 --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt @@ -0,0 +1,65 @@ +Aspeed Pin Controllers +---------------------- + +The Aspeed SoCs vary in functionality inside a generation but have a common mux +device register layout. + +Required properties: +- compatible : Should be any one of the following: + "aspeed,ast2400-pinctrl" + "aspeed,g4-pinctrl" + "aspeed,ast2500-pinctrl" + "aspeed,g5-pinctrl" + +The pin controller node should be a child of a syscon node with the required +property: +- compatible: "syscon", "simple-mfd" + +Refer to the the bindings described in +Documentation/devicetree/bindings/mfd/syscon.txt + +Subnode Format +-------------- + +The required properties of child nodes are (as defined in pinctrl-bindings): +- function +- groups + +Each function has only one associated pin group. Each group is named by its +function. The following values for the function and groups properties are +supported: + +aspeed,ast2400-pinctrl, aspeed,g4-pinctrl: + +ACPI BMCINT DDCCLK DDCDAT FLACK FLBUSY FLWP GPID0 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 +I2C11 I2C12 I2C13 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8 I2C9 LPCPD LPCPME LPCSMI MDIO1 +MDIO2 NCTS1 NCTS3 NCTS4 NDCD1 NDCD3 NDCD4 NDSR1 NDSR3 NDTR1 NDTR3 NRI1 NRI3 +NRI4 NRTS1 NRTS3 PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RMII1 ROM16 +ROM8 ROMCS1 ROMCS2 ROMCS3 ROMCS4 RXD1 RXD3 RXD4 SD1 SGPMI SIOPBI SIOPBO TIMER3 +TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD3 TXD4 UART6 VGAHS VGAVS VPI18 VPI24 VPI30 +VPO12 VPO24 + +aspeed,ast2500-pinctrl, aspeed,g5-pinctrl: + +GPID0 GPID2 GPIE0 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8 +I2C9 MAC1LINK MDIO1 MDIO2 OSCCLK PEWAKE PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 +RGMII1 RGMII2 RMII1 RMII2 SD1 SPI1 TIMER4 TIMER5 TIMER6 TIMER7 TIMER8 + +Examples: + +syscon: scu@1e6e2000 { + compatible = "syscon", "simple-mfd"; + reg = <0x1e6e2000 0x1a8>; + + pinctrl: pinctrl { + compatible = "aspeed,g4-pinctrl"; + + pinctrl_i2c3_default: i2c3_default { + function = "I2C3"; + groups = "I2C3"; + }; + }; +}; + +Please refer to pinctrl-bindings.txt in this directory for details of the +common pinctrl bindings used by client devices. diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-st.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-st.txt index 26bcb18f4e60..013c675b5b64 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-st.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-st.txt @@ -30,8 +30,7 @@ Second type has a dedicated interrupt per gpio bank. Pin controller node: Required properties: -- compatible : should be "st,<SOC>-<pio-block>-pinctrl" - like st,stih415-sbc-pinctrl, st,stih415-front-pinctrl and so on. +- compatible : should be "st,stih407-<pio-block>-pinctrl" - st,syscfg : Should be a phandle of the syscfg node. - st,retime-pin-mask : Should be mask to specify which pins can be retimed. If the property is not present, it is assumed that all the pins in the @@ -50,7 +49,11 @@ Optional properties: GPIO controller/bank node. Required properties: - gpio-controller : Indicates this device is a GPIO controller -- #gpio-cells : Should be one. The first cell is the pin number. +- #gpio-cells : Must be two. + - First cell: specifies the pin number inside the controller + - Second cell: specifies whether the pin is logically inverted. + - 0 = active high + - 1 = active low - st,bank-name : Should be a name string for this bank as specified in datasheet. @@ -76,23 +79,23 @@ include/dt-bindings/interrupt-controller/irq.h Example: pin-controller-sbc { - #address-cells = <1>; - #size-cells = <1>; - compatible = "st,stih415-sbc-pinctrl"; - st,syscfg = <&syscfg_sbc>; - reg = <0xfe61f080 0x4>; - reg-names = "irqmux"; - interrupts = <GIC_SPI 180 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "irqmux"; - ranges = <0 0xfe610000 0x5000>; - - PIO0: gpio@fe610000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "st,stih407-sbc-pinctrl"; + st,syscfg = <&syscfg_sbc>; + reg = <0x0961f080 0x4>; + reg-names = "irqmux"; + interrupts = <GIC_SPI 188 IRQ_TYPE_NONE>; + interrupt-names = "irqmux"; + ranges = <0 0x09610000 0x6000>; + + pio0: gpio@09610000 { gpio-controller; - #gpio-cells = <1>; + #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; - reg = <0 0x100>; - st,bank-name = "PIO0"; + reg = <0x0 0x100>; + st,bank-name = "PIO0"; }; ... pin-functions nodes follow... @@ -162,7 +165,7 @@ pin-controller { sdhci0:sdhci@fe810000{ ... - interrupt-parent = <&PIO3>; + interrupt-parent = <&pio3>; #interrupt-cells = <2>; interrupts = <3 IRQ_TYPE_LEVEL_HIGH>; /* Interrupt line via PIO3-3 */ interrupt-names = "card-detect"; diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt index a54c39ebbf8b..8d893a874634 100644 --- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt +++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt @@ -17,6 +17,9 @@ PMIC's from Qualcomm. "qcom,pm8994-gpio" "qcom,pma8084-gpio" + And must contain either "qcom,spmi-gpio" or "qcom,ssbi-gpio" + if the device is on an spmi bus or an ssbi bus respectively + - reg: Usage: required Value type: <prop-encoded-array> @@ -183,7 +186,7 @@ to specify in a pin configuration subnode: Example: pm8921_gpio: gpio@150 { - compatible = "qcom,pm8921-gpio"; + compatible = "qcom,pm8921-gpio", "qcom,ssbi-gpio"; reg = <0x150 0x160>; interrupts = <192 1>, <193 1>, <194 1>, <195 1>, <196 1>, <197 1>, diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt index b484ba1af78c..2ab95bc26066 100644 --- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt +++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt @@ -19,6 +19,9 @@ of PMIC's from Qualcomm. "qcom,pm8994-mpp", "qcom,pma8084-mpp", + And must contain either "qcom,spmi-mpp" or "qcom,ssbi-mpp" + if the device is on an spmi bus or an ssbi bus respectively. + - reg: Usage: required Value type: <prop-encoded-array> @@ -158,7 +161,7 @@ to specify in a pin configuration subnode: Example: mpps@a000 { - compatible = "qcom,pm8841-mpp"; + compatible = "qcom,pm8841-mpp", "qcom,spmi-mpp"; reg = <0xa000>; gpio-controller; #gpio-cells = <2>; diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt index e4cf022c992e..13df9498311a 100644 --- a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt @@ -17,9 +17,11 @@ Required Properties: - "renesas,pfc-r8a7779": for R8A7779 (R-Car H1) compatible pin-controller. - "renesas,pfc-r8a7790": for R8A7790 (R-Car H2) compatible pin-controller. - "renesas,pfc-r8a7791": for R8A7791 (R-Car M2-W) compatible pin-controller. + - "renesas,pfc-r8a7792": for R8A7792 (R-Car V2H) compatible pin-controller. - "renesas,pfc-r8a7793": for R8A7793 (R-Car M2-N) compatible pin-controller. - "renesas,pfc-r8a7794": for R8A7794 (R-Car E2) compatible pin-controller. - "renesas,pfc-r8a7795": for R8A7795 (R-Car H3) compatible pin-controller. + - "renesas,pfc-r8a7796": for R8A7796 (R-Car M3-W) compatible pin-controller. - "renesas,pfc-sh73a0": for SH73A0 (SH-Mobile AG5) compatible pin-controller. - reg: Base address and length of each memory resource used by the pin diff --git a/Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt index 587bffb9cbc6..f9753c416974 100644 --- a/Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt @@ -14,6 +14,11 @@ Required properies: - #size-cells : The value of this property must be 1 - ranges : defines mapping between pin controller node (parent) to gpio-bank node (children). + - interrupt-parent: phandle of the interrupt parent to which the external + GPIO interrupts are forwarded to. + - st,syscfg: Should be phandle/offset pair. The phandle to the syscon node + which includes IRQ mux selection register, and the offset of the IRQ mux + selection register. - pins-are-numbered: Specify the subnodes are using numbered pinmux to specify pins. diff --git a/Documentation/devicetree/bindings/regulator/ltc3676.txt b/Documentation/devicetree/bindings/regulator/ltc3676.txt new file mode 100644 index 000000000000..d4eb366ce18c --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/ltc3676.txt @@ -0,0 +1,94 @@ +Linear Technology LTC3676 8-output regulators + +Required properties: +- compatible: "lltc,ltc3676" +- reg: I2C slave address + +Required child node: +- regulators: Contains eight regulator child nodes sw1, sw2, sw3, sw4, + ldo1, ldo2, ldo3, and ldo4, specifying the initialization data as + documented in Documentation/devicetree/bindings/regulator/regulator.txt. + +Each regulator is defined using the standard binding for regulators. The +nodes for sw1, sw2, sw3, sw4, ldo1, ldo2 and ldo4 additionally need to specify +the resistor values of their external feedback voltage dividers: + +Required properties (not on ldo3): +- lltc,fb-voltage-divider: An array of two integers containing the resistor + values R1 and R2 of the feedback voltage divider in ohms. + +Regulators sw1, sw2, sw3, sw4 can regulate the feedback reference from: +412.5mV to 800mV in 12.5 mV steps. The output voltage thus ranges between +0.4125 * (1 + R1/R2) V and 0.8 * (1 + R1/R2) V. + +Regulators ldo1, ldo2, and ldo4 have a fixed 0.725 V reference and thus output +0.725 * (1 + R1/R2) V. The ldo3 regulator is fixed to 1.8 V. The ldo1 standby +regulator can not be disabled and thus should have the regulator-always-on +property set. + +Example: + + ltc3676: pmic@3c { + compatible = "lltc,ltc3676"; + reg = <0x3c>; + + regulators { + sw1_reg: sw1 { + regulator-min-microvolt = <674400>; + regulator-max-microvolt = <1308000>; + lltc,fb-voltage-divider = <127000 200000>; + regulator-ramp-delay = <7000>; + regulator-boot-on; + regulator-always-on; + }; + + sw2_reg: sw2 { + regulator-min-microvolt = <1033310>; + regulator-max-microvolt = <200400>; + lltc,fb-voltage-divider = <301000 200000>; + regulator-ramp-delay = <7000>; + regulator-boot-on; + regulator-always-on; + }; + + sw3_reg: sw3 { + regulator-min-microvolt = <674400>; + regulator-max-microvolt = <130800>; + lltc,fb-voltage-divider = <127000 200000>; + regulator-ramp-delay = <7000>; + regulator-boot-on; + regulator-always-on; + }; + + sw4_reg: sw4 { + regulator-min-microvolt = <868310>; + regulator-max-microvolt = <168400>; + lltc,fb-voltage-divider = <221000 200000>; + regulator-ramp-delay = <7000>; + regulator-boot-on; + regulator-always-on; + }; + + ldo2_reg: ldo2 { + regulator-min-microvolt = <2490375>; + regulator-max-microvolt = <2490375>; + lltc,fb-voltage-divider = <487000 200000>; + regulator-boot-on; + regulator-always-on; + }; + + ldo3_reg: ldo3 { + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-boot-on; + }; + + ldo4_reg: ldo4 { + regulator-min-microvolt = <3023250>; + regulator-max-microvolt = <3023250>; + lltc,fb-voltage-divider = <634000 200000>; + regulator-boot-on; + regulator-always-on; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/pv88080.txt b/Documentation/devicetree/bindings/regulator/pv88080.txt index 38a614210dcb..e6e4b9c82d89 100644 --- a/Documentation/devicetree/bindings/regulator/pv88080.txt +++ b/Documentation/devicetree/bindings/regulator/pv88080.txt @@ -1,22 +1,28 @@ * Powerventure Semiconductor PV88080 Voltage Regulator Required properties: -- compatible: "pvs,pv88080". -- reg: I2C slave address, usually 0x49. +- compatible: Must be one of the following, depending on the + silicon version: + - "pvs,pv88080" (DEPRECATED) + + - "pvs,pv88080-aa" for PV88080 AA or AB silicon + - "pvs,pv88080-ba" for PV88080 BA or BB silicon + NOTE: The use of the compatibles with no silicon version is deprecated. +- reg: I2C slave address, usually 0x49 - interrupts: the interrupt outputs of the controller - regulators: A node that houses a sub-node for each regulator within the device. Each sub-node is identified using the node's name, with valid values listed below. The content of each sub-node is defined by the standard binding for regulators; see regulator.txt. - BUCK1, BUCK2, and BUCK3. + BUCK1, BUCK2, BUCK3 and HVBUCK. Optional properties: - Any optional property defined in regulator.txt -Example +Example: pmic: pv88080@49 { - compatible = "pvs,pv88080"; + compatible = "pvs,pv88080-ba"; reg = <0x49>; interrupt-parent = <&gpio>; interrupts = <24 24>; @@ -45,5 +51,12 @@ Example regulator-min-microamp = <1496000>; regulator-max-microamp = <4189000>; }; + + HVBUCK { + regulator-name = "hvbuck"; + regulator-min-microvolt = < 5000>; + regulator-max-microvolt = <1275000>; + }; }; }; + diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt b/Documentation/devicetree/bindings/regulator/regulator.txt index ecfc593cac15..6ab5aef619d9 100644 --- a/Documentation/devicetree/bindings/regulator/regulator.txt +++ b/Documentation/devicetree/bindings/regulator/regulator.txt @@ -13,7 +13,7 @@ Optional properties: - regulator-allow-bypass: allow the regulator to go into bypass mode - regulator-allow-set-load: allow the regulator performance level to be configured - <name>-supply: phandle to the parent supply/regulator node -- regulator-ramp-delay: ramp delay for regulator(in uV/uS) +- regulator-ramp-delay: ramp delay for regulator(in uV/us) For hardware which supports disabling ramp rate, it should be explicitly initialised to zero (regulator-ramp-delay = <0>) for disabling ramp delay. - regulator-enable-ramp-delay: The time taken, in microseconds, for the supply diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,wcnss-pil.txt b/Documentation/devicetree/bindings/remoteproc/qcom,wcnss-pil.txt new file mode 100644 index 000000000000..0d2361ebe3d7 --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/qcom,wcnss-pil.txt @@ -0,0 +1,132 @@ +Qualcomm WCNSS Peripheral Image Loader + +This document defines the binding for a component that loads and boots firmware +on the Qualcomm WCNSS core. + +- compatible: + Usage: required + Value type: <string> + Definition: must be one of: + "qcom,riva-pil", + "qcom,pronto-v1-pil", + "qcom,pronto-v2-pil" + +- reg: + Usage: required + Value type: <prop-encoded-array> + Definition: must specify the base address and size of the CCU, DXE and + PMU register blocks + +- reg-names: + Usage: required + Value type: <stringlist> + Definition: must be "ccu", "dxe", "pmu" + +- interrupts-extended: + Usage: required + Value type: <prop-encoded-array> + Definition: must list the watchdog and fatal IRQs and may specify the + ready, handover and stop-ack IRQs + +- interrupt-names: + Usage: required + Value type: <stringlist> + Definition: should be "wdog", "fatal", optionally followed by "ready", + "handover", "stop-ack" + +- vddmx-supply: +- vddcx-supply: +- vddpx-supply: + Usage: required + Value type: <phandle> + Definition: reference to the regulators to be held on behalf of the + booting of the WCNSS core + +- qcom,smem-states: + Usage: optional + Value type: <prop-encoded-array> + Definition: reference to the SMEM state used to indicate to WCNSS that + it should shut down + +- qcom,smem-state-names: + Usage: optional + Value type: <stringlist> + Definition: should be "stop" + +- memory-region: + Usage: required + Value type: <prop-encoded-array> + Definition: reference to reserved-memory node for the remote processor + see ../reserved-memory/reserved-memory.txt + += SUBNODES +A single subnode of the WCNSS PIL describes the attached rf module and its +resource dependencies. + +- compatible: + Usage: required + Value type: <string> + Definition: must be one of: + "qcom,wcn3620", + "qcom,wcn3660", + "qcom,wcn3680" + +- clocks: + Usage: required + Value type: <prop-encoded-array> + Definition: should specify the xo clock and optionally the rf clock + +- clock-names: + Usage: required + Value type: <stringlist> + Definition: should be "xo", optionally followed by "rf" + +- vddxo-supply: +- vddrfa-supply: +- vddpa-supply: +- vdddig-supply: + Usage: required + Value type: <phandle> + Definition: reference to the regulators to be held on behalf of the + booting of the WCNSS core + += EXAMPLE +The following example describes the resources needed to boot control the WCNSS, +with attached WCN3680, as it is commonly found on MSM8974 boards. + +pronto@fb204000 { + compatible = "qcom,pronto-v2-pil"; + reg = <0xfb204000 0x2000>, <0xfb202000 0x1000>, <0xfb21b000 0x3000>; + reg-names = "ccu", "dxe", "pmu"; + + interrupts-extended = <&intc 0 149 1>, + <&wcnss_smp2p_slave 0 0>, + <&wcnss_smp2p_slave 1 0>, + <&wcnss_smp2p_slave 2 0>, + <&wcnss_smp2p_slave 3 0>; + interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack"; + + vddmx-supply = <&pm8841_s1>; + vddcx-supply = <&pm8841_s2>; + vddpx-supply = <&pm8941_s3>; + + qcom,smem-states = <&wcnss_smp2p_out 0>; + qcom,smem-state-names = "stop"; + + memory-region = <&wcnss_region>; + + pinctrl-names = "default"; + pinctrl-0 = <&wcnss_pin_a>; + + iris { + compatible = "qcom,wcn3680"; + + clocks = <&rpmcc RPM_CXO_CLK_SRC>, <&rpmcc RPM_CXO_A2>; + clock-names = "xo", "rf"; + + vddxo-supply = <&pm8941_l6>; + vddrfa-supply = <&pm8941_l11>; + vddpa-supply = <&pm8941_l19>; + vdddig-supply = <&pm8941_s3>; + }; +}; diff --git a/Documentation/devicetree/bindings/serial/8250.txt b/Documentation/devicetree/bindings/serial/8250.txt index 936ab5b87324..f86bb06c39e9 100644 --- a/Documentation/devicetree/bindings/serial/8250.txt +++ b/Documentation/devicetree/bindings/serial/8250.txt @@ -42,6 +42,8 @@ Optional properties: - auto-flow-control: one way to enable automatic flow control support. The driver is allowed to detect support for the capability even without this property. +- tx-threshold: Specify the TX FIFO low water indication for parts with + programmable TX FIFO thresholds. Note: * fsl,ns16550: diff --git a/Documentation/devicetree/bindings/serial/st,stm32-usart.txt b/Documentation/devicetree/bindings/serial/st,stm32-usart.txt new file mode 100644 index 000000000000..85ec5f2b1996 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/st,stm32-usart.txt @@ -0,0 +1,46 @@ +* STMicroelectronics STM32 USART + +Required properties: +- compatible: Can be either "st,stm32-usart", "st,stm32-uart", +"st,stm32f7-usart" or "st,stm32f7-uart" depending on whether +the device supports synchronous mode and is compatible with +stm32(f4) or stm32f7. +- reg: The address and length of the peripheral registers space +- interrupts: The interrupt line of the USART instance +- clocks: The input clock of the USART instance + +Optional properties: +- pinctrl: The reference on the pins configuration +- st,hw-flow-ctrl: bool flag to enable hardware flow control. +- dmas: phandle(s) to DMA controller node(s). Refer to stm32-dma.txt +- dma-names: "rx" and/or "tx" + +Examples: +usart4: serial@40004c00 { + compatible = "st,stm32-uart"; + reg = <0x40004c00 0x400>; + interrupts = <52>; + clocks = <&clk_pclk1>; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_usart4>; +}; + +usart2: serial@40004400 { + compatible = "st,stm32-usart", "st,stm32-uart"; + reg = <0x40004400 0x400>; + interrupts = <38>; + clocks = <&clk_pclk1>; + st,hw-flow-ctrl; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_usart2 &pinctrl_usart2_rtscts>; +}; + +usart1: serial@40011000 { + compatible = "st,stm32-usart", "st,stm32-uart"; + reg = <0x40011000 0x400>; + interrupts = <37>; + clocks = <&rcc 0 164>; + dmas = <&dma2 2 4 0x414 0x0>, + <&dma2 7 4 0x414 0x0>; + dma-names = "rx", "tx"; +}; diff --git a/Documentation/devicetree/bindings/soc/mediatek/auxadc.txt b/Documentation/devicetree/bindings/soc/mediatek/auxadc.txt deleted file mode 100644 index bdb782918a72..000000000000 --- a/Documentation/devicetree/bindings/soc/mediatek/auxadc.txt +++ /dev/null @@ -1,21 +0,0 @@ -MediaTek AUXADC -=============== - -The Auxiliary Analog/Digital Converter (AUXADC) is an ADC found -in some Mediatek SoCs which among other things measures the temperatures -in the SoC. It can be used directly with register accesses, but it is also -used by thermal controller which reads the temperatures from the AUXADC -directly via its own bus interface. See -Documentation/devicetree/bindings/thermal/mediatek-thermal.txt -for the Thermal Controller which holds a phandle to the AUXADC. - -Required properties: -- compatible: Must be "mediatek,mt8173-auxadc" -- reg: Address range of the AUXADC unit - -Example: - -auxadc: auxadc@11001000 { - compatible = "mediatek,mt8173-auxadc"; - reg = <0 0x11001000 0 0x1000>; -}; diff --git a/Documentation/devicetree/bindings/sound/nau8810.txt b/Documentation/devicetree/bindings/sound/nau8810.txt new file mode 100644 index 000000000000..05830e477acd --- /dev/null +++ b/Documentation/devicetree/bindings/sound/nau8810.txt @@ -0,0 +1,16 @@ +NAU8810 audio CODEC + +This device supports I2C only. + +Required properties: + + - compatible : "nuvoton,nau8810" + + - reg : the I2C address of the device. + +Example: + +codec: nau8810@1a { + compatible = "nuvoton,nau8810"; + reg = <0x1a>; +}; diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-sgtl5000.txt b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-sgtl5000.txt new file mode 100644 index 000000000000..5da7da4ea07a --- /dev/null +++ b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-sgtl5000.txt @@ -0,0 +1,42 @@ +NVIDIA Tegra audio complex, with SGTL5000 CODEC + +Required properties: +- compatible : "nvidia,tegra-audio-sgtl5000" +- clocks : Must contain an entry for each entry in clock-names. + See ../clocks/clock-bindings.txt for details. +- clock-names : Must include the following entries: + - pll_a + - pll_a_out0 + - mclk (The Tegra cdev1/extern1 clock, which feeds the CODEC's mclk) +- nvidia,model : The user-visible name of this sound complex. +- nvidia,audio-routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the connection's sink, + the second being the connection's source. Valid names for sources and + sinks are the SGTL5000's pins (as documented in its binding), and the jacks + on the board: + + * Headphone Jack + * Line In Jack + * Mic Jack + +- nvidia,i2s-controller : The phandle of the Tegra I2S controller that's + connected to the CODEC. +- nvidia,audio-codec : The phandle of the SGTL5000 audio codec. + +Example: + +sound { + compatible = "toradex,tegra-audio-sgtl5000-apalis_t30", + "nvidia,tegra-audio-sgtl5000"; + nvidia,model = "Toradex Apalis T30"; + nvidia,audio-routing = + "Headphone Jack", "HP_OUT", + "LINE_IN", "Line In Jack", + "MIC_IN", "Mic Jack"; + nvidia,i2s-controller = <&tegra_i2s2>; + nvidia,audio-codec = <&sgtl5000>; + clocks = <&tegra_car TEGRA30_CLK_PLL_A>, + <&tegra_car TEGRA30_CLK_PLL_A_OUT0>, + <&tegra_car TEGRA30_CLK_EXTERN1>; + clock-names = "pll_a", "pll_a_out0", "mclk"; +}; diff --git a/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt b/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt index 48129368d4d9..d9d8635ff94c 100644 --- a/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt +++ b/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt @@ -16,6 +16,24 @@ Required properties: * "spkr-iomux" - qcom,model : Name of the sound card. +- qcom,audio-routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the + connection's sink, the second being the connection's + source. Valid names could be power supplies, MicBias + of msm8x16_wcd codec and the jacks on the board: + + Power supplies: + * MIC BIAS External1 + * MIC BIAS External2 + * MIC BIAS Internal1 + * MIC BIAS Internal2 + + Board connectors: + * Headset Mic + * Secondary Mic", + * DMIC + * Ext Spk + Dai-link subnode properties and subnodes: Required dai-link subnodes: @@ -37,6 +55,18 @@ sound: sound { reg-names = "mic-iomux", "spkr-iomux"; qcom,model = "DB410c"; + qcom,audio-routing = + "MIC BIAS External1", "Handset Mic", + "MIC BIAS Internal2", "Headset Mic", + "MIC BIAS External1", "Secondary Mic", + "AMIC1", "MIC BIAS External1", + "AMIC2", "MIC BIAS Internal2", + "AMIC3", "MIC BIAS External1", + "DMIC1", "MIC BIAS Internal1", + "MIC BIAS Internal1", "Digital Mic1", + "DMIC2", "MIC BIAS Internal1", + "MIC BIAS Internal1", "Digital Mic2"; + /* I2S - Internal codec */ internal-dai-link@0 { cpu { /* PRIMARY */ diff --git a/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt b/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt deleted file mode 100644 index 255ece3043ad..000000000000 --- a/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt +++ /dev/null @@ -1,75 +0,0 @@ -Renesas Sampling Rate Convert Sound Card: - -Renesas Sampling Rate Convert Sound Card specifies audio DAI connections of SoC <-> codec. - -Required properties: - -- compatible : "renesas,rsrc-card{,<board>}" - Examples with boards are: - - "renesas,rsrc-card" - - "renesas,rsrc-card,lager" - - "renesas,rsrc-card,koelsch" -Optional properties: - -- card_name : User specified audio sound card name, one string - property. -- cpu : CPU sub-node -- codec : CODEC sub-node - -Optional subnode properties: - -- format : CPU/CODEC common audio format. - "i2s", "right_j", "left_j" , "dsp_a" - "dsp_b", "ac97", "pdm", "msb", "lsb" -- frame-master : Indicates dai-link frame master. - phandle to a cpu or codec subnode. -- bitclock-master : Indicates dai-link bit clock master. - phandle to a cpu or codec subnode. -- bitclock-inversion : bool property. Add this if the - dai-link uses bit clock inversion. -- frame-inversion : bool property. Add this if the - dai-link uses frame clock inversion. -- convert-rate : platform specified sampling rate convert -- convert-channels : platform specified converted channel size (2 - 8 ch) -- audio-prefix : see audio-routing -- audio-routing : A list of the connections between audio components. - Each entry is a pair of strings, the first being the connection's sink, - the second being the connection's source. Valid names for sources. - use audio-prefix if some components is using same sink/sources naming. - it can be used if compatible was "renesas,rsrc-card"; - -Required CPU/CODEC subnodes properties: - -- sound-dai : phandle and port of CPU/CODEC - -Optional CPU/CODEC subnodes properties: - -- clocks / system-clock-frequency : specify subnode's clock if needed. - it can be specified via "clocks" if system has - clock node (= common clock), or "system-clock-frequency" - (if system doens't support common clock) - If a clock is specified, it is - enabled with clk_prepare_enable() - in dai startup() and disabled with - clk_disable_unprepare() in dai - shutdown(). - -Example - -sound { - compatible = "renesas,rsrc-card,lager"; - - card-name = "rsnd-ak4643"; - format = "left_j"; - bitclock-master = <&sndcodec>; - frame-master = <&sndcodec>; - - sndcpu: cpu { - sound-dai = <&rcar_sound>; - }; - - sndcodec: codec { - sound-dai = <&ak4643>; - system-clock-frequency = <11289600>; - }; -}; diff --git a/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt new file mode 100644 index 000000000000..eac91db07178 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt @@ -0,0 +1,22 @@ +ROCKCHIP with MAX98357A/RT5514/DA7219 codecs on GRU boards + +Required properties: +- compatible: "rockchip,rk3399-gru-sound" +- rockchip,cpu: The phandle of the Rockchip I2S controller that's + connected to the codecs +- rockchip,codec: The phandle of the MAX98357A/RT5514/DA7219 codecs + +Optional properties: +- dmic-wakeup-delay-ms : specify delay time (ms) for DMIC ready. + If this option is specified, which means it's required dmic need + delay for DMIC to ready so that rt5514 can avoid recording before + DMIC send valid data + +Example: + +sound { + compatible = "rockchip,rk3399-gru-sound"; + rockchip,cpu = <&i2s0>; + rockchip,codec = <&max98357a &rt5514 &da7219>; + dmic-wakeup-delay-ms = <20>; +}; diff --git a/Documentation/devicetree/bindings/sound/rt5659.txt b/Documentation/devicetree/bindings/sound/rt5659.txt index 5f79e7fde032..1766e0543fc5 100644 --- a/Documentation/devicetree/bindings/sound/rt5659.txt +++ b/Documentation/devicetree/bindings/sound/rt5659.txt @@ -12,6 +12,9 @@ Required properties: Optional properties: +- clocks: The phandle of the master clock to the CODEC +- clock-names: Should be "mclk" + - realtek,in1-differential - realtek,in3-differential - realtek,in4-differential diff --git a/Documentation/devicetree/bindings/sound/rt5660.txt b/Documentation/devicetree/bindings/sound/rt5660.txt new file mode 100644 index 000000000000..30be5f921930 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/rt5660.txt @@ -0,0 +1,47 @@ +RT5660 audio CODEC + +This device supports I2C only. + +Required properties: + +- compatible : "realtek,rt5660". + +- reg : The I2C address of the device. + +Optional properties: + +- clocks: The phandle of the master clock to the CODEC +- clock-names: Should be "mclk" + +- realtek,in1-differential +- realtek,in3-differential + Boolean. Indicate MIC1/3 input are differential, rather than single-ended. + +- realtek,poweroff-in-suspend + Boolean. If the codec will be powered off in suspend, the resume should be + added delay time for waiting codec power ready. + +- realtek,dmic1-data-pin + 0: dmic1 is not used + 1: using GPIO2 pin as dmic1 data pin + 2: using IN1P pin as dmic1 data pin + +Pins on the device (for linking into audio routes) for RT5660: + + * DMIC L1 + * DMIC R1 + * IN1P + * IN1N + * IN2P + * IN3P + * IN3N + * SPO + * LOUTL + * LOUTR + +Example: + +rt5660 { + compatible = "realtek,rt5660"; + reg = <0x1c>; +}; diff --git a/Documentation/devicetree/bindings/sound/rt5663.txt b/Documentation/devicetree/bindings/sound/rt5663.txt new file mode 100644 index 000000000000..7d3c974c6e2e --- /dev/null +++ b/Documentation/devicetree/bindings/sound/rt5663.txt @@ -0,0 +1,30 @@ +RT5663/RT5668 audio CODEC + +This device supports I2C only. + +Required properties: + +- compatible : One of "realtek,rt5663" or "realtek,rt5668". + +- reg : The I2C address of the device. + +- interrupts : The CODEC's interrupt output. + +Optional properties: + +Pins on the device (for linking into audio routes) for RT5663/RT5668: + + * IN1P + * IN1N + * IN2P + * IN2N + * HPOL + * HPOR + +Example: + +codec: rt5663@12 { + compatible = "realtek,rt5663"; + reg = <0x12>; + interrupts = <7 IRQ_TYPE_EDGE_FALLING>; +}; diff --git a/Documentation/devicetree/bindings/sound/simple-card.txt b/Documentation/devicetree/bindings/sound/simple-card.txt index 59d862801e59..c7a93931fad2 100644 --- a/Documentation/devicetree/bindings/sound/simple-card.txt +++ b/Documentation/devicetree/bindings/sound/simple-card.txt @@ -22,6 +22,8 @@ Optional properties: headphones are attached. - simple-audio-card,mic-det-gpio : Reference to GPIO that signals when a microphone is attached. +- simple-audio-card,aux-devs : List of phandles pointing to auxiliary devices, such + as amplifiers, to be added to the sound card. Optional subnodes: @@ -162,3 +164,38 @@ sound { }; }; }; + +Example 3 - route audio from IMX6 SSI2 through TLV320DAC3100 codec +through TPA6130A2 amplifier to headphones: + +&i2c0 { + codec: tlv320dac3100@18 { + compatible = "ti,tlv320dac3100"; + ... + } + + amp: tpa6130a2@60 { + compatible = "ti,tpa6130a2"; + ... + } +} + +sound { + compatible = "simple-audio-card"; + ... + simple-audio-card,widgets = + "Headphone", "Headphone Jack"; + simple-audio-card,routing = + "Headphone Jack", "HPLEFT", + "Headphone Jack", "HPRIGHT", + "LEFTIN", "HPL", + "RIGHTIN", "HPR"; + simple-audio-card,aux-devs = <&>; + simple-audio-card,cpu { + sound-dai = <&ssi2>; + }; + simple-audio-card,codec { + sound-dai = <&codec>; + clocks = ... + }; +}; diff --git a/Documentation/devicetree/bindings/sound/simple-scu-card.txt b/Documentation/devicetree/bindings/sound/simple-scu-card.txt new file mode 100644 index 000000000000..d6fe47ed09af --- /dev/null +++ b/Documentation/devicetree/bindings/sound/simple-scu-card.txt @@ -0,0 +1,110 @@ +ASoC simple SCU Sound Card + +Simple-Card specifies audio DAI connections of SoC <-> codec. + +Required properties: + +- compatible : "simple-scu-audio-card" + "renesas,rsrc-card" + +Optional properties: + +- simple-audio-card,name : User specified audio sound card name, one string + property. +- simple-audio-card,cpu : CPU sub-node +- simple-audio-card,codec : CODEC sub-node + +Optional subnode properties: + +- simple-audio-card,format : CPU/CODEC common audio format. + "i2s", "right_j", "left_j" , "dsp_a" + "dsp_b", "ac97", "pdm", "msb", "lsb" +- simple-audio-card,frame-master : Indicates dai-link frame master. + phandle to a cpu or codec subnode. +- simple-audio-card,bitclock-master : Indicates dai-link bit clock master. + phandle to a cpu or codec subnode. +- simple-audio-card,bitclock-inversion : bool property. Add this if the + dai-link uses bit clock inversion. +- simple-audio-card,frame-inversion : bool property. Add this if the + dai-link uses frame clock inversion. +- simple-audio-card,convert-rate : platform specified sampling rate convert +- simple-audio-card,convert-channels : platform specified converted channel size (2 - 8 ch) +- simple-audio-card,prefix : see audio-routing +- simple-audio-card,routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the connection's sink, + the second being the connection's source. Valid names for sources. + use audio-prefix if some components is using same sink/sources naming. + it can be used if compatible was "renesas,rsrc-card"; + +Required CPU/CODEC subnodes properties: + +- sound-dai : phandle and port of CPU/CODEC + +Optional CPU/CODEC subnodes properties: + +- clocks / system-clock-frequency : specify subnode's clock if needed. + it can be specified via "clocks" if system has + clock node (= common clock), or "system-clock-frequency" + (if system doens't support common clock) + If a clock is specified, it is + enabled with clk_prepare_enable() + in dai startup() and disabled with + clk_disable_unprepare() in dai + shutdown(). + +Example 1. Sampling Rate Covert + +sound { + compatible = "simple-scu-audio-card"; + + simple-audio-card,name = "rsnd-ak4643"; + simple-audio-card,format = "left_j"; + simple-audio-card,format = "left_j"; + simple-audio-card,bitclock-master = <&sndcodec>; + simple-audio-card,frame-master = <&sndcodec>; + + simple-audio-card,convert-rate = <48000>; /* see audio_clk_a */ + + simple-audio-card,prefix = "ak4642"; + simple-audio-card,routing = "ak4642 Playback", "DAI0 Playback", + "DAI0 Capture", "ak4642 Capture"; + + sndcpu: simple-audio-card,cpu { + sound-dai = <&rcar_sound>; + }; + + sndcodec: simple-audio-card,codec { + sound-dai = <&ak4643>; + system-clock-frequency = <11289600>; + }; +}; + +Example 2. 2 CPU 1 Codec + +sound { + compatible = "renesas,rsrc-card"; + + card-name = "rsnd-ak4643"; + format = "left_j"; + bitclock-master = <&dpcmcpu>; + frame-master = <&dpcmcpu>; + + convert-rate = <48000>; /* see audio_clk_a */ + + audio-prefix = "ak4642"; + audio-routing = "ak4642 Playback", "DAI0 Playback", + "ak4642 Playback", "DAI1 Playback"; + + dpcmcpu: cpu@0 { + sound-dai = <&rcar_sound 0>; + }; + + cpu@1 { + sound-dai = <&rcar_sound 1>; + }; + + codec { + sound-dai = <&ak4643>; + clocks = <&audio_clock>; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt b/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt index 16bcdfb6760e..745dc62f76ea 100644 --- a/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt +++ b/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt @@ -11,7 +11,9 @@ Documentation/devicetree/bindings/sound/simple-card.txt. --------------------------------------- Required properties: - - compatible: "st,sti-uni-player" or "st,sti-uni-reader" + - compatible: "st,stih407-uni-player-hdmi", "st,stih407-uni-player-pcm-out", + "st,stih407-uni-player-dac", "st,stih407-uni-player-spdif", + "st,stih407-uni-reader-pcm_in", "st,stih407-uni-reader-hdmi", - st,syscfg: phandle to boot-device system configuration registers @@ -33,32 +35,24 @@ Required properties: "tx" for "st,sti-uni-player" compatibility "rx" for "st,sti-uni-reader" compatibility - - st,version: IP version integrated in SOC. - - - dai-name: DAI name that describes the IP. - - - st,mode: IP working mode depending on associated codec. - "HDMI" connected to HDMI codec and support IEC HDMI formats (player only). - "SPDIF" connected to SPDIF codec and support SPDIF formats (player only). - "PCM" PCM standard mode for I2S or TDM bus. - "TDM" TDM mode for TDM bus. - Required properties ("st,sti-uni-player" compatibility only): - clocks: CPU_DAI IP clock source, listed in the same order than the CPU_DAI properties. - - st,uniperiph-id: internal SOC IP instance ID. - Optional properties: - pinctrl-0: defined for CPU_DAI@1 and CPU_DAI@4 to describe I2S PIOs for external codecs connection. - pinctrl-names: should contain only one value - "default". + - st,tdm-mode: to declare to set TDM mode for unireader and uniplayer IPs. + Only compartible with IPs in charge of the external I2S/TDM bus. + Should be declared depending on associated codec. + Example: - sti_uni_player1: sti-uni-player@1 { - compatible = "st,sti-uni-player"; + sti_uni_player1: sti-uni-player@0x8D81000 { + compatible = "st,stih407-uni-player-hdmi"; status = "okay"; #sound-dai-cells = <0>; st,syscfg = <&syscfg_core>; @@ -66,15 +60,12 @@ Example: reg = <0x8D81000 0x158>; interrupts = <GIC_SPI 85 IRQ_TYPE_NONE>; dmas = <&fdma0 3 0 1>; - st,dai-name = "Uni Player #1 (I2S)"; dma-names = "tx"; - st,uniperiph-id = <1>; - st,version = <5>; - st,mode = "TDM"; + st,tdm-mode = <1>; }; - sti_uni_player2: sti-uni-player@2 { - compatible = "st,sti-uni-player"; + sti_uni_player2: sti-uni-player@0x8D82000 { + compatible = "st,stih407-uni-player-pcm-out"; status = "okay"; #sound-dai-cells = <0>; st,syscfg = <&syscfg_core>; @@ -82,15 +73,11 @@ Example: reg = <0x8D82000 0x158>; interrupts = <GIC_SPI 86 IRQ_TYPE_NONE>; dmas = <&fdma0 4 0 1>; - dai-name = "Uni Player #2 (DAC)"; dma-names = "tx"; - st,uniperiph-id = <2>; - st,version = <5>; - st,mode = "PCM"; }; - sti_uni_player3: sti-uni-player@3 { - compatible = "st,sti-uni-player"; + sti_uni_player3: sti-uni-player@0x8D85000 { + compatible = "st,stih407-uni-player-spdif"; status = "okay"; #sound-dai-cells = <0>; st,syscfg = <&syscfg_core>; @@ -99,14 +86,10 @@ Example: interrupts = <GIC_SPI 89 IRQ_TYPE_NONE>; dmas = <&fdma0 7 0 1>; dma-names = "tx"; - dai-name = "Uni Player #3 (SPDIF)"; - st,uniperiph-id = <3>; - st,version = <5>; - st,mode = "SPDIF"; }; - sti_uni_reader1: sti-uni-reader@1 { - compatible = "st,sti-uni-reader"; + sti_uni_reader1: sti-uni-reader@0x8D84000 { + compatible = "st,stih407-uni-reader-hdmi"; status = "disabled"; #sound-dai-cells = <0>; st,syscfg = <&syscfg_core>; @@ -114,9 +97,6 @@ Example: interrupts = <GIC_SPI 88 IRQ_TYPE_NONE>; dmas = <&fdma0 6 0 1>; dma-names = "rx"; - dai-name = "Uni Reader #1 (HDMI RX)"; - st,version = <3>; - st,mode = "PCM"; }; 2) sti-sas-codec: internal audio codec IPs driver diff --git a/Documentation/devicetree/bindings/sound/sunxi,sun4i-spdif.txt b/Documentation/devicetree/bindings/sound/sunxi,sun4i-spdif.txt index 13503aa505a9..0230c4d20506 100644 --- a/Documentation/devicetree/bindings/sound/sunxi,sun4i-spdif.txt +++ b/Documentation/devicetree/bindings/sound/sunxi,sun4i-spdif.txt @@ -9,6 +9,7 @@ Required properties: - compatible : should be one of the following: - "allwinner,sun4i-a10-spdif": for the Allwinner A10 SoC + - "allwinner,sun6i-a31-spdif": for the Allwinner A31 SoC - reg : Offset and length of the register set for the device. @@ -25,6 +26,8 @@ Required properties: "apb" clock for the spdif bus. "spdif" clock for spdif controller. + - resets : reset specifier for the ahb reset (A31 and newer only) + Example: spdif: spdif@01c21000 { diff --git a/Documentation/devicetree/bindings/sound/tlv320aic31xx.txt b/Documentation/devicetree/bindings/sound/tlv320aic31xx.txt index eff12be5e789..9340d2ddcc54 100644 --- a/Documentation/devicetree/bindings/sound/tlv320aic31xx.txt +++ b/Documentation/devicetree/bindings/sound/tlv320aic31xx.txt @@ -11,6 +11,7 @@ Required properties: "ti,tlv320aic3110" - TLV320AIC3110 (stereo speaker amp, no MiniDSP) "ti,tlv320aic3120" - TLV320AIC3120 (mono speaker amp, MiniDSP) "ti,tlv320aic3111" - TLV320AIC3111 (stereo speaker amp, MiniDSP) + "ti,tlv320dac3100" - TLV320DAC3100 (no ADC, mono speaker amp, no MiniDSP) - reg - <int> - I2C slave address - HPVDD-supply, SPRVDD-supply, SPLVDD-supply, AVDD-supply, IOVDD-supply, @@ -37,9 +38,11 @@ CODEC output pins: * MICBIAS CODEC input pins: - * MIC1LP - * MIC1RP - * MIC1LM + * MIC1LP, devices with ADC + * MIC1RP, devices with ADC + * MIC1LM, devices with ADC + * AIN1, devices without ADC + * AIN2, devices without ADC The pins can be used in referring sound node's audio-routing property. diff --git a/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt new file mode 100644 index 000000000000..ad7ac80a3841 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt @@ -0,0 +1,233 @@ +Broadcom SPI controller + +The Broadcom SPI controller is a SPI master found on various SOCs, including +BRCMSTB (BCM7XXX), Cygnus, NSP and NS2. The Broadcom Master SPI hw IP consits +of : + MSPI : SPI master controller can read and write to a SPI slave device + BSPI : Broadcom SPI in combination with the MSPI hw IP provides acceleration + for flash reads and be configured to do single, double, quad lane + io with 3-byte and 4-byte addressing support. + + Supported Broadcom SoCs have one instance of MSPI+BSPI controller IP. + MSPI master can be used wihout BSPI. BRCMSTB SoCs have an additional instance + of a MSPI master without the BSPI to use with non flash slave devices that + use SPI protocol. + +Required properties: + +- #address-cells: + Must be <1>, as required by generic SPI binding. + +- #size-cells: + Must be <0>, also as required by generic SPI binding. + +- compatible: + Must be one of : + "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-qspi" : MSPI+BSPI on BRCMSTB SoCs + "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs + "brcm,spi-bcm-qspi", "brcm,spi-nsp-qspi" : MSPI+BSPI on Cygnus, NSP + "brcm,spi-bcm-qspi", "brcm,spi-ns2-qspi" : NS2 SoCs + +- reg: + Define the bases and ranges of the associated I/O address spaces. + The required range is MSPI controller registers. + +- reg-names: + First name does not matter, but must be reserved for the MSPI controller + register range as mentioned in 'reg' above, and will typically contain + - "bspi_regs": BSPI register range, not required with compatible + "spi-brcmstb-mspi" + - "mspi_regs": MSPI register range is required for compatible strings + - "intr_regs", "intr_status_reg" : Interrupt and status register for + NSP, NS2, Cygnus SoC + +- interrupts + The interrupts used by the MSPI and/or BSPI controller. + +- interrupt-names: + Names of interrupts associated with MSPI + - "mspi_halted" : + - "mspi_done": Indicates that the requested SPI operation is complete. + - "spi_lr_fullness_reached" : Linear read BSPI pipe full + - "spi_lr_session_aborted" : Linear read BSPI pipe aborted + - "spi_lr_impatient" : Linear read BSPI requested when pipe empty + - "spi_lr_session_done" : Linear read BSPI session done + +- clocks: + A phandle to the reference clock for this block. + +Optional properties: + + +- native-endian + Defined when using BE SoC and device uses BE register read/write + +Recommended optional m25p80 properties: +- spi-rx-bus-width: Definition as per + Documentation/devicetree/bindings/spi/spi-bus.txt + +Examples: + +BRCMSTB SoC Example: + + SPI Master (MSPI+BSPI) for SPI-NOR access: + + spi@f03e3400 { + #address-cells = <0x1>; + #size-cells = <0x0>; + compatible = "brcm,spi-brcmstb-qspi", "brcm,spi-brcmstb-qspi"; + reg = <0xf03e0920 0x4 0xf03e3400 0x188 0xf03e3200 0x50>; + reg-names = "cs_reg", "mspi", "bspi"; + interrupts = <0x6 0x5 0x4 0x3 0x2 0x1 0x0>; + interrupt-parent = <0x1c>; + interrupt-names = "mspi_halted", + "mspi_done", + "spi_lr_overread", + "spi_lr_session_done", + "spi_lr_impatient", + "spi_lr_session_aborted", + "spi_lr_fullness_reached"; + + clocks = <&hif_spi>; + clock-names = "sw_spi"; + + m25p80@0 { + #size-cells = <0x2>; + #address-cells = <0x2>; + compatible = "m25p80"; + reg = <0x0>; + spi-max-frequency = <0x2625a00>; + spi-cpol; + spi-cpha; + m25p,fast-read; + + flash0.bolt@0 { + reg = <0x0 0x0 0x0 0x100000>; + }; + + flash0.macadr@100000 { + reg = <0x0 0x100000 0x0 0x10000>; + }; + + flash0.nvram@110000 { + reg = <0x0 0x110000 0x0 0x10000>; + }; + + flash0.kernel@120000 { + reg = <0x0 0x120000 0x0 0x400000>; + }; + + flash0.devtree@520000 { + reg = <0x0 0x520000 0x0 0x10000>; + }; + + flash0.splash@530000 { + reg = <0x0 0x530000 0x0 0x80000>; + }; + + flash0@0 { + reg = <0x0 0x0 0x0 0x4000000>; + }; + }; + }; + + + MSPI master for any SPI device : + + spi@f0416000 { + #address-cells = <1>; + #size-cells = <0>; + clocks = <&upg_fixed>; + compatible = "brcm,spi-brcmstb-qspi", "brcm,spi-brcmstb-mspi"; + reg = <0xf0416000 0x180>; + reg-names = "mspi"; + interrupts = <0x14>; + interrupt-parent = <&irq0_aon_intc>; + interrupt-names = "mspi_done"; + }; + +iProc SoC Example: + + qspi: spi@18027200 { + compatible = "brcm,spi-bcm-qspi", "brcm,spi-nsp-qspi"; + reg = <0x18027200 0x184>, + <0x18027000 0x124>, + <0x1811c408 0x004>, + <0x180273a0 0x01c>; + reg-names = "mspi_regs", "bspi_regs", "intr_regs", "intr_status_reg"; + interrupts = <GIC_SPI 72 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 73 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 74 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 76 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 77 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 78 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = + "spi_lr_fullness_reached", + "spi_lr_session_aborted", + "spi_lr_impatient", + "spi_lr_session_done", + "mspi_done", + "mspi_halted"; + clocks = <&iprocmed>; + clock-names = "iprocmed"; + num-cs = <2>; + #address-cells = <1>; + #size-cells = <0>; + }; + + + NS2 SoC Example: + + qspi: spi@66470200 { + compatible = "brcm,spi-bcm-qspi", "brcm,spi-ns2-qspi"; + reg = <0x66470200 0x184>, + <0x66470000 0x124>, + <0x67017408 0x004>, + <0x664703a0 0x01c>; + reg-names = "mspi", "bspi", "intr_regs", + "intr_status_reg"; + interrupts = <GIC_SPI 419 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "spi_l1_intr"; + clocks = <&iprocmed>; + clock-names = "iprocmed"; + num-cs = <2>; + #address-cells = <1>; + #size-cells = <0>; + }; + + + m25p80 node for NSP, NS2 + + &qspi { + flash: m25p80@0 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "m25p80"; + reg = <0x0>; + spi-max-frequency = <12500000>; + m25p,fast-read; + spi-cpol; + spi-cpha; + + partition@0 { + label = "boot"; + reg = <0x00000000 0x000a0000>; + }; + + partition@a0000 { + label = "env"; + reg = <0x000a0000 0x00060000>; + }; + + partition@100000 { + label = "system"; + reg = <0x00100000 0x00600000>; + }; + + partition@700000 { + label = "rootfs"; + reg = <0x00700000 0x01900000>; + }; + }; diff --git a/Documentation/devicetree/bindings/spi/jcore,spi.txt b/Documentation/devicetree/bindings/spi/jcore,spi.txt new file mode 100644 index 000000000000..93936d16e139 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/jcore,spi.txt @@ -0,0 +1,34 @@ +J-Core SPI master + +Required properties: + +- compatible: Must be "jcore,spi2". + +- reg: Memory region for registers. + +- #address-cells: Must be 1. + +- #size-cells: Must be 0. + +Optional properties: + +- clocks: If a phandle named "ref_clk" is present, SPI clock speed + programming is relative to the frequency of the indicated clock. + Necessary only if the input clock rate is something other than a + fixed 50 MHz. + +- clock-names: Clock names, one for each phandle in clocks. + +See spi-bus.txt for additional properties not specific to this device. + +Example: + +spi@40 { + compatible = "jcore,spi2"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x40 0x8>; + spi-max-frequency = <25000000>; + clocks = <&bus_clk>; + clock-names = "ref_clk"; +} diff --git a/Documentation/devicetree/bindings/spi/spi-bus.txt b/Documentation/devicetree/bindings/spi/spi-bus.txt index 17822860cb98..4b1d6e74c744 100644 --- a/Documentation/devicetree/bindings/spi/spi-bus.txt +++ b/Documentation/devicetree/bindings/spi/spi-bus.txt @@ -31,7 +31,7 @@ with max(cs-gpios > hw cs). So if for example the controller has 2 CS lines, and the cs-gpios property looks like this: -cs-gpios = <&gpio1 0 0> <0> <&gpio1 1 0> <&gpio1 2 0>; +cs-gpios = <&gpio1 0 0>, <0>, <&gpio1 1 0>, <&gpio1 2 0>; Then it should be configured so that num_chipselect = 4 with the following mapping: diff --git a/Documentation/devicetree/bindings/spi/spi-meson.txt b/Documentation/devicetree/bindings/spi/spi-meson.txt index bb52a86f3365..dc6d0313324a 100644 --- a/Documentation/devicetree/bindings/spi/spi-meson.txt +++ b/Documentation/devicetree/bindings/spi/spi-meson.txt @@ -7,7 +7,7 @@ NOR memories, without DMA support and a 64-byte unified transmit / receive buffer. Required properties: - - compatible: should be "amlogic,meson6-spifc" + - compatible: should be "amlogic,meson6-spifc" or "amlogic,meson-gxbb-spifc" - reg: physical base address and length of the controller registers - clocks: phandle of the input clock for the baud rate generator - #address-cells: should be 1 diff --git a/Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt b/Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt index da2d510cae47..e207c11630af 100644 --- a/Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt +++ b/Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt @@ -2,7 +2,9 @@ MOXA ART timer Required properties: -- compatible : Must be "moxa,moxart-timer" +- compatible : Must be one of: + - "moxa,moxart-timer" + - "aspeed,ast2400-timer" - reg : Should contain registers location and length - interrupts : Should contain the timer interrupt number - clocks : Should contain phandle for the clock that drives the counter diff --git a/Documentation/devicetree/bindings/timer/oxsemi,rps-timer.txt b/Documentation/devicetree/bindings/timer/oxsemi,rps-timer.txt index 3ca89cd1caef..d191612539e8 100644 --- a/Documentation/devicetree/bindings/timer/oxsemi,rps-timer.txt +++ b/Documentation/devicetree/bindings/timer/oxsemi,rps-timer.txt @@ -2,7 +2,7 @@ Oxford Semiconductor OXNAS SoCs Family RPS Timer ================================================ Required properties: -- compatible: Should be "oxsemi,ox810se-rps-timer" +- compatible: Should be "oxsemi,ox810se-rps-timer" or "oxsemi,ox820-rps-timer" - reg : Specifies base physical address and size of the registers. - interrupts : The interrupts of the two timers - clocks : The phandle of the timer clock source diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt index 341dc67f3472..0e03344e2e8b 100644 --- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt +++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt @@ -81,6 +81,8 @@ i.mx specific properties - fsl,usbmisc: phandler of non-core register device, with one argument that indicate usb controller index - disable-over-current: disable over current detect +- over-current-active-high: over current signal polarity is high active, + typically over current signal polarity is low active. - external-vbus-divider: enables off-chip resistor divider for Vbus Example: diff --git a/Documentation/devicetree/bindings/usb/dwc2.txt b/Documentation/devicetree/bindings/usb/dwc2.txt index 20a68bf2b4e7..7d16ebfaa5a1 100644 --- a/Documentation/devicetree/bindings/usb/dwc2.txt +++ b/Documentation/devicetree/bindings/usb/dwc2.txt @@ -26,7 +26,10 @@ Refer to phy/phy-bindings.txt for generic phy consumer properties - g-use-dma: enable dma usage in gadget driver. - g-rx-fifo-size: size of rx fifo size in gadget mode. - g-np-tx-fifo-size: size of non-periodic tx fifo size in gadget mode. -- g-tx-fifo-size: size of periodic tx fifo per endpoint (except ep0) in gadget mode. + +Deprecated properties: +- g-tx-fifo-size: size of periodic tx fifo per endpoint (except ep0) + in gadget mode. Example: diff --git a/Documentation/devicetree/bindings/usb/dwc3-cavium.txt b/Documentation/devicetree/bindings/usb/dwc3-cavium.txt new file mode 100644 index 000000000000..710b782ccf65 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/dwc3-cavium.txt @@ -0,0 +1,28 @@ +Cavium SuperSpeed DWC3 USB SoC controller + +Required properties: +- compatible: Should contain "cavium,octeon-7130-usb-uctl" + +Required child node: +A child node must exist to represent the core DWC3 IP block. The name of +the node is not important. The content of the node is defined in dwc3.txt. + +Example device node: + + uctl@1180069000000 { + compatible = "cavium,octeon-7130-usb-uctl"; + reg = <0x00011800 0x69000000 0x00000000 0x00000100>; + ranges; + #address-cells = <0x00000002>; + #size-cells = <0x00000002>; + refclk-frequency = <0x05f5e100>; + refclk-type-ss = "dlmc_ref_clk0"; + refclk-type-hs = "dlmc_ref_clk0"; + power = <0x00000002 0x00000002 0x00000001>; + xhci@1690000000000 { + compatible = "cavium,octeon-7130-xhci", "synopsys,dwc3"; + reg = <0x00016900 0x00000000 0x00000010 0x00000000>; + interrupt-parent = <0x00000010>; + interrupts = <0x00000009 0x00000004>; + }; + }; diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt index 7d7ce089b003..e3e6983288e3 100644 --- a/Documentation/devicetree/bindings/usb/dwc3.txt +++ b/Documentation/devicetree/bindings/usb/dwc3.txt @@ -13,7 +13,8 @@ Optional properties: in the array is expected to be a handle to the USB2/HS PHY and the second element is expected to be a handle to the USB3/SS PHY - phys: from the *Generic PHY* bindings - - phy-names: from the *Generic PHY* bindings + - phy-names: from the *Generic PHY* bindings; supported names are "usb2-phy" + or "usb3-phy". - snps,usb3_lpm_capable: determines if platform is USB3 LPM capable - snps,disable_scramble_quirk: true when SW should disable data scrambling. Only really useful for FPGA builds. @@ -39,6 +40,11 @@ Optional properties: disabling the suspend signal to the PHY. - snps,dis_rxdet_inp3_quirk: when set core will disable receiver detection in PHY P3 power state. + - snps,dis-u2-freeclk-exists-quirk: when set, clear the u2_freeclk_exists + in GUSB2PHYCFG, specify that USB2 PHY doesn't provide + a free-running PHY clock. + - snps,dis-del-phy-power-chg-quirk: when set core will change PHY power + from P0 to P1/P2/P3 without delay. - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal utmi_l1_suspend_n, false when asserts utmi_sleep_n - snps,hird-threshold: HIRD threshold diff --git a/Documentation/devicetree/bindings/usb/generic.txt b/Documentation/devicetree/bindings/usb/generic.txt index bba825711873..bfadeb1c3bab 100644 --- a/Documentation/devicetree/bindings/usb/generic.txt +++ b/Documentation/devicetree/bindings/usb/generic.txt @@ -11,6 +11,11 @@ Optional properties: "peripheral" and "otg". In case this attribute isn't passed via DT, USB DRD controllers should default to OTG. + - phy_type: tells USB controllers that we want to configure the core to support + a UTMI+ PHY with an 8- or 16-bit interface if UTMI+ is + selected. Valid arguments are "utmi" and "utmi_wide". + In case this isn't passed via DT, USB controllers should + default to HW capability. - otg-rev: tells usb driver the release number of the OTG and EH supplement with which the device and its descriptors are compliant, in binary-coded decimal (i.e. 2.0 is 0200H). This @@ -34,6 +39,7 @@ dwc3@4a030000 { usb-phy = <&usb2_phy>, <&usb3,phy>; maximum-speed = "super-speed"; dr_mode = "otg"; + phy_type = "utmi_wide"; otg-rev = <0x0200>; adp-disable; }; diff --git a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt index b6040563e51a..9e18e000339e 100644 --- a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt +++ b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt @@ -9,6 +9,7 @@ Required properties: - "renesas,usbhs-r8a7793" for r8a7793 (R-Car M2-N) compatible device - "renesas,usbhs-r8a7794" for r8a7794 (R-Car E2) compatible device - "renesas,usbhs-r8a7795" for r8a7795 (R-Car H3) compatible device + - "renesas,usbhs-r8a7796" for r8a7796 (R-Car M3-W) compatible device - "renesas,rcar-gen2-usbhs" for R-Car Gen2 compatible device - "renesas,rcar-gen3-usbhs" for R-Car Gen3 compatible device diff --git a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt new file mode 100644 index 000000000000..0536a938e3ab --- /dev/null +++ b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt @@ -0,0 +1,59 @@ +Rockchip SuperSpeed DWC3 USB SoC controller + +Required properties: +- compatible: should contain "rockchip,rk3399-dwc3" for rk3399 SoC +- clocks: A list of phandle + clock-specifier pairs for the + clocks listed in clock-names +- clock-names: Should contain the following: + "ref_clk" Controller reference clk, have to be 24 MHz + "suspend_clk" Controller suspend clk, have to be 24 MHz or 32 KHz + "bus_clk" Master/Core clock, have to be >= 62.5 MHz for SS + operation and >= 30MHz for HS operation + "grf_clk" Controller grf clk + +Required child node: +A child node must exist to represent the core DWC3 IP block. The name of +the node is not important. The content of the node is defined in dwc3.txt. + +Phy documentation is provided in the following places: +Documentation/devicetree/bindings/phy/rockchip,dwc3-usb-phy.txt + +Example device nodes: + + usbdrd3_0: usb@fe800000 { + compatible = "rockchip,rk3399-dwc3"; + clocks = <&cru SCLK_USB3OTG0_REF>, <&cru SCLK_USB3OTG0_SUSPEND>, + <&cru ACLK_USB3OTG0>, <&cru ACLK_USB3_GRF>; + clock-names = "ref_clk", "suspend_clk", + "bus_clk", "grf_clk"; + #address-cells = <2>; + #size-cells = <2>; + ranges; + status = "disabled"; + usbdrd_dwc3_0: dwc3@fe800000 { + compatible = "snps,dwc3"; + reg = <0x0 0xfe800000 0x0 0x100000>; + interrupts = <GIC_SPI 105 IRQ_TYPE_LEVEL_HIGH>; + dr_mode = "otg"; + status = "disabled"; + }; + }; + + usbdrd3_1: usb@fe900000 { + compatible = "rockchip,rk3399-dwc3"; + clocks = <&cru SCLK_USB3OTG1_REF>, <&cru SCLK_USB3OTG1_SUSPEND>, + <&cru ACLK_USB3OTG1>, <&cru ACLK_USB3_GRF>; + clock-names = "ref_clk", "suspend_clk", + "bus_clk", "grf_clk"; + #address-cells = <2>; + #size-cells = <2>; + ranges; + status = "disabled"; + usbdrd_dwc3_1: dwc3@fe900000 { + compatible = "snps,dwc3"; + reg = <0x0 0xfe900000 0x0 0x100000>; + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>; + dr_mode = "otg"; + status = "disabled"; + }; + }; diff --git a/Documentation/devicetree/bindings/usb/usb4604.txt b/Documentation/devicetree/bindings/usb/usb4604.txt new file mode 100644 index 000000000000..82506d17712c --- /dev/null +++ b/Documentation/devicetree/bindings/usb/usb4604.txt @@ -0,0 +1,19 @@ +SMSC USB4604 High-Speed Hub Controller + +Required properties: +- compatible: Should be "smsc,usb4604" + +Optional properties: +- reg: Specifies the i2c slave address, it is required and should be 0x2d + if I2C is used. +- reset-gpios: Should specify GPIO for reset. +- initial-mode: Should specify initial mode. + (1 for HUB mode, 2 for STANDBY mode) + +Examples: + usb-hub@2d { + compatible = "smsc,usb4604"; + reg = <0x2d>; + reset-gpios = <&gpx3 5 1>; + initial-mode = <1>; + }; diff --git a/Documentation/devicetree/bindings/usb/usbmisc-imx.txt b/Documentation/devicetree/bindings/usb/usbmisc-imx.txt index 3539d4e7d23e..f1e27faf528e 100644 --- a/Documentation/devicetree/bindings/usb/usbmisc-imx.txt +++ b/Documentation/devicetree/bindings/usb/usbmisc-imx.txt @@ -6,6 +6,7 @@ Required properties: "fsl,imx6q-usbmisc" for imx6q "fsl,vf610-usbmisc" for Vybrid vf610 "fsl,imx6sx-usbmisc" for imx6sx + "fsl,imx7d-usbmisc" for imx7d - reg: Should contain registers location and length Examples: diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 1992aa97d45a..77e985f21707 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -3,8 +3,8 @@ Device tree binding vendor prefix registry. Keep list in alphabetical order. This isn't an exhaustive list, but you should add new prefixes to it before using them to avoid name-space collisions. -abilis Abilis Systems abcn Abracon Corporation +abilis Abilis Systems active-semi Active-Semi International Inc ad Avionic Design GmbH adapteva Adapteva, Inc. @@ -36,6 +36,7 @@ aspeed ASPEED Technology Inc. atlas Atlas Scientific LLC atmel Atmel Corporation auo AU Optronics Corporation +auvidea Auvidea GmbH avago Avago Technologies avic Shanghai AVIC Optoelectronics Co., Ltd. axis Axis Communications AB @@ -75,6 +76,7 @@ digilent Diglent, Inc. dlg Dialog Semiconductor dlink D-Link Corporation dmo Data Modul AG +domintech Domintech Co., Ltd. dptechnics DPTechnics dragino Dragino Technology Co., Limited ea Embedded Artists AB @@ -85,6 +87,7 @@ elan Elan Microelectronic Corp. embest Shenzhen Embest Technology Co., Ltd. emmicro EM Microelectronic energymicro Silicon Laboratories (formerly Energy Micro AS) +engicam Engicam S.r.l. epcos EPCOS AG epfl Ecole Polytechnique Fรฉdรฉrale de Lausanne epson Seiko Epson Corp. @@ -101,8 +104,8 @@ focaltech FocalTech Systems Co.,Ltd fsl Freescale Semiconductor ge General Electric Company geekbuying GeekBuying -GEFanuc GE Fanuc Intelligent Platforms Embedded Systems, Inc. gef GE Fanuc Intelligent Platforms Embedded Systems, Inc. +GEFanuc GE Fanuc Intelligent Platforms Embedded Systems, Inc. geniatech Geniatech, Inc. giantplus Giantplus Technology Co., Ltd. globalscale Globalscale Technologies, Inc. @@ -126,7 +129,6 @@ i2se I2SE GmbH ibm International Business Machines (IBM) idt Integrated Device Technologies, Inc. ifi Ingenieurburo Fur Ic-Technologie (I/F/I) -iom Iomega Corporation img Imagination Technologies Ltd. infineon Infineon Technologies inforce Inforce Computing @@ -135,11 +137,14 @@ innolux Innolux Corporation intel Intel Corporation intercontrol Inter Control Group invensense InvenSense Inc. +iom Iomega Corporation isee ISEE 2007 S.L. isil Intersil issi Integrated Silicon Solutions Inc. +jdi Japan Display Inc. jedec JEDEC Solid State Technology Association karo Ka-Ro electronics GmbH +keithkoep Keith & Koep GmbH keymile Keymile GmbH kinetic Kinetic Technologies kosagi Sutajio Ko-Usagi PTE Ltd. @@ -149,8 +154,8 @@ lantiq Lantiq Semiconductor lenovo Lenovo Group Ltd. lg LG Corporation linux Linux-specific binding -lsi LSI Corp. (LSI Logic) lltc Linear Technology Corporation +lsi LSI Corp. (LSI Logic) marvell Marvell Technology Group Ltd. maxim Maxim Integrated Products meas Measurement Specialties @@ -190,20 +195,20 @@ onnn ON Semiconductor Corp. ontat On Tat Industrial Company opencores OpenCores.org option Option NV +ORCL Oracle Corporation ortustech Ortus Technology Co., Ltd. ovti OmniVision Technologies -ORCL Oracle Corporation oxsemi Oxford Semiconductor, Ltd. panasonic Panasonic Corporation parade Parade Technologies Inc. pericom Pericom Technology Inc. phytec PHYTEC Messtechnik GmbH picochip Picochip Ltd +pixcir PIXCIR MICROELECTRONICS Co., Ltd plathome Plat'Home Co., Ltd. plda PLDA -pixcir PIXCIR MICROELECTRONICS Co., Ltd -pulsedlight PulsedLight, Inc powervr PowerVR (deprecated, use img) +pulsedlight PulsedLight, Inc qca Qualcomm Atheros, Inc. qcom Qualcomm Technologies, Inc qemu QEMU, a generic and open source machine emulator and virtualizer @@ -231,12 +236,13 @@ sgx SGX Sensortech sharp Sharp Corporation si-en Si-En Technology Ltd. sigma Sigma Designs, Inc. +sii Seiko Instruments, Inc. sil Silicon Image silabs Silicon Laboratories +silead Silead Inc. +silergy Silergy Corp. siliconmitus Silicon Mitus, Inc. simtek -sii Seiko Instruments, Inc. -silergy Silergy Corp. sirf SiRF Technology, Inc. sis Silicon Integrated Systems Corp. sitronix Sitronix Technology Corporation @@ -254,9 +260,12 @@ starry Starry Electronic Technology (ShenZhen) Co., LTD startek Startek ste ST-Ericsson stericsson ST-Ericsson +summit Summit microelectronics +sunchip Shenzhen Sunchip Technology Co., Ltd +SUNW Sun Microsystems, Inc +swir Sierra Wireless syna Synaptics Inc. synology Synology, Inc. -SUNW Sun Microsystems, Inc tbs TBS Technologies tcg Trusted Computing Group tcl Toby Churchill Ltd. @@ -265,17 +274,18 @@ technologic Technologic Systems thine THine Electronics, Inc. ti Texas Instruments tlm Trusted Logic Mobility +topeet Topeet toradex Toradex AG toshiba Toshiba Corporation toumaz Toumaz -tplink TP-LINK Technologies Co., Ltd. tpk TPK U.S.A. LLC +tplink TP-LINK Technologies Co., Ltd. tronfy Tronfy tronsmart Tronsmart truly Truly Semiconductors Limited tyan Tyan Computer Corporation -upisemi uPI Semiconductor Corp. uniwest United Western Technologies Corp (UniWest) +upisemi uPI Semiconductor Corp. urt United Radiant Technology Corporation usi Universal Scientific Industrial Co., Ltd. v3 V3 Semiconductor @@ -293,7 +303,7 @@ x-powers X-Powers xes Extreme Engineering Solutions (X-ES) xillybus Xillybus Ltd. xlnx Xilinx -zyxel ZyXEL Communications Corp. zarlink Zarlink Semiconductor zii Zodiac Inflight Innovations zte ZTE Corp. +zyxel ZyXEL Communications Corp. diff --git a/Documentation/devicetree/changesets.txt b/Documentation/devicetree/changesets.txt index 935ba5acc34e..cb488eeb6353 100644 --- a/Documentation/devicetree/changesets.txt +++ b/Documentation/devicetree/changesets.txt @@ -21,20 +21,11 @@ a set of changes. No changes to the active tree are made at this point. All the change operations are recorded in the of_changeset 'entries' list. -3. mutex_lock(of_mutex) - starts a changeset; The global of_mutex -ensures there can only be one editor at a time. - -4. of_changeset_apply() - Apply the changes to the tree. Either the +3. of_changeset_apply() - Apply the changes to the tree. Either the entire changeset will get applied, or if there is an error the tree will -be restored to the previous state - -5. mutex_unlock(of_mutex) - All operations complete, release the mutex +be restored to the previous state. The core ensures proper serialization +through locking. An unlocked version __of_changeset_apply is available, +if needed. If a successfully applied changeset needs to be removed, it can be done -with the following sequence. - -1. mutex_lock(of_mutex) - -2. of_changeset_revert() - -3. mutex_unlock(of_mutex) +with of_changeset_revert(). diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt index 91ce82d5f0c4..c4fd47540b31 100644 --- a/Documentation/dmaengine/provider.txt +++ b/Documentation/dmaengine/provider.txt @@ -282,6 +282,17 @@ supported. that is supposed to push the current transaction descriptor to a pending queue, waiting for issue_pending to be called. + - In this structure the function pointer callback_result can be + initialized in order for the submitter to be notified that a + transaction has completed. In the earlier code the function pointer + callback has been used. However it does not provide any status to the + transaction and will be deprecated. The result structure defined as + dmaengine_result that is passed in to callback_result has two fields: + + result: This provides the transfer result defined by + dmaengine_tx_result. Either success or some error + condition. + + residue: Provides the residue bytes of the transfer for those that + support residue. * device_issue_pending - Takes the first transaction descriptor in the pending queue, diff --git a/Documentation/docutils.conf b/Documentation/docutils.conf new file mode 100644 index 000000000000..2830772264c8 --- /dev/null +++ b/Documentation/docutils.conf @@ -0,0 +1,7 @@ +# -*- coding: utf-8 mode: conf-colon -*- +# +# docutils configuration file +# http://docutils.sourceforge.net/docs/user/config.html + +[general] +halt_level: severe
\ No newline at end of file diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst new file mode 100644 index 000000000000..935b9b8d456c --- /dev/null +++ b/Documentation/driver-api/basics.rst @@ -0,0 +1,120 @@ +Driver Basics +============= + +Driver Entry and Exit points +---------------------------- + +.. kernel-doc:: include/linux/init.h + :internal: + +Atomic and pointer manipulation +------------------------------- + +.. kernel-doc:: arch/x86/include/asm/atomic.h + :internal: + +Delaying, scheduling, and timer routines +---------------------------------------- + +.. kernel-doc:: include/linux/sched.h + :internal: + +.. kernel-doc:: kernel/sched/core.c + :export: + +.. kernel-doc:: kernel/sched/cpupri.c + :internal: + +.. kernel-doc:: kernel/sched/fair.c + :internal: + +.. kernel-doc:: include/linux/completion.h + :internal: + +.. kernel-doc:: kernel/time/timer.c + :export: + +Wait queues and Wake events +--------------------------- + +.. kernel-doc:: include/linux/wait.h + :internal: + +.. kernel-doc:: kernel/sched/wait.c + :export: + +High-resolution timers +---------------------- + +.. kernel-doc:: include/linux/ktime.h + :internal: + +.. kernel-doc:: include/linux/hrtimer.h + :internal: + +.. kernel-doc:: kernel/time/hrtimer.c + :export: + +Workqueues and Kevents +---------------------- + +.. kernel-doc:: include/linux/workqueue.h + :internal: + +.. kernel-doc:: kernel/workqueue.c + :export: + +Internal Functions +------------------ + +.. kernel-doc:: kernel/exit.c + :internal: + +.. kernel-doc:: kernel/signal.c + :internal: + +.. kernel-doc:: include/linux/kthread.h + :internal: + +.. kernel-doc:: kernel/kthread.c + :export: + +Kernel objects manipulation +--------------------------- + +.. kernel-doc:: lib/kobject.c + :export: + +Kernel utility functions +------------------------ + +.. kernel-doc:: include/linux/kernel.h + :internal: + +.. kernel-doc:: kernel/printk/printk.c + :export: + +.. kernel-doc:: kernel/panic.c + :export: + +.. kernel-doc:: kernel/sys.c + :export: + +.. kernel-doc:: kernel/rcu/srcu.c + :export: + +.. kernel-doc:: kernel/rcu/tree.c + :export: + +.. kernel-doc:: kernel/rcu/tree_plugin.h + :export: + +.. kernel-doc:: kernel/rcu/update.c + :export: + +Device Resource Management +-------------------------- + +.. kernel-doc:: drivers/base/devres.c + :export: + diff --git a/Documentation/driver-api/frame-buffer.rst b/Documentation/driver-api/frame-buffer.rst new file mode 100644 index 000000000000..9dd3060f027d --- /dev/null +++ b/Documentation/driver-api/frame-buffer.rst @@ -0,0 +1,62 @@ +Frame Buffer Library +==================== + +The frame buffer drivers depend heavily on four data structures. These +structures are declared in include/linux/fb.h. They are fb_info, +fb_var_screeninfo, fb_fix_screeninfo and fb_monospecs. The last +three can be made available to and from userland. + +fb_info defines the current state of a particular video card. Inside +fb_info, there exists a fb_ops structure which is a collection of +needed functions to make fbdev and fbcon work. fb_info is only visible +to the kernel. + +fb_var_screeninfo is used to describe the features of a video card +that are user defined. With fb_var_screeninfo, things such as depth +and the resolution may be defined. + +The next structure is fb_fix_screeninfo. This defines the properties +of a card that are created when a mode is set and can't be changed +otherwise. A good example of this is the start of the frame buffer +memory. This "locks" the address of the frame buffer memory, so that it +cannot be changed or moved. + +The last structure is fb_monospecs. In the old API, there was little +importance for fb_monospecs. This allowed for forbidden things such as +setting a mode of 800x600 on a fix frequency monitor. With the new API, +fb_monospecs prevents such things, and if used correctly, can prevent a +monitor from being cooked. fb_monospecs will not be useful until +kernels 2.5.x. + +Frame Buffer Memory +------------------- + +.. kernel-doc:: drivers/video/fbdev/core/fbmem.c + :export: + +Frame Buffer Colormap +--------------------- + +.. kernel-doc:: drivers/video/fbdev/core/fbcmap.c + :export: + +Frame Buffer Video Mode Database +-------------------------------- + +.. kernel-doc:: drivers/video/fbdev/core/modedb.c + :internal: + +.. kernel-doc:: drivers/video/fbdev/core/modedb.c + :export: + +Frame Buffer Macintosh Video Mode Database +------------------------------------------ + +.. kernel-doc:: drivers/video/fbdev/macmodes.c + :export: + +Frame Buffer Fonts +------------------ + +Refer to the file lib/fonts/fonts.c for more information. + diff --git a/Documentation/driver-api/hsi.rst b/Documentation/driver-api/hsi.rst new file mode 100644 index 000000000000..f9cec02b72a1 --- /dev/null +++ b/Documentation/driver-api/hsi.rst @@ -0,0 +1,88 @@ +High Speed Synchronous Serial Interface (HSI) +============================================= + +Introduction +--------------- + +High Speed Syncronous Interface (HSI) is a fullduplex, low latency protocol, +that is optimized for die-level interconnect between an Application Processor +and a Baseband chipset. It has been specified by the MIPI alliance in 2003 and +implemented by multiple vendors since then. + +The HSI interface supports full duplex communication over multiple channels +(typically 8) and is capable of reaching speeds up to 200 Mbit/s. + +The serial protocol uses two signals, DATA and FLAG as combined data and clock +signals and an additional READY signal for flow control. An additional WAKE +signal can be used to wakeup the chips from standby modes. The signals are +commonly prefixed by AC for signals going from the application die to the +cellular die and CA for signals going the other way around. + +:: + + +------------+ +---------------+ + | Cellular | | Application | + | Die | | Die | + | | - - - - - - CAWAKE - - - - - - >| | + | T|------------ CADATA ------------>|R | + | X|------------ CAFLAG ------------>|X | + | |<----------- ACREADY ------------| | + | | | | + | | | | + | |< - - - - - ACWAKE - - - - - - -| | + | R|<----------- ACDATA -------------|T | + | X|<----------- ACFLAG -------------|X | + | |------------ CAREADY ----------->| | + | | | | + | | | | + +------------+ +---------------+ + +HSI Subsystem in Linux +------------------------- + +In the Linux kernel the hsi subsystem is supposed to be used for HSI devices. +The hsi subsystem contains drivers for hsi controllers including support for +multi-port controllers and provides a generic API for using the HSI ports. + +It also contains HSI client drivers, which make use of the generic API to +implement a protocol used on the HSI interface. These client drivers can +use an arbitrary number of channels. + +hsi-char Device +------------------ + +Each port automatically registers a generic client driver called hsi_char, +which provides a charecter device for userspace representing the HSI port. +It can be used to communicate via HSI from userspace. Userspace may +configure the hsi_char device using the following ioctl commands: + +HSC_RESET + flush the HSI port + +HSC_SET_PM + enable or disable the client. + +HSC_SEND_BREAK + send break + +HSC_SET_RX + set RX configuration + +HSC_GET_RX + get RX configuration + +HSC_SET_TX + set TX configuration + +HSC_GET_TX + get TX configuration + +The kernel HSI API +------------------ + +.. kernel-doc:: include/linux/hsi/hsi.h + :internal: + +.. kernel-doc:: drivers/hsi/hsi_core.c + :export: + diff --git a/Documentation/driver-api/i2c.rst b/Documentation/driver-api/i2c.rst new file mode 100644 index 000000000000..f3939f7852bd --- /dev/null +++ b/Documentation/driver-api/i2c.rst @@ -0,0 +1,46 @@ +I\ :sup:`2`\ C and SMBus Subsystem +================================== + +I\ :sup:`2`\ C (or without fancy typography, "I2C") is an acronym for +the "Inter-IC" bus, a simple bus protocol which is widely used where low +data rate communications suffice. Since it's also a licensed trademark, +some vendors use another name (such as "Two-Wire Interface", TWI) for +the same bus. I2C only needs two signals (SCL for clock, SDA for data), +conserving board real estate and minimizing signal quality issues. Most +I2C devices use seven bit addresses, and bus speeds of up to 400 kHz; +there's a high speed extension (3.4 MHz) that's not yet found wide use. +I2C is a multi-master bus; open drain signaling is used to arbitrate +between masters, as well as to handshake and to synchronize clocks from +slower clients. + +The Linux I2C programming interfaces support only the master side of bus +interactions, not the slave side. The programming interface is +structured around two kinds of driver, and two kinds of device. An I2C +"Adapter Driver" abstracts the controller hardware; it binds to a +physical device (perhaps a PCI device or platform_device) and exposes a +:c:type:`struct i2c_adapter <i2c_adapter>` representing each +I2C bus segment it manages. On each I2C bus segment will be I2C devices +represented by a :c:type:`struct i2c_client <i2c_client>`. +Those devices will be bound to a :c:type:`struct i2c_driver +<i2c_driver>`, which should follow the standard Linux driver +model. (At this writing, a legacy model is more widely used.) There are +functions to perform various I2C protocol operations; at this writing +all such functions are usable only from task context. + +The System Management Bus (SMBus) is a sibling protocol. Most SMBus +systems are also I2C conformant. The electrical constraints are tighter +for SMBus, and it standardizes particular protocol messages and idioms. +Controllers that support I2C can also support most SMBus operations, but +SMBus controllers don't support all the protocol options that an I2C +controller will. There are functions to perform various SMBus protocol +operations, either using I2C primitives or by issuing SMBus commands to +i2c_adapter devices which don't support those I2C operations. + +.. kernel-doc:: include/linux/i2c.h + :internal: + +.. kernel-doc:: drivers/i2c/i2c-boardinfo.c + :functions: i2c_register_board_info + +.. kernel-doc:: drivers/i2c/i2c-core.c + :export: diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst new file mode 100644 index 000000000000..8e259c5d0322 --- /dev/null +++ b/Documentation/driver-api/index.rst @@ -0,0 +1,26 @@ +======================================== +The Linux driver implementer's API guide +======================================== + +The kernel offers a wide variety of interfaces to support the development +of device drivers. This document is an only somewhat organized collection +of some of those interfaces โ it will hopefully get better over time! The +available subsections can be seen below. + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + basics + infrastructure + message-based + sound + frame-buffer + input + spi + i2c + hsi + miscellaneous diff --git a/Documentation/driver-api/infrastructure.rst b/Documentation/driver-api/infrastructure.rst new file mode 100644 index 000000000000..5d50d6733db3 --- /dev/null +++ b/Documentation/driver-api/infrastructure.rst @@ -0,0 +1,169 @@ +Device drivers infrastructure +============================= + +The Basic Device Driver-Model Structures +---------------------------------------- + +.. kernel-doc:: include/linux/device.h + :internal: + +Device Drivers Base +------------------- + +.. kernel-doc:: drivers/base/init.c + :internal: + +.. kernel-doc:: drivers/base/driver.c + :export: + +.. kernel-doc:: drivers/base/core.c + :export: + +.. kernel-doc:: drivers/base/syscore.c + :export: + +.. kernel-doc:: drivers/base/class.c + :export: + +.. kernel-doc:: drivers/base/node.c + :internal: + +.. kernel-doc:: drivers/base/firmware_class.c + :export: + +.. kernel-doc:: drivers/base/transport_class.c + :export: + +.. kernel-doc:: drivers/base/dd.c + :export: + +.. kernel-doc:: include/linux/platform_device.h + :internal: + +.. kernel-doc:: drivers/base/platform.c + :export: + +.. kernel-doc:: drivers/base/bus.c + :export: + +Buffer Sharing and Synchronization +---------------------------------- + +The dma-buf subsystem provides the framework for sharing buffers for +hardware (DMA) access across multiple device drivers and subsystems, and +for synchronizing asynchronous hardware access. + +This is used, for example, by drm "prime" multi-GPU support, but is of +course not limited to GPU use cases. + +The three main components of this are: (1) dma-buf, representing a +sg_table and exposed to userspace as a file descriptor to allow passing +between devices, (2) fence, which provides a mechanism to signal when +one device as finished access, and (3) reservation, which manages the +shared or exclusive fence(s) associated with the buffer. + +dma-buf +~~~~~~~ + +.. kernel-doc:: drivers/dma-buf/dma-buf.c + :export: + +.. kernel-doc:: include/linux/dma-buf.h + :internal: + +reservation +~~~~~~~~~~~ + +.. kernel-doc:: drivers/dma-buf/reservation.c + :doc: Reservation Object Overview + +.. kernel-doc:: drivers/dma-buf/reservation.c + :export: + +.. kernel-doc:: include/linux/reservation.h + :internal: + +fence +~~~~~ + +.. kernel-doc:: drivers/dma-buf/fence.c + :export: + +.. kernel-doc:: include/linux/fence.h + :internal: + +.. kernel-doc:: drivers/dma-buf/seqno-fence.c + :export: + +.. kernel-doc:: include/linux/seqno-fence.h + :internal: + +.. kernel-doc:: drivers/dma-buf/fence-array.c + :export: + +.. kernel-doc:: include/linux/fence-array.h + :internal: + +.. kernel-doc:: drivers/dma-buf/reservation.c + :export: + +.. kernel-doc:: include/linux/reservation.h + :internal: + +.. kernel-doc:: drivers/dma-buf/sync_file.c + :export: + +.. kernel-doc:: include/linux/sync_file.h + :internal: + +Device Drivers DMA Management +----------------------------- + +.. kernel-doc:: drivers/base/dma-coherent.c + :export: + +.. kernel-doc:: drivers/base/dma-mapping.c + :export: + +Device Drivers Power Management +------------------------------- + +.. kernel-doc:: drivers/base/power/main.c + :export: + +Device Drivers ACPI Support +--------------------------- + +.. kernel-doc:: drivers/acpi/scan.c + :export: + +.. kernel-doc:: drivers/acpi/scan.c + :internal: + +Device drivers PnP support +-------------------------- + +.. kernel-doc:: drivers/pnp/core.c + :internal: + +.. kernel-doc:: drivers/pnp/card.c + :export: + +.. kernel-doc:: drivers/pnp/driver.c + :internal: + +.. kernel-doc:: drivers/pnp/manager.c + :export: + +.. kernel-doc:: drivers/pnp/support.c + :export: + +Userspace IO devices +-------------------- + +.. kernel-doc:: drivers/uio/uio.c + :export: + +.. kernel-doc:: include/linux/uio_driver.h + :internal: + diff --git a/Documentation/driver-api/input.rst b/Documentation/driver-api/input.rst new file mode 100644 index 000000000000..d05bf58fa83e --- /dev/null +++ b/Documentation/driver-api/input.rst @@ -0,0 +1,51 @@ +Input Subsystem +=============== + +Input core +---------- + +.. kernel-doc:: include/linux/input.h + :internal: + +.. kernel-doc:: drivers/input/input.c + :export: + +.. kernel-doc:: drivers/input/ff-core.c + :export: + +.. kernel-doc:: drivers/input/ff-memless.c + :export: + +Multitouch Library +------------------ + +.. kernel-doc:: include/linux/input/mt.h + :internal: + +.. kernel-doc:: drivers/input/input-mt.c + :export: + +Polled input devices +-------------------- + +.. kernel-doc:: include/linux/input-polldev.h + :internal: + +.. kernel-doc:: drivers/input/input-polldev.c + :export: + +Matrix keyboards/keypads +------------------------ + +.. kernel-doc:: include/linux/input/matrix_keypad.h + :internal: + +Sparse keymap support +--------------------- + +.. kernel-doc:: include/linux/input/sparse-keymap.h + :internal: + +.. kernel-doc:: drivers/input/sparse-keymap.c + :export: + diff --git a/Documentation/driver-api/message-based.rst b/Documentation/driver-api/message-based.rst new file mode 100644 index 000000000000..18ff94ef6d8e --- /dev/null +++ b/Documentation/driver-api/message-based.rst @@ -0,0 +1,12 @@ +Message-based devices +===================== + +Fusion message devices +---------------------- + +.. kernel-doc:: drivers/message/fusion/mptbase.c + :export: + +.. kernel-doc:: drivers/message/fusion/mptscsih.c + :export: + diff --git a/Documentation/driver-api/miscellaneous.rst b/Documentation/driver-api/miscellaneous.rst new file mode 100644 index 000000000000..8da7d115bafc --- /dev/null +++ b/Documentation/driver-api/miscellaneous.rst @@ -0,0 +1,50 @@ +Parallel Port Devices +===================== + +.. kernel-doc:: include/linux/parport.h + :internal: + +.. kernel-doc:: drivers/parport/ieee1284.c + :export: + +.. kernel-doc:: drivers/parport/share.c + :export: + +.. kernel-doc:: drivers/parport/daisy.c + :internal: + +16x50 UART Driver +================= + +.. kernel-doc:: drivers/tty/serial/serial_core.c + :export: + +.. kernel-doc:: drivers/tty/serial/8250/8250_core.c + :export: + +Pulse-Width Modulation (PWM) +============================ + +Pulse-width modulation is a modulation technique primarily used to +control power supplied to electrical devices. + +The PWM framework provides an abstraction for providers and consumers of +PWM signals. A controller that provides one or more PWM signals is +registered as :c:type:`struct pwm_chip <pwm_chip>`. Providers +are expected to embed this structure in a driver-specific structure. +This structure contains fields that describe a particular chip. + +A chip exposes one or more PWM signal sources, each of which exposed as +a :c:type:`struct pwm_device <pwm_device>`. Operations can be +performed on PWM devices to control the period, duty cycle, polarity and +active state of the signal. + +Note that PWM devices are exclusive resources: they can always only be +used by one consumer at a time. + +.. kernel-doc:: include/linux/pwm.h + :internal: + +.. kernel-doc:: drivers/pwm/core.c + :export: + diff --git a/Documentation/driver-api/sound.rst b/Documentation/driver-api/sound.rst new file mode 100644 index 000000000000..afef6eabc073 --- /dev/null +++ b/Documentation/driver-api/sound.rst @@ -0,0 +1,54 @@ +Sound Devices +============= + +.. kernel-doc:: include/sound/core.h + :internal: + +.. kernel-doc:: sound/sound_core.c + :export: + +.. kernel-doc:: include/sound/pcm.h + :internal: + +.. kernel-doc:: sound/core/pcm.c + :export: + +.. kernel-doc:: sound/core/device.c + :export: + +.. kernel-doc:: sound/core/info.c + :export: + +.. kernel-doc:: sound/core/rawmidi.c + :export: + +.. kernel-doc:: sound/core/sound.c + :export: + +.. kernel-doc:: sound/core/memory.c + :export: + +.. kernel-doc:: sound/core/pcm_memory.c + :export: + +.. kernel-doc:: sound/core/init.c + :export: + +.. kernel-doc:: sound/core/isadma.c + :export: + +.. kernel-doc:: sound/core/control.c + :export: + +.. kernel-doc:: sound/core/pcm_lib.c + :export: + +.. kernel-doc:: sound/core/hwdep.c + :export: + +.. kernel-doc:: sound/core/pcm_native.c + :export: + +.. kernel-doc:: sound/core/memalloc.c + :export: + diff --git a/Documentation/driver-api/spi.rst b/Documentation/driver-api/spi.rst new file mode 100644 index 000000000000..f64cb666498a --- /dev/null +++ b/Documentation/driver-api/spi.rst @@ -0,0 +1,53 @@ +Serial Peripheral Interface (SPI) +================================= + +SPI is the "Serial Peripheral Interface", widely used with embedded +systems because it is a simple and efficient interface: basically a +multiplexed shift register. Its three signal wires hold a clock (SCK, +often in the range of 1-20 MHz), a "Master Out, Slave In" (MOSI) data +line, and a "Master In, Slave Out" (MISO) data line. SPI is a full +duplex protocol; for each bit shifted out the MOSI line (one per clock) +another is shifted in on the MISO line. Those bits are assembled into +words of various sizes on the way to and from system memory. An +additional chipselect line is usually active-low (nCS); four signals are +normally used for each peripheral, plus sometimes an interrupt. + +The SPI bus facilities listed here provide a generalized interface to +declare SPI busses and devices, manage them according to the standard +Linux driver model, and perform input/output operations. At this time, +only "master" side interfaces are supported, where Linux talks to SPI +peripherals and does not implement such a peripheral itself. (Interfaces +to support implementing SPI slaves would necessarily look different.) + +The programming interface is structured around two kinds of driver, and +two kinds of device. A "Controller Driver" abstracts the controller +hardware, which may be as simple as a set of GPIO pins or as complex as +a pair of FIFOs connected to dual DMA engines on the other side of the +SPI shift register (maximizing throughput). Such drivers bridge between +whatever bus they sit on (often the platform bus) and SPI, and expose +the SPI side of their device as a :c:type:`struct spi_master +<spi_master>`. SPI devices are children of that master, +represented as a :c:type:`struct spi_device <spi_device>` and +manufactured from :c:type:`struct spi_board_info +<spi_board_info>` descriptors which are usually provided by +board-specific initialization code. A :c:type:`struct spi_driver +<spi_driver>` is called a "Protocol Driver", and is bound to a +spi_device using normal driver model calls. + +The I/O model is a set of queued messages. Protocol drivers submit one +or more :c:type:`struct spi_message <spi_message>` objects, +which are processed and completed asynchronously. (There are synchronous +wrappers, however.) Messages are built from one or more +:c:type:`struct spi_transfer <spi_transfer>` objects, each of +which wraps a full duplex SPI transfer. A variety of protocol tweaking +options are needed, because different chips adopt very different +policies for how they use the bits transferred with SPI. + +.. kernel-doc:: include/linux/spi/spi.h + :internal: + +.. kernel-doc:: drivers/spi/spi.c + :functions: spi_register_board_info + +.. kernel-doc:: drivers/spi/spi.c + :export: diff --git a/Documentation/driver-model/device.txt b/Documentation/driver-model/device.txt index 1e70220d20f4..2403eb856187 100644 --- a/Documentation/driver-model/device.txt +++ b/Documentation/driver-model/device.txt @@ -50,7 +50,7 @@ Attributes of devices can be exported by a device driver through sysfs. Please see Documentation/filesystems/sysfs.txt for more information on how sysfs works. -As explained in Documentation/kobject.txt, device attributes must be be +As explained in Documentation/kobject.txt, device attributes must be created before the KOBJ_ADD uevent is generated. The only way to realize that is by defining an attribute group. diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index b0d775d28e97..75bc5b8add2f 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -266,8 +266,12 @@ IIO devm_iio_device_unregister() devm_iio_kfifo_allocate() devm_iio_kfifo_free() + devm_iio_triggered_buffer_setup() + devm_iio_triggered_buffer_cleanup() devm_iio_trigger_alloc() devm_iio_trigger_free() + devm_iio_trigger_register() + devm_iio_trigger_unregister() devm_iio_channel_get() devm_iio_channel_release() devm_iio_channel_get_all() diff --git a/Documentation/email-clients.txt b/Documentation/email-clients.txt index 2d485dea8cec..ac892b30815e 100644 --- a/Documentation/email-clients.txt +++ b/Documentation/email-clients.txt @@ -1,23 +1,27 @@ +.. _email_clients: + Email clients info for Linux -====================================================================== +============================ Git ----------------------------------------------------------------------- -These days most developers use `git send-email` instead of regular +--- + +These days most developers use ``git send-email`` instead of regular email clients. The man page for this is quite good. On the receiving -end, maintainers use `git am` to apply the patches. +end, maintainers use ``git am`` to apply the patches. -If you are new to git then send your first patch to yourself. Save it -as raw text including all the headers. Run `git am raw_email.txt` and -then review the changelog with `git log`. When that works then send +If you are new to ``git`` then send your first patch to yourself. Save it +as raw text including all the headers. Run ``git am raw_email.txt`` and +then review the changelog with ``git log``. When that works then send the patch to the appropriate mailing list(s). General Preferences ----------------------------------------------------------------------- +------------------- + Patches for the Linux kernel are submitted via email, preferably as inline text in the body of the email. Some maintainers accept attachments, but then the attachments should have content-type -"text/plain". However, attachments are generally frowned upon because +``text/plain``. However, attachments are generally frowned upon because it makes quoting portions of the patch more difficult in the patch review process. @@ -25,7 +29,7 @@ Email clients that are used for Linux kernel patches should send the patch text untouched. For example, they should not modify or delete tabs or spaces, even at the beginning or end of lines. -Don't send patches with "format=flowed". This can cause unexpected +Don't send patches with ``format=flowed``. This can cause unexpected and unwanted line breaks. Don't let your email client do automatic word wrapping for you. @@ -54,57 +58,63 @@ mailing lists. Some email client (MUA) hints ----------------------------------------------------------------------- +----------------------------- + Here are some specific MUA configuration hints for editing and sending patches for the Linux kernel. These are not meant to be complete software package configuration summaries. + Legend: -TUI = text-based user interface -GUI = graphical user interface -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +- TUI = text-based user interface +- GUI = graphical user interface + Alpine (TUI) +************ Config options: -In the "Sending Preferences" section: -- "Do Not Send Flowed Text" must be enabled -- "Strip Whitespace Before Sending" must be disabled +In the :menuselection:`Sending Preferences` section: + +- :menuselection:`Do Not Send Flowed Text` must be ``enabled`` +- :menuselection:`Strip Whitespace Before Sending` must be ``disabled`` When composing the message, the cursor should be placed where the patch -should appear, and then pressing CTRL-R let you specify the patch file +should appear, and then pressing :kbd:`CTRL-R` let you specify the patch file to insert into the message. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Claws Mail (GUI) +**************** Works. Some people use this successfully for patches. -To insert a patch use Message->Insert File (CTRL+i) or an external editor. +To insert a patch use :menuselection:`Message-->Insert` File (:kbd:`CTRL-I`) +or an external editor. If the inserted patch has to be edited in the Claws composition window -"Auto wrapping" in Configuration->Preferences->Compose->Wrapping should be +"Auto wrapping" in +:menuselection:`Configuration-->Preferences-->Compose-->Wrapping` should be disabled. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Evolution (GUI) +*************** Some people use this successfully for patches. When composing mail select: Preformat - from Format->Paragraph Style->Preformatted (Ctrl-7) + from :menuselection:`Format-->Paragraph Style-->Preformatted` (:kbd:`CTRL-7`) or the toolbar Then use: - Insert->Text File... (Alt-n x) +:menuselection:`Insert-->Text File...` (:kbd:`ALT-N x`) to insert the patch. -You can also "diff -Nru old.c new.c | xclip", select Preformat, then -paste with the middle button. +You can also ``diff -Nru old.c new.c | xclip``, select +:menuselection:`Preformat`, then paste with the middle button. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kmail (GUI) +*********** Some people use Kmail successfully for patches. @@ -120,11 +130,12 @@ word-wrapped and you can uncheck "word wrap" without losing the existing wrapping. At the bottom of your email, put the commonly-used patch delimiter before -inserting your patch: three hyphens (---). +inserting your patch: three hyphens (``---``). -Then from the "Message" menu item, select insert file and choose your patch. +Then from the :menuselection:`Message` menu item, select insert file and +choose your patch. As an added bonus you can customise the message creation toolbar menu -and put the "insert file" icon there. +and put the :menuselection:`insert file` icon there. Make the composer window wide enough so that no lines wrap. As of KMail 1.13.5 (KDE 4.5.4), KMail will apply word wrapping when sending @@ -139,86 +150,96 @@ as inlined text will make them tricky to extract from their 7-bit encoding. If you absolutely must send patches as attachments instead of inlining them as text, right click on the attachment and select properties, and -highlight "Suggest automatic display" to make the attachment inlined to -make it more viewable. +highlight :menuselection:`Suggest automatic display` to make the attachment +inlined to make it more viewable. When saving patches that are sent as inlined text, select the email that contains the patch from the message list pane, right click and select -"save as". You can use the whole email unmodified as a patch if it was -properly composed. There is no option currently to save the email when you -are actually viewing it in its own window -- there has been a request filed -at kmail's bugzilla and hopefully this will be addressed. Emails are saved -as read-write for user only so you will have to chmod them to make them +:menuselection:`save as`. You can use the whole email unmodified as a patch +if it was properly composed. There is no option currently to save the email +when you are actually viewing it in its own window -- there has been a request +filed at kmail's bugzilla and hopefully this will be addressed. Emails are +saved as read-write for user only so you will have to chmod them to make them group and world readable if you copy them elsewhere. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lotus Notes (GUI) +***************** Run away from it. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mutt (TUI) +********** -Plenty of Linux developers use mutt, so it must work pretty well. +Plenty of Linux developers use ``mutt``, so it must work pretty well. Mutt doesn't come with an editor, so whatever editor you use should be used in a way that there are no automatic linebreaks. Most editors have -an "insert file" option that inserts the contents of a file unaltered. +an :menuselection:`insert file` option that inserts the contents of a file +unaltered. + +To use ``vim`` with mutt:: -To use 'vim' with mutt: set editor="vi" - If using xclip, type the command +If using xclip, type the command:: + :set paste - before middle button or shift-insert or use + +before middle button or shift-insert or use:: + :r filename if you want to include the patch inline. -(a)ttach works fine without "set paste". +(a)ttach works fine without ``set paste``. + +You can also generate patches with ``git format-patch`` and then use Mutt +to send them:: -You can also generate patches with 'git format-patch' and then use Mutt -to send them: $ mutt -H 0001-some-bug-fix.patch Config options: + It should work with default settings. -However, it's a good idea to set the "send_charset" to: +However, it's a good idea to set the ``send_charset`` to:: + set send_charset="us-ascii:utf-8" Mutt is highly customizable. Here is a minimum configuration to start -using Mutt to send patches through Gmail: - -# .muttrc -# ================ IMAP ==================== -set imap_user = 'yourusername@gmail.com' -set imap_pass = 'yourpassword' -set spoolfile = imaps://imap.gmail.com/INBOX -set folder = imaps://imap.gmail.com/ -set record="imaps://imap.gmail.com/[Gmail]/Sent Mail" -set postponed="imaps://imap.gmail.com/[Gmail]/Drafts" -set mbox="imaps://imap.gmail.com/[Gmail]/All Mail" - -# ================ SMTP ==================== -set smtp_url = "smtp://username@smtp.gmail.com:587/" -set smtp_pass = $imap_pass -set ssl_force_tls = yes # Require encrypted connection - -# ================ Composition ==================== -set editor = `echo \$EDITOR` -set edit_headers = yes # See the headers when editing -set charset = UTF-8 # value of $LANG; also fallback for send_charset -# Sender, email address, and sign-off line must match -unset use_domain # because joe@localhost is just embarrassing -set realname = "YOUR NAME" -set from = "username@gmail.com" -set use_from = yes +using Mutt to send patches through Gmail:: + + # .muttrc + # ================ IMAP ==================== + set imap_user = 'yourusername@gmail.com' + set imap_pass = 'yourpassword' + set spoolfile = imaps://imap.gmail.com/INBOX + set folder = imaps://imap.gmail.com/ + set record="imaps://imap.gmail.com/[Gmail]/Sent Mail" + set postponed="imaps://imap.gmail.com/[Gmail]/Drafts" + set mbox="imaps://imap.gmail.com/[Gmail]/All Mail" + + # ================ SMTP ==================== + set smtp_url = "smtp://username@smtp.gmail.com:587/" + set smtp_pass = $imap_pass + set ssl_force_tls = yes # Require encrypted connection + + # ================ Composition ==================== + set editor = `echo \$EDITOR` + set edit_headers = yes # See the headers when editing + set charset = UTF-8 # value of $LANG; also fallback for send_charset + # Sender, email address, and sign-off line must match + unset use_domain # because joe@localhost is just embarrassing + set realname = "YOUR NAME" + set from = "username@gmail.com" + set use_from = yes The Mutt docs have lots more information: + http://dev.mutt.org/trac/wiki/UseCases/Gmail + http://dev.mutt.org/doc/manual.html -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pine (TUI) +********** Pine has had some whitespace truncation issues in the past, but these should all be fixed now. @@ -226,12 +247,13 @@ should all be fixed now. Use alpine (pine's successor) if you can. Config options: -- quell-flowed-text is needed for recent versions -- the "no-strip-whitespace-before-send" option is needed + +- ``quell-flowed-text`` is needed for recent versions +- the ``no-strip-whitespace-before-send`` option is needed -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sylpheed (GUI) +************** - Works well for inlining text (or using attachments). - Allows use of an external editor. @@ -241,50 +263,50 @@ Sylpheed (GUI) - Adding addresses to address book doesn't understand the display name properly. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Thunderbird (GUI) +***************** Thunderbird is an Outlook clone that likes to mangle text, but there are ways to coerce it into behaving. - Allow use of an external editor: The easiest thing to do with Thunderbird and patches is to use an - "external editor" extension and then just use your favorite $EDITOR + "external editor" extension and then just use your favorite ``$EDITOR`` for reading/merging patches into the body text. To do this, download and install the extension, then add a button for it using - View->Toolbars->Customize... and finally just click on it when in the - Compose dialog. + :menuselection:`View-->Toolbars-->Customize...` and finally just click on it + when in the :menuselection:`Compose` dialog. Please note that "external editor" requires that your editor must not fork, or in other words, the editor must not return before closing. You may have to pass additional flags or change the settings of your editor. Most notably if you are using gvim then you must pass the -f - option to gvim by putting "/usr/bin/gvim -f" (if the binary is in - /usr/bin) to the text editor field in "external editor" settings. If you - are using some other editor then please read its manual to find out how - to do this. + option to gvim by putting ``/usr/bin/gvim -f`` (if the binary is in + ``/usr/bin``) to the text editor field in :menuselection:`external editor` + settings. If you are using some other editor then please read its manual + to find out how to do this. To beat some sense out of the internal editor, do this: -- Edit your Thunderbird config settings so that it won't use format=flowed. - Go to "edit->preferences->advanced->config editor" to bring up the - thunderbird's registry editor. +- Edit your Thunderbird config settings so that it won't use ``format=flowed``. + Go to :menuselection:`edit-->preferences-->advanced-->config editor` to bring up + the thunderbird's registry editor. -- Set "mailnews.send_plaintext_flowed" to "false" +- Set ``mailnews.send_plaintext_flowed`` to ``false`` -- Set "mailnews.wraplength" from "72" to "0" +- Set ``mailnews.wraplength`` from ``72`` to ``0`` -- "View" > "Message Body As" > "Plain Text" +- :menuselection:`View-->Message Body As-->Plain Text` -- "View" > "Character Encoding" > "Unicode (UTF-8)" +- :menuselection:`View-->Character Encoding-->Unicode (UTF-8)` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TkRat (GUI) +*********** Works. Use "Insert file..." or external editor. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gmail (Web GUI) +*************** Does not work for sending patches. @@ -295,5 +317,3 @@ although tab2space problem can be solved with external editor. Another problem is that Gmail will base64-encode any message that has a non-ASCII character. That includes things like European names. - - ### diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 0c16a22521a8..23d18b8a49d5 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -123,9 +123,12 @@ The DAX code does not work correctly on architectures which have virtually mapped caches such as ARM, MIPS and SPARC. Calling get_user_pages() on a range of user memory that has been mmaped -from a DAX file will fail as there are no 'struct page' to describe -those pages. This problem is being worked on. That means that O_DIRECT -reads/writes to those memory ranges from a non-DAX file will fail (note -that O_DIRECT reads/writes _of a DAX file_ do work, it is the memory -that is being accessed that is key here). Other things that will not -work include RDMA, sendfile() and splice(). +from a DAX file will fail when there are no 'struct page' to describe +those pages. This problem has been addressed in some device drivers +by adding optional struct page support for pages under the control of +the driver (see CONFIG_NVDIMM_PFN in drivers/nvdimm for an example of +how to do this). In the non struct page cases O_DIRECT reads/writes to +those memory ranges from a non-DAX file will fail (note that O_DIRECT +reads/writes _of a DAX file_ do work, it is the memory that is being +accessed that is key here). Other things that will not work in the +non struct page case include RDMA, sendfile() and splice(). diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index ecd808088362..753dd4f96afe 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -131,6 +131,7 @@ inline_dentry Enable the inline dir feature: data in new created directory entries can be written into inode block. The space of inode block which is used to store inline dentries is limited to ~3.4k. +noinline_dentry Diable the inline dentry feature. flush_merge Merge concurrent cache_flush commands as much as possible to eliminate redundant command issues. If the underlying device handles the cache_flush command relatively slowly, diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 68080ad6a75e..fcc1ac094282 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -145,7 +145,7 @@ Table 1-1: Process specific entries in /proc symbol the task is blocked in - or "0" if not blocked. pagemap Page table stack Report full stack trace, enable via CONFIG_STACKTRACE - smaps a extension based on maps, showing the memory consumption of + smaps an extension based on maps, showing the memory consumption of each mapping and flags associated with it numa_maps an extension based on maps, showing the memory locality and binding policy as well as mem usage (in pages) of each mapping. diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 8146e9fd5ffc..c2d44e6e117b 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -348,3 +348,126 @@ Removed Sysctls ---- ------- fs.xfs.xfsbufd_centisec v4.0 fs.xfs.age_buffer_centisecs v4.0 + + +Error handling +============== + +XFS can act differently according to the type of error found during its +operation. The implementation introduces the following concepts to the error +handler: + + -failure speed: + Defines how fast XFS should propagate an error upwards when a specific + error is found during the filesystem operation. It can propagate + immediately, after a defined number of retries, after a set time period, + or simply retry forever. + + -error classes: + Specifies the subsystem the error configuration will apply to, such as + metadata IO or memory allocation. Different subsystems will have + different error handlers for which behaviour can be configured. + + -error handlers: + Defines the behavior for a specific error. + +The filesystem behavior during an error can be set via sysfs files. Each +error handler works independently - the first condition met by an error handler +for a specific class will cause the error to be propagated rather than reset and +retried. + +The action taken by the filesystem when the error is propagated is context +dependent - it may cause a shut down in the case of an unrecoverable error, +it may be reported back to userspace, or it may even be ignored because +there's nothing useful we can with the error or anyone we can report it to (e.g. +during unmount). + +The configuration files are organized into the following hierarchy for each +mounted filesystem: + + /sys/fs/xfs/<dev>/error/<class>/<error>/ + +Where: + <dev> + The short device name of the mounted filesystem. This is the same device + name that shows up in XFS kernel error messages as "XFS(<dev>): ..." + + <class> + The subsystem the error configuration belongs to. As of 4.9, the defined + classes are: + + - "metadata": applies metadata buffer write IO + + <error> + The individual error handler configurations. + + +Each filesystem has "global" error configuration options defined in their top +level directory: + + /sys/fs/xfs/<dev>/error/ + + fail_at_unmount (Min: 0 Default: 1 Max: 1) + Defines the filesystem error behavior at unmount time. + + If set to a value of 1, XFS will override all other error configurations + during unmount and replace them with "immediate fail" characteristics. + i.e. no retries, no retry timeout. This will always allow unmount to + succeed when there are persistent errors present. + + If set to 0, the configured retry behaviour will continue until all + retries and/or timeouts have been exhausted. This will delay unmount + completion when there are persistent errors, and it may prevent the + filesystem from ever unmounting fully in the case of "retry forever" + handler configurations. + + Note: there is no guarantee that fail_at_unmount can be set whilst an + unmount is in progress. It is possible that the sysfs entries are + removed by the unmounting filesystem before a "retry forever" error + handler configuration causes unmount to hang, and hence the filesystem + must be configured appropriately before unmount begins to prevent + unmount hangs. + +Each filesystem has specific error class handlers that define the error +propagation behaviour for specific errors. There is also a "default" error +handler defined, which defines the behaviour for all errors that don't have +specific handlers defined. Where multiple retry constraints are configuredi for +a single error, the first retry configuration that expires will cause the error +to be propagated. The handler configurations are found in the directory: + + /sys/fs/xfs/<dev>/error/<class>/<error>/ + + max_retries (Min: -1 Default: Varies Max: INTMAX) + Defines the allowed number of retries of a specific error before + the filesystem will propagate the error. The retry count for a given + error context (e.g. a specific metadata buffer) is reset every time + there is a successful completion of the operation. + + Setting the value to "-1" will cause XFS to retry forever for this + specific error. + + Setting the value to "0" will cause XFS to fail immediately when the + specific error is reported. + + Setting the value to "N" (where 0 < N < Max) will make XFS retry the + operation "N" times before propagating the error. + + retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day) + Define the amount of time (in seconds) that the filesystem is + allowed to retry its operations when the specific error is + found. + + Setting the value to "-1" will allow XFS to retry forever for this + specific error. + + Setting the value to "0" will cause XFS to fail immediately when the + specific error is reported. + + Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the + operation for up to "N" seconds before propagating the error. + +Note: The default behaviour for a specific error handler is dependent on both +the class and error context. For example, the default values for +"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults +to "fail immediately" behaviour. This is done because ENODEV is a fatal, +unrecoverable error no matter how many times the metadata IO is retried. diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt deleted file mode 100644 index 7b727783db7e..000000000000 --- a/Documentation/gcov.txt +++ /dev/null @@ -1,257 +0,0 @@ -Using gcov with the Linux kernel -================================ - -1. Introduction -2. Preparation -3. Customization -4. Files -5. Modules -6. Separated build and test machines -7. Troubleshooting -Appendix A: sample script: gather_on_build.sh -Appendix B: sample script: gather_on_test.sh - - -1. Introduction -=============== - -gcov profiling kernel support enables the use of GCC's coverage testing -tool gcov [1] with the Linux kernel. Coverage data of a running kernel -is exported in gcov-compatible format via the "gcov" debugfs directory. -To get coverage data for a specific file, change to the kernel build -directory and use gcov with the -o option as follows (requires root): - -# cd /tmp/linux-out -# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c - -This will create source code files annotated with execution counts -in the current directory. In addition, graphical gcov front-ends such -as lcov [2] can be used to automate the process of collecting data -for the entire kernel and provide coverage overviews in HTML format. - -Possible uses: - -* debugging (has this line been reached at all?) -* test improvement (how do I change my test to cover these lines?) -* minimizing kernel configurations (do I need this option if the - associated code is never run?) - --- - -[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html -[2] http://ltp.sourceforge.net/coverage/lcov.php - - -2. Preparation -============== - -Configure the kernel with: - - CONFIG_DEBUG_FS=y - CONFIG_GCOV_KERNEL=y - -select the gcc's gcov format, default is autodetect based on gcc version: - - CONFIG_GCOV_FORMAT_AUTODETECT=y - -and to get coverage data for the entire kernel: - - CONFIG_GCOV_PROFILE_ALL=y - -Note that kernels compiled with profiling flags will be significantly -larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported -on all architectures. - -Profiling data will only become accessible once debugfs has been -mounted: - - mount -t debugfs none /sys/kernel/debug - - -3. Customization -================ - -To enable profiling for specific files or directories, add a line -similar to the following to the respective kernel Makefile: - - For a single file (e.g. main.o): - GCOV_PROFILE_main.o := y - - For all files in one directory: - GCOV_PROFILE := y - -To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL -is specified, use: - - GCOV_PROFILE_main.o := n - and: - GCOV_PROFILE := n - -Only files which are linked to the main kernel image or are compiled as -kernel modules are supported by this mechanism. - - -4. Files -======== - -The gcov kernel support creates the following files in debugfs: - - /sys/kernel/debug/gcov - Parent directory for all gcov-related files. - - /sys/kernel/debug/gcov/reset - Global reset file: resets all coverage data to zero when - written to. - - /sys/kernel/debug/gcov/path/to/compile/dir/file.gcda - The actual gcov data file as understood by the gcov - tool. Resets file coverage data to zero when written to. - - /sys/kernel/debug/gcov/path/to/compile/dir/file.gcno - Symbolic link to a static data file required by the gcov - tool. This file is generated by gcc when compiling with - option -ftest-coverage. - - -5. Modules -========== - -Kernel modules may contain cleanup code which is only run during -module unload time. The gcov mechanism provides a means to collect -coverage data for such code by keeping a copy of the data associated -with the unloaded module. This data remains available through debugfs. -Once the module is loaded again, the associated coverage counters are -initialized with the data from its previous instantiation. - -This behavior can be deactivated by specifying the gcov_persist kernel -parameter: - - gcov_persist=0 - -At run-time, a user can also choose to discard data for an unloaded -module by writing to its data file or the global reset file. - - -6. Separated build and test machines -==================================== - -The gcov kernel profiling infrastructure is designed to work out-of-the -box for setups where kernels are built and run on the same machine. In -cases where the kernel runs on a separate machine, special preparations -must be made, depending on where the gcov tool is used: - -a) gcov is run on the TEST machine - -The gcov tool version on the test machine must be compatible with the -gcc version used for kernel build. Also the following files need to be -copied from build to test machine: - -from the source tree: - - all C source files + headers - -from the build tree: - - all C source files + headers - - all .gcda and .gcno files - - all links to directories - -It is important to note that these files need to be placed into the -exact same file system location on the test machine as on the build -machine. If any of the path components is symbolic link, the actual -directory needs to be used instead (due to make's CURDIR handling). - -b) gcov is run on the BUILD machine - -The following files need to be copied after each test case from test -to build machine: - -from the gcov directory in sysfs: - - all .gcda files - - all links to .gcno files - -These files can be copied to any location on the build machine. gcov -must then be called with the -o option pointing to that directory. - -Example directory setup on the build machine: - - /tmp/linux: kernel source tree - /tmp/out: kernel build directory as specified by make O= - /tmp/coverage: location of the files copied from the test machine - - [user@build] cd /tmp/out - [user@build] gcov -o /tmp/coverage/tmp/out/init main.c - - -7. Troubleshooting -================== - -Problem: Compilation aborts during linker step. -Cause: Profiling flags are specified for source files which are not - linked to the main kernel or which are linked by a custom - linker procedure. -Solution: Exclude affected source files from profiling by specifying - GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the - corresponding Makefile. - -Problem: Files copied from sysfs appear empty or incomplete. -Cause: Due to the way seq_file works, some tools such as cp or tar - may not correctly copy files from sysfs. -Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links. - Alternatively use the mechanism shown in Appendix B. - - -Appendix A: gather_on_build.sh -============================== - -Sample script to gather coverage meta files on the build machine -(see 6a): -#!/bin/bash - -KSRC=$1 -KOBJ=$2 -DEST=$3 - -if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then - echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2 - exit 1 -fi - -KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) -KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) - -find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \ - -perm /u+r,g+r | tar cfz $DEST -P -T - - -if [ $? -eq 0 ] ; then - echo "$DEST successfully created, copy to test system and unpack with:" - echo " tar xfz $DEST -P" -else - echo "Could not create file $DEST" -fi - - -Appendix B: gather_on_test.sh -============================= - -Sample script to gather coverage data files on the test machine -(see 6b): - -#!/bin/bash -e - -DEST=$1 -GCDA=/sys/kernel/debug/gcov - -if [ -z "$DEST" ] ; then - echo "Usage: $0 <output.tar.gz>" >&2 - exit 1 -fi - -TEMPDIR=$(mktemp -d) -echo Collecting data.. -find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \; -find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \; -find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \; -tar czf $DEST -C $TEMPDIR sys -rm -rf $TEMPDIR - -echo "$DEST successfully created, copy to build system and unpack with:" -echo " tar xfz $DEST" diff --git a/Documentation/gpio/board.txt b/Documentation/gpio/board.txt index 86d3fa95fd12..40884c4fe40c 100644 --- a/Documentation/gpio/board.txt +++ b/Documentation/gpio/board.txt @@ -8,9 +8,9 @@ gpio-legacy.txt (actually, there is no real mapping possible with the old interface; you just fetch an integer from somewhere and request the corresponding GPIO. -Platforms that make use of GPIOs must select ARCH_REQUIRE_GPIOLIB (if GPIO usage -is mandatory) or ARCH_WANT_OPTIONAL_GPIOLIB (if GPIO support can be omitted) in -their Kconfig. Then, how GPIOs are mapped depends on what the platform uses to +All platforms can enable the GPIO library, but if the platform strictly +requires GPIO functionality to be present, it needs to select GPIOLIB from its +Kconfig. Then, how GPIOs are mapped depends on what the platform uses to describe its hardware layout. Currently, mappings can be defined through device tree, ACPI, and platform data. diff --git a/Documentation/gpio/driver.txt b/Documentation/gpio/driver.txt index 6cb35a78eff4..368d5a294d89 100644 --- a/Documentation/gpio/driver.txt +++ b/Documentation/gpio/driver.txt @@ -262,6 +262,12 @@ symbol: to the container using container_of(). (See Documentation/driver-model/design-patterns.txt) + If there is a need to exclude certain GPIOs from the IRQ domain, one can + set .irq_need_valid_mask of the gpiochip before gpiochip_add_data() is + called. This allocates .irq_valid_mask with as many bits set as there are + GPIOs in the chip. Drivers can exclude GPIOs by clearing bits from this + mask. The mask must be filled in before gpiochip_irqchip_add() is called. + * gpiochip_set_chained_irqchip(): sets up a chained irq handler for a gpio_chip from a parent IRQ and passes the struct gpio_chip* as handler data. (Notice handler data, since the irqchip data is likely used by the diff --git a/Documentation/gpio/gpio-legacy.txt b/Documentation/gpio/gpio-legacy.txt index 79ab5648d69b..b34fd94f7089 100644 --- a/Documentation/gpio/gpio-legacy.txt +++ b/Documentation/gpio/gpio-legacy.txt @@ -72,8 +72,8 @@ in this document, but drivers acting as clients to the GPIO interface must not care how it's implemented.) That said, if the convention is supported on their platform, drivers should -use it when possible. Platforms must select ARCH_REQUIRE_GPIOLIB or -ARCH_WANT_OPTIONAL_GPIOLIB in their Kconfig. Drivers that can't work without +use it when possible. Platforms must select GPIOLIB if GPIO functionality +is strictly required. Drivers that can't work without standard GPIO calls should have Kconfig entries which depend on GPIOLIB. The GPIO calls are available, either as "real code" or as optimized-away stubs, when drivers use the include file: @@ -553,22 +553,14 @@ either NULL or the label associated with that GPIO when it was requested. Platform Support ---------------- -To support this framework, a platform's Kconfig will "select" either -ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB -and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines -three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep(). +To force-enable this framework, a platform's Kconfig will "select" GPIOLIB, +else it is up to the user to configure support for GPIO. It may also provide a custom value for ARCH_NR_GPIOS, so that it better reflects the number of GPIOs in actual use on that platform, without wasting static table space. (It should count both built-in/SoC GPIOs and also ones on GPIO expanders. -ARCH_REQUIRE_GPIOLIB means that the gpiolib code will always get compiled -into the kernel on that architecture. - -ARCH_WANT_OPTIONAL_GPIOLIB means the gpiolib code defaults to off and the user -can enable it and build it into the kernel optionally. - If neither of these options are selected, the platform does not support GPIOs through GPIO-lib and the code cannot be enabled by the user. diff --git a/Documentation/gpu/conf.py b/Documentation/gpu/conf.py new file mode 100644 index 000000000000..6314d1708230 --- /dev/null +++ b/Documentation/gpu/conf.py @@ -0,0 +1,5 @@ +# -*- coding: utf-8; mode: python -*- + +project = "Linux GPU Driver Developer's Guide" + +tags.add("subproject") diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst index fcac0fa72056..5ff3d2b236af 100644 --- a/Documentation/gpu/index.rst +++ b/Documentation/gpu/index.rst @@ -12,3 +12,10 @@ Linux GPU Driver Developer's Guide drm-uapi i915 vga-switcheroo + +.. only:: subproject + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/hsi.txt b/Documentation/hsi.txt deleted file mode 100644 index 6ac6cd51852a..000000000000 --- a/Documentation/hsi.txt +++ /dev/null @@ -1,75 +0,0 @@ -HSI - High-speed Synchronous Serial Interface - -1. Introduction -~~~~~~~~~~~~~~~ - -High Speed Syncronous Interface (HSI) is a fullduplex, low latency protocol, -that is optimized for die-level interconnect between an Application Processor -and a Baseband chipset. It has been specified by the MIPI alliance in 2003 and -implemented by multiple vendors since then. - -The HSI interface supports full duplex communication over multiple channels -(typically 8) and is capable of reaching speeds up to 200 Mbit/s. - -The serial protocol uses two signals, DATA and FLAG as combined data and clock -signals and an additional READY signal for flow control. An additional WAKE -signal can be used to wakeup the chips from standby modes. The signals are -commonly prefixed by AC for signals going from the application die to the -cellular die and CA for signals going the other way around. - -+------------+ +---------------+ -| Cellular | | Application | -| Die | | Die | -| | - - - - - - CAWAKE - - - - - - >| | -| T|------------ CADATA ------------>|R | -| X|------------ CAFLAG ------------>|X | -| |<----------- ACREADY ------------| | -| | | | -| | | | -| |< - - - - - ACWAKE - - - - - - -| | -| R|<----------- ACDATA -------------|T | -| X|<----------- ACFLAG -------------|X | -| |------------ CAREADY ----------->| | -| | | | -| | | | -+------------+ +---------------+ - -2. HSI Subsystem in Linux -~~~~~~~~~~~~~~~~~~~~~~~~~ - -In the Linux kernel the hsi subsystem is supposed to be used for HSI devices. -The hsi subsystem contains drivers for hsi controllers including support for -multi-port controllers and provides a generic API for using the HSI ports. - -It also contains HSI client drivers, which make use of the generic API to -implement a protocol used on the HSI interface. These client drivers can -use an arbitrary number of channels. - -3. hsi-char Device -~~~~~~~~~~~~~~~~~~ - -Each port automatically registers a generic client driver called hsi_char, -which provides a charecter device for userspace representing the HSI port. -It can be used to communicate via HSI from userspace. Userspace may -configure the hsi_char device using the following ioctl commands: - -* HSC_RESET: - - flush the HSI port - -* HSC_SET_PM - - enable or disable the client. - -* HSC_SEND_BREAK - - send break - -* HSC_SET_RX - - set RX configuration - -* HSC_GET_RX - - get RX configuration - -* HSC_SET_TX - - set TX configuration - -* HSC_GET_TX - - get TX configuration diff --git a/Documentation/hwmon/adt7470 b/Documentation/hwmon/adt7470 index 8ce4aa0a0f55..fe68e18a0c8d 100644 --- a/Documentation/hwmon/adt7470 +++ b/Documentation/hwmon/adt7470 @@ -65,6 +65,23 @@ from 0 (off) to 255 (full speed). Fan speed will be set to maximum when the temperature sensor associated with the PWM control exceeds pwm#_auto_point2_temp. +The driver also allows control of the PWM frequency: + +* pwm1_freq + +The PWM frequency is rounded to the nearest one of: + +* 11.0 Hz +* 14.7 Hz +* 22.1 Hz +* 29.4 Hz +* 35.3 Hz +* 44.1 Hz +* 58.8 Hz +* 88.2 Hz +* 1.4 kHz +* 22.5 kHz + Notes ----- diff --git a/Documentation/hwmon/hwmon-kernel-api.txt b/Documentation/hwmon/hwmon-kernel-api.txt index 2ecdbfc85ecf..ef9d74947f5c 100644 --- a/Documentation/hwmon/hwmon-kernel-api.txt +++ b/Documentation/hwmon/hwmon-kernel-api.txt @@ -34,6 +34,19 @@ devm_hwmon_device_register_with_groups(struct device *dev, const char *name, void *drvdata, const struct attribute_group **groups); +struct device * +hwmon_device_register_with_info(struct device *dev, + const char *name, void *drvdata, + const struct hwmon_chip_info *info, + const struct attribute_group **groups); + +struct device * +devm_hwmon_device_register_with_info(struct device *dev, + const char *name, + void *drvdata, + const struct hwmon_chip_info *info, + const struct attribute_group **groups); + void hwmon_device_unregister(struct device *dev); void devm_hwmon_device_unregister(struct device *dev); @@ -60,15 +73,229 @@ devm_hwmon_device_register_with_groups is similar to hwmon_device_register_with_groups. However, it is device managed, meaning the hwmon device does not have to be removed explicitly by the removal function. +hwmon_device_register_with_info is the most comprehensive and preferred means +to register a hardware monitoring device. It creates the standard sysfs +attributes in the hardware monitoring core, letting the driver focus on reading +from and writing to the chip instead of having to bother with sysfs attributes. +Its parameters are described in more detail below. + +devm_hwmon_device_register_with_info is similar to +hwmon_device_register_with_info. However, it is device managed, meaning the +hwmon device does not have to be removed explicitly by the removal function. + hwmon_device_unregister deregisters a registered hardware monitoring device. The parameter of this function is the pointer to the registered hardware monitoring device structure. This function must be called from the driver remove function if the hardware monitoring device was registered with -hwmon_device_register or with hwmon_device_register_with_groups. +hwmon_device_register, hwmon_device_register_with_groups, or +hwmon_device_register_with_info. devm_hwmon_device_unregister does not normally have to be called. It is only needed for error handling, and only needed if the driver probe fails after -the call to devm_hwmon_device_register_with_groups. +the call to devm_hwmon_device_register_with_groups and if the automatic +(device managed) removal would be too late. + +Using devm_hwmon_device_register_with_info() +-------------------------------------------- + +hwmon_device_register_with_info() registers a hardware monitoring device. +The parameters to this function are + +struct device *dev Pointer to parent device +const char *name Device name +void *drvdata Driver private data +const struct hwmon_chip_info *info + Pointer to chip description. +const struct attribute_group **groups + Null-terminated list of additional sysfs attribute + groups. + +This function returns a pointer to the created hardware monitoring device +on success and a negative error code for failure. + +The hwmon_chip_info structure looks as follows. + +struct hwmon_chip_info { + const struct hwmon_ops *ops; + const struct hwmon_channel_info **info; +}; + +It contains the following fields: + +* ops: Pointer to device operations. +* info: NULL-terminated list of device channel descriptors. + +The list of hwmon operations is defined as: + +struct hwmon_ops { + umode_t (*is_visible)(const void *, enum hwmon_sensor_types type, + u32 attr, int); + int (*read)(struct device *, enum hwmon_sensor_types type, + u32 attr, int, long *); + int (*write)(struct device *, enum hwmon_sensor_types type, + u32 attr, int, long); +}; + +It defines the following operations. + +* is_visible: Pointer to a function to return the file mode for each supported + attribute. This function is mandatory. + +* read: Pointer to a function for reading a value from the chip. This function + is optional, but must be provided if any readable attributes exist. + +* write: Pointer to a function for writing a value to the chip. This function is + optional, but must be provided if any writeable attributes exist. + +Each sensor channel is described with struct hwmon_channel_info, which is +defined as follows. + +struct hwmon_channel_info { + enum hwmon_sensor_types type; + u32 *config; +}; + +It contains following fields: + +* type: The hardware monitoring sensor type. + Supported sensor types are + * hwmon_chip A virtual sensor type, used to describe attributes + which apply to the entire chip. + * hwmon_temp Temperature sensor + * hwmon_in Voltage sensor + * hwmon_curr Current sensor + * hwmon_power Power sensor + * hwmon_energy Energy sensor + * hwmon_humidity Humidity sensor + * hwmon_fan Fan speed sensor + * hwmon_pwm PWM control + +* config: Pointer to a 0-terminated list of configuration values for each + sensor of the given type. Each value is a combination of bit values + describing the attributes supposed by a single sensor. + +As an example, here is the complete description file for a LM75 compatible +sensor chip. The chip has a single temperature sensor. The driver wants to +register with the thermal subsystem (HWMON_C_REGISTER_TZ), and it supports +the update_interval attribute (HWMON_C_UPDATE_INTERVAL). The chip supports +reading the temperature (HWMON_T_INPUT), it has a maximum temperature +register (HWMON_T_MAX) as well as a maximum temperature hysteresis register +(HWMON_T_MAX_HYST). + +static const u32 lm75_chip_config[] = { + HWMON_C_REGISTER_TZ | HWMON_C_UPDATE_INTERVAL, + 0 +}; + +static const struct hwmon_channel_info lm75_chip = { + .type = hwmon_chip, + .config = lm75_chip_config, +}; + +static const u32 lm75_temp_config[] = { + HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MAX_HYST, + 0 +}; + +static const struct hwmon_channel_info lm75_temp = { + .type = hwmon_temp, + .config = lm75_temp_config, +}; + +static const struct hwmon_channel_info *lm75_info[] = { + &lm75_chip, + &lm75_temp, + NULL +}; + +static const struct hwmon_ops lm75_hwmon_ops = { + .is_visible = lm75_is_visible, + .read = lm75_read, + .write = lm75_write, +}; + +static const struct hwmon_chip_info lm75_chip_info = { + .ops = &lm75_hwmon_ops, + .info = lm75_info, +}; + +A complete list of bit values indicating individual attribute support +is defined in include/linux/hwmon.h. Definition prefixes are as follows. + +HWMON_C_xxxx Chip attributes, for use with hwmon_chip. +HWMON_T_xxxx Temperature attributes, for use with hwmon_temp. +HWMON_I_xxxx Voltage attributes, for use with hwmon_in. +HWMON_C_xxxx Current attributes, for use with hwmon_curr. + Notice the prefix overlap with chip attributes. +HWMON_P_xxxx Power attributes, for use with hwmon_power. +HWMON_E_xxxx Energy attributes, for use with hwmon_energy. +HWMON_H_xxxx Humidity attributes, for use with hwmon_humidity. +HWMON_F_xxxx Fan speed attributes, for use with hwmon_fan. +HWMON_PWM_xxxx PWM control attributes, for use with hwmon_pwm. + +Driver callback functions +------------------------- + +Each driver provides is_visible, read, and write functions. Parameters +and return values for those functions are as follows. + +umode_t is_visible_func(const void *data, enum hwmon_sensor_types type, + u32 attr, int channel) + +Parameters: + data: Pointer to device private data structure. + type: The sensor type. + attr: Attribute identifier associated with a specific attribute. + For example, the attribute value for HWMON_T_INPUT would be + hwmon_temp_input. For complete mappings of bit fields to + attribute values please see include/linux/hwmon.h. + channel:The sensor channel number. + +Return value: + The file mode for this attribute. Typically, this will be 0 (the + attribute will not be created), S_IRUGO, or 'S_IRUGO | S_IWUSR'. + +int read_func(struct device *dev, enum hwmon_sensor_types type, + u32 attr, int channel, long *val) + +Parameters: + dev: Pointer to the hardware monitoring device. + type: The sensor type. + attr: Attribute identifier associated with a specific attribute. + For example, the attribute value for HWMON_T_INPUT would be + hwmon_temp_input. For complete mappings please see + include/linux/hwmon.h. + channel:The sensor channel number. + val: Pointer to attribute value. + +Return value: + 0 on success, a negative error number otherwise. + +int write_func(struct device *dev, enum hwmon_sensor_types type, + u32 attr, int channel, long val) + +Parameters: + dev: Pointer to the hardware monitoring device. + type: The sensor type. + attr: Attribute identifier associated with a specific attribute. + For example, the attribute value for HWMON_T_INPUT would be + hwmon_temp_input. For complete mappings please see + include/linux/hwmon.h. + channel:The sensor channel number. + val: The value to write to the chip. + +Return value: + 0 on success, a negative error number otherwise. + + +Driver-provided sysfs attributes +-------------------------------- + +If the hardware monitoring device is registered with +hwmon_device_register_with_info or devm_hwmon_device_register_with_info, +it is most likely not necessary to provide sysfs attributes. Only non-standard +sysfs attributes need to be provided when one of those registration functions +is used. The header file linux/hwmon-sysfs.h provides a number of useful macros to declare and use hardware monitoring sysfs attributes. diff --git a/Documentation/hwmon/max6650 b/Documentation/hwmon/max6650 index 58d9644a2bde..dff1d296a48b 100644 --- a/Documentation/hwmon/max6650 +++ b/Documentation/hwmon/max6650 @@ -34,6 +34,7 @@ fan3_input ro " fan4_input ro " fan1_target rw desired fan speed in RPM (closed loop mode only) pwm1_enable rw regulator mode, 0=full on, 1=open loop, 2=closed loop + 3=off pwm1 rw relative speed (0-255), 255=max. speed. Used in open loop mode only. fan1_div rw sets the speed range the inputs can handle. Legal diff --git a/Documentation/hwmon/ucd9000 b/Documentation/hwmon/ucd9000 index 805e33edb978..262e713e60ff 100644 --- a/Documentation/hwmon/ucd9000 +++ b/Documentation/hwmon/ucd9000 @@ -2,12 +2,13 @@ Kernel driver ucd9000 ===================== Supported chips: - * TI UCD90120, UCD90124, UCD9090, and UCD90910 - Prefixes: 'ucd90120', 'ucd90124', 'ucd9090', 'ucd90910' + * TI UCD90120, UCD90124, UCD90160, UCD9090, and UCD90910 + Prefixes: 'ucd90120', 'ucd90124', 'ucd90160', 'ucd9090', 'ucd90910' Addresses scanned: - Datasheets: http://focus.ti.com/lit/ds/symlink/ucd90120.pdf http://focus.ti.com/lit/ds/symlink/ucd90124.pdf + http://focus.ti.com/lit/ds/symlink/ucd90160.pdf http://focus.ti.com/lit/ds/symlink/ucd9090.pdf http://focus.ti.com/lit/ds/symlink/ucd90910.pdf @@ -32,6 +33,13 @@ interrupts, cascading, or other system functions. Twelve of these pins offer PWM functionality. Using these pins, the UCD90124 offers support for fan control, margining, and general-purpose PWM functions. +The UCD90160 is a 16-rail PMBus/I2C addressable power-supply sequencer and +monitor. The device integrates a 12-bit ADC for monitoring up to 16 power-supply +voltage inputs. Twenty-six GPIO pins can be used for power supply enables, +power-on reset signals, external interrupts, cascading, or other system +functions. Twelve of these pins offer PWM functionality. Using these pins, the +UCD90160 offers support for margining, and general-purpose PWM functions. + The UCD9090 is a 10-rail PMBus/I2C addressable power-supply sequencer and monitor. The device integrates a 12-bit ADC for monitoring up to 10 power-supply voltage inputs. Twenty-three GPIO pins can be used for power supply enables, diff --git a/Documentation/hwmon/xgene-hwmon b/Documentation/hwmon/xgene-hwmon new file mode 100644 index 000000000000..6ec50ed7cc8f --- /dev/null +++ b/Documentation/hwmon/xgene-hwmon @@ -0,0 +1,30 @@ +Kernel driver xgene-hwmon +======================== + +Supported chips: + * APM X-Gene SoC + +Description +----------- + +This driver adds hardware temperature and power reading support for +APM X-Gene SoC using the mailbox communication interface. +For device tree, it is the standard DT mailbox. +For ACPI, it is the PCC mailbox. + +The following sensors are supported + + * Temperature + - SoC on-die temperature in milli-degree C + - Alarm when high/over temperature occurs + * Power + - CPU power in uW + - IO power in uW + +sysfs-Interface +--------------- + +temp0_input - SoC on-die temperature (milli-degree C) +temp0_critical_alarm - An 1 would indicates on-die temperature exceeded threshold +power0_input - CPU power in (uW) +power1_input - IO power in (uW) diff --git a/Documentation/i2c/slave-interface b/Documentation/i2c/slave-interface index 80807adb8ded..7e2a228f21bc 100644 --- a/Documentation/i2c/slave-interface +++ b/Documentation/i2c/slave-interface @@ -145,6 +145,11 @@ If you want to add slave support to the bus driver: * Catch the slave interrupts and send appropriate i2c_slave_events to the backend. +Note that most hardware supports being master _and_ slave on the same bus. So, +if you extend a bus driver, please make sure that the driver supports that as +well. In almost all cases, slave support does not need to disable the master +functionality. + Check the i2c-rcar driver as an example. diff --git a/Documentation/iio/iio_configfs.txt b/Documentation/iio/iio_configfs.txt index f0add35cd52e..4e5f101837a8 100644 --- a/Documentation/iio/iio_configfs.txt +++ b/Documentation/iio/iio_configfs.txt @@ -82,8 +82,8 @@ users to create hrtimer triggers under /config/iio/triggers/hrtimer. e.g: -$ mkdir /config/triggers/hrtimer/instance1 -$ rmdir /config/triggers/hrtimer/instance1 +$ mkdir /config/iio/triggers/hrtimer/instance1 +$ rmdir /config/iio/triggers/hrtimer/instance1 Each trigger can have one or more attributes specific to the trigger type. diff --git a/Documentation/index.rst b/Documentation/index.rst index e0fc72963e87..d9ccb94fca95 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -6,22 +6,19 @@ Welcome to The Linux Kernel's documentation! ============================================ -Nothing for you to see here *yet*. Please move along. - Contents: .. toctree:: :maxdepth: 2 kernel-documentation - media/media_uapi - media/media_kapi - media/dvb-drivers/index - media/v4l-drivers/index + development-process/index + dev-tools/tools + driver-api/index + media/index gpu/index Indices and tables ================== * :ref:`genindex` -* :ref:`search` diff --git a/Documentation/infiniband/sysfs.txt b/Documentation/infiniband/sysfs.txt index 45bcafe6ff8a..77570d16b170 100644 --- a/Documentation/infiniband/sysfs.txt +++ b/Documentation/infiniband/sysfs.txt @@ -89,6 +89,36 @@ HFI1 nctxts - number of allowed contexts (PSM2) chip_reset - diagnostic (root only) boardversion - board version + + sdma<N>/ - one directory per sdma engine (0 - 15) + sdma<N>/cpu_list - read-write, list of cpus for user-process to sdma + engine assignment. + sdma<N>/vl - read-only, vl the sdma engine maps to. + + The new interface will give the user control on the affinity settings + for the hfi1 device. + As an example, to set an sdma engine irq affinity and thread affinity + of a user processes to use the sdma engine, which is "near" in terms + of NUMA configuration, or physical cpu location, the user will do: + + echo "3" > /proc/irq/<N>/smp_affinity_list + echo "4-7" > /sys/devices/.../sdma3/cpu_list + cat /sys/devices/.../sdma3/vl + 0 + echo "8" > /proc/irq/<M>/smp_affinity_list + echo "9-12" > /sys/devices/.../sdma4/cpu_list + cat /sys/devices/.../sdma4/vl + 1 + + to make sure that when a process runs on cpus 4,5,6, or 7, + and uses vl=0, then sdma engine 3 is selected by the driver, + and also the interrupt of the sdma engine 3 is steered to cpu 3. + Similarly, when a process runs on cpus 9,10,11, or 12 and sets vl=1, + then engine 4 will be selected and the irq of the sdma engine 4 is + steered to cpu 8. + This assumes that in the above N is the irq number of "sdma3", + and M is irq number of "sdma4" in the /proc/interrupts file. + ports/1/ CCMgtA/ cc_settings_bin - CCA tables used by PSM2 diff --git a/Documentation/ioctl/botching-up-ioctls.txt b/Documentation/ioctl/botching-up-ioctls.txt index cc30b14791cb..36138c632f7a 100644 --- a/Documentation/ioctl/botching-up-ioctls.txt +++ b/Documentation/ioctl/botching-up-ioctls.txt @@ -34,15 +34,18 @@ will need to add a a 32-bit compat layer: 64-bit platforms do. So we always need padding to the natural size to get this right. - * Pad the entire struct to a multiple of 64-bits - the structure size will - otherwise differ on 32-bit versus 64-bit. Having a different structure size - hurts when passing arrays of structures to the kernel, or if the kernel - checks the structure size, which e.g. the drm core does. + * Pad the entire struct to a multiple of 64-bits if the structure contains + 64-bit types - the structure size will otherwise differ on 32-bit versus + 64-bit. Having a different structure size hurts when passing arrays of + structures to the kernel, or if the kernel checks the structure size, which + e.g. the drm core does. * Pointers are __u64, cast from/to a uintprt_t on the userspace side and from/to a void __user * in the kernel. Try really hard not to delay this conversion or worse, fiddle the raw __u64 through your code since that - diminishes the checking tools like sparse can provide. + diminishes the checking tools like sparse can provide. The macro + u64_to_user_ptr can be used in the kernel to avoid warnings about integers + and pointres of different sizes. Basics diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt deleted file mode 100644 index 7dd95b35cd7c..000000000000 --- a/Documentation/kasan.txt +++ /dev/null @@ -1,171 +0,0 @@ -KernelAddressSanitizer (KASAN) -============================== - -0. Overview -=========== - -KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides -a fast and comprehensive solution for finding use-after-free and out-of-bounds -bugs. - -KASAN uses compile-time instrumentation for checking every memory access, -therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is -required for detection of out-of-bounds accesses to stack or global variables. - -Currently KASAN is supported only for x86_64 architecture. - -1. Usage -======== - -To enable KASAN configure kernel with: - - CONFIG_KASAN = y - -and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and -inline are compiler instrumentation types. The former produces smaller binary -the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC -version 5.0 or later. - -KASAN works with both SLUB and SLAB memory allocators. -For better bug detection and nicer reporting, enable CONFIG_STACKTRACE. - -To disable instrumentation for specific files or directories, add a line -similar to the following to the respective kernel Makefile: - - For a single file (e.g. main.o): - KASAN_SANITIZE_main.o := n - - For all files in one directory: - KASAN_SANITIZE := n - -1.1 Error reports -================= - -A typical out of bounds access report looks like this: - -================================================================== -BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3 -Write of size 1 by task modprobe/1689 -============================================================================= -BUG kmalloc-128 (Not tainted): kasan error ------------------------------------------------------------------------------ - -Disabling lock debugging due to kernel taint -INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689 - __slab_alloc+0x4b4/0x4f0 - kmem_cache_alloc_trace+0x10b/0x190 - kmalloc_oob_right+0x3d/0x75 [test_kasan] - init_module+0x9/0x47 [test_kasan] - do_one_initcall+0x99/0x200 - load_module+0x2cb3/0x3b20 - SyS_finit_module+0x76/0x80 - system_call_fastpath+0x12/0x17 -INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080 -INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720 - -Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ -Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk -Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. -Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........ -Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ -CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98 -Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 - ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78 - ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8 - ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558 -Call Trace: - [<ffffffff81cc68ae>] dump_stack+0x46/0x58 - [<ffffffff811fd848>] print_trailer+0xf8/0x160 - [<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan] - [<ffffffff811ff0f5>] object_err+0x35/0x40 - [<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan] - [<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0 - [<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40 - [<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40 - [<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40 - [<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan] - [<ffffffff8120a995>] __asan_store1+0x75/0xb0 - [<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan] - [<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan] - [<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan] - [<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan] - [<ffffffff810002d9>] do_one_initcall+0x99/0x200 - [<ffffffff811e4e5c>] ? __vunmap+0xec/0x160 - [<ffffffff81114f63>] load_module+0x2cb3/0x3b20 - [<ffffffff8110fd70>] ? m_show+0x240/0x240 - [<ffffffff81115f06>] SyS_finit_module+0x76/0x80 - [<ffffffff81cd3129>] system_call_fastpath+0x12/0x17 -Memory state around the buggy address: - ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc - ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc - ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc - ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc - ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00 ->ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc - ^ - ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc - ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc - ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb - ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb - ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb -================================================================== - -The header of the report discribe what kind of bug happened and what kind of -access caused it. It's followed by the description of the accessed slub object -(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and -the description of the accessed memory page. - -In the last section the report shows memory state around the accessed address. -Reading this part requires some understanding of how KASAN works. - -The state of each 8 aligned bytes of memory is encoded in one shadow byte. -Those 8 bytes can be accessible, partially accessible, freed or be a redzone. -We use the following encoding for each shadow byte: 0 means that all 8 bytes -of the corresponding memory region are accessible; number N (1 <= N <= 7) means -that the first N bytes are accessible, and other (8 - N) bytes are not; -any negative value indicates that the entire 8-byte word is inaccessible. -We use different negative values to distinguish between different kinds of -inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h). - -In the report above the arrows point to the shadow byte 03, which means that -the accessed address is partially accessible. - - -2. Implementation details -========================= - -From a high level, our approach to memory error detection is similar to that -of kmemcheck: use shadow memory to record whether each byte of memory is safe -to access, and use compile-time instrumentation to check shadow memory on each -memory access. - -AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory -(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and -offset to translate a memory address to its corresponding shadow address. - -Here is the function which translates an address to its corresponding shadow -address: - -static inline void *kasan_mem_to_shadow(const void *addr) -{ - return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) - + KASAN_SHADOW_OFFSET; -} - -where KASAN_SHADOW_SCALE_SHIFT = 3. - -Compile-time instrumentation used for checking memory accesses. Compiler inserts -function calls (__asan_load*(addr), __asan_store*(addr)) before each memory -access of size 1, 2, 4, 8 or 16. These functions check whether memory access is -valid or not by checking corresponding shadow memory. - -GCC 5.0 has possibility to perform inline instrumentation. Instead of making -function calls GCC directly inserts the code to check the shadow memory. -This option significantly enlarges kernel but it gives x1.1-x2 performance -boost over outline instrumented kernel. diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index db101857b2c9..069fcb3eef6e 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -274,7 +274,44 @@ menuconfig: This is similar to the simple config entry above, but it also gives a hint to front ends, that all suboptions should be displayed as a -separate list of options. +separate list of options. To make sure all the suboptions will really +show up under the menuconfig entry and not outside of it, every item +from the <config options> list must depend on the menuconfig symbol. +In practice, this is achieved by using one of the next two constructs: + +(1): +menuconfig M +if M + config C1 + config C2 +endif + +(2): +menuconfig M +config C1 + depends on M +config C2 + depends on M + +In the following examples (3) and (4), C1 and C2 still have the M +dependency, but will not appear under menuconfig M anymore, because +of C0, which doesn't depend on M: + +(3): +menuconfig M + config C0 +if M + config C1 + config C2 +endif + +(4): +menuconfig M +config C0 +config C1 + depends on M +config C2 + depends on M choices: diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 88ff63d5fde3..b0eb27b956d9 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -393,6 +393,15 @@ Notes on loading the dump-capture kernel: * We generally don' have to bring up a SMP kernel just to capture the dump. Hence generally it is useful either to build a UP dump-capture kernel or specify maxcpus=1 option while loading dump-capture kernel. + Note, though maxcpus always works, you had better replace it with + nr_cpus to save memory if supported by the current ARCH, such as x86. + +* You should enable multi-cpu support in dump-capture kernel if you intend + to use multi-thread programs with it, such as parallel dump feature of + makedumpfile. Otherwise, the multi-thread program may have a great + performance degradation. To enable multi-cpu support, you should bring up an + SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X] + options while loading it. * For s390x there are two kdump modes: If a ELF header is specified with the elfcorehdr= kernel parameter, it is used by the kdump kernel as it diff --git a/Documentation/kernel-docs.txt b/Documentation/kernel-docs.txt index 1dafc52167b0..05a7857a4a83 100644 --- a/Documentation/kernel-docs.txt +++ b/Documentation/kernel-docs.txt @@ -1,731 +1,652 @@ +.. _kernel_docs: - Index of Documentation for People Interested in Writing and/or - - Understanding the Linux Kernel. +Index of Documentation for People Interested in Writing and/or Understanding the Linux Kernel +============================================================================================= Juan-Mariano de Goyeneche <jmseyas@dit.upm.es> -/* - * The latest version of this document may be found at: - * http://www.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html - */ - - The need for a document like this one became apparent in the - linux-kernel mailing list as the same questions, asking for pointers - to information, appeared again and again. - - Fortunately, as more and more people get to GNU/Linux, more and more - get interested in the Kernel. But reading the sources is not always - enough. It is easy to understand the code, but miss the concepts, the - philosophy and design decisions behind this code. - - Unfortunately, not many documents are available for beginners to - start. And, even if they exist, there was no "well-known" place which - kept track of them. These lines try to cover this lack. All documents - available on line known by the author are listed, while some reference - books are also mentioned. - - PLEASE, if you know any paper not listed here or write a new document, - send me an e-mail, and I'll include a reference to it here. Any - corrections, ideas or comments are also welcomed. - - The papers that follow are listed in no particular order. All are - cataloged with the following fields: the document's "Title", the - "Author"/s, the "URL" where they can be found, some "Keywords" helpful - when searching for specific topics, and a brief "Description" of the - Document. - - Enjoy! - - ON-LINE DOCS: - - * Title: "Linux Device Drivers, Third Edition" - Author: Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman - URL: http://lwn.net/Kernel/LDD3/ - Description: A 600-page book covering the (2.6.10) driver - programming API and kernel hacking in general. Available under the - Creative Commons Attribution-ShareAlike 2.0 license. - - * Title: "The Linux Kernel" - Author: David A. Rusling. - URL: http://www.tldp.org/LDP/tlk/tlk.html - Keywords: everything!, book. - Description: On line, 200 pages book describing most aspects of - the Linux Kernel. Probably, the first reference for beginners. - Lots of illustrations explaining data structures use and - relationships in the purest Richard W. Stevens' style. Contents: - "1.-Hardware Basics, 2.-Software Basics, 3.-Memory Management, - 4.-Processes, 5.-Interprocess Communication Mechanisms, 6.-PCI, - 7.-Interrupts and Interrupt Handling, 8.-Device Drivers, 9.-The - File system, 10.-Networks, 11.-Kernel Mechanisms, 12.-Modules, - 13.-The Linux Kernel Sources, A.-Linux Data Structures, B.-The - Alpha AXP Processor, C.-Useful Web and FTP Sites, D.-The GNU - General Public License, Glossary". In short: a must have. - - * Title: "Linux Device Drivers, 2nd Edition" - Author: Alessandro Rubini and Jonathan Corbet. - URL: http://www.xml.com/ldd/chapter/book/index.html - Keywords: device drivers, modules, debugging, memory, hardware, - interrupt handling, char drivers, block drivers, kmod, mmap, DMA, - buses. - Description: O'Reilly's popular book, now also on-line under the - GNU Free Documentation License. - Notes: You can also buy it in paper-form from O'Reilly. See below - under BOOKS (Not on-line). - - * Title: "Conceptual Architecture of the Linux Kernel" - Author: Ivan T. Bowman. - URL: http://plg.uwaterloo.ca/ - Keywords: conceptual software architecture, extracted design, - reverse engineering, system structure. - Description: Conceptual software architecture of the Linux kernel, - automatically extracted from the source code. Very detailed. Good - figures. Gives good overall kernel understanding. - - * Title: "Concrete Architecture of the Linux Kernel" - Author: Ivan T. Bowman, Saheem Siddiqi, and Meyer C. Tanuan. - URL: http://plg.uwaterloo.ca/ - Keywords: concrete architecture, extracted design, reverse - engineering, system structure, dependencies. - Description: Concrete architecture of the Linux kernel, - automatically extracted from the source code. Very detailed. Good - figures. Gives good overall kernel understanding. This papers - focus on lower details than its predecessor (files, variables...). - - * Title: "Linux as a Case Study: Its Extracted Software - Architecture" - Author: Ivan T. Bowman, Richard C. Holt and Neil V. Brewster. - URL: http://plg.uwaterloo.ca/ - Keywords: software architecture, architecture recovery, - redocumentation. - Description: Paper appeared at ICSE'99, Los Angeles, May 16-22, - 1999. A mixture of the previous two documents from the same - author. - - * Title: "Overview of the Virtual File System" - Author: Richard Gooch. - URL: http://www.mjmwired.net/kernel/Documentation/filesystems/vfs.txt - Keywords: VFS, File System, mounting filesystems, opening files, - dentries, dcache. - Description: Brief introduction to the Linux Virtual File System. - What is it, how it works, operations taken when opening a file or - mounting a file system and description of important data - structures explaining the purpose of each of their entries. - - * Title: "The Linux RAID-1, 4, 5 Code" - Author: Ingo Molnar, Gadi Oxman and Miguel de Icaza. - URL: http://www.linuxjournal.com/article.php?sid=2391 - Keywords: RAID, MD driver. - Description: Linux Journal Kernel Korner article. Here is its - abstract: "A description of the implementation of the RAID-1, - RAID-4 and RAID-5 personalities of the MD device driver in the - Linux kernel, providing users with high performance and reliable, - secondary-storage capability using software". - - * Title: "Dynamic Kernels: Modularized Device Drivers" - Author: Alessandro Rubini. - URL: http://www.linuxjournal.com/article.php?sid=1219 - Keywords: device driver, module, loading/unloading modules, - allocating resources. - Description: Linux Journal Kernel Korner article. Here is its - abstract: "This is the first of a series of four articles - co-authored by Alessandro Rubini and Georg Zezchwitz which present - a practical approach to writing Linux device drivers as kernel - loadable modules. This installment presents an introduction to the - topic, preparing the reader to understand next month's - installment". - - * Title: "Dynamic Kernels: Discovery" - Author: Alessandro Rubini. - URL: http://www.linuxjournal.com/article.php?sid=1220 - Keywords: character driver, init_module, clean_up module, - autodetection, mayor number, minor number, file operations, - open(), close(). - Description: Linux Journal Kernel Korner article. Here is its - abstract: "This article, the second of four, introduces part of - the actual code to create custom module implementing a character - device driver. It describes the code for module initialization and - cleanup, as well as the open() and close() system calls". - - * Title: "The Devil's in the Details" - Author: Georg v. Zezschwitz and Alessandro Rubini. - URL: http://www.linuxjournal.com/article.php?sid=1221 - Keywords: read(), write(), select(), ioctl(), blocking/non - blocking mode, interrupt handler. - Description: Linux Journal Kernel Korner article. Here is its - abstract: "This article, the third of four on writing character - device drivers, introduces concepts of reading, writing, and using - ioctl-calls". - - * Title: "Dissecting Interrupts and Browsing DMA" - Author: Alessandro Rubini and Georg v. Zezschwitz. - URL: http://www.linuxjournal.com/article.php?sid=1222 - Keywords: interrupts, irqs, DMA, bottom halves, task queues. - Description: Linux Journal Kernel Korner article. Here is its - abstract: "This is the fourth in a series of articles about - writing character device drivers as loadable kernel modules. This - month, we further investigate the field of interrupt handling. - Though it is conceptually simple, practical limitations and - constraints make this an ``interesting'' part of device driver - writing, and several different facilities have been provided for - different situations. We also investigate the complex topic of - DMA". - - * Title: "Device Drivers Concluded" - Author: Georg v. Zezschwitz. - URL: http://www.linuxjournal.com/article.php?sid=1287 - Keywords: address spaces, pages, pagination, page management, - demand loading, swapping, memory protection, memory mapping, mmap, - virtual memory areas (VMAs), vremap, PCI. - Description: Finally, the above turned out into a five articles - series. This latest one's introduction reads: "This is the last of - five articles about character device drivers. In this final - section, Georg deals with memory mapping devices, beginning with - an overall description of the Linux memory management concepts". - - * Title: "Network Buffers And Memory Management" - Author: Alan Cox. - URL: http://www.linuxjournal.com/article.php?sid=1312 - Keywords: sk_buffs, network devices, protocol/link layer - variables, network devices flags, transmit, receive, - configuration, multicast. - Description: Linux Journal Kernel Korner. Here is the abstract: - "Writing a network device driver for Linux is fundamentally - simple---most of the complexity (other than talking to the - hardware) involves managing network packets in memory". - - * Title: "Linux Kernel Hackers' Guide" - Author: Michael K. Johnson. - URL: http://www.tldp.org/LDP/khg/HyperNews/get/khg.html - Keywords: device drivers, files, VFS, kernel interface, character vs - block devices, hardware interrupts, scsi, DMA, access to user memory, - memory allocation, timers. - Description: A guide designed to help you get up to speed on the - concepts that are not intuitevly obvious, and to document the internal - structures of Linux. - - * Title: "The Venus kernel interface" - Author: Peter J. Braam. - URL: - http://www.coda.cs.cmu.edu/doc/html/kernel-venus-protocol.html - Keywords: coda, filesystem, venus, cache manager. - Description: "This document describes the communication between - Venus and kernel level file system code needed for the operation - of the Coda filesystem. This version document is meant to describe - the current interface (version 1.0) as well as improvements we - envisage". - - * Title: "Programming PCI-Devices under Linux" - Author: Claus Schroeter. - URL: - ftp://ftp.llp.fu-berlin.de/pub/linux/LINUX-LAB/whitepapers/pcip.ps.gz - Keywords: PCI, device, busmastering. - Description: 6 pages tutorial on PCI programming under Linux. - Gives the basic concepts on the architecture of the PCI subsystem, - as long as basic functions and macros to read/write the devices - and perform busmastering. - - * Title: "Writing Character Device Driver for Linux" - Author: R. Baruch and C. Schroeter. - URL: - ftp://ftp.llp.fu-berlin.de/pub/linux/LINUX-LAB/whitepapers/drivers.ps.gz - Keywords: character device drivers, I/O, signals, DMA, accessing - ports in user space, kernel environment. - Description: 68 pages paper on writing character drivers. A little - bit old (1.993, 1.994) although still useful. - - * Title: "Design and Implementation of the Second Extended - Filesystem" - Author: Rรฉmy Card, Theodore Ts'o, Stephen Tweedie. - URL: http://web.mit.edu/tytso/www/linux/ext2intro.html - Keywords: ext2, linux fs history, inode, directory, link, devices, - VFS, physical structure, performance, benchmarks, ext2fs library, - ext2fs tools, e2fsck. - Description: Paper written by three of the top ext2 hackers. - Covers Linux filesystems history, ext2 motivation, ext2 features, - design, physical structure on disk, performance, benchmarks, - e2fsck's passes description... A must read! - Notes: This paper was first published in the Proceedings of the - First Dutch International Symposium on Linux, ISBN 90-367-0385-9. - - * Title: "Analysis of the Ext2fs structure" - Author: Louis-Dominique Dubeau. - URL: http://teaching.csse.uwa.edu.au/units/CITS2002/fs-ext2/ - Keywords: ext2, filesystem, ext2fs. - Description: Description of ext2's blocks, directories, inodes, - bitmaps, invariants... - - * Title: "Journaling the Linux ext2fs Filesystem" - Author: Stephen C. Tweedie. - URL: - ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz - Keywords: ext3, journaling. - Description: Excellent 8-pages paper explaining the journaling - capabilities added to ext2 by the author, showing different - problems faced and the alternatives chosen. - - * Title: "Kernel API changes from 2.0 to 2.2" - Author: Richard Gooch. - URL: http://www.safe-mbox.com/~rgooch/linux/docs/porting-to-2.2.html - Keywords: 2.2, changes. - Description: Kernel functions/structures/variables which changed - from 2.0.x to 2.2.x. - - * Title: "Kernel API changes from 2.2 to 2.4" - Author: Richard Gooch. - URL: http://www.safe-mbox.com/~rgooch/linux/docs/porting-to-2.4.html - Keywords: 2.4, changes. - Description: Kernel functions/structures/variables which changed - from 2.2.x to 2.4.x. - - * Title: "Linux Kernel Module Programming Guide" - Author: Ori Pomerantz. - URL: http://tldp.org/LDP/lkmpg/2.6/html/index.html - Keywords: modules, GPL book, /proc, ioctls, system calls, - interrupt handlers . - Description: Very nice 92 pages GPL book on the topic of modules - programming. Lots of examples. - - * Title: "I/O Event Handling Under Linux" - Author: Richard Gooch. - Keywords: IO, I/O, select(2), poll(2), FDs, aio_read(2), readiness - event queues. - Description: From the Introduction: "I/O Event handling is about - how your Operating System allows you to manage a large number of - open files (file descriptors in UNIX/POSIX, or FDs) in your - application. You want the OS to notify you when FDs become active - (have data ready to be read or are ready for writing). Ideally you - want a mechanism that is scalable. This means a large number of - inactive FDs cost very little in memory and CPU time to manage". - - * Title: "The Kernel Hacking HOWTO" - Author: Various Talented People, and Rusty. - Location: in kernel tree, Documentation/DocBook/kernel-hacking.tmpl - (must be built as "make {htmldocs | psdocs | pdfdocs}) - Keywords: HOWTO, kernel contexts, deadlock, locking, modules, - symbols, return conventions. - Description: From the Introduction: "Please understand that I - never wanted to write this document, being grossly underqualified, - but I always wanted to read it, and this was the only way. I - simply explain some best practices, and give reading entry-points - into the kernel sources. I avoid implementation details: that's - what the code is for, and I ignore whole tracts of useful - routines. This document assumes familiarity with C, and an - understanding of what the kernel is, and how it is used. It was - originally written for the 2.3 kernels, but nearly all of it - applies to 2.2 too; 2.0 is slightly different". - - * Title: "Writing an ALSA Driver" - Author: Takashi Iwai <tiwai@suse.de> - URL: http://www.alsa-project.org/~iwai/writing-an-alsa-driver/index.html - Keywords: ALSA, sound, soundcard, driver, lowlevel, hardware. - Description: Advanced Linux Sound Architecture for developers, - both at kernel and user-level sides. ALSA is the Linux kernel - sound architecture in the 2.6 kernel version. - - * Title: "Programming Guide for Linux USB Device Drivers" - Author: Detlef Fliegl. - URL: http://usb.in.tum.de/usbdoc/ - Keywords: USB, universal serial bus. - Description: A must-read. From the Preface: "This document should - give detailed information about the current state of the USB - subsystem and its API for USB device drivers. The first section - will deal with the basics of USB devices. You will learn about - different types of devices and their properties. Going into detail - you will see how USB devices communicate on the bus. The second - section gives an overview of the Linux USB subsystem [2] and the - device driver framework. Then the API and its data structures will - be explained step by step. The last section of this document - contains a reference of all API calls and their return codes". - Notes: Beware: the main page states: "This document may not be - published, printed or used in excerpts without explicit permission - of the author". Fortunately, it may still be read... - - * Title: "Linux Kernel Mailing List Glossary" - Author: various - URL: http://kernelnewbies.org/glossary/ - Keywords: glossary, terms, linux-kernel. - Description: From the introduction: "This glossary is intended as - a brief description of some of the acronyms and terms you may hear - during discussion of the Linux kernel". - - * Title: "Linux Kernel Locking HOWTO" - Author: Various Talented People, and Rusty. - Location: in kernel tree, Documentation/DocBook/kernel-locking.tmpl - (must be built as "make {htmldocs | psdocs | pdfdocs}) - Keywords: locks, locking, spinlock, semaphore, atomic, race - condition, bottom halves, tasklets, softirqs. - Description: The title says it all: document describing the - locking system in the Linux Kernel either in uniprocessor or SMP - systems. - Notes: "It was originally written for the later (>2.3.47) 2.3 - kernels, but most of it applies to 2.2 too; 2.0 is slightly - different". Freely redistributable under the conditions of the GNU - General Public License. - - * Title: "Global spinlock list and usage" - Author: Rick Lindsley. - URL: http://lse.sourceforge.net/lockhier/global-spin-lock - Keywords: spinlock. - Description: This is an attempt to document both the existence and - usage of the spinlocks in the Linux 2.4.5 kernel. Comprehensive - list of spinlocks showing when they are used, which functions - access them, how each lock is acquired, under what conditions it - is held, whether interrupts can occur or not while it is held... - - * Title: "Porting Linux 2.0 Drivers To Linux 2.2: Changes and New - Features " - Author: Alan Cox. - URL: http://www.linux-mag.com/1999-05/gear_01.html - Keywords: ports, porting. - Description: Article from Linux Magazine on porting from 2.0 to - 2.2 kernels. - - * Title: "Porting Device Drivers To Linux 2.2: part II" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/238 - Keywords: ports, porting. - Description: Second part on porting from 2.0 to 2.2 kernels. - - * Title: "How To Make Sure Your Driver Will Work On The Power - Macintosh" - Author: Paul Mackerras. - URL: http://www.linux-mag.com/id/261 - Keywords: Mac, Power Macintosh, porting, drivers, compatibility. - Description: The title says it all. - - * Title: "An Introduction to SCSI Drivers" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/284 - Keywords: SCSI, device, driver. - Description: The title says it all. - - * Title: "Advanced SCSI Drivers And Other Tales" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/307 - Keywords: SCSI, device, driver, advanced. - Description: The title says it all. - - * Title: "Writing Linux Mouse Drivers" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/330 - Keywords: mouse, driver, gpm. - Description: The title says it all. - - * Title: "More on Mouse Drivers" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/356 - Keywords: mouse, driver, gpm, races, asynchronous I/O. - Description: The title still says it all. - - * Title: "Writing Video4linux Radio Driver" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/381 - Keywords: video4linux, driver, radio, radio devices. - Description: The title says it all. - - * Title: "Video4linux Drivers, Part 1: Video-Capture Device" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/406 - Keywords: video4linux, driver, video capture, capture devices, - camera driver. - Description: The title says it all. - - * Title: "Video4linux Drivers, Part 2: Video-capture Devices" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/429 - Keywords: video4linux, driver, video capture, capture devices, - camera driver, control, query capabilities, capability, facility. - Description: The title says it all. - - * Title: "PCI Management in Linux 2.2" - Author: Alan Cox. - URL: http://www.linux-mag.com/id/452 - Keywords: PCI, bus, bus-mastering. - Description: The title says it all. - - * Title: "Linux 2.4 Kernel Internals" - Author: Tigran Aivazian and Christoph Hellwig. - URL: http://www.moses.uklinux.net/patches/lki.html - Keywords: Linux, kernel, booting, SMB boot, VFS, page cache. - Description: A little book used for a short training course. - Covers building the kernel image, booting (including SMP bootup), - process management, VFS and more. - - * Title: "Linux IP Networking. A Guide to the Implementation and - Modification of the Linux Protocol Stack." - Author: Glenn Herrin. - URL: http://www.cs.unh.edu/cnrg/gherrin - Keywords: network, networking, protocol, IP, UDP, TCP, connection, - socket, receiving, transmitting, forwarding, routing, packets, - modules, /proc, sk_buff, FIB, tags. - Description: Excellent paper devoted to the Linux IP Networking, - explaining anything from the kernel's to the user space - configuration tools' code. Very good to get a general overview of - the kernel networking implementation and understand all steps - packets follow from the time they are received at the network - device till they are delivered to applications. The studied kernel - code is from 2.2.14 version. Provides code for a working packet - dropper example. - - * Title: "Get those boards talking under Linux." - Author: Alex Ivchenko. - URL: http://www.edn.com/article/CA46968.html - Keywords: data-acquisition boards, drivers, modules, interrupts, - memory allocation. - Description: Article written for people wishing to make their data - acquisition boards work on their GNU/Linux machines. Gives a basic - overview on writing drivers, from the naming of functions to - interrupt handling. - Notes: Two-parts article. Part II is at - URL: http://www.edn.com/article/CA46998.html - - * Title: "Linux PCMCIA Programmer's Guide" - Author: David Hinds. - URL: http://pcmcia-cs.sourceforge.net/ftp/doc/PCMCIA-PROG.html - Keywords: PCMCIA. - Description: "This document describes how to write kernel device - drivers for the Linux PCMCIA Card Services interface. It also - describes how to write user-mode utilities for communicating with - Card Services. - - * Title: "The Linux Kernel NFSD Implementation" - Author: Neil Brown. - URL: - http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/nfsd.html - Keywords: knfsd, nfsd, NFS, RPC, lockd, mountd, statd. - Description: The title says it all. - Notes: Covers knfsd's version 1.4.7 (patch against 2.2.7 kernel). - - * Title: "A Linux vm README" - Author: Kanoj Sarcar. - URL: http://kos.enix.org/pub/linux-vmm.html - Keywords: virtual memory, mm, pgd, vma, page, page flags, page - cache, swap cache, kswapd. - Description: Telegraphic, short descriptions and definitions - relating the Linux virtual memory implementation. - - * Title: "(nearly) Complete Linux Loadable Kernel Modules. The - definitive guide for hackers, virus coders and system - administrators." - Author: pragmatic/THC. - URL: http://packetstormsecurity.org/docs/hack/LKM_HACKING.html - Keywords: syscalls, intercept, hide, abuse, symbol table. - Description: Interesting paper on how to abuse the Linux kernel in - order to intercept and modify syscalls, make - files/directories/processes invisible, become root, hijack ttys, - write kernel modules based virus... and solutions for admins to - avoid all those abuses. - Notes: For 2.0.x kernels. Gives guidances to port it to 2.2.x - kernels. - - BOOKS: (Not on-line) - - * Title: "Linux Device Drivers" - Author: Alessandro Rubini. - Publisher: O'Reilly & Associates. - Date: 1998. - Pages: 439. - ISBN: 1-56592-292-1 - - * Title: "Linux Device Drivers, 2nd Edition" - Author: Alessandro Rubini and Jonathan Corbet. - Publisher: O'Reilly & Associates. - Date: 2001. - Pages: 586. - ISBN: 0-59600-008-1 - Notes: Further information in - http://www.oreilly.com/catalog/linuxdrive2/ - - * Title: "Linux Device Drivers, 3rd Edition" - Authors: Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman - Publisher: O'Reilly & Associates. - Date: 2005. - Pages: 636. - ISBN: 0-596-00590-3 - Notes: Further information in - http://www.oreilly.com/catalog/linuxdrive3/ - PDF format, URL: http://lwn.net/Kernel/LDD3/ - - * Title: "Linux Kernel Internals" - Author: Michael Beck. - Publisher: Addison-Wesley. - Date: 1997. - ISBN: 0-201-33143-8 (second edition) - - * Title: "The Design of the UNIX Operating System" - Author: Maurice J. Bach. - Publisher: Prentice Hall. - Date: 1986. - Pages: 471. - ISBN: 0-13-201757-1 - - * Title: "The Design and Implementation of the 4.3 BSD UNIX - Operating System" - Author: Samuel J. Leffler, Marshall Kirk McKusick, Michael J. - Karels, John S. Quarterman. - Publisher: Addison-Wesley. - Date: 1989 (reprinted with corrections on October, 1990). - ISBN: 0-201-06196-1 - - * Title: "The Design and Implementation of the 4.4 BSD UNIX - Operating System" - Author: Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, - John S. Quarterman. - Publisher: Addison-Wesley. - Date: 1996. - ISBN: 0-201-54979-4 - - * Title: "Programmation Linux 2.0 API systeme et fonctionnement du - noyau" - Author: Remy Card, Eric Dumas, Franck Mevel. - Publisher: Eyrolles. - Date: 1997. - Pages: 520. - ISBN: 2-212-08932-5 - Notes: French. - - * Title: "Unix internals -- the new frontiers" - Author: Uresh Vahalia. - Publisher: Prentice Hall. - Date: 1996. - Pages: 600. - ISBN: 0-13-101908-2 - - * Title: "Programming for the real world - POSIX.4" - Author: Bill O. Gallmeister. - Publisher: O'Reilly & Associates, Inc.. - Date: 1995. - Pages: ???. - ISBN: I-56592-074-0 - Notes: Though not being directly about Linux, Linux aims to be - POSIX. Good reference. - - * Title: "UNIX Systems for Modern Architectures: Symmetric - Multiprocessing and Caching for Kernel Programmers" - Author: Curt Schimmel. - Publisher: Addison Wesley. - Date: June, 1994. - Pages: 432. - ISBN: 0-201-63338-8 - - * Title: "Linux Kernel Development, 3rd Edition" - Author: Robert Love - Publisher: Addison-Wesley. - Date: July, 2010 - Pages: 440 - ISBN: 978-0672329463 - - MISCELLANEOUS: - - * Name: linux/Documentation - Author: Many. - URL: Just look inside your kernel sources. - Keywords: anything, DocBook. - Description: Documentation that comes with the kernel sources, - inside the Documentation directory. Some pages from this document - (including this document itself) have been moved there, and might - be more up to date than the web version. - - * Name: "Linux Kernel Source Reference" - Author: Thomas Graichen. - URL: http://marc.info/?l=linux-kernel&m=96446640102205&w=4 - Keywords: CVS, web, cvsweb, browsing source code. - Description: Web interface to a CVS server with the kernel - sources. "Here you can have a look at any file of the Linux kernel - sources of any version starting from 1.0 up to the (daily updated) - current version available. Also you can check the differences - between two versions of a file". - - * Name: "Cross-Referencing Linux" - URL: http://lxr.free-electrons.com/ - Keywords: Browsing source code. - Description: Another web-based Linux kernel source code browser. - Lots of cross references to variables and functions. You can see - where they are defined and where they are used. - - * Name: "Linux Weekly News" - URL: http://lwn.net - Keywords: latest kernel news. - Description: The title says it all. There's a fixed kernel section - summarizing developers' work, bug fixes, new features and versions - produced during the week. Published every Thursday. - - * Name: "Kernel Traffic" - URL: http://kt.earth.li/kernel-traffic/index.html - Keywords: linux-kernel mailing list, weekly kernel news. - Description: Weekly newsletter covering the most relevant - discussions of the linux-kernel mailing list. - - * Name: "CuTTiNG.eDGe.LiNuX" - URL: http://edge.kernelnotes.org - Keywords: changelist. - Description: Site which provides the changelist for every kernel - release. What's new, what's better, what's changed. Myrdraal reads - the patches and describes them. Pointers to the patches are there, - too. - - * Name: "New linux-kernel Mailing List FAQ" - URL: http://www.tux.org/lkml/ - Keywords: linux-kernel mailing list FAQ. - Description: linux-kernel is a mailing list for developers to - communicate. This FAQ builds on the previous linux-kernel mailing - list FAQ maintained by Frohwalt Egerer, who no longer maintains - it. Read it to see how to join the mailing list. Dozens of - interesting questions regarding the list, Linux, developers (who - is ...?), terms (what is...?) are answered here too. Just read it. - - * Name: "Linux Virtual File System" - Author: Peter J. Braam. - URL: http://www.coda.cs.cmu.edu/doc/talks/linuxvfs/ - Keywords: slides, VFS, inode, superblock, dentry, dcache. - Description: Set of slides, presumably from a presentation on the - Linux VFS layer. Covers version 2.1.x, with dentries and the - dcache. - - * Name: "Gary's Encyclopedia - The Linux Kernel" - Author: Gary (I suppose...). - URL: http://slencyclopedia.berlios.de/index.html - Keywords: linux, community, everything! - Description: Gary's Encyclopedia exists to allow the rapid finding - of documentation and other information of interest to GNU/Linux - users. It has about 4000 links to external pages in 150 major - categories. This link is for kernel-specific links, documents, - sites... This list is now hosted by developer.Berlios.de, - but seems not to have been updated since sometime in 1999. - - * Name: "The home page of Linux-MM" - Author: The Linux-MM team. - URL: http://linux-mm.org/ - Keywords: memory management, Linux-MM, mm patches, TODO, docs, - mailing list. - Description: Site devoted to Linux Memory Management development. - Memory related patches, HOWTOs, links, mm developers... Don't miss - it if you are interested in memory management development! - - * Name: "Kernel Newbies IRC Channel and Website" - URL: http://www.kernelnewbies.org - Keywords: IRC, newbies, channel, asking doubts. - Description: #kernelnewbies on irc.oftc.net. - #kernelnewbies is an IRC network dedicated to the 'newbie' - kernel hacker. The audience mostly consists of people who are - learning about the kernel, working on kernel projects or - professional kernel hackers that want to help less seasoned kernel - people. - #kernelnewbies is on the OFTC IRC Network. - Try irc.oftc.net as your server and then /join #kernelnewbies. - The kernelnewbies website also hosts articles, documents, FAQs... - - * Name: "linux-kernel mailing list archives and search engines" - URL: http://vger.kernel.org/vger-lists.html - URL: http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html - URL: http://marc.theaimsgroup.com/?l=linux-kernel - URL: http://groups.google.com/group/mlist.linux.kernel - URL: http://www.cs.helsinki.fi/linux/linux-kernel/ - URL: http://www.lib.uaa.alaska.edu/linux-kernel/ - Keywords: linux-kernel, archives, search. - Description: Some of the linux-kernel mailing list archivers. If - you have a better/another one, please let me know. - _________________________________________________________________ - - Document last updated on Sat 2005-NOV-19 +The need for a document like this one became apparent in the +linux-kernel mailing list as the same questions, asking for pointers +to information, appeared again and again. + +Fortunately, as more and more people get to GNU/Linux, more and more +get interested in the Kernel. But reading the sources is not always +enough. It is easy to understand the code, but miss the concepts, the +philosophy and design decisions behind this code. + +Unfortunately, not many documents are available for beginners to +start. And, even if they exist, there was no "well-known" place which +kept track of them. These lines try to cover this lack. All documents +available on line known by the author are listed, while some reference +books are also mentioned. + +PLEASE, if you know any paper not listed here or write a new document, +send me an e-mail, and I'll include a reference to it here. Any +corrections, ideas or comments are also welcomed. + +The papers that follow are listed in no particular order. All are +cataloged with the following fields: the document's "Title", the +"Author"/s, the "URL" where they can be found, some "Keywords" helpful +when searching for specific topics, and a brief "Description" of the +Document. + +Enjoy! + +.. note:: + + The documents on each section of this document are ordered by its + published date, from the newest to the oldest. + +Docs at the Linux Kernel tree +----------------------------- + +The DocBook books should be built with ``make {htmldocs | psdocs | pdfdocs}``. +The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``. + + * Name: **linux/Documentation** + + :Author: Many. + :Location: Documentation/ + :Keywords: text files, Sphinx, DocBook. + :Description: Documentation that comes with the kernel sources, + inside the Documentation directory. Some pages from this document + (including this document itself) have been moved there, and might + be more up to date than the web version. + + * Title: **The Kernel Hacking HOWTO** + + :Author: Various Talented People, and Rusty. + :Location: Documentation/DocBook/kernel-hacking.tmpl + :Keywords: HOWTO, kernel contexts, deadlock, locking, modules, + symbols, return conventions. + :Description: From the Introduction: "Please understand that I + never wanted to write this document, being grossly underqualified, + but I always wanted to read it, and this was the only way. I + simply explain some best practices, and give reading entry-points + into the kernel sources. I avoid implementation details: that's + what the code is for, and I ignore whole tracts of useful + routines. This document assumes familiarity with C, and an + understanding of what the kernel is, and how it is used. It was + originally written for the 2.3 kernels, but nearly all of it + applies to 2.2 too; 2.0 is slightly different". + + * Title: **Linux Kernel Locking HOWTO** + + :Author: Various Talented People, and Rusty. + :Location: Documentation/DocBook/kernel-locking.tmpl + :Keywords: locks, locking, spinlock, semaphore, atomic, race + condition, bottom halves, tasklets, softirqs. + :Description: The title says it all: document describing the + locking system in the Linux Kernel either in uniprocessor or SMP + systems. + :Notes: "It was originally written for the later (>2.3.47) 2.3 + kernels, but most of it applies to 2.2 too; 2.0 is slightly + different". Freely redistributable under the conditions of the GNU + General Public License. + +On-line docs +------------ + + * Title: **Linux Kernel Mailing List Glossary** + + :Author: various + :URL: http://kernelnewbies.org/glossary/ + :Date: rolling version + :Keywords: glossary, terms, linux-kernel. + :Description: From the introduction: "This glossary is intended as + a brief description of some of the acronyms and terms you may hear + during discussion of the Linux kernel". + + * Title: **Tracing the Way of Data in a TCP Connection through the Linux Kernel** + + :Author: Richard Sailer + :URL: https://archive.org/details/linux_kernel_data_flow_short_paper + :Date: 2016 + :Keywords: Linux Kernel Networking, TCP, tracing, ftrace + :Description: A seminar paper explaining ftrace and how to use it for + understanding linux kernel internals, + illustrated at tracing the way of a TCP packet through the kernel. + :Abstract: *This short paper outlines the usage of ftrace a tracing framework + as a tool to understand a running Linux system. + Having obtained a trace-log a kernel hacker can read and understand + source code more determined and with context. + In a detailed example this approach is demonstrated in tracing + and the way of data in a TCP Connection through the kernel. + Finally this trace-log is used as base for more a exact conceptual + exploration and description of the Linux TCP/IP implementation.* + + * Title: **On submitting kernel Patches** + + :Author: Andi Kleen + :URL: http://halobates.de/on-submitting-kernel-patches.pdf + :Date: 2008 + :Keywords: patches, review process, types of submissions, basic rules, case studies + :Description: This paper gives several experience values on what types of patches + there are and how likley they get merged. + :Abstract: + [...]. This paper examines some common problems for + submitting larger changes and some strategies to avoid problems. + + * Title: **Overview of the Virtual File System** + + :Author: Richard Gooch. + :URL: http://www.mjmwired.net/kernel/Documentation/filesystems/vfs.txt + :Date: 2007 + :Keywords: VFS, File System, mounting filesystems, opening files, + dentries, dcache. + :Description: Brief introduction to the Linux Virtual File System. + What is it, how it works, operations taken when opening a file or + mounting a file system and description of important data + structures explaining the purpose of each of their entries. + + * Title: **Linux Device Drivers, Third Edition** + + :Author: Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman + :URL: http://lwn.net/Kernel/LDD3/ + :Date: 2005 + :Description: A 600-page book covering the (2.6.10) driver + programming API and kernel hacking in general. Available under the + Creative Commons Attribution-ShareAlike 2.0 license. + :note: You can also :ref:`purchase a copy from O'Reilly or elsewhere <ldd3_published>`. + + * Title: **Writing an ALSA Driver** + + :Author: Takashi Iwai <tiwai@suse.de> + :URL: http://www.alsa-project.org/~iwai/writing-an-alsa-driver/index.html + :Date: 2005 + :Keywords: ALSA, sound, soundcard, driver, lowlevel, hardware. + :Description: Advanced Linux Sound Architecture for developers, + both at kernel and user-level sides. ALSA is the Linux kernel + sound architecture in the 2.6 kernel version. + + * Title: **Linux PCMCIA Programmer's Guide** + + :Author: David Hinds. + :URL: http://pcmcia-cs.sourceforge.net/ftp/doc/PCMCIA-PROG.html + :Date: 2003 + :Keywords: PCMCIA. + :Description: "This document describes how to write kernel device + drivers for the Linux PCMCIA Card Services interface. It also + describes how to write user-mode utilities for communicating with + Card Services. + + * Title: **Linux Kernel Module Programming Guide** + + :Author: Ori Pomerantz. + :URL: http://tldp.org/LDP/lkmpg/2.6/html/index.html + :Date: 2001 + :Keywords: modules, GPL book, /proc, ioctls, system calls, + interrupt handlers . + :Description: Very nice 92 pages GPL book on the topic of modules + programming. Lots of examples. + + * Title: **Global spinlock list and usage** + + :Author: Rick Lindsley. + :URL: http://lse.sourceforge.net/lockhier/global-spin-lock + :Date: 2001 + :Keywords: spinlock. + :Description: This is an attempt to document both the existence and + usage of the spinlocks in the Linux 2.4.5 kernel. Comprehensive + list of spinlocks showing when they are used, which functions + access them, how each lock is acquired, under what conditions it + is held, whether interrupts can occur or not while it is held... + + * Title: **A Linux vm README** + + :Author: Kanoj Sarcar. + :URL: http://kos.enix.org/pub/linux-vmm.html + :Date: 2001 + :Keywords: virtual memory, mm, pgd, vma, page, page flags, page + cache, swap cache, kswapd. + :Description: Telegraphic, short descriptions and definitions + relating the Linux virtual memory implementation. + + * Title: **Video4linux Drivers, Part 1: Video-Capture Device** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/406 + :Date: 2000 + :Keywords: video4linux, driver, video capture, capture devices, + camera driver. + :Description: The title says it all. + + * Title: **Video4linux Drivers, Part 2: Video-capture Devices** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/429 + :Date: 2000 + :Keywords: video4linux, driver, video capture, capture devices, + camera driver, control, query capabilities, capability, facility. + :Description: The title says it all. + + * Title: **Linux IP Networking. A Guide to the Implementation and Modification of the Linux Protocol Stack.** + + :Author: Glenn Herrin. + :URL: http://www.cs.unh.edu/cnrg/gherrin + :Date: 2000 + :Keywords: network, networking, protocol, IP, UDP, TCP, connection, + socket, receiving, transmitting, forwarding, routing, packets, + modules, /proc, sk_buff, FIB, tags. + :Description: Excellent paper devoted to the Linux IP Networking, + explaining anything from the kernel's to the user space + configuration tools' code. Very good to get a general overview of + the kernel networking implementation and understand all steps + packets follow from the time they are received at the network + device till they are delivered to applications. The studied kernel + code is from 2.2.14 version. Provides code for a working packet + dropper example. + + * Title: **How To Make Sure Your Driver Will Work On The Power Macintosh** + + :Author: Paul Mackerras. + :URL: http://www.linux-mag.com/id/261 + :Date: 1999 + :Keywords: Mac, Power Macintosh, porting, drivers, compatibility. + :Description: The title says it all. + + * Title: **An Introduction to SCSI Drivers** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/284 + :Date: 1999 + :Keywords: SCSI, device, driver. + :Description: The title says it all. + + * Title: **Advanced SCSI Drivers And Other Tales** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/307 + :Date: 1999 + :Keywords: SCSI, device, driver, advanced. + :Description: The title says it all. + + * Title: **Writing Linux Mouse Drivers** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/330 + :Date: 1999 + :Keywords: mouse, driver, gpm. + :Description: The title says it all. + + * Title: **More on Mouse Drivers** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/356 + :Date: 1999 + :Keywords: mouse, driver, gpm, races, asynchronous I/O. + :Description: The title still says it all. + + * Title: **Writing Video4linux Radio Driver** + + :Author: Alan Cox. + :URL: http://www.linux-mag.com/id/381 + :Date: 1999 + :Keywords: video4linux, driver, radio, radio devices. + :Description: The title says it all. + + * Title: **I/O Event Handling Under Linux** + + :Author: Richard Gooch. + :URL: http://web.mit.edu/~yandros/doc/io-events.html + :Date: 1999 + :Keywords: IO, I/O, select(2), poll(2), FDs, aio_read(2), readiness + event queues. + :Description: From the Introduction: "I/O Event handling is about + how your Operating System allows you to manage a large number of + open files (file descriptors in UNIX/POSIX, or FDs) in your + application. You want the OS to notify you when FDs become active + (have data ready to be read or are ready for writing). Ideally you + want a mechanism that is scalable. This means a large number of + inactive FDs cost very little in memory and CPU time to manage". + + * Title: **(nearly) Complete Linux Loadable Kernel Modules. The definitive guide for hackers, virus coders and system administrators.** + + :Author: pragmatic/THC. + :URL: http://packetstormsecurity.org/docs/hack/LKM_HACKING.html + :Date: 1999 + :Keywords: syscalls, intercept, hide, abuse, symbol table. + :Description: Interesting paper on how to abuse the Linux kernel in + order to intercept and modify syscalls, make + files/directories/processes invisible, become root, hijack ttys, + write kernel modules based virus... and solutions for admins to + avoid all those abuses. + :Notes: For 2.0.x kernels. Gives guidances to port it to 2.2.x + kernels. + + * Name: **Linux Virtual File System** + + :Author: Peter J. Braam. + :URL: http://www.coda.cs.cmu.edu/doc/talks/linuxvfs/ + :Date: 1998 + :Keywords: slides, VFS, inode, superblock, dentry, dcache. + :Description: Set of slides, presumably from a presentation on the + Linux VFS layer. Covers version 2.1.x, with dentries and the + dcache. + + * Title: **The Venus kernel interface** + + :Author: Peter J. Braam. + :URL: http://www.coda.cs.cmu.edu/doc/html/kernel-venus-protocol.html + :Date: 1998 + :Keywords: coda, filesystem, venus, cache manager. + :Description: "This document describes the communication between + Venus and kernel level file system code needed for the operation + of the Coda filesystem. This version document is meant to describe + the current interface (version 1.0) as well as improvements we + envisage". + + * Title: **Design and Implementation of the Second Extended Filesystem** + + :Author: Rรฉmy Card, Theodore Ts'o, Stephen Tweedie. + :URL: http://web.mit.edu/tytso/www/linux/ext2intro.html + :Date: 1998 + :Keywords: ext2, linux fs history, inode, directory, link, devices, + VFS, physical structure, performance, benchmarks, ext2fs library, + ext2fs tools, e2fsck. + :Description: Paper written by three of the top ext2 hackers. + Covers Linux filesystems history, ext2 motivation, ext2 features, + design, physical structure on disk, performance, benchmarks, + e2fsck's passes description... A must read! + :Notes: This paper was first published in the Proceedings of the + First Dutch International Symposium on Linux, ISBN 90-367-0385-9. + + * Title: **The Linux RAID-1, 4, 5 Code** + + :Author: Ingo Molnar, Gadi Oxman and Miguel de Icaza. + :URL: http://www.linuxjournal.com/article.php?sid=2391 + :Date: 1997 + :Keywords: RAID, MD driver. + :Description: Linux Journal Kernel Korner article. Here is its + :Abstract: *A description of the implementation of the RAID-1, + RAID-4 and RAID-5 personalities of the MD device driver in the + Linux kernel, providing users with high performance and reliable, + secondary-storage capability using software*. + + * Title: **Linux Kernel Hackers' Guide** + + :Author: Michael K. Johnson. + :URL: http://www.tldp.org/LDP/khg/HyperNews/get/khg.html + :Date: 1997 + :Keywords: device drivers, files, VFS, kernel interface, character vs + block devices, hardware interrupts, scsi, DMA, access to user memory, + memory allocation, timers. + :Description: A guide designed to help you get up to speed on the + concepts that are not intuitevly obvious, and to document the internal + structures of Linux. + + * Title: **Dynamic Kernels: Modularized Device Drivers** + + :Author: Alessandro Rubini. + :URL: http://www.linuxjournal.com/article.php?sid=1219 + :Date: 1996 + :Keywords: device driver, module, loading/unloading modules, + allocating resources. + :Description: Linux Journal Kernel Korner article. Here is its + :Abstract: *This is the first of a series of four articles + co-authored by Alessandro Rubini and Georg Zezchwitz which present + a practical approach to writing Linux device drivers as kernel + loadable modules. This installment presents an introduction to the + topic, preparing the reader to understand next month's + installment*. + + * Title: **Dynamic Kernels: Discovery** + + :Author: Alessandro Rubini. + :URL: http://www.linuxjournal.com/article.php?sid=1220 + :Date: 1996 + :Keywords: character driver, init_module, clean_up module, + autodetection, mayor number, minor number, file operations, + open(), close(). + :Description: Linux Journal Kernel Korner article. Here is its + :Abstract: *This article, the second of four, introduces part of + the actual code to create custom module implementing a character + device driver. It describes the code for module initialization and + cleanup, as well as the open() and close() system calls*. + + * Title: **The Devil's in the Details** + + :Author: Georg v. Zezschwitz and Alessandro Rubini. + :URL: http://www.linuxjournal.com/article.php?sid=1221 + :Date: 1996 + :Keywords: read(), write(), select(), ioctl(), blocking/non + blocking mode, interrupt handler. + :Description: Linux Journal Kernel Korner article. Here is its + :Abstract: *This article, the third of four on writing character + device drivers, introduces concepts of reading, writing, and using + ioctl-calls*. + + * Title: **Dissecting Interrupts and Browsing DMA** + + :Author: Alessandro Rubini and Georg v. Zezschwitz. + :URL: http://www.linuxjournal.com/article.php?sid=1222 + :Date: 1996 + :Keywords: interrupts, irqs, DMA, bottom halves, task queues. + :Description: Linux Journal Kernel Korner article. Here is its + :Abstract: *This is the fourth in a series of articles about + writing character device drivers as loadable kernel modules. This + month, we further investigate the field of interrupt handling. + Though it is conceptually simple, practical limitations and + constraints make this an ''interesting'' part of device driver + writing, and several different facilities have been provided for + different situations. We also investigate the complex topic of + DMA*. + + * Title: **Device Drivers Concluded** + + :Author: Georg v. Zezschwitz. + :URL: http://www.linuxjournal.com/article.php?sid=1287 + :Date: 1996 + :Keywords: address spaces, pages, pagination, page management, + demand loading, swapping, memory protection, memory mapping, mmap, + virtual memory areas (VMAs), vremap, PCI. + :Description: Finally, the above turned out into a five articles + series. This latest one's introduction reads: "This is the last of + five articles about character device drivers. In this final + section, Georg deals with memory mapping devices, beginning with + an overall description of the Linux memory management concepts". + + * Title: **Network Buffers And Memory Management** + + :Author: Alan Cox. + :URL: http://www.linuxjournal.com/article.php?sid=1312 + :Date: 1996 + :Keywords: sk_buffs, network devices, protocol/link layer + variables, network devices flags, transmit, receive, + configuration, multicast. + :Description: Linux Journal Kernel Korner. + :Abstract: *Writing a network device driver for Linux is fundamentally + simple---most of the complexity (other than talking to the + hardware) involves managing network packets in memory*. + + * Title: **Analysis of the Ext2fs structure** + + :Author: Louis-Dominique Dubeau. + :URL: http://teaching.csse.uwa.edu.au/units/CITS2002/fs-ext2/ + :Date: 1994 + :Keywords: ext2, filesystem, ext2fs. + :Description: Description of ext2's blocks, directories, inodes, + bitmaps, invariants... + +Published books +--------------- + + * Title: **Linux Treiber entwickeln** + + :Author: Jรผrgen Quade, Eva-Katharina Kunst + :Publisher: dpunkt.verlag + :Date: Oct 2015 (4th edition) + :Pages: 688 + :ISBN: 978-3-86490-288-8 + :Note: German. The third edition from 2011 is + much cheaper and still quite up-to-date. + + * Title: **Linux Kernel Networking: Implementation and Theory** + + :Author: Rami Rosen + :Publisher: Apress + :Date: December 22, 2013 + :Pages: 648 + :ISBN: 978-1430261964 + + * Title: **Embedded Linux Primer: A practical Real-World Approach, 2nd Edition** + + :Author: Christopher Hallinan + :Publisher: Pearson + :Date: November, 2010 + :Pages: 656 + :ISBN: 978-0137017836 + + * Title: **Linux Kernel Development, 3rd Edition** + + :Author: Robert Love + :Publisher: Addison-Wesley + :Date: July, 2010 + :Pages: 440 + :ISBN: 978-0672329463 + + * Title: **Essential Linux Device Drivers** + + :Author: Sreekrishnan Venkateswaran + :Published: Prentice Hall + :Date: April, 2008 + :Pages: 744 + :ISBN: 978-0132396554 + +.. _ldd3_published: + + * Title: **Linux Device Drivers, 3rd Edition** + + :Authors: Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman + :Publisher: O'Reilly & Associates + :Date: 2005 + :Pages: 636 + :ISBN: 0-596-00590-3 + :Notes: Further information in + http://www.oreilly.com/catalog/linuxdrive3/ + PDF format, URL: http://lwn.net/Kernel/LDD3/ + + * Title: **Linux Kernel Internals** + + :Author: Michael Beck + :Publisher: Addison-Wesley + :Date: 1997 + :ISBN: 0-201-33143-8 (second edition) + + * Title: **Programmation Linux 2.0 API systeme et fonctionnement du noyau** + + :Author: Remy Card, Eric Dumas, Franck Mevel + :Publisher: Eyrolles + :Date: 1997 + :Pages: 520 + :ISBN: 2-212-08932-5 + :Notes: French + + * Title: **The Design and Implementation of the 4.4 BSD UNIX Operating System** + + :Author: Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, + John S. Quarterman + :Publisher: Addison-Wesley + :Date: 1996 + :ISBN: 0-201-54979-4 + + * Title: **Unix internals -- the new frontiers** + + :Author: Uresh Vahalia + :Publisher: Prentice Hall + :Date: 1996 + :Pages: 600 + :ISBN: 0-13-101908-2 + + * Title: **Programming for the real world - POSIX.4** + + :Author: Bill O. Gallmeister + :Publisher: O'Reilly & Associates, Inc + :Date: 1995 + :Pages: 552 + :ISBN: I-56592-074-0 + :Notes: Though not being directly about Linux, Linux aims to be + POSIX. Good reference. + + * Title: **UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers** + + :Author: Curt Schimmel + :Publisher: Addison Wesley + :Date: June, 1994 + :Pages: 432 + :ISBN: 0-201-63338-8 + + * Title: **The Design and Implementation of the 4.3 BSD UNIX Operating System** + + :Author: Samuel J. Leffler, Marshall Kirk McKusick, Michael J + Karels, John S. Quarterman + :Publisher: Addison-Wesley + :Date: 1989 (reprinted with corrections on October, 1990) + :ISBN: 0-201-06196-1 + + * Title: **The Design of the UNIX Operating System** + + :Author: Maurice J. Bach + :Publisher: Prentice Hall + :Date: 1986 + :Pages: 471 + :ISBN: 0-13-201757-1 + +Miscellaneous +------------- + + * Name: **Cross-Referencing Linux** + + :URL: http://lxr.free-electrons.com/ + :Keywords: Browsing source code. + :Description: Another web-based Linux kernel source code browser. + Lots of cross references to variables and functions. You can see + where they are defined and where they are used. + + * Name: **Linux Weekly News** + + :URL: http://lwn.net + :Keywords: latest kernel news. + :Description: The title says it all. There's a fixed kernel section + summarizing developers' work, bug fixes, new features and versions + produced during the week. Published every Thursday. + + * Name: **The home page of Linux-MM** + + :Author: The Linux-MM team. + :URL: http://linux-mm.org/ + :Keywords: memory management, Linux-MM, mm patches, TODO, docs, + mailing list. + :Description: Site devoted to Linux Memory Management development. + Memory related patches, HOWTOs, links, mm developers... Don't miss + it if you are interested in memory management development! + + * Name: **Kernel Newbies IRC Channel and Website** + + :URL: http://www.kernelnewbies.org + :Keywords: IRC, newbies, channel, asking doubts. + :Description: #kernelnewbies on irc.oftc.net. + #kernelnewbies is an IRC network dedicated to the 'newbie' + kernel hacker. The audience mostly consists of people who are + learning about the kernel, working on kernel projects or + professional kernel hackers that want to help less seasoned kernel + people. + #kernelnewbies is on the OFTC IRC Network. + Try irc.oftc.net as your server and then /join #kernelnewbies. + The kernelnewbies website also hosts articles, documents, FAQs... + + * Name: **linux-kernel mailing list archives and search engines** + + :URL: http://vger.kernel.org/vger-lists.html + :URL: http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html + :URL: http://groups.google.com/group/mlist.linux.kernel + :Keywords: linux-kernel, archives, search. + :Description: Some of the linux-kernel mailing list archivers. If + you have a better/another one, please let me know. + +------- + +Document last updated on Tue 2016-Sep-20 + +This document is based on: + http://www.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html diff --git a/Documentation/kernel-documentation.rst b/Documentation/kernel-documentation.rst index 391decc66a18..10cc7ddb6235 100644 --- a/Documentation/kernel-documentation.rst +++ b/Documentation/kernel-documentation.rst @@ -107,6 +107,35 @@ Here are some specific guidelines for the kernel documentation: the order as encountered."), having the higher levels the same overall makes it easier to follow the documents. + +the C domain +------------ + +The `Sphinx C Domain`_ (name c) is suited for documentation of C API. E.g. a +function prototype: + +.. code-block:: rst + + .. c:function:: int ioctl( int fd, int request ) + +The C domain of the kernel-doc has some additional features. E.g. you can +*rename* the reference name of a function with a common name like ``open`` or +``ioctl``: + +.. code-block:: rst + + .. c:function:: int ioctl( int fd, int request ) + :name: VIDIOC_LOG_STATUS + +The func-name (e.g. ioctl) remains in the output but the ref-name changed from +``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also +changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by: + +.. code-block:: rst + + :c:func:`VIDIOC_LOG_STATUS` + + list tables ----------- @@ -265,6 +294,8 @@ The kernel-doc extension is included in the kernel source tree, at ``scripts/kernel-doc`` script to extract the documentation comments from the source. +.. _kernel_doc: + Writing kernel-doc comments =========================== diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a4f4d693e2c1..ec8d81417dc8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -460,6 +460,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted. driver will print ACPI tables for AMD IOMMU during IOMMU initialization. + amd_iommu_intr= [HW,X86-64] + Specifies one of the following AMD IOMMU interrupt + remapping modes: + legacy - Use legacy interrupt remapping mode. + vapic - Use virtual APIC mode, which allows IOMMU + to inject interrupts directly into guest. + This mode requires kvm-amd.avic=1. + (Default when IOMMU HW support is present.) + amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b> @@ -698,6 +707,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted. loops can be debugged more effectively on production systems. + clocksource.arm_arch_timer.fsl-a008585= + [ARM64] + Format: <bool> + Enable/disable the workaround of Freescale/NXP + erratum A-008585. This can be useful for KVM + guests, if the guest device tree doesn't show the + erratum. If unspecified, the workaround is + enabled based on the device tree. + clearcpuid=BITNUM [X86] Disable CPUID feature X for the kernel. See arch/x86/include/asm/cpufeatures.h for the valid bit @@ -1045,11 +1063,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. determined by the stdout-path property in device tree's chosen node. - cdns,<addr> - Start an early, polled-mode console on a cadence serial - port at the specified address. The cadence serial port - must already be setup and configured. Options are not - yet supported. + cdns,<addr>[,options] + Start an early, polled-mode console on a Cadence + (xuartps) serial port at the specified address. Only + supported option is baud rate. If baud rate is not + specified, the serial port must already be setup and + configured. uart[8250],io,<addr>[,options] uart[8250],mmio,<addr>[,options] @@ -1364,6 +1383,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0. Default: 1024 + gpio-mockup.gpio_mockup_ranges + [HW] Sets the ranges of gpiochip of for this device. + Format: <start1>,<end1>,<start2>,<end2>... + hardlockup_all_cpu_backtrace= [KNL] Should the hard-lockup detector generate backtraces on all cpus. @@ -1688,7 +1711,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. intel_idle.max_cstate= [KNL,HW,ACPI,X86] 0 disables intel_idle and fall back on acpi_idle. - 1 to 6 specify maximum depth of C-state. + 1 to 9 specify maximum depth of C-state. intel_pstate= [X86] disable @@ -2161,10 +2184,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. than or equal to this physical address is ignored. maxcpus= [SMP] Maximum number of processors that an SMP kernel - should make use of. maxcpus=n : n >= 0 limits the - kernel to using 'n' processors. n=0 is a special case, - it is equivalent to "nosmp", which also disables - the IO APIC. + will bring up during bootup. maxcpus=n : n >= 0 limits + the kernel to bring up 'n' processors. Surely after + bootup you can bring up the other plugged cpu by executing + "echo 1 > /sys/devices/system/cpu/cpuX/online". So maxcpus + only takes effect during system bootup. + While n=0 is a special case, it is equivalent to "nosmp", + which also disables the IO APIC. max_loop= [LOOP] The number of loop block devices that get (loop.max_loop) unconditionally pre-created at init time. The default @@ -2571,8 +2597,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nodelayacct [KNL] Disable per-task delay accounting - nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. - nodsp [SH] Disable hardware DSP at boot time. noefi Disable EFI runtime services support. @@ -2773,9 +2797,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nr_cpus= [SMP] Maximum number of processors that an SMP kernel could support. nr_cpus=n : n >= 1 limits the kernel to - supporting 'n' processors. Later in runtime you can not - use hotplug cpu feature to put more cpu back to online. - just like you compile the kernel NR_CPUS=n + support 'n' processors. It could be larger than the + number of already plugged CPU during bootup, later in + runtime you can physically add extra cpu until it reaches + n. So during boot up some boot time memory for per-cpu + variables need be pre-allocated for later physical cpu + hot plugging. nr_uarts= [SERIAL] maximum number of UARTs to be registered. @@ -4238,6 +4265,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. u = IGNORE_UAS (don't bind to the uas driver); w = NO_WP_DETECT (don't test whether the medium is write-protected). + y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE + even if the device claims no cache) Example: quirks=0419:aaf5:rl,0421:0433:rc user_debug= [KNL,ARM] diff --git a/Documentation/kmemcheck.txt b/Documentation/kmemcheck.txt deleted file mode 100644 index 80aae85d8da6..000000000000 --- a/Documentation/kmemcheck.txt +++ /dev/null @@ -1,754 +0,0 @@ -GETTING STARTED WITH KMEMCHECK -============================== - -Vegard Nossum <vegardno@ifi.uio.no> - - -Contents -======== -0. Introduction -1. Downloading -2. Configuring and compiling -3. How to use -3.1. Booting -3.2. Run-time enable/disable -3.3. Debugging -3.4. Annotating false positives -4. Reporting errors -5. Technical description - - -0. Introduction -=============== - -kmemcheck is a debugging feature for the Linux Kernel. More specifically, it -is a dynamic checker that detects and warns about some uses of uninitialized -memory. - -Userspace programmers might be familiar with Valgrind's memcheck. The main -difference between memcheck and kmemcheck is that memcheck works for userspace -programs only, and kmemcheck works for the kernel only. The implementations -are of course vastly different. Because of this, kmemcheck is not as accurate -as memcheck, but it turns out to be good enough in practice to discover real -programmer errors that the compiler is not able to find through static -analysis. - -Enabling kmemcheck on a kernel will probably slow it down to the extent that -the machine will not be usable for normal workloads such as e.g. an -interactive desktop. kmemcheck will also cause the kernel to use about twice -as much memory as normal. For this reason, kmemcheck is strictly a debugging -feature. - - -1. Downloading -============== - -As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel. - - -2. Configuring and compiling -============================ - -kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of -configuration variables must have specific settings in order for the kmemcheck -menu to even appear in "menuconfig". These are: - - o CONFIG_CC_OPTIMIZE_FOR_SIZE=n - - This option is located under "General setup" / "Optimize for size". - - Without this, gcc will use certain optimizations that usually lead to - false positive warnings from kmemcheck. An example of this is a 16-bit - field in a struct, where gcc may load 32 bits, then discard the upper - 16 bits. kmemcheck sees only the 32-bit load, and may trigger a - warning for the upper 16 bits (if they're uninitialized). - - o CONFIG_SLAB=y or CONFIG_SLUB=y - - This option is located under "General setup" / "Choose SLAB - allocator". - - o CONFIG_FUNCTION_TRACER=n - - This option is located under "Kernel hacking" / "Tracers" / "Kernel - Function Tracer" - - When function tracing is compiled in, gcc emits a call to another - function at the beginning of every function. This means that when the - page fault handler is called, the ftrace framework will be called - before kmemcheck has had a chance to handle the fault. If ftrace then - modifies memory that was tracked by kmemcheck, the result is an - endless recursive page fault. - - o CONFIG_DEBUG_PAGEALLOC=n - - This option is located under "Kernel hacking" / "Memory Debugging" - / "Debug page memory allocations". - -In addition, I highly recommend turning on CONFIG_DEBUG_INFO=y. This is also -located under "Kernel hacking". With this, you will be able to get line number -information from the kmemcheck warnings, which is extremely valuable in -debugging a problem. This option is not mandatory, however, because it slows -down the compilation process and produces a much bigger kernel image. - -Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory -Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows -a description of the kmemcheck configuration variables: - - o CONFIG_KMEMCHECK - - This must be enabled in order to use kmemcheck at all... - - o CONFIG_KMEMCHECK_[DISABLED | ENABLED | ONESHOT]_BY_DEFAULT - - This option controls the status of kmemcheck at boot-time. "Enabled" - will enable kmemcheck right from the start, "disabled" will boot the - kernel as normal (but with the kmemcheck code compiled in, so it can - be enabled at run-time after the kernel has booted), and "one-shot" is - a special mode which will turn kmemcheck off automatically after - detecting the first use of uninitialized memory. - - If you are using kmemcheck to actively debug a problem, then you - probably want to choose "enabled" here. - - The one-shot mode is mostly useful in automated test setups because it - can prevent floods of warnings and increase the chances of the machine - surviving in case something is really wrong. In other cases, the one- - shot mode could actually be counter-productive because it would turn - itself off at the very first error -- in the case of a false positive - too -- and this would come in the way of debugging the specific - problem you were interested in. - - If you would like to use your kernel as normal, but with a chance to - enable kmemcheck in case of some problem, it might be a good idea to - choose "disabled" here. When kmemcheck is disabled, most of the run- - time overhead is not incurred, and the kernel will be almost as fast - as normal. - - o CONFIG_KMEMCHECK_QUEUE_SIZE - - Select the maximum number of error reports to store in an internal - (fixed-size) buffer. Since errors can occur virtually anywhere and in - any context, we need a temporary storage area which is guaranteed not - to generate any other page faults when accessed. The queue will be - emptied as soon as a tasklet may be scheduled. If the queue is full, - new error reports will be lost. - - The default value of 64 is probably fine. If some code produces more - than 64 errors within an irqs-off section, then the code is likely to - produce many, many more, too, and these additional reports seldom give - any more information (the first report is usually the most valuable - anyway). - - This number might have to be adjusted if you are not using serial - console or similar to capture the kernel log. If you are using the - "dmesg" command to save the log, then getting a lot of kmemcheck - warnings might overflow the kernel log itself, and the earlier reports - will get lost in that way instead. Try setting this to 10 or so on - such a setup. - - o CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT - - Select the number of shadow bytes to save along with each entry of the - error-report queue. These bytes indicate what parts of an allocation - are initialized, uninitialized, etc. and will be displayed when an - error is detected to help the debugging of a particular problem. - - The number entered here is actually the logarithm of the number of - bytes that will be saved. So if you pick for example 5 here, kmemcheck - will save 2^5 = 32 bytes. - - The default value should be fine for debugging most problems. It also - fits nicely within 80 columns. - - o CONFIG_KMEMCHECK_PARTIAL_OK - - This option (when enabled) works around certain GCC optimizations that - produce 32-bit reads from 16-bit variables where the upper 16 bits are - thrown away afterwards. - - The default value (enabled) is recommended. This may of course hide - some real errors, but disabling it would probably produce a lot of - false positives. - - o CONFIG_KMEMCHECK_BITOPS_OK - - This option silences warnings that would be generated for bit-field - accesses where not all the bits are initialized at the same time. This - may also hide some real bugs. - - This option is probably obsolete, or it should be replaced with - the kmemcheck-/bitfield-annotations for the code in question. The - default value is therefore fine. - -Now compile the kernel as usual. - - -3. How to use -============= - -3.1. Booting -============ - -First some information about the command-line options. There is only one -option specific to kmemcheck, and this is called "kmemcheck". It can be used -to override the default mode as chosen by the CONFIG_KMEMCHECK_*_BY_DEFAULT -option. Its possible settings are: - - o kmemcheck=0 (disabled) - o kmemcheck=1 (enabled) - o kmemcheck=2 (one-shot mode) - -If SLUB debugging has been enabled in the kernel, it may take precedence over -kmemcheck in such a way that the slab caches which are under SLUB debugging -will not be tracked by kmemcheck. In order to ensure that this doesn't happen -(even though it shouldn't by default), use SLUB's boot option "slub_debug", -like this: slub_debug=- - -In fact, this option may also be used for fine-grained control over SLUB vs. -kmemcheck. For example, if the command line includes "kmemcheck=1 -slub_debug=,dentry", then SLUB debugging will be used only for the "dentry" -slab cache, and with kmemcheck tracking all the other caches. This is advanced -usage, however, and is not generally recommended. - - -3.2. Run-time enable/disable -============================ - -When the kernel has booted, it is possible to enable or disable kmemcheck at -run-time. WARNING: This feature is still experimental and may cause false -positive warnings to appear. Therefore, try not to use this. If you find that -it doesn't work properly (e.g. you see an unreasonable amount of warnings), I -will be happy to take bug reports. - -Use the file /proc/sys/kernel/kmemcheck for this purpose, e.g.: - - $ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck - -The numbers are the same as for the kmemcheck= command-line option. - - -3.3. Debugging -============== - -A typical report will look something like this: - -WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) -80000000000000000000000000000000000000000088ffff0000000000000000 - i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u - ^ - -Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A -RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 -RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002 -RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 -RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84 -RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000 -R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e -R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8 -FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000 -CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 -CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0 -DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 -DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 - [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 - [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 - [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 - [<ffffffff8100c7b5>] int_signal+0x12/0x17 - [<ffffffffffffffff>] 0xffffffffffffffff - -The single most valuable information in this report is the RIP (or EIP on 32- -bit) value. This will help us pinpoint exactly which instruction that caused -the warning. - -If your kernel was compiled with CONFIG_DEBUG_INFO=y, then all we have to do -is give this address to the addr2line program, like this: - - $ addr2line -e vmlinux -i ffffffff8104ede8 - arch/x86/include/asm/string_64.h:12 - include/asm-generic/siginfo.h:287 - kernel/signal.c:380 - kernel/signal.c:410 - -The "-e vmlinux" tells addr2line which file to look in. IMPORTANT: This must -be the vmlinux of the kernel that produced the warning in the first place! If -not, the line number information will almost certainly be wrong. - -The "-i" tells addr2line to also print the line numbers of inlined functions. -In this case, the flag was very important, because otherwise, it would only -have printed the first line, which is just a call to memcpy(), which could be -called from a thousand places in the kernel, and is therefore not very useful. -These inlined functions would not show up in the stack trace above, simply -because the kernel doesn't load the extra debugging information. This -technique can of course be used with ordinary kernel oopses as well. - -In this case, it's the caller of memcpy() that is interesting, and it can be -found in include/asm-generic/siginfo.h, line 287: - -281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from) -282 { -283 if (from->si_code < 0) -284 memcpy(to, from, sizeof(*to)); -285 else -286 /* _sigchld is currently the largest know union member */ -287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld)); -288 } - -Since this was a read (kmemcheck usually warns about reads only, though it can -warn about writes to unallocated or freed memory as well), it was probably the -"from" argument which contained some uninitialized bytes. Following the chain -of calls, we move upwards to see where "from" was allocated or initialized, -kernel/signal.c, line 380: - -359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info) -360 { -... -367 list_for_each_entry(q, &list->list, list) { -368 if (q->info.si_signo == sig) { -369 if (first) -370 goto still_pending; -371 first = q; -... -377 if (first) { -378 still_pending: -379 list_del_init(&first->list); -380 copy_siginfo(info, &first->info); -381 __sigqueue_free(first); -... -392 } -393 } - -Here, it is &first->info that is being passed on to copy_siginfo(). The -variable "first" was found on a list -- passed in as the second argument to -collect_signal(). We continue our journey through the stack, to figure out -where the item on "list" was allocated or initialized. We move to line 410: - -395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, -396 siginfo_t *info) -397 { -... -410 collect_signal(sig, pending, info); -... -414 } - -Now we need to follow the "pending" pointer, since that is being passed on to -collect_signal() as "list". At this point, we've run out of lines from the -"addr2line" output. Not to worry, we just paste the next addresses from the -kmemcheck stack dump, i.e.: - - [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 - [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 - [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 - [<ffffffff8100c7b5>] int_signal+0x12/0x17 - - $ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \ - ffffffff8100b87d ffffffff8100c7b5 - kernel/signal.c:446 - kernel/signal.c:1806 - arch/x86/kernel/signal.c:805 - arch/x86/kernel/signal.c:871 - arch/x86/kernel/entry_64.S:694 - -Remember that since these addresses were found on the stack and not as the -RIP value, they actually point to the _next_ instruction (they are return -addresses). This becomes obvious when we look at the code for line 446: - -422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) -423 { -... -431 signr = __dequeue_signal(&tsk->signal->shared_pending, -432 mask, info); -433 /* -434 * itimer signal ? -435 * -436 * itimers are process shared and we restart periodic -437 * itimers in the signal delivery path to prevent DoS -438 * attacks in the high resolution timer case. This is -439 * compliant with the old way of self restarting -440 * itimers, as the SIGALRM is a legacy signal and only -441 * queued once. Changing the restart behaviour to -442 * restart the timer in the signal dequeue path is -443 * reducing the timer noise on heavy loaded !highres -444 * systems too. -445 */ -446 if (unlikely(signr == SIGALRM)) { -... -489 } - -So instead of looking at 446, we should be looking at 431, which is the line -that executes just before 446. Here we see that what we are looking for is -&tsk->signal->shared_pending. - -Our next task is now to figure out which function that puts items on this -"shared_pending" list. A crude, but efficient tool, is git grep: - - $ git grep -n 'shared_pending' kernel/ - ... - kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending; - kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending; - ... - -There were more results, but none of them were related to list operations, -and these were the only assignments. We inspect the line numbers more closely -and find that this is indeed where items are being added to the list: - -816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, -817 int group) -818 { -... -828 pending = group ? &t->signal->shared_pending : &t->pending; -... -851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && -852 (is_si_special(info) || -853 info->si_code >= 0))); -854 if (q) { -855 list_add_tail(&q->list, &pending->list); -... -890 } - -and: - -1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) -1310 { -.... -1339 pending = group ? &t->signal->shared_pending : &t->pending; -1340 list_add_tail(&q->list, &pending->list); -.... -1347 } - -In the first case, the list element we are looking for, "q", is being returned -from the function __sigqueue_alloc(), which looks like an allocation function. -Let's take a look at it: - -187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags, -188 int override_rlimit) -189 { -190 struct sigqueue *q = NULL; -191 struct user_struct *user; -192 -193 /* -194 * We won't get problems with the target's UID changing under us -195 * because changing it requires RCU be used, and if t != current, the -196 * caller must be holding the RCU readlock (by way of a spinlock) and -197 * we use RCU protection here -198 */ -199 user = get_uid(__task_cred(t)->user); -200 atomic_inc(&user->sigpending); -201 if (override_rlimit || -202 atomic_read(&user->sigpending) <= -203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) -204 q = kmem_cache_alloc(sigqueue_cachep, flags); -205 if (unlikely(q == NULL)) { -206 atomic_dec(&user->sigpending); -207 free_uid(user); -208 } else { -209 INIT_LIST_HEAD(&q->list); -210 q->flags = 0; -211 q->user = user; -212 } -213 -214 return q; -215 } - -We see that this function initializes q->list, q->flags, and q->user. It seems -that now is the time to look at the definition of "struct sigqueue", e.g.: - -14 struct sigqueue { -15 struct list_head list; -16 int flags; -17 siginfo_t info; -18 struct user_struct *user; -19 }; - -And, you might remember, it was a memcpy() on &first->info that caused the -warning, so this makes perfect sense. It also seems reasonable to assume that -it is the caller of __sigqueue_alloc() that has the responsibility of filling -out (initializing) this member. - -But just which fields of the struct were uninitialized? Let's look at -kmemcheck's report again: - -WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) -80000000000000000000000000000000000000000088ffff0000000000000000 - i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u - ^ - -These first two lines are the memory dump of the memory object itself, and the -shadow bytemap, respectively. The memory object itself is in this case -&first->info. Just beware that the start of this dump is NOT the start of the -object itself! The position of the caret (^) corresponds with the address of -the read (ffff88003e4a2024). - -The shadow bytemap dump legend is as follows: - - i - initialized - u - uninitialized - a - unallocated (memory has been allocated by the slab layer, but has not - yet been handed off to anybody) - f - freed (memory has been allocated by the slab layer, but has been freed - by the previous owner) - -In order to figure out where (relative to the start of the object) the -uninitialized memory was located, we have to look at the disassembly. For -that, we'll need the RIP address again: - -RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 - - $ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8: - ffffffff8104edc8: mov %r8,0x8(%r8) - ffffffff8104edcc: test %r10d,%r10d - ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168> - ffffffff8104edd5: mov %rax,%rdx - ffffffff8104edd8: mov $0xc,%ecx - ffffffff8104eddd: mov %r13,%rdi - ffffffff8104ede0: mov $0x30,%eax - ffffffff8104ede5: mov %rdx,%rsi - ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi) - ffffffff8104edea: test $0x2,%al - ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0> - ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi) - ffffffff8104edf0: test $0x1,%al - ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5> - ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi) - ffffffff8104edf5: mov %r8,%rdi - ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free> - -As expected, it's the "rep movsl" instruction from the memcpy() that causes -the warning. We know about REP MOVSL that it uses the register RCX to count -the number of remaining iterations. By taking a look at the register dump -again (from the kmemcheck report), we can figure out how many bytes were left -to copy: - -RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 - -By looking at the disassembly, we also see that %ecx is being loaded with the -value $0xc just before (ffffffff8104edd8), so we are very lucky. Keep in mind -that this is the number of iterations, not bytes. And since this is a "long" -operation, we need to multiply by 4 to get the number of bytes. So this means -that the uninitialized value was encountered at 4 * (0xc - 0x9) = 12 bytes -from the start of the object. - -We can now try to figure out which field of the "struct siginfo" that was not -initialized. This is the beginning of the struct: - -40 typedef struct siginfo { -41 int si_signo; -42 int si_errno; -43 int si_code; -44 -45 union { -.. -92 } _sifields; -93 } siginfo_t; - -On 64-bit, the int is 4 bytes long, so it must the union member that has -not been initialized. We can verify this using gdb: - - $ gdb vmlinux - ... - (gdb) p &((struct siginfo *) 0)->_sifields - $1 = (union {...} *) 0x10 - -Actually, it seems that the union member is located at offset 0x10 -- which -means that gcc has inserted 4 bytes of padding between the members si_code -and _sifields. We can now get a fuller picture of the memory dump: - - _----------------------------=> si_code - / _--------------------=> (padding) - | / _------------=> _sifields(._kill._pid) - | | / _----=> _sifields(._kill._uid) - | | | / --------|-------|-------|-------| -80000000000000000000000000000000000000000088ffff0000000000000000 - i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u - -This allows us to realize another important fact: si_code contains the value -0x80. Remember that x86 is little endian, so the first 4 bytes "80000000" are -really the number 0x00000080. With a bit of research, we find that this is -actually the constant SI_KERNEL defined in include/asm-generic/siginfo.h: - -144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */ - -This macro is used in exactly one place in the x86 kernel: In send_signal() -in kernel/signal.c: - -816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, -817 int group) -818 { -... -828 pending = group ? &t->signal->shared_pending : &t->pending; -... -851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && -852 (is_si_special(info) || -853 info->si_code >= 0))); -854 if (q) { -855 list_add_tail(&q->list, &pending->list); -856 switch ((unsigned long) info) { -... -865 case (unsigned long) SEND_SIG_PRIV: -866 q->info.si_signo = sig; -867 q->info.si_errno = 0; -868 q->info.si_code = SI_KERNEL; -869 q->info.si_pid = 0; -870 q->info.si_uid = 0; -871 break; -... -890 } - -Not only does this match with the .si_code member, it also matches the place -we found earlier when looking for where siginfo_t objects are enqueued on the -"shared_pending" list. - -So to sum up: It seems that it is the padding introduced by the compiler -between two struct fields that is uninitialized, and this gets reported when -we do a memcpy() on the struct. This means that we have identified a false -positive warning. - -Normally, kmemcheck will not report uninitialized accesses in memcpy() calls -when both the source and destination addresses are tracked. (Instead, we copy -the shadow bytemap as well). In this case, the destination address clearly -was not tracked. We can dig a little deeper into the stack trace from above: - - arch/x86/kernel/signal.c:805 - arch/x86/kernel/signal.c:871 - arch/x86/kernel/entry_64.S:694 - -And we clearly see that the destination siginfo object is located on the -stack: - -782 static void do_signal(struct pt_regs *regs) -783 { -784 struct k_sigaction ka; -785 siginfo_t info; -... -804 signr = get_signal_to_deliver(&info, &ka, regs, NULL); -... -854 } - -And this &info is what eventually gets passed to copy_siginfo() as the -destination argument. - -Now, even though we didn't find an actual error here, the example is still a -good one, because it shows how one would go about to find out what the report -was all about. - - -3.4. Annotating false positives -=============================== - -There are a few different ways to make annotations in the source code that -will keep kmemcheck from checking and reporting certain allocations. Here -they are: - - o __GFP_NOTRACK_FALSE_POSITIVE - - This flag can be passed to kmalloc() or kmem_cache_alloc() (therefore - also to other functions that end up calling one of these) to indicate - that the allocation should not be tracked because it would lead to - a false positive report. This is a "big hammer" way of silencing - kmemcheck; after all, even if the false positive pertains to - particular field in a struct, for example, we will now lose the - ability to find (real) errors in other parts of the same struct. - - Example: - - /* No warnings will ever trigger on accessing any part of x */ - x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE); - - o kmemcheck_bitfield_begin(name)/kmemcheck_bitfield_end(name) and - kmemcheck_annotate_bitfield(ptr, name) - - The first two of these three macros can be used inside struct - definitions to signal, respectively, the beginning and end of a - bitfield. Additionally, this will assign the bitfield a name, which - is given as an argument to the macros. - - Having used these markers, one can later use - kmemcheck_annotate_bitfield() at the point of allocation, to indicate - which parts of the allocation is part of a bitfield. - - Example: - - struct foo { - int x; - - kmemcheck_bitfield_begin(flags); - int flag_a:1; - int flag_b:1; - kmemcheck_bitfield_end(flags); - - int y; - }; - - struct foo *x = kmalloc(sizeof *x); - - /* No warnings will trigger on accessing the bitfield of x */ - kmemcheck_annotate_bitfield(x, flags); - - Note that kmemcheck_annotate_bitfield() can be used even before the - return value of kmalloc() is checked -- in other words, passing NULL - as the first argument is legal (and will do nothing). - - -4. Reporting errors -=================== - -As we have seen, kmemcheck will produce false positive reports. Therefore, it -is not very wise to blindly post kmemcheck warnings to mailing lists and -maintainers. Instead, I encourage maintainers and developers to find errors -in their own code. If you get a warning, you can try to work around it, try -to figure out if it's a real error or not, or simply ignore it. Most -developers know their own code and will quickly and efficiently determine the -root cause of a kmemcheck report. This is therefore also the most efficient -way to work with kmemcheck. - -That said, we (the kmemcheck maintainers) will always be on the lookout for -false positives that we can annotate and silence. So whatever you find, -please drop us a note privately! Kernel configs and steps to reproduce (if -available) are of course a great help too. - -Happy hacking! - - -5. Technical description -======================== - -kmemcheck works by marking memory pages non-present. This means that whenever -somebody attempts to access the page, a page fault is generated. The page -fault handler notices that the page was in fact only hidden, and so it calls -on the kmemcheck code to make further investigations. - -When the investigations are completed, kmemcheck "shows" the page by marking -it present (as it would be under normal circumstances). This way, the -interrupted code can continue as usual. - -But after the instruction has been executed, we should hide the page again, so -that we can catch the next access too! Now kmemcheck makes use of a debugging -feature of the processor, namely single-stepping. When the processor has -finished the one instruction that generated the memory access, a debug -exception is raised. From here, we simply hide the page again and continue -execution, this time with the single-stepping feature turned off. - -kmemcheck requires some assistance from the memory allocator in order to work. -The memory allocator needs to - - 1. Tell kmemcheck about newly allocated pages and pages that are about to - be freed. This allows kmemcheck to set up and tear down the shadow memory - for the pages in question. The shadow memory stores the status of each - byte in the allocation proper, e.g. whether it is initialized or - uninitialized. - - 2. Tell kmemcheck which parts of memory should be marked uninitialized. - There are actually a few more states, such as "not yet allocated" and - "recently freed". - -If a slab cache is set up using the SLAB_NOTRACK flag, it will never return -memory that can take page faults because of kmemcheck. - -If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still -request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags. -This does not prevent the page faults from occurring, however, but marks the -object in question as being initialized so that no warnings will ever be -produced for this object. - -Currently, the SLAB and SLUB allocators are supported by kmemcheck. diff --git a/Documentation/ko_KR/memory-barriers.txt b/Documentation/ko_KR/memory-barriers.txt new file mode 100644 index 000000000000..34d3d380893d --- /dev/null +++ b/Documentation/ko_KR/memory-barriers.txt @@ -0,0 +1,3135 @@ +NOTE: +This is a version of Documentation/memory-barriers.txt translated into Korean. +This document is maintained by SeongJae Park <sj38.park@gmail.com>. +If you find any difference between this document and the original file or +a problem with the translation, please contact the maintainer of this file. + +Please also note that the purpose of this file is to be easier to +read for non English (read: Korean) speakers and is not intended as +a fork. So if you have any comments or updates for this file please +update the original English file first. The English version is +definitive, and readers should look there if they have any doubt. + +=================================== +์ด ๋ฌธ์๋ +Documentation/memory-barriers.txt +์ ํ๊ธ ๋ฒ์ญ์
๋๋ค. + +์ญ์๏ผ ๋ฐ์ฑ์ฌ <sj38.park@gmail.com> +=================================== + + + ========================= + ๋ฆฌ๋
์ค ์ปค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด + ========================= + +์ ์: David Howells <dhowells@redhat.com> + Paul E. McKenney <paulmck@linux.vnet.ibm.com> + Will Deacon <will.deacon@arm.com> + Peter Zijlstra <peterz@infradead.org> + +======== +๋ฉด์ฑ
์กฐํญ +======== + +์ด ๋ฌธ์๋ ๋ช
์ธ์๊ฐ ์๋๋๋ค; ์ด ๋ฌธ์๋ ์๋ฒฝํ์ง ์์๋ฐ, ๊ฐ๊ฒฐ์ฑ์ ์ํด ์๋๋ +๋ถ๋ถ๋ ์๊ณ , ์๋ํ์ง ์์์ง๋ง ์ฌ๋์ ์ํด ์ฐ์๋ค๋ณด๋ ๋ถ์์ ํ ๋ถ๋ถ๋ ์์ต๋๋ค. +์ด ๋ฌธ์๋ ๋ฆฌ๋
์ค์์ ์ ๊ณตํ๋ ๋ค์ํ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ ์ฌ์ฉํ๊ธฐ ์ํ +์๋ด์์
๋๋ค๋ง, ๋ญ๊ฐ ์ด์ํ๋ค ์ถ์ผ๋ฉด (๊ทธ๋ฐ๊ฒ ๋ง์ ๊ฒ๋๋ค) ์ง๋ฌธ์ ๋ถํ๋๋ฆฝ๋๋ค. + +๋ค์ ๋งํ์ง๋ง, ์ด ๋ฌธ์๋ ๋ฆฌ๋
์ค๊ฐ ํ๋์จ์ด์ ๊ธฐ๋ํ๋ ์ฌํญ์ ๋ํ ๋ช
์ธ์๊ฐ +์๋๋๋ค. + +์ด ๋ฌธ์์ ๋ชฉ์ ์ ๋๊ฐ์ง์
๋๋ค: + + (1) ์ด๋ค ํน์ ๋ฐฐ๋ฆฌ์ด์ ๋ํด ๊ธฐ๋ํ ์ ์๋ ์ต์ํ์ ๊ธฐ๋ฅ์ ๋ช
์ธํ๊ธฐ ์ํด์, + ๊ทธ๋ฆฌ๊ณ + + (2) ์ฌ์ฉ ๊ฐ๋ฅํ ๋ฐฐ๋ฆฌ์ด๋ค์ ๋ํด ์ด๋ป๊ฒ ์ฌ์ฉํด์ผ ํ๋์ง์ ๋ํ ์๋ด๋ฅผ ์ ๊ณตํ๊ธฐ + ์ํด์. + +์ด๋ค ์ํคํ
์ณ๋ ํน์ ํ ๋ฐฐ๋ฆฌ์ด๋ค์ ๋ํด์๋ ์ฌ๊ธฐ์ ์ด์ผ๊ธฐํ๋ ์ต์ํ์ +์๊ตฌ์ฌํญ๋ค๋ณด๋ค ๋ง์ ๊ธฐ๋ฅ์ ์ ๊ณตํ ์๋ ์์ต๋๋ค๋ง, ์ฌ๊ธฐ์ ์ด์ผ๊ธฐํ๋ +์๊ตฌ์ฌํญ๋ค์ ์ถฉ์กฑํ์ง ์๋ ์ํคํ
์ณ๊ฐ ์๋ค๋ฉด ๊ทธ ์ํคํ
์ณ๊ฐ ์๋ชป๋ ๊ฒ์ด๋ ์ ์ +์์๋์๊ธฐ ๋ฐ๋๋๋ค. + +๋ํ, ํน์ ์ํคํ
์ณ์์ ์ผ๋ถ ๋ฐฐ๋ฆฌ์ด๋ ํด๋น ์ํคํ
์ณ์ ํน์ํ ๋์ ๋ฐฉ์์ผ๋ก ์ธํด +ํด๋น ๋ฐฐ๋ฆฌ์ด์ ๋ช
์์ ์ฌ์ฉ์ด ๋ถํ์ํด์ no-op ์ด ๋ ์๋ ์์์ ์์๋์๊ธฐ +๋ฐ๋๋๋ค. + +์ญ์: ๋ณธ ๋ฒ์ญ ์ญ์ ์๋ฒฝํ์ง ์์๋ฐ, ์ด ์ญ์ ๋ถ๋ถ์ ์ผ๋ก๋ ์๋๋ ๊ฒ์ด๊ธฐ๋ +ํฉ๋๋ค. ์ฌํ ๊ธฐ์ ๋ฌธ์๋ค์ด ๊ทธ๋ ๋ฏ ์๋ฒฝํ ์ดํด๋ฅผ ์ํด์๋ ๋ฒ์ญ๋ฌธ๊ณผ ์๋ฌธ์ ํจ๊ป +์ฝ์ผ์๋ ๋ฒ์ญ๋ฌธ์ ํ๋์ ๊ฐ์ด๋๋ก ํ์ฉํ์๊ธธ ์ถ์ฒ๋๋ฆฌ๋ฉฐ, ๋ฐ๊ฒฌ๋๋ ์ค์ญ ๋ฑ์ +๋ํด์๋ ์ธ์ ๋ ์๊ฒฌ์ ๋ถํ๋๋ฆฝ๋๋ค. ๊ณผํ ๋ฒ์ญ์ผ๋ก ์ธํ ์คํด๋ฅผ ์ต์ํํ๊ธฐ ์ํด +์ ๋งคํ ๋ถ๋ถ์ด ์์ ๊ฒฝ์ฐ์๋ ์ด์ํจ์ด ์๋๋ผ๋ ์๋์ ์ฉ์ด๋ฅผ ์ฐจ์ฉํฉ๋๋ค. + + +===== +๋ชฉ์ฐจ: +===== + + (*) ์ถ์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ๋ชจ๋ธ. + + - ๋๋ฐ์ด์ค ์คํผ๋ ์ด์
. + - ๋ณด์ฅ์ฌํญ. + + (*) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ฌด์์ธ๊ฐ? + + - ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ ์ข
๋ฅ. + - ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ ๋ํด ๊ฐ์ ํด์ ์๋ ๊ฒ. + - ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด. + - ์ปจํธ๋กค ์์กด์ฑ. + - SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ. + - ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ํ์ค์ ์. + - ์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด vs ๋ก๋ ์์ธก. + - ์ดํ์ฑ + + (*) ๋ช
์์ ์ปค๋ ๋ฐฐ๋ฆฌ์ด. + + - ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด. + - CPU ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + - MMIO ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด. + + (*) ์๋ฌต์ ์ปค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + + - ๋ฝ Acquisition ํจ์. + - ์ธํฐ๋ฝํธ ๋นํ์ฑํ ํจ์. + - ์ฌ๋ฆฝ๊ณผ ์จ์ดํฌ์
ํจ์. + - ๊ทธ์ธ์ ํจ์๋ค. + + (*) CPU ๊ฐ ACQUIRING ๋ฐฐ๋ฆฌ์ด์ ํจ๊ณผ. + + - Acquire vs ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค. + - Acquire vs I/O ์ก์ธ์ค. + + (*) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ๊ณณ + + - ํ๋ก์ธ์๊ฐ ์ํธ ์์ฉ. + - ์ดํ ๋ฏน ์คํผ๋ ์ด์
. + - ๋๋ฐ์ด์ค ์ก์ธ์ค. + - ์ธํฐ๋ฝํธ. + + (*) ์ปค๋ I/O ๋ฐฐ๋ฆฌ์ด์ ํจ๊ณผ. + + (*) ๊ฐ์ ๋๋ ๊ฐ์ฅ ์ํ๋ ์คํ ์์ ๋ชจ๋ธ. + + (*) CPU ์บ์์ ์ํฅ. + + - ์บ์ ์ผ๊ด์ฑ. + - ์บ์ ์ผ๊ด์ฑ vs DMA. + - ์บ์ ์ผ๊ด์ฑ vs MMIO. + + (*) CPU ๋ค์ด ์ ์ง๋ฅด๋ ์ผ๋ค. + + - ๊ทธ๋ฆฌ๊ณ , Alpha ๊ฐ ์๋ค. + - ๊ฐ์ ๋จธ์ ๊ฒ์คํธ. + + (*) ์ฌ์ฉ ์. + + - ์ํ์ ๋ฒํผ. + + (*) ์ฐธ๊ณ ๋ฌธํ. + + +======================= +์ถ์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ๋ชจ๋ธ +======================= + +๋ค์๊ณผ ๊ฐ์ด ์ถ์ํ๋ ์์คํ
๋ชจ๋ธ์ ์๊ฐํด ๋ด
์๋ค: + + : : + : : + : : + +-------+ : +--------+ : +-------+ + | | : | | : | | + | | : | | : | | + | CPU 1 |<----->| Memory |<----->| CPU 2 | + | | : | | : | | + | | : | | : | | + +-------+ : +--------+ : +-------+ + ^ : ^ : ^ + | : | : | + | : | : | + | : v : | + | : +--------+ : | + | : | | : | + | : | | : | + +---------->| Device |<----------+ + : | | : + : | | : + : +--------+ : + : : + +ํ๋ก๊ทธ๋จ์ ์ฌ๋ฌ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ์คํผ๋ ์ด์
์ ๋ฐ์์ํค๊ณ , ๊ฐ๊ฐ์ CPU ๋ ๊ทธ๋ฐ +ํ๋ก๊ทธ๋จ๋ค์ ์คํํฉ๋๋ค. ์ถ์ํ๋ CPU ๋ชจ๋ธ์์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ ์์๋ +๋งค์ฐ ์ํ๋์ด ์๊ณ , CPU ๋ ํ๋ก๊ทธ๋จ์ด ์ธ๊ณผ๊ด๊ณ๋ฅผ ์ด๊ธฐ์ง ์๋ ์ํ๋ก ๊ด๋ฆฌ๋๋ค๊ณ +๋ณด์ผ ์๋ง ์๋ค๋ฉด ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ ์์ ์ด ์ํ๋ ์ด๋ค ์์๋๋ก๋ ์ฌ๋ฐฐ์นํด +๋์์ํฌ ์ ์์ต๋๋ค. ๋น์ทํ๊ฒ, ์ปดํ์ผ๋ฌ ๋ํ ํ๋ก๊ทธ๋จ์ ์ ์์ ๋์์ ํด์น์ง +์๋ ํ๋ ๋ด์์๋ ์ด๋ค ์์๋ก๋ ์์ ์ด ์ํ๋ ๋๋ก ์ธ์คํธ๋ญ์
์ ์ฌ๋ฐฐ์น ํ ์ +์์ต๋๋ค. + +๋ฐ๋ผ์ ์์ ๋ค์ด์ด๊ทธ๋จ์์ ํ CPU๊ฐ ๋์์ํค๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ด ๋ง๋ค์ด๋ด๋ +๋ณํ๋ ํด๋น ์คํผ๋ ์ด์
์ด CPU ์ ์์คํ
์ ๋ค๋ฅธ ๋ถ๋ถ๋ค ์ฌ์ด์ ์ธํฐํ์ด์ค(์ ์ )๋ฅผ +์ง๋๊ฐ๋ฉด์ ์์คํ
์ ๋๋จธ์ง ๋ถ๋ถ๋ค์ ์ธ์ง๋ฉ๋๋ค. + + +์๋ฅผ ๋ค์ด, ๋ค์์ ์ผ๋ จ์ ์ด๋ฒคํธ๋ค์ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + =============== =============== + { A == 1; B == 2 } + A = 3; x = B; + B = 4; y = A; + +๋ค์ด์ด๊ทธ๋จ์ ๊ฐ์ด๋ฐ์ ์์นํ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ๋ณด์ฌ์ง๊ฒ ๋๋ ์ก์ธ์ค๋ค์ ๋ค์์ ์ด +24๊ฐ์ ์กฐํฉ์ผ๋ก ์ฌ๊ตฌ์ฑ๋ ์ ์์ต๋๋ค: + + STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4 + STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3 + STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4 + STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4 + STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3 + STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4 + STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4 + STORE B=4, ... + ... + +๋ฐ๋ผ์ ๋ค์์ ๋ค๊ฐ์ง ์กฐํฉ์ ๊ฐ๋ค์ด ๋์ฌ ์ ์์ต๋๋ค: + + x == 2, y == 1 + x == 2, y == 3 + x == 4, y == 1 + x == 4, y == 3 + + +ํ๋ฐ ๋ ๋์๊ฐ์, ํ CPU ๊ฐ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ๋ฐ์ํ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ ๊ฒฐ๊ณผ๋ +๋ค๋ฅธ CPU ์์์ ๋ก๋ ์คํผ๋ ์ด์
์ ํตํด ์ธ์ง๋๋๋ฐ, ์ด ๋ ์คํ ์ด๊ฐ ๋ฐ์๋ ์์์ +๋ค๋ฅธ ์์๋ก ์ธ์ง๋ ์๋ ์์ต๋๋ค. + + +์๋ก, ์๋์ ์ผ๋ จ์ ์ด๋ฒคํธ๋ค์ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + =============== =============== + { A == 1, B == 2, C == 3, P == &A, Q == &C } + B = 4; Q = P; + P = &B D = *Q; + +D ๋ก ์ฝํ์ง๋ ๊ฐ์ CPU 2 ์์ P ๋ก๋ถํฐ ์ฝํ์ง ์ฃผ์๊ฐ์ ์์กด์ ์ด๊ธฐ ๋๋ฌธ์ ์ฌ๊ธฐ์ +๋ถ๋ช
ํ ๋ฐ์ดํฐ ์์กด์ฑ์ด ์์ต๋๋ค. ํ์ง๋ง ์ด ์ด๋ฒคํธ๋ค์ ์คํ ๊ฒฐ๊ณผ๋ก๋ ์๋์ +๊ฒฐ๊ณผ๋ค์ด ๋ชจ๋ ๋ํ๋ ์ ์์ต๋๋ค: + + (Q == &A) and (D == 1) + (Q == &B) and (D == 2) + (Q == &B) and (D == 4) + +CPU 2 ๋ *Q ์ ๋ก๋๋ฅผ ์์ฒญํ๊ธฐ ์ ์ P ๋ฅผ Q ์ ๋ฃ๊ธฐ ๋๋ฌธ์ D ์ C ๋ฅผ ์ง์ด๋ฃ๋ +์ผ์ ์์์ ์์๋์ธ์. + + +๋๋ฐ์ด์ค ์คํผ๋ ์ด์
+------------------- + +์ผ๋ถ ๋๋ฐ์ด์ค๋ ์์ ์ ์ปจํธ๋กค ์ธํฐํ์ด์ค๋ฅผ ๋ฉ๋ชจ๋ฆฌ์ ํน์ ์์ญ์ผ๋ก ๋งคํํด์ +์ ๊ณตํ๋๋ฐ(Memory mapped I/O), ํด๋น ์ปจํธ๋กค ๋ ์ง์คํฐ์ ์ ๊ทผํ๋ ์์๋ ๋งค์ฐ +์ค์ํฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์ด๋๋ ์ค ํฌํธ ๋ ์ง์คํฐ (A) ์ ๋ฐ์ดํฐ ํฌํธ ๋ ์ง์คํฐ (D) +๋ฅผ ํตํด ์ ๊ทผ๋๋ ๋ด๋ถ ๋ ์ง์คํฐ ์งํฉ์ ๊ฐ๋ ์ด๋๋ท ์นด๋๋ฅผ ์๊ฐํด ๋ด
์๋ค. ๋ด๋ถ์ +5๋ฒ ๋ ์ง์คํฐ๋ฅผ ์ฝ๊ธฐ ์ํด ๋ค์์ ์ฝ๋๊ฐ ์ฌ์ฉ๋ ์ ์์ต๋๋ค: + + *A = 5; + x = *D; + +ํ์ง๋ง, ์ด๊ฑด ๋ค์์ ๋ ์กฐํฉ ์ค ํ๋๋ก ๋ง๋ค์ด์ง ์ ์์ต๋๋ค: + + STORE *A = 5, x = LOAD *D + x = LOAD *D, STORE *A = 5 + +๋๋ฒ์งธ ์กฐํฉ์ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ด์จ _ํ์_ ์ฃผ์๋ฅผ ์ค์ ํ๋ฏ๋ก, ์ค๋์์ ์ผ์ผํฌ ๊ฒ๋๋ค. + + +๋ณด์ฅ์ฌํญ +-------- + +CPU ์๊ฒ ๊ธฐ๋ํ ์ ์๋ ์ต์ํ์ ๋ณด์ฅ์ฌํญ ๋ช๊ฐ์ง๊ฐ ์์ต๋๋ค: + + (*) ์ด๋ค CPU ๋ , ์์กด์ฑ์ด ์กด์ฌํ๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ํด๋น CPU ์์ ์๊ฒ + ์์ด์๋ ์์๋๋ก ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ์ํ ์์ฒญ๋ฉ๋๋ค. ์ฆ, ๋ค์์ ๋ํด์: + + Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q); + + CPU ๋ ๋ค์๊ณผ ๊ฐ์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ํ์ค๋ฅผ ์ํ ์์ฒญํฉ๋๋ค: + + Q = LOAD P, D = LOAD *Q + + ๊ทธ๋ฆฌ๊ณ ๊ทธ ์ํ์ค ๋ด์์์ ์์๋ ํญ์ ์ง์ผ์ง๋๋ค. ๋๋ถ๋ถ์ ์์คํ
์์ + smp_read_barrier_depends() ๋ ์๋ฌด์ผ๋ ์ํ์ง๋ง DEC Alpha ์์๋ + ๋ช
์์ ์ผ๋ก ์ฌ์ฉ๋์ด์ผ ํฉ๋๋ค. ๋ณดํต์ ๊ฒฝ์ฐ์๋ smp_read_barrier_depends() + ๋ฅผ ์ง์ ์ฌ์ฉํ๋ ๋์ rcu_dereference() ๊ฐ์ ๊ฒ๋ค์ ์ฌ์ฉํด์ผ ํจ์ + ์์๋์ธ์. + + (*) ํน์ CPU ๋ด์์ ๊ฒน์น๋ ์์ญ์ ๋ฉ๋ชจ๋ฆฌ์ ํํด์ง๋ ๋ก๋์ ์คํ ์ด ๋ค์ ํด๋น + CPU ์์์๋ ์์๊ฐ ๋ฐ๋์ง ์์ ๊ฒ์ผ๋ก ๋ณด์ฌ์ง๋๋ค. ์ฆ, ๋ค์์ ๋ํด์: + + a = READ_ONCE(*X); WRITE_ONCE(*X, b); + + CPU ๋ ๋ค์์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ํ์ค๋ง์ ๋ฉ๋ชจ๋ฆฌ์ ์์ฒญํ ๊ฒ๋๋ค: + + a = LOAD *X, STORE *X = b + + ๊ทธ๋ฆฌ๊ณ ๋ค์์ ๋ํด์๋: + + WRITE_ONCE(*X, c); d = READ_ONCE(*X); + + CPU ๋ ๋ค์์ ์ํ ์์ฒญ๋ง์ ๋ง๋ค์ด ๋
๋๋ค: + + STORE *X = c, d = LOAD *X + + (๋ก๋ ์คํผ๋ ์ด์
๊ณผ ์คํ ์ด ์คํผ๋ ์ด์
์ด ๊ฒน์น๋ ๋ฉ๋ชจ๋ฆฌ ์์ญ์ ๋ํด + ์ํ๋๋ค๋ฉด ํด๋น ์คํผ๋ ์ด์
๋ค์ ๊ฒน์น๋ค๊ณ ํํ๋ฉ๋๋ค). + +๊ทธ๋ฆฌ๊ณ _๋ฐ๋์_ ๋๋ _์ ๋๋ก_ ๊ฐ์ ํ๊ฑฐ๋ ๊ฐ์ ํ์ง ๋ง์์ผ ํ๋ ๊ฒ๋ค์ด ์์ต๋๋ค: + + (*) ์ปดํ์ผ๋ฌ๊ฐ READ_ONCE() ๋ WRITE_ONCE() ๋ก ๋ณดํธ๋์ง ์์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ฅผ + ๋น์ ์ด ์ํ๋ ๋๋ก ํ ๊ฒ์ด๋ผ๋ ๊ฐ์ ์ _์ ๋๋ก_ ํด์ ์๋ฉ๋๋ค. ๊ทธ๊ฒ๋ค์ด + ์๋ค๋ฉด, ์ปดํ์ผ๋ฌ๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด ์น์
์์ ๋ค๋ฃจ๊ฒ ๋ , ๋ชจ๋ "์ฐฝ์์ ์ธ" + ๋ณ๊ฒฝ๋ค์ ๋ง๋ค์ด๋ผ ๊ถํ์ ๊ฐ๊ฒ ๋ฉ๋๋ค. + + (*) ๊ฐ๋ณ์ ์ธ ๋ก๋์ ์คํ ์ด๋ค์ด ์ฃผ์ด์ง ์์๋๋ก ์์ฒญ๋ ๊ฒ์ด๋ผ๋ ๊ฐ์ ์ _์ ๋๋ก_ + ํ์ง ๋ง์์ผ ํฉ๋๋ค. ์ด ๋ง์ ๊ณง: + + X = *A; Y = *B; *D = Z; + + ๋ ๋ค์์ ๊ฒ๋ค ์ค ์ด๋ ๊ฒ์ผ๋ก๋ ๋ง๋ค์ด์ง ์ ์๋ค๋ ์๋ฏธ์
๋๋ค: + + X = LOAD *A, Y = LOAD *B, STORE *D = Z + X = LOAD *A, STORE *D = Z, Y = LOAD *B + Y = LOAD *B, X = LOAD *A, STORE *D = Z + Y = LOAD *B, STORE *D = Z, X = LOAD *A + STORE *D = Z, X = LOAD *A, Y = LOAD *B + STORE *D = Z, Y = LOAD *B, X = LOAD *A + + (*) ๊ฒน์น๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ํฉ์ณ์ง๊ฑฐ๋ ๋ฒ๋ ค์ง ์ ์์์ _๋ฐ๋์_ ๊ฐ์ ํด์ผ + ํฉ๋๋ค. ๋ค์์ ์ฝ๋๋: + + X = *A; Y = *(A + 4); + + ๋ค์์ ๊ฒ๋ค ์ค ๋ญ๋ ๋ ์ ์์ต๋๋ค: + + X = LOAD *A; Y = LOAD *(A + 4); + Y = LOAD *(A + 4); X = LOAD *A; + {X, Y} = LOAD {*A, *(A + 4) }; + + ๊ทธ๋ฆฌ๊ณ : + + *A = X; *(A + 4) = Y; + + ๋ ๋ค์ ์ค ๋ญ๋ ๋ ์ ์์ต๋๋ค: + + STORE *A = X; STORE *(A + 4) = Y; + STORE *(A + 4) = Y; STORE *A = X; + STORE {*A, *(A + 4) } = {X, Y}; + +๊ทธ๋ฆฌ๊ณ ๋ณด์ฅ์ฌํญ์ ๋ฐ๋๋๋ ๊ฒ๋ค(anti-guarantees)์ด ์์ต๋๋ค: + + (*) ์ด ๋ณด์ฅ์ฌํญ๋ค์ bitfield ์๋ ์ ์ฉ๋์ง ์๋๋ฐ, ์ปดํ์ผ๋ฌ๋ค์ bitfield ๋ฅผ + ์์ ํ๋ ์ฝ๋๋ฅผ ์์ฑํ ๋ ์์์ฑ ์๋(non-atomic) ์ฝ๊ณ -์์ ํ๊ณ -์ฐ๋ + ์ธ์คํธ๋ญ์
๋ค์ ์กฐํฉ์ ๋ง๋๋ ๊ฒฝ์ฐ๊ฐ ๋ง๊ธฐ ๋๋ฌธ์
๋๋ค. ๋ณ๋ ฌ ์๊ณ ๋ฆฌ์ฆ์ + ๋๊ธฐํ์ bitfield ๋ฅผ ์ฌ์ฉํ๋ ค ํ์ง ๋ง์ญ์์ค. + + (*) bitfield ๋ค์ด ์ฌ๋ฌ ๋ฝ์ผ๋ก ๋ณดํธ๋๋ ๊ฒฝ์ฐ๋ผ ํ๋๋ผ๋, ํ๋์ bitfield ์ + ๋ชจ๋ ํ๋๋ค์ ํ๋์ ๋ฝ์ผ๋ก ๋ณดํธ๋์ด์ผ ํฉ๋๋ค. ๋ง์ฝ ํ bitfield ์ ๋ + ํ๋๊ฐ ์๋ก ๋ค๋ฅธ ๋ฝ์ผ๋ก ๋ณดํธ๋๋ค๋ฉด, ์ปดํ์ผ๋ฌ์ ์์์ฑ ์๋ + ์ฝ๊ณ -์์ ํ๊ณ -์ฐ๋ ์ธ์คํธ๋ญ์
์กฐํฉ์ ํ ํ๋์์ ์
๋ฐ์ดํธ๊ฐ ๊ทผ์ฒ์ + ํ๋์๋ ์ํฅ์ ๋ผ์น๊ฒ ํ ์ ์์ต๋๋ค. + + (*) ์ด ๋ณด์ฅ์ฌํญ๋ค์ ์ ์ ํ๊ฒ ์ ๋ ฌ๋๊ณ ํฌ๊ธฐ๊ฐ ์กํ ์ค์นผ๋ผ ๋ณ์๋ค์ ๋ํด์๋ง + ์ ์ฉ๋ฉ๋๋ค. "์ ์ ํ๊ฒ ํฌ๊ธฐ๊ฐ ์กํ" ์ด๋ผํจ์ ํ์ฌ๋ก์จ๋ "char", "short", + "int" ๊ทธ๋ฆฌ๊ณ "long" ๊ณผ ๊ฐ์ ํฌ๊ธฐ์ ๋ณ์๋ค์ ์๋ฏธํฉ๋๋ค. "์ ์ ํ๊ฒ ์ ๋ ฌ๋" + ์ ์์ฐ์ค๋ฐ ์ ๋ ฌ์ ์๋ฏธํ๋๋ฐ, ๋ฐ๋ผ์ "char" ์ ๋ํด์๋ ์๋ฌด ์ ์ฝ์ด ์๊ณ , + "short" ์ ๋ํด์๋ 2๋ฐ์ดํธ ์ ๋ ฌ์, "int" ์๋ 4๋ฐ์ดํธ ์ ๋ ฌ์, ๊ทธ๋ฆฌ๊ณ + "long" ์ ๋ํด์๋ 32-bit ์์คํ
์ธ์ง 64-bit ์์คํ
์ธ์ง์ ๋ฐ๋ผ 4๋ฐ์ดํธ ๋๋ + 8๋ฐ์ดํธ ์ ๋ ฌ์ ์๋ฏธํฉ๋๋ค. ์ด ๋ณด์ฅ์ฌํญ๋ค์ C11 ํ์ค์์ ์๊ฐ๋์์ผ๋ฏ๋ก, + C11 ์ ์ ์ค๋๋ ์ปดํ์ผ๋ฌ(์๋ฅผ ๋ค์ด, gcc 4.6) ๋ฅผ ์ฌ์ฉํ ๋์ ์ฃผ์ํ์๊ธฐ + ๋ฐ๋๋๋ค. ํ์ค์ ์ด ๋ณด์ฅ์ฌํญ๋ค์ "memory location" ์ ์ ์ํ๋ 3.14 + ์น์
์ ๋ค์๊ณผ ๊ฐ์ด ์ค๋ช
๋์ด ์์ต๋๋ค: + (์ญ์: ์ธ์ฉ๋ฌธ์ด๋ฏ๋ก ๋ฒ์ญํ์ง ์์ต๋๋ค) + + memory location + either an object of scalar type, or a maximal sequence + of adjacent bit-fields all having nonzero width + + NOTE 1: Two threads of execution can update and access + separate memory locations without interfering with + each other. + + NOTE 2: A bit-field and an adjacent non-bit-field member + are in separate memory locations. The same applies + to two bit-fields, if one is declared inside a nested + structure declaration and the other is not, or if the two + are separated by a zero-length bit-field declaration, + or if they are separated by a non-bit-field member + declaration. It is not safe to concurrently update two + bit-fields in the same structure if all members declared + between them are also bit-fields, no matter what the + sizes of those intervening bit-fields happen to be. + + +========================= +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ฌด์์ธ๊ฐ? +========================= + +์์์ ๋ดค๋ฏ์ด, ์ํธ๊ฐ ์์กด์ฑ์ด ์๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ ์ค์ ๋ก๋ ๋ฌด์์์ +์์๋ก ์ํ๋ ์ ์์ผ๋ฉฐ, ์ด๋ CPU ์ CPU ๊ฐ์ ์ํธ์์ฉ์ด๋ I/O ์ ๋ฌธ์ ๊ฐ ๋ ์ +์์ต๋๋ค. ๋ฐ๋ผ์ ์ปดํ์ผ๋ฌ์ CPU ๊ฐ ์์๋ฅผ ๋ฐ๊พธ๋๋ฐ ์ ์ฝ์ ๊ฑธ ์ ์๋๋ก ๊ฐ์
ํ +์ ์๋ ์ด๋ค ๋ฐฉ๋ฒ์ด ํ์ํฉ๋๋ค. + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๊ทธ๋ฐ ๊ฐ์
์๋จ์
๋๋ค. ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ด์ ๋ ์๊ณผ +๋ค ์์ธก์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค ๊ฐ์ ๋ถ๋ถ์ ์์๊ฐ ์กด์ฌํ๋๋ก ํ๋ ํจ๊ณผ๋ฅผ ์ค๋๋ค. + +์์คํ
์ CPU ๋ค๊ณผ ์ฌ๋ฌ ๋๋ฐ์ด์ค๋ค์ ์ฑ๋ฅ์ ์ฌ๋ฆฌ๊ธฐ ์ํด ๋ช
๋ น์ด ์ฌ๋ฐฐ์น, ์คํ +์ ์, ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ ์กฐํฉ, ์์ธก์ ๋ก๋(speculative load), ๋ธ๋์น +์์ธก(speculative branch prediction), ๋ค์ํ ์ข
๋ฅ์ ์บ์ฑ(caching) ๋ฑ์ ๋ค์ํ +ํธ๋ฆญ์ ์ฌ์ฉํ ์ ์๊ธฐ ๋๋ฌธ์ ์ด๋ฐ ๊ฐ์ ๋ ฅ์ ์ค์ํฉ๋๋ค. ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ ์ด๋ฐ +ํธ๋ฆญ๋ค์ ๋ฌดํจ๋ก ํ๊ฑฐ๋ ์ต์ ํ๋ ๋ชฉ์ ์ผ๋ก ์ฌ์ฉ๋์ด์ ธ์ ์ฝ๋๊ฐ ์ฌ๋ฌ CPU ์ +๋๋ฐ์ด์ค๋ค ๊ฐ์ ์ํธ์์ฉ์ ์ ์์ ์ผ๋ก ์ ์ดํ ์ ์๊ฒ ํด์ค๋๋ค. + + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ ์ข
๋ฅ +-------------------- + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ค๊ฐ์ ๊ธฐ๋ณธ ํ์
์ผ๋ก ๋ถ๋ฅ๋ฉ๋๋ค: + + (1) ์ฐ๊ธฐ (๋๋ ์คํ ์ด) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + + ์ฐ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์์คํ
์ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ค์ ํด๋น ๋ฐฐ๋ฆฌ์ด๋ณด๋ค ์์ + ๋ช
์๋ ๋ชจ๋ STORE ์คํผ๋ ์ด์
๋ค์ด ํด๋น ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ช
์๋ ๋ชจ๋ STORE + ์คํผ๋ ์ด์
๋ค๋ณด๋ค ๋จผ์ ์ํ๋ ๊ฒ์ผ๋ก ๋ณด์ผ ๊ฒ์ ๋ณด์ฅํฉ๋๋ค. + + ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ ๋ํ ๋ถ๋ถ์ ์์ ์ธ์ฐ๊ธฐ์
๋๋ค; ๋ก๋ + ์คํผ๋ ์ด์
๋ค์ ๋ํด์๋ ์ด๋ค ์ํฅ๋ ๋ผ์น์ง ์์ต๋๋ค. + + CPU ๋ ์๊ฐ์ ํ๋ฆ์ ๋ฐ๋ผ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ์ผ๋ จ์ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ + ํ๋์ฉ ์์ฒญํด ์ง์ด๋ฃ์ต๋๋ค. ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด ์์ ๋ชจ๋ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ + ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ชจ๋ ์คํ ์ด ์คํผ๋ ์ด์
๋ค๋ณด๋ค _์์_ ์ํ๋ ๊ฒ๋๋ค. + + [!] ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ค์ ์ฝ๊ธฐ ๋๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด์ ํจ๊ป ์ง์ ๋ง์ถฐ + ์ฌ์ฉ๋์ด์ผ๋ง ํจ์ ์์๋์ธ์; "SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + + (2) ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด. + + ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด์ ๋ณด๋ค ์ํ๋ ํํ์
๋๋ค. ๋๊ฐ์ ๋ก๋ + ์คํผ๋ ์ด์
์ด ์๊ณ ๋๋ฒ์งธ ๊ฒ์ด ์ฒซ๋ฒ์งธ ๊ฒ์ ๊ฒฐ๊ณผ์ ์์กดํ๊ณ ์์ ๋(์: + ๋๋ฒ์งธ ๋ก๋๊ฐ ์ฐธ์กฐํ ์ฃผ์๋ฅผ ์ฒซ๋ฒ์งธ ๋ก๋๊ฐ ์ฝ๋ ๊ฒฝ์ฐ), ๋๋ฒ์งธ ๋ก๋๊ฐ ์ฝ์ด์ฌ + ๋ฐ์ดํฐ๋ ์ฒซ๋ฒ์งธ ๋ก๋์ ์ํด ๊ทธ ์ฃผ์๊ฐ ์ป์ด์ง๊ธฐ ์ ์ ์
๋ฐ์ดํธ ๋์ด ์์์ + ๋ณด์ฅํ๊ธฐ ์ํด์ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ์ ์์ต๋๋ค. + + ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์ํธ ์์กด์ ์ธ ๋ก๋ ์คํผ๋ ์ด์
๋ค ์ฌ์ด์ ๋ถ๋ถ์ ์์ + ์ธ์ฐ๊ธฐ์
๋๋ค; ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ด๋ ๋
๋ฆฝ์ ์ธ ๋ก๋๋ค, ๋๋ ์ค๋ณต๋๋ + ๋ก๋๋ค์ ๋ํด์๋ ์ด๋ค ์ํฅ๋ ๋ผ์น์ง ์์ต๋๋ค. + + (1) ์์ ์ธ๊ธํ๋ฏ์ด, ์์คํ
์ CPU ๋ค์ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ์ผ๋ จ์ ์คํ ์ด + ์คํผ๋ ์ด์
๋ค์ ๋์ ธ ๋ฃ๊ณ ์์ผ๋ฉฐ, ๊ฑฐ๊ธฐ์ ๊ด์ฌ์ด ์๋ ๋ค๋ฅธ CPU ๋ ๊ทธ + ์คํผ๋ ์ด์
๋ค์ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ด ์คํํ ๊ฒฐ๊ณผ๋ฅผ ์ธ์งํ ์ ์์ต๋๋ค. ์ด์ฒ๋ผ + ๋ค๋ฅธ CPU ์ ์คํ ์ด ์คํผ๋ ์ด์
์ ๊ฒฐ๊ณผ์ ๊ด์ฌ์ ๋๊ณ ์๋ CPU ๊ฐ ์ํ ์์ฒญํ + ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋, ๋ฐฐ๋ฆฌ์ด ์์ ์ด๋ค ๋ก๋ ์คํผ๋ ์ด์
์ด ๋ค๋ฅธ CPU ์์ + ๋์ ธ ๋ฃ์ ์คํ ์ด ์คํผ๋ ์ด์
๊ณผ ๊ฐ์ ์์ญ์ ํฅํ๋ค๋ฉด, ๊ทธ๋ฐ ์คํ ์ด + ์คํผ๋ ์ด์
๋ค์ด ๋ง๋ค์ด๋ด๋ ๊ฒฐ๊ณผ๊ฐ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ก๋ + ์คํผ๋ ์ด์
๋ค์๊ฒ๋ ๋ณด์ผ ๊ฒ์ ๋ณด์ฅํฉ๋๋ค. + + ์ด ์์ ์ธ์ฐ๊ธฐ ์ ์ฝ์ ๋ํ ๊ทธ๋ฆผ์ ๋ณด๊ธฐ ์ํด์ "๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ํ์ค์ ์" + ์๋ธ์น์
์ ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค. + + [!] ์ฒซ๋ฒ์งธ ๋ก๋๋ ๋ฐ๋์ _๋ฐ์ดํฐ_ ์์กด์ฑ์ ๊ฐ์ ธ์ผ์ง ์ปจํธ๋กค ์์กด์ฑ์ ๊ฐ์ ธ์ผ + ํ๋๊ฒ ์๋์ ์์๋์ญ์์ค. ๋ง์ฝ ๋๋ฒ์งธ ๋ก๋๋ฅผ ์ํ ์ฃผ์๊ฐ ์ฒซ๋ฒ์งธ ๋ก๋์ + ์์กด์ ์ด์ง๋ง ๊ทธ ์์กด์ฑ์ ์กฐ๊ฑด์ ์ด์ง ๊ทธ ์ฃผ์ ์์ฒด๋ฅผ ๊ฐ์ ธ์ค๋๊ฒ ์๋๋ผ๋ฉด, + ๊ทธ๊ฒ์ _์ปจํธ๋กค_ ์์กด์ฑ์ด๊ณ , ์ด ๊ฒฝ์ฐ์๋ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๊ทธ๋ณด๋ค ๊ฐ๋ ฅํ + ๋ฌด์ธ๊ฐ๊ฐ ํ์ํฉ๋๋ค. ๋ ์์ธํ ๋ด์ฉ์ ์ํด์๋ "์ปจํธ๋กค ์์กด์ฑ" ์๋ธ์น์
์ + ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค. + + [!] ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ๋ณดํต ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ ํจ๊ป ์ง์ ๋ง์ถฐ ์ฌ์ฉ๋์ด์ผ + ํฉ๋๋ค; "SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + + (3) ์ฝ๊ธฐ (๋๋ ๋ก๋) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + + ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด ๊ธฐ๋ฅ์ ๋ณด์ฅ์ฌํญ์ ๋ํด์ ๋ฐฐ๋ฆฌ์ด๋ณด๋ค + ์์ ๋ช
์๋ ๋ชจ๋ LOAD ์คํผ๋ ์ด์
๋ค์ด ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ช
์๋๋ ๋ชจ๋ LOAD + ์คํผ๋ ์ด์
๋ค๋ณด๋ค ๋จผ์ ํํด์ง ๊ฒ์ผ๋ก ์์คํ
์ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ค์ ๋ณด์ฌ์ง ๊ฒ์ + ๋ณด์ฅํฉ๋๋ค. + + ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ก๋ ์คํผ๋ ์ด์
์ ํํด์ง๋ ๋ถ๋ถ์ ์์ ์ธ์ฐ๊ธฐ์
๋๋ค; ์คํ ์ด + ์คํผ๋ ์ด์
์ ๋ํด์๋ ์ด๋ค ์ํฅ๋ ๋ผ์น์ง ์์ต๋๋ค. + + ์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ด์ฅํ๋ฏ๋ก ๋ฐ์ดํฐ ์์กด์ฑ + ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋์ ํ ์ ์์ต๋๋ค. + + [!] ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ์ผ๋ฐ์ ์ผ๋ก ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ ํจ๊ป ์ง์ ๋ง์ถฐ ์ฌ์ฉ๋์ด์ผ + ํฉ๋๋ค; "SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + + (4) ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + + ๋ฒ์ฉ(general) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐฐ๋ฆฌ์ด๋ณด๋ค ์์ ๋ช
์๋ ๋ชจ๋ LOAD ์ STORE + ์คํผ๋ ์ด์
๋ค์ด ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ช
์๋ ๋ชจ๋ LOAD ์ STORE ์คํผ๋ ์ด์
๋ค๋ณด๋ค + ๋จผ์ ์ํ๋ ๊ฒ์ผ๋ก ์์คํ
์ ๋๋จธ์ง ์ปดํฌ๋ํธ๋ค์ ๋ณด์ด๊ฒ ๋จ์ ๋ณด์ฅํฉ๋๋ค. + + ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋ก๋์ ์คํ ์ด ๋ชจ๋์ ๋ํ ๋ถ๋ถ์ ์์ ์ธ์ฐ๊ธฐ์
๋๋ค. + + ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด, ์ฐ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ๋ชจ๋๋ฅผ + ๋ด์ฅํ๋ฏ๋ก, ๋ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ชจ๋ ๋์ ํ ์ ์์ต๋๋ค. + + +๊ทธ๋ฆฌ๊ณ ๋๊ฐ์ ๋ช
์์ ์ด์ง ์์ ํ์
์ด ์์ต๋๋ค: + + (5) ACQUIRE ์คํผ๋ ์ด์
. + + ์ด ํ์
์ ์คํผ๋ ์ด์
์ ๋จ๋ฐฉํฅ์ ํฌ๊ณผ์ฑ ๋ฐฐ๋ฆฌ์ด์ฒ๋ผ ๋์ํฉ๋๋ค. ACQUIRE + ์คํผ๋ ์ด์
๋ค์ ๋ชจ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ด ACQUIRE ์คํผ๋ ์ด์
ํ์ + ์ผ์ด๋ ๊ฒ์ผ๋ก ์์คํ
์ ๋๋จธ์ง ์ปดํฌ๋ํธ๋ค์ ๋ณด์ด๊ฒ ๋ ๊ฒ์ด ๋ณด์ฅ๋ฉ๋๋ค. + LOCK ์คํผ๋ ์ด์
๊ณผ smp_load_acquire(), smp_cond_acquire() ์คํผ๋ ์ด์
๋ + ACQUIRE ์คํผ๋ ์ด์
์ ํฌํจ๋ฉ๋๋ค. smp_cond_acquire() ์คํผ๋ ์ด์
์ ์ปจํธ๋กค + ์์กด์ฑ๊ณผ smp_rmb() ๋ฅผ ์ฌ์ฉํด์ ACQUIRE ์ ์๋ฏธ์ ์๊ตฌ์ฌํญ(semantic)์ + ์ถฉ์กฑ์ํต๋๋ค. + + ACQUIRE ์คํผ๋ ์ด์
์์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ ACQUIRE ์คํผ๋ ์ด์
์๋ฃ ํ์ + ์ํ๋ ๊ฒ์ฒ๋ผ ๋ณด์ผ ์ ์์ต๋๋ค. + + ACQUIRE ์คํผ๋ ์ด์
์ ๊ฑฐ์ ํญ์ RELEASE ์คํผ๋ ์ด์
๊ณผ ์ง์ ์ง์ด ์ฌ์ฉ๋์ด์ผ + ํฉ๋๋ค. + + + (6) RELEASE ์คํผ๋ ์ด์
. + + ์ด ํ์
์ ์คํผ๋ ์ด์
๋ค๋ ๋จ๋ฐฉํฅ ํฌ๊ณผ์ฑ ๋ฐฐ๋ฆฌ์ด์ฒ๋ผ ๋์ํฉ๋๋ค. RELEASE + ์คํผ๋ ์ด์
์์ ๋ชจ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ RELEASE ์คํผ๋ ์ด์
์ ์ ์๋ฃ๋ + ๊ฒ์ผ๋ก ์์คํ
์ ๋ค๋ฅธ ์ปดํฌ๋ํธ๋ค์ ๋ณด์ฌ์ง ๊ฒ์ด ๋ณด์ฅ๋ฉ๋๋ค. UNLOCK ๋ฅ์ + ์คํผ๋ ์ด์
๋ค๊ณผ smp_store_release() ์คํผ๋ ์ด์
๋ RELEASE ์คํผ๋ ์ด์
์ + ์ผ์ข
์
๋๋ค. + + RELEASE ์คํผ๋ ์ด์
๋ค์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ RELEASE ์คํผ๋ ์ด์
์ด + ์๋ฃ๋๊ธฐ ์ ์ ํํด์ง ๊ฒ์ฒ๋ผ ๋ณด์ผ ์ ์์ต๋๋ค. + + ACQUIRE ์ RELEASE ์คํผ๋ ์ด์
์ ์ฌ์ฉ์ ์ผ๋ฐ์ ์ผ๋ก ๋ค๋ฅธ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ + ํ์์ฑ์ ์์ฑ๋๋ค (ํ์ง๋ง "MMIO ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด" ์๋ธ์น์
์์ ์ค๋ช
๋๋ ์์ธ๋ฅผ + ์์๋์ธ์). ๋ํ, RELEASE+ACQUIRE ์กฐํฉ์ ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ฒ๋ผ ๋์ํ + ๊ฒ์ ๋ณด์ฅํ์ง -์์ต๋๋ค-. ํ์ง๋ง, ์ด๋ค ๋ณ์์ ๋ํ RELEASE ์คํผ๋ ์ด์
์ + ์์๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ์ํ ๊ฒฐ๊ณผ๋ ์ด RELEASE ์คํผ๋ ์ด์
์ ๋ค์ด์ด ๊ฐ์ + ๋ณ์์ ๋ํด ์ํ๋ ACQUIRE ์คํผ๋ ์ด์
์ ๋ค๋ฐ๋ฅด๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค์๋ ๋ณด์ฌ์ง + ๊ฒ์ด ๋ณด์ฅ๋ฉ๋๋ค. ๋ค๋ฅด๊ฒ ๋งํ์๋ฉด, ์ฃผ์ด์ง ๋ณ์์ ํฌ๋ฆฌํฐ์ปฌ ์น์
์์๋, ํด๋น + ๋ณ์์ ๋ํ ์์ ํฌ๋ฆฌํฐ์ปฌ ์น์
์์์ ๋ชจ๋ ์ก์ธ์ค๋ค์ด ์๋ฃ๋์์ ๊ฒ์ + ๋ณด์ฅํฉ๋๋ค. + + ์ฆ, ACQUIRE ๋ ์ต์ํ์ "์ทจ๋" ๋์์ฒ๋ผ, ๊ทธ๋ฆฌ๊ณ RELEASE ๋ ์ต์ํ์ "๊ณต๊ฐ" + ์ฒ๋ผ ๋์ํ๋ค๋ ์๋ฏธ์
๋๋ค. + +atomic_ops.txt ์์ ์ค๋ช
๋๋ ์ดํ ๋ฏน ์คํผ๋ ์ด์
๋ค ์ค์๋ ์์ ํ ์์์กํ ๊ฒ๋ค๊ณผ +(๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ์ง ์๋) ์ํ๋ ์์์ ๊ฒ๋ค ์ธ์ ACQUIRE ์ RELEASE ๋ถ๋ฅ์ +๊ฒ๋ค๋ ์กด์ฌํฉ๋๋ค. ๋ก๋์ ์คํ ์ด๋ฅผ ๋ชจ๋ ์ํํ๋ ์กฐํฉ๋ ์ดํ ๋ฏน ์คํผ๋ ์ด์
์์, +ACQUIRE ๋ ํด๋น ์คํผ๋ ์ด์
์ ๋ก๋ ๋ถ๋ถ์๋ง ์ ์ฉ๋๊ณ RELEASE ๋ ํด๋น +์คํผ๋ ์ด์
์ ์คํ ์ด ๋ถ๋ถ์๋ง ์ ์ฉ๋ฉ๋๋ค. + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ ๋ CPU ๊ฐ, ๋๋ CPU ์ ๋๋ฐ์ด์ค ๊ฐ์ ์ํธ์์ฉ์ ๊ฐ๋ฅ์ฑ์ด ์์ +๋์๋ง ํ์ํฉ๋๋ค. ๋ง์ฝ ์ด๋ค ์ฝ๋์ ๊ทธ๋ฐ ์ํธ์์ฉ์ด ์์ ๊ฒ์ด ๋ณด์ฅ๋๋ค๋ฉด, ํด๋น +์ฝ๋์์๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ ํ์๊ฐ ์์ต๋๋ค. + + +์ด๊ฒ๋ค์ _์ต์ํ์_ ๋ณด์ฅ์ฌํญ๋ค์์ ์์๋์ธ์. ๋ค๋ฅธ ์ํคํ
์ณ์์๋ ๋ ๊ฐ๋ ฅํ +๋ณด์ฅ์ฌํญ์ ์ ๊ณตํ ์๋ ์์ต๋๋ค๋ง, ๊ทธ๋ฐ ๋ณด์ฅ์ฌํญ์ ์ํคํ
์ณ ์ข
์์ ์ฝ๋ ์ด์ธ์ +๋ถ๋ถ์์๋ ์ ๋ขฐ๋์ง _์์_ ๊ฒ๋๋ค. + + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ ๋ํด ๊ฐ์ ํด์ ์๋ ๊ฒ +------------------------------------- + +๋ฆฌ๋
์ค ์ปค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ด ๋ณด์ฅํ์ง ์๋ ๊ฒ๋ค์ด ์์ต๋๋ค: + + (*) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์์์ ๋ช
์๋ ์ด๋ค ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ๋ช
๋ น์ ์ํ + ์๋ฃ ์์ ๊น์ง _์๋ฃ_ ๋ ๊ฒ์ด๋ ๋ณด์ฅ์ ์์ต๋๋ค; ๋ฐฐ๋ฆฌ์ด๊ฐ ํ๋ ์ผ์ CPU ์ + ์ก์ธ์ค ํ์ ํน์ ํ์
์ ์ก์ธ์ค๋ค์ ๋์ ์ ์๋ ์ ์ ๊ธ๋ ๊ฒ์ผ๋ก ์๊ฐ๋ ์ + ์์ต๋๋ค. + + (*) ํ CPU ์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ํํ๋๊ฒ ์์คํ
์ ๋ค๋ฅธ CPU ๋ ํ๋์จ์ด์ + ์ด๋ค ์ง์ ์ ์ธ ์ํฅ์ ๋ผ์น๋ค๋ ๋ณด์ฅ์ ์กด์ฌํ์ง ์์ต๋๋ค. ๋ฐฐ๋ฆฌ์ด ์ํ์ด + ๋ง๋๋ ๊ฐ์ ์ ์ํฅ์ ๋๋ฒ์งธ CPU ๊ฐ ์ฒซ๋ฒ์งธ CPU ์ ์ก์ธ์ค๋ค์ ๊ฒฐ๊ณผ๋ฅผ + ๋ฐ๋ผ๋ณด๋ ์์๊ฐ ๋ฉ๋๋ค๋ง, ๋ค์ ํญ๋ชฉ์ ๋ณด์ธ์: + + (*) ์ฒซ๋ฒ์งธ CPU ๊ฐ ๋๋ฒ์งธ CPU ์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ๋ผ๋ณผ ๋, _์ค๋ น_ + ๋๋ฒ์งธ CPU ๊ฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ๋ค ํด๋, ์ฒซ๋ฒ์งธ CPU _๋ํ_ ๊ทธ์ ๋ง๋ + ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ์ง ์๋๋ค๋ฉด ("SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ" ์๋ธ์น์
์ + ์ฐธ๊ณ ํ์ธ์) ๊ทธ ๊ฒฐ๊ณผ๊ฐ ์ฌ๋ฐ๋ฅธ ์์๋ก ๋ณด์ฌ์ง๋ค๋ ๋ณด์ฅ์ ์์ต๋๋ค. + + (*) CPU ๋ฐ๊นฅ์ ํ๋์จ์ด[*] ๊ฐ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ์์๋ฅผ ๋ฐ๊พธ์ง ์๋๋ค๋ ๋ณด์ฅ์ + ์กด์ฌํ์ง ์์ต๋๋ค. CPU ์บ์ ์ผ๊ด์ฑ ๋ฉ์ปค๋์ฆ์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด์ ๊ฐ์ ์ + ์ํฅ์ CPU ์ฌ์ด์ ์ ํํ๊ธด ํ์ง๋ง, ์์๋๋ก ์ ํํ์ง๋ ์์ ์ ์์ต๋๋ค. + + [*] ๋ฒ์ค ๋ง์คํฐ๋ง DMA ์ ์ผ๊ด์ฑ์ ๋ํด์๋ ๋ค์์ ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค: + + Documentation/PCI/pci.txt + Documentation/DMA-API-HOWTO.txt + Documentation/DMA-API.txt + + +๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด +-------------------- + +๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด์ ์ฌ์ฉ์ ์์ด ์ง์ผ์ผ ํ๋ ์ฌํญ๋ค์ ์ฝ๊ฐ ๋ฏธ๋ฌํ๊ณ , ๋ฐ์ดํฐ +์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๊ฐ ์ฌ์ฉ๋์ด์ผ ํ๋ ์ํฉ๋ ํญ์ ๋ช
๋ฐฑํ์ง๋ ์์ต๋๋ค. ์ค๋ช
์ ์ํด +๋ค์์ ์ด๋ฒคํธ ์ํ์ค๋ฅผ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + =============== =============== + { A == 1, B == 2, C == 3, P == &A, Q == &C } + B = 4; + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(P, &B) + Q = READ_ONCE(P); + D = *Q; + +์ฌ๊ธฐ์ ๋ถ๋ช
ํ ๋ฐ์ดํฐ ์์กด์ฑ์ด ์กด์ฌํ๋ฏ๋ก, ์ด ์ํ์ค๊ฐ ๋๋ฌ์ ๋ Q ๋ &A ๋๋ &B +์ผ ๊ฒ์ด๊ณ , ๋ฐ๋ผ์: + + (Q == &A) ๋ (D == 1) ๋ฅผ, + (Q == &B) ๋ (D == 4) ๋ฅผ ์๋ฏธํฉ๋๋ค. + +ํ์ง๋ง! CPU 2 ๋ B ์ ์
๋ฐ์ดํธ๋ฅผ ์ธ์ํ๊ธฐ ์ ์ P ์ ์
๋ฐ์ดํธ๋ฅผ ์ธ์ํ ์ ์๊ณ , +๋ฐ๋ผ์ ๋ค์์ ๊ฒฐ๊ณผ๊ฐ ๊ฐ๋ฅํฉ๋๋ค: + + (Q == &B) and (D == 2) ???? + +์ด๋ฐ ๊ฒฐ๊ณผ๋ ์ผ๊ด์ฑ์ด๋ ์ธ๊ณผ ๊ด๊ณ ์ ์ง๊ฐ ์คํจํ ๊ฒ์ฒ๋ผ ๋ณด์ผ ์๋ ์๊ฒ ์ง๋ง, +๊ทธ๋ ์ง ์์ต๋๋ค, ๊ทธ๋ฆฌ๊ณ ์ด ํ์์ (DEC Alpha ์ ๊ฐ์) ์ฌ๋ฌ CPU ์์ ์ค์ ๋ก +๋ฐ๊ฒฌ๋ ์ ์์ต๋๋ค. + +์ด ๋ฌธ์ ์ํฉ์ ์ ๋๋ก ํด๊ฒฐํ๊ธฐ ์ํด, ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ๊ทธ๋ณด๋ค ๊ฐํ๋ +๋ฌด์ธ๊ฐ๊ฐ ์ฃผ์๋ฅผ ์ฝ์ด์ฌ ๋์ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ด์ฌ ๋ ์ฌ์ด์ ์ถ๊ฐ๋์ด์ผ๋ง ํฉ๋๋ค: + + CPU 1 CPU 2 + =============== =============== + { A == 1, B == 2, C == 3, P == &A, Q == &C } + B = 4; + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(P, &B); + Q = READ_ONCE(P); + <๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด> + D = *Q; + +์ด ๋ณ๊ฒฝ์ ์์ ์ฒ์ ๋๊ฐ์ง ๊ฒฐ๊ณผ ์ค ํ๋๋ง์ด ๋ฐ์ํ ์ ์๊ณ , ์ธ๋ฒ์งธ์ ๊ฒฐ๊ณผ๋ +๋ฐ์ํ ์ ์๋๋ก ํฉ๋๋ค. + +๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์์กด์ ์ฐ๊ธฐ์ ๋ํด์๋ ์์๋ฅผ ์ก์์ค๋๋ค: + + CPU 1 CPU 2 + =============== =============== + { A == 1, B == 2, C = 3, P == &A, Q == &C } + B = 4; + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(P, &B); + Q = READ_ONCE(P); + <๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด> + *Q = 5; + +์ด ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ Q ๋ก์ ์ฝ๊ธฐ๊ฐ *Q ๋ก์ ์คํ ์ด์ ์์๋ฅผ ๋ง์ถ๊ฒ +ํด์ค๋๋ค. ์ด๋ ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ฅผ ๋ง์ต๋๋ค: + + (Q == &B) && (B == 4) + +์ด๋ฐ ํจํด์ ๋๋ฌผ๊ฒ ์ฌ์ฉ๋์ด์ผ ํจ์ ์์ ๋์๊ธฐ ๋ฐ๋๋๋ค. ๋ฌด์๋ณด๋ค๋, ์์กด์ฑ +์์ ๊ท์น์ ์๋๋ ์ฐ๊ธฐ ์์
์ -์๋ฐฉ- ํด์ ๊ทธ๋ก ์ธํด ๋ฐ์ํ๋ ๋น์ผ ์บ์ ๋ฏธ์ค๋ +์์ ๋ ค๋ ๊ฒ์
๋๋ค. ์ด ํจํด์ ๋๋ฌผ๊ฒ ๋ฐ์ํ๋ ์๋ฌ ์กฐ๊ฑด ๊ฐ์๊ฒ๋ค์ ๊ธฐ๋กํ๋๋ฐ +์ฌ์ฉ๋ ์ ์๊ณ , ์ด๋ ๊ฒ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํด ์์๋ฅผ ์งํค๊ฒ ํจ์ผ๋ก์จ ๊ทธ๋ฐ ๊ธฐ๋ก์ด +์ฌ๋ผ์ง๋ ๊ฒ์ ๋ง์ต๋๋ค. + + +[!] ์๋นํ ๋น์ง๊ด์ ์ธ ์ด ์ํฉ์ ๋ถ๋ฆฌ๋ ์บ์๋ฅผ ๊ฐ์ง ๊ธฐ๊ณ, ์๋ฅผ ๋ค์ด ํ ์บ์ +๋ฑ
ํฌ๊ฐ ์ง์๋ฒ ์บ์ ๋ผ์ธ์ ์ฒ๋ฆฌํ๊ณ ๋ค๋ฅธ ๋ฑ
ํฌ๋ ํ์๋ฒ ์บ์ ๋ผ์ธ์ ์ฒ๋ฆฌํ๋ ๊ธฐ๊ณ +๋ฑ์์ ๊ฐ์ฅ ์ ๋ฐ์ํฉ๋๋ค. ํฌ์ธํฐ P ๋ ํ์ ๋ฒํธ์ ์บ์ ๋ผ์ธ์ ์๊ณ , ๋ณ์ B ๋ +์ง์ ๋ฒํธ ์บ์ ๋ผ์ธ์ ์๋ค๊ณ ์๊ฐํด ๋ด
์๋ค. ๊ทธ๋ฐ ์ํ์์ ์ฝ๊ธฐ ์์
์ ํ๋ CPU +์ ์ง์๋ฒ ๋ฑ
ํฌ๋ ํ ์ผ์ด ์์ฌ ๋งค์ฐ ๋ฐ์์ง๋ง ํ์๋ฒ ๋ฑ
ํฌ๋ ํ ์ผ์ด ์์ด ์๋ฌด +์ผ๋ ํ์ง ์๊ณ ์์๋ค๋ฉด, ํฌ์ธํฐ P ๋ ์ ๊ฐ (&B) ์, ๊ทธ๋ฆฌ๊ณ ๋ณ์ B ๋ ์๋ ๊ฐ +(2) ์ ๊ฐ์ง๊ณ ์๋ ์ํ๊ฐ ๋ณด์ฌ์ง ์๋ ์์ต๋๋ค. + + +๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ๋งค์ฐ ์ค์ํ๋ฐ, ์๋ฅผ ๋ค์ด RCU ์์คํ
์์ ๊ทธ๋ ์ต๋๋ค. +include/linux/rcupdate.h ์ rcu_assign_pointer() ์ rcu_dereference() ๋ฅผ +์ฐธ๊ณ ํ์ธ์. ์ฌ๊ธฐ์ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ RCU ๋ก ๊ด๋ฆฌ๋๋ ํฌ์ธํฐ์ ํ๊ฒ์ ํ์ฌ +ํ๊ฒ์์ ์์ ๋ ์๋ก์ด ํ๊ฒ์ผ๋ก ๋ฐ๊พธ๋ ์์
์์ ์๋ก ์์ ๋ ํ๊ฒ์ด ์ด๊ธฐํ๊ฐ +์๋ฃ๋์ง ์์ ์ฑ๋ก ๋ณด์ฌ์ง๋ ์ผ์ด ์ผ์ด๋์ง ์๊ฒ ํด์ค๋๋ค. + +๋ ๋ง์ ์๋ฅผ ์ํด์ "์บ์ ์ผ๊ด์ฑ" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + +์ปจํธ๋กค ์์กด์ฑ +------------- + +๋ก๋-๋ก๋ ์ปจํธ๋กค ์์กด์ฑ์ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ง์ผ๋ก๋ ์ ํํ ๋์ํ ์๊ฐ +์์ด์ ์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ํ์๋ก ํฉ๋๋ค. ์๋์ ์ฝ๋๋ฅผ ๋ด
์๋ค: + + q = READ_ONCE(a); + if (q) { + <๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด> /* BUG: No data dependency!!! */ + p = READ_ONCE(b); + } + +์ด ์ฝ๋๋ ์ํ๋ ๋๋ก์ ํจ๊ณผ๋ฅผ ๋ด์ง ๋ชปํ ์ ์๋๋ฐ, ์ด ์ฝ๋์๋ ๋ฐ์ดํฐ ์์กด์ฑ์ด +์๋๋ผ ์ปจํธ๋กค ์์กด์ฑ์ด ์กด์ฌํ๊ธฐ ๋๋ฌธ์ผ๋ก, ์ด๋ฐ ์ํฉ์์ CPU ๋ ์คํ ์๋๋ฅผ ๋ +๋น ๋ฅด๊ฒ ํ๊ธฐ ์ํด ๋ถ๊ธฐ ์กฐ๊ฑด์ ๊ฒฐ๊ณผ๋ฅผ ์์ธกํ๊ณ ์ฝ๋๋ฅผ ์ฌ๋ฐฐ์น ํ ์ ์์ด์ ๋ค๋ฅธ +CPU ๋ b ๋ก๋ถํฐ์ ๋ก๋ ์คํผ๋ ์ด์
์ด a ๋ก๋ถํฐ์ ๋ก๋ ์คํผ๋ ์ด์
๋ณด๋ค ๋จผ์ ๋ฐ์ํ +๊ฑธ๋ก ์ธ์ํ ์ ์์ต๋๋ค. ์ฌ๊ธฐ์ ์ ๋ง๋ก ํ์ํ๋ ๊ฑด ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + q = READ_ONCE(a); + if (q) { + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + p = READ_ONCE(b); + } + +ํ์ง๋ง, ์คํ ์ด ์คํผ๋ ์ด์
์ ์์ธก์ ์ผ๋ก ์ํ๋์ง ์์ต๋๋ค. ์ฆ, ๋ค์ ์์์์ +๊ฐ์ด ๋ก๋-์คํ ์ด ์ปจํธ๋กค ์์กด์ฑ์ด ์กด์ฌํ๋ ๊ฒฝ์ฐ์๋ ์์๊ฐ -์ง์ผ์ง๋ค-๋ +์๋ฏธ์
๋๋ค. + + q = READ_ONCE(a); + if (q) { + WRITE_ONCE(b, p); + } + +์ปจํธ๋กค ์์กด์ฑ์ ๋ณดํต ๋ค๋ฅธ ํ์
์ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ ์ง์ ๋ง์ถฐ ์ฌ์ฉ๋ฉ๋๋ค. ๊ทธ๋ ๋ค๊ณค +ํ๋, READ_ONCE() ๋ ๋ฐ๋์ ์ฌ์ฉํด์ผ ํจ์ ๋ถ๋ ๋ช
์ฌํ์ธ์! READ_ONCE() ๊ฐ +์๋ค๋ฉด, ์ปดํ์ผ๋ฌ๊ฐ 'a' ๋ก๋ถํฐ์ ๋ก๋๋ฅผ 'a' ๋ก๋ถํฐ์ ๋๋ค๋ฅธ ๋ก๋์, 'b' ๋ก์ +์คํ ์ด๋ฅผ 'b' ๋ก์ ๋๋ค๋ฅธ ์คํ ์ด์ ์กฐํฉํด ๋ฒ๋ ค ๋งค์ฐ ๋น์ง๊ด์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ์ด๋ํ ์ +์์ต๋๋ค. + +์ด๊ฑธ๋ก ๋์ด ์๋๊ฒ, ์ปดํ์ผ๋ฌ๊ฐ ๋ณ์ 'a' ์ ๊ฐ์ด ํญ์ 0์ด ์๋๋ผ๊ณ ์ฆ๋ช
ํ ์ +์๋ค๋ฉด, ์์ ์์์ "if" ๋ฌธ์ ์์ ์ ๋ค์๊ณผ ๊ฐ์ด ์ต์ ํ ํ ์๋ ์์ต๋๋ค: + + q = a; + b = p; /* BUG: Compiler and CPU can both reorder!!! */ + +๊ทธ๋ฌ๋ READ_ONCE() ๋ฅผ ๋ฐ๋์ ์ฌ์ฉํ์ธ์. + +๋ค์๊ณผ ๊ฐ์ด "if" ๋ฌธ์ ์๊ฐ๋ ๋ธ๋์น์ ๋ชจ๋ ์กด์ฌํ๋ ๋์ผํ ์คํ ์ด์ ๋ํด ์์๋ฅผ +๊ฐ์ ํ๊ณ ์ถ์ ๊ฒฝ์ฐ๊ฐ ์์ ์ ์์ต๋๋ค: + + q = READ_ONCE(a); + if (q) { + barrier(); + WRITE_ONCE(b, p); + do_something(); + } else { + barrier(); + WRITE_ONCE(b, p); + do_something_else(); + } + +์ํ๊น๊ฒ๋, ํ์ฌ์ ์ปดํ์ผ๋ฌ๋ค์ ๋์ ์ต์ ํ ๋ ๋ฒจ์์๋ ์ด๊ฑธ ๋ค์๊ณผ ๊ฐ์ด +๋ฐ๊ฟ๋ฒ๋ฆฝ๋๋ค: + + q = READ_ONCE(a); + barrier(); + WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */ + if (q) { + /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ + do_something(); + } else { + /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ + do_something_else(); + } + +์ด์ 'a' ์์์ ๋ก๋์ 'b' ๋ก์ ์คํ ์ด ์ฌ์ด์๋ ์กฐ๊ฑด์ ๊ด๊ณ๊ฐ ์๊ธฐ ๋๋ฌธ์ CPU +๋ ์ด๋ค์ ์์๋ฅผ ๋ฐ๊ฟ ์ ์๊ฒ ๋ฉ๋๋ค: ์ด๋ฐ ๊ฒฝ์ฐ์ ์กฐ๊ฑด์ ๊ด๊ณ๋ ๋ฐ๋์ +ํ์ํ๋ฐ, ๋ชจ๋ ์ปดํ์ผ๋ฌ ์ต์ ํ๊ฐ ์ด๋ฃจ์ด์ง๊ณ ๋ ํ์ ์ด์
๋ธ๋ฆฌ ์ฝ๋์์๋ +๋ง์ฐฌ๊ฐ์ง์
๋๋ค. ๋ฐ๋ผ์, ์ด ์์์ ์์๋ฅผ ์งํค๊ธฐ ์ํด์๋ smp_store_release() +์ ๊ฐ์ ๋ช
์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํฉ๋๋ค: + + q = READ_ONCE(a); + if (q) { + smp_store_release(&b, p); + do_something(); + } else { + smp_store_release(&b, p); + do_something_else(); + } + +๋ฐ๋ฉด์ ๋ช
์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ์๋ค๋ฉด, ์ด๋ฐ ๊ฒฝ์ฐ์ ์์๋ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ด +์๋ก ๋ค๋ฅผ ๋์๋ง ๋ณด์ฅ๋๋๋ฐ, ์๋ฅผ ๋ค๋ฉด ๋ค์๊ณผ ๊ฐ์ ๊ฒฝ์ฐ์
๋๋ค: + + q = READ_ONCE(a); + if (q) { + WRITE_ONCE(b, p); + do_something(); + } else { + WRITE_ONCE(b, r); + do_something_else(); + } + +์ฒ์์ READ_ONCE() ๋ ์ปดํ์ผ๋ฌ๊ฐ 'a' ์ ๊ฐ์ ์ฆ๋ช
ํด๋ด๋ ๊ฒ์ ๋ง๊ธฐ ์ํด ์ฌ์ ํ +ํ์ํฉ๋๋ค. + +๋ํ, ๋ก์ปฌ ๋ณ์ 'q' ๋ฅผ ๊ฐ์ง๊ณ ํ๋ ์ผ์ ๋ํด ์ฃผ์ํด์ผ ํ๋๋ฐ, ๊ทธ๋ฌ์ง ์์ผ๋ฉด +์ปดํ์ผ๋ฌ๋ ๊ทธ ๊ฐ์ ์ถ์ธกํ๊ณ ๋๋ค์ ํ์ํ ์กฐ๊ฑด๊ด๊ณ๋ฅผ ์์ ๋ฒ๋ฆด ์ ์์ต๋๋ค. +์๋ฅผ ๋ค๋ฉด: + + q = READ_ONCE(a); + if (q % MAX) { + WRITE_ONCE(b, p); + do_something(); + } else { + WRITE_ONCE(b, r); + do_something_else(); + } + +๋ง์ฝ MAX ๊ฐ 1 ๋ก ์ ์๋ ์์๋ผ๋ฉด, ์ปดํ์ผ๋ฌ๋ (q % MAX) ๋ 0์ด๋ ๊ฒ์ ์์์ฑ๊ณ , +์์ ์ฝ๋๋ฅผ ์๋์ ๊ฐ์ด ๋ฐ๊ฟ๋ฒ๋ฆด ์ ์์ต๋๋ค: + + q = READ_ONCE(a); + WRITE_ONCE(b, p); + do_something_else(); + +์ด๋ ๊ฒ ๋๋ฉด, CPU ๋ ๋ณ์ 'a' ๋ก๋ถํฐ์ ๋ก๋์ ๋ณ์ 'b' ๋ก์ ์คํ ์ด ์ฌ์ด์ ์์๋ฅผ +์ง์ผ์ค ํ์๊ฐ ์์ด์ง๋๋ค. barrier() ๋ฅผ ์ถ๊ฐํด ํด๊ฒฐํด ๋ณด๊ณ ์ถ๊ฒ ์ง๋ง, ๊ทธ๊ฑด +๋์์ด ์๋ฉ๋๋ค. ์กฐ๊ฑด ๊ด๊ณ๋ ์ฌ๋ผ์ก๊ณ , barrier() ๋ ์ด๋ฅผ ๋๋๋ฆฌ์ง ๋ชปํฉ๋๋ค. +๋ฐ๋ผ์, ์ด ์์๋ฅผ ์ง์ผ์ผ ํ๋ค๋ฉด, MAX ๊ฐ 1 ๋ณด๋ค ํฌ๋ค๋ ๊ฒ์, ๋ค์๊ณผ ๊ฐ์ ๋ฐฉ๋ฒ์ +์ฌ์ฉํด ๋ถ๋ช
ํ ํด์ผ ํฉ๋๋ค: + + q = READ_ONCE(a); + BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ + if (q % MAX) { + WRITE_ONCE(b, p); + do_something(); + } else { + WRITE_ONCE(b, r); + do_something_else(); + } + +'b' ๋ก์ ์คํ ์ด๋ค์ ์ฌ์ ํ ์๋ก ๋ค๋ฆ์ ์์๋์ธ์. ๋ง์ฝ ๊ทธ๊ฒ๋ค์ด ๋์ผํ๋ฉด, +์์์ ์ด์ผ๊ธฐํ๋ฏ, ์ปดํ์ผ๋ฌ๊ฐ ๊ทธ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ 'if' ๋ฌธ ๋ฐ๊นฅ์ผ๋ก +๋์ง์ด๋ผ ์ ์์ต๋๋ค. + +๋ํ ์ด์ง ์กฐ๊ฑด๋ฌธ ํ๊ฐ์ ๋๋ฌด ์์กดํ์ง ์๋๋ก ์กฐ์ฌํด์ผ ํฉ๋๋ค. ๋ค์์ ์๋ฅผ +๋ด
์๋ค: + + q = READ_ONCE(a); + if (q || 1 > 0) + WRITE_ONCE(b, 1); + +์ฒซ๋ฒ์งธ ์กฐ๊ฑด๋ง์ผ๋ก๋ ๋ธ๋์น ์กฐ๊ฑด ์ ์ฒด๋ฅผ ๊ฑฐ์ง์ผ๋ก ๋ง๋ค ์ ์๊ณ ๋๋ฒ์งธ ์กฐ๊ฑด์ ํญ์ +์ฐธ์ด๊ธฐ ๋๋ฌธ์, ์ปดํ์ผ๋ฌ๋ ์ด ์๋ฅผ ๋ค์๊ณผ ๊ฐ์ด ๋ฐ๊ฟ์ ์ปจํธ๋กค ์์กด์ฑ์ ์์ ๋ฒ๋ฆด +์ ์์ต๋๋ค: + + q = READ_ONCE(a); + WRITE_ONCE(b, 1); + +์ด ์๋ ์ปดํ์ผ๋ฌ๊ฐ ์ฝ๋๋ฅผ ์ถ์ธก์ผ๋ก ์์ ํ ์ ์๋๋ก ๋ถ๋ช
ํ ํด์ผ ํ๋ค๋ ์ ์ +๊ฐ์กฐํฉ๋๋ค. ์กฐ๊ธ ๋ ์ผ๋ฐ์ ์ผ๋ก ๋งํด์, READ_ONCE() ๋ ์ปดํ์ผ๋ฌ์๊ฒ ์ฃผ์ด์ง ๋ก๋ +์คํผ๋ ์ด์
์ ์ํ ์ฝ๋๋ฅผ ์ ๋ง๋ก ๋ง๋ค๋๋ก ํ์ง๋ง, ์ปดํ์ผ๋ฌ๊ฐ ๊ทธ๋ ๊ฒ ๋ง๋ค์ด์ง +์ฝ๋์ ์ํ ๊ฒฐ๊ณผ๋ฅผ ์ฌ์ฉํ๋๋ก ๊ฐ์ ํ์ง๋ ์์ต๋๋ค. + +๋ง์ง๋ง์ผ๋ก, ์ปจํธ๋กค ์์กด์ฑ์ ์ดํ์ฑ (transitivity) ์ ์ ๊ณตํ์ง -์์ต๋๋ค-. ์ด๊ฑด +x ์ y ๊ฐ ๋ ๋ค 0 ์ด๋ผ๋ ์ด๊ธฐ๊ฐ์ ๊ฐ์ก๋ค๋ ๊ฐ์ ํ์ ๋๊ฐ์ ์์ ๋ก +๋ณด์ด๊ฒ ์ต๋๋ค: + + CPU 0 CPU 1 + ======================= ======================= + r1 = READ_ONCE(x); r2 = READ_ONCE(y); + if (r1 > 0) if (r2 > 0) + WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); + + assert(!(r1 == 1 && r2 == 1)); + +์ด ๋ CPU ์์ ์์ assert() ์ ์กฐ๊ฑด์ ํญ์ ์ฐธ์ผ ๊ฒ์
๋๋ค. ๊ทธ๋ฆฌ๊ณ , ๋ง์ฝ ์ปจํธ๋กค +์์กด์ฑ์ด ์ดํ์ฑ์ (์ค์ ๋ก๋ ๊ทธ๋ฌ์ง ์์ง๋ง) ๋ณด์ฅํ๋ค๋ฉด, ๋ค์์ CPU ๊ฐ ์ถ๊ฐ๋์ด๋ +์๋์ assert() ์กฐ๊ฑด์ ์ฐธ์ด ๋ ๊ฒ์
๋๋ค: + + CPU 2 + ===================== + WRITE_ONCE(x, 2); + + assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ + +ํ์ง๋ง ์ปจํธ๋กค ์์กด์ฑ์ ์ดํ์ฑ์ ์ ๊ณตํ์ง -์๊ธฐ- ๋๋ฌธ์, ์ธ๊ฐ์ CPU ์์ ๊ฐ ์คํ +์๋ฃ๋ ํ์ ์์ assert() ์ ์กฐ๊ฑด์ ๊ฑฐ์ง์ผ๋ก ํ๊ฐ๋ ์ ์์ต๋๋ค. ์ธ๊ฐ์ CPU +์์ ๊ฐ ์์๋ฅผ ์งํค๊ธธ ์ํ๋ค๋ฉด, CPU 0 ์ CPU 1 ์ฝ๋์ ๋ก๋์ ์คํ ์ด ์ฌ์ด, "if" +๋ฌธ ๋ฐ๋ก ๋ค์์ smp_mb()๋ฅผ ๋ฃ์ด์ผ ํฉ๋๋ค. ๋ ๋์๊ฐ์, ์ต์ด์ ๋ CPU ์์ ๋ +๋งค์ฐ ์ํํ๋ฏ๋ก ์ฌ์ฉ๋์ง ์์์ผ ํฉ๋๋ค. + +์ด ๋๊ฐ์ ์์ ๋ ๋ค์ ๋
ผ๋ฌธ: +http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf ์ +์ด ์ฌ์ดํธ: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html ์ ๋์จ LB ์ WWC +๋ฆฌํธ๋จธ์ค ํ
์คํธ์
๋๋ค. + +์์ฝํ์๋ฉด: + + (*) ์ปจํธ๋กค ์์กด์ฑ์ ์์ ๋ก๋๋ค์ ๋ค์ ์คํ ์ด๋ค์ ๋ํด ์์๋ฅผ ๋ง์ถฐ์ค๋๋ค. + ํ์ง๋ง, ๊ทธ ์ธ์ ์ด๋ค ์์๋ ๋ณด์ฅํ์ง -์์ต๋๋ค-: ์์ ๋ก๋์ ๋ค์ ๋ก๋๋ค + ์ฌ์ด์๋, ์์ ์คํ ์ด์ ๋ค์ ์คํ ์ด๋ค ์ฌ์ด์๋์. ์ด๋ฐ ๋ค๋ฅธ ํํ์ + ์์๊ฐ ํ์ํ๋ค๋ฉด smp_rmb() ๋ smp_wmb()๋ฅผ, ๋๋, ์์ ์คํ ์ด๋ค๊ณผ ๋ค์ + ๋ก๋๋ค ์ฌ์ด์ ์์๋ฅผ ์ํด์๋ smp_mb() ๋ฅผ ์ฌ์ฉํ์ธ์. + + (*) "if" ๋ฌธ์ ์๊ฐ๋ ๋ธ๋์น๊ฐ ๊ฐ์ ๋ณ์์์ ๋์ผํ ์คํ ์ด๋ก ์์ํ๋ค๋ฉด, ๊ทธ + ์คํ ์ด๋ค์ ๊ฐ ์คํ ์ด ์์ smp_mb() ๋ฅผ ๋ฃ๊ฑฐ๋ smp_store_release() ๋ฅผ + ์ฌ์ฉํด์ ์คํ ์ด๋ฅผ ํ๋ ์์ผ๋ก ์์๋ฅผ ๋ง์ถฐ์ค์ผ ํฉ๋๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ + ์ํด "if" ๋ฌธ์ ์๊ฐ๋ ๋ธ๋์น์ ์์ ์ง์ ์ barrier() ๋ฅผ ๋ฃ๋ ๊ฒ๋ง์ผ๋ก๋ + ์ถฉ๋ถํ ํด๊ฒฐ์ด ๋์ง ์๋๋ฐ, ์ด๋ ์์ ์์์ ๋ณธ๊ฒ๊ณผ ๊ฐ์ด, ์ปดํ์ผ๋ฌ์ + ์ต์ ํ๋ barrier() ๊ฐ ์๋ฏธํ๋ ๋ฐ๋ฅผ ์งํค๋ฉด์๋ ์ปจํธ๋กค ์์กด์ฑ์ ์์์ํฌ + ์ ์๊ธฐ ๋๋ฌธ์ด๋ผ๋ ์ ์ ๋ถ๋ ์์๋์๊ธฐ ๋ฐ๋๋๋ค. + + (*) ์ปจํธ๋กค ์์กด์ฑ์ ์์ ๋ก๋์ ๋ค์ ์คํ ์ด ์ฌ์ด์ ์ต์ ํ๋์, ์คํ + ์์ ์์์ ์กฐ๊ฑด๊ด๊ณ๋ฅผ ํ์๋ก ํ๋ฉฐ, ์ด ์กฐ๊ฑด๊ด๊ณ๋ ์์ ๋ก๋์ ๊ด๊ณ๋์ด์ผ + ํฉ๋๋ค. ๋ง์ฝ ์ปดํ์ผ๋ฌ๊ฐ ์กฐ๊ฑด ๊ด๊ณ๋ฅผ ์ต์ ํ๋ก ์์จ์ ์๋ค๋ฉด, ์์๋ + ์ต์ ํ๋ก ์์ ๋ฒ๋ ธ์ ๊ฒ๋๋ค. READ_ONCE() ์ WRITE_ONCE() ์ ์ฃผ์ ๊น์ + ์ฌ์ฉ์ ์ฃผ์ด์ง ์กฐ๊ฑด ๊ด๊ณ๋ฅผ ์ ์งํ๋๋ฐ ๋์์ด ๋ ์ ์์ต๋๋ค. + + (*) ์ปจํธ๋กค ์์กด์ฑ์ ์ํด์ ์ปดํ์ผ๋ฌ๊ฐ ์กฐ๊ฑด๊ด๊ณ๋ฅผ ์์ ๋ฒ๋ฆฌ๋ ๊ฒ์ ๋ง์์ผ + ํฉ๋๋ค. ์ฃผ์ ๊น์ READ_ONCE() ๋ atomic{,64}_read() ์ ์ฌ์ฉ์ด ์ปจํธ๋กค + ์์กด์ฑ์ด ์ฌ๋ผ์ง์ง ์๊ฒ ํ๋๋ฐ ๋์์ ์ค ์ ์์ต๋๋ค. ๋ ๋ง์ ์ ๋ณด๋ฅผ + ์ํด์ "์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด" ์น์
์ ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค. + + (*) ์ปจํธ๋กค ์์กด์ฑ์ ๋ณดํต ๋ค๋ฅธ ํ์
์ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ ์ง์ ๋ง์ถฐ ์ฌ์ฉ๋ฉ๋๋ค. + + (*) ์ปจํธ๋กค ์์กด์ฑ์ ์ดํ์ฑ์ ์ ๊ณตํ์ง -์์ต๋๋ค-. ์ดํ์ฑ์ด ํ์ํ๋ค๋ฉด, + smp_mb() ๋ฅผ ์ฌ์ฉํ์ธ์. + + +SMP ๋ฐฐ๋ฆฌ์ด ์ง๋ง์ถ๊ธฐ +-------------------- + +CPU ๊ฐ ์ํธ์์ฉ์ ๋ค๋ฃฐ ๋์ ์ผ๋ถ ํ์
์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ํญ์ ์ง์ ๋ง์ถฐ +์ฌ์ฉ๋์ด์ผ ํฉ๋๋ค. ์ ์ ํ๊ฒ ์ง์ ๋ง์ถ์ง ์์ ์ฝ๋๋ ์ฌ์ค์ ์๋ฌ์ ๊ฐ๊น์ต๋๋ค. + +๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๋ค์ ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๋ผ๋ฆฌ๋ ์ง์ ๋ง์ถ์ง๋ง ์ดํ์ฑ์ด ์๋ ๋๋ถ๋ถ์ ๋ค๋ฅธ +ํ์
์ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ๋ ์ง์ ๋ง์ถฅ๋๋ค. ACQUIRE ๋ฐฐ๋ฆฌ์ด๋ RELEASE ๋ฐฐ๋ฆฌ์ด์ ์ง์ +๋ง์ถฅ๋๋ค๋ง, ๋ ๋ค ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๋ฅผ ํฌํจํด ๋ค๋ฅธ ๋ฐฐ๋ฆฌ์ด๋ค๊ณผ๋ ์ง์ ๋ง์ถ ์ ์์ต๋๋ค. +์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์ปจํธ๋กค ์์กด์ฑ, ACQUIRE ๋ฐฐ๋ฆฌ์ด, RELEASE +๋ฐฐ๋ฆฌ์ด, ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด, ๋๋ ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด์ ์ง์ ๋ง์ถฅ๋๋ค. ๋น์ทํ๊ฒ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ +์ปจํธ๋กค ์์กด์ฑ, ๋๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ACQUIRE ๋ฐฐ๋ฆฌ์ด, +RELEASE ๋ฐฐ๋ฆฌ์ด, ๋๋ ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด์ ์ง์ ๋ง์ถ๋๋ฐ, ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + CPU 1 CPU 2 + =============== =============== + WRITE_ONCE(a, 1); + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(b, 2); x = READ_ONCE(b); + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + y = READ_ONCE(a); + +๋๋: + + CPU 1 CPU 2 + =============== =============================== + a = 1; + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(b, &a); x = READ_ONCE(b); + <๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด> + y = *x; + +๋๋: + + CPU 1 CPU 2 + =============== =============================== + r1 = READ_ONCE(y); + <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(y, 1); if (r2 = READ_ONCE(x)) { + <๋ฌต์์ ์ปจํธ๋กค ์์กด์ฑ> + WRITE_ONCE(y, 1); + } + + assert(r1 == 0 || r2 == 0); + +๊ธฐ๋ณธ์ ์ผ๋ก, ์ฌ๊ธฐ์์ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ "๋ ์ํ๋" ํ์
์ผ ์ ์์ด๋ ํญ์ ์กด์ฌํด์ผ +ํฉ๋๋ค. + +[!] ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด ์์ ์คํ ์ด ์คํผ๋ ์ด์
์ ์ผ๋ฐ์ ์ผ๋ก ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ +์์กด์ฑ ๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ก๋ ์คํผ๋ ์ด์
๊ณผ ๋งค์น๋ ๊ฒ์ด๊ณ , ๋ฐ๋๋ ๋ง์ฐฌ๊ฐ์ง์
๋๋ค: + + CPU 1 CPU 2 + =================== =================== + WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c); + WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d); + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> \ <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a); + WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b); + + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ํ์ค์ ์ +------------------------- + +์ฒซ์งธ, ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ ๋ถ๋ถ์ ์์ ์ธ์ฐ๊ธฐ๋ก ๋์ํฉ๋๋ค. +์๋์ ์ด๋ฒคํธ ์ํ์ค๋ฅผ ๋ณด์ธ์: + + CPU 1 + ======================= + STORE A = 1 + STORE B = 2 + STORE C = 3 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE D = 4 + STORE E = 5 + +์ด ์ด๋ฒคํธ ์ํ์ค๋ ๋ฉ๋ชจ๋ฆฌ ์ผ๊ด์ฑ ์์คํ
์ ์์๋ผ๋ฆฌ์ ์์๊ฐ ์กด์ฌํ์ง ์๋ ์งํฉ +{ STORE A, STORE B, STORE C } ๊ฐ ์ญ์ ์์๋ผ๋ฆฌ์ ์์๊ฐ ์กด์ฌํ์ง ์๋ ์งํฉ +{ STORE D, STORE E } ๋ณด๋ค ๋จผ์ ์ผ์ด๋ ๊ฒ์ผ๋ก ์์คํ
์ ๋๋จธ์ง ์์๋ค์ ๋ณด์ด๋๋ก +์ ๋ฌ๋ฉ๋๋ค: + + +-------+ : : + | | +------+ + | |------>| C=3 | } /\ + | | : +------+ }----- \ -----> ์์คํ
์ ๋๋จธ์ง ์์์ + | | : | A=1 | } \/ ๋ณด์ฌ์ง ์ ์๋ ์ด๋ฒคํธ๋ค + | | : +------+ } + | CPU 1 | : | B=2 | } + | | +------+ } + | | wwwwwwwwwwwwwwww } <--- ์ฌ๊ธฐ์ ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐฐ๋ฆฌ์ด ์์ + | | +------+ } ๋ชจ๋ ์คํ ์ด๊ฐ ๋ฐฐ๋ฆฌ์ด ๋ค์ ์คํ ์ด + | | : | E=5 | } ์ ์ ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ์ ๋ฌ๋๋๋ก + | | : +------+ } ํฉ๋๋ค + | |------>| D=4 | } + | | +------+ + +-------+ : : + | + | CPU 1 ์ ์ํด ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ ์ ๋ฌ๋๋ + | ์ผ๋ จ์ ์คํ ์ด ์คํผ๋ ์ด์
๋ค + V + + +๋์งธ, ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ ์์กด์ ๋ก๋ ์คํผ๋ ์ด์
๋ค์ ๋ถ๋ถ์ ์์ +์ธ์ฐ๊ธฐ๋ก ๋์ํฉ๋๋ค. ๋ค์ ์ผ๋ จ์ ์ด๋ฒคํธ๋ค์ ๋ณด์ธ์: + + CPU 1 CPU 2 + ======================= ======================= + { B = 7; X = 9; Y = 8; C = &Y } + STORE A = 1 + STORE B = 2 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE C = &B LOAD X + STORE D = 4 LOAD C (gets &B) + LOAD *C (reads B) + +์ฌ๊ธฐ์ ๋ณ๋ค๋ฅธ ๊ฐ์
์ด ์๋ค๋ฉด, CPU 1 ์ ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด์๋ ๋ถ๊ตฌํ๊ณ CPU 2 ๋ CPU 1 +์ ์ด๋ฒคํธ๋ค์ ์์ ํ ๋ฌด์์์ ์์๋ก ์ธ์งํ๊ฒ ๋ฉ๋๋ค: + + +-------+ : : : : + | | +------+ +-------+ | CPU 2 ์ ์ธ์ง๋๋ + | |------>| B=2 |----- --->| Y->8 | | ์
๋ฐ์ดํธ ์ด๋ฒคํธ + | | : +------+ \ +-------+ | ์ํ์ค + | CPU 1 | : | A=1 | \ --->| C->&Y | V + | | +------+ | +-------+ + | | wwwwwwwwwwwwwwww | : : + | | +------+ | : : + | | : | C=&B |--- | : : +-------+ + | | : +------+ \ | +-------+ | | + | |------>| D=4 | ----------->| C->&B |------>| | + | | +------+ | +-------+ | | + +-------+ : : | : : | | + | : : | | + | : : | CPU 2 | + | +-------+ | | + ๋ถ๋ช
ํ ์๋ชป๋ ---> | | B->7 |------>| | + B ์ ๊ฐ ์ธ์ง (!) | +-------+ | | + | : : | | + | +-------+ | | + X ์ ๋ก๋๊ฐ B ์ ---> \ | X->9 |------>| | + ์ผ๊ด์ฑ ์ ์ง๋ฅผ \ +-------+ | | + ์ง์ฐ์ํด ----->| B->2 | +-------+ + +-------+ + : : + + +์์ ์์์, CPU 2 ๋ (B ์ ๊ฐ์ด ๋ ) *C ์ ๊ฐ ์ฝ๊ธฐ๊ฐ C ์ LOAD ๋ค์ ์ด์ด์ง์๋ +B ๊ฐ 7 ์ด๋ผ๋ ๊ฒฐ๊ณผ๋ฅผ ์ป์ต๋๋ค. + +ํ์ง๋ง, ๋ง์ฝ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๊ฐ C ์ ๋ก๋์ *C (์ฆ, B) ์ ๋ก๋ ์ฌ์ด์ +์์๋ค๋ฉด: + + CPU 1 CPU 2 + ======================= ======================= + { B = 7; X = 9; Y = 8; C = &Y } + STORE A = 1 + STORE B = 2 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE C = &B LOAD X + STORE D = 4 LOAD C (gets &B) + <๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด> + LOAD *C (reads B) + +๋ค์๊ณผ ๊ฐ์ด ๋ฉ๋๋ค: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| B=2 |----- --->| Y->8 | + | | : +------+ \ +-------+ + | CPU 1 | : | A=1 | \ --->| C->&Y | + | | +------+ | +-------+ + | | wwwwwwwwwwwwwwww | : : + | | +------+ | : : + | | : | C=&B |--- | : : +-------+ + | | : +------+ \ | +-------+ | | + | |------>| D=4 | ----------->| C->&B |------>| | + | | +------+ | +-------+ | | + +-------+ : : | : : | | + | : : | | + | : : | CPU 2 | + | +-------+ | | + | | X->9 |------>| | + | +-------+ | | + C ๋ก์ ์คํ ์ด ์์ ---> \ ddddddddddddddddd | | + ๋ชจ๋ ์ด๋ฒคํธ ๊ฒฐ๊ณผ๊ฐ \ +-------+ | | + ๋ค์ ๋ก๋์๊ฒ ----->| B->2 |------>| | + ๋ณด์ด๊ฒ ๊ฐ์ ํ๋ค +-------+ | | + : : +-------+ + + +์
์งธ, ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ก๋ ์คํผ๋ ์ด์
๋ค์์ ๋ถ๋ถ์ ์์ ์ธ์ฐ๊ธฐ๋ก ๋์ํฉ๋๋ค. +์๋์ ์ผ๋ จ์ ์ด๋ฒคํธ๋ฅผ ๋ด
์๋ค: + + CPU 1 CPU 2 + ======================= ======================= + { A = 0, B = 9 } + STORE A=1 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE B=2 + LOAD B + LOAD A + +CPU 1 ์ ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ณค์ง๋ง, ๋ณ๋ค๋ฅธ ๊ฐ์
์ด ์๋ค๋ฉด CPU 2 ๋ CPU 1 ์์ ํํด์ง +์ด๋ฒคํธ์ ๊ฒฐ๊ณผ๋ฅผ ๋ฌด์์์ ์์๋ก ์ธ์งํ๊ฒ ๋ฉ๋๋ค. + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | | A->0 |------>| | + | +-------+ | | + | : : +-------+ + \ : : + \ +-------+ + ---->| A->1 | + +-------+ + : : + + +ํ์ง๋ง, ๋ง์ฝ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๊ฐ B ์ ๋ก๋์ A ์ ๋ก๋ ์ฌ์ด์ ์กด์ฌํ๋ค๋ฉด: + + CPU 1 CPU 2 + ======================= ======================= + { A = 0, B = 9 } + STORE A=1 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE B=2 + LOAD B + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + LOAD A + +CPU 1 ์ ์ํด ๋ง๋ค์ด์ง ๋ถ๋ถ์ ์์๊ฐ CPU 2 ์๋ ๊ทธ๋๋ก ์ธ์ง๋ฉ๋๋ค: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + | : : | | + ์ฌ๊ธฐ์ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ----> \ rrrrrrrrrrrrrrrrr | | + B ๋ก์ ์คํ ์ด ์ ์ \ +-------+ | | + ๋ชจ๋ ๊ฒฐ๊ณผ๋ฅผ CPU 2 ์ ---->| A->1 |------>| | + ๋ณด์ด๋๋ก ํ๋ค +-------+ | | + : : +-------+ + + +๋ ์๋ฒฝํ ์ค๋ช
์ ์ํด, A ์ ๋ก๋๊ฐ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด ์๊ณผ ๋ค์ ์์ผ๋ฉด ์ด๋ป๊ฒ ๋ ์ง +์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + ======================= ======================= + { A = 0, B = 9 } + STORE A=1 + <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + STORE B=2 + LOAD B + LOAD A [first load of A] + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + LOAD A [second load of A] + +A ์ ๋ก๋ ๋๊ฐ๊ฐ ๋ชจ๋ B ์ ๋ก๋ ๋ค์ ์์ง๋ง, ์๋ก ๋ค๋ฅธ ๊ฐ์ ์ป์ด์ฌ ์ +์์ต๋๋ค: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + | : : | | + | +-------+ | | + | | A->0 |------>| 1st | + | +-------+ | | + ์ฌ๊ธฐ์ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ----> \ rrrrrrrrrrrrrrrrr | | + B ๋ก์ ์คํ ์ด ์ ์ \ +-------+ | | + ๋ชจ๋ ๊ฒฐ๊ณผ๋ฅผ CPU 2 ์ ---->| A->1 |------>| 2nd | + ๋ณด์ด๋๋ก ํ๋ค +-------+ | | + : : +-------+ + + +ํ์ง๋ง CPU 1 ์์์ A ์
๋ฐ์ดํธ๋ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๊ฐ ์๋ฃ๋๊ธฐ ์ ์๋ ๋ณด์ผ ์๋ +์๊ธด ํฉ๋๋ค: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + \ : : | | + \ +-------+ | | + ---->| A->1 |------>| 1st | + +-------+ | | + rrrrrrrrrrrrrrrrr | | + +-------+ | | + | A->1 |------>| 2nd | + +-------+ | | + : : +-------+ + + +์ฌ๊ธฐ์ ๋ณด์ฅ๋๋ ๊ฑด, ๋ง์ฝ B ์ ๋ก๋๊ฐ B == 2 ๋ผ๋ ๊ฒฐ๊ณผ๋ฅผ ๋ดค๋ค๋ฉด, A ์์ ๋๋ฒ์งธ +๋ก๋๋ ํญ์ A == 1 ์ ๋ณด๊ฒ ๋ ๊ฒ์ด๋ผ๋ ๊ฒ๋๋ค. A ์์ ์ฒซ๋ฒ์งธ ๋ก๋์๋ ๊ทธ๋ฐ +๋ณด์ฅ์ด ์์ต๋๋ค; A == 0 ์ด๊ฑฐ๋ A == 1 ์ด๊ฑฐ๋ ๋ ์ค ํ๋์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ฒ ๋ ๊ฒ๋๋ค. + + +์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด VS ๋ก๋ ์์ธก +------------------------------- + +๋ง์ CPU๋ค์ด ๋ก๋๋ฅผ ์์ธก์ ์ผ๋ก (speculatively) ํฉ๋๋ค: ์ด๋ค ๋ฐ์ดํฐ๋ฅผ ๋ฉ๋ชจ๋ฆฌ์์ +๋ก๋ํด์ผ ํ๊ฒ ๋ ์ง ์์ธก์ ํ๋ค๋ฉด, ํด๋น ๋ฐ์ดํฐ๋ฅผ ๋ก๋ํ๋ ์ธ์คํธ๋ญ์
์ ์ค์ ๋ก๋ +์์ง ๋ง๋์ง ์์๋๋ผ๋ ๋ค๋ฅธ ๋ก๋ ์์
์ด ์์ด ๋ฒ์ค (bus) ๊ฐ ์๋ฌด ์ผ๋ ํ๊ณ ์์ง +์๋ค๋ฉด, ๊ทธ ๋ฐ์ดํฐ๋ฅผ ๋ก๋ํฉ๋๋ค. ์ดํ์ ์ค์ ๋ก๋ ์ธ์คํธ๋ญ์
์ด ์คํ๋๋ฉด CPU ๊ฐ +์ด๋ฏธ ๊ทธ ๊ฐ์ ๊ฐ์ง๊ณ ์๊ธฐ ๋๋ฌธ์ ๊ทธ ๋ก๋ ์ธ์คํธ๋ญ์
์ ์ฆ์ ์๋ฃ๋ฉ๋๋ค. + +ํด๋น CPU ๋ ์ค์ ๋ก๋ ๊ทธ ๊ฐ์ด ํ์์น ์์๋ค๋ ์ฌ์ค์ด ๋์ค์ ๋๋ฌ๋ ์๋ ์๋๋ฐ - +ํด๋น ๋ก๋ ์ธ์คํธ๋ญ์
์ด ๋ธ๋์น๋ก ์ฐํ๋๊ฑฐ๋ ํ์ ์ ์๊ฒ ์ฃ - , ๊ทธ๋ ๊ฒ ๋๋ฉด ์์ +์ฝ์ด๋ ๊ฐ์ ๋ฒ๋ฆฌ๊ฑฐ๋ ๋์ค์ ์ฌ์ฉ์ ์ํด ์บ์์ ๋ฃ์ด๋ ์ ์์ต๋๋ค. + +๋ค์์ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + ======================= ======================= + LOAD B + DIVIDE } ๋๋๊ธฐ ๋ช
๋ น์ ์ผ๋ฐ์ ์ผ๋ก + DIVIDE } ๊ธด ์๊ฐ์ ํ์๋ก ํฉ๋๋ค + LOAD A + +๋ ์ด๋ ๊ฒ ๋ ์ ์์ต๋๋ค: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + ๋๋๊ธฐ ํ๋๋ผ ๋ฐ์ ---> --->| A->0 |~~~~ | | + CPU ๋ A ์ LOAD ๋ฅผ +-------+ ~ | | + ์์ธกํด์ ์ํํ๋ค : : ~ | | + : :DIVIDE | | + : : ~ | | + ๋๋๊ธฐ๊ฐ ๋๋๋ฉด ---> ---> : : ~-->| | + CPU ๋ ํด๋น LOAD ๋ฅผ : : | | + ์ฆ๊ฐ ์๋ฃํ๋ค : : +-------+ + + +์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋๋ฒ์งธ ๋ก๋ ์ง์ ์ ๋๋๋ค๋ฉด: + + CPU 1 CPU 2 + ======================= ======================= + LOAD B + DIVIDE + DIVIDE + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + LOAD A + +์์ธก์ผ๋ก ์ป์ด์ง ๊ฐ์ ์ฌ์ฉ๋ ๋ฐฐ๋ฆฌ์ด์ ํ์
์ ๋ฐ๋ผ์ ํด๋น ๊ฐ์ด ์ณ์์ง ๊ฒํ ๋๊ฒ +๋ฉ๋๋ค. ๋ง์ฝ ํด๋น ๋ฉ๋ชจ๋ฆฌ ์์ญ์ ๋ณํ๊ฐ ์์๋ค๋ฉด, ์์ธก์ผ๋ก ์ป์ด๋์๋ ๊ฐ์ด +์ฌ์ฉ๋ฉ๋๋ค: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + ๋๋๊ธฐ ํ๋๋ผ ๋ฐ์ ---> --->| A->0 |~~~~ | | + CPU ๋ A ์ LOAD ๋ฅผ +-------+ ~ | | + ์์ธกํ๋ค : : ~ | | + : :DIVIDE | | + : : ~ | | + : : ~ | | + rrrrrrrrrrrrrrrr~ | | + : : ~ | | + : : ~-->| | + : : | | + : : +-------+ + + +ํ์ง๋ง ๋ค๋ฅธ CPU ์์ ์
๋ฐ์ดํธ๋ ๋ฌดํจํ๊ฐ ์์๋ค๋ฉด, ๊ทธ ์์ธก์ ๋ฌดํจํ๋๊ณ ๊ทธ ๊ฐ์ +๋ค์ ์ฝํ์ง๋๋ค: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + ๋๋๊ธฐ ํ๋๋ผ ๋ฐ์ ---> --->| A->0 |~~~~ | | + CPU ๋ A ์ LOAD ๋ฅผ +-------+ ~ | | + ์์ธกํ๋ค : : ~ | | + : :DIVIDE | | + : : ~ | | + : : ~ | | + rrrrrrrrrrrrrrrrr | | + +-------+ | | + ์์ธก์ฑ ๋์์ ๋ฌดํจํ ๋๊ณ ---> --->| A->1 |------>| | + ์
๋ฐ์ดํธ๋ ๊ฐ์ด ๋ค์ ์ฝํ์ง๋ค +-------+ | | + : : +-------+ + + +์ดํ์ฑ +------ + +์ดํ์ฑ(transitivity)์ ์ค์ ์ ์ปดํจํฐ ์์คํ
์์ ํญ์ ์ ๊ณต๋์ง๋ ์๋, ์์ +๋ง์ถ๊ธฐ์ ๋ํ ์๋นํ ์ง๊ด์ ์ธ ๊ฐ๋
์
๋๋ค. ๋ค์์ ์๊ฐ ์ดํ์ฑ์ ๋ณด์ฌ์ค๋๋ค: + + CPU 1 CPU 2 CPU 3 + ======================= ======================= ======================= + { X = 0, Y = 0 } + STORE X=1 LOAD X STORE Y=1 + <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> + LOAD Y LOAD X + +CPU 2 ์ X ๋ก๋๊ฐ 1์ ๋ฆฌํดํ๊ณ Y ๋ก๋๊ฐ 0์ ๋ฆฌํดํ๋ค๊ณ ํด๋ด
์๋ค. ์ด๋ CPU 2 ์ +X ๋ก๋๊ฐ CPU 1 ์ X ์คํ ์ด ๋ค์ ์ด๋ฃจ์ด์ก๊ณ CPU 2 ์ Y ๋ก๋๋ CPU 3 ์ Y ์คํ ์ด +์ ์ ์ด๋ฃจ์ด์ก์์ ์๋ฏธํฉ๋๋ค. ๊ทธ๋ผ "CPU 3 ์ X ๋ก๋๋ 0์ ๋ฆฌํดํ ์ ์๋์?" + +CPU 2 ์ X ๋ก๋๋ CPU 1 ์ ์คํ ์ด ํ์ ์ด๋ฃจ์ด์ก์ผ๋, CPU 3 ์ X ๋ก๋๋ 1์ +๋ฆฌํดํ๋๊ฒ ์์ฐ์ค๋ฝ์ต๋๋ค. ์ด๋ฐ ์๊ฐ์ด ์ดํ์ฑ์ ํ ์์
๋๋ค: CPU A ์์ ์คํ๋ +๋ก๋๊ฐ CPU B ์์์ ๊ฐ์ ๋ณ์์ ๋ํ ๋ก๋๋ฅผ ๋ค๋ฐ๋ฅธ๋ค๋ฉด, CPU A ์ ๋ก๋๋ CPU B +์ ๋ก๋๊ฐ ๋ด๋์ ๊ฐ๊ณผ ๊ฐ๊ฑฐ๋ ๊ทธ ํ์ ๊ฐ์ ๋ด๋์์ผ ํฉ๋๋ค. + +๋ฆฌ๋
์ค ์ปค๋์์ ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด์ ์ฌ์ฉ์ ์ดํ์ฑ์ ๋ณด์ฅํฉ๋๋ค. ๋ฐ๋ผ์, ์์ ์์์ +CPU 2 ์ X ๋ก๋๊ฐ 1์, Y ๋ก๋๋ 0์ ๋ฆฌํดํ๋ค๋ฉด, CPU 3 ์ X ๋ก๋๋ ๋ฐ๋์ 1์ +๋ฆฌํดํฉ๋๋ค. + +ํ์ง๋ง, ์ฝ๊ธฐ๋ ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด์ ๋ํด์๋ ์ดํ์ฑ์ด ๋ณด์ฅ๋์ง -์์ต๋๋ค-. ์๋ฅผ ๋ค์ด, +์์ ์์์ CPU 2 ์ ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๊ฐ ์๋์ฒ๋ผ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ก ๋ฐ๋ ๊ฒฝ์ฐ๋ฅผ ์๊ฐํด +๋ด
์๋ค: + + CPU 1 CPU 2 CPU 3 + ======================= ======================= ======================= + { X = 0, Y = 0 } + STORE X=1 LOAD X STORE Y=1 + <์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด> <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> + LOAD Y LOAD X + +์ด ์ฝ๋๋ ์ดํ์ฑ์ ๊ฐ์ง ์์ต๋๋ค: ์ด ์์์๋, CPU 2 ์ X ๋ก๋๊ฐ 1์ +๋ฆฌํดํ๊ณ , Y ๋ก๋๋ 0์ ๋ฆฌํดํ์ง๋ง CPU 3 ์ X ๋ก๋๊ฐ 0์ ๋ฆฌํดํ๋ ๊ฒ๋ ์์ ํ +ํฉ๋ฒ์ ์
๋๋ค. + +CPU 2 ์ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๊ฐ ์์ ์ ์ฝ๊ธฐ๋ ์์๋ฅผ ๋ง์ถฐ์ค๋, CPU 1 ์ ์คํ ์ด์์ +์์๋ฅผ ๋ง์ถฐ์ค๋ค๊ณ ๋ ๋ณด์ฅํ ์ ์๋ค๋๊ฒ ํต์ฌ์
๋๋ค. ๋ฐ๋ผ์, CPU 1 ๊ณผ CPU 2 ๊ฐ +๋ฒํผ๋ ์บ์๋ฅผ ๊ณต์ ํ๋ ์์คํ
์์ ์ด ์์ ์ฝ๋๊ฐ ์คํ๋๋ค๋ฉด, CPU 2 ๋ CPU 1 ์ด +์ด ๊ฐ์ ์ข ๋นจ๋ฆฌ ์ ๊ทผํ ์ ์์ ๊ฒ์
๋๋ค. ๋ฐ๋ผ์ CPU 1 ๊ณผ CPU 2 ์ ์ ๊ทผ์ผ๋ก +์กฐํฉ๋ ์์๋ฅผ ๋ชจ๋ CPU ๊ฐ ๋์ํ ์ ์๋๋ก ํ๊ธฐ ์ํด ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํฉ๋๋ค. + +๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๋ "๊ธ๋ก๋ฒ ์ดํ์ฑ"์ ์ ๊ณตํด์, ๋ชจ๋ CPU ๋ค์ด ์คํผ๋ ์ด์
๋ค์ ์์์ +๋์ํ๊ฒ ํ ๊ฒ์
๋๋ค. ๋ฐ๋ฉด, release-acquire ์กฐํฉ์ "๋ก์ปฌ ์ดํ์ฑ" ๋ง์ +์ ๊ณตํด์, ํด๋น ์กฐํฉ์ด ์ฌ์ฉ๋ CPU ๋ค๋ง์ด ํด๋น ์ก์ธ์ค๋ค์ ์กฐํฉ๋ ์์์ ๋์ํจ์ด +๋ณด์ฅ๋ฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์กด๊ฒฝ์ค๋ฐ Herman Hollerith ์ C ์ฝ๋๋ก ๋ณด๋ฉด: + + int u, v, x, y, z; + + void cpu0(void) + { + r0 = smp_load_acquire(&x); + WRITE_ONCE(u, 1); + smp_store_release(&y, 1); + } + + void cpu1(void) + { + r1 = smp_load_acquire(&y); + r4 = READ_ONCE(v); + r5 = READ_ONCE(u); + smp_store_release(&z, 1); + } + + void cpu2(void) + { + r2 = smp_load_acquire(&z); + smp_store_release(&x, 1); + } + + void cpu3(void) + { + WRITE_ONCE(v, 1); + smp_mb(); + r3 = READ_ONCE(u); + } + +cpu0(), cpu1(), ๊ทธ๋ฆฌ๊ณ cpu2() ๋ smp_store_release()/smp_load_acquire() ์์ +์ฐ๊ฒฐ์ ํตํ ๋ก์ปฌ ์ดํ์ฑ์ ๋์ฐธํ๊ณ ์์ผ๋ฏ๋ก, ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ ๋์ค์ง ์์ +๊ฒ๋๋ค: + + r0 == 1 && r1 == 1 && r2 == 1 + +๋ ๋์๊ฐ์, cpu0() ์ cpu1() ์ฌ์ด์ release-acquire ๊ด๊ณ๋ก ์ธํด, cpu1() ์ +cpu0() ์ ์ฐ๊ธฐ๋ฅผ ๋ด์ผ๋ง ํ๋ฏ๋ก, ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ ์์ ๊ฒ๋๋ค: + + r1 == 1 && r5 == 0 + +ํ์ง๋ง, release-acquire ํ๋์ฑ์ ๋์ฐธํ CPU ๋ค์๋ง ์ ์ฉ๋๋ฏ๋ก cpu3() ์๋ +์ ์ฉ๋์ง ์์ต๋๋ค. ๋ฐ๋ผ์, ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๊ฐ ๊ฐ๋ฅํฉ๋๋ค: + + r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 + +๋น์ทํ๊ฒ, ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ ๊ฐ๋ฅํฉ๋๋ค: + + r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1 + +cpu0(), cpu1(), ๊ทธ๋ฆฌ๊ณ cpu2() ๋ ๊ทธ๋ค์ ์ฝ๊ธฐ์ ์ฐ๊ธฐ๋ฅผ ์์๋๋ก ๋ณด๊ฒ ๋์ง๋ง, +release-acquire ์ฒด์ธ์ ๊ด์ฌ๋์ง ์์ CPU ๋ค์ ๊ทธ ์์์ ์ด๊ฒฌ์ ๊ฐ์ง ์ +์์ต๋๋ค. ์ด๋ฐ ์ด๊ฒฌ์ smp_load_acquire() ์ smp_store_release() ์ ๊ตฌํ์ +์ฌ์ฉ๋๋ ์ํ๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ธ์คํธ๋ญ์
๋ค์ ํญ์ ๋ฐฐ๋ฆฌ์ด ์์ ์คํ ์ด๋ค์ ๋ค์ +๋ก๋๋ค์ ์์ธ์ธ ํ์๋ ์๋ค๋ ์ฌ์ค์์ ๊ธฐ์ธํฉ๋๋ค. ์ด ๋ง์ cpu3() ๋ cpu0() ์ +u ๋ก์ ์คํ ์ด๋ฅผ cpu1() ์ v ๋ก๋ถํฐ์ ๋ก๋ ๋ค์ ์ผ์ด๋ ๊ฒ์ผ๋ก ๋ณผ ์ ์๋ค๋ +๋ป์
๋๋ค, cpu0() ์ cpu1() ์ ์ด ๋ ์คํผ๋ ์ด์
์ด ์๋๋ ์์๋๋ก ์ผ์ด๋ฌ์์ +๋ชจ๋ ๋์ํ๋๋ฐ๋ ๋ง์
๋๋ค. + +ํ์ง๋ง, smp_load_acquire() ๋ ๋ง์ ์ด ์๋์ ๋ช
์ฌํ์๊ธฐ ๋ฐ๋๋๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, +์ด ํจ์๋ ๋จ์ํ ์์ ๊ท์น์ ์งํค๋ฉฐ ์ธ์๋ก๋ถํฐ์ ์ฝ๊ธฐ๋ฅผ ์ํํฉ๋๋ค. ์ด๊ฒ์ +์ด๋ค ํน์ ํ ๊ฐ์ด ์ฝํ ๊ฒ์ธ์ง๋ ๋ณด์ฅํ์ง -์์ต๋๋ค-. ๋ฐ๋ผ์, ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ +๊ฐ๋ฅํฉ๋๋ค: + + r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0 + +์ด๋ฐ ๊ฒฐ๊ณผ๋ ์ด๋ค ๊ฒ๋ ์ฌ๋ฐฐ์น ๋์ง ์๋, ์์ฐจ์ ์ผ๊ด์ฑ์ ๊ฐ์ง ๊ฐ์์ +์์คํ
์์๋ ์ผ์ด๋ ์ ์์์ ๊ธฐ์ตํด ๋์๊ธฐ ๋ฐ๋๋๋ค. + +๋ค์ ๋งํ์ง๋ง, ๋น์ ์ ์ฝ๋๊ฐ ๊ธ๋ก๋ฒ ์ดํ์ฑ์ ํ์๋ก ํ๋ค๋ฉด, ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด๋ฅผ +์ฌ์ฉํ์ญ์์ค. + + +================== +๋ช
์์ ์ปค๋ ๋ฐฐ๋ฆฌ์ด +================== + +๋ฆฌ๋
์ค ์ปค๋์ ์๋ก ๋ค๋ฅธ ๋จ๊ณ์์ ๋์ํ๋ ๋ค์ํ ๋ฐฐ๋ฆฌ์ด๋ค์ ๊ฐ์ง๊ณ ์์ต๋๋ค: + + (*) ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด. + + (*) CPU ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด. + + (*) MMIO ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด. + + +์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด +--------------- + +๋ฆฌ๋
์ค ์ปค๋์ ์ปดํ์ผ๋ฌ๊ฐ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ฅผ ์ฌ๋ฐฐ์น ํ๋ ๊ฒ์ ๋ง์์ฃผ๋ ๋ช
์์ ์ธ +์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๊ฐ์ง๊ณ ์์ต๋๋ค: + + barrier(); + +์ด๊ฑด ๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด์
๋๋ค -- barrier() ์ ์ฝ๊ธฐ-์ฝ๊ธฐ ๋ ์ฐ๊ธฐ-์ฐ๊ธฐ ๋ณ์ข
์ ์์ต๋๋ค. +ํ์ง๋ง, READ_ONCE() ์ WRITE_ONCE() ๋ ํน์ ์ก์ธ์ค๋ค์ ๋ํด์๋ง ๋์ํ๋ +barrier() ์ ์ํ๋ ํํ๋ก ๋ณผ ์ ์์ต๋๋ค. + +barrier() ํจ์๋ ๋ค์๊ณผ ๊ฐ์ ํจ๊ณผ๋ฅผ ๊ฐ์ต๋๋ค: + + (*) ์ปดํ์ผ๋ฌ๊ฐ barrier() ๋ค์ ์ก์ธ์ค๋ค์ด barrier() ์์ ์ก์ธ์ค๋ณด๋ค ์์ผ๋ก + ์ฌ๋ฐฐ์น๋์ง ๋ชปํ๊ฒ ํฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ ์ฝ๋์ ์ธํฐ๋ฝํธ ๋นํ + ์ฝ๋ ์ฌ์ด์ ํต์ ์ ์ ์คํ ํ๊ธฐ ์ํด ์ฌ์ฉ๋ ์ ์์ต๋๋ค. + + (*) ๋ฃจํ์์, ์ปดํ์ผ๋ฌ๊ฐ ๋ฃจํ ์กฐ๊ฑด์ ์ฌ์ฉ๋ ๋ณ์๋ฅผ ๋งค ์ดํฐ๋ ์ด์
๋ง๋ค + ๋ฉ๋ชจ๋ฆฌ์์ ๋ก๋ํ์ง ์์๋ ๋๋๋ก ์ต์ ํ ํ๋๊ฑธ ๋ฐฉ์งํฉ๋๋ค. + +READ_ONCE() ์ WRITE_ONCE() ํจ์๋ ์ฑ๊ธ ์ฐ๋ ๋ ์ฝ๋์์๋ ๋ฌธ์ ์์ง๋ง ๋์์ฑ์ด +์๋ ์ฝ๋์์๋ ๋ฌธ์ ๊ฐ ๋ ์ ์๋ ๋ชจ๋ ์ต์ ํ๋ฅผ ๋ง์ต๋๋ค. ์ด๋ฐ ๋ฅ์ ์ต์ ํ์ +๋ํ ์๋ฅผ ๋ช๊ฐ์ง ๋ค์ด๋ณด๋ฉด ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + (*) ์ปดํ์ผ๋ฌ๋ ๊ฐ์ ๋ณ์์ ๋ํ ๋ก๋์ ์คํ ์ด๋ฅผ ์ฌ๋ฐฐ์น ํ ์ ์๊ณ , ์ด๋ค + ๊ฒฝ์ฐ์๋ CPU๊ฐ ๊ฐ์ ๋ณ์๋ก๋ถํฐ์ ๋ก๋๋ค์ ์ฌ๋ฐฐ์นํ ์๋ ์์ต๋๋ค. ์ด๋ + ๋ค์์ ์ฝ๋๊ฐ: + + a[0] = x; + a[1] = x; + + x ์ ์์ ๊ฐ์ด a[1] ์, ์ ๊ฐ์ด a[0] ์ ์๊ฒ ํ ์ ์๋ค๋ ๋ป์
๋๋ค. + ์ปดํ์ผ๋ฌ์ CPU๊ฐ ์ด๋ฐ ์ผ์ ๋ชปํ๊ฒ ํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ํด์ผ ํฉ๋๋ค: + + a[0] = READ_ONCE(x); + a[1] = READ_ONCE(x); + + ์ฆ, READ_ONCE() ์ WRITE_ONCE() ๋ ์ฌ๋ฌ CPU ์์ ํ๋์ ๋ณ์์ ๊ฐํด์ง๋ + ์ก์ธ์ค๋ค์ ์บ์ ์ผ๊ด์ฑ์ ์ ๊ณตํฉ๋๋ค. + + (*) ์ปดํ์ผ๋ฌ๋ ๊ฐ์ ๋ณ์์ ๋ํ ์ฐ์์ ์ธ ๋ก๋๋ค์ ๋ณํฉํ ์ ์์ต๋๋ค. ๊ทธ๋ฐ + ๋ณํฉ ์์
์ผ๋ก ์ปดํ์ผ๋ฌ๋ ๋ค์์ ์ฝ๋๋ฅผ: + + while (tmp = a) + do_something_with(tmp); + + ๋ค์๊ณผ ๊ฐ์ด, ์ฑ๊ธ ์ฐ๋ ๋ ์ฝ๋์์๋ ๋ง์ด ๋์ง๋ง ๊ฐ๋ฐ์์ ์๋์ ์ ํ ๋ง์ง + ์๋ ๋ฐฉํฅ์ผ๋ก "์ต์ ํ" ํ ์ ์์ต๋๋ค: + + if (tmp = a) + for (;;) + do_something_with(tmp); + + ์ปดํ์ผ๋ฌ๊ฐ ์ด๋ฐ ์ง์ ํ์ง ๋ชปํ๊ฒ ํ๋ ค๋ฉด READ_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์: + + while (tmp = READ_ONCE(a)) + do_something_with(tmp); + + (*) ์์ปจ๋ ๋ ์ง์คํฐ ์ฌ์ฉ๋์ด ๋ง์ ์ปดํ์ผ๋ฌ๊ฐ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ๋ ์ง์คํฐ์ ๋ด์ ์ + ์๋ ๊ฒฝ์ฐ, ์ปดํ์ผ๋ฌ๋ ๋ณ์๋ฅผ ๋ค์ ๋ก๋ํ ์ ์์ต๋๋ค. ๋ฐ๋ผ์ ์ปดํ์ผ๋ฌ๋ + ์์ ์์์ ๋ณ์ 'tmp' ์ฌ์ฉ์ ์ต์ ํ๋ก ์์ ๋ฒ๋ฆด ์ ์์ต๋๋ค: + + while (tmp = a) + do_something_with(tmp); + + ์ด ์ฝ๋๋ ๋ค์๊ณผ ๊ฐ์ด ์ฑ๊ธ ์ฐ๋ ๋์์๋ ์๋ฒฝํ์ง๋ง ๋์์ฑ์ด ์กด์ฌํ๋ + ๊ฒฝ์ฐ์ ์น๋ช
์ ์ธ ์ฝ๋๋ก ๋ฐ๋ ์ ์์ต๋๋ค: + + while (a) + do_something_with(a); + + ์๋ฅผ ๋ค์ด, ์ต์ ํ๋ ์ด ์ฝ๋๋ ๋ณ์ a ๊ฐ ๋ค๋ฅธ CPU ์ ์ํด "while" ๋ฌธ๊ณผ + do_something_with() ํธ์ถ ์ฌ์ด์ ๋ฐ๋์ด do_something_with() ์ 0์ ๋๊ธธ + ์๋ ์์ต๋๋ค. + + ์ด๋ฒ์๋, ์ปดํ์ผ๋ฌ๊ฐ ๊ทธ๋ฐ ์ง์ ํ๋๊ฑธ ๋ง๊ธฐ ์ํด READ_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์: + + while (tmp = READ_ONCE(a)) + do_something_with(tmp); + + ๋ ์ง์คํฐ๊ฐ ๋ถ์กฑํ ์ํฉ์ ๊ฒช๋ ๊ฒฝ์ฐ, ์ปดํ์ผ๋ฌ๋ tmp ๋ฅผ ์คํ์ ์ ์ฅํด๋ ์๋ + ์์ต๋๋ค. ์ปดํ์ผ๋ฌ๊ฐ ๋ณ์๋ฅผ ๋ค์ ์ฝ์ด๋ค์ด๋๊ฑด ์ด๋ ๊ฒ ์ ์ฅํด๋๊ณ ํ์ ๋ค์ + ์ฝ์ด๋ค์ด๋๋ฐ ๋๋ ์ค๋ฒํค๋ ๋๋ฌธ์
๋๋ค. ๊ทธ๋ ๊ฒ ํ๋๊ฒ ์ฑ๊ธ ์ฐ๋ ๋ + ์ฝ๋์์๋ ์์ ํ๋ฏ๋ก, ์์ ํ์ง ์์ ๊ฒฝ์ฐ์๋ ์ปดํ์ผ๋ฌ์๊ฒ ์ง์ ์๋ ค์ค์ผ + ํฉ๋๋ค. + + (*) ์ปดํ์ผ๋ฌ๋ ๊ทธ ๊ฐ์ด ๋ฌด์์ผ์ง ์๊ณ ์๋ค๋ฉด ๋ก๋๋ฅผ ์์ ์ํ ์๋ ์์ต๋๋ค. + ์๋ฅผ ๋ค์ด, ๋ค์์ ์ฝ๋๋ ๋ณ์ 'a' ์ ๊ฐ์ด ํญ์ 0์์ ์ฆ๋ช
ํ ์ ์๋ค๋ฉด: + + while (tmp = a) + do_something_with(tmp); + + ์ด๋ ๊ฒ ์ต์ ํ ๋์ด๋ฒ๋ฆด ์ ์์ต๋๋ค: + + do { } while (0); + + ์ด ๋ณํ์ ์ฑ๊ธ ์ฐ๋ ๋ ์ฝ๋์์๋ ๋์์ด ๋๋๋ฐ ๋ก๋์ ๋ธ๋์น๋ฅผ ์ ๊ฑฐํ๊ธฐ + ๋๋ฌธ์
๋๋ค. ๋ฌธ์ ๋ ์ปดํ์ผ๋ฌ๊ฐ 'a' ์ ๊ฐ์ ์
๋ฐ์ดํธ ํ๋๊ฑด ํ์ฌ์ CPU ํ๋ + ๋ฟ์ด๋ผ๋ ๊ฐ์ ์์์ ์ฆ๋ช
์ ํ๋ค๋๋ฐ ์์ต๋๋ค. ๋ง์ฝ ๋ณ์ 'a' ๊ฐ ๊ณต์ ๋์ด + ์๋ค๋ฉด, ์ปดํ์ผ๋ฌ์ ์ฆ๋ช
์ ํ๋ฆฐ ๊ฒ์ด ๋ ๊ฒ๋๋ค. ์ปดํ์ผ๋ฌ๋ ๊ทธ ์์ ์ด + ์๊ฐํ๋ ๊ฒ๋งํผ ๋ง์ ๊ฒ์ ์๊ณ ์์ง ๋ชปํจ์ ์ปดํ์ผ๋ฌ์๊ฒ ์๋ฆฌ๊ธฐ ์ํด + READ_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์: + + while (tmp = READ_ONCE(a)) + do_something_with(tmp); + + ํ์ง๋ง ์ปดํ์ผ๋ฌ๋ READ_ONCE() ๋ค์ ๋์ค๋ ๊ฐ์ ๋ํด์๋ ๋๊ธธ์ ๋๊ณ ์์์ + ๊ธฐ์ตํ์ธ์. ์๋ฅผ ๋ค์ด, ๋ค์์ ์ฝ๋์์ MAX ๋ ์ ์ฒ๋ฆฌ๊ธฐ ๋งคํฌ๋ก๋ก, 1์ ๊ฐ์ + ๊ฐ๋๋ค๊ณ ํด๋ด
์๋ค: + + while ((tmp = READ_ONCE(a)) % MAX) + do_something_with(tmp); + + ์ด๋ ๊ฒ ๋๋ฉด ์ปดํ์ผ๋ฌ๋ MAX ๋ฅผ ๊ฐ์ง๊ณ ์ํ๋๋ "%" ์คํผ๋ ์ดํฐ์ ๊ฒฐ๊ณผ๊ฐ ํญ์ + 0์ด๋ผ๋ ๊ฒ์ ์๊ฒ ๋๊ณ , ์ปดํ์ผ๋ฌ๊ฐ ์ฝ๋๋ฅผ ์ค์ง์ ์ผ๋ก๋ ์กด์ฌํ์ง ์๋ + ๊ฒ์ฒ๋ผ ์ต์ ํ ํ๋ ๊ฒ์ด ํ์ฉ๋์ด ๋ฒ๋ฆฝ๋๋ค. ('a' ๋ณ์์ ๋ก๋๋ ์ฌ์ ํ + ํํด์ง ๊ฒ๋๋ค.) + + (*) ๋น์ทํ๊ฒ, ์ปดํ์ผ๋ฌ๋ ๋ณ์๊ฐ ์ ์ฅํ๋ ค ํ๋ ๊ฐ์ ์ด๋ฏธ ๊ฐ์ง๊ณ ์๋ค๋ ๊ฒ์ + ์๋ฉด ์คํ ์ด ์์ฒด๋ฅผ ์ ๊ฑฐํ ์ ์์ต๋๋ค. ์ด๋ฒ์๋, ์ปดํ์ผ๋ฌ๋ ํ์ฌ์ CPU + ๋ง์ด ๊ทธ ๋ณ์์ ๊ฐ์ ์ฐ๋ ์ค๋ก์ง ํ๋์ ์กด์ฌ๋ผ๊ณ ์๊ฐํ์ฌ ๊ณต์ ๋ ๋ณ์์ + ๋ํด์๋ ์๋ชป๋ ์ผ์ ํ๊ฒ ๋ฉ๋๋ค. ์๋ฅผ ๋ค์ด, ๋ค์๊ณผ ๊ฐ์ ๊ฒฝ์ฐ๊ฐ ์์ ์ + ์์ต๋๋ค: + + a = 0; + ... ๋ณ์ a ์ ์คํ ์ด๋ฅผ ํ์ง ์๋ ์ฝ๋ ... + a = 0; + + ์ปดํ์ผ๋ฌ๋ ๋ณ์ 'a' ์ ๊ฐ์ ์ด๋ฏธ 0์ด๋ผ๋ ๊ฒ์ ์๊ณ , ๋ฐ๋ผ์ ๋๋ฒ์งธ ์คํ ์ด๋ฅผ + ์ญ์ ํ ๊ฒ๋๋ค. ๋ง์ฝ ๋ค๋ฅธ CPU ๊ฐ ๊ทธ ์ฌ์ด ๋ณ์ 'a' ์ ๋ค๋ฅธ ๊ฐ์ ์ผ๋ค๋ฉด + ํฉ๋นํ ๊ฒฐ๊ณผ๊ฐ ๋์ฌ ๊ฒ๋๋ค. + + ์ปดํ์ผ๋ฌ๊ฐ ๊ทธ๋ฐ ์๋ชป๋ ์ถ์ธก์ ํ์ง ์๋๋ก WRITE_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์: + + WRITE_ONCE(a, 0); + ... ๋ณ์ a ์ ์คํ ์ด๋ฅผ ํ์ง ์๋ ์ฝ๋ ... + WRITE_ONCE(a, 0); + + (*) ์ปดํ์ผ๋ฌ๋ ํ์ง ๋ง๋ผ๊ณ ํ์ง ์์ผ๋ฉด ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ ์ฌ๋ฐฐ์น ํ ์ + ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, ๋ค์์ ํ๋ก์ธ์ค ๋ ๋ฒจ ์ฝ๋์ ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ ์ฌ์ด์ + ์ํธ์์ฉ์ ์๊ฐํด ๋ด
์๋ค: + + void process_level(void) + { + msg = get_message(); + flag = true; + } + + void interrupt_handler(void) + { + if (flag) + process_message(msg); + } + + ์ด ์ฝ๋์๋ ์ปดํ์ผ๋ฌ๊ฐ process_level() ์ ๋ค์๊ณผ ๊ฐ์ด ๋ณํํ๋ ๊ฒ์ ๋ง์ + ์๋จ์ด ์๊ณ , ์ด๋ฐ ๋ณํ์ ์ฑ๊ธ์ฐ๋ ๋์์๋ผ๋ฉด ์ค์ ๋ก ํ๋ฅญํ ์ ํ์ผ ์ + ์์ต๋๋ค: + + void process_level(void) + { + flag = true; + msg = get_message(); + } + + ์ด ๋๊ฐ์ ๋ฌธ์ฅ ์ฌ์ด์ ์ธํฐ๋ฝํธ๊ฐ ๋ฐ์ํ๋ค๋ฉด, interrupt_handler() ๋ ์๋ฏธ๋ฅผ + ์ ์ ์๋ ๋ฉ์ธ์ง๋ฅผ ๋ฐ์ ์๋ ์์ต๋๋ค. ์ด๊ฑธ ๋ง๊ธฐ ์ํด ๋ค์๊ณผ ๊ฐ์ด + WRITE_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์: + + void process_level(void) + { + WRITE_ONCE(msg, get_message()); + WRITE_ONCE(flag, true); + } + + void interrupt_handler(void) + { + if (READ_ONCE(flag)) + process_message(READ_ONCE(msg)); + } + + interrupt_handler() ์์์๋ ์ค์ฒฉ๋ ์ธํฐ๋ฝํธ๋ NMI ์ ๊ฐ์ด ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ + ์ญ์ 'flag' ์ 'msg' ์ ์ ๊ทผํ๋ ๋๋ค๋ฅธ ๋ฌด์ธ๊ฐ์ ์ธํฐ๋ฝํธ ๋ ์ ์๋ค๋ฉด + READ_ONCE() ์ WRITE_ONCE() ๋ฅผ ์ฌ์ฉํด์ผ ํจ์ ๊ธฐ์ตํด ๋์ธ์. ๋ง์ฝ ๊ทธ๋ฐ + ๊ฐ๋ฅ์ฑ์ด ์๋ค๋ฉด, interrupt_handler() ์์์๋ ๋ฌธ์ํ ๋ชฉ์ ์ด ์๋๋ผ๋ฉด + READ_ONCE() ์ WRITE_ONCE() ๋ ํ์์น ์์ต๋๋ค. (๊ทผ๋์ ๋ฆฌ๋
์ค ์ปค๋์์ + ์ค์ฒฉ๋ ์ธํฐ๋ฝํธ๋ ๋ณดํต ์ ์ผ์ด๋์ง ์์๋ ๊ธฐ์ตํด ๋์ธ์, ์ค์ ๋ก, ์ด๋ค + ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ๊ฐ ์ธํฐ๋ฝํธ๊ฐ ํ์ฑํ๋ ์ฑ๋ก ๋ฆฌํดํ๋ฉด WARN_ONCE() ๊ฐ + ์คํ๋ฉ๋๋ค.) + + ์ปดํ์ผ๋ฌ๋ READ_ONCE() ์ WRITE_ONCE() ๋ค์ READ_ONCE() ๋ WRITE_ONCE(), + barrier(), ๋๋ ๋น์ทํ ๊ฒ๋ค์ ๋ด๊ณ ์์ง ์์ ์ฝ๋๋ฅผ ์์ง์ผ ์ ์์ ๊ฒ์ผ๋ก + ๊ฐ์ ๋์ด์ผ ํฉ๋๋ค. + + ์ด ํจ๊ณผ๋ barrier() ๋ฅผ ํตํด์๋ ๋ง๋ค ์ ์์ง๋ง, READ_ONCE() ์ + WRITE_ONCE() ๊ฐ ์ข ๋ ์๋ชฉ ๋์ ์ ํ์
๋๋ค: READ_ONCE() ์ WRITE_ONCE()๋ + ์ปดํ์ผ๋ฌ์ ์ฃผ์ด์ง ๋ฉ๋ชจ๋ฆฌ ์์ญ์ ๋ํด์๋ง ์ต์ ํ ๊ฐ๋ฅ์ฑ์ ํฌ๊ธฐํ๋๋ก + ํ์ง๋ง, barrier() ๋ ์ปดํ์ผ๋ฌ๊ฐ ์ง๊ธ๊น์ง ๊ธฐ๊ณ์ ๋ ์ง์คํฐ์ ์บ์ํด ๋์ + ๋ชจ๋ ๋ฉ๋ชจ๋ฆฌ ์์ญ์ ๊ฐ์ ๋ฒ๋ ค์ผ ํ๊ฒ ํ๊ธฐ ๋๋ฌธ์
๋๋ค. ๋ฌผ๋ก , ์ปดํ์ผ๋ฌ๋ + READ_ONCE() ์ WRITE_ONCE() ๊ฐ ์ผ์ด๋ ์์๋ ์ง์ผ์ค๋๋ค, CPU ๋ ๋น์ฐํ + ๊ทธ ์์๋ฅผ ์งํฌ ์๋ฌด๊ฐ ์์ง๋ง์. + + (*) ์ปดํ์ผ๋ฌ๋ ๋ค์์ ์์์์ ๊ฐ์ด ๋ณ์์์ ์คํ ์ด๋ฅผ ๋ ์กฐํด๋ผ ์๋ ์์ต๋๋ค: + + if (a) + b = a; + else + b = 42; + + ์ปดํ์ผ๋ฌ๋ ์๋์ ๊ฐ์ ์ต์ ํ๋ก ๋ธ๋์น๋ฅผ ์ค์ผ ๊ฒ๋๋ค: + + b = 42; + if (a) + b = a; + + ์ฑ๊ธ ์ฐ๋ ๋ ์ฝ๋์์ ์ด ์ต์ ํ๋ ์์ ํ ๋ฟ ์๋๋ผ ๋ธ๋์น ๊ฐฏ์๋ฅผ + ์ค์ฌ์ค๋๋ค. ํ์ง๋ง ์ํ๊น๊ฒ๋, ๋์์ฑ์ด ์๋ ์ฝ๋์์๋ ์ด ์ต์ ํ๋ ๋ค๋ฅธ + CPU ๊ฐ 'b' ๋ฅผ ๋ก๋ํ ๋, -- 'a' ๊ฐ 0์ด ์๋๋ฐ๋ -- ๊ฐ์ง์ธ ๊ฐ, 42๋ฅผ ๋ณด๊ฒ + ๋๋ ๊ฒฝ์ฐ๋ฅผ ๊ฐ๋ฅํ๊ฒ ํฉ๋๋ค. ์ด๊ฑธ ๋ฐฉ์งํ๊ธฐ ์ํด WRITE_ONCE() ๋ฅผ + ์ฌ์ฉํ์ธ์: + + if (a) + WRITE_ONCE(b, a); + else + WRITE_ONCE(b, 42); + + ์ปดํ์ผ๋ฌ๋ ๋ก๋๋ฅผ ๋ง๋ค์ด๋ผ ์๋ ์์ต๋๋ค. ์ผ๋ฐ์ ์ผ๋ก๋ ๋ฌธ์ ๋ฅผ ์ผ์ผํค์ง + ์์ง๋ง, ์บ์ ๋ผ์ธ ๋ฐ์ด์ฑ์ ์ผ์ผ์ผ ์ฑ๋ฅ๊ณผ ํ์ฅ์ฑ์ ๋จ์ด๋จ๋ฆด ์ ์์ต๋๋ค. + ๋ ์กฐ๋ ๋ก๋๋ฅผ ๋ง๊ธฐ ์ํด์ READ_ONCE() ๋ฅผ ์ฌ์ฉํ์ธ์. + + (*) ์ ๋ ฌ๋ ๋ฉ๋ชจ๋ฆฌ ์ฃผ์์ ์์นํ, ํ๋ฒ์ ๋ฉ๋ชจ๋ฆฌ ์ฐธ์กฐ ์ธ์คํธ๋ญ์
์ผ๋ก ์ก์ธ์ค + ๊ฐ๋ฅํ ํฌ๊ธฐ์ ๋ฐ์ดํฐ๋ ํ๋์ ํฐ ์ก์ธ์ค๊ฐ ์ฌ๋ฌ๊ฐ์ ์์ ์ก์ธ์ค๋ค๋ก + ๋์ฒด๋๋ "๋ก๋ ํฐ์ด๋ง(load tearing)" ๊ณผ "์คํ ์ด ํฐ์ด๋ง(store tearing)" ์ + ๋ฐฉ์งํฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์ฃผ์ด์ง ์ํคํ
์ณ๊ฐ 7-bit imeediate field ๋ฅผ ๊ฐ๋ + 16-bit ์คํ ์ด ์ธ์คํธ๋ญ์
์ ์ ๊ณตํ๋ค๋ฉด, ์ปดํ์ผ๋ฌ๋ ๋ค์์ 32-bit ์คํ ์ด๋ฅผ + ๊ตฌํํ๋๋ฐ์ ๋๊ฐ์ 16-bit store-immediate ๋ช
๋ น์ ์ฌ์ฉํ๋ ค ํ ๊ฒ๋๋ค: + + p = 0x00010002; + + ์คํ ์ด ํ ์์๋ฅผ ๋ง๋ค๊ณ ๊ทธ ๊ฐ์ ์คํ ์ด ํ๊ธฐ ์ํด ๋๊ฐ๊ฐ ๋๋ ์ธ์คํธ๋ญ์
์ + ์ฌ์ฉํ๊ฒ ๋๋, ์ด๋ฐ ์ข
๋ฅ์ ์ต์ ํ๋ฅผ GCC ๋ ์ค์ ๋ก ํจ์ ๋ถ๋ ์์ ๋์ญ์์ค. + ์ด ์ต์ ํ๋ ์ฑ๊ธ ์ฐ๋ ๋ ์ฝ๋์์๋ ์ฑ๊ณต์ ์ธ ์ต์ ํ ์
๋๋ค. ์ค์ ๋ก, ๊ทผ๋์ + ๋ฐ์ํ (๊ทธ๋ฆฌ๊ณ ๊ณ ์ณ์ง) ๋ฒ๊ทธ๋ GCC ๊ฐ volatile ์คํ ์ด์ ๋น์ ์์ ์ผ๋ก ์ด + ์ต์ ํ๋ฅผ ์ฌ์ฉํ๊ฒ ํ์ต๋๋ค. ๊ทธ๋ฐ ๋ฒ๊ทธ๊ฐ ์๋ค๋ฉด, ๋ค์์ ์์์ + WRITE_ONCE() ์ ์ฌ์ฉ์ ์คํ ์ด ํฐ์ด๋ง์ ๋ฐฉ์งํฉ๋๋ค: + + WRITE_ONCE(p, 0x00010002); + + Packed ๊ตฌ์กฐ์ฒด์ ์ฌ์ฉ ์ญ์ ๋ค์์ ์์ฒ๋ผ ๋ก๋ / ์คํ ์ด ํฐ์ด๋ง์ ์ ๋ฐํ ์ + ์์ต๋๋ค: + + struct __attribute__((__packed__)) foo { + short a; + int b; + short c; + }; + struct foo foo1, foo2; + ... + + foo2.a = foo1.a; + foo2.b = foo1.b; + foo2.c = foo1.c; + + READ_ONCE() ๋ WRITE_ONCE() ๋ ์๊ณ volatile ๋งํน๋ ์๊ธฐ ๋๋ฌธ์, + ์ปดํ์ผ๋ฌ๋ ์ด ์ธ๊ฐ์ ๋์
๋ฌธ์ ๋๊ฐ์ 32-bit ๋ก๋์ ๋๊ฐ์ 32-bit ์คํ ์ด๋ก + ๋ณํํ ์ ์์ต๋๋ค. ์ด๋ 'foo1.b' ์ ๊ฐ์ ๋ก๋ ํฐ์ด๋ง๊ณผ 'foo2.b' ์ + ์คํ ์ด ํฐ์ด๋ง์ ์ด๋ํ ๊ฒ๋๋ค. ์ด ์์์๋ READ_ONCE() ์ WRITE_ONCE() + ๊ฐ ํฐ์ด๋ง์ ๋ง์ ์ ์์ต๋๋ค: + + foo2.a = foo1.a; + WRITE_ONCE(foo2.b, READ_ONCE(foo1.b)); + foo2.c = foo1.c; + +๊ทธ๋ ์ง๋ง, volatile ๋ก ๋งํฌ๋ ๋ณ์์ ๋ํด์๋ READ_ONCE() ์ WRITE_ONCE() ๊ฐ +ํ์์น ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, 'jiffies' ๋ volatile ๋ก ๋งํฌ๋์ด ์๊ธฐ ๋๋ฌธ์, +READ_ONCE(jiffies) ๋ผ๊ณ ํ ํ์๊ฐ ์์ต๋๋ค. READ_ONCE() ์ WRITE_ONCE() ๊ฐ +์ค์ volatile ์บ์คํ
์ผ๋ก ๊ตฌํ๋์ด ์์ด์ ์ธ์๊ฐ ์ด๋ฏธ volatile ๋ก ๋งํฌ๋์ด +์๋ค๋ฉด ๋๋ค๋ฅธ ํจ๊ณผ๋ฅผ ๋ด์ง๋ ์๊ธฐ ๋๋ฌธ์
๋๋ค. + +์ด ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด๋ค์ CPU ์๋ ์ง์ ์ ํจ๊ณผ๋ฅผ ์ ํ ๋ง๋ค์ง ์๊ธฐ ๋๋ฌธ์, ๊ฒฐ๊ตญ์ +์ฌ๋ฐฐ์น๊ฐ ์ผ์ด๋ ์๋ ์์์ ๋ถ๋ ๊ธฐ์ตํด ๋์ญ์์ค. + + +CPU ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด +----------------- + +๋ฆฌ๋
์ค ์ปค๋์ ๋ค์์ ์ฌ๋๊ฐ ๊ธฐ๋ณธ CPU ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๊ฐ์ง๊ณ ์์ต๋๋ค: + + TYPE MANDATORY SMP CONDITIONAL + =============== ======================= =========================== + ๋ฒ์ฉ mb() smp_mb() + ์ฐ๊ธฐ wmb() smp_wmb() + ์ฝ๊ธฐ rmb() smp_rmb() + ๋ฐ์ดํฐ ์์กด์ฑ read_barrier_depends() smp_read_barrier_depends() + + +๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ ์ธํ ๋ชจ๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ +ํฌํจํฉ๋๋ค. ๋ฐ์ดํฐ ์์กด์ฑ์ ์ปดํ์ผ๋ฌ์์ ์ถ๊ฐ์ ์ธ ์์ ๋ณด์ฅ์ ํฌํจํ์ง +์์ต๋๋ค. + +๋ฐฉ๋ฐฑ: ๋ฐ์ดํฐ ์์กด์ฑ์ด ์๋ ๊ฒฝ์ฐ, ์ปดํ์ผ๋ฌ๋ ํด๋น ๋ก๋๋ฅผ ์ฌ๋ฐ๋ฅธ ์์๋ก ์ผ์ผํฌ +๊ฒ์ผ๋ก (์: `a[b]` ๋ a[b] ๋ฅผ ๋ก๋ ํ๊ธฐ ์ ์ b ์ ๊ฐ์ ๋จผ์ ๋ก๋ํ๋ค) +๊ธฐ๋๋์ง๋ง, C ์ธ์ด ์ฌ์์๋ ์ปดํ์ผ๋ฌ๊ฐ b ์ ๊ฐ์ ์ถ์ธก (์: 1 ๊ณผ ๊ฐ์) ํด์ +b ๋ก๋ ์ ์ a ๋ก๋๋ฅผ ํ๋ ์ฝ๋ (์: tmp = a[1]; if (b != 1) tmp = a[b]; ) ๋ฅผ +๋ง๋ค์ง ์์์ผ ํ๋ค๋ ๋ด์ฉ ๊ฐ์ ๊ฑด ์์ต๋๋ค. ๋ํ ์ปดํ์ผ๋ฌ๋ a[b] ๋ฅผ ๋ก๋ํ +ํ์ b ๋ฅผ ๋๋ค์ ๋ก๋ํ ์๋ ์์ด์, a[b] ๋ณด๋ค ์ต์ ๋ฒ์ ์ b ๊ฐ์ ๊ฐ์ง ์๋ +์์ต๋๋ค. ์ด๋ฐ ๋ฌธ์ ๋ค์ ํด๊ฒฐ์ฑ
์ ๋ํ ์๊ฒฌ ์ผ์น๋ ์์ง ์์ต๋๋ค๋ง, ์ผ๋จ +READ_ONCE() ๋งคํฌ๋ก๋ถํฐ ๋ณด๊ธฐ ์์ํ๋๊ฒ ์ข์ ์์์ด ๋ ๊ฒ๋๋ค. + +SMP ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ ์ ๋ํ๋ก์ธ์๋ก ์ปดํ์ผ๋ ์์คํ
์์๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด๋ก +๋ฐ๋๋๋ฐ, ํ๋์ CPU ๋ ์ค์ค๋ก ์ผ๊ด์ฑ์ ์ ์งํ๊ณ , ๊ฒน์น๋ ์ก์ธ์ค๋ค ์ญ์ ์ฌ๋ฐ๋ฅธ +์์๋ก ํํด์ง ๊ฒ์ผ๋ก ์๊ฐ๋๊ธฐ ๋๋ฌธ์
๋๋ค. ํ์ง๋ง, ์๋์ "Virtual Machine +Guests" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ญ์์ค. + +[!] SMP ์์คํ
์์ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ๋ก์ ์ ๊ทผ๋ค์ ์์ ์ธ์์ผ ํ ๋, SMP ๋ฉ๋ชจ๋ฆฌ +๋ฐฐ๋ฆฌ์ด๋ _๋ฐ๋์_ ์ฌ์ฉ๋์ด์ผ ํจ์ ๊ธฐ์ตํ์ธ์, ๊ทธ๋์ ๋ฝ์ ์ฌ์ฉํ๋ ๊ฒ์ผ๋ก๋ +์ถฉ๋ถํ๊ธด ํ์ง๋ง ๋ง์ด์ฃ . + +Mandatory ๋ฐฐ๋ฆฌ์ด๋ค์ SMP ์์คํ
์์๋ UP ์์คํ
์์๋ SMP ํจ๊ณผ๋ง ํต์ ํ๊ธฐ์๋ +๋ถํ์ํ ์ค๋ฒํค๋๋ฅผ ๊ฐ๊ธฐ ๋๋ฌธ์ SMP ํจ๊ณผ๋ง ํต์ ํ๋ฉด ๋๋ ๊ณณ์๋ ์ฌ์ฉ๋์ง ์์์ผ +ํฉ๋๋ค. ํ์ง๋ง, ๋์จํ ์์ ๊ท์น์ ๋ฉ๋ชจ๋ฆฌ I/O ์๋์ฐ๋ฅผ ํตํ MMIO ์ ํจ๊ณผ๋ฅผ +ํต์ ํ ๋์๋ mandatory ๋ฐฐ๋ฆฌ์ด๋ค์ด ์ฌ์ฉ๋ ์ ์์ต๋๋ค. ์ด ๋ฐฐ๋ฆฌ์ด๋ค์ +์ปดํ์ผ๋ฌ์ CPU ๋ชจ๋ ์ฌ๋ฐฐ์น๋ฅผ ๋ชปํ๋๋ก ํจ์ผ๋ก์จ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ด ๋๋ฐ์ด์ค์ +๋ณด์ฌ์ง๋ ์์์๋ ์ํฅ์ ์ฃผ๊ธฐ ๋๋ฌธ์, SMP ๊ฐ ์๋ ์์คํ
์ด๋ผ ํ ์ง๋ผ๋ ํ์ํ ์ +์์ต๋๋ค. + + +์ผ๋ถ ๊ณ ๊ธ ๋ฐฐ๋ฆฌ์ด ํจ์๋ค๋ ์์ต๋๋ค: + + (*) smp_store_mb(var, value) + + ์ด ํจ์๋ ํน์ ๋ณ์์ ํน์ ๊ฐ์ ๋์
ํ๊ณ ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์นฉ๋๋ค. + UP ์ปดํ์ผ์์๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด๋ณด๋ค ๋ํ ๊ฒ์ ์น๋ค๊ณ ๋ ๋ณด์ฅ๋์ง ์์ต๋๋ค. + + + (*) smp_mb__before_atomic(); + (*) smp_mb__after_atomic(); + + ์ด๊ฒ๋ค์ ๊ฐ์ ๋ฆฌํดํ์ง ์๋ (๋ํ๊ธฐ, ๋นผ๊ธฐ, ์ฆ๊ฐ, ๊ฐ์์ ๊ฐ์) ์ดํ ๋ฏน + ํจ์๋ค์ ์ํ, ํนํ ๊ทธ๊ฒ๋ค์ด ๋ ํผ๋ฐ์ค ์นด์ดํ
์ ์ฌ์ฉ๋ ๋๋ฅผ ์ํ + ํจ์๋ค์
๋๋ค. ์ด ํจ์๋ค์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ๊ณ ์์ง๋ ์์ต๋๋ค. + + ์ด๊ฒ๋ค์ ๊ฐ์ ๋ฆฌํดํ์ง ์์ผ๋ฉฐ ์ดํ ๋ฏนํ (set_bit ๊ณผ clear_bit ๊ฐ์) ๋นํธ + ์ฐ์ฐ์๋ ์ฌ์ฉ๋ ์ ์์ต๋๋ค. + + ํ ์๋ก, ๊ฐ์ฒด ํ๋๋ฅผ ๋ฌดํจํ ๊ฒ์ผ๋ก ํ์ํ๊ณ ๊ทธ ๊ฐ์ฒด์ ๋ ํผ๋ฐ์ค ์นด์ดํธ๋ฅผ + ๊ฐ์์ํค๋ ๋ค์ ์ฝ๋๋ฅผ ๋ณด์ธ์: + + obj->dead = 1; + smp_mb__before_atomic(); + atomic_dec(&obj->ref_count); + + ์ด ์ฝ๋๋ ๊ฐ์ฒด์ ์
๋ฐ์ดํธ๋ death ๋งํฌ๊ฐ ๋ ํผ๋ฐ์ค ์นด์ดํฐ ๊ฐ์ ๋์ + *์ ์* ๋ณด์ผ ๊ฒ์ ๋ณด์ฅํฉ๋๋ค. + + ๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ Documentation/atomic_ops.txt ๋ฌธ์๋ฅผ ์ฐธ๊ณ ํ์ธ์. + ์ด๋์ ์ด๊ฒ๋ค์ ์ฌ์ฉํด์ผ ํ ์ง ๊ถ๊ธํ๋ค๋ฉด "์ดํ ๋ฏน ์คํผ๋ ์ด์
" ์๋ธ์น์
์ + ์ฐธ๊ณ ํ์ธ์. + + + (*) lockless_dereference(); + + ์ด ํจ์๋ smp_read_barrier_depends() ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ๋ + ํฌ์ธํฐ ์ฝ์ด์ค๊ธฐ ๋ํผ(wrapper) ํจ์๋ก ์๊ฐ๋ ์ ์์ต๋๋ค. + + ๊ฐ์ฒด์ ๋ผ์ดํํ์์ด RCU ์ธ์ ๋ฉ์ปค๋์ฆ์ผ๋ก ๊ด๋ฆฌ๋๋ค๋ ์ ์ ์ ์ธํ๋ฉด + rcu_dereference() ์๋ ์ ์ฌํ๋ฐ, ์๋ฅผ ๋ค๋ฉด ๊ฐ์ฒด๊ฐ ์์คํ
์ด ๊บผ์ง ๋์๋ง + ์ ๊ฑฐ๋๋ ๊ฒฝ์ฐ ๋ฑ์
๋๋ค. ๋ํ, lockless_dereference() ์ RCU ์ ํจ๊ป + ์ฌ์ฉ๋ ์๋, RCU ์์ด ์ฌ์ฉ๋ ์๋ ์๋ ์ผ๋ถ ๋ฐ์ดํฐ ๊ตฌ์กฐ์ ์ฌ์ฉ๋๊ณ + ์์ต๋๋ค. + + + (*) dma_wmb(); + (*) dma_rmb(); + + ์ด๊ฒ๋ค์ CPU ์ DMA ๊ฐ๋ฅํ ๋๋ฐ์ด์ค์์ ๋ชจ๋ ์ก์ธ์ค ๊ฐ๋ฅํ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ์ + ์ฝ๊ธฐ, ์ฐ๊ธฐ ์์
๋ค์ ์์๋ฅผ ๋ณด์ฅํ๊ธฐ ์ํด consistent memory ์์ ์ฌ์ฉํ๊ธฐ + ์ํ ๊ฒ๋ค์
๋๋ค. + + ์๋ฅผ ๋ค์ด, ๋๋ฐ์ด์ค์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๊ณต์ ํ๋ฉฐ, ๋์คํฌ๋ฆฝํฐ ์ํ ๊ฐ์ ์ฌ์ฉํด + ๋์คํฌ๋ฆฝํฐ๊ฐ ๋๋ฐ์ด์ค์ ์ํด ์๋์ง ์๋๋ฉด CPU ์ ์ํด ์๋์ง ํ์ํ๊ณ , + ๊ณต์ง์ฉ ์ด์ธ์ข
(doorbell) ์ ์ฌ์ฉํด ์
๋ฐ์ดํธ๋ ๋์คํฌ๋ฆฝํฐ๊ฐ ๋๋ฐ์ด์ค์ ์ฌ์ฉ + ๊ฐ๋ฅํด์ก์์ ๊ณต์งํ๋ ๋๋ฐ์ด์ค ๋๋ผ์ด๋ฒ๋ฅผ ์๊ฐํด ๋ด
์๋ค: + + if (desc->status != DEVICE_OWN) { + /* ๋์คํฌ๋ฆฝํฐ๋ฅผ ์์ ํ๊ธฐ ์ ์๋ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ง ์์ */ + dma_rmb(); + + /* ๋ฐ์ดํฐ๋ฅผ ์ฝ๊ณ ์ */ + read_data = desc->data; + desc->data = write_data; + + /* ์ํ ์
๋ฐ์ดํธ ์ ์์ ์ฌํญ์ ๋ฐ์ */ + dma_wmb(); + + /* ์์ ๊ถ์ ์์ */ + desc->status = DEVICE_OWN; + + /* MMIO ๋ฅผ ํตํด ๋๋ฐ์ด์ค์ ๊ณต์ง๋ฅผ ํ๊ธฐ ์ ์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๋๊ธฐํ */ + wmb(); + + /* ์
๋ฐ์ดํธ๋ ๋์คํฌ๋ฆฝํฐ์ ๋๋ฐ์ด์ค์ ๊ณต์ง */ + writel(DESC_NOTIFY, doorbell); + } + + dma_rmb() ๋ ๋์คํฌ๋ฆฝํฐ๋ก๋ถํฐ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ด์ค๊ธฐ ์ ์ ๋๋ฐ์ด์ค๊ฐ ์์ ๊ถ์ + ๋ด๋์์์ ๋ณด์ฅํ๊ฒ ํ๊ณ , dma_wmb() ๋ ๋๋ฐ์ด์ค๊ฐ ์์ ์ด ์์ ๊ถ์ ๋ค์ + ๊ฐ์ก์์ ๋ณด๊ธฐ ์ ์ ๋์คํฌ๋ฆฝํฐ์ ๋ฐ์ดํฐ๊ฐ ์ฐ์์์ ๋ณด์ฅํฉ๋๋ค. wmb() ๋ + ์บ์ ์ผ๊ด์ฑ์ด ์๋ (cache incoherent) MMIO ์์ญ์ ์ฐ๊ธฐ๋ฅผ ์๋ํ๊ธฐ ์ ์ + ์บ์ ์ผ๊ด์ฑ์ด ์๋ ๋ฉ๋ชจ๋ฆฌ (cache coherent memory) ์ฐ๊ธฐ๊ฐ ์๋ฃ๋์์์ + ๋ณด์ฅํด์ฃผ๊ธฐ ์ํด ํ์ํฉ๋๋ค. + + consistent memory ์ ๋ํ ์์ธํ ๋ด์ฉ์ ์ํด์ Documentation/DMA-API.txt + ๋ฌธ์๋ฅผ ์ฐธ๊ณ ํ์ธ์. + + +MMIO ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด +---------------- + +๋ฆฌ๋
์ค ์ปค๋์ ๋ํ memory-mapped I/O ์ฐ๊ธฐ๋ฅผ ์ํ ํน๋ณํ ๋ฐฐ๋ฆฌ์ด๋ ๊ฐ์ง๊ณ +์์ต๋๋ค: + + mmiowb(); + +์ด๊ฒ์ mandatory ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด์ ๋ณ์ข
์ผ๋ก, ์ํ๋ ์์ ๊ท์น์ I/O ์์ญ์์ผ๋ก์ +์ฐ๊ธฐ๊ฐ ๋ถ๋ถ์ ์ผ๋ก ์์๋ฅผ ๋ง์ถ๋๋ก ํด์ค๋๋ค. ์ด ํจ์๋ CPU->ํ๋์จ์ด ์ฌ์ด๋ฅผ +๋์ด์ ์ค์ ํ๋์จ์ด์๊น์ง ์ผ๋ถ ์์ค์ ์ํฅ์ ๋ผ์นฉ๋๋ค. + +๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ "Acquire vs I/O ์ก์ธ์ค" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + +========================= +์๋ฌต์ ์ปค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด +========================= + +๋ฆฌ๋
์ค ์ปค๋์ ์ผ๋ถ ํจ์๋ค์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ด์ฅํ๊ณ ์๋๋ฐ, ๋ฝ(lock)๊ณผ +์ค์ผ์ฅด๋ง ๊ด๋ จ ํจ์๋ค์ด ๋๋ถ๋ถ์
๋๋ค. + +์ฌ๊ธฐ์ _์ต์ํ์_ ๋ณด์ฅ์ ์ค๋ช
ํฉ๋๋ค; ํน์ ์ํคํ
์ณ์์๋ ์ด ์ค๋ช
๋ณด๋ค ๋ ๋ง์ +๋ณด์ฅ์ ์ ๊ณตํ ์๋ ์์ต๋๋ค๋ง ํด๋น ์ํคํ
์ณ์ ์ข
์์ ์ธ ์ฝ๋ ์ธ์ ๋ถ๋ถ์์๋ +๊ทธ๋ฐ ๋ณด์ฅ์ ๊ธฐ๋ํด์ ์๋ ๊ฒ๋๋ค. + + +๋ฝ ACQUISITION ํจ์ +------------------- + +๋ฆฌ๋
์ค ์ปค๋์ ๋ค์ํ ๋ฝ ๊ตฌ์ฑ์ฒด๋ฅผ ๊ฐ์ง๊ณ ์์ต๋๋ค: + + (*) ์คํ ๋ฝ + (*) R/W ์คํ ๋ฝ + (*) ๋ฎคํ
์ค + (*) ์ธ๋งํฌ์ด + (*) R/W ์ธ๋งํฌ์ด + +๊ฐ ๊ตฌ์ฑ์ฒด๋ง๋ค ๋ชจ๋ ๊ฒฝ์ฐ์ "ACQUIRE" ์คํผ๋ ์ด์
๊ณผ "RELEASE" ์คํผ๋ ์ด์
์ ๋ณ์ข
์ด +์กด์ฌํฉ๋๋ค. ์ด ์คํผ๋ ์ด์
๋ค์ ๋ชจ๋ ์ ์ ํ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ๊ณ ์์ต๋๋ค: + + (1) ACQUIRE ์คํผ๋ ์ด์
์ ์ํฅ: + + ACQUIRE ๋ค์์ ์์ฒญ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ ACQUIRE ์คํผ๋ ์ด์
์ด ์๋ฃ๋ + ๋ค์ ์๋ฃ๋ฉ๋๋ค. + + ACQUIRE ์์์ ์์ฒญ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ ACQUIRE ์คํผ๋ ์ด์
์ด ์๋ฃ๋ ํ์ + ์๋ฃ๋ ์ ์์ต๋๋ค. smp_mb__before_spinlock() ๋ค์ ACQUIRE ๊ฐ ์คํ๋๋ + ์ฝ๋ ๋ธ๋ก์ ๋ธ๋ก ์์ ์คํ ์ด๋ฅผ ๋ธ๋ก ๋ค์ ๋ก๋์ ์คํ ์ด์ ๋ํด ์์ + ๋ง์ถฅ๋๋ค. ์ด๊ฑด smp_mb() ๋ณด๋ค ์ํ๋ ๊ฒ์์ ๊ธฐ์ตํ์ธ์! ๋ง์ ์ํคํ
์ณ์์ + smp_mb__before_spinlock() ์ ์ฌ์ค ์๋ฌด์ผ๋ ํ์ง ์์ต๋๋ค. + + (2) RELEASE ์คํผ๋ ์ด์
์ ์ํฅ: + + RELEASE ์์์ ์์ฒญ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ RELEASE ์คํผ๋ ์ด์
์ด ์๋ฃ๋๊ธฐ + ์ ์ ์๋ฃ๋ฉ๋๋ค. + + RELEASE ๋ค์์ ์์ฒญ๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ RELEASE ์คํผ๋ ์ด์
์๋ฃ ์ ์ + ์๋ฃ๋ ์ ์์ต๋๋ค. + + (3) ACQUIRE vs ACQUIRE ์ํฅ: + + ์ด๋ค ACQUIRE ์คํผ๋ ์ด์
๋ณด๋ค ์์์ ์์ฒญ๋ ๋ชจ๋ ACQUIRE ์คํผ๋ ์ด์
์ ๊ทธ + ACQUIRE ์คํผ๋ ์ด์
์ ์ ์๋ฃ๋ฉ๋๋ค. + + (4) ACQUIRE vs RELEASE implication: + + ์ด๋ค RELEASE ์คํผ๋ ์ด์
๋ณด๋ค ์์ ์์ฒญ๋ ACQUIRE ์คํผ๋ ์ด์
์ ๊ทธ RELEASE + ์คํผ๋ ์ด์
๋ณด๋ค ๋จผ์ ์๋ฃ๋ฉ๋๋ค. + + (5) ์คํจํ ์กฐ๊ฑด์ ACQUIRE ์ํฅ: + + ACQUIRE ์คํผ๋ ์ด์
์ ์ผ๋ถ ๋ฝ(lock) ๋ณ์ข
์ ๋ฝ์ด ๊ณง๋ฐ๋ก ํ๋ํ๊ธฐ์๋ + ๋ถ๊ฐ๋ฅํ ์ํ์ด๊ฑฐ๋ ๋ฝ์ด ํ๋ ๊ฐ๋ฅํด์ง๋๋ก ๊ธฐ๋ค๋ฆฌ๋ ๋์ค ์๊ทธ๋์ ๋ฐ๊ฑฐ๋ + ํด์ ์คํจํ ์ ์์ต๋๋ค. ์คํจํ ๋ฝ์ ์ด๋ค ๋ฐฐ๋ฆฌ์ด๋ ๋ดํฌํ์ง ์์ต๋๋ค. + +[!] ์ฐธ๊ณ : ๋ฝ ACQUIRE ์ RELEASE ๊ฐ ๋จ๋ฐฉํฅ ๋ฐฐ๋ฆฌ์ด์ฌ์ ๋ํ๋๋ ํ์ ์ค ํ๋๋ +ํฌ๋ฆฌํฐ์ปฌ ์น์
๋ฐ๊นฅ์ ์ธ์คํธ๋ญ์
์ ์ํฅ์ด ํฌ๋ฆฌํฐ์ปฌ ์น์
๋ด๋ถ๋ก๋ ๋ค์ด์ฌ ์ +์๋ค๋ ๊ฒ์
๋๋ค. + +RELEASE ํ์ ์์ฒญ๋๋ ACQUIRE ๋ ์ ์ฒด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ผ ์ฌ๊ฒจ์ง๋ฉด ์๋๋๋ฐ, +ACQUIRE ์์ ์ก์ธ์ค๊ฐ ACQUIRE ํ์ ์ํ๋ ์ ์๊ณ , RELEASE ํ์ ์ก์ธ์ค๊ฐ +RELEASE ์ ์ ์ํ๋ ์๋ ์์ผ๋ฉฐ, ๊ทธ ๋๊ฐ์ ์ก์ธ์ค๊ฐ ์๋ก๋ฅผ ์ง๋์น ์๋ ์๊ธฐ +๋๋ฌธ์
๋๋ค: + + *A = a; + ACQUIRE M + RELEASE M + *B = b; + +๋ ๋ค์๊ณผ ๊ฐ์ด ๋ ์๋ ์์ต๋๋ค: + + ACQUIRE M, STORE *B, STORE *A, RELEASE M + +ACQUIRE ์ RELEASE ๊ฐ ๋ฝ ํ๋๊ณผ ํด์ ๋ผ๋ฉด, ๊ทธ๋ฆฌ๊ณ ๋ฝ์ ACQUIRE ์ RELEASE ๊ฐ +๊ฐ์ ๋ฝ ๋ณ์์ ๋ํ ๊ฒ์ด๋ผ๋ฉด, ํด๋น ๋ฝ์ ์ฅ๊ณ ์์ง ์์ ๋ค๋ฅธ CPU ์ ์์ผ์๋ +์ด์ ๊ฐ์ ์ฌ๋ฐฐ์น๊ฐ ์ผ์ด๋๋ ๊ฒ์ผ๋ก ๋ณด์ผ ์ ์์ต๋๋ค. ์์ฝํ์๋ฉด, ACQUIRE ์ +์ด์ด RELEASE ์คํผ๋ ์ด์
์ ์์ฐจ์ ์ผ๋ก ์คํํ๋ ํ์๊ฐ ์ ์ฒด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ก +์๊ฐ๋์ด์ -์๋ฉ๋๋ค-. + +๋น์ทํ๊ฒ, ์์ ๋ฐ๋ ์ผ์ด์ค์ธ RELEASE ์ ACQUIRE ๋๊ฐ ์คํผ๋ ์ด์
์ ์์ฐจ์ ์คํ +์ญ์ ์ ์ฒด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ์ง ์์ต๋๋ค. ๋ฐ๋ผ์, RELEASE, ACQUIRE ๋ก +๊ท์ ๋๋ ํฌ๋ฆฌํฐ์ปฌ ์น์
์ CPU ์ํ์ RELEASE ์ ACQUIRE ๋ฅผ ๊ฐ๋ก์ง๋ฅผ ์ ์์ผ๋ฏ๋ก, +๋ค์๊ณผ ๊ฐ์ ์ฝ๋๋: + + *A = a; + RELEASE M + ACQUIRE N + *B = b; + +๋ค์๊ณผ ๊ฐ์ด ์ํ๋ ์ ์์ต๋๋ค: + + ACQUIRE N, STORE *B, STORE *A, RELEASE M + +์ด๋ฐ ์ฌ๋ฐฐ์น๋ ๋ฐ๋๋ฝ์ ์ผ์ผํฌ ์๋ ์์ ๊ฒ์ฒ๋ผ ๋ณด์ผ ์ ์์ต๋๋ค. ํ์ง๋ง, ๊ทธ๋ฐ +๋ฐ๋๋ฝ์ ์กฐ์ง์ด ์๋ค๋ฉด RELEASE ๋ ๋จ์ํ ์๋ฃ๋ ๊ฒ์ด๋ฏ๋ก ๋ฐ๋๋ฝ์ ์กด์ฌํ ์ +์์ต๋๋ค. + + ์ด๊ฒ ์ด๋ป๊ฒ ์ฌ๋ฐ๋ฅธ ๋์์ ํ ์ ์์๊น์? + + ์ฐ๋ฆฌ๊ฐ ์ด์ผ๊ธฐ ํ๊ณ ์๋๊ฑด ์ฌ๋ฐฐ์น๋ฅผ ํ๋ CPU ์ ๋ํ ์ด์ผ๊ธฐ์ด์ง, + ์ปดํ์ผ๋ฌ์ ๋ํ ๊ฒ์ด ์๋๋ ์ ์ด ํต์ฌ์
๋๋ค. ์ปดํ์ผ๋ฌ (๋๋, ๊ฐ๋ฐ์) + ๊ฐ ์คํผ๋ ์ด์
๋ค์ ์ด๋ ๊ฒ ์ฌ๋ฐฐ์นํ๋ฉด, ๋ฐ๋๋ฝ์ด ์ผ์ด๋ ์ -์์ต-๋๋ค. + + ํ์ง๋ง CPU ๊ฐ ์คํผ๋ ์ด์
๋ค์ ์ฌ๋ฐฐ์น ํ๋ค๋๊ฑธ ์๊ฐํด ๋ณด์ธ์. ์ด ์์์, + ์ด์
๋ธ๋ฆฌ ์ฝ๋ ์์ผ๋ก๋ ์ธ๋ฝ์ด ๋ฝ์ ์์๊ฒ ๋์ด ์์ต๋๋ค. CPU ๊ฐ ์ด๋ฅผ + ์ฌ๋ฐฐ์นํด์ ๋ค์ ๋ฝ ์คํผ๋ ์ด์
์ ๋จผ์ ์คํํ๊ฒ ๋ฉ๋๋ค. ๋ง์ฝ ๋ฐ๋๋ฝ์ด + ์กด์ฌํ๋ค๋ฉด, ์ด ๋ฝ ์คํผ๋ ์ด์
์ ๊ทธ์ ์คํ์ ํ๋ฉฐ ๊ณ์ํด์ ๋ฝ์ + ์๋ํฉ๋๋ค (๋๋, ํ์ฐธ ํ์๊ฒ ์ง๋ง, ์ ๋ญ๋๋ค). CPU ๋ ์ธ์ ๊ฐ๋ + (์ด์
๋ธ๋ฆฌ ์ฝ๋์์๋ ๋ฝ์ ์์๋) ์ธ๋ฝ ์คํผ๋ ์ด์
์ ์คํํ๋๋ฐ, ์ด ์ธ๋ฝ + ์คํผ๋ ์ด์
์ด ์ ์ฌ์ ๋ฐ๋๋ฝ์ ํด๊ฒฐํ๊ณ , ๋ฝ ์คํผ๋ ์ด์
๋ ๋ค์ด์ด ์ฑ๊ณตํ๊ฒ + ๋ฉ๋๋ค. + + ํ์ง๋ง ๋ง์ฝ ๋ฝ์ด ์ ์ ์๋ ํ์
์ด์๋ค๋ฉด์? ๊ทธ๋ฐ ๊ฒฝ์ฐ์ ์ฝ๋๋ + ์ค์ผ์ฅด๋ฌ๋ก ๋ค์ด๊ฐ๋ ค ํ ๊ฑฐ๊ณ , ์ฌ๊ธฐ์ ๊ฒฐ๊ตญ์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ง๋๊ฒ + ๋๋๋ฐ, ์ด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์์ ์ธ๋ฝ ์คํผ๋ ์ด์
์ด ์๋ฃ๋๋๋ก ๋ง๋ค๊ณ , + ๋ฐ๋๋ฝ์ ์ด๋ฒ์๋ ํด๊ฒฐ๋ฉ๋๋ค. ์ ์ ์๋ ํ์์ ์ธ๋ฝ ์ฌ์ด์ ๊ฒฝ์ฃผ ์ํฉ + (race) ๋ ์์ ์ ์๊ฒ ์ต๋๋ค๋ง, ๋ฝ ๊ด๋ จ ๊ธฐ๋ฅ๋ค์ ๊ทธ๋ฐ ๊ฒฝ์ฃผ ์ํฉ์ ๋ชจ๋ + ๊ฒฝ์ฐ์ ์ ๋๋ก ํด๊ฒฐํ ์ ์์ด์ผ ํฉ๋๋ค. + +๋ฝ๊ณผ ์ธ๋งํฌ์ด๋ UP ์ปดํ์ผ๋ ์์คํ
์์์ ์์์ ๋ํด ๋ณด์ฅ์ ํ์ง ์๊ธฐ ๋๋ฌธ์, +๊ทธ๋ฐ ์ํฉ์์ ์ธํฐ๋ฝํธ ๋นํ์ฑํ ์คํผ๋ ์ด์
๊ณผ ํจ๊ป๊ฐ ์๋๋ผ๋ฉด ์ด๋ค ์ผ์๋ - ํนํ +I/O ์ก์ธ์ค์ ๊ด๋ จํด์๋ - ์ ๋๋ก ์ฌ์ฉ๋ ์ ์์ ๊ฒ๋๋ค. + +"CPU ๊ฐ ACQUIRING ๋ฐฐ๋ฆฌ์ด ํจ๊ณผ" ์น์
๋ ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค. + + +์๋ฅผ ๋ค์ด, ๋ค์๊ณผ ๊ฐ์ ์ฝ๋๋ฅผ ์๊ฐํด ๋ด
์๋ค: + + *A = a; + *B = b; + ACQUIRE + *C = c; + *D = d; + RELEASE + *E = e; + *F = f; + +์ฌ๊ธฐ์ ๋ค์์ ์ด๋ฒคํธ ์ํ์ค๊ฐ ์๊ธธ ์ ์์ต๋๋ค: + + ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE + + [+] {*F,*A} ๋ ์กฐํฉ๋ ์ก์ธ์ค๋ฅผ ์๋ฏธํฉ๋๋ค. + +ํ์ง๋ง ๋ค์๊ณผ ๊ฐ์ ๊ฑด ๋ถ๊ฐ๋ฅํ์ฃ : + + {*F,*A}, *B, ACQUIRE, *C, *D, RELEASE, *E + *A, *B, *C, ACQUIRE, *D, RELEASE, *E, *F + *A, *B, ACQUIRE, *C, RELEASE, *D, *E, *F + *B, ACQUIRE, *C, *D, RELEASE, {*F,*A}, *E + + + +์ธํฐ๋ฝํธ ๋นํ์ฑํ ํจ์ +---------------------- + +์ธํฐ๋ฝํธ๋ฅผ ๋นํ์ฑํ ํ๋ ํจ์ (ACQUIRE ์ ๋์ผ) ์ ์ธํฐ๋ฝํธ๋ฅผ ํ์ฑํ ํ๋ ํจ์ +(RELEASE ์ ๋์ผ) ๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด์ฒ๋ผ๋ง ๋์ํฉ๋๋ค. ๋ฐ๋ผ์, ๋ณ๋์ ๋ฉ๋ชจ๋ฆฌ +๋ฐฐ๋ฆฌ์ด๋ I/O ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ์ํฉ์ด๋ผ๋ฉด ๊ทธ ๋ฐฐ๋ฆฌ์ด๋ค์ ์ธํฐ๋ฝํธ ๋นํ์ฑํ ํจ์ +์ธ์ ๋ฐฉ๋ฒ์ผ๋ก ์ ๊ณต๋์ด์ผ๋ง ํฉ๋๋ค. + + +์ฌ๋ฆฝ๊ณผ ์จ์ดํฌ์
ํจ์ +-------------------- + +๊ธ๋ก๋ฒ ๋ฐ์ดํฐ์ ํ์๋ ์ด๋ฒคํธ์ ์ํด ํ๋ก์ธ์ค๋ฅผ ์ ์ ๋น ํธ๋ฆฌ๋ ๊ฒ๊ณผ ๊นจ์ฐ๋ ๊ฒ์ +ํด๋น ์ด๋ฒคํธ๋ฅผ ๊ธฐ๋ค๋ฆฌ๋ ํ์คํฌ์ ํ์คํฌ ์ํ์ ๊ทธ ์ด๋ฒคํธ๋ฅผ ์๋ฆฌ๊ธฐ ์ํด ์ฌ์ฉ๋๋ +๊ธ๋ก๋ฒ ๋ฐ์ดํฐ, ๋ ๋ฐ์ดํฐ๊ฐ์ ์ํธ์์ฉ์ผ๋ก ๋ณผ ์ ์์ต๋๋ค. ์ด๊ฒ์ด ์ณ์ ์์๋๋ก +์ผ์ด๋จ์ ๋ถ๋ช
ํ ํ๊ธฐ ์ํด, ํ๋ก์ธ์ค๋ฅผ ์ ์ ๋ค๊ฒ ํ๋ ๊ธฐ๋ฅ๊ณผ ๊นจ์ฐ๋ ๊ธฐ๋ฅ์ +๋ช๊ฐ์ง ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํฉ๋๋ค. + +๋จผ์ , ์ ์ ์ฌ์ฐ๋ ์ชฝ์ ์ผ๋ฐ์ ์ผ๋ก ๋ค์๊ณผ ๊ฐ์ ์ด๋ฒคํธ ์ํ์ค๋ฅผ ๋ฐ๋ฆ
๋๋ค: + + for (;;) { + set_current_state(TASK_UNINTERRUPTIBLE); + if (event_indicated) + break; + schedule(); + } + +set_current_state() ์ ์ํด, ํ์คํฌ ์ํ๊ฐ ๋ฐ๋ ํ ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ +์๋์ผ๋ก ์ฝ์
๋ฉ๋๋ค: + + CPU 1 + =============================== + set_current_state(); + smp_store_mb(); + STORE current->state + <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> + LOAD event_indicated + +set_current_state() ๋ ๋ค์์ ๊ฒ๋ค๋ก ๊ฐ์ธ์ง ์๋ ์์ต๋๋ค: + + prepare_to_wait(); + prepare_to_wait_exclusive(); + +์ด๊ฒ๋ค ์ญ์ ์ํ๋ฅผ ์ค์ ํ ํ ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฝ์
ํฉ๋๋ค. +์์ ์ ์ฒด ์ํ์ค๋ ๋ค์๊ณผ ๊ฐ์ ํจ์๋ค๋ก ํ๋ฒ์ ์ํ ๊ฐ๋ฅํ๋ฐ, ์ด๊ฒ๋ค์ ๋ชจ๋ +์ฌ๋ฐ๋ฅธ ์ฅ์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฝ์
ํฉ๋๋ค: + + wait_event(); + wait_event_interruptible(); + wait_event_interruptible_exclusive(); + wait_event_interruptible_timeout(); + wait_event_killable(); + wait_event_timeout(); + wait_on_bit(); + wait_on_bit_lock(); + + +๋๋ฒ์งธ๋ก, ๊นจ์ฐ๊ธฐ๋ฅผ ์ํํ๋ ์ฝ๋๋ ์ผ๋ฐ์ ์ผ๋ก ๋ค์๊ณผ ๊ฐ์ ๊ฒ๋๋ค: + + event_indicated = 1; + wake_up(&event_wait_queue); + +๋๋: + + event_indicated = 1; + wake_up_process(event_daemon); + +wake_up() ๋ฅ์ ์ํด ์ฐ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ๋ดํฌ๋ฉ๋๋ค. ๋ง์ฝ ๊ทธ๊ฒ๋ค์ด ๋ญ๊ฐ๋ฅผ +๊นจ์ด๋ค๋ฉด์. ์ด ๋ฐฐ๋ฆฌ์ด๋ ํ์คํฌ ์ํ๊ฐ ์ง์์ง๊ธฐ ์ ์ ์ํ๋๋ฏ๋ก, ์ด๋ฒคํธ๋ฅผ +์๋ฆฌ๊ธฐ ์ํ STORE ์ ํ์คํฌ ์ํ๋ฅผ TASK_RUNNING ์ผ๋ก ์ค์ ํ๋ STORE ์ฌ์ด์ +์์นํ๊ฒ ๋ฉ๋๋ค. + + CPU 1 CPU 2 + =============================== =============================== + set_current_state(); STORE event_indicated + smp_store_mb(); wake_up(); + STORE current->state <์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด> + <๋ฒ์ฉ ๋ฐฐ๋ฆฌ์ด> STORE current->state + LOAD event_indicated + +ํ๋ฒ๋ ๋งํฉ๋๋ค๋ง, ์ด ์ฐ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์ด ์ฝ๋๊ฐ ์ ๋ง๋ก ๋ญ๊ฐ๋ฅผ ๊นจ์ธ ๋์๋ง +์คํ๋ฉ๋๋ค. ์ด๊ฑธ ์ค๋ช
ํ๊ธฐ ์ํด, X ์ Y ๋ ๋ชจ๋ 0 ์ผ๋ก ์ด๊ธฐํ ๋์ด ์๋ค๋ ๊ฐ์ +ํ์ ์๋์ ์ด๋ฒคํธ ์ํ์ค๋ฅผ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 + =============================== =============================== + X = 1; STORE event_indicated + smp_mb(); wake_up(); + Y = 1; wait_event(wq, Y == 1); + wake_up(); load from Y sees 1, no memory barrier + load from X might see 0 + +์ ์์ ์์์ ๊ฒฝ์ฐ์ ๋ฌ๋ฆฌ ๊นจ์ฐ๊ธฐ๊ฐ ์ ๋ง๋ก ํํด์ก๋ค๋ฉด, CPU 2 ์ X ๋ก๋๋ 1 ์ +๋ณธ๋ค๊ณ ๋ณด์ฅ๋ ์ ์์ ๊ฒ๋๋ค. + +์ฌ์ฉ ๊ฐ๋ฅํ ๊นจ์ฐ๊ธฐ๋ฅ ํจ์๋ค๋ก ๋ค์๊ณผ ๊ฐ์ ๊ฒ๋ค์ด ์์ต๋๋ค: + + complete(); + wake_up(); + wake_up_all(); + wake_up_bit(); + wake_up_interruptible(); + wake_up_interruptible_all(); + wake_up_interruptible_nr(); + wake_up_interruptible_poll(); + wake_up_interruptible_sync(); + wake_up_interruptible_sync_poll(); + wake_up_locked(); + wake_up_locked_poll(); + wake_up_nr(); + wake_up_poll(); + wake_up_process(); + + +[!] ์ ์ฌ์ฐ๋ ์ฝ๋์ ๊นจ์ฐ๋ ์ฝ๋์ ๋ดํฌ๋๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ค์ ๊นจ์ฐ๊ธฐ ์ ์ +์ด๋ฃจ์ด์ง ์คํ ์ด๋ฅผ ์ ์ฌ์ฐ๋ ์ฝ๋๊ฐ set_current_state() ๋ฅผ ํธ์ถํ ํ์ ํํ๋ +๋ก๋์ ๋ํด ์์๋ฅผ ๋ง์ถ์ง _์๋๋ค๋_ ์ ์ ๊ธฐ์ตํ์ธ์. ์๋ฅผ ๋ค์ด, ์ ์ฌ์ฐ๋ +์ฝ๋๊ฐ ๋ค์๊ณผ ๊ฐ๊ณ : + + set_current_state(TASK_INTERRUPTIBLE); + if (event_indicated) + break; + __set_current_state(TASK_RUNNING); + do_something(my_data); + +๊นจ์ฐ๋ ์ฝ๋๋ ๋ค์๊ณผ ๊ฐ๋ค๋ฉด: + + my_data = value; + event_indicated = 1; + wake_up(&event_wait_queue); + +event_indecated ์์ ๋ณ๊ฒฝ์ด ์ ์ฌ์ฐ๋ ์ฝ๋์๊ฒ my_data ์์ ๋ณ๊ฒฝ ํ์ ์ด๋ฃจ์ด์ง +๊ฒ์ผ๋ก ์ธ์ง๋ ๊ฒ์ด๋ผ๋ ๋ณด์ฅ์ด ์์ต๋๋ค. ์ด๋ฐ ๊ฒฝ์ฐ์๋ ์์ชฝ ์ฝ๋ ๋ชจ๋ ๊ฐ๊ฐ์ +๋ฐ์ดํฐ ์ก์ธ์ค ์ฌ์ด์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ง์ ์ณ์ผ ํฉ๋๋ค. ๋ฐ๋ผ์ ์์ ์ฌ์ฐ๋ +์ฝ๋๋ ๋ค์๊ณผ ๊ฐ์ด: + + set_current_state(TASK_INTERRUPTIBLE); + if (event_indicated) { + smp_rmb(); + do_something(my_data); + } + +๊ทธ๋ฆฌ๊ณ ๊นจ์ฐ๋ ์ฝ๋๋ ๋ค์๊ณผ ๊ฐ์ด ๋์ด์ผ ํฉ๋๋ค: + + my_data = value; + smp_wmb(); + event_indicated = 1; + wake_up(&event_wait_queue); + + +๊ทธ์ธ์ ํจ์๋ค +------------- + +๊ทธ์ธ์ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ๋ ํจ์๋ค์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + (*) schedule() ๊ณผ ๊ทธ ์ ์ฌํ ๊ฒ๋ค์ด ์์ ํ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํฉ๋๋ค. + + +============================== +CPU ๊ฐ ACQUIRING ๋ฐฐ๋ฆฌ์ด์ ํจ๊ณผ +============================== + +SMP ์์คํ
์์์ ๋ฝ ๊ธฐ๋ฅ๋ค์ ๋์ฑ ๊ฐ๋ ฅํ ํํ์ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ ๊ณตํฉ๋๋ค: ์ด +๋ฐฐ๋ฆฌ์ด๋ ๋์ผํ ๋ฝ์ ์ฌ์ฉํ๋ ๋ค๋ฅธ CPU ๋ค์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ์์์๋ ์ํฅ์ +๋ผ์นฉ๋๋ค. + + +ACQUIRE VS ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค +------------------------ + +๋ค์์ ์๋ฅผ ์๊ฐํด ๋ด
์๋ค: ์์คํ
์ ๋๊ฐ์ ์คํ๋ฝ (M) ๊ณผ (Q), ๊ทธ๋ฆฌ๊ณ ์ธ๊ฐ์ CPU +๋ฅผ ๊ฐ์ง๊ณ ์์ต๋๋ค; ์ฌ๊ธฐ์ ๋ค์์ ์ด๋ฒคํธ ์ํ์ค๊ฐ ๋ฐ์ํฉ๋๋ค: + + CPU 1 CPU 2 + =============================== =============================== + WRITE_ONCE(*A, a); WRITE_ONCE(*E, e); + ACQUIRE M ACQUIRE Q + WRITE_ONCE(*B, b); WRITE_ONCE(*F, f); + WRITE_ONCE(*C, c); WRITE_ONCE(*G, g); + RELEASE M RELEASE Q + WRITE_ONCE(*D, d); WRITE_ONCE(*H, h); + +*A ๋ก์ ์ก์ธ์ค๋ถํฐ *H ๋ก์ ์ก์ธ์ค๊น์ง๊ฐ ์ด๋ค ์์๋ก CPU 3 ์๊ฒ ๋ณด์ฌ์ง์ง์ +๋ํด์๋ ๊ฐ CPU ์์์ ๋ฝ ์ฌ์ฉ์ ์ํด ๋ดํฌ๋์ด ์๋ ์ ์ฝ์ ์ ์ธํ๊ณ ๋ ์ด๋ค +๋ณด์ฅ๋ ์กด์ฌํ์ง ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, CPU 3 ์๊ฒ ๋ค์๊ณผ ๊ฐ์ ์์๋ก ๋ณด์ฌ์ง๋ +๊ฒ์ด ๊ฐ๋ฅํฉ๋๋ค: + + *E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M + +ํ์ง๋ง ๋ค์๊ณผ ๊ฐ์ด ๋ณด์ด์ง๋ ์์ ๊ฒ๋๋ค: + + *B, *C or *D preceding ACQUIRE M + *A, *B or *C following RELEASE M + *F, *G or *H preceding ACQUIRE Q + *E, *F or *G following RELEASE Q + + + +ACQUIRE VS I/O ์ก์ธ์ค +---------------------- + +ํน์ ํ (ํนํ NUMA ๊ฐ ๊ด๋ จ๋) ํ๊ฒฝ ํ์์ ๋๊ฐ์ CPU ์์ ๋์ผํ ์คํ๋ฝ์ผ๋ก +๋ณดํธ๋๋ ๋๊ฐ์ ํฌ๋ฆฌํฐ์ปฌ ์น์
์์ I/O ์ก์ธ์ค๋ PCI ๋ธ๋ฆฟ์ง์ ๊ฒน์ณ์ง I/O +์ก์ธ์ค๋ก ๋ณด์ผ ์ ์๋๋ฐ, PCI ๋ธ๋ฆฟ์ง๋ ์บ์ ์ผ๊ด์ฑ ํ๋กํ ์ฝ๊ณผ ํฉ์ ๋ง์ถฐ์ผ ํ +์๋ฌด๊ฐ ์์ผ๋ฏ๋ก, ํ์ํ ์ฝ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ์์ฒญ๋์ง ์๊ธฐ ๋๋ฌธ์
๋๋ค. + +์๋ฅผ ๋ค์ด์: + + CPU 1 CPU 2 + =============================== =============================== + spin_lock(Q) + writel(0, ADDR) + writel(1, DATA); + spin_unlock(Q); + spin_lock(Q); + writel(4, ADDR); + writel(5, DATA); + spin_unlock(Q); + +๋ PCI ๋ธ๋ฆฟ์ง์ ๋ค์๊ณผ ๊ฐ์ด ๋ณด์ผ ์ ์์ต๋๋ค: + + STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5 + +์ด๋ ๊ฒ ๋๋ฉด ํ๋์จ์ด์ ์ค๋์์ ์ผ์ผํฌ ์ ์์ต๋๋ค. + + +์ด๋ฐ ๊ฒฝ์ฐ์ ์ก์๋ ์คํ๋ฝ์ ๋ด๋ ค๋๊ธฐ ์ ์ mmiowb() ๋ฅผ ์ํํด์ผ ํ๋๋ฐ, ์๋ฅผ +๋ค๋ฉด ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + CPU 1 CPU 2 + =============================== =============================== + spin_lock(Q) + writel(0, ADDR) + writel(1, DATA); + mmiowb(); + spin_unlock(Q); + spin_lock(Q); + writel(4, ADDR); + writel(5, DATA); + mmiowb(); + spin_unlock(Q); + +์ด ์ฝ๋๋ CPU 1 ์์ ์์ฒญ๋ ๋๊ฐ์ ์คํ ์ด๊ฐ PCI ๋ธ๋ฆฟ์ง์ CPU 2 ์์ ์์ฒญ๋ +์คํ ์ด๋ค๋ณด๋ค ๋จผ์ ๋ณด์ฌ์ง์ ๋ณด์ฅํฉ๋๋ค. + + +๋ํ, ๊ฐ์ ๋๋ฐ์ด์ค์์ ์คํ ์ด๋ฅผ ์ด์ด ๋ก๋๊ฐ ์ํ๋๋ฉด ์ด ๋ก๋๋ ๋ก๋๊ฐ ์ํ๋๊ธฐ +์ ์ ์คํ ์ด๊ฐ ์๋ฃ๋๊ธฐ๋ฅผ ๊ฐ์ ํ๋ฏ๋ก mmiowb() ์ ํ์๊ฐ ์์ด์ง๋๋ค: + + CPU 1 CPU 2 + =============================== =============================== + spin_lock(Q) + writel(0, ADDR) + a = readl(DATA); + spin_unlock(Q); + spin_lock(Q); + writel(4, ADDR); + b = readl(DATA); + spin_unlock(Q); + + +๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ Documenataion/DocBook/deviceiobook.tmpl ์ ์ฐธ๊ณ ํ์ธ์. + + +========================= +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ๊ณณ +========================= + +์ค๋ น SMP ์ปค๋์ ์ฌ์ฉํ๋๋ผ๋ ์ฑ๊ธ ์ฐ๋ ๋๋ก ๋์ํ๋ ์ฝ๋๋ ์ฌ๋ฐ๋ฅด๊ฒ ๋์ํ๋ +๊ฒ์ผ๋ก ๋ณด์ฌ์ง ๊ฒ์ด๊ธฐ ๋๋ฌธ์, ํ๋ฒํ ์์คํ
์ด์์ค์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
์ฌ๋ฐฐ์น๋ +์ผ๋ฐ์ ์ผ๋ก ๋ฌธ์ ๊ฐ ๋์ง ์์ต๋๋ค. ํ์ง๋ง, ์ฌ๋ฐฐ์น๊ฐ ๋ฌธ์ ๊ฐ _๋ ์ ์๋_ ๋ค๊ฐ์ง +ํ๊ฒฝ์ด ์์ต๋๋ค: + + (*) ํ๋ก์ธ์๊ฐ ์ํธ ์์ฉ. + + (*) ์ดํ ๋ฏน ์คํผ๋ ์ด์
. + + (*) ๋๋ฐ์ด์ค ์ก์ธ์ค. + + (*) ์ธํฐ๋ฝํธ. + + +ํ๋ก์ธ์๊ฐ ์ํธ ์์ฉ +-------------------- + +๋๊ฐ ์ด์์ ํ๋ก์ธ์๋ฅผ ๊ฐ์ง ์์คํ
์ด ์๋ค๋ฉด, ์์คํ
์ ๋๊ฐ ์ด์์ CPU ๋ ๋์์ +๊ฐ์ ๋ฐ์ดํฐ์ ๋ํ ์์
์ ํ ์ ์์ต๋๋ค. ์ด๋ ๋๊ธฐํ ๋ฌธ์ ๋ฅผ ์ผ์ผํฌ ์ ์๊ณ , +์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ์ผ๋ฐ์ ๋ฐฉ๋ฒ์ ๋ฝ์ ์ฌ์ฉํ๋ ๊ฒ์
๋๋ค. ํ์ง๋ง, ๋ฝ์ ์๋นํ +๋น์ฉ์ด ๋น์ธ์ ๊ฐ๋ฅํ๋ฉด ๋ฝ์ ์ฌ์ฉํ์ง ์๊ณ ์ผ์ ์ฒ๋ฆฌํ๋ ๊ฒ์ด ๋ซ์ต๋๋ค. ์ด๋ฐ +๊ฒฝ์ฐ, ๋ CPU ๋ชจ๋์ ์ํฅ์ ๋ผ์น๋ ์คํผ๋ ์ด์
๋ค์ ์ค๋์์ ๋ง๊ธฐ ์ํด ์ ์คํ๊ฒ +์์๊ฐ ๋ง์ถฐ์ ธ์ผ ํฉ๋๋ค. + +์๋ฅผ ๋ค์ด, R/W ์ธ๋งํฌ์ด์ ๋๋ฆฐ ์ํ๊ฒฝ๋ก (slow path) ๋ฅผ ์๊ฐํด ๋ด
์๋ค. +์ธ๋งํฌ์ด๋ฅผ ์ํด ๋๊ธฐ๋ฅผ ํ๋ ํ๋์ ํ๋ก์ธ์ค๊ฐ ์์ ์ ์คํ ์ค ์ผ๋ถ๋ฅผ ์ด +์ธ๋งํฌ์ด์ ๋๊ธฐ ํ๋ก์ธ์ค ๋ฆฌ์คํธ์ ๋งํฌํ ์ฑ๋ก ์์ต๋๋ค: + + struct rw_semaphore { + ... + spinlock_t lock; + struct list_head waiters; + }; + + struct rwsem_waiter { + struct list_head list; + struct task_struct *task; + }; + +ํน์ ๋๊ธฐ ์ํ ํ๋ก์ธ์ค๋ฅผ ๊นจ์ฐ๊ธฐ ์ํด, up_read() ๋ up_write() ํจ์๋ ๋ค์๊ณผ +๊ฐ์ ์ผ์ ํฉ๋๋ค: + + (1) ๋ค์ ๋๊ธฐ ์ํ ํ๋ก์ธ์ค ๋ ์ฝ๋๋ ์ด๋์๋์ง ์๊ธฐ ์ํด ์ด ๋๊ธฐ ์ํ + ํ๋ก์ธ์ค ๋ ์ฝ๋์ next ํฌ์ธํฐ๋ฅผ ์ฝ์ต๋๋ค; + + (2) ์ด ๋๊ธฐ ์ํ ํ๋ก์ธ์ค์ task ๊ตฌ์กฐ์ฒด๋ก์ ํฌ์ธํฐ๋ฅผ ์ฝ์ต๋๋ค; + + (3) ์ด ๋๊ธฐ ์ํ ํ๋ก์ธ์ค๊ฐ ์ธ๋งํฌ์ด๋ฅผ ํ๋ํ์์ ์๋ฆฌ๊ธฐ ์ํด task + ํฌ์ธํฐ๋ฅผ ์ด๊ธฐํ ํฉ๋๋ค; + + (4) ํด๋น ํ์คํฌ์ ๋ํด wake_up_process() ๋ฅผ ํธ์ถํฉ๋๋ค; ๊ทธ๋ฆฌ๊ณ + + (5) ํด๋น ๋๊ธฐ ์ํ ํ๋ก์ธ์ค์ task ๊ตฌ์กฐ์ฒด๋ฅผ ์ก๊ณ ์๋ ๋ ํผ๋ฐ์ค๋ฅผ ํด์ ํฉ๋๋ค. + +๋ฌ๋ฆฌ ๋งํ์๋ฉด, ๋ค์ ์ด๋ฒคํธ ์ํ์ค๋ฅผ ์ํํด์ผ ํฉ๋๋ค: + + LOAD waiter->list.next; + LOAD waiter->task; + STORE waiter->task; + CALL wakeup + RELEASE task + +๊ทธ๋ฆฌ๊ณ ์ด ์ด๋ฒคํธ๋ค์ด ๋ค๋ฅธ ์์๋ก ์ํ๋๋ค๋ฉด, ์ค๋์์ด ์ผ์ด๋ ์ ์์ต๋๋ค. + +ํ๋ฒ ์ธ๋งํฌ์ด์ ๋๊ธฐ์ค์ ๋ค์ด๊ฐ๊ณ ์ธ๋งํฌ์ด ๋ฝ์ ๋์๋ค๋ฉด, ํด๋น ๋๊ธฐ ํ๋ก์ธ์ค๋ +๋ฝ์ ๋ค์๋ ์ก์ง ์์ต๋๋ค; ๋์ ์์ ์ task ํฌ์ธํฐ๊ฐ ์ด๊ธฐํ ๋๊ธธ ๊ธฐ๋ค๋ฆฝ๋๋ค. +๊ทธ ๋ ์ฝ๋๋ ๋๊ธฐ ํ๋ก์ธ์ค์ ์คํ์ ์๊ธฐ ๋๋ฌธ์, ๋ฆฌ์คํธ์ next ํฌ์ธํฐ๊ฐ ์ฝํ์ง๊ธฐ +_์ ์_ task ํฌ์ธํฐ๊ฐ ์ง์์ง๋ค๋ฉด, ๋ค๋ฅธ CPU ๋ ํด๋น ๋๊ธฐ ํ๋ก์ธ์ค๋ฅผ ์์ํด ๋ฒ๋ฆฌ๊ณ +up*() ํจ์๊ฐ next ํฌ์ธํฐ๋ฅผ ์ฝ๊ธฐ ์ ์ ๋๊ธฐ ํ๋ก์ธ์ค์ ์คํ์ ๋ง๊ตฌ ๊ฑด๋๋ฆด ์ +์์ต๋๋ค. + +๊ทธ๋ ๊ฒ ๋๋ฉด ์์ ์ด๋ฒคํธ ์ํ์ค์ ์ด๋ค ์ผ์ด ์ผ์ด๋๋์ง ์๊ฐํด ๋ณด์ฃ : + + CPU 1 CPU 2 + =============================== =============================== + down_xxx() + Queue waiter + Sleep + up_yyy() + LOAD waiter->task; + STORE waiter->task; + Woken up by other event + <preempt> + Resume processing + down_xxx() returns + call foo() + foo() clobbers *waiter + </preempt> + LOAD waiter->list.next; + --- OOPS --- + +์ด ๋ฌธ์ ๋ ์ธ๋งํฌ์ด ๋ฝ์ ์ฌ์ฉ์ผ๋ก ํด๊ฒฐ๋ ์๋ ์๊ฒ ์ง๋ง, ๊ทธ๋ ๊ฒ ๋๋ฉด ๊นจ์ด๋ ํ์ +down_xxx() ํจ์๊ฐ ๋ถํ์ํ๊ฒ ์คํ๋ฝ์ ๋๋ค์ ์ป์ด์ผ๋ง ํฉ๋๋ค. + +์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ๋ฐฉ๋ฒ์ ๋ฒ์ฉ SMP ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ถ๊ฐํ๋ ๊ฒ๋๋ค: + + LOAD waiter->list.next; + LOAD waiter->task; + smp_mb(); + STORE waiter->task; + CALL wakeup + RELEASE task + +์ด ๊ฒฝ์ฐ์, ๋ฐฐ๋ฆฌ์ด๋ ์์คํ
์ ๋๋จธ์ง CPU ๋ค์๊ฒ ๋ชจ๋ ๋ฐฐ๋ฆฌ์ด ์์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๊ฐ +๋ฐฐ๋ฆฌ์ด ๋ค์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ณด๋ค ์์ ์ผ์ด๋ ๊ฒ์ผ๋ก ๋ณด์ด๊ฒ ๋ง๋ญ๋๋ค. ๋ฐฐ๋ฆฌ์ด ์์ +๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ค์ด ๋ฐฐ๋ฆฌ์ด ๋ช
๋ น ์์ฒด๊ฐ ์๋ฃ๋๋ ์์ ๊น์ง ์๋ฃ๋๋ค๊ณ ๋ ๋ณด์ฅํ์ง +_์์ต๋๋ค_. + +(์ด๊ฒ ๋ฌธ์ ๊ฐ ๋์ง ์์) ๋จ์ผ ํ๋ก์ธ์ ์์คํ
์์ smp_mb() ๋ ์ค์ ๋ก๋ ๊ทธ์ +์ปดํ์ผ๋ฌ๊ฐ CPU ์์์์ ์์๋ฅผ ๋ฐ๊พธ๊ฑฐ๋ ํ์ง ์๊ณ ์ฃผ์ด์ง ์์๋๋ก ๋ช
๋ น์ +๋ด๋ฆฌ๋๋ก ํ๋ ์ปดํ์ผ๋ฌ ๋ฐฐ๋ฆฌ์ด์ผ ๋ฟ์
๋๋ค. ์ค์ง ํ๋์ CPU ๋ง ์์ผ๋, CPU ์ +์์กด์ฑ ์์ ๋ก์ง์ด ๊ทธ ์ธ์ ๋ชจ๋ ๊ฒ์ ์์์ ์ฒ๋ฆฌํ ๊ฒ๋๋ค. + + +์ดํ ๋ฏน ์คํผ๋ ์ด์
+----------------- + +์ดํ ๋ฏน ์คํผ๋ ์ด์
์ ๊ธฐ์ ์ ์ผ๋ก ํ๋ก์ธ์๊ฐ ์ํธ์์ฉ์ผ๋ก ๋ถ๋ฅ๋๋ฉฐ ๊ทธ ์ค ์ผ๋ถ๋ +์ ์ฒด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ๊ณ ๋ ์ผ๋ถ๋ ๋ดํฌํ์ง ์์ง๋ง, ์ปค๋์์ ์๋นํ +์์กด์ ์ผ๋ก ์ฌ์ฉํ๋ ๊ธฐ๋ฅ ์ค ํ๋์
๋๋ค. + +๋ฉ๋ชจ๋ฆฌ์ ์ด๋ค ์ํ๋ฅผ ์์ ํ๊ณ ํด๋น ์ํ์ ๋ํ (์์ ์ ๋๋ ์ต์ ์) ์ ๋ณด๋ฅผ +๋ฆฌํดํ๋ ์ดํ ๋ฏน ์คํผ๋ ์ด์
์ ๋ชจ๋ SMP-์กฐ๊ฑด์ ๋ฒ์ฉ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด(smp_mb())๋ฅผ +์ค์ ์คํผ๋ ์ด์
์ ์๊ณผ ๋ค์ ๋ดํฌํฉ๋๋ค. ์ด๋ฐ ์คํผ๋ ์ด์
์ ๋ค์์ ๊ฒ๋ค์ +ํฌํจํฉ๋๋ค: + + xchg(); + atomic_xchg(); atomic_long_xchg(); + atomic_inc_return(); atomic_long_inc_return(); + atomic_dec_return(); atomic_long_dec_return(); + atomic_add_return(); atomic_long_add_return(); + atomic_sub_return(); atomic_long_sub_return(); + atomic_inc_and_test(); atomic_long_inc_and_test(); + atomic_dec_and_test(); atomic_long_dec_and_test(); + atomic_sub_and_test(); atomic_long_sub_and_test(); + atomic_add_negative(); atomic_long_add_negative(); + test_and_set_bit(); + test_and_clear_bit(); + test_and_change_bit(); + + /* exchange ์กฐ๊ฑด์ด ์ฑ๊ณตํ ๋ */ + cmpxchg(); + atomic_cmpxchg(); atomic_long_cmpxchg(); + atomic_add_unless(); atomic_long_add_unless(); + +์ด๊ฒ๋ค์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ํจ๊ณผ๊ฐ ํ์ํ ACQUIRE ๋ถ๋ฅ์ RELEASE ๋ถ๋ฅ ์คํผ๋ ์ด์
๋ค์ +๊ตฌํํ ๋, ๊ทธ๋ฆฌ๊ณ ๊ฐ์ฒด ํด์ ๋ฅผ ์ํด ๋ ํผ๋ฐ์ค ์นด์ดํฐ๋ฅผ ์กฐ์ ํ ๋, ์๋ฌต์ ๋ฉ๋ชจ๋ฆฌ +๋ฐฐ๋ฆฌ์ด ํจ๊ณผ๊ฐ ํ์ํ ๊ณณ ๋ฑ์ ์ฌ์ฉ๋ฉ๋๋ค. + + +๋ค์์ ์คํผ๋ ์ด์
๋ค์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ์ง _์๊ธฐ_ ๋๋ฌธ์ ๋ฌธ์ ๊ฐ ๋ ์ +์์ง๋ง, RELEASE ๋ถ๋ฅ์ ์คํผ๋ ์ด์
๋ค๊ณผ ๊ฐ์ ๊ฒ๋ค์ ๊ตฌํํ ๋ ์ฌ์ฉ๋ ์๋ +์์ต๋๋ค: + + atomic_set(); + set_bit(); + clear_bit(); + change_bit(); + +์ด๊ฒ๋ค์ ์ฌ์ฉํ ๋์๋ ํ์ํ๋ค๋ฉด ์ ์ ํ (์๋ฅผ ๋ค๋ฉด smp_mb__before_atomic() +๊ฐ์) ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ๋ช
์์ ์ผ๋ก ํจ๊ป ์ฌ์ฉ๋์ด์ผ ํฉ๋๋ค. + + +์๋์ ๊ฒ๋ค๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌํ์ง _์๊ธฐ_ ๋๋ฌธ์, ์ผ๋ถ ํ๊ฒฝ์์๋ (์๋ฅผ +๋ค๋ฉด smp_mb__before_atomic() ๊ณผ ๊ฐ์) ๋ช
์์ ์ธ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ฌ์ฉ์ด ํ์ํฉ๋๋ค. + + atomic_add(); + atomic_sub(); + atomic_inc(); + atomic_dec(); + +์ด๊ฒ๋ค์ด ํต๊ณ ์์ฑ์ ์ํด ์ฌ์ฉ๋๋ค๋ฉด, ๊ทธ๋ฆฌ๊ณ ํต๊ณ ๋ฐ์ดํฐ ์ฌ์ด์ ๊ด๊ณ๊ฐ ์กด์ฌํ์ง +์๋๋ค๋ฉด ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ํ์์น ์์ ๊ฒ๋๋ค. + +๊ฐ์ฒด์ ์๋ช
์ ๊ด๋ฆฌํ๊ธฐ ์ํด ๋ ํผ๋ฐ์ค ์นด์ดํ
๋ชฉ์ ์ผ๋ก ์ฌ์ฉ๋๋ค๋ฉด, ๋ ํผ๋ฐ์ค +์นด์ดํฐ๋ ๋ฝ์ผ๋ก ๋ณดํธ๋๋ ์น์
์์๋ง ์กฐ์ ๋๊ฑฐ๋ ํธ์ถํ๋ ์ชฝ์ด ์ด๋ฏธ ์ถฉ๋ถํ +๋ ํผ๋ฐ์ค๋ฅผ ์ก๊ณ ์์ ๊ฒ์ด๊ธฐ ๋๋ฌธ์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์๋ง ํ์ ์์ ๊ฒ๋๋ค. + +๋ง์ฝ ์ด๋ค ๋ฝ์ ๊ตฌ์ฑํ๊ธฐ ์ํด ์ฌ์ฉ๋๋ค๋ฉด, ๋ฝ ๊ด๋ จ ๋์์ ์ผ๋ฐ์ ์ผ๋ก ์์
์ ํน์ +์์๋๋ก ์งํํด์ผ ํ๋ฏ๋ก ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ์ ์์ต๋๋ค. + +๊ธฐ๋ณธ์ ์ผ๋ก, ๊ฐ ์ฌ์ฉ์ฒ์์๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ์ง ์๋์ง ์ถฉ๋ถํ ๊ณ ๋ คํด์ผ +ํฉ๋๋ค. + +์๋์ ์คํผ๋ ์ด์
๋ค์ ํน๋ณํ ๋ฝ ๊ด๋ จ ๋์๋ค์
๋๋ค: + + test_and_set_bit_lock(); + clear_bit_unlock(); + __clear_bit_unlock(); + +์ด๊ฒ๋ค์ ACQUIRE ๋ฅ์ RELEASE ๋ฅ์ ์คํผ๋ ์ด์
๋ค์ ๊ตฌํํฉ๋๋ค. ๋ฝ ๊ด๋ จ ๋๊ตฌ๋ฅผ +๊ตฌํํ ๋์๋ ์ด๊ฒ๋ค์ ์ข ๋ ์ ํธํ๋ ํธ์ด ๋์๋ฐ, ์ด๊ฒ๋ค์ ๊ตฌํ์ ๋ง์ +์ํคํ
์ณ์์ ์ต์ ํ ๋ ์ ์๊ธฐ ๋๋ฌธ์
๋๋ค. + +[!] ์ด๋ฐ ์ํฉ์ ์ฌ์ฉํ ์ ์๋ ํน์ํ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ๋๊ตฌ๋ค์ด ์์ต๋๋ค๋ง, ์ผ๋ถ +CPU ์์๋ ์ฌ์ฉ๋๋ ์ดํ ๋ฏน ์ธ์คํธ๋ญ์
์์ฒด์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ๋ดํฌ๋์ด ์์ด์ +์ดํ ๋ฏน ์คํผ๋ ์ด์
๊ณผ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ํจ๊ป ์ฌ์ฉํ๋ ๊ฒ ๋ถํ์ํ ์ผ์ด ๋ ์ +์๋๋ฐ, ๊ทธ๋ฐ ๊ฒฝ์ฐ์ ์ด ํน์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ๋๊ตฌ๋ค์ no-op ์ด ๋์ด ์ค์ง์ ์ผ๋ก +์๋ฌด์ผ๋ ํ์ง ์์ต๋๋ค. + +๋ ๋ง์ ๋ด์ฉ์ ์ํด์ Documentation/atomic_ops.txt ๋ฅผ ์ฐธ๊ณ ํ์ธ์. + + +๋๋ฐ์ด์ค ์ก์ธ์ค +--------------- + +๋ง์ ๋๋ฐ์ด์ค๊ฐ ๋ฉ๋ชจ๋ฆฌ ๋งคํ ๊ธฐ๋ฒ์ผ๋ก ์ ์ด๋ ์ ์๋๋ฐ, ๊ทธ๋ ๊ฒ ์ ์ด๋๋ +๋๋ฐ์ด์ค๋ CPU ์๋ ๋จ์ง ํน์ ๋ฉ๋ชจ๋ฆฌ ์์ญ์ ์งํฉ์ฒ๋ผ ๋ณด์ด๊ฒ ๋ฉ๋๋ค. ๋๋ผ์ด๋ฒ๋ +๊ทธ๋ฐ ๋๋ฐ์ด์ค๋ฅผ ์ ์ดํ๊ธฐ ์ํด ์ ํํ ์ฌ๋ฐ๋ฅธ ์์๋ก ์ฌ๋ฐ๋ฅธ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ฅผ +๋ง๋ค์ด์ผ ํฉ๋๋ค. + +ํ์ง๋ง, ์ก์ธ์ค๋ค์ ์ฌ๋ฐฐ์น ํ๊ฑฐ๋ ์กฐํฉํ๊ฑฐ๋ ๋ณํฉํ๋๊ฒ ๋ ํจ์จ์ ์ด๋ผ ํ๋จํ๋ +์๋ฆฌํ CPU ๋ ์ปดํ์ผ๋ฌ๋ค์ ์ฌ์ฉํ๋ฉด ๋๋ผ์ด๋ฒ ์ฝ๋์ ์กฐ์ฌ์ค๋ฝ๊ฒ ์์ ๋ง์ถฐ์ง +์ก์ธ์ค๋ค์ด ๋๋ฐ์ด์ค์๋ ์์ฒญ๋ ์์๋๋ก ๋์ฐฉํ์ง ๋ชปํ๊ฒ ํ ์ ์๋ - ๋๋ฐ์ด์ค๊ฐ +์ค๋์์ ํ๊ฒ ํ - ์ ์ฌ์ ๋ฌธ์ ๊ฐ ์๊ธธ ์ ์์ต๋๋ค. + +๋ฆฌ๋
์ค ์ปค๋ ๋ด๋ถ์์, I/O ๋ ์ด๋ป๊ฒ ์ก์ธ์ค๋ค์ ์ ์ ํ ์์ฐจ์ ์ด๊ฒ ๋ง๋ค ์ ์๋์ง +์๊ณ ์๋, - inb() ๋ writel() ๊ณผ ๊ฐ์ - ์ ์ ํ ์ก์ธ์ค ๋ฃจํด์ ํตํด ์ด๋ฃจ์ด์ ธ์ผ๋ง +ํฉ๋๋ค. ์ด๊ฒ๋ค์ ๋๋ถ๋ถ์ ๊ฒฝ์ฐ์๋ ๋ช
์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ์ ํจ๊ป ์ฌ์ฉ๋ ํ์๊ฐ +์์ต๋๋ค๋ง, ๋ค์์ ๋๊ฐ์ง ์ํฉ์์๋ ๋ช
์์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ์ ์์ต๋๋ค: + + (1) ์ผ๋ถ ์์คํ
์์ I/O ์คํ ์ด๋ ๋ชจ๋ CPU ์ ์ผ๊ด๋๊ฒ ์์ ๋ง์ถฐ์ง์ง ์๋๋ฐ, + ๋ฐ๋ผ์ _๋ชจ๋ _ ์ผ๋ฐ์ ์ธ ๋๋ผ์ด๋ฒ๋ค์ ๋ฝ์ด ์ฌ์ฉ๋์ด์ผ๋ง ํ๊ณ ์ด ํฌ๋ฆฌํฐ์ปฌ + ์น์
์ ๋น ์ ธ๋์ค๊ธฐ ์ ์ mmiowb() ๊ฐ ๊ผญ ํธ์ถ๋์ด์ผ ํฉ๋๋ค. + + (2) ๋ง์ฝ ์ก์ธ์ค ํจ์๋ค์ด ์ํ๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ์์ฑ์ ๊ฐ๋ I/O ๋ฉ๋ชจ๋ฆฌ ์๋์ฐ๋ฅผ + ์ฌ์ฉํ๋ค๋ฉด, ์์๋ฅผ ๊ฐ์ ํ๊ธฐ ์ํด์ _mandatory_ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํฉ๋๋ค. + +๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ Documentation/DocBook/deviceiobook.tmpl ์ ์ฐธ๊ณ ํ์ญ์์ค. + + +์ธํฐ๋ฝํธ +-------- + +๋๋ผ์ด๋ฒ๋ ์์ ์ ์ธํฐ๋ฝํธ ์๋น์ค ๋ฃจํด์ ์ํด ์ธํฐ๋ฝํธ ๋นํ ์ ์๊ธฐ ๋๋ฌธ์ +๋๋ผ์ด๋ฒ์ ์ด ๋ ๋ถ๋ถ์ ์๋ก์ ๋๋ฐ์ด์ค ์ ์ด ๋๋ ์ก์ธ์ค ๋ถ๋ถ๊ณผ ์ํธ ๊ฐ์ญํ ์ +์์ต๋๋ค. + +์ค์ค๋ก์๊ฒ ์ธํฐ๋ฝํธ ๋นํ๋ ๊ฑธ ๋ถ๊ฐ๋ฅํ๊ฒ ํ๊ณ , ๋๋ผ์ด๋ฒ์ ํฌ๋ฆฌํฐ์ปฌํ +์คํผ๋ ์ด์
๋ค์ ๋ชจ๋ ์ธํฐ๋ฝํธ๊ฐ ๋ถ๊ฐ๋ฅํ๊ฒ ๋ ์์ญ์ ์ง์ด๋ฃ๊ฑฐ๋ ํ๋ ๋ฐฉ๋ฒ (๋ฝ์ +ํ ํํ) ์ผ๋ก ์ด๋ฐ ์ํธ ๊ฐ์ญ์ - ์ต์ํ ๋ถ๋ถ์ ์ผ๋ก๋ผ๋ - ์ค์ผ ์ ์์ต๋๋ค. +๋๋ผ์ด๋ฒ์ ์ธํฐ๋ฝํธ ๋ฃจํด์ด ์คํ ์ค์ธ ๋์, ํด๋น ๋๋ผ์ด๋ฒ์ ์ฝ์ด๋ ๊ฐ์ CPU ์์ +์ํ๋์ง ์์ ๊ฒ์ด๋ฉฐ, ํ์ฌ์ ์ธํฐ๋ฝํธ๊ฐ ์ฒ๋ฆฌ๋๋ ์ค์๋ ๋๋ค์ ์ธํฐ๋ฝํธ๊ฐ +์ผ์ด๋์ง ๋ชปํ๋๋ก ๋์ด ์์ผ๋ ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ๋ ๊ทธ์ ๋ํด์๋ ๋ฝ์ ์ก์ง ์์๋ +๋ฉ๋๋ค. + +ํ์ง๋ง, ์ด๋๋ ์ค ๋ ์ง์คํฐ์ ๋ฐ์ดํฐ ๋ ์ง์คํฐ๋ฅผ ๊ฐ๋ ์ด๋๋ท ์นด๋๋ฅผ ๋ค๋ฃจ๋ +๋๋ผ์ด๋ฒ๋ฅผ ์๊ฐํด ๋ด
์๋ค. ๋ง์ฝ ์ด ๋๋ผ์ด๋ฒ์ ์ฝ์ด๊ฐ ์ธํฐ๋ฝํธ๋ฅผ ๋นํ์ฑํ์ํจ +์ฑ๋ก ์ด๋๋ท ์นด๋์ ๋ํํ๊ณ ๋๋ผ์ด๋ฒ์ ์ธํฐ๋ฝํธ ํธ๋ค๋ฌ๊ฐ ํธ์ถ๋์๋ค๋ฉด: + + LOCAL IRQ DISABLE + writew(ADDR, 3); + writew(DATA, y); + LOCAL IRQ ENABLE + <interrupt> + writew(ADDR, 4); + q = readw(DATA); + </interrupt> + +๋ง์ฝ ์์ ๊ท์น์ด ์ถฉ๋ถํ ์ํ๋์ด ์๋ค๋ฉด ๋ฐ์ดํฐ ๋ ์ง์คํฐ์์ ์คํ ์ด๋ ์ด๋๋ ์ค +๋ ์ง์คํฐ์ ๋๋ฒ์งธ๋ก ํํด์ง๋ ์คํ ์ด ๋ค์ ์ผ์ด๋ ์๋ ์์ต๋๋ค: + + STORE *ADDR = 3, STORE *ADDR = 4, STORE *DATA = y, q = LOAD *DATA + + +๋ง์ฝ ์์ ๊ท์น์ด ์ถฉ๋ถํ ์ํ๋์ด ์๊ณ ๋ฌต์์ ์ผ๋ก๋ ๋ช
์์ ์ผ๋ก๋ ๋ฐฐ๋ฆฌ์ด๊ฐ +์ฌ์ฉ๋์ง ์์๋ค๋ฉด ์ธํฐ๋ฝํธ ๋นํ์ฑํ ์น์
์์ ์ผ์ด๋ ์ก์ธ์ค๊ฐ ๋ฐ๊นฅ์ผ๋ก ์์ด์ +์ธํฐ๋ฝํธ ๋ด์์ ์ผ์ด๋ ์ก์ธ์ค์ ์์ผ ์ ์๋ค๊ณ - ๊ทธ๋ฆฌ๊ณ ๊ทธ ๋ฐ๋๋ - ๊ฐ์ ํด์ผ๋ง +ํฉ๋๋ค. + +๊ทธ๋ฐ ์์ญ ์์์ ์ผ์ด๋๋ I/O ์ก์ธ์ค๋ค์ ์๊ฒฉํ ์์ ๊ท์น์ I/O ๋ ์ง์คํฐ์ +๋ฌต์์ I/O ๋ฐฐ๋ฆฌ์ด๋ฅผ ํ์ฑํ๋ ๋๊ธฐ์ (synchronous) ๋ก๋ ์คํผ๋ ์ด์
์ ํฌํจํ๊ธฐ +๋๋ฌธ์ ์ผ๋ฐ์ ์ผ๋ก๋ ์ด๋ฐ๊ฒ ๋ฌธ์ ๊ฐ ๋์ง ์์ต๋๋ค. ๋ง์ฝ ์ด๊ฑธ๋ก๋ ์ถฉ๋ถ์น ์๋ค๋ฉด +mmiowb() ๊ฐ ๋ช
์์ ์ผ๋ก ์ฌ์ฉ๋ ํ์๊ฐ ์์ต๋๋ค. + + +ํ๋์ ์ธํฐ๋ฝํธ ๋ฃจํด๊ณผ ๋ณ๋์ CPU ์์ ์ํ์ค์ด๋ฉฐ ์๋ก ํต์ ์ ํ๋ ๋ ๋ฃจํด +์ฌ์ด์๋ ๋น์ทํ ์ํฉ์ด ์ผ์ด๋ ์ ์์ต๋๋ค. ๋ง์ฝ ๊ทธ๋ฐ ๊ฒฝ์ฐ๊ฐ ๋ฐ์ํ ๊ฐ๋ฅ์ฑ์ด +์๋ค๋ฉด, ์์๋ฅผ ๋ณด์ฅํ๊ธฐ ์ํด ์ธํฐ๋ฝํธ ๋นํ์ฑํ ๋ฝ์ด ์ฌ์ฉ๋์ด์ ธ์ผ๋ง ํฉ๋๋ค. + + +====================== +์ปค๋ I/O ๋ฐฐ๋ฆฌ์ด์ ํจ๊ณผ +====================== + +I/O ๋ฉ๋ชจ๋ฆฌ์ ์ก์ธ์คํ ๋, ๋๋ผ์ด๋ฒ๋ ์ ์ ํ ์ก์ธ์ค ํจ์๋ฅผ ์ฌ์ฉํด์ผ ํฉ๋๋ค: + + (*) inX(), outX(): + + ์ด๊ฒ๋ค์ ๋ฉ๋ชจ๋ฆฌ ๊ณต๊ฐ๋ณด๋ค๋ I/O ๊ณต๊ฐ์ ์ด์ผ๊ธฐ๋ฅผ ํ๋ ค๋ ์๋๋ก + ๋ง๋ค์ด์ก์ต๋๋ค๋ง, ๊ทธ๊ฑด ๊ธฐ๋ณธ์ ์ผ๋ก CPU ๋ง๋ค ๋ค๋ฅธ ์ปจ์
์
๋๋ค. i386 ๊ณผ + x86_64 ํ๋ก์ธ์๋ค์ ํน๋ณํ I/O ๊ณต๊ฐ ์ก์ธ์ค ์ฌ์ดํด๊ณผ ๋ช
๋ น์ด๋ฅผ ์ค์ ๋ก ๊ฐ์ง๊ณ + ์์ง๋ง, ๋ค๋ฅธ ๋ง์ CPU ๋ค์๋ ๊ทธ๋ฐ ์ปจ์
์ด ์กด์ฌํ์ง ์์ต๋๋ค. + + ๋ค๋ฅธ ๊ฒ๋ค ์ค์์๋ PCI ๋ฒ์ค๊ฐ I/O ๊ณต๊ฐ ์ปจ์
์ ์ ์ํ๋๋ฐ, ์ด๋ - i386 ๊ณผ + x86_64 ๊ฐ์ CPU ์์ - CPU ์ I/O ๊ณต๊ฐ ์ปจ์
์ผ๋ก ์ฝ๊ฒ ๋งค์น๋ฉ๋๋ค. ํ์ง๋ง, + ๋์ฒดํ I/O ๊ณต๊ฐ์ด ์๋ CPU ์์๋ CPU ์ ๋ฉ๋ชจ๋ฆฌ ๋งต์ ๊ฐ์ I/O ๊ณต๊ฐ์ผ๋ก + ๋งคํ๋ ์๋ ์์ต๋๋ค. + + ์ด ๊ณต๊ฐ์ผ๋ก์ ์ก์ธ์ค๋ (i386 ๋ฑ์์๋) ์์ ํ๊ฒ ๋๊ธฐํ ๋ฉ๋๋ค๋ง, ์ค๊ฐ์ + (PCI ํธ์คํธ ๋ธ๋ฆฌ์ง์ ๊ฐ์) ๋ธ๋ฆฌ์ง๋ค์ ์ด๋ฅผ ์์ ํ ๋ณด์ฅํ์ง ์์์๋ + ์์ต๋๋ค. + + ์ด๊ฒ๋ค์ ์ํธ๊ฐ์ ์์๋ ์์ ํ๊ฒ ๋ณด์ฅ๋ฉ๋๋ค. + + ๋ค๋ฅธ ํ์
์ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
, I/O ์คํผ๋ ์ด์
์ ๋ํ ์์๋ ์์ ํ๊ฒ + ๋ณด์ฅ๋์ง๋ ์์ต๋๋ค. + + (*) readX(), writeX(): + + ์ด๊ฒ๋ค์ด ์ํ ์์ฒญ๋๋ CPU ์์ ์๋ก์๊ฒ ์์ ํ ์์๊ฐ ๋ง์ถฐ์ง๊ณ ๋
๋ฆฝ์ ์ผ๋ก + ์ํ๋๋์ง์ ๋ํ ๋ณด์ฅ ์ฌ๋ถ๋ ์ด๋ค์ด ์ก์ธ์ค ํ๋ ๋ฉ๋ชจ๋ฆฌ ์๋์ฐ์ ์ ์๋ + ํน์ฑ์ ์ํด ๊ฒฐ์ ๋ฉ๋๋ค. ์๋ฅผ ๋ค์ด, ์ต์ ์ i386 ์ํคํ
์ณ ๋จธ์ ์์๋ MTRR + ๋ ์ง์คํฐ๋ก ์ด ํน์ฑ์ด ์กฐ์ ๋ฉ๋๋ค. + + ์ผ๋ฐ์ ์ผ๋ก๋, ํ๋ฆฌํ์น (prefetch) ๊ฐ๋ฅํ ๋๋ฐ์ด์ค๋ฅผ ์ก์ธ์ค ํ๋๊ฒ + ์๋๋ผ๋ฉด, ์ด๊ฒ๋ค์ ์์ ํ ์์๊ฐ ๋ง์ถฐ์ง๊ณ ๊ฒฐํฉ๋์ง ์๊ฒ ๋ณด์ฅ๋ ๊ฒ๋๋ค. + + ํ์ง๋ง, (PCI ๋ธ๋ฆฌ์ง์ ๊ฐ์) ์ค๊ฐ์ ํ๋์จ์ด๋ ์์ ์ด ์ํ๋ค๋ฉด ์งํ์ + ์ฐ๊ธฐ์ํฌ ์ ์์ต๋๋ค; ์คํ ์ด ๋ช
๋ น์ ์ค์ ๋ก ํ๋์จ์ด๋ก ๋ด๋ ค๋ณด๋ด๊ธฐ(flush) + ์ํด์๋ ๊ฐ์ ์์น๋ก๋ถํฐ ๋ก๋๋ฅผ ํ๋ ๋ฐฉ๋ฒ์ด ์์ต๋๋ค๋ง[*], PCI ์ ๊ฒฝ์ฐ๋ + ๊ฐ์ ๋๋ฐ์ด์ค๋ ํ๊ฒฝ ๊ตฌ์ฑ ์์ญ์์์ ๋ก๋๋ง์ผ๋ก๋ ์ถฉ๋ถํ ๊ฒ๋๋ค. + + [*] ์ฃผ์! ์ฐ์ฌ์ง ๊ฒ๊ณผ ๊ฐ์ ์์น๋ก๋ถํฐ์ ๋ก๋๋ฅผ ์๋ํ๋ ๊ฒ์ ์ค๋์์ + ์ผ์ผํฌ ์๋ ์์ต๋๋ค - ์๋ก 16650 Rx/Tx ์๋ฆฌ์ผ ๋ ์ง์คํฐ๋ฅผ ์๊ฐํด + ๋ณด์ธ์. + + ํ๋ฆฌํ์น ๊ฐ๋ฅํ I/O ๋ฉ๋ชจ๋ฆฌ๊ฐ ์ฌ์ฉ๋๋ฉด, ์คํ ์ด ๋ช
๋ น๋ค์ด ์์๋ฅผ ์งํค๋๋ก + ํ๊ธฐ ์ํด mmiowb() ๋ฐฐ๋ฆฌ์ด๊ฐ ํ์ํ ์ ์์ต๋๋ค. + + PCI ํธ๋์ญ์
์ฌ์ด์ ์ํธ์์ฉ์ ๋ํด ๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ PCI ๋ช
์ธ์๋ฅผ + ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค. + + (*) readX_relaxed(), writeX_relaxed() + + ์ด๊ฒ๋ค์ readX() ์ writeX() ๋ ๋น์ทํ์ง๋ง, ๋ ์ํ๋ ๋ฉ๋ชจ๋ฆฌ ์์ ๋ณด์ฅ์ + ์ ๊ณตํฉ๋๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ด๊ฒ๋ค์ ์ผ๋ฐ์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค (์: DMA ๋ฒํผ) ์๋ + LOCK ์ด๋ UNLOCK ์คํผ๋ ์ด์
๋ค์๋ ์์๋ฅผ ๋ณด์ฅํ์ง ์์ต๋๋ค. LOCK ์ด๋ + UNLOCK ์คํผ๋ ์ด์
๋ค์ ๋ง์ถฐ์ง๋ ์์๊ฐ ํ์ํ๋ค๋ฉด, mmiowb() ๋ฐฐ๋ฆฌ์ด๊ฐ ์ฌ์ฉ๋ + ์ ์์ต๋๋ค. ๊ฐ์ ์ฃผ๋ณ ์ฅ์น์์ ์ํ๋ ์ก์ธ์ค๋ผ๋ฆฌ๋ ์์๊ฐ ์ง์ผ์ง์ ์์ + ๋์๊ธฐ ๋ฐ๋๋๋ค. + + (*) ioreadX(), iowriteX() + + ์ด๊ฒ๋ค์ inX()/outX() ๋ readX()/writeX() ์ฒ๋ผ ์ค์ ๋ก ์ํํ๋ ์ก์ธ์ค์ + ์ข
๋ฅ์ ๋ฐ๋ผ ์ ์ ํ๊ฒ ์ํ๋ ๊ฒ์
๋๋ค. + + +=================================== +๊ฐ์ ๋๋ ๊ฐ์ฅ ์ํ๋ ์คํ ์์ ๋ชจ๋ธ +=================================== + +์ปจ์
์ ์ผ๋ก CPU ๋ ์ฃผ์ด์ง ํ๋ก๊ทธ๋จ์ ๋ํด ํ๋ก๊ทธ๋จ ๊ทธ ์์ฒด์๋ ์ธ๊ณผ์ฑ (program +causality) ์ ์งํค๋ ๊ฒ์ฒ๋ผ ๋ณด์ด๊ฒ ํ์ง๋ง ์ผ๋ฐ์ ์ผ๋ก๋ ์์๋ฅผ ๊ฑฐ์ ์ง์ผ์ฃผ์ง +์๋๋ค๊ณ ๊ฐ์ ๋์ด์ผ๋ง ํฉ๋๋ค. (i386 ์ด๋ x86_64 ๊ฐ์) ์ผ๋ถ CPU ๋ค์ ์ฝ๋ +์ฌ๋ฐฐ์น์ (powerpc ๋ frv ์ ๊ฐ์) ๋ค๋ฅธ ๊ฒ๋ค์ ๋นํด ๊ฐํ ์ ์ฝ์ ๊ฐ์ง๋ง, ์ํคํ
์ณ +์ข
์์ ์ฝ๋ ์ด์ธ์ ์ฝ๋์์๋ ์์์ ๋ํ ์ ์ฝ์ด ๊ฐ์ฅ ์ํ๋ ๊ฒฝ์ฐ (DEC Alpha) +๋ฅผ ๊ฐ์ ํด์ผ ํฉ๋๋ค. + +์ด ๋ง์, CPU ์๊ฒ ์ฃผ์ด์ง๋ ์ธ์คํธ๋ญ์
์คํธ๋ฆผ ๋ด์ ํ ์ธ์คํธ๋ญ์
์ด ์์ +์ธ์คํธ๋ญ์
์ ์ข
์์ ์ด๋ผ๋ฉด ์์ ์ธ์คํธ๋ญ์
์ ๋ค์ ์ข
์์ ์ธ์คํธ๋ญ์
์ด ์คํ๋๊ธฐ +์ ์ ์๋ฃ[*]๋ ์ ์์ด์ผ ํ๋ค๋ ์ ์ฝ (๋ฌ๋ฆฌ ๋งํด์, ์ธ๊ณผ์ฑ์ด ์ง์ผ์ง๋ ๊ฒ์ผ๋ก +๋ณด์ด๊ฒ ํจ) ์ธ์๋ ์์ ์ด ์ํ๋ ์์๋๋ก - ์ฌ์ง์ด ๋ณ๋ ฌ์ ์ผ๋ก๋ - ๊ทธ ์คํธ๋ฆผ์ +์คํํ ์ ์์์ ์๋ฏธํฉ๋๋ค + + [*] ์ผ๋ถ ์ธ์คํธ๋ญ์
์ ํ๋ ์ด์์ ์ํฅ - ์กฐ๊ฑด ์ฝ๋๋ฅผ ๋ฐ๊พผ๋ค๋์ง, ๋ ์ง์คํฐ๋ + ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๋ฐ๊พผ๋ค๋์ง - ์ ๋ง๋ค์ด๋ด๋ฉฐ, ๋ค๋ฅธ ์ธ์คํธ๋ญ์
์ ๋ค๋ฅธ ํจ๊ณผ์ + ์ข
์์ ์ผ ์ ์์ต๋๋ค. + +CPU ๋ ์ต์ข
์ ์ผ๋ก ์๋ฌด ํจ๊ณผ๋ ๋ง๋ค์ง ์๋ ์ธ์คํธ๋ญ์
์ํ์ค๋ ์์ ๋ฒ๋ฆด ์๋ +์์ต๋๋ค. ์๋ฅผ ๋ค์ด, ๋ง์ฝ ๋๊ฐ์ ์ฐ์๋๋ ์ธ์คํธ๋ญ์
์ด ๋ ๋ค ๊ฐ์ ๋ ์ง์คํฐ์ +์ง์ ์ ์ธ ๊ฐ (immediate value) ์ ์ง์ด๋ฃ๋๋ค๋ฉด, ์ฒซ๋ฒ์งธ ์ธ์คํธ๋ญ์
์ ๋ฒ๋ ค์ง ์๋ +์์ต๋๋ค. + + +๋น์ทํ๊ฒ, ์ปดํ์ผ๋ฌ ์ญ์ ํ๋ก๊ทธ๋จ์ ์ธ๊ณผ์ฑ๋ง ์ง์ผ์ค๋ค๋ฉด ์ธ์คํธ๋ญ์
์คํธ๋ฆผ์ +์์ ์ด ๋ณด๊ธฐ์ ์ฌ๋ฐ๋ฅด๋ค ์๊ฐ๋๋๋๋ก ์ฌ๋ฐฐ์น ํ ์ ์์ต๋๋ค. + + +=============== +CPU ์บ์์ ์ํฅ +=============== + +์บ์๋ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ด ์์คํ
์ ์ฒด์ ์ด๋ป๊ฒ ์ธ์ง๋๋์ง๋ CPU ์ ๋ฉ๋ชจ๋ฆฌ +์ฌ์ด์ ์กด์ฌํ๋ ์บ์๋ค, ๊ทธ๋ฆฌ๊ณ ์์คํ
์ํ์ ์ผ๊ด์ฑ์ ๊ด๋ฆฌํ๋ ๋ฉ๋ชจ๋ฆฌ ์ผ๊ด์ฑ +์์คํ
์ ์๋น ๋ถ๋ถ ์ํฅ์ ๋ฐ์ต๋๋ค. + +ํ CPU ๊ฐ ์์คํ
์ ๋ค๋ฅธ ๋ถ๋ถ๋ค๊ณผ ์บ์๋ฅผ ํตํด ์ํธ์์ฉํ๋ค๋ฉด, ๋ฉ๋ชจ๋ฆฌ ์์คํ
์ +CPU ์ ์บ์๋ค์ ํฌํจํด์ผ ํ๋ฉฐ, CPU ์ CPU ์์ ์ ์บ์ ์ฌ์ด์์์ ๋์์ ์ํ +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๊ฐ์ ธ์ผ ํฉ๋๋ค. (๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ๋
ผ๋ฆฌ์ ์ผ๋ก๋ ๋ค์ ๊ทธ๋ฆผ์ +์ ์ ์์ ๋์ํฉ๋๋ค): + + <--- CPU ---> : <----------- Memory -----------> + : + +--------+ +--------+ : +--------+ +-----------+ + | | | | : | | | | +--------+ + | CPU | | Memory | : | CPU | | | | | + | Core |--->| Access |----->| Cache |<-->| | | | + | | | Queue | : | | | |--->| Memory | + | | | | : | | | | | | + +--------+ +--------+ : +--------+ | | | | + : | Cache | +--------+ + : | Coherency | + : | Mechanism | +--------+ + +--------+ +--------+ : +--------+ | | | | + | | | | : | | | | | | + | CPU | | Memory | : | CPU | | |--->| Device | + | Core |--->| Access |----->| Cache |<-->| | | | + | | | Queue | : | | | | | | + | | | | : | | | | +--------+ + +--------+ +--------+ : +--------+ +-----------+ + : + : + +ํน์ ๋ก๋๋ ์คํ ์ด๋ ํด๋น ์คํผ๋ ์ด์
์ ์์ฒญํ CPU ์ ์บ์ ๋ด์์ ๋์์ ์๋ฃํ +์๋ ์๊ธฐ ๋๋ฌธ์ ํด๋น CPU ์ ๋ฐ๊นฅ์๋ ๋ณด์ด์ง ์์ ์ ์์ง๋ง, ๋ค๋ฅธ CPU ๊ฐ ๊ด์ฌ์ +๊ฐ๋๋ค๋ฉด ์บ์ ์ผ๊ด์ฑ ๋ฉ์ปค๋์ฆ์ด ํด๋น ์บ์๋ผ์ธ์ ํด๋น CPU ์๊ฒ ์ ๋ฌํ๊ณ , ํด๋น +๋ฉ๋ชจ๋ฆฌ ์์ญ์ ๋ํ ์คํผ๋ ์ด์
์ด ๋ฐ์ํ ๋๋ง๋ค ๊ทธ ์ํฅ์ ์ ํ์ํค๊ธฐ ๋๋ฌธ์, ํด๋น +์คํผ๋ ์ด์
์ ๋ฉ๋ชจ๋ฆฌ์ ์ค์ ๋ก ์ก์ธ์ค๋ฅผ ํ๊ฒ์ฒ๋ผ ๋ํ๋ ๊ฒ์
๋๋ค. + +CPU ์ฝ์ด๋ ํ๋ก๊ทธ๋จ์ ์ธ๊ณผ์ฑ์ด ์ ์ง๋๋ค๊ณ ๋ง ์ฌ๊ฒจ์ง๋ค๋ฉด ์ธ์คํธ๋ญ์
๋ค์ ์ด๋ค +์์๋ก๋ ์ฌ๋ฐฐ์นํด์ ์ํํ ์ ์์ต๋๋ค. ์ผ๋ถ ์ธ์คํธ๋ญ์
๋ค์ ๋ก๋๋ ์คํ ์ด +์คํผ๋ ์ด์
์ ๋ง๋๋๋ฐ ์ด ์คํผ๋ ์ด์
๋ค์ ์ดํ ์ํ๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ํ์ ๋ค์ด๊ฐ๊ฒ +๋ฉ๋๋ค. ์ฝ์ด๋ ์ด ์คํผ๋ ์ด์
๋ค์ ํด๋น ํ์ ์ด๋ค ์์๋ก๋ ์ํ๋๋๋ก ๋ฃ์ ์ +์๊ณ , ๋ค๋ฅธ ์ธ์คํธ๋ญ์
์ ์๋ฃ๋ฅผ ๊ธฐ๋ค๋ฆฌ๋๋ก ๊ฐ์ ๋๊ธฐ ์ ๊น์ง๋ ์ํ์ ๊ณ์ํฉ๋๋ค. + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ํ๋ ์ผ์ CPU ์ชฝ์์ ๋ฉ๋ชจ๋ฆฌ ์ชฝ์ผ๋ก ๋์ด๊ฐ๋ ์ก์ธ์ค๋ค์ ์์, +๊ทธ๋ฆฌ๊ณ ๊ทธ ์ก์ธ์ค์ ๊ฒฐ๊ณผ๊ฐ ์์คํ
์ ๋ค๋ฅธ ๊ด์ฐฐ์๋ค์๊ฒ ์ธ์ง๋๋ ์์๋ฅผ ์ ์ดํ๋ +๊ฒ์
๋๋ค. + +[!] CPU ๋ค์ ํญ์ ๊ทธ๋ค ์์ ์ ๋ก๋์ ์คํ ์ด๋ ํ๋ก๊ทธ๋จ ์์๋๋ก ์ผ์ด๋ ๊ฒ์ผ๋ก +๋ณด๊ธฐ ๋๋ฌธ์, ์ฃผ์ด์ง CPU ๋ด์์๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ ํ์๊ฐ _์์ต๋๋ค_. + +[!] MMIO ๋ ๋ค๋ฅธ ๋๋ฐ์ด์ค ์ก์ธ์ค๋ค์ ์บ์ ์์คํ
์ ์ฐํํ ์๋ ์์ต๋๋ค. ์ฐํ +์ฌ๋ถ๋ ๋๋ฐ์ด์ค๊ฐ ์ก์ธ์ค ๋๋ ๋ฉ๋ชจ๋ฆฌ ์๋์ฐ์ ํน์ฑ์ ์ํด ๊ฒฐ์ ๋ ์๋ ์๊ณ , CPU +๊ฐ ๊ฐ์ง๊ณ ์์ ์ ์๋ ํน์ํ ๋๋ฐ์ด์ค ํต์ ์ธ์คํธ๋ญ์
์ ์ฌ์ฉ์ ์ํด์ ๊ฒฐ์ ๋ +์๋ ์์ต๋๋ค. + + +์บ์ ์ผ๊ด์ฑ +----------- + +ํ์ง๋ง ์ถ์ ์์์ ์ด์ผ๊ธฐํ ๊ฒ์ฒ๋ผ ๋จ์ํ์ง ์์ต๋๋ค: ์บ์๋ค์ ์ผ๊ด์ ์ผ ๊ฒ์ผ๋ก +๊ธฐ๋๋์ง๋ง, ๊ทธ ์ผ๊ด์ฑ์ด ์์์๋ ์ ์ฉ๋ ๊ฑฐ๋ผ๋ ๋ณด์ฅ์ ์์ต๋๋ค. ํ CPU ์์ +๋ง๋ค์ด์ง ๋ณ๊ฒฝ ์ฌํญ์ ์ต์ข
์ ์ผ๋ก๋ ์์คํ
์ ๋ชจ๋ CPU ์๊ฒ ๋ณด์ฌ์ง๊ฒ ๋์ง๋ง, ๋ค๋ฅธ +CPU ๋ค์๊ฒ๋ ๊ฐ์ ์์๋ก ๋ณด์ด๊ฒ ๋ ๊ฑฐ๋ผ๋ ๋ณด์ฅ์ ์๋ค๋ ๋ป์
๋๋ค. + + +๋๊ฐ์ CPU (1 & 2) ๊ฐ ๋ฌ๋ ค ์๊ณ , ๊ฐ CPU ์ ๋๊ฐ์ ๋ฐ์ดํฐ ์บ์(CPU 1 ์ A/B ๋ฅผ, +CPU 2 ๋ C/D ๋ฅผ ๊ฐ์ต๋๋ค)๊ฐ ๋ณ๋ ฌ๋ก ์ฐ๊ฒฐ๋์ด ์๋ ์์คํ
์ ๋ค๋ฃฌ๋ค๊ณ ์๊ฐํด +๋ด
์๋ค: + + : + : +--------+ + : +---------+ | | + +--------+ : +--->| Cache A |<------->| | + | | : | +---------+ | | + | CPU 1 |<---+ | | + | | : | +---------+ | | + +--------+ : +--->| Cache B |<------->| | + : +---------+ | | + : | Memory | + : +---------+ | System | + +--------+ : +--->| Cache C |<------->| | + | | : | +---------+ | | + | CPU 2 |<---+ | | + | | : | +---------+ | | + +--------+ : +--->| Cache D |<------->| | + : +---------+ | | + : +--------+ + : + +์ด ์์คํ
์ด ๋ค์๊ณผ ๊ฐ์ ํน์ฑ์ ๊ฐ๋๋ค ์๊ฐํด ๋ด
์๋ค: + + (*) ํ์๋ฒ ์บ์๋ผ์ธ์ ์บ์ A, ์บ์ C ๋๋ ๋ฉ๋ชจ๋ฆฌ์ ์์นํ ์ ์์; + + (*) ์ง์๋ฒ ์บ์๋ผ์ธ์ ์บ์ B, ์บ์ D ๋๋ ๋ฉ๋ชจ๋ฆฌ์ ์์นํ ์ ์์; + + (*) CPU ์ฝ์ด๊ฐ ํ๊ฐ์ ์บ์์ ์ ๊ทผํ๋ ๋์, ๋ค๋ฅธ ์บ์๋ - ๋ํฐ ์บ์๋ผ์ธ์ + ๋ฉ๋ชจ๋ฆฌ์ ๋ด๋ฆฌ๊ฑฐ๋ ์ถ์ธก์ฑ ๋ก๋๋ฅผ ํ๊ฑฐ๋ ํ๊ธฐ ์ํด - ์์คํ
์ ๋ค๋ฅธ ๋ถ๋ถ์ + ์ก์ธ์ค ํ๊ธฐ ์ํด ๋ฒ์ค๋ฅผ ์ฌ์ฉํ ์ ์์; + + (*) ๊ฐ ์บ์๋ ์์คํ
์ ๋๋จธ์ง ๋ถ๋ถ๋ค๊ณผ ์ผ๊ด์ฑ์ ๋ง์ถ๊ธฐ ์ํด ํด๋น ์บ์์ + ์ ์ฉ๋์ด์ผ ํ ์คํผ๋ ์ด์
๋ค์ ํ๋ฅผ ๊ฐ์ง; + + (*) ์ด ์ผ๊ด์ฑ ํ๋ ์บ์์ ์ด๋ฏธ ์กด์ฌํ๋ ๋ผ์ธ์ ๊ฐํด์ง๋ ํ๋ฒํ ๋ก๋์ ์ํด์๋ + ๋น์์ง์ง ์๋๋ฐ, ํ์ ์คํผ๋ ์ด์
๋ค์ด ์ด ๋ก๋์ ๊ฒฐ๊ณผ์ ์ํฅ์ ๋ผ์น ์ ์๋ค + ํ ์ง๋ผ๋ ๊ทธ๋ฌํจ. + +์ด์ , ์ฒซ๋ฒ์งธ CPU ์์ ๋๊ฐ์ ์ฐ๊ธฐ ์คํผ๋ ์ด์
์ ๋ง๋๋๋ฐ, ํด๋น CPU ์ ์บ์์ +์์ฒญ๋ ์์๋ก ์คํผ๋ ์ด์
์ด ๋๋ฌ๋จ์ ๋ณด์ฅํ๊ธฐ ์ํด ๋ ์คํผ๋ ์ด์
์ฌ์ด์ ์ฐ๊ธฐ +๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํ๋ ์ํฉ์ ์์ํด ๋ด
์๋ค: + + CPU 1 CPU 2 COMMENT + =============== =============== ======================================= + u == 0, v == 1 and p == &u, q == &u + v = 2; + smp_wmb(); v ์ ๋ณ๊ฒฝ์ด p ์ ๋ณ๊ฒฝ ์ ์ ๋ณด์ผ ๊ฒ์ + ๋ถ๋ช
ํ ํจ + <A:modify v=2> v ๋ ์ด์ ์บ์ A ์ ๋
์ ์ ์ผ๋ก ์กด์ฌํจ + p = &v; + <B:modify p=&v> p ๋ ์ด์ ์บ์ B ์ ๋
์ ์ ์ผ๋ก ์กด์ฌํจ + +์ฌ๊ธฐ์์ ์ฐ๊ธฐ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ CPU 1 ์ ์บ์๊ฐ ์ฌ๋ฐ๋ฅธ ์์๋ก ์
๋ฐ์ดํธ ๋ ๊ฒ์ผ๋ก +์์คํ
์ ๋ค๋ฅธ CPU ๋ค์ด ์ธ์งํ๊ฒ ๋ง๋ญ๋๋ค. ํ์ง๋ง, ์ด์ ๋๋ฒ์งธ CPU ๊ฐ ๊ทธ ๊ฐ๋ค์ +์ฝ์ผ๋ ค ํ๋ ์ํฉ์ ์๊ฐํด ๋ด
์๋ค: + + CPU 1 CPU 2 COMMENT + =============== =============== ======================================= + ... + q = p; + x = *q; + +์์ ๋๊ฐ์ ์ฝ๊ธฐ ์คํผ๋ ์ด์
์ ์์๋ ์์๋ก ์ผ์ด๋์ง ๋ชปํ ์ ์๋๋ฐ, ๋๋ฒ์งธ CPU +์ ํ ์บ์์ ๋ค๋ฅธ ์บ์ ์ด๋ฒคํธ๊ฐ ๋ฐ์ํด v ๋ฅผ ๋ด๊ณ ์๋ ์บ์๋ผ์ธ์ ํด๋น ์บ์์์ +์
๋ฐ์ดํธ๊ฐ ์ง์ฐ๋๋ ์ฌ์ด, p ๋ฅผ ๋ด๊ณ ์๋ ์บ์๋ผ์ธ์ ๋๋ฒ์งธ CPU ์ ๋ค๋ฅธ ์บ์์ +์
๋ฐ์ดํธ ๋์ด๋ฒ๋ ธ์ ์ ์๊ธฐ ๋๋ฌธ์
๋๋ค. + + CPU 1 CPU 2 COMMENT + =============== =============== ======================================= + u == 0, v == 1 and p == &u, q == &u + v = 2; + smp_wmb(); + <A:modify v=2> <C:busy> + <C:queue v=2> + p = &v; q = p; + <D:request p> + <B:modify p=&v> <D:commit p=&v> + <D:read p> + x = *q; + <C:read *q> ์บ์์ ์
๋ฐ์ดํธ ๋๊ธฐ ์ ์ v ๋ฅผ ์ฝ์ + <C:unbusy> + <C:commit v=2> + +๊ธฐ๋ณธ์ ์ผ๋ก, ๋๊ฐ์ ์บ์๋ผ์ธ ๋ชจ๋ CPU 2 ์ ์ต์ข
์ ์ผ๋ก๋ ์
๋ฐ์ดํธ ๋ ๊ฒ์ด์ง๋ง, +๋ณ๋์ ๊ฐ์
์์ด๋, ์
๋ฐ์ดํธ์ ์์๊ฐ CPU 1 ์์ ๋ง๋ค์ด์ง ์์์ ๋์ผํ +๊ฒ์ด๋ผ๋ ๋ณด์ฅ์ด ์์ต๋๋ค. + + +์ฌ๊ธฐ์ ๊ฐ์
ํ๊ธฐ ์ํด์ , ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ ์ฝ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ก๋ ์คํผ๋ ์ด์
๋ค +์ฌ์ด์ ๋ฃ์ด์ผ ํฉ๋๋ค. ์ด๋ ๊ฒ ํจ์ผ๋ก์จ ์บ์๊ฐ ๋ค์ ์์ฒญ์ ์ฒ๋ฆฌํ๊ธฐ ์ ์ ์ผ๊ด์ฑ +ํ๋ฅผ ์ฒ๋ฆฌํ๋๋ก ๊ฐ์ ํ๊ฒ ๋ฉ๋๋ค. + + CPU 1 CPU 2 COMMENT + =============== =============== ======================================= + u == 0, v == 1 and p == &u, q == &u + v = 2; + smp_wmb(); + <A:modify v=2> <C:busy> + <C:queue v=2> + p = &v; q = p; + <D:request p> + <B:modify p=&v> <D:commit p=&v> + <D:read p> + smp_read_barrier_depends() + <C:unbusy> + <C:commit v=2> + x = *q; + <C:read *q> ์บ์์ ์
๋ฐ์ดํธ ๋ v ๋ฅผ ์ฝ์ + + +์ด๋ฐ ๋ถ๋ฅ์ ๋ฌธ์ ๋ DEC Alpha ๊ณ์ด ํ๋ก์ธ์๋ค์์ ๋ฐ๊ฒฌ๋ ์ ์๋๋ฐ, ์ด๋ค์ +๋ฐ์ดํฐ ๋ฒ์ค๋ฅผ ์ข ๋ ์ ์ฌ์ฉํด ์ฑ๋ฅ์ ๊ฐ์ ํ ์ ์๋, ๋ถํ ๋ ์บ์๋ฅผ ๊ฐ์ง๊ณ ์๊ธฐ +๋๋ฌธ์
๋๋ค. ๋๋ถ๋ถ์ CPU ๋ ํ๋์ ์ฝ๊ธฐ ์คํผ๋ ์ด์
์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๊ฐ ๋ค๋ฅธ ์ฝ๊ธฐ +์คํผ๋ ์ด์
์ ์์กด์ ์ด๋ผ๋ฉด ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ฅผ ๋ดํฌ์ํต๋๋ค๋ง, ๋ชจ๋๊ฐ ๊ทธ๋ฐ๊ฑด +์๋๊ธฐ ๋๋ฌธ์ ์ด์ ์ ์์กดํด์ ์๋ฉ๋๋ค. + +๋ค๋ฅธ CPU ๋ค๋ ๋ถํ ๋ ์บ์๋ฅผ ๊ฐ์ง๊ณ ์์ ์ ์์ง๋ง, ๊ทธ๋ฐ CPU ๋ค์ ํ๋ฒํ ๋ฉ๋ชจ๋ฆฌ +์ก์ธ์ค๋ฅผ ์ํด์๋ ์ด ๋ถํ ๋ ์บ์๋ค ์ฌ์ด์ ์กฐ์ ์ ํด์ผ๋ง ํฉ๋๋ค. Alpha ๋ ๊ฐ์ฅ +์ฝํ ๋ฉ๋ชจ๋ฆฌ ์์ ์๋งจํฑ (semantic) ์ ์ ํํจ์ผ๋ก์จ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ๋ช
์์ ์ผ๋ก +์ฌ์ฉ๋์ง ์์์ ๋์๋ ๊ทธ๋ฐ ์กฐ์ ์ด ํ์ํ์ง ์๊ฒ ํ์ต๋๋ค. + + +์บ์ ์ผ๊ด์ฑ VS DMA +------------------ + +๋ชจ๋ ์์คํ
์ด DMA ๋ฅผ ํ๋ ๋๋ฐ์ด์ค์ ๋ํด์๊น์ง ์บ์ ์ผ๊ด์ฑ์ ์ ์งํ์ง๋ +์์ต๋๋ค. ๊ทธ๋ฐ ๊ฒฝ์ฐ, DMA ๋ฅผ ์๋ํ๋ ๋๋ฐ์ด์ค๋ RAM ์ผ๋ก๋ถํฐ ์๋ชป๋ ๋ฐ์ดํฐ๋ฅผ +์ฝ์ ์ ์๋๋ฐ, ๋ํฐ ์บ์ ๋ผ์ธ์ด CPU ์ ์บ์์ ๋จธ๋ฌด๋ฅด๊ณ ์๊ณ , ๋ฐ๋ ๊ฐ์ด ์์ง +RAM ์ ์จ์ง์ง ์์์ ์ ์๊ธฐ ๋๋ฌธ์
๋๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด์ , ์ปค๋์ +์ ์ ํ ๋ถ๋ถ์์ ๊ฐ CPU ์บ์์ ๋ฌธ์ ๋๋ ๋นํธ๋ค์ ํ๋ฌ์ (flush) ์์ผ์ผ๋ง ํฉ๋๋ค +(๊ทธ๋ฆฌ๊ณ ๊ทธ๊ฒ๋ค์ ๋ฌดํจํ - invalidation - ์ํฌ ์๋ ์๊ฒ ์ฃ ). + +๋ํ, ๋๋ฐ์ด์ค์ ์ํด RAM ์ DMA ๋ก ์ฐ์ฌ์ง ๊ฐ์ ๋๋ฐ์ด์ค๊ฐ ์ฐ๊ธฐ๋ฅผ ์๋ฃํ ํ์ +CPU ์ ์บ์์์ RAM ์ผ๋ก ์ฐ์ฌ์ง๋ ๋ํฐ ์บ์ ๋ผ์ธ์ ์ํด ๋ฎ์ด์จ์ง ์๋ ์๊ณ , CPU +์ ์บ์์ ์กด์ฌํ๋ ์บ์ ๋ผ์ธ์ด ํด๋น ์บ์์์ ์ญ์ ๋๊ณ ๋ค์ ๊ฐ์ ์ฝ์ด๋ค์ด๊ธฐ +์ ๊น์ง๋ RAM ์ด ์
๋ฐ์ดํธ ๋์๋ค๋ ์ฌ์ค ์์ฒด๊ฐ ์จ๊ฒจ์ ธ ๋ฒ๋ฆด ์๋ ์์ต๋๋ค. ์ด +๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด์ , ์ปค๋์ ์ ์ ํ ๋ถ๋ถ์์ ๊ฐ CPU ์ ์บ์ ์์ ๋ฌธ์ ๊ฐ ๋๋ +๋นํธ๋ค์ ๋ฌดํจํ ์์ผ์ผ ํฉ๋๋ค. + +์บ์ ๊ด๋ฆฌ์ ๋ํ ๋ ๋ง์ ์ ๋ณด๋ฅผ ์ํด์ Documentation/cachetlb.txt ๋ฅผ +์ฐธ๊ณ ํ์ธ์. + + +์บ์ ์ผ๊ด์ฑ VS MMIO +------------------- + +Memory mapped I/O ๋ ์ผ๋ฐ์ ์ผ๋ก CPU ์ ๋ฉ๋ชจ๋ฆฌ ๊ณต๊ฐ ๋ด์ ํ ์๋์ฐ์ ํน์ ๋ถ๋ถ +๋ด์ ๋ฉ๋ชจ๋ฆฌ ์ง์ญ์ ์ด๋ฃจ์ด์ง๋๋ฐ, ์ด ์๋์ฐ๋ ์ผ๋ฐ์ ์ธ, RAM ์ผ๋ก ํฅํ๋ +์๋์ฐ์๋ ๋ค๋ฅธ ํน์ฑ์ ๊ฐ์ต๋๋ค. + +๊ทธ๋ฐ ํน์ฑ ๊ฐ์ด๋ฐ ํ๋๋, ์ผ๋ฐ์ ์ผ๋ก ๊ทธ๋ฐ ์ก์ธ์ค๋ ์บ์๋ฅผ ์์ ํ ์ฐํํ๊ณ +๋๋ฐ์ด์ค ๋ฒ์ค๋ก ๊ณง๋ฐ๋ก ํฅํ๋ค๋ ๊ฒ์
๋๋ค. ์ด ๋ง์ MMIO ์ก์ธ์ค๋ ๋จผ์ +์์๋์ด์ ์บ์์์ ์๋ฃ๋ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค๋ฅผ ์ถ์ํ ์ ์๋ค๋ ๋ป์
๋๋ค. ์ด๋ฐ +๊ฒฝ์ฐ์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ง์ผ๋ก๋ ์ถฉ๋ถ์น ์๊ณ , ๋ง์ฝ ์บ์๋ ๋ฉ๋ชจ๋ฆฌ ์ฐ๊ธฐ ์คํผ๋ ์ด์
๊ณผ +MMIO ์ก์ธ์ค๊ฐ ์ด๋ค ๋ฐฉ์์ผ๋ก๋ ์์กด์ ์ด๋ผ๋ฉด ํด๋น ์บ์๋ ๋ ์คํผ๋ ์ด์
์ฌ์ด์ +๋น์์ ธ(flush)์ผ๋ง ํฉ๋๋ค. + + +====================== +CPU ๋ค์ด ์ ์ง๋ฅด๋ ์ผ๋ค +====================== + +ํ๋ก๊ทธ๋๋จธ๋ CPU ๊ฐ ๋ฉ๋ชจ๋ฆฌ ์คํผ๋ ์ด์
๋ค์ ์ ํํ ์์ฒญํ๋๋ก ์ํํด ์ค ๊ฒ์ด๋ผ๊ณ +์๊ฐํ๋๋ฐ, ์๋ฅผ ๋ค์ด ๋ค์๊ณผ ๊ฐ์ ์ฝ๋๋ฅผ CPU ์๊ฒ ๋๊ธด๋ค๋ฉด: + + a = READ_ONCE(*A); + WRITE_ONCE(*B, b); + c = READ_ONCE(*C); + d = READ_ONCE(*D); + WRITE_ONCE(*E, e); + +CPU ๋ ๋ค์ ์ธ์คํธ๋ญ์
์ ์ฒ๋ฆฌํ๊ธฐ ์ ์ ํ์ฌ์ ์ธ์คํธ๋ญ์
์ ์ํ ๋ฉ๋ชจ๋ฆฌ +์คํผ๋ ์ด์
์ ์๋ฃํ ๊ฒ์ด๋ผ ์๊ฐํ๊ณ , ๋ฐ๋ผ์ ์์คํ
์ธ๋ถ์์ ๊ด์ฐฐํ๊ธฐ์๋ ์ ํด์ง +์์๋๋ก ์คํผ๋ ์ด์
์ด ์ํ๋ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค: + + LOAD *A, STORE *B, LOAD *C, LOAD *D, STORE *E. + + +๋น์ฐํ์ง๋ง, ์ค์ ๋ก๋ ํจ์ฌ ์๋ง์
๋๋ค. ๋ง์ CPU ์ ์ปดํ์ผ๋ฌ์์ ์์ ๊ฐ์ ์ +์ฑ๋ฆฝํ์ง ๋ชปํ๋๋ฐ ๊ทธ ์ด์ ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค: + + (*) ๋ก๋ ์คํผ๋ ์ด์
๋ค์ ์คํ์ ๊ณ์ ํด๋๊ฐ๊ธฐ ์ํด ๊ณง๋ฐ๋ก ์๋ฃ๋ ํ์๊ฐ ์๋ + ๊ฒฝ์ฐ๊ฐ ๋ง์ ๋ฐ๋ฉด, ์คํ ์ด ์คํผ๋ ์ด์
๋ค์ ์ข
์ข
๋ณ๋ค๋ฅธ ๋ฌธ์ ์์ด ์ ์๋ ์ + ์์ต๋๋ค; + + (*) ๋ก๋ ์คํผ๋ ์ด์
๋ค์ ์์ธก์ ์ผ๋ก ์ํ๋ ์ ์์ผ๋ฉฐ, ํ์์๋ ๋ก๋์๋ค๊ณ + ์ฆ๋ช
๋ ์์ธก์ ๋ก๋์ ๊ฒฐ๊ณผ๋ ๋ฒ๋ ค์ง๋๋ค; + + (*) ๋ก๋ ์คํผ๋ ์ด์
๋ค์ ์์ธก์ ์ผ๋ก ์ํ๋ ์ ์์ผ๋ฏ๋ก, ์์๋ ์ด๋ฒคํธ์ + ์ํ์ค์ ๋ค๋ฅธ ์๊ฐ์ ๋ก๋๊ฐ ์ด๋ค์ง ์ ์์ต๋๋ค; + + (*) ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ์์๋ CPU ๋ฒ์ค์ ์บ์๋ฅผ ์ข ๋ ์ ์ฌ์ฉํ ์ ์๋๋ก ์ฌ๋ฐฐ์น + ๋ ์ ์์ต๋๋ค; + + (*) ๋ก๋์ ์คํ ์ด๋ ์ธ์ ํ ์์น์์ ์ก์ธ์ค๋ค์ ์ผ๊ด์ ์ผ๋ก ์ฒ๋ฆฌํ ์ ์๋ + ๋ฉ๋ชจ๋ฆฌ๋ I/O ํ๋์จ์ด (๋ฉ๋ชจ๋ฆฌ์ PCI ๋๋ฐ์ด์ค ๋ ๋ค ์ด๊ฒ ๊ฐ๋ฅํ ์ + ์์ต๋๋ค) ์ ๋ํด ์์ฒญ๋๋ ๊ฒฝ์ฐ, ๊ฐ๋ณ ์คํผ๋ ์ด์
์ ์ํ ํธ๋์ญ์
์ค์ + ๋น์ฉ์ ์๋ผ๊ธฐ ์ํด ์กฐํฉ๋์ด ์คํ๋ ์ ์์ต๋๋ค; ๊ทธ๋ฆฌ๊ณ + + (*) ํด๋น CPU ์ ๋ฐ์ดํฐ ์บ์๊ฐ ์์์ ์ํฅ์ ๋ผ์น ์๋ ์๊ณ , ์บ์ ์ผ๊ด์ฑ + ๋ฉ์ปค๋์ฆ์ด - ์คํ ์ด๊ฐ ์ค์ ๋ก ์บ์์ ๋๋ฌํ๋ค๋ฉด - ์ด ๋ฌธ์ ๋ฅผ ์ํ์ํฌ ์๋ + ์์ง๋ง ์ด ์ผ๊ด์ฑ ๊ด๋ฆฌ๊ฐ ๋ค๋ฅธ CPU ๋ค์๋ ๊ฐ์ ์์๋ก ์ ๋ฌ๋๋ค๋ ๋ณด์ฅ์ + ์์ต๋๋ค. + +๋ฐ๋ผ์, ์์ ์ฝ๋์ ๋ํด ๋ค๋ฅธ CPU ๊ฐ ๋ณด๋ ๊ฒฐ๊ณผ๋ ๋ค์๊ณผ ๊ฐ์ ์ ์์ต๋๋ค: + + LOAD *A, ..., LOAD {*C,*D}, STORE *E, STORE *B + + ("LOAD {*C,*D}" ๋ ์กฐํฉ๋ ๋ก๋์
๋๋ค) + + +ํ์ง๋ง, CPU ๋ ์ค์ค๋ก๋ ์ผ๊ด์ ์ผ ๊ฒ์ ๋ณด์ฅํฉ๋๋ค: CPU _์์ _ ์ ์ก์ธ์ค๋ค์ +์์ ์๊ฒ๋ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๊ฐ ์์์๋ ๋ถ๊ตฌํ๊ณ ์ ํํ ์์ ์ธ์์ง ๊ฒ์ผ๋ก ๋ณด์ฌ์ง +๊ฒ์
๋๋ค. ์๋ฅผ ๋ค์ด ๋ค์์ ์ฝ๋๊ฐ ์ฃผ์ด์ก๋ค๋ฉด: + + U = READ_ONCE(*A); + WRITE_ONCE(*A, V); + WRITE_ONCE(*A, W); + X = READ_ONCE(*A); + WRITE_ONCE(*A, Y); + Z = READ_ONCE(*A); + +๊ทธ๋ฆฌ๊ณ ์ธ๋ถ์ ์ํฅ์ ์ํ ๊ฐ์ญ์ด ์๋ค๊ณ ๊ฐ์ ํ๋ฉด, ์ต์ข
๊ฒฐ๊ณผ๋ ๋ค์๊ณผ ๊ฐ์ด +๋ํ๋ ๊ฒ์ด๋ผ๊ณ ์์๋ ์ ์์ต๋๋ค: + + U == *A ์ ์ต์ด ๊ฐ + X == W + Z == Y + *A == Y + +์์ ์ฝ๋๋ CPU ๊ฐ ๋ค์์ ๋ฉ๋ชจ๋ฆฌ ์ก์ธ์ค ์ํ์ค๋ฅผ ๋ง๋ค๋๋ก ํ ๊ฒ๋๋ค: + + U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A + +ํ์ง๋ง, ๋ณ๋ค๋ฅธ ๊ฐ์
์ด ์๊ณ ํ๋ก๊ทธ๋จ์ ์์ผ์ ์ด ์ธ์์ด ์ฌ์ ํ ์ผ๊ด์ ์ด๋ผ๊ณ +๋ณด์ธ๋ค๋ ๋ณด์ฅ๋ง ์ง์ผ์ง๋ค๋ฉด ์ด ์ํ์ค๋ ์ด๋ค ์กฐํฉ์ผ๋ก๋ ์ฌ๊ตฌ์ฑ๋ ์ ์์ผ๋ฉฐ, ๊ฐ +์ก์ธ์ค๋ค์ ํฉ์ณ์ง๊ฑฐ๋ ๋ฒ๋ ค์ง ์ ์์ต๋๋ค. ์ผ๋ถ ์ํคํ
์ณ์์ CPU ๋ ๊ฐ์ ์์น์ +๋ํ ์ฐ์์ ์ธ ๋ก๋ ์คํผ๋ ์ด์
๋ค์ ์ฌ๋ฐฐ์น ํ ์ ์๊ธฐ ๋๋ฌธ์ ์์ ์์์์ +READ_ONCE() ์ WRITE_ONCE() ๋ ๋ฐ๋์ ์กด์ฌํด์ผ ํจ์ ์์๋์ธ์. ๊ทธ๋ฐ ์ข
๋ฅ์ +์ํคํ
์ณ์์ READ_ONCE() ์ WRITE_ONCE() ๋ ์ด ๋ฌธ์ ๋ฅผ ๋ง๊ธฐ ์ํด ํ์ํ ์ผ์ +๋ญ๊ฐ ๋๋ ์ง ํ๊ฒ ๋๋๋ฐ, ์๋ฅผ ๋ค์ด Itanium ์์๋ READ_ONCE() ์ WRITE_ONCE() +๊ฐ ์ฌ์ฉํ๋ volatile ์บ์คํ
์ GCC ๊ฐ ๊ทธ๋ฐ ์ฌ๋ฐฐ์น๋ฅผ ๋ฐฉ์งํ๋ ํน์ ์ธ์คํธ๋ญ์
์ธ +ld.acq ์ stl.rel ์ธ์คํธ๋ญ์
์ ๊ฐ๊ฐ ๋ง๋ค์ด ๋ด๋๋ก ํฉ๋๋ค. + +์ปดํ์ผ๋ฌ ์ญ์ ์ด ์ํ์ค์ ์ก์ธ์ค๋ค์ CPU ๊ฐ ๋ณด๊ธฐ๋ ์ ์ ํฉ์น๊ฑฐ๋ ๋ฒ๋ฆฌ๊ฑฐ๋ ๋ค๋ก +๋ฏธ๋ค๋ฒ๋ฆด ์ ์์ต๋๋ค. + +์๋ฅผ ๋ค์ด: + + *A = V; + *A = W; + +๋ ๋ค์๊ณผ ๊ฐ์ด ๋ณํ๋ ์ ์์ต๋๋ค: + + *A = W; + +๋ฐ๋ผ์, ์ฐ๊ธฐ ๋ฐฐ๋ฆฌ์ด๋ WRITE_ONCE() ๊ฐ ์๋ค๋ฉด *A ๋ก์ V ๊ฐ์ ์ ์ฅ์ ํจ๊ณผ๋ +์ฌ๋ผ์ง๋ค๊ณ ๊ฐ์ ๋ ์ ์์ต๋๋ค. ๋น์ทํ๊ฒ: + + *A = Y; + Z = *A; + +๋, ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ READ_ONCE() ์ WRITE_ONCE() ์์ด๋ ๋ค์๊ณผ ๊ฐ์ด ๋ณํ๋ ์ +์์ต๋๋ค: + + *A = Y; + Z = Y; + +๊ทธ๋ฆฌ๊ณ ์ด LOAD ์คํผ๋ ์ด์
์ CPU ๋ฐ๊นฅ์๋ ์์ ๋ณด์ด์ง ์์ต๋๋ค. + + +๊ทธ๋ฆฌ๊ณ , ALPHA ๊ฐ ์๋ค +--------------------- + +DEC Alpha CPU ๋ ๊ฐ์ฅ ์ํ๋ ๋ฉ๋ชจ๋ฆฌ ์์์ CPU ์ค ํ๋์
๋๋ค. ๋ฟ๋ง ์๋๋ผ, +Alpha CPU ์ ์ผ๋ถ ๋ฒ์ ์ ๋ถํ ๋ ๋ฐ์ดํฐ ์บ์๋ฅผ ๊ฐ์ง๊ณ ์์ด์, ์๋ฏธ์ ์ผ๋ก +๊ด๊ณ๋์ด ์๋ ๋๊ฐ์ ์บ์ ๋ผ์ธ์ด ์๋ก ๋ค๋ฅธ ์๊ฐ์ ์
๋ฐ์ดํธ ๋๋๊ฒ ๊ฐ๋ฅํฉ๋๋ค. +์ด๊ฒ ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๊ฐ ์ ๋ง ํ์ํด์ง๋ ๋ถ๋ถ์ธ๋ฐ, ๋ฐ์ดํฐ ์์กด์ฑ ๋ฐฐ๋ฆฌ์ด๋ +๋ฉ๋ชจ๋ฆฌ ์ผ๊ด์ฑ ์์คํ
๊ณผ ํจ๊ป ๋๊ฐ์ ์บ์๋ฅผ ๋๊ธฐํ ์์ผ์, ํฌ์ธํฐ ๋ณ๊ฒฝ๊ณผ ์๋ก์ด +๋ฐ์ดํฐ์ ๋ฐ๊ฒฌ์ ์ฌ๋ฐ๋ฅธ ์์๋ก ์ผ์ด๋๊ฒ ํ๊ธฐ ๋๋ฌธ์
๋๋ค. + +๋ฆฌ๋
์ค ์ปค๋์ ๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด ๋ชจ๋ธ์ Alpha ์ ๊ธฐ์ดํด์ ์ ์๋์์ต๋๋ค. + +์์ "์บ์ ์ผ๊ด์ฑ" ์๋ธ์น์
์ ์ฐธ๊ณ ํ์ธ์. + + +๊ฐ์ ๋จธ์ ๊ฒ์คํธ +---------------- + +๊ฐ์ ๋จธ์ ์์ ๋์ํ๋ ๊ฒ์คํธ๋ค์ ๊ฒ์คํธ ์์ฒด๋ SMP ์ง์ ์์ด ์ปดํ์ผ ๋์๋ค +ํด๋ SMP ์ํฅ์ ๋ฐ์ ์ ์์ต๋๋ค. ์ด๊ฑด UP ์ปค๋์ ์ฌ์ฉํ๋ฉด์ SMP ํธ์คํธ์ +๊ฒฐ๋ถ๋์ด ๋ฐ์ํ๋ ๋ถ์์ฉ์
๋๋ค. ์ด ๊ฒฝ์ฐ์๋ mandatory ๋ฐฐ๋ฆฌ์ด๋ฅผ ์ฌ์ฉํด์ ๋ฌธ์ ๋ฅผ +ํด๊ฒฐํ ์ ์๊ฒ ์ง๋ง ๊ทธ๋ฐ ํด๊ฒฐ์ ๋๋ถ๋ถ์ ๊ฒฝ์ฐ ์ต์ ์ ํด๊ฒฐ์ฑ
์ด ์๋๋๋ค. + +์ด ๋ฌธ์ ๋ฅผ ์๋ฒฝํ๊ฒ ํด๊ฒฐํ๊ธฐ ์ํด, ๋ก์ฐ ๋ ๋ฒจ์ virt_mb() ๋ฑ์ ๋งคํฌ๋ก๋ฅผ ์ฌ์ฉํ ์ +์์ต๋๋ค. ์ด๊ฒ๋ค์ SMP ๊ฐ ํ์ฑํ ๋์ด ์๋ค๋ฉด smp_mb() ๋ฑ๊ณผ ๋์ผํ ํจ๊ณผ๋ฅผ +๊ฐ์ต๋๋ค๋ง, SMP ์ SMP ์๋ ์์คํ
๋ชจ๋์ ๋ํด ๋์ผํ ์ฝ๋๋ฅผ ๋ง๋ค์ด๋
๋๋ค. +์๋ฅผ ๋ค์ด, ๊ฐ์ ๋จธ์ ๊ฒ์คํธ๋ค์ (SMP ์ผ ์ ์๋) ํธ์คํธ์ ๋๊ธฐํ๋ฅผ ํ ๋์๋ +smp_mb() ๊ฐ ์๋๋ผ virt_mb() ๋ฅผ ์ฌ์ฉํด์ผ ํฉ๋๋ค. + +์ด๊ฒ๋ค์ smp_mb() ๋ฅ์ ๊ฒ๋ค๊ณผ ๋ชจ๋ ๋ถ๋ถ์์ ๋์ผํ๋ฉฐ, ํนํ, MMIO ์ ์ํฅ์ +๋ํด์๋ ๊ฐ์ฌํ์ง ์์ต๋๋ค: MMIO ์ ์ํฅ์ ์ ์ดํ๋ ค๋ฉด, mandatory ๋ฐฐ๋ฆฌ์ด๋ฅผ +์ฌ์ฉํ์๊ธฐ ๋ฐ๋๋๋ค. + + +======= +์ฌ์ฉ ์ +======= + +์ํ์ ๋ฒํผ +----------- + +๋ฉ๋ชจ๋ฆฌ ๋ฐฐ๋ฆฌ์ด๋ ์ํ์ ๋ฒํผ๋ฅผ ์์ฑ์(producer)์ ์๋น์(consumer) ์ฌ์ด์ +๋๊ธฐํ์ ๋ฝ์ ์ฌ์ฉํ์ง ์๊ณ ๊ตฌํํ๋๋ฐ์ ์ฌ์ฉ๋ ์ ์์ต๋๋ค. ๋ ์์ธํ ๋ด์ฉ์ +์ํด์ ๋ค์์ ์ฐธ๊ณ ํ์ธ์: + + Documentation/circular-buffers.txt + + +========= +์ฐธ๊ณ ๋ฌธํ +========= + +Alpha AXP Architecture Reference Manual, Second Edition (Sites & Witek, +Digital Press) + Chapter 5.2: Physical Address Space Characteristics + Chapter 5.4: Caches and Write Buffers + Chapter 5.5: Data Sharing + Chapter 5.6: Read/Write Ordering + +AMD64 Architecture Programmer's Manual Volume 2: System Programming + Chapter 7.1: Memory-Access Ordering + Chapter 7.4: Buffering and Combining Memory Writes + +IA-32 Intel Architecture Software Developer's Manual, Volume 3: +System Programming Guide + Chapter 7.1: Locked Atomic Operations + Chapter 7.2: Memory Ordering + Chapter 7.4: Serializing Instructions + +The SPARC Architecture Manual, Version 9 + Chapter 8: Memory Models + Appendix D: Formal Specification of the Memory Models + Appendix J: Programming with the Memory Models + +UltraSPARC Programmer Reference Manual + Chapter 5: Memory Accesses and Cacheability + Chapter 15: Sparc-V9 Memory Models + +UltraSPARC III Cu User's Manual + Chapter 9: Memory Models + +UltraSPARC IIIi Processor User's Manual + Chapter 8: Memory Models + +UltraSPARC Architecture 2005 + Chapter 9: Memory + Appendix D: Formal Specifications of the Memory Models + +UltraSPARC T1 Supplement to the UltraSPARC Architecture 2005 + Chapter 8: Memory Models + Appendix F: Caches and Cache Coherency + +Solaris Internals, Core Kernel Architecture, p63-68: + Chapter 3.3: Hardware Considerations for Locks and + Synchronization + +Unix Systems for Modern Architectures, Symmetric Multiprocessing and Caching +for Kernel Programmers: + Chapter 13: Other Memory Models + +Intel Itanium Architecture Software Developer's Manual: Volume 1: + Section 2.6: Speculation + Section 4.4: Memory Access diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 1f9b3e2b98ae..1f6d45abfe42 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -103,6 +103,16 @@ Note that the probed function's args may be passed on the stack or in registers. The jprobe will work in either case, so long as the handler's prototype matches that of the probed function. +Note that in some architectures (e.g.: arm64 and sparc64) the stack +copy is not done, as the actual location of stacked parameters may be +outside of a reasonable MAX_STACK_SIZE value and because that location +cannot be determined by the jprobes code. In this case the jprobes +user must be careful to make certain the calling signature of the +function does not cause parameters to be passed on the stack (e.g.: +more than eight function arguments, an argument of more than sixteen +bytes, or more than 64 bytes of argument data, depending on +architecture). + 1.3 Return Probes 1.3.1 How Does a Return Probe Work? diff --git a/Documentation/leds/leds-mlxcpld.txt b/Documentation/leds/leds-mlxcpld.txt new file mode 100644 index 000000000000..a0e8fd457117 --- /dev/null +++ b/Documentation/leds/leds-mlxcpld.txt @@ -0,0 +1,110 @@ +Kernel driver for Mellanox systems LEDs +======================================= + +Provide system LED support for the nex Mellanox systems: +"msx6710", "msx6720", "msb7700", "msn2700", "msx1410", +"msn2410", "msb7800", "msn2740", "msn2100". + +Description +----------- +Driver provides the following LEDs for the systems "msx6710", "msx6720", +"msb7700", "msn2700", "msx1410", "msn2410", "msb7800", "msn2740": + mlxcpld:fan1:green + mlxcpld:fan1:red + mlxcpld:fan2:green + mlxcpld:fan2:red + mlxcpld:fan3:green + mlxcpld:fan3:red + mlxcpld:fan4:green + mlxcpld:fan4:red + mlxcpld:psu:green + mlxcpld:psu:red + mlxcpld:status:green + mlxcpld:status:red + + "status" + CPLD reg offset: 0x20 + Bits [3:0] + + "psu" + CPLD reg offset: 0x20 + Bits [7:4] + + "fan1" + CPLD reg offset: 0x21 + Bits [3:0] + + "fan2" + CPLD reg offset: 0x21 + Bits [7:4] + + "fan3" + CPLD reg offset: 0x22 + Bits [3:0] + + "fan4" + CPLD reg offset: 0x22 + Bits [7:4] + + Color mask for all the above LEDs: + [bit3,bit2,bit1,bit0] or + [bit7,bit6,bit5,bit4]: + [0,0,0,0] = LED OFF + [0,1,0,1] = Red static ON + [1,1,0,1] = Green static ON + [0,1,1,0] = Red blink 3Hz + [1,1,1,0] = Green blink 3Hz + [0,1,1,1] = Red blink 6Hz + [1,1,1,1] = Green blink 6Hz + +Driver provides the following LEDs for the system "msn2100": + mlxcpld:fan:green + mlxcpld:fan:red + mlxcpld:psu1:green + mlxcpld:psu1:red + mlxcpld:psu2:green + mlxcpld:psu2:red + mlxcpld:status:green + mlxcpld:status:red + mlxcpld:uid:blue + + "status" + CPLD reg offset: 0x20 + Bits [3:0] + + "fan" + CPLD reg offset: 0x21 + Bits [3:0] + + "psu1" + CPLD reg offset: 0x23 + Bits [3:0] + + "psu2" + CPLD reg offset: 0x23 + Bits [7:4] + + "uid" + CPLD reg offset: 0x24 + Bits [3:0] + + Color mask for all the above LEDs, excepted uid: + [bit3,bit2,bit1,bit0] or + [bit7,bit6,bit5,bit4]: + [0,0,0,0] = LED OFF + [0,1,0,1] = Red static ON + [1,1,0,1] = Green static ON + [0,1,1,0] = Red blink 3Hz + [1,1,1,0] = Green blink 3Hz + [0,1,1,1] = Red blink 6Hz + [1,1,1,1] = Green blink 6Hz + + Color mask for uid LED: + [bit3,bit2,bit1,bit0]: + [0,0,0,0] = LED OFF + [1,1,0,1] = Blue static ON + [1,1,1,0] = Blue blink 3Hz + [1,1,1,1] = Blue blink 6Hz + +Driver supports HW blinking at 3Hz and 6Hz frequency (50% duty cycle). +For 3Hz duty cylce is about 167 msec, for 6Hz is about 83 msec. diff --git a/Documentation/leds/ledtrig-oneshot.txt b/Documentation/leds/ledtrig-oneshot.txt index 07cd1fa41a3a..fe57474a12e2 100644 --- a/Documentation/leds/ledtrig-oneshot.txt +++ b/Documentation/leds/ledtrig-oneshot.txt @@ -21,24 +21,8 @@ below: echo oneshot > trigger -This adds the following sysfs attributes to the LED: - - delay_on - specifies for how many milliseconds the LED has to stay at - LED_FULL brightness after it has been armed. - Default to 100 ms. - - delay_off - specifies for how many milliseconds the LED has to stay at - LED_OFF brightness after it has been armed. - Default to 100 ms. - - invert - reverse the blink logic. If set to 0 (default) blink on for delay_on - ms, then blink off for delay_off ms, leaving the LED normally off. If - set to 1, blink off for delay_off ms, then blink on for delay_on ms, - leaving the LED normally on. - Setting this value also immediately change the LED state. - - shot - write any non-empty string to signal an events, this starts a blink - sequence if not already running. +This adds sysfs attributes to the LED that are documented in: +Documentation/ABI/testing/sysfs-class-led-trigger-oneshot Example use-case: network devices, initialization: diff --git a/Documentation/leds/ledtrig-usbport.txt b/Documentation/leds/ledtrig-usbport.txt new file mode 100644 index 000000000000..69f54bfb4789 --- /dev/null +++ b/Documentation/leds/ledtrig-usbport.txt @@ -0,0 +1,41 @@ +USB port LED trigger +==================== + +This LED trigger can be used for signalling to the user a presence of USB device +in a given port. It simply turns on LED when device appears and turns it off +when it disappears. + +It requires selecting USB ports that should be observed. All available ones are +listed as separated entries in a "ports" subdirectory. Selecting is handled by +echoing "1" to a chosen port. + +Please note that this trigger allows selecting multiple USB ports for a single +LED. This can be useful in two cases: + +1) Device with single USB LED and few physical ports + +In such a case LED will be turned on as long as there is at least one connected +USB device. + +2) Device with a physical port handled by few controllers + +Some devices may have one controller per PHY standard. E.g. USB 3.0 physical +port may be handled by ohci-platform, ehci-platform and xhci-hcd. If there is +only one LED user will most likely want to assign ports from all 3 hubs. + + +This trigger can be activated from user space on led class devices as shown +below: + + echo usbport > trigger + +This adds sysfs attributes to the LED that are documented in: +Documentation/ABI/testing/sysfs-class-led-trigger-usbport + +Example use-case: + + echo usbport > trigger + echo 1 > ports/usb1-port1 + echo 1 > ports/usb2-port1 + cat ports/usb1-port1 + echo 0 > ports/usb1-port1 diff --git a/Documentation/locking/lglock.txt b/Documentation/locking/lglock.txt deleted file mode 100644 index a6971e34fabe..000000000000 --- a/Documentation/locking/lglock.txt +++ /dev/null @@ -1,166 +0,0 @@ -lglock - local/global locks for mostly local access patterns ------------------------------------------------------------- - -Origin: Nick Piggin's VFS scalability series introduced during - 2.6.35++ [1] [2] -Location: kernel/locking/lglock.c - include/linux/lglock.h -Users: currently only the VFS and stop_machine related code - -Design Goal: ------------- - -Improve scalability of globally used large data sets that are -distributed over all CPUs as per_cpu elements. - -To manage global data structures that are partitioned over all CPUs -as per_cpu elements but can be mostly handled by CPU local actions -lglock will be used where the majority of accesses are cpu local -reading and occasional cpu local writing with very infrequent -global write access. - - -* deal with things locally whenever possible - - very fast access to the local per_cpu data - - reasonably fast access to specific per_cpu data on a different - CPU -* while making global action possible when needed - - by expensive access to all CPUs locks - effectively - resulting in a globally visible critical section. - -Design: -------- - -Basically it is an array of per_cpu spinlocks with the -lg_local_lock/unlock accessing the local CPUs lock object and the -lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object -the lg_local_lock has to disable preemption as migration protection so -that the reference to the local CPUs lock does not go out of scope. -Due to the lg_local_lock/unlock only touching cpu-local resources it -is fast. Taking the local lock on a different CPU will be more -expensive but still relatively cheap. - -One can relax the migration constraints by acquiring the current -CPUs lock with lg_local_lock_cpu, remember the cpu, and release that -lock at the end of the critical section even if migrated. This should -give most of the performance benefits without inhibiting migration -though needs careful considerations for nesting of lglocks and -consideration of deadlocks with lg_global_lock. - -The lg_global_lock/unlock locks all underlying spinlocks of all -possible CPUs (including those off-line). The preemption disable/enable -are needed in the non-RT kernels to prevent deadlocks like: - - on cpu 1 - - task A task B - lg_global_lock - got cpu 0 lock - <<<< preempt <<<< - lg_local_lock_cpu for cpu 0 - spin on cpu 0 lock - -On -RT this deadlock scenario is resolved by the arch_spin_locks in the -lglocks being replaced by rt_mutexes which resolve the above deadlock -by boosting the lock-holder. - - -Implementation: ---------------- - -The initial lglock implementation from Nick Piggin used some complex -macros to generate the lglock/brlock in lglock.h - they were later -turned into a set of functions by Andi Kleen [7]. The change to functions -was motivated by the presence of multiple lock users and also by them -being easier to maintain than the generating macros. This change to -functions is also the basis to eliminated the restriction of not -being initializeable in kernel modules (the remaining problem is that -locks are not explicitly initialized - see lockdep-design.txt) - -Declaration and initialization: -------------------------------- - - #include <linux/lglock.h> - - DEFINE_LGLOCK(name) - or: - DEFINE_STATIC_LGLOCK(name); - - lg_lock_init(&name, "lockdep_name_string"); - - on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note - also that as of 3.18-rc6 all declaration in use are of the _STATIC_ - variant (and it seems that the non-static was never in use). - lg_lock_init is initializing the lockdep map only. - -Usage: ------- - -From the locking semantics it is a spinlock. It could be called a -locality aware spinlock. lg_local_* behaves like a per_cpu -spinlock and lg_global_* like a global spinlock. -No surprises in the API. - - lg_local_lock(*lglock); - access to protected per_cpu object on this CPU - lg_local_unlock(*lglock); - - lg_local_lock_cpu(*lglock, cpu); - access to protected per_cpu object on other CPU cpu - lg_local_unlock_cpu(*lglock, cpu); - - lg_global_lock(*lglock); - access all protected per_cpu objects on all CPUs - lg_global_unlock(*lglock); - - There are no _trylock variants of the lglocks. - -Note that the lg_global_lock/unlock has to iterate over all possible -CPUs rather than the actually present CPUs or a CPU could go off-line -with a held lock [4] and that makes it very expensive. A discussion on -these issues can be found at [5] - -Constraints: ------------- - - * currently the declaration of lglocks in kernel modules is not - possible, though this should be doable with little change. - * lglocks are not recursive. - * suitable for code that can do most operations on the CPU local - data and will very rarely need the global lock - * lg_global_lock/unlock is *very* expensive and does not scale - * on UP systems all lg_* primitives are simply spinlocks - * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but - does not change the tasks state while sleeping [6]. - * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock - is downgraded to a migrate_disable/enable, the other - preempt_disable/enable are downgraded to barriers [6]. - The deadlock noted for non-RT above is resolved due to rt_mutexes - boosting the lock-holder in this case which arch_spin_locks do - not do. - -lglocks were designed for very specific problems in the VFS and probably -only are the right answer in these corner cases. Any new user that looks -at lglocks probably wants to look at the seqlock and RCU alternatives as -her first choice. There are also efforts to resolve the RCU issues that -currently prevent using RCU in place of view remaining lglocks. - -Note on brlock history: ------------------------ - -The 'Big Reader' read-write spinlocks were originally introduced by -Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They -later were introduced by the VFS scalability patch set in 2.6 series -again as the "big reader lock" brlock [2] variant of lglock which has -been replaced by seqlock primitives or by RCU based primitives in the -3.13 kernel series as was suggested in [3] in 2003. The brlock was -entirely removed in the 3.13 kernel series. - -Link: 1 http://lkml.org/lkml/2010/8/2/81 -Link: 2 http://lwn.net/Articles/401738/ -Link: 3 http://lkml.org/lkml/2003/3/9/205 -Link: 4 https://lkml.org/lkml/2011/8/24/185 -Link: 5 http://lkml.org/lkml/2011/12/18/189 -Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/ - patch series - lglocks-rt.patch.patch -Link: 7 http://lkml.org/lkml/2012/3/5/26 diff --git a/Documentation/media/Makefile b/Documentation/media/Makefile index 39e2d766dbe3..a7fb35291f6c 100644 --- a/Documentation/media/Makefile +++ b/Documentation/media/Makefile @@ -10,7 +10,8 @@ FILES = audio.h.rst ca.h.rst dmx.h.rst frontend.h.rst net.h.rst video.h.rst \ TARGETS := $(addprefix $(BUILDDIR)/, $(FILES)) -htmldocs: $(BUILDDIR) ${TARGETS} +.PHONY: all +all: $(BUILDDIR) ${TARGETS} $(BUILDDIR): $(Q)mkdir -p $@ diff --git a/Documentation/media/conf.py b/Documentation/media/conf.py new file mode 100644 index 000000000000..bef927bc4659 --- /dev/null +++ b/Documentation/media/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = 'Linux Media Subsystem Documentation' + +tags.add("subproject") + +latex_documents = [ + ('index', 'media.tex', 'Linux Media Subsystem Documentation', + 'The kernel development community', 'manual'), +] diff --git a/Documentation/media/conf_nitpick.py b/Documentation/media/conf_nitpick.py new file mode 100644 index 000000000000..11beac2e68fb --- /dev/null +++ b/Documentation/media/conf_nitpick.py @@ -0,0 +1,93 @@ +# -*- coding: utf-8; mode: python -*- + +project = 'Linux Media Subsystem Documentation' + +# It is possible to run Sphinx in nickpick mode with: +nitpicky = True + +# within nit-picking build, do not refer to any intersphinx object +intersphinx_mapping = {} + +# In nickpick mode, it will complain about lots of missing references that +# +# 1) are just typedefs like: bool, __u32, etc; +# 2) It will complain for things like: enum, NULL; +# 3) It will complain for symbols that should be on different +# books (but currently aren't ported to ReST) +# +# The list below has a list of such symbols to be ignored in nitpick mode +# +nitpick_ignore = [ + ("c:func", "clock_gettime"), + ("c:func", "close"), + ("c:func", "container_of"), + ("c:func", "determine_valid_ioctls"), + ("c:func", "ERR_PTR"), + ("c:func", "ioctl"), + ("c:func", "IS_ERR"), + ("c:func", "mmap"), + ("c:func", "open"), + ("c:func", "pci_name"), + ("c:func", "poll"), + ("c:func", "PTR_ERR"), + ("c:func", "read"), + ("c:func", "release"), + ("c:func", "set"), + ("c:func", "struct fd_set"), + ("c:func", "struct pollfd"), + ("c:func", "usb_make_path"), + ("c:func", "write"), + ("c:type", "atomic_t"), + ("c:type", "bool"), + ("c:type", "buf_queue"), + ("c:type", "device"), + ("c:type", "device_driver"), + ("c:type", "device_node"), + ("c:type", "enum"), + ("c:type", "file"), + ("c:type", "i2c_adapter"), + ("c:type", "i2c_board_info"), + ("c:type", "i2c_client"), + ("c:type", "ktime_t"), + ("c:type", "led_classdev_flash"), + ("c:type", "list_head"), + ("c:type", "lock_class_key"), + ("c:type", "module"), + ("c:type", "mutex"), + ("c:type", "pci_dev"), + ("c:type", "pdvbdev"), + ("c:type", "poll_table_struct"), + ("c:type", "s32"), + ("c:type", "s64"), + ("c:type", "sd"), + ("c:type", "spi_board_info"), + ("c:type", "spi_device"), + ("c:type", "spi_master"), + ("c:type", "struct fb_fix_screeninfo"), + ("c:type", "struct pollfd"), + ("c:type", "struct timeval"), + ("c:type", "struct video_capability"), + ("c:type", "u16"), + ("c:type", "u32"), + ("c:type", "u64"), + ("c:type", "u8"), + ("c:type", "union"), + ("c:type", "usb_device"), + + ("cpp:type", "boolean"), + ("cpp:type", "fd"), + ("cpp:type", "fd_set"), + ("cpp:type", "int16_t"), + ("cpp:type", "NULL"), + ("cpp:type", "off_t"), + ("cpp:type", "pollfd"), + ("cpp:type", "size_t"), + ("cpp:type", "ssize_t"), + ("cpp:type", "timeval"), + ("cpp:type", "__u16"), + ("cpp:type", "__u32"), + ("cpp:type", "__u64"), + ("cpp:type", "uint16_t"), + ("cpp:type", "uint32_t"), + ("cpp:type", "video_system_t"), +] diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst new file mode 100644 index 000000000000..7f8f0af620ce --- /dev/null +++ b/Documentation/media/index.rst @@ -0,0 +1,19 @@ +Linux Media Subsystem Documentation +=================================== + +Contents: + +.. toctree:: + :maxdepth: 2 + + media_uapi + media_kapi + dvb-drivers/index + v4l-drivers/index + +.. only:: subproject + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/media/uapi/cec/cec-func-open.rst b/Documentation/media/uapi/cec/cec-func-open.rst index 38fd7e0cfccd..7c0f981a6e07 100644 --- a/Documentation/media/uapi/cec/cec-func-open.rst +++ b/Documentation/media/uapi/cec/cec-func-open.rst @@ -32,7 +32,7 @@ Arguments Open flags. Access mode must be ``O_RDWR``. When the ``O_NONBLOCK`` flag is given, the - :ref:`CEC_RECEIVE <CEC_RECEIVE>` and :ref:`CEC_DQEVENT <CEC_DQEVENT>` ioctls + :ref:`CEC_RECEIVE <CEC_RECEIVE>` and :c:func:`CEC_DQEVENT` ioctls will return the ``EAGAIN`` error code when no message or event is available, and ioctls :ref:`CEC_TRANSMIT <CEC_TRANSMIT>`, :ref:`CEC_ADAP_S_PHYS_ADDR <CEC_ADAP_S_PHYS_ADDR>` and diff --git a/Documentation/media/uapi/cec/cec-ioc-adap-g-log-addrs.rst b/Documentation/media/uapi/cec/cec-ioc-adap-g-log-addrs.rst index 04ee90099676..201d4839931c 100644 --- a/Documentation/media/uapi/cec/cec-ioc-adap-g-log-addrs.rst +++ b/Documentation/media/uapi/cec/cec-ioc-adap-g-log-addrs.rst @@ -144,7 +144,7 @@ logical address types are already defined will return with error ``EBUSY``. - ``flags`` - - Flags. No flags are defined yet, so set this to 0. + - Flags. See :ref:`cec-log-addrs-flags` for a list of available flags. - .. row 7 @@ -201,6 +201,25 @@ logical address types are already defined will return with error ``EBUSY``. give the CEC framework more information about the device type, even though the framework won't use it directly in the CEC message. +.. _cec-log-addrs-flags: + +.. flat-table:: Flags for struct cec_log_addrs + :header-rows: 0 + :stub-columns: 0 + :widths: 3 1 4 + + + - .. _`CEC-LOG-ADDRS-FL-ALLOW-UNREG-FALLBACK`: + + - ``CEC_LOG_ADDRS_FL_ALLOW_UNREG_FALLBACK`` + + - 1 + + - By default if no logical address of the requested type can be claimed, then + it will go back to the unconfigured state. If this flag is set, then it will + fallback to the Unregistered logical address. Note that if the Unregistered + logical address was explicitly requested, then this flag has no effect. + .. _cec-versions: .. flat-table:: CEC Versions diff --git a/Documentation/media/uapi/cec/cec-ioc-dqevent.rst b/Documentation/media/uapi/cec/cec-ioc-dqevent.rst index 7a6d6d00ce19..f8caa28a96d2 100644 --- a/Documentation/media/uapi/cec/cec-ioc-dqevent.rst +++ b/Documentation/media/uapi/cec/cec-ioc-dqevent.rst @@ -15,7 +15,8 @@ CEC_DQEVENT - Dequeue a CEC event Synopsis ======== -.. cpp:function:: int ioctl( int fd, int request, struct cec_event *argp ) +.. c:function:: int ioctl( int fd, int request, struct cec_event *argp ) + :name: CEC_DQEVENT Arguments ========= @@ -36,7 +37,7 @@ Description and is currently only available as a staging kernel module. CEC devices can send asynchronous events. These can be retrieved by -calling :ref:`ioctl CEC_DQEVENT <CEC_DQEVENT>`. If the file descriptor is in +calling :c:func:`CEC_DQEVENT`. If the file descriptor is in non-blocking mode and no event is pending, then it will return -1 and set errno to the ``EAGAIN`` error code. @@ -64,7 +65,8 @@ it is guaranteed that the state did change in between the two events. - ``phys_addr`` - - The current physical address. + - The current physical address. This is ``CEC_PHYS_ADDR_INVALID`` if no + valid physical address is set. - .. row 2 @@ -72,7 +74,10 @@ it is guaranteed that the state did change in between the two events. - ``log_addr_mask`` - - The current set of claimed logical addresses. + - The current set of claimed logical addresses. This is 0 if no logical + addresses are claimed or if ``phys_addr`` is ``CEC_PHYS_ADDR_INVALID``. + If bit 15 is set (``1 << CEC_LOG_ADDR_UNREGISTERED``) then this device + has the unregistered logical address. In that case all other bits are 0. diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index a4d0a99de04d..ba818ecce6f9 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -609,7 +609,7 @@ A data-dependency barrier must also order against dependent writes: The data-dependency barrier must order the read into Q with the store into *Q. This prohibits this outcome: - (Q == B) && (B == 4) + (Q == &B) && (B == 4) Please note that this pattern should be rare. After all, the whole point of dependency ordering is to -prevent- writes to the data structure, along @@ -1928,6 +1928,7 @@ There are some more advanced barrier functions: See Documentation/DMA-API.txt for more information on consistent memory. + MMIO WRITE BARRIER ------------------ @@ -2075,7 +2076,7 @@ systems, and so cannot be counted on in such a situation to actually achieve anything at all - especially with respect to I/O accesses - unless combined with interrupt disabling operations. -See also the section on "Inter-CPU locking barrier effects". +See also the section on "Inter-CPU acquiring barrier effects". As an example, consider the following: diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 415154a487d0..a7697783ac4c 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -74,6 +74,8 @@ dns_resolver.txt - The DNS resolver module allows kernel servies to make DNS queries. driver.txt - Softnet driver issues. +ena.txt + - info on Amazon's Elastic Network Adapter (ENA) e100.txt - info on Intel's EtherExpress PRO/100 line of 10/100 boards e1000.txt diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index 1b5e7a7f2185..8a8d3d96f6c6 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -43,10 +43,15 @@ new interfaces to verify the compatibility. There is no need to reload the module if you plug your USB wifi adapter into your ma- chine after batman advanced was initially loaded. -To activate a given interface simply write "bat0" into its -"mesh_iface" file inside the batman_adv subfolder: +The batman-adv soft-interface can be created using the iproute2 +tool "ip" -# echo bat0 > /sys/class/net/eth0/batman_adv/mesh_iface +# ip link add name bat0 type batadv + +To activate a given interface simply attach it to the "bat0" +interface + +# ip link set dev eth0 master bat0 Repeat this step for all interfaces you wish to add. Now batman starts using/broadcasting on this/these interface(s). @@ -56,10 +61,10 @@ By reading the "iface_status" file you can check its status: # cat /sys/class/net/eth0/batman_adv/iface_status # active -To deactivate an interface you have to write "none" into its -"mesh_iface" file: +To deactivate an interface you have to detach it from the +"bat0" interface: -# echo none > /sys/class/net/eth0/batman_adv/mesh_iface +# ip link set dev eth0 nomaster All mesh wide settings can be found in batman's own interface diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index f20c884c048a..6d6c07cf1a9a 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -227,9 +227,9 @@ to address individual switches in the tree. dsa_switch: structure describing a switch device in the tree, referencing a dsa_switch_tree as a backpointer, slave network devices, master network device, -and a reference to the backing dsa_switch_driver +and a reference to the backing dsa_switch_ops -dsa_switch_driver: structure referencing function pointers, see below for a full +dsa_switch_ops: structure referencing function pointers, see below for a full description. Design limitations @@ -357,10 +357,10 @@ regular HWMON devices in /sys/class/hwmon/. Driver development ================== -DSA switch drivers need to implement a dsa_switch_driver structure which will +DSA switch drivers need to implement a dsa_switch_ops structure which will contain the various members described below. -register_switch_driver() registers this dsa_switch_driver in its internal list +register_switch_driver() registers this dsa_switch_ops in its internal list of drivers to probe for. unregister_switch_driver() does the exact opposite. Unless requested differently by setting the priv_size member accordingly, DSA @@ -379,7 +379,7 @@ Switch configuration buses, return a non-NULL string - setup: setup function for the switch, this function is responsible for setting - up the dsa_switch_driver private structure with all it needs: register maps, + up the dsa_switch_ops private structure with all it needs: register maps, interrupts, mutexes, locks etc.. This function is also expected to properly configure the switch to separate all network interfaces from each other, that is, they should be isolated by the switch hardware itself, typically by creating @@ -584,6 +584,29 @@ of DSA, would be the its port-based VLAN, used by the associated bridge device. function that the driver has to call for each MAC address known to be behind the given port. A switchdev object is used to carry the VID and FDB info. +- port_mdb_prepare: bridge layer function invoked when the bridge prepares the + installation of a multicast database entry. If the operation is not supported, + this function should return -EOPNOTSUPP to inform the bridge code to fallback + to a software implementation. No hardware setup must be done in this function. + See port_fdb_add for this and details. + +- port_mdb_add: bridge layer function invoked when the bridge wants to install + a multicast database entry, the switch hardware should be programmed with the + specified address in the specified VLAN ID in the forwarding database + associated with this VLAN ID. + +Note: VLAN ID 0 corresponds to the port private database, which, in the context +of DSA, would be the its port-based VLAN, used by the associated bridge device. + +- port_mdb_del: bridge layer function invoked when the bridge wants to remove a + multicast database entry, the switch hardware should be programmed to delete + the specified MAC address from the specified VLAN ID if it was mapped into + this port forwarding database. + +- port_mdb_dump: bridge layer function invoked with a switchdev callback + function that the driver has to call for each MAC address known to be behind + the given port. A switchdev object is used to carry the VID and MDB info. + TODO ==== diff --git a/Documentation/networking/ena.txt b/Documentation/networking/ena.txt new file mode 100644 index 000000000000..2b4b6f57e549 --- /dev/null +++ b/Documentation/networking/ena.txt @@ -0,0 +1,305 @@ +Linux kernel driver for Elastic Network Adapter (ENA) family: +============================================================= + +Overview: +========= +ENA is a networking interface designed to make good use of modern CPU +features and system architectures. + +The ENA device exposes a lightweight management interface with a +minimal set of memory mapped registers and extendable command set +through an Admin Queue. + +The driver supports a range of ENA devices, is link-speed independent +(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has +a negotiated and extendable feature set. + +Some ENA devices support SR-IOV. This driver is used for both the +SR-IOV Physical Function (PF) and Virtual Function (VF) devices. + +ENA devices enable high speed and low overhead network traffic +processing by providing multiple Tx/Rx queue pairs (the maximum number +is advertised by the device via the Admin Queue), a dedicated MSI-X +interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation, +and CPU cacheline optimized data placement. + +The ENA driver supports industry standard TCP/IP offload features such +as checksum offload and TCP transmit segmentation offload (TSO). +Receive-side scaling (RSS) is supported for multi-core scaling. + +The ENA driver and its corresponding devices implement health +monitoring mechanisms such as watchdog, enabling the device and driver +to recover in a manner transparent to the application, as well as +debug logs. + +Some of the ENA devices support a working mode called Low-latency +Queue (LLQ), which saves several more microseconds. + +Supported PCI vendor ID/device IDs: +=================================== +1d0f:0ec2 - ENA PF +1d0f:1ec2 - ENA PF with LLQ support +1d0f:ec20 - ENA VF +1d0f:ec21 - ENA VF with LLQ support + +ENA Source Code Directory Structure: +==================================== +ena_com.[ch] - Management communication layer. This layer is + responsible for the handling all the management + (admin) communication between the device and the + driver. +ena_eth_com.[ch] - Tx/Rx data path. +ena_admin_defs.h - Definition of ENA management interface. +ena_eth_io_defs.h - Definition of ENA data path interface. +ena_common_defs.h - Common definitions for ena_com layer. +ena_regs_defs.h - Definition of ENA PCI memory-mapped (MMIO) registers. +ena_netdev.[ch] - Main Linux kernel driver. +ena_syfsfs.[ch] - Sysfs files. +ena_ethtool.c - ethtool callbacks. +ena_pci_id_tbl.h - Supported device IDs. + +Management Interface: +===================== +ENA management interface is exposed by means of: +- PCIe Configuration Space +- Device Registers +- Admin Queue (AQ) and Admin Completion Queue (ACQ) +- Asynchronous Event Notification Queue (AENQ) + +ENA device MMIO Registers are accessed only during driver +initialization and are not involved in further normal device +operation. + +AQ is used for submitting management commands, and the +results/responses are reported asynchronously through ACQ. + +ENA introduces a very small set of management commands with room for +vendor-specific extensions. Most of the management operations are +framed in a generic Get/Set feature command. + +The following admin queue commands are supported: +- Create I/O submission queue +- Create I/O completion queue +- Destroy I/O submission queue +- Destroy I/O completion queue +- Get feature +- Set feature +- Configure AENQ +- Get statistics + +Refer to ena_admin_defs.h for the list of supported Get/Set Feature +properties. + +The Asynchronous Event Notification Queue (AENQ) is a uni-directional +queue used by the ENA device to send to the driver events that cannot +be reported using ACQ. AENQ events are subdivided into groups. Each +group may have multiple syndromes, as shown below + +The events are: + Group Syndrome + Link state change - X - + Fatal error - X - + Notification Suspend traffic + Notification Resume traffic + Keep-Alive - X - + +ACQ and AENQ share the same MSI-X vector. + +Keep-Alive is a special mechanism that allows monitoring of the +device's health. The driver maintains a watchdog (WD) handler which, +if fired, logs the current state and statistics then resets and +restarts the ENA device and driver. A Keep-Alive event is delivered by +the device every second. The driver re-arms the WD upon reception of a +Keep-Alive event. A missed Keep-Alive event causes the WD handler to +fire. + +Data Path Interface: +==================== +I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx +SQ correspondingly). Each SQ has a completion queue (CQ) associated +with it. + +The SQs and CQs are implemented as descriptor rings in contiguous +physical memory. + +The ENA driver supports two Queue Operation modes for Tx SQs: +- Regular mode + * In this mode the Tx SQs reside in the host's memory. The ENA + device fetches the ENA Tx descriptors and packet data from host + memory. +- Low Latency Queue (LLQ) mode or "push-mode". + * In this mode the driver pushes the transmit descriptors and the + first 128 bytes of the packet directly to the ENA device memory + space. The rest of the packet payload is fetched by the + device. For this operation mode, the driver uses a dedicated PCI + device memory BAR, which is mapped with write-combine capability. + +The Rx SQs support only the regular mode. + +Note: Not all ENA devices support LLQ, and this feature is negotiated + with the device upon initialization. If the ENA device does not + support LLQ mode, the driver falls back to the regular mode. + +The driver supports multi-queue for both Tx and Rx. This has various +benefits: +- Reduced CPU/thread/process contention on a given Ethernet interface. +- Cache miss rate on completion is reduced, particularly for data + cache lines that hold the sk_buff structures. +- Increased process-level parallelism when handling received packets. +- Increased data cache hit rate, by steering kernel processing of + packets to the CPU, where the application thread consuming the + packet is running. +- In hardware interrupt re-direction. + +Interrupt Modes: +================ +The driver assigns a single MSI-X vector per queue pair (for both Tx +and Rx directions). The driver assigns an additional dedicated MSI-X vector +for management (for ACQ and AENQ). + +Management interrupt registration is performed when the Linux kernel +probes the adapter, and it is de-registered when the adapter is +removed. I/O queue interrupt registration is performed when the Linux +interface of the adapter is opened, and it is de-registered when the +interface is closed. + +The management interrupt is named: + ena-mgmnt@pci:<PCI domain:bus:slot.function> +and for each queue pair, an interrupt is named: + <interface name>-Tx-Rx-<queue index> + +The ENA device operates in auto-mask and auto-clear interrupt +modes. That is, once MSI-X is delivered to the host, its Cause bit is +automatically cleared and the interrupt is masked. The interrupt is +unmasked by the driver after NAPI processing is complete. + +Interrupt Moderation: +===================== +ENA driver and device can operate in conventional or adaptive interrupt +moderation mode. + +In conventional mode the driver instructs device to postpone interrupt +posting according to static interrupt delay value. The interrupt delay +value can be configured through ethtool(8). The following ethtool +parameters are supported by the driver: tx-usecs, rx-usecs + +In adaptive interrupt moderation mode the interrupt delay value is +updated by the driver dynamically and adjusted every NAPI cycle +according to the traffic nature. + +By default ENA driver applies adaptive coalescing on Rx traffic and +conventional coalescing on Tx traffic. + +Adaptive coalescing can be switched on/off through ethtool(8) +adaptive_rx on|off parameter. + +The driver chooses interrupt delay value according to the number of +bytes and packets received between interrupt unmasking and interrupt +posting. The driver uses interrupt delay table that subdivides the +range of received bytes/packets into 5 levels and assigns interrupt +delay value to each level. + +The user can enable/disable adaptive moderation, modify the interrupt +delay table and restore its default values through sysfs. + +The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK +and can be configured by the ETHTOOL_STUNABLE command of the +SIOCETHTOOL ioctl. + +SKB: +The driver-allocated SKB for frames received from Rx handling using +NAPI context. The allocation method depends on the size of the packet. +If the frame length is larger than rx_copybreak, napi_get_frags() +is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer +content is copied (by CPU) to the SKB, and the buffer is recycled. + +Statistics: +=========== +The user can obtain ENA device and driver statistics using ethtool. +The driver can collect regular or extended statistics (including +per-queue stats) from the device. + +In addition the driver logs the stats to syslog upon device reset. + +MTU: +==== +The driver supports an arbitrarily large MTU with a maximum that is +negotiated with the device. The driver configures MTU using the +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU +via ip(8) and similar legacy tools. + +Stateless Offloads: +=================== +The ENA driver supports: +- TSO over IPv4/IPv6 +- TSO with ECN +- IPv4 header checksum offload +- TCP/UDP over IPv4/IPv6 checksum offloads + +RSS: +==== +- The ENA device supports RSS that allows flexible Rx traffic + steering. +- Toeplitz and CRC32 hash functions are supported. +- Different combinations of L2/L3/L4 fields can be configured as + inputs for hash functions. +- The driver configures RSS settings using the AQ SetFeature command + (ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and + ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties). +- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash + function delivered in the Rx CQ descriptor is set in the received + SKB. +- The user can provide a hash key, hash function, and configure the + indirection table through ethtool(8). + +DATA PATH: +========== +Tx: +--- +end_start_xmit() is called by the stack. This function does the following: +- Maps data buffers (skb->data and frags). +- Populates ena_buf for the push buffer (if the driver and device are + in push mode.) +- Prepares ENA bufs for the remaining frags. +- Allocates a new request ID from the empty req_id ring. The request + ID is the index of the packet in the Tx info. This is used for + out-of-order TX completions. +- Adds the packet to the proper place in the Tx ring. +- Calls ena_com_prepare_tx(), an ENA communication layer that converts + the ena_bufs to ENA descriptors (and adds meta ENA descriptors as + needed.) + * This function also copies the ENA descriptors and the push buffer + to the Device memory space (if in push mode.) +- Writes doorbell to the ENA device. +- When the ENA device finishes sending the packet, a completion + interrupt is raised. +- The interrupt handler schedules NAPI. +- The ena_clean_tx_irq() function is called. This function handles the + completion descriptors generated by the ENA, with a single + completion descriptor per completed packet. + * req_id is retrieved from the completion descriptor. The tx_info of + the packet is retrieved via the req_id. The data buffers are + unmapped and req_id is returned to the empty req_id ring. + * The function stops when the completion descriptors are completed or + the budget is reached. + +Rx: +--- +- When a packet is received from the ENA device. +- The interrupt handler schedules NAPI. +- The ena_clean_rx_irq() function is called. This function calls + ena_rx_pkt(), an ENA communication layer function, which returns the + number of descriptors used for a new unhandled packet, and zero if + no new packet is found. +- Then it calls the ena_clean_rx_irq() function. +- ena_eth_rx_skb() checks packet length: + * If the packet is small (len < rx_copybreak), the driver allocates + a SKB for the new packet, and copies the packet payload into the + SKB data buffer. + - In this way the original data buffer is not passed to the stack + and is reused for future Rx packets. + * Otherwise the function unmaps the Rx buffer, then allocates the + new SKB structure and hooks the Rx buffer to the SKB frags. +- The new SKB is updated with the necessary information (protocol, + checksum hw verify result, etc.), and then passed to the network + stack, using the NAPI interface function napi_gro_receive(). diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 9ae929395b24..3db8c67d2c8d 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -575,32 +575,33 @@ tcp_syncookies - BOOLEAN unconditionally generation of syncookies. tcp_fastopen - INTEGER - Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data - in the opening SYN packet. To use this feature, the client application - must use sendmsg() or sendto() with MSG_FASTOPEN flag rather than - connect() to perform a TCP handshake automatically. + Enable TCP Fast Open (RFC7413) to send and accept data in the opening + SYN packet. - The values (bitmap) are - 1: Enables sending data in the opening SYN on the client w/ MSG_FASTOPEN. - 2: Enables TCP Fast Open on the server side, i.e., allowing data in - a SYN packet to be accepted and passed to the application before - 3-way hand shake finishes. - 4: Send data in the opening SYN regardless of cookie availability and - without a cookie option. - 0x100: Accept SYN data w/o validating the cookie. - 0x200: Accept data-in-SYN w/o any cookie option present. - 0x400/0x800: Enable Fast Open on all listeners regardless of the - TCP_FASTOPEN socket option. The two different flags designate two - different ways of setting max_qlen without the TCP_FASTOPEN socket - option. + The client support is enabled by flag 0x1 (on by default). The client + then must use sendmsg() or sendto() with the MSG_FASTOPEN flag, + rather than connect() to send data in SYN. - Default: 1 + The server support is enabled by flag 0x2 (off by default). Then + either enable for all listeners with another flag (0x400) or + enable individual listeners via TCP_FASTOPEN socket option with + the option value being the length of the syn-data backlog. - Note that the client & server side Fast Open flags (1 and 2 - respectively) must be also enabled before the rest of flags can take - effect. + The values (bitmap) are + 0x1: (client) enables sending data in the opening SYN on the client. + 0x2: (server) enables the server support, i.e., allowing data in + a SYN packet to be accepted and passed to the + application before 3-way handshake finishes. + 0x4: (client) send data in the opening SYN regardless of cookie + availability and without a cookie option. + 0x200: (server) accept data-in-SYN w/o any cookie option present. + 0x400: (server) enable all listeners to support Fast Open by + default without explicit TCP_FASTOPEN socket option. + + Default: 0x1 - See include/net/tcp.h and the code for more details. + Note that that additional client or server features are only + effective if the basic support (0x1 and 0x2) are enabled respectively. tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt index 14422f8fcdc4..24196cef7c91 100644 --- a/Documentation/networking/ipvlan.txt +++ b/Documentation/networking/ipvlan.txt @@ -22,7 +22,7 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module There are no module parameters for this driver and it can be configured using IProute2/ip utility. - ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 } + ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | l3 | l3s } e.g. ip link add link ipvl0 eth0 type ipvlan mode l2 @@ -48,6 +48,11 @@ master device for the L2 processing and routing from that instance will be used before packets are queued on the outbound device. In this mode the slaves will not receive nor can send multicast / broadcast traffic. +4.3 L3S mode: + This is very similar to the L3 mode except that iptables (conn-tracking) +works in this mode and hence it is L3-symmetric (L3s). This will have slightly less +performance but that shouldn't matter since you are choosing this mode over plain-L3 +mode to make conn-tracking work. 5. What to choose (macvlan vs. ipvlan)? These two devices are very similar in many regards and the specific use diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 70c926ae212d..1b63bbc6b94f 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -725,7 +725,8 @@ The kernel interface functions are as follows: (*) End a client call. - void rxrpc_kernel_end_call(struct rxrpc_call *call); + void rxrpc_kernel_end_call(struct socket *sock, + struct rxrpc_call *call); This is used to end a previously begun call. The user_call_ID is expunged from AF_RXRPC's knowledge and will not be seen again in association with @@ -733,7 +734,9 @@ The kernel interface functions are as follows: (*) Send data through a call. - int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, + int rxrpc_kernel_send_data(struct socket *sock, + struct rxrpc_call *call, + struct msghdr *msg, size_t len); This is used to supply either the request part of a client call or the @@ -745,9 +748,42 @@ The kernel interface functions are as follows: The msg must not specify a destination address, control data or any flags other than MSG_MORE. len is the total amount of data to transmit. + (*) Receive data from a call. + + int rxrpc_kernel_recv_data(struct socket *sock, + struct rxrpc_call *call, + void *buf, + size_t size, + size_t *_offset, + bool want_more, + u32 *_abort) + + This is used to receive data from either the reply part of a client call + or the request part of a service call. buf and size specify how much + data is desired and where to store it. *_offset is added on to buf and + subtracted from size internally; the amount copied into the buffer is + added to *_offset before returning. + + want_more should be true if further data will be required after this is + satisfied and false if this is the last item of the receive phase. + + There are three normal returns: 0 if the buffer was filled and want_more + was true; 1 if the buffer was filled, the last DATA packet has been + emptied and want_more was false; and -EAGAIN if the function needs to be + called again. + + If the last DATA packet is processed but the buffer contains less than + the amount requested, EBADMSG is returned. If want_more wasn't set, but + more data was available, EMSGSIZE is returned. + + If a remote ABORT is detected, the abort code received will be stored in + *_abort and ECONNABORTED will be returned. + (*) Abort a call. - void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); + void rxrpc_kernel_abort_call(struct socket *sock, + struct rxrpc_call *call, + u32 abort_code); This is used to abort a call if it's still in an abortable state. The abort code specified will be placed in the ABORT message sent. @@ -820,47 +856,6 @@ The kernel interface functions are as follows: Other errors may be returned if the call had been aborted (-ECONNABORTED) or had timed out (-ETIME). - (*) Record the delivery of a data message. - - void rxrpc_kernel_data_consumed(struct rxrpc_call *call, - struct sk_buff *skb); - - This is used to record a data message as having been consumed and to - update the ACK state for the call. The message must still be passed to - rxrpc_kernel_free_skb() for disposal by the caller. - - (*) Free a message. - - void rxrpc_kernel_free_skb(struct sk_buff *skb); - - This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC - socket. - - (*) Determine if a data message is the last one on a call. - - bool rxrpc_kernel_is_data_last(struct sk_buff *skb); - - This is used to determine if a socket buffer holds the last data message - to be received for a call (true will be returned if it does, false - if not). - - The data message will be part of the reply on a client call and the - request on an incoming call. In the latter case there will be more - messages, but in the former case there will not. - - (*) Get the abort code from an abort message. - - u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); - - This is used to extract the abort code from a remote abort message. - - (*) Get the error number from a local or network error message. - - int rxrpc_kernel_get_error_number(struct sk_buff *skb); - - This is used to extract the error number from a message indicating either - a local error occurred or a network error occurred. - (*) Allocate a null key for doing anonymous security. struct key *rxrpc_get_null_key(const char *keyname); @@ -868,6 +863,13 @@ The kernel interface functions are as follows: This is used to allocate a null RxRPC key that can be used to indicate anonymous security for a particular domain. + (*) Get the peer address of a call. + + void rxrpc_kernel_get_peer(struct socket *sock, struct rxrpc_call *call, + struct sockaddr_rxrpc *_srx); + + This is used to find the remote peer address of a call. + ======================= CONFIGURABLE PARAMETERS diff --git a/Documentation/networking/strparser.txt b/Documentation/networking/strparser.txt new file mode 100644 index 000000000000..a0bf573dfa61 --- /dev/null +++ b/Documentation/networking/strparser.txt @@ -0,0 +1,136 @@ +Stream Parser +------------- + +The stream parser (strparser) is a utility that parses messages of an +application layer protocol running over a TCP connection. The stream +parser works in conjunction with an upper layer in the kernel to provide +kernel support for application layer messages. For instance, Kernel +Connection Multiplexor (KCM) uses the Stream Parser to parse messages +using a BPF program. + +Interface +--------- + +The API includes a context structure, a set of callbacks, utility +functions, and a data_ready function. The callbacks include +a parse_msg function that is called to perform parsing (e.g. +BPF parsing in case of KCM), and a rcv_msg function that is called +when a full message has been completed. + +A stream parser can be instantiated for a TCP connection. This is done +by: + +strp_init(struct strparser *strp, struct sock *csk, + struct strp_callbacks *cb) + +strp is a struct of type strparser that is allocated by the upper layer. +csk is the TCP socket associated with the stream parser. Callbacks are +called by the stream parser. + +Callbacks +--------- + +There are four callbacks: + +int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); + + parse_msg is called to determine the length of the next message + in the stream. The upper layer must implement this function. It + should parse the sk_buff as containing the headers for the + next application layer messages in the stream. + + The skb->cb in the input skb is a struct strp_rx_msg. Only + the offset field is relevant in parse_msg and gives the offset + where the message starts in the skb. + + The return values of this function are: + + >0 : indicates length of successfully parsed message + 0 : indicates more data must be received to parse the message + -ESTRPIPE : current message should not be processed by the + kernel, return control of the socket to userspace which + can proceed to read the messages itself + other < 0 : Error is parsing, give control back to userspace + assuming that synchronization is lost and the stream + is unrecoverable (application expected to close TCP socket) + + In the case that an error is returned (return value is less than + zero) the stream parser will set the error on TCP socket and wake + it up. If parse_msg returned -ESTRPIPE and the stream parser had + previously read some bytes for the current message, then the error + set on the attached socket is ENODATA since the stream is + unrecoverable in that case. + +void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + + rcv_msg is called when a full message has been received and + is queued. The callee must consume the sk_buff; it can + call strp_pause to prevent any further messages from being + received in rcv_msg (see strp_pause below). This callback + must be set. + + The skb->cb in the input skb is a struct strp_rx_msg. This + struct contains two fields: offset and full_len. Offset is + where the message starts in the skb, and full_len is the + the length of the message. skb->len - offset may be greater + then full_len since strparser does not trim the skb. + +int (*read_sock_done)(struct strparser *strp, int err); + + read_sock_done is called when the stream parser is done reading + the TCP socket. The stream parser may read multiple messages + in a loop and this function allows cleanup to occur when existing + the loop. If the callback is not set (NULL in strp_init) a + default function is used. + +void (*abort_parser)(struct strparser *strp, int err); + + This function is called when stream parser encounters an error + in parsing. The default function stops the stream parser for the + TCP socket and sets the error in the socket. The default function + can be changed by setting the callback to non-NULL in strp_init. + +Functions +--------- + +The upper layer calls strp_tcp_data_ready when data is ready on the lower +socket for strparser to process. This should be called from a data_ready +callback that is set on the socket. + +strp_stop is called to completely stop stream parser operations. This +is called internally when the stream parser encounters an error, and +it is called from the upper layer when unattaching a TCP socket. + +strp_done is called to unattach the stream parser from the TCP socket. +This must be called after the stream processor has be stopped. + +strp_check_rcv is called to check for new messages on the socket. This +is normally called at initialization of the a stream parser instance +of after strp_unpause. + +Statistics +---------- + +Various counters are kept for each stream parser for a TCP socket. +These are in the strp_stats structure. strp_aggr_stats is a convenience +structure for accumulating statistics for multiple stream parser +instances. save_strp_stats and aggregate_strp_stats are helper functions +to save and aggregate statistics. + +Message assembly limits +----------------------- + +The stream parser provide mechanisms to limit the resources consumed by +message assembly. + +A timer is set when assembly starts for a new message. The message +timeout is taken from rcvtime for the associated TCP socket. If the +timer fires before assembly completes the stream parser is aborted +and the ETIMEDOUT error is set on the TCP socket. + +Message length is limited to the receive buffer size of the associated +TCP socket. If the length returned by parse_msg is greater than +the socket buffer size then the stream parser is aborted with +EMSGSIZE error set on the TCP socket. Note that this makes the +maximum size of receive skbuffs for a socket with a stream parser +to be 2*sk_rcvbuf of the TCP socket. diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index 31c39115834d..2bbac05ab9e2 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -283,15 +283,10 @@ be sent to the port netdev for processing by the bridge driver. The bridge should not reflood the packet to the same ports the device flooded, otherwise there will be duplicate packets on the wire. -To avoid duplicate packets, the device/driver should mark a packet as already -forwarded using skb->offload_fwd_mark. The same mark is set on the device -ports in the domain using dev->offload_fwd_mark. If the skb->offload_fwd_mark -is non-zero and matches the forwarding egress port's dev->skb_mark, the kernel -will drop the skb right before transmit on the egress port, with the -understanding that the device already forwarded the packet on same egress port. -The driver can use switchdev_port_fwd_mark_set() to set a globally unique mark -for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) and -a group ifindex. +To avoid duplicate packets, the switch driver should mark a packet as already +forwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark +the skb using the ingress bridge port's mark and prevent it from being forwarded +through any bridge port with the same mark. It is possible for the switch device to not handle flooding and push the packets up to the bridge driver for flooding. This is not ideal as the number @@ -319,30 +314,29 @@ the kernel, with the device doing the FIB lookup and forwarding. The device does a longest prefix match (LPM) on FIB entries matching route prefix and forwards the packet to the matching FIB entry's nexthop(s) egress ports. -To program the device, the driver implements support for -SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops. -switchdev_port_obj_add is used for both adding a new FIB entry to the device, -or modifying an existing entry on the device. +To program the device, the driver has to register a FIB notifier handler +using register_fib_notifier. The following events are available: +FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device, + or modifying an existing entry on the device. +FIB_EVENT_ENTRY_DEL: used for removing a FIB entry +FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes -XXX: Currently, only SWITCHDEV_OBJ_ID_IPV4_FIB objects are supported. +FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass: -SWITCHDEV_OBJ_ID_IPV4_FIB object passes: - - struct switchdev_obj_ipv4_fib { /* IPV4_FIB */ + struct fib_entry_notifier_info { + struct fib_notifier_info info; /* must be first */ u32 dst; int dst_len; struct fib_info *fi; u8 tos; u8 type; - u32 nlflags; u32 tb_id; - } ipv4_fib; + u32 nlflags; + }; to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The *fi structure holds details on the route and route's nexthops. *dev is one of the -port netdevs mentioned in the routes next hop list. If the output port netdevs -referenced in the route's nexthop list don't all have the same switch ID, the -driver is not called to add/modify/delete the FIB entry. +port netdevs mentioned in the route's next hop list. Routes offloaded to the device are labeled with "offload" in the ip route listing: @@ -360,6 +354,8 @@ listing: 12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15 +The "offload" flag is set in case at least one device offloads the FIB entry. + XXX: add/mod/del IPv6 FIB API Nexthop Resolution diff --git a/Documentation/remoteproc.txt b/Documentation/remoteproc.txt index ef0219fa4bb4..f07597482351 100644 --- a/Documentation/remoteproc.txt +++ b/Documentation/remoteproc.txt @@ -101,9 +101,9 @@ int dummy_rproc_example(struct rproc *my_rproc) On success, the new rproc is returned, and on failure, NULL. Note: _never_ directly deallocate @rproc, even if it was not registered - yet. Instead, when you need to unroll rproc_alloc(), use rproc_put(). + yet. Instead, when you need to unroll rproc_alloc(), use rproc_free(). - void rproc_put(struct rproc *rproc) + void rproc_free(struct rproc *rproc) - Free an rproc handle that was allocated by rproc_alloc. This function essentially unrolls rproc_alloc(), by decrementing the rproc's refcount. It doesn't directly free rproc; that would happen @@ -131,7 +131,7 @@ int dummy_rproc_example(struct rproc *my_rproc) has completed successfully. After rproc_del() returns, @rproc is still valid, and its - last refcount should be decremented by calling rproc_put(). + last refcount should be decremented by calling rproc_free(). Returns 0 on success and -EINVAL if @rproc isn't valid. diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 53a2fe1ae8b8..8e37b0ba2c9d 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -16,6 +16,7 @@ CONTENTS 4.1 System-wide settings 4.2 Task interface 4.3 Default behavior + 4.4 Behavior of sched_yield() 5. Tasks CPU affinity 5.1 SCHED_DEADLINE and cpusets HOWTO 6. Future plans @@ -426,6 +427,23 @@ CONTENTS Finally, notice that in order not to jeopardize the admission control a -deadline task cannot fork. + +4.4 Behavior of sched_yield() +----------------------------- + + When a SCHED_DEADLINE task calls sched_yield(), it gives up its + remaining runtime and is immediately throttled, until the next + period, when its runtime will be replenished (a special flag + dl_yielded is set and used to handle correctly throttling and runtime + replenishment after a call to sched_yield()). + + This behavior of sched_yield() allows the task to wake-up exactly at + the beginning of the next period. Also, this may be useful in the + future with bandwidth reclaiming mechanisms, where sched_yield() will + make the leftoever runtime available for reclamation by other + SCHED_DEADLINE tasks. + + 5. Tasks CPU affinity ===================== diff --git a/Documentation/scsi/scsi-parameters.txt b/Documentation/scsi/scsi-parameters.txt index 1241ac11edb1..d5ae6ced6be3 100644 --- a/Documentation/scsi/scsi-parameters.txt +++ b/Documentation/scsi/scsi-parameters.txt @@ -79,8 +79,6 @@ parameters may be changed at runtime by the command ncr53c8xx= [HW,SCSI] - nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. - osst= [HW,SCSI] SCSI Tape Driver Format: <buffer_size>,<write_threshold> See also Documentation/scsi/st.txt. diff --git a/Documentation/serial/serial-rs485.txt b/Documentation/serial/serial-rs485.txt index 2253b8b45a74..389fcd4759e9 100644 --- a/Documentation/serial/serial-rs485.txt +++ b/Documentation/serial/serial-rs485.txt @@ -45,9 +45,8 @@ #include <linux/serial.h> - /* RS485 ioctls: */ - #define TIOCGRS485 0x542E - #define TIOCSRS485 0x542F + /* Include definition for RS485 ioctls: TIOCGRS485 and TIOCSRS485 */ + #include <sys/ioctl.h> /* Open your specific device (e.g., /dev/mydevice): */ int fd = open ("/dev/mydevice", O_RDWR); diff --git a/Documentation/sphinx-static/theme_overrides.css b/Documentation/sphinx-static/theme_overrides.css index e88461c4c1e6..d5764a4de5a2 100644 --- a/Documentation/sphinx-static/theme_overrides.css +++ b/Documentation/sphinx-static/theme_overrides.css @@ -42,6 +42,20 @@ caption a.headerlink { opacity: 0; } caption a.headerlink:hover { opacity: 1; } + /* Menu selection and keystrokes */ + + span.menuselection { + color: blue; + font-family: "Courier New", Courier, monospace + } + + code.kbd, code.kbd span { + color: white; + background-color: darkblue; + font-weight: bold; + font-family: "Courier New", Courier, monospace + } + /* inline literal: drop the borderbox, padding and red color */ code, .rst-content tt, .rst-content code { @@ -55,5 +69,4 @@ .rst-content tt.literal,.rst-content tt.literal,.rst-content code.literal { color: inherit; } - } diff --git a/Documentation/sphinx/cdomain.py b/Documentation/sphinx/cdomain.py new file mode 100644 index 000000000000..df0419c62096 --- /dev/null +++ b/Documentation/sphinx/cdomain.py @@ -0,0 +1,165 @@ +# -*- coding: utf-8; mode: python -*- +# pylint: disable=W0141,C0113,C0103,C0325 +u""" + cdomain + ~~~~~~~ + + Replacement for the sphinx c-domain. + + :copyright: Copyright (C) 2016 Markus Heiser + :license: GPL Version 2, June 1991 see Linux/COPYING for details. + + List of customizations: + + * Moved the *duplicate C object description* warnings for function + declarations in the nitpicky mode. See Sphinx documentation for + the config values for ``nitpick`` and ``nitpick_ignore``. + + * Add option 'name' to the "c:function:" directive. With option 'name' the + ref-name of a function can be modified. E.g.:: + + .. c:function:: int ioctl( int fd, int request ) + :name: VIDIOC_LOG_STATUS + + The func-name (e.g. ioctl) remains in the output but the ref-name changed + from 'ioctl' to 'VIDIOC_LOG_STATUS'. The function is referenced by:: + + * :c:func:`VIDIOC_LOG_STATUS` or + * :any:`VIDIOC_LOG_STATUS` (``:any:`` needs sphinx 1.3) + + * Handle signatures of function-like macros well. Don't try to deduce + arguments types of function-like macros. + +""" + +from docutils import nodes +from docutils.parsers.rst import directives + +import sphinx +from sphinx import addnodes +from sphinx.domains.c import c_funcptr_sig_re, c_sig_re +from sphinx.domains.c import CObject as Base_CObject +from sphinx.domains.c import CDomain as Base_CDomain + +__version__ = '1.0' + +# Get Sphinx version +major, minor, patch = map(int, sphinx.__version__.split(".")) + +def setup(app): + + app.override_domain(CDomain) + + return dict( + version = __version__, + parallel_read_safe = True, + parallel_write_safe = True + ) + +class CObject(Base_CObject): + + """ + Description of a C language object. + """ + option_spec = { + "name" : directives.unchanged + } + + def handle_func_like_macro(self, sig, signode): + u"""Handles signatures of function-like macros. + + If the objtype is 'function' and the the signature ``sig`` is a + function-like macro, the name of the macro is returned. Otherwise + ``False`` is returned. """ + + if not self.objtype == 'function': + return False + + m = c_funcptr_sig_re.match(sig) + if m is None: + m = c_sig_re.match(sig) + if m is None: + raise ValueError('no match') + + rettype, fullname, arglist, _const = m.groups() + arglist = arglist.strip() + if rettype or not arglist: + return False + + arglist = arglist.replace('`', '').replace('\\ ', '') # remove markup + arglist = [a.strip() for a in arglist.split(",")] + + # has the first argument a type? + if len(arglist[0].split(" ")) > 1: + return False + + # This is a function-like macro, it's arguments are typeless! + signode += addnodes.desc_name(fullname, fullname) + paramlist = addnodes.desc_parameterlist() + signode += paramlist + + for argname in arglist: + param = addnodes.desc_parameter('', '', noemph=True) + # separate by non-breaking space in the output + param += nodes.emphasis(argname, argname) + paramlist += param + + return fullname + + def handle_signature(self, sig, signode): + """Transform a C signature into RST nodes.""" + + fullname = self.handle_func_like_macro(sig, signode) + if not fullname: + fullname = super(CObject, self).handle_signature(sig, signode) + + if "name" in self.options: + if self.objtype == 'function': + fullname = self.options["name"] + else: + # FIXME: handle :name: value of other declaration types? + pass + return fullname + + def add_target_and_index(self, name, sig, signode): + # for C API items we add a prefix since names are usually not qualified + # by a module name and so easily clash with e.g. section titles + targetname = 'c.' + name + if targetname not in self.state.document.ids: + signode['names'].append(targetname) + signode['ids'].append(targetname) + signode['first'] = (not self.names) + self.state.document.note_explicit_target(signode) + inv = self.env.domaindata['c']['objects'] + if (name in inv and self.env.config.nitpicky): + if self.objtype == 'function': + if ('c:func', name) not in self.env.config.nitpick_ignore: + self.state_machine.reporter.warning( + 'duplicate C object description of %s, ' % name + + 'other instance in ' + self.env.doc2path(inv[name][0]), + line=self.lineno) + inv[name] = (self.env.docname, self.objtype) + + indextext = self.get_index_text(name) + if indextext: + if major == 1 and minor < 4: + # indexnode's tuple changed in 1.4 + # https://github.com/sphinx-doc/sphinx/commit/e6a5a3a92e938fcd75866b4227db9e0524d58f7c + self.indexnode['entries'].append( + ('single', indextext, targetname, '')) + else: + self.indexnode['entries'].append( + ('single', indextext, targetname, '', None)) + +class CDomain(Base_CDomain): + + """C language domain.""" + name = 'c' + label = 'C' + directives = { + 'function': CObject, + 'member': CObject, + 'macro': CObject, + 'type': CObject, + 'var': CObject, + } diff --git a/Documentation/sphinx/kernel-doc.py b/Documentation/sphinx/kernel-doc.py index f6920c0af6ee..d15e07f36881 100644 --- a/Documentation/sphinx/kernel-doc.py +++ b/Documentation/sphinx/kernel-doc.py @@ -39,6 +39,8 @@ from docutils.parsers.rst import directives from sphinx.util.compat import Directive from sphinx.ext.autodoc import AutodocReporter +__version__ = '1.0' + class KernelDocDirective(Directive): """Extract kernel-doc comments from the specified file""" required_argument = 1 @@ -139,3 +141,9 @@ def setup(app): app.add_config_value('kerneldoc_verbosity', 1, 'env') app.add_directive('kernel-doc', KernelDocDirective) + + return dict( + version = __version__, + parallel_read_safe = True, + parallel_write_safe = True + ) diff --git a/Documentation/sphinx/kernel_include.py b/Documentation/sphinx/kernel_include.py index db5738238733..f523aa68a36b 100755 --- a/Documentation/sphinx/kernel_include.py +++ b/Documentation/sphinx/kernel_include.py @@ -39,11 +39,18 @@ from docutils.parsers.rst import directives from docutils.parsers.rst.directives.body import CodeBlock, NumberLines from docutils.parsers.rst.directives.misc import Include +__version__ = '1.0' + # ============================================================================== def setup(app): # ============================================================================== app.add_directive("kernel-include", KernelInclude) + return dict( + version = __version__, + parallel_read_safe = True, + parallel_write_safe = True + ) # ============================================================================== class KernelInclude(Include): diff --git a/Documentation/sphinx/load_config.py b/Documentation/sphinx/load_config.py new file mode 100644 index 000000000000..301a21aa4f63 --- /dev/null +++ b/Documentation/sphinx/load_config.py @@ -0,0 +1,32 @@ +# -*- coding: utf-8; mode: python -*- +# pylint: disable=R0903, C0330, R0914, R0912, E0401 + +import os +import sys +from sphinx.util.pycompat import execfile_ + +# ------------------------------------------------------------------------------ +def loadConfig(namespace): +# ------------------------------------------------------------------------------ + + u"""Load an additional configuration file into *namespace*. + + The name of the configuration file is taken from the environment + ``SPHINX_CONF``. The external configuration file extends (or overwrites) the + configuration values from the origin ``conf.py``. With this you are able to + maintain *build themes*. """ + + config_file = os.environ.get("SPHINX_CONF", None) + if (config_file is not None + and os.path.normpath(namespace["__file__"]) != os.path.normpath(config_file) ): + config_file = os.path.abspath(config_file) + + if os.path.isfile(config_file): + sys.stdout.write("load additional sphinx-config: %s\n" % config_file) + config = namespace.copy() + config['__file__'] = config_file + execfile_(config_file, config) + del config['__file__'] + namespace.update(config) + else: + sys.stderr.write("WARNING: additional sphinx-config not found: %s\n" % config_file) diff --git a/Documentation/sphinx/parse-headers.pl b/Documentation/sphinx/parse-headers.pl index 34bd9e2630b0..74089b0da798 100755 --- a/Documentation/sphinx/parse-headers.pl +++ b/Documentation/sphinx/parse-headers.pl @@ -220,7 +220,7 @@ $data =~ s/\n\s+\n/\n\n/g; # # Add escape codes for special characters # -$data =~ s,([\_\`\*\<\>\&\\\\:\/\|]),\\$1,g; +$data =~ s,([\_\`\*\<\>\&\\\\:\/\|\%\$\#\{\}\~\^]),\\$1,g; $data =~ s,DEPRECATED,**DEPRECATED**,g; diff --git a/Documentation/sphinx/rstFlatTable.py b/Documentation/sphinx/rstFlatTable.py index 26db852e3c74..55f275793028 100644..100755 --- a/Documentation/sphinx/rstFlatTable.py +++ b/Documentation/sphinx/rstFlatTable.py @@ -73,6 +73,12 @@ def setup(app): roles.register_local_role('cspan', c_span) roles.register_local_role('rspan', r_span) + return dict( + version = __version__, + parallel_read_safe = True, + parallel_write_safe = True + ) + # ============================================================================== def c_span(name, rawtext, text, lineno, inliner, options=None, content=None): # ============================================================================== diff --git a/Documentation/stable_api_nonsense.txt b/Documentation/stable_api_nonsense.txt index db3be892afb2..24f5aeecee91 100644 --- a/Documentation/stable_api_nonsense.txt +++ b/Documentation/stable_api_nonsense.txt @@ -1,17 +1,26 @@ +.. _stable_api_nonsense: + The Linux Kernel Driver Interface +================================== + (all of your questions answered and then some) Greg Kroah-Hartman <greg@kroah.com> -This is being written to try to explain why Linux does not have a binary -kernel interface, nor does it have a stable kernel interface. Please -realize that this article describes the _in kernel_ interfaces, not the -kernel to userspace interfaces. The kernel to userspace interface is -the one that application programs use, the syscall interface. That -interface is _very_ stable over time, and will not break. I have old -programs that were built on a pre 0.9something kernel that still work -just fine on the latest 2.6 kernel release. That interface is the one -that users and application programmers can count on being stable. +This is being written to try to explain why Linux **does not have a binary +kernel interface, nor does it have a stable kernel interface**. + +.. note:: + + Please realize that this article describes the **in kernel** interfaces, not + the kernel to userspace interfaces. + + The kernel to userspace interface is the one that application programs use, + the syscall interface. That interface is **very** stable over time, and + will not break. I have old programs that were built on a pre 0.9something + kernel that still work just fine on the latest 2.6 kernel release. + That interface is the one that users and application programmers can count + on being stable. Executive Summary @@ -33,7 +42,7 @@ to worry about the in-kernel interfaces changing. For the majority of the world, they neither see this interface, nor do they care about it at all. -First off, I'm not going to address _any_ legal issues about closed +First off, I'm not going to address **any** legal issues about closed source, hidden source, binary blobs, source wrappers, or any other term that describes kernel drivers that do not have their source code released under the GPL. Please consult a lawyer if you have any legal @@ -51,19 +60,23 @@ Binary Kernel Interface Assuming that we had a stable kernel source interface for the kernel, a binary interface would naturally happen too, right? Wrong. Please consider the following facts about the Linux kernel: + - Depending on the version of the C compiler you use, different kernel data structures will contain different alignment of structures, and possibly include different functions in different ways (putting functions inline or not.) The individual function organization isn't that important, but the different data structure padding is very important. + - Depending on what kernel build options you select, a wide range of different things can be assumed by the kernel: + - different structures can contain different fields - Some functions may not be implemented at all, (i.e. some locks compile away to nothing for non-SMP builds.) - Memory within the kernel can be aligned in different ways, depending on the build options. + - Linux runs on a wide range of different processor architectures. There is no way that binary drivers from one architecture will run on another architecture properly. @@ -105,6 +118,7 @@ As a specific examples of this, the in-kernel USB interfaces have undergone at least three different reworks over the lifetime of this subsystem. These reworks were done to address a number of different issues: + - A change from a synchronous model of data streams to an asynchronous one. This reduced the complexity of a number of drivers and increased the throughput of all USB drivers such that we are now @@ -166,6 +180,7 @@ very little effort on your part. The very good side effects of having your driver in the main kernel tree are: + - The quality of the driver will rise as the maintenance costs (to the original developer) will decrease. - Other developers will add features to your driver. @@ -175,7 +190,7 @@ are: changes require it. - The driver automatically gets shipped in all Linux distributions without having to ask the distros to add it. - + As Linux supports a larger number of different devices "out of the box" than any other operating system, and it supports these devices on more different processor architectures than any other operating system, this diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt index ffd4575ec9f2..4d82e31b7958 100644 --- a/Documentation/stable_kernel_rules.txt +++ b/Documentation/stable_kernel_rules.txt @@ -1,4 +1,7 @@ -Everything you ever wanted to know about Linux -stable releases. +.. _stable_kernel_rules: + +Everything you ever wanted to know about Linux -stable releases +=============================================================== Rules on what kind of patches are accepted, and which ones are not, into the "-stable" tree: @@ -23,68 +26,94 @@ Rules on what kind of patches are accepted, and which ones are not, into the race can be exploited is also provided. - It cannot contain any "trivial" fixes in it (spelling changes, whitespace cleanups, etc). - - It must follow the Documentation/SubmittingPatches rules. + - It must follow the + :ref:`Documentation/SubmittingPatches <submittingpatches>` + rules. - It or an equivalent fix must already exist in Linus' tree (upstream). -Procedure for submitting patches to the -stable tree: +Procedure for submitting patches to the -stable tree +---------------------------------------------------- - If the patch covers files in net/ or drivers/net please follow netdev stable submission guidelines as described in Documentation/networking/netdev-FAQ.txt - Security patches should not be handled (solely) by the -stable review - process but should follow the procedures in Documentation/SecurityBugs. + process but should follow the procedures in + :ref:`Documentation/SecurityBugs <securitybugs>`. + +For all other submissions, choose one of the following procedures +----------------------------------------------------------------- + +.. _option_1: -For all other submissions, choose one of the following procedures: +Option 1 +******** - --- Option 1 --- +To have the patch automatically included in the stable tree, add the tag + +.. code-block:: none - To have the patch automatically included in the stable tree, add the tag Cc: stable@vger.kernel.org - in the sign-off area. Once the patch is merged it will be applied to - the stable tree without anything else needing to be done by the author - or subsystem maintainer. - --- Option 2 --- +in the sign-off area. Once the patch is merged it will be applied to +the stable tree without anything else needing to be done by the author +or subsystem maintainer. + +.. _option_2: - After the patch has been merged to Linus' tree, send an email to - stable@vger.kernel.org containing the subject of the patch, the commit ID, - why you think it should be applied, and what kernel version you wish it to - be applied to. +Option 2 +******** - --- Option 3 --- +After the patch has been merged to Linus' tree, send an email to +stable@vger.kernel.org containing the subject of the patch, the commit ID, +why you think it should be applied, and what kernel version you wish it to +be applied to. - Send the patch, after verifying that it follows the above rules, to - stable@vger.kernel.org. You must note the upstream commit ID in the - changelog of your submission, as well as the kernel version you wish - it to be applied to. +.. _option_3: -Option 1 is *strongly* preferred, is the easiest and most common. Options 2 and -3 are more useful if the patch isn't deemed worthy at the time it is applied to -a public git tree (for instance, because it deserves more regression testing -first). Option 3 is especially useful if the patch needs some special handling -to apply to an older kernel (e.g., if API's have changed in the meantime). +Option 3 +******** -Note that for Option 3, if the patch deviates from the original upstream patch -(for example because it had to be backported) this must be very clearly -documented and justified in the patch description. +Send the patch, after verifying that it follows the above rules, to +stable@vger.kernel.org. You must note the upstream commit ID in the +changelog of your submission, as well as the kernel version you wish +it to be applied to. + +:ref:`option_1` is **strongly** preferred, is the easiest and most common. +:ref:`option_2` and :ref:`option_3` are more useful if the patch isn't deemed +worthy at the time it is applied to a public git tree (for instance, because +it deserves more regression testing first). :ref:`option_3` is especially +useful if the patch needs some special handling to apply to an older kernel +(e.g., if API's have changed in the meantime). + +Note that for :ref:`option_3`, if the patch deviates from the original +upstream patch (for example because it had to be backported) this must be very +clearly documented and justified in the patch description. The upstream commit ID must be specified with a separate line above the commit text, like this: +.. code-block:: none + commit <sha1> upstream. Additionally, some patches submitted via Option 1 may have additional patch prerequisites which can be cherry-picked. This can be specified in the following format in the sign-off area: +.. code-block:: none + Cc: <stable@vger.kernel.org> # 3.3.x: a1f84a3: sched: Check for idle Cc: <stable@vger.kernel.org> # 3.3.x: 1b9508f: sched: Rate-limit newidle Cc: <stable@vger.kernel.org> # 3.3.x: fd21073: sched: Fix affinity logic Cc: <stable@vger.kernel.org> # 3.3.x - Signed-off-by: Ingo Molnar <mingo@elte.hu> + Signed-off-by: Ingo Molnar <mingo@elte.hu> + +The tag sequence has the meaning of: + +.. code-block:: none - The tag sequence has the meaning of: git cherry-pick a1f84a3 git cherry-pick 1b9508f git cherry-pick fd21073 @@ -93,12 +122,17 @@ format in the sign-off area: Also, some patches may have kernel version prerequisites. This can be specified in the following format in the sign-off area: +.. code-block:: none + Cc: <stable@vger.kernel.org> # 3.3.x- - The tag has the meaning of: +The tag has the meaning of: + +.. code-block:: none + git cherry-pick <this commit> - For each "-stable" tree starting with the specified version. +For each "-stable" tree starting with the specified version. Following the submission: @@ -109,7 +143,8 @@ Following the submission: other developers and by the relevant subsystem maintainer. -Review cycle: +Review cycle +------------ - When the -stable maintainers decide for a review cycle, the patches will be sent to the review committee, and the maintainer of the affected area of @@ -125,17 +160,22 @@ Review cycle: security kernel team, and not go through the normal review cycle. Contact the kernel security team for more details on this procedure. -Trees: +Trees +----- - The queues of patches, for both completed versions and in progress versions can be found at: + http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git + - The finalized and tagged releases of all stable kernels can be found in separate branches per version at: + http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git -Review committee: +Review committee +---------------- - This is made up of a number of kernel developers who have volunteered for this task, and a few that haven't. diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index 477927becacb..ea8d7b4e53f0 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt @@ -15,6 +15,8 @@ The updated API replacements are: DEFINE_STATIC_KEY_TRUE(key); DEFINE_STATIC_KEY_FALSE(key); +DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); +DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); static_branch_likely() static_branch_unlikely() @@ -140,6 +142,13 @@ static_branch_inc(), will change the branch back to true. Likewise, if the key is initialized false, a 'static_branch_inc()', will change the branch to true. And then a 'static_branch_dec()', will again make the branch false. +Where an array of keys is required, it can be defined as: + + DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); + +or: + + DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); 4) Architecture level code patching interface, 'jump labels' diff --git a/Documentation/sysctl/README b/Documentation/sysctl/README index 8c3306e01d52..91f54ffa0077 100644 --- a/Documentation/sysctl/README +++ b/Documentation/sysctl/README @@ -69,6 +69,7 @@ proc/ <empty> sunrpc/ SUN Remote Procedure Call (NFS) vm/ memory management tuning buffer and cache management +user/ Per user per user namespace limits These are the subdirs I have on my system. There might be more or other subdirs in another setup. If you see another dir, I'd diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 302b5ed616a6..35e17f748ca7 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -265,6 +265,13 @@ aio-nr can grow to. ============================================================== +mount-max: + +This denotes the maximum number of mounts that may exist +in a mount namespace. + +============================================================== + 2. /proc/sys/fs/binfmt_misc ---------------------------------------------------------- diff --git a/Documentation/sysctl/user.txt b/Documentation/sysctl/user.txt new file mode 100644 index 000000000000..1291c498f78f --- /dev/null +++ b/Documentation/sysctl/user.txt @@ -0,0 +1,66 @@ +Documentation for /proc/sys/user/* kernel version 4.9.0 + (c) 2016 Eric Biederman <ebiederm@xmission.com> + +============================================================== + +This file contains the documetation for the sysctl files in +/proc/sys/user. + +The files in this directory can be used to override the default +limits on the number of namespaces and other objects that have +per user per user namespace limits. + +The primary purpose of these limits is to stop programs that +malfunction and attempt to create a ridiculous number of objects, +before the malfunction becomes a system wide problem. It is the +intention that the defaults of these limits are set high enough that +no program in normal operation should run into these limits. + +The creation of per user per user namespace objects are charged to +the user in the user namespace who created the object and +verified to be below the per user limit in that user namespace. + +The creation of objects is also charged to all of the users +who created user namespaces the creation of the object happens +in (user namespaces can be nested) and verified to be below the per user +limits in the user namespaces of those users. + +This recursive counting of created objects ensures that creating a +user namespace does not allow a user to escape their current limits. + +Currently, these files are in /proc/sys/user: + +- max_cgroup_namespaces + + The maximum number of cgroup namespaces that any user in the current + user namespace may create. + +- max_ipc_namespaces + + The maximum number of ipc namespaces that any user in the current + user namespace may create. + +- max_mnt_namespaces + + The maximum number of mount namespaces that any user in the current + user namespace may create. + +- max_net_namespaces + + The maximum number of network namespaces that any user in the + current user namespace may create. + +- max_pid_namespaces + + The maximum number of pid namespaces that any user in the current + user namespace may create. + +- max_user_namespaces + + The maximum number of user namespaces that any user in the current + user namespace may create. + +- max_uts_namespaces + + The maximum number of user namespaces that any user in the current + user namespace may create. diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index dd5f916b351d..a273dd0bbaaa 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt @@ -203,6 +203,17 @@ along to ftrace_push_return_trace() instead of a stub value of 0. Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer. +HAVE_FUNCTION_GRAPH_RET_ADDR_PTR +-------------------------------- + +An arch may pass in a pointer to the return address on the stack. This +prevents potential stack unwinding issues where the unwinder gets out of +sync with ret_stack and the wrong addresses are reported by +ftrace_graph_ret_addr(). + +Adding support for it is easy: just define the macro in asm/ftrace.h and +pass the return address pointer as the 'retp' argument to +ftrace_push_return_trace(). HAVE_FTRACE_NMI_ENTER --------------------- diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index a6b3705e62a6..185c39fea2a0 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -858,11 +858,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] When enabled, it will account time the task has been scheduled out as part of the function call. - graph-time - When running function graph tracer, to include the - time to call nested functions. When this is not set, - the time reported for the function will only include - the time the function itself executed for, not the time - for functions that it called. + graph-time - When running function profiler with function graph tracer, + to include the time to call nested functions. When this is + not set, the time reported for the function will only + include the time the function itself executed for, not the + time for functions that it called. record-cmd - When any event or tracer is enabled, a hook is enabled in the sched_switch trace point to fill comm cache diff --git a/Documentation/trace/hwlat_detector.txt b/Documentation/trace/hwlat_detector.txt new file mode 100644 index 000000000000..3207717a0d1a --- /dev/null +++ b/Documentation/trace/hwlat_detector.txt @@ -0,0 +1,79 @@ +Introduction: +------------- + +The tracer hwlat_detector is a special purpose tracer that is used to +detect large system latencies induced by the behavior of certain underlying +hardware or firmware, independent of Linux itself. The code was developed +originally to detect SMIs (System Management Interrupts) on x86 systems, +however there is nothing x86 specific about this patchset. It was +originally written for use by the "RT" patch since the Real Time +kernel is highly latency sensitive. + +SMIs are not serviced by the Linux kernel, which means that it does not +even know that they are occuring. SMIs are instead set up by BIOS code +and are serviced by BIOS code, usually for "critical" events such as +management of thermal sensors and fans. Sometimes though, SMIs are used for +other tasks and those tasks can spend an inordinate amount of time in the +handler (sometimes measured in milliseconds). Obviously this is a problem if +you are trying to keep event service latencies down in the microsecond range. + +The hardware latency detector works by hogging one of the cpus for configurable +amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter +for some period, then looking for gaps in the TSC data. Any gap indicates a +time when the polling was interrupted and since the interrupts are disabled, +the only thing that could do that would be an SMI or other hardware hiccup +(or an NMI, but those can be tracked). + +Note that the hwlat detector should *NEVER* be used in a production environment. +It is intended to be run manually to determine if the hardware platform has a +problem with long system firmware service routines. + +Usage: +------ + +Write the ASCII text "hwlat" into the current_tracer file of the tracing system +(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to +redefine the threshold in microseconds (us) above which latency spikes will +be taken into account. + +Example: + + # echo hwlat > /sys/kernel/tracing/current_tracer + # echo 100 > /sys/kernel/tracing/tracing_thresh + +The /sys/kernel/tracing/hwlat_detector interface contains the following files: + +width - time period to sample with CPUs held (usecs) + must be less than the total window size (enforced) +window - total period of sampling, width being inside (usecs) + +By default the width is set to 500,000 and window to 1,000,000, meaning that +for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs +(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will +change to a default of 10 usecs. If any latencies that exceed the threshold is +observed then the data will be written to the tracing ring buffer. + +The minimum sleep time between periods is 1 millisecond. Even if width +is less than 1 millisecond apart from window, to allow the system to not +be totally starved. + +If tracing_thresh was zero when hwlat detector was started, it will be set +back to zero if another tracer is loaded. Note, the last value in +tracing_thresh that hwlat detector had will be saved and this value will +be restored in tracing_thresh if it is still zero when hwlat detector is +started again. + +The following tracing directory files are used by the hwlat_detector: + +in /sys/kernel/tracing: + + tracing_threshold - minimum latency value to be considered (usecs) + tracing_max_latency - maximum hardware latency actually observed (usecs) + tracing_cpumask - the CPUs to move the hwlat thread across + hwlat_detector/width - specified amount of time to spin within window (usecs) + hwlat_detector/window - amount of time between (width) runs (usecs) + +The hwlat detector's kernel thread will migrate across each CPU specified in +tracing_cpumask between each window. To limit the migration, either modify +tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU +affinity directly, and the migration will stop. diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index ea52ec1f8484..e4991fb1eedc 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -44,8 +44,8 @@ Synopsis of kprobe_events +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types - (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield - are supported. + (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types + (x8/x16/x32/x64), "string" and bitfield are supported. (*) only for return probe. (**) this is useful for fetching a field of data structures. @@ -54,7 +54,10 @@ Types ----- Several types are supported for fetch-args. Kprobe tracer will access memory by given type. Prefix 's' and 'u' means those types are signed and unsigned -respectively. Traced arguments are shown in decimal (signed) or hex (unsigned). +respectively. 'x' prefix implies it is unsigned. Traced arguments are shown +in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32' +or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and +x86-64 uses x64). String type is a special type, which fetches a "null-terminated" string from kernel space. This means it will fail and store NULL if the string container has been paged out. diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt index 72d1cd4f7bf3..94b6b4581763 100644 --- a/Documentation/trace/uprobetracer.txt +++ b/Documentation/trace/uprobetracer.txt @@ -40,8 +40,8 @@ Synopsis of uprobe_tracer +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types - (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield - are supported. + (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types + (x8/x16/x32/x64), "string" and bitfield are supported. (*) only for return probe. (**) this is useful for fetching a field of data structures. @@ -50,7 +50,10 @@ Types ----- Several types are supported for fetch-args. Uprobe tracer will access memory by given type. Prefix 's' and 'u' means those types are signed and unsigned -respectively. Traced arguments are shown in decimal (signed) or hex (unsigned). +respectively. 'x' prefix implies it is unsigned. Traced arguments are shown +in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32' +or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and +x86-64 uses x64). String type is a special type, which fetches a "null-terminated" string from user space. Bitfield is another special type, which takes 3 parameters, bit-width, bit- diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt b/Documentation/virtual/kvm/devices/arm-vgic-its.txt new file mode 100644 index 000000000000..6081a5b7fc1e --- /dev/null +++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt @@ -0,0 +1,38 @@ +ARM Virtual Interrupt Translation Service (ITS) +=============================================== + +Device types supported: + KVM_DEV_TYPE_ARM_VGIC_ITS ARM Interrupt Translation Service Controller + +The ITS allows MSI(-X) interrupts to be injected into guests. This extension is +optional. Creating a virtual ITS controller also requires a host GICv3 (see +arm-vgic-v3.txt), but does not depend on having physical ITS controllers. + +There can be multiple ITS controllers per guest, each of them has to have +a separate, non-overlapping MMIO region. + + +Groups: + KVM_DEV_ARM_VGIC_GRP_ADDR + Attributes: + KVM_VGIC_ITS_ADDR_TYPE (rw, 64-bit) + Base address in the guest physical address space of the GICv3 ITS + control register frame. + This address needs to be 64K aligned and the region covers 128K. + Errors: + -E2BIG: Address outside of addressable IPA range + -EINVAL: Incorrectly aligned address + -EEXIST: Address already configured + -EFAULT: Invalid user pointer for attr->addr. + -ENODEV: Incorrect attribute or the ITS is not supported. + + + KVM_DEV_ARM_VGIC_GRP_CTRL + Attributes: + KVM_DEV_ARM_VGIC_CTRL_INIT + request the initialization of the ITS, no additional parameter in + kvm_device_attr.addr. + Errors: + -ENXIO: ITS not properly configured as required prior to setting + this attribute + -ENOMEM: Memory shortage when allocating ITS internal data diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt new file mode 100644 index 000000000000..9348b3caccd7 --- /dev/null +++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt @@ -0,0 +1,206 @@ +ARM Virtual Generic Interrupt Controller v3 and later (VGICv3) +============================================================== + + +Device types supported: + KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 + +Only one VGIC instance may be instantiated through this API. The created VGIC +will act as the VM interrupt controller, requiring emulated user-space devices +to inject interrupts to the VGIC instead of directly to CPUs. It is not +possible to create both a GICv3 and GICv2 on the same VM. + +Creating a guest GICv3 device requires a host GICv3 as well. + + +Groups: + KVM_DEV_ARM_VGIC_GRP_ADDR + Attributes: + KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit) + Base address in the guest physical address space of the GICv3 distributor + register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. + This address needs to be 64K aligned and the region covers 64 KByte. + + KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit) + Base address in the guest physical address space of the GICv3 + redistributor register mappings. There are two 64K pages for each + VCPU and all of the redistributor pages are contiguous. + Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. + This address needs to be 64K aligned. + Errors: + -E2BIG: Address outside of addressable IPA range + -EINVAL: Incorrectly aligned address + -EEXIST: Address already configured + -ENXIO: The group or attribute is unknown/unsupported for this device + or hardware support is missing. + -EFAULT: Invalid user pointer for attr->addr. + + + + KVM_DEV_ARM_VGIC_GRP_DIST_REGS + KVM_DEV_ARM_VGIC_GRP_REDIST_REGS + Attributes: + The attr field of kvm_device_attr encodes two values: + bits: | 63 .... 32 | 31 .... 0 | + values: | mpidr | offset | + + All distributor regs are (rw, 32-bit) and kvm_device_attr.addr points to a + __u32 value. 64-bit registers must be accessed by separately accessing the + lower and higher word. + + Writes to read-only registers are ignored by the kernel. + + KVM_DEV_ARM_VGIC_GRP_DIST_REGS accesses the main distributor registers. + KVM_DEV_ARM_VGIC_GRP_REDIST_REGS accesses the redistributor of the CPU + specified by the mpidr. + + The offset is relative to the "[Re]Distributor base address" as defined + in the GICv3/4 specs. Getting or setting such a register has the same + effect as reading or writing the register on real hardware, except for the + following registers: GICD_STATUSR, GICR_STATUSR, GICD_ISPENDR, + GICR_ISPENDR0, GICD_ICPENDR, and GICR_ICPENDR0. These registers behave + differently when accessed via this interface compared to their + architecturally defined behavior to allow software a full view of the + VGIC's internal state. + + The mpidr field is used to specify which + redistributor is accessed. The mpidr is ignored for the distributor. + + The mpidr encoding is based on the affinity information in the + architecture defined MPIDR, and the field is encoded as follows: + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | + | Aff3 | Aff2 | Aff1 | Aff0 | + + Note that distributor fields are not banked, but return the same value + regardless of the mpidr used to access the register. + + The GICD_STATUSR and GICR_STATUSR registers are architecturally defined such + that a write of a clear bit has no effect, whereas a write with a set bit + clears that value. To allow userspace to freely set the values of these two + registers, setting the attributes with the register offsets for these two + registers simply sets the non-reserved bits to the value written. + + + Accesses (reads and writes) to the GICD_ISPENDR register region and + GICR_ISPENDR0 registers get/set the value of the latched pending state for + the interrupts. + + This is identical to the value returned by a guest read from ISPENDR for an + edge triggered interrupt, but may differ for level triggered interrupts. + For edge triggered interrupts, once an interrupt becomes pending (whether + because of an edge detected on the input line or because of a guest write + to ISPENDR) this state is "latched", and only cleared when either the + interrupt is activated or when the guest writes to ICPENDR. A level + triggered interrupt may be pending either because the level input is held + high by a device, or because of a guest write to the ISPENDR register. Only + ISPENDR writes are latched; if the device lowers the line level then the + interrupt is no longer pending unless the guest also wrote to ISPENDR, and + conversely writes to ICPENDR or activations of the interrupt do not clear + the pending status if the line level is still being held high. (These + rules are documented in the GICv3 specification descriptions of the ICPENDR + and ISPENDR registers.) For a level triggered interrupt the value accessed + here is that of the latch which is set by ISPENDR and cleared by ICPENDR or + interrupt activation, whereas the value returned by a guest read from + ISPENDR is the logical OR of the latch value and the input line level. + + Raw access to the latch state is provided to userspace so that it can save + and restore the entire GIC internal state (which is defined by the + combination of the current input line level and the latch state, and cannot + be deduced from purely the line level and the value of the ISPENDR + registers). + + Accesses to GICD_ICPENDR register region and GICR_ICPENDR0 registers have + RAZ/WI semantics, meaning that reads always return 0 and writes are always + ignored. + + Errors: + -ENXIO: Getting or setting this register is not yet supported + -EBUSY: One or more VCPUs are running + + + KVM_DEV_ARM_VGIC_CPU_SYSREGS + Attributes: + The attr field of kvm_device_attr encodes two values: + bits: | 63 .... 32 | 31 .... 16 | 15 .... 0 | + values: | mpidr | RES | instr | + + The mpidr field encodes the CPU ID based on the affinity information in the + architecture defined MPIDR, and the field is encoded as follows: + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | + | Aff3 | Aff2 | Aff1 | Aff0 | + + The instr field encodes the system register to access based on the fields + defined in the A64 instruction set encoding for system register access + (RES means the bits are reserved for future use and should be zero): + + | 15 ... 14 | 13 ... 11 | 10 ... 7 | 6 ... 3 | 2 ... 0 | + | Op 0 | Op1 | CRn | CRm | Op2 | + + All system regs accessed through this API are (rw, 64-bit) and + kvm_device_attr.addr points to a __u64 value. + + KVM_DEV_ARM_VGIC_CPU_SYSREGS accesses the CPU interface registers for the + CPU specified by the mpidr field. + + Errors: + -ENXIO: Getting or setting this register is not yet supported + -EBUSY: VCPU is running + -EINVAL: Invalid mpidr supplied + + + KVM_DEV_ARM_VGIC_GRP_NR_IRQS + Attributes: + A value describing the number of interrupts (SGI, PPI and SPI) for + this GIC instance, ranging from 64 to 1024, in increments of 32. + + kvm_device_attr.addr points to a __u32 value. + + Errors: + -EINVAL: Value set is out of the expected range + -EBUSY: Value has already be set. + + + KVM_DEV_ARM_VGIC_GRP_CTRL + Attributes: + KVM_DEV_ARM_VGIC_CTRL_INIT + request the initialization of the VGIC, no additional parameter in + kvm_device_attr.addr. + Errors: + -ENXIO: VGIC not properly configured as required prior to calling + this attribute + -ENODEV: no online VCPU + -ENOMEM: memory shortage when allocating vgic internal data + + + KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO + Attributes: + The attr field of kvm_device_attr encodes the following values: + bits: | 63 .... 32 | 31 .... 10 | 9 .... 0 | + values: | mpidr | info | vINTID | + + The vINTID specifies which set of IRQs is reported on. + + The info field specifies which information userspace wants to get or set + using this interface. Currently we support the following info values: + + VGIC_LEVEL_INFO_LINE_LEVEL: + Get/Set the input level of the IRQ line for a set of 32 contiguously + numbered interrupts. + vINTID must be a multiple of 32. + + kvm_device_attr.addr points to a __u32 value which will contain a + bitmap where a set bit means the interrupt level is asserted. + + Bit[n] indicates the status for interrupt vINTID + n. + + SGIs and any interrupt with a higher ID than the number of interrupts + supported, will be RAZ/WI. LPIs are always edge-triggered and are + therefore not supported by this interface. + + PPIs are reported per VCPU as specified in the mpidr field, and SPIs are + reported with the same value regardless of the mpidr specified. + + The mpidr field encodes the CPU ID based on the affinity information in the + architecture defined MPIDR, and the field is encoded as follows: + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | + | Aff3 | Aff2 | Aff1 | Aff0 | diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt index 89182f80cc7f..76e61c883347 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt @@ -1,24 +1,19 @@ -ARM Virtual Generic Interrupt Controller (VGIC) -=============================================== +ARM Virtual Generic Interrupt Controller v2 (VGIC) +================================================== Device types supported: KVM_DEV_TYPE_ARM_VGIC_V2 ARM Generic Interrupt Controller v2.0 - KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 - KVM_DEV_TYPE_ARM_VGIC_ITS ARM Interrupt Translation Service Controller -Only one VGIC instance of the V2/V3 types above may be instantiated through -either this API or the legacy KVM_CREATE_IRQCHIP api. The created VGIC will -act as the VM interrupt controller, requiring emulated user-space devices to -inject interrupts to the VGIC instead of directly to CPUs. +Only one VGIC instance may be instantiated through either this API or the +legacy KVM_CREATE_IRQCHIP API. The created VGIC will act as the VM interrupt +controller, requiring emulated user-space devices to inject interrupts to the +VGIC instead of directly to CPUs. -Creating a guest GICv3 device requires a host GICv3 as well. -GICv3 implementations with hardware compatibility support allow a guest GICv2 -as well. +GICv3 implementations with hardware compatibility support allow creating a +guest GICv2 through this interface. For information on creating a guest GICv3 +device and guest ITS devices, see arm-vgic-v3.txt. It is not possible to +create both a GICv3 and GICv2 device on the same VM. -Creating a virtual ITS controller requires a host GICv3 (but does not depend -on having physical ITS controllers). -There can be multiple ITS controllers per guest, each of them has to have -a separate, non-overlapping MMIO region. Groups: KVM_DEV_ARM_VGIC_GRP_ADDR @@ -32,26 +27,13 @@ Groups: Base address in the guest physical address space of the GIC virtual cpu interface register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2. This address needs to be 4K aligned and the region covers 4 KByte. - - KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit) - Base address in the guest physical address space of the GICv3 distributor - register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. - This address needs to be 64K aligned and the region covers 64 KByte. - - KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit) - Base address in the guest physical address space of the GICv3 - redistributor register mappings. There are two 64K pages for each - VCPU and all of the redistributor pages are contiguous. - Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. - This address needs to be 64K aligned. - - KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit) - Base address in the guest physical address space of the GICv3 ITS - control register frame. The ITS allows MSI(-X) interrupts to be - injected into guests. This extension is optional. If the kernel - does not support the ITS, the call returns -ENODEV. - Only valid for KVM_DEV_TYPE_ARM_VGIC_ITS. - This address needs to be 64K aligned and the region covers 128K. + Errors: + -E2BIG: Address outside of addressable IPA range + -EINVAL: Incorrectly aligned address + -EEXIST: Address already configured + -ENXIO: The group or attribute is unknown/unsupported for this device + or hardware support is missing. + -EFAULT: Invalid user pointer for attr->addr. KVM_DEV_ARM_VGIC_GRP_DIST_REGS Attributes: diff --git a/Documentation/virtual/kvm/devices/vcpu.txt b/Documentation/virtual/kvm/devices/vcpu.txt index c04165868faf..02f50686c418 100644 --- a/Documentation/virtual/kvm/devices/vcpu.txt +++ b/Documentation/virtual/kvm/devices/vcpu.txt @@ -30,4 +30,6 @@ Returns: -ENODEV: PMUv3 not supported attribute -EBUSY: PMUv3 already initialized -Request the initialization of the PMUv3. +Request the initialization of the PMUv3. This must be done after creating the +in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not +supported. diff --git a/Documentation/vme_api.txt b/Documentation/vme_api.txt index ca5b82797f6c..90006550f485 100644 --- a/Documentation/vme_api.txt +++ b/Documentation/vme_api.txt @@ -8,13 +8,14 @@ As with other subsystems within the Linux kernel, VME device drivers register with the VME subsystem, typically called from the devices init routine. This is achieved via a call to the following function: - int vme_register_driver (struct vme_driver *driver); + int vme_register_driver (struct vme_driver *driver, unsigned int ndevs); If driver registration is successful this function returns zero, if an error occurred a negative error code will be returned. A pointer to a structure of type 'vme_driver' must be provided to the -registration function. The structure is as follows: +registration function. Along with ndevs, which is the number of devices your +driver is able to support. The structure is as follows: struct vme_driver { struct list_head node; @@ -32,8 +33,8 @@ At the minimum, the '.name', '.match' and '.probe' elements of this structure should be correctly set. The '.name' element is a pointer to a string holding the device driver's name. -The '.match' function allows controlling the number of devices that need to -be registered. The match function should return 1 if a device should be +The '.match' function allows control over which VME devices should be registered +with the driver. The match function should return 1 if a device should be probed and 0 otherwise. This example match function (from vme_user.c) limits the number of devices probed to one: @@ -385,13 +386,13 @@ location monitor location. Each location monitor can monitor a number of adjacent locations: int vme_lm_attach(struct vme_resource *res, int num, - void (*callback)(int)); + void (*callback)(void *)); int vme_lm_detach(struct vme_resource *res, int num); The callback function is declared as follows. - void callback(int num); + void callback(void *data); Slot Detection diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 8c7dd5957ae1..5724092db811 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,13 +12,13 @@ ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... -ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB) +ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... -ffffffef00000000 - ffffffff00000000 (=64 GB) EFI region mapping space +ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space ... unused hole ... -ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0 +ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole diff --git a/Documentation/xtensa/mmu.txt b/Documentation/xtensa/mmu.txt index 0312fe66475c..222a2c6748e6 100644 --- a/Documentation/xtensa/mmu.txt +++ b/Documentation/xtensa/mmu.txt @@ -3,15 +3,8 @@ MMUv3 initialization sequence. The code in the initialize_mmu macro sets up MMUv3 memory mapping identically to MMUv2 fixed memory mapping. Depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX symbol this code is -located in one of the following address ranges: - - 0xF0000000..0xFFFFFFFF (will keep same address in MMU v2 layout; - typically ROM) - 0x00000000..0x07FFFFFF (system RAM; this code is actually linked - at 0xD0000000..0xD7FFFFFF [cached] - or 0xD8000000..0xDFFFFFFF [uncached]; - in any case, initially runs elsewhere - than linked, so have to be careful) +located in addresses it was linked for (symbol undefined), or not +(symbol defined), so it needs to be position-independent. The code has the following assumptions: This code fragment is run only on an MMU v3. @@ -28,24 +21,26 @@ TLB setup proceeds along the following steps. PA = physical address (two upper nibbles of it); pc = physical range that contains this code; -After step 2, we jump to virtual address in 0x40000000..0x5fffffff -that corresponds to next instruction to execute in this code. -After step 4, we jump to intended (linked) address of this code. - - Step 0 Step1 Step 2 Step3 Step 4 Step5 - ============ ===== ============ ===== ============ ===== - VA PA PA VA PA PA VA PA PA - ------ -- -- ------ -- -- ------ -- -- - E0..FF -> E0 -> E0 E0..FF -> E0 F0..FF -> F0 -> F0 - C0..DF -> C0 -> C0 C0..DF -> C0 E0..EF -> F0 -> F0 - A0..BF -> A0 -> A0 A0..BF -> A0 D8..DF -> 00 -> 00 - 80..9F -> 80 -> 80 80..9F -> 80 D0..D7 -> 00 -> 00 - 60..7F -> 60 -> 60 60..7F -> 60 - 40..5F -> 40 40..5F -> pc -> pc 40..5F -> pc - 20..3F -> 20 -> 20 20..3F -> 20 - 00..1F -> 00 -> 00 00..1F -> 00 - -The default location of IO peripherals is above 0xf0000000. This may change +After step 2, we jump to virtual address in the range 0x40000000..0x5fffffff +or 0x00000000..0x1fffffff, depending on whether the kernel was loaded below +0x40000000 or above. That address corresponds to next instruction to execute +in this code. After step 4, we jump to intended (linked) address of this code. +The scheme below assumes that the kernel is loaded below 0x40000000. + + Step0 Step1 Step2 Step3 Step4 Step5 + ===== ===== ===== ===== ===== ===== + VA PA PA PA PA VA PA PA + ------ -- -- -- -- ------ -- -- + E0..FF -> E0 -> E0 -> E0 F0..FF -> F0 -> F0 + C0..DF -> C0 -> C0 -> C0 E0..EF -> F0 -> F0 + A0..BF -> A0 -> A0 -> A0 D8..DF -> 00 -> 00 + 80..9F -> 80 -> 80 -> 80 D0..D7 -> 00 -> 00 + 60..7F -> 60 -> 60 -> 60 + 40..5F -> 40 -> pc -> pc 40..5F -> pc + 20..3F -> 20 -> 20 -> 20 + 00..1F -> 00 -> 00 -> 00 + +The default location of IO peripherals is above 0xf0000000. This may be changed using a "ranges" property in a device tree simple-bus node. See ePAPR 1.1, ยง6.5 for details on the syntax and semantic of simple-bus nodes. The following limitations apply: @@ -62,3 +57,127 @@ limitations apply: 6. The IO area covers the entire 256MB segment of parent-bus-address; the "ranges" triplet length field is ignored + + +MMUv3 address space layouts. +============================ + +Default MMUv2-compatible layout. + + Symbol VADDR Size ++------------------+ +| Userspace | 0x00000000 TASK_SIZE ++------------------+ 0x40000000 ++------------------+ +| Page table | 0x80000000 ++------------------+ 0x80400000 ++------------------+ +| KMAP area | PKMAP_BASE PTRS_PER_PTE * +| | DCACHE_N_COLORS * +| | PAGE_SIZE +| | (4MB * DCACHE_N_COLORS) ++------------------+ +| Atomic KMAP area | FIXADDR_START KM_TYPE_NR * +| | NR_CPUS * +| | DCACHE_N_COLORS * +| | PAGE_SIZE ++------------------+ FIXADDR_TOP 0xbffff000 ++------------------+ +| VMALLOC area | VMALLOC_START 0xc0000000 128MB - 64KB ++------------------+ VMALLOC_END +| Cache aliasing | TLBTEMP_BASE_1 0xc7ff0000 DCACHE_WAY_SIZE +| remap area 1 | ++------------------+ +| Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE +| remap area 2 | ++------------------+ ++------------------+ +| Cached KSEG | XCHAL_KSEG_CACHED_VADDR 0xd0000000 128MB ++------------------+ +| Uncached KSEG | XCHAL_KSEG_BYPASS_VADDR 0xd8000000 128MB ++------------------+ +| Cached KIO | XCHAL_KIO_CACHED_VADDR 0xe0000000 256MB ++------------------+ +| Uncached KIO | XCHAL_KIO_BYPASS_VADDR 0xf0000000 256MB ++------------------+ + + +256MB cached + 256MB uncached layout. + + Symbol VADDR Size ++------------------+ +| Userspace | 0x00000000 TASK_SIZE ++------------------+ 0x40000000 ++------------------+ +| Page table | 0x80000000 ++------------------+ 0x80400000 ++------------------+ +| KMAP area | PKMAP_BASE PTRS_PER_PTE * +| | DCACHE_N_COLORS * +| | PAGE_SIZE +| | (4MB * DCACHE_N_COLORS) ++------------------+ +| Atomic KMAP area | FIXADDR_START KM_TYPE_NR * +| | NR_CPUS * +| | DCACHE_N_COLORS * +| | PAGE_SIZE ++------------------+ FIXADDR_TOP 0x9ffff000 ++------------------+ +| VMALLOC area | VMALLOC_START 0xa0000000 128MB - 64KB ++------------------+ VMALLOC_END +| Cache aliasing | TLBTEMP_BASE_1 0xa7ff0000 DCACHE_WAY_SIZE +| remap area 1 | ++------------------+ +| Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE +| remap area 2 | ++------------------+ ++------------------+ +| Cached KSEG | XCHAL_KSEG_CACHED_VADDR 0xb0000000 256MB ++------------------+ +| Uncached KSEG | XCHAL_KSEG_BYPASS_VADDR 0xc0000000 256MB ++------------------+ ++------------------+ +| Cached KIO | XCHAL_KIO_CACHED_VADDR 0xe0000000 256MB ++------------------+ +| Uncached KIO | XCHAL_KIO_BYPASS_VADDR 0xf0000000 256MB ++------------------+ + + +512MB cached + 512MB uncached layout. + + Symbol VADDR Size ++------------------+ +| Userspace | 0x00000000 TASK_SIZE ++------------------+ 0x40000000 ++------------------+ +| Page table | 0x80000000 ++------------------+ 0x80400000 ++------------------+ +| KMAP area | PKMAP_BASE PTRS_PER_PTE * +| | DCACHE_N_COLORS * +| | PAGE_SIZE +| | (4MB * DCACHE_N_COLORS) ++------------------+ +| Atomic KMAP area | FIXADDR_START KM_TYPE_NR * +| | NR_CPUS * +| | DCACHE_N_COLORS * +| | PAGE_SIZE ++------------------+ FIXADDR_TOP 0x8ffff000 ++------------------+ +| VMALLOC area | VMALLOC_START 0x90000000 128MB - 64KB ++------------------+ VMALLOC_END +| Cache aliasing | TLBTEMP_BASE_1 0x97ff0000 DCACHE_WAY_SIZE +| remap area 1 | ++------------------+ +| Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE +| remap area 2 | ++------------------+ ++------------------+ +| Cached KSEG | XCHAL_KSEG_CACHED_VADDR 0xa0000000 512MB ++------------------+ +| Uncached KSEG | XCHAL_KSEG_BYPASS_VADDR 0xc0000000 512MB ++------------------+ +| Cached KIO | XCHAL_KIO_CACHED_VADDR 0xe0000000 256MB ++------------------+ +| Uncached KIO | XCHAL_KIO_BYPASS_VADDR 0xf0000000 256MB ++------------------+ |