diff options
Diffstat (limited to 'Documentation')
101 files changed, 3298 insertions, 503 deletions
diff --git a/Documentation/ABI/README b/Documentation/ABI/README index 10069828568b..1fafc4b0753b 100644 --- a/Documentation/ABI/README +++ b/Documentation/ABI/README @@ -72,3 +72,16 @@ kernel tree without going through the obsolete state first. It's up to the developer to place their interfaces in the category they wish for it to start out in. + + +Notable bits of non-ABI, which should not under any circumstances be considered +stable: + +- Kconfig. Userspace should not rely on the presence or absence of any + particular Kconfig symbol, in /proc/config.gz, in the copy of .config + commonly installed to /boot, or in any invocation of the kernel build + process. + +- Kernel-internal symbols. Do not rely on the presence, absence, location, or + type of any kernel symbol, either in System.map files or the kernel binary + itself. See Documentation/stable_api_nonsense.txt. diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index 5c53d28f775c..b9688de8455b 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -61,6 +61,12 @@ Description: Interface for making ib_srp connect to a new target. interrupt is handled by a different CPU then the comp_vector parameter can be used to spread the SRP completion workload over multiple CPU's. + * tl_retry_count, a number in the range 2..7 specifying the + IB RC retry count. + * queue_size, the maximum number of commands that the + initiator is allowed to queue per SCSI host. The default + value for this parameter is 62. The lowest supported value + is 2. What: /sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev Date: January 2, 2006 @@ -153,6 +159,13 @@ Contact: linux-rdma@vger.kernel.org Description: InfiniBand service ID used for establishing communication with the SRP target. +What: /sys/class/scsi_host/host<n>/sgid +Date: February 1, 2014 +KernelVersion: 3.13 +Contact: linux-rdma@vger.kernel.org +Description: InfiniBand GID of the source port used for communication with + the SRP target. + What: /sys/class/scsi_host/host<n>/zero_req_lim Date: September 20, 2006 KernelVersion: 2.6.18 diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp index b36fb0dc13c8..ec7af69fea0a 100644 --- a/Documentation/ABI/stable/sysfs-transport-srp +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -5,6 +5,24 @@ Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org Description: Instructs an SRP initiator to disconnect from a target and to remove all LUNs imported from that target. +What: /sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo +Date: February 1, 2014 +KernelVersion: 3.13 +Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before removing a target port. + Zero means immediate removal. Setting this attribute to "off" + will disable the dev_loss timer. + +What: /sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo +Date: February 1, 2014 +KernelVersion: 3.13 +Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before failing I/O. Zero means + failing I/O immediately. Setting this attribute to "off" will + disable the fast_io_fail timer. + What: /sys/class/srp_remote_ports/port-<h>:<n>/port_id Date: June 27, 2007 KernelVersion: 2.6.24 @@ -12,8 +30,29 @@ Contact: linux-scsi@vger.kernel.org Description: 16-byte local SRP port identifier in hexadecimal format. An example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00. +What: /sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay +Date: February 1, 2014 +KernelVersion: 3.13 +Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a reconnect + attempt failed before retrying. Setting this attribute to + "off" will disable time-based reconnecting. + What: /sys/class/srp_remote_ports/port-<h>:<n>/roles Date: June 27, 2007 KernelVersion: 2.6.24 Contact: linux-scsi@vger.kernel.org Description: Role of the remote port. Either "SRP Initiator" or "SRP Target". + +What: /sys/class/srp_remote_ports/port-<h>:<n>/state +Date: February 1, 2014 +KernelVersion: 3.13 +Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org +Description: State of the transport layer used for communication with the + remote port. "running" if the transport layer is operational; + "blocked" if a transport layer error has been encountered but + the fast_io_fail_tmo timer has not yet fired; "fail-fast" + after the fast_io_fail_tmo timer has fired and before the + "dev_loss_tmo" timer has fired; "lost" after the + "dev_loss_tmo" timer has fired and before the port is finally + removed. diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd index bfd119ace6ad..1399bb2da3eb 100644 --- a/Documentation/ABI/testing/sysfs-class-mtd +++ b/Documentation/ABI/testing/sysfs-class-mtd @@ -104,7 +104,7 @@ Description: One of the following ASCII strings, representing the device type: - absent, ram, rom, nor, nand, dataflash, ubi, unknown + absent, ram, rom, nor, nand, mlc-nand, dataflash, ubi, unknown What: /sys/class/mtd/mtdX/writesize Date: April 2009 diff --git a/Documentation/ABI/testing/sysfs-class-net-batman-adv b/Documentation/ABI/testing/sysfs-class-net-batman-adv index bdc00707c751..7f34a95bb963 100644 --- a/Documentation/ABI/testing/sysfs-class-net-batman-adv +++ b/Documentation/ABI/testing/sysfs-class-net-batman-adv @@ -1,13 +1,13 @@ What: /sys/class/net/<iface>/batman-adv/iface_status Date: May 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Indicates the status of <iface> as it is seen by batman. What: /sys/class/net/<iface>/batman-adv/mesh_iface Date: May 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: The /sys/class/net/<iface>/batman-adv/mesh_iface file displays the batman mesh interface this <iface> diff --git a/Documentation/ABI/testing/sysfs-class-net-mesh b/Documentation/ABI/testing/sysfs-class-net-mesh index bdcd8b4e38f2..0baa657b18c4 100644 --- a/Documentation/ABI/testing/sysfs-class-net-mesh +++ b/Documentation/ABI/testing/sysfs-class-net-mesh @@ -1,22 +1,23 @@ What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms Date: May 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Indicates whether the batman protocol messages of the mesh <mesh_iface> shall be aggregated or not. -What: /sys/class/net/<mesh_iface>/mesh/ap_isolation +What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation Date: May 2011 -Contact: Antonio Quartulli <ordex@autistici.org> +Contact: Antonio Quartulli <antonio@meshcoding.com> Description: Indicates whether the data traffic going from a wireless client to another wireless client will be - silently dropped. + silently dropped. <vlan_subdir> is empty when referring + to the untagged lan. What: /sys/class/net/<mesh_iface>/mesh/bonding Date: June 2010 -Contact: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> +Contact: Simon Wunderlich <sw@simonwunderlich.de> Description: Indicates whether the data traffic going through the mesh will be sent using multiple interfaces at the @@ -24,7 +25,7 @@ Description: What: /sys/class/net/<mesh_iface>/mesh/bridge_loop_avoidance Date: November 2011 -Contact: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> +Contact: Simon Wunderlich <sw@simonwunderlich.de> Description: Indicates whether the bridge loop avoidance feature is enabled. This feature detects and avoids loops @@ -41,21 +42,21 @@ Description: What: /sys/class/net/<mesh_iface>/mesh/gw_bandwidth Date: October 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Defines the bandwidth which is propagated by this node if gw_mode was set to 'server'. What: /sys/class/net/<mesh_iface>/mesh/gw_mode Date: October 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Defines the state of the gateway features. Can be either 'off', 'client' or 'server'. What: /sys/class/net/<mesh_iface>/mesh/gw_sel_class Date: October 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Defines the selection criteria this node will use to choose a gateway if gw_mode was set to 'client'. @@ -77,25 +78,14 @@ Description: What: /sys/class/net/<mesh_iface>/mesh/orig_interval Date: May 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Defines the interval in milliseconds in which batman sends its protocol messages. What: /sys/class/net/<mesh_iface>/mesh/routing_algo Date: Dec 2011 -Contact: Marek Lindner <lindner_marek@yahoo.de> +Contact: Marek Lindner <mareklindner@neomailbox.ch> Description: Defines the routing procotol this mesh instance uses to find the optimal paths through the mesh. - -What: /sys/class/net/<mesh_iface>/mesh/vis_mode -Date: May 2010 -Contact: Marek Lindner <lindner_marek@yahoo.de> -Description: - Each batman node only maintains information about its - own local neighborhood, therefore generating graphs - showing the topology of the entire mesh is not easily - feasible without having a central instance to collect - the local topologies from all nodes. This file allows - to activate the collecting (server) mode. diff --git a/Documentation/ABI/testing/sysfs-class-powercap b/Documentation/ABI/testing/sysfs-class-powercap new file mode 100644 index 000000000000..db3b3ff70d84 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-powercap @@ -0,0 +1,152 @@ +What: /sys/class/powercap/ +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + The powercap/ class sub directory belongs to the power cap + subsystem. Refer to + Documentation/power/powercap/powercap.txt for details. + +What: /sys/class/powercap/<control type> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + A <control type> is a unique name under /sys/class/powercap. + Here <control type> determines how the power is going to be + controlled. A <control type> can contain multiple power zones. + +What: /sys/class/powercap/<control type>/enabled +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + This allows to enable/disable power capping for a "control type". + This status affects every power zone using this "control_type. + +What: /sys/class/powercap/<control type>/<power zone> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + A power zone is a single or a collection of devices, which can + be independently monitored and controlled. A power zone sysfs + entry is qualified with the name of the <control type>. + E.g. intel-rapl:0:1:1. + +What: /sys/class/powercap/<control type>/<power zone>/<child power zone> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Power zones may be organized in a hierarchy in which child + power zones provide monitoring and control for a subset of + devices under the parent. For example, if there is a parent + power zone for a whole CPU package, each CPU core in it can + be a child power zone. + +What: /sys/class/powercap/.../<power zone>/name +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Specifies the name of this power zone. + +What: /sys/class/powercap/.../<power zone>/energy_uj +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Current energy counter in micro-joules. Write "0" to reset. + If the counter can not be reset, then this attribute is + read-only. + +What: /sys/class/powercap/.../<power zone>/max_energy_range_uj +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Range of the above energy counter in micro-joules. + + +What: /sys/class/powercap/.../<power zone>/power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Current power in micro-watts. + +What: /sys/class/powercap/.../<power zone>/max_power_range_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Range of the above power value in micro-watts. + +What: /sys/class/powercap/.../<power zone>/constraint_X_name +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Each power zone can define one or more constraints. Each + constraint can have an optional name. Here "X" can have values + from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_power_limit_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Power limit in micro-watts should be applicable for + the time window specified by "constraint_X_time_window_us". + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Time window in micro seconds. This is used along with + constraint_X_power_limit_uw to define a power constraint. + Here "X" can have values from 0 to max integer. + + +What: /sys/class/powercap/<control type>/.../constraint_X_max_power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Maximum allowed power in micro watts for this constraint. + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/<control type>/.../constraint_X_min_power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Minimum allowed power in micro watts for this constraint. + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_max_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Maximum allowed time window in micro seconds for this + constraint. Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_min_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Minimum allowed time window in micro seconds for this + constraint. Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/enabled +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description + This allows to enable/disable power capping at power zone level. + This applies to current power zone and its children. diff --git a/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos b/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos new file mode 100644 index 000000000000..1d6a8cf9dc0a --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-ryos @@ -0,0 +1,178 @@ +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/control +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one select which data from which + profile will be read next. The data has to be 3 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/profile +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: The mouse can store 5 profiles which can be switched by the + press of a button. profile holds index of actual profile. + This value is persistent, so its value determines the profile + that's active when the device is powered on next time. + When written, the device activates the set profile immediately. + The data has to be 3 bytes long. + The device will reject invalid data. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_primary +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the default of all keys for + a specific profile. Profile index is included in written data. + The data has to be 125 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_function +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + function keys for a specific profile. Profile index is included + in written data. The data has to be 95 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_macro +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the macro + keys for a specific profile. Profile index is included in + written data. The data has to be 35 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_thumbster +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + thumbster keys for a specific profile. Profile index is included + in written data. The data has to be 23 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_extra +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + capslock and function keys for a specific profile. Profile index + is included in written data. The data has to be 8 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/keys_easyzone +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the function of the + easyzone keys for a specific profile. Profile index is included + in written data. The data has to be 294 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/key_mask +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one deactivate certain keys like + windows and application keys, to prevent accidental presses. + Profile index for which this settings occur is included in + written data. The data has to be 6 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the backlight intensity for + a specific profile. Profile index is included in written data. + This attribute is only valid for the glow and pro variant. + The data has to be 16 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/macro +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one store macros with max 480 + keystrokes for a specific button for a specific profile. + Button and profile indexes are included in written data. + The data has to be 2002 bytes long. + Before reading this file, control has to be written to select + which profile and key to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/info +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When read, this file returns general data like firmware version. + The data is 8 bytes long. + This file is readonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/reset +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one reset the device. + The data has to be 3 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/talk +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one trigger easyshift functionality + from the host. + The data has to be 16 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light_control +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one switch between stored and custom + light settings. + This attribute is only valid for the pro variant. + The data has to be 8 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/stored_lights +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set per-key lighting for different + layers. + This attribute is only valid for the pro variant. + The data has to be 1382 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/custom_lights +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set the actual per-key lighting. + This attribute is only valid for the pro variant. + The data has to be 20 bytes long. + This file is writeonly. +Users: http://roccat.sourceforge.net + +What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/ryos/roccatryos<minor>/light_macro +Date: October 2013 +Contact: Stefan Achatz <erazor_de@users.sourceforge.net> +Description: When written, this file lets one set a light macro that is looped + whenever the device gets in dimness mode. + This attribute is only valid for the pro variant. + The data has to be 2002 bytes long. + Before reading this file, control has to be written to select + which profile to read. +Users: http://roccat.sourceforge.net diff --git a/Documentation/ABI/testing/sysfs-driver-hid-wiimote b/Documentation/ABI/testing/sysfs-driver-hid-wiimote index ed5dd567d397..39dfa5cb1cc5 100644 --- a/Documentation/ABI/testing/sysfs-driver-hid-wiimote +++ b/Documentation/ABI/testing/sysfs-driver-hid-wiimote @@ -57,3 +57,21 @@ Description: This attribute is only provided if the device was detected as a Calibration data is already applied by the kernel to all input values but may be used by user-space to perform other transformations. + +What: /sys/bus/hid/drivers/wiimote/<dev>/pro_calib +Date: October 2013 +KernelVersion: 3.13 +Contact: David Herrmann <dh.herrmann@gmail.com> +Description: This attribute is only provided if the device was detected as a + pro-controller. It provides a single line with 4 calibration + values for all 4 analog sticks. Format is: "x1:y1 x2:y2". Data + is prefixed with a +/-. Each value is a signed 16bit number. + Data is encoded as decimal numbers and specifies the offsets of + the analog sticks of the pro-controller. + Calibration data is already applied by the kernel to all input + values but may be used by user-space to perform other + transformations. + Calibration data is detected by the kernel during device setup. + You can write "scan\n" into this file to re-trigger calibration. + You can also write data directly in the form "x1:y1 x2:y2" to + set the calibration values manually. diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt index 14129f149a75..5e983031cc11 100644 --- a/Documentation/DMA-API-HOWTO.txt +++ b/Documentation/DMA-API-HOWTO.txt @@ -101,14 +101,23 @@ style to do this even if your device holds the default setting, because this shows that you did think about these issues wrt. your device. -The query is performed via a call to dma_set_mask(): +The query is performed via a call to dma_set_mask_and_coherent(): - int dma_set_mask(struct device *dev, u64 mask); + int dma_set_mask_and_coherent(struct device *dev, u64 mask); -The query for consistent allocations is performed via a call to -dma_set_coherent_mask(): +which will query the mask for both streaming and coherent APIs together. +If you have some special requirements, then the following two separate +queries can be used instead: - int dma_set_coherent_mask(struct device *dev, u64 mask); + The query for streaming mappings is performed via a call to + dma_set_mask(): + + int dma_set_mask(struct device *dev, u64 mask); + + The query for consistent allocations is performed via a call + to dma_set_coherent_mask(): + + int dma_set_coherent_mask(struct device *dev, u64 mask); Here, dev is a pointer to the device struct of your device, and mask is a bit mask describing which bits of an address your device @@ -137,7 +146,7 @@ exactly why. The standard 32-bit addressing device would do something like this: - if (dma_set_mask(dev, DMA_BIT_MASK(32))) { + if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) { printk(KERN_WARNING "mydev: No suitable DMA available.\n"); goto ignore_this_device; @@ -171,22 +180,20 @@ the case would look like this: int using_dac, consistent_using_dac; - if (!dma_set_mask(dev, DMA_BIT_MASK(64))) { + if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) { using_dac = 1; consistent_using_dac = 1; - dma_set_coherent_mask(dev, DMA_BIT_MASK(64)); - } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) { + } else if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) { using_dac = 0; consistent_using_dac = 0; - dma_set_coherent_mask(dev, DMA_BIT_MASK(32)); } else { printk(KERN_WARNING "mydev: No suitable DMA available.\n"); goto ignore_this_device; } -dma_set_coherent_mask() will always be able to set the same or a -smaller mask as dma_set_mask(). However for the rare case that a +The coherent coherent mask will always be able to set the same or a +smaller mask as the streaming mask. However for the rare case that a device driver only uses consistent allocations, one would have to check the return value from dma_set_coherent_mask(). @@ -199,9 +206,9 @@ address you might do something like: goto ignore_this_device; } -When dma_set_mask() is successful, and returns zero, the kernel saves -away this mask you have provided. The kernel will use this -information later when you make DMA mappings. +When dma_set_mask() or dma_set_mask_and_coherent() is successful, and +returns zero, the kernel saves away this mask you have provided. The +kernel will use this information later when you make DMA mappings. There is a case which we are aware of at this time, which is worth mentioning in this documentation. If your device supports multiple diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 78a6c569d204..e865279cec58 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -142,6 +142,14 @@ internal API for use by the platform than an external API for use by driver writers. int +dma_set_mask_and_coherent(struct device *dev, u64 mask) + +Checks to see if the mask is possible and updates the device +streaming and coherent DMA mask parameters if it is. + +Returns: 0 if successful and a negative error if not. + +int dma_set_mask(struct device *dev, u64 mask) Checks to see if the mask is possible and updates the device diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt index e59480db9ee0..cc2450d80310 100644 --- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt @@ -13,7 +13,7 @@ all pending DMA writes to complete, and thus provides a mechanism to strictly order DMA from a device across all intervening busses and bridges. This barrier is not specific to a particular type of interconnect, it applies to the system as a whole, and so its -implementation must account for the idiosyncracies of the system all +implementation must account for the idiosyncrasies of the system all the way from the DMA device to memory. As an example of a situation where DMA_ATTR_WRITE_BARRIER would be @@ -60,7 +60,7 @@ such mapping is non-trivial task and consumes very limited resources Buffers allocated with this attribute can be only passed to user space by calling dma_mmap_attrs(). By using this API, you are guaranteeing that you won't dereference the pointer returned by dma_alloc_attr(). You -can threat it as a cookie that must be passed to dma_mmap_attrs() and +can treat it as a cookie that must be passed to dma_mmap_attrs() and dma_free_attrs(). Make sure that both of these also get this attribute set on each call. @@ -82,7 +82,7 @@ to 'device' domain, what synchronizes CPU caches for the given region (usually it means that the cache has been flushed or invalidated depending on the dma direction). However, next calls to dma_map_{single,page,sg}() for other devices will perform exactly the -same sychronization operation on the CPU cache. CPU cache sychronization +same synchronization operation on the CPU cache. CPU cache synchronization might be a time consuming operation, especially if the buffers are large, so it is highly recommended to avoid it if possible. DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index f403ec3c5c9a..46ad6faee9ab 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -152,8 +152,8 @@ !Finclude/net/cfg80211.h cfg80211_scan_request !Finclude/net/cfg80211.h cfg80211_scan_done !Finclude/net/cfg80211.h cfg80211_bss -!Finclude/net/cfg80211.h cfg80211_inform_bss_frame -!Finclude/net/cfg80211.h cfg80211_inform_bss +!Finclude/net/cfg80211.h cfg80211_inform_bss_width_frame +!Finclude/net/cfg80211.h cfg80211_inform_bss_width !Finclude/net/cfg80211.h cfg80211_unlink_bss !Finclude/net/cfg80211.h cfg80211_find_ie !Finclude/net/cfg80211.h ieee80211_bss_get_ie diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 09e884e5b9f5..19f2a5a5a5b4 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -1958,7 +1958,7 @@ machines due to caching. <chapter id="apiref-mutex"> <title>Mutex API reference</title> !Iinclude/linux/mutex.h -!Ekernel/mutex.c +!Ekernel/locking/mutex.c </chapter> <chapter id="apiref-futex"> diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index a248f42a121e..cd11926e07c7 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl @@ -1222,8 +1222,6 @@ in this page</entry> #define NAND_BBT_VERSION 0x00000100 /* Create a bbt if none axists */ #define NAND_BBT_CREATE 0x00000200 -/* Search good / bad pattern through all pages of a block */ -#define NAND_BBT_SCANALLPAGES 0x00000400 /* Write bbt if neccecary */ #define NAND_BBT_WRITE 0x00001000 /* Read and write back block contents when writing bbt */ diff --git a/Documentation/PCI/pci.txt b/Documentation/PCI/pci.txt index bccf602a87f5..6f458564d625 100644 --- a/Documentation/PCI/pci.txt +++ b/Documentation/PCI/pci.txt @@ -525,8 +525,9 @@ corresponding register block for you. 6. Other interesting functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -pci_find_slot() Find pci_dev corresponding to given bus and - slot numbers. +pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, + bus and slot and number. If the device is + found, its reference count is increased. pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3) pci_find_capability() Find specified capability in device's capability list. @@ -582,7 +583,8 @@ having sane locking. pci_find_device() Superseded by pci_get_device() pci_find_subsys() Superseded by pci_get_subsys() -pci_find_slot() Superseded by pci_get_slot() +pci_find_slot() Superseded by pci_get_domain_bus_and_slot() +pci_get_slot() Superseded by pci_get_domain_bus_and_slot() The alternative is the traditional PCI device driver that walks PCI diff --git a/Documentation/assoc_array.txt b/Documentation/assoc_array.txt new file mode 100644 index 000000000000..f4faec0f66e4 --- /dev/null +++ b/Documentation/assoc_array.txt @@ -0,0 +1,574 @@ + ======================================== + GENERIC ASSOCIATIVE ARRAY IMPLEMENTATION + ======================================== + +Contents: + + - Overview. + + - The public API. + - Edit script. + - Operations table. + - Manipulation functions. + - Access functions. + - Index key form. + + - Internal workings. + - Basic internal tree layout. + - Shortcuts. + - Splitting and collapsing nodes. + - Non-recursive iteration. + - Simultaneous alteration and iteration. + + +======== +OVERVIEW +======== + +This associative array implementation is an object container with the following +properties: + + (1) Objects are opaque pointers. The implementation does not care where they + point (if anywhere) or what they point to (if anything). + + [!] NOTE: Pointers to objects _must_ be zero in the least significant bit. + + (2) Objects do not need to contain linkage blocks for use by the array. This + permits an object to be located in multiple arrays simultaneously. + Rather, the array is made up of metadata blocks that point to objects. + + (3) Objects require index keys to locate them within the array. + + (4) Index keys must be unique. Inserting an object with the same key as one + already in the array will replace the old object. + + (5) Index keys can be of any length and can be of different lengths. + + (6) Index keys should encode the length early on, before any variation due to + length is seen. + + (7) Index keys can include a hash to scatter objects throughout the array. + + (8) The array can iterated over. The objects will not necessarily come out in + key order. + + (9) The array can be iterated over whilst it is being modified, provided the + RCU readlock is being held by the iterator. Note, however, under these + circumstances, some objects may be seen more than once. If this is a + problem, the iterator should lock against modification. Objects will not + be missed, however, unless deleted. + +(10) Objects in the array can be looked up by means of their index key. + +(11) Objects can be looked up whilst the array is being modified, provided the + RCU readlock is being held by the thread doing the look up. + +The implementation uses a tree of 16-pointer nodes internally that are indexed +on each level by nibbles from the index key in the same manner as in a radix +tree. To improve memory efficiency, shortcuts can be emplaced to skip over +what would otherwise be a series of single-occupancy nodes. Further, nodes +pack leaf object pointers into spare space in the node rather than making an +extra branch until as such time an object needs to be added to a full node. + + +============== +THE PUBLIC API +============== + +The public API can be found in <linux/assoc_array.h>. The associative array is +rooted on the following structure: + + struct assoc_array { + ... + }; + +The code is selected by enabling CONFIG_ASSOCIATIVE_ARRAY. + + +EDIT SCRIPT +----------- + +The insertion and deletion functions produce an 'edit script' that can later be +applied to effect the changes without risking ENOMEM. This retains the +preallocated metadata blocks that will be installed in the internal tree and +keeps track of the metadata blocks that will be removed from the tree when the +script is applied. + +This is also used to keep track of dead blocks and dead objects after the +script has been applied so that they can be freed later. The freeing is done +after an RCU grace period has passed - thus allowing access functions to +proceed under the RCU read lock. + +The script appears as outside of the API as a pointer of the type: + + struct assoc_array_edit; + +There are two functions for dealing with the script: + + (1) Apply an edit script. + + void assoc_array_apply_edit(struct assoc_array_edit *edit); + + This will perform the edit functions, interpolating various write barriers + to permit accesses under the RCU read lock to continue. The edit script + will then be passed to call_rcu() to free it and any dead stuff it points + to. + + (2) Cancel an edit script. + + void assoc_array_cancel_edit(struct assoc_array_edit *edit); + + This frees the edit script and all preallocated memory immediately. If + this was for insertion, the new object is _not_ released by this function, + but must rather be released by the caller. + +These functions are guaranteed not to fail. + + +OPERATIONS TABLE +---------------- + +Various functions take a table of operations: + + struct assoc_array_ops { + ... + }; + +This points to a number of methods, all of which need to be provided: + + (1) Get a chunk of index key from caller data: + + unsigned long (*get_key_chunk)(const void *index_key, int level); + + This should return a chunk of caller-supplied index key starting at the + *bit* position given by the level argument. The level argument will be a + multiple of ASSOC_ARRAY_KEY_CHUNK_SIZE and the function should return + ASSOC_ARRAY_KEY_CHUNK_SIZE bits. No error is possible. + + + (2) Get a chunk of an object's index key. + + unsigned long (*get_object_key_chunk)(const void *object, int level); + + As the previous function, but gets its data from an object in the array + rather than from a caller-supplied index key. + + + (3) See if this is the object we're looking for. + + bool (*compare_object)(const void *object, const void *index_key); + + Compare the object against an index key and return true if it matches and + false if it doesn't. + + + (4) Diff the index keys of two objects. + + int (*diff_objects)(const void *a, const void *b); + + Return the bit position at which the index keys of two objects differ or + -1 if they are the same. + + + (5) Free an object. + + void (*free_object)(void *object); + + Free the specified object. Note that this may be called an RCU grace + period after assoc_array_apply_edit() was called, so synchronize_rcu() may + be necessary on module unloading. + + +MANIPULATION FUNCTIONS +---------------------- + +There are a number of functions for manipulating an associative array: + + (1) Initialise an associative array. + + void assoc_array_init(struct assoc_array *array); + + This initialises the base structure for an associative array. It can't + fail. + + + (2) Insert/replace an object in an associative array. + + struct assoc_array_edit * + assoc_array_insert(struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key, + void *object); + + This inserts the given object into the array. Note that the least + significant bit of the pointer must be zero as it's used to type-mark + pointers internally. + + If an object already exists for that key then it will be replaced with the + new object and the old one will be freed automatically. + + The index_key argument should hold index key information and is + passed to the methods in the ops table when they are called. + + This function makes no alteration to the array itself, but rather returns + an edit script that must be applied. -ENOMEM is returned in the case of + an out-of-memory error. + + The caller should lock exclusively against other modifiers of the array. + + + (3) Delete an object from an associative array. + + struct assoc_array_edit * + assoc_array_delete(struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key); + + This deletes an object that matches the specified data from the array. + + The index_key argument should hold index key information and is + passed to the methods in the ops table when they are called. + + This function makes no alteration to the array itself, but rather returns + an edit script that must be applied. -ENOMEM is returned in the case of + an out-of-memory error. NULL will be returned if the specified object is + not found within the array. + + The caller should lock exclusively against other modifiers of the array. + + + (4) Delete all objects from an associative array. + + struct assoc_array_edit * + assoc_array_clear(struct assoc_array *array, + const struct assoc_array_ops *ops); + + This deletes all the objects from an associative array and leaves it + completely empty. + + This function makes no alteration to the array itself, but rather returns + an edit script that must be applied. -ENOMEM is returned in the case of + an out-of-memory error. + + The caller should lock exclusively against other modifiers of the array. + + + (5) Destroy an associative array, deleting all objects. + + void assoc_array_destroy(struct assoc_array *array, + const struct assoc_array_ops *ops); + + This destroys the contents of the associative array and leaves it + completely empty. It is not permitted for another thread to be traversing + the array under the RCU read lock at the same time as this function is + destroying it as no RCU deferral is performed on memory release - + something that would require memory to be allocated. + + The caller should lock exclusively against other modifiers and accessors + of the array. + + + (6) Garbage collect an associative array. + + int assoc_array_gc(struct assoc_array *array, + const struct assoc_array_ops *ops, + bool (*iterator)(void *object, void *iterator_data), + void *iterator_data); + + This iterates over the objects in an associative array and passes each one + to iterator(). If iterator() returns true, the object is kept. If it + returns false, the object will be freed. If the iterator() function + returns true, it must perform any appropriate refcount incrementing on the + object before returning. + + The internal tree will be packed down if possible as part of the iteration + to reduce the number of nodes in it. + + The iterator_data is passed directly to iterator() and is otherwise + ignored by the function. + + The function will return 0 if successful and -ENOMEM if there wasn't + enough memory. + + It is possible for other threads to iterate over or search the array under + the RCU read lock whilst this function is in progress. The caller should + lock exclusively against other modifiers of the array. + + +ACCESS FUNCTIONS +---------------- + +There are two functions for accessing an associative array: + + (1) Iterate over all the objects in an associative array. + + int assoc_array_iterate(const struct assoc_array *array, + int (*iterator)(const void *object, + void *iterator_data), + void *iterator_data); + + This passes each object in the array to the iterator callback function. + iterator_data is private data for that function. + + This may be used on an array at the same time as the array is being + modified, provided the RCU read lock is held. Under such circumstances, + it is possible for the iteration function to see some objects twice. If + this is a problem, then modification should be locked against. The + iteration algorithm should not, however, miss any objects. + + The function will return 0 if no objects were in the array or else it will + return the result of the last iterator function called. Iteration stops + immediately if any call to the iteration function results in a non-zero + return. + + + (2) Find an object in an associative array. + + void *assoc_array_find(const struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key); + + This walks through the array's internal tree directly to the object + specified by the index key.. + + This may be used on an array at the same time as the array is being + modified, provided the RCU read lock is held. + + The function will return the object if found (and set *_type to the object + type) or will return NULL if the object was not found. + + +INDEX KEY FORM +-------------- + +The index key can be of any form, but since the algorithms aren't told how long +the key is, it is strongly recommended that the index key includes its length +very early on before any variation due to the length would have an effect on +comparisons. + +This will cause leaves with different length keys to scatter away from each +other - and those with the same length keys to cluster together. + +It is also recommended that the index key begin with a hash of the rest of the +key to maximise scattering throughout keyspace. + +The better the scattering, the wider and lower the internal tree will be. + +Poor scattering isn't too much of a problem as there are shortcuts and nodes +can contain mixtures of leaves and metadata pointers. + +The index key is read in chunks of machine word. Each chunk is subdivided into +one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and +on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is +unlikely that more than one word of any particular index key will have to be +used. + + +================= +INTERNAL WORKINGS +================= + +The associative array data structure has an internal tree. This tree is +constructed of two types of metadata blocks: nodes and shortcuts. + +A node is an array of slots. Each slot can contain one of four things: + + (*) A NULL pointer, indicating that the slot is empty. + + (*) A pointer to an object (a leaf). + + (*) A pointer to a node at the next level. + + (*) A pointer to a shortcut. + + +BASIC INTERNAL TREE LAYOUT +-------------------------- + +Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index +key space is strictly subdivided by the nodes in the tree and nodes occur on +fixed levels. For example: + + Level: 0 1 2 3 + =============== =============== =============== =============== + NODE D + NODE B NODE C +------>+---+ + +------>+---+ +------>+---+ | | 0 | + NODE A | | 0 | | | 0 | | +---+ + +---+ | +---+ | +---+ | : : + | 0 | | : : | : : | +---+ + +---+ | +---+ | +---+ | | f | + | 1 |---+ | 3 |---+ | 7 |---+ +---+ + +---+ +---+ +---+ + : : : : | 8 |---+ + +---+ +---+ +---+ | NODE E + | e |---+ | f | : : +------>+---+ + +---+ | +---+ +---+ | 0 | + | f | | | f | +---+ + +---+ | +---+ : : + | NODE F +---+ + +------>+---+ | f | + | 0 | NODE G +---+ + +---+ +------>+---+ + : : | | 0 | + +---+ | +---+ + | 6 |---+ : : + +---+ +---+ + : : | f | + +---+ +---+ + | f | + +---+ + +In the above example, there are 7 nodes (A-G), each with 16 slots (0-f). +Assuming no other meta data nodes in the tree, the key space is divided thusly: + + KEY PREFIX NODE + ========== ==== + 137* D + 138* E + 13[0-69-f]* C + 1[0-24-f]* B + e6* G + e[0-57-f]* F + [02-df]* A + +So, for instance, keys with the following example index keys will be found in +the appropriate nodes: + + INDEX KEY PREFIX NODE + =============== ======= ==== + 13694892892489 13 C + 13795289025897 137 D + 13889dde88793 138 E + 138bbb89003093 138 E + 1394879524789 12 C + 1458952489 1 B + 9431809de993ba - A + b4542910809cd - A + e5284310def98 e F + e68428974237 e6 G + e7fffcbd443 e F + f3842239082 - A + +To save memory, if a node can hold all the leaves in its portion of keyspace, +then the node will have all those leaves in it and will not have any metadata +pointers - even if some of those leaves would like to be in the same slot. + +A node can contain a heterogeneous mix of leaves and metadata pointers. +Metadata pointers must be in the slots that match their subdivisions of key +space. The leaves can be in any slot not occupied by a metadata pointer. It +is guaranteed that none of the leaves in a node will match a slot occupied by a +metadata pointer. If the metadata pointer is there, any leaf whose key matches +the metadata key prefix must be in the subtree that the metadata pointer points +to. + +In the above example list of index keys, node A will contain: + + SLOT CONTENT INDEX KEY (PREFIX) + ==== =============== ================== + 1 PTR TO NODE B 1* + any LEAF 9431809de993ba + any LEAF b4542910809cd + e PTR TO NODE F e* + any LEAF f3842239082 + +and node B: + + 3 PTR TO NODE C 13* + any LEAF 1458952489 + + +SHORTCUTS +--------- + +Shortcuts are metadata records that jump over a piece of keyspace. A shortcut +is a replacement for a series of single-occupancy nodes ascending through the +levels. Shortcuts exist to save memory and to speed up traversal. + +It is possible for the root of the tree to be a shortcut - say, for example, +the tree contains at least 17 nodes all with key prefix '1111'. The insertion +algorithm will insert a shortcut to skip over the '1111' keyspace in a single +bound and get to the fourth level where these actually become different. + + +SPLITTING AND COLLAPSING NODES +------------------------------ + +Each node has a maximum capacity of 16 leaves and metadata pointers. If the +insertion algorithm finds that it is trying to insert a 17th object into a +node, that node will be split such that at least two leaves that have a common +key segment at that level end up in a separate node rooted on that slot for +that common key segment. + +If the leaves in a full node and the leaf that is being inserted are +sufficiently similar, then a shortcut will be inserted into the tree. + +When the number of objects in the subtree rooted at a node falls to 16 or +fewer, then the subtree will be collapsed down to a single node - and this will +ripple towards the root if possible. + + +NON-RECURSIVE ITERATION +----------------------- + +Each node and shortcut contains a back pointer to its parent and the number of +slot in that parent that points to it. None-recursive iteration uses these to +proceed rootwards through the tree, going to the parent node, slot N + 1 to +make sure progress is made without the need for a stack. + +The backpointers, however, make simultaneous alteration and iteration tricky. + + +SIMULTANEOUS ALTERATION AND ITERATION +------------------------------------- + +There are a number of cases to consider: + + (1) Simple insert/replace. This involves simply replacing a NULL or old + matching leaf pointer with the pointer to the new leaf after a barrier. + The metadata blocks don't change otherwise. An old leaf won't be freed + until after the RCU grace period. + + (2) Simple delete. This involves just clearing an old matching leaf. The + metadata blocks don't change otherwise. The old leaf won't be freed until + after the RCU grace period. + + (3) Insertion replacing part of a subtree that we haven't yet entered. This + may involve replacement of part of that subtree - but that won't affect + the iteration as we won't have reached the pointer to it yet and the + ancestry blocks are not replaced (the layout of those does not change). + + (4) Insertion replacing nodes that we're actively processing. This isn't a + problem as we've passed the anchoring pointer and won't switch onto the + new layout until we follow the back pointers - at which point we've + already examined the leaves in the replaced node (we iterate over all the + leaves in a node before following any of its metadata pointers). + + We might, however, re-see some leaves that have been split out into a new + branch that's in a slot further along than we were at. + + (5) Insertion replacing nodes that we're processing a dependent branch of. + This won't affect us until we follow the back pointers. Similar to (4). + + (6) Deletion collapsing a branch under us. This doesn't affect us because the + back pointers will get us back to the parent of the new node before we + could see the new node. The entire collapsed subtree is thrown away + unchanged - and will still be rooted on the same slot, so we shouldn't + process it a second time as we'll go back to slot + 1. + +Note: + + (*) Under some circumstances, we need to simultaneously change the parent + pointer and the parent slot pointer on a node (say, for example, we + inserted another node before it and moved it up a level). We cannot do + this without locking against a read - so we have to replace that node too. + + However, when we're changing a shortcut into a node this isn't a problem + as shortcuts only have one slot and so the parent slot number isn't used + when traversing backwards over one. This means that it's okay to change + the slot number first - provided suitable barriers are used to make sure + the parent slot number is read after the back pointer. + +Obsolete blocks and leaves are freed up after an RCU grace period has passed, +so as long as anyone doing walking or iteration holds the RCU read lock, the +old superstructure should not go away on them. diff --git a/Documentation/backlight/lp855x-driver.txt b/Documentation/backlight/lp855x-driver.txt index 1c732f0c6758..01bce243d3d7 100644 --- a/Documentation/backlight/lp855x-driver.txt +++ b/Documentation/backlight/lp855x-driver.txt @@ -4,7 +4,8 @@ Kernel driver lp855x Backlight driver for LP855x ICs Supported chips: - Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8556 and LP8557 + Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and + LP8557 Author: Milo(Woogyom) Kim <milo.kim@ti.com> @@ -24,7 +25,7 @@ Value : pwm based or register based 2) chip_id The lp855x chip id. -Value : lp8550/lp8551/lp8552/lp8553/lp8556/lp8557 +Value : lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557 Platform data for lp855x ------------------------ diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.txt index 470fe4b5e379..e2240f5ab64d 100644 --- a/Documentation/blockdev/floppy.txt +++ b/Documentation/blockdev/floppy.txt @@ -39,15 +39,15 @@ Module configuration options ============================ If you use the floppy driver as a module, use the following syntax: -modprobe floppy <options> +modprobe floppy floppy="<options>" Example: - modprobe floppy omnibook messages + modprobe floppy floppy="omnibook messages" If you need certain options enabled every time you load the floppy driver, you can put: - options floppy omnibook messages + options floppy floppy="omnibook messages" in a configuration file in /etc/modprobe.d/. diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 8af4ad121828..e2bc132608fd 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -573,15 +573,19 @@ an memcg since the pages are allowed to be allocated from any physical node. One of the use cases is evaluating application performance by combining this information with the application's CPU allocation. -We export "total", "file", "anon" and "unevictable" pages per-node for -each memcg. The ouput format of memory.numa_stat is: +Each memcg's numa_stat file includes "total", "file", "anon" and "unevictable" +per-node page counts including "hierarchical_<counter>" which sums up all +hierarchical children's values in addition to the memcg's own value. + +The ouput format of memory.numa_stat is: total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... +hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ... -And we have total = file + anon + unevictable. +The "total" count is sum of file + anon + unevictable. 6. Hierarchy support diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 40282e617913..8b1a4451422e 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -23,8 +23,8 @@ Contents: 1.1 Initialization 1.2 Per-CPU Initialization 1.3 verify -1.4 target or setpolicy? -1.5 target +1.4 target/target_index or setpolicy? +1.5 target/target_index 1.6 setpolicy 2. Frequency Table Helpers @@ -56,7 +56,8 @@ cpufreq_driver.init - A pointer to the per-CPU initialization cpufreq_driver.verify - A pointer to a "verification" function. cpufreq_driver.setpolicy _or_ -cpufreq_driver.target - See below on the differences. +cpufreq_driver.target/ +target_index - See below on the differences. And optionally @@ -66,7 +67,7 @@ cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled and _before_ the pre-suspend frequency and/or policy is restored by a call to - ->target or ->setpolicy. + ->target/target_index or ->setpolicy. cpufreq_driver.attr - A pointer to a NULL-terminated list of "struct freq_attr" which allow to @@ -103,8 +104,8 @@ policy->governor must contain the "default policy" for this CPU. A few moments later, cpufreq_driver.verify and either cpufreq_driver.setpolicy or - cpufreq_driver.target is called with - these values. + cpufreq_driver.target/target_index is called + with these values. For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the frequency table helpers might be helpful. See the section 2 for more information @@ -133,20 +134,28 @@ range) is within policy->min and policy->max. If necessary, increase policy->max first, and only if this is no solution, decrease policy->min. -1.4 target or setpolicy? +1.4 target/target_index or setpolicy? ---------------------------- Most cpufreq drivers or even most cpu frequency scaling algorithms only allow the CPU to be set to one frequency. For these, you use the -->target call. +->target/target_index call. Some cpufreq-capable processors switch the frequency between certain limits on their own. These shall use the ->setpolicy call -1.4. target +1.4. target/target_index ------------- +The target_index call has two arguments: struct cpufreq_policy *policy, +and unsigned int index (into the exposed frequency table). + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined by freq_table[index].frequency. + +Deprecated: +---------- The target call has three arguments: struct cpufreq_policy *policy, unsigned int target_frequency, unsigned int relation. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 219970ba54b7..77ec21574fb1 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most cpu frequency scaling algorithms only offer the CPU to be set to one frequency. In order to offer dynamic frequency scaling, the cpufreq core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target" +these specific drivers will be transformed to offer a "->target/target_index" call instead of the existing "->setpolicy" call. For "longrun", all stays the same, though. @@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set / the limits of policy->{min,max} / \ / \ - Using the ->setpolicy call, Using the ->target call, + Using the ->setpolicy call, Using the ->target/target_index call, the limits and the the frequency closest "policy" is set. to target_freq is set. It is assured that it diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 786dc82f98ce..8cb9938cc47e 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -5,7 +5,7 @@ Rusty Russell <rusty@rustcorp.com.au> Srivatsa Vaddagiri <vatsa@in.ibm.com> i386: - Zwane Mwaikambo <zwane@arm.linux.org.uk> + Zwane Mwaikambo <zwanem@gmail.com> ppc64: Nathan Lynch <nathanl@austin.ibm.com> Joel Schopp <jschopp@austin.ibm.com> diff --git a/Documentation/cpuidle/governor.txt b/Documentation/cpuidle/governor.txt index 12c6bd50c9f6..d9020f5e847b 100644 --- a/Documentation/cpuidle/governor.txt +++ b/Documentation/cpuidle/governor.txt @@ -25,5 +25,4 @@ kernel configuration and platform will be selected by cpuidle. Interfaces: extern int cpuidle_register_governor(struct cpuidle_governor *gov); -extern void cpuidle_unregister_governor(struct cpuidle_governor *gov); struct cpuidle_governor diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt index d7c440b444cc..df52a849957f 100644 --- a/Documentation/device-mapper/cache-policies.txt +++ b/Documentation/device-mapper/cache-policies.txt @@ -30,8 +30,10 @@ multiqueue This policy is the default. -The multiqueue policy has two sets of 16 queues: one set for entries -waiting for the cache and another one for those in the cache. +The multiqueue policy has three sets of 16 queues: one set for entries +waiting for the cache and another two for those in the cache (a set for +clean entries and a set for dirty entries). + Cache entries in the queues are aged based on logical time. Entry into the cache is based on variable thresholds and queue selection is based on hit count on entry. The policy aims to take different cache miss diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index 33d45ee0b737..274752f8bdf9 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -68,10 +68,11 @@ So large block sizes are bad because they waste cache space. And small block sizes are bad because they increase the amount of metadata (both in core and on disk). -Writeback/writethrough ----------------------- +Cache operating modes +--------------------- -The cache has two modes, writeback and writethrough. +The cache has three operating modes: writeback, writethrough and +passthrough. If writeback, the default, is selected then a write to a block that is cached will go only to the cache and the block will be marked dirty in @@ -81,8 +82,31 @@ If writethrough is selected then a write to a cached block will not complete until it has hit both the origin and cache devices. Clean blocks should remain clean. +If passthrough is selected, useful when the cache contents are not known +to be coherent with the origin device, then all reads are served from +the origin device (all reads miss the cache) and all writes are +forwarded to the origin device; additionally, write hits cause cache +block invalidates. To enable passthrough mode the cache must be clean. +Passthrough mode allows a cache device to be activated without having to +worry about coherency. Coherency that exists is maintained, although +the cache will gradually cool as writes take place. If the coherency of +the cache can later be verified, or established through use of the +"invalidate_cblocks" message, the cache device can be transitioned to +writethrough or writeback mode while still warm. Otherwise, the cache +contents can be discarded prior to transitioning to the desired +operating mode. + A simple cleaner policy is provided, which will clean (write back) all -dirty blocks in a cache. Useful for decommissioning a cache. +dirty blocks in a cache. Useful for decommissioning a cache or when +shrinking a cache. Shrinking the cache's fast device requires all cache +blocks, in the area of the cache being removed, to be clean. If the +area being removed from the cache still contains dirty blocks the resize +will fail. Care must be taken to never reduce the volume used for the +cache's fast device until the cache is clean. This is of particular +importance if writeback mode is used. Writethrough and passthrough +modes already maintain a clean cache. Future support to partially clean +the cache, above a specified threshold, will allow for keeping the cache +warm and in writeback mode during resize. Migration throttling -------------------- @@ -161,7 +185,7 @@ Constructor block size : cache unit size in sectors #feature args : number of feature arguments passed - feature args : writethrough. (The default is writeback.) + feature args : writethrough or passthrough (The default is writeback.) policy : the replacement policy to use #policy args : an even number of arguments corresponding to @@ -177,6 +201,13 @@ Optional feature arguments are: back cache block contents later for performance reasons, so they may differ from the corresponding origin blocks. + passthrough : a degraded mode useful for various cache coherency + situations (e.g., rolling back snapshots of + underlying storage). Reads and writes always go to + the origin. If a write goes to a cached origin + block, then the cache block is invalidated. + To enable passthrough mode the cache must be clean. + A policy called 'default' is always registered. This is an alias for the policy we currently think is giving best all round performance. @@ -231,12 +262,26 @@ The message format is: E.g. dmsetup message my_cache 0 sequential_threshold 1024 + +Invalidation is removing an entry from the cache without writing it +back. Cache blocks can be invalidated via the invalidate_cblocks +message, which takes an arbitrary number of cblock ranges. Each cblock +must be expressed as a decimal value, in the future a variant message +that takes cblock ranges expressed in hexidecimal may be needed to +better support efficient invalidation of larger caches. The cache must +be in passthrough mode when invalidate_cblocks is used. + + invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]* + +E.g. + dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 + Examples ======== The test suite can be found here: -https://github.com/jthornber/thinp-test-suite +https://github.com/jthornber/device-mapper-test-suite dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt index 2c656ae43ba7..c81839b52c4d 100644 --- a/Documentation/device-mapper/dm-crypt.txt +++ b/Documentation/device-mapper/dm-crypt.txt @@ -4,12 +4,15 @@ dm-crypt Device-Mapper's "crypt" target provides transparent encryption of block devices using the kernel crypto API. +For a more detailed description of supported parameters see: +http://code.google.com/p/cryptsetup/wiki/DMCrypt + Parameters: <cipher> <key> <iv_offset> <device path> \ <offset> [<#opt_params> <opt_params>] <cipher> Encryption cipher and an optional IV generation mode. - (In format cipher[:keycount]-chainmode-ivopts:ivmode). + (In format cipher[:keycount]-chainmode-ivmode[:ivopts]). Examples: des aes-cbc-essiv:sha256 @@ -19,7 +22,11 @@ Parameters: <cipher> <key> <iv_offset> <device path> \ <key> Key used for encryption. It is encoded as a hexadecimal number. - You can only use key sizes that are valid for the selected cipher. + You can only use key sizes that are valid for the selected cipher + in combination with the selected iv mode. + Note that for some iv modes the key string can contain additional + keys (for example IV seed) so the key contains more parts concatenated + into a single string. <keycount> Multi-key compatibility mode. You can define <keycount> keys and diff --git a/Documentation/devices.txt b/Documentation/devices.txt index 23721d3be3e6..80b72419ffd8 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -414,6 +414,7 @@ Your cooperation is appreciated. 200 = /dev/net/tun TAP/TUN network device 201 = /dev/button/gulpb Transmeta GULP-B buttons 202 = /dev/emd/ctl Enhanced Metadisk RAID (EMD) control + 203 = /dev/cuse Cuse (character device in user-space) 204 = /dev/video/em8300 EM8300 DVD decoder control 205 = /dev/video/em8300_mv EM8300 DVD decoder video 206 = /dev/video/em8300_ma EM8300 DVD decoder audio diff --git a/Documentation/devicetree/bindings/arc/pmu.txt b/Documentation/devicetree/bindings/arc/pmu.txt new file mode 100644 index 000000000000..49d517340de3 --- /dev/null +++ b/Documentation/devicetree/bindings/arc/pmu.txt @@ -0,0 +1,24 @@ +* ARC Performance Monitor Unit + +The ARC 700 can be configured with a pipeline performance monitor for counting +CPU and cache events like cache misses and hits. + +Note that: + * ARC 700 refers to a family of ARC processor cores; + - There is only one type of PMU available for the whole family; + - The PMU may support different sets of events; supported events are probed + at boot time, as required by the reference manual. + + * The ARC 700 PMU does not support interrupts; although HW events may be + counted, the HW events themselves cannot serve as a trigger for a sample. + +Required properties: + +- compatible : should contain + "snps,arc700-pmu" + +Example: + +pmu { + compatible = "snps,arc700-pmu"; +}; diff --git a/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt index f770ac0893d4..049675944b78 100644 --- a/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt +++ b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt @@ -1,7 +1,9 @@ Calxeda DDR memory controller Properties: -- compatible : Should be "calxeda,hb-ddr-ctrl" +- compatible : Should be: + - "calxeda,hb-ddr-ctrl" for ECX-1000 + - "calxeda,ecx-2000-ddr-ctrl" for ECX-2000 - reg : Address and size for DDR controller registers. - interrupts : Interrupt for DDR controller. diff --git a/Documentation/devicetree/bindings/dma/atmel-dma.txt b/Documentation/devicetree/bindings/dma/atmel-dma.txt index e1f343c7a34b..f69bcf5a6343 100644 --- a/Documentation/devicetree/bindings/dma/atmel-dma.txt +++ b/Documentation/devicetree/bindings/dma/atmel-dma.txt @@ -28,7 +28,7 @@ The three cells in order are: dependent: - bit 7-0: peripheral identifier for the hardware handshaking interface. The identifier can be different for tx and rx. - - bit 11-8: FIFO configuration. 0 for half FIFO, 1 for ALAP, 1 for ASAP. + - bit 11-8: FIFO configuration. 0 for half FIFO, 1 for ALAP, 2 for ASAP. Example: diff --git a/Documentation/devicetree/bindings/hwmon/lm90.txt b/Documentation/devicetree/bindings/hwmon/lm90.txt new file mode 100644 index 000000000000..e8632486b9ef --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/lm90.txt @@ -0,0 +1,44 @@ +* LM90 series thermometer. + +Required node properties: +- compatible: manufacturer and chip name, one of + "adi,adm1032" + "adi,adt7461" + "adi,adt7461a" + "gmt,g781" + "national,lm90" + "national,lm86" + "national,lm89" + "national,lm99" + "dallas,max6646" + "dallas,max6647" + "dallas,max6649" + "dallas,max6657" + "dallas,max6658" + "dallas,max6659" + "dallas,max6680" + "dallas,max6681" + "dallas,max6695" + "dallas,max6696" + "onnn,nct1008" + "winbond,w83l771" + "nxp,sa56004" + +- reg: I2C bus address of the device + +- vcc-supply: vcc regulator for the supply voltage. + +Optional properties: +- interrupts: Contains a single interrupt specifier which describes the + LM90 "-ALERT" pin output. + See interrupt-controller/interrupts.txt for the format. + +Example LM90 node: + +temp-sensor { + compatible = "onnn,nct1008"; + reg = <0x4c>; + vcc-supply = <&palmas_ldo6_reg>; + interrupt-parent = <&gpio>; + interrupts = <TEGRA_GPIO(O, 4) IRQ_TYPE_LEVEL_LOW>; +} diff --git a/Documentation/devicetree/bindings/i2c/i2c-bcm-kona.txt b/Documentation/devicetree/bindings/i2c/i2c-bcm-kona.txt new file mode 100644 index 000000000000..1b87b741fa8e --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-bcm-kona.txt @@ -0,0 +1,35 @@ +Broadcom Kona Family I2C +========================= + +This I2C controller is used in the following Broadcom SoCs: + + BCM11130 + BCM11140 + BCM11351 + BCM28145 + BCM28155 + +Required Properties +------------------- +- compatible: "brcm,bcm11351-i2c", "brcm,kona-i2c" +- reg: Physical base address and length of controller registers +- interrupts: The interrupt number used by the controller +- clocks: clock specifier for the kona i2c external clock +- clock-frequency: The I2C bus frequency in Hz +- #address-cells: Should be <1> +- #size-cells: Should be <0> + +Refer to clocks/clock-bindings.txt for generic clock consumer +properties. + +Example: + +i2c@3e016000 { + compatible = "brcm,bcm11351-i2c","brcm,kona-i2c"; + reg = <0x3e016000 0x80>; + interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&bsc1_clk>; + clock-frequency = <400000>; + #address-cells = <1>; + #size-cells = <0>; +}; diff --git a/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt b/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt new file mode 100644 index 000000000000..056732cfdcee --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt @@ -0,0 +1,44 @@ +* Samsung's High Speed I2C controller + +The Samsung's High Speed I2C controller is used to interface with I2C devices +at various speeds ranging from 100khz to 3.4Mhz. + +Required properties: + - compatible: value should be. + -> "samsung,exynos5-hsi2c", for i2c compatible with exynos5 hsi2c. + - reg: physical base address of the controller and length of memory mapped + region. + - interrupts: interrupt number to the cpu. + - #address-cells: always 1 (for i2c addresses) + - #size-cells: always 0 + + - Pinctrl: + - pinctrl-0: Pin control group to be used for this controller. + - pinctrl-names: Should contain only one value - "default". + +Optional properties: + - clock-frequency: Desired operating frequency in Hz of the bus. + -> If not specified, the bus operates in fast-speed mode at + at 100khz. + -> If specified, the bus operates in high-speed mode only if the + clock-frequency is >= 1Mhz. + +Example: + +hsi2c@12ca0000 { + compatible = "samsung,exynos5-hsi2c"; + reg = <0x12ca0000 0x100>; + interrupts = <56>; + clock-frequency = <100000>; + + pinctrl-0 = <&i2c4_bus>; + pinctrl-names = "default"; + + #address-cells = <1>; + #size-cells = <0>; + + s2mps11_pmic@66 { + compatible = "samsung,s2mps11-pmic"; + reg = <0x66>; + }; +}; diff --git a/Documentation/devicetree/bindings/i2c/i2c-rcar.txt b/Documentation/devicetree/bindings/i2c/i2c-rcar.txt new file mode 100644 index 000000000000..897cfcd5ce92 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-rcar.txt @@ -0,0 +1,23 @@ +I2C for R-Car platforms + +Required properties: +- compatible: Must be one of + "renesas,i2c-rcar" + "renesas,i2c-r8a7778" + "renesas,i2c-r8a7779" + "renesas,i2c-r8a7790" +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: interrupt specifier. + +Optional properties: +- clock-frequency: desired I2C bus clock frequency in Hz. The absence of this + propoerty indicates the default frequency 100 kHz. + +Examples : + +i2c0: i2c@e6500000 { + compatible = "renesas,i2c-rcar-h2"; + reg = <0 0xe6500000 0 0x428>; + interrupts = <0 174 0x4>; +}; diff --git a/Documentation/devicetree/bindings/i2c/i2c-st.txt b/Documentation/devicetree/bindings/i2c/i2c-st.txt new file mode 100644 index 000000000000..437e0db3823c --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-st.txt @@ -0,0 +1,41 @@ +ST SSC binding, for I2C mode operation + +Required properties : +- compatible : Must be "st,comms-ssc-i2c" or "st,comms-ssc4-i2c" +- reg : Offset and length of the register set for the device +- interrupts : the interrupt specifier +- clock-names: Must contain "ssc". +- clocks: Must contain an entry for each name in clock-names. See the common + clock bindings. +- A pinctrl state named "default" must be defined to set pins in mode of + operation for I2C transfer. + +Optional properties : +- clock-frequency : Desired I2C bus clock frequency in Hz. If not specified, + the default 100 kHz frequency will be used. As only Normal and Fast modes + are supported, possible values are 100000 and 400000. +- st,i2c-min-scl-pulse-width-us : The minimum valid SCL pulse width that is + allowed through the deglitch circuit. In units of us. +- st,i2c-min-sda-pulse-width-us : The minimum valid SDA pulse width that is + allowed through the deglitch circuit. In units of us. +- A pinctrl state named "idle" could be defined to set pins in idle state + when I2C instance is not performing a transfer. +- A pinctrl state named "sleep" could be defined to set pins in sleep state + when driver enters in suspend. + + + +Example : + +i2c0: i2c@fed40000 { + compatible = "st,comms-ssc4-i2c"; + reg = <0xfed40000 0x110>; + interrupts = <GIC_SPI 187 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&CLK_S_ICN_REG_0>; + clock-names = "ssc"; + clock-frequency = <400000>; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_i2c0_default>; + st,i2c-min-scl-pulse-width-us = <0>; + st,i2c-min-sda-pulse-width-us = <5>; +}; diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt index ad6a73852f08..b1cb3415e6f1 100644 --- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -15,6 +15,7 @@ adi,adt7461 +/-1C TDM Extended Temp Range I.C adt7461 +/-1C TDM Extended Temp Range I.C at,24c08 i2c serial eeprom (24cxx) atmel,24c02 i2c serial eeprom (24cxx) +atmel,at97sc3204t i2c trusted platform module (TPM) catalyst,24c32 i2c serial eeprom dallas,ds1307 64 x 8, Serial, I2C Real-Time Clock dallas,ds1338 I2C RTC with 56-Byte NV RAM @@ -35,6 +36,7 @@ fsl,mc13892 MC13892: Power Management Integrated Circuit (PMIC) for i.MX35/51 fsl,mma8450 MMA8450Q: Xtrinsic Low-power, 3-axis Xtrinsic Accelerometer fsl,mpr121 MPR121: Proximity Capacitive Touch Sensor Controller fsl,sgtl5000 SGTL5000: Ultra Low-Power Audio Codec +gmt,g751 G751: Digital Temperature Sensor and Thermal Watchdog with Two-Wire Interface infineon,slb9635tt Infineon SLB9635 (Soft-) I2C TPM (old protocol, max 100khz) infineon,slb9645tt Infineon SLB9645 I2C TPM (new protocol, max 400khz) maxim,ds1050 5 Bit Programmable, Pulse-Width Modulator @@ -44,6 +46,7 @@ mc,rv3029c2 Real Time Clock Module with I2C-Bus national,lm75 I2C TEMP SENSOR national,lm80 Serial Interface ACPI-Compatible Microprocessor System Hardware Monitor national,lm92 ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and Thermal Window Comparator with Two-Wire Interface +nuvoton,npct501 i2c trusted platform module (TPM) nxp,pca9556 Octal SMBus and I2C registered interface nxp,pca9557 8-bit I2C-bus and SMBus I/O port with reset nxp,pcf8563 Real-time clock/calendar @@ -61,3 +64,4 @@ taos,tsl2550 Ambient Light Sensor with SMBUS/Two Wire Serial Interface ti,tsc2003 I2C Touch-Screen Controller ti,tmp102 Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface ti,tmp275 Digital Temperature Sensor +winbond,wpct301 i2c trusted platform module (TPM) diff --git a/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt b/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt index 491c97b78384..878549ba814d 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt +++ b/Documentation/devicetree/bindings/input/touchscreen/ti-tsc-adc.txt @@ -6,7 +6,7 @@ Required properties: ti,wires: Wires refer to application modes i.e. 4/5/8 wire touchscreen support on the platform. ti,x-plate-resistance: X plate resistance - ti,coordiante-readouts: The sequencer supports a total of 16 + ti,coordinate-readouts: The sequencer supports a total of 16 programmable steps each step is used to read a single coordinate. A single readout is enough but multiple reads can diff --git a/Documentation/devicetree/bindings/media/st-rc.txt b/Documentation/devicetree/bindings/media/st-rc.txt new file mode 100644 index 000000000000..05c432d08bca --- /dev/null +++ b/Documentation/devicetree/bindings/media/st-rc.txt @@ -0,0 +1,29 @@ +Device-Tree bindings for ST IRB IP + +Required properties: + - compatible: Should contain "st,comms-irb". + - reg: Base physical address of the controller and length of memory + mapped region. + - interrupts: interrupt-specifier for the sole interrupt generated by + the device. The interrupt specifier format depends on the interrupt + controller parent. + - rx-mode: can be "infrared" or "uhf". This property specifies the L1 + protocol used for receiving remote control signals. rx-mode should + be present iff the rx pins are wired up. + - tx-mode: should be "infrared". This property specifies the L1 + protocol used for transmitting remote control signals. tx-mode should + be present iff the tx pins are wired up. + +Optional properties: + - pinctrl-names, pinctrl-0: the pincontrol settings to configure muxing + properly for IRB pins. + - clocks : phandle with clock-specifier pair for IRB. + +Example node: + + rc: rc@fe518000 { + compatible = "st,comms-irb"; + reg = <0xfe518000 0x234>; + interrupts = <0 203 0>; + rx-mode = "infrared"; + }; diff --git a/Documentation/devicetree/bindings/mfd/as3722.txt b/Documentation/devicetree/bindings/mfd/as3722.txt new file mode 100644 index 000000000000..fc2191ecfd6b --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/as3722.txt @@ -0,0 +1,194 @@ +* ams AS3722 Power management IC. + +Required properties: +------------------- +- compatible: Must be "ams,as3722". +- reg: I2C device address. +- interrupt-controller: AS3722 has internal interrupt controller which takes the + interrupt request from internal sub-blocks like RTC, regulators, GPIOs as well + as external input. +- #interrupt-cells: Should be set to 2 for IRQ number and flags. + The first cell is the IRQ number. IRQ numbers for different interrupt source + of AS3722 are defined at dt-bindings/mfd/as3722.h + The second cell is the flags, encoded as the trigger masks from binding document + interrupts.txt, using dt-bindings/irq. + +Optional submodule and their properties: +======================================= + +Pinmux and GPIO: +=============== +Device has 8 GPIO pins which can be configured as GPIO as well as the special IO +functions. + +Please refer to pinctrl-bindings.txt in this directory for details of the +common pinctrl bindings used by client devices, including the meaning of the +phrase "pin configuration node". + +Following are properties which is needed if GPIO and pinmux functionality +is required: + Required properties: + ------------------- + - gpio-controller: Marks the device node as a GPIO controller. + - #gpio-cells: Number of GPIO cells. Refer to binding document + gpio/gpio.txt + + Optional properties: + -------------------- + Following properties are require if pin control setting is required + at boot. + - pinctrl-names: A pinctrl state named "default" be defined, using the + bindings in pinctrl/pinctrl-binding.txt. + - pinctrl[0...n]: Properties to contain the phandle that refer to + different nodes of pin control settings. These nodes represents + the pin control setting of state 0 to state n. Each of these + nodes contains different subnodes to represents some desired + configuration for a list of pins. This configuration can + include the mux function to select on those pin(s), and + various pin configuration parameters, such as pull-up, + open drain. + + Each subnode have following properties: + Required properties: + - pins: List of pins. Valid values of pins properties are: + gpio0, gpio1, gpio2, gpio3, gpio4, gpio5, + gpio6, gpio7 + + Optional properties: + function, bias-disable, bias-pull-up, bias-pull-down, + bias-high-impedance, drive-open-drain. + + Valid values for function properties are: + gpio, interrupt-out, gpio-in-interrupt, + vsup-vbat-low-undebounce-out, + vsup-vbat-low-debounce-out, + voltage-in-standby, oc-pg-sd0, oc-pg-sd6, + powergood-out, pwm-in, pwm-out, clk32k-out, + watchdog-in, soft-reset-in + +Regulators: +=========== +Device has multiple DCDC and LDOs. The node "regulators" is require if regulator +functionality is needed. + +Following are properties of regulator subnode. + + Optional properties: + ------------------- + The input supply of regulators are the optional properties on the + regulator node. The input supply of these regulators are provided + through following properties: + vsup-sd2-supply: Input supply for SD2. + vsup-sd3-supply: Input supply for SD3. + vsup-sd4-supply: Input supply for SD4. + vsup-sd5-supply: Input supply for SD5. + vin-ldo0-supply: Input supply for LDO0. + vin-ldo1-6-supply: Input supply for LDO1 and LDO6. + vin-ldo2-5-7-supply: Input supply for LDO2, LDO5 and LDO7. + vin-ldo3-4-supply: Input supply for LDO3 and LDO4. + vin-ldo9-10-supply: Input supply for LDO9 and LDO10. + vin-ldo11-supply: Input supply for LDO11. + + Optional sub nodes for regulators: + --------------------------------- + The subnodes name is the name of regulator and it must be one of: + sd[0-6], ldo[0-7], ldo[9-11] + + Each sub-node should contain the constraints and initialization + information for that regulator. See regulator.txt for a description + of standard properties for these sub-nodes. + Additional optional custom properties are listed below. + ams,ext-control: External control of the rail. The option of + this properties will tell which external input is + controlling this rail. Valid values are 0, 1, 2 ad 3. + 0: There is no external control of this rail. + 1: Rail is controlled by ENABLE1 input pin. + 2: Rail is controlled by ENABLE2 input pin. + 3: Rail is controlled by ENABLE3 input pin. + Missing this property on DT will be assume as no + external control. The external control pin macros + are defined @dt-bindings/mfd/as3722.h + + ams,enable-tracking: Enable tracking with SD1, only supported + by LDO3. + +Example: +-------- +#include <dt-bindings/mfd/as3722.h> +... +ams3722 { + compatible = "ams,as3722"; + reg = <0x48>; + + interrupt-parent = <&intc>; + interrupt-controller; + #interrupt-cells = <2>; + + gpio-controller; + #gpio-cells = <2>; + + pinctrl-names = "default"; + pinctrl-0 = <&as3722_default>; + + as3722_default: pinmux { + gpio0 { + pins = "gpio0"; + function = "gpio"; + bias-pull-down; + }; + + gpio1_2_4_7 { + pins = "gpio1", "gpio2", "gpio4", "gpio7"; + function = "gpio"; + bias-pull-up; + }; + + gpio5 { + pins = "gpio5"; + function = "clk32k_out"; + }; + } + + regulators { + vsup-sd2-supply = <...>; + ... + + sd0 { + regulator-name = "vdd_cpu"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1400000>; + regulator-always-on; + ams,ext-control = <2>; + }; + + sd1 { + regulator-name = "vdd_core"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1400000>; + regulator-always-on; + ams,ext-control = <1>; + }; + + sd2 { + regulator-name = "vddio_ddr"; + regulator-min-microvolt = <1350000>; + regulator-max-microvolt = <1350000>; + regulator-always-on; + }; + + sd4 { + regulator-name = "avdd-hdmi-pex"; + regulator-min-microvolt = <1050000>; + regulator-max-microvolt = <1050000>; + regulator-always-on; + }; + + sd5 { + regulator-name = "vdd-1v8"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-always-on; + }; + .... + }; +}; diff --git a/Documentation/devicetree/bindings/mfd/s2mps11.txt b/Documentation/devicetree/bindings/mfd/s2mps11.txt index c9332c626021..78a840d7510d 100644 --- a/Documentation/devicetree/bindings/mfd/s2mps11.txt +++ b/Documentation/devicetree/bindings/mfd/s2mps11.txt @@ -1,10 +1,10 @@ * Samsung S2MPS11 Voltage and Current Regulator -The Samsung S2MP211 is a multi-function device which includes voltage and +The Samsung S2MPS11 is a multi-function device which includes voltage and current regulators, RTC, charger controller and other sub-blocks. It is -interfaced to the host controller using a I2C interface. Each sub-block is -addressed by the host system using different I2C slave address. +interfaced to the host controller using an I2C interface. Each sub-block is +addressed by the host system using different I2C slave addresses. Required properties: - compatible: Should be "samsung,s2mps11-pmic". @@ -43,7 +43,8 @@ sub-node should be of the format as listed below. BUCK[2/3/4/6] supports disabling ramp delay on hardware, so explictly regulator-ramp-delay = <0> can be used for them to disable ramp delay. - In absence of regulator-ramp-delay property, default ramp delay will be used. + In the absence of the regulator-ramp-delay property, the default ramp + delay will be used. NOTE: Some BUCKs share the ramp rate setting i.e. same ramp value will be set for a particular group of BUCKs. So provide same regulator-ramp-delay<value>. @@ -58,10 +59,10 @@ supports. Note: The 'n' in LDOn and BUCKn represents the LDO or BUCK number as per the datasheet of s2mps11. - LDOn - - valid values for n are 1 to 28 + - valid values for n are 1 to 38 - Example: LDO0, LD01, LDO28 - BUCKn - - valid values for n are 1 to 9. + - valid values for n are 1 to 10. - Example: BUCK1, BUCK2, BUCK9 Example: diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt index 1dd622546d06..9046ba06c47a 100644 --- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt +++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt @@ -12,6 +12,11 @@ Required properties: Optional properties: - fsl,cd-controller : Indicate to use controller internal card detection - fsl,wp-controller : Indicate to use controller internal write protection +- fsl,delay-line : Specify the number of delay cells for override mode. + This is used to set the clock delay for DLL(Delay Line) on override mode + to select a proper data sampling window in case the clock quality is not good + due to signal path is too long on the board. Please refer to eSDHC/uSDHC + chapter, DLL (Delay Line) section in RM for details. Examples: diff --git a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt index 066a78b034ca..8f3f13315358 100644 --- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt @@ -52,6 +52,9 @@ Optional properties: is specified and the ciu clock is specified then we'll try to set the ciu clock to this at probe time. +* clock-freq-min-max: Minimum and Maximum clock frequency for card output + clock(cclk_out). If it's not specified, max is 200MHZ and min is 400KHz by default. + * num-slots: specifies the number of slots supported by the controller. The number of physical slots actually used could be equal or less than the value specified by num-slots. If this property is not specified, the value @@ -66,6 +69,10 @@ Optional properties: * supports-highspeed: Enables support for high speed cards (up to 50MHz) +* caps2-mmc-hs200-1_8v: Supports mmc HS200 SDR 1.8V mode + +* caps2-mmc-hs200-1_2v: Supports mmc HS200 SDR 1.2V mode + * broken-cd: as documented in mmc core bindings. * vmmc-supply: The phandle to the regulator to use for vmmc. If this is @@ -93,8 +100,10 @@ board specific portions as listed below. dwmmc0@12200000 { clock-frequency = <400000000>; + clock-freq-min-max = <400000 200000000>; num-slots = <1>; supports-highspeed; + caps2-mmc-hs200-1_8v; broken-cd; fifo-depth = <0x80>; card-detect-delay = <200>; diff --git a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt index df338cb5059c..5e1f31b5ff70 100644 --- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt +++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt @@ -22,10 +22,10 @@ Optional properties: width of 8 is assumed. - ti,nand-ecc-opt: A string setting the ECC layout to use. One of: - - "sw" Software method (default) - "hw" Hardware method - "hw-romcode" gpmc hamming mode method & romcode layout + "sw" <deprecated> use "ham1" instead + "hw" <deprecated> use "ham1" instead + "hw-romcode" <deprecated> use "ham1" instead + "ham1" 1-bit Hamming ecc code "bch4" 4-bit BCH ecc code "bch8" 8-bit BCH ecc code @@ -36,8 +36,12 @@ Optional properties: "prefetch-dma" Prefetch enabled sDMA mode "prefetch-irq" Prefetch enabled irq mode - - elm_id: Specifies elm device node. This is required to support BCH - error correction using ELM module. + - elm_id: <deprecated> use "ti,elm-id" instead + - ti,elm-id: Specifies phandle of the ELM devicetree node. + ELM is an on-chip hardware engine on TI SoC which is used for + locating ECC errors for BCHx algorithms. SoC devices which have + ELM hardware engines should specify this device node in .dtsi + Using ELM for ECC error correction frees some CPU cycles. For inline partiton table parsing (optional): diff --git a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt new file mode 100644 index 000000000000..7ff57a119f81 --- /dev/null +++ b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt @@ -0,0 +1,28 @@ +TI CPSW Phy mode Selection Device Tree Bindings +----------------------------------------------- + +Required properties: +- compatible : Should be "ti,am3352-cpsw-phy-sel" +- reg : physical base address and size of the cpsw + registers map +- reg-names : names of the register map given in "reg" node + +Optional properties: +-rmii-clock-ext : If present, the driver will configure the RMII + interface to external clock usage + +Examples: + + phy_sel: cpsw-phy-sel@44e10650 { + compatible = "ti,am3352-cpsw-phy-sel"; + reg= <0x44e10650 0x4>; + reg-names = "gmii-sel"; + }; + +(or) + phy_sel: cpsw-phy-sel@44e10650 { + compatible = "ti,am3352-cpsw-phy-sel"; + reg= <0x44e10650 0x4>; + reg-names = "gmii-sel"; + rmii-clock-ext; + }; diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt b/Documentation/devicetree/bindings/pci/designware-pcie.txt index e216af356847..d5d26d443693 100644 --- a/Documentation/devicetree/bindings/pci/designware-pcie.txt +++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt @@ -3,7 +3,7 @@ Required properties: - compatible: should contain "snps,dw-pcie" to identify the core, plus an identifier for the specific instance, such - as "samsung,exynos5440-pcie". + as "samsung,exynos5440-pcie" or "fsl,imx6q-pcie". - reg: base addresses and lengths of the pcie controller, the phy controller, additional register for the phy controller. - interrupts: interrupt values for level interrupt, @@ -21,6 +21,11 @@ Required properties: - num-lanes: number of lanes to use - reset-gpio: gpio pin number of power good signal +Optional properties for fsl,imx6q-pcie +- power-on-gpio: gpio pin number of power-enable signal +- wake-up-gpio: gpio pin number of incoming wakeup signal +- disable-gpio: gpio pin number of outgoing rfkill/endpoint disable signal + Example: SoC specific DT Entry: diff --git a/Documentation/devicetree/bindings/power/twl-charger.txt b/Documentation/devicetree/bindings/power/twl-charger.txt new file mode 100644 index 000000000000..d5c706216df5 --- /dev/null +++ b/Documentation/devicetree/bindings/power/twl-charger.txt @@ -0,0 +1,20 @@ +TWL BCI (Battery Charger Interface) + +Required properties: +- compatible: + - "ti,twl4030-bci" +- interrupts: two interrupt lines from the TWL SIH (secondary + interrupt handler) - interrupts 9 and 2. + +Optional properties: +- ti,bb-uvolt: microvolts for charging the backup battery. +- ti,bb-uamp: microamps for charging the backup battery. + +Examples: + +bci { + compatible = "ti,twl4030-bci"; + interrupts = <9>, <2>; + ti,bb-uvolt = <3200000>; + ti,bb-uamp = <150>; +}; diff --git a/Documentation/devicetree/bindings/power_supply/ti,bq24735.txt b/Documentation/devicetree/bindings/power_supply/ti,bq24735.txt new file mode 100644 index 000000000000..4f6a550184d0 --- /dev/null +++ b/Documentation/devicetree/bindings/power_supply/ti,bq24735.txt @@ -0,0 +1,32 @@ +TI BQ24735 Charge Controller +~~~~~~~~~~ + +Required properties : + - compatible : "ti,bq24735" + +Optional properties : + - interrupts : Specify the interrupt to be used to trigger when the AC + adapter is either plugged in or removed. + - ti,ac-detect-gpios : This GPIO is optionally used to read the AC adapter + presence. This is a Host GPIO that is configured as an input and + connected to the bq24735. + - ti,charge-current : Used to control and set the charging current. This value + must be between 128mA and 8.128A with a 64mA step resolution. The POR value + is 0x0000h. This number is in mA (e.g. 8192), see spec for more information + about the ChargeCurrent (0x14h) register. + - ti,charge-voltage : Used to control and set the charging voltage. This value + must be between 1.024V and 19.2V with a 16mV step resolution. The POR value + is 0x0000h. This number is in mV (e.g. 19200), see spec for more information + about the ChargeVoltage (0x15h) register. + - ti,input-current : Used to control and set the charger input current. This + value must be between 128mA and 8.064A with a 128mA step resolution. The + POR value is 0x1000h. This number is in mA (e.g. 8064), see the spec for + more information about the InputCurrent (0x3fh) register. + +Example: + + bq24735@9 { + compatible = "ti,bq24735"; + reg = <0x9>; + ti,ac-detect-gpios = <&gpio 72 0x1>; + } diff --git a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt index 2a4b4bce6110..7fc1b010fa75 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt @@ -1,33 +1,30 @@ -* Freescale 83xx DMA Controller +* Freescale DMA Controllers -Freescale PowerPC 83xx have on chip general purpose DMA controllers. +** Freescale Elo DMA Controller + This is a little-endian 4-channel DMA controller, used in Freescale mpc83xx + series chips such as mpc8315, mpc8349, mpc8379 etc. Required properties: -- compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma", where CHIP is the processor - (mpc8349, mpc8360, etc.) and the second is - "fsl,elo-dma" -- reg : <registers mapping for DMA general status reg> -- ranges : Should be defined as specified in 1) to describe the - DMA controller channels. +- compatible : must include "fsl,elo-dma" +- reg : DMA General Status Register, i.e. DGSR which contains + status for all the 4 DMA channels +- ranges : describes the mapping between the address space of the + DMA channels and the address space of the DMA controller - cell-index : controller index. 0 for controller @ 0x8100 -- interrupts : <interrupt mapping for DMA IRQ> +- interrupts : interrupt specifier for DMA IRQ - interrupt-parent : optional, if needed for interrupt mapping - - DMA channel nodes: - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma-channel", where CHIP is the processor - (mpc8349, mpc8350, etc.) and the second is - "fsl,elo-dma-channel". However, see note below. - - reg : <registers mapping for channel> - - cell-index : dma channel index starts at 0. + - compatible : must include "fsl,elo-dma-channel" + However, see note below. + - reg : DMA channel specific registers + - cell-index : DMA channel index starts at 0. Optional properties: - - interrupts : <interrupt mapping for DMA channel IRQ> - (on 83xx this is expected to be identical to - the interrupts property of the parent node) + - interrupts : interrupt specifier for DMA channel IRQ + (on 83xx this is expected to be identical to + the interrupts property of the parent node) - interrupt-parent : optional, if needed for interrupt mapping Example: @@ -70,30 +67,27 @@ Example: }; }; -* Freescale 85xx/86xx DMA Controller - -Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers. +** Freescale EloPlus DMA Controller + This is a 4-channel DMA controller with extended addresses and chaining, + mainly used in Freescale mpc85xx/86xx, Pxxx and BSC series chips, such as + mpc8540, mpc8641 p4080, bsc9131 etc. Required properties: -- compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma", where CHIP is the processor - (mpc8540, mpc8540, etc.) and the second is - "fsl,eloplus-dma" -- reg : <registers mapping for DMA general status reg> +- compatible : must include "fsl,eloplus-dma" +- reg : DMA General Status Register, i.e. DGSR which contains + status for all the 4 DMA channels - cell-index : controller index. 0 for controller @ 0x21000, 1 for controller @ 0xc000 -- ranges : Should be defined as specified in 1) to describe the - DMA controller channels. +- ranges : describes the mapping between the address space of the + DMA channels and the address space of the DMA controller - DMA channel nodes: - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma-channel", where CHIP is the processor - (mpc8540, mpc8560, etc.) and the second is - "fsl,eloplus-dma-channel". However, see note below. - - cell-index : dma channel index starts at 0. - - reg : <registers mapping for channel> - - interrupts : <interrupt mapping for DMA channel IRQ> + - compatible : must include "fsl,eloplus-dma-channel" + However, see note below. + - cell-index : DMA channel index starts at 0. + - reg : DMA channel specific registers + - interrupts : interrupt specifier for DMA channel IRQ - interrupt-parent : optional, if needed for interrupt mapping Example: @@ -134,6 +128,76 @@ Example: }; }; +** Freescale Elo3 DMA Controller + DMA controller which has same function as EloPlus except that Elo3 has 8 + channels while EloPlus has only 4, it is used in Freescale Txxx and Bxxx + series chips, such as t1040, t4240, b4860. + +Required properties: + +- compatible : must include "fsl,elo3-dma" +- reg : contains two entries for DMA General Status Registers, + i.e. DGSR0 which includes status for channel 1~4, and + DGSR1 for channel 5~8 +- ranges : describes the mapping between the address space of the + DMA channels and the address space of the DMA controller + +- DMA channel nodes: + - compatible : must include "fsl,eloplus-dma-channel" + - reg : DMA channel specific registers + - interrupts : interrupt specifier for DMA channel IRQ + - interrupt-parent : optional, if needed for interrupt mapping + +Example: +dma@100300 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "fsl,elo3-dma"; + reg = <0x100300 0x4>, + <0x100600 0x4>; + ranges = <0x0 0x100100 0x500>; + dma-channel@0 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x0 0x80>; + interrupts = <28 2 0 0>; + }; + dma-channel@80 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x80 0x80>; + interrupts = <29 2 0 0>; + }; + dma-channel@100 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x100 0x80>; + interrupts = <30 2 0 0>; + }; + dma-channel@180 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x180 0x80>; + interrupts = <31 2 0 0>; + }; + dma-channel@300 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x300 0x80>; + interrupts = <76 2 0 0>; + }; + dma-channel@380 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x380 0x80>; + interrupts = <77 2 0 0>; + }; + dma-channel@400 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x400 0x80>; + interrupts = <78 2 0 0>; + }; + dma-channel@480 { + compatible = "fsl,eloplus-dma-channel"; + reg = <0x480 0x80>; + interrupts = <79 2 0 0>; + }; +}; + Note on DMA channel compatible properties: The compatible property must say "fsl,elo-dma-channel" or "fsl,eloplus-dma-channel" to be used by the Elo DMA driver (fsldma). Any DMA channel used by fsldma cannot be used by another diff --git a/Documentation/devicetree/bindings/pwm/pwm-samsung.txt b/Documentation/devicetree/bindings/pwm/pwm-samsung.txt index d61fccd40bad..5538de9c2007 100644 --- a/Documentation/devicetree/bindings/pwm/pwm-samsung.txt +++ b/Documentation/devicetree/bindings/pwm/pwm-samsung.txt @@ -15,7 +15,7 @@ Required properties: samsung,s5pc100-pwm - for 32-bit timers present on S5PC100, S5PV210, Exynos4210 rev0 SoCs samsung,exynos4210-pwm - for 32-bit timers present on Exynos4210, - Exynos4x12 and Exynos5250 SoCs + Exynos4x12, Exynos5250 and Exynos5420 SoCs - reg: base address and size of register area - interrupts: list of timer interrupts (one interrupt per timer, starting at timer 0) diff --git a/Documentation/devicetree/bindings/video/atmel,lcdc.txt b/Documentation/devicetree/bindings/video/atmel,lcdc.txt new file mode 100644 index 000000000000..1ec175eddca8 --- /dev/null +++ b/Documentation/devicetree/bindings/video/atmel,lcdc.txt @@ -0,0 +1,75 @@ +Atmel LCDC Framebuffer +----------------------------------------------------- + +Required properties: +- compatible : + "atmel,at91sam9261-lcdc" , + "atmel,at91sam9263-lcdc" , + "atmel,at91sam9g10-lcdc" , + "atmel,at91sam9g45-lcdc" , + "atmel,at91sam9g45es-lcdc" , + "atmel,at91sam9rl-lcdc" , + "atmel,at32ap-lcdc" +- reg : Should contain 1 register ranges(address and length) +- interrupts : framebuffer controller interrupt +- display: a phandle pointing to the display node + +Required nodes: +- display: a display node is required to initialize the lcd panel + This should be in the board dts. +- default-mode: a videomode within the display with timing parameters + as specified below. + +Example: + + fb0: fb@0x00500000 { + compatible = "atmel,at91sam9g45-lcdc"; + reg = <0x00500000 0x1000>; + interrupts = <23 3 0>; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_fb>; + display = <&display0>; + status = "okay"; + #address-cells = <1>; + #size-cells = <1>; + + }; + +Atmel LCDC Display +----------------------------------------------------- +Required properties (as per of_videomode_helper): + + - atmel,dmacon: dma controler configuration + - atmel,lcdcon2: lcd controler configuration + - atmel,guard-time: lcd guard time (Delay in frame periods) + - bits-per-pixel: lcd panel bit-depth. + +Optional properties (as per of_videomode_helper): + - atmel,lcdcon-backlight: enable backlight + - atmel,lcd-wiring-mode: lcd wiring mode "RGB" or "BRG" + - atmel,power-control-gpio: gpio to power on or off the LCD (as many as needed) + +Example: + display0: display { + bits-per-pixel = <32>; + atmel,lcdcon-backlight; + atmel,dmacon = <0x1>; + atmel,lcdcon2 = <0x80008002>; + atmel,guard-time = <9>; + atmel,lcd-wiring-mode = <1>; + + display-timings { + native-mode = <&timing0>; + timing0: timing0 { + clock-frequency = <9000000>; + hactive = <480>; + vactive = <272>; + hback-porch = <1>; + hfront-porch = <1>; + vback-porch = <40>; + vfront-porch = <1>; + hsync-len = <45>; + vsync-len = <1>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/video/backlight/lp855x.txt b/Documentation/devicetree/bindings/video/backlight/lp855x.txt index 1482103d288f..96e83a56048e 100644 --- a/Documentation/devicetree/bindings/video/backlight/lp855x.txt +++ b/Documentation/devicetree/bindings/video/backlight/lp855x.txt @@ -2,7 +2,7 @@ lp855x bindings Required properties: - compatible: "ti,lp8550", "ti,lp8551", "ti,lp8552", "ti,lp8553", - "ti,lp8556", "ti,lp8557" + "ti,lp8555", "ti,lp8556", "ti,lp8557" - reg: I2C slave address (u8) - dev-ctrl: Value of DEVICE CONTROL register (u8). It depends on the device. @@ -15,6 +15,33 @@ Optional properties: Example: + /* LP8555 */ + backlight@2c { + compatible = "ti,lp8555"; + reg = <0x2c>; + + dev-ctrl = /bits/ 8 <0x00>; + pwm-period = <10000>; + + /* 4V OV, 4 output LED0 string enabled */ + rom_14h { + rom-addr = /bits/ 8 <0x14>; + rom-val = /bits/ 8 <0xcf>; + }; + + /* Heavy smoothing, 24ms ramp time step */ + rom_15h { + rom-addr = /bits/ 8 <0x15>; + rom-val = /bits/ 8 <0xc7>; + }; + + /* 4 output LED1 string enabled */ + rom_19h { + rom-addr = /bits/ 8 <0x19>; + rom-val = /bits/ 8 <0x0f>; + }; + }; + /* LP8556 */ backlight@2c { compatible = "ti,lp8556"; diff --git a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt index 1e4fc727f3b1..764db86d441a 100644 --- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt +++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt @@ -10,12 +10,16 @@ Required properties: last value in the array represents a 100% duty cycle (brightest). - default-brightness-level: the default brightness level (index into the array defined by the "brightness-levels" property) + - power-supply: regulator for supply voltage Optional properties: - pwm-names: a list of names for the PWM devices specified in the "pwms" property (see PWM binding[0]) + - enable-gpios: contains a single GPIO specifier for the GPIO which enables + and disables the backlight (see GPIO binding[1]) [0]: Documentation/devicetree/bindings/pwm/pwm.txt +[1]: Documentation/devicetree/bindings/gpio/gpio.txt Example: @@ -25,4 +29,7 @@ Example: brightness-levels = <0 4 8 16 32 64 128 255>; default-brightness-level = <6>; + + power-supply = <&vdd_bl_reg>; + enable-gpios = <&gpio 58 0>; }; diff --git a/Documentation/devicetree/bindings/watchdog/dw_wdt.txt b/Documentation/devicetree/bindings/watchdog/dw_wdt.txt new file mode 100644 index 000000000000..08e16f684f2d --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/dw_wdt.txt @@ -0,0 +1,21 @@ +Synopsys Designware Watchdog Timer + +Required Properties: + +- compatible : Should contain "snps,dw-wdt" +- reg : Base address and size of the watchdog timer registers. +- clocks : phandle + clock-specifier for the clock that drives the + watchdog timer. + +Optional Properties: + +- interrupts : The interrupt used for the watchdog timeout warning. + +Example: + + watchdog0: wd@ffd02000 { + compatible = "snps,dw-wdt"; + reg = <0xffd02000 0x1000>; + interrupts = <0 171 4>; + clocks = <&per_base_clk>; + }; diff --git a/Documentation/devicetree/bindings/gpio/men-a021-wdt.txt b/Documentation/devicetree/bindings/watchdog/men-a021-wdt.txt index 370dee3226d9..370dee3226d9 100644 --- a/Documentation/devicetree/bindings/gpio/men-a021-wdt.txt +++ b/Documentation/devicetree/bindings/watchdog/men-a021-wdt.txt diff --git a/Documentation/devicetree/bindings/watchdog/moxa,moxart-watchdog.txt b/Documentation/devicetree/bindings/watchdog/moxa,moxart-watchdog.txt new file mode 100644 index 000000000000..1169857d1d12 --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/moxa,moxart-watchdog.txt @@ -0,0 +1,15 @@ +MOXA ART Watchdog timer + +Required properties: + +- compatible : Must be "moxa,moxart-watchdog" +- reg : Should contain registers location and length +- clocks : Should contain phandle for the clock that drives the counter + +Example: + + watchdog: watchdog@98500000 { + compatible = "moxa,moxart-watchdog"; + reg = <0x98500000 0x10>; + clocks = <&coreclk>; + }; diff --git a/Documentation/devicetree/bindings/watchdog/rt2880-wdt.txt b/Documentation/devicetree/bindings/watchdog/rt2880-wdt.txt new file mode 100644 index 000000000000..d7bab3db9d1f --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/rt2880-wdt.txt @@ -0,0 +1,19 @@ +Ralink Watchdog Timers + +Required properties: +- compatible: must be "ralink,rt2880-wdt" +- reg: physical base address of the controller and length of the register range + +Optional properties: +- interrupt-parent: phandle to the INTC device node +- interrupts: Specify the INTC interrupt number + +Example: + + watchdog@120 { + compatible = "ralink,rt2880-wdt"; + reg = <0x120 0x10>; + + interrupt-parent = <&intc>; + interrupts = <1>; + }; diff --git a/Documentation/devicetree/bindings/watchdog/sirfsoc_wdt.txt b/Documentation/devicetree/bindings/watchdog/sirfsoc_wdt.txt new file mode 100644 index 000000000000..9cbc76c89b2b --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/sirfsoc_wdt.txt @@ -0,0 +1,14 @@ +SiRFSoC Timer and Watchdog Timer(WDT) Controller + +Required properties: +- compatible: "sirf,prima2-tick" +- reg: Address range of tick timer/WDT register set +- interrupts: interrupt number to the cpu + +Example: + +timer@b0020000 { + compatible = "sirf,prima2-tick"; + reg = <0xb0020000 0x1000>; + interrupts = <0>; +}; diff --git a/Documentation/dmatest.txt b/Documentation/dmatest.txt index a2b5663eae26..dd77a81bdb80 100644 --- a/Documentation/dmatest.txt +++ b/Documentation/dmatest.txt @@ -15,39 +15,48 @@ be built as module or inside kernel. Let's consider those cases. Part 2 - When dmatest is built as a module... -After mounting debugfs and loading the module, the /sys/kernel/debug/dmatest -folder with nodes will be created. There are two important files located. First -is the 'run' node that controls run and stop phases of the test, and the second -one, 'results', is used to get the test case results. - -Note that in this case test will not run on load automatically. - Example of usage: + % modprobe dmatest channel=dma0chan0 timeout=2000 iterations=1 run=1 + +...or: + % modprobe dmatest % echo dma0chan0 > /sys/module/dmatest/parameters/channel % echo 2000 > /sys/module/dmatest/parameters/timeout % echo 1 > /sys/module/dmatest/parameters/iterations - % echo 1 > /sys/kernel/debug/dmatest/run + % echo 1 > /sys/module/dmatest/parameters/run + +...or on the kernel command line: + + dmatest.channel=dma0chan0 dmatest.timeout=2000 dmatest.iterations=1 dmatest.run=1 Hint: available channel list could be extracted by running the following command: % ls -1 /sys/class/dma/ -After a while you will start to get messages about current status or error like -in the original code. +Once started a message like "dmatest: Started 1 threads using dma0chan0" is +emitted. After that only test failure messages are reported until the test +stops. Note that running a new test will not stop any in progress test. -The following command should return actual state of the test. - % cat /sys/kernel/debug/dmatest/run - -To wait for test done the user may perform a busy loop that checks the state. - - % while [ $(cat /sys/kernel/debug/dmatest/run) = "Y" ] - > do - > echo -n "." - > sleep 1 - > done - > echo +The following command returns the state of the test. + % cat /sys/module/dmatest/parameters/run + +To wait for test completion userpace can poll 'run' until it is false, or use +the wait parameter. Specifying 'wait=1' when loading the module causes module +initialization to pause until a test run has completed, while reading +/sys/module/dmatest/parameters/wait waits for any running test to complete +before returning. For example, the following scripts wait for 42 tests +to complete before exiting. Note that if 'iterations' is set to 'infinite' then +waiting is disabled. + +Example: + % modprobe dmatest run=1 iterations=42 wait=1 + % modprobe -r dmatest +...or: + % modprobe dmatest run=1 iterations=42 + % cat /sys/module/dmatest/parameters/wait + % modprobe -r dmatest Part 3 - When built-in in the kernel... @@ -62,21 +71,22 @@ case. You always could check them at run-time by running Part 4 - Gathering the test results -The module provides a storage for the test results in the memory. The gathered -data could be used after test is done. +Test results are printed to the kernel log buffer with the format: -The special file 'results' in the debugfs represents gathered data of the in -progress test. The messages collected are printed to the kernel log as well. +"dmatest: result <channel>: <test id>: '<error msg>' with src_off=<val> dst_off=<val> len=<val> (<err code>)" Example of output: - % cat /sys/kernel/debug/dmatest/results - dma0chan0-copy0: #1: No errors with src_off=0x7bf dst_off=0x8ad len=0x3fea (0) + % dmesg | tail -n 1 + dmatest: result dma0chan0-copy0: #1: No errors with src_off=0x7bf dst_off=0x8ad len=0x3fea (0) The message format is unified across the different types of errors. A number in the parens represents additional information, e.g. error code, error counter, -or status. +or status. A test thread also emits a summary line at completion listing the +number of tests executed, number that failed, and a result code. -Comparison between buffers is stored to the dedicated structure. +Example: + % dmesg | tail -n 1 + dmatest: dma0chan0-copy0: summary 1 test, 0 failures 1000 iops 100000 KB/s (0) -Note that the verify result is now accessible only via file 'results' in the -debugfs. +The details of a data miscompare error are also emitted, but do not follow the +above format. diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 9dae59407437..5dd282dda55c 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -70,6 +70,12 @@ Unless otherwise specified, all options default to off. See comments at the top of fs/btrfs/check-integrity.c for more info. + commit=<seconds> + Set the interval of periodic commit, 30 seconds by default. Higher + values defer data being synced to permanent storage with obvious + consequences when the system crashes. The upper bound is not forced, + but a warning is printed if it's more than 300 seconds (5 minutes). + compress compress=<type> compress-force @@ -154,7 +160,11 @@ Unless otherwise specified, all options default to off. Currently this scans a list of several previous tree roots and tries to use the first readable. - skip_balance + rescan_uuid_tree + Force check and rebuild procedure of the UUID tree. This should not + normally be needed. + + skip_balance Skip automatic resume of interrupted balance operation after mount. May be resumed with "btrfs balance resume." @@ -234,24 +244,14 @@ available from the git repository at the following location: These include the following tools: -mkfs.btrfs: create a filesystem - -btrfsctl: control program to create snapshots and subvolumes: +* mkfs.btrfs: create a filesystem - mount /dev/sda2 /mnt - btrfsctl -s new_subvol_name /mnt - btrfsctl -s snapshot_of_default /mnt/default - btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name - btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol - ls /mnt - default snapshot_of_a_snapshot snapshot_of_new_subvol - new_subvol_name snapshot_of_default +* btrfs: a single tool to manage the filesystems, refer to the manpage for more details - Snapshots and subvolumes cannot be deleted right now, but you can - rm -rf all the files and directories inside them. +* 'btrfsck' or 'btrfs check': do a consistency check of the filesystem -btrfsck: do a limited check of the FS extent trees. +Other tools for specific tasks: -btrfs-debug-tree: print all of the FS metadata in text form. Example: +* btrfs-convert: in-place conversion from ext2/3/4 filesystems - btrfs-debug-tree /dev/sda2 >& big_output_file +* btrfs-image: dump filesystem metadata for debugging diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking index ff7b611abf33..09bbf9a54f80 100644 --- a/Documentation/filesystems/directory-locking +++ b/Documentation/filesystems/directory-locking @@ -2,6 +2,10 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem (->s_vfs_rename_mutex). + When taking the i_mutex on multiple non-directory objects, we +always acquire the locks in order by increasing address. We'll call +that "inode pointer" order in the following. + For our purposes all operations fall in 5 classes: 1) read access. Locking rules: caller locks directory we are accessing. @@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem locks victim and calls the method. 4) rename() that is _not_ cross-directory. Locking rules: caller locks -the parent, finds source and target, if target already exists - locks it -and then calls the method. +the parent and finds source and target. If target already exists, lock +it. If source is a non-directory, lock it. If that means we need to +lock both, lock them in inode pointer order. 5) link creation. Locking rules: * lock parent @@ -30,7 +35,9 @@ rules: fail with -ENOTEMPTY * if new parent is equal to or is a descendent of source fail with -ELOOP - * if target exists - lock it. + * If target exists, lock it. If source is a non-directory, lock + it. In case that means we need to lock both source and target, + do so in inode pointer order. * call the method. @@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B. renames will be blocked on filesystem lock and we don't start changing the order until we had acquired all locks). -(3) any operation holds at most one lock on non-directory object and - that lock is acquired after all other locks. (Proof: see descriptions - of operations). +(3) locks on non-directory objects are acquired only after locks on + directory objects, and are acquired in inode pointer order. + (Proof: all operations but renames take lock on at most one + non-directory object, except renames, which take locks on source and + target in inode pointer order in the case they are not directories.) Now consider the minimal deadlock. Each process is blocked on attempt to acquire some lock and already holds at least one lock. Let's @@ -66,9 +75,13 @@ consider the set of contended locks. First of all, filesystem lock is not contended, since any process blocked on it is not holding any locks. Thus all processes are blocked on ->i_mutex. - Non-directory objects are not contended due to (3). Thus link -creation can't be a part of deadlock - it can't be blocked on source -and it means that it doesn't hold any locks. + By (3), any process holding a non-directory lock can only be +waiting on another non-directory lock with a larger address. Therefore +the process holding the "largest" such lock can always make progress, and +non-directory objects are not included in the set of contended locks. + + Thus link creation can't be a part of deadlock - it can't be +blocked on source and it means that it doesn't hold any locks. Any contended object is either held by cross-directory rename or has a child that is also contended. Indeed, suppose that it is held by diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index 3cd27bed6349..a3fe811bbdbc 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -119,6 +119,7 @@ active_logs=%u Support configuring the number of active logs. In the Default number is 6. disable_ext_identify Disable the extension list configured by mkfs, so f2fs does not aware of cold files such as media files. +inline_xattr Enable the inline xattrs feature. ================================================================================ DEBUGFS ENTRIES @@ -164,6 +165,12 @@ Files in /sys/fs/f2fs/<devname> gc_idle = 1 will select the Cost Benefit approach & setting gc_idle = 2 will select the greedy aproach. + reclaim_segments This parameter controls the number of prefree + segments to be reclaimed. If the number of prefree + segments is larger than this number, f2fs tries to + conduct checkpoint to reclaim the prefree segments + to free segments. By default, 100 segments, 200MB. + ================================================================================ USAGE ================================================================================ diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index f0890581f7f6..fe2b7ae6f962 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -455,3 +455,11 @@ in your dentry operations instead. vfs_follow_link has been removed. Filesystems must use nd_set_link from ->follow_link for normal symlinks, or nd_jump_link for magic /proc/<pid> style links. +-- +[mandatory] + iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be + called with both ->i_lock and inode_hash_lock held; the former is *not* + taken anymore, so verify that your callbacks do not rely on it (none + of the in-tree instances did). inode_hash_lock is still held, + of course, so they are still serialized wrt removal from inode hash, + as well as wrt set() callback of iget5_locked(). diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 823c95faebd2..22d89aa37218 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -460,6 +460,7 @@ manner. The codes are the following: nl - non-linear mapping ar - architecture specific flag dd - do not include area into core dump + sd - soft-dirty flag mm - mixed map area hg - huge page advise flag nh - no-huge page advise flag diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index aa1f459fa6cf..4a93e98b290a 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -307,7 +307,7 @@ the following: <proceeding files...> <slot #3, id = 0x43, characters = "h is long"> - <slot #2, id = 0x02, characters = "xtension which"> + <slot #2, id = 0x02, characters = "xtension whic"> <slot #1, id = 0x01, characters = "My Big File.E"> <directory entry, name = "MYBIGFIL.EXT"> diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt index e7ca6478cd93..7b727783db7e 100644 --- a/Documentation/gcov.txt +++ b/Documentation/gcov.txt @@ -50,6 +50,10 @@ Configure the kernel with: CONFIG_DEBUG_FS=y CONFIG_GCOV_KERNEL=y +select the gcc's gcov format, default is autodetect based on gcc version: + + CONFIG_GCOV_FORMAT_AUTODETECT=y + and to get coverage data for the entire kernel: CONFIG_GCOV_PROFILE_ALL=y diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90 index b466974e142f..ab81013cc390 100644 --- a/Documentation/hwmon/lm90 +++ b/Documentation/hwmon/lm90 @@ -122,6 +122,12 @@ Supported chips: Prefix: 'g781' Addresses scanned: I2C 0x4c, 0x4d Datasheet: Not publicly available from GMT + * Texas Instruments TMP451 + Prefix: 'tmp451' + Addresses scanned: I2C 0x4c + Datasheet: Publicly available at TI website + http://www.ti.com/litv/pdf/sbos686 + Author: Jean Delvare <khali@linux-fr.org> diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index d29dea0f3232..7b0dcdb57173 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -25,6 +25,7 @@ Supported adapters: * Intel Avoton (SOC) * Intel Wellsburg (PCH) * Intel Coleto Creek (PCH) + * Intel Wildcat Point-LP (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller diff --git a/Documentation/input/gamepad.txt b/Documentation/input/gamepad.txt index 8002c894c6b0..31bb6a4029ef 100644 --- a/Documentation/input/gamepad.txt +++ b/Documentation/input/gamepad.txt @@ -122,12 +122,14 @@ D-Pad: BTN_DPAD_* Analog buttons are reported as: ABS_HAT0X and ABS_HAT0Y + (for ABS values negative is left/up, positive is right/down) Analog-Sticks: The left analog-stick is reported as ABS_X, ABS_Y. The right analog stick is reported as ABS_RX, ABS_RY. Zero, one or two sticks may be present. If analog-sticks provide digital buttons, they are mapped accordingly as BTN_THUMBL (first/left) and BTN_THUMBR (second/right). + (for ABS values negative is left/up, positive is right/down) Triggers: Trigger buttons can be available as digital or analog buttons or both. User- @@ -138,6 +140,7 @@ Triggers: ABS_HAT2X (right/ZR) and BTN_TL2 or ABS_HAT2Y (left/ZL). If only one trigger-button combination is present (upper+lower), they are reported as "right" triggers (BTN_TR/ABS_HAT1X). + (ABS trigger values start at 0, pressure is reported as positive values) Menu-Pad: Menu buttons are always digital and are mapped according to their location diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt index 8ef6dbb6a462..bbc99c0c1094 100644 --- a/Documentation/kbuild/kconfig.txt +++ b/Documentation/kbuild/kconfig.txt @@ -20,16 +20,9 @@ symbols have been introduced. To see a list of new config symbols when using "make oldconfig", use cp user/some/old.config .config - yes "" | make oldconfig >conf.new + make listnewconfig -and the config program will list as (NEW) any new symbols that have -unknown values. Of course, the .config file is also updated with -new (default) values, so you can use: - - grep "(NEW)" conf.new - -to see the new config symbols or you can use diffconfig to see the -differences between the previous and new .config files: +and the config program will list any new symbols, one per line. scripts/diffconfig .config.old .config | less diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index fd3ecedc084d..50680a59a2ff 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1070,6 +1070,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. VIA, nVidia) verbose: show contents of HPET registers during setup + hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET + registers. Default set by CONFIG_HPET_MMAP_DEFAULT. + hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot. hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages. On x86-64 and powerpc, this option can be specified @@ -1187,15 +1190,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted. owned by uid=0. ima_hash= [IMA] - Format: { "sha1" | "md5" } + Format: { md5 | sha1 | rmd160 | sha256 | sha384 + | sha512 | ... } default: "sha1" + The list of supported hash algorithms is defined + in crypto/hash_info.h. + ima_tcb [IMA] Load a policy which meets the needs of the Trusted Computing Base. This means IMA will measure all programs exec'd, files mmap'd for exec, and all files opened for read by uid=0. + ima_template= [IMA] + Select one of defined IMA measurements template formats. + Formats: { "ima" | "ima-ng" } + Default: "ima-ng" + init= [KNL] Format: <full_path> Run specified binary instead of /sbin/init as init @@ -1775,6 +1787,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. + movable_node [KNL,X86] Boot-time switch to enable the effects + of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details. + MTD_Partition= [MTD] Format: <name>,<region-number>,<size>,<offset> diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index dd2f7b26ca30..72d010689751 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -46,16 +46,14 @@ With these hooks we provide the following statistics: contentions - number of lock acquisitions that had to wait wait time min - shortest (non-0) time we ever had to wait for a lock max - longest time we ever had to wait for a lock - total - total time we spend waiting on this lock + total - total time we spend waiting on this lock + avg - average time spent waiting on this lock acq-bounces - number of lock acquisitions that involved x-cpu data acquisitions - number of times we took the lock hold time min - shortest (non-0) time we ever held the lock - max - longest time we ever held the lock - total - total time this lock was held - -From these number various other statistics can be derived, such as: - - hold time average = hold time total / acquisitions + max - longest time we ever held the lock + total - total time this lock was held + avg - average time this lock was held These numbers are gathered per lock class, per read/write state (when applicable). @@ -84,37 +82,38 @@ Look at the current lock statistics: # less /proc/lock_stat -01 lock_stat version 0.3 -02 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total -04 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +01 lock_stat version 0.4 +02----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg +04----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 05 -06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34 -07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88 -08 --------------- -09 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928 -10 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d -11 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce -12 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59 -13 --------------- -14 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce -15 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928 -16 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59 -17 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d +06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99 +07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 +08 --------------- +09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280 +19 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 +11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 +12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80 +13 --------------- +14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0 +15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250 +16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 +17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 18 -19 ............................................................................................................................................................................................... +19............................................................................................................................................................................................................................. 20 -21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41 -22 ----------- -23 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54 -24 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb -25 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44 -26 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a -27 ----------- -28 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54 -29 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44 -30 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb -31 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a +21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48 +22 --------------- +23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 +24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 +25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 +26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 +27 --------------- +28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 +29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 +30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 +31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 + This excerpt shows the first two lock class statistics. Line 01 shows the output version - each time the format changes this will be updated. Line 02-04 @@ -131,30 +130,30 @@ The integer part of the time values is in us. Dealing with nested locks, subclasses may appear: -32............................................................................................................................................................................................... +32........................................................................................................................................................................................................................... 33 -34 &rq->lock: 13128 13128 0.43 190.53 103881.26 97454 3453404 0.00 401.11 13224683.11 +34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82 35 --------- -36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 -37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a -38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a -39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb +36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 +37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a +38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a +39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb 40 --------- -41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 -42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a -43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 -44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 +41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 +42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a +43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 +44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 45 -46............................................................................................................................................................................................... +46........................................................................................................................................................................................................................... 47 -48 &rq->lock/1: 11526 11488 0.33 388.73 136294.31 21461 38404 0.00 37.93 109388.53 +48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84 49 ----------- -50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 +50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 51 ----------- -52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 -53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 -54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 -55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a +52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 +53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 +54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 +55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a Line 48 shows statistics for the second subclass (/1) of &rq->lock class (subclass starts from 0), since in this case, as line 50 suggests, @@ -163,16 +162,16 @@ double_rq_lock actually acquires a nested lock of two spinlocks. View the top contending locks: # grep : /proc/lock_stat | head - &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60 - &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38 - dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 - &inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74 - &zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06 - &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 - &q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52 - &rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62 - &rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63 - tasklist_lock-W: 15 15 1.45 10.87 32.70 1201 7390 0.58 62.55 13648.47 + clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71 + tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59 + &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59 + &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69 + &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93 + tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72 + tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89 + rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89 + &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27 + rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76 Clear the statistics: diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt index 38c10fd7f411..1dfe62c3641d 100644 --- a/Documentation/mutex-design.txt +++ b/Documentation/mutex-design.txt @@ -116,11 +116,11 @@ using mutexes at the moment, please let me know if you find any. ] Implementation of mutexes ------------------------- -'struct mutex' is the new mutex type, defined in include/linux/mutex.h -and implemented in kernel/mutex.c. It is a counter-based mutex with a -spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", -0 for "locked" and negative numbers (usually -1) for "locked, potential -waiters queued". +'struct mutex' is the new mutex type, defined in include/linux/mutex.h and +implemented in kernel/locking/mutex.c. It is a counter-based mutex with a +spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for +"locked" and negative numbers (usually -1) for "locked, potential waiters +queued". the APIs of 'struct mutex' have been streamlined: diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index c1d82047a4b1..89490beb3c0b 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -69,8 +69,7 @@ folder: # aggregated_ogms gw_bandwidth log_level # ap_isolation gw_mode orig_interval # bonding gw_sel_class routing_algo -# bridge_loop_avoidance hop_penalty vis_mode -# fragmentation +# bridge_loop_avoidance hop_penalty fragmentation There is a special folder for debugging information: @@ -78,7 +77,7 @@ There is a special folder for debugging information: # ls /sys/kernel/debug/batman_adv/bat0/ # bla_backbone_table log transtable_global # bla_claim_table originators transtable_local -# gateways socket vis_data +# gateways socket Some of the files contain all sort of status information regard- ing the mesh network. For example, you can view the table of @@ -127,51 +126,6 @@ ously assigned to interfaces now used by batman advanced, e.g. # ifconfig eth0 0.0.0.0 -VISUALIZATION -------------- - -If you want topology visualization, at least one mesh node must -be configured as VIS-server: - -# echo "server" > /sys/class/net/bat0/mesh/vis_mode - -Each node is either configured as "server" or as "client" (de- -fault: "client"). Clients send their topology data to the server -next to them, and server synchronize with other servers. If there -is no server configured (default) within the mesh, no topology -information will be transmitted. With these "synchronizing -servers", there can be 1 or more vis servers sharing the same (or -at least very similar) data. - -When configured as server, you can get a topology snapshot of -your mesh: - -# cat /sys/kernel/debug/batman_adv/bat0/vis_data - -This raw output is intended to be easily parsable and convertable -with other tools. Have a look at the batctl README if you want a -vis output in dot or json format for instance and how those out- -puts could then be visualised in an image. - -The raw format consists of comma separated values per entry where -each entry is giving information about a certain source inter- -face. Each entry can/has to have the following values: --> "mac" - mac address of an originator's source interface - (each line begins with it) --> "TQ mac value" - src mac's link quality towards mac address - of a neighbor originator's interface which - is being used for routing --> "TT mac" - TT announced by source mac --> "PRIMARY" - this is a primary interface --> "SEC mac" - secondary mac address of source - (requires preceding PRIMARY) - -The TQ value has a range from 4 to 255 with 255 being the best. -The TT entries are showing which hosts are connected to the mesh -via bat0 or being bridged into the mesh network. The PRIMARY/SEC -values are only applied on primary interfaces - - LOGGING/DEBUGGING ----------------- @@ -245,5 +199,5 @@ Mailing-list: b.a.t.m.a.n@open-mesh.org (optional subscription You can also contact the Authors: -Marek Lindner <lindner_marek@yahoo.de> -Simon Wunderlich <siwu@hrz.tu-chemnitz.de> +Marek Lindner <mareklindner@neomailbox.ch> +Simon Wunderlich <sw@simonwunderlich.de> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 9b28e714831a..2cdb8b66caa9 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -639,6 +639,15 @@ num_unsol_na are generated by the ipv4 and ipv6 code and the numbers of repetitions cannot be set independently. +packets_per_slave + + Specify the number of packets to transmit through a slave before + moving to the next one. When set to 0 then a slave is chosen at + random. + + The valid range is 0 - 65535; the default value is 1. This option + has effect only in balance-rr mode. + primary A string (eth0, eth2, etc) specifying which slave is the @@ -743,21 +752,16 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The IPv4 formula is - - (((source IP XOR dest IP) AND 0xffff) XOR - ( source MAC XOR destination MAC )) - modulo slave count - - The IPv6 formula is + generate the hash. The formula is - hash = (source ip quad 2 XOR dest IP quad 2) XOR - (source ip quad 3 XOR dest IP quad 3) XOR - (source ip quad 4 XOR dest IP quad 4) + hash = source MAC XOR destination MAC + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. - (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) - XOR (source MAC XOR destination MAC)) - modulo slave count + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, @@ -779,21 +783,16 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented IPv4 TCP and UDP packets is + The formula for unfragmented TCP and UDP packets is - ((source port XOR dest port) XOR - ((source IP XOR dest IP) AND 0xffff) - modulo slave count + hash = source port, destination port (as in the header) + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. - The formula for unfragmented IPv6 TCP and UDP packets is - - hash = (source port XOR dest port) XOR - ((source ip quad 2 XOR dest IP quad 2) XOR - (source ip quad 3 XOR dest IP quad 3) XOR - (source ip quad 4 XOR dest IP quad 4)) - - ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) - modulo slave count + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. For fragmented TCP or UDP packets and all other IPv4 and IPv6 protocol traffic, the source and destination port @@ -801,10 +800,6 @@ xmit_hash_policy formula is the same as for the layer2 transmit hash policy. - The IPv4 policy is intended to mimic the behavior of - certain switches, notably Cisco switches with PFC2 as - well as some Foundry and IBM products. - This algorithm is not fully 802.3ad compliant. A single TCP or UDP conversation containing both fragmented and unfragmented packets will see packets @@ -815,6 +810,26 @@ xmit_hash_policy conversations. Other implementations of 802.3ad may or may not tolerate this noncompliance. + encap2+3 + + This policy uses the same formula as layer2+3 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + + encap3+4 + + This policy uses the same formula as layer3+4 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + The default value is layer2. This option was added in bonding version 2.6.3. In earlier versions of bonding, this parameter does not exist, and the layer2 policy is the only policy. The diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 820f55344edc..4c072414eadb 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -25,6 +25,12 @@ This file contains 4.1.5 RAW socket option CAN_RAW_FD_FRAMES 4.1.6 RAW socket returned message flags 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.2.1 Broadcast Manager operations + 4.2.2 Broadcast Manager message flags + 4.2.3 Broadcast Manager transmission timers + 4.2.4 Broadcast Manager message sequence transmission + 4.2.5 Broadcast Manager receive filter timers + 4.2.6 Broadcast Manager multiplex message receive filter 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) @@ -593,6 +599,217 @@ solution for a couple of reasons: In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set. 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + + The Broadcast Manager protocol provides a command based configuration + interface to filter and send (e.g. cyclic) CAN messages in kernel space. + + Receive filters can be used to down sample frequent messages; detect events + such as message contents changes, packet length changes, and do time-out + monitoring of received messages. + + Periodic transmission tasks of CAN frames or a sequence of CAN frames can be + created and modified at runtime; both the message content and the two + possible transmit intervals can be altered. + + A BCM socket is not intended for sending individual CAN frames using the + struct can_frame as known from the CAN_RAW socket. Instead a special BCM + configuration message is defined. The basic BCM configuration message used + to communicate with the broadcast manager and the available operations are + defined in the linux/can/bcm.h include. The BCM message consists of a + message header with a command ('opcode') followed by zero or more CAN frames. + The broadcast manager sends responses to user space in the same form: + + struct bcm_msg_head { + __u32 opcode; /* command */ + __u32 flags; /* special flags */ + __u32 count; /* run 'count' times with ival1 */ + struct timeval ival1, ival2; /* count and subsequent interval */ + canid_t can_id; /* unique can_id for task */ + __u32 nframes; /* number of can_frames following */ + struct can_frame frames[0]; + }; + + The aligned payload 'frames' uses the same basic CAN frame structure defined + at the beginning of section 4 and in the include/linux/can.h include. All + messages to the broadcast manager from user space have this structure. + + Note a CAN_BCM socket must be connected instead of bound after socket + creation (example without error checking): + + int s; + struct sockaddr_can addr; + struct ifreq ifr; + + s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM); + + strcpy(ifr.ifr_name, "can0"); + ioctl(s, SIOCGIFINDEX, &ifr); + + addr.can_family = AF_CAN; + addr.can_ifindex = ifr.ifr_ifindex; + + connect(s, (struct sockaddr *)&addr, sizeof(addr)) + + (..) + + The broadcast manager socket is able to handle any number of in flight + transmissions or receive filters concurrently. The different RX/TX jobs are + distinguished by the unique can_id in each BCM message. However additional + CAN_BCM sockets are recommended to communicate on multiple CAN interfaces. + When the broadcast manager socket is bound to 'any' CAN interface (=> the + interface index is set to zero) the configured receive filters apply to any + CAN interface unless the sendto() syscall is used to overrule the 'any' CAN + interface index. When using recvfrom() instead of read() to retrieve BCM + socket messages the originating CAN interface is provided in can_ifindex. + + 4.2.1 Broadcast Manager operations + + The opcode defines the operation for the broadcast manager to carry out, + or details the broadcast managers response to several events, including + user requests. + + Transmit Operations (user space to broadcast manager): + + TX_SETUP: Create (cyclic) transmission task. + + TX_DELETE: Remove (cyclic) transmission task, requires only can_id. + + TX_READ: Read properties of (cyclic) transmission task for can_id. + + TX_SEND: Send one CAN frame. + + Transmit Responses (broadcast manager to user space): + + TX_STATUS: Reply to TX_READ request (transmission task configuration). + + TX_EXPIRED: Notification when counter finishes sending at initial interval + 'ival1'. Requires the TX_COUNTEVT flag to be set at TX_SETUP. + + Receive Operations (user space to broadcast manager): + + RX_SETUP: Create RX content filter subscription. + + RX_DELETE: Remove RX content filter subscription, requires only can_id. + + RX_READ: Read properties of RX content filter subscription for can_id. + + Receive Responses (broadcast manager to user space): + + RX_STATUS: Reply to RX_READ request (filter task configuration). + + RX_TIMEOUT: Cyclic message is detected to be absent (timer ival1 expired). + + RX_CHANGED: BCM message with updated CAN frame (detected content change). + Sent on first message received or on receipt of revised CAN messages. + + 4.2.2 Broadcast Manager message flags + + When sending a message to the broadcast manager the 'flags' element may + contain the following flag definitions which influence the behaviour: + + SETTIMER: Set the values of ival1, ival2 and count + + STARTTIMER: Start the timer with the actual values of ival1, ival2 + and count. Starting the timer leads simultaneously to emit a CAN frame. + + TX_COUNTEVT: Create the message TX_EXPIRED when count expires + + TX_ANNOUNCE: A change of data by the process is emitted immediately. + + TX_CP_CAN_ID: Copies the can_id from the message header to each + subsequent frame in frames. This is intended as usage simplification. For + TX tasks the unique can_id from the message header may differ from the + can_id(s) stored for transmission in the subsequent struct can_frame(s). + + RX_FILTER_ID: Filter by can_id alone, no frames required (nframes=0). + + RX_CHECK_DLC: A change of the DLC leads to an RX_CHANGED. + + RX_NO_AUTOTIMER: Prevent automatically starting the timeout monitor. + + RX_ANNOUNCE_RESUME: If passed at RX_SETUP and a receive timeout occured, a + RX_CHANGED message will be generated when the (cyclic) receive restarts. + + TX_RESET_MULTI_IDX: Reset the index for the multiple frame transmission. + + RX_RTR_FRAME: Send reply for RTR-request (placed in op->frames[0]). + + 4.2.3 Broadcast Manager transmission timers + + Periodic transmission configurations may use up to two interval timers. + In this case the BCM sends a number of messages ('count') at an interval + 'ival1', then continuing to send at another given interval 'ival2'. When + only one timer is needed 'count' is set to zero and only 'ival2' is used. + When SET_TIMER and START_TIMER flag were set the timers are activated. + The timer values can be altered at runtime when only SET_TIMER is set. + + 4.2.4 Broadcast Manager message sequence transmission + + Up to 256 CAN frames can be transmitted in a sequence in the case of a cyclic + TX task configuration. The number of CAN frames is provided in the 'nframes' + element of the BCM message head. The defined number of CAN frames are added + as array to the TX_SETUP BCM configuration message. + + /* create a struct to set up a sequence of four CAN frames */ + struct { + struct bcm_msg_head msg_head; + struct can_frame frame[4]; + } mytxmsg; + + (..) + mytxmsg.nframes = 4; + (..) + + write(s, &mytxmsg, sizeof(mytxmsg)); + + With every transmission the index in the array of CAN frames is increased + and set to zero at index overflow. + + 4.2.5 Broadcast Manager receive filter timers + + The timer values ival1 or ival2 may be set to non-zero values at RX_SETUP. + When the SET_TIMER flag is set the timers are enabled: + + ival1: Send RX_TIMEOUT when a received message is not received again within + the given time. When START_TIMER is set at RX_SETUP the timeout detection + is activated directly - even without a former CAN frame reception. + + ival2: Throttle the received message rate down to the value of ival2. This + is useful to reduce messages for the application when the signal inside the + CAN frame is stateless as state changes within the ival2 periode may get + lost. + + 4.2.6 Broadcast Manager multiplex message receive filter + + To filter for content changes in multiplex message sequences an array of more + than one CAN frames can be passed in a RX_SETUP configuration message. The + data bytes of the first CAN frame contain the mask of relevant bits that + have to match in the subsequent CAN frames with the received CAN frame. + If one of the subsequent CAN frames is matching the bits in that frame data + mark the relevant content to be compared with the previous received content. + Up to 257 CAN frames (multiplex filter bit mask CAN frame plus 256 CAN + filters) can be added as array to the TX_SETUP BCM configuration message. + + /* usually used to clear CAN frame data[] - beware of endian problems! */ + #define U64_DATA(p) (*(unsigned long long*)(p)->data) + + struct { + struct bcm_msg_head msg_head; + struct can_frame frame[5]; + } msg; + + msg.msg_head.opcode = RX_SETUP; + msg.msg_head.can_id = 0x42; + msg.msg_head.flags = 0; + msg.msg_head.nframes = 5; + U64_DATA(&msg.frame[0]) = 0xFF00000000000000ULL; /* MUX mask */ + U64_DATA(&msg.frame[1]) = 0x01000000000000FFULL; /* data mask (MUX 0x01) */ + U64_DATA(&msg.frame[2]) = 0x0200FFFF000000FFULL; /* data mask (MUX 0x02) */ + U64_DATA(&msg.frame[3]) = 0x330000FFFFFF0003ULL; /* data mask (MUX 0x33) */ + U64_DATA(&msg.frame[4]) = 0x4F07FC0FF0000000ULL; /* data mask (MUX 0x4F) */ + + write(s, &msg, sizeof(msg)); + 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a46d78583ae1..3c12d9a7ed00 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -267,17 +267,6 @@ tcp_max_orphans - INTEGER more aggressively. Let me to remind again: each orphan eats up to ~64K of unswappable memory. -tcp_max_ssthresh - INTEGER - Limited Slow-Start for TCP with large congestion windows (cwnd) defined in - RFC3742. Limited slow-start is a mechanism to limit growth of the cwnd - on the region where cwnd is larger than tcp_max_ssthresh. TCP increases cwnd - by at most tcp_max_ssthresh segments, and by at least tcp_max_ssthresh/2 - segments per RTT when the cwnd is above tcp_max_ssthresh. - If TCP connection increased cwnd to thousands (or tens of thousands) segments, - and thousands of packets were being dropped during slow-start, you can set - tcp_max_ssthresh to improve performance for new TCP connection. - Default: 0 (off) - tcp_max_syn_backlog - INTEGER Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client. @@ -451,7 +440,7 @@ tcp_fastopen - INTEGER connect() to perform a TCP handshake automatically. The values (bitmap) are - 1: Enables sending data in the opening SYN on the client. + 1: Enables sending data in the opening SYN on the client w/ MSG_FASTOPEN. 2: Enables TCP Fast Open on the server side, i.e., allowing data in a SYN packet to be accepted and passed to the application before 3-way hand shake finishes. @@ -464,7 +453,7 @@ tcp_fastopen - INTEGER different ways of setting max_qlen without the TCP_FASTOPEN socket option. - Default: 0 + Default: 1 Note that the client & server side Fast Open flags (1 and 2 respectively) must be also enabled before the rest of flags can take @@ -588,9 +577,6 @@ tcp_limit_output_bytes - INTEGER typical pfifo_fast qdiscs. tcp_limit_output_bytes limits the number of bytes on qdisc or device to reduce artificial RTT/cwnd and reduce bufferbloat. - Note: For GSO/TSO enabled flows, we try to have at least two - packets in flight. Reducing tcp_limit_output_bytes might also - reduce the size of individual GSO packet (64KB being the max) Default: 131072 tcp_challenge_ack_limit - INTEGER diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index c7ecc7080494..0b1cf6b2a592 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -10,12 +10,12 @@ network devices. struct net_device allocation rules ================================== Network device structures need to persist even after module is unloaded and -must be allocated with kmalloc. If device has registered successfully, -it will be freed on last use by free_netdev. This is required to handle the -pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu ) +must be allocated with alloc_netdev_mqs() and friends. +If device has registered successfully, it will be freed on last use +by free_netdev(). This is required to handle the pathologic case cleanly +(example: rmmod mydriver </sys/class/net/myeth/mtu ) -There are routines in net_init.c to handle the common cases of -alloc_etherdev, alloc_netdev. These reserve extra space for driver +alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver private data which gets freed when the network device is freed. If separately allocated data is attached to the network device (netdev_priv(dev)) then it is up to the module exit handler to free that. diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt index 425c51d56aef..b8a907dc0169 100644 --- a/Documentation/power/opp.txt +++ b/Documentation/power/opp.txt @@ -42,7 +42,7 @@ We can represent these as three OPPs as the following {Hz, uV} tuples: OPP library provides a set of helper functions to organize and query the OPP information. The library is located in drivers/base/power/opp.c and the header -is located in include/linux/opp.h. OPP library can be enabled by enabling +is located in include/linux/pm_opp.h. OPP library can be enabled by enabling CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to optionally boot at a certain OPP without needing cpufreq. @@ -71,14 +71,14 @@ operations until that OPP could be re-enabled if possible. OPP library facilitates this concept in it's implementation. The following operational functions operate only on available opps: -opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count -and opp_init_cpufreq_table +opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count +and dev_pm_opp_init_cpufreq_table -opp_find_freq_exact is meant to be used to find the opp pointer which can then -be used for opp_enable/disable functions to make an opp available as required. +dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then +be used for dev_pm_opp_enable/disable functions to make an opp available as required. WARNING: Users of OPP library should refresh their availability count using -get_opp_count if opp_enable/disable functions are invoked for a device, the +get_opp_count if dev_pm_opp_enable/disable functions are invoked for a device, the exact mechanism to trigger these or the notification mechanism to other dependent subsystems such as cpufreq are left to the discretion of the SoC specific framework which uses the OPP library. Similar care needs to be taken @@ -96,24 +96,24 @@ using RCU read locks. The opp_find_freq_{exact,ceil,floor}, opp_get_{voltage, freq, opp_count} fall into this category. opp_{add,enable,disable} are updaters which use mutex and implement it's own -RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses +RCU locking mechanisms. dev_pm_opp_init_cpufreq_table acts as an updater and uses mutex to implment RCU updater strategy. These functions should *NOT* be called under RCU locks and other contexts that prevent blocking functions in RCU or mutex operations from working. 2. Initial OPP List Registration ================================ -The SoC implementation calls opp_add function iteratively to add OPPs per +The SoC implementation calls dev_pm_opp_add function iteratively to add OPPs per device. It is expected that the SoC framework will register the OPP entries optimally- typical numbers range to be less than 5. The list generated by registering the OPPs is maintained by OPP library throughout the device operation. The SoC framework can subsequently control the availability of the -OPPs dynamically using the opp_enable / disable functions. +OPPs dynamically using the dev_pm_opp_enable / disable functions. -opp_add - Add a new OPP for a specific domain represented by the device pointer. +dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer. The OPP is defined using the frequency and voltage. Once added, the OPP is assumed to be available and control of it's availability can be done - with the opp_enable/disable functions. OPP library internally stores + with the dev_pm_opp_enable/disable functions. OPP library internally stores and manages this information in the opp struct. This function may be used by SoC framework to define a optimal list as per the demands of SoC usage environment. @@ -124,7 +124,7 @@ opp_add - Add a new OPP for a specific domain represented by the device pointer. soc_pm_init() { /* Do things */ - r = opp_add(mpu_dev, 1000000, 900000); + r = dev_pm_opp_add(mpu_dev, 1000000, 900000); if (!r) { pr_err("%s: unable to register mpu opp(%d)\n", r); goto no_cpufreq; @@ -143,44 +143,44 @@ functions return the matching pointer representing the opp if a match is found, else returns error. These errors are expected to be handled by standard error checks such as IS_ERR() and appropriate actions taken by the caller. -opp_find_freq_exact - Search for an OPP based on an *exact* frequency and +dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and availability. This function is especially useful to enable an OPP which is not available by default. Example: In a case when SoC framework detects a situation where a higher frequency could be made available, it can use this function to - find the OPP prior to call the opp_enable to actually make it available. + find the OPP prior to call the dev_pm_opp_enable to actually make it available. rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, false); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); rcu_read_unlock(); /* dont operate on the pointer.. just do a sanity check.. */ if (IS_ERR(opp)) { pr_err("frequency not disabled!\n"); /* trigger appropriate actions.. */ } else { - opp_enable(dev,1000000000); + dev_pm_opp_enable(dev,1000000000); } NOTE: This is the only search function that operates on OPPs which are not available. -opp_find_freq_floor - Search for an available OPP which is *at most* the +dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the provided frequency. This function is useful while searching for a lesser match OR operating on OPP information in the order of decreasing frequency. Example: To find the highest opp for a device: freq = ULONG_MAX; rcu_read_lock(); - opp_find_freq_floor(dev, &freq); + dev_pm_opp_find_freq_floor(dev, &freq); rcu_read_unlock(); -opp_find_freq_ceil - Search for an available OPP which is *at least* the +dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the provided frequency. This function is useful while searching for a higher match OR operating on OPP information in the order of increasing frequency. Example 1: To find the lowest opp for a device: freq = 0; rcu_read_lock(); - opp_find_freq_ceil(dev, &freq); + dev_pm_opp_find_freq_ceil(dev, &freq); rcu_read_unlock(); Example 2: A simplified implementation of a SoC cpufreq_driver->target: soc_cpufreq_target(..) @@ -188,7 +188,7 @@ opp_find_freq_ceil - Search for an available OPP which is *at least* the /* Do stuff like policy checks etc. */ /* Find the best frequency match for the req */ rcu_read_lock(); - opp = opp_find_freq_ceil(dev, &freq); + opp = dev_pm_opp_find_freq_ceil(dev, &freq); rcu_read_unlock(); if (!IS_ERR(opp)) soc_switch_to_freq_voltage(freq); @@ -208,34 +208,34 @@ as thermal considerations (e.g. don't use OPPx until the temperature drops). WARNING: Do not use these functions in interrupt context. -opp_enable - Make a OPP available for operation. +dev_pm_opp_enable - Make a OPP available for operation. Example: Lets say that 1GHz OPP is to be made available only if the SoC temperature is lower than a certain threshold. The SoC framework implementation might choose to do something as follows: if (cur_temp < temp_low_thresh) { /* Enable 1GHz if it was disabled */ rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, false); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); rcu_read_unlock(); /* just error check */ if (!IS_ERR(opp)) - ret = opp_enable(dev, 1000000000); + ret = dev_pm_opp_enable(dev, 1000000000); else goto try_something_else; } -opp_disable - Make an OPP to be not available for operation +dev_pm_opp_disable - Make an OPP to be not available for operation Example: Lets say that 1GHz OPP is to be disabled if the temperature exceeds a threshold value. The SoC framework implementation might choose to do something as follows: if (cur_temp > temp_high_thresh) { /* Disable 1GHz if it was enabled */ rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, true); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true); rcu_read_unlock(); /* just error check */ if (!IS_ERR(opp)) - ret = opp_disable(dev, 1000000000); + ret = dev_pm_opp_disable(dev, 1000000000); else goto try_something_else; } @@ -247,7 +247,7 @@ information from the OPP structure is necessary. Once an OPP pointer is retrieved using the search functions, the following functions can be used by SoC framework to retrieve the information represented inside the OPP layer. -opp_get_voltage - Retrieve the voltage represented by the opp pointer. +dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer. Example: At a cpufreq transition to a different frequency, SoC framework requires to set the voltage represented by the OPP using the regulator framework to the Power Management chip providing the @@ -256,15 +256,15 @@ opp_get_voltage - Retrieve the voltage represented by the opp pointer. { /* do things */ rcu_read_lock(); - opp = opp_find_freq_ceil(dev, &freq); - v = opp_get_voltage(opp); + opp = dev_pm_opp_find_freq_ceil(dev, &freq); + v = dev_pm_opp_get_voltage(opp); rcu_read_unlock(); if (v) regulator_set_voltage(.., v); /* do other things */ } -opp_get_freq - Retrieve the freq represented by the opp pointer. +dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer. Example: Lets say the SoC framework uses a couple of helper functions we could pass opp pointers instead of doing additional parameters to handle quiet a bit of data parameters. @@ -273,8 +273,8 @@ opp_get_freq - Retrieve the freq represented by the opp pointer. /* do things.. */ max_freq = ULONG_MAX; rcu_read_lock(); - max_opp = opp_find_freq_floor(dev,&max_freq); - requested_opp = opp_find_freq_ceil(dev,&freq); + max_opp = dev_pm_opp_find_freq_floor(dev,&max_freq); + requested_opp = dev_pm_opp_find_freq_ceil(dev,&freq); if (!IS_ERR(max_opp) && !IS_ERR(requested_opp)) r = soc_test_validity(max_opp, requested_opp); rcu_read_unlock(); @@ -282,25 +282,25 @@ opp_get_freq - Retrieve the freq represented by the opp pointer. } soc_test_validity(..) { - if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp)) + if(dev_pm_opp_get_voltage(max_opp) < dev_pm_opp_get_voltage(requested_opp)) return -EINVAL; - if(opp_get_freq(max_opp) < opp_get_freq(requested_opp)) + if(dev_pm_opp_get_freq(max_opp) < dev_pm_opp_get_freq(requested_opp)) return -EINVAL; /* do things.. */ } -opp_get_opp_count - Retrieve the number of available opps for a device +dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device Example: Lets say a co-processor in the SoC needs to know the available frequencies in a table, the main processor can notify as following: soc_notify_coproc_available_frequencies() { /* Do things */ rcu_read_lock(); - num_available = opp_get_opp_count(dev); + num_available = dev_pm_opp_get_opp_count(dev); speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL); /* populate the table in increasing order */ freq = 0; - while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) { + while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) { speeds[i] = freq; freq++; i++; @@ -313,7 +313,7 @@ opp_get_opp_count - Retrieve the number of available opps for a device 6. Cpufreq Table Generation =========================== -opp_init_cpufreq_table - cpufreq framework typically is initialized with +dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with cpufreq_frequency_table_cpuinfo which is provided with the list of frequencies that are available for operation. This function provides a ready to use conversion routine to translate the OPP layer's internal @@ -326,7 +326,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with soc_pm_init() { /* Do things */ - r = opp_init_cpufreq_table(dev, &freq_table); + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); if (!r) cpufreq_frequency_table_cpuinfo(policy, freq_table); /* Do other things */ @@ -336,7 +336,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with addition to CONFIG_PM as power management feature is required to dynamically scale voltage and frequency in a system. -opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table +dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table 7. Data Structures ================== @@ -358,16 +358,16 @@ accessed by various functions as described above. However, the structures representing the actual OPPs and domains are internal to the OPP library itself to allow for suitable abstraction reusable across systems. -struct opp - The internal data structure of OPP library which is used to +struct dev_pm_opp - The internal data structure of OPP library which is used to represent an OPP. In addition to the freq, voltage, availability information, it also contains internal book keeping information required for the OPP library to operate on. Pointer to this structure is provided back to the users such as SoC framework to be used as a identifier for OPP in the interactions with OPP layer. - WARNING: The struct opp pointer should not be parsed or modified by the - users. The defaults of for an instance is populated by opp_add, but the - availability of the OPP can be modified by opp_enable/disable functions. + WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the + users. The defaults of for an instance is populated by dev_pm_opp_add, but the + availability of the OPP can be modified by dev_pm_opp_enable/disable functions. struct device - This is used to identify a domain to the OPP layer. The nature of the device and it's implementation is left to the user of @@ -377,19 +377,19 @@ Overall, in a simplistic view, the data structure operations is represented as following: Initialization / modification: - +-----+ /- opp_enable -opp_add --> | opp | <------- - | +-----+ \- opp_disable + +-----+ /- dev_pm_opp_enable +dev_pm_opp_add --> | opp | <------- + | +-----+ \- dev_pm_opp_disable \-------> domain_info(device) Search functions: - /-- opp_find_freq_ceil ---\ +-----+ -domain_info<---- opp_find_freq_exact -----> | opp | - \-- opp_find_freq_floor ---/ +-----+ + /-- dev_pm_opp_find_freq_ceil ---\ +-----+ +domain_info<---- dev_pm_opp_find_freq_exact -----> | opp | + \-- dev_pm_opp_find_freq_floor ---/ +-----+ Retrieval functions: -+-----+ /- opp_get_voltage ++-----+ /- dev_pm_opp_get_voltage | opp | <--- -+-----+ \- opp_get_freq ++-----+ \- dev_pm_opp_get_freq -domain_info <- opp_get_opp_count +domain_info <- dev_pm_opp_get_opp_count diff --git a/Documentation/power/power_supply_class.txt b/Documentation/power/power_supply_class.txt index 3f10b39b0346..89a8816990ff 100644 --- a/Documentation/power/power_supply_class.txt +++ b/Documentation/power/power_supply_class.txt @@ -135,11 +135,11 @@ CAPACITY_LEVEL - capacity level. This corresponds to POWER_SUPPLY_CAPACITY_LEVEL_*. TEMP - temperature of the power supply. -TEMP_ALERT_MIN - minimum battery temperature alert value in milli centigrade. -TEMP_ALERT_MAX - maximum battery temperature alert value in milli centigrade. +TEMP_ALERT_MIN - minimum battery temperature alert. +TEMP_ALERT_MAX - maximum battery temperature alert. TEMP_AMBIENT - ambient temperature. -TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert value in milli centigrade. -TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert value in milli centigrade. +TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert. +TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert. TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e. while battery powers a load) diff --git a/Documentation/power/powercap/powercap.txt b/Documentation/power/powercap/powercap.txt new file mode 100644 index 000000000000..1e6ef164e07a --- /dev/null +++ b/Documentation/power/powercap/powercap.txt @@ -0,0 +1,236 @@ +Power Capping Framework +================================== + +The power capping framework provides a consistent interface between the kernel +and the user space that allows power capping drivers to expose the settings to +user space in a uniform way. + +Terminology +========================= +The framework exposes power capping devices to user space via sysfs in the +form of a tree of objects. The objects at the root level of the tree represent +'control types', which correspond to different methods of power capping. For +example, the intel-rapl control type represents the Intel "Running Average +Power Limit" (RAPL) technology, whereas the 'idle-injection' control type +corresponds to the use of idle injection for controlling power. + +Power zones represent different parts of the system, which can be controlled and +monitored using the power capping method determined by the control type the +given zone belongs to. They each contain attributes for monitoring power, as +well as controls represented in the form of power constraints. If the parts of +the system represented by different power zones are hierarchical (that is, one +bigger part consists of multiple smaller parts that each have their own power +controls), those power zones may also be organized in a hierarchy with one +parent power zone containing multiple subzones and so on to reflect the power +control topology of the system. In that case, it is possible to apply power +capping to a set of devices together using the parent power zone and if more +fine grained control is required, it can be applied through the subzones. + + +Example sysfs interface tree: + +/sys/devices/virtual/powercap +??? intel-rapl + ??? intel-rapl:0 + ? ??? constraint_0_name + ? ??? constraint_0_power_limit_uw + ? ??? constraint_0_time_window_us + ? ??? constraint_1_name + ? ??? constraint_1_power_limit_uw + ? ??? constraint_1_time_window_us + ? ??? device -> ../../intel-rapl + ? ??? energy_uj + ? ??? intel-rapl:0:0 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:0 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? intel-rapl:0:1 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:0 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? max_energy_range_uj + ? ??? max_power_range_uw + ? ??? name + ? ??? enabled + ? ??? power + ? ? ??? async + ? ? [] + ? ??? subsystem -> ../../../../../class/power_cap + ? ??? enabled + ? ??? uevent + ??? intel-rapl:1 + ? ??? constraint_0_name + ? ??? constraint_0_power_limit_uw + ? ??? constraint_0_time_window_us + ? ??? constraint_1_name + ? ??? constraint_1_power_limit_uw + ? ??? constraint_1_time_window_us + ? ??? device -> ../../intel-rapl + ? ??? energy_uj + ? ??? intel-rapl:1:0 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:1 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? intel-rapl:1:1 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:1 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? max_energy_range_uj + ? ??? max_power_range_uw + ? ??? name + ? ??? enabled + ? ??? power + ? ? ??? async + ? ? [] + ? ??? subsystem -> ../../../../../class/power_cap + ? ??? uevent + ??? power + ? ??? async + ? [] + ??? subsystem -> ../../../../class/power_cap + ??? enabled + ??? uevent + +The above example illustrates a case in which the Intel RAPL technology, +available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one +control type called intel-rapl which contains two power zones, intel-rapl:0 and +intel-rapl:1, representing CPU packages. Each of these power zones contains +two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the +"core" and the "uncore" parts of the given CPU package, respectively. All of +the zones and subzones contain energy monitoring attributes (energy_uj, +max_energy_range_uj) and constraint attributes (constraint_*) allowing controls +to be applied (the constraints in the 'package' power zones apply to the whole +CPU packages and the subzone constraints only apply to the respective parts of +the given package individually). Since Intel RAPL doesn't provide instantaneous +power value, there is no power_uw attribute. + +In addition to that, each power zone contains a name attribute, allowing the +part of the system represented by that zone to be identified. +For example: + +cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name +package-0 + +The Intel RAPL technology allows two constraints, short term and long term, +with two different time windows to be applied to each power zone. Thus for +each zone there are 2 attributes representing the constraint names, 2 power +limits and 2 attributes representing the sizes of the time windows. Such that, +constraint_j_* attributes correspond to the jth constraint (j = 0,1). + +For example: + constraint_0_name + constraint_0_power_limit_uw + constraint_0_time_window_us + constraint_1_name + constraint_1_power_limit_uw + constraint_1_time_window_us + +Power Zone Attributes +================================= +Monitoring attributes +---------------------- + +energy_uj (rw): Current energy counter in micro joules. Write "0" to reset. +If the counter can not be reset, then this attribute is read only. + +max_energy_range_uj (ro): Range of the above energy counter in micro-joules. + +power_uw (ro): Current power in micro watts. + +max_power_range_uw (ro): Range of the above power value in micro-watts. + +name (ro): Name of this power zone. + +It is possible that some domains have both power ranges and energy counter ranges; +however, only one is mandatory. + +Constraints +---------------- +constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be +applicable for the time window specified by "constraint_X_time_window_us". + +constraint_X_time_window_us (rw): Time window in micro seconds. + +constraint_X_name (ro): An optional name of the constraint + +constraint_X_max_power_uw(ro): Maximum allowed power in micro watts. + +constraint_X_min_power_uw(ro): Minimum allowed power in micro watts. + +constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds. + +constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds. + +Except power_limit_uw and time_window_us other fields are optional. + +Common zone and control type attributes +---------------------------------------- +enabled (rw): Enable/Disable controls at zone level or for all zones using +a control type. + +Power Cap Client Driver Interface +================================== +The API summary: + +Call powercap_register_control_type() to register control type object. +Call powercap_register_zone() to register a power zone (under a given +control type), either as a top-level power zone or as a subzone of another +power zone registered earlier. +The number of constraints in a power zone and the corresponding callbacks have +to be defined prior to calling powercap_register_zone() to register that zone. + +To Free a power zone call powercap_unregister_zone(). +To free a control type object call powercap_unregister_control_type(). +Detailed API can be generated using kernel-doc on include/linux/powercap.h. diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 71d8fe4e75d3..b6ce00b2be9a 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -145,11 +145,13 @@ The action performed by the idle callback is totally dependent on the subsystem if the device can be suspended (i.e. if all of the conditions necessary for suspending the device are satisfied) and to queue up a suspend request for the device in that case. If there is no idle callback, or if the callback returns -0, then the PM core will attempt to carry out a runtime suspend of the device; -in essence, it will call pm_runtime_suspend() directly. To prevent this (for -example, if the callback routine has started a delayed suspend), the routine -should return a non-zero value. Negative error return codes are ignored by the -PM core. +0, then the PM core will attempt to carry out a runtime suspend of the device, +also respecting devices configured for autosuspend. In essence this means a +call to pm_runtime_autosuspend() (do note that drivers needs to update the +device last busy mark, pm_runtime_mark_last_busy(), to control the delay under +this circumstance). To prevent this (for example, if the callback routine has +started a delayed suspend), the routine must return a non-zero value. Negative +error return codes are ignored by the PM core. The helper functions provided by the PM core, described in Section 4, guarantee that the following constraints are met with respect to runtime PM callbacks for @@ -308,7 +310,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: - execute the subsystem-level idle callback for the device; returns an error code on failure, where -EINPROGRESS means that ->runtime_idle() is already being executed; if there is no callback or the callback returns 0 - then run pm_runtime_suspend(dev) and return its result + then run pm_runtime_autosuspend(dev) and return its result int pm_runtime_suspend(struct device *dev); - execute the subsystem-level suspend callback for the device; returns 0 on @@ -545,13 +547,11 @@ helper functions described in Section 4. In that case, pm_runtime_resume() should be used. Of course, for this purpose the device's runtime PM has to be enabled earlier by calling pm_runtime_enable(). -If the device bus type's or driver's ->probe() callback runs -pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, -they will fail returning -EAGAIN, because the device's usage counter is -incremented by the driver core before executing ->probe(). Still, it may be -desirable to suspend the device as soon as ->probe() has finished, so the driver -core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for -the device at that time. +It may be desirable to suspend the device once ->probe() has finished. +Therefore the driver core uses the asyncronous pm_request_idle() to submit a +request to execute the subsystem-level idle callback for the device at that +time. A driver that makes use of the runtime autosuspend feature, may want to +update the last busy mark before returning from ->probe(). Moreover, the driver core prevents runtime PM callbacks from racing with the bus notifier callback in __device_release_driver(), which is necessary, because the @@ -654,7 +654,7 @@ out the following operations: __pm_runtime_disable() with 'false' as the second argument for every device right before executing the subsystem-level .suspend_late() callback for it. - * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync() + * During system resume it calls pm_runtime_enable() and pm_runtime_put() for every device right after executing the subsystem-level .resume_early() callback and right after executing the subsystem-level .resume() callback for it, respectively. diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c index f59ded066108..a74d0a84d329 100644 --- a/Documentation/ptp/testptp.c +++ b/Documentation/ptp/testptp.c @@ -100,6 +100,11 @@ static long ppb_to_scaled_ppm(int ppb) return (long) (ppb * 65.536); } +static int64_t pctns(struct ptp_clock_time *t) +{ + return t->sec * 1000000000LL + t->nsec; +} + static void usage(char *progname) { fprintf(stderr, @@ -112,6 +117,8 @@ static void usage(char *progname) " -f val adjust the ptp clock frequency by 'val' ppb\n" " -g get the ptp clock time\n" " -h prints this message\n" + " -k val measure the time offset between system and phc clock\n" + " for 'val' times (Maximum 25)\n" " -p val enable output with a period of 'val' nanoseconds\n" " -P val enable or disable (val=1|0) the system clock PPS\n" " -s set the ptp clock time from the system time\n" @@ -133,8 +140,12 @@ int main(int argc, char *argv[]) struct itimerspec timeout; struct sigevent sigevent; + struct ptp_clock_time *pct; + struct ptp_sys_offset *sysoff; + + char *progname; - int c, cnt, fd; + int i, c, cnt, fd; char *device = DEVICE; clockid_t clkid; @@ -144,14 +155,19 @@ int main(int argc, char *argv[]) int extts = 0; int gettime = 0; int oneshot = 0; + int pct_offset = 0; + int n_samples = 0; int periodic = 0; int perout = -1; int pps = -1; int settime = 0; + int64_t t1, t2, tp; + int64_t interval, offset; + progname = strrchr(argv[0], '/'); progname = progname ? 1+progname : argv[0]; - while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:sSt:v"))) { + while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghk:p:P:sSt:v"))) { switch (c) { case 'a': oneshot = atoi(optarg); @@ -174,6 +190,10 @@ int main(int argc, char *argv[]) case 'g': gettime = 1; break; + case 'k': + pct_offset = 1; + n_samples = atoi(optarg); + break; case 'p': perout = atoi(optarg); break; @@ -376,6 +396,47 @@ int main(int argc, char *argv[]) } } + if (pct_offset) { + if (n_samples <= 0 || n_samples > 25) { + puts("n_samples should be between 1 and 25"); + usage(progname); + return -1; + } + + sysoff = calloc(1, sizeof(*sysoff)); + if (!sysoff) { + perror("calloc"); + return -1; + } + sysoff->n_samples = n_samples; + + if (ioctl(fd, PTP_SYS_OFFSET, sysoff)) + perror("PTP_SYS_OFFSET"); + else + puts("system and phc clock time offset request okay"); + + pct = &sysoff->ts[0]; + for (i = 0; i < sysoff->n_samples; i++) { + t1 = pctns(pct+2*i); + tp = pctns(pct+2*i+1); + t2 = pctns(pct+2*i+2); + interval = t2 - t1; + offset = (t2 + t1) / 2 - tp; + + printf("system time: %ld.%ld\n", + (pct+2*i)->sec, (pct+2*i)->nsec); + printf("phc time: %ld.%ld\n", + (pct+2*i+1)->sec, (pct+2*i+1)->nsec); + printf("system time: %ld.%ld\n", + (pct+2*i+2)->sec, (pct+2*i+2)->nsec); + printf("system/phc clock time offset is %ld ns\n" + "system clock time delay is %ld ns\n", + offset, interval); + } + + free(sysoff); + } + close(fd); return 0; } diff --git a/Documentation/pwm.txt b/Documentation/pwm.txt index 1039b68fe9c6..93cb97974986 100644 --- a/Documentation/pwm.txt +++ b/Documentation/pwm.txt @@ -39,7 +39,7 @@ New users should use the pwm_get() function and pass to it the consumer device or a consumer name. pwm_put() is used to free the PWM device. Managed variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist. -After being requested a PWM has to be configured using: +After being requested, a PWM has to be configured using: int pwm_config(struct pwm_device *pwm, int duty_ns, int period_ns); @@ -94,7 +94,7 @@ for new drivers to use the generic PWM framework. A new PWM controller/chip can be added using pwmchip_add() and removed again with pwmchip_remove(). pwmchip_add() takes a filled in struct pwm_chip as argument which provides a description of the PWM chip, the -number of PWM devices provider by the chip and the chip-specific +number of PWM devices provided by the chip and the chip-specific implementation of the supported PWM operations to the framework. Locking diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX index 414235c1fcfc..45c82fd3e9d3 100644 --- a/Documentation/security/00-INDEX +++ b/Documentation/security/00-INDEX @@ -22,3 +22,5 @@ keys.txt - description of the kernel key retention service. tomoyo.txt - documentation on the TOMOYO Linux Security Module. +IMA-templates.txt + - documentation on the template management mechanism for IMA. diff --git a/Documentation/security/IMA-templates.txt b/Documentation/security/IMA-templates.txt new file mode 100644 index 000000000000..a777e5f1df5b --- /dev/null +++ b/Documentation/security/IMA-templates.txt @@ -0,0 +1,87 @@ + IMA Template Management Mechanism + + +==== INTRODUCTION ==== + +The original 'ima' template is fixed length, containing the filedata hash +and pathname. The filedata hash is limited to 20 bytes (md5/sha1). +The pathname is a null terminated string, limited to 255 characters. +To overcome these limitations and to add additional file metadata, it is +necessary to extend the current version of IMA by defining additional +templates. For example, information that could be possibly reported are +the inode UID/GID or the LSM labels either of the inode and of the process +that is accessing it. + +However, the main problem to introduce this feature is that, each time +a new template is defined, the functions that generate and display +the measurements list would include the code for handling a new format +and, thus, would significantly grow over the time. + +The proposed solution solves this problem by separating the template +management from the remaining IMA code. The core of this solution is the +definition of two new data structures: a template descriptor, to determine +which information should be included in the measurement list; a template +field, to generate and display data of a given type. + +Managing templates with these structures is very simple. To support +a new data type, developers define the field identifier and implement +two functions, init() and show(), respectively to generate and display +measurement entries. Defining a new template descriptor requires +specifying the template format, a string of field identifiers separated +by the '|' character. While in the current implementation it is possible +to define new template descriptors only by adding their definition in the +template specific code (ima_template.c), in a future version it will be +possible to register a new template on a running kernel by supplying to IMA +the desired format string. In this version, IMA initializes at boot time +all defined template descriptors by translating the format into an array +of template fields structures taken from the set of the supported ones. + +After the initialization step, IMA will call ima_alloc_init_template() +(new function defined within the patches for the new template management +mechanism) to generate a new measurement entry by using the template +descriptor chosen through the kernel configuration or through the newly +introduced 'ima_template=' kernel command line parameter. It is during this +phase that the advantages of the new architecture are clearly shown: +the latter function will not contain specific code to handle a given template +but, instead, it simply calls the init() method of the template fields +associated to the chosen template descriptor and store the result (pointer +to allocated data and data length) in the measurement entry structure. + +The same mechanism is employed to display measurements entries. +The functions ima[_ascii]_measurements_show() retrieve, for each entry, +the template descriptor used to produce that entry and call the show() +method for each item of the array of template fields structures. + + + +==== SUPPORTED TEMPLATE FIELDS AND DESCRIPTORS ==== + +In the following, there is the list of supported template fields +('<identifier>': description), that can be used to define new template +descriptors by adding their identifier to the format string +(support for more data types will be added later): + + - 'd': the digest of the event (i.e. the digest of a measured file), + calculated with the SHA1 or MD5 hash algorithm; + - 'n': the name of the event (i.e. the file name), with size up to 255 bytes; + - 'd-ng': the digest of the event, calculated with an arbitrary hash + algorithm (field format: [<hash algo>:]digest, where the digest + prefix is shown only if the hash algorithm is not SHA1 or MD5); + - 'n-ng': the name of the event, without size limitations. + + +Below, there is the list of defined template descriptors: + - "ima": its format is 'd|n'; + - "ima-ng" (default): its format is 'd-ng|n-ng'. + + + +==== USE ==== + +To specify the template descriptor to be used to generate measurement entries, +currently the following methods are supported: + + - select a template descriptor among those supported in the kernel + configuration ('ima-ng' is the default choice); + - specify a template descriptor name from the kernel command line through + the 'ima_template=' parameter. diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt index 7b4145d00452..a4c33f1a7c6d 100644 --- a/Documentation/security/keys.txt +++ b/Documentation/security/keys.txt @@ -865,15 +865,14 @@ encountered: calling processes has a searchable link to the key from one of its keyrings. There are three functions for dealing with these: - key_ref_t make_key_ref(const struct key *key, - unsigned long possession); + key_ref_t make_key_ref(const struct key *key, bool possession); struct key *key_ref_to_ptr(const key_ref_t key_ref); - unsigned long is_key_possessed(const key_ref_t key_ref); + bool is_key_possessed(const key_ref_t key_ref); The first function constructs a key reference from a key pointer and - possession information (which must be 0 or 1 and not any other value). + possession information (which must be true or false). The second function retrieves the key pointer from a reference and the third retrieves the possession flag. @@ -961,14 +960,17 @@ payload contents" for more information. the argument will not be parsed. -(*) Extra references can be made to a key by calling the following function: +(*) Extra references can be made to a key by calling one of the following + functions: + struct key *__key_get(struct key *key); struct key *key_get(struct key *key); - These need to be disposed of by calling key_put() when they've been - finished with. The key pointer passed in will be returned. If the pointer - is NULL or CONFIG_KEYS is not set then the key will not be dereferenced and - no increment will take place. + Keys so references will need to be disposed of by calling key_put() when + they've been finished with. The key pointer passed in will be returned. + + In the case of key_get(), if the pointer is NULL or CONFIG_KEYS is not set + then the key will not be dereferenced and no increment will take place. (*) A key's serial number can be obtained by calling: diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 4273b2d71a27..26b7ee491df8 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -290,13 +290,24 @@ Default value is "/sbin/hotplug". kptr_restrict: This toggle indicates whether restrictions are placed on -exposing kernel addresses via /proc and other interfaces. When -kptr_restrict is set to (0), there are no restrictions. When -kptr_restrict is set to (1), the default, kernel pointers -printed using the %pK format specifier will be replaced with 0's -unless the user has CAP_SYSLOG. When kptr_restrict is set to -(2), kernel pointers printed using %pK will be replaced with 0's -regardless of privileges. +exposing kernel addresses via /proc and other interfaces. + +When kptr_restrict is set to (0), the default, there are no restrictions. + +When kptr_restrict is set to (1), kernel pointers printed using the %pK +format specifier will be replaced with 0's unless the user has CAP_SYSLOG +and effective user and group ids are equal to the real ids. This is +because %pK checks are done at read() time rather than open() time, so +if permissions are elevated between the open() and the read() (e.g via +a setuid binary) then %pK will not leak kernel pointers to unprivileged +users. Note, this is a temporary solution only. The correct long-term +solution is to do the permission checks at open() time. Consider removing +world read permissions from files that use %pK, and using dmesg_restrict +to protect against uses of %pK in dmesg(8) if leaking kernel pointer +values to unprivileged users is a concern. + +When kptr_restrict is set to (2), kernel pointers printed using +%pK will be replaced with 0's regardless of privileges. ============================================================== diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 79a797eb3e87..1fbd4eb7b64a 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -119,8 +119,11 @@ other appears as 0 when read. dirty_background_ratio -Contains, as a percentage of total system memory, the number of pages at which -the background kernel flusher threads will start writing out dirty data. +Contains, as a percentage of total available memory that contains free pages +and reclaimable pages, the number of pages at which the background kernel +flusher threads will start writing out dirty data. + +The total avaiable memory is not equal to total system memory. ============================================================== @@ -151,9 +154,11 @@ interval will be written out next time a flusher thread wakes up. dirty_ratio -Contains, as a percentage of total system memory, the number of pages at which -a process which is generating disk writes will itself start writing out dirty -data. +Contains, as a percentage of total available memory that contains free pages +and reclaimable pages, the number of pages at which a process which is +generating disk writes will itself start writing out dirty data. + +The total avaiable memory is not equal to total system memory. ============================================================== diff --git a/Documentation/target/tcm_mod_builder.py b/Documentation/target/tcm_mod_builder.py index 54d29c1320ed..230ce71f4d75 100755 --- a/Documentation/target/tcm_mod_builder.py +++ b/Documentation/target/tcm_mod_builder.py @@ -440,15 +440,15 @@ def tcm_mod_build_configfs(proto_ident, fabric_mod_dir_var, fabric_mod_name): buf += " /*\n" buf += " * Setup default attribute lists for various fabric->tf_cit_tmpl\n" buf += " */\n" - buf += " TF_CIT_TMPL(fabric)->tfc_wwn_cit.ct_attrs = " + fabric_mod_name + "_wwn_attrs;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_base_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_attrib_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_param_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_np_base_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_nacl_base_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_nacl_attrib_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_nacl_auth_cit.ct_attrs = NULL;\n" - buf += " TF_CIT_TMPL(fabric)->tfc_tpg_nacl_param_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_wwn_cit.ct_attrs = " + fabric_mod_name + "_wwn_attrs;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_base_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_attrib_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_param_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_np_base_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_nacl_base_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_nacl_attrib_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_nacl_auth_cit.ct_attrs = NULL;\n" + buf += " fabric->tf_cit_tmpl.tfc_tpg_nacl_param_cit.ct_attrs = NULL;\n" buf += " /*\n" buf += " * Register the fabric for use within TCM\n" buf += " */\n" diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX index a9248da5cdbc..ef2ccbf77fa2 100644 --- a/Documentation/timers/00-INDEX +++ b/Documentation/timers/00-INDEX @@ -8,5 +8,9 @@ hpet_example.c - sample hpet timer test program hrtimers.txt - subsystem for high-resolution kernel timers +NO_HZ.txt + - Summary of the different methods for the scheduler clock-interrupts management. +timers-howto.txt + - how to insert delays in the kernel the right (tm) way. timer_stats.txt - timer usage statistics diff --git a/Documentation/trace/tracepoints.txt b/Documentation/trace/tracepoints.txt index ac4170dd0f24..6b018b53177a 100644 --- a/Documentation/trace/tracepoints.txt +++ b/Documentation/trace/tracepoints.txt @@ -114,3 +114,8 @@ core kernel image or in modules. If the tracepoint has to be used in kernel modules, an EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be used to export the defined tracepoints. + +Note: The convenience macro TRACE_EVENT provides an alternative way to + define tracepoints. Check http://lwn.net/Articles/379903, + http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362 + for a series of articles with more details. diff --git a/Documentation/usb/gadget_configfs.txt b/Documentation/usb/gadget_configfs.txt index 8ec2a67c39b7..4cf53e406613 100644 --- a/Documentation/usb/gadget_configfs.txt +++ b/Documentation/usb/gadget_configfs.txt @@ -26,7 +26,7 @@ Linux provides a number of functions for gadgets to use. Creating a gadget means deciding what configurations there will be and which functions each configuration will provide. -Configfs (please see Documentation/filesystems/configfs/*) lends itslef nicely +Configfs (please see Documentation/filesystems/configfs/*) lends itself nicely for the purpose of telling the kernel about the above mentioned decision. This document is about how to do it. @@ -99,7 +99,7 @@ directories must be created: $ mkdir configs/<name>.<number> where <name> can be any string which is legal in a filesystem and the -<numebr> is the configuration's number, e.g.: +<number> is the configuration's number, e.g.: $ mkdir configs/c.1 @@ -327,7 +327,7 @@ from the buffer to the cs), but it is up to the implementer of the two functions to decide what they actually do. typedef struct configured_structure cs; -typedef struc specific_attribute sa; +typedef struct specific_attribute sa; sa +----------------------------------+ diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX new file mode 100644 index 000000000000..641ec9220179 --- /dev/null +++ b/Documentation/virtual/kvm/00-INDEX @@ -0,0 +1,24 @@ +00-INDEX + - this file. +api.txt + - KVM userspace API. +cpuid.txt + - KVM-specific cpuid leaves (x86). +devices/ + - KVM_CAP_DEVICE_CTRL userspace API. +hypercalls.txt + - KVM hypercalls. +locking.txt + - notes on KVM locks. +mmu.txt + - the x86 kvm shadow mmu. +msr.txt + - KVM-specific MSRs (x86). +nested-vmx.txt + - notes on nested virtualization for Intel x86 processors. +ppc-pv.txt + - the paravirtualization interface on PowerPC. +review-checklist.txt + - review checklist for KVM patches. +timekeeping.txt + - timekeeping virtualization for x86-based architectures. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 858aecf21db2..a30035dd4c26 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1122,9 +1122,9 @@ struct kvm_cpuid2 { struct kvm_cpuid_entry2 entries[0]; }; -#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1 -#define KVM_CPUID_FLAG_STATEFUL_FUNC 2 -#define KVM_CPUID_FLAG_STATE_READ_NEXT 4 +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) struct kvm_cpuid_entry2 { __u32 function; @@ -1810,6 +1810,50 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_TB_OFFSET | 64 + PPC | KVM_REG_PPC_SPMC1 | 32 + PPC | KVM_REG_PPC_SPMC2 | 32 + PPC | KVM_REG_PPC_IAMR | 64 + PPC | KVM_REG_PPC_TFHAR | 64 + PPC | KVM_REG_PPC_TFIAR | 64 + PPC | KVM_REG_PPC_TEXASR | 64 + PPC | KVM_REG_PPC_FSCR | 64 + PPC | KVM_REG_PPC_PSPB | 32 + PPC | KVM_REG_PPC_EBBHR | 64 + PPC | KVM_REG_PPC_EBBRR | 64 + PPC | KVM_REG_PPC_BESCR | 64 + PPC | KVM_REG_PPC_TAR | 64 + PPC | KVM_REG_PPC_DPDES | 64 + PPC | KVM_REG_PPC_DAWR | 64 + PPC | KVM_REG_PPC_DAWRX | 64 + PPC | KVM_REG_PPC_CIABR | 64 + PPC | KVM_REG_PPC_IC | 64 + PPC | KVM_REG_PPC_VTB | 64 + PPC | KVM_REG_PPC_CSIGR | 64 + PPC | KVM_REG_PPC_TACR | 64 + PPC | KVM_REG_PPC_TCSCR | 64 + PPC | KVM_REG_PPC_PID | 64 + PPC | KVM_REG_PPC_ACOP | 64 + PPC | KVM_REG_PPC_VRSAVE | 32 + PPC | KVM_REG_PPC_LPCR | 64 + PPC | KVM_REG_PPC_PPR | 64 + PPC | KVM_REG_PPC_ARCH_COMPAT 32 + PPC | KVM_REG_PPC_TM_GPR0 | 64 + ... + PPC | KVM_REG_PPC_TM_GPR31 | 64 + PPC | KVM_REG_PPC_TM_VSR0 | 128 + ... + PPC | KVM_REG_PPC_TM_VSR63 | 128 + PPC | KVM_REG_PPC_TM_CR | 64 + PPC | KVM_REG_PPC_TM_LR | 64 + PPC | KVM_REG_PPC_TM_CTR | 64 + PPC | KVM_REG_PPC_TM_FPSCR | 64 + PPC | KVM_REG_PPC_TM_AMR | 64 + PPC | KVM_REG_PPC_TM_PPR | 64 + PPC | KVM_REG_PPC_TM_VRSAVE | 64 + PPC | KVM_REG_PPC_TM_VSCR | 32 + PPC | KVM_REG_PPC_TM_DSCR | 64 + PPC | KVM_REG_PPC_TM_TAR | 64 ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: @@ -2304,7 +2348,31 @@ Possible features: Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). -4.83 KVM_GET_REG_LIST +4.83 KVM_ARM_PREFERRED_TARGET + +Capability: basic +Architectures: arm, arm64 +Type: vm ioctl +Parameters: struct struct kvm_vcpu_init (out) +Returns: 0 on success; -1 on error +Errors: + ENODEV: no preferred target available for the host + +This queries KVM for preferred CPU target type which can be emulated +by KVM on underlying host. + +The ioctl returns struct kvm_vcpu_init instance containing information +about preferred CPU target type and recommended features for it. The +kvm_vcpu_init->features bitmap returned will have feature bits set if +the preferred target recommends setting these features, but this is +not mandatory. + +The information returned by this ioctl can be used to prepare an instance +of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in +in VCPU matching underlying host. + + +4.84 KVM_GET_REG_LIST Capability: basic Architectures: arm, arm64 @@ -2323,8 +2391,7 @@ struct kvm_reg_list { This ioctl returns the guest registers that are supported for the KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. - -4.84 KVM_ARM_SET_DEVICE_ADDR +4.85 KVM_ARM_SET_DEVICE_ADDR Capability: KVM_CAP_ARM_SET_DEVICE_ADDR Architectures: arm, arm64 @@ -2362,7 +2429,7 @@ must be called after calling KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the base addresses will return -EEXIST. -4.85 KVM_PPC_RTAS_DEFINE_TOKEN +4.86 KVM_PPC_RTAS_DEFINE_TOKEN Capability: KVM_CAP_PPC_RTAS Architectures: ppc @@ -2661,6 +2728,77 @@ and usually define the validity of a groups of registers. (e.g. one bit }; +4.81 KVM_GET_EMULATED_CPUID + +Capability: KVM_CAP_EXT_EMUL_CPUID +Architectures: x86 +Type: system ioctl +Parameters: struct kvm_cpuid2 (in/out) +Returns: 0 on success, -1 on error + +struct kvm_cpuid2 { + __u32 nent; + __u32 flags; + struct kvm_cpuid_entry2 entries[0]; +}; + +The member 'flags' is used for passing flags from userspace. + +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) + +struct kvm_cpuid_entry2 { + __u32 function; + __u32 index; + __u32 flags; + __u32 eax; + __u32 ebx; + __u32 ecx; + __u32 edx; + __u32 padding[3]; +}; + +This ioctl returns x86 cpuid features which are emulated by +kvm.Userspace can use the information returned by this ioctl to query +which features are emulated by kvm instead of being present natively. + +Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2 +structure with the 'nent' field indicating the number of entries in +the variable-size array 'entries'. If the number of entries is too low +to describe the cpu capabilities, an error (E2BIG) is returned. If the +number is too high, the 'nent' field is adjusted and an error (ENOMEM) +is returned. If the number is just right, the 'nent' field is adjusted +to the number of valid entries in the 'entries' array, which is then +filled. + +The entries returned are the set CPUID bits of the respective features +which kvm emulates, as returned by the CPUID instruction, with unknown +or unsupported feature bits cleared. + +Features like x2apic, for example, may not be present in the host cpu +but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be +emulated efficiently and thus not included here. + +The fields in each entry are defined as follows: + + function: the eax value used to obtain the entry + index: the ecx value used to obtain the entry (for entries that are + affected by ecx) + flags: an OR of zero or more of the following: + KVM_CPUID_FLAG_SIGNIFCANT_INDEX: + if the index field is valid + KVM_CPUID_FLAG_STATEFUL_FUNC: + if cpuid for this function returns different values for successive + invocations; there will be several entries with the same function, + all with this flag set + KVM_CPUID_FLAG_STATE_READ_NEXT: + for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is + the first entry to be read by a cpu + eax, ebx, ecx, edx: the values returned by the cpuid instruction for + this function/index combination + + 6. Capabilities that can be enabled ----------------------------------- diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 22ff659bc0fb..3c65feb83010 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -43,6 +43,13 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 ------------------------------------------------------------------------------ +KVM_FEATURE_STEAL_TIME || 5 || steal time can be enabled by + || || writing to msr 0x4b564d03. +------------------------------------------------------------------------------ +KVM_FEATURE_PV_EOI || 6 || paravirtualized end of interrupt + || || handler can be enabled by writing + || || to msr 0x4b564d04. +------------------------------------------------------------------------------ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt new file mode 100644 index 000000000000..ef51740c67ca --- /dev/null +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -0,0 +1,22 @@ +VFIO virtual device +=================== + +Device types supported: + KVM_DEV_TYPE_VFIO + +Only one VFIO instance may be created per VM. The created device +tracks VFIO groups in use by the VM and features of those groups +important to the correctness and acceleration of the VM. As groups +are enabled and disabled for use by the VM, KVM should be updated +about their presence. When registered with KVM, a reference to the +VFIO-group is held by KVM. + +Groups: + KVM_DEV_VFIO_GROUP + +KVM_DEV_VFIO_GROUP attributes: + KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking + KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking + +For each, kvm_device_attr.addr points to an int32_t file descriptor +for the VFIO group. diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt index 41b7ac9884b5..f8869410d40c 100644 --- a/Documentation/virtual/kvm/locking.txt +++ b/Documentation/virtual/kvm/locking.txt @@ -132,10 +132,14 @@ See the comments in spte_has_volatile_bits() and mmu_spte_update(). ------------ Name: kvm_lock -Type: raw_spinlock +Type: spinlock_t Arch: any Protects: - vm_list - - hardware virtualization enable/disable + +Name: kvm_count_lock +Type: raw_spinlock_t +Arch: any +Protects: - hardware virtualization enable/disable Comment: 'raw' because hardware enabling/disabling must be atomic /wrt migration. @@ -151,3 +155,14 @@ Type: spinlock_t Arch: any Protects: -shadow page/shadow tlb entry Comment: it is a spinlock since it is used in mmu notifier. + +Name: kvm->srcu +Type: srcu lock +Arch: any +Protects: - kvm->memslots + - kvm->buses +Comment: The srcu read lock must be held while accessing memslots (e.g. + when using gfn_to_* functions) and while accessing in-kernel + MMIO/PIO address->device structure mapping (kvm->buses). + The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu + if it is needed by multiple functions. diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX index 5481c8ba3412..a39d06680e1c 100644 --- a/Documentation/vm/00-INDEX +++ b/Documentation/vm/00-INDEX @@ -4,10 +4,12 @@ active_mm.txt - An explanation from Linus about tsk->active_mm vs tsk->mm. balance - various information on memory balancing. -hugepage-mmap.c - - Example app using huge page memory with the mmap system call. -hugepage-shm.c - - Example app using huge page memory with Sys V shared memory system calls. +cleancache.txt + - Intro to cleancache and page-granularity victim cache. +frontswap.txt + - Outline frontswap, part of the transcendent memory frontend. +highmem.txt + - Outline of highmem and common issues. hugetlbpage.txt - a brief summary of hugetlbpage support in the Linux kernel. hwpoison.txt @@ -16,21 +18,23 @@ ksm.txt - how to use the Kernel Samepage Merging feature. locking - info on how locking and synchronization is done in the Linux vm code. -map_hugetlb.c - - an example program that uses the MAP_HUGETLB mmap flag. numa - information about NUMA specific code in the Linux vm. numa_memory_policy.txt - documentation of concepts and APIs of the 2.6 memory policy support. overcommit-accounting - description of the Linux kernels overcommit handling modes. -page-types.c - - Tool for querying page flags page_migration - description of page migration in NUMA systems. pagemap.txt - pagemap, from the userspace perspective slub.txt - a short users guide for SLUB. +soft-dirty.txt + - short explanation for soft-dirty PTEs +transhuge.txt + - Transparent Hugepage Support, alternative way of using hugepages. unevictable-lru.txt - Unevictable LRU infrastructure +zswap.txt + - Intro to compressed cache for swap pages diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock new file mode 100644 index 000000000000..6dea4fd5c961 --- /dev/null +++ b/Documentation/vm/split_page_table_lock @@ -0,0 +1,94 @@ +Split page table lock +===================== + +Originally, mm->page_table_lock spinlock protected all page tables of the +mm_struct. But this approach leads to poor page fault scalability of +multi-threaded applications due high contention on the lock. To improve +scalability, split page table lock was introduced. + +With split page table lock we have separate per-table lock to serialize +access to the table. At the moment we use split lock for PTE and PMD +tables. Access to higher level tables protected by mm->page_table_lock. + +There are helpers to lock/unlock a table and other accessor functions: + - pte_offset_map_lock() + maps pte and takes PTE table lock, returns pointer to the taken + lock; + - pte_unmap_unlock() + unlocks and unmaps PTE table; + - pte_alloc_map_lock() + allocates PTE table if needed and take the lock, returns pointer + to taken lock or NULL if allocation failed; + - pte_lockptr() + returns pointer to PTE table lock; + - pmd_lock() + takes PMD table lock, returns pointer to taken lock; + - pmd_lockptr() + returns pointer to PMD table lock; + +Split page table lock for PTE tables is enabled compile-time if +CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. +If split lock is disabled, all tables guaded by mm->page_table_lock. + +Split page table lock for PMD tables is enabled, if it's enabled for PTE +tables and the architecture supports it (see below). + +Hugetlb and split page table lock +--------------------------------- + +Hugetlb can support several page sizes. We use split lock only for PMD +level, but not for PUD. + +Hugetlb-specific helpers: + - huge_pte_lock() + takes pmd split lock for PMD_SIZE page, mm->page_table_lock + otherwise; + - huge_pte_lockptr() + returns pointer to table lock; + +Support of split page table lock by an architecture +--------------------------------------------------- + +There's no need in special enabling of PTE split page table lock: +everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), +which must be called on PTE table allocation / freeing. + +Make sure the architecture doesn't use slab allocator for page table +allocation: slab uses page->slab_cache and page->first_page for its pages. +These fields share storage with page->ptl. + +PMD split lock only makes sense if you have more than two page table +levels. + +PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table +allocation and pgtable_pmd_page_dtor() on freeing. + +Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and +pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing +paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). + +With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. + +NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must +be handled properly. + +page->ptl +--------- + +page->ptl is used to access split page table lock, where 'page' is struct +page of page containing the table. It shares storage with page->private +(and few other fields in union). + +To avoid increasing size of struct page and have best performance, we use a +trick: + - if spinlock_t fits into long, we use page->ptr as spinlock, so we + can avoid indirect access and save a cache line. + - if size of spinlock_t is bigger then size of long, we use page->ptl as + pointer to spinlock_t and allocate it dynamically. This allows to use + split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs + one more cache line for indirect access; + +The spinlock_t allocated in pgtable_page_ctor() for PTE table and in +pgtable_pmd_page_ctor() for PMD table. + +Please, never access page->ptl directly -- use appropriate helper. diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 7e492d8aaeaf..00c3d31e7971 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -8,7 +8,7 @@ significant performance improvement if reads from the compressed cache are faster than reads from a swap device. NOTE: Zswap is a new feature as of v3.11 and interacts heavily with memory -reclaim. This interaction has not be fully explored on the large set of +reclaim. This interaction has not been fully explored on the large set of potential configurations and workloads that exist. For this reason, zswap is a work in progress and should be considered experimental. @@ -23,7 +23,7 @@ Some potential benefits: drastically reducing life-shortening writes. Zswap evicts pages from compressed cache on an LRU basis to the backing swap -device when the compressed pool reaches it size limit. This requirement had +device when the compressed pool reaches its size limit. This requirement had been identified in prior community discussions. To enabled zswap, the "enabled" attribute must be set to 1 at boot time. e.g. @@ -37,7 +37,7 @@ the backing swap device in the case that the compressed pool is full. Zswap makes use of zbud for the managing the compressed memory pool. Each allocation in zbud is not directly accessible by address. Rather, a handle is -return by the allocation routine and that handle must be mapped before being +returned by the allocation routine and that handle must be mapped before being accessed. The compressed memory pool grows on demand and shrinks as compressed pages are freed. The pool is not preallocated. @@ -56,7 +56,7 @@ in the swap_map goes to 0) the swap code calls the zswap invalidate function, via frontswap, to free the compressed entry. Zswap seeks to be simple in its policies. Sysfs attributes allow for one user -controlled policies: +controlled policy: * max_pool_percent - The maximum percentage of memory that the compressed pool can occupy. |