diff options
Diffstat (limited to 'Documentation')
39 files changed, 1105 insertions, 58 deletions
diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl index 08ff908aa7a2..24979f691e3e 100644 --- a/Documentation/DocBook/debugobjects.tmpl +++ b/Documentation/DocBook/debugobjects.tmpl @@ -96,6 +96,7 @@ <listitem><para>debug_object_deactivate</para></listitem> <listitem><para>debug_object_destroy</para></listitem> <listitem><para>debug_object_free</para></listitem> + <listitem><para>debug_object_assert_init</para></listitem> </itemizedlist> Each of these functions takes the address of the real object and a pointer to the object type specific debug description @@ -273,6 +274,26 @@ debug checks. </para> </sect1> + + <sect1 id="debug_object_assert_init"> + <title>debug_object_assert_init</title> + <para> + This function is called to assert that an object has been + initialized. + </para> + <para> + When the real object is not tracked by debugobjects, it calls + fixup_assert_init of the object type description structure + provided by the caller, with the hardcoded object state + ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem + by calling debug_object_init and other specific initializing + functions. + </para> + <para> + When the real object is already tracked by debugobjects it is + ignored. + </para> + </sect1> </chapter> <chapter id="fixupfunctions"> <title>Fixup functions</title> @@ -381,6 +402,35 @@ statistics. </para> </sect1> + <sect1 id="fixup_assert_init"> + <title>fixup_assert_init</title> + <para> + This function is called from the debug code whenever a problem + in debug_object_assert_init is detected. + </para> + <para> + Called from debug_object_assert_init() with a hardcoded state + ODEBUG_STATE_NOTAVAILABLE when the object is not found in the + debug bucket. + </para> + <para> + The function returns 1 when the fixup was successful, + otherwise 0. The return value is used to update the + statistics. + </para> + <para> + Note, this function should make sure debug_object_init() is + called before returning. + </para> + <para> + The handling of statically initialized objects is a special + case. The fixup function should check if this is a legitimate + case of a statically initialized object or not. In this case only + debug_object_init() should be called to make the object known to + the tracker. Then the function should return 0 because this is not + a real fixup. + </para> + </sect1> </chapter> <chapter id="bugs"> <title>Known Bugs And Assumptions</title> diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 81bc1a9ab9d8..f7ade3b3b40d 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -275,8 +275,8 @@ versions. If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is the current stable kernel. -2.6.x.y are maintained by the "stable" team <stable@kernel.org>, and are -released as needs dictate. The normal release period is approximately +2.6.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and +are released as needs dictate. The normal release period is approximately two weeks, but it can be longer if there are no pressing problems. A security-related problem, instead, can cause a release to happen almost instantly. diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 0c134f8afc6f..bff2d8be1e18 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome! RCU rather than SRCU, because RCU is almost always faster and easier to use than is SRCU. + If you need to enter your read-side critical section in a + hardirq or exception handler, and then exit that same read-side + critical section in the task that was interrupted, then you need + to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid + the lockdep checking that would otherwise this practice illegal. + Also unlike other forms of RCU, explicit initialization and cleanup is required via init_srcu_struct() and cleanup_srcu_struct(). These are passed a "struct srcu_struct" diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 31852705b586..bf778332a28f 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the same effect, but require that the readers manipulate CPU-local - counters. These counters allow limited types of blocking - within RCU read-side critical sections. SRCU also uses - CPU-local counters, and permits general blocking within - RCU read-side critical sections. These two variants of - RCU detect grace periods by sampling these counters. + counters. These counters allow limited types of blocking within + RCU read-side critical sections. SRCU also uses CPU-local + counters, and permits general blocking within RCU read-side + critical sections. These variants of RCU detect grace periods + by sampling these counters. o If I am running on a uniprocessor kernel, which can only do one thing at a time, why should I wait for a grace period? diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 4e959208f736..083d88cbc089 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning messages. +o A hardware or software issue shuts off the scheduler-clock + interrupt on a CPU that is not in dyntick-idle mode. This + problem really has happened, and seems to be most likely to + result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. + o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred @@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed. -The RCU, RCU-sched, and RCU-bh implementations have CPU stall -warning. SRCU does not have its own CPU stall warnings, but its -calls to synchronize_sched() will result in RCU-sched detecting -RCU-sched-related CPU stalls. Please note that RCU only detects -CPU stalls when there is a grace period in progress. No grace period, -no CPU stall warnings. +The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. +SRCU does not have its own CPU stall warnings, but its calls to +synchronize_sched() will result in RCU-sched detecting RCU-sched-related +CPU stalls. Please note that RCU only detects CPU stalls when there is +a grace period in progress. No grace period, no CPU stall warnings. To diagnose the cause of the stall, inspect the stack traces. The offending function will usually be near the top of the stack. diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 783d6c134d3f..d67068d0d2b9 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported. To properly exercise RCU implementations with preemptible read-side critical sections. +onoff_interval + The number of seconds between each attempt to execute a + randomly selected CPU-hotplug operation. Defaults to + zero, which disables CPU hotplugging. In HOTPLUG_CPU=n + kernels, rcutorture will silently refuse to do any + CPU-hotplug operations regardless of what value is + specified for onoff_interval. + shuffle_interval The number of seconds to keep the test threads affinitied to a particular subset of the CPUs, defaults to 3 seconds. Used in conjunction with test_no_idle_hz. +shutdown_secs The number of seconds to run the test before terminating + the test and powering off the system. The default is + zero, which disables test termination and system shutdown. + This capability is useful for automated testing. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index aaf65f6c6cd7..49587abfc2f7 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented or one greater than the interrupt-nesting depth otherwise. The number after the second "/" is the NMI nesting depth. - This field is displayed only for CONFIG_NO_HZ kernels. - o "df" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being in dynticks-idle state. - This field is displayed only for CONFIG_NO_HZ kernels. - o "of" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being offline. In a perfect world, this might never happen, but it diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 6ef692667e2f..6bbe8dcdc3da 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -4,6 +4,7 @@ to start learning about RCU: 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ What is RCU? @@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu N/A srcu_read_unlock synchronize_srcu_expedited + srcu_read_lock_raw + srcu_read_unlock_raw srcu_dereference SRCU: Initialization/cleanup @@ -855,27 +858,33 @@ list can be helpful: a. Will readers need to block? If so, you need SRCU. -b. What about the -rt patchset? If readers would need to block +b. Is it necessary to start a read-side critical section in a + hardirq handler or exception handler, and then to complete + this read-side critical section in the task that was + interrupted? If so, you need SRCU's srcu_read_lock_raw() and + srcu_read_unlock_raw() primitives. + +c. What about the -rt patchset? If readers would need to block in an non-rt kernel, you need SRCU. If readers would block in a -rt kernel, but not in a non-rt kernel, SRCU is not necessary. -c. Do you need to treat NMI handlers, hardirq handlers, +d. Do you need to treat NMI handlers, hardirq handlers, and code segments with preemption disabled (whether via preempt_disable(), local_irq_save(), local_bh_disable(), or some other mechanism) as if they were explicit RCU readers? If so, you need RCU-sched. -d. Do you need RCU grace periods to complete even in the face +e. Do you need RCU grace periods to complete even in the face of softirq monopolization of one or more of the CPUs? For example, is your code subject to network-based denial-of-service attacks? If so, you need RCU-bh. -e. Is your workload too update-intensive for normal use of +f. Is your workload too update-intensive for normal use of RCU, but inappropriate for other synchronization mechanisms? If so, consider SLAB_DESTROY_BY_RCU. But please be careful! -f. Otherwise, use RCU. +g. Otherwise, use RCU. Of course, this all assumes that you have determined that RCU is in fact the right tool for your job. diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index 771d48d3b335..208a2d465b92 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt @@ -51,15 +51,14 @@ ffc00000 ffefffff DMA memory mapping region. Memory returned ff000000 ffbfffff Reserved for future expansion of DMA mapping region. -VMALLOC_END feffffff Free for platform use, recommended. - VMALLOC_END must be aligned to a 2MB - boundary. - VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. Memory returned by vmalloc/ioremap will be dynamically placed in this region. - VMALLOC_START may be based upon the value - of the high_memory variable. + Machine specific static mappings are also + located here through iotable_init(). + VMALLOC_START is based upon the value + of the high_memory variable, and VMALLOC_END + is equal to 0xff000000. PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. This maps the platforms RAM, and typically diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 3bd585b44927..27f2b21a9d5c 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables. *** YOU HAVE BEEN WARNED! *** +Properly aligned pointers, longs, ints, and chars (and unsigned +equivalents) may be atomically loaded from and stored to in the same +sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE() +macro should be used to prevent the compiler from using optimizations +that might otherwise optimize accesses out of existence on the one hand, +or that might create unsolicited accesses on the other. + +For example consider the following code: + + while (a > 0) + do_something(); + +If the compiler can prove that do_something() does not store to the +variable a, then the compiler is within its rights transforming this to +the following: + + tmp = a; + if (a > 0) + for (;;) + do_something(); + +If you don't want the compiler to do this (and you probably don't), then +you should use something like the following: + + while (ACCESS_ONCE(a) < 0) + do_something(); + +Alternatively, you could place a barrier() call in the loop. + +For another example, consider the following code: + + tmp_a = a; + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +If the compiler can prove that do_something_with() does not store to the +variable a, then the compiler is within its rights to manufacture an +additional load as follows: + + tmp_a = a; + do_something_with(tmp_a); + tmp_a = a; + do_something_else_with(tmp_a); + +This could fatally confuse your code if it expected the same value +to be passed to do_something_with() and do_something_else_with(). + +The compiler would be likely to manufacture this additional load if +do_something_with() was an inline function that made very heavy use +of registers: reloading from variable a could save a flush to the +stack and later reload. To prevent the compiler from attacking your +code in this manner, write the following: + + tmp_a = ACCESS_ONCE(a); + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +For a final example, consider the following code, assuming that the +variable a is set at boot time before the second CPU is brought online +and never changed later, so that memory barriers are not needed: + + if (a) + b = 9; + else + b = 42; + +The compiler is within its rights to manufacture an additional store +by transforming the above code into the following: + + b = 42; + if (a) + b = 9; + +This could come as a fatal surprise to other code running concurrently +that expected b to never have the value 42 if a was zero. To prevent +the compiler from doing this, write something like: + + if (a) + ACCESS_ONCE(b) = 9; + else + ACCESS_ONCE(b) = 42; + +Don't even -think- about doing this without proper use of memory barriers, +locks, or atomic operations if variable a can change at runtime! + +*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *** + Now, we move onto the atomic operation interfaces typically implemented with the help of assembly code. diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index cc0ebc5241b3..4d8774f6f48a 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -44,8 +44,8 @@ Features: - oom-killer disable knob and oom-notifier - Root cgroup has no limit controls. - Kernel memory and Hugepages are not under control yet. We just manage - pages on LRU. To add more controls, we have to take care of performance. + Kernel memory support is work in progress, and the current version provides + basically functionality. (See Section 2.7) Brief summary of control files. @@ -72,6 +72,9 @@ Brief summary of control files. memory.oom_control # set/show oom controls. memory.numa_stat # show the number of memory usage per numa node + memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory + memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + 1. History The memory controller has a long history. A request for comments for the memory @@ -255,6 +258,27 @@ When oom event notifier is registered, event will be delivered. per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by zone->lru_lock, it has no lock of its own. +2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) + +With the Kernel memory extension, the Memory Controller is able to limit +the amount of kernel memory used by the system. Kernel memory is fundamentally +different than user memory, since it can't be swapped out, which makes it +possible to DoS the system by consuming too much of this precious resource. + +Kernel memory limits are not imposed for the root cgroup. Usage for the root +cgroup may or may not be accounted. + +Currently no soft limit is implemented for kernel memory. It is future work +to trigger slab reclaim when those limits are reached. + +2.7.1 Current Kernel Memory resources accounted + +* sockets memory pressure: some sockets protocols have memory pressure +thresholds. The Memory Controller allows them to be controlled individually +per cgroup, instead of globally. + +* tcp memory pressure: sockets memory pressure for the tcp protocol. + 3. User Interface 0. Configuration diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt new file mode 100644 index 000000000000..01b322635591 --- /dev/null +++ b/Documentation/cgroups/net_prio.txt @@ -0,0 +1,53 @@ +Network priority cgroup +------------------------- + +The Network priority cgroup provides an interface to allow an administrator to +dynamically set the priority of network traffic generated by various +applications + +Nominally, an application would set the priority of its traffic via the +SO_PRIORITY socket option. This however, is not always possible because: + +1) The application may not have been coded to set this value +2) The priority of application traffic is often a site-specific administrative + decision rather than an application defined one. + +This cgroup allows an administrator to assign a process to a group which defines +the priority of egress traffic on a given interface. Network priority groups can +be created by first mounting the cgroup filesystem. + +# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio + +With the above step, the initial group acting as the parent accounting group +becomes visible at '/sys/fs/cgroup/net_prio'. This group includes all tasks in +the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup. + +Each net_prio cgroup contains two files that are subsystem specific + +net_prio.prioidx +This file is read-only, and is simply informative. It contains a unique integer +value that the kernel uses as an internal representation of this cgroup. + +net_prio.ifpriomap +This file contains a map of the priorities assigned to traffic originating from +processes in this group and egressing the system on various interfaces. It +contains a list of tuples in the form <ifname priority>. Contents of this file +can be modified by echoing a string into the file using the same tuple format. +for example: + +echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap + +This command would force any traffic originating from processes belonging to the +iscsi net_prio cgroup and egressing on interface eth0 to have the priority of +said traffic set to the value 5. The parent accounting group also has a +writeable 'net_prio.ifpriomap' file that can be used to set a system default +priority. + +Priorities are set immediately prior to queueing a frame to the device +queueing discipline (qdisc) so priorities will be assigned prior to the hardware +queue selection being made. + +One usage for the net_prio cgroup is with mqprio qdisc allowing application +traffic to be steered to hardware/driver based traffic classes. These mappings +can then be managed by administrators or other networking protocols such as +DCBX. diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting index 903a2546f138..8a48c9b62864 100644 --- a/Documentation/development-process/5.Posting +++ b/Documentation/development-process/5.Posting @@ -271,10 +271,10 @@ copies should go to: the linux-kernel list. - If you are fixing a bug, think about whether the fix should go into the - next stable update. If so, stable@kernel.org should get a copy of the - patch. Also add a "Cc: stable@kernel.org" to the tags within the patch - itself; that will cause the stable team to get a notification when your - fix goes into the mainline. + next stable update. If so, stable@vger.kernel.org should get a copy of + the patch. Also add a "Cc: stable@vger.kernel.org" to the tags within + the patch itself; that will cause the stable team to get a notification + when your fix goes into the mainline. When selecting recipients for a patch, it is good to have an idea of who you think will eventually accept the patch and get it merged. While it diff --git a/Documentation/devicetree/bindings/arm/gic.txt b/Documentation/devicetree/bindings/arm/gic.txt index 52916b4aa1fe..9b4b82a721b6 100644 --- a/Documentation/devicetree/bindings/arm/gic.txt +++ b/Documentation/devicetree/bindings/arm/gic.txt @@ -42,6 +42,10 @@ Optional - interrupts : Interrupt source of the parent interrupt controller. Only present on secondary GICs. +- cpu-offset : per-cpu offset within the distributor and cpu interface + regions, used when the GIC doesn't have banked registers. The offset is + cpu-offset * cpu-nr. + Example: intc: interrupt-controller@fff11000 { diff --git a/Documentation/devicetree/bindings/arm/vic.txt b/Documentation/devicetree/bindings/arm/vic.txt new file mode 100644 index 000000000000..266716b23437 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/vic.txt @@ -0,0 +1,29 @@ +* ARM Vectored Interrupt Controller + +One or more Vectored Interrupt Controllers (VIC's) can be connected in an ARM +system for interrupt routing. For multiple controllers they can either be +nested or have the outputs wire-OR'd together. + +Required properties: + +- compatible : should be one of + "arm,pl190-vic" + "arm,pl192-vic" +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : The number of cells to define the interrupts. Must be 1 as + the VIC has no configuration options for interrupt sources. The cell is a u32 + and defines the interrupt number. +- reg : The register bank for the VIC. + +Optional properties: + +- interrupts : Interrupt source for parent controllers if the VIC is nested. + +Example: + + vic0: interrupt-controller@60000 { + compatible = "arm,pl192-vic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x60000 0x1000>; + }; diff --git a/Documentation/devicetree/bindings/i2c/i2c-designware.txt b/Documentation/devicetree/bindings/i2c/i2c-designware.txt new file mode 100644 index 000000000000..e42a2ee233e6 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-designware.txt @@ -0,0 +1,22 @@ +* Synopsys DesignWare I2C + +Required properties : + + - compatible : should be "snps,designware-i2c" + - reg : Offset and length of the register set for the device + - interrupts : <IRQ> where IRQ is the interrupt number. + +Recommended properties : + + - clock-frequency : desired I2C bus clock frequency in Hz. + +Example : + + i2c@f0000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "snps,designware-i2c"; + reg = <0xf0000 0x1000>; + interrupts = <11>; + clock-frequency = <400000>; + }; diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt new file mode 100644 index 000000000000..1a85f986961b --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -0,0 +1,58 @@ +This is a list of trivial i2c devices that have simple device tree +bindings, consisting only of a compatible field, an address and +possibly an interrupt line. + +If a device needs more specific bindings, such as properties to +describe some aspect of it, there needs to be a specific binding +document for it just like any other devices. + + +Compatible Vendor / Chip +========== ============= +ad,ad7414 SMBus/I2C Digital Temperature Sensor in 6-Pin SOT with SMBus Alert and Over Temperature Pin +ad,adm9240 ADM9240: Complete System Hardware Monitor for uProcessor-Based Systems +adi,adt7461 +/-1C TDM Extended Temp Range I.C +adt7461 +/-1C TDM Extended Temp Range I.C +at,24c08 i2c serial eeprom (24cxx) +atmel,24c02 i2c serial eeprom (24cxx) +catalyst,24c32 i2c serial eeprom +dallas,ds1307 64 x 8, Serial, I2C Real-Time Clock +dallas,ds1338 I2C RTC with 56-Byte NV RAM +dallas,ds1339 I2C Serial Real-Time Clock +dallas,ds1340 I2C RTC with Trickle Charger +dallas,ds1374 I2C, 32-Bit Binary Counter Watchdog RTC with Trickle Charger and Reset Input/Output +dallas,ds1631 High-Precision Digital Thermometer +dallas,ds1682 Total-Elapsed-Time Recorder with Alarm +dallas,ds1775 Tiny Digital Thermometer and Thermostat +dallas,ds3232 Extremely Accurate I²C RTC with Integrated Crystal and SRAM +dallas,ds4510 CPU Supervisor with Nonvolatile Memory and Programmable I/O +dallas,ds75 Digital Thermometer and Thermostat +dialog,da9053 DA9053: flexible system level PMIC with multicore support +epson,rx8025 High-Stability. I2C-Bus INTERFACE REAL TIME CLOCK MODULE +epson,rx8581 I2C-BUS INTERFACE REAL TIME CLOCK MODULE +fsl,mag3110 MAG3110: Xtrinsic High Accuracy, 3D Magnetometer +fsl,mc13892 MC13892: Power Management Integrated Circuit (PMIC) for i.MX35/51 +fsl,mma8450 MMA8450Q: Xtrinsic Low-power, 3-axis Xtrinsic Accelerometer +fsl,mpr121 MPR121: Proximity Capacitive Touch Sensor Controller +fsl,sgtl5000 SGTL5000: Ultra Low-Power Audio Codec +maxim,ds1050 5 Bit Programmable, Pulse-Width Modulator +maxim,max1237 Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs +maxim,max6625 9-Bit/12-Bit Temperature Sensors with I²C-Compatible Serial Interface +mc,rv3029c2 Real Time Clock Module with I2C-Bus +national,lm75 I2C TEMP SENSOR +national,lm80 Serial Interface ACPI-Compatible Microprocessor System Hardware Monitor +national,lm92 ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and Thermal Window Comparator with Two-Wire Interface +nxp,pca9556 Octal SMBus and I2C registered interface +nxp,pca9557 8-bit I2C-bus and SMBus I/O port with reset +nxp,pcf8563 Real-time clock/calendar +ovti,ov5642 OV5642: Color CMOS QSXGA (5-megapixel) Image Sensor with OmniBSI and Embedded TrueFocus +pericom,pt7c4338 Real-time Clock Module +plx,pex8648 48-Lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch +ramtron,24c64 i2c serial eeprom (24cxx) +ricoh,rs5c372a I2C bus SERIAL INTERFACE REAL-TIME CLOCK IC +samsung,24ad0xd1 S524AD0XF1 (128K/256K-bit Serial EEPROM for Low Power) +st-micro,24c256 i2c serial eeprom (24cxx) +stm,m41t00 Serial Access TIMEKEEPER +stm,m41t62 Serial real-time clock (RTC) with alarm +stm,m41t80 M41T80 - SERIAL ACCESS RTC WITH ALARMS +ti,tsc2003 I2C Touch-Screen Controller diff --git a/Documentation/devicetree/bindings/net/calxeda-xgmac.txt b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt new file mode 100644 index 000000000000..411727a3f82d --- /dev/null +++ b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt @@ -0,0 +1,15 @@ +* Calxeda Highbank 10Gb XGMAC Ethernet + +Required properties: +- compatible : Should be "calxeda,hb-xgmac" +- reg : Address and length of the register set for the device +- interrupts : Should contain 3 xgmac interrupts. The 1st is main interrupt. + The 2nd is pwr mgt interrupt. The 3rd is low power state interrupt. + +Example: + +ethernet@fff50000 { + compatible = "calxeda,hb-xgmac"; + reg = <0xfff50000 0x1000>; + interrupts = <0 77 4 0 78 4 0 79 4>; +}; diff --git a/Documentation/devicetree/bindings/net/can/cc770.txt b/Documentation/devicetree/bindings/net/can/cc770.txt new file mode 100644 index 000000000000..77027bf6460a --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/cc770.txt @@ -0,0 +1,53 @@ +Memory mapped Bosch CC770 and Intel AN82527 CAN controller + +Note: The CC770 is a CAN controller from Bosch, which is 100% +compatible with the old AN82527 from Intel, but with "bugs" being fixed. + +Required properties: + +- compatible : should be "bosch,cc770" for the CC770 and "intc,82527" + for the AN82527. + +- reg : should specify the chip select, address offset and size required + to map the registers of the controller. The size is usually 0x80. + +- interrupts : property with a value describing the interrupt source + (number and sensitivity) required for the controller. + +Optional properties: + +- bosch,external-clock-frequency : frequency of the external oscillator + clock in Hz. Note that the internal clock frequency used by the + controller is half of that value. If not specified, a default + value of 16000000 (16 MHz) is used. + +- bosch,clock-out-frequency : slock frequency in Hz on the CLKOUT pin. + If not specified or if the specified value is 0, the CLKOUT pin + will be disabled. + +- bosch,slew-rate : slew rate of the CLKOUT signal. If not specified, + a resonable value will be calculated. + +- bosch,disconnect-rx0-input : see data sheet. + +- bosch,disconnect-rx1-input : see data sheet. + +- bosch,disconnect-tx1-output : see data sheet. + +- bosch,polarity-dominant : see data sheet. + +- bosch,divide-memory-clock : see data sheet. + +- bosch,iso-low-speed-mux : see data sheet. + +For further information, please have a look to the CC770 or AN82527. + +Examples: + +can@3,100 { + compatible = "bosch,cc770"; + reg = <3 0x100 0x80>; + interrupts = <2 0>; + interrupt-parent = <&mpic>; + bosch,external-clock-frequency = <16000000>; +}; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt new file mode 100644 index 000000000000..b9a8a2bcfae7 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt @@ -0,0 +1,163 @@ +Message unit node: + +For SRIO controllers that implement the message unit as part of the controller +this node is required. For devices with RMAN this node should NOT exist. The +node is composed of three types of sub-nodes ("fsl-srio-msg-unit", +"fsl-srio-dbell-unit" and "fsl-srio-port-write-unit"). + +See srio.txt for more details about generic SRIO controller details. + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-rmu-vX.Y", "fsl,srio-rmu". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + The LIODN value is associated with all RMU transactions + (msg-unit, doorbell, port-write). + +Sub-Nodes for RMU: The RMU node is composed of multiple sub-nodes that +correspond to the actual sub-controllers in the RMU. The manual for a given +SoC will detail which and how many of these sub-controllers are implemented. + +Message Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-msg-unit-vX.Y", "fsl,srio-msg-unit". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Doorbell Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-dbell-unit-vX.Y", "fsl,srio-dbell-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Port-Write Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-port-write-unit-vX.Y", "fsl,srio-port-write-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles port-write conditions is + specified by this property. (Typically shared with error). + + Note: All other standard properties (see the ePAPR) are allowed + but are optional. + +Example: + rmu: rmu@d3000 { + compatible = "fsl,srio-rmu"; + reg = <0xd3000 0x400>; + ranges = <0x0 0xd3000 0x400>; + fsl,liodn = <0xc8>; + + message-unit@0 { + compatible = "fsl,srio-msg-unit"; + reg = <0x0 0x100>; + interrupts = < + 60 2 0 0 /* msg1_tx_irq */ + 61 2 0 0>;/* msg1_rx_irq */ + }; + message-unit@100 { + compatible = "fsl,srio-msg-unit"; + reg = <0x100 0x100>; + interrupts = < + 62 2 0 0 /* msg2_tx_irq */ + 63 2 0 0>;/* msg2_rx_irq */ + }; + doorbell-unit@400 { + compatible = "fsl,srio-dbell-unit"; + reg = <0x400 0x80>; + interrupts = < + 56 2 0 0 /* bell_outb_irq */ + 57 2 0 0>;/* bell_inb_irq */ + }; + port-write-unit@4e0 { + compatible = "fsl,srio-port-write-unit"; + reg = <0x4e0 0x20>; + interrupts = <16 2 1 11>; + }; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt new file mode 100644 index 000000000000..b039bcbee134 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt @@ -0,0 +1,103 @@ +* Freescale Serial RapidIO (SRIO) Controller + +RapidIO port node: +Properties: + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio" for IP blocks with IP Block + Revision Register (SRIO IPBRR1) Major ID equal to 0x01c0. + + Optionally, a compatiable string of "fsl,srio-vX.Y" where X is Major + version in IP Block Revision Register and Y is Minor version. If this + compatiable is provided it should be ordered before "fsl,srio". + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers. The size should + be set to 0x11000. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles error conditions is specified by this + property. (Typically shared with port-write). + + - fsl,srio-rmu-handle: + Usage: required if rmu node is defined + Value type: <phandle> + Definition: A single <phandle> value that points to the RMU. + (See srio-rmu.txt for more details on RMU node binding) + +Port Child Nodes: There should a port child node for each port that exists in +the controller. The ports are numbered starting at one (1) and should have +the following properties: + + - cell-index + Usage: required + Value type: <u32> + Definition: A standard property. Matches the port id. + + - ranges + Usage: required if local access windows preset + Value type: <prop-encoded-array> + Definition: A standard property. Utilized to describe the memory mapped + IO space utilized by the controller. This corresponds to the + setting of the local access windows that are targeted to this + SRIO port. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + For HW (ie, the P4080) that only supports a LIODN for both + memory and maintenance transactions then a single LIODN is + represented in the property for both transactions. + + For HW (ie, the P304x/P5020, etc) that supports an LIODN for + memory transactions and a unique LIODN for maintenance + transactions then a pair of LIODNs are represented in the + property. Within the pair, the first element represents the + LIODN associated with memory transactions and the second element + represents the LIODN associated with maintenance transactions + for the port. + +Note: All other standard properties (see ePAPR) are allowed but are optional. + +Example: + + rapidio: rapidio@ffe0c0000 { + #address-cells = <2>; + #size-cells = <2>; + reg = <0xf 0xfe0c0000 0 0x11000>; + compatible = "fsl,srio"; + interrupts = <16 2 1 11>; /* err_irq */ + fsl,srio-rmu-handle = <&rmu>; + ranges; + + port1 { + cell-index = <1>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <34>; + ranges = <0 0 0xc 0x20000000 0 0x10000000>; + }; + + port2 { + cell-index = <2>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <48>; + ranges = <0 0 0xc 0x30000000 0 0x10000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 874921e97802..18626965159e 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -8,7 +8,9 @@ amcc Applied Micro Circuits Corporation (APM, formally AMCC) apm Applied Micro Circuits Corporation (APM) arm ARM Ltd. atmel Atmel Corporation +cavium Cavium, Inc. chrp Common Hardware Reference Platform +cortina Cortina Systems, Inc. dallas Maxim Integrated Products (formerly Dallas Semiconductor) denx Denx Software Engineering epson Seiko Epson Corp. @@ -36,6 +38,7 @@ schindler Schindler sil Silicon Image simtek sirf SiRF Technology, Inc. +st STMicroelectronics stericsson ST-Ericsson ti Texas Instruments xlnx Xilinx diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index d79aead9418b..10c64c8a13d4 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -262,6 +262,7 @@ IOMAP devm_ioremap() devm_ioremap_nocache() devm_iounmap() + devm_request_and_ioremap() : checks resource, requests region, ioremaps pcim_iomap() pcim_iounmap() pcim_iomap_table() : array of mapped addresses indexed by BAR diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 3d849122b5b1..33f7327d0451 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -263,8 +263,7 @@ Who: Ravikiran Thirumalai <kiran@scalex86.org> What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS (in net/core/net-sysfs.c) -When: After the only user (hal) has seen a release with the patches - for enough time, probably some time in 2010. +When: 3.5 Why: Over 1K .text/.data size reduction, data is available in other ways (ioctls) Who: Johannes Berg <johannes@sipsolutions.net> diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 9281a95d689f..6872c91bce35 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -97,7 +97,8 @@ A read on the resulting file will yield either Y (for non-zero values) or N, followed by a newline. If written to, it will accept either upper- or lower-case values, or 1 or 0. Any other input will be silently ignored. -Finally, a block of arbitrary binary data can be exported with: +Another option is exporting a block of arbitrary binary data, with +this structure and function: struct debugfs_blob_wrapper { void *data; @@ -115,6 +116,35 @@ can be used to export binary information, but there does not appear to be any code which does so in the mainline. Note that all files created with debugfs_create_blob() are read-only. +If you want to dump a block of registers (something that happens quite +often during development, even if little such code reaches mainline. +Debugfs offers two functions: one to make a registers-only file, and +another to insert a register block in the middle of another sequential +file. + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + + struct dentry *debugfs_create_regset32(const char *name, mode_t mode, + struct dentry *parent, + struct debugfs_regset32 *regset); + + int debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + +The "base" argument may be 0, but you may want to build the reg32 array +using __stringify, and a number of register names (macros) are actually +byte offsets over a base for the register block. + + There are a couple of other directory-oriented helper functions: struct dentry *debugfs_rename(struct dentry *old_dir, diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 81c287fad79d..e229769606f2 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1885,6 +1885,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. arch_perfmon: [X86] Force use of architectural perfmon on Intel CPUs instead of the CPU specific event set. + timer: [X86] Force use of architectural NMI + timer mode (see also oprofile.timer + for generic hr timer mode) + [s390] Force legacy basic mode sampling + (report cpu_type "timer") oops=panic Always panic on oopses. Default is to just kill the process, but there is a small probability of @@ -2750,11 +2755,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. functions are at fixed addresses, they make nice targets for exploits that can control RIP. - emulate Vsyscalls turn into traps and are emulated - reasonably safely. + emulate [default] Vsyscalls turn into traps and are + emulated reasonably safely. - native [default] Vsyscalls are native syscall - instructions. + native Vsyscalls are native syscall instructions. This is a little bit faster than trapping and makes a few dynamic recompilers work better than they would in emulation mode. diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt index abf768c681e2..5dbc99c04f6e 100644 --- a/Documentation/lockdep-design.txt +++ b/Documentation/lockdep-design.txt @@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash table, which hash-table can be checked in a lockfree manner. If the locking chain occurs again later on, the hash table tells us that we dont have to validate the chain again. + +Troubleshooting: +---------------- + +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes. +Exceeding this number will trigger the following lockdep warning: + + (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)) + +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical +desktop systems have less than 1,000 lock classes, so this warning +normally results from lock-class leakage or failure to properly +initialize locks. These two problems are illustrated below: + +1. Repeated module loading and unloading while running the validator + will result in lock-class leakage. The issue here is that each + load of the module will create a new set of lock classes for + that module's locks, but module unloading does not remove old + classes (see below discussion of reuse of lock classes for why). + Therefore, if that module is loaded and unloaded repeatedly, + the number of lock classes will eventually reach the maximum. + +2. Using structures such as arrays that have large numbers of + locks that are not explicitly initialized. For example, + a hash table with 8192 buckets where each bucket has its own + spinlock_t will consume 8192 lock classes -unless- each spinlock + is explicitly initialized at runtime, for example, using the + run-time spin_lock_init() as opposed to compile-time initializers + such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize + the per-bucket spinlocks would guarantee lock-class overflow. + In contrast, a loop that called spin_lock_init() on each lock + would place all 8192 locks into a single lock class. + + The moral of this story is that you should always explicitly + initialize your locks. + +One might argue that the validator should be modified to allow +lock classes to be reused. However, if you are tempted to make this +argument, first review the code and think through the changes that would +be required, keeping in mind that the lock classes to be removed are +likely to be linked into the lock-dependency graph. This turns out to +be harder to do than to say. + +Of course, if you do run out of lock classes, the next thing to do is +to find the offending lock classes. First, the following command gives +you the number of lock classes currently in use along with the maximum: + + grep "lock-classes" /proc/lockdep_stats + +This command produces the following output on a modest system: + + lock-classes: 748 [max: 8191] + +If the number allocated (748 above) increases continually over time, +then there is likely a leak. The following command can be used to +identify the leaking lock classes: + + grep "BD" /proc/lockdep + +Run the command and save the output, then compare against the output from +a later run of this command to identify the leakers. This same output +can also help you find situations where runtime lock initialization has +been omitted. diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index bbce1215434a..9ad9ddeb384c 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -144,6 +144,8 @@ nfc.txt - The Linux Near Field Communication (NFS) subsystem. olympic.txt - IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info. +openvswitch.txt + - Open vSwitch developer documentation. operstates.txt - Overview of network interface operational states. packet_mmap.txt diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index c86d03f18a5b..221ad0cdf11f 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -200,15 +200,16 @@ abled during run time. Following log_levels are defined: 0 - All debug output disabled 1 - Enable messages related to routing / flooding / broadcasting -2 - Enable route or tt entry added / changed / deleted -3 - Enable all messages +2 - Enable messages related to route added / changed / deleted +4 - Enable messages related to translation table operations +7 - Enable all messages The debug output can be changed at runtime using the file /sys/class/net/bat0/mesh/log_level. e.g. # echo 2 > /sys/class/net/bat0/mesh/log_level -will enable debug messages for when routes or TTs change. +will enable debug messages for when routes change. BATCTL diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 91df678fb7f8..080ad26690ae 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -196,6 +196,23 @@ or, for backwards compatibility, the option value. E.g., The parameters are as follows: +active_slave + + Specifies the new active slave for modes that support it + (active-backup, balance-alb and balance-tlb). Possible values + are the name of any currently enslaved interface, or an empty + string. If a name is given, the slave and its link must be up in order + to be selected as the new active slave. If an empty string is + specified, the current active slave is cleared, and a new active + slave is selected automatically. + + Note that this is only available through the sysfs interface. No module + parameter by this name exists. + + The normal value of this option is the name of the currently + active slave, or the empty string if there is no active slave or + the current mode does not use an active slave. + ad_select Specifies the 802.3ad aggregation selection logic to use. The diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index f41ea2405220..1dc1c24a7547 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt @@ -78,3 +78,30 @@ in software. This is currently WIP. See header include/net/mac802154.h and several drivers in drivers/ieee802154/. +6LoWPAN Linux implementation +============================ + +The IEEE 802.15.4 standard specifies an MTU of 128 bytes, yielding about 80 +octets of actual MAC payload once security is turned on, on a wireless link +with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format +[RFC4944] was specified to carry IPv6 datagrams over such constrained links, +taking into account limited bandwidth, memory, or energy resources that are +expected in applications such as wireless Sensor Networks. [RFC4944] defines +a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header +to support the IPv6 minimum MTU requirement [RFC2460], and stateless header +compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the +relatively large IPv6 and UDP headers down to (in the best case) several bytes. + +In Semptember 2011 the standard update was published - [RFC6282]. +It deprecates HC1 and HC2 compression and defines IPHC encoding format which is +used in this Linux implementation. + +All the code related to 6lowpan you may find in files: net/ieee802154/6lowpan.* + +To setup 6lowpan interface you need (busybox release > 1.17.0): +1. Add IEEE802.15.4 interface and initialize PANid; +2. Add 6lowpan interface by command like: + # ip link add link wpan0 name lowpan0 type lowpan +3. Set MAC (if needs): + # ip link set lowpan0 address de:ad:be:ef:ca:fe:ba:be +4. Bring up 'lowpan0' interface diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c index 65968fbf1e49..ac5debb2f16c 100644 --- a/Documentation/networking/ifenslave.c +++ b/Documentation/networking/ifenslave.c @@ -539,12 +539,14 @@ static int if_getconfig(char *ifname) metric = 0; } else metric = ifr.ifr_metric; + printf("The result of SIOCGIFMETRIC is %d\n", metric); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0) mtu = 0; else mtu = ifr.ifr_mtu; + printf("The result of SIOCGIFMTU is %d\n", mtu); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) { diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 589f2da5d545..ad3e80e17b4f 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -31,6 +31,16 @@ neigh/default/gc_thresh3 - INTEGER when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. +neigh/default/unres_qlen_bytes - INTEGER + The maximum number of bytes which may be used by packets + queued for each unresolved address by other network layers. + (added in linux 3.3) + +neigh/default/unres_qlen - INTEGER + The maximum number of packets which may be queued for each + unresolved address by other network layers. + (deprecated in linux 3.3) : use unres_qlen_bytes instead. + mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. @@ -165,6 +175,9 @@ tcp_congestion_control - STRING connections. The algorithm "reno" is always available, but additional choices may be available based on kernel configuration. Default is set as part of kernel configuration. + For passive connections, the listener congestion control choice + is inherited. + [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ] tcp_cookie_size - INTEGER Default size of TCP Cookie Transactions (TCPCT) option, that may be diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt new file mode 100644 index 000000000000..b8a048b8df3a --- /dev/null +++ b/Documentation/networking/openvswitch.txt @@ -0,0 +1,195 @@ +Open vSwitch datapath developer documentation +============================================= + +The Open vSwitch kernel module allows flexible userspace control over +flow-level packet processing on selected network devices. It can be +used to implement a plain Ethernet switch, network device bonding, +VLAN processing, network access control, flow-based network control, +and so on. + +The kernel module implements multiple "datapaths" (analogous to +bridges), each of which can have multiple "vports" (analogous to ports +within a bridge). Each datapath also has associated with it a "flow +table" that userspace populates with "flows" that map from keys based +on packet headers and metadata to sets of actions. The most common +action forwards the packet to another vport; other actions are also +implemented. + +When a packet arrives on a vport, the kernel module processes it by +extracting its flow key and looking it up in the flow table. If there +is a matching flow, it executes the associated actions. If there is +no match, it queues the packet to userspace for processing (as part of +its processing, userspace will likely set up a flow to handle further +packets of the same type entirely in-kernel). + + +Flow key compatibility +---------------------- + +Network protocols evolve over time. New protocols become important +and existing protocols lose their prominence. For the Open vSwitch +kernel module to remain relevant, it must be possible for newer +versions to parse additional protocols as part of the flow key. It +might even be desirable, someday, to drop support for parsing +protocols that have become obsolete. Therefore, the Netlink interface +to Open vSwitch is designed to allow carefully written userspace +applications to work with any version of the flow key, past or future. + +To support this forward and backward compatibility, whenever the +kernel module passes a packet to userspace, it also passes along the +flow key that it parsed from the packet. Userspace then extracts its +own notion of a flow key from the packet and compares it against the +kernel-provided version: + + - If userspace's notion of the flow key for the packet matches the + kernel's, then nothing special is necessary. + + - If the kernel's flow key includes more fields than the userspace + version of the flow key, for example if the kernel decoded IPv6 + headers but userspace stopped at the Ethernet type (because it + does not understand IPv6), then again nothing special is + necessary. Userspace can still set up a flow in the usual way, + as long as it uses the kernel-provided flow key to do it. + + - If the userspace flow key includes more fields than the + kernel's, for example if userspace decoded an IPv6 header but + the kernel stopped at the Ethernet type, then userspace can + forward the packet manually, without setting up a flow in the + kernel. This case is bad for performance because every packet + that the kernel considers part of the flow must go to userspace, + but the forwarding behavior is correct. (If userspace can + determine that the values of the extra fields would not affect + forwarding behavior, then it could set up a flow anyway.) + +How flow keys evolve over time is important to making this work, so +the following sections go into detail. + + +Flow key format +--------------- + +A flow key is passed over a Netlink socket as a sequence of Netlink +attributes. Some attributes represent packet metadata, defined as any +information about a packet that cannot be extracted from the packet +itself, e.g. the vport on which the packet was received. Most +attributes, however, are extracted from headers within the packet, +e.g. source and destination addresses from Ethernet, IP, or TCP +headers. + +The <linux/openvswitch.h> header file defines the exact format of the +flow key attributes. For informal explanatory purposes here, we write +them as comma-separated strings, with parentheses indicating arguments +and nesting. For example, the following could represent a flow key +corresponding to a TCP packet that arrived on vport 1: + + in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), + eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, + frag=no), tcp(src=49163, dst=80) + +Often we ellipsize arguments not important to the discussion, e.g.: + + in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) + + +Basic rule for evolving flow keys +--------------------------------- + +Some care is needed to really maintain forward and backward +compatibility for applications that follow the rules listed under +"Flow key compatibility" above. + +The basic rule is obvious: + + ------------------------------------------------------------------ + New network protocol support must only supplement existing flow + key attributes. It must not change the meaning of already defined + flow key attributes. + ------------------------------------------------------------------ + +This rule does have less-obvious consequences so it is worth working +through a few examples. Suppose, for example, that the kernel module +did not already implement VLAN parsing. Instead, it just interpreted +the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the +packet. The flow key for any packet with an 802.1Q header would look +essentially like this, ignoring metadata: + + eth(...), eth_type(0x8100) + +Naively, to add VLAN support, it makes sense to add a new "vlan" flow +key attribute to contain the VLAN tag, then continue to decode the +encapsulated headers beyond the VLAN tag using the existing field +definitions. With this change, an TCP packet in VLAN 10 would have a +flow key much like this: + + eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) + +But this change would negatively affect a userspace application that +has not been updated to understand the new "vlan" flow key attribute. +The application could, following the flow compatibility rules above, +ignore the "vlan" attribute that it does not understand and therefore +assume that the flow contained IP packets. This is a bad assumption +(the flow only contains IP packets if one parses and skips over the +802.1Q header) and it could cause the application's behavior to change +across kernel versions even though it follows the compatibility rules. + +The solution is to use a set of nested attributes. This is, for +example, why 802.1Q support uses nested attributes. A TCP packet in +VLAN 10 is actually expressed as: + + eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), + ip(proto=6, ...), tcp(...))) + +Notice how the "eth_type", "ip", and "tcp" flow key attributes are +nested inside the "encap" attribute. Thus, an application that does +not understand the "vlan" key will not see either of those attributes +and therefore will not misinterpret them. (Also, the outer eth_type +is still 0x8100, not changed to 0x0800.) + +Handling malformed packets +-------------------------- + +Don't drop packets in the kernel for malformed protocol headers, bad +checksums, etc. This would prevent userspace from implementing a +simple Ethernet switch that forwards every packet. + +Instead, in such a case, include an attribute with "empty" content. +It doesn't matter if the empty content could be valid protocol values, +as long as those values are rarely seen in practice, because userspace +can always forward all packets with those values to userspace and +handle them individually. + +For example, consider a packet that contains an IP header that +indicates protocol 6 for TCP, but which is truncated just after the IP +header, so that the TCP header is missing. The flow key for this +packet would include a tcp attribute with all-zero src and dst, like +this: + + eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) + +As another example, consider a packet with an Ethernet type of 0x8100, +indicating that a VLAN TCI should follow, but which is truncated just +after the Ethernet type. The flow key for this packet would include +an all-zero-bits vlan and an empty encap attribute, like this: + + eth(...), eth_type(0x8100), vlan(0), encap() + +Unlike a TCP packet with source and destination ports 0, an +all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka +VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan +attribute expressly to allow this situation to be distinguished. +Thus, the flow key in this second example unambiguously indicates a +missing or malformed VLAN TCI. + +Other rules +----------- + +The other rules for flow keys are much less subtle: + + - Duplicate attributes are not allowed at a given nesting level. + + - Ordering of attributes is not significant. + + - When the kernel sends a given flow key to userspace, it always + composes it the same way. This allows userspace to hash and + compare entire flow keys that it may not be able to fully + interpret. diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 4acea6603720..1c08a4b0981f 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -155,7 +155,7 @@ As capture, each frame contains two parts: /* fill sockaddr_ll struct to prepare binding */ my_addr.sll_family = AF_PACKET; - my_addr.sll_protocol = ETH_P_ALL; + my_addr.sll_protocol = htons(ETH_P_ALL); my_addr.sll_ifindex = s_ifr.ifr_ifindex; /* bind socket to eth0 */ diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index a177de21d28e..579994afbe06 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -208,7 +208,7 @@ The counter in rps_dev_flow_table values records the length of the current CPU's backlog when a packet in this flow was last enqueued. Each backlog queue has a head counter that is incremented on dequeue. A tail counter is computed as head counter + queue length. In other words, the counter -in rps_dev_flow_table[i] records the last element in flow i that has +in rps_dev_flow[i] records the last element in flow i that has been enqueued onto the currently designated CPU for flow i (of course, entry i is actually selected by hash and multiple flows may hash to the same entry i). @@ -224,7 +224,7 @@ following is true: - The current CPU's queue head counter >= the recorded tail counter value in rps_dev_flow[i] -- The current CPU is unset (equal to NR_CPUS) +- The current CPU is unset (equal to RPS_NO_CPU) - The current CPU is offline After this check, the packet is sent to the (possibly updated) current @@ -235,7 +235,7 @@ CPU. ==== RFS Configuration -RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on by default for SMP). The functionality remains disabled until explicitly configured. The number of entries in the global flow table is set through: @@ -258,7 +258,7 @@ For a single queue device, the rps_flow_cnt value for the single queue would normally be configured to the same value as rps_sock_flow_entries. For a multi-queue device, the rps_flow_cnt for each queue might be configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_flow_entries is set to 32768 and there +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there are 16 configured receive queues, rps_flow_cnt for each queue might be configured as 2048. diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 8d67980fabe8..d0aeeadd264b 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -4,14 +4,16 @@ Copyright (C) 2007-2010 STMicroelectronics Ltd Author: Giuseppe Cavallaro <peppe.cavallaro@st.com> This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers -(Synopsys IP blocks); it has been fully tested on STLinux platforms. +(Synopsys IP blocks). Currently this network device driver is for all STM embedded MAC/GMAC -(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr. +(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000 +FF1152AMT0221 D1215994A VIRTEX FPGA board. -DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100 -Universal version 4.0 have been used for developing the first code -implementation. +DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether MAC 10/100 +Universal version 4.0 have been used for developing this driver. + +This driver supports both the platform bus and PCI. Please, for more information also visit: www.stlinux.com @@ -277,5 +279,5 @@ In fact, these can generate an huge amount of debug messages. 6) TODO: o XGMAC is not supported. - o Review the timer optimisation code to use an embedded device that will be - available in new chip generations. + o Add the EEE - Energy Efficient Ethernet + o Add the PTP - precision time protocol diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt new file mode 100644 index 000000000000..5a013686b9ea --- /dev/null +++ b/Documentation/networking/team.txt @@ -0,0 +1,2 @@ +Team devices are driven from userspace via libteam library which is here: + https://github.com/jpirko/libteam diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index b510564aac7e..bb24c2a0e870 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -191,8 +191,6 @@ And for string fields they are: Currently, only exact string matches are supported. -Currently, the maximum number of predicates in a filter is 16. - 5.2 Setting filters ------------------- |