diff options
author | Paul Mackerras <paulus@samba.org> | 2008-01-31 03:25:51 +0300 |
---|---|---|
committer | Paul Mackerras <paulus@samba.org> | 2008-01-31 03:25:51 +0300 |
commit | bd45ac0c5daae35e7c71138172e63df5cf644cf6 (patch) | |
tree | 5eb5a599bf6a9d7a8a34e802db932aa9e9555de4 /Documentation | |
parent | 4eece4ccf997c0e6d8fdad3d842e37b16b8d705f (diff) | |
parent | 5bdeae46be6dfe9efa44a548bd622af325f4bdb4 (diff) | |
download | linux-bd45ac0c5daae35e7c71138172e63df5cf644cf6.tar.xz |
Merge branch 'linux-2.6'
Diffstat (limited to 'Documentation')
70 files changed, 5460 insertions, 665 deletions
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 4953bc258729..6a0ad4715e9f 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -11,7 +11,7 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ procfs-guide.xml writing_usb_driver.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ - genericirq.xml s390-drivers.xml uio-howto.xml + genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml ### # The build process is as follows (targets): diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index aa38cc5692a0..77436d735013 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -419,7 +419,13 @@ X!Edrivers/pnp/system.c <chapter id="blkdev"> <title>Block Devices</title> -!Eblock/ll_rw_blk.c +!Eblock/blk-core.c +!Eblock/blk-map.c +!Iblock/blk-sysfs.c +!Eblock/blk-settings.c +!Eblock/blk-exec.c +!Eblock/blk-barrier.c +!Eblock/blk-tag.c </chapter> <chapter id="chrdev"> diff --git a/Documentation/DocBook/s390-drivers.tmpl b/Documentation/DocBook/s390-drivers.tmpl index 254e769282a4..3d2f31b99dd9 100644 --- a/Documentation/DocBook/s390-drivers.tmpl +++ b/Documentation/DocBook/s390-drivers.tmpl @@ -116,6 +116,7 @@ !Iinclude/asm-s390/ccwdev.h !Edrivers/s390/cio/device.c !Edrivers/s390/cio/device_ops.c +!Edrivers/s390/cio/airq.c </sect1> <sect1 id="cmf"> <title>The channel-measurement facility</title> diff --git a/Documentation/DocBook/scsi.tmpl b/Documentation/DocBook/scsi.tmpl new file mode 100644 index 000000000000..f299ab182bbe --- /dev/null +++ b/Documentation/DocBook/scsi.tmpl @@ -0,0 +1,409 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" + "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> + +<book id="scsimid"> + <bookinfo> + <title>SCSI Interfaces Guide</title> + + <authorgroup> + <author> + <firstname>James</firstname> + <surname>Bottomley</surname> + <affiliation> + <address> + <email>James.Bottomley@steeleye.com</email> + </address> + </affiliation> + </author> + + <author> + <firstname>Rob</firstname> + <surname>Landley</surname> + <affiliation> + <address> + <email>rob@landley.net</email> + </address> + </affiliation> + </author> + + </authorgroup> + + <copyright> + <year>2007</year> + <holder>Linux Foundation</holder> + </copyright> + + <legalnotice> + <para> + This documentation is free software; you can redistribute + it and/or modify it under the terms of the GNU General Public + License version 2. + </para> + + <para> + This program is distributed in the hope that it will be + useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + For more details see the file COPYING in the source + distribution of Linux. + </para> + </legalnotice> + </bookinfo> + + <toc></toc> + + <chapter id="intro"> + <title>Introduction</title> + <sect1 id="protocol_vs_bus"> + <title>Protocol vs bus</title> + <para> + Once upon a time, the Small Computer Systems Interface defined both + a parallel I/O bus and a data protocol to connect a wide variety of + peripherals (disk drives, tape drives, modems, printers, scanners, + optical drives, test equipment, and medical devices) to a host + computer. + </para> + <para> + Although the old parallel (fast/wide/ultra) SCSI bus has largely + fallen out of use, the SCSI command set is more widely used than ever + to communicate with devices over a number of different busses. + </para> + <para> + The <ulink url='http://www.t10.org/scsi-3.htm'>SCSI protocol</ulink> + is a big-endian peer-to-peer packet based protocol. SCSI commands + are 6, 10, 12, or 16 bytes long, often followed by an associated data + payload. + </para> + <para> + SCSI commands can be transported over just about any kind of bus, and + are the default protocol for storage devices attached to USB, SATA, + SAS, Fibre Channel, FireWire, and ATAPI devices. SCSI packets are + also commonly exchanged over Infiniband, + <ulink url='http://i2o.shadowconnect.com/faq.php'>I20</ulink>, TCP/IP + (<ulink url='http://en.wikipedia.org/wiki/ISCSI'>iSCSI</ulink>), even + <ulink url='http://cyberelk.net/tim/parport/parscsi.html'>Parallel + ports</ulink>. + </para> + </sect1> + <sect1 id="subsystem_design"> + <title>Design of the Linux SCSI subsystem</title> + <para> + The SCSI subsystem uses a three layer design, with upper, mid, and low + layers. Every operation involving the SCSI subsystem (such as reading + a sector from a disk) uses one driver at each of the 3 levels: one + upper layer driver, one lower layer driver, and the SCSI midlayer. + </para> + <para> + The SCSI upper layer provides the interface between userspace and the + kernel, in the form of block and char device nodes for I/O and + ioctl(). The SCSI lower layer contains drivers for specific hardware + devices. + </para> + <para> + In between is the SCSI mid-layer, analogous to a network routing + layer such as the IPv4 stack. The SCSI mid-layer routes a packet + based data protocol between the upper layer's /dev nodes and the + corresponding devices in the lower layer. It manages command queues, + provides error handling and power management functions, and responds + to ioctl() requests. + </para> + </sect1> + </chapter> + + <chapter id="upper_layer"> + <title>SCSI upper layer</title> + <para> + The upper layer supports the user-kernel interface by providing + device nodes. + </para> + <sect1 id="sd"> + <title>sd (SCSI Disk)</title> + <para>sd (sd_mod.o)</para> +<!-- !Idrivers/scsi/sd.c --> + </sect1> + <sect1 id="sr"> + <title>sr (SCSI CD-ROM)</title> + <para>sr (sr_mod.o)</para> + </sect1> + <sect1 id="st"> + <title>st (SCSI Tape)</title> + <para>st (st.o)</para> + </sect1> + <sect1 id="sg"> + <title>sg (SCSI Generic)</title> + <para>sg (sg.o)</para> + </sect1> + <sect1 id="ch"> + <title>ch (SCSI Media Changer)</title> + <para>ch (ch.c)</para> + </sect1> + </chapter> + + <chapter id="mid_layer"> + <title>SCSI mid layer</title> + + <sect1 id="midlayer_implementation"> + <title>SCSI midlayer implementation</title> + <sect2 id="scsi_device.h"> + <title>include/scsi/scsi_device.h</title> + <para> + </para> +!Iinclude/scsi/scsi_device.h + </sect2> + + <sect2 id="scsi.c"> + <title>drivers/scsi/scsi.c</title> + <para>Main file for the SCSI midlayer.</para> +!Edrivers/scsi/scsi.c + </sect2> + <sect2 id="scsicam.c"> + <title>drivers/scsi/scsicam.c</title> + <para> + <ulink url='http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf'>SCSI + Common Access Method</ulink> support functions, for use with + HDIO_GETGEO, etc. + </para> +!Edrivers/scsi/scsicam.c + </sect2> + <sect2 id="scsi_error.c"> + <title>drivers/scsi/scsi_error.c</title> + <para>Common SCSI error/timeout handling routines.</para> +!Edrivers/scsi/scsi_error.c + </sect2> + <sect2 id="scsi_devinfo.c"> + <title>drivers/scsi/scsi_devinfo.c</title> + <para> + Manage scsi_dev_info_list, which tracks blacklisted and whitelisted + devices. + </para> +!Idrivers/scsi/scsi_devinfo.c + </sect2> + <sect2 id="scsi_ioctl.c"> + <title>drivers/scsi/scsi_ioctl.c</title> + <para> + Handle ioctl() calls for SCSI devices. + </para> +!Edrivers/scsi/scsi_ioctl.c + </sect2> + <sect2 id="scsi_lib.c"> + <title>drivers/scsi/scsi_lib.c</title> + <para> + SCSI queuing library. + </para> +!Edrivers/scsi/scsi_lib.c + </sect2> + <sect2 id="scsi_lib_dma.c"> + <title>drivers/scsi/scsi_lib_dma.c</title> + <para> + SCSI library functions depending on DMA + (map and unmap scatter-gather lists). + </para> +!Edrivers/scsi/scsi_lib_dma.c + </sect2> + <sect2 id="scsi_module.c"> + <title>drivers/scsi/scsi_module.c</title> + <para> + The file drivers/scsi/scsi_module.c contains legacy support for + old-style host templates. It should never be used by any new driver. + </para> + </sect2> + <sect2 id="scsi_proc.c"> + <title>drivers/scsi/scsi_proc.c</title> + <para> + The functions in this file provide an interface between + the PROC file system and the SCSI device drivers + It is mainly used for debugging, statistics and to pass + information directly to the lowlevel driver. + + I.E. plumbing to manage /proc/scsi/* + </para> +!Idrivers/scsi/scsi_proc.c + </sect2> + <sect2 id="scsi_netlink.c"> + <title>drivers/scsi/scsi_netlink.c</title> + <para> + Infrastructure to provide async events from transports to userspace + via netlink, using a single NETLINK_SCSITRANSPORT protocol for all + transports. + + See <ulink url='http://marc.info/?l=linux-scsi&m=115507374832500&w=2'>the + original patch submission</ulink> for more details. + </para> +!Idrivers/scsi/scsi_netlink.c + </sect2> + <sect2 id="scsi_scan.c"> + <title>drivers/scsi/scsi_scan.c</title> + <para> + Scan a host to determine which (if any) devices are attached. + + The general scanning/probing algorithm is as follows, exceptions are + made to it depending on device specific flags, compilation options, + and global variable (boot or module load time) settings. + + A specific LUN is scanned via an INQUIRY command; if the LUN has a + device attached, a scsi_device is allocated and setup for it. + + For every id of every channel on the given host, start by scanning + LUN 0. Skip hosts that don't respond at all to a scan of LUN 0. + Otherwise, if LUN 0 has a device attached, allocate and setup a + scsi_device for it. If target is SCSI-3 or up, issue a REPORT LUN, + and scan all of the LUNs returned by the REPORT LUN; else, + sequentially scan LUNs up until some maximum is reached, or a LUN is + seen that cannot have a device attached to it. + </para> +!Idrivers/scsi/scsi_scan.c + </sect2> + <sect2 id="scsi_sysctl.c"> + <title>drivers/scsi/scsi_sysctl.c</title> + <para> + Set up the sysctl entry: "/dev/scsi/logging_level" + (DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level. + </para> + </sect2> + <sect2 id="scsi_sysfs.c"> + <title>drivers/scsi/scsi_sysfs.c</title> + <para> + SCSI sysfs interface routines. + </para> +!Edrivers/scsi/scsi_sysfs.c + </sect2> + <sect2 id="hosts.c"> + <title>drivers/scsi/hosts.c</title> + <para> + mid to lowlevel SCSI driver interface + </para> +!Edrivers/scsi/hosts.c + </sect2> + <sect2 id="constants.c"> + <title>drivers/scsi/constants.c</title> + <para> + mid to lowlevel SCSI driver interface + </para> +!Edrivers/scsi/constants.c + </sect2> + </sect1> + + <sect1 id="Transport_classes"> + <title>Transport classes</title> + <para> + Transport classes are service libraries for drivers in the SCSI + lower layer, which expose transport attributes in sysfs. + </para> + <sect2 id="Fibre_Channel_transport"> + <title>Fibre Channel transport</title> + <para> + The file drivers/scsi/scsi_transport_fc.c defines transport attributes + for Fibre Channel. + </para> +!Edrivers/scsi/scsi_transport_fc.c + </sect2> + <sect2 id="iSCSI_transport"> + <title>iSCSI transport class</title> + <para> + The file drivers/scsi/scsi_transport_iscsi.c defines transport + attributes for the iSCSI class, which sends SCSI packets over TCP/IP + connections. + </para> +!Edrivers/scsi/scsi_transport_iscsi.c + </sect2> + <sect2 id="SAS_transport"> + <title>Serial Attached SCSI (SAS) transport class</title> + <para> + The file drivers/scsi/scsi_transport_sas.c defines transport + attributes for Serial Attached SCSI, a variant of SATA aimed at + large high-end systems. + </para> + <para> + The SAS transport class contains common code to deal with SAS HBAs, + an aproximated representation of SAS topologies in the driver model, + and various sysfs attributes to expose these topologies and managment + interfaces to userspace. + </para> + <para> + In addition to the basic SCSI core objects this transport class + introduces two additional intermediate objects: The SAS PHY + as represented by struct sas_phy defines an "outgoing" PHY on + a SAS HBA or Expander, and the SAS remote PHY represented by + struct sas_rphy defines an "incoming" PHY on a SAS Expander or + end device. Note that this is purely a software concept, the + underlying hardware for a PHY and a remote PHY is the exactly + the same. + </para> + <para> + There is no concept of a SAS port in this code, users can see + what PHYs form a wide port based on the port_identifier attribute, + which is the same for all PHYs in a port. + </para> +!Edrivers/scsi/scsi_transport_sas.c + </sect2> + <sect2 id="SATA_transport"> + <title>SATA transport class</title> + <para> + The SATA transport is handled by libata, which has its own book of + documentation in this directory. + </para> + </sect2> + <sect2 id="SPI_transport"> + <title>Parallel SCSI (SPI) transport class</title> + <para> + The file drivers/scsi/scsi_transport_spi.c defines transport + attributes for traditional (fast/wide/ultra) SCSI busses. + </para> +!Edrivers/scsi/scsi_transport_spi.c + </sect2> + <sect2 id="SRP_transport"> + <title>SCSI RDMA (SRP) transport class</title> + <para> + The file drivers/scsi/scsi_transport_srp.c defines transport + attributes for SCSI over Remote Direct Memory Access. + </para> +!Edrivers/scsi/scsi_transport_srp.c + </sect2> + </sect1> + + </chapter> + + <chapter id="lower_layer"> + <title>SCSI lower layer</title> + <sect1 id="hba_drivers"> + <title>Host Bus Adapter transport types</title> + <para> + Many modern device controllers use the SCSI command set as a protocol to + communicate with their devices through many different types of physical + connections. + </para> + <para> + In SCSI language a bus capable of carrying SCSI commands is + called a "transport", and a controller connecting to such a bus is + called a "host bus adapter" (HBA). + </para> + <sect2 id="scsi_debug.c"> + <title>Debug transport</title> + <para> + The file drivers/scsi/scsi_debug.c simulates a host adapter with a + variable number of disks (or disk like devices) attached, sharing a + common amount of RAM. Does a lot of checking to make sure that we are + not getting blocks mixed up, and panics the kernel if anything out of + the ordinary is seen. + </para> + <para> + To be more realistic, the simulated devices have the transport + attributes of SAS disks. + </para> + <para> + For documentation see + <ulink url='http://www.torque.net/sg/sdebug26.html'>http://www.torque.net/sg/sdebug26.html</ulink> + </para> +<!-- !Edrivers/scsi/scsi_debug.c --> + </sect2> + <sect2 id="todo"> + <title>todo</title> + <para>Parallel (fast/wide/ultra) SCSI, USB, SATA, + SAS, Fibre Channel, FireWire, ATAPI devices, Infiniband, + I20, iSCSI, Parallel ports, netlink... + </para> + </sect2> + </sect1> + </chapter> +</book> diff --git a/Documentation/DocBook/videobook.tmpl b/Documentation/DocBook/videobook.tmpl index b629da33951d..b3d93ee27693 100644 --- a/Documentation/DocBook/videobook.tmpl +++ b/Documentation/DocBook/videobook.tmpl @@ -96,7 +96,6 @@ static struct video_device my_radio { "My radio", VID_TYPE_TUNER, - VID_HARDWARE_MYRADIO, radio_open. radio_close, NULL, /* no read */ @@ -119,13 +118,6 @@ static struct video_device my_radio way to change channel so it is tuneable. </para> <para> - The VID_HARDWARE_ types are unique to each device. Numbers are assigned by - <email>alan@redhat.com</email> when device drivers are going to be released. Until then you - can pull a suitably large number out of your hat and use it. 10000 should be - safe for a very long time even allowing for the huge number of vendors - making new and different radio cards at the moment. - </para> - <para> We declare an open and close routine, but we do not need read or write, which are used to read and write video data to or from the card itself. As we have no read or write there is no poll function. @@ -844,7 +836,6 @@ static struct video_device my_camera "My Camera", VID_TYPE_OVERLAY|VID_TYPE_SCALES|\ VID_TYPE_CAPTURE|VID_TYPE_CHROMAKEY, - VID_HARDWARE_MYCAMERA, camera_open. camera_close, camera_read, /* no read */ diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index 6221464d1a7e..39ad8f56783a 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt @@ -9,8 +9,8 @@ The first thing resembling RCU was published in 1980, when Kung and Lehman [Kung80] recommended use of a garbage collector to defer destruction of nodes in a parallel binary search tree in order to simplify its implementation. This works well in environments that have garbage -collectors, but current production garbage collectors incur significant -read-side overhead. +collectors, but most production garbage collectors incur significant +overhead. In 1982, Manber and Ladner [Manber82,Manber84] recommended deferring destruction until all threads running at that time have terminated, again @@ -99,16 +99,25 @@ locking, reduces contention, reduces memory latency for readers, and parallelizes pipeline stalls and memory latency for writers. However, these techniques still impose significant read-side overhead in the form of memory barriers. Researchers at Sun worked along similar lines -in the same timeframe [HerlihyLM02,HerlihyLMS03]. These techniques -can be thought of as inside-out reference counts, where the count is -represented by the number of hazard pointers referencing a given data -structure (rather than the more conventional counter field within the -data structure itself). +in the same timeframe [HerlihyLM02]. These techniques can be thought +of as inside-out reference counts, where the count is represented by the +number of hazard pointers referencing a given data structure (rather than +the more conventional counter field within the data structure itself). + +By the same token, RCU can be thought of as a "bulk reference count", +where some form of reference counter covers all reference by a given CPU +or thread during a set timeframe. This timeframe is related to, but +not necessarily exactly the same as, an RCU grace period. In classic +RCU, the reference counter is the per-CPU bit in the "bitmask" field, +and each such bit covers all references that might have been made by +the corresponding CPU during the prior grace period. Of course, RCU +can be thought of in other terms as well. In 2003, the K42 group described how RCU could be used to create -hot-pluggable implementations of operating-system functions. Later that -year saw a paper describing an RCU implementation of System V IPC -[Arcangeli03], and an introduction to RCU in Linux Journal [McKenney03a]. +hot-pluggable implementations of operating-system functions [Appavoo03a]. +Later that year saw a paper describing an RCU implementation of System +V IPC [Arcangeli03], and an introduction to RCU in Linux Journal +[McKenney03a]. 2004 has seen a Linux-Journal article on use of RCU in dcache [McKenney04a], a performance comparison of locking to RCU on several @@ -117,10 +126,19 @@ number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper describing how to make RCU safe for soft-realtime applications [Sarma04c], and a paper describing SELinux performance with RCU [JamesMorris04b]. -2005 has seen further adaptation of RCU to realtime use, permitting +2005 brought further adaptation of RCU to realtime use, permitting preemption of RCU realtime critical sections [PaulMcKenney05a, PaulMcKenney05b]. +2006 saw the first best-paper award for an RCU paper [ThomasEHart2006a], +as well as further work on efficient implementations of preemptible +RCU [PaulEMcKenney2006b], but priority-boosting of RCU read-side critical +sections proved elusive. An RCU implementation permitting general +blocking in read-side critical sections appeared [PaulEMcKenney2006c], +Robert Olsson described an RCU-protected trie-hash combination +[RobertOlsson2006a]. + + Bibtex Entries @article{Kung80 @@ -203,6 +221,41 @@ Bibtex Entries ,Address="New Orleans, LA" } +@conference{Pu95a, +Author = "Calton Pu and Tito Autrey and Andrew Black and Charles Consel and +Crispin Cowan and Jon Inouye and Lakshmi Kethana and Jonathan Walpole and +Ke Zhang", +Title = "Optimistic Incremental Specialization: Streamlining a Commercial +Operating System", +Booktitle = "15\textsuperscript{th} ACM Symposium on +Operating Systems Principles (SOSP'95)", +address = "Copper Mountain, CO", +month="December", +year="1995", +pages="314-321", +annotation=" + Uses a replugger, but with a flag to signal when people are + using the resource at hand. Only one reader at a time. +" +} + +@conference{Cowan96a, +Author = "Crispin Cowan and Tito Autrey and Charles Krasic and +Calton Pu and Jonathan Walpole", +Title = "Fast Concurrent Dynamic Linking for an Adaptive Operating System", +Booktitle = "International Conference on Configurable Distributed Systems +(ICCDS'96)", +address = "Annapolis, MD", +month="May", +year="1996", +pages="108", +isbn="0-8186-7395-8", +annotation=" + Uses a replugger, but with a counter to signal when people are + using the resource at hand. Allows multiple readers. +" +} + @techreport{Slingwine95 ,author="John D. Slingwine and Paul E. McKenney" ,title="Apparatus and Method for Achieving Reduced Overhead Mutual @@ -312,6 +365,49 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" [Viewed June 23, 2004]" } +@conference{Michael02a +,author="Maged M. Michael" +,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic +Reads and Writes" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM +Symposium on Principles of Distributed Computing}" +,pages="21-30" +,annotation=" + Each thread keeps an array of pointers to items that it is + currently referencing. Sort of an inside-out garbage collection + mechanism, but one that requires the accessing code to explicitly + state its needs. Also requires read-side memory barriers on + most architectures. +" +} + +@conference{Michael02b +,author="Maged M. Michael" +,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM +Symposium on Parallel +Algorithms and Architecture}" +,pages="73-82" +,annotation=" + Like the title says... +" +} + +@InProceedings{HerlihyLM02 +,author={Maurice Herlihy and Victor Luchangco and Mark Moir} +,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, +Lock-Free Data Structures" +,booktitle={Proceedings of 16\textsuperscript{th} International +Symposium on Distributed Computing} +,year=2002 +,month="October" +,pages="339-353" +} + @article{Appavoo03a ,author="J. Appavoo and K. Hui and C. A. N. Soules and R. W. Wisniewski and D. M. {Da Silva} and O. Krieger and M. A. Auslander and D. J. Edelsohn and @@ -447,3 +543,95 @@ Oregon Health and Sciences University" Realtime turns into making RCU yet more realtime friendly. " } + +@conference{ThomasEHart2006a +,Author="Thomas E. Hart and Paul E. McKenney and Angela Demke Brown" +,Title="Making Lockless Synchronization Fast: Performance Implications +of Memory Reclamation" +,Booktitle="20\textsuperscript{th} {IEEE} International Parallel and +Distributed Processing Symposium" +,month="April" +,year="2006" +,day="25-29" +,address="Rhodes, Greece" +,annotation=" + Compares QSBR (AKA "classic RCU"), HPBR, EBR, and lock-free + reference counting. +" +} + +@Conference{PaulEMcKenney2006b +,Author="Paul E. McKenney and Dipankar Sarma and Ingo Molnar and +Suparna Bhattacharya" +,Title="Extending RCU for Realtime and Embedded Workloads" +,Booktitle="{Ottawa Linux Symposium}" +,Month="July" +,Year="2006" +,pages="v2 123-138" +,note="Available: +\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184} +\url{http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf} +[Viewed January 1, 2007]" +,annotation=" + Described how to improve the -rt implementation of realtime RCU. +" +} + +@unpublished{PaulEMcKenney2006c +,Author="Paul E. McKenney" +,Title="Sleepable {RCU}" +,month="October" +,day="9" +,year="2006" +,note="Available: +\url{http://lwn.net/Articles/202847/} +Revised: +\url{http://www.rdrop.com/users/paulmck/RCU/srcu.2007.01.14a.pdf} +[Viewed August 21, 2006]" +,annotation=" + LWN article introducing SRCU. +" +} + +@unpublished{RobertOlsson2006a +,Author="Robert Olsson and Stefan Nilsson" +,Title="{TRASH}: A dynamic {LC}-trie and hash data structure" +,month="August" +,day="18" +,year="2006" +,note="Available: +\url{http://www.nada.kth.se/~snilsson/public/papers/trash/trash.pdf} +[Viewed February 24, 2007]" +,annotation=" + RCU-protected dynamic trie-hash combination. +" +} + +@unpublished{ThomasEHart2007a +,Author="Thomas E. Hart and Paul E. McKenney and Angela Demke Brown and Jonathan Walpole" +,Title="Performance of memory reclamation for lockless synchronization" +,journal="J. Parallel Distrib. Comput." +,year="2007" +,note="To appear in J. Parallel Distrib. Comput. + \url{doi=10.1016/j.jpdc.2007.04.010}" +,annotation={ + Compares QSBR (AKA "classic RCU"), HPBR, EBR, and lock-free + reference counting. Journal version of ThomasEHart2006a. +} +} + +@unpublished{PaulEMcKenney2007QRCUspin +,Author="Paul E. McKenney" +,Title="Using Promela and Spin to verify parallel algorithms" +,month="August" +,day="1" +,year="2007" +,note="Available: +\url{http://lwn.net/Articles/243851/} +[Viewed September 8, 2007]" +,annotation=" + LWN article describing Promela and spin, and also using Oleg + Nesterov's QRCU as an example (with Paul McKenney's fastpath). +" +} + diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index f84407cba816..95821a29ae41 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -36,6 +36,14 @@ o How can the updater tell when a grace period has completed executed in user mode, or executed in the idle loop, we can safely free up that item. + Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the + same effect, but require that the readers manipulate CPU-local + counters. These counters allow limited types of blocking + within RCU read-side critical sections. SRCU also uses + CPU-local counters, and permits general blocking within + RCU read-side critical sections. These two variants of + RCU detect grace periods by sampling these counters. + o If I am running on a uniprocessor kernel, which can only do one thing at a time, why should I wait for a grace period? @@ -46,7 +54,10 @@ o How can I see where RCU is currently used in the Linux kernel? Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu", "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh", "srcu_read_lock", "srcu_read_unlock", "synchronize_rcu", - "synchronize_net", and "synchronize_srcu". + "synchronize_net", "synchronize_srcu", and the other RCU + primitives. Or grab one of the cscope databases from: + + http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html o What guidelines should I follow when writing code that uses RCU? @@ -67,7 +78,11 @@ o I hear that RCU is patented? What is with that? o I hear that RCU needs work in order to support realtime kernels? - Yes, work in progress. + This work is largely completed. Realtime-friendly RCU can be + enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter. + However, work is in progress for enabling priority boosting of + preempted RCU read-side critical sections.This is needed if you + have CPU-bound realtime threads. o Where can I find more information on RCU? diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 25a3c3f7d378..2967a65269d8 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -46,12 +46,13 @@ stat_interval The number of seconds between output of torture shuffle_interval The number of seconds to keep the test threads affinitied - to a particular subset of the CPUs. Used in conjunction - with test_no_idle_hz. + to a particular subset of the CPUs, defaults to 5 seconds. + Used in conjunction with test_no_idle_hz. test_no_idle_hz Whether or not to test the ability of RCU to operate in a kernel that disables the scheduling-clock interrupt to idle CPUs. Boolean parameter, "1" to test, "0" otherwise. + Defaults to omitting this test. torture_type The type of RCU to test: "rcu" for the rcu_read_lock() API, "rcu_sync" for rcu_read_lock() with synchronous reclamation, @@ -82,8 +83,6 @@ be evident. ;-) The entries are as follows: -o "ggp": The number of counter flips (or batches) since boot. - o "rtc": The hexadecimal address of the structure currently visible to readers. @@ -117,8 +116,8 @@ o "Reader Pipe": Histogram of "ages" of structures seen by readers. o "Reader Batch": Another histogram of "ages" of structures seen by readers, but in terms of counter flips (or batches) rather than in terms of grace periods. The legal number of non-zero - entries is again two. The reason for this separate view is - that it is easier to get the third entry to show up in the + entries is again two. The reason for this separate view is that + it is sometimes easier to get the third entry to show up in the "Reader Batch" list than in the "Reader Pipe" list. o "Free-Block Circulation": Shows the number of torture structures diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 555c8cf3650a..af3b925ece08 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -45,6 +45,7 @@ The following ARM processors are supported by cpufreq: ARM Integrator ARM-SA1100 ARM-SA1110 +Intel PXA 1.2 x86 diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index a741f658a3c9..ba0aacde94fb 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -50,7 +50,7 @@ additional_cpus=n (*) Use this to limit hotpluggable cpus. This option sets cpu_possible_map = cpu_present_map + additional_cpus (*) Option valid only for following architectures -- x86_64, ia64, s390 +- x86_64, ia64 ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT to determine the number of potentially hot-pluggable cpus. The implementation @@ -109,12 +109,13 @@ Never use anything other than cpumask_t to represent bitmap of CPUs. for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask. #include <linux/cpu.h> - lock_cpu_hotplug() and unlock_cpu_hotplug(): + get_online_cpus() and put_online_cpus(): -The above calls are used to inhibit cpu hotplug operations. While holding the -cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid -cpus going away, you could also use preempt_disable() and preempt_enable() -for those sections. Just remember the critical section cannot call any +The above calls are used to inhibit cpu hotplug operations. While the +cpu_hotplug.refcount is non zero, the cpu_online_map will not change. +If you merely need to avoid cpus going away, you could also use +preempt_disable() and preempt_enable() for those sections. +Just remember the critical section cannot call any function that can sleep or schedule this process away. The preempt_disable() will work as long as stop_machine_run() is used to take a cpu down. diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt index a2ac6d294793..8b49302712a8 100644 --- a/Documentation/crypto/api-intro.txt +++ b/Documentation/crypto/api-intro.txt @@ -33,9 +33,16 @@ The idea is to make the user interface and algorithm registration API very simple, while hiding the core logic from both. Many good ideas from existing APIs such as Cryptoapi and Nettle have been adapted for this. -The API currently supports three types of transforms: Ciphers, Digests and -Compressors. The compression algorithms especially seem to be performing -very well so far. +The API currently supports five main types of transforms: AEAD (Authenticated +Encryption with Associated Data), Block Ciphers, Ciphers, Compressors and +Hashes. + +Please note that Block Ciphers is somewhat of a misnomer. It is in fact +meant to support all ciphers including stream ciphers. The difference +between Block Ciphers and Ciphers is that the latter operates on exactly +one block while the former can operate on an arbitrary amount of data, +subject to block size requirements (i.e., non-stream ciphers can only +process multiples of blocks). Support for hardware crypto devices via an asynchronous interface is under development. @@ -69,29 +76,12 @@ Here's an example of how to use the API: Many real examples are available in the regression test module (tcrypt.c). -CONFIGURATION NOTES - -As Triple DES is part of the DES module, for those using modular builds, -add the following line to /etc/modprobe.conf: - - alias des3_ede des - -The Null algorithms reside in the crypto_null module, so these lines -should also be added: - - alias cipher_null crypto_null - alias digest_null crypto_null - alias compress_null crypto_null - -The SHA384 algorithm shares code within the SHA512 module, so you'll -also need: - alias sha384 sha512 - - DEVELOPER NOTES Transforms may only be allocated in user context, and cryptographic -methods may only be called from softirq and user contexts. +methods may only be called from softirq and user contexts. For +transforms with a setkey method it too should only be called from +user context. When using the API for ciphers, performance will be optimal if each scatterlist contains data which is a multiple of the cipher's block @@ -130,8 +120,9 @@ might already be working on. BUGS Send bug reports to: -Herbert Xu <herbert@gondor.apana.org.au> -Cc: David S. Miller <davem@redhat.com> +linux-crypto@vger.kernel.org +Cc: Herbert Xu <herbert@gondor.apana.org.au>, + David S. Miller <davem@redhat.com> FURTHER INFORMATION diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/debugging-via-ohci1394.txt new file mode 100644 index 000000000000..de4804e8b396 --- /dev/null +++ b/Documentation/debugging-via-ohci1394.txt @@ -0,0 +1,179 @@ + + Using physical DMA provided by OHCI-1394 FireWire controllers for debugging + --------------------------------------------------------------------------- + +Introduction +------------ + +Basically all FireWire controllers which are in use today are compliant +to the OHCI-1394 specification which defines the controller to be a PCI +bus master which uses DMA to offload data transfers from the CPU and has +a "Physical Response Unit" which executes specific requests by employing +PCI-Bus master DMA after applying filters defined by the OHCI-1394 driver. + +Once properly configured, remote machines can send these requests to +ask the OHCI-1394 controller to perform read and write requests on +physical system memory and, for read requests, send the result of +the physical memory read back to the requester. + +With that, it is possible to debug issues by reading interesting memory +locations such as buffers like the printk buffer or the process table. + +Retrieving a full system memory dump is also possible over the FireWire, +using data transfer rates in the order of 10MB/s or more. + +Memory access is currently limited to the low 4G of physical address +space which can be a problem on IA64 machines where memory is located +mostly above that limit, but it is rarely a problem on more common +hardware such as hardware based on x86, x86-64 and PowerPC. + +Together with a early initialization of the OHCI-1394 controller for debugging, +this facility proved most useful for examining long debugs logs in the printk +buffer on to debug early boot problems in areas like ACPI where the system +fails to boot and other means for debugging (serial port) are either not +available (notebooks) or too slow for extensive debug information (like ACPI). + +Drivers +------- + +The OHCI-1394 drivers in drivers/firewire and drivers/ieee1394 initialize +the OHCI-1394 controllers to a working state and can be used to enable +physical DMA. By default you only have to load the driver, and physical +DMA access will be granted to all remote nodes, but it can be turned off +when using the ohci1394 driver. + +Because these drivers depend on the PCI enumeration to be completed, an +initialization routine which can runs pretty early (long before console_init(), +which makes the printk buffer appear on the console can be called) was written. + +To activate it, enable CONFIG_PROVIDE_OHCI1394_DMA_INIT (Kernel hacking menu: +Provide code for enabling DMA over FireWire early on boot) and pass the +parameter "ohci1394_dma=early" to the recompiled kernel on boot. + +Tools +----- + +firescope - Originally developed by Benjamin Herrenschmidt, Andi Kleen ported +it from PowerPC to x86 and x86_64 and added functionality, firescope can now +be used to view the printk buffer of a remote machine, even with live update. + +Bernhard Kaindl enhanced firescope to support accessing 64-bit machines +from 32-bit firescope and vice versa: +- ftp://ftp.suse.de/private/bk/firewire/tools/firescope-0.2.2.tar.bz2 + +and he implemented fast system dump (alpha version - read README.txt): +- ftp://ftp.suse.de/private/bk/firewire/tools/firedump-0.1.tar.bz2 + +There is also a gdb proxy for firewire which allows to use gdb to access +data which can be referenced from symbols found by gdb in vmlinux: +- ftp://ftp.suse.de/private/bk/firewire/tools/fireproxy-0.33.tar.bz2 + +The latest version of this gdb proxy (fireproxy-0.34) can communicate (not +yet stable) with kgdb over an memory-based communication module (kgdbom). + +Getting Started +--------------- + +The OHCI-1394 specification regulates that the OHCI-1394 controller must +disable all physical DMA on each bus reset. + +This means that if you want to debug an issue in a system state where +interrupts are disabled and where no polling of the OHCI-1394 controller +for bus resets takes place, you have to establish any FireWire cable +connections and fully initialize all FireWire hardware __before__ the +system enters such state. + +Step-by-step instructions for using firescope with early OHCI initialization: + +1) Verify that your hardware is supported: + + Load the ohci1394 or the fw-ohci module and check your kernel logs. + You should see a line similar to + + ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[18] MMIO=[fe9ff800-fe9fffff] + ... Max Packet=[2048] IR/IT contexts=[4/8] + + when loading the driver. If you have no supported controller, many PCI, + CardBus and even some Express cards which are fully compliant to OHCI-1394 + specification are available. If it requires no driver for Windows operating + systems, it most likely is. Only specialized shops have cards which are not + compliant, they are based on TI PCILynx chips and require drivers for Win- + dows operating systems. + +2) Establish a working FireWire cable connection: + + Any FireWire cable, as long at it provides electrically and mechanically + stable connection and has matching connectors (there are small 4-pin and + large 6-pin FireWire ports) will do. + + If an driver is running on both machines you should see a line like + + ieee1394: Node added: ID:BUS[0-01:1023] GUID[0090270001b84bba] + + on both machines in the kernel log when the cable is plugged in + and connects the two machines. + +3) Test physical DMA using firescope: + + On the debug host, + - load the raw1394 module, + - make sure that /dev/raw1394 is accessible, + then start firescope: + + $ firescope + Port 0 (ohci1394) opened, 2 nodes detected + + FireScope + --------- + Target : <unspecified> + Gen : 1 + [Ctrl-T] choose target + [Ctrl-H] this menu + [Ctrl-Q] quit + + ------> Press Ctrl-T now, the output should be similar to: + + 2 nodes available, local node is: 0 + 0: ffc0, uuid: 00000000 00000000 [LOCAL] + 1: ffc1, uuid: 00279000 ba4bb801 + + Besides the [LOCAL] node, it must show another node without error message. + +4) Prepare for debugging with early OHCI-1394 initialization: + + 4.1) Kernel compilation and installation on debug target + + Compile the kernel to be debugged with CONFIG_PROVIDE_OHCI1394_DMA_INIT + (Kernel hacking: Provide code for enabling DMA over FireWire early on boot) + enabled and install it on the machine to be debugged (debug target). + + 4.2) Transfer the System.map of the debugged kernel to the debug host + + Copy the System.map of the kernel be debugged to the debug host (the host + which is connected to the debugged machine over the FireWire cable). + +5) Retrieving the printk buffer contents: + + With the FireWire cable connected, the OHCI-1394 driver on the debugging + host loaded, reboot the debugged machine, booting the kernel which has + CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early. + + Then, on the debugging host, run firescope, for example by using -A: + + firescope -A System.map-of-debug-target-kernel + + Note: -A automatically attaches to the first non-local node. It only works + reliably if only connected two machines are connected using FireWire. + + After having attached to the debug target, press Ctrl-D to view the + complete printk buffer or Ctrl-U to enter auto update mode and get an + updated live view of recent kernel messages logged on the debug target. + + Call "firescope -h" to get more information on firescope's options. + +Notes +----- +Documentation and specifications: ftp://ftp.suse.de/private/bk/firewire/docs + +FireWire is a trademark of Apple Inc. - for more information please refer to: +http://en.wikipedia.org/wiki/FireWire diff --git a/Documentation/dontdiff b/Documentation/dontdiff index f2d658a6a942..c09a96b99354 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -46,8 +46,6 @@ .mailmap .mm 53c700_d.h -53c7xx_d.h -53c7xx_u.h 53c8xx_d.h* BitKeeper COPYING diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt index ecb47adda063..b7b1d1b1da46 100644 --- a/Documentation/dvb/bt8xx.txt +++ b/Documentation/dvb/bt8xx.txt @@ -78,6 +78,18 @@ Example: For a full list of card ID's please see Documentation/video4linux/CARDLIST.bttv. In case of further problems please subscribe and send questions to the mailing list: linux-dvb@linuxtv.org. +2c) Probing the cards with broken PCI subsystem ID +-------------------------------------------------- +There are some TwinHan cards that the EEPROM has become corrupted for some +reason. The cards do not have correct PCI subsystem ID. But we can force +probing the cards with broken PCI subsystem ID + + $ echo 109e 0878 $subvendor $subdevice > \ + /sys/bus/pci/drivers/bt878/new_id + +109e: PCI_VENDOR_ID_BROOKTREE +0878: PCI_DEVICE_ID_BROOKTREE_878 + Authors: Richard Walker, Jamie Honan, Michael Hunold, diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 20c4c8bac9d7..181bff005167 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -191,15 +191,6 @@ Who: Kay Sievers <kay.sievers@suse.de> --------------------------- -What: i2c_adapter.list -When: July 2007 -Why: Superfluous, this list duplicates the one maintained by the driver - core. -Who: Jean Delvare <khali@linux-fr.org>, - David Brownell <dbrownell@users.sourceforge.net> - ---------------------------- - What: ACPI procfs interface When: July 2008 Why: ACPI sysfs conversion should be finished by January 2008. @@ -225,14 +216,6 @@ Who: Len Brown <len.brown@intel.com> --------------------------- -What: i2c-ixp2000, i2c-ixp4xx and scx200_i2c drivers -When: September 2007 -Why: Obsolete. The new i2c-gpio driver replaces all hardware-specific - I2C-over-GPIO drivers. -Who: Jean Delvare <khali@linux-fr.org> - ---------------------------- - What: 'time' kernel boot parameter When: January 2008 Why: replaced by 'printk.time=<value>' so that printk timestamps can be @@ -266,22 +249,6 @@ Who: Tejun Heo <htejun@gmail.com> --------------------------- -What: Legacy RTC drivers (under drivers/i2c/chips) -When: November 2007 -Why: Obsolete. We have a RTC subsystem with better drivers. -Who: Jean Delvare <khali@linux-fr.org> - ---------------------------- - -What: iptables SAME target -When: 1.1. 2008 -Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h -Why: Obsolete for multiple years now, NAT core provides the same behaviour. - Unfixable broken wrt. 32/64 bit cleanness. -Who: Patrick McHardy <kaber@trash.net> - ---------------------------- - What: The arch/ppc and include/asm-ppc directories When: Jun 2008 Why: The arch/powerpc tree is the merged architecture for ppc32 and ppc64 @@ -295,16 +262,6 @@ Who: linuxppc-dev@ozlabs.org --------------------------- -What: mthca driver's MSI support -When: January 2008 -Files: drivers/infiniband/hw/mthca/*.[ch] -Why: All mthca hardware also supports MSI-X, which provides - strictly more functionality than MSI. So there is no point in - having both MSI-X and MSI support in the driver. -Who: Roland Dreier <rolandd@cisco.com> - ---------------------------- - What: sk98lin network driver When: Feburary 2008 Why: In kernel tree version of driver is unmaintained. Sk98lin driver @@ -323,13 +280,77 @@ Who: Thomas Gleixner <tglx@linutronix.de> --------------------------- -What: shaper network driver -When: January 2008 -Files: drivers/net/shaper.c, include/linux/if_shaper.h -Why: This driver has been marked obsolete for many years. - It was only designed to work on lower speed links and has design - flaws that lead to machine crashes. The qdisc infrastructure in - 2.4 or later kernels, provides richer features and is more robust. -Who: Stephen Hemminger <shemminger@linux-foundation.org> +--------------------------- + +What: i2c-i810, i2c-prosavage and i2c-savage4 +When: May 2008 +Why: These drivers are superseded by i810fb, intelfb and savagefb. +Who: Jean Delvare <khali@linux-fr.org> --------------------------- + +What: bcm43xx wireless network driver +When: 2.6.26 +Files: drivers/net/wireless/bcm43xx +Why: This driver's functionality has been replaced by the + mac80211-based b43 and b43legacy drivers. +Who: John W. Linville <linville@tuxdriver.com> + +--------------------------- + +What: ieee80211 softmac wireless networking component +When: 2.6.26 (or after removal of bcm43xx and port of zd1211rw to mac80211) +Files: net/ieee80211/softmac +Why: No in-kernel drivers will depend on it any longer. +Who: John W. Linville <linville@tuxdriver.com> + +--------------------------- + +What: rc80211-simple rate control algorithm for mac80211 +When: 2.6.26 +Files: net/mac80211/rc80211-simple.c +Why: This algorithm was provided for reference but always exhibited bad + responsiveness and performance and has some serious flaws. It has been + replaced by rc80211-pid. +Who: Stefano Brivio <stefano.brivio@polimi.it> + +--------------------------- + +What (Why): + - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files + (superseded by xt_TOS/xt_tos target & match) + + - "forwarding" header files like ipt_mac.h in + include/linux/netfilter_ipv4/ and include/linux/netfilter_ipv6/ + + - xt_CONNMARK match revision 0 + (superseded by xt_CONNMARK match revision 1) + + - xt_MARK target revisions 0 and 1 + (superseded by xt_MARK match revision 2) + + - xt_connmark match revision 0 + (superseded by xt_connmark match revision 1) + + - xt_conntrack match revision 0 + (superseded by xt_conntrack match revision 1) + + - xt_iprange match revision 0, + include/linux/netfilter_ipv4/ipt_iprange.h + (superseded by xt_iprange match revision 1) + + - xt_mark match revision 0 + (superseded by xt_mark match revision 1) + +When: January 2009 or Linux 2.7.0, whichever comes first +Why: Superseded by newer revisions or modules +Who: Jan Engelhardt <jengelh@computergmbh.de> + +--------------------------- + +What: b43 support for firmware revision < 410 +When: July 2008 +Why: The support code for the old firmware hurts code readability/maintainability + and slightly hurts runtime performance. Bugfixes for the old firmware + are not provided by Broadcom anymore. +Who: Michael Buesch <mb@bu3sch.de> diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 6a4adcae9f9a..560f88dc7090 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -86,9 +86,21 @@ Alex is working on a new set of patches right now. When mounting an ext4 filesystem, the following option are accepted: (*) == default -extents ext4 will use extents to address file data. The +extents (*) ext4 will use extents to address file data. The file system will no longer be mountable by ext3. +noextents ext4 will not use extents for newly created files + +journal_checksum Enable checksumming of the journal transactions. + This will allow the recovery code in e2fsck and the + kernel to detect corruption in the kernel. It is a + compatible change and will be ignored by older kernels. + +journal_async_commit Commit block can be written to disk without waiting + for descriptor blocks. If enabled older kernels cannot + mount the device. This will enable 'journal_checksum' + internally. + journal=update Update the ext4 file system's journal to the current format. @@ -196,6 +208,12 @@ nobh (a) cache disk block mapping information "nobh" option tries to avoid associating buffer heads (supported only for "writeback" mode). +mballoc (*) Use the multiple block allocator for block allocation +nomballoc disabled multiple block allocator for block allocation. +stripe=n Number of filesystem blocks that mballoc will try + to use for allocation size and alignment. For RAID5/6 + systems this should be the number of data + disks * RAID chunk size in file system blocks. Data Mode --------- diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index ed55238023a9..c318a8bbb1ef 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -35,7 +35,6 @@ Features which OCFS2 does not support yet: - Directory change notification (F_NOTIFY) - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) - POSIX ACLs - - readpages / writepages (not user visible) Mount options ============= @@ -62,3 +61,18 @@ data=writeback Data ordering is not preserved, data may be written preferred_slot=0(*) During mount, try to use this filesystem slot first. If it is in use by another node, the first empty one found will be chosen. Invalid values will be ignored. +commit=nrsec (*) Ocfs2 can be told to sync all its data and metadata + every 'nrsec' seconds. The default value is 5 seconds. + This means that if you lose your power, you will lose + as much as the latest 5 seconds of work (your + filesystem will not be damaged though, thanks to the + journaling). This default value (or any low value) + will hurt performance, but it's good for data-safety. + Setting it to 0 will have the same effect as leaving + it at the default (5 seconds). + Setting it to very large values will improve + performance. +localalloc=8(*) Allows custom localalloc size in MB. If the value is too + large, the fs will silently revert it to the default. + Localalloc is not enabled for local mounts. +localflocks This disables cluster aware flock. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index dec99455321f..4413a2d4646f 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -857,6 +857,45 @@ CPUs. The "procs_blocked" line gives the number of processes currently blocked, waiting for I/O to complete. +1.9 Ext4 file system parameters +------------------------------ +Ext4 file system have one directory per partition under /proc/fs/ext4/ +# ls /proc/fs/ext4/hdc/ +group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req +stats stream_req + +mb_groups: +This file gives the details of mutiblock allocator buddy cache of free blocks + +mb_history: +Multiblock allocation history. + +stats: +This file indicate whether the multiblock allocator should start collecting +statistics. The statistics are shown during unmount + +group_prealloc: +The multiblock allocator normalize the block allocation request to +group_prealloc filesystem blocks if we don't have strip value set. +The stripe value can be specified at mount time or during mke2fs. + +max_to_scan: +How long multiblock allocator can look for a best extent (in found extents) + +min_to_scan: +How long multiblock allocator must look for a best extent + +order2_req: +Multiblock allocator use 2^N search using buddies only for requests greater +than or equal to order2_req. The request size is specfied in file system +blocks. A value of 2 indicate only if the requests are greater than or equal +to 4 blocks. + +stream_req: +Files smaller than stream_req are served by the stream allocator, whose +purpose is to pack requests as close each to other as possible to +produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16 +filesystem block size will use group based preallocation. ------------------------------------------------------------------------------ Summary diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index fde4420e3f75..3bd958360159 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -17,9 +17,8 @@ Supported adapters: Datasheets: Publicly available at the Intel website Authors: - Frodo Looijaard <frodol@dds.nl>, - Philip Edelbrock <phil@netroedge.com>, Mark Studebaker <mdsxyz123@yahoo.com> + Jean Delvare <khali@linux-fr.org> Module Parameters @@ -62,7 +61,7 @@ Not supported. I2C Block Read Support ---------------------- -Not supported at the moment. +I2C block read is supported on the 82801EB (ICH5) and later chips. SMBus 2.0 Support diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro index 06b4be3ef6d8..1405fb69984c 100644 --- a/Documentation/i2c/busses/i2c-viapro +++ b/Documentation/i2c/busses/i2c-viapro @@ -10,7 +10,7 @@ Supported adapters: * VIA Technologies, Inc. VT8231, VT8233, VT8233A Datasheet: available on request from VIA - * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251 + * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8237S, VT8251 Datasheet: available on request and under NDA from VIA * VIA Technologies, Inc. CX700 @@ -46,6 +46,7 @@ Your lspci -n listing must show one of these : device 1106:3177 (VT8235) device 1106:3227 (VT8237R) device 1106:3337 (VT8237A) + device 1106:3372 (VT8237S) device 1106:3287 (VT8251) device 1106:8324 (CX700) diff --git a/Documentation/i2c/chips/pcf8575 b/Documentation/i2c/chips/pcf8575 new file mode 100644 index 000000000000..25f5698a61cf --- /dev/null +++ b/Documentation/i2c/chips/pcf8575 @@ -0,0 +1,72 @@ +About the PCF8575 chip and the pcf8575 kernel driver +==================================================== + +The PCF8575 chip is produced by the following manufacturers: + + * Philips NXP + http://www.nxp.com/#/pip/cb=[type=product,path=50807/41735/41850,final=PCF8575_3]|pip=[pip=PCF8575_3][0] + + * Texas Instruments + http://focus.ti.com/docs/prod/folders/print/pcf8575.html + + +Some vendors sell small PCB's with the PCF8575 mounted on it. You can connect +such a board to a Linux host via e.g. an USB to I2C interface. Examples of +PCB boards with a PCF8575: + + * SFE Breakout Board for PCF8575 I2C Expander by RobotShop + http://www.robotshop.ca/home/products/robot-parts/electronics/adapters-converters/sfe-pcf8575-i2c-expander-board.html + + * Breakout Board for PCF8575 I2C Expander by Spark Fun Electronics + http://www.sparkfun.com/commerce/product_info.php?products_id=8130 + + +Description +----------- +The PCF8575 chip is a 16-bit I/O expander for the I2C bus. Up to eight of +these chips can be connected to the same I2C bus. You can find this +chip on some custom designed hardware, but you won't find it on PC +motherboards. + +The PCF8575 chip consists of a 16-bit quasi-bidirectional port and an I2C-bus +interface. Each of the sixteen I/O's can be independently used as an input or +an output. To set up an I/O pin as an input, you have to write a 1 to the +corresponding output. + +For more information please see the datasheet. + + +Detection +--------- + +There is no method known to detect whether a chip on a given I2C address is +a PCF8575 or whether it is any other I2C device. So there are two alternatives +to let the driver find the installed PCF8575 devices: +- Load this driver after any other I2C driver for I2C devices with addresses + in the range 0x20 .. 0x27. +- Pass the I2C bus and address of the installed PCF8575 devices explicitly to + the driver at load time via the probe=... or force=... parameters. + +/sys interface +-------------- + +For each address on which a PCF8575 chip was found or forced the following +files will be created under /sys: +* /sys/bus/i2c/devices/<bus>-<address>/read +* /sys/bus/i2c/devices/<bus>-<address>/write +where bus is the I2C bus number (0, 1, ...) and address is the four-digit +hexadecimal representation of the 7-bit I2C address of the PCF8575 +(0020 .. 0027). + +The read file is read-only. Reading it will trigger an I2C read and will hence +report the current input state for the pins configured as inputs, and the +current output value for the pins configured as outputs. + +The write file is read-write. Writing a value to it will configure all pins +as output for which the corresponding bit is zero. Reading the write file will +return the value last written, or -EAGAIN if no value has yet been written to +the write file. + +On module initialization the configuration of the chip is not changed -- the +chip is left in the state it was already configured in through either power-up +or through previous I2C write actions. diff --git a/Documentation/i2c/i2c-stub b/Documentation/i2c/i2c-stub index 89e69ad3436c..0d8be1c20c16 100644 --- a/Documentation/i2c/i2c-stub +++ b/Documentation/i2c/i2c-stub @@ -25,6 +25,9 @@ The typical use-case is like this: 3. load the target sensors chip driver module 4. observe its behavior in the kernel log +There's a script named i2c-stub-from-dump in the i2c-tools package which +can load register values automatically from a chip dump. + PARAMETERS: int chip_addr[10]: @@ -32,9 +35,6 @@ int chip_addr[10]: CAVEATS: -There are independent arrays for byte/data and word/data commands. Depending -on if/how a target driver mixes them, you'll need to be careful. - If your target driver polls some byte or word waiting for it to change, the stub could lock it up. Use i2cset to unlock it. diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 2c170032bf37..bfb0a5520817 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -267,9 +267,9 @@ insmod parameter of the form force_<kind>. Fortunately, as a module writer, you just have to define the `normal_i2c' parameter. The complete declaration could look like this: - /* Scan 0x37, and 0x48 to 0x4f */ - static unsigned short normal_i2c[] = { 0x37, 0x48, 0x49, 0x4a, 0x4b, 0x4c, - 0x4d, 0x4e, 0x4f, I2C_CLIENT_END }; + /* Scan 0x4c to 0x4f */ + static const unsigned short normal_i2c[] = { 0x4c, 0x4d, 0x4e, 0x4f, + I2C_CLIENT_END }; /* Magic definition of all other variables and things */ I2C_CLIENT_INSMOD; diff --git a/Documentation/ide.txt b/Documentation/ide.txt index 1d50f23a5cab..94e2e3b9e77f 100644 --- a/Documentation/ide.txt +++ b/Documentation/ide.txt @@ -30,7 +30,7 @@ *** *** The CMD640 is also used on some Vesa Local Bus (VLB) cards, and is *NOT* *** automatically detected by Linux. For safe, reliable operation with such -*** interfaces, one *MUST* use the "ide0=cmd640_vlb" kernel option. +*** interfaces, one *MUST* use the "cmd640.probe_vlb" kernel option. *** *** Use of the "serialize" option is no longer necessary. @@ -244,10 +244,6 @@ Summary of ide driver parameters for kernel command line "hdx=nodma" : disallow DMA - "hdx=swapdata" : when the drive is a disk, byte swap all data - - "hdx=bswap" : same as above.......... - "hdx=scsi" : the return of the ide-scsi flag, this is useful for allowing ide-floppy, ide-tape, and ide-cdrom|writers to use ide-scsi emulation on a device specific option. @@ -292,9 +288,6 @@ The following are valid ONLY on ide0, which usually corresponds to the first ATA interface found on the particular host, and the defaults for the base,ctl ports must not be altered. - "ide0=cmd640_vlb" : *REQUIRED* for VLB cards with the CMD640 chip - (not for PCI -- automatically detected) - "ide=doubler" : probe/support IDE doublers on Amiga There may be more options than shown -- use the source, Luke! @@ -310,6 +303,10 @@ i.e. to enable probing for ALI M14xx chipsets (ali14xx host driver) use: * "probe" module parameter when ali14xx driver is compiled as module ("modprobe ali14xx probe") +Also for legacy CMD640 host driver (cmd640) you need to use "probe_vlb" +kernel paremeter to enable probing for VLB version of the chipset (PCI ones +are detected automatically). + ================================================================================ IDE ATAPI streaming tape driver diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt index 5c7fbf9d96b4..c18363bd8d11 100644 --- a/Documentation/ioctl-number.txt +++ b/Documentation/ioctl-number.txt @@ -138,6 +138,7 @@ Code Seq# Include File Comments 'm' 00-1F net/irda/irmod.h conflict! 'n' 00-7F linux/ncp_fs.h 'n' E0-FF video/matrox.h matroxfb +'o' 00-1F fs/ocfs2/ocfs2_fs.h OCFS2 'p' 00-0F linux/phantom.h conflict! (OpenHaptics needs this) 'p' 00-3F linux/mc146818rtc.h conflict! 'p' 40-7F linux/nvram.h diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 616043a6da99..649cb8799890 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -24,7 +24,7 @@ visible if its parent entry is also visible. Menu entries ------------ -Most entries define a config option, all other entries help to organize +Most entries define a config option; all other entries help to organize them. A single configuration option is defined like this: config MODVERSIONS @@ -50,7 +50,7 @@ applicable everywhere (see syntax). - type definition: "bool"/"tristate"/"string"/"hex"/"int" Every config option must have a type. There are only two basic types: - tristate and string, the other types are based on these two. The type + tristate and string; the other types are based on these two. The type definition optionally accepts an input prompt, so these two examples are equivalent: @@ -108,7 +108,7 @@ applicable everywhere (see syntax). equal to 'y' without visiting the dependencies. So abusing select you are able to select a symbol FOO even if FOO depends on BAR that is not set. In general use select only for - non-visible symbols (no promts anywhere) and for symbols with + non-visible symbols (no prompts anywhere) and for symbols with no dependencies. That will limit the usefulness but on the other hand avoid the illegal configurations all over. kconfig should one day warn about such things. @@ -127,6 +127,27 @@ applicable everywhere (see syntax). used to help visually separate configuration logic from help within the file as an aid to developers. +- misc options: "option" <symbol>[=<value>] + Various less common options can be defined via this option syntax, + which can modify the behaviour of the menu entry and its config + symbol. These options are currently possible: + + - "defconfig_list" + This declares a list of default entries which can be used when + looking for the default configuration (which is used when the main + .config doesn't exists yet.) + + - "modules" + This declares the symbol to be used as the MODULES symbol, which + enables the third modular state for all config symbols. + + - "env"=<value> + This imports the environment variable into Kconfig. It behaves like + a default, except that the value comes from the environment, this + also means that the behaviour when mixing it with normal defaults is + undefined at this point. The symbol is currently not exported back + to the build environment (if this is desired, it can be done via + another symbol). Menu dependencies ----------------- @@ -162,9 +183,9 @@ An expression can have a value of 'n', 'm' or 'y' (or 0, 1, 2 respectively for calculations). A menu entry becomes visible when it's expression evaluates to 'm' or 'y'. -There are two types of symbols: constant and nonconstant symbols. -Nonconstant symbols are the most common ones and are defined with the -'config' statement. Nonconstant symbols consist entirely of alphanumeric +There are two types of symbols: constant and non-constant symbols. +Non-constant symbols are the most common ones and are defined with the +'config' statement. Non-constant symbols consist entirely of alphanumeric characters or underscores. Constant symbols are only part of expressions. Constant symbols are always surrounded by single or double quotes. Within the quote, any @@ -301,3 +322,81 @@ mainmenu: This sets the config program's title bar if the config program chooses to use it. + + +Kconfig hints +------------- +This is a collection of Kconfig tips, most of which aren't obvious at +first glance and most of which have become idioms in several Kconfig +files. + +Adding common features and make the usage configurable +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It is a common idiom to implement a feature/functionality that are +relevant for some architectures but not all. +The recommended way to do so is to use a config variable named HAVE_* +that is defined in a common Kconfig file and selected by the relevant +architectures. +An example is the generic IOMAP functionality. + +We would in lib/Kconfig see: + +# Generic IOMAP is used to ... +config HAVE_GENERIC_IOMAP + +config GENERIC_IOMAP + depends on HAVE_GENERIC_IOMAP && FOO + +And in lib/Makefile we would see: +obj-$(CONFIG_GENERIC_IOMAP) += iomap.o + +For each architecture using the generic IOMAP functionality we would see: + +config X86 + select ... + select HAVE_GENERIC_IOMAP + select ... + +Note: we use the existing config option and avoid creating a new +config variable to select HAVE_GENERIC_IOMAP. + +Note: the use of the internal config variable HAVE_GENERIC_IOMAP, it is +introduced to overcome the limitation of select which will force a +config option to 'y' no matter the dependencies. +The dependencies are moved to the symbol GENERIC_IOMAP and we avoid the +situation where select forces a symbol equals to 'y'. + +Build as module only +~~~~~~~~~~~~~~~~~~~~ +To restrict a component build to module-only, qualify its config symbol +with "depends on m". E.g.: + +config FOO + depends on BAR && m + +limits FOO to module (=m) or disabled (=n). + + +Build limited by a third config symbol which may be =y or =m +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A common idiom that we see (and sometimes have problems with) is this: + +When option C in B (module or subsystem) uses interfaces from A (module +or subsystem), and both A and B are tristate (could be =y or =m if they +were independent of each other, but they aren't), then we need to limit +C such that it cannot be built statically if A is built as a loadable +module. (C already depends on B, so there is no dependency issue to +take care of here.) + +If A is linked statically into the kernel image, C can be built +statically or as loadable module(s). However, if A is built as loadable +module(s), then C must be restricted to loadable module(s) also. This +can be expressed in kconfig language as: + +config C + depends on A = y || A = B + +or for real examples, use this command in a kernel tree: + +$ find . -name Kconfig\* | xargs grep -ns "depends on.*=.*||.*=" | grep -v orig + diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index db122df5e77d..92c40d174355 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -34,6 +34,7 @@ parameter is applicable: ALSA ALSA sound support is enabled. APIC APIC support is enabled. APM Advanced Power Management support is enabled. + AVR32 AVR32 architecture is enabled. AX25 Appropriate AX.25 support is enabled. BLACKFIN Blackfin architecture is enabled. DRM Direct Rendering Management support is enabled. @@ -369,7 +370,8 @@ and is between 256 and 4096 characters. It is defined in the file configured. Potentially dangerous and should only be used if you are entirely sure of the consequences. - chandev= [HW,NET] Generic channel device initialisation + ccw_timeout_log [S390] + See Documentation/s390/CommonIO for details. checkreqprot [SELINUX] Set initial checkreqprot flag value. Format: { "0" | "1" } @@ -381,6 +383,12 @@ and is between 256 and 4096 characters. It is defined in the file Value can be changed at runtime via /selinux/checkreqprot. + cio_ignore= [S390] + See Documentation/s390/CommonIO for details. + + cio_msg= [S390] + See Documentation/s390/CommonIO for details. + clock= [BUGS=X86-32, HW] gettimeofday clocksource override. [Deprecated] Forces specified clocksource (if available) to be used @@ -408,8 +416,21 @@ and is between 256 and 4096 characters. It is defined in the file [SPARC64] tick [X86-64] hpet,tsc - code_bytes [IA32] How many bytes of object code to print in an - oops report. + clearcpuid=BITNUM [X86] + Disable CPUID feature X for the kernel. See + include/asm-x86/cpufeature.h for the valid bit numbers. + Note the Linux specific bits are not necessarily + stable over kernel options, but the vendor specific + ones should be. + Also note that user programs calling CPUID directly + or using the feature without checking anything + will still see it. This just prevents it from + being used by the kernel or shown in /proc/cpuinfo. + Also note the kernel might malfunction if you disable + some critical bits. + + code_bytes [IA32/X86_64] How many bytes of object code to print + in an oops report. Range: 0 - 8192 Default: 64 @@ -562,6 +583,12 @@ and is between 256 and 4096 characters. It is defined in the file See drivers/char/README.epca and Documentation/digiepca.txt. + disable_mtrr_trim [X86, Intel and AMD only] + By default the kernel will trim any uncacheable + memory out of your available memory pool based on + MTRR settings. This parameter disables that behavior, + possibly causing your machine to run very slowly. + dmasound= [HW,OSS] Sound subsystem buffers dscc4.setup= [NET] @@ -652,6 +679,10 @@ and is between 256 and 4096 characters. It is defined in the file gamma= [HW,DRM] + gart_fix_e820= [X86_64] disable the fix e820 for K8 GART + Format: off | on + default: on + gdth= [HW,SCSI] See header of drivers/scsi/gdth.c. @@ -787,6 +818,16 @@ and is between 256 and 4096 characters. It is defined in the file for translation below 32 bit and if not available then look in the higher range. + io_delay= [X86-32,X86-64] I/O delay method + 0x80 + Standard port 0x80 based delay + 0xed + Alternate port 0xed based delay (needed on some systems) + udelay + Simple two microseconds delay + none + No delay + io7= [HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in arch/alpha/kernel/core_marvel.c. @@ -1052,6 +1093,11 @@ and is between 256 and 4096 characters. It is defined in the file Multi-Function General Purpose Timers on AMD Geode platforms. + mfgptfix [X86-32] Fix MFGPT timers on AMD Geode platforms when + the BIOS has incorrectly applied a workaround. TinyBIOS + version 0.98 is known to be affected, 0.99 fixes the + problem by letting the user disable the workaround. + mga= [HW,DRM] mousedev.tap_time= @@ -1124,6 +1170,10 @@ and is between 256 and 4096 characters. It is defined in the file of returning the full 64-bit number. The default is to return 64-bit inode numbers. + nmi_debug= [KNL,AVR32] Specify one or more actions to take + when a NMI is triggered. + Format: [state][,regs][,debounce][,die] + nmi_watchdog= [KNL,BUGS=X86-32] Debugging features for SMP kernels no387 [BUGS=X86-32] Tells the kernel to use the 387 maths @@ -1148,6 +1198,8 @@ and is between 256 and 4096 characters. It is defined in the file nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. + noefi [X86-32,X86-64] Disable EFI runtime services support. + noexec [IA-64] noexec [X86-32,X86-64] @@ -1158,6 +1210,8 @@ and is between 256 and 4096 characters. It is defined in the file register save and restore. The kernel will only save legacy floating-point registers on task switch. + noclflush [BUGS=X86] Don't use the CLFLUSH instruction + nohlt [BUGS=ARM] no-hlt [BUGS=X86-32] Tells the kernel that the hlt @@ -1594,7 +1648,13 @@ and is between 256 and 4096 characters. It is defined in the file Format: <vendor>:<model>:<flags> (flags are integer value) - scsi_logging= [SCSI] + scsi_logging_level= [SCSI] a bit mask of logging levels + See drivers/scsi/scsi_logging.h for bits. Also + settable via sysctl at dev.scsi.logging_level + (/proc/sys/dev/scsi/logging_level). + There is also a nice 'scsi_logging_level' script in the + S390-tools package, available for download at + http://www-128.ibm.com/developerworks/linux/linux390/s390-tools-1.5.4.html scsi_mod.scan= [SCSI] sync (default) scans SCSI busses as they are discovered. async scans them in kernel threads, @@ -1961,6 +2021,11 @@ and is between 256 and 4096 characters. It is defined in the file vdso=1: enable VDSO (default) vdso=0: disable VDSO mapping + vdso32= [X86-32,X86-64] + vdso32=2: enable compat VDSO (default with COMPAT_VDSO) + vdso32=1: enable 32-bit VDSO (default) + vdso32=0: disable 32-bit VDSO mapping + vector= [IA-64,SMP] vector=percpu: enable percpu vector domain diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index ca86a885ad8f..bf3256e04027 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt @@ -1,289 +1,386 @@ -The kobject Infrastructure +Everything you never wanted to know about kobjects, ksets, and ktypes -Patrick Mochel <mochel@osdl.org> +Greg Kroah-Hartman <gregkh@suse.de> -Updated: 3 June 2003 +Based on an original article by Jon Corbet for lwn.net written October 1, +2003 and located at http://lwn.net/Articles/51437/ +Last updated December 19, 2007 -Copyright (c) 2003 Patrick Mochel -Copyright (c) 2003 Open Source Development Labs +Part of the difficulty in understanding the driver model - and the kobject +abstraction upon which it is built - is that there is no obvious starting +place. Dealing with kobjects requires understanding a few different types, +all of which make reference to each other. In an attempt to make things +easier, we'll take a multi-pass approach, starting with vague terms and +adding detail as we go. To that end, here are some quick definitions of +some terms we will be working with. -0. Introduction + - A kobject is an object of type struct kobject. Kobjects have a name + and a reference count. A kobject also has a parent pointer (allowing + objects to be arranged into hierarchies), a specific type, and, + usually, a representation in the sysfs virtual filesystem. -The kobject infrastructure performs basic object management that larger -data structures and subsystems can leverage, rather than reimplement -similar functionality. This functionality primarily concerns: + Kobjects are generally not interesting on their own; instead, they are + usually embedded within some other structure which contains the stuff + the code is really interested in. -- Object reference counting. -- Maintaining lists (sets) of objects. -- Object set locking. -- Userspace representation. + No structure should EVER have more than one kobject embedded within it. + If it does, the reference counting for the object is sure to be messed + up and incorrect, and your code will be buggy. So do not do this. -The infrastructure consists of a number of object types to support -this functionality. Their programming interfaces are described below -in detail, and briefly here: + - A ktype is the type of object that embeds a kobject. Every structure + that embeds a kobject needs a corresponding ktype. The ktype controls + what happens to the kobject when it is created and destroyed. -- kobjects a simple object. -- kset a set of objects of a certain type. -- ktype a set of helpers for objects of a common type. + - A kset is a group of kobjects. These kobjects can be of the same ktype + or belong to different ktypes. The kset is the basic container type for + collections of kobjects. Ksets contain their own kobjects, but you can + safely ignore that implementation detail as the kset core code handles + this kobject automatically. + When you see a sysfs directory full of other directories, generally each + of those directories corresponds to a kobject in the same kset. -The kobject infrastructure maintains a close relationship with the -sysfs filesystem. Each kobject that is registered with the kobject -core receives a directory in sysfs. Attributes about the kobject can -then be exported. Please see Documentation/filesystems/sysfs.txt for -more information. +We'll look at how to create and manipulate all of these types. A bottom-up +approach will be taken, so we'll go back to kobjects. -The kobject infrastructure provides a flexible programming interface, -and allows kobjects and ksets to be used without being registered -(i.e. with no sysfs representation). This is also described later. +Embedding kobjects -1. kobjects +It is rare for kernel code to create a standalone kobject, with one major +exception explained below. Instead, kobjects are used to control access to +a larger, domain-specific object. To this end, kobjects will be found +embedded in other structures. If you are used to thinking of things in +object-oriented terms, kobjects can be seen as a top-level, abstract class +from which other classes are derived. A kobject implements a set of +capabilities which are not particularly useful by themselves, but which are +nice to have in other objects. The C language does not allow for the +direct expression of inheritance, so other techniques - such as structure +embedding - must be used. -1.1 Description +So, for example, the UIO code has a structure that defines the memory +region associated with a uio device: +struct uio_mem { + struct kobject kobj; + unsigned long addr; + unsigned long size; + int memtype; + void __iomem *internal_addr; +}; -struct kobject is a simple data type that provides a foundation for -more complex object types. It provides a set of basic fields that -almost all complex data types share. kobjects are intended to be -embedded in larger data structures and replace fields they duplicate. +If you have a struct uio_mem structure, finding its embedded kobject is +just a matter of using the kobj member. Code that works with kobjects will +often have the opposite problem, however: given a struct kobject pointer, +what is the pointer to the containing structure? You must avoid tricks +(such as assuming that the kobject is at the beginning of the structure) +and, instead, use the container_of() macro, found in <linux/kernel.h>: -1.2 Definition + container_of(pointer, type, member) -struct kobject { - const char * k_name; - struct kref kref; - struct list_head entry; - struct kobject * parent; - struct kset * kset; - struct kobj_type * ktype; - struct sysfs_dirent * sd; - wait_queue_head_t poll; -}; +where pointer is the pointer to the embedded kobject, type is the type of +the containing structure, and member is the name of the structure field to +which pointer points. The return value from container_of() is a pointer to +the given type. So, for example, a pointer "kp" to a struct kobject +embedded within a struct uio_mem could be converted to a pointer to the +containing uio_mem structure with: -void kobject_init(struct kobject *); -int kobject_add(struct kobject *); -int kobject_register(struct kobject *); + struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj); -void kobject_del(struct kobject *); -void kobject_unregister(struct kobject *); +Programmers often define a simple macro for "back-casting" kobject pointers +to the containing type. -struct kobject * kobject_get(struct kobject *); -void kobject_put(struct kobject *); +Initialization of kobjects -1.3 kobject Programming Interface +Code which creates a kobject must, of course, initialize that object. Some +of the internal fields are setup with a (mandatory) call to kobject_init(): -kobjects may be dynamically added and removed from the kobject core -using kobject_register() and kobject_unregister(). Registration -includes inserting the kobject in the list of its dominant kset and -creating a directory for it in sysfs. + void kobject_init(struct kobject *kobj, struct kobj_type *ktype); -Alternatively, one may use a kobject without adding it to its kset's list -or exporting it via sysfs, by simply calling kobject_init(). An -initialized kobject may later be added to the object hierarchy by -calling kobject_add(). An initialized kobject may be used for -reference counting. +The ktype is required for a kobject to be created properly, as every kobject +must have an associated kobj_type. After calling kobject_init(), to +register the kobject with sysfs, the function kobject_add() must be called: -Note: calling kobject_init() then kobject_add() is functionally -equivalent to calling kobject_register(). + int kobject_add(struct kobject *kobj, struct kobject *parent, const char *fmt, ...); -When a kobject is unregistered, it is removed from its kset's list, -removed from the sysfs filesystem, and its reference count is decremented. -List and sysfs removal happen in kobject_del(), and may be called -manually. kobject_put() decrements the reference count, and may also -be called manually. +This sets up the parent of the kobject and the name for the kobject +properly. If the kobject is to be associated with a specific kset, +kobj->kset must be assigned before calling kobject_add(). If a kset is +associated with a kobject, then the parent for the kobject can be set to +NULL in the call to kobject_add() and then the kobject's parent will be the +kset itself. -A kobject's reference count may be incremented with kobject_get(), -which returns a valid reference to a kobject; and decremented with -kobject_put(). An object's reference count may only be incremented if -it is already positive. +As the name of the kobject is set when it is added to the kernel, the name +of the kobject should never be manipulated directly. If you must change +the name of the kobject, call kobject_rename(): -When a kobject's reference count reaches 0, the method struct -kobj_type::release() (which the kobject's kset points to) is called. -This allows any memory allocated for the object to be freed. + int kobject_rename(struct kobject *kobj, const char *new_name); +There is a function called kobject_set_name() but that is legacy cruft and +is being removed. If your code needs to call this function, it is +incorrect and needs to be fixed. -NOTE!!! +To properly access the name of the kobject, use the function +kobject_name(): -It is _imperative_ that you supply a destructor for dynamically -allocated kobjects to free them if you are using kobject reference -counts. The reference count controls the lifetime of the object. -If it goes to 0, then it is assumed that the object will -be freed and cannot be used. + const char *kobject_name(const struct kobject * kobj); -More importantly, you must free the object there, and not immediately -after an unregister call. If someone else is referencing the object -(e.g. through a sysfs file), they will obtain a reference to the -object, assume it's valid and operate on it. If the object is -unregistered and freed in the meantime, the operation will then -reference freed memory and go boom. +There is a helper function to both initialize and add the kobject to the +kernel at the same time, called supprisingly enough kobject_init_and_add(): -This can be prevented, in the simplest case, by defining a release -method and freeing the object from there only. Note that this will not -secure reference count/object management models that use a dual -reference count or do other wacky things with the reference count -(like the networking layer). + int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, + struct kobject *parent, const char *fmt, ...); +The arguments are the same as the individual kobject_init() and +kobject_add() functions described above. -1.4 sysfs -Each kobject receives a directory in sysfs. This directory is created -under the kobject's parent directory. +Uevents -If a kobject does not have a parent when it is registered, its parent -becomes its dominant kset. +After a kobject has been registered with the kobject core, you need to +announce to the world that it has been created. This can be done with a +call to kobject_uevent(): -If a kobject does not have a parent nor a dominant kset, its directory -is created at the top-level of the sysfs partition. + int kobject_uevent(struct kobject *kobj, enum kobject_action action); +Use the KOBJ_ADD action for when the kobject is first added to the kernel. +This should be done only after any attributes or children of the kobject +have been initialized properly, as userspace will instantly start to look +for them when this call happens. +When the kobject is removed from the kernel (details on how to do that is +below), the uevent for KOBJ_REMOVE will be automatically created by the +kobject core, so the caller does not have to worry about doing that by +hand. -2. ksets -2.1 Description +Reference counts -A kset is a set of kobjects that are embedded in the same type. +One of the key functions of a kobject is to serve as a reference counter +for the object in which it is embedded. As long as references to the object +exist, the object (and the code which supports it) must continue to exist. +The low-level functions for manipulating a kobject's reference counts are: + struct kobject *kobject_get(struct kobject *kobj); + void kobject_put(struct kobject *kobj); -struct kset { - struct kobj_type * ktype; - struct list_head list; - struct kobject kobj; - struct kset_uevent_ops * uevent_ops; -}; +A successful call to kobject_get() will increment the kobject's reference +counter and return the pointer to the kobject. +When a reference is released, the call to kobject_put() will decrement the +reference count and, possibly, free the object. Note that kobject_init() +sets the reference count to one, so the code which sets up the kobject will +need to do a kobject_put() eventually to release that reference. -void kset_init(struct kset * k); -int kset_add(struct kset * k); -int kset_register(struct kset * k); -void kset_unregister(struct kset * k); +Because kobjects are dynamic, they must not be declared statically or on +the stack, but instead, always allocated dynamically. Future versions of +the kernel will contain a run-time check for kobjects that are created +statically and will warn the developer of this improper usage. -struct kset * kset_get(struct kset * k); -void kset_put(struct kset * k); +If all that you want to use a kobject for is to provide a reference counter +for your structure, please use the struct kref instead; a kobject would be +overkill. For more information on how to use struct kref, please see the +file Documentation/kref.txt in the Linux kernel source tree. -struct kobject * kset_find_obj(struct kset *, char *); +Creating "simple" kobjects -The type that the kobjects are embedded in is described by the ktype -pointer. +Sometimes all that a developer wants is a way to create a simple directory +in the sysfs hierarchy, and not have to mess with the whole complication of +ksets, show and store functions, and other details. This is the one +exception where a single kobject should be created. To create such an +entry, use the function: -A kset contains a kobject itself, meaning that it may be registered in -the kobject hierarchy and exported via sysfs. More importantly, the -kset may be embedded in a larger data type, and may be part of another -kset (of that object type). + struct kobject *kobject_create_and_add(char *name, struct kobject *parent); -For example, a block device is an object (struct gendisk) that is -contained in a set of block devices. It may also contain a set of -partitions (struct hd_struct) that have been found on the device. The -following code snippet illustrates how to express this properly. +This function will create a kobject and place it in sysfs in the location +underneath the specified parent kobject. To create simple attributes +associated with this kobject, use: - struct gendisk * disk; - ... - disk->kset.kobj.kset = &block_kset; - disk->kset.ktype = &partition_ktype; - kset_register(&disk->kset); + int sysfs_create_file(struct kobject *kobj, struct attribute *attr); +or + int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp); -- The kset that the disk's embedded object belongs to is the - block_kset, and is pointed to by disk->kset.kobj.kset. +Both types of attributes used here, with a kobject that has been created +with the kobject_create_and_add(), can be of type kobj_attribute, so no +special custom attribute is needed to be created. -- The type of objects on the disk's _subordinate_ list are partitions, - and is set in disk->kset.ktype. +See the example module, samples/kobject/kobject-example.c for an +implementation of a simple kobject and attributes. -- The kset is then registered, which handles initializing and adding - the embedded kobject to the hierarchy. -2.2 kset Programming Interface +ktypes and release methods -All kset functions, except kset_find_obj(), eventually forward the -calls to their embedded kobjects after performing kset-specific -operations. ksets offer a similar programming model to kobjects: they -may be used after they are initialized, without registering them in -the hierarchy. +One important thing still missing from the discussion is what happens to a +kobject when its reference count reaches zero. The code which created the +kobject generally does not know when that will happen; if it did, there +would be little point in using a kobject in the first place. Even +predictable object lifecycles become more complicated when sysfs is brought +in as other portions of the kernel can get a reference on any kobject that +is registered in the system. -kset_find_obj() may be used to locate a kobject with a particular -name. The kobject, if found, is returned. +The end result is that a structure protected by a kobject cannot be freed +before its reference count goes to zero. The reference count is not under +the direct control of the code which created the kobject. So that code must +be notified asynchronously whenever the last reference to one of its +kobjects goes away. -There are also some helper functions which names point to the formerly -existing "struct subsystem", whose functions have been taken over by -ksets. +Once you registered your kobject via kobject_add(), you must never use +kfree() to free it directly. The only safe way is to use kobject_put(). It +is good practice to always use kobject_put() after kobject_init() to avoid +errors creeping in. +This notification is done through a kobject's release() method. Usually +such a method has a form like: -decl_subsys(name,type,uevent_ops) + void my_object_release(struct kobject *kobj) + { + struct my_object *mine = container_of(kobj, struct my_object, kobj); -Declares a kset named '<name>_subsys' of type <type> with -uevent_ops <uevent_ops>. For example, + /* Perform any additional cleanup on this object, then... */ + kfree(mine); + } -decl_subsys(devices, &ktype_device, &device_uevent_ops); +One important point cannot be overstated: every kobject must have a +release() method, and the kobject must persist (in a consistent state) +until that method is called. If these constraints are not met, the code is +flawed. Note that the kernel will warn you if you forget to provide a +release() method. Do not try to get rid of this warning by providing an +"empty" release function; you will be mocked mercilessly by the kobject +maintainer if you attempt this. -is equivalent to doing: +Note, the name of the kobject is available in the release function, but it +must NOT be changed within this callback. Otherwise there will be a memory +leak in the kobject core, which makes people unhappy. -struct kset devices_subsys = { - .ktype = &ktype_devices, - .uevent_ops = &device_uevent_ops, -}; -kobject_set_name(&devices_subsys, name); +Interestingly, the release() method is not stored in the kobject itself; +instead, it is associated with the ktype. So let us introduce struct +kobj_type: + + struct kobj_type { + void (*release)(struct kobject *); + struct sysfs_ops *sysfs_ops; + struct attribute **default_attrs; + }; -The objects that are registered with a subsystem that use the -subsystem's default list must have their kset ptr set properly. These -objects may have embedded kobjects or ksets. The -following helper makes setting the kset easier: +This structure is used to describe a particular type of kobject (or, more +correctly, of containing object). Every kobject needs to have an associated +kobj_type structure; a pointer to that structure must be specified when you +call kobject_init() or kobject_init_and_add(). +The release field in struct kobj_type is, of course, a pointer to the +release() method for this type of kobject. The other two fields (sysfs_ops +and default_attrs) control how objects of this type are represented in +sysfs; they are beyond the scope of this document. -kobj_set_kset_s(obj,subsys) +The default_attrs pointer is a list of default attributes that will be +automatically created for any kobject that is registered with this ktype. -- Assumes that obj->kobj exists, and is a struct kobject. -- Sets the kset of that kobject to the kset <subsys>. -int subsystem_register(struct kset *s); -void subsystem_unregister(struct kset *s); +ksets -These are just wrappers around the respective kset_* functions. +A kset is merely a collection of kobjects that want to be associated with +each other. There is no restriction that they be of the same ktype, but be +very careful if they are not. -2.3 sysfs +A kset serves these functions: -ksets are represented in sysfs when their embedded kobjects are -registered. They follow the same rules of parenting, with one -exception. If a kset does not have a parent, nor is its embedded -kobject part of another kset, the kset's parent becomes its dominant -subsystem. + - It serves as a bag containing a group of objects. A kset can be used by + the kernel to track "all block devices" or "all PCI device drivers." -If the kset does not have a parent, its directory is created at the -sysfs root. This should only happen when the kset registered is -embedded in a subsystem itself. + - A kset is also a subdirectory in sysfs, where the associated kobjects + with the kset can show up. Every kset contains a kobject which can be + set up to be the parent of other kobjects; the top-level directories of + the sysfs hierarchy are constructed in this way. + - Ksets can support the "hotplugging" of kobjects and influence how + uevent events are reported to user space. -3. struct ktype +In object-oriented terms, "kset" is the top-level container class; ksets +contain their own kobject, but that kobject is managed by the kset code and +should not be manipulated by any other user. -3.1. Description +A kset keeps its children in a standard kernel linked list. Kobjects point +back to their containing kset via their kset field. In almost all cases, +the kobjects belonging to a ket have that kset (or, strictly, its embedded +kobject) in their parent. -struct kobj_type { - void (*release)(struct kobject *); - struct sysfs_ops * sysfs_ops; - struct attribute ** default_attrs; +As a kset contains a kobject within it, it should always be dynamically +created and never declared statically or on the stack. To create a new +kset use: + struct kset *kset_create_and_add(const char *name, + struct kset_uevent_ops *u, + struct kobject *parent); + +When you are finished with the kset, call: + void kset_unregister(struct kset *kset); +to destroy it. + +An example of using a kset can be seen in the +samples/kobject/kset-example.c file in the kernel tree. + +If a kset wishes to control the uevent operations of the kobjects +associated with it, it can use the struct kset_uevent_ops to handle it: + +struct kset_uevent_ops { + int (*filter)(struct kset *kset, struct kobject *kobj); + const char *(*name)(struct kset *kset, struct kobject *kobj); + int (*uevent)(struct kset *kset, struct kobject *kobj, + struct kobj_uevent_env *env); }; -Object types require specific functions for converting between the -generic object and the more complex type. struct kobj_type provides -the object-specific fields, which include: +The filter function allows a kset to prevent a uevent from being emitted to +userspace for a specific kobject. If the function returns 0, the uevent +will not be emitted. + +The name function will be called to override the default name of the kset +that the uevent sends to userspace. By default, the name will be the same +as the kset itself, but this function, if present, can override that name. + +The uevent function will be called when the uevent is about to be sent to +userspace to allow more environment variables to be added to the uevent. + +One might ask how, exactly, a kobject is added to a kset, given that no +functions which perform that function have been presented. The answer is +that this task is handled by kobject_add(). When a kobject is passed to +kobject_add(), its kset member should point to the kset to which the +kobject will belong. kobject_add() will handle the rest. + +If the kobject belonging to a kset has no parent kobject set, it will be +added to the kset's directory. Not all members of a kset do necessarily +live in the kset directory. If an explicit parent kobject is assigned +before the kobject is added, the kobject is registered with the kset, but +added below the parent kobject. + + +Kobject removal -- release: Called when the kobject's reference count reaches 0. This - should convert the object to the more complex type and free it. +After a kobject has been registered with the kobject core successfully, it +must be cleaned up when the code is finished with it. To do that, call +kobject_put(). By doing this, the kobject core will automatically clean up +all of the memory allocated by this kobject. If a KOBJ_ADD uevent has been +sent for the object, a corresponding KOBJ_REMOVE uevent will be sent, and +any other sysfs housekeeping will be handled for the caller properly. -- sysfs_ops: Provides conversion functions for sysfs access. Please - see the sysfs documentation for more information. +If you need to do a two-stage delete of the kobject (say you are not +allowed to sleep when you need to destroy the object), then call +kobject_del() which will unregister the kobject from sysfs. This makes the +kobject "invisible", but it is not cleaned up, and the reference count of +the object is still the same. At a later time call kobject_put() to finish +the cleanup of the memory associated with the kobject. -- default_attrs: Default attributes to be exported via sysfs when the - object is registered.Note that the last attribute has to be - initialized to NULL ! You can find a complete implementation - in block/genhd.c +kobject_del() can be used to drop the reference to the parent object, if +circular references are constructed. It is valid in some cases, that a +parent objects references a child. Circular references _must_ be broken +with an explicit call to kobject_del(), so that a release functions will be +called, and the objects in the former circle release each other. -Instances of struct kobj_type are not registered; only referenced by -the kset. A kobj_type may be referenced by an arbitrary number of -ksets, as there may be disparate sets of identical objects. +Example code to copy from +For a more complete example of using ksets and kobjects properly, see the +sample/kobject/kset-example.c code. diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index cb12ae175aa2..53a63890aea4 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -141,6 +141,7 @@ architectures: - ppc64 - ia64 (Does not support probes on instruction slot1.) - sparc64 (Return probes not yet implemented.) +- arm 3. Configuring Kprobes diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 9b0e322118b5..6c8a2386cd50 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -79,6 +79,9 @@ static void *guest_base; /* The maximum guest physical address allowed, and maximum possible. */ static unsigned long guest_limit, guest_max; +/* a per-cpu variable indicating whose vcpu is currently running */ +static unsigned int __thread cpu_id; + /* This is our list of devices. */ struct device_list { @@ -153,6 +156,9 @@ struct virtqueue void (*handle_output)(int fd, struct virtqueue *me); }; +/* Remember the arguments to the program so we can "reboot" */ +static char **main_args; + /* Since guest is UP and we don't run at the same time, we don't need barriers. * But I include them in the code in case others copy it. */ #define wmb() @@ -554,7 +560,7 @@ static void wake_parent(int pipefd, int lguest_fd) else FD_CLR(-fd - 1, &devices.infds); } else /* Send LHREQ_BREAK command. */ - write(lguest_fd, args, sizeof(args)); + pwrite(lguest_fd, args, sizeof(args), cpu_id); } } @@ -1489,7 +1495,9 @@ static void setup_block_file(const char *filename) /* Create stack for thread and run it */ stack = malloc(32768); - if (clone(io_thread, stack + 32768, CLONE_VM, dev) == -1) + /* SIGCHLD - We dont "wait" for our cloned thread, so prevent it from + * becoming a zombie. */ + if (clone(io_thread, stack + 32768, CLONE_VM | SIGCHLD, dev) == -1) err(1, "Creating clone"); /* We don't need to keep the I/O thread's end of the pipes open. */ @@ -1499,7 +1507,21 @@ static void setup_block_file(const char *filename) verbose("device %u: virtblock %llu sectors\n", devices.device_num, cap); } -/* That's the end of device setup. */ +/* That's the end of device setup. :*/ + +/* Reboot */ +static void __attribute__((noreturn)) restart_guest(void) +{ + unsigned int i; + + /* Closing pipes causes the waker thread and io_threads to die, and + * closing /dev/lguest cleans up the Guest. Since we don't track all + * open fds, we simply close everything beyond stderr. */ + for (i = 3; i < FD_SETSIZE; i++) + close(i); + execv(main_args[0], main_args); + err(1, "Could not exec %s", main_args[0]); +} /*L:220 Finally we reach the core of the Launcher, which runs the Guest, serves * its input and output, and finally, lays it to rest. */ @@ -1511,7 +1533,8 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd) int readval; /* We read from the /dev/lguest device to run the Guest. */ - readval = read(lguest_fd, ¬ify_addr, sizeof(notify_addr)); + readval = pread(lguest_fd, ¬ify_addr, + sizeof(notify_addr), cpu_id); /* One unsigned long means the Guest did HCALL_NOTIFY */ if (readval == sizeof(notify_addr)) { @@ -1521,16 +1544,23 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd) /* ENOENT means the Guest died. Reading tells us why. */ } else if (errno == ENOENT) { char reason[1024] = { 0 }; - read(lguest_fd, reason, sizeof(reason)-1); + pread(lguest_fd, reason, sizeof(reason)-1, cpu_id); errx(1, "%s", reason); + /* ERESTART means that we need to reboot the guest */ + } else if (errno == ERESTART) { + restart_guest(); /* EAGAIN means the Waker wanted us to look at some input. * Anything else means a bug or incompatible change. */ } else if (errno != EAGAIN) err(1, "Running guest failed"); + /* Only service input on thread for CPU 0. */ + if (cpu_id != 0) + continue; + /* Service input, then unset the BREAK to release the Waker. */ handle_input(lguest_fd); - if (write(lguest_fd, args, sizeof(args)) < 0) + if (pwrite(lguest_fd, args, sizeof(args), cpu_id) < 0) err(1, "Resetting break"); } } @@ -1571,6 +1601,12 @@ int main(int argc, char *argv[]) /* If they specify an initrd file to load. */ const char *initrd_name = NULL; + /* Save the args: we "reboot" by execing ourselves again. */ + main_args = argv; + /* We don't "wait" for the children, so prevent them from becoming + * zombies. */ + signal(SIGCHLD, SIG_IGN); + /* First we initialize the device list. Since console and network * device receive input from a file descriptor, we keep an fdset * (infds) and the maximum fd number (max_infd) with the head of the @@ -1582,6 +1618,7 @@ int main(int argc, char *argv[]) devices.lastdev = &devices.dev; devices.next_irq = 1; + cpu_id = 0; /* We need to know how much memory so we can set up the device * descriptor and memory pages for the devices as we parse the command * line. So we quickly look through the arguments to find the amount diff --git a/Documentation/m68k/kernel-options.txt b/Documentation/m68k/kernel-options.txt index 248589e8bcf5..c93bed66e25d 100644 --- a/Documentation/m68k/kernel-options.txt +++ b/Documentation/m68k/kernel-options.txt @@ -867,66 +867,6 @@ controller and should be autodetected by the driver. An example is the 24 bit region which is specified by a mask of 0x00fffffe. -5.5) 53c7xx= ------------- - -Syntax: 53c7xx=<sub-options...> - -These options affect the A4000T, A4091, WarpEngine, Blizzard 603e+, -and GForce 040/060 SCSI controllers on the Amiga, as well as the -builtin MVME 16x SCSI controller. - -The <sub-options> is a comma-separated list of the sub-options listed -below. - -5.5.1) nosync -------------- - -Syntax: nosync:0 - - Disables sync negotiation for all devices. Any value after the - colon is acceptable (and has the same effect). - -5.5.2) noasync --------------- - -[OBSOLETE, REMOVED] - -5.5.3) nodisconnect -------------------- - -Syntax: nodisconnect:0 - - Disables SCSI disconnects. Any value after the colon is acceptable - (and has the same effect). - -5.5.4) validids ---------------- - -Syntax: validids:0xNN - - Specify which SCSI ids the driver should pay attention to. This is - a bitmask (i.e. to only pay attention to ID#4, you'd use 0x10). - Default is 0x7f (devices 0-6). - -5.5.5) opthi -5.5.6) optlo ------------- - -Syntax: opthi:M,optlo:N - - Specify options for "hostdata->options". The acceptable definitions - are listed in drivers/scsi/53c7xx.h; the 32 high bits should be in - opthi and the 32 low bits in optlo. They must be specified in the - order opthi=M,optlo=N. - -5.5.7) next ------------ - - No argument. Used to separate blocks of keywords when there's more - than one 53c7xx host adapter in the system. - - /* Local Variables: */ /* mode: text */ /* End: */ diff --git a/Documentation/mips/00-INDEX b/Documentation/mips/00-INDEX index 3f13bf8043d2..8ae9cffc2262 100644 --- a/Documentation/mips/00-INDEX +++ b/Documentation/mips/00-INDEX @@ -2,5 +2,3 @@ - this file. AU1xxx_IDE.README - README for MIPS AU1XXX IDE driver. -GT64120.README - - README for dir with info on MIPS boards using GT-64120 or GT-64120A. diff --git a/Documentation/mips/GT64120.README b/Documentation/mips/GT64120.README deleted file mode 100644 index 2d0eec91dc59..000000000000 --- a/Documentation/mips/GT64120.README +++ /dev/null @@ -1,65 +0,0 @@ -README for arch/mips/gt64120 directory and subdirectories - -Jun Sun, jsun@mvista.com or jsun@junsun.net -01/27, 2001 - -MOTIVATION ----------- - -Many MIPS boards share the same system controller (or CPU companian chip), -such as GT-64120. It is highly desirable to let these boards share -the same controller code instead of duplicating them. - -This directory is meant to hold all MIPS boards that use GT-64120 or GT-64120A. - - -HOW TO ADD A BOARD ------------------- - -. Create a subdirectory include/asm/gt64120/<board>. - -. Create a file called gt64120_dep.h under that directory. - -. Modify include/asm/gt64120/gt64120.h file to include the new gt64120_dep.h - based on config options. The board-dep section is at the end of - include/asm/gt64120/gt64120.h file. There you can find all required - definitions include/asm/gt64120/<board>/gt64120_dep.h file must supply. - -. Create a subdirectory arch/mips/gt64120/<board> directory to hold - board specific routines. - -. The GT-64120 common code is supplied under arch/mips/gt64120/common directory. - It includes: - 1) arch/mips/gt64120/pci.c - - common PCI routine, include the top-level pcibios_init() - 2) arch/mips/gt64120/irq.c - - common IRQ routine, include the top-level do_IRQ() - [This part really belongs to arch/mips/kernel. jsun] - 3) arch/mips/gt64120/gt_irq.c - - common IRQ routines for GT-64120 chip. Currently it only handles - the timer interrupt. - -. Board-specific routines are supplied under arch/mips/gt64120/<board> dir. - 1) arch/mips/gt64120/<board>/pci.c - it provides bus fixup routine - 2) arch/mips/gt64120/<board>/irq.c - it provides enable/disable irqs - and board irq setup routine (irq_setup) - 3) arch/mips/gt64120/<board>/int-handler.S - - The first-level interrupt dispatching routine. - 4) a bunch of other "normal" stuff (setup, prom, dbg_io, reset, etc) - -. Follow other "normal" procedure to modify configuration files, etc. - - -TO-DO LIST ----------- - -. Expand arch/mips/gt64120/gt_irq.c to handle all GT-64120 interrupts. - We probably need to introduce GT_IRQ_BASE in board-dep header file, - which is used the starting irq_nr for all GT irqs. - - A function, gt64120_handle_irq(), will be added so that the first-level - irq dispatcher will call this function if it detects an interrupt - from GT-64120. - -. More support for GT-64120 PCI features (2nd PCI bus, perhaps) - diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 563e442f2d42..02e56d447a8f 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -24,6 +24,8 @@ baycom.txt - info on the driver for Baycom style amateur radio modems bridge.txt - where to get user space programs for ethernet bridging with Linux. +can.txt + - documentation on CAN protocol family. cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt @@ -82,8 +84,6 @@ policy-routing.txt - IP policy-based routing ray_cs.txt - Raylink Wireless LAN card driver info. -shaper.txt - - info on the module that can shape/limit transmitted traffic. sk98lin.txt - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit Ethernet Adapter family driver info diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6cc30e0d5795..a0cda062bc33 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,7 +1,7 @@ Linux Ethernet Bonding Driver HOWTO - Latest update: 24 April 2006 + Latest update: 12 November 2007 Initial release : Thomas Davis <tadavis at lbl.gov> Corrections, HA extensions : 2000/10/03-15 : @@ -166,12 +166,17 @@ to use ifenslave. 2. Bonding Driver Options ========================= - Options for the bonding driver are supplied as parameters to -the bonding module at load time. They may be given as command line -arguments to the insmod or modprobe command, but are usually specified -in either the /etc/modules.conf or /etc/modprobe.conf configuration -file, or in a distro-specific configuration file (some of which are -detailed in the next section). + Options for the bonding driver are supplied as parameters to the +bonding module at load time, or are specified via sysfs. + + Module options may be given as command line arguments to the +insmod or modprobe command, but are usually specified in either the +/etc/modules.conf or /etc/modprobe.conf configuration file, or in a +distro-specific configuration file (some of which are detailed in the next +section). + + Details on bonding support for sysfs is provided in the +"Configuring Bonding Manually via Sysfs" section, below. The available bonding driver parameters are listed below. If a parameter is not specified the default value is used. When initially @@ -812,11 +817,13 @@ the system /etc/modules.conf or /etc/modprobe.conf configuration file. 3.2 Configuration with Initscripts Support ------------------------------------------ - This section applies to distros using a version of initscripts -with bonding support, for example, Red Hat Linux 9 or Red Hat -Enterprise Linux version 3 or 4. On these systems, the network -initialization scripts have some knowledge of bonding, and can be -configured to control bonding devices. + This section applies to distros using a recent version of +initscripts with bonding support, for example, Red Hat Enterprise Linux +version 3 or later, Fedora, etc. On these systems, the network +initialization scripts have knowledge of bonding, and can be configured to +control bonding devices. Note that older versions of the initscripts +package have lower levels of support for bonding; this will be noted where +applicable. These distros will not automatically load the network adapter driver unless the ethX device is configured with an IP address. @@ -864,11 +871,31 @@ USERCTL=no Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. - Finally, it is necessary to edit /etc/modules.conf (or -/etc/modprobe.conf, depending upon your distro) to load the bonding -module with your desired options when the bond0 interface is brought -up. The following lines in /etc/modules.conf (or modprobe.conf) will -load the bonding module, and select its options: + For later versions of initscripts, such as that found with Fedora +7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and, +indeed, preferable, to specify the bonding options in the ifcfg-bond0 +file, e.g. a line of the format: + +BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254" + + will configure the bond with the specified options. The options +specified in BONDING_OPTS are identical to the bonding module parameters +except for the arp_ip_target field. Each target should be included as a +separate option and should be preceded by a '+' to indicate it should be +added to the list of queried targets, e.g., + + arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 + + is the proper syntax to specify multiple targets. When specifying +options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or +/etc/modprobe.conf. + + For older versions of initscripts that do not support +BONDING_OPTS, it is necessary to edit /etc/modules.conf (or +/etc/modprobe.conf, depending upon your distro) to load the bonding module +with your desired options when the bond0 interface is brought up. The +following lines in /etc/modules.conf (or modprobe.conf) will load the +bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -883,9 +910,10 @@ up and running. 3.2.1 Using DHCP with Initscripts --------------------------------- - Recent versions of initscripts (the version supplied with -Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do -have support for assigning IP information to bonding devices via DHCP. + Recent versions of initscripts (the versions supplied with Fedora +Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to +work) have support for assigning IP information to bonding devices via +DHCP. To configure bonding for DHCP, configure it as described above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" @@ -895,18 +923,14 @@ is case sensitive. 3.2.2 Configuring Multiple Bonds with Initscripts ------------------------------------------------- - At this writing, the initscripts package does not directly -support loading the bonding driver multiple times, so the process for -doing so is the same as described in the "Configuring Multiple Bonds -Manually" section, below. - - NOTE: It has been observed that some Red Hat supplied kernels -are apparently unable to rename modules at load time (the "-o bond1" -part). Attempts to pass that option to modprobe will produce an -"Operation not permitted" error. This has been reported on some -Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels -exhibiting this problem, it will be impossible to configure multiple -bonds with differing parameters. + Initscripts packages that are included with Fedora 7 and Red Hat +Enterprise Linux 5 support multiple bonding interfaces by simply +specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the +number of the bond. This support requires sysfs support in the kernel, +and a bonding driver of version 3.0.0 or later. Other configurations may +not support this method for specifying multiple bonding interfaces; for +those instances, see the "Configuring Multiple Bonds Manually" section, +below. 3.3 Configuring Bonding Manually with Ifenslave ----------------------------------------------- @@ -977,15 +1001,58 @@ initialization scripts lack support for configuring multiple bonds. options, you may wish to use the "max_bonds" module parameter, documented above. - To create multiple bonding devices with differing options, it -is necessary to use bonding parameters exported by sysfs, documented -in the section below. + To create multiple bonding devices with differing options, it is +preferrable to use bonding parameters exported by sysfs, documented in the +section below. + + For versions of bonding without sysfs support, the only means to +provide multiple instances of bonding with differing options is to load +the bonding driver multiple times. Note that current versions of the +sysconfig network initialization scripts handle this automatically; if +your distro uses these scripts, no special action is needed. See the +section Configuring Bonding Devices, above, if you're not sure about your +network initialization scripts. + + To load multiple instances of the module, it is necessary to +specify a different name for each instance (the module loading system +requires that every loaded module, even multiple instances of the same +module, have a unique name). This is accomplished by supplying multiple +sets of bonding options in /etc/modprobe.conf, for example: + +alias bond0 bonding +options bond0 -o bond0 mode=balance-rr miimon=100 + +alias bond1 bonding +options bond1 -o bond1 mode=balance-alb miimon=50 + + will load the bonding module two times. The first instance is +named "bond0" and creates the bond0 device in balance-rr mode with an +miimon of 100. The second instance is named "bond1" and creates the +bond1 device in balance-alb mode with an miimon of 50. + + In some circumstances (typically with older distributions), +the above does not work, and the second bonding instance never sees +its options. In that case, the second options line can be substituted +as follows: + +install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ + mode=balance-alb miimon=50 + This may be repeated any number of times, specifying a new and +unique name in place of bond1 for each subsequent instance. + + It has been observed that some Red Hat supplied kernels are unable +to rename modules at load time (the "-o bond1" part). Attempts to pass +that option to modprobe will produce an "Operation not permitted" error. +This has been reported on some Fedora Core kernels, and has been seen on +RHEL 4 as well. On kernels exhibiting this problem, it will be impossible +to configure multiple bonds with differing parameters (as they are older +kernels, and also lack sysfs support). 3.4 Configuring Bonding Manually via Sysfs ------------------------------------------ - Starting with version 3.0, Channel Bonding may be configured + Starting with version 3.0.0, Channel Bonding may be configured via the sysfs interface. This interface allows dynamic configuration of all bonds in the system without unloading the module. It also allows for adding and removing bonds at runtime. Ifenslave is no @@ -1030,9 +1097,6 @@ To enslave interface eth0 to bond bond0: To free slave eth0 from bond bond0: # echo -eth0 > /sys/class/net/bond0/bonding/slaves - NOTE: The bond must be up before slaves can be added. All -slaves are freed when the interface is brought down. - When an interface is enslaved to a bond, symlinks between the two are created in the sysfs filesystem. In this case, you would get /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and @@ -1622,6 +1686,15 @@ one for each switch in the network). This will insure that, regardless of which switch is active, the ARP monitor has a suitable target to query. + Note, also, that of late many switches now support a functionality +generally referred to as "trunk failover." This is a feature of the +switch that causes the link state of a particular switch port to be set +down (or up) when the state of another switch port goes down (or up). +It's purpose is to propogate link failures from logically "exterior" ports +to the logically "interior" ports that bonding is able to monitor via +miimon. Availability and configuration for trunk failover varies by +switch, but this can be a viable alternative to the ARP monitor when using +suitable switches. 12. Configuring Bonding for Maximum Throughput ============================================== @@ -1709,7 +1782,7 @@ balance-rr: This mode is the only mode that will permit a single interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the - striping often results in peer systems receiving packets out + striping generally results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments. @@ -1721,22 +1794,20 @@ balance-rr: This mode is the only mode that will permit a single interface's worth of throughput, even after adjusting tcp_reordering. - Note that this out of order delivery occurs when both the - sending and receiving systems are utilizing a multiple - interface bond. Consider a configuration in which a - balance-rr bond feeds into a single higher capacity network - channel (e.g., multiple 100Mb/sec ethernets feeding a single - gigabit ethernet via an etherchannel capable switch). In this - configuration, traffic sent from the multiple 100Mb devices to - a destination connected to the gigabit device will not see - packets out of order. However, traffic sent from the gigabit - device to the multiple 100Mb devices may or may not see - traffic out of order, depending upon the balance policy of the - switch. Many switches do not support any modes that stripe - traffic (instead choosing a port based upon IP or MAC level - addresses); for those devices, traffic flowing from the - gigabit device to the many 100Mb devices will only utilize one - interface. + Note that the fraction of packets that will be delivered out of + order is highly variable, and is unlikely to be zero. The level + of reordering depends upon a variety of factors, including the + networking interfaces, the switch, and the topology of the + configuration. Speaking in general terms, higher speed network + cards produce more reordering (due to factors such as packet + coalescing), and a "many to many" topology will reorder at a + higher rate than a "many slow to one fast" configuration. + + Many switches do not support any modes that stripe traffic + (instead choosing a port based upon IP or MAC level addresses); + for those devices, traffic for a particular connection flowing + through the switch to a balance-rr bond will not utilize greater + than one interface's worth of bandwidth. If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order @@ -1936,6 +2007,10 @@ Failover may be delayed via the downdelay bonding module option. 13.2 Duplicated Incoming Packets -------------------------------- + NOTE: Starting with version 3.0.2, the bonding driver has logic to +suppress duplicate packets, which should largely eliminate this problem. +The following description is kept for reference. + It is not uncommon to observe a short burst of duplicated traffic when the bonding device is first used, or after it has been idle for some period of time. This is most easily observed by issuing @@ -2096,6 +2171,9 @@ The new driver was designed to be SMP safe from the start. EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, devices need not be of the same speed. + Starting with version 3.2.1, bonding also supports Infiniband +slaves in active-backup mode. + 3. How many bonding devices can I have? There is no limit. @@ -2154,11 +2232,15 @@ switches currently available support 802.3ad. 8. Where does a bonding device get its MAC address from? - If not explicitly configured (with ifconfig or ip link), the -MAC address of the bonding device is taken from its first slave -device. This MAC address is then passed to all following slaves and -remains persistent (even if the first slave is removed) until the -bonding device is brought down or reconfigured. + When using slave devices that have fixed MAC addresses, or when +the fail_over_mac option is enabled, the bonding device's MAC address is +the MAC address of the active slave. + + For other configurations, if not explicitly configured (with +ifconfig or ip link), the MAC address of the bonding device is taken from +its first slave device. This MAC address is then passed to all following +slaves and remains persistent (even if the first slave is removed) until +the bonding device is brought down or reconfigured. If you wish to change the MAC address, you can set it with ifconfig or ip link: diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt new file mode 100644 index 000000000000..f1b2de170929 --- /dev/null +++ b/Documentation/networking/can.txt @@ -0,0 +1,629 @@ +============================================================================ + +can.txt + +Readme file for the Controller Area Network Protocol Family (aka Socket CAN) + +This file contains + + 1 Overview / What is Socket CAN + + 2 Motivation / Why using the socket API + + 3 Socket CAN concept + 3.1 receive lists + 3.2 local loopback of sent frames + 3.3 network security issues (capabilities) + 3.4 network problem notifications + + 4 How to use Socket CAN + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + 4.1.1 RAW socket option CAN_RAW_FILTER + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + 5 Socket CAN core module + 5.1 can.ko module params + 5.2 procfs content + 5.3 writing own CAN protocol modules + + 6 CAN network drivers + 6.1 general settings + 6.2 local loopback of sent frames + 6.3 CAN controller hardware filters + 6.4 currently supported CAN hardware + 6.5 todo + + 7 Credits + +============================================================================ + +1. Overview / What is Socket CAN +-------------------------------- + +The socketcan package is an implementation of CAN protocols +(Controller Area Network) for Linux. CAN is a networking technology +which has widespread use in automation, embedded devices, and +automotive fields. While there have been other CAN implementations +for Linux based on character devices, Socket CAN uses the Berkeley +socket API, the Linux network stack and implements the CAN device +drivers as network interfaces. The CAN socket API has been designed +as similar as possible to the TCP/IP protocols to allow programmers, +familiar with network programming, to easily learn how to use CAN +sockets. + +2. Motivation / Why using the socket API +---------------------------------------- + +There have been CAN implementations for Linux before Socket CAN so the +question arises, why we have started another project. Most existing +implementations come as a device driver for some CAN hardware, they +are based on character devices and provide comparatively little +functionality. Usually, there is only a hardware-specific device +driver which provides a character device interface to send and +receive raw CAN frames, directly to/from the controller hardware. +Queueing of frames and higher-level transport protocols like ISO-TP +have to be implemented in user space applications. Also, most +character-device implementations support only one single process to +open the device at a time, similar to a serial interface. Exchanging +the CAN controller requires employment of another device driver and +often the need for adaption of large parts of the application to the +new driver's API. + +Socket CAN was designed to overcome all of these limitations. A new +protocol family has been implemented which provides a socket interface +to user space applications and which builds upon the Linux network +layer, so to use all of the provided queueing functionality. A device +driver for CAN controller hardware registers itself with the Linux +network layer as a network device, so that CAN frames from the +controller can be passed up to the network layer and on to the CAN +protocol family module and also vice-versa. Also, the protocol family +module provides an API for transport protocol modules to register, so +that any number of transport protocols can be loaded or unloaded +dynamically. In fact, the can core module alone does not provide any +protocol and cannot be used without loading at least one additional +protocol module. Multiple sockets can be opened at the same time, +on different or the same protocol module and they can listen/send +frames on different or the same CAN IDs. Several sockets listening on +the same interface for frames with the same CAN ID are all passed the +same received matching CAN frames. An application wishing to +communicate using a specific transport protocol, e.g. ISO-TP, just +selects that protocol when opening the socket, and then can read and +write application data byte streams, without having to deal with +CAN-IDs, frames, etc. + +Similar functionality visible from user-space could be provided by a +character device, too, but this would lead to a technically inelegant +solution for a couple of reasons: + +* Intricate usage. Instead of passing a protocol argument to + socket(2) and using bind(2) to select a CAN interface and CAN ID, an + application would have to do all these operations using ioctl(2)s. + +* Code duplication. A character device cannot make use of the Linux + network queueing code, so all that code would have to be duplicated + for CAN networking. + +* Abstraction. In most existing character-device implementations, the + hardware-specific device driver for a CAN controller directly + provides the character device for the application to work with. + This is at least very unusual in Unix systems for both, char and + block devices. For example you don't have a character device for a + certain UART of a serial interface, a certain sound chip in your + computer, a SCSI or IDE controller providing access to your hard + disk or tape streamer device. Instead, you have abstraction layers + which provide a unified character or block device interface to the + application on the one hand, and a interface for hardware-specific + device drivers on the other hand. These abstractions are provided + by subsystems like the tty layer, the audio subsystem or the SCSI + and IDE subsystems for the devices mentioned above. + + The easiest way to implement a CAN device driver is as a character + device without such a (complete) abstraction layer, as is done by most + existing drivers. The right way, however, would be to add such a + layer with all the functionality like registering for certain CAN + IDs, supporting several open file descriptors and (de)multiplexing + CAN frames between them, (sophisticated) queueing of CAN frames, and + providing an API for device drivers to register with. However, then + it would be no more difficult, or may be even easier, to use the + networking framework provided by the Linux kernel, and this is what + Socket CAN does. + + The use of the networking framework of the Linux kernel is just the + natural and most appropriate way to implement CAN for Linux. + +3. Socket CAN concept +--------------------- + + As described in chapter 2 it is the main goal of Socket CAN to + provide a socket interface to user space applications which builds + upon the Linux network layer. In contrast to the commonly known + TCP/IP and ethernet networking, the CAN bus is a broadcast-only(!) + medium that has no MAC-layer addressing like ethernet. The CAN-identifier + (can_id) is used for arbitration on the CAN-bus. Therefore the CAN-IDs + have to be chosen uniquely on the bus. When designing a CAN-ECU + network the CAN-IDs are mapped to be sent by a specific ECU. + For this reason a CAN-ID can be treated best as a kind of source address. + + 3.1 receive lists + + The network transparent access of multiple applications leads to the + problem that different applications may be interested in the same + CAN-IDs from the same CAN network interface. The Socket CAN core + module - which implements the protocol family CAN - provides several + high efficient receive lists for this reason. If e.g. a user space + application opens a CAN RAW socket, the raw protocol module itself + requests the (range of) CAN-IDs from the Socket CAN core that are + requested by the user. The subscription and unsubscription of + CAN-IDs can be done for specific CAN interfaces or for all(!) known + CAN interfaces with the can_rx_(un)register() functions provided to + CAN protocol modules by the SocketCAN core (see chapter 5). + To optimize the CPU usage at runtime the receive lists are split up + into several specific lists per device that match the requested + filter complexity for a given use-case. + + 3.2 local loopback of sent frames + + As known from other networking concepts the data exchanging + applications may run on the same or different nodes without any + change (except for the according addressing information): + + ___ ___ ___ _______ ___ + | _ | | _ | | _ | | _ _ | | _ | + ||A|| ||B|| ||C|| ||A| |B|| ||C|| + |___| |___| |___| |_______| |___| + | | | | | + -----------------(1)- CAN bus -(2)--------------- + + To ensure that application A receives the same information in the + example (2) as it would receive in example (1) there is need for + some kind of local loopback of the sent CAN frames on the appropriate + node. + + The Linux network devices (by default) just can handle the + transmission and reception of media dependent frames. Due to the + arbritration on the CAN bus the transmission of a low prio CAN-ID + may be delayed by the reception of a high prio CAN frame. To + reflect the correct* traffic on the node the loopback of the sent + data has to be performed right after a successful transmission. If + the CAN network interface is not capable of performing the loopback for + some reason the SocketCAN core can do this task as a fallback solution. + See chapter 6.2 for details (recommended). + + The loopback functionality is enabled by default to reflect standard + networking behaviour for CAN applications. Due to some requests from + the RT-SocketCAN group the loopback optionally may be disabled for each + separate socket. See sockopts from the CAN RAW sockets in chapter 4.1. + + * = you really like to have this when you're running analyser tools + like 'candump' or 'cansniffer' on the (same) node. + + 3.3 network security issues (capabilities) + + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since the currently implemented CAN_RAW and CAN_BCM sockets can only + send and receive frames to/from CAN interfaces it does not affect + security of others networks to allow all users to access the CAN. + To enable non-root users to access CAN_RAW and CAN_BCM protocol + sockets the Kconfig options CAN_RAW_USER and/or CAN_BCM_USER may be + selected at kernel compile time. + + 3.4 network problem notifications + + The use of the CAN bus may lead to several problems on the physical + and media access control layer. Detecting and logging of these lower + layer problems is a vital requirement for CAN users to identify + hardware issues on the physical transceiver layer as well as + arbitration problems and error frames caused by the different + ECUs. The occurrence of detected errors are important for diagnosis + and have to be logged together with the exact timestamp. For this + reason the CAN interface driver can generate so called Error Frames + that can optionally be passed to the user application in the same + way as other CAN frames. Whenever an error on the physical layer + or the MAC layer is detected (e.g. by the CAN controller) the driver + creates an appropriate error frame. Error frames can be requested by + the user application using the common CAN filter mechanisms. Inside + this filter definition the (interested) type of errors may be + selected. The reception of error frames is disabled by default. + +4. How to use Socket CAN +------------------------ + + Like TCP/IP, you first need to open a socket for communicating over a + CAN network. Since Socket CAN implements a new protocol family, you + need to pass PF_CAN as the first argument to the socket(2) system + call. Currently, there are two CAN protocols to choose from, the raw + socket protocol and the broadcast manager (BCM). So to open a socket, + you would write + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + and + + s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM); + + respectively. After the successful creation of the socket, you would + normally use the bind(2) system call to bind the socket to a CAN + interface (which is different from TCP/IP due to different addressing + - see chapter 3). After binding (CAN_RAW) or connecting (CAN_BCM) + the socket, you can read(2) and write(2) from/to the socket or use + send(2), sendto(2), sendmsg(2) and the recv* counterpart operations + on the socket as usual. There are also CAN specific socket options + described below. + + The basic CAN frame structure and the sockaddr structure are defined + in include/linux/can.h: + + struct can_frame { + canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ + __u8 can_dlc; /* data length code: 0 .. 8 */ + __u8 data[8] __attribute__((aligned(8))); + }; + + The alignment of the (linear) payload data[] to a 64bit boundary + allows the user to define own structs and unions to easily access the + CAN payload. There is no given byteorder on the CAN bus by + default. A read(2) system call on a CAN_RAW socket transfers a + struct can_frame to the user space. + + The sockaddr_can structure has an interface index like the + PF_PACKET socket, that also binds to a specific interface: + + struct sockaddr_can { + sa_family_t can_family; + int can_ifindex; + union { + struct { canid_t rx_id, tx_id; } tp16; + struct { canid_t rx_id, tx_id; } tp20; + struct { canid_t rx_id, tx_id; } mcnet; + struct { canid_t rx_id, tx_id; } isotp; + } can_addr; + }; + + To determine the interface index an appropriate ioctl() has to + be used (example for CAN_RAW sockets without error checking): + + int s; + struct sockaddr_can addr; + struct ifreq ifr; + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + strcpy(ifr.ifr_name, "can0" ); + ioctl(s, SIOCGIFINDEX, &ifr); + + addr.can_family = AF_CAN; + addr.can_ifindex = ifr.ifr_ifindex; + + bind(s, (struct sockaddr *)&addr, sizeof(addr)); + + (..) + + To bind a socket to all(!) CAN interfaces the interface index must + be 0 (zero). In this case the socket receives CAN frames from every + enabled CAN interface. To determine the originating CAN interface + the system call recvfrom(2) may be used instead of read(2). To send + on a socket that is bound to 'any' interface sendto(2) is needed to + specify the outgoing interface. + + Reading CAN frames from a bound CAN_RAW socket (see above) consists + of reading a struct can_frame: + + struct can_frame frame; + + nbytes = read(s, &frame, sizeof(struct can_frame)); + + if (nbytes < 0) { + perror("can raw socket read"); + return 1; + } + + /* paraniod check ... */ + if (nbytes < sizeof(struct can_frame)) { + fprintf(stderr, "read: incomplete CAN frame\n"); + return 1; + } + + /* do something with the received CAN frame */ + + Writing CAN frames can be done similarly, with the write(2) system call: + + nbytes = write(s, &frame, sizeof(struct can_frame)); + + When the CAN interface is bound to 'any' existing CAN interface + (addr.can_ifindex = 0) it is recommended to use recvfrom(2) if the + information about the originating CAN interface is needed: + + struct sockaddr_can addr; + struct ifreq ifr; + socklen_t len = sizeof(addr); + struct can_frame frame; + + nbytes = recvfrom(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, &len); + + /* get interface name of the received CAN frame */ + ifr.ifr_ifindex = addr.can_ifindex; + ioctl(s, SIOCGIFNAME, &ifr); + printf("Received a CAN frame from interface %s", ifr.ifr_name); + + To write CAN frames on sockets bound to 'any' CAN interface the + outgoing interface has to be defined certainly. + + strcpy(ifr.ifr_name, "can0"); + ioctl(s, SIOCGIFINDEX, &ifr); + addr.can_ifindex = ifr.ifr_ifindex; + addr.can_family = AF_CAN; + + nbytes = sendto(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, sizeof(addr)); + + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + + Using CAN_RAW sockets is extensively comparable to the commonly + known access to CAN character devices. To meet the new possibilities + provided by the multi user SocketCAN approach, some reasonable + defaults are set at RAW socket binding time: + + - The filters are set to exactly one filter receiving everything + - The socket only receives valid data frames (=> no error frames) + - The loopback of sent CAN frames is enabled (see chapter 3.2) + - The socket does not receive its own sent frames (in loopback mode) + + These default settings may be changed before or after binding the socket. + To use the referenced definitions of the socket options for CAN_RAW + sockets, include <linux/can/raw.h>. + + 4.1.1 RAW socket option CAN_RAW_FILTER + + The reception of CAN frames using CAN_RAW sockets can be controlled + by defining 0 .. n filters with the CAN_RAW_FILTER socket option. + + The CAN filter structure is defined in include/linux/can.h: + + struct can_filter { + canid_t can_id; + canid_t can_mask; + }; + + A filter matches, when + + <received_can_id> & mask == can_id & mask + + which is analogous to known CAN controllers hardware filter semantics. + The filter can be inverted in this semantic, when the CAN_INV_FILTER + bit is set in can_id element of the can_filter structure. In + contrast to CAN controller hardware filters the user may set 0 .. n + receive filters for each open socket separately: + + struct can_filter rfilter[2]; + + rfilter[0].can_id = 0x123; + rfilter[0].can_mask = CAN_SFF_MASK; + rfilter[1].can_id = 0x200; + rfilter[1].can_mask = 0x700; + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, &rfilter, sizeof(rfilter)); + + To disable the reception of CAN frames on the selected CAN_RAW socket: + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, NULL, 0); + + To set the filters to zero filters is quite obsolete as not read + data causes the raw socket to discard the received CAN frames. But + having this 'send only' use-case we may remove the receive list in the + Kernel to save a little (really a very little!) CPU usage. + + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + + As described in chapter 3.4 the CAN interface driver can generate so + called Error Frames that can optionally be passed to the user + application in the same way as other CAN frames. The possible + errors are divided into different error classes that may be filtered + using the appropriate error mask. To register for every possible + error condition CAN_ERR_MASK can be used as value for the error mask. + The values for the error mask are defined in linux/can/error.h . + + can_err_mask_t err_mask = ( CAN_ERR_TX_TIMEOUT | CAN_ERR_BUSOFF ); + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_ERR_FILTER, + &err_mask, sizeof(err_mask)); + + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + + To meet multi user needs the local loopback is enabled by default + (see chapter 3.2 for details). But in some embedded use-cases + (e.g. when only one application uses the CAN bus) this loopback + functionality can be disabled (separately for each socket): + + int loopback = 0; /* 0 = disabled, 1 = enabled (default) */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_LOOPBACK, &loopback, sizeof(loopback)); + + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + + When the local loopback is enabled, all the sent CAN frames are + looped back to the open CAN sockets that registered for the CAN + frames' CAN-ID on this given interface to meet the multi user + needs. The reception of the CAN frames on the same socket that was + sending the CAN frame is assumed to be unwanted and therefore + disabled by default. This default behaviour may be changed on + demand: + + int recv_own_msgs = 1; /* 0 = disabled (default), 1 = enabled */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, + &recv_own_msgs, sizeof(recv_own_msgs)); + + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + +5. Socket CAN core module +------------------------- + + The Socket CAN core module implements the protocol family + PF_CAN. CAN protocol modules are loaded by the core module at + runtime. The core module provides an interface for CAN protocol + modules to subscribe needed CAN IDs (see chapter 3.1). + + 5.1 can.ko module params + + - stats_timer: To calculate the Socket CAN core statistics + (e.g. current/maximum frames per second) this 1 second timer is + invoked at can.ko module start time by default. This timer can be + disabled by using stattimer=0 on the module comandline. + + - debug: (removed since SocketCAN SVN r546) + + 5.2 procfs content + + As described in chapter 3.1 the Socket CAN core uses several filter + lists to deliver received CAN frames to CAN protocol modules. These + receive lists, their filters and the count of filter matches can be + checked in the appropriate receive list. All entries contain the + device and a protocol module identifier: + + foo@bar:~$ cat /proc/net/can/rcvlist_all + + receive list 'rx_all': + (vcan3: no entry) + (vcan2: no entry) + (vcan1: no entry) + device can_id can_mask function userdata matches ident + vcan0 000 00000000 f88e6370 f6c6f400 0 raw + (any: no entry) + + In this example an application requests any CAN traffic from vcan0. + + rcvlist_all - list for unfiltered entries (no filter operations) + rcvlist_eff - list for single extended frame (EFF) entries + rcvlist_err - list for error frames masks + rcvlist_fil - list for mask/value filters + rcvlist_inv - list for mask/value filters (inverse semantic) + rcvlist_sff - list for single standard frame (SFF) entries + + Additional procfs files in /proc/net/can + + stats - Socket CAN core statistics (rx/tx frames, match ratios, ...) + reset_stats - manual statistic reset + version - prints the Socket CAN core version and the ABI version + + 5.3 writing own CAN protocol modules + + To implement a new protocol in the protocol family PF_CAN a new + protocol has to be defined in include/linux/can.h . + The prototypes and definitions to use the Socket CAN core can be + accessed by including include/linux/can/core.h . + In addition to functions that register the CAN protocol and the + CAN device notifier chain there are functions to subscribe CAN + frames received by CAN interfaces and to send CAN frames: + + can_rx_register - subscribe CAN frames from a specific interface + can_rx_unregister - unsubscribe CAN frames from a specific interface + can_send - transmit a CAN frame (optional with local loopback) + + For details see the kerneldoc documentation in net/can/af_can.c or + the source code of net/can/raw.c or net/can/bcm.c . + +6. CAN network drivers +---------------------- + + Writing a CAN network device driver is much easier than writing a + CAN character device driver. Similar to other known network device + drivers you mainly have to deal with: + + - TX: Put the CAN frame from the socket buffer to the CAN controller. + - RX: Put the CAN frame from the CAN controller to the socket buffer. + + See e.g. at Documentation/networking/netdevices.txt . The differences + for writing CAN network device driver are described below: + + 6.1 general settings + + dev->type = ARPHRD_CAN; /* the netdevice hardware type */ + dev->flags = IFF_NOARP; /* CAN has no arp */ + + dev->mtu = sizeof(struct can_frame); + + The struct can_frame is the payload of each socket buffer in the + protocol family PF_CAN. + + 6.2 local loopback of sent frames + + As described in chapter 3.2 the CAN network device driver should + support a local loopback functionality similar to the local echo + e.g. of tty devices. In this case the driver flag IFF_ECHO has to be + set to prevent the PF_CAN core from locally echoing sent frames + (aka loopback) as fallback solution: + + dev->flags = (IFF_NOARP | IFF_ECHO); + + 6.3 CAN controller hardware filters + + To reduce the interrupt load on deep embedded systems some CAN + controllers support the filtering of CAN IDs or ranges of CAN IDs. + These hardware filter capabilities vary from controller to + controller and have to be identified as not feasible in a multi-user + networking approach. The use of the very controller specific + hardware filters could make sense in a very dedicated use-case, as a + filter on driver level would affect all users in the multi-user + system. The high efficient filter sets inside the PF_CAN core allow + to set different multiple filters for each socket separately. + Therefore the use of hardware filters goes to the category 'handmade + tuning on deep embedded systems'. The author is running a MPC603e + @133MHz with four SJA1000 CAN controllers from 2002 under heavy bus + load without any problems ... + + 6.4 currently supported CAN hardware (September 2007) + + On the project website http://developer.berlios.de/projects/socketcan + there are different drivers available: + + vcan: Virtual CAN interface driver (if no real hardware is available) + sja1000: Philips SJA1000 CAN controller (recommended) + i82527: Intel i82527 CAN controller + mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200) + ccan: CCAN controller core (e.g. inside SOC h7202) + slcan: For a bunch of CAN adaptors that are attached via a + serial line ASCII protocol (for serial / USB adaptors) + + Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport) + from PEAK Systemtechnik support the CAN netdevice driver model + since Linux driver v6.0: http://www.peak-system.com/linux/index.htm + + Please check the Mailing Lists on the berlios OSS project website. + + 6.5 todo (September 2007) + + The configuration interface for CAN network drivers is still an open + issue that has not been finalized in the socketcan project. Also the + idea of having a library module (candev.ko) that holds functions + that are needed by all CAN netdevices is not ready to ship. + Your contribution is welcome. + +7. Credits +---------- + + Oliver Hartkopp (PF_CAN core, filters, drivers, bcm) + Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan) + Jan Kizka (RT-SocketCAN core, Socket-API reconciliation) + Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews) + Robert Schwebel (design reviews, PTXdist integration) + Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers) + Benedikt Spranger (reviews) + Thomas Gleixner (LKML reviews, coding style, posting hints) + Andrey Volkov (kernel subtree structure, ioctls, mscan driver) + Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003) + Klaus Hitschler (PEAK driver integration) + Uwe Koppe (CAN netdevices with PF_PACKET approach) + Michael Schulze (driver layer loopback requirement, RT CAN drivers review) diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index afb66f9a8aff..39131a3c78f8 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -14,24 +14,35 @@ Introduction ============ Datagram Congestion Control Protocol (DCCP) is an unreliable, connection -based protocol designed to solve issues present in UDP and TCP particularly -for real time and multimedia traffic. +oriented protocol designed to solve issues present in UDP and TCP, particularly +for real-time and multimedia (streaming) traffic. +It divides into a base protocol (RFC 4340) and plugable congestion control +modules called CCIDs. Like plugable TCP congestion control, at least one CCID +needs to be enabled in order for the protocol to function properly. In the Linux +implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as +the TCP-friendly CCID3 (RFC 4342), are optional. +For a brief introduction to CCIDs and suggestions for choosing a CCID to match +given applications, see section 10 of RFC 4340. It has a base protocol and pluggable congestion control IDs (CCIDs). -It is at proposed standard RFC status and the homepage for DCCP as a protocol -is at: - http://www.read.cs.ucla.edu/dccp/ +DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol +is at http://www.ietf.org/html.charters/dccp-charter.html Missing features ================ -The DCCP implementation does not currently have all the features that are in -the RFC. +The Linux DCCP implementation does not currently support all the features that are +specified in RFCs 4340...42. The known bugs are at: http://linux-net.osdl.org/index.php/TODO#DCCP +For more up-to-date versions of the DCCP implementation, please consider using +the experimental DCCP test tree; instructions for checking this out are on: +http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree + + Socket options ============== @@ -46,6 +57,12 @@ can be set before calling bind(). DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet size (application payload size) in bytes, see RFC 4340, section 14. +DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold +timewait state when closing the connection (RFC 4340, 8.3). The usual case is +that the closing server sends a CloseReq, whereupon the client holds timewait +state. When this boolean socket option is on, the server sends a Close instead +and will enter TIMEWAIT. This option must be set after accept() returns. + DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums always cover the entire packet and that only fully covered application data is @@ -72,6 +89,8 @@ DCCP_SOCKOPT_CCID_TX_INFO Returns a `struct tfrc_tx_info' in optval; the buffer for optval and optlen must be set to at least sizeof(struct tfrc_tx_info). +On unidirectional connections it is useful to close the unused half-connection +via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. Sysctl variables ================ @@ -123,6 +142,12 @@ sync_ratelimit = 125 ms sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit of this parameter is milliseconds; a value of 0 disables rate-limiting. +IOCTLS +====== +FIONREAD + Works as in udp(7): returns in the `int' argument pointer the size of + the next pending datagram in bytes, or 0 when no datagram is pending. + Notes ===== diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6f7872ba1def..17a6e46fbd43 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -446,6 +446,33 @@ tcp_dma_copybreak - INTEGER and CONFIG_NET_DMA is enabled. Default: 4096 +UDP variables: + +udp_mem - vector of 3 INTEGERs: min, pressure, max + Number of pages allowed for queueing by all UDP sockets. + + min: Below this number of pages UDP is not bothered about its + memory appetite. When amount of memory allocated by UDP exceeds + this number, UDP starts to moderate memory usage. + + pressure: This value was introduced to follow format of tcp_mem. + + max: Number of pages allowed for queueing by all UDP sockets. + + Default is calculated at boot time from amount of available memory. + +udp_rmem_min - INTEGER + Minimal size of receive buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for receiving data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + +udp_wmem_min - INTEGER + Minimal size of send buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for sending data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + CIPSOv4 Variables: cipso_cache_enable - BOOLEAN diff --git a/Documentation/networking/shaper.txt b/Documentation/networking/shaper.txt deleted file mode 100644 index 6c4ebb66a906..000000000000 --- a/Documentation/networking/shaper.txt +++ /dev/null @@ -1,48 +0,0 @@ -Traffic Shaper For Linux - -This is the current BETA release of the traffic shaper for Linux. It works -within the following limits: - -o Minimum shaping speed is currently about 9600 baud (it can only -shape down to 1 byte per clock tick) - -o Maximum is about 256K, it will go above this but get a bit blocky. - -o If you ifconfig the master device that a shaper is attached to down -then your machine will follow. - -o The shaper must be a module. - - -Setup: - - A shaper device is configured using the shapeconfig program. -Typically you will do something like this - -shapecfg attach shaper0 eth1 -shapecfg speed shaper0 64000 -ifconfig shaper0 myhost netmask 255.255.255.240 broadcast 1.2.3.4.255 up -route add -net some.network netmask a.b.c.d dev shaper0 - -The shaper should have the same IP address as the device it is attached to -for normal use. - -Gotchas: - - The shaper shapes transmitted traffic. It's rather impossible to -shape received traffic except at the end (or a router) transmitting it. - - Gated/routed/rwhod/mrouted all see the shaper as an additional device -and will treat it as such unless patched. Note that for mrouted you can run -mrouted tunnels via a traffic shaper to control bandwidth usage. - - The shaper is device/route based. This makes it very easy to use -with any setup BUT less flexible. You may need to use iproute2 to set up -multiple route tables to get the flexibility. - - There is no "borrowing" or "sharing" scheme. This is a simple -traffic limiter. We implement Van Jacobson and Sally Floyd's CBQ -architecture into Linux 2.2. This is the preferred solution. Shaper is -for simple or back compatible setups. - -Alan diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt index b6409cab075c..3870f280280b 100644 --- a/Documentation/networking/udplite.txt +++ b/Documentation/networking/udplite.txt @@ -236,7 +236,7 @@ This displays UDP-Lite statistics variables, whose meaning is as follows. - InDatagrams: Total number of received datagrams. + InDatagrams: The total number of datagrams delivered to users. NoPorts: Number of packets received to an unknown port. These cases are counted separately (not as InErrors). diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.txt new file mode 100644 index 000000000000..53c1a58b02f1 --- /dev/null +++ b/Documentation/networking/xfrm_proc.txt @@ -0,0 +1,70 @@ +XFRM proc - /proc/net/xfrm_* files +================================== +Masahide NAKAMURA <nakam@linux-ipv6.org> + + +Transformation Statistics +------------------------- +xfrm_proc is a statistics shown factor dropped by transformation +for developer. +It is a counter designed from current transformation source code +and defined like linux private MIB. + +Inbound statistics +~~~~~~~~~~~~~~~~~~ +XfrmInError: + All errors which is not matched others +XfrmInBufferError: + No buffer is left +XfrmInHdrError: + Header error +XfrmInNoStates: + No state is found + i.e. Either inbound SPI, address, or IPsec protocol at SA is wrong +XfrmInStateProtoError: + Transformation protocol specific error + e.g. SA key is wrong +XfrmInStateModeError: + Transformation mode specific error +XfrmInSeqOutOfWindow: + Sequence out of window +XfrmInStateExpired: + State is expired +XfrmInStateMismatch: + State has mismatch option + e.g. UDP encapsulation type is mismatch +XfrmInStateInvalid: + State is invalid +XfrmInTmplMismatch: + No matching template for states + e.g. Inbound SAs are correct but SP rule is wrong +XfrmInNoPols: + No policy is found for states + e.g. Inbound SAs are correct but no SP is found +XfrmInPolBlock: + Policy discards +XfrmInPolError: + Policy error + +Outbound errors +~~~~~~~~~~~~~~~ +XfrmOutError: + All errors which is not matched others +XfrmOutBundleGenError: + Bundle generation error +XfrmOutBundleCheckError: + Bundle check error +XfrmOutNoStates: + No state is found +XfrmOutStateProtoError: + Transformation protocol specific error +XfrmOutStateModeError: + Transformation mode specific error +XfrmOutStateExpired: + State is expired +XfrmOutPolBlock: + Policy discards +XfrmOutPolDead: + Policy is dead +XfrmOutPolError: + Policy error diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt index 481faf515d53..a327db67782a 100644 --- a/Documentation/pnp.txt +++ b/Documentation/pnp.txt @@ -17,9 +17,9 @@ The User Interface ------------------ The Linux Plug and Play user interface provides a means to activate PnP devices for legacy and user level drivers that do not support Linux Plug and Play. The -user interface is integrated into driverfs. +user interface is integrated into sysfs. -In addition to the standard driverfs file the following are created in each +In addition to the standard sysfs file the following are created in each device's directory: id - displays a list of support EISA IDs options - displays possible resource configurations diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO index 86320aa3fb0b..8fbc0a852870 100644 --- a/Documentation/s390/CommonIO +++ b/Documentation/s390/CommonIO @@ -4,6 +4,11 @@ S/390 common I/O-Layer - command line parameters, procfs and debugfs entries Command line parameters ----------------------- +* ccw_timeout_log + + Enable logging of debug information in case of ccw device timeouts. + + * cio_msg = yes | no Determines whether information on found devices and sensed device diff --git a/Documentation/s390/cds.txt b/Documentation/s390/cds.txt index 3081927cc2d6..c4b7b2bd369a 100644 --- a/Documentation/s390/cds.txt +++ b/Documentation/s390/cds.txt @@ -133,7 +133,7 @@ During its startup the Linux/390 system checks for peripheral devices. Each of those devices is uniquely defined by a so called subchannel by the ESA/390 channel subsystem. While the subchannel numbers are system generated, each subchannel also takes a user defined attribute, the so called device number. -Both subchannel number and device number cannot exceed 65535. During driverfs +Both subchannel number and device number cannot exceed 65535. During sysfs initialisation, the information about control unit type and device types that imply specific I/O commands (channel command words - CCWs) in order to operate the device are gathered. Device drivers can retrieve this set of hardware diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX index aa1f7e927834..c2e18e109858 100644 --- a/Documentation/scsi/00-INDEX +++ b/Documentation/scsi/00-INDEX @@ -64,8 +64,6 @@ lpfc.txt - LPFC driver release notes megaraid.txt - Common Management Module, shared code handling ioctls for LSI drivers -ncr53c7xx.txt - - info on driver for NCR53c7xx based adapters ncr53c8xx.txt - info on driver for NCR53c8xx based adapters osst.txt diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 5eb927544990..91c81db0ba71 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -1,3 +1,162 @@ +1 Release Date : Thur. Nov. 07 16:30:43 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.16 +3 Older Version : 00.00.03.15 + +1. Increased MFI_POLL_TIMEOUT_SECS to 60 seconds from 10. FW may take + a max of 60 seconds to respond to the INIT cmd. + +1 Release Date : Fri. Sep. 07 16:30:43 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.15 +3 Older Version : 00.00.03.14 + +1. Added module parameter "poll_mode_io" to support for "polling" + (reduced interrupt operation). In this mode, IO completion + interrupts are delayed. At the end of initiating IOs, the + driver schedules for cmd completion if there are pending cmds + to be completed. A timer-based interrupt has also been added + to prevent IO completion processing from being delayed + indefinitely in the case that no new IOs are initiated. + +1 Release Date : Fri. Sep. 07 16:30:43 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.14 +3 Older Version : 00.00.03.13 + +1. Setting the max_sectors_per_req based on max SGL supported by the + FW. Prior versions calculated this value from controller info + (max_sectors_1, max_sectors_2). For certain controllers/FW, + this was resulting in a value greater than max SGL supported + by the FW. Issue was first reported by users running LUKS+XFS + with megaraid_sas. Thanks to RB for providing the logs and + duplication steps that helped to get to the root cause of the + issue. 2. Increased MFI_POLL_TIMEOUT_SECS to 60 seconds from + 10. FW may take a max of 60 seconds to respond to the INIT + cmd. + +1 Release Date : Fri. June. 15 16:30:43 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.13 +3 Older Version : 00.00.03.12 + +1. Added the megasas_reset_timer routine to intercept cmd timeout and throttle io. + +On Fri, 2007-03-16 at 16:44 -0600, James Bottomley wrote: +It looks like megaraid_sas at least needs this to throttle its commands +> as they begin to time out. The code keeps the existing transport +> template use of eh_timed_out (and allows the transport to override the +> host if they both have this callback). +> +> James + +1 Release Date : Sat May. 12 16:30:43 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.12 +3 Older Version : 00.00.03.11 + +1. When MegaSAS driver receives reset call from OS, driver waits in reset +routine for max 3 minutes for all pending command completion. Now driver will +call completion routine every 5 seconds from the reset routine instead of +waiting for depending on cmd completion from isr path. + +1 Release Date : Mon Apr. 30 10:25:52 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.11 +3 Older Version : 00.00.03.09 + + 1. Memory Manager for IOCTL removed for 2.6 kernels. + pci_alloc_consistent replaced by dma_alloc_coherent. With this + change there is no need of memory manager in the driver code + + On Wed, 2007-02-07 at 13:30 -0800, Andrew Morton wrote: + > I suspect all this horror is due to stupidity in the DMA API. + > + > pci_alloc_consistent() just goes and assumes GFP_ATOMIC, whereas + > the caller (megasas_mgmt_fw_ioctl) would have been perfectly happy + > to use GFP_KERNEL. + > + > I bet this fixes it + + It does, but the DMA API was expanded to cope with this exact case, so + use dma_alloc_coherent() directly in the megaraid code instead. The dev + is just &pci_dev->dev. + + James <James.Bottomley@SteelEye.com> + + 3. SYNCHRONIZE_CACHE is not supported by FW and thus blocked by driver. + 4. Hibernation support added + 5. Performing diskdump while running IO in RHEL 4 was failing. Fixed. + +1 Release Date : Fri Feb. 09 14:36:28 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang + +2 Current Version : 00.00.03.09 +3 Older Version : 00.00.03.08 + +i. Under heavy IO mid-layer prints "DRIVER_TIMEOUT" errors + + The driver now waits for 10 seconds to elapse instead of 5 (as in + previous release) to resume IO. + +1 Release Date : Mon Feb. 05 11:35:24 PST 2007 - + (emaild-id:megaraidlinux@lsi.com) + Sumant Patro + Bo Yang +2 Current Version : 00.00.03.08 +3 Older Version : 00.00.03.07 + +i. Under heavy IO mid-layer prints "DRIVER_TIMEOUT" errors + + Fix: The driver is now throttling IO. + Checks added in megasas_queue_command to know if FW is able to + process commands within timeout period. If number of retries + is 2 or greater,the driver stops sending cmd to FW temporarily. IO is + resumed if pending cmd count reduces to 16 or 5 seconds has elapsed + from the time cmds were last sent to FW. + +ii. FW enables WCE bit in Mode Sense cmd for drives that are configured + as WriteBack. The OS may send "SYNCHRONIZE_CACHE" cmd when Logical + Disks are exposed with WCE=1. User is advised to enable Write Back + mode only when the controller has battery backup. At this time + Synhronize cache is not supported by the FW. Driver will short-cycle + the cmd and return sucess without sending down to FW. + +1 Release Date : Sun Jan. 14 11:21:32 PDT 2007 - + Sumant Patro <Sumant.Patro@lsil.com>/Bo Yang +2 Current Version : 00.00.03.07 +3 Older Version : 00.00.03.06 + +i. bios_param entry added in scsi_host_template that returns disk geometry + information. + +1 Release Date : Fri Oct 20 11:21:32 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com>/Bo Yang +2 Current Version : 00.00.03.06 +3 Older Version : 00.00.03.05 + +1. Added new memory management module to support the IOCTL memory allocation. For IOCTL we try to allocate from the memory pool created during driver initialization. If mem pool is empty then we allocate at run time. +2. Added check in megasas_queue_command and dpc/isr routine to see if we have already declared adapter dead + (hw_crit_error=1). If hw_crit_error==1, now we donot accept any processing of pending cmds/accept any cmd from OS 1 Release Date : Mon Oct 02 11:21:32 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com> 2 Current Version : 00.00.03.05 diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt index a8257840695a..d16011a8618e 100644 --- a/Documentation/scsi/aacraid.txt +++ b/Documentation/scsi/aacraid.txt @@ -56,6 +56,10 @@ Supported Cards/Chipsets 9005:0285:9005:02d1 Adaptec 5405 (Voodoo40) 9005:0285:15d9:02d2 SMC AOC-USAS-S8i-LP 9005:0285:15d9:02d3 SMC AOC-USAS-S8iR-LP + 9005:0285:9005:02d4 Adaptec 2045 (Voodoo04 Lite) + 9005:0285:9005:02d5 Adaptec 2405 (Voodoo40 Lite) + 9005:0285:9005:02d6 Adaptec 2445 (Voodoo44 Lite) + 9005:0285:9005:02d7 Adaptec 2805 (Voodoo80 Lite) 1011:0046:9005:0364 Adaptec 5400S (Mustang) 9005:0287:9005:0800 Adaptec Themisto (Jupiter) 9005:0200:9005:0200 Adaptec Themisto (Jupiter) diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt index d28a31247d4c..a6eb4add1be6 100644 --- a/Documentation/scsi/hptiop.txt +++ b/Documentation/scsi/hptiop.txt @@ -1,9 +1,9 @@ -HIGHPOINT ROCKETRAID 3xxx RAID DRIVER (hptiop) +HIGHPOINT ROCKETRAID 3xxx/4xxx ADAPTER DRIVER (hptiop) Controller Register Map ------------------------- -The controller IOP is accessed via PCI BAR0. +For Intel IOP based adapters, the controller IOP is accessed via PCI BAR0: BAR0 offset Register 0x10 Inbound Message Register 0 @@ -18,6 +18,24 @@ The controller IOP is accessed via PCI BAR0. 0x40 Inbound Queue Port 0x44 Outbound Queue Port +For Marvell IOP based adapters, the IOP is accessed via PCI BAR0 and BAR1: + + BAR0 offset Register + 0x20400 Inbound Doorbell Register + 0x20404 Inbound Interrupt Mask Register + 0x20408 Outbound Doorbell Register + 0x2040C Outbound Interrupt Mask Register + + BAR1 offset Register + 0x0 Inbound Queue Head Pointer + 0x4 Inbound Queue Tail Pointer + 0x8 Outbound Queue Head Pointer + 0xC Outbound Queue Tail Pointer + 0x10 Inbound Message Register + 0x14 Outbound Message Register + 0x40-0x1040 Inbound Queue + 0x1040-0x2040 Outbound Queue + I/O Request Workflow ---------------------- @@ -73,15 +91,9 @@ The driver exposes following sysfs attributes: driver-version R driver version string firmware-version R firmware version string -The driver registers char device "hptiop" to communicate with HighPoint RAID -management software. Its ioctl routine acts as a general binary interface -between the IOP firmware and HighPoint RAID management software. New management -functions can be implemented in application/firmware without modification -in driver code. - ----------------------------------------------------------------------------- -Copyright (C) 2006 HighPoint Technologies, Inc. All Rights Reserved. +Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of diff --git a/Documentation/scsi/ncr53c7xx.txt b/Documentation/scsi/ncr53c7xx.txt deleted file mode 100644 index 91e9552d63e5..000000000000 --- a/Documentation/scsi/ncr53c7xx.txt +++ /dev/null @@ -1,40 +0,0 @@ -README for WarpEngine/A4000T/A4091 SCSI kernels. - -Use the following options to disable options in the SCSI driver. - -Using amiboot for example..... - -To disable Synchronous Negotiation.... - - amiboot -k kernel 53c7xx=nosync:0 - -To disable Disconnection.... - - amiboot -k kernel 53c7xx=nodisconnect:0 - -To disable certain SCSI devices... - - amiboot -k kernel 53c7xx=validids:0x3F - - this allows only device ID's 0,1,2,3,4 and 5 for linux to handle. - (this is a bitmasked field - i.e. each bit represents a SCSI ID) - -These commands work on a per controller basis and use the option 'next' to -move to the next controller in the system. - -e.g. - amiboot -k kernel 53c7xx=nodisconnect:0,next,nosync:0 - - this uses No Disconnection on the first controller and Asynchronous - SCSI on the second controller. - -Known Issues: - -Two devices are known not to function with the default settings of using -synchronous SCSI. These are the Archive Viper 150 Tape Drive and the -SyQuest SQ555 removeable hard drive. When using these devices on a controller -use the 'nosync:0' option. - -Please try these options and post any problems/successes to me. - -Alan Hourihane <alanh@fairlite.demon.co.uk> diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885 index 00cb646a4bde..0924e6e142c4 100644 --- a/Documentation/video4linux/CARDLIST.cx23885 +++ b/Documentation/video4linux/CARDLIST.cx23885 @@ -1,5 +1,7 @@ 0 -> UNKNOWN/GENERIC [0070:3400] 1 -> Hauppauge WinTV-HVR1800lp [0070:7600] - 2 -> Hauppauge WinTV-HVR1800 [0070:7800,0070:7801] + 2 -> Hauppauge WinTV-HVR1800 [0070:7800,0070:7801,0070:7809] 3 -> Hauppauge WinTV-HVR1250 [0070:7911] 4 -> DViCO FusionHDTV5 Express [18ac:d500] + 5 -> Hauppauge WinTV-HVR1500Q [0070:7790,0070:7797] + 6 -> Hauppauge WinTV-HVR1500 [0070:7710,0070:7717] diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index 82ac8250e978..bc5593bd9704 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 @@ -56,3 +56,4 @@ 55 -> Shenzhen Tungsten Ages Tech TE-DTV-250 / Swann OEM [c180:c980] 56 -> Hauppauge WinTV-HVR1300 DVB-T/Hybrid MPEG Encoder [0070:9600,0070:9601,0070:9602] 57 -> ADS Tech Instant Video PCI [1421:0390] + 58 -> Pinnacle PCTV HD 800i [11bd:0051] diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx index 37f0e3cedf43..6a8469f2bcae 100644 --- a/Documentation/video4linux/CARDLIST.em28xx +++ b/Documentation/video4linux/CARDLIST.em28xx @@ -1,14 +1,17 @@ 0 -> Unknown EM2800 video grabber (em2800) [eb1a:2800] - 1 -> Unknown EM2820/2840 video grabber (em2820/em2840) + 1 -> Unknown EM2750/28xx video grabber (em2820/em2840) [eb1a:2750,eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883] 2 -> Terratec Cinergy 250 USB (em2820/em2840) [0ccd:0036] 3 -> Pinnacle PCTV USB 2 (em2820/em2840) [2304:0208] - 4 -> Hauppauge WinTV USB 2 (em2820/em2840) [2040:4200] - 5 -> MSI VOX USB 2.0 (em2820/em2840) [eb1a:2820] + 4 -> Hauppauge WinTV USB 2 (em2820/em2840) [2040:4200,2040:4201] + 5 -> MSI VOX USB 2.0 (em2820/em2840) 6 -> Terratec Cinergy 200 USB (em2800) 7 -> Leadtek Winfast USB II (em2800) 8 -> Kworld USB2800 (em2800) - 9 -> Pinnacle Dazzle DVC 90 (em2820/em2840) [2304:0207] - 10 -> Hauppauge WinTV HVR 900 (em2880) - 11 -> Terratec Hybrid XS (em2880) + 9 -> Pinnacle Dazzle DVC 90/DVC 100 (em2820/em2840) [2304:0207,2304:021a] + 10 -> Hauppauge WinTV HVR 900 (em2880) [2040:6500] + 11 -> Terratec Hybrid XS (em2880) [0ccd:0042] 12 -> Kworld PVR TV 2800 RF (em2820/em2840) - 13 -> Terratec Prodigy XS (em2880) + 13 -> Terratec Prodigy XS (em2880) [0ccd:0047] + 14 -> Pixelview Prolink PlayTV USB 2.0 (em2820/em2840) + 15 -> V-Gear PocketTV (em2800) + 16 -> Hauppauge WinTV HVR 950 (em2880) [2040:6513] diff --git a/Documentation/video4linux/CARDLIST.ivtv b/Documentation/video4linux/CARDLIST.ivtv index ddd76a0eb100..a019e27e42b3 100644 --- a/Documentation/video4linux/CARDLIST.ivtv +++ b/Documentation/video4linux/CARDLIST.ivtv @@ -16,3 +16,9 @@ 16 -> GOTVIEW PCI DVD2 Deluxe [ffac:0600] 17 -> Yuan MPC622 [ff01:d998] 18 -> Digital Cowboy DCT-MTVP1 [1461:bfff] +19 -> Yuan PG600V2/GotView PCI DVD Lite [ffab:0600,ffad:0600] +20 -> Club3D ZAP-TV1x01 [ffab:0600] +21 -> AverTV MCE 116 Plus [1461:c439] +22 -> ASUS Falcon2 [1043:4b66,1043:462e,1043:4b2e] +23 -> AverMedia PVR-150 Plus [1461:c035] +24 -> AverMedia EZMaker PCI Deluxe [1461:c03f] diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index a14545300e4c..5d3b6b4d2515 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 @@ -80,7 +80,7 @@ 79 -> Sedna/MuchTV PC TV Cardbus TV/Radio (ITO25 Rev:2B) 80 -> ASUS Digimatrix TV [1043:0210] 81 -> Philips Tiger reference design [1131:2018] - 82 -> MSI TV@Anywhere plus [1462:6231] + 82 -> MSI TV@Anywhere plus [1462:6231,1462:8624] 83 -> Terratec Cinergy 250 PCI TV [153b:1160] 84 -> LifeView FlyDVB Trio [5168:0319] 85 -> AverTV DVB-T 777 [1461:2c05,1461:2c05] @@ -102,7 +102,7 @@ 101 -> Pinnacle PCTV 310i [11bd:002f] 102 -> Avermedia AVerTV Studio 507 [1461:9715] 103 -> Compro Videomate DVB-T200A -104 -> Hauppauge WinTV-HVR1110 DVB-T/Hybrid [0070:6701] +104 -> Hauppauge WinTV-HVR1110 DVB-T/Hybrid [0070:6700,0070:6701,0070:6702,0070:6703,0070:6704,0070:6705] 105 -> Terratec Cinergy HT PCMCIA [153b:1172] 106 -> Encore ENLTV [1131:2342,1131:2341,3016:2344] 107 -> Encore ENLTV-FM [1131:230f] @@ -116,3 +116,16 @@ 115 -> Sabrent PCMCIA TV-PCB05 [0919:2003] 116 -> 10MOONS TM300 TV Card [1131:2304] 117 -> Avermedia Super 007 [1461:f01d] +118 -> Beholder BeholdTV 401 [0000:4016] +119 -> Beholder BeholdTV 403 [0000:4036] +120 -> Beholder BeholdTV 403 FM [0000:4037] +121 -> Beholder BeholdTV 405 [0000:4050] +122 -> Beholder BeholdTV 405 FM [0000:4051] +123 -> Beholder BeholdTV 407 [0000:4070] +124 -> Beholder BeholdTV 407 FM [0000:4071] +125 -> Beholder BeholdTV 409 [0000:4090] +126 -> Beholder BeholdTV 505 FM/RDS [0000:5051,0000:505B,5ace:5050] +127 -> Beholder BeholdTV 507 FM/RDS / BeholdTV 509 FM [0000:5071,0000:507B,5ace:5070,5ace:5090] +128 -> Beholder BeholdTV Columbus TVFM [0000:5201] +129 -> Beholder BeholdTV 607 / BeholdTV 609 [5ace:6070,5ace:6071,5ace:6072,5ace:6073,5ace:6090,5ace:6091,5ace:6092,5ace:6093] +130 -> Beholder BeholdTV M6 / BeholdTV M6 Extra [5ace:6190,5ace:6193] diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index a88c02d23805..0e2394695bb8 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner @@ -52,7 +52,7 @@ tuner=50 - TCL 2002N tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) tuner=52 - Thomson DTT 7610 (ATSC/NTSC) tuner=53 - Philips FQ1286 -tuner=54 - tda8290+75 +tuner=54 - Philips/NXP TDA 8290/8295 + 8275/8275A/18271 tuner=55 - TCL 2002MB tuner=56 - Philips PAL/SECAM multi (FQ1216AME MK4) tuner=57 - Philips FQ1236A MK4 @@ -69,7 +69,8 @@ tuner=67 - Philips TD1316 Hybrid Tuner tuner=68 - Philips TUV1236D ATSC/NTSC dual in tuner=69 - Tena TNF 5335 and similar models tuner=70 - Samsung TCPN 2121P30A -tuner=71 - Xceive xc3028 +tuner=71 - Xceive xc2028/xc3028 tuner tuner=72 - Thomson FE6600 tuner=73 - Samsung TCPG 6121P30A tuner=75 - Philips TEA5761 FM Radio +tuner=76 - Xceive 5000 tuner diff --git a/Documentation/video4linux/CARDLIST.usbvision b/Documentation/video4linux/CARDLIST.usbvision index 3d6850ef0245..0b72d3fee17e 100644 --- a/Documentation/video4linux/CARDLIST.usbvision +++ b/Documentation/video4linux/CARDLIST.usbvision @@ -62,3 +62,4 @@ 61 -> Pinnacle Studio Linx Video input cable (PAL) [2304:0301] 62 -> Pinnacle PCTV Bungee USB (PAL) FM [2304:0419] 63 -> Hauppauge WinTv-USB [2400:4200] + 64 -> Pinnacle Studio PCTV USB (NTSC) FM V3 [2304:0113] diff --git a/Documentation/video4linux/extract_xc3028.pl b/Documentation/video4linux/extract_xc3028.pl new file mode 100644 index 000000000000..cced8ac5c543 --- /dev/null +++ b/Documentation/video4linux/extract_xc3028.pl @@ -0,0 +1,926 @@ +#!/usr/bin/perl + +# Copyright (c) Mauro Carvalho Chehab <mchehab@infradead.org> +# Released under GPLv2 +# +# In order to use, you need to: +# 1) Download the windows driver with something like: +# wget http://www.steventoth.net/linux/xc5000/HVR-12x0-14x0-17x0_1_25_25271_WHQL.zip +# 2) Extract the file hcw85bda.sys from the zip into the current dir: +# unzip -j HVR-12x0-14x0-17x0_1_25_25271_WHQL.zip Driver85/hcw85bda.sys +# 3) run the script: +# ./extract_xc3028.pl +# 4) copy the generated file: +# cp xc3028-v27.fw /lib/firmware + +#use strict; +use IO::Handle; + +my $debug=0; + +sub verify ($$) +{ + my ($filename, $hash) = @_; + my ($testhash); + + if (system("which md5sum > /dev/null 2>&1")) { + die "This firmware requires the md5sum command - see http://www.gnu.org/software/coreutils/\n"; + } + + open(CMD, "md5sum ".$filename."|"); + $testhash = <CMD>; + $testhash =~ /([a-zA-Z0-9]*)/; + $testhash = $1; + close CMD; + die "Hash of extracted file does not match (found $testhash, expected $hash!\n" if ($testhash ne $hash); +} + +sub get_hunk ($$) +{ + my ($offset, $length) = @_; + my ($chunklength, $buf, $rcount, $out); + + sysseek(INFILE, $offset, SEEK_SET); + while ($length > 0) { + # Calc chunk size + $chunklength = 2048; + $chunklength = $length if ($chunklength > $length); + + $rcount = sysread(INFILE, $buf, $chunklength); + die "Ran out of data\n" if ($rcount != $chunklength); + $out .= $buf; + $length -= $rcount; + } + return $out; +} + +sub write_le16($) +{ + my $val = shift; + my $msb = ($val >> 8) &0xff; + my $lsb = $val & 0xff; + + syswrite(OUTFILE, chr($lsb).chr($msb)); +} + +sub write_le32($) +{ + my $val = shift; + my $l3 = ($val >> 24) & 0xff; + my $l2 = ($val >> 16) & 0xff; + my $l1 = ($val >> 8) & 0xff; + my $l0 = $val & 0xff; + + syswrite(OUTFILE, chr($l0).chr($l1).chr($l2).chr($l3)); +} + +sub write_le64($$) +{ + my $msb_val = shift; + my $lsb_val = shift; + my $l7 = ($msb_val >> 24) & 0xff; + my $l6 = ($msb_val >> 16) & 0xff; + my $l5 = ($msb_val >> 8) & 0xff; + my $l4 = $msb_val & 0xff; + + my $l3 = ($lsb_val >> 24) & 0xff; + my $l2 = ($lsb_val >> 16) & 0xff; + my $l1 = ($lsb_val >> 8) & 0xff; + my $l0 = $lsb_val & 0xff; + + syswrite(OUTFILE, + chr($l0).chr($l1).chr($l2).chr($l3). + chr($l4).chr($l5).chr($l6).chr($l7)); +} + +sub write_hunk($$) +{ + my ($offset, $length) = @_; + my $out = get_hunk($offset, $length); + + printf "(len %d) ",$length if ($debug); + + for (my $i=0;$i<$length;$i++) { + printf "%02x ",ord(substr($out,$i,1)) if ($debug); + } + printf "\n" if ($debug); + + syswrite(OUTFILE, $out); +} + +sub write_hunk_fix_endian($$) +{ + my ($offset, $length) = @_; + my $out = get_hunk($offset, $length); + + printf "(len_fix %d) ",$length if ($debug); + + for (my $i=0;$i<$length;$i++) { + printf "%02x ",ord(substr($out,$i,1)) if ($debug); + } + printf "\n" if ($debug); + + my $i=0; + while ($i<$length) { + my $size = ord(substr($out,$i,1))*256+ord(substr($out,$i+1,1)); + syswrite(OUTFILE, substr($out,$i+1,1)); + syswrite(OUTFILE, substr($out,$i,1)); + $i+=2; + if ($size>0 && $size <0x8000) { + for (my $j=0;$j<$size;$j++) { + syswrite(OUTFILE, substr($out,$j+$i,1)); + } + $i+=$size; + } + } +} + +sub main_firmware($$$$) +{ + my $out; + my $j=0; + my $outfile = shift; + my $name = shift; + my $version = shift; + my $nr_desc = shift; + + for ($j = length($name); $j <32; $j++) { + $name = $name.chr(0); +} + + open OUTFILE, ">$outfile"; + syswrite(OUTFILE, $name); + write_le16($version); + write_le16($nr_desc); + + # + # Firmware 0, type: BASE FW F8MHZ (0x00000003), id: (0000000000000000), size: 8718 + # + + write_le32(0x00000003); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8718); # Size + write_hunk_fix_endian(813432, 8718); + + # + # Firmware 1, type: BASE FW F8MHZ MTS (0x00000007), id: (0000000000000000), size: 8712 + # + + write_le32(0x00000007); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8712); # Size + write_hunk_fix_endian(822152, 8712); + + # + # Firmware 2, type: BASE FW FM (0x00000401), id: (0000000000000000), size: 8562 + # + + write_le32(0x00000401); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8562); # Size + write_hunk_fix_endian(830872, 8562); + + # + # Firmware 3, type: BASE FW FM INPUT1 (0x00000c01), id: (0000000000000000), size: 8576 + # + + write_le32(0x00000c01); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8576); # Size + write_hunk_fix_endian(839440, 8576); + + # + # Firmware 4, type: BASE FW (0x00000001), id: (0000000000000000), size: 8706 + # + + write_le32(0x00000001); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8706); # Size + write_hunk_fix_endian(848024, 8706); + + # + # Firmware 5, type: BASE FW MTS (0x00000005), id: (0000000000000000), size: 8682 + # + + write_le32(0x00000005); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(8682); # Size + write_hunk_fix_endian(856736, 8682); + + # + # Firmware 6, type: STD FW (0x00000000), id: PAL/BG A2/A (0000000100000007), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000001, 0x00000007); # ID + write_le32(161); # Size + write_hunk_fix_endian(865424, 161); + + # + # Firmware 7, type: STD FW MTS (0x00000004), id: PAL/BG A2/A (0000000100000007), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000001, 0x00000007); # ID + write_le32(169); # Size + write_hunk_fix_endian(865592, 169); + + # + # Firmware 8, type: STD FW (0x00000000), id: PAL/BG A2/B (0000000200000007), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000002, 0x00000007); # ID + write_le32(161); # Size + write_hunk_fix_endian(865424, 161); + + # + # Firmware 9, type: STD FW MTS (0x00000004), id: PAL/BG A2/B (0000000200000007), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000002, 0x00000007); # ID + write_le32(169); # Size + write_hunk_fix_endian(865592, 169); + + # + # Firmware 10, type: STD FW (0x00000000), id: PAL/BG NICAM/A (0000000400000007), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000004, 0x00000007); # ID + write_le32(161); # Size + write_hunk_fix_endian(866112, 161); + + # + # Firmware 11, type: STD FW MTS (0x00000004), id: PAL/BG NICAM/A (0000000400000007), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000004, 0x00000007); # ID + write_le32(169); # Size + write_hunk_fix_endian(866280, 169); + + # + # Firmware 12, type: STD FW (0x00000000), id: PAL/BG NICAM/B (0000000800000007), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000008, 0x00000007); # ID + write_le32(161); # Size + write_hunk_fix_endian(866112, 161); + + # + # Firmware 13, type: STD FW MTS (0x00000004), id: PAL/BG NICAM/B (0000000800000007), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000008, 0x00000007); # ID + write_le32(169); # Size + write_hunk_fix_endian(866280, 169); + + # + # Firmware 14, type: STD FW (0x00000000), id: PAL/DK A2 (00000003000000e0), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000003, 0x000000e0); # ID + write_le32(161); # Size + write_hunk_fix_endian(866800, 161); + + # + # Firmware 15, type: STD FW MTS (0x00000004), id: PAL/DK A2 (00000003000000e0), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000003, 0x000000e0); # ID + write_le32(169); # Size + write_hunk_fix_endian(866968, 169); + + # + # Firmware 16, type: STD FW (0x00000000), id: PAL/DK NICAM (0000000c000000e0), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x0000000c, 0x000000e0); # ID + write_le32(161); # Size + write_hunk_fix_endian(867144, 161); + + # + # Firmware 17, type: STD FW MTS (0x00000004), id: PAL/DK NICAM (0000000c000000e0), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x0000000c, 0x000000e0); # ID + write_le32(169); # Size + write_hunk_fix_endian(867312, 169); + + # + # Firmware 18, type: STD FW (0x00000000), id: SECAM/K1 (0000000000200000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x00200000); # ID + write_le32(161); # Size + write_hunk_fix_endian(867488, 161); + + # + # Firmware 19, type: STD FW MTS (0x00000004), id: SECAM/K1 (0000000000200000), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000000, 0x00200000); # ID + write_le32(169); # Size + write_hunk_fix_endian(867656, 169); + + # + # Firmware 20, type: STD FW (0x00000000), id: SECAM/K3 (0000000004000000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x04000000); # ID + write_le32(161); # Size + write_hunk_fix_endian(867832, 161); + + # + # Firmware 21, type: STD FW MTS (0x00000004), id: SECAM/K3 (0000000004000000), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000000, 0x04000000); # ID + write_le32(169); # Size + write_hunk_fix_endian(868000, 169); + + # + # Firmware 22, type: STD FW D2633 DTV6 ATSC (0x00010030), id: (0000000000000000), size: 149 + # + + write_le32(0x00010030); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868176, 149); + + # + # Firmware 23, type: STD FW D2620 DTV6 QAM (0x00000068), id: (0000000000000000), size: 149 + # + + write_le32(0x00000068); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868336, 149); + + # + # Firmware 24, type: STD FW D2633 DTV6 QAM (0x00000070), id: (0000000000000000), size: 149 + # + + write_le32(0x00000070); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868488, 149); + + # + # Firmware 25, type: STD FW D2620 DTV7 (0x00000088), id: (0000000000000000), size: 149 + # + + write_le32(0x00000088); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868648, 149); + + # + # Firmware 26, type: STD FW D2633 DTV7 (0x00000090), id: (0000000000000000), size: 149 + # + + write_le32(0x00000090); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868800, 149); + + # + # Firmware 27, type: STD FW D2620 DTV78 (0x00000108), id: (0000000000000000), size: 149 + # + + write_le32(0x00000108); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868960, 149); + + # + # Firmware 28, type: STD FW D2633 DTV78 (0x00000110), id: (0000000000000000), size: 149 + # + + write_le32(0x00000110); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(869112, 149); + + # + # Firmware 29, type: STD FW D2620 DTV8 (0x00000208), id: (0000000000000000), size: 149 + # + + write_le32(0x00000208); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868648, 149); + + # + # Firmware 30, type: STD FW D2633 DTV8 (0x00000210), id: (0000000000000000), size: 149 + # + + write_le32(0x00000210); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(149); # Size + write_hunk_fix_endian(868800, 149); + + # + # Firmware 31, type: STD FW FM (0x00000400), id: (0000000000000000), size: 135 + # + + write_le32(0x00000400); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le32(135); # Size + write_hunk_fix_endian(869584, 135); + + # + # Firmware 32, type: STD FW (0x00000000), id: PAL/I (0000000000000010), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x00000010); # ID + write_le32(161); # Size + write_hunk_fix_endian(869728, 161); + + # + # Firmware 33, type: STD FW MTS (0x00000004), id: PAL/I (0000000000000010), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000000, 0x00000010); # ID + write_le32(169); # Size + write_hunk_fix_endian(869896, 169); + + # + # Firmware 34, type: STD FW (0x00000000), id: SECAM/L AM (0000001000400000), size: 169 + # + + write_le32(0x00000000); # Type + write_le64(0x00000010, 0x00400000); # ID + write_le32(169); # Size + write_hunk_fix_endian(870072, 169); + + # + # Firmware 35, type: STD FW (0x00000000), id: SECAM/L NICAM (0000000c00400000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x0000000c, 0x00400000); # ID + write_le32(161); # Size + write_hunk_fix_endian(870248, 161); + + # + # Firmware 36, type: STD FW (0x00000000), id: SECAM/Lc (0000000000800000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x00800000); # ID + write_le32(161); # Size + write_hunk_fix_endian(870416, 161); + + # + # Firmware 37, type: STD FW (0x00000000), id: NTSC/M Kr (0000000000008000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le32(161); # Size + write_hunk_fix_endian(870584, 161); + + # + # Firmware 38, type: STD FW LCD (0x00001000), id: NTSC/M Kr (0000000000008000), size: 161 + # + + write_le32(0x00001000); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le32(161); # Size + write_hunk_fix_endian(870752, 161); + + # + # Firmware 39, type: STD FW LCD NOGD (0x00003000), id: NTSC/M Kr (0000000000008000), size: 161 + # + + write_le32(0x00003000); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le32(161); # Size + write_hunk_fix_endian(870920, 161); + + # + # Firmware 40, type: STD FW MTS (0x00000004), id: NTSC/M Kr (0000000000008000), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le32(169); # Size + write_hunk_fix_endian(871088, 169); + + # + # Firmware 41, type: STD FW (0x00000000), id: NTSC PAL/M PAL/N (000000000000b700), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(161); # Size + write_hunk_fix_endian(871264, 161); + + # + # Firmware 42, type: STD FW LCD (0x00001000), id: NTSC PAL/M PAL/N (000000000000b700), size: 161 + # + + write_le32(0x00001000); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(161); # Size + write_hunk_fix_endian(871432, 161); + + # + # Firmware 43, type: STD FW LCD NOGD (0x00003000), id: NTSC PAL/M PAL/N (000000000000b700), size: 161 + # + + write_le32(0x00003000); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(161); # Size + write_hunk_fix_endian(871600, 161); + + # + # Firmware 44, type: STD FW (0x00000000), id: NTSC/M Jp (0000000000002000), size: 161 + # + + write_le32(0x00000000); # Type + write_le64(0x00000000, 0x00002000); # ID + write_le32(161); # Size + write_hunk_fix_endian(871264, 161); + + # + # Firmware 45, type: STD FW MTS (0x00000004), id: NTSC PAL/M PAL/N (000000000000b700), size: 169 + # + + write_le32(0x00000004); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(169); # Size + write_hunk_fix_endian(871936, 169); + + # + # Firmware 46, type: STD FW MTS LCD (0x00001004), id: NTSC PAL/M PAL/N (000000000000b700), size: 169 + # + + write_le32(0x00001004); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(169); # Size + write_hunk_fix_endian(872112, 169); + + # + # Firmware 47, type: STD FW MTS LCD NOGD (0x00003004), id: NTSC PAL/M PAL/N (000000000000b700), size: 169 + # + + write_le32(0x00003004); # Type + write_le64(0x00000000, 0x0000b700); # ID + write_le32(169); # Size + write_hunk_fix_endian(872288, 169); + + # + # Firmware 48, type: SCODE FW HAS IF (0x60000000), IF = 3.28 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(3280); # IF + write_le32(192); # Size + write_hunk(811896, 192); + + # + # Firmware 49, type: SCODE FW HAS IF (0x60000000), IF = 3.30 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(3300); # IF + write_le32(192); # Size + write_hunk(813048, 192); + + # + # Firmware 50, type: SCODE FW HAS IF (0x60000000), IF = 3.44 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(3440); # IF + write_le32(192); # Size + write_hunk(812280, 192); + + # + # Firmware 51, type: SCODE FW HAS IF (0x60000000), IF = 3.46 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(3460); # IF + write_le32(192); # Size + write_hunk(812472, 192); + + # + # Firmware 52, type: SCODE FW DTV6 ATSC OREN36 HAS IF (0x60210020), IF = 3.80 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60210020); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(3800); # IF + write_le32(192); # Size + write_hunk(809784, 192); + + # + # Firmware 53, type: SCODE FW HAS IF (0x60000000), IF = 4.00 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4000); # IF + write_le32(192); # Size + write_hunk(812088, 192); + + # + # Firmware 54, type: SCODE FW DTV6 ATSC TOYOTA388 HAS IF (0x60410020), IF = 4.08 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60410020); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4080); # IF + write_le32(192); # Size + write_hunk(809976, 192); + + # + # Firmware 55, type: SCODE FW HAS IF (0x60000000), IF = 4.20 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4200); # IF + write_le32(192); # Size + write_hunk(811704, 192); + + # + # Firmware 56, type: SCODE FW MONO HAS IF (0x60008000), IF = 4.32 MHz id: NTSC/M Kr (0000000000008000), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le16(4320); # IF + write_le32(192); # Size + write_hunk(808056, 192); + + # + # Firmware 57, type: SCODE FW HAS IF (0x60000000), IF = 4.45 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4450); # IF + write_le32(192); # Size + write_hunk(812664, 192); + + # + # Firmware 58, type: SCODE FW HAS IF (0x60000000), IF = 4.50 MHz id: NTSC/M Jp (0000000000002000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00002000); # ID + write_le16(4500); # IF + write_le32(192); # Size + write_hunk(807672, 192); + + # + # Firmware 59, type: SCODE FW LCD NOGD IF HAS IF (0x60023000), IF = 4.60 MHz id: NTSC/M Kr (0000000000008000), size: 192 + # + + write_le32(0x60023000); # Type + write_le64(0x00000000, 0x00008000); # ID + write_le16(4600); # IF + write_le32(192); # Size + write_hunk(807864, 192); + + # + # Firmware 60, type: SCODE FW DTV78 ZARLINK456 HAS IF (0x62000100), IF = 4.76 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x62000100); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4760); # IF + write_le32(192); # Size + write_hunk(807288, 192); + + # + # Firmware 61, type: SCODE FW HAS IF (0x60000000), IF = 4.94 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(4940); # IF + write_le32(192); # Size + write_hunk(811512, 192); + + # + # Firmware 62, type: SCODE FW DTV7 ZARLINK456 HAS IF (0x62000080), IF = 5.26 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x62000080); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(5260); # IF + write_le32(192); # Size + write_hunk(810552, 192); + + # + # Firmware 63, type: SCODE FW MONO HAS IF (0x60008000), IF = 5.32 MHz id: PAL/BG NICAM/B (0000000800000007), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000008, 0x00000007); # ID + write_le16(5320); # IF + write_le32(192); # Size + write_hunk(810744, 192); + + # + # Firmware 64, type: SCODE FW DTV8 CHINA HAS IF (0x64000200), IF = 5.40 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x64000200); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(5400); # IF + write_le32(192); # Size + write_hunk(807096, 192); + + # + # Firmware 65, type: SCODE FW DTV6 ATSC OREN538 HAS IF (0x60110020), IF = 5.58 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60110020); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(5580); # IF + write_le32(192); # Size + write_hunk(809592, 192); + + # + # Firmware 66, type: SCODE FW HAS IF (0x60000000), IF = 5.64 MHz id: PAL/BG A2/B (0000000200000007), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000002, 0x00000007); # ID + write_le16(5640); # IF + write_le32(192); # Size + write_hunk(808440, 192); + + # + # Firmware 67, type: SCODE FW HAS IF (0x60000000), IF = 5.74 MHz id: PAL/BG NICAM/B (0000000800000007), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000008, 0x00000007); # ID + write_le16(5740); # IF + write_le32(192); # Size + write_hunk(808632, 192); + + # + # Firmware 68, type: SCODE FW DTV7 DIBCOM52 HAS IF (0x61000080), IF = 5.90 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x61000080); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(5900); # IF + write_le32(192); # Size + write_hunk(810360, 192); + + # + # Firmware 69, type: SCODE FW MONO HAS IF (0x60008000), IF = 6.00 MHz id: PAL/I (0000000000000010), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000000, 0x00000010); # ID + write_le16(6000); # IF + write_le32(192); # Size + write_hunk(808824, 192); + + # + # Firmware 70, type: SCODE FW DTV6 QAM F6MHZ HAS IF (0x68000060), IF = 6.20 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x68000060); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(6200); # IF + write_le32(192); # Size + write_hunk(809400, 192); + + # + # Firmware 71, type: SCODE FW HAS IF (0x60000000), IF = 6.24 MHz id: PAL/I (0000000000000010), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000010); # ID + write_le16(6240); # IF + write_le32(192); # Size + write_hunk(808248, 192); + + # + # Firmware 72, type: SCODE FW MONO HAS IF (0x60008000), IF = 6.32 MHz id: SECAM/K1 (0000000000200000), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000000, 0x00200000); # ID + write_le16(6320); # IF + write_le32(192); # Size + write_hunk(811320, 192); + + # + # Firmware 73, type: SCODE FW HAS IF (0x60000000), IF = 6.34 MHz id: SECAM/K1 (0000000000200000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00200000); # ID + write_le16(6340); # IF + write_le32(192); # Size + write_hunk(809208, 192); + + # + # Firmware 74, type: SCODE FW MONO HAS IF (0x60008000), IF = 6.50 MHz id: SECAM/K3 (0000000004000000), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000000, 0x04000000); # ID + write_le16(6500); # IF + write_le32(192); # Size + write_hunk(811128, 192); + + # + # Firmware 75, type: SCODE FW DTV6 ATSC ATI638 HAS IF (0x60090020), IF = 6.58 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60090020); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(6580); # IF + write_le32(192); # Size + write_hunk(807480, 192); + + # + # Firmware 76, type: SCODE FW HAS IF (0x60000000), IF = 6.60 MHz id: PAL/DK A2 (00000003000000e0), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000003, 0x000000e0); # ID + write_le16(6600); # IF + write_le32(192); # Size + write_hunk(809016, 192); + + # + # Firmware 77, type: SCODE FW MONO HAS IF (0x60008000), IF = 6.68 MHz id: PAL/DK A2 (00000003000000e0), size: 192 + # + + write_le32(0x60008000); # Type + write_le64(0x00000003, 0x000000e0); # ID + write_le16(6680); # IF + write_le32(192); # Size + write_hunk(810936, 192); + + # + # Firmware 78, type: SCODE FW DTV6 ATSC TOYOTA794 HAS IF (0x60810020), IF = 8.14 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60810020); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(8140); # IF + write_le32(192); # Size + write_hunk(810168, 192); + + # + # Firmware 79, type: SCODE FW HAS IF (0x60000000), IF = 8.20 MHz id: (0000000000000000), size: 192 + # + + write_le32(0x60000000); # Type + write_le64(0x00000000, 0x00000000); # ID + write_le16(8200); # IF + write_le32(192); # Size + write_hunk(812856, 192); +} + +sub extract_firmware { + my $sourcefile = "hcw85bda.sys"; + my $hash = "0e44dbf63bb0169d57446aec21881ff2"; + my $outfile = "xc3028-v27.fw"; + my $name = "xc2028 firmware"; + my $version = 519; + my $nr_desc = 80; + my $out; + + verify($sourcefile, $hash); + + open INFILE, "<$sourcefile"; + main_firmware($outfile, $name, $version, $nr_desc); + close INFILE; +} + +extract_firmware; +printf "Firmwares generated.\n"; diff --git a/Documentation/video4linux/sn9c102.txt b/Documentation/video4linux/sn9c102.txt index 1ffad19ce891..b26f5195af51 100644 --- a/Documentation/video4linux/sn9c102.txt +++ b/Documentation/video4linux/sn9c102.txt @@ -568,6 +568,7 @@ the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'. Many thanks to following persons for their contribute (listed in alphabetical order): +- David Anderson for the donation of a webcam; - Luca Capello for the donation of a webcam; - Philippe Coval for having helped testing the PAS202BCA image sensor; - Joao Rodrigo Fuzaro, Joao Limirio, Claudio Filho and Caio Begotti for the diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c index 7047696c47a1..488c1f31b992 100644 --- a/Documentation/vm/slabinfo.c +++ b/Documentation/vm/slabinfo.c @@ -1021,7 +1021,7 @@ void read_slab_dir(void) char *t; int count; - if (chdir("/sys/slab")) + if (chdir("/sys/kernel/slab")) fatal("SYSFS support for SLUB not active\n"); dir = opendir("."); diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index d17f324db9f5..dcf8bcf846d6 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt @@ -63,7 +63,7 @@ In case you forgot to enable debugging on the kernel command line: It is possible to enable debugging manually when the kernel is up. Look at the contents of: -/sys/slab/<slab name>/ +/sys/kernel/slab/<slab name>/ Look at the writable files. Writing 1 to them will enable the corresponding debug option. All options can be set on a slab that does diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 945311840a10..34abae4e9442 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -110,12 +110,18 @@ Idle loop Rebooting - reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old] + reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] [, [w]arm | [c]old] bios Use the CPU reboot vector for warm reset warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbd Use the keyboard controller. cold reset (default) + acpi Use the ACPI RESET_REG in the FADT. If ACPI is not configured or the + ACPI reset does not work, the reboot path attempts the reset using + the keyboard controller. + efi Use efi reset_system runtime service. If EFI is not configured or the + EFI reset does not work, the reboot path attempts the reset using + the keyboard controller. Using warm reset will be much faster especially on big memory systems because the BIOS will not go through the memory check. diff --git a/Documentation/x86_64/uefi.txt b/Documentation/x86_64/uefi.txt index 91a98edfb588..7d77120a5184 100644 --- a/Documentation/x86_64/uefi.txt +++ b/Documentation/x86_64/uefi.txt @@ -19,6 +19,10 @@ Mechanics: - Build the kernel with the following configuration. CONFIG_FB_EFI=y CONFIG_FRAMEBUFFER_CONSOLE=y + If EFI runtime services are expected, the following configuration should + be selected. + CONFIG_EFI=y + CONFIG_EFI_VARS=y or m # optional - Create a VFAT partition on the disk - Copy the following to the VFAT partition: elilo bootloader with x86_64 support, elilo configuration file, @@ -27,3 +31,8 @@ Mechanics: can be found in the elilo sourceforge project. - Boot to EFI shell and invoke elilo choosing the kernel image built in first step. +- If some or all EFI runtime services don't work, you can try following + kernel command line parameters to turn off some or all EFI runtime + services. + noefi turn off all EFI runtime services + reboot_type=k turn off EFI reboot runtime service diff --git a/Documentation/zh_CN/CodingStyle b/Documentation/zh_CN/CodingStyle new file mode 100644 index 000000000000..ecd9307a641f --- /dev/null +++ b/Documentation/zh_CN/CodingStyle @@ -0,0 +1,701 @@ +Chinese translated version of Documentation/CodingStyle + +If you have any comment or update to the content, please post to LKML directly. +However, if you have problem communicating in English you can also ask the +Chinese maintainer for help. Contact the Chinese maintainer, if this +translation is outdated or there is problem with translation. + +Chinese maintainer: Zhang Le <r0bertz@gentoo.org> +--------------------------------------------------------------------- +Documentation/CodingStyle的中文翻译 + +如果想评论或更新本文的内容,请直接发信到LKML。如果你使用英文交流有困难的话,也可 +以向中文版维护者求助。如果本翻译更新不及时或者翻译存在问题,请联系中文版维护者。 + +中文版维护者: 张乐 Zhang Le <r0bertz@gentoo.org> +中文版翻译者: 张乐 Zhang Le <r0bertz@gentoo.org> +中文版校译者: 王聪 Wang Cong <xiyou.wangcong@gmail.com> + wheelz <kernel.zeng@gmail.com> + 管旭东 Xudong Guan <xudong.guan@gmail.com> + Li Zefan <lizf@cn.fujitsu.com> + Wang Chen <wangchen@cn.fujitsu.com> +以下为正文 +--------------------------------------------------------------------- + + Linux内核代码风格 + +这是一个简短的文档,描述了linux内核的首选代码风格。代码风格是因人而异的,而且我 +不愿意把我的观点强加给任何人,不过这里所讲述的是我必须要维护的代码所遵守的风格, +并且我也希望绝大多数其他代码也能遵守这个风格。请在写代码时至少考虑一下本文所述的 +风格。 + +首先,我建议你打印一份GNU代码规范,然后不要读它。烧了它,这是一个具有重大象征性 +意义的动作。 + +不管怎样,现在我们开始: + + + 第一章:缩进 + +制表符是8个字符,所以缩进也是8个字符。有些异端运动试图将缩进变为4(乃至2)个字符 +深,这几乎相当于尝试将圆周率的值定义为3。 + +理由:缩进的全部意义就在于清楚的定义一个控制块起止于何处。尤其是当你盯着你的屏幕 +连续看了20小时之后,你将会发现大一点的缩进会使你更容易分辨缩进。 + +现在,有些人会抱怨8个字符的缩进会使代码向右边移动的太远,在80个字符的终端屏幕上 +就很难读这样的代码。这个问题的答案是,如果你需要3级以上的缩进,不管用何种方式你 +的代码已经有问题了,应该修正你的程序。 + +简而言之,8个字符的缩进可以让代码更容易阅读,还有一个好处是当你的函数嵌套太深的 +时候可以给你警告。留心这个警告。 + +在switch语句中消除多级缩进的首选的方式是让“switch”和从属于它的“case”标签对齐于同 +一列,而不要“两次缩进”“case”标签。比如: + + switch (suffix) { + case 'G': + case 'g': + mem <<= 30; + break; + case 'M': + case 'm': + mem <<= 20; + break; + case 'K': + case 'k': + mem <<= 10; + /* fall through */ + default: + break; + } + + +不要把多个语句放在一行里,除非你有什么东西要隐藏: + + if (condition) do_this; + do_something_everytime; + +也不要在一行里放多个赋值语句。内核代码风格超级简单。就是避免可能导致别人误读的表 +达式。 + +除了注释、文档和Kconfig之外,不要使用空格来缩进,前面的例子是例外,是有意为之。 + +选用一个好的编辑器,不要在行尾留空格。 + + + 第二章:把长的行和字符串打散 + +代码风格的意义就在于使用平常使用的工具来维持代码的可读性和可维护性。 + +每一行的长度的限制是80列,我们强烈建议您遵守这个惯例。 + +长于80列的语句要打散成有意义的片段。每个片段要明显短于原来的语句,而且放置的位置 +也明显的靠右。同样的规则也适用于有很长参数列表的函数头。长字符串也要打散成较短的 +字符串。唯一的例外是超过80列可以大幅度提高可读性并且不会隐藏信息的情况。 + +void fun(int a, int b, int c) +{ + if (condition) + printk(KERN_WARNING "Warning this is a long printk with " + "3 parameters a: %u b: %u " + "c: %u \n", a, b, c); + else + next_statement; +} + + 第三章:大括号和空格的放置 + +C语言风格中另外一个常见问题是大括号的放置。和缩进大小不同,选择或弃用某种放置策 +略并没有多少技术上的原因,不过首选的方式,就像Kernighan和Ritchie展示给我们的,是 +把起始大括号放在行尾,而把结束大括号放在行首,所以: + + if (x is true) { + we do y + } + +这适用于所有的非函数语句块(if、switch、for、while、do)。比如: + + switch (action) { + case KOBJ_ADD: + return "add"; + case KOBJ_REMOVE: + return "remove"; + case KOBJ_CHANGE: + return "change"; + default: + return NULL; + } + +不过,有一个例外,那就是函数:函数的起始大括号放置于下一行的开头,所以: + + int function(int x) + { + body of function + } + +全世界的异端可能会抱怨这个不一致性是……呃……不一致的,不过所有思维健全的人都知道( +a)K&R是_正确的_,并且(b)K&R是正确的。此外,不管怎样函数都是特殊的(在C语言中 +,函数是不能嵌套的)。 + +注意结束大括号独自占据一行,除非它后面跟着同一个语句的剩余部分,也就是do语句中的 +“while”或者if语句中的“else”,像这样: + + do { + body of do-loop + } while (condition); + +和 + + if (x == y) { + .. + } else if (x > y) { + ... + } else { + .... + } + +理由:K&R。 + +也请注意这种大括号的放置方式也能使空(或者差不多空的)行的数量最小化,同时不失可 +读性。因此,由于你的屏幕上的新行是不可再生资源(想想25行的终端屏幕),你将会有更 +多的空行来放置注释。 + +当只有一个单独的语句的时候,不用加不必要的大括号。 + +if (condition) + action(); + +这点不适用于本身为某个条件语句的一个分支的单独语句。这时需要在两个分支里都使用大 +括号。 + +if (condition) { + do_this(); + do_that(); +} else { + otherwise(); +} + + 3.1:空格 + +Linux内核的空格使用方式(主要)取决于它是用于函数还是关键字。(大多数)关键字后 +要加一个空格。值得注意的例外是sizeof、typeof、alignof和__attribute__,这些关键字 +某些程度上看起来更像函数(它们在Linux里也常常伴随小括号而使用,尽管在C语言里这样 +的小括号不是必需的,就像“struct fileinfo info”声明过后的“sizeof info”)。 + +所以在这些关键字之后放一个空格: + if, switch, case, for, do, while +但是不要在sizeof、typeof、alignof或者__attribute__这些关键字之后放空格。例如, + s = sizeof(struct file); + +不要在小括号里的表达式两侧加空格。这是一个反例: + + s = sizeof( struct file ); + +当声明指针类型或者返回指针类型的函数时,“*”的首选使用方式是使之靠近变量名或者函 +数名,而不是靠近类型名。例子: + + char *linux_banner; + unsigned long long memparse(char *ptr, char **retptr); + char *match_strdup(substring_t *s); + +在大多数二元和三元操作符两侧使用一个空格,例如下面所有这些操作符: + + = + - < > * / % | & ^ <= >= == != ? : + +但是一元操作符后不要加空格: + & * + - ~ ! sizeof typeof alignof __attribute__ defined + +后缀自加和自减一元操作符前不加空格: + ++ -- + +前缀自加和自减一元操作符后不加空格: + ++ -- + +“.”和“->”结构体成员操作符前后不加空格。 + +不要在行尾留空白。有些可以自动缩进的编辑器会在新行的行首加入适量的空白,然后你 +就可以直接在那一行输入代码。不过假如你最后没有在那一行输入代码,有些编辑器就不 +会移除已经加入的空白,就像你故意留下一个只有空白的行。包含行尾空白的行就这样产 +生了。 + +当git发现补丁包含了行尾空白的时候会警告你,并且可以应你的要求去掉行尾空白;不过 +如果你是正在打一系列补丁,这样做会导致后面的补丁失败,因为你改变了补丁的上下文。 + + + 第四章:命名 + +C是一个简朴的语言,你的命名也应该这样。和Modula-2和Pascal程序员不同,C程序员不使 +用类似ThisVariableIsATemporaryCounter这样华丽的名字。C程序员会称那个变量为“tmp” +,这样写起来会更容易,而且至少不会令其难于理解。 + +不过,虽然混用大小写的名字是不提倡使用的,但是全局变量还是需要一个具描述性的名字 +。称一个全局函数为“foo”是一个难以饶恕的错误。 + +全局变量(只有当你真正需要它们的时候再用它)需要有一个具描述性的名字,就像全局函 +数。如果你有一个可以计算活动用户数量的函数,你应该叫它“count_active_users()”或者 +类似的名字,你不应该叫它“cntuser()”。 + +在函数名中包含函数类型(所谓的匈牙利命名法)是脑子出了问题——编译器知道那些类型而 +且能够检查那些类型,这样做只能把程序员弄糊涂了。难怪微软总是制造出有问题的程序。 + +本地变量名应该简短,而且能够表达相关的含义。如果你有一些随机的整数型的循环计数器 +,它应该被称为“i”。叫它“loop_counter”并无益处,如果它没有被误解的可能的话。类似 +的,“tmp”可以用来称呼任意类型的临时变量。 + +如果你怕混淆了你的本地变量名,你就遇到另一个问题了,叫做函数增长荷尔蒙失衡综合症 +。请看第六章(函数)。 + + + 第五章:Typedef + +不要使用类似“vps_t”之类的东西。 + +对结构体和指针使用typedef是一个错误。当你在代码里看到: + + vps_t a; + +这代表什么意思呢? + +相反,如果是这样 + + struct virtual_container *a; + +你就知道“a”是什么了。 + +很多人认为typedef“能提高可读性”。实际不是这样的。它们只在下列情况下有用: + + (a) 完全不透明的对象(这种情况下要主动使用typedef来隐藏这个对象实际上是什么)。 + + 例如:“pte_t”等不透明对象,你只能用合适的访问函数来访问它们。 + + 注意!不透明性和“访问函数”本身是不好的。我们使用pte_t等类型的原因在于真的是 + 完全没有任何共用的可访问信息。 + + (b) 清楚的整数类型,如此,这层抽象就可以帮助消除到底是“int”还是“long”的混淆。 + + u8/u16/u32是完全没有问题的typedef,不过它们更符合类别(d)而不是这里。 + + 再次注意!要这样做,必须事出有因。如果某个变量是“unsigned long“,那么没有必要 + + typedef unsigned long myflags_t; + + 不过如果有一个明确的原因,比如它在某种情况下可能会是一个“unsigned int”而在 + 其他情况下可能为“unsigned long”,那么就不要犹豫,请务必使用typedef。 + + (c) 当你使用sparse按字面的创建一个新类型来做类型检查的时候。 + + (d) 和标准C99类型相同的类型,在某些例外的情况下。 + + 虽然让眼睛和脑筋来适应新的标准类型比如“uint32_t”不需要花很多时间,可是有些 + 人仍然拒绝使用它们。 + + 因此,Linux特有的等同于标准类型的“u8/u16/u32/u64”类型和它们的有符号类型是被 + 允许的——尽管在你自己的新代码中,它们不是强制要求要使用的。 + + 当编辑已经使用了某个类型集的已有代码时,你应该遵循那些代码中已经做出的选择。 + + (e) 可以在用户空间安全使用的类型。 + + 在某些用户空间可见的结构体里,我们不能要求C99类型而且不能用上面提到的“u32” + 类型。因此,我们在与用户空间共享的所有结构体中使用__u32和类似的类型。 + +可能还有其他的情况,不过基本的规则是永远不要使用typedef,除非你可以明确的应用上 +述某个规则中的一个。 + +总的来说,如果一个指针或者一个结构体里的元素可以合理的被直接访问到,那么它们就不 +应该是一个typedef。 + + + 第六章:函数 + +函数应该简短而漂亮,并且只完成一件事情。函数应该可以一屏或者两屏显示完(我们都知 +道ISO/ANSI屏幕大小是80x24),只做一件事情,而且把它做好。 + +一个函数的最大长度是和该函数的复杂度和缩进级数成反比的。所以,如果你有一个理论上 +很简单的只有一个很长(但是简单)的case语句的函数,而且你需要在每个case里做很多很 +小的事情,这样的函数尽管很长,但也是可以的。 + +不过,如果你有一个复杂的函数,而且你怀疑一个天分不是很高的高中一年级学生可能甚至 +搞不清楚这个函数的目的,你应该严格的遵守前面提到的长度限制。使用辅助函数,并为之 +取个具描述性的名字(如果你觉得它们的性能很重要的话,可以让编译器内联它们,这样的 +效果往往会比你写一个复杂函数的效果要好。) + +函数的另外一个衡量标准是本地变量的数量。此数量不应超过5-10个,否则你的函数就有 +问题了。重新考虑一下你的函数,把它分拆成更小的函数。人的大脑一般可以轻松的同时跟 +踪7个不同的事物,如果再增多的话,就会糊涂了。即便你聪颖过人,你也可能会记不清你2 +个星期前做过的事情。 + +在源文件里,使用空行隔开不同的函数。如果该函数需要被导出,它的EXPORT*宏应该紧贴 +在它的结束大括号之下。比如: + +int system_is_up(void) +{ + return system_state == SYSTEM_RUNNING; +} +EXPORT_SYMBOL(system_is_up); + +在函数原型中,包含函数名和它们的数据类型。虽然C语言里没有这样的要求,在Linux里这 +是提倡的做法,因为这样可以很简单的给读者提供更多的有价值的信息。 + + + 第七章:集中的函数退出途径 + +虽然被某些人声称已经过时,但是goto语句的等价物还是经常被编译器所使用,具体形式是 +无条件跳转指令。 + +当一个函数从多个位置退出并且需要做一些通用的清理工作的时候,goto的好处就显现出来 +了。 + +理由是: + +- 无条件语句容易理解和跟踪 +- 嵌套程度减小 +- 可以避免由于修改时忘记更新某个单独的退出点而导致的错误 +- 减轻了编译器的工作,无需删除冗余代码;) + +int fun(int a) +{ + int result = 0; + char *buffer = kmalloc(SIZE); + + if (buffer == NULL) + return -ENOMEM; + + if (condition1) { + while (loop1) { + ... + } + result = 1; + goto out; + } + ... +out: + kfree(buffer); + return result; +} + + 第八章:注释 + +注释是好的,不过有过度注释的危险。永远不要在注释里解释你的代码是如何运作的:更好 +的做法是让别人一看你的代码就可以明白,解释写的很差的代码是浪费时间。 + +一般的,你想要你的注释告诉别人你的代码做了什么,而不是怎么做的。也请你不要把注释 +放在一个函数体内部:如果函数复杂到你需要独立的注释其中的一部分,你很可能需要回到 +第六章看一看。你可以做一些小注释来注明或警告某些很聪明(或者槽糕)的做法,但不要 +加太多。你应该做的,是把注释放在函数的头部,告诉人们它做了什么,也可以加上它做这 +些事情的原因。 + +当注释内核API函数时,请使用kernel-doc格式。请看 +Documentation/kernel-doc-nano-HOWTO.txt和scripts/kernel-doc以获得详细信息。 + +Linux的注释风格是C89“/* ... */”风格。不要使用C99风格“// ...”注释。 + +长(多行)的首选注释风格是: + + /* + * This is the preferred style for multi-line + * comments in the Linux kernel source code. + * Please use it consistently. + * + * Description: A column of asterisks on the left side, + * with beginning and ending almost-blank lines. + */ + +注释数据也是很重要的,不管是基本类型还是衍生类型。为了方便实现这一点,每一行应只 +声明一个数据(不要使用逗号来一次声明多个数据)。这样你就有空间来为每个数据写一段 +小注释来解释它们的用途了。 + + + 第九章:你已经把事情弄糟了 + +这没什么,我们都是这样。可能你的使用了很长时间Unix的朋友已经告诉你“GNU emacs”能 +自动帮你格式化C源代码,而且你也注意到了,确实是这样,不过它所使用的默认值和我们 +想要的相去甚远(实际上,甚至比随机打的还要差——无数个猴子在GNU emacs里打字永远不 +会创造出一个好程序)(译注:请参考Infinite Monkey Theorem) + +所以你要么放弃GNU emacs,要么改变它让它使用更合理的设定。要采用后一个方案,你可 +以把下面这段粘贴到你的.emacs文件里。 + +(defun linux-c-mode () + "C mode with adjusted defaults for use with the Linux kernel." + (interactive) + (c-mode) + (c-set-style "K&R") + (setq tab-width 8) + (setq indent-tabs-mode t) + (setq c-basic-offset 8)) + +这样就定义了M-x linux-c-mode命令。当你hack一个模块的时候,如果你把字符串 +-*- linux-c -*-放在头两行的某个位置,这个模式将会被自动调用。如果你希望在你修改 +/usr/src/linux里的文件时魔术般自动打开linux-c-mode的话,你也可能需要添加 + +(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode) + auto-mode-alist)) + +到你的.emacs文件里。 + +不过就算你尝试让emacs正确的格式化代码失败了,也并不意味着你失去了一切:还可以用“ +indent”。 + +不过,GNU indent也有和GNU emacs一样有问题的设定,所以你需要给它一些命令选项。不 +过,这还不算太糟糕,因为就算是GNU indent的作者也认同K&R的权威性(GNU的人并不是坏 +人,他们只是在这个问题上被严重的误导了),所以你只要给indent指定选项“-kr -i8” +(代表“K&R,8个字符缩进”),或者使用“scripts/Lindent”,这样就可以以最时髦的方式 +缩进源代码。 + +“indent”有很多选项,特别是重新格式化注释的时候,你可能需要看一下它的手册页。不过 +记住:“indent”不能修正坏的编程习惯。 + + + 第十章:Kconfig配置文件 + +对于遍布源码树的所有Kconfig*配置文件来说,它们缩进方式与C代码相比有所不同。紧挨 +在“config”定义下面的行缩进一个制表符,帮助信息则再多缩进2个空格。比如: + +config AUDIT + bool "Auditing support" + depends on NET + help + Enable auditing infrastructure that can be used with another + kernel subsystem, such as SELinux (which requires this for + logging of avc messages output). Does not do system-call + auditing without CONFIG_AUDITSYSCALL. + +仍然被认为不够稳定的功能应该被定义为依赖于“EXPERIMENTAL”: + +config SLUB + depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT + bool "SLUB (Unqueued Allocator)" + ... + +而那些危险的功能(比如某些文件系统的写支持)应该在它们的提示字符串里显著的声明这 +一点: + +config ADFS_FS_RW + bool "ADFS write support (DANGEROUS)" + depends on ADFS_FS + ... + +要查看配置文件的完整文档,请看Documentation/kbuild/kconfig-language.txt。 + + + 第十一章:数据结构 + +如果一个数据结构,在创建和销毁它的单线执行环境之外可见,那么它必须要有一个引用计 +数器。内核里没有垃圾收集(并且内核之外的垃圾收集慢且效率低下),这意味着你绝对需 +要记录你对这种数据结构的使用情况。 + +引用计数意味着你能够避免上锁,并且允许多个用户并行访问这个数据结构——而不需要担心 +这个数据结构仅仅因为暂时不被使用就消失了,那些用户可能不过是沉睡了一阵或者做了一 +些其他事情而已。 + +注意上锁不能取代引用计数。上锁是为了保持数据结构的一致性,而引用计数是一个内存管 +理技巧。通常二者都需要,不要把两个搞混了。 + +很多数据结构实际上有2级引用计数,它们通常有不同“类”的用户。子类计数器统计子类用 +户的数量,每当子类计数器减至零时,全局计数器减一。 + +这种“多级引用计数”的例子可以在内存管理(“struct mm_struct”:mm_users和mm_count) +和文件系统(“struct super_block”:s_count和s_active)中找到。 + +记住:如果另一个执行线索可以找到你的数据结构,但是这个数据结构没有引用计数器,这 +里几乎肯定是一个bug。 + + + 第十二章:宏,枚举和RTL + +用于定义常量的宏的名字及枚举里的标签需要大写。 + +#define CONSTANT 0x12345 + +在定义几个相关的常量时,最好用枚举。 + +宏的名字请用大写字母,不过形如函数的宏的名字可以用小写字母。 + +一般的,如果能写成内联函数就不要写成像函数的宏。 + +含有多个语句的宏应该被包含在一个do-while代码块里: + +#define macrofun(a, b, c) \ + do { \ + if (a == 5) \ + do_this(b, c); \ + } while (0) + +使用宏的时候应避免的事情: + +1) 影响控制流程的宏: + +#define FOO(x) \ + do { \ + if (blah(x) < 0) \ + return -EBUGGERED; \ + } while(0) + +非常不好。它看起来像一个函数,不过却能导致“调用”它的函数退出;不要打乱读者大脑里 +的语法分析器。 + +2) 依赖于一个固定名字的本地变量的宏: + +#define FOO(val) bar(index, val) + +可能看起来像是个不错的东西,不过它非常容易把读代码的人搞糊涂,而且容易导致看起来 +不相关的改动带来错误。 + +3) 作为左值的带参数的宏: FOO(x) = y;如果有人把FOO变成一个内联函数的话,这种用 +法就会出错了。 + +4) 忘记了优先级:使用表达式定义常量的宏必须将表达式置于一对小括号之内。带参数的 +宏也要注意此类问题。 + +#define CONSTANT 0x4000 +#define CONSTEXP (CONSTANT | 3) + +cpp手册对宏的讲解很详细。Gcc internals手册也详细讲解了RTL(译注:register +transfer language),内核里的汇编语言经常用到它。 + + + 第十三章:打印内核消息 + +内核开发者应该是受过良好教育的。请一定注意内核信息的拼写,以给人以好的印象。不要 +用不规范的单词比如“dont”,而要用“do not”或者“don't”。保证这些信息简单、明了、无 +歧义。 + +内核信息不必以句号(译注:英文句号,即点)结束。 + +在小括号里打印数字(%d)没有任何价值,应该避免这样做。 + +<linux/device.h>里有一些驱动模型诊断宏,你应该使用它们,以确保信息对应于正确的 +设备和驱动,并且被标记了正确的消息级别。这些宏有:dev_err(), dev_warn(), +dev_info()等等。对于那些不和某个特定设备相关连的信息,<linux/kernel.h>定义了 +pr_debug()和pr_info()。 + +写出好的调试信息可以是一个很大的挑战;当你写出来之后,这些信息在远程除错的时候 +就会成为极大的帮助。当DEBUG符号没有被定义的时候,这些信息不应该被编译进内核里 +(也就是说,默认地,它们不应该被包含在内)。如果你使用dev_dbg()或者pr_debug(), +就能自动达到这个效果。很多子系统拥有Kconfig选项来启用-DDEBUG。还有一个相关的惯例 +是使用VERBOSE_DEBUG来添加dev_vdbg()消息到那些已经由DEBUG启用的消息之上。 + + + 第十四章:分配内存 + +内核提供了下面的一般用途的内存分配函数:kmalloc(),kzalloc(),kcalloc()和 +vmalloc()。请参考API文档以获取有关它们的详细信息。 + +传递结构体大小的首选形式是这样的: + + p = kmalloc(sizeof(*p), ...); + +另外一种传递方式中,sizeof的操作数是结构体的名字,这样会降低可读性,并且可能会引 +入bug。有可能指针变量类型被改变时,而对应的传递给内存分配函数的sizeof的结果不变。 + +强制转换一个void指针返回值是多余的。C语言本身保证了从void指针到其他任何指针类型 +的转换是没有问题的。 + + + 第十五章:内联弊病 + +有一个常见的误解是内联函数是gcc提供的可以让代码运行更快的一个选项。虽然使用内联 +函数有时候是恰当的(比如作为一种替代宏的方式,请看第十二章),不过很多情况下不是 +这样。inline关键字的过度使用会使内核变大,从而使整个系统运行速度变慢。因为大内核 +会占用更多的指令高速缓存(译注:一级缓存通常是指令缓存和数据缓存分开的)而且会导 +致pagecache的可用内存减少。想象一下,一次pagecache未命中就会导致一次磁盘寻址,将 +耗时5毫秒。5毫秒的时间内CPU能执行很多很多指令。 + +一个基本的原则是如果一个函数有3行以上,就不要把它变成内联函数。这个原则的一个例 +外是,如果你知道某个参数是一个编译时常量,而且因为这个常量你确定编译器在编译时能 +优化掉你的函数的大部分代码,那仍然可以给它加上inline关键字。kmalloc()内联函数就 +是一个很好的例子。 + +人们经常主张给static的而且只用了一次的函数加上inline,如此不会有任何损失,因为没 +有什么好权衡的。虽然从技术上说这是正确的,但是实际上这种情况下即使不加inline gcc +也可以自动使其内联。而且其他用户可能会要求移除inline,由此而来的争论会抵消inline +自身的潜在价值,得不偿失。 + + + 第十六章:函数返回值及命名 + +函数可以返回很多种不同类型的值,最常见的一种是表明函数执行成功或者失败的值。这样 +的一个值可以表示为一个错误代码整数(-Exxx=失败,0=成功)或者一个“成功”布尔值( +0=失败,非0=成功)。 + +混合使用这两种表达方式是难于发现的bug的来源。如果C语言本身严格区分整形和布尔型变 +量,那么编译器就能够帮我们发现这些错误……不过C语言不区分。为了避免产生这种bug,请 +遵循下面的惯例: + + 如果函数的名字是一个动作或者强制性的命令,那么这个函数应该返回错误代码整 + 数。如果是一个判断,那么函数应该返回一个“成功”布尔值。 + +比如,“add work”是一个命令,所以add_work()函数在成功时返回0,在失败时返回-EBUSY。 +类似的,因为“PCI device present”是一个判断,所以pci_dev_present()函数在成功找到 +一个匹配的设备时应该返回1,如果找不到时应该返回0。 + +所有导出(译注:EXPORT)的函数都必须遵守这个惯例,所有的公共函数也都应该如此。私 +有(static)函数不需要如此,但是我们也推荐这样做。 + +返回值是实际计算结果而不是计算是否成功的标志的函数不受此惯例的限制。一般的,他们 +通过返回一些正常值范围之外的结果来表示出错。典型的例子是返回指针的函数,他们使用 +NULL或者ERR_PTR机制来报告错误。 + + + 第十七章:不要重新发明内核宏 + +头文件include/linux/kernel.h包含了一些宏,你应该使用它们,而不要自己写一些它们的 +变种。比如,如果你需要计算一个数组的长度,使用这个宏 + + #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) + +类似的,如果你要计算某结构体成员的大小,使用 + + #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) + +还有可以做严格的类型检查的min()和max()宏,如果你需要可以使用它们。你可以自己看看 +那个头文件里还定义了什么你可以拿来用的东西,如果有定义的话,你就不应在你的代码里 +自己重新定义。 + + + 第十八章:编辑器模式行和其他需要罗嗦的事情 + +有一些编辑器可以解释嵌入在源文件里的由一些特殊标记标明的配置信息。比如,emacs +能够解释被标记成这样的行: + +-*- mode: c -*- + +或者这样的: + +/* +Local Variables: +compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c" +End: +*/ + +Vim能够解释这样的标记: + +/* vim:set sw=8 noet */ + +不要在源代码中包含任何这样的内容。每个人都有他自己的编辑器配置,你的源文件不应 +该覆盖别人的配置。这包括有关缩进和模式配置的标记。人们可以使用他们自己定制的模 +式,或者使用其他可以产生正确的缩进的巧妙方法。 + + + + 附录 I:参考 + +The C Programming Language, 第二版, 作者Brian W. Kernighan和Denni +M. Ritchie. Prentice Hall, Inc., 1988. ISBN 0-13-110362-8 (软皮), +0-13-110370-9 (硬皮). URL: http://cm.bell-labs.com/cm/cs/cbook/ + +The Practice of Programming 作者Brian W. Kernighan和Rob Pike. Addison-Wesley, +Inc., 1999. ISBN 0-201-61586-X. URL: http://cm.bell-labs.com/cm/cs/tpop/ + +cpp,gcc,gcc internals和indent的GNU手册——和K&R及本文相符合的部分,全部可以在 +http://www.gnu.org/manual/找到 + +WG14是C语言的国际标准化工作组,URL: http://www.open-std.org/JTC1/SC22/WG14/ + +Kernel CodingStyle,作者greg@kroah.com发表于OLS 2002: +http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ + +-- +最后更新于2007年7月13日。 diff --git a/Documentation/zh_CN/HOWTO b/Documentation/zh_CN/HOWTO index 48fc67bfbe3d..3d80e8af36ec 100644 --- a/Documentation/zh_CN/HOWTO +++ b/Documentation/zh_CN/HOWTO @@ -1,10 +1,10 @@ Chinese translated version of Documentation/HOWTO If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have problem +original document maintainer directly. However, if you have a problem communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer, if this translation is outdated -or there is problem with translation. +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. Maintainer: Greg Kroah-Hartman <greg@kroah.com> Chinese maintainer: Li Yang <leoli@freescale.com> @@ -85,7 +85,7 @@ Linux内核源代码都是在GPL(通用公共许可证)的保护下发布的 Linux内核代码中包含有大量的文档。这些文档对于学习如何与内核社区互动有着 不可估量的价值。当一个新的功能被加入内核,最好把解释如何使用这个功能的文 档也放进内核。当内核的改动导致面向用户空间的接口发生变化时,最好将相关信 -息或手册页(manpages)的补丁发到mtk-manpages@gmx.net,以向手册页(manpages) +息或手册页(manpages)的补丁发到mtk.manpages@gmail.com,以向手册页(manpages) 的维护者解释这些变化。 以下是内核代码中需要阅读的文档: @@ -218,6 +218,8 @@ kernel.org网站的pub/linux/kernel/v2.6/目录下找到它。它的开发遵循 时,一个新的-rc版本就会被发布。计划是每周都发布新的-rc版本。 - 这个过程一直持续下去直到内核被认为达到足够稳定的状态,持续时间大概是 6个星期。 + - 以下地址跟踪了在每个-rc发布中发现的退步列表: + http://kernelnewbies.org/known_regressions 关于内核发布,值得一提的是Andrew Morton在linux-kernel邮件列表中如是说: “没有人知道新内核何时会被发布,因为发布是根据已知bug的情况来决定 diff --git a/Documentation/zh_CN/SubmittingDrivers b/Documentation/zh_CN/SubmittingDrivers new file mode 100644 index 000000000000..5f4815c63ec7 --- /dev/null +++ b/Documentation/zh_CN/SubmittingDrivers @@ -0,0 +1,168 @@ +Chinese translated version of Documentation/SubmittingDrivers + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Chinese maintainer: Li Yang <leo@zh-kernel.org> +--------------------------------------------------------------------- +Documentation/SubmittingDrivers 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +中文版维护者: 李阳 Li Yang <leo@zh-kernel.org> +中文版翻译者: 李阳 Li Yang <leo@zh-kernel.org> +中文版校译者: 陈琦 Maggie Chen <chenqi@beyondsoft.com> + 王聪 Wang Cong <xiyou.wangcong@gmail.com> + 张巍 Zhang Wei <Wei.Zhang@freescale.com> + +以下为正文 +--------------------------------------------------------------------- + +如何向 Linux 内核提交驱动程序 +----------------------------- + +这篇文档将会解释如何向不同的内核源码树提交设备驱动程序。请注意,如果你感 +兴趣的是显卡驱动程序,你也许应该访问 XFree86 项目(http://www.xfree86.org/) +和/或 X.org 项目 (http://x.org)。 + +另请参阅 Documentation/SubmittingPatches 文档。 + + +分配设备号 +---------- + +块设备和字符设备的主设备号与从设备号是由 Linux 命名编号分配权威 LANANA( +现在是 Torben Mathiasen)负责分配。申请的网址是 http://www.lanana.org/。 +即使不准备提交到主流内核的设备驱动也需要在这里分配设备号。有关详细信息, +请参阅 Documentation/devices.txt。 + +如果你使用的不是已经分配的设备号,那么当你提交设备驱动的时候,它将会被强 +制分配一个新的设备号,即便这个设备号和你之前发给客户的截然不同。 + +设备驱动的提交对象 +------------------ + +Linux 2.0: + 此内核源码树不接受新的驱动程序。 + +Linux 2.2: + 此内核源码树不接受新的驱动程序。 + +Linux 2.4: + 如果所属的代码领域在内核的 MAINTAINERS 文件中列有一个总维护者, + 那么请将驱动程序提交给他。如果此维护者没有回应或者你找不到恰当的 + 维护者,那么请联系 Willy Tarreau <w@1wt.eu>。 + +Linux 2.6: + 除了遵循和 2.4 版内核同样的规则外,你还需要在 linux-kernel 邮件 + 列表上跟踪最新的 API 变化。向 Linux 2.6 内核提交驱动的顶级联系人 + 是 Andrew Morton <akpm@osdl.org>。 + +决定设备驱动能否被接受的条件 +---------------------------- + +许可: 代码必须使用 GNU 通用公开许可证 (GPL) 提交给 Linux,但是 + 我们并不要求 GPL 是唯一的许可。你或许会希望同时使用多种 + 许可证发布,如果希望驱动程序可以被其他开源社区(比如BSD) + 使用。请参考 include/linux/module.h 文件中所列出的可被 + 接受共存的许可。 + +版权: 版权所有者必须同意使用 GPL 许可。最好提交者和版权所有者 + 是相同个人或实体。否则,必需列出授权使用 GPL 的版权所有 + 人或实体,以备验证之需。 + +接口: 如果你的驱动程序使用现成的接口并且和其他同类的驱动程序行 + 为相似,而不是去发明无谓的新接口,那么它将会更容易被接受。 + 如果你需要一个 Linux 和 NT 的通用驱动接口,那么请在用 + 户空间实现它。 + +代码: 请使用 Documentation/CodingStyle 中所描述的 Linux 代码风 + 格。如果你的某些代码段(例如那些与 Windows 驱动程序包共 + 享的代码段)需要使用其他格式,而你却只希望维护一份代码, + 那么请将它们很好地区分出来,并且注明原因。 + +可移植性: 请注意,指针并不永远是 32 位的,不是所有的计算机都使用小 + 尾模式 (little endian) 存储数据,不是所有的人都拥有浮点 + 单元,不要随便在你的驱动程序里嵌入 x86 汇编指令。只能在 + x86 上运行的驱动程序一般是不受欢迎的。虽然你可能只有 x86 + 硬件,很难测试驱动程序在其他平台上是否可用,但是确保代码 + 可以被轻松地移植却是很简单的。 + +清晰度: 做到所有人都能修补这个驱动程序将会很有好处,因为这样你将 + 会直接收到修复的补丁而不是 bug 报告。如果你提交一个试图 + 隐藏硬件工作机理的驱动程序,那么它将会被扔进废纸篓。 + +电源管理: 因为 Linux 正在被很多移动设备和桌面系统使用,所以你的驱 + 动程序也很有可能被使用在这些设备上。它应该支持最基本的电 + 源管理,即在需要的情况下实现系统级休眠和唤醒要用到的 + .suspend 和 .resume 函数。你应该检查你的驱动程序是否能正 + 确地处理休眠与唤醒,如果实在无法确认,请至少把 .suspend + 函数定义成返回 -ENOSYS(功能未实现)错误。你还应该尝试确 + 保你的驱动在什么都不干的情况下将耗电降到最低。要获得驱动 + 程序测试的指导,请参阅 + Documentation/power/drivers-testing.txt。有关驱动程序电 + 源管理问题相对全面的概述,请参阅 + Documentation/power/devices.txt。 + +管理: 如果一个驱动程序的作者还在进行有效的维护,那么通常除了那 + 些明显正确且不需要任何检查的补丁以外,其他所有的补丁都会 + 被转发给作者。如果你希望成为驱动程序的联系人和更新者,最 + 好在代码注释中写明并且在 MAINTAINERS 文件中加入这个驱动 + 程序的条目。 + +不影响设备驱动能否被接受的条件 +------------------------------ + +供应商: 由硬件供应商来维护驱动程序通常是一件好事。不过,如果源码 + 树里已经有其他人提供了可稳定工作的驱动程序,那么请不要期 + 望“我是供应商”会成为内核改用你的驱动程序的理由。理想的情 + 况是:供应商与现有驱动程序的作者合作,构建一个统一完美的 + 驱动程序。 + +作者: 驱动程序是由大的 Linux 公司研发还是由你个人编写,并不影 + 响其是否能被内核接受。没有人对内核源码树享有特权。只要你 + 充分了解内核社区,你就会发现这一点。 + + +资源列表 +-------- + +Linux 内核主源码树: + ftp.??.kernel.org:/pub/linux/kernel/... + ?? == 你的国家代码,例如 "cn"、"us"、"uk"、"fr" 等等 + +Linux 内核邮件列表: + linux-kernel@vger.kernel.org + [可通过向majordomo@vger.kernel.org发邮件来订阅] + +Linux 设备驱动程序,第三版(探讨 2.6.10 版内核): + http://lwn.net/Kernel/LDD3/ (免费版) + +LWN.net: + 每周内核开发活动摘要 - http://lwn.net/ + 2.6 版中 API 的变更: + http://lwn.net/Articles/2.6-kernel-api/ + 将旧版内核的驱动程序移植到 2.6 版: + http://lwn.net/Articles/driver-porting/ + +KernelTrap: + Linux 内核的最新动态以及开发者访谈 + http://kerneltrap.org/ + +内核新手(KernelNewbies): + 为新的内核开发者提供文档和帮助 + http://kernelnewbies.org/ + +Linux USB项目: + http://www.linux-usb.org/ + +写内核驱动的“不要”(Arjan van de Ven著): + http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf + +内核清洁工 (Kernel Janitor): + http://janitor.kernelnewbies.org/ diff --git a/Documentation/zh_CN/SubmittingPatches b/Documentation/zh_CN/SubmittingPatches new file mode 100644 index 000000000000..985c92e20b73 --- /dev/null +++ b/Documentation/zh_CN/SubmittingPatches @@ -0,0 +1,416 @@ +Chinese translated version of Documentation/SubmittingPatches + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Chinese maintainer: TripleX Chung <triplex@zh-kernel.org> +--------------------------------------------------------------------- +Documentation/SubmittingPatches 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +中文版维护者: 钟宇 TripleX Chung <triplex@zh-kernel.org> +中文版翻译者: 钟宇 TripleX Chung <triplex@zh-kernel.org> +中文版校译者: 李阳 Li Yang <leo@zh-kernel.org> + 王聪 Wang Cong <xiyou.wangcong@gmail.com> + +以下为正文 +--------------------------------------------------------------------- + + 如何让你的改动进入内核 + 或者 + 获得亲爱的 Linus Torvalds 的关注和处理 +---------------------------------- + +对于想要将改动提交到 Linux 内核的个人或者公司来说,如果不熟悉“规矩”, +提交的流程会让人畏惧。本文档收集了一系列建议,这些建议可以大大的提高你 +的改动被接受的机会。 +阅读 Documentation/SubmitChecklist 来获得在提交代码前需要检查的项目的列 +表。如果你在提交一个驱动程序,那么同时阅读一下 +Documentation/SubmittingDrivers 。 + + +-------------------------- +第一节 - 创建并发送你的改动 +-------------------------- + +1) "diff -up" +----------- + +使用 "diff -up" 或者 "diff -uprN" 来创建补丁。 + +所有内核的改动,都是以补丁的形式呈现的,补丁由 diff(1) 生成。创建补丁的 +时候,要确认它是以 "unified diff" 格式创建的,这种格式由 diff(1) 的 '-u' +参数生成。而且,请使用 '-p' 参数,那样会显示每个改动所在的C函数,使得 +产生的补丁容易读得多。补丁应该基于内核源代码树的根目录,而不是里边的任 +何子目录。 +为一个单独的文件创建补丁,一般来说这样做就够了: + + SRCTREE= linux-2.6 + MYFILE= drivers/net/mydriver.c + + cd $SRCTREE + cp $MYFILE $MYFILE.orig + vi $MYFILE # make your change + cd .. + diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch + +为多个文件创建补丁,你可以解开一个没有修改过的内核源代码树,然后和你自 +己的代码树之间做 diff 。例如: + + MYSRC= /devel/linux-2.6 + + tar xvfz linux-2.6.12.tar.gz + mv linux-2.6.12 linux-2.6.12-vanilla + diff -uprN -X linux-2.6.12-vanilla/Documentation/dontdiff \ + linux-2.6.12-vanilla $MYSRC > /tmp/patch + +"dontdiff" 是内核在编译的时候产生的文件的列表,列表中的文件在 diff(1) +产生的补丁里会被跳过。"dontdiff" 文件被包含在2.6.12和之后版本的内核源代 +码树中。对于更早的内核版本,你可以从 +<http://www.xenotime.net/linux/doc/dontdiff> 获取它。 +确定你的补丁里没有包含任何不属于这次补丁提交的额外文件。记得在用diff(1) +生成补丁之后,审阅一次补丁,以确保准确。 +如果你的改动很散乱,你应该研究一下如何将补丁分割成独立的部分,将改动分 +割成一系列合乎逻辑的步骤。这样更容易让其他内核开发者审核,如果你想你的 +补丁被接受,这是很重要的。下面这些脚本能够帮助你做这件事情: +Quilt: +http://savannah.nongnu.org/projects/quilt + +Andrew Morton 的补丁脚本: +http://www.zip.com.au/~akpm/linux/patches/ +作为这些脚本的替代,quilt 是值得推荐的补丁管理工具(看上面的链接)。 + +2)描述你的改动。 +描述你的改动包含的技术细节。 + +要多具体就写多具体。最糟糕的描述可能是像下面这些语句:“更新了某驱动程 +序”,“修正了某驱动程序的bug”,或者“这个补丁包含了某子系统的修改,请 +使用。” + +如果你的描述开始变长,这表示你也许需要拆分你的补丁了,请看第3小节, +继续。 + +3)拆分你的改动 + +将改动拆分,逻辑类似的放到同一个补丁文件里。 + +例如,如果你的改动里同时有bug修正和性能优化,那么把这些改动才分到两个或 +者更多的补丁文件中。如果你的改动包含对API的修改,并且修改了驱动程序来适 +应这些新的API,那么把这些修改分成两个补丁。 + +另一方面,如果你将一个单独的改动做成多个补丁文件,那么将它们合并成一个 +单独的补丁文件。这样一个逻辑上单独的改动只被包含在一个补丁文件里。 + +如果有一个补丁依赖另外一个补丁来完成它的改动,那没问题。简单的在你的补 +丁描述里指出“这个补丁依赖某补丁”就好了。 + +如果你不能将补丁浓缩成更少的文件,那么每次大约发送出15个,然后等待审查 +和整合。 + +4)选择 e-mail 的收件人 + +看一遍 MAINTAINERS 文件和源代码,看看你所的改动所在的内核子系统有没有指 +定的维护者。如果有,给他们发e-mail。 + +如果没有找到维护者,或者维护者没有反馈,将你的补丁发送到内核开发者主邮 +件列表 linux-kernel@vger.kernel.org。大部分的内核开发者都跟踪这个邮件列 +表,可以评价你的改动。 + +每次不要发送超过15个补丁到 vger 邮件列表!!! + +Linus Torvalds 是决定改动能否进入 Linux 内核的最终裁决者。他的 e-mail +地址是 <torvalds@linux-foundation.org> 。他收到的 e-mail 很多,所以一般 +的说,最好别给他发 e-mail。 + +那些修正bug,“显而易见”的修改或者是类似的只需要很少讨论的补丁可以直接 +发送或者CC给Linus。那些需要讨论或者没有很清楚的好处的补丁,一般先发送到 +linux-kernel邮件列表。只有当补丁被讨论得差不多了,才提交给Linus。 + +5)选择CC( e-mail 抄送)列表 + +除非你有理由不这样做,否则CC linux-kernel@vger.kernel.org。 + +除了 Linus 之外,其他内核开发者也需要注意到你的改动,这样他们才能评论你 +的改动并提供代码审查和建议。linux-kernel 是 Linux 内核开发者主邮件列表 +。其它的邮件列表为特定的子系统提供服务,比如 USB,framebuffer 设备,虚 +拟文件系统,SCSI 子系统,等等。查看 MAINTAINERS 文件来获得和你的改动有 +关的邮件列表。 + +Majordomo lists of VGER.KERNEL.ORG at: + <http://vger.kernel.org/vger-lists.html> + +如果改动影响了用户空间和内核之间的接口,请给 MAN-PAGES 的维护者(列在 +MAITAINERS 文件里的)发送一个手册页(man-pages)补丁,或者至少通知一下改 +变,让一些信息有途径进入手册页。 + +即使在第四步的时候,维护者没有作出回应,也要确认在修改他们的代码的时候 +,一直将维护者拷贝到CC列表中。 + +对于小的补丁,你也许会CC到 Adrian Bunk 管理的搜集琐碎补丁的邮件列表 +(Trivial Patch Monkey)trivial@kernel.org,那里专门收集琐碎的补丁。下面这样 +的补丁会被看作“琐碎的”补丁: + 文档的拼写修正。 + 修正会影响到 grep(1) 的拼写。 + 警告信息修正(频繁的打印无用的警告是不好的。) + 编译错误修正(代码逻辑的确是对的,只是编译有问题。) + 运行时修正(只要真的修正了错误。) + 移除使用了被废弃的函数/宏的代码(例如 check_region。) + 联系方式和文档修正。 + 用可移植的代码替换不可移植的代码(即使在体系结构相关的代码中,既然有 + 人拷贝,只要它是琐碎的) + 任何文件的作者/维护者对该文件的改动(例如 patch monkey 在重传模式下) + +URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> + +(译注,关于“琐碎补丁”的一些说明:因为原文的这一部分写得比较简单,所以不得不 +违例写一下译注。"trivial"这个英文单词的本意是“琐碎的,不重要的。”但是在这里 +有稍微有一些变化,例如对一些明显的NULL指针的修正,属于运行时修正,会被归类 +到琐碎补丁里。虽然NULL指针的修正很重要,但是这样的修正往往很小而且很容易得到 +检验,所以也被归入琐碎补丁。琐碎补丁更精确的归类应该是 +“simple, localized & easy to verify”,也就是说简单的,局部的和易于检验的。 +trivial@kernel.org邮件列表的目的是针对这样的补丁,为提交者提供一个中心,来 +降低提交的门槛。) + +6)没有 MIME 编码,没有链接,没有压缩,没有附件,只有纯文本。 + +Linus 和其他的内核开发者需要阅读和评论你提交的改动。对于内核开发者来说 +,可以“引用”你的改动很重要,使用一般的 e-mail 工具,他们就可以在你的 +代码的任何位置添加评论。 + +因为这个原因,所有的提交的补丁都是 e-mail 中“内嵌”的。 +警告:如果你使用剪切-粘贴你的补丁,小心你的编辑器的自动换行功能破坏你的 +补丁。 + +不要将补丁作为 MIME 编码的附件,不管是否压缩。很多流行的 e-mail 软件不 +是任何时候都将 MIME 编码的附件当作纯文本发送的,这会使得别人无法在你的 +代码中加评论。另外,MIME 编码的附件会让 Linus 多花一点时间来处理,这就 +降低了你的改动被接受的可能性。 + +警告:一些邮件软件,比如 Mozilla 会将你的信息以如下格式发送: +---- 邮件头 ---- +Content-Type: text/plain; charset=us-ascii; format=flowed +---- 邮件头 ---- +问题在于 “format=flowed” 会让接收端的某些邮件软件将邮件中的制表符替换 +成空格以及做一些类似的替换。这样,你发送的时候看起来没问题的补丁就被破 +坏了。 + +要修正这个问题,只需要将你的 mozilla 的 defaults/pref/mailnews.js 文件 +里的 +pref("mailnews.send_plaintext_flowed", false); // RFC 2646======= +修改成 +pref("mailnews.display.disable_format_flowed_support", true); +就可以了。 + +7) e-mail 的大小 + +给 Linus 发送补丁的时候,永远按照第6小节说的做。 + +大的改动对邮件列表不合适,对某些维护者也不合适。如果你的补丁,在不压缩 +的情况下,超过了40kB,那么你最好将补丁放在一个能通过 internet 访问的服 +务器上,然后用指向你的补丁的 URL 替代。 + +8) 指出你的内核版本 + +在标题和在补丁的描述中,指出补丁对应的内核的版本,是很重要的。 + +如果补丁不能干净的在最新版本的内核上打上,Linus 是不会接受它的。 + +9) 不要气馁,继续提交。 + +当你提交了改动以后,耐心地等待。如果 Linus 喜欢你的改动并且同意它,那么 +它将在下一个内核发布版本中出现。 + +然而,如果你的改动没有出现在下一个版本的内核中,可能有若干原因。减少那 +些原因,修正错误,重新提交更新后的改动,是你自己的工作。 + +Linus不给出任何评论就“丢弃”你的补丁是常见的事情。在系统中这样的事情很 +平常。如果他没有接受你的补丁,也许是由于以下原本: +* 你的补丁不能在最新版本的内核上干净的打上。 +* 你的补丁在 linux-kernel 邮件列表中没有得到充分的讨论。 +* 风格问题(参照第2小节) +* 邮件格式问题(重读本节) +* 你的改动有技术问题。 +* 他收到了成吨的 e-mail,而你的在混乱中丢失了。 +* 你让人为难。 + +有疑问的时候,在 linux-kernel 邮件列表上请求评论。 + +10) 在标题上加上 PATCH 的字样 + +Linus 和 linux-kernel 邮件列表的 e-mail 流量都很高,一个通常的约定是标 +题行以 [PATCH] 开头。这样可以让 Linus 和其他内核开发人员可以从 e-mail +的讨论中很轻易的将补丁分辨出来。 + +11)为你的工作签名 + +为了加强对谁做了何事的追踪,尤其是对那些透过好几层的维护者的补丁,我们 +建议在发送出去的补丁上加一个 “sign-off” 的过程。 + +"sign-off" 是在补丁的注释的最后的简单的一行文字,认证你编写了它或者其他 +人有权力将它作为开放源代码的补丁传递。规则很简单:如果你能认证如下信息 +: + 开发者来源证书 1.1 + 对于本项目的贡献,我认证如下信息: + (a)这些贡献是完全或者部分的由我创建,我有权利以文件中指出 + 的开放源代码许可证提交它;或者 + (b)这些贡献基于以前的工作,据我所知,这些以前的工作受恰当的开放 + 源代码许可证保护,而且,根据许可证,我有权提交修改后的贡献, + 无论是完全还是部分由我创造,这些贡献都使用同一个开放源代码许可证 + (除非我被允许用其它的许可证),正如文件中指出的;或者 + (c)这些贡献由认证(a),(b)或者(c)的人直接提供给我,而 + 且我没有修改它。 + (d)我理解并同意这个项目和贡献是公开的,贡献的记录(包括我 + 一起提交的个人记录,包括 sign-off )被永久维护并且可以和这个项目 + 或者开放源代码的许可证同步地再发行。 + 那么加入这样一行: + Signed-off-by: Random J Developer <random@developer.example.org> + +使用你的真名(抱歉,不能使用假名或者匿名。) + +有人在最后加上标签。现在这些东西会被忽略,但是你可以这样做,来标记公司 +内部的过程,或者只是指出关于 sign-off 的一些特殊细节。 + +12)标准补丁格式 + +标准的补丁,标题行是: + Subject: [PATCH 001/123] 子系统:一句话概述 + +标准补丁的信体存在如下部分: + + - 一个 "from" 行指出补丁作者。 + + - 一个空行 + + - 说明的主体,这些说明文字会被拷贝到描述该补丁的永久改动记录里。 + + - 一个由"---"构成的标记行 + + - 不合适放到改动记录里的额外的注解。 + + - 补丁本身(diff 输出) + +标题行的格式,使得对标题行按字母序排序非常的容易 - 很多 e-mail 客户端都 +可以支持 - 因为序列号是用零填充的,所以按数字排序和按字母排序是一样的。 + +e-mail 标题中的“子系统”标识哪个内核子系统将被打补丁。 + +e-mail 标题中的“一句话概述”扼要的描述 e-mail 中的补丁。“一句话概述” +不应该是一个文件名。对于一个补丁系列(“补丁系列”指一系列的多个相关补 +丁),不要对每个补丁都使用同样的“一句话概述”。 + +记住 e-mail 的“一句话概述”会成为该补丁的全局唯一标识。它会蔓延到 git +的改动记录里。然后“一句话概述”会被用在开发者的讨论里,用来指代这个补 +丁。用户将希望通过 google 来搜索"一句话概述"来找到那些讨论这个补丁的文 +章。 + +一些标题的例子: + + Subject: [patch 2/5] ext2: improve scalability of bitmap searching + Subject: [PATCHv2 001/207] x86: fix eflags tracking + +"from" 行是信体里的最上面一行,具有如下格式: + From: Original Author <author@example.com> + +"from" 行指明在永久改动日志里,谁会被确认为作者。如果没有 "from" 行,那 +么邮件头里的 "From: " 行会被用来决定改动日志中的作者。 + +说明的主题将会被提交到永久的源代码改动日志里,因此对那些早已经不记得和 +这个补丁相关的讨论细节的有能力的读者来说,是有意义的。 + +"---" 标记行对于补丁处理工具要找到哪里是改动日志信息的结束,是不可缺少 +的。 + +对于 "---" 标记之后的额外注解,一个好的用途就是用来写 diffstat,用来显 +示修改了什么文件和每个文件都增加和删除了多少行。diffstat 对于比较大的补 +丁特别有用。其余那些只是和时刻或者开发者相关的注解,不合适放到永久的改 +动日志里的,也应该放这里。 +使用 diffstat的选项 "-p 1 -w 70" 这样文件名就会从内核源代码树的目录开始 +,不会占用太宽的空间(很容易适合80列的宽度,也许会有一些缩进。) + +在后面的参考资料中能看到适当的补丁格式的更多细节。 + +------------------------------- +第二节 提示,建议和诀窍 +------------------------------- + +本节包含很多和提交到内核的代码有关的通常的"规则"。事情永远有例外...但是 +你必须真的有好的理由这样做。你可以把本节叫做Linus的计算机科学入门课。 + +1) 读 Document/CodingStyle + +Nuff 说过,如果你的代码和这个偏离太多,那么它有可能会被拒绝,没有更多的 +审查,没有更多的评价。 + +2) #ifdef 是丑陋的 +混杂了 ifdef 的代码难以阅读和维护。别这样做。作为替代,将你的 ifdef 放 +在头文件里,有条件地定义 "static inline" 函数,或者宏,在代码里用这些东 +西。让编译器把那些"空操作"优化掉。 + +一个简单的例子,不好的代码: + + dev = alloc_etherdev (sizeof(struct funky_private)); + if (!dev) + return -ENODEV; + #ifdef CONFIG_NET_FUNKINESS + init_funky_net(dev); + #endif + +清理后的例子: + +(头文件里) + #ifndef CONFIG_NET_FUNKINESS + static inline void init_funky_net (struct net_device *d) {} + #endif + +(代码文件里) + dev = alloc_etherdev (sizeof(struct funky_private)); + if (!dev) + return -ENODEV; + init_funky_net(dev); + +3) 'static inline' 比宏好 + +Static inline 函数相比宏来说,是好得多的选择。Static inline 函数提供了 +类型安全,没有长度限制,没有格式限制,在 gcc 下开销和宏一样小。 + +宏只在 static inline 函数不是最优的时候[在 fast paths 里有很少的独立的 +案例],或者不可能用 static inline 函数的时候[例如字符串分配]。 +应该用 'static inline' 而不是 'static __inline__', 'extern inline' 和 +'extern __inline__' 。 + +4) 不要过度设计 + +不要试图预计模糊的未来事情,这些事情也许有用也许没有用:"让事情尽可能的 +简单,而不是更简单"。 + +---------------- +第三节 参考文献 +---------------- + +Andrew Morton, "The perfect patch" (tpp). + <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> + +Jeff Garzik, "Linux kernel patch submission format". + <http://linux.yyz.us/patch-format.html> + +Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". + <http://www.kroah.com/log/2005/03/31/> + <http://www.kroah.com/log/2005/07/08/> + <http://www.kroah.com/log/2005/10/19/> + <http://www.kroah.com/log/2006/01/11/> + +NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! + <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> + +Kernel Documentation/CodingStyle: + <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> + +Linus Torvalds's mail on the canonical patch format: + <http://lkml.org/lkml/2005/4/7/183> +-- diff --git a/Documentation/zh_CN/oops-tracing.txt b/Documentation/zh_CN/oops-tracing.txt new file mode 100644 index 000000000000..9312608ffb8d --- /dev/null +++ b/Documentation/zh_CN/oops-tracing.txt @@ -0,0 +1,212 @@ +Chinese translated version of Documentation/oops-tracing.txt + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Chinese maintainer: Dave Young <hidave.darkstar@gmail.com> +--------------------------------------------------------------------- +Documentation/oops-tracing.txt 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +中文版维护者: 杨瑞 Dave Young <hidave.darkstar@gmail.com> +中文版翻译者: 杨瑞 Dave Young <hidave.darkstar@gmail.com> +中文版校译者: 李阳 Li Yang <leo@zh-kernel.org> + 王聪 Wang Cong <xiyou.wangcong@gmail.com> + +以下为正文 +--------------------------------------------------------------------- + +注意: ksymoops 在2.6中是没有用的。 请以原有格式使用Oops(来自dmesg,等等)。 +忽略任何这样那样关于“解码Oops”或者“通过ksymoops运行”的文档。 如果你贴出运行过 +ksymoops的来自2.6的Oops,人们只会让你重贴一次。 + +快速总结 +------------- + +发现Oops并发送给看似相关的内核领域的维护者。别太担心对不上号。如果你不确定就发给 +和你所做的事情相关的代码的负责人。 如果可重现试着描述怎样重构。 那甚至比oops更有 +价值。 + +如果你对于发送给谁一无所知, 发给linux-kernel@vger.kernel.org。感谢你帮助Linux +尽可能地稳定。 + +Oops在哪里? +---------------------- + +通常Oops文本由klogd从内核缓冲区里读取并传给syslogd,由syslogd写到syslog文件中, +典型地是/var/log/messages(依赖于/etc/syslog.conf)。有时klogd崩溃了,这种情况下你 +能够运行dmesg > file来从内核缓冲区中读取数据并保存下来。 否则你可以 +cat /proc/kmsg > file, 然而你必须介入中止传输, kmsg是一个“永不结束的文件”。如 +果机器崩溃坏到你不能输入命令或者磁盘不可用那么你有三种选择:- + +(1) 手抄屏幕上的文本待机器重启后再输入计算机。 麻烦但如果没有针对崩溃的准备, +这是仅有的选择。 另外,你可以用数码相机把屏幕拍下来-不太好,但比没有强。 如果信 +息滚动到了终端的上面,你会发现以高分辩率启动(比如,vga=791)会让你读到更多的文 +本。(注意:这需要vesafb,所以对‘早期’的oops没有帮助) + +(2)用串口终端启动(请参看Documentation/serial-console.txt),运行一个null +modem到另一台机器并用你喜欢的通讯工具获取输出。Minicom工作地很好。 + +(3)使用Kdump(请参看Documentation/kdump/kdump.txt), +使用在Documentation/kdump/gdbmacros.txt中定义的dmesg gdb宏,从旧的内存中提取内核 +环形缓冲区。 + +完整信息 +---------------- + +注意:以下来自于Linus的邮件适用于2.4内核。 我因为历史原因保留了它,并且因为其中 +一些信息仍然适用。 特别注意的是,请忽略任何ksymoops的引用。 + +From: Linus Torvalds <torvalds@osdl.org> + +怎样跟踪Oops.. [原发到linux-kernel的一封邮件] + +主要的窍门是有五年和这些烦人的oops消息打交道的经验;-) + +实际上,你有办法使它更简单。我有两个不同的方法: + + gdb /usr/src/linux/vmlinux + gdb> disassemble <offending_function> + +那是发现问题的简单办法,至少如果bug报告做的好的情况下(象这个一样-运行ksymoops +得到oops发生的函数及函数内的偏移)。 + +哦,如果报告发生的内核以相同的编译器和相似的配置编译它会有帮助的。 + +另一件要做的事是反汇编bug报告的“Code”部分:ksymoops也会用正确的工具来做这件事, +但如果没有那些工具你可以写一个傻程序: + + char str[] = "\xXX\xXX\xXX..."; + main(){} + +并用gcc -g编译它然后执行“disassemble str”(XX部分是由Oops报告的值-你可以仅剪切 +粘贴并用“\x”替换空格-我就是这么做的,因为我懒得写程序自动做这一切)。 + +另外,你可以用scripts/decodecode这个shell脚本。它的使用方法是: +decodecode < oops.txt + +“Code”之后的十六进制字节可能(在某些架构上)有一些当前指令之前的指令字节以及 +当前和之后的指令字节 + +Code: f9 0f 8d f9 00 00 00 8d 42 0c e8 dd 26 11 c7 a1 60 ea 2b f9 8b 50 08 a1 +64 ea 2b f9 8d 34 82 8b 1e 85 db 74 6d 8b 15 60 ea 2b f9 <8b> 43 04 39 42 54 +7e 04 40 89 42 54 8b 43 04 3b 05 00 f6 52 c0 + +最后,如果你想知道代码来自哪里,你可以: + + cd /usr/src/linux + make fs/buffer.s # 或任何产生BUG的文件 + +然后你会比gdb反汇编更清楚的知道发生了什么。 + +现在,问题是把你所拥有的所有数据结合起来:C源码(关于它应该怎样的一般知识), +汇编代码及其反汇编得到的代码(另外还有从“oops”消息得到的寄存器状态-对了解毁坏的 +指针有用,而且当你有了汇编代码你也能拿其它的寄存器和任何它们对应的C表达式做匹配 +)。 + +实际上,你仅需看看哪里不匹配(这个例子是“Code”反汇编和编译器生成的代码不匹配)。 +然后你须要找出为什么不匹配。通常很简单-你看到代码使用了空指针然后你看代码想知道 +空指针是怎么出现的,还有检查它是否合法.. + +现在,如果明白这是一项耗时的工作而且需要一丁点儿的专心,没错。这就是我为什么大多 +只是忽略那些没有符号表信息的崩溃报告的原因:简单的说太难查找了(我有一些 +程序用于在内核代码段中搜索特定的模式,而且有时我也已经能找出那些崩溃的地方,但是 +仅仅是找出正确的序列也确实需要相当扎实的内核知识) + +_有时_会发生这种情况,我仅看到崩溃中的反汇编代码序列, 然后我马上就明白问题出在 +哪里。这时我才意识到自己干这个工作已经太长时间了;-) + + Linus + + +--------------------------------------------------------------------------- +关于Oops跟踪的注解: + +为了帮助Linus和其它内核开发者,klogd纳入了大量的支持来处理保护错误。为了拥有对 +地址解析的完整支持至少应该使用1.3-pl3的sysklogd包。 + +当保护错误发生时,klogd守护进程自动把内核日志信息中的重要地址翻译成它们相应的符 +号。 + +klogd执行两种类型的地址解析。首先是静态翻译其次是动态翻译。静态翻译和ksymoops +一样使用System.map文件。为了做静态翻译klogd守护进程必须在初始化时能找到system +map文件。关于klogd怎样搜索map文件请参看klogd手册页。 + +动态地址翻译在使用内核可装载模块时很重要。 因为内核模块的内存是从内核动态内存池 +里分配的,所以不管是模块开始位置还是模块中函数和符号的位置都不是固定的。 + +内核支持允许程序决定装载哪些模块和它们在内存中位置的系统调用。使用这些系统调用 +klogd守护进程生成一张符号表用于调试发生在可装载模块中的保护错误。 + +至少klogd会提供产生保护错误的模块名。还可有额外的符号信息供可装载模块开发者选择 +以从模块中输出符号信息。 + +因为内核模块环境可能是动态的,所以必须有一种机制当模块环境发生改变时来通知klogd +守护进程。 有一些可用的命令行选项允许klogd向当前执行中的守护进程发送信号,告知符 +号信息应该被刷新了。 更多信息请参看klogd手册页。 + +sysklogd发布时包含一个补丁修改了modules-2.0.0包,无论何时一个模块装载或者卸载都 +会自动向klogd发送信号。打上这个补丁提供了必要的对调试发生于内核可装载模块的保护 +错误的无缝支持。 + +以下是被klogd处理过的发生在可装载模块中的一个保护错误例子: +--------------------------------------------------------------------------- +Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc +Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000 +Aug 29 09:51:01 blizard kernel: *pde = 00000000 +Aug 29 09:51:01 blizard kernel: Oops: 0002 +Aug 29 09:51:01 blizard kernel: CPU: 0 +Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868] +Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212 +Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c +Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c +Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 +Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000) +Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001 +Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00 +Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036 +Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128] +Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3 +--------------------------------------------------------------------------- + +Dr. G.W. Wettstein Oncology Research Div. Computing Facility +Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com +820 4th St. N. +Fargo, ND 58122 +Phone: 701-234-7556 + + +--------------------------------------------------------------------------- +受污染的内核 + +一些oops报告在程序记数器之后包含字符串'Tainted: '。这表明内核已经被一些东西给污 +染了。 该字符串之后紧跟着一系列的位置敏感的字符,每个代表一个特定的污染值。 + + 1:'G'如果所有装载的模块都有GPL或相容的许可证,'P'如果装载了任何的专有模块。 +没有模块MODULE_LICENSE或者带有insmod认为是与GPL不相容的的MODULE_LICENSE的模块被 +认定是专有的。 + + 2:'F'如果有任何通过“insmod -f”被强制装载的模块,' '如果所有模块都被正常装载。 + + 3:'S'如果oops发生在SMP内核中,运行于没有证明安全运行多处理器的硬件。 当前这种 +情况仅限于几种不支持SMP的速龙处理器。 + + 4:'R'如果模块通过“insmod -f”被强制装载,' '如果所有模块都被正常装载。 + + 5:'M'如果任何处理器报告了机器检查异常,' '如果没有发生机器检查异常。 + + 6:'B'如果页释放函数发现了一个错误的页引用或者一些非预期的页标志。 + + 7:'U'如果用户或者用户应用程序特别请求设置污染标志,否则' '。 + + 8:'D'如果内核刚刚死掉,比如有OOPS或者BUG。 + +使用'Tainted: '字符串的主要原因是要告诉内核调试者,这是否是一个干净的内核亦或发 +生了任何的不正常的事。污染是永久的:即使出错的模块已经被卸载了,污染值仍然存在, +以表明内核不再值得信任。 diff --git a/Documentation/zh_CN/sparse.txt b/Documentation/zh_CN/sparse.txt new file mode 100644 index 000000000000..75992a603ae3 --- /dev/null +++ b/Documentation/zh_CN/sparse.txt @@ -0,0 +1,100 @@ +Chinese translated version of Documentation/sparse.txt + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Chinese maintainer: Li Yang <leo@zh-kernel.org> +--------------------------------------------------------------------- +Documentation/sparse.txt 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +中文版维护者: 李阳 Li Yang <leo@zh-kernel.org> +中文版翻译者: 李阳 Li Yang <leo@zh-kernel.org> + + +以下为正文 +--------------------------------------------------------------------- + +Copyright 2004 Linus Torvalds +Copyright 2004 Pavel Machek <pavel@suse.cz> +Copyright 2006 Bob Copeland <me@bobcopeland.com> + +使用 sparse 工具做类型检查 +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +"__bitwise" 是一种类型属性,所以你应该这样使用它: + + typedef int __bitwise pm_request_t; + + enum pm_request { + PM_SUSPEND = (__force pm_request_t) 1, + PM_RESUME = (__force pm_request_t) 2 + }; + +这样会使 PM_SUSPEND 和 PM_RESUME 成为位方式(bitwise)整数(使用"__force" +是因为 sparse 会抱怨改变位方式的类型转换,但是这里我们确实需要强制进行转 +换)。而且因为所有枚举值都使用了相同的类型,这里的"enum pm_request"也将 +会使用那个类型做为底层实现。 + +而且使用 gcc 编译的时候,所有的 __bitwise/__force 都会消失,最后在 gcc +看来它们只不过是普通的整数。 + +坦白来说,你并不需要使用枚举类型。上面那些实际都可以浓缩成一个特殊的"int +__bitwise"类型。 + +所以更简单的办法只要这样做: + + typedef int __bitwise pm_request_t; + + #define PM_SUSPEND ((__force pm_request_t) 1) + #define PM_RESUME ((__force pm_request_t) 2) + +现在你就有了严格的类型检查所需要的所有基础架构。 + +一个小提醒:常数整数"0"是特殊的。你可以直接把常数零当作位方式整数使用而 +不用担心 sparse 会抱怨。这是因为"bitwise"(恰如其名)是用来确保不同位方 +式类型不会被弄混(小尾模式,大尾模式,cpu尾模式,或者其他),对他们来说 +常数"0"确实是特殊的。 + +获取 sparse 工具 +~~~~~~~~~~~~~~~~ + +你可以从 Sparse 的主页获取最新的发布版本: + + http://www.kernel.org/pub/linux/kernel/people/josh/sparse/ + +或者,你也可以使用 git 克隆最新的 sparse 开发版本: + + git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git + +DaveJ 把每小时自动生成的 git 源码树 tar 包放在以下地址: + + http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ + +一旦你下载了源码,只要以普通用户身份运行: + + make + make install + +它将会被自动安装到你的 ~/bin 目录下。 + +使用 sparse 工具 +~~~~~~~~~~~~~~~~ + +用"make C=1"命令来编译内核,会对所有重新编译的 C 文件使用 sparse 工具。 +或者使用"make C=2"命令,无论文件是否被重新编译都会对其使用 sparse 工具。 +如果你已经编译了内核,用后一种方式可以很快地检查整个源码树。 + +make 的可选变量 CHECKFLAGS 可以用来向 sparse 工具传递参数。编译系统会自 +动向 sparse 工具传递 -Wbitwise 参数。你可以定义 __CHECK_ENDIAN__ 来进行 +大小尾检查。 + + make C=2 CHECKFLAGS="-D__CHECK_ENDIAN__" + +这些检查默认都是被关闭的,因为他们通常会产生大量的警告。 diff --git a/Documentation/zh_CN/stable_kernel_rules.txt b/Documentation/zh_CN/stable_kernel_rules.txt new file mode 100644 index 000000000000..b5b9b0ab02fd --- /dev/null +++ b/Documentation/zh_CN/stable_kernel_rules.txt @@ -0,0 +1,66 @@ +Chinese translated version of Documentation/stable_kernel_rules.txt + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Chinese maintainer: TripleX Chung <triplex@zh-kernel.org> +--------------------------------------------------------------------- +Documentation/stable_kernel_rules.txt 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + + +中文版维护者: 钟宇 TripleX Chung <triplex@zh-kernel.org> +中文版翻译者: 钟宇 TripleX Chung <triplex@zh-kernel.org> +中文版校译者: 李阳 Li Yang <leo@zh-kernel.org> + Kangkai Yin <e12051@motorola.com> + +以下为正文 +--------------------------------------------------------------------- + +关于Linux 2.6稳定版发布,所有你想知道的事情。 + +关于哪些类型的补丁可以被接收进入稳定版代码树,哪些不可以的规则: + + - 必须是显而易见的正确,并且经过测试的。 + - 连同上下文,不能大于100行。 + - 必须只修正一件事情。 + - 必须修正了一个给大家带来麻烦的真正的bug(不是“这也许是一个问题...” + 那样的东西)。 + - 必须修正带来如下后果的问题:编译错误(对被标记为CONFIG_BROKEN的例外), + 内核崩溃,挂起,数据损坏,真正的安全问题,或者一些类似“哦,这不 + 好”的问题。简短的说,就是一些致命的问题。 + - 没有“理论上的竞争条件”,除非能给出竞争条件如何被利用的解释。 + - 不能存在任何的“琐碎的”修正(拼写修正,去掉多余空格之类的)。 + - 必须被相关子系统的维护者接受。 + - 必须遵循Documentation/SubmittingPatches里的规则。 + +向稳定版代码树提交补丁的过程: + + - 在确认了补丁符合以上的规则后,将补丁发送到stable@kernel.org。 + - 如果补丁被接受到队列里,发送者会收到一个ACK回复,如果没有被接受,收 + 到的是NAK回复。回复需要几天的时间,这取决于开发者的时间安排。 + - 被接受的补丁会被加到稳定版本队列里,等待其他开发者的审查。 + - 安全方面的补丁不要发到这个列表,应该发送到security@kernel.org。 + +审查周期: + + - 当稳定版的维护者决定开始一个审查周期,补丁将被发送到审查委员会,以 + 及被补丁影响的领域的维护者(除非提交者就是该领域的维护者)并且抄送 + 到linux-kernel邮件列表。 + - 审查委员会有48小时的时间,用来决定给该补丁回复ACK还是NAK。 + - 如果委员会中有成员拒绝这个补丁,或者linux-kernel列表上有人反对这个 + 补丁,并提出维护者和审查委员会之前没有意识到的问题,补丁会从队列中 + 丢弃。 + - 在审查周期结束的时候,那些得到ACK回应的补丁将会被加入到最新的稳定版 + 发布中,一个新的稳定版发布就此产生。 + - 安全性补丁将从内核安全小组那里直接接收到稳定版代码树中,而不是通过 + 通常的审查周期。请联系内核安全小组以获得关于这个过程的更多细节。 + +审查委员会: + - 由一些自愿承担这项任务的内核开发者,和几个非志愿的组成。 diff --git a/Documentation/zh_CN/volatile-considered-harmful.txt b/Documentation/zh_CN/volatile-considered-harmful.txt new file mode 100644 index 000000000000..ba8149d2233a --- /dev/null +++ b/Documentation/zh_CN/volatile-considered-harmful.txt @@ -0,0 +1,113 @@ +Chinese translated version of Documentation/volatile-considered-harmful.txt + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Jonathan Corbet <corbet@lwn.net> +Chinese maintainer: Bryan Wu <bryan.wu@analog.com> +--------------------------------------------------------------------- +Documentation/volatile-considered-harmful.txt 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +英文版维护者: Jonathan Corbet <corbet@lwn.net> +中文版维护者: 伍鹏 Bryan Wu <bryan.wu@analog.com> +中文版翻译者: 伍鹏 Bryan Wu <bryan.wu@analog.com> +中文版校译者: 张汉辉 Eugene Teo <eugeneteo@kernel.sg> + 杨瑞 Dave Young <hidave.darkstar@gmail.com> +以下为正文 +--------------------------------------------------------------------- + +为什么不应该使用“volatile”类型 +------------------------------ + +C程序员通常认为volatile表示某个变量可以在当前执行的线程之外被改变;因此,在内核 +中用到共享数据结构时,常常会有C程序员喜欢使用volatile这类变量。换句话说,他们经 +常会把volatile类型看成某种简易的原子变量,当然它们不是。在内核中使用volatile几 +乎总是错误的;本文档将解释为什么这样。 + +理解volatile的关键是知道它的目的是用来消除优化,实际上很少有人真正需要这样的应 +用。在内核中,程序员必须防止意外的并发访问破坏共享的数据结构,这其实是一个完全 +不同的任务。用来防止意外并发访问的保护措施,可以更加高效的避免大多数优化相关的 +问题。 + +像volatile一样,内核提供了很多原语来保证并发访问时的数据安全(自旋锁, 互斥量,内 +存屏障等等),同样可以防止意外的优化。如果可以正确使用这些内核原语,那么就没有 +必要再使用volatile。如果仍然必须使用volatile,那么几乎可以肯定在代码的某处有一 +个bug。在正确设计的内核代码中,volatile能带来的仅仅是使事情变慢。 + +思考一下这段典型的内核代码: + + spin_lock(&the_lock); + do_something_on(&shared_data); + do_something_else_with(&shared_data); + spin_unlock(&the_lock); + +如果所有的代码都遵循加锁规则,当持有the_lock的时候,不可能意外的改变shared_data的 +值。任何可能访问该数据的其他代码都会在这个锁上等待。自旋锁原语跟内存屏障一样—— 它 +们显式的用来书写成这样 —— 意味着数据访问不会跨越它们而被优化。所以本来编译器认为 +它知道在shared_data里面将有什么,但是因为spin_lock()调用跟内存屏障一样,会强制编 +译器忘记它所知道的一切。那么在访问这些数据时不会有优化的问题。 + +如果shared_data被声名为volatile,锁操作将仍然是必须的。就算我们知道没有其他人正在 +使用它,编译器也将被阻止优化对临界区内shared_data的访问。在锁有效的同时, +shared_data不是volatile的。在处理共享数据的时候,适当的锁操作可以不再需要 +volatile —— 并且是有潜在危害的。 + +volatile的存储类型最初是为那些内存映射的I/O寄存器而定义。在内核里,寄存器访问也应 +该被锁保护,但是人们也不希望编译器“优化”临界区内的寄存器访问。内核里I/O的内存访问 +是通过访问函数完成的;不赞成通过指针对I/O内存的直接访问,并且不是在所有体系架构上 +都能工作。那些访问函数正是为了防止意外优化而写的,因此,再说一次,volatile类型不 +是必需的。 + +另一种引起用户可能使用volatile的情况是当处理器正忙着等待一个变量的值。正确执行一 +个忙等待的方法是: + + while (my_variable != what_i_want) + cpu_relax(); + +cpu_relax()调用会降低CPU的能量消耗或者让位于超线程双处理器;它也作为内存屏障一样出 +现,所以,再一次,volatile不是必需的。当然,忙等待一开始就是一种反常规的做法。 + +在内核中,一些稀少的情况下volatile仍然是有意义的: + + - 在一些体系架构的系统上,允许直接的I/0内存访问,那么前面提到的访问函数可以使用 + volatile。基本上,每一个访问函数调用它自己都是一个小的临界区域并且保证了按照 + 程序员期望的那样发生访问操作。 + + - 某些会改变内存的内联汇编代码虽然没有什么其他明显的附作用,但是有被GCC删除的可 + 能性。在汇编声明中加上volatile关键字可以防止这种删除操作。 + + - Jiffies变量是一种特殊情况,虽然每次引用它的时候都可以有不同的值,但读jiffies + 变量时不需要任何特殊的加锁保护。所以jiffies变量可以使用volatile,但是不赞成 + 其他跟jiffies相同类型变量使用volatile。Jiffies被认为是一种“愚蠢的遗留物" + (Linus的话)因为解决这个问题比保持现状要麻烦的多。 + + - 由于某些I/0设备可能会修改连续一致的内存,所以有时,指向连续一致内存的数据结构 + 的指针需要正确的使用volatile。网络适配器使用的环状缓存区正是这类情形的一个例 + 子,其中适配器用改变指针来表示哪些描述符已经处理过了。 + +对于大多代码,上述几种可以使用volatile的情况都不适用。所以,使用volatile是一种 +bug并且需要对这样的代码额外仔细检查。那些试图使用volatile的开发人员需要退一步想想 +他们真正想实现的是什么。 + +非常欢迎删除volatile变量的补丁 - 只要证明这些补丁完整的考虑了并发问题。 + +注释 +---- + +[1] http://lwn.net/Articles/233481/ +[2] http://lwn.net/Articles/233482/ + +致谢 +---- + +最初由Randy Dunlap推动并作初步研究 +由Jonathan Corbet撰写 +参考Satyam Sharma,Johannes Stezenbach,Jesper Juhl,Heikki Orsila, +H. Peter Anvin,Philipp Hahn和Stefan Richter的意见改善了本档。 |