diff options
Diffstat (limited to 'Documentation')
36 files changed, 814 insertions, 535 deletions
diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING index 65b97e1dbf70..35f5bd243336 100644 --- a/Documentation/BUG-HUNTING +++ b/Documentation/BUG-HUNTING @@ -191,6 +191,30 @@ e.g. crash dump output as shown by Dave Miller. > mov 0x8(%ebp), %ebx ! %ebx = skb->sk > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt +In addition, you can use GDB to figure out the exact file and line +number of the OOPS from the vmlinux file. If you have +CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the +OOPS: + + EIP: 0060:[<c021e50e>] Not tainted VLI + +And use GDB to translate that to human-readable form: + + gdb vmlinux + (gdb) l *0xc021e50e + +If you don't have CONFIG_DEBUG_INFO enabled, you use the function +offset from the OOPS: + + EIP is at vt_ioctl+0xda8/0x1482 + +And recompile the kernel with CONFIG_DEBUG_INFO enabled: + + make vmlinux + gdb vmlinux + (gdb) p vt_ioctl + (gdb) l *(0x<address of vt_ioctl> + 0xda8) + Another very useful option of the Kernel Hacking section in menuconfig is Debug memory allocations. This will help you see whether data has been initialised and not set before use etc. To see the values that get assigned diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index afc286775891..b49b92edb396 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -495,29 +495,40 @@ re-formatting you may want to take a look at the man page. But remember: "indent" is not a fix for bad programming. - Chapter 10: Configuration-files + Chapter 10: Kconfig configuration files -For configuration options (arch/xxx/Kconfig, and all the Kconfig files), -somewhat different indentation is used. +For all of the Kconfig* configuration files throughout the source tree, +the indentation is somewhat different. Lines under a "config" definition +are indented with one tab, while help text is indented an additional two +spaces. Example: -Help text is indented with 2 spaces. - -if CONFIG_EXPERIMENTAL - tristate CONFIG_BOOM - default n - help - Apply nitroglycerine inside the keyboard (DANGEROUS) - bool CONFIG_CHEER - depends on CONFIG_BOOM - default y +config AUDIT + bool "Auditing support" + depends on NET help - Output nice messages when you explode -endif + Enable auditing infrastructure that can be used with another + kernel subsystem, such as SELinux (which requires this for + logging of avc messages output). Does not do system-call + auditing without CONFIG_AUDITSYSCALL. + +Features that might still be considered unstable should be defined as +dependent on "EXPERIMENTAL": + +config SLUB + depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT + bool "SLUB (Unqueued Allocator)" + ... + +while seriously dangerous features (such as write support for certain +filesystems) should advertise this prominently in their prompt string: + +config ADFS_FS_RW + bool "ADFS write support (DANGEROUS)" + depends on ADFS_FS + ... -Generally, CONFIG_EXPERIMENTAL should surround all options not considered -stable. All options that are known to trash data (experimental write- -support for file-systems, for instance) should be denoted (DANGEROUS), other -experimental options should be denoted (EXPERIMENTAL). +For full documentation on the configuration files, see the file +Documentation/kbuild/kconfig-language.txt. Chapter 11: Data structures diff --git a/Documentation/DocBook/gadget.tmpl b/Documentation/DocBook/gadget.tmpl index e7fc96433408..6996d977bf8f 100644 --- a/Documentation/DocBook/gadget.tmpl +++ b/Documentation/DocBook/gadget.tmpl @@ -52,7 +52,7 @@ <toc></toc> -<chapter><title>Introduction</title> +<chapter id="intro"><title>Introduction</title> <para>This document presents a Linux-USB "Gadget" kernel mode diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl index a2ebd651b05a..af293606fbe3 100644 --- a/Documentation/DocBook/usb.tmpl +++ b/Documentation/DocBook/usb.tmpl @@ -185,7 +185,7 @@ </chapter> -<chapter><title>USB-Standard Types</title> +<chapter id="types"><title>USB-Standard Types</title> <para>In <filename><linux/usb/ch9.h></filename> you will find the USB data types defined in chapter 9 of the USB specification. @@ -197,7 +197,7 @@ </chapter> -<chapter><title>Host-Side Data Types and Macros</title> +<chapter id="hostside"><title>Host-Side Data Types and Macros</title> <para>The host side API exposes several layers to drivers, some of which are more necessary than others. @@ -211,7 +211,7 @@ </chapter> - <chapter><title>USB Core APIs</title> + <chapter id="usbcore"><title>USB Core APIs</title> <para>There are two basic I/O models in the USB API. The most elemental one is asynchronous: drivers submit requests @@ -248,7 +248,7 @@ !Edrivers/usb/core/hub.c </chapter> - <chapter><title>Host Controller APIs</title> + <chapter id="hcd"><title>Host Controller APIs</title> <para>These APIs are only for use by host controller drivers, most of which implement standard register interfaces such as @@ -285,7 +285,7 @@ !Idrivers/usb/core/buffer.c </chapter> - <chapter> + <chapter id="usbfs"> <title>The USB Filesystem (usbfs)</title> <para>This chapter presents the Linux <emphasis>usbfs</emphasis>. @@ -317,7 +317,7 @@ not it has a kernel driver. </para> - <sect1> + <sect1 id="usbfs-files"> <title>What files are in "usbfs"?</title> <para>Conventionally mounted at @@ -356,7 +356,7 @@ </sect1> - <sect1> + <sect1 id="usbfs-fstab"> <title>Mounting and Access Control</title> <para>There are a number of mount options for usbfs, which will @@ -439,7 +439,7 @@ </sect1> - <sect1> + <sect1 id="usbfs-devices"> <title>/proc/bus/usb/devices</title> <para>This file is handy for status viewing tools in user @@ -473,7 +473,7 @@ for (;;) { </para> </sect1> - <sect1> + <sect1 id="usbfs-bbbddd"> <title>/proc/bus/usb/BBB/DDD</title> <para>Use these files in one of these basic ways: @@ -510,7 +510,7 @@ for (;;) { </sect1> - <sect1> + <sect1 id="usbfs-lifecycle"> <title>Life Cycle of User Mode Drivers</title> <para>Such a driver first needs to find a device file @@ -565,7 +565,7 @@ for (;;) { </sect1> - <sect1><title>The ioctl() Requests</title> + <sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title> <para>To use these ioctls, you need to include the following headers in your userspace program: @@ -604,7 +604,7 @@ for (;;) { </para> - <sect2> + <sect2 id="usbfs-mgmt"> <title>Management/Status Requests</title> <para>A number of usbfs requests don't deal very directly @@ -736,7 +736,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param) </sect2> - <sect2> + <sect2 id="usbfs-sync"> <title>Synchronous I/O Support</title> <para>Synchronous requests involve the kernel blocking @@ -865,7 +865,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param) </variablelist> </sect2> - <sect2> + <sect2 id="usbfs-async"> <title>Asynchronous I/O Support</title> <para>As mentioned above, there are situations where it may be diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 48123dba5e6a..ced9207bedcf 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -396,26 +396,6 @@ bugme-janitor mailing list (every change in the bugzilla is mailed here) -Managing bug reports --------------------- - -One of the best ways to put into practice your hacking skills is by fixing -bugs reported by other people. Not only you will help to make the kernel -more stable, you'll learn to fix real world problems and you will improve -your skills, and other developers will be aware of your presence. Fixing -bugs is one of the best ways to get merits among other developers, because -not many people like wasting time fixing other people's bugs. - -To work in the already reported bug reports, go to http://bugzilla.kernel.org. -If you want to be advised of the future bug reports, you can subscribe to the -bugme-new mailing list (only new bug reports are mailed here) or to the -bugme-janitor mailing list (every change in the bugzilla is mailed here) - - http://lists.osdl.org/mailman/listinfo/bugme-new - http://lists.osdl.org/mailman/listinfo/bugme-janitors - - - Mailing lists ------------- diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 3af3e65cf43b..6ebffb57e3db 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist @@ -84,3 +84,9 @@ kernel patches. 24: Avoid whitespace damage such as indenting with spaces or whitespace at the end of lines. You can test this by feeding the patch to "git apply --check --whitespace=error-all" + +25: Check your patch for general style as detailed in + Documentation/CodingStyle. Check for trivial violations with the + patch style checker prior to submission (scripts/checkpatch.pl). + You should be able to justify all violations that remain in + your patch. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index a417b25fb1aa..0958e97d4bf4 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -118,7 +118,20 @@ then only post say 15 or so at a time and wait for review and integration. -4) Select e-mail destination. +4) Style check your changes. + +Check your patch for basic style violations, details of which can be +found in Documentation/CodingStyle. Failure to do so simply wastes +the reviewers time and will get your patch rejected, probabally +without even being read. + +At a minimum you should check your patches with the patch style +checker prior to submission (scripts/patchcheck.pl). You should +be able to justify all violations that remain in your patch. + + + +5) Select e-mail destination. Look through the MAINTAINERS file and the source code, and determine if your change applies to a specific subsystem of the kernel, with @@ -146,7 +159,7 @@ discussed should the patch then be submitted to Linus. -5) Select your CC (e-mail carbon copy) list. +6) Select your CC (e-mail carbon copy) list. Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org. @@ -187,8 +200,7 @@ URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> - -6) No MIME, no links, no compression, no attachments. Just plain text. +7) No MIME, no links, no compression, no attachments. Just plain text. Linus and other kernel developers need to be able to read and comment on the changes you are submitting. It is important for a kernel @@ -223,9 +235,9 @@ pref("mailnews.display.disable_format_flowed_support", true); -7) E-mail size. +8) E-mail size. -When sending patches to Linus, always follow step #6. +When sending patches to Linus, always follow step #7. Large changes are not appropriate for mailing lists, and some maintainers. If your patch, uncompressed, exceeds 40 kB in size, @@ -234,7 +246,7 @@ server, and provide instead a URL (link) pointing to your patch. -8) Name your kernel version. +9) Name your kernel version. It is important to note, either in the subject line or in the patch description, the kernel version to which this patch applies. @@ -244,7 +256,7 @@ Linus will not apply it. -9) Don't get discouraged. Re-submit. +10) Don't get discouraged. Re-submit. After you have submitted your change, be patient and wait. If Linus likes your change and applies it, it will appear in the next version @@ -270,7 +282,7 @@ When in doubt, solicit comments on linux-kernel mailing list. -10) Include PATCH in the subject +11) Include PATCH in the subject Due to high e-mail traffic to Linus, and to linux-kernel, it is common convention to prefix your subject line with [PATCH]. This lets Linus @@ -279,7 +291,7 @@ e-mail discussions. -11) Sign your work +12) Sign your work To improve tracking of who did what, especially with patches that can percolate to their final resting place in the kernel through several @@ -328,7 +340,32 @@ now, but you can do this to mark internal company procedures or just point out some special detail about the sign-off. -12) The canonical patch format +13) When to use Acked-by: + +The Signed-off-by: tag indicates that the signer was involved in the +development of the patch, or that he/she was in the patch's delivery path. + +If a person was not directly involved in the preparation or handling of a +patch but wishes to signify and record their approval of it then they can +arrange to have an Acked-by: line added to the patch's changelog. + +Acked-by: is often used by the maintainer of the affected code when that +maintainer neither contributed to nor forwarded the patch. + +Acked-by: is not as formal as Signed-off-by:. It is a record that the acker +has at least reviewed the patch and has indicated acceptance. Hence patch +mergers will sometimes manually convert an acker's "yep, looks good to me" +into an Acked-by:. + +Acked-by: does not necessarily indicate acknowledgement of the entire patch. +For example, if a patch affects multiple subsystems and has an Acked-by: from +one subsystem maintainer then this usually indicates acknowledgement of just +the part which affects that maintainer's code. Judgement should be used here. + When in doubt people should refer to the original discussion in the mailing +list archives. + + +14) The canonical patch format The canonical patch subject line is: @@ -427,6 +464,10 @@ section Linus Computer Science 101. Nuff said. If your code deviates too much from this, it is likely to be rejected without further review, and without comment. +Check your patches with the patch style checker prior to submission +(scripts/checkpatch.pl). You should be able to justify all +violations that remain in your patch. + 2) #ifdefs are ugly diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 2a63d5662a93..05851e9982ed 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -149,7 +149,7 @@ defined which accomplish this: void smp_mb__before_atomic_dec(void); void smp_mb__after_atomic_dec(void); void smp_mb__before_atomic_inc(void); - void smp_mb__after_atomic_dec(void); + void smp_mb__after_atomic_inc(void); For example, smp_mb__before_atomic_dec() can be used like so: diff --git a/Documentation/block/capability.txt b/Documentation/block/capability.txt new file mode 100644 index 000000000000..2f1729424ef4 --- /dev/null +++ b/Documentation/block/capability.txt @@ -0,0 +1,15 @@ +Generic Block Device Capability +=============================================================================== +This file documents the sysfs file block/<disk>/capability + +capability is a hex word indicating which capabilities a specific disk +supports. For more information on bits not listed here, see +include/linux/genhd.h + +Capability Value +------------------------------------------------------------------------------- +GENHD_FL_MEDIA_CHANGE_NOTIFY 4 + When this bit is set, the disk supports Asynchronous Notification + of media change events. These events will be broadcast to user + space via kernel uevent. + diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 64e9f6c4826b..595a5ea4c690 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -10,10 +10,12 @@ *.grp *.gz *.html +*.i *.jpeg *.ko *.log *.lst +*.moc *.mod.c *.o *.orig @@ -25,6 +27,9 @@ *.s *.sgml *.so +*.symtypes +*.tab.c +*.tab.h *.tex *.ver *.xml @@ -32,9 +37,13 @@ *_vga16.c *cscope* *~ +*.9 +*.9.gz .* .cscope 53c700_d.h +53c7xx_d.h +53c7xx_u.h 53c8xx_d.h* BitKeeper COPYING @@ -70,9 +79,11 @@ bzImage* classlist.h* comp*.log compile.h* +conf config config-* config_data.h* +config_data.gz* conmakehash consolemap_deftbl.c* crc32table.h* @@ -81,18 +92,23 @@ defkeymap.c* devlist.h* docproc dummy_sym.c* +elf2ecoff elfconfig.h* filelist fixdep fore200e_mkfirm fore200e_pca_fw.c* +gconf gen-devlist gen-kdb_cmds.c* gen_crc32table gen_init_cpio genksyms gentbl +*_gray256.c ikconfig.h* +initramfs_data.cpio +initramfs_data.cpio.gz initramfs_list kallsyms kconfig @@ -100,19 +116,30 @@ kconfig.tk keywords.c* ksym.c* ksym.h* +kxgettext +lkc_defs.h lex.c* +lex.*.c +lk201-map.c logo_*.c logo_*_clut224.c logo_*_mono.c lxdialog mach-types mach-types.h +machtypes.h make_times_h map maui_boot.h +mconf +miboot* mk_elfconfig +mkboot +mkbugboot mkdep +mkprep mktables +mktree modpost modversions.h* offset.h @@ -120,18 +147,28 @@ offsets.h oui.c* parse.c* parse.h* +patches* +pca200e.bin +pca200e_ecd.bin2 +piggy.gz +piggyback pnmtologo ppc_defs.h* promcon_tbl.c* pss_boot.h +qconf raid6altivec*.c raid6int*.c raid6tables.c +relocs +series setup sim710_d.h* +sImage sm_tbl* split-include tags +tftpboot.img times.h* tkparse trix_boot.h @@ -139,8 +176,11 @@ utsrelease.h* version.h* vmlinux vmlinux-* +vmlinux.aout vmlinux.lds vsyscall.lds wanxlfw.inc uImage -zImage +unifdef +zImage* +zconf.hash.c diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt index 19c4a6e13676..2a97320ee17f 100644 --- a/Documentation/driver-model/platform.txt +++ b/Documentation/driver-model/platform.txt @@ -96,6 +96,46 @@ System setup also associates those clocks with the device, so that that calls to clk_get(&pdev->dev, clock_name) return them as needed. +Legacy Drivers: Device Probing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Some drivers are not fully converted to the driver model, because they take +on a non-driver role: the driver registers its platform device, rather than +leaving that for system infrastructure. Such drivers can't be hotplugged +or coldplugged, since those mechanisms require device creation to be in a +different system component than the driver. + +The only "good" reason for this is to handle older system designs which, like +original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware +configuration. Newer systems have largely abandoned that model, in favor of +bus-level support for dynamic configuration (PCI, USB), or device tables +provided by the boot firmware (e.g. PNPACPI on x86). There are too many +conflicting options about what might be where, and even educated guesses by +an operating system will be wrong often enough to make trouble. + +This style of driver is discouraged. If you're updating such a driver, +please try to move the device enumeration to a more appropriate location, +outside the driver. This will usually be cleanup, since such drivers +tend to already have "normal" modes, such as ones using device nodes that +were created by PNP or by platform device setup. + +None the less, there are some APIs to support such legacy drivers. Avoid +using these calls except with such hotplug-deficient drivers. + + struct platform_device *platform_device_alloc( + char *name, unsigned id); + +You can use platform_device_alloc() to dynamically allocate a device, which +you will then initialize with resources and platform_device_register(). +A better solution is usually: + + struct platform_device *platform_device_register_simple( + char *name, unsigned id, + struct resource *res, unsigned nres); + +You can use platform_device_register_simple() as a one-step call to allocate +and register a device. + + Device Naming and Driver Binding ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The platform_device.dev.bus_id is the canonical name for the devices. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 5c8695a3d139..7d3f205b0ba5 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -62,7 +62,7 @@ Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de> What: old NCR53C9x driver When: October 2007 Why: Replaced by the much better esp_scsi driver. Actual low-level - driver can ported over almost trivially. + driver can be ported over almost trivially. Who: David Miller <davem@davemloft.net> Christoph Hellwig <hch@lst.de> @@ -70,6 +70,7 @@ Who: David Miller <davem@davemloft.net> What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. When: December 2006 +Files: include/linux/video_decoder.h Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 series. The old API have lots of drawbacks and don't provide enough means to work with all video and audio standards. The newer API is @@ -103,6 +104,7 @@ Who: Dominik Brodowski <linux@brodo.de> What: remove EXPORT_SYMBOL(kernel_thread) When: August 2006 Files: arch/*/kernel/*_ksyms.c +Funcs: kernel_thread Why: kernel_thread is a low-level implementation detail. Drivers should use the <linux/kthread.h> API instead which shields them from implementation details and provides a higherlevel interface that diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking index d7099a9266fb..ff7b611abf33 100644 --- a/Documentation/filesystems/directory-locking +++ b/Documentation/filesystems/directory-locking @@ -1,5 +1,6 @@ Locking scheme used for directory operations is based on two -kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem). +kinds of locks - per-inode (->i_mutex) and per-filesystem +(->s_vfs_rename_mutex). For our purposes all operations fall in 5 classes: @@ -63,7 +64,7 @@ objects - A < B iff A is an ancestor of B. attempt to acquire some lock and already holds at least one lock. Let's consider the set of contended locks. First of all, filesystem lock is not contended, since any process blocked on it is not holding any locks. -Thus all processes are blocked on ->i_sem. +Thus all processes are blocked on ->i_mutex. Non-directory objects are not contended due to (3). Thus link creation can't be a part of deadlock - it can't be blocked on source diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 5531694059ab..dac45c92d872 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -107,7 +107,7 @@ free to drop it... --- [informational] -->link() callers hold ->i_sem on the object we are linking to. Some of your +->link() callers hold ->i_mutex on the object we are linking to. Some of your problems might be over... --- @@ -130,9 +130,9 @@ went in - and hadn't been documented ;-/). Just remove it from fs_flags --- [mandatory] -->setattr() is called without BKL now. Caller _always_ holds ->i_sem, so -watch for ->i_sem-grabbing code that might be used by your ->setattr(). -Callers of notify_change() need ->i_sem now. +->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so +watch for ->i_mutex-grabbing code that might be used by your ->setattr(). +Callers of notify_change() need ->i_mutex now. --- [recommended] diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 6dd050878a20..145e44086358 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -94,10 +94,10 @@ largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 Note that trying to mount a tmpfs with an mpol option will fail if the running kernel does not support NUMA; and will fail if its nodelist -specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs -being mounted, but from time to time runs a kernel built without NUMA -capability (perhaps a safe recovery kernel), or configured to support -fewer nodes, then it is advisable to omit the mpol option from automatic +specifies a node which is not online. If your system relies on that +tmpfs being mounted, but from time to time runs a kernel built without +NUMA capability (perhaps a safe recovery kernel), or with fewer nodes +online, then it is advisable to omit the mpol option from automatic mount options. It can be added later, when the tmpfs is already mounted on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. @@ -121,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root. Author: Christoph Rohland <cr@sap.com>, 1.12.01 Updated: - Hugh Dickins <hugh@veritas.com>, 19 February 2006 + Hugh Dickins <hugh@veritas.com>, 4 June 2007 diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index e9cc8bb26f7d..c3480aa66ba8 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -1,7 +1,7 @@ request_firmware() hotplug interface: ------------------------------------ - Copyright (C) 2003 Manuel Estrada Sainz <ranty@debian.org> + Copyright (C) 2003 Manuel Estrada Sainz Why: --- diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c index 87feccdb5c9f..6865cbe075ec 100644 --- a/Documentation/firmware_class/firmware_sample_driver.c +++ b/Documentation/firmware_class/firmware_sample_driver.c @@ -1,7 +1,7 @@ /* * firmware_sample_driver.c - * - * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> + * Copyright (c) 2003 Manuel Estrada Sainz * * Sample code on how to use request_firmware() from drivers. * diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c index 9e1b0e4051cd..4994f1f28f8c 100644 --- a/Documentation/firmware_class/firmware_sample_firmware_class.c +++ b/Documentation/firmware_class/firmware_sample_firmware_class.c @@ -1,7 +1,7 @@ /* * firmware_sample_firmware_class.c - * - * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> + * Copyright (c) 2003 Manuel Estrada Sainz * * NOTE: This is just a probe of concept, if you think that your driver would * be well served by this mechanism please contact me first. @@ -19,7 +19,7 @@ #include <linux/firmware.h> -MODULE_AUTHOR("Manuel Estrada Sainz <ranty@debian.org>"); +MODULE_AUTHOR("Manuel Estrada Sainz"); MODULE_DESCRIPTION("Hackish sample for using firmware class directly"); MODULE_LICENSE("GPL"); diff --git a/Documentation/hrtimer/timer_stats.txt b/Documentation/hrtimer/timer_stats.txt index 27f782e3593f..22b0814d0ad0 100644 --- a/Documentation/hrtimer/timer_stats.txt +++ b/Documentation/hrtimer/timer_stats.txt @@ -2,9 +2,10 @@ timer_stats - timer usage statistics ------------------------------------ timer_stats is a debugging facility to make the timer (ab)usage in a Linux -system visible to kernel and userspace developers. It is not intended for -production usage as it adds significant overhead to the (hr)timer code and the -(hr)timer data structures. +system visible to kernel and userspace developers. If enabled in the config +but not used it has almost zero runtime overhead, and a relatively small +data structure overhead. Even if collection is enabled runtime all the +locking is per-CPU and lookup is hashed. timer_stats should be used by kernel and userspace developers to verify that their code does not make unduly use of timers. This helps to avoid unnecessary diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index 66fa67fec2a7..35985b34d5a6 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -2,7 +2,7 @@ ---------------------------- H. Peter Anvin <hpa@zytor.com> - Last update 2007-05-16 + Last update 2007-05-23 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -202,6 +202,8 @@ All general purpose boot loaders should write the fields marked nonstandard address should fill in the fields marked (reloc); other boot loaders can ignore those fields. +The byte order of all fields is littleendian (this is x86, after all.) + Field name: setup_secs Type: read Offset/size: 0x1f1/1 @@ -280,14 +282,16 @@ Type: read Offset/size: 0x206/2 Protocol: 2.00+ - Contains the boot protocol version, e.g. 0x0204 for version 2.04. + Contains the boot protocol version, in (major << 8)+minor format, + e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version + 10.17. Field name: readmode_swtch Type: modify (optional) Offset/size: 0x208/4 Protocol: 2.00+ - Boot loader hook (see separate chapter.) + Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) Field name: start_sys Type: read @@ -304,10 +308,17 @@ Protocol: 2.00+ If set to a nonzero value, contains a pointer to a NUL-terminated human-readable kernel version number string, less 0x200. This can be used to display the kernel version to the user. This value - should be less than (0x200*setup_sects). For example, if this value - is set to 0x1c00, the kernel version number string can be found at - offset 0x1e00 in the kernel file. This is a valid value if and only - if the "setup_sects" field contains the value 14 or higher. + should be less than (0x200*setup_sects). + + For example, if this value is set to 0x1c00, the kernel version + number string can be found at offset 0x1e00 in the kernel file. + This is a valid value if and only if the "setup_sects" field + contains the value 15 or higher, as: + + 0x1c00 < 15*0x200 (= 0x1e00) but + 0x1c00 >= 14*0x200 (= 0x1c00) + + 0x1c00 >> 9 = 14, so the minimum value for setup_secs is 15. Field name: type_of_loader Type: write (obligatory) @@ -377,7 +388,7 @@ Protocol: 2.00+ This field can be modified for two purposes: - 1. as a boot loader hook (see separate chapter.) + 1. as a boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) 2. if a bootloader which does not install a hook loads a relocatable kernel at a nonstandard address it will have to modify @@ -715,7 +726,7 @@ switched off, especially if the loaded kernel has the floppy driver as a demand-loaded module! -**** ADVANCED BOOT TIME HOOKS +**** ADVANCED BOOT LOADER HOOKS If the boot loader runs in a particularly hostile environment (such as LOADLIN, which runs under DOS) it may be impossible to follow the @@ -740,4 +751,5 @@ IMPORTANT: All the hooks are required to preserve %esp, %ebp, %esi and set them up to BOOT_DS (0x18) yourself. After completing your hook, you should jump to the address - that was in this field before your boot loader overwrote it. + that was in this field before your boot loader overwrote it + (relocated, if appropriate.) diff --git a/Documentation/ia64/aliasing-test.c b/Documentation/ia64/aliasing-test.c index 3153167b41c3..d485256ee1ce 100644 --- a/Documentation/ia64/aliasing-test.c +++ b/Documentation/ia64/aliasing-test.c @@ -197,7 +197,7 @@ skip: return rc; } -main() +int main() { int rc; diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt index 15f1b35deb34..d3dc505104da 100644 --- a/Documentation/initrd.txt +++ b/Documentation/initrd.txt @@ -27,16 +27,20 @@ When using initrd, the system typically boots as follows: 1) the boot loader loads the kernel and the initial RAM disk 2) the kernel converts initrd into a "normal" RAM disk and frees the memory used by initrd - 3) initrd is mounted read-write as root - 4) /linuxrc is executed (this can be any valid executable, including + 3) if the root device is not /dev/ram0, the old (deprecated) + change_root procedure is followed. see the "Obsolete root change + mechanism" section below. + 4) root device is mounted. if it is /dev/ram0, the initrd image is + then mounted as root + 5) /sbin/init is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything - init can do) - 5) linuxrc mounts the "real" root file system - 6) linuxrc places the root file system at the root directory using the + init can do). + 6) init mounts the "real" root file system + 7) init places the root file system at the root directory using the pivot_root system call - 7) the usual boot sequence (e.g. invocation of /sbin/init) is performed - on the root file system - 8) the initrd file system is removed + 8) init execs the /sbin/init on the new root filesystem, performing + the usual boot sequence + 9) the initrd file system is removed Note that changing the root directory does not involve unmounting it. It is therefore possible to leave processes running on initrd during that @@ -70,7 +74,7 @@ initrd adds the following new options: root=/dev/ram0 initrd is mounted as root, and the normal boot procedure is followed, - with the RAM disk still mounted as root. + with the RAM disk mounted as root. Compressed cpio images ---------------------- @@ -137,11 +141,11 @@ We'll describe the loopback device method: # mkdir /mnt/dev # mknod /mnt/dev/console c 5 1 5) copy all the files that are needed to properly use the initrd - environment. Don't forget the most important file, /linuxrc - Note that /linuxrc's permissions must include "x" (execute). + environment. Don't forget the most important file, /sbin/init + Note that /sbin/init's permissions must include "x" (execute). 6) correct operation the initrd environment can frequently be tested even without rebooting with the command - # chroot /mnt /linuxrc + # chroot /mnt /sbin/init This is of course limited to initrds that do not interfere with the general system state (e.g. by reconfiguring network interfaces, overwriting mounted devices, trying to start already running demons, @@ -154,7 +158,7 @@ We'll describe the loopback device method: # gzip -9 initrd For experimenting with initrd, you may want to take a rescue floppy and -only add a symbolic link from /linuxrc to /bin/sh. Alternatively, you +only add a symbolic link from /sbin/init to /bin/sh. Alternatively, you can try the experimental newlib environment [2] to create a small initrd. @@ -163,15 +167,14 @@ boot loaders support initrd. Since the boot process is still compatible with an older mechanism, the following boot command line parameters have to be given: - root=/dev/ram0 init=/linuxrc rw + root=/dev/ram0 rw (rw is only necessary if writing to the initrd file system.) With LOADLIN, you simply execute LOADLIN <kernel> initrd=<disk_image> -e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 - init=/linuxrc rw +e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw With LILO, you add the option INITRD=<path> to either the global section or to the section of the respective kernel in /etc/lilo.conf, and pass @@ -179,7 +182,7 @@ the options using APPEND, e.g. image = /bzImage initrd = /boot/initrd.gz - append = "root=/dev/ram0 init=/linuxrc rw" + append = "root=/dev/ram0 rw" and run /sbin/lilo @@ -191,7 +194,7 @@ Now you can boot and enjoy using initrd. Changing the root device ------------------------ -When finished with its duties, linuxrc typically changes the root device +When finished with its duties, init typically changes the root device and proceeds with starting the Linux system on the "real" root device. The procedure involves the following steps: @@ -217,7 +220,7 @@ must exist before calling pivot_root. Example: # mkdir initrd # pivot_root . initrd -Now, the linuxrc process may still access the old root via its +Now, the init process may still access the old root via its executable, shared libraries, standard input/output/error, and its current root directory. All these references are dropped by the following command: @@ -249,10 +252,6 @@ disk can be freed: It is also possible to use initrd with an NFS-mounted root, see the pivot_root(8) man page for details. -Note: if linuxrc or any program exec'ed from it terminates for some -reason, the old change_root mechanism is invoked (see section "Obsolete -root change mechanism"). - Usage scenarios --------------- @@ -264,15 +263,15 @@ as follows: 1) system boots from floppy or other media with a minimal kernel (e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and loads initrd - 2) /linuxrc determines what is needed to (1) mount the "real" root FS + 2) /sbin/init determines what is needed to (1) mount the "real" root FS (i.e. device type, device drivers, file system) and (2) the distribution media (e.g. CD-ROM, network, tape, ...). This can be done by asking the user, by auto-probing, or by using a hybrid approach. - 3) /linuxrc loads the necessary kernel modules - 4) /linuxrc creates and populates the root file system (this doesn't + 3) /sbin/init loads the necessary kernel modules + 4) /sbin/init creates and populates the root file system (this doesn't have to be a very usable system yet) - 5) /linuxrc invokes pivot_root to change the root file system and + 5) /sbin/init invokes pivot_root to change the root file system and execs - via chroot - a program that continues the installation 6) the boot loader is installed 7) the boot loader is configured to load an initrd with the set of @@ -291,7 +290,7 @@ different hardware configurations in a single administrative domain. In such cases, it is desirable to generate only a small set of kernels (ideally only one) and to keep the system-specific part of configuration information as small as possible. In this case, a common initrd could be -generated with all the necessary modules. Then, only /linuxrc or a file +generated with all the necessary modules. Then, only /sbin/init or a file read by it would have to be different. A third scenario are more convenient recovery disks, because information @@ -337,6 +336,25 @@ This old, deprecated mechanism is commonly called "change_root", while the new, supported mechanism is called "pivot_root". +Mixed change_root and pivot_root mechanism +------------------------------------------ + +In case you did not want to use root=/dev/ram0 to trig the pivot_root mechanism, +you may create both /linuxrc and /sbin/init in your initrd image. + +/linuxrc would contain only the following: + +#! /bin/sh +mount -n -t proc proc /proc +echo 0x0100 >/proc/sys/kernel/real-root-dev +umount -n /proc + +Once linuxrc exited, the kernel would mount again your initrd as root, +this time executing /sbin/init. Again, it would be duty of this init +to build the right environment (maybe using the root= device passed on +the cmdline) before the final execution of the real /sbin/init. + + Resources --------- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 09220a1e22d9..5d0283cd3a81 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -170,7 +170,10 @@ and is between 256 and 4096 characters. It is defined in the file acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS Format: To spoof as Windows 98: ="Microsoft Windows" - acpi_osi= [HW,ACPI] empty param disables _OSI + acpi_osi= [HW,ACPI] Modify list of supported OS interface strings + acpi_osi="string1" # add string1 -- only one string + acpi_osi="!string2" # remove built-in string2 + acpi_osi= # disable all strings acpi_serialize [HW,ACPI] force serialization of AML methods @@ -396,6 +399,26 @@ and is between 256 and 4096 characters. It is defined in the file clocksource is not available, it defaults to PIT. Format: { pit | tsc | cyclone | pmtmr } + clocksource= [GENERIC_TIME] Override the default clocksource + Format: <string> + Override the default clocksource and use the clocksource + with the name specified. + Some clocksource names to choose from, depending on + the platform: + [all] jiffies (this is the base, fallback clocksource) + [ACPI] acpi_pm + [ARM] imx_timer1,OSTS,netx_timer,mpu_timer2, + pxa_timer,timer3,32k_counter,timer0_1 + [AVR32] avr32 + [IA-32] pit,hpet,tsc,vmi-timer; + scx200_hrt on Geode; cyclone on IBM x440 + [MIPS] MIPS + [PARISC] cr16 + [S390] tod + [SH] SuperH + [SPARC64] tick + [X86-64] hpet,tsc + code_bytes [IA32] How many bytes of object code to print in an oops report. Range: 0 - 8192 @@ -1112,9 +1135,9 @@ and is between 256 and 4096 characters. It is defined in the file when set. Format: <int> - noaliencache [MM, NUMA] Disables the allcoation of alien caches in - the slab allocator. Saves per-node memory, but will - impact performance on real NUMA hardware. + noaliencache [MM, NUMA, SLAB] Disables the allocation of alien + caches in the slab allocator. Saves per-node memory, + but will impact performance. noalign [KNL,ARM] @@ -1593,6 +1616,37 @@ and is between 256 and 4096 characters. It is defined in the file slram= [HW,MTD] + slub_debug [MM, SLUB] + Enabling slub_debug allows one to determine the culprit + if slab objects become corrupted. Enabling slub_debug + creates guard zones around objects and poisons objects + when not in use. Also tracks the last alloc / free. + For more information see Documentation/vm/slub.txt. + + slub_max_order= [MM, SLUB] + Determines the maximum allowed order for slabs. Setting + this too high may cause fragmentation. + For more information see Documentation/vm/slub.txt. + + slub_min_objects= [MM, SLUB] + The minimum objects per slab. SLUB will increase the + slab order up to slub_max_order to generate a + sufficiently big slab to satisfy the number of objects. + The higher the number of objects the smaller the overhead + of tracking slabs. + For more information see Documentation/vm/slub.txt. + + slub_min_order= [MM, SLUB] + Determines the mininum page order for slabs. Must be + lower than slub_max_order + For more information see Documentation/vm/slub.txt. + + slub_nomerge [MM, SLUB] + Disable merging of slabs of similar size. May be + necessary if there is some reason to distinguish + allocs to different slabs. + For more information see Documentation/vm/slub.txt. + smart2= [HW] Format: <io1>[,<io2>[,...,<io8>]] @@ -1807,10 +1861,6 @@ and is between 256 and 4096 characters. It is defined in the file time Show timing data prefixed to each printk message line - clocksource= [GENERIC_TIME] Override the default clocksource - Override the default clocksource and use the clocksource - with the name specified. - tipar.timeout= [HW,PPT] Set communications timeout in tenths of a second (default 15). diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 58408dd023c7..650657c54733 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -24,7 +24,7 @@ Contents: (*) Explicit kernel barriers. - Compiler barrier. - - The CPU memory barriers. + - CPU memory barriers. - MMIO write barrier. (*) Implicit kernel memory barriers. @@ -265,7 +265,7 @@ Memory barriers are such interventions. They impose a perceived partial ordering over the memory operations on either side of the barrier. Such enforcement is important because the CPUs and other devices in a system -can use a variety of tricks to improve performance - including reordering, +can use a variety of tricks to improve performance, including reordering, deferral and combination of memory operations; speculative loads; speculative branch prediction and various types of caching. Memory barriers are used to override or suppress these tricks, allowing the code to sanely control the @@ -457,7 +457,7 @@ sequence, Q must be either &A or &B, and that: (Q == &A) implies (D == 1) (Q == &B) implies (D == 4) -But! CPU 2's perception of P may be updated _before_ its perception of B, thus +But! CPU 2's perception of P may be updated _before_ its perception of B, thus leading to the following situation: (Q == &B) and (D == 2) ???? @@ -573,7 +573,7 @@ Basically, the read barrier always has to be there, even though it can be of the "weaker" type. [!] Note that the stores before the write barrier would normally be expected to -match the loads after the read barrier or data dependency barrier, and vice +match the loads after the read barrier or the data dependency barrier, and vice versa: CPU 1 CPU 2 @@ -588,7 +588,7 @@ versa: EXAMPLES OF MEMORY BARRIER SEQUENCES ------------------------------------ -Firstly, write barriers act as a partial orderings on store operations. +Firstly, write barriers act as partial orderings on store operations. Consider the following sequence of events: CPU 1 @@ -608,15 +608,15 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E +-------+ : : | | +------+ | |------>| C=3 | } /\ - | | : +------+ }----- \ -----> Events perceptible - | | : | A=1 | } \/ to rest of system + | | : +------+ }----- \ -----> Events perceptible to + | | : | A=1 | } \/ the rest of the system | | : +------+ } | CPU 1 | : | B=2 | } | | +------+ } | | wwwwwwwwwwwwwwww } <--- At this point the write barrier | | +------+ } requires all stores prior to the | | : | E=5 | } barrier to be committed before - | | : +------+ } further stores may be take place. + | | : +------+ } further stores may take place | |------>| D=4 | } | | +------+ +-------+ : : @@ -626,7 +626,7 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E V -Secondly, data dependency barriers act as a partial orderings on data-dependent +Secondly, data dependency barriers act as partial orderings on data-dependent loads. Consider the following sequence of events: CPU 1 CPU 2 @@ -975,7 +975,7 @@ compiler from moving the memory accesses either side of it to the other side: barrier(); -This a general barrier - lesser varieties of compiler barrier do not exist. +This is a general barrier - lesser varieties of compiler barrier do not exist. The compiler barrier has no direct effect on the CPU, which may then reorder things however it wishes. @@ -997,7 +997,7 @@ The Linux kernel has eight basic CPU memory barriers: All CPU memory barriers unconditionally imply compiler barriers. SMP memory barriers are reduced to compiler barriers on uniprocessor compiled -systems because it is assumed that a CPU will be appear to be self-consistent, +systems because it is assumed that a CPU will appear to be self-consistent, and will order overlapping accesses correctly with respect to itself. [!] Note that SMP memory barriers _must_ be used to control the ordering of @@ -1146,9 +1146,9 @@ for each construct. These operations all imply certain barriers: Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. -[!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way - barriers is that the effects instructions outside of a critical section may - seep into the inside of the critical section. +[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way + barriers is that the effects of instructions outside of a critical section + may seep into the inside of the critical section. A LOCK followed by an UNLOCK may not be assumed to be full memory barrier because it is possible for an access preceding the LOCK to happen after the @@ -1239,7 +1239,7 @@ three CPUs; then should the following sequence of events occur: UNLOCK M UNLOCK Q *D = d; *H = h; -Then there is no guarantee as to what order CPU #3 will see the accesses to *A +Then there is no guarantee as to what order CPU 3 will see the accesses to *A through *H occur in, other than the constraints imposed by the separate locks on the separate CPUs. It might, for example, see: @@ -1269,12 +1269,12 @@ However, if the following occurs: UNLOCK M [2] *H = h; -CPU #3 might see: +CPU 3 might see: *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], LOCK M [2], *H, *F, *G, UNLOCK M [2], *D -But assuming CPU #1 gets the lock first, it won't see any of: +But assuming CPU 1 gets the lock first, CPU 3 won't see any of: *B, *C, *D, *F, *G or *H preceding LOCK M [1] *A, *B or *C following UNLOCK M [1] @@ -1327,12 +1327,12 @@ spinlock, for example: mmiowb(); spin_unlock(Q); -this will ensure that the two stores issued on CPU #1 appear at the PCI bridge -before either of the stores issued on CPU #2. +this will ensure that the two stores issued on CPU 1 appear at the PCI bridge +before either of the stores issued on CPU 2. -Furthermore, following a store by a load to the same device obviates the need -for an mmiowb(), because the load forces the store to complete before the load +Furthermore, following a store by a load from the same device obviates the need +for the mmiowb(), because the load forces the store to complete before the load is performed: CPU 1 CPU 2 @@ -1363,7 +1363,7 @@ circumstances in which reordering definitely _could_ be a problem: (*) Atomic operations. - (*) Accessing devices (I/O). + (*) Accessing devices. (*) Interrupts. @@ -1399,7 +1399,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: (1) read the next pointer from this waiter's record to know as to where the next waiter record is; - (4) read the pointer to the waiter's task structure; + (2) read the pointer to the waiter's task structure; (3) clear the task pointer to tell the waiter it has been given the semaphore; @@ -1407,7 +1407,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: (5) release the reference held on the waiter's task struct. -In otherwords, it has to perform this sequence of events: +In other words, it has to perform this sequence of events: LOAD waiter->list.next; LOAD waiter->task; @@ -1502,7 +1502,7 @@ operations and adjusting reference counters towards object destruction, and as such the implicit memory barrier effects are necessary. -The following operation are potential problems as they do _not_ imply memory +The following operations are potential problems as they do _not_ imply memory barriers, but might be used for implementing such things as UNLOCK-class operations: @@ -1517,7 +1517,7 @@ With these the appropriate explicit memory barrier should be used if necessary The following also do _not_ imply memory barriers, and so may require explicit memory barriers under some circumstances (smp_mb__before_atomic_dec() for -instance)): +instance): atomic_add(); atomic_sub(); @@ -1641,8 +1641,8 @@ functions: indeed have special I/O space access cycles and instructions, but many CPUs don't have such a concept. - The PCI bus, amongst others, defines an I/O space concept - which on such - CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O + The PCI bus, amongst others, defines an I/O space concept which - on such + CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O space. However, it may also be mapped as a virtual I/O space in the CPU's memory map, particularly on those CPUs that don't support alternate I/O spaces. @@ -1664,7 +1664,7 @@ functions: i386 architecture machines, for example, this is controlled by way of the MTRR registers. - Ordinarily, these will be guaranteed to be fully ordered and uncombined,, + Ordinarily, these will be guaranteed to be fully ordered and uncombined, provided they're not accessing a prefetchable device. However, intermediary hardware (such as a PCI bridge) may indulge in @@ -1689,7 +1689,7 @@ functions: (*) ioreadX(), iowriteX() - These will perform as appropriate for the type of access they're actually + These will perform appropriately for the type of access they're actually doing, be it inX()/outX() or readX()/writeX(). @@ -1705,7 +1705,7 @@ of arch-specific code. This means that it must be considered that the CPU will execute its instruction stream in any order it feels like - or even in parallel - provided that if an -instruction in the stream depends on the an earlier instruction, then that +instruction in the stream depends on an earlier instruction, then that earlier instruction must be sufficiently complete[*] before the later instruction may proceed; in other words: provided that the appearance of causality is maintained. @@ -1795,8 +1795,8 @@ eventually become visible on all CPUs, there's no guarantee that they will become apparent in the same order on those other CPUs. -Consider dealing with a system that has pair of CPUs (1 & 2), each of which has -a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): +Consider dealing with a system that has a pair of CPUs (1 & 2), each of which +has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): : : +--------+ @@ -1835,7 +1835,7 @@ Imagine the system has the following properties: (*) the coherency queue is not flushed by normal loads to lines already present in the cache, even though the contents of the queue may - potentially effect those loads. + potentially affect those loads. Imagine, then, that two writes are made on the first CPU, with a write barrier between them to guarantee that they will appear to reach that CPU's caches in @@ -1845,7 +1845,7 @@ the requisite order: =============== =============== ======================================= u == 0, v == 1 and p == &u, q == &u v = 2; - smp_wmb(); Make sure change to v visible before + smp_wmb(); Make sure change to v is visible before change to p <A:modify v=2> v is now in cache A exclusively p = &v; @@ -1853,7 +1853,7 @@ the requisite order: The write memory barrier forces the other CPUs in the system to perceive that the local CPU's caches have apparently been updated in the correct order. But -now imagine that the second CPU that wants to read those values: +now imagine that the second CPU wants to read those values: CPU 1 CPU 2 COMMENT =============== =============== ======================================= @@ -1861,7 +1861,7 @@ now imagine that the second CPU that wants to read those values: q = p; x = *q; -The above pair of reads may then fail to happen in expected order, as the +The above pair of reads may then fail to happen in the expected order, as the cacheline holding p may get updated in one of the second CPU's caches whilst the update to the cacheline holding v is delayed in the other of the second CPU's caches by some other cache event: @@ -1916,7 +1916,7 @@ access depends on a read, not all do, so it may not be relied on. Other CPUs may also have split caches, but must coordinate between the various cachelets for normal memory accesses. The semantics of the Alpha removes the -need for coordination in absence of memory barriers. +need for coordination in the absence of memory barriers. CACHE COHERENCY VS DMA @@ -1931,10 +1931,10 @@ invalidate them as well). In addition, the data DMA'd to RAM by a device may be overwritten by dirty cache lines being written back to RAM from a CPU's cache after the device has -installed its own data, or cache lines simply present in a CPUs cache may -simply obscure the fact that RAM has been updated, until at such time as the -cacheline is discarded from the CPU's cache and reloaded. To deal with this, -the appropriate part of the kernel must invalidate the overlapping bits of the +installed its own data, or cache lines present in the CPU's cache may simply +obscure the fact that RAM has been updated, until at such time as the cacheline +is discarded from the CPU's cache and reloaded. To deal with this, the +appropriate part of the kernel must invalidate the overlapping bits of the cache on each CPU. See Documentation/cachetlb.txt for more information on cache management. @@ -1944,7 +1944,7 @@ CACHE COHERENCY VS MMIO ----------------------- Memory mapped I/O usually takes place through memory locations that are part of -a window in the CPU's memory space that have different properties assigned than +a window in the CPU's memory space that has different properties assigned than the usual RAM directed window. Amongst these properties is usually the fact that such accesses bypass the @@ -1960,7 +1960,7 @@ THE THINGS CPUS GET UP TO ========================= A programmer might take it for granted that the CPU will perform memory -operations in exactly the order specified, so that if a CPU is, for example, +operations in exactly the order specified, so that if the CPU is, for example, given the following piece of code to execute: a = *A; @@ -1969,7 +1969,7 @@ given the following piece of code to execute: d = *D; *E = e; -They would then expect that the CPU will complete the memory operation for each +they would then expect that the CPU will complete the memory operation for each instruction before moving on to the next one, leading to a definite sequence of operations as seen by external observers in the system: @@ -1986,8 +1986,8 @@ assumption doesn't hold because: (*) loads may be done speculatively, and the result discarded should it prove to have been unnecessary; - (*) loads may be done speculatively, leading to the result having being - fetched at the wrong time in the expected sequence of events; + (*) loads may be done speculatively, leading to the result having been fetched + at the wrong time in the expected sequence of events; (*) the order of the memory accesses may be rearranged to promote better use of the CPU buses and caches; @@ -2069,12 +2069,12 @@ AND THEN THERE'S THE ALPHA The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, some versions of the Alpha CPU have a split data cache, permitting them to have -two semantically related cache lines updating at separate times. This is where +two semantically-related cache lines updated at separate times. This is where the data dependency barrier really becomes necessary as this synchronises both caches with the memory coherence system, thus making it seem like pointer changes vs new data occur in the right order. -The Alpha defines the Linux's kernel's memory barrier model. +The Alpha defines the Linux kernel's memory barrier model. See the subsection on "Cache Coherency" above. diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index e06b6e3c1db5..153d84d281e6 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -32,6 +32,8 @@ cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver +cxacru.txt + - Conexant AccessRunner USB ADSL Modem de4x5.txt - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver decnet.txt diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt new file mode 100644 index 000000000000..b074681a963e --- /dev/null +++ b/Documentation/networking/cxacru.txt @@ -0,0 +1,84 @@ +Firmware is required for this device: http://accessrunner.sourceforge.net/ + +While it is capable of managing/maintaining the ADSL connection without the +module loaded, the device will sometimes stop responding after unloading the +driver and it is necessary to unplug/remove power to the device to fix this. + +Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ +these are directories named cxacruN where N is the device number. A symlink +named device points to the USB interface device's directory which contains +several sysfs attribute files for retrieving device statistics: + +* adsl_controller_version + +* adsl_headend +* adsl_headend_environment + Information about the remote headend. + +* downstream_attenuation (dB) +* downstream_bits_per_frame +* downstream_rate (kbps) +* downstream_snr_margin (dB) + Downstream stats. + +* upstream_attenuation (dB) +* upstream_bits_per_frame +* upstream_rate (kbps) +* upstream_snr_margin (dB) +* transmitter_power (dBm/Hz) + Upstream stats. + +* downstream_crc_errors +* downstream_fec_errors +* downstream_hec_errors +* upstream_crc_errors +* upstream_fec_errors +* upstream_hec_errors + Error counts. + +* line_startable + Indicates that ADSL support on the device + is/can be enabled, see adsl_start. + +* line_status + "initialising" + "down" + "attempting to activate" + "training" + "channel analysis" + "exchange" + "waiting" + "up" + + Changes between "down" and "attempting to activate" + if there is no signal. + +* link_status + "not connected" + "connected" + "lost" + +* mac_address + +* modulation + "ANSI T1.413" + "ITU-T G.992.1 (G.DMT)" + "ITU-T G.992.2 (G.LITE)" + +* startup_attempts + Count of total attempts to initialise ADSL. + +To enable/disable ADSL, the following can be written to the adsl_state file: + "start" + "stop + "restart" (stops, waits 1.5s, then starts) + "poll" (used to resume status polling if it was disabled due to failure) + +Changes in adsl/line state are reported via kernel log messages: + [4942145.150704] ATM dev 0: ADSL state: running + [4942243.663766] ATM dev 0: ADSL line: down + [4942249.665075] ATM dev 0: ADSL line: attempting to activate + [4942253.654954] ATM dev 0: ADSL line: training + [4942255.666387] ATM dev 0: ADSL line: channel analysis + [4942259.656262] ATM dev 0: ADSL line: exchange + [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up) diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.txt new file mode 100644 index 000000000000..5bbd16792fe1 --- /dev/null +++ b/Documentation/networking/xfrm_sysctl.txt @@ -0,0 +1,4 @@ +/proc/sys/net/core/xfrm_* Variables: + +xfrm_acq_expires - INTEGER + default 30 - hard timeout in seconds for acquire requests diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index b49ce169a63a..d42d98107d49 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1,7 +1,6 @@ Booting the Linux/ppc kernel without Open Firmware -------------------------------------------------- - (c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, IBM Corp. (c) 2005 Becky Bruce <becky.bruce at freescale.com>, @@ -9,6 +8,62 @@ (c) 2006 MontaVista Software, Inc. Flash chip node definition +Table of Contents +================= + + I - Introduction + 1) Entry point for arch/powerpc + 2) Board support + + II - The DT block format + 1) Header + 2) Device tree generalities + 3) Device tree "structure" block + 4) Device tree "strings" block + + III - Required content of the device tree + 1) Note about cells and address representation + 2) Note about "compatible" properties + 3) Note about "name" properties + 4) Note about node and property names and character set + 5) Required nodes and properties + a) The root node + b) The /cpus node + c) The /cpus/* nodes + d) the /memory node(s) + e) The /chosen node + f) the /soc<SOCname> node + + IV - "dtc", the device tree compiler + + V - Recommendations for a bootloader + + VI - System-on-a-chip devices and nodes + 1) Defining child nodes of an SOC + 2) Representing devices without a current OF specification + a) MDIO IO device + c) PHY nodes + b) Gianfar-compatible ethernet nodes + d) Interrupt controllers + e) I2C + f) Freescale SOC USB controllers + g) Freescale SOC SEC Security Engines + h) Board Control and Status (BCSR) + i) Freescale QUICC Engine module (QE) + g) Flash chip nodes + + VII - Specifying interrupt information for devices + 1) interrupts property + 2) interrupt-parent property + 3) OpenPIC Interrupt Controllers + 4) ISA Interrupt Controllers + + Appendix A - Sample SOC node for MPC8540 + + +Revision Information +==================== + May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or @@ -1687,7 +1742,7 @@ platforms are moved over to use the flattened-device-tree model. }; }; - g) Flash chip nodes + j) Flash chip nodes Flash chips (Memory Technology Devices) are often used for solid state file systems on embedded devices. diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 57b878cc393c..355ff0a2bb7c 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -917,6 +917,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. ref Reference board, base config m2-2 Some Gateway MX series laptops m6 Some Gateway NX series laptops + pa6 Gateway NX860 series STAC9227/9228/9229/927x ref Reference board diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index 795fbb48ffa7..76ea6c837be5 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -1,26 +1,30 @@ Overview of Linux kernel SPI support ==================================== -02-Dec-2005 +21-May-2007 What is SPI? ------------ The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial link used to connect microcontrollers to sensors, memory, and peripherals. +It's a simple "de facto" standard, not complicated enough to acquire a +standardization body. SPI uses a master/slave configuration. The three signal wires hold a clock (SCK, often on the order of 10 MHz), and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, Slave Out" (MISO) signals. (Other names are also used.) There are four clocking modes through which data is exchanged; mode-0 and mode-3 are most commonly used. Each clock cycle shifts data out and data in; the clock -doesn't cycle except when there is data to shift. +doesn't cycle except when there is a data bit to shift. Not all data bits +are used though; not every protocol uses those full duplex capabilities. -SPI masters may use a "chip select" line to activate a given SPI slave +SPI masters use a fourth "chip select" line to activate a given SPI slave device, so those three signal wires may be connected to several chips -in parallel. All SPI slaves support chipselects. Some devices have +in parallel. All SPI slaves support chipselects; they are usually active +low signals, labeled nCSx for slave 'x' (e.g. nCS0). Some devices have other signals, often including an interrupt to the master. -Unlike serial busses like USB or SMBUS, even low level protocols for +Unlike serial busses like USB or SMBus, even low level protocols for SPI slave functions are usually not interoperable between vendors (except for commodities like SPI memory chips). @@ -33,6 +37,11 @@ SPI slave functions are usually not interoperable between vendors - Some devices may use eight bit words. Others may different word lengths, such as streams of 12-bit or 20-bit digital samples. + - Words are usually sent with their most significant bit (MSB) first, + but sometimes the least significant bit (LSB) goes first instead. + + - Sometimes SPI is used to daisy-chain devices, like shift registers. + In the same way, SPI slaves will only rarely support any kind of automatic discovery/enumeration protocol. The tree of slave devices accessible from a given SPI master will normally be set up manually, with configuration @@ -44,6 +53,14 @@ half-duplex SPI, for request/response protocols), SSP ("Synchronous Serial Protocol"), PSP ("Programmable Serial Protocol"), and other related protocols. +Some chips eliminate a signal line by combining MOSI and MISO, and +limiting themselves to half-duplex at the hardware level. In fact +some SPI chips have this signal mode as a strapping option. These +can be accessed using the same programming interface as SPI, but of +course they won't handle full duplex transfers. You may find such +chips described as using "three wire" signaling: SCK, data, nCSx. +(That data line is sometimes called MOMI or SISO.) + Microcontrollers often support both master and slave sides of the SPI protocol. This document (and Linux) currently only supports the master side of SPI interactions. @@ -74,6 +91,32 @@ interfaces with SPI modes. Given SPI support, they could use MMC or SD cards without needing a special purpose MMC/SD/SDIO controller. +I'm confused. What are these four SPI "clock modes"? +----------------------------------------------------- +It's easy to be confused here, and the vendor documentation you'll +find isn't necessarily helpful. The four modes combine two mode bits: + + - CPOL indicates the initial clock polarity. CPOL=0 means the + clock starts low, so the first (leading) edge is rising, and + the second (trailing) edge is falling. CPOL=1 means the clock + starts high, so the first (leading) edge is falling. + + - CPHA indicates the clock phase used to sample data; CPHA=0 says + sample on the leading edge, CPHA=1 means the trailing edge. + + Since the signal needs to stablize before it's sampled, CPHA=0 + implies that its data is written half a clock before the first + clock edge. The chipselect may have made it become available. + +Chip specs won't always say "uses SPI mode X" in as many words, +but their timing diagrams will make the CPOL and CPHA modes clear. + +In the SPI mode number, CPOL is the high order bit and CPHA is the +low order bit. So when a chip's timing diagram shows the clock +starting low (CPOL=0) and data stabilized for sampling during the +trailing clock edge (CPHA=1), that's SPI mode 1. + + How do these driver programming interfaces work? ------------------------------------------------ The <linux/spi/spi.h> header file includes kerneldoc, as does the diff --git a/Documentation/thinkpad-acpi.txt b/Documentation/thinkpad-acpi.txt index 2d4803359a04..9e6b94face4b 100644 --- a/Documentation/thinkpad-acpi.txt +++ b/Documentation/thinkpad-acpi.txt @@ -138,7 +138,7 @@ Hot keys -------- procfs: /proc/acpi/ibm/hotkey -sysfs device attribute: hotkey/* +sysfs device attribute: hotkey_* Without this driver, only the Fn-F4 key (sleep button) generates an ACPI event. With the driver loaded, the hotkey feature enabled and the @@ -196,10 +196,7 @@ The following commands can be written to the /proc/acpi/ibm/hotkey file: sysfs notes: - The hot keys attributes are in a hotkey/ subdirectory off the - thinkpad device. - - bios_enabled: + hotkey_bios_enabled: Returns the status of the hot keys feature when thinkpad-acpi was loaded. Upon module unload, the hot key feature status will be restored to this value. @@ -207,19 +204,19 @@ sysfs notes: 0: hot keys were disabled 1: hot keys were enabled - bios_mask: + hotkey_bios_mask: Returns the hot keys mask when thinkpad-acpi was loaded. Upon module unload, the hot keys mask will be restored to this value. - enable: + hotkey_enable: Enables/disables the hot keys feature, and reports current status of the hot keys feature. 0: disables the hot keys feature / feature disabled 1: enables the hot keys feature / feature enabled - mask: + hotkey_mask: bit mask to enable ACPI event generation for each hot key (see above). Returns the current status of the hot keys mask, and allows one to modify it. @@ -229,7 +226,7 @@ Bluetooth --------- procfs: /proc/acpi/ibm/bluetooth -sysfs device attribute: bluetooth/enable +sysfs device attribute: bluetooth_enable This feature shows the presence and current state of a ThinkPad Bluetooth device in the internal ThinkPad CDC slot. @@ -244,7 +241,7 @@ If Bluetooth is installed, the following commands can be used: Sysfs notes: If the Bluetooth CDC card is installed, it can be enabled / - disabled through the "bluetooth/enable" thinkpad-acpi device + disabled through the "bluetooth_enable" thinkpad-acpi device attribute, and its current status can also be queried. enable: @@ -252,7 +249,7 @@ Sysfs notes: 1: enables Bluetooth / Bluetooth is enabled. Note: this interface will be probably be superseeded by the - generic rfkill class. + generic rfkill class, so it is NOT to be considered stable yet. Video output control -- /proc/acpi/ibm/video -------------------------------------------- @@ -898,7 +895,7 @@ EXPERIMENTAL: WAN ----------------- procfs: /proc/acpi/ibm/wan -sysfs device attribute: wwan/enable +sysfs device attribute: wwan_enable This feature is marked EXPERIMENTAL because the implementation directly accesses hardware registers and may not work as expected. USE @@ -921,7 +918,7 @@ If the W-WAN card is installed, the following commands can be used: Sysfs notes: If the W-WAN card is installed, it can be enabled / - disabled through the "wwan/enable" thinkpad-acpi device + disabled through the "wwan_enable" thinkpad-acpi device attribute, and its current status can also be queried. enable: @@ -929,7 +926,7 @@ Sysfs notes: 1: enables WWAN card / WWAN card is enabled. Note: this interface will be probably be superseeded by the - generic rfkill class. + generic rfkill class, so it is NOT to be considered stable yet. Multiple Commands, Module Parameters ------------------------------------ diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index 727c8d81aeaf..1523320abd87 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt @@ -1,13 +1,9 @@ Short users guide for SLUB -------------------------- -First of all slub should transparently replace SLAB. If you enable -SLUB then everything should work the same (Note the word "should". -There is likely not much value in that word at this point). - The basic philosophy of SLUB is very different from SLAB. SLAB requires rebuilding the kernel to activate debug options for all -SLABS. SLUB always includes full debugging but its off by default. +slab caches. SLUB always includes full debugging but it is off by default. SLUB can enable debugging only for selected slabs in order to avoid an impact on overall system performance which may make a bug more difficult to find. @@ -76,13 +72,28 @@ of objects. Careful with tracing: It may spew out lots of information and never stop if used on the wrong slab. -SLAB Merging +Slab merging ------------ -If no debugging is specified then SLUB may merge similar slabs together +If no debug options are specified then SLUB may merge similar slabs together in order to reduce overhead and increase cache hotness of objects. slabinfo -a displays which slabs were merged together. +Slab validation +--------------- + +SLUB can validate all object if the kernel was booted with slub_debug. In +order to do so you must have the slabinfo tool. Then you can do + +slabinfo -v + +which will test all objects. Output will be generated to the syslog. + +This also works in a more limited way if boot was without slab debug. +In that case slabinfo -v simply tests all reachable objects. Usually +these are in the cpu slabs and the partial slabs. Full slabs are not +tracked by SLUB in a non debug situation. + Getting more performance ------------------------ @@ -91,9 +102,9 @@ list_lock once in a while to deal with partial slabs. That overhead is governed by the order of the allocation for each slab. The allocations can be influenced by kernel parameters: -slub_min_objects=x (default 8) +slub_min_objects=x (default 4) slub_min_order=x (default 0) -slub_max_order=x (default 4) +slub_max_order=x (default 1) slub_min_objects allows to specify how many objects must at least fit into one slab in order for the allocation order to be acceptable. @@ -109,5 +120,107 @@ longer be checked. This is useful to avoid SLUB trying to generate super large order pages to fit slub_min_objects of a slab cache with large object sizes into one high order page. - -Christoph Lameter, <clameter@sgi.com>, April 10, 2007 +SLUB Debug output +----------------- + +Here is a sample of slub debug output: + +*** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58 + Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ + Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 + Redzone 0xc90f6d28: 00 cc cc cc . +FreePointer 0xc90f6d2c -> 0xc90f6d58 +Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554 +Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ + [<c010523d>] dump_trace+0x63/0x1eb + [<c01053df>] show_trace_log_lvl+0x1a/0x2f + [<c010601d>] show_trace+0x12/0x14 + [<c0106035>] dump_stack+0x16/0x18 + [<c017e0fa>] object_err+0x143/0x14b + [<c017e2cc>] check_object+0x66/0x234 + [<c017eb43>] __slab_free+0x239/0x384 + [<c017f446>] kfree+0xa6/0xc6 + [<c02e2335>] get_modalias+0xb9/0xf5 + [<c02e23b7>] dmi_dev_uevent+0x27/0x3c + [<c027866a>] dev_uevent+0x1ad/0x1da + [<c0205024>] kobject_uevent_env+0x20a/0x45b + [<c020527f>] kobject_uevent+0xa/0xf + [<c02779f1>] store_uevent+0x4f/0x58 + [<c027758e>] dev_attr_store+0x29/0x2f + [<c01bec4f>] sysfs_write_file+0x16e/0x19c + [<c0183ba7>] vfs_write+0xd1/0x15a + [<c01841d7>] sys_write+0x3d/0x72 + [<c0104112>] sysenter_past_esp+0x5f/0x99 + [<b7f7b410>] 0xb7f7b410 + ======================= +@@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b + + + +If SLUB encounters a corrupted object then it will perform the following +actions: + +1. Isolation and report of the issue + +This will be a message in the system log starting with + +*** SLUB <slab cache affected>: <What went wrong>@<object address> +offset=<offset of object into slab> flags=<slabflags> +inuse=<objects in use in this slab> freelist=<first free object in slab> + +2. Report on how the problem was dealt with in order to ensure the continued +operation of the system. + +These are messages in the system log beginning with + +@@@ SLUB <slab cache affected>: <corrective action taken> + + +In the above sample SLUB found that the Redzone of an active object has +been overwritten. Here a string of 8 characters was written into a slab that +has the length of 8 characters. However, a 8 character string needs a +terminating 0. That zero has overwritten the first byte of the Redzone field. +After reporting the details of the issue encountered the @@@ SLUB message +tell us that SLUB has restored the redzone to its proper value and then +system operations continue. + +Various types of lines can follow the @@@ SLUB line: + +Bytes b4 <address> : <bytes> + Show a few bytes before the object where the problem was detected. + Can be useful if the corruption does not stop with the start of the + object. + +Object <address> : <bytes> + The bytes of the object. If the object is inactive then the bytes + typically contain poisoning values. Any non-poison value shows a + corruption by a write after free. + +Redzone <address> : <bytes> + The redzone following the object. The redzone is used to detect + writes after the object. All bytes should always have the same + value. If there is any deviation then it is due to a write after + the object boundary. + +Freepointer + The pointer to the next free object in the slab. May become + corrupted if overwriting continues after the red zone. + +Last alloc: +Last free: + Shows the address from which the object was allocated/freed last. + We note the pid, the time and the CPU that did so. This is usually + the most useful information to figure out where things went wrong. + Here get_modalias() did an kmalloc(8) instead of a kmalloc(9). + +Filler <address> : <bytes> + Unused data to fill up the space in order to get the next object + properly aligned. In the debug case we make sure that there are + at least 4 bytes of filler. This allow for the detection of writes + before the object. + +Following the filler will be a stackdump. That stackdump describes the +location where the error was detected. The cause of the corruption is more +likely to be found by looking at the information about the last alloc / free. + +Christoph Lameter, <clameter@sgi.com>, May 23, 2007 diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt index d9ee6336c1d4..4f68052395c0 100644 --- a/Documentation/watchdog/pcwd-watchdog.txt +++ b/Documentation/watchdog/pcwd-watchdog.txt @@ -1,3 +1,5 @@ +Last reviewed: 10/05/2007 + Berkshire Products PC Watchdog Card Support for ISA Cards Revision A and C Documentation and Driver by Ken Hollis <kenji@bitgate.com> @@ -14,8 +16,8 @@ The Watchdog Driver will automatically find your watchdog card, and will attach a running driver for use with that card. After the watchdog - drivers have initialized, you can then talk to the card using the PC - Watchdog program, available from http://ftp.bitgate.com/pcwd/. + drivers have initialized, you can then talk to the card using a PC + Watchdog program. I suggest putting a "watchdog -d" before the beginning of an fsck, and a "watchdog -e -t 1" immediately after the end of an fsck. (Remember @@ -62,5 +64,3 @@ -- Ken Hollis (kenji@bitgate.com) -(This documentation may be out of date. Check - http://ftp.bitgate.com/pcwd/ for the absolute latest additions.) diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index 8d16f6f3c4ec..bb7cb1d31ec7 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt @@ -1,3 +1,6 @@ +Last reviewed: 10/05/2007 + + The Linux Watchdog driver API. Copyright 2002 Christer Weingel <wingel@nano-system.com> @@ -22,7 +25,7 @@ the system. If userspace fails (RAM error, kernel bug, whatever), the notifications cease to occur, and the hardware watchdog will reset the system (causing a reboot) after the timeout occurs. -The Linux watchdog API is a rather AD hoc construction and different +The Linux watchdog API is a rather ad-hoc construction and different drivers implement different, and sometimes incompatible, parts of it. This file is an attempt to document the existing usage and allow future driver writers to use it as a reference. @@ -46,14 +49,16 @@ some of the drivers support the configuration option "Disable watchdog shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when compiling the kernel, there is no way of disabling the watchdog once it has been started. So, if the watchdog daemon crashes, the system -will reboot after the timeout has passed. +will reboot after the timeout has passed. Watchdog devices also usually +support the nowayout module parameter so that this option can be controlled +at runtime. -Some other drivers will not disable the watchdog, unless a specific -magic character 'V' has been sent /dev/watchdog just before closing -the file. If the userspace daemon closes the file without sending -this special character, the driver will assume that the daemon (and -userspace in general) died, and will stop pinging the watchdog without -disabling it first. This will then cause a reboot. +Drivers will not disable the watchdog, unless a specific magic character 'V' +has been sent /dev/watchdog just before closing the file. If the userspace +daemon closes the file without sending this special character, the driver +will assume that the daemon (and userspace in general) died, and will stop +pinging the watchdog without disabling it first. This will then cause a +reboot if the watchdog is not re-opened in sufficient time. The ioctl API: @@ -227,218 +232,3 @@ The following options are available: [FIXME -- better explanations] -Implementations in the current drivers in the kernel tree: - -Here I have tried to summarize what the different drivers support and -where they do strange things compared to the other drivers. - -acquirewdt.c -- Acquire Single Board Computer - - This driver has a hardcoded timeout of 1 minute - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if - the device is open, 0 if not. [FIXME -- isn't this rather - silly? To be able to use the ioctl, the device must be open - and so GETSTATUS will always return 1]. - -advantechwdt.c -- Advantech Single Board Computer - - Timeout that defaults to 60 seconds, supports SETTIMEOUT. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - [FIXME -- silliness again?] - -booke_wdt.c -- PowerPC BookE Watchdog Timer - - Timeout default varies according to frequency, supports - SETTIMEOUT - - Watchdog cannot be turned off, CONFIG_WATCHDOG_NOWAYOUT - does not make sense - - GETSUPPORT returns the watchdog_info struct, and - GETSTATUS returns the supported options. GETBOOTSTATUS - returns a 1 if the last reset was caused by the - watchdog and a 0 otherwise. This watchdog cannot be - disabled once it has been started. The wdt_period kernel - parameter selects which bit of the time base changing - from 0->1 will trigger the watchdog exception. Changing - the timeout from the ioctl calls will change the - wdt_period as defined above. Finally if you would like to - replace the default Watchdog Handler you can implement the - WatchdogHandler() function in your own code. - -eurotechwdt.c -- Eurotech CPU-1220/1410 - - The timeout can be set using the SETTIMEOUT ioctl and defaults - to 60 seconds. - - Also has a module parameter "ev", event type which controls - what should happen on a timeout, the string "int" or anything - else that causes a reboot. [FIXME -- better description] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but - GETSTATUS is not supported and GETBOOTSTATUS just returns 0. - -i810-tco.c -- Intel 810 chipset - - Also has support for a lot of other i8x0 stuff, but the - watchdog is one of the things. - - The timeout is set using the module parameter "i810_margin", - which is in steps of 0.6 seconds where 2<i810_margin<64. The - driver supports the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT. - - GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call - returns some kind of timer value which ist not compatible with - the other drivers. GETBOOT status returns some kind of - hardware specific boot status. [FIXME -- describe this] - -ib700wdt.c -- IB700 Single Board Computer - - Default timeout of 30 seconds and the timeout is settable - using the SETTIMEOUT ioctl. Note that only a few timeout - values are supported. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - [FIXME -- silliness again?] - -machzwd.c -- MachZ ZF-Logic - - Hardcoded timeout of 10 seconds - - Has a module parameter "action" that controls what happens - when the timeout runs out which can be 0 = RESET (default), - 1 = SMI, 2 = NMI, 3 = SCI. - - Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character - 'V' close handling. - - GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call - returns if the device is open or not. [FIXME -- silliness - again?] - -mixcomwd.c -- MixCom Watchdog - - [FIXME -- I'm unable to tell what the timeout is] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if - the device is opened or not [FIXME -- I'm not really sure how - this works, there seems to be some magic connected to - CONFIG_WATCHDOG_NOWAYOUT] - -pcwd.c -- Berkshire PC Watchdog - - Hardcoded timeout of 1.5 seconds - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both - GETSTATUS and GETBOOTSTATUS return something useful. - - The SETOPTIONS call can be used to enable and disable the card - and to ask the driver to call panic if the system overheats. - -sbc60xxwdt.c -- 60xx Single Board Computer - - Hardcoded timeout of 10 seconds - - Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic - character 'V' close handling. - - No bits set in GETSUPPORT - -scx200.c -- National SCx200 CPUs - - Not in the kernel yet. - - The timeout is set using a module parameter "margin" which - defaults to 60 seconds. The timeout can also be set using - SETTIMEOUT and read using GETTIMEOUT. - - Supports a module parameter "nowayout" that is initialized - with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the - magic character 'V' handling. - -shwdt.c -- SuperH 3/4 processors - - [FIXME -- I'm unable to tell what the timeout is] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call - returns if the device is open or not. [FIXME -- silliness - again?] - -softdog.c -- Software watchdog - - The timeout is set with the module parameter "soft_margin" - which defaults to 60 seconds, the timeout is also settable - using the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - WDIOF_SETTIMEOUT bit set in GETSUPPORT - -w83877f_wdt.c -- W83877F Computer - - Hardcoded timeout of 30 seconds - - Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic - character 'V' close handling. - - No bits set in GETSUPPORT - -w83627hf_wdt.c -- w83627hf watchdog - - Timeout that defaults to 60 seconds, supports SETTIMEOUT. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - -wdt.c -- ICS WDT500/501 ISA and -wdt_pci.c -- ICS WDT500/501 PCI - - Default timeout of 60 seconds. The timeout is also settable - using the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns with bits set depending on the actual - card. The WDT501 supports a lot of external monitoring, the - WDT500 much less. - -wdt285.c -- Footbridge watchdog - - The timeout is set with the module parameter "soft_margin" - which defaults to 60 seconds. The timeout is also settable - using the SETTIMEOUT ioctl. - - Does not support CONFIG_WATCHDOG_NOWAYOUT - - WDIOF_SETTIMEOUT bit set in GETSUPPORT - -wdt977.c -- Netwinder W83977AF chip - - Hardcoded timeout of 3 minutes - - Supports CONFIG_WATCHDOG_NOWAYOUT - - Does not support any ioctls at all. - diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt deleted file mode 100644 index 4b1ff69cc19a..000000000000 --- a/Documentation/watchdog/watchdog.txt +++ /dev/null @@ -1,94 +0,0 @@ - Watchdog Timer Interfaces For The Linux Operating System - - Alan Cox <alan@lxorguk.ukuu.org.uk> - - Custom Linux Driver And Program Development - - -The following watchdog drivers are currently implemented: - - ICS WDT501-P - ICS WDT501-P (no fan tachometer) - ICS WDT500-P - Software Only - SA1100 Internal Watchdog - Berkshire Products PC Watchdog Revision A & C (by Ken Hollis) - - -All six interfaces provide /dev/watchdog, which when open must be written -to within a timeout or the machine will reboot. Each write delays the reboot -time another timeout. In the case of the software watchdog the ability to -reboot will depend on the state of the machines and interrupts. The hardware -boards physically pull the machine down off their own onboard timers and -will reboot from almost anything. - -A second temperature monitoring interface is available on the WDT501P cards -and some Berkshire cards. This provides /dev/temperature. This is the machine -internal temperature in degrees Fahrenheit. Each read returns a single byte -giving the temperature. - -The third interface logs kernel messages on additional alert events. - -Both software and hardware watchdog drivers are available in the standard -kernel. If you are using the software watchdog, you probably also want -to use "panic=60" as a boot argument as well. - -The wdt card cannot be safely probed for. Instead you need to pass -wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". - -The SA1100 watchdog module can be configured with the "sa1100_margin" -commandline argument which specifies timeout value in seconds. - -The i810 TCO watchdog modules can be configured with the "i810_margin" -commandline argument which specifies the counter initial value. The counter -is decremented every 0.6 seconds and default to 50 (30 seconds). Values can -range between 3 and 63. - -The i810 TCO watchdog driver also implements the WDIOC_GETSTATUS and -WDIOC_GETBOOTSTATUS ioctl()s. WDIOC_GETSTATUS returns the actual counter value -and WDIOC_GETBOOTSTATUS returns the value of TCO2 Status Register (see Intel's -documentation for the 82801AA and 82801AB datasheet). - -Features --------- - WDT501P WDT500P Software Berkshire i810 TCO SA1100WD -Reboot Timer X X X X X X -External Reboot X X o o o X -I/O Port Monitor o o o X o o -Temperature X o o X o o -Fan Speed X o o o o o -Power Under X o o o o o -Power Over X o o o o o -Overheat X o o o o o - -The external event interfaces on the WDT boards are not currently supported. -Minor numbers are however allocated for it. - - -Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c - - -Contact Information - -People keep asking about the WDT watchdog timer hardware: The phone contacts -for Industrial Computer Source are: - -Industrial Computer Source -http://www.indcompsrc.com -ICS Advent, San Diego -6260 Sequence Dr. -San Diego, CA 92121-4371 -Phone (858) 677-0877 -FAX: (858) 677-0895 -> -ICS Advent Europe, UK -Oving Road -Chichester, -West Sussex, -PO19 4ET, UK -Phone: 00.44.1243.533900 - - -and please mention Linux when enquiring. - -For full information about the PCWD cards see the pcwd-watchdog.txt document. diff --git a/Documentation/watchdog/wdt.txt b/Documentation/watchdog/wdt.txt new file mode 100644 index 000000000000..03fd756d976d --- /dev/null +++ b/Documentation/watchdog/wdt.txt @@ -0,0 +1,43 @@ +Last Reviewed: 10/05/2007 + + WDT Watchdog Timer Interfaces For The Linux Operating System + Alan Cox <alan@lxorguk.ukuu.org.uk> + + ICS WDT501-P + ICS WDT501-P (no fan tachometer) + ICS WDT500-P + +All the interfaces provide /dev/watchdog, which when open must be written +to within a timeout or the machine will reboot. Each write delays the reboot +time another timeout. In the case of the software watchdog the ability to +reboot will depend on the state of the machines and interrupts. The hardware +boards physically pull the machine down off their own onboard timers and +will reboot from almost anything. + +A second temperature monitoring interface is available on the WDT501P cards +This provides /dev/temperature. This is the machine internal temperature in +degrees Fahrenheit. Each read returns a single byte giving the temperature. + +The third interface logs kernel messages on additional alert events. + +The wdt card cannot be safely probed for. Instead you need to pass +wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". + +Features +-------- + WDT501P WDT500P +Reboot Timer X X +External Reboot X X +I/O Port Monitor o o +Temperature X o +Fan Speed X o +Power Under X o +Power Over X o +Overheat X o + +The external event interfaces on the WDT boards are not currently supported. +Minor numbers are however allocated for it. + + +Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c + |