summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-kernel-vmcoreinfo14
-rw-r--r--Documentation/blockdev/ramdisk.txt21
-rw-r--r--Documentation/cpu-hotplug.txt2
-rw-r--r--Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt27
-rw-r--r--Documentation/devicetree/bindings/rtc/maxim,ds1742.txt12
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.txt1
-rw-r--r--Documentation/dynamic-debug-howto.txt9
-rw-r--r--Documentation/filesystems/00-INDEX58
-rw-r--r--Documentation/filesystems/nilfs2.txt56
-rw-r--r--Documentation/filesystems/sysfs.txt6
-rw-r--r--Documentation/kernel-parameters.txt11
-rw-r--r--Documentation/printk-formats.txt11
-rw-r--r--Documentation/sysctl/kernel.txt15
-rw-r--r--Documentation/trace/postprocess/trace-vmscan-postprocess.pl18
-rw-r--r--Documentation/vm/locking130
15 files changed, 225 insertions, 166 deletions
diff --git a/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo b/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
new file mode 100644
index 000000000000..7bd81168e063
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
@@ -0,0 +1,14 @@
+What: /sys/kernel/vmcoreinfo
+Date: October 2007
+KernelVersion: 2.6.24
+Contact: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
+ Kexec Mailing List <kexec@lists.infradead.org>
+ Vivek Goyal <vgoyal@redhat.com>
+Description
+ Shows physical address and size of vmcoreinfo ELF note.
+ First value contains physical address of note in hex and
+ second value contains the size of note in hex. This ELF
+ note info is parsed by second kernel and exported to user
+ space as part of ELF note in /proc/vmcore file. This note
+ contains various information like struct size, symbol
+ values, page size etc.
diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.txt
index fa72e97dd669..fe2ef978d85a 100644
--- a/Documentation/blockdev/ramdisk.txt
+++ b/Documentation/blockdev/ramdisk.txt
@@ -36,21 +36,30 @@ allowing one to squeeze more programs onto an average installation or
rescue floppy disk.
-2) Kernel Command Line Parameters
+2) Parameters
---------------------------------
+2a) Kernel Command Line Parameters
+
ramdisk_size=N
==============
This parameter tells the RAM disk driver to set up RAM disks of N k size. The
-default is 4096 (4 MB) (8192 (8 MB) on S390).
+default is 4096 (4 MB).
+
+2b) Module parameters
- ramdisk_blocksize=N
- ===================
+ rd_nr
+ =====
+ /dev/ramX devices created.
-This parameter tells the RAM disk driver how many bytes to use per block. The
-default is 1024 (BLOCK_SIZE).
+ max_part
+ ========
+ Maximum partition number.
+ rd_size
+ =======
+ See ramdisk_size.
3) Using "rdev -r"
------------------
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 8cb9938cc47e..be675d2d15a7 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -285,7 +285,7 @@ A: This is what you would need in your kernel code to receive notifications.
return NOTIFY_OK;
}
- static struct notifier_block foobar_cpu_notifer =
+ static struct notifier_block foobar_cpu_notifier =
{
.notifier_call = foobar_cpu_callback,
};
diff --git a/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt b/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt
new file mode 100644
index 000000000000..31406fd4a43e
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt
@@ -0,0 +1,27 @@
+Haoyu Microelectronics HYM8563 Real Time Clock
+
+The HYM8563 provides basic rtc and alarm functionality
+as well as a clock output of up to 32kHz.
+
+Required properties:
+- compatible: should be: "haoyu,hym8563"
+- reg: i2c address
+- interrupts: rtc alarm/event interrupt
+- #clock-cells: the value should be 0
+
+Example:
+
+hym8563: hym8563@51 {
+ compatible = "haoyu,hym8563";
+ reg = <0x51>;
+
+ interrupts = <13 IRQ_TYPE_EDGE_FALLING>;
+
+ #clock-cells = <0>;
+};
+
+device {
+...
+ clocks = <&hym8563>;
+...
+};
diff --git a/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt b/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt
new file mode 100644
index 000000000000..d0f937c355b5
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt
@@ -0,0 +1,12 @@
+* Maxim (Dallas) DS1742/DS1743 Real Time Clock
+
+Required properties:
+- compatible: Should contain "maxim,ds1742".
+- reg: Physical base address of the RTC and length of memory
+ mapped region.
+
+Example:
+ rtc: rtc@10000000 {
+ compatible = "maxim,ds1742";
+ reg = <0x10000000 0x800>;
+ };
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index ff415d183352..520596da7953 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -34,6 +34,7 @@ fsl Freescale Semiconductor
GEFanuc GE Fanuc Intelligent Platforms Embedded Systems, Inc.
gef GE Fanuc Intelligent Platforms Embedded Systems, Inc.
gmt Global Mixed-mode Technology, Inc.
+haoyu Haoyu Microelectronic Co. Ltd.
hisilicon Hisilicon Limited.
hp Hewlett Packard
ibm International Business Machines (IBM)
diff --git a/Documentation/dynamic-debug-howto.txt b/Documentation/dynamic-debug-howto.txt
index 1bbdcfcf1f13..46325eb2ea76 100644
--- a/Documentation/dynamic-debug-howto.txt
+++ b/Documentation/dynamic-debug-howto.txt
@@ -108,6 +108,12 @@ If your query set is big, you can batch them too:
~# cat query-batch-file > <debugfs>/dynamic_debug/control
+A another way is to use wildcard. The match rule support '*' (matches
+zero or more characters) and '?' (matches exactly one character).For
+example, you can match all usb drivers:
+
+ ~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
+
At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification.
@@ -315,6 +321,9 @@ nullarbor:~ # echo -n 'func svc_process -p' >
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
<debugfs>/dynamic_debug/control
+// enable messages in files of which the pathes include string "usb"
+nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
+
// enable all messages
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 8042050eb265..632211cbdd56 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -10,24 +10,32 @@ afs.txt
- info and examples for the distributed AFS (Andrew File System) fs.
affs.txt
- info and mount options for the Amiga Fast File System.
+autofs4-mount-control.txt
+ - info on device control operations for autofs4 module.
automount-support.txt
- information about filesystem automount support.
befs.txt
- information about the BeOS filesystem for Linux.
bfs.txt
- info for the SCO UnixWare Boot Filesystem (BFS).
+btrfs.txt
+ - info for the BTRFS filesystem.
+caching/
+ - directory containing filesystem cache documentation.
ceph.txt
- - info for the Ceph Distributed File System
-cifs.txt
- - description of the CIFS filesystem.
+ - info for the Ceph Distributed File System.
+cifs/
+ - directory containing CIFS filesystem documentation and example code.
coda.txt
- description of the CODA filesystem.
configfs/
- directory containing configfs documentation and example code.
cramfs.txt
- info on the cram filesystem for small storage (ROMs etc).
-dentry-locking.txt
- - info on the RCU-based dcache locking model.
+debugfs.txt
+ - info on the debugfs filesystem.
+devpts.txt
+ - info on the devpts filesystem.
directory-locking
- info about the locking scheme used for directory operations.
dlmfs.txt
@@ -35,7 +43,7 @@ dlmfs.txt
dnotify.txt
- info about directory notification in Linux.
dnotify_test.c
- - example program for dnotify
+ - example program for dnotify.
ecryptfs.txt
- docs on eCryptfs: stacked cryptographic filesystem for Linux.
efivarfs.txt
@@ -48,12 +56,18 @@ ext3.txt
- info, mount options and specifications for the Ext3 filesystem.
ext4.txt
- info, mount options and specifications for the Ext4 filesystem.
-files.txt
- - info on file management in the Linux kernel.
f2fs.txt
- info and mount options for the F2FS filesystem.
+fiemap.txt
+ - info on fiemap ioctl.
+files.txt
+ - info on file management in the Linux kernel.
fuse.txt
- info on the Filesystem in User SpacE including mount options.
+gfs2-glocks.txt
+ - info on the Global File System 2 - Glock internal locking rules.
+gfs2-uevents.txt
+ - info on the Global File System 2 - uevents.
gfs2.txt
- info on the Global File System 2.
hfs.txt
@@ -84,40 +98,58 @@ ntfs.txt
- info and mount options for the NTFS filesystem (Windows NT).
ocfs2.txt
- info and mount options for the OCFS2 clustered filesystem.
+omfs.txt
+ - info on the Optimized MPEG FileSystem.
+path-lookup.txt
+ - info on path walking and name lookup locking.
+pohmelfs/
+ - directory containing pohmelfs filesystem documentation.
porting
- various information on filesystem porting.
proc.txt
- info on Linux's /proc filesystem.
+qnx6.txt
+ - info on the QNX6 filesystem.
+quota.txt
+ - info on Quota subsystem.
ramfs-rootfs-initramfs.txt
- info on the 'in memory' filesystems ramfs, rootfs and initramfs.
-reiser4.txt
- - info on the Reiser4 filesystem based on dancing tree algorithms.
relay.txt
- info on relay, for efficient streaming from kernel to user space.
romfs.txt
- description of the ROMFS filesystem.
seq_file.txt
- - how to use the seq_file API
+ - how to use the seq_file API.
sharedsubtree.txt
- a description of shared subtrees for namespaces.
spufs.txt
- info and mount options for the SPU filesystem used on Cell.
+squashfs.txt
+ - info on the squashfs filesystem.
sysfs-pci.txt
- info on accessing PCI device resources through sysfs.
+sysfs-tagging.txt
+ - info on sysfs tagging to avoid duplicates.
sysfs.txt
- info on sysfs, a ram-based filesystem for exporting kernel objects.
sysv-fs.txt
- info on the SystemV/V7/Xenix/Coherent filesystem.
tmpfs.txt
- info on tmpfs, a filesystem that holds all files in virtual memory.
+ubifs.txt
+ - info on the Unsorted Block Images FileSystem.
udf.txt
- info and mount options for the UDF filesystem.
ufs.txt
- info on the ufs filesystem.
vfat.txt
- - info on using the VFAT filesystem used in Windows NT and Windows 95
+ - info on using the VFAT filesystem used in Windows NT and Windows 95.
vfs.txt
- - overview of the Virtual File System
+ - overview of the Virtual File System.
+xfs-delayed-logging-design.txt
+ - info on the XFS Delayed Logging Design.
+xfs-self-describing-metadata.txt
+ - info on XFS Self Describing Metadata.
xfs.txt
- info and mount options for the XFS filesystem.
xip.txt
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
index 873a2ab2e9f8..06887d46ccf2 100644
--- a/Documentation/filesystems/nilfs2.txt
+++ b/Documentation/filesystems/nilfs2.txt
@@ -81,6 +81,62 @@ nodiscard(*) The discard/TRIM commands are sent to the underlying
block device when blocks are freed. This is useful
for SSD devices and sparse/thinly-provisioned LUNs.
+Ioctls
+======
+
+There is some NILFS2 specific functionality which can be accessed by applications
+through the system call interfaces. The list of all NILFS2 specific ioctls are
+shown in the table below.
+
+Table of NILFS2 specific ioctls
+..............................................................................
+ Ioctl Description
+ NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between
+ checkpoint and snapshot state. This ioctl is
+ used in chcp and mkcp utilities.
+
+ NILFS_IOCTL_DELETE_CHECKPOINT Remove checkpoint from NILFS2 file system.
+ This ioctl is used in rmcp utility.
+
+ NILFS_IOCTL_GET_CPINFO Return info about requested checkpoints. This
+ ioctl is used in lscp utility and by
+ nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_CPSTAT Return checkpoints statistics. This ioctl is
+ used by lscp, rmcp utilities and by
+ nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_SUINFO Return segment usage info about requested
+ segments. This ioctl is used in lssu,
+ nilfs_resize utilities and by nilfs_cleanerd
+ daemon.
+
+ NILFS_IOCTL_GET_SUSTAT Return segment usage statistics. This ioctl
+ is used in lssu, nilfs_resize utilities and
+ by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_VINFO Return information on virtual block addresses.
+ This ioctl is used by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_BDESCS Return information about descriptors of disk
+ block numbers. This ioctl is used by
+ nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_CLEAN_SEGMENTS Do garbage collection operation in the
+ environment of requested parameters from
+ userspace. This ioctl is used by
+ nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_SYNC Make a checkpoint. This ioctl is used in
+ mkcp utility.
+
+ NILFS_IOCTL_RESIZE Resize NILFS2 volume. This ioctl is used
+ by nilfs_resize utility.
+
+ NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and
+ upper limit of segments in bytes. This ioctl
+ is used by nilfs_resize utility.
+
NILFS2 usage
============
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index a6619b7064b9..b35a64b82f9e 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -108,12 +108,12 @@ static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
is equivalent to doing:
static struct device_attribute dev_attr_foo = {
- .attr = {
+ .attr = {
.name = "foo",
.mode = S_IWUSR | S_IRUGO,
- .show = show_foo,
- .store = store_foo,
},
+ .show = show_foo,
+ .store = store_foo,
};
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d4762d7ebd14..44738564b2ee 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1059,7 +1059,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
debugfs files are removed at module unload time.
gpt [EFI] Forces disk with valid GPT signature but
- invalid Protective MBR to be treated as GPT.
+ invalid Protective MBR to be treated as GPT. If the
+ primary GPT is corrupted, it enables the backup/alternate
+ GPT to be used instead.
grcan.enable0= [HW] Configuration of physical interface 0. Determines
the "Enable 0" bit of the configuration register.
@@ -1461,6 +1463,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Valid arguments: on, off
Default: on
+ kmemcheck= [X86] Boot-time kmemcheck enable/disable/one-shot mode
+ Valid arguments: 0, 1, 2
+ kmemcheck=0 (disabled)
+ kmemcheck=1 (enabled)
+ kmemcheck=2 (one-shot mode)
+ Default: 2 (one-shot mode)
+
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 445ad743ec81..6f4eb322ffaf 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -55,14 +55,21 @@ Struct Resources:
For printing struct resources. The 'R' and 'r' specifiers result in a
printed resource with ('R') or without ('r') a decoded flags member.
-Physical addresses:
+Physical addresses types phys_addr_t:
- %pa 0x01234567 or 0x0123456789abcdef
+ %pa[p] 0x01234567 or 0x0123456789abcdef
For printing a phys_addr_t type (and its derivatives, such as
resource_size_t) which can vary based on build options, regardless of
the width of the CPU data path. Passed by reference.
+DMA addresses types dma_addr_t:
+
+ %pad 0x01234567 or 0x0123456789abcdef
+
+ For printing a dma_addr_t type which can vary based on build options,
+ regardless of the width of the CPU data path. Passed by reference.
+
Raw buffer as a hex string:
%*ph 00 01 02 ... 3f
%*phC 00:01:02: ... :3f
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 6d486404200e..ee9a2f983b99 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -33,6 +33,7 @@ show up in /proc/sys/kernel:
- domainname
- hostname
- hotplug
+- kexec_load_disabled
- kptr_restrict
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ]
@@ -287,6 +288,18 @@ Default value is "/sbin/hotplug".
==============================================================
+kexec_load_disabled:
+
+A toggle indicating if the kexec_load syscall has been disabled. This
+value defaults to 0 (false: kexec_load enabled), but can be set to 1
+(true: kexec_load disabled). Once true, kexec can no longer be used, and
+the toggle cannot be set back to false. This allows a kexec image to be
+loaded before disabling the syscall, allowing a system to set up (and
+later use) an image without it being altered. Generally used together
+with the "modules_disabled" sysctl.
+
+==============================================================
+
kptr_restrict:
This toggle indicates whether restrictions are placed on
@@ -331,7 +344,7 @@ A toggle value indicating if modules are allowed to be loaded
in an otherwise modular kernel. This toggle defaults to off
(0), but can be set true (1). Once true, modules can be
neither loaded nor unloaded, and the toggle cannot be set back
-to false.
+to false. Generally used with the "kexec_load_disabled" toggle.
==============================================================
diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
index 4a37c4759cd2..00e425faa2fd 100644
--- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
@@ -123,7 +123,7 @@ my $regex_writepage;
# Static regex used. Specified like this for readability and for use with /o
# (process_pid) (cpus ) ( time ) (tpoint ) (details)
-my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
+my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])(\s*[dX.][Nnp.][Hhs.][0-9a-fA-F.]*|)\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
my $regex_statname = '[-0-9]*\s\((.*)\).*';
my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*';
@@ -270,8 +270,8 @@ EVENT_PROCESS:
while ($traceevent = <STDIN>) {
if ($traceevent =~ /$regex_traceevent/o) {
$process_pid = $1;
- $timestamp = $3;
- $tracepoint = $4;
+ $timestamp = $4;
+ $tracepoint = $5;
$process_pid =~ /(.*)-([0-9]*)$/;
my $process = $1;
@@ -299,7 +299,7 @@ EVENT_PROCESS:
$perprocesspid{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}++;
$perprocesspid{$process_pid}->{STATE_DIRECT_BEGIN} = $timestamp;
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_direct_begin/o) {
print "WARNING: Failed to parse mm_vmscan_direct_reclaim_begin as expected\n";
print " $details\n";
@@ -322,7 +322,7 @@ EVENT_PROCESS:
$perprocesspid{$process_pid}->{HIGH_DIRECT_RECLAIM_LATENCY}[$index] = "$order-$latency";
}
} elsif ($tracepoint eq "mm_vmscan_kswapd_wake") {
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_kswapd_wake/o) {
print "WARNING: Failed to parse mm_vmscan_kswapd_wake as expected\n";
print " $details\n";
@@ -356,7 +356,7 @@ EVENT_PROCESS:
} elsif ($tracepoint eq "mm_vmscan_wakeup_kswapd") {
$perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}++;
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_wakeup_kswapd/o) {
print "WARNING: Failed to parse mm_vmscan_wakeup_kswapd as expected\n";
print " $details\n";
@@ -366,7 +366,7 @@ EVENT_PROCESS:
my $order = $3;
$perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD_PERORDER}[$order]++;
} elsif ($tracepoint eq "mm_vmscan_lru_isolate") {
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_lru_isolate/o) {
print "WARNING: Failed to parse mm_vmscan_lru_isolate as expected\n";
print " $details\n";
@@ -387,7 +387,7 @@ EVENT_PROCESS:
}
$perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty;
} elsif ($tracepoint eq "mm_vmscan_lru_shrink_inactive") {
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_lru_shrink_inactive/o) {
print "WARNING: Failed to parse mm_vmscan_lru_shrink_inactive as expected\n";
print " $details\n";
@@ -397,7 +397,7 @@ EVENT_PROCESS:
my $nr_reclaimed = $4;
$perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED} += $nr_reclaimed;
} elsif ($tracepoint eq "mm_vmscan_writepage") {
- $details = $5;
+ $details = $6;
if ($details !~ /$regex_writepage/o) {
print "WARNING: Failed to parse mm_vmscan_writepage as expected\n";
print " $details\n";
diff --git a/Documentation/vm/locking b/Documentation/vm/locking
deleted file mode 100644
index f61228bd6395..000000000000
--- a/Documentation/vm/locking
+++ /dev/null
@@ -1,130 +0,0 @@
-Started Oct 1999 by Kanoj Sarcar <kanojsarcar@yahoo.com>
-
-The intent of this file is to have an uptodate, running commentary
-from different people about how locking and synchronization is done
-in the Linux vm code.
-
-page_table_lock & mmap_sem
---------------------------------------
-
-Page stealers pick processes out of the process pool and scan for
-the best process to steal pages from. To guarantee the existence
-of the victim mm, a mm_count inc and a mmdrop are done in swap_out().
-Page stealers hold kernel_lock to protect against a bunch of races.
-The vma list of the victim mm is also scanned by the stealer,
-and the page_table_lock is used to preserve list sanity against the
-process adding/deleting to the list. This also guarantees existence
-of the vma. Vma existence is not guaranteed once try_to_swap_out()
-drops the page_table_lock. To guarantee the existence of the underlying
-file structure, a get_file is done before the swapout() method is
-invoked. The page passed into swapout() is guaranteed not to be reused
-for a different purpose because the page reference count due to being
-present in the user's pte is not released till after swapout() returns.
-
-Any code that modifies the vmlist, or the vm_start/vm_end/
-vm_flags:VM_LOCKED/vm_next of any vma *in the list* must prevent
-kswapd from looking at the chain.
-
-The rules are:
-1. To scan the vmlist (look but don't touch) you must hold the
- mmap_sem with read bias, i.e. down_read(&mm->mmap_sem)
-2. To modify the vmlist you need to hold the mmap_sem with
- read&write bias, i.e. down_write(&mm->mmap_sem) *AND*
- you need to take the page_table_lock.
-3. The swapper takes _just_ the page_table_lock, this is done
- because the mmap_sem can be an extremely long lived lock
- and the swapper just cannot sleep on that.
-4. The exception to this rule is expand_stack, which just
- takes the read lock and the page_table_lock, this is ok
- because it doesn't really modify fields anybody relies on.
-5. You must be able to guarantee that while holding page_table_lock
- or page_table_lock of mm A, you will not try to get either lock
- for mm B.
-
-The caveats are:
-1. find_vma() makes use of, and updates, the mmap_cache pointer hint.
-The update of mmap_cache is racy (page stealer can race with other code
-that invokes find_vma with mmap_sem held), but that is okay, since it
-is a hint. This can be fixed, if desired, by having find_vma grab the
-page_table_lock.
-
-
-Code that add/delete elements from the vmlist chain are
-1. callers of insert_vm_struct
-2. callers of merge_segments
-3. callers of avl_remove
-
-Code that changes vm_start/vm_end/vm_flags:VM_LOCKED of vma's on
-the list:
-1. expand_stack
-2. mprotect
-3. mlock
-4. mremap
-
-It is advisable that changes to vm_start/vm_end be protected, although
-in some cases it is not really needed. Eg, vm_start is modified by
-expand_stack(), it is hard to come up with a destructive scenario without
-having the vmlist protection in this case.
-
-The page_table_lock nests with the inode i_mmap_mutex and the kmem cache
-c_spinlock spinlocks. This is okay, since the kmem code asks for pages after
-dropping c_spinlock. The page_table_lock also nests with pagecache_lock and
-pagemap_lru_lock spinlocks, and no code asks for memory with these locks
-held.
-
-The page_table_lock is grabbed while holding the kernel_lock spinning monitor.
-
-The page_table_lock is a spin lock.
-
-Note: PTL can also be used to guarantee that no new clones using the
-mm start up ... this is a loose form of stability on mm_users. For
-example, it is used in copy_mm to protect against a racing tlb_gather_mmu
-single address space optimization, so that the zap_page_range (from
-truncate) does not lose sending ipi's to cloned threads that might
-be spawned underneath it and go to user mode to drag in pte's into tlbs.
-
-swap_lock
---------------
-The swap devices are chained in priority order from the "swap_list" header.
-The "swap_list" is used for the round-robin swaphandle allocation strategy.
-The #free swaphandles is maintained in "nr_swap_pages". These two together
-are protected by the swap_lock.
-
-The swap_lock also protects all the device reference counts on the
-corresponding swaphandles, maintained in the "swap_map" array, and the
-"highest_bit" and "lowest_bit" fields.
-
-The swap_lock is a spinlock, and is never acquired from intr level.
-
-To prevent races between swap space deletion or async readahead swapins
-deciding whether a swap handle is being used, ie worthy of being read in
-from disk, and an unmap -> swap_free making the handle unused, the swap
-delete and readahead code grabs a temp reference on the swaphandle to
-prevent warning messages from swap_duplicate <- read_swap_cache_async.
-
-Swap cache locking
-------------------
-Pages are added into the swap cache with kernel_lock held, to make sure
-that multiple pages are not being added (and hence lost) by associating
-all of them with the same swaphandle.
-
-Pages are guaranteed not to be removed from the scache if the page is
-"shared": ie, other processes hold reference on the page or the associated
-swap handle. The only code that does not follow this rule is shrink_mmap,
-which deletes pages from the swap cache if no process has a reference on
-the page (multiple processes might have references on the corresponding
-swap handle though). lookup_swap_cache() races with shrink_mmap, when
-establishing a reference on a scache page, so, it must check whether the
-page it located is still in the swapcache, or shrink_mmap deleted it.
-(This race is due to the fact that shrink_mmap looks at the page ref
-count with pagecache_lock, but then drops pagecache_lock before deleting
-the page from the scache).
-
-do_wp_page and do_swap_page have MP races in them while trying to figure
-out whether a page is "shared", by looking at the page_count + swap_count.
-To preserve the sum of the counts, the page lock _must_ be acquired before
-calling is_page_shared (else processes might switch their swap_count refs
-to the page count refs, after the page count ref has been snapshotted).
-
-Swap device deletion code currently breaks all the scache assumptions,
-since it grabs neither mmap_sem nor page_table_lock.