diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/Locking | 3 | ||||
-rw-r--r-- | Documentation/filesystems/nfs-rdma.txt | 256 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 59 | ||||
-rw-r--r-- | Documentation/filesystems/seq_file.txt | 19 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.txt | 12 | ||||
-rw-r--r-- | Documentation/filesystems/vfat.txt | 15 |
6 files changed, 353 insertions, 11 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 42d4b30b1045..c2992bc54f2f 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -511,7 +511,6 @@ prototypes: void (*open)(struct vm_area_struct*); void (*close)(struct vm_area_struct*); int (*fault)(struct vm_area_struct*, struct vm_fault *); - struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *); int (*page_mkwrite)(struct vm_area_struct *, struct page *); locking rules: @@ -519,7 +518,6 @@ locking rules: open: no yes close: no yes fault: no yes -nopage: no yes page_mkwrite: no yes no ->page_mkwrite() is called when a previously read-only page is @@ -537,4 +535,3 @@ NULL. ipc/shm.c::shm_delete() - may need BKL. ->read() and ->write() in many drivers are (probably) missing BKL. -drivers/sgi/char/graphics.c::sgi_graphics_nopage() - may need BKL. diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt new file mode 100644 index 000000000000..d0ec45ae4e7d --- /dev/null +++ b/Documentation/filesystems/nfs-rdma.txt @@ -0,0 +1,256 @@ +################################################################################ +# # +# NFS/RDMA README # +# # +################################################################################ + + Author: NetApp and Open Grid Computing + Date: April 15, 2008 + +Table of Contents +~~~~~~~~~~~~~~~~~ + - Overview + - Getting Help + - Installation + - Check RDMA and NFS Setup + - NFS/RDMA Setup + +Overview +~~~~~~~~ + + This document describes how to install and setup the Linux NFS/RDMA client + and server software. + + The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server + was first included in the following release, Linux 2.6.25. + + In our testing, we have obtained excellent performance results (full 10Gbit + wire bandwidth at minimal client CPU) under many workloads. The code passes + the full Connectathon test suite and operates over both Infiniband and iWARP + RDMA adapters. + +Getting Help +~~~~~~~~~~~~ + + If you get stuck, you can ask questions on the + + nfs-rdma-devel@lists.sourceforge.net + + mailing list. + +Installation +~~~~~~~~~~~~ + + These instructions are a step by step guide to building a machine for + use with NFS/RDMA. + + - Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + + - Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + + - Install nfs-utils-1.1.1 or greater on the client + + An NFS/RDMA mount point can only be obtained by using the mount.nfs + command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs + you are using, type: + + > /sbin/mount.nfs -V + + If the version is less than 1.1.1 or the command does not exist, + then you will need to install the latest version of nfs-utils. + + Download the latest package from: + + http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not be using GSS and NFSv4, the installation process + can be simplified by disabling these features when running configure: + + > ./configure --disable-gss --disable-nfsv4 + + For more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. + The standard technique is to create a symlink called mount.nfs4 to mount.nfs. + + NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.1 is needed on the client. + + - Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the 2.6 Linux + kernel can be found at: + + ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ + + Download the sources and place them in an appropriate location. + + - Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + + - Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + + - Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +~~~~~~~~~~~~~~~~~~~~~~~~ + + Before configuring the NFS/RDMA software, it is a good idea to test + your new kernel to ensure that the kernel is working correctly. + In particular, it is a good idea to verify that the RDMA stack + is functioning as expected and standard NFS over TCP/IP and/or UDP/IP + is working properly. + + - Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + > modprobe ib_mthca + > modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + > cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + host1> ifconfig ib0 a.b.c.x + host2> ifconfig ib0 a.b.c.y + host1> ping a.b.c.y + host2> ping a.b.c.x + + For other device types, follow the appropriate procedures. + + - Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +~~~~~~~~~~~~~~ + + We recommend that you use two machines, one to act as the client and + one to act as the server. + + One time configuration: + + - On the server system, configure the /etc/exports file and + start the NFS/RDMA server. + + Exports entries with the following formats have been tested: + + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) + + The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the + cleint's iWARP address(es) for an RNIC. + + NOTE: The "insecure" option must be used because the NFS/RDMA client does not + use a reserved port. + + Each time a machine boots: + + - Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + > modprobe ib_mthca + > modprobe ib_ipoib + > ifconfig ib0 a.b.c.d + + NOTE: use unique addresses for the client and server + + - Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), + load the RDMA transport module: + + > modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the server: + + > /etc/init.d/nfs start + + or + + > service nfs start + + Instruct the server to listen on the RDMA transport: + + > echo rdma 2050 > /proc/fs/nfsd/portlist + + - On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), + load the RDMA client module: + + > modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), issue the mount.nfs command: + + > /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050 + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check the + "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 518ebe609e2b..dbc3c6a3650f 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -43,6 +43,7 @@ Table of Contents 2.13 /proc/<pid>/oom_score - Display current oom-killer score 2.14 /proc/<pid>/io - Display the IO accounting fields 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings + 2.16 /proc/<pid>/mountinfo - Information about mounts ------------------------------------------------------------------------------ Preface @@ -462,11 +463,17 @@ SwapTotal: 0 kB SwapFree: 0 kB Dirty: 968 kB Writeback: 0 kB +AnonPages: 861800 kB Mapped: 280372 kB -Slab: 684068 kB +Slab: 284364 kB +SReclaimable: 159856 kB +SUnreclaim: 124508 kB +PageTables: 24448 kB +NFS_Unstable: 0 kB +Bounce: 0 kB +WritebackTmp: 0 kB CommitLimit: 7669796 kB Committed_AS: 100056 kB -PageTables: 24448 kB VmallocTotal: 112216 kB VmallocUsed: 428 kB VmallocChunk: 111088 kB @@ -502,8 +509,17 @@ VmallocChunk: 111088 kB on the disk Dirty: Memory which is waiting to get written back to the disk Writeback: Memory which is actively being written back to the disk + AnonPages: Non-file backed pages mapped into userspace page tables Mapped: files which have been mmaped, such as libraries Slab: in-kernel data structures cache +SReclaimable: Part of Slab, that might be reclaimed, such as caches + SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure + PageTables: amount of memory dedicated to the lowest level of page + tables. +NFS_Unstable: NFS pages sent to the server, but not yet committed to stable + storage + Bounce: Memory used for block device "bounce buffers" +WritebackTmp: Memory used by FUSE for temporary writeback buffers CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to @@ -530,8 +546,6 @@ Committed_AS: The amount of memory presently allocated on the system. above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated. - PageTables: amount of memory dedicated to the lowest level of page - tables. VmallocTotal: total size of vmalloc memory area VmallocUsed: amount of vmalloc area which is used VmallocChunk: largest contigious block of vmalloc area which is free @@ -2348,4 +2362,41 @@ For example: $ echo 0x7 > /proc/self/coredump_filter $ ./some_program +2.16 /proc/<pid>/mountinfo - Information about mounts +-------------------------------------------------------- + +This file contains lines of the form: + +36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue +(1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) + +(1) mount ID: unique identifier of the mount (may be reused after umount) +(2) parent ID: ID of parent (or of self for the top of the mount tree) +(3) major:minor: value of st_dev for files on filesystem +(4) root: root of the mount within the filesystem +(5) mount point: mount point relative to the process's root +(6) mount options: per mount options +(7) optional fields: zero or more fields of the form "tag[:value]" +(8) separator: marks the end of the optional fields +(9) filesystem type: name of filesystem of the form "type[.subtype]" +(10) mount source: filesystem specific information or "none" +(11) super options: per super block options + +Parsers should ignore all unrecognised optional fields. Currently the +possible optional fields are: + +shared:X mount is shared in peer group X +master:X mount is slave to peer group X +propagate_from:X mount is slave and receives propagation from peer group X (*) +unbindable mount is unbindable + +(*) X is the closest dominant peer group under the process's root. If +X is the immediate master of the mount, or if there's no dominant peer +group under the same root, then only the "master:X" field is present +and not the "propagate_from:X" field. + +For more information on mount propagation see: + + Documentation/filesystems/sharedsubtree.txt + ------------------------------------------------------------------------------ diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index 7fb8e6dc62bf..b843743aa0b5 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt @@ -122,8 +122,7 @@ stop() is the place to free it. } Finally, the show() function should format the object currently pointed to -by the iterator for output. It should return zero, or an error code if -something goes wrong. The example module's show() function is: +by the iterator for output. The example module's show() function is: static int ct_seq_show(struct seq_file *s, void *v) { @@ -132,6 +131,12 @@ something goes wrong. The example module's show() function is: return 0; } +If all is well, the show() function should return zero. A negative error +code in the usual manner indicates that something went wrong; it will be +passed back to user space. This function can also return SEQ_SKIP, which +causes the current item to be skipped; if the show() function has already +generated output before returning SEQ_SKIP, that output will be dropped. + We will look at seq_printf() in a moment. But first, the definition of the seq_file iterator is finished by creating a seq_operations structure with the four functions we have just defined: @@ -182,12 +187,18 @@ The first two output a single character and a string, just like one would expect. seq_escape() is like seq_puts(), except that any character in s which is in the string esc will be represented in octal form in the output. -There is also a function for printing filenames: +There is also a pair of functions for printing filenames: int seq_path(struct seq_file *m, struct path *path, char *esc); + int seq_path_root(struct seq_file *m, struct path *path, + struct path *root, char *esc) Here, path indicates the file of interest, and esc is a set of characters -which should be escaped in the output. +which should be escaped in the output. A call to seq_path() will output +the path relative to the current process's filesystem root. If a different +root is desired, it can be used with seq_path_root(). Note that, if it +turns out that path cannot be reached from root, the value of root will be +changed in seq_file_root() to a root which *does* work. Making it all work diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 145e44086358..222437efd75a 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -92,6 +92,18 @@ NodeList format is a comma-separated list of decimal numbers and ranges, a range being two hyphen-separated decimal numbers, the smallest and largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 +NUMA memory allocation policies have optional flags that can be used in +conjunction with their modes. These optional flags can be specified +when tmpfs is mounted by appending them to the mode before the NodeList. +See Documentation/vm/numa_memory_policy.txt for a list of all available +memory allocation policy mode flags. + + =static is equivalent to MPOL_F_STATIC_NODES + =relative is equivalent to MPOL_F_RELATIVE_NODES + +For example, mpol=bind=static:NodeList, is the equivalent of an +allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES. + Note that trying to mount a tmpfs with an mpol option will fail if the running kernel does not support NUMA; and will fail if its nodelist specifies a node which is not online. If your system relies on that diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index fcc123ffa252..2d5e1e582e13 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -17,6 +17,21 @@ dmask=### -- The permission mask for the directory. fmask=### -- The permission mask for files. The default is the umask of current process. +allow_utime=### -- This option controls the permission check of mtime/atime. + + 20 - If current process is in group of file's group ID, + you can change timestamp. + 2 - Other users can change timestamp. + + The default is set from `dmask' option. (If the directory is + writable, utime(2) is also allowed. I.e. ~dmask & 022) + + Normally utime(2) checks current process is owner of + the file, or it has CAP_FOWNER capability. But FAT + filesystem doesn't have uid/gid on disk, so normal + check is too unflexible. With this option you can + relax it. + codepage=### -- Sets the codepage number for converting to shortname characters on FAT filesystem. By default, FAT_DEFAULT_CODEPAGE setting is used. |