diff options
Diffstat (limited to 'Documentation/admin-guide/nfs')
-rw-r--r-- | Documentation/admin-guide/nfs/fault_injection.rst | 70 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/index.rst | 15 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/nfs-client.rst | 141 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/nfs-idmapper.rst | 78 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/nfs-rdma.rst | 292 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst | 40 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/nfsroot.rst | 364 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/pnfs-block-server.rst | 42 | ||||
-rw-r--r-- | Documentation/admin-guide/nfs/pnfs-scsi-server.rst | 24 |
9 files changed, 1066 insertions, 0 deletions
diff --git a/Documentation/admin-guide/nfs/fault_injection.rst b/Documentation/admin-guide/nfs/fault_injection.rst new file mode 100644 index 000000000000..eb029c0c15ce --- /dev/null +++ b/Documentation/admin-guide/nfs/fault_injection.rst @@ -0,0 +1,70 @@ +=================== +NFS Fault Injection +=================== + +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce. Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system. Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to <debug_dir> and ls +<debug_dir>/nfsd. This will show a list of files that will be used for +injecting faults on the NFS server. As root, write a number n to the file +corresponding to the action you want the server to take. The server will then +process the first n items it finds. So if you want to forget 5 locks, echo '5' +to <debug_dir>/nfsd/forget_locks. A value of 0 will tell the server to forget +all corresponding items. A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: + The NFS server keeps a list of clients that have placed a mount call. If + this list is cleared, the server will have no knowledge of who the client + is, forcing the client to reauthenticate with the server. + +forget_openowners: + The NFS server keeps a list of what files are currently opened and who + they were opened by. Clearing this list will force the client to reopen + its files. + +forget_locks: + The NFS server keeps a list of what files are currently locked in the VFS. + Clearing this list will force the client to reclaim its locks (files are + unlocked through the VFS as they are cleared from this list). + +forget_delegations: + A delegation is used to assure the client that a file, or part of a file, + has not changed since the delegation was awarded. Clearing this list will + force the client to reacquire its delegation before accessing the file + again. + +recall_delegations: + Delegations can be recalled by the server when another client attempts to + access a file. This test will notify the client that its delegation has + been revoked, forcing the client to reacquire the delegation before using + the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process. This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user. For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock. Running `inject_faults forget_locks` will instruct the server to +forgetall locks. diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst new file mode 100644 index 000000000000..6b5a3c90fac5 --- /dev/null +++ b/Documentation/admin-guide/nfs/index.rst @@ -0,0 +1,15 @@ +============= +NFS +============= + +.. toctree:: + :maxdepth: 1 + + nfs-client + nfsroot + nfs-rdma + nfsd-admin-interfaces + nfs-idmapper + pnfs-block-server + pnfs-scsi-server + fault_injection diff --git a/Documentation/admin-guide/nfs/nfs-client.rst b/Documentation/admin-guide/nfs/nfs-client.rst new file mode 100644 index 000000000000..c4b777c7584b --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-client.rst @@ -0,0 +1,141 @@ +========== +NFS Client +========== + +The NFS client +============== + +The NFS version 2 protocol was first documented in RFC1094 (March 1989). +Since then two more major releases of NFS have been published, with NFSv3 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April +2003). + +The Linux NFS client currently supports all the above published versions, +and work is in progress on adding support for minor version 1 of the NFSv4 +protocol. + +The purpose of this document is to provide information on some of the +special features of the NFS client that can be configured by system +administrators. + + +The nfs4_unique_id parameter +============================ + +NFSv4 requires clients to identify themselves to servers with a unique +string. File open and lock state shared between one client and one server +is associated with this identity. To support robust NFSv4 state recovery +and transparent state migration, this identity string must not change +across client reboots. + +Without any other intervention, the Linux client uses a string that contains +the local system's node name. System administrators, however, often do not +take care to ensure that node names are fully qualified and do not change +over the lifetime of a client system. Node names can have other +administrative requirements that require particular behavior that does not +work well as part of an nfs_client_id4 string. + +The nfs.nfs4_unique_id boot parameter specifies a unique string that can be +used instead of a system's node name when an NFS client identifies itself to +a server. Thus, if the system's node name is not unique, or it changes, its +nfs.nfs4_unique_id stays the same, preventing collision with other clients +or loss of state during NFS reboot recovery or transparent state migration. + +The nfs.nfs4_unique_id string is typically a UUID, though it can contain +anything that is believed to be unique across all NFS clients. An +nfs4_unique_id string should be chosen when a client system is installed, +just as a system's root file system gets a fresh UUID in its label at +install time. + +The string should remain fixed for the lifetime of the client. It can be +changed safely if care is taken that the client shuts down cleanly and all +outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. + +This string can be stored in an NFS client's grub.conf, or it can be provided +via a net boot facility such as PXE. It may also be specified as an nfs.ko +module parameter. Specifying a uniquifier string is not support for NFS +clients running in containers. + + +The DNS resolver +================ + +NFSv4 allows for one server to refer the NFS client to data that has been +migrated onto another server by means of the special "fs_locations" +attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and +`Implementation Guide for Referrals in NFSv4`_. + +.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6 +.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 + +The fs_locations information can take the form of either an ip address and +a path, or a DNS hostname and a path. The latter requires the NFS client to +do a DNS lookup in order to mount the new volume, and hence the need for an +upcall to allow userland to provide this service. + +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: + + (1) The process checks the dns_resolve cache to see if it contains a + valid entry. If so, it returns that entry and exits. + + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' + (may be changed using the 'nfs.cache_getent' kernel boot parameter) + is run, with two arguments: + - the cache name, "dns_resolve" + - the hostname to resolve + + (3) After looking up the corresponding ip address, the helper script + writes the result into the rpc_pipefs pseudo-file + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' + in the following (text) format: + + "<ip address> <hostname> <ttl>\n" + + Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. + <hostname> is identical to the second argument of the helper + script, and <ttl> is the 'time to live' of this cache entry (in + units of seconds). + + .. note:: + If <ip address> is invalid, say the string "0", then a negative + entry is created, which will cause the kernel to treat the hostname + as having no valid DNS translation. + + + + +A basic sample /sbin/nfs_cache_getent +===================================== +.. code-block:: sh + + #!/bin/bash + # + ttl=600 + # + cut=/usr/bin/cut + getent=/usr/bin/getent + rpc_pipefs=/var/lib/nfs/rpc_pipefs + # + die() + { + echo "Usage: $0 cache_name entry_name" + exit 1 + } + + [ $# -lt 2 ] && die + cachename="$1" + cache_path=${rpc_pipefs}/cache/${cachename}/channel + + case "${cachename}" in + dns_resolve) + name="$2" + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" + [ -z "${result}" ] && result="0" + ;; + *) + die + ;; + esac + echo "${result} ${name} ${ttl}" >${cache_path} diff --git a/Documentation/admin-guide/nfs/nfs-idmapper.rst b/Documentation/admin-guide/nfs/nfs-idmapper.rst new file mode 100644 index 000000000000..58b8e63412d5 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-idmapper.rst @@ -0,0 +1,78 @@ +============= +NFS ID Mapper +============= + +Id mapper is used by NFS to translate user and group ids into names, and to +translate user and group names into ids. Part of this translation involves +performing an upcall to userspace to request the information. There are two +ways NFS could obtain this information: placing a call to /sbin/request-key +or by placing a call to the rpc.idmap daemon. + +NFS will attempt to call /sbin/request-key first. If this succeeds, the +result will be cached using the generic request-key cache. This call should +only fail if /etc/request-key.conf is not configured for the id_resolver key +type, see the "Configuring" section below if you wish to use the request-key +method. + +If the call to /sbin/request-key fails (if /etc/request-key.conf is not +configured with the id_resolver key type), then the idmapper will ask the +legacy rpc.idmap daemon for the id mapping. This result will be stored +in a custom NFS idmap cache. + + +Configuring +=========== + +The file /etc/request-key.conf will need to be modified so /sbin/request-key can +direct the upcall. The following line should be added: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. +The last parameter, 600, defines how many seconds into the future the key will +expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout +is not specified, nfs.idmap will default to 600 seconds. + +id mapper uses for key descriptions:: + + uid: Find the UID for the given user + gid: Find the GID for the given group + user: Find the user name for the given UID + group: Find the group name for the given GID + +You can handle any of these individually, rather than using the generic upcall +program. If you would like to use your own program for a uid lookup then you +would edit your request-key.conf so it look similar to this: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver uid:* * /some/other/program %k %d 600`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +Notice that the new line was added above the line for the generic program. +request-key will find the first matching line and corresponding program. In +this case, /some/other/program will handle all uid lookups and +/usr/sbin/nfs.idmap will handle gid, user, and group lookups. + +See Documentation/security/keys/request-key.rst for more information +about the request-key function. + + +nfs.idmap +========= + +nfs.idmap is designed to be called by request-key, and should not be run "by +hand". This program takes two arguments, a serialized key and a key +description. The serialized key is first converted into a key_serial_t, and +then passed as an argument to keyctl_instantiate (both are part of keyutils.h). + +The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap +determines the correct function to call by looking at the first part of the +description string. For example, a uid lookup description will appear as +"uid:user@domain". + +nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise. diff --git a/Documentation/admin-guide/nfs/nfs-rdma.rst b/Documentation/admin-guide/nfs/nfs-rdma.rst new file mode 100644 index 000000000000..ef0f3678b1fb --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-rdma.rst @@ -0,0 +1,292 @@ +=================== +Setting up NFS/RDMA +=================== + +:Author: + NetApp and Open Grid Computing (May 29, 2008) + +.. warning:: + This document is probably obsolete. + +Overview +======== + +This document describes how to install and setup the Linux NFS/RDMA client +and server software. + +The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server +was first included in the following release, Linux 2.6.25. + +In our testing, we have obtained excellent performance results (full 10Gbit +wire bandwidth at minimal client CPU) under many workloads. The code passes +the full Connectathon test suite and operates over both Infiniband and iWARP +RDMA adapters. + +Getting Help +============ + +If you get stuck, you can ask questions on the +nfs-rdma-devel@lists.sourceforge.net mailing list. + +Installation +============ + +These instructions are a step by step guide to building a machine for +use with NFS/RDMA. + +- Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + +- Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + +- Install nfs-utils-1.1.2 or greater on the client + + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: + + .. code-block:: sh + + $ /sbin/mount.nfs -V + + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. + + Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: + + .. code-block:: sh + + $ ./configure --disable-gss --disable-nfsv4 + + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. + + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + .. code-block:: sh + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount command. + + .. note:: + mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.2 is needed on the client. + +- Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the Linux + kernel can be found at: https://www.kernel.org/pub/linux/kernel/ + + Download the sources and place them in an appropriate location. + +- Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + +- Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + +- Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + + #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + + #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +======================== + +Before configuring the NFS/RDMA software, it is a good idea to test +your new kernel to ensure that the kernel is working correctly. +In particular, it is a good idea to verify that the RDMA stack +is functioning as expected and standard NFS over TCP/IP and/or UDP/IP +is working properly. + +- Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + .. code-block:: sh + + $ cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + .. code-block:: sh + + host1$ ip link set dev ib0 up + host1$ ip address add dev ib0 a.b.c.x + host2$ ip link set dev ib0 up + host2$ ip address add dev ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x + + For other device types, follow the appropriate procedures. + +- Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +============== + +We recommend that you use two machines, one to act as the client and +one to act as the server. + +One time configuration: +----------------------- + +- On the server system, configure the /etc/exports file and start the NFS/RDMA server. + + Exports entries with the following formats have been tested:: + + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) + + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the client's iWARP address(es) for an RNIC. + + .. note:: + The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. + +Each time a machine boots: +-------------------------- + +- Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ip li set dev ib0 up + $ ip addr add dev ib0 a.b.c.d + + .. note:: + Please use unique addresses for the client and server! + +- Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: + + .. code-block:: sh + + $ modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the + server: + + .. code-block:: sh + + $ /etc/init.d/nfs start + + or + + .. code-block:: sh + + $ service nfs start + + Instruct the server to listen on the RDMA transport: + + .. code-block:: sh + + $ echo rdma 20049 > /proc/fs/nfsd/portlist + +- On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: + + .. code-block:: sh + + $ modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: + + .. code-block:: sh + + $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! diff --git a/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst b/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst new file mode 100644 index 000000000000..c05926f79054 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst @@ -0,0 +1,40 @@ +================================== +Administrative interfaces for nfsd +================================== + +Note that normally these interfaces are used only by the utilities in +nfs-utils. + +nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, +which is normally mounted at /proc/fs/nfsd/. + +The server is always started by the first write of a nonzero value to +nfsd/threads. + +Before doing that, NFSD can be told which sockets to listen on by +writing to nfsd/portlist; that write may be: + + - an ascii-encoded file descriptor, which should refer to a + bound (and listening, for tcp) socket, or + - "transportname port", where transportname is currently either + "udp", "tcp", or "rdma". + +If nfsd is started without doing any of these, then it will create one +udp and one tcp listener at port 2049 (see nfsd_init_socks). + +On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of +0 to nfsd/threads. All locks and state are thrown away at that point. + +Between startup and shutdown, the number of threads may be adjusted up +or down by additional writes to nfsd/threads or by writes to +nfsd/pool_threads. + +For more detail about files under nfsd/ and what they control, see +fs/nfsd/nfsctl.c; most of them have detailed comments. + +Implementation notes +==================== + +Note that the rpc server requires the caller to serialize addition and +removal of listening sockets, and startup and shutdown of the server. +For nfsd this is done using nfsd_mutex. diff --git a/Documentation/admin-guide/nfs/nfsroot.rst b/Documentation/admin-guide/nfs/nfsroot.rst new file mode 100644 index 000000000000..82a4fda057f9 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfsroot.rst @@ -0,0 +1,364 @@ +=============================================== +Mounting the root filesystem via NFS (nfsroot) +=============================================== + +:Authors: + Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> + + Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> + + Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> + + Updated 2006 by Horms <horms@verge.net.au> + + Updated 2018 by Chris Novakovic <chris@chrisn.me.uk> + + + +In order to use a diskless system, such as an X-terminal or printer server for +example, it is necessary for the root filesystem to be present on a non-disk +device. This may be an initramfs (see +Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see +Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The +following text describes on how to use NFS for the root filesystem. For the rest +of this text 'client' means the diskless system, and 'server' means the NFS +server. + + + + +Enabling nfsroot capabilities +============================= + +In order to use nfsroot, NFS client support needs to be selected as +built-in during configuration. Once this has been selected, the nfsroot +option will become available, which should also be selected. + +In the networking options, kernel level autoconfiguration can be selected, +along with the types of autoconfiguration to support. Selecting all of +DHCP, BOOTP and RARP is safe. + + + + +Kernel command line +=================== + +When the kernel has been loaded by a boot loader (see below) it needs to be +told what root fs device to use. And in the case of nfsroot, where to find +both the server and the name of the directory on the server to mount as root. +This can be established using the following kernel command line parameters: + + +root=/dev/nfs + This is necessary to enable the pseudo-NFS-device. Note that it's not a + real device but just a synonym to tell the kernel to use NFS instead of + a real device. + + +nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] + If the `nfsroot' parameter is NOT given on the command line, + the default ``"/tftpboot/%s"`` will be used. + + <server-ip> Specifies the IP address of the NFS server. + The default address is determined by the ip parameter + (see below). This parameter allows the use of different + servers for IP autoconfiguration and NFS. + + <root-dir> Name of the directory on the server to mount as root. + If there is a "%s" token in the string, it will be + replaced by the ASCII-representation of the client's + IP address. + + <nfs-options> Standard NFS options. All options are separated by commas. + The following defaults are used:: + + port = as given by server portmap daemon + rsize = 4096 + wsize = 4096 + timeo = 7 + retrans = 3 + acregmin = 3 + acregmax = 60 + acdirmin = 30 + acdirmax = 60 + flags = hard, nointr, noposix, cto, ac + + +ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip> + This parameter tells the kernel how to configure IP addresses of devices + and also how to set up the IP routing table. It was originally called + nfsaddrs, but now the boot-time IP configuration works independently of + NFS, so it was renamed to ip and the old name remained as an alias for + compatibility reasons. + + If this parameter is missing from the kernel command line, all fields are + assumed to be empty, and the defaults mentioned below apply. In general + this means that the kernel tries to configure everything using + autoconfiguration. + + The <autoconf> parameter can appear alone as the value to the ip + parameter (without all the ':' characters before). If the value is + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise + autoconfiguration will take place. The most common way to use this + is "ip=dhcp". + + <client-ip> IP address of the client. + Default: Determined using autoconfiguration. + + <server-ip> IP address of the NFS server. + If RARP is used to determine + the client address and this parameter is NOT empty only + replies from the specified server are accepted. + + Only required for NFS root. That is autoconfiguration + will not be triggered if it is missing and NFS root is not + in operation. + + Value is exported to /proc/net/pnp with the prefix "bootserver " + (see below). + + Default: Determined using autoconfiguration. + The address of the autoconfiguration server is used. + + <gw-ip> IP address of a gateway if the server is on a different subnet. + Default: Determined using autoconfiguration. + + <netmask> Netmask for local network interface. + If unspecified the netmask is derived from the client IP address + assuming classful addressing. + + Default: Determined using autoconfiguration. + + <hostname> Name of the client. + If a '.' character is present, anything + before the first '.' is used as the client's hostname, and anything + after it is used as its NIS domain name. May be supplied by + autoconfiguration, but its absence will not trigger autoconfiguration. + If specified and DHCP is used, the user-provided hostname (and NIS + domain name, if present) will be carried in the DHCP request; this + may cause a DNS record to be created or updated for the client. + + Default: Client IP address is used in ASCII notation. + + <device> Name of network device to use. + Default: If the host only has one device, it is used. + Otherwise the device is determined using + autoconfiguration. This is done by sending + autoconfiguration requests out of all devices, + and using the device that received the first reply. + + <autoconf> Method to use for autoconfiguration. + In the case of options + which specify multiple autoconfiguration protocols, + requests are sent using all protocols, and the first one + to reply is used. + + Only autoconfiguration protocols that have been compiled + into the kernel will be used, regardless of the value of + this option:: + + off or none: don't use autoconfiguration + (do static IP assignment instead) + on or any: use any protocol available in the kernel + (default) + dhcp: use DHCP + bootp: use BOOTP + rarp: use RARP + both: use both BOOTP and RARP but not DHCP + (old option kept for backwards compatibility) + + if dhcp is used, the client identifier can be used by following + format "ip=dhcp,client-id-type,client-id-value" + + Default: any + + <dns0-ip> IP address of primary nameserver. + Value is exported to /proc/net/pnp with the prefix "nameserver " + (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + <dns1-ip> IP address of secondary nameserver. + See <dns0-ip>. + + <ntp0-ip> IP address of a Network Time Protocol (NTP) server. + Value is exported to /proc/net/ipconfig/ntp_servers, but is + otherwise unused (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + After configuration (whether manual or automatic) is complete, two files + are created in the following format; lines are omitted if their respective + value is empty following configuration: + + - /proc/net/pnp: + + #PROTO: <DHCP|BOOTP|RARP|MANUAL> (depending on configuration method) + domain <dns-domain> (if autoconfigured, the DNS domain) + nameserver <dns0-ip> (primary name server IP) + nameserver <dns1-ip> (secondary name server IP) + nameserver <dns2-ip> (tertiary name server IP) + bootserver <server-ip> (NFS server IP) + + - /proc/net/ipconfig/ntp_servers: + + <ntp0-ip> (NTP server IP) + <ntp1-ip> (NTP server IP) + <ntp2-ip> (NTP server IP) + + <dns-domain> and <dns2-ip> (in /proc/net/pnp) and <ntp1-ip> and <ntp2-ip> + (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; + they cannot be specified as part of the "ip=" kernel command line parameter. + + Because the "domain" and "nameserver" options are recognised by DNS + resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems + that use an NFS root filesystem. + + Note that the kernel will not synchronise the system time with any NTP + servers it discovers; this is the responsibility of a user space process + (e.g. an initrd/initramfs script that passes the IP addresses listed in + /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real + root filesystem if it is on NFS). + + +nfsrootdebug + This parameter enables debugging messages to appear in the kernel + log at boot time so that administrators can verify that the correct + NFS mount options, server address, and root path are passed to the + NFS client. + + +rdinit=<executable file> + To specify which file contains the program that starts system + initialization, administrators can use this command line parameter. + The default value of this parameter is "/init". If the specified + file exists and the kernel can execute it, root filesystem related + kernel command line parameters, including 'nfsroot=', are ignored. + + A description of the process of mounting the root file system can be + found in Documentation/driver-api/early-userspace/early_userspace_support.rst + + +Boot Loader +=========== + +To get the kernel into memory different approaches can be used. +They depend on various facilities being available: + + +- Booting from a floppy using syslinux + + When building kernels, an easy way to create a boot floppy that uses + syslinux is to use the zdisk or bzdisk make targets which use zimage + and bzimage images respectively. Both targets accept the + FDARGS parameter which can be used to set the kernel command line. + + e.g:: + + make bzdisk FDARGS="root=/dev/nfs" + + Note that the user running this command will need to have + access to the floppy drive device, /dev/fd0 + + For more information on syslinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + .. note:: + Previously it was possible to write a kernel directly to + a floppy using dd, configure the boot device using rdev, and + boot using the resulting floppy. Linux no longer supports this + method of booting. + +- Booting from a cdrom using isolinux + + When building kernels, an easy way to create a bootable cdrom that + uses isolinux is to use the isoimage target which uses a bzimage + image. Like zdisk and bzdisk, this target accepts the FDARGS + parameter which can be used to set the kernel command line. + + e.g:: + + make isoimage FDARGS="root=/dev/nfs" + + The resulting iso image will be arch/<ARCH>/boot/image.iso + This can be written to a cdrom using a variety of tools including + cdrecord. + + e.g:: + + cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + +- Using LILO + + When using LILO all the necessary command line parameters may be + specified using the 'append=' directive in the LILO configuration + file. + + However, to use the 'root=' directive you also need to create + a dummy root device, which may be removed after LILO is run. + + e.g:: + + mknod /dev/boot255 c 0 255 + + For information on configuring LILO, please refer to its documentation. + +- Using GRUB + + When using GRUB, kernel parameter are simply appended after the kernel + specification: kernel <kernel> <parameters> + +- Using loadlin + + loadlin may be used to boot Linux from a DOS command prompt without + requiring a local hard disk to mount as root. This has not been + thoroughly tested by the authors of this document, but in general + it should be possible configure the kernel command line similarly + to the configuration of LILO. + + Please refer to the loadlin documentation for further information. + +- Using a boot ROM + + This is probably the most elegant way of booting a diskless client. + With a boot ROM the kernel is loaded using the TFTP protocol. The + authors of this document are not aware of any no commercial boot + ROMs that support booting Linux over the network. However, there + are two free implementations of a boot ROM, netboot-nfs and + etherboot, both of which are available on sunsite.unc.edu, and both + of which contain everything you need to boot a diskless Linux client. + +- Using pxelinux + + Pxelinux may be used to boot linux using the PXE boot loader + which is present on many modern network cards. + + When using pxelinux, the kernel image is specified using + "kernel <relative-path-below /tftpboot>". The nfsroot parameters + are passed to the kernel by adding them to the "append" line. + It is common to use serial console in conjunction with pxeliunx, + see Documentation/admin-guide/serial-console.rst for more information. + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + + + +Credits +======= + + The nfsroot code in the kernel and the RARP support have been written + by Gero Kuhlmann <gero@gkminix.han.de>. + + The rest of the IP layer autoconfiguration code has been written + by Martin Mares <mj@atrey.karlin.mff.cuni.cz>. + + In order to write the initial version of nfsroot I would like to thank + Jens-Uwe Mager <jum@anubis.han.de> for his help. diff --git a/Documentation/admin-guide/nfs/pnfs-block-server.rst b/Documentation/admin-guide/nfs/pnfs-block-server.rst new file mode 100644 index 000000000000..b00a2e705cc4 --- /dev/null +++ b/Documentation/admin-guide/nfs/pnfs-block-server.rst @@ -0,0 +1,42 @@ +=================================== +pNFS block layout server user guide +=================================== + +The Linux NFS server now supports the pNFS block layout extension. In this +case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition +to handling all the metadata access to the NFS export also hands out layouts +to the clients to directly access the underlying block devices that are +shared with the client. + +To use pNFS block layouts with with the Linux NFS server the exported file +system needs to support the pNFS block layouts (currently just XFS), and the +file system must sit on shared storage (typically iSCSI) that is accessible +to the clients in addition to the MDS. As of now the file system needs to +sit directly on the exported volume, striping or concatenation of +volumes on the MDS and clients is not supported yet. + +On the server, pNFS block volume support is automatically if the file system +support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK +option enabled, the blkmapd daemon from nfs-utils is running, and the +file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). + +If the nfsd server needs to fence a non-responding client it calls +/sbin/nfsd-recall-failed with the first argument set to the IP address of +the client, and the second argument set to the device node without the /dev +prefix for the file system to be fenced. Below is an example file that shows +how to translate the device into a serial number from SCSI EVPD 0x80:: + + cat > /sbin/nfsd-recall-failed << EOF + +.. code-block:: sh + + #!/bin/sh + + CLIENT="$1" + DEV="/dev/$2" + EVPD=`sg_inq --page=0x80 ${DEV} | \ + grep "Unit serial number:" | \ + awk -F ': ' '{print $2}'` + + echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log + EOF diff --git a/Documentation/admin-guide/nfs/pnfs-scsi-server.rst b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst new file mode 100644 index 000000000000..d2f6ee558071 --- /dev/null +++ b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst @@ -0,0 +1,24 @@ + +================================== +pNFS SCSI layout server user guide +================================== + +This document describes support for pNFS SCSI layouts in the Linux NFS server. +With pNFS SCSI layouts, the NFS server acts as Metadata Server (MDS) for pNFS, +which in addition to handling all the metadata access to the NFS export, +also hands out layouts to the clients so that they can directly access the +underlying SCSI LUNs that are shared with the client. + +To use pNFS SCSI layouts with with the Linux NFS server, the exported file +system needs to support the pNFS SCSI layouts (currently just XFS), and the +file system must sit on a SCSI LUN that is accessible to the clients in +addition to the MDS. As of now the file system needs to sit directly on the +exported LUN, striping or concatenation of LUNs on the MDS and clients +is not supported yet. + +On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is +automatically enabled if the file system is exported using the "pnfs" +option and the underlying SCSI device support persistent reservations. +On the client make sure the kernel has the CONFIG_PNFS_BLOCK option +enabled, and the file system is mounted using the NFSv4.1 protocol +version (mount -o vers=4.1). |