diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/cifs/TODO | 26 | ||||
-rw-r--r-- | Documentation/filesystems/coda.txt | 11 | ||||
-rw-r--r-- | Documentation/filesystems/conf.py | 10 | ||||
-rw-r--r-- | Documentation/filesystems/dax.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/debugfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.txt | 133 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/nfsroot.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 15 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 47 | ||||
-rw-r--r-- | Documentation/filesystems/ramfs-rootfs-initramfs.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/sysfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/xfs-self-describing-metadata.txt | 8 | ||||
-rw-r--r-- | Documentation/filesystems/xfs.txt | 470 |
14 files changed, 204 insertions, 530 deletions
diff --git a/Documentation/filesystems/cifs/TODO b/Documentation/filesystems/cifs/TODO index 9267f3fb131f..edbbccda1942 100644 --- a/Documentation/filesystems/cifs/TODO +++ b/Documentation/filesystems/cifs/TODO @@ -13,7 +13,8 @@ a) SMB3 (and SMB3.1.1) missing optional features: - T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl currently the only two server side copy mechanisms supported) -b) improved sparse file support +b) improved sparse file support (fiemap and SEEK_HOLE are implemented +but additional features would be supportable by the protocol). c) Directory entry caching relies on a 1 second timer, rather than using Directory Leases, currently only the root file handle is cached longer @@ -21,9 +22,13 @@ using Directory Leases, currently only the root file handle is cached longer d) quota support (needs minor kernel change since quota calls to make it to network filesystems or deviceless filesystems) -e) Additional use cases where we use "compoounding" (e.g. open/query/close -and open/setinfo/close) to reduce the number of roundtrips, and also -open to reduce redundant opens (using deferred close and reference counts more). +e) Additional use cases can be optimized to use "compounding" +(e.g. open/query/close and open/setinfo/close) to reduce the number +of roundtrips to the server and improve performance. Various cases +(stat, statfs, create, unlink, mkdir) already have been improved by +using compounding but more can be done. In addition we could significantly +reduce redundant opens by using deferred close (with handle caching leases) +and better using reference counters on file handles. f) Finish inotify support so kde and gnome file list windows will autorefresh (partially complete by Asser). Needs minor kernel @@ -43,18 +48,17 @@ mount or a per server basis to client UIDs or nobody if no mapping exists. Also better integration with winbind for resolving SID owners k) Add tools to take advantage of more smb3 specific ioctls and features -(passthrough ioctl/fsctl for sending various SMB3 fsctls to the server -is in progress, and a passthrough query_info call is already implemented -in cifs.ko to allow smb3 info levels queries to be sent from userspace) +(passthrough ioctl/fsctl is now implemented in cifs.ko to allow sending +various SMB3 fsctls and query info and set info calls directly from user space) +Add tools to make setting various non-POSIX metadata attributes easier +from tools (e.g. extending what was done in smb-info tool). l) encrypted file support m) improved stats gathering tools (perhaps integration with nfsometer?) to extend and make easier to use what is currently in /proc/fs/cifs/Stats -n) allow setting more NTFS/SMB3 file attributes remotely (currently limited to compressed -file attribute via chflags) and improve user space tools for managing and -viewing them. +n) Add support for claims based ACLs ("DAC") o) mount helper GUI (to simplify the various configuration options on mount) @@ -82,6 +86,8 @@ so far). w) Add support for additional strong encryption types, and additional spnego authentication mechanisms (see MS-SMB2) +x) Finish support for SMB3.1.1 compression + KNOWN BUGS ==================================== See http://bugzilla.samba.org - search on product "CifsVFS" for diff --git a/Documentation/filesystems/coda.txt b/Documentation/filesystems/coda.txt index 61311356025d..545262c167c3 100644 --- a/Documentation/filesystems/coda.txt +++ b/Documentation/filesystems/coda.txt @@ -481,7 +481,10 @@ kernel support. - + struct coda_timespec { + int64_t tv_sec; /* seconds */ + long tv_nsec; /* nanoseconds */ + }; struct coda_vattr { enum coda_vtype va_type; /* vnode type (for create) */ @@ -493,9 +496,9 @@ kernel support. long va_fileid; /* file id */ u_quad_t va_size; /* file size in bytes */ long va_blocksize; /* blocksize preferred for i/o */ - struct timespec va_atime; /* time of last access */ - struct timespec va_mtime; /* time of last modification */ - struct timespec va_ctime; /* time file changed */ + struct coda_timespec va_atime; /* time of last access */ + struct coda_timespec va_mtime; /* time of last modification */ + struct coda_timespec va_ctime; /* time file changed */ u_long va_gen; /* generation number of file */ u_long va_flags; /* flags defined for file */ dev_t va_rdev; /* device special file represents */ diff --git a/Documentation/filesystems/conf.py b/Documentation/filesystems/conf.py deleted file mode 100644 index ea44172af5c4..000000000000 --- a/Documentation/filesystems/conf.py +++ /dev/null @@ -1,10 +0,0 @@ -# -*- coding: utf-8; mode: python -*- - -project = "Linux Filesystems API" - -tags.add("subproject") - -latex_documents = [ - ('index', 'filesystems.tex', project, - 'The kernel development community', 'manual'), -] diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 6d2c0d340dea..679729442fd2 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -76,7 +76,7 @@ exposure of uninitialized data through mmap. These filesystems may be used for inspiration: - ext2: see Documentation/filesystems/ext2.txt - ext4: see Documentation/filesystems/ext4/ -- xfs: see Documentation/filesystems/xfs.txt +- xfs: see Documentation/admin-guide/xfs.rst Handling Media Errors diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 4a0a9c3f4af6..9e27c843d00e 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -169,7 +169,7 @@ byte offsets over a base for the register block. If you want to dump an u32 array in debugfs, you can create file with: - struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + void debugfs_create_u32_array(const char *name, umode_t mode, struct dentry *parent, u32 *array, u32 elements); diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index f7b5e4ff0de3..496fa28b2492 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -214,11 +214,22 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix", non-atomic files likewise "nobarrier" mount option. test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt context. The fake fscrypt context is used by xfstests. -checkpoint=%s Set to "disable" to turn off checkpointing. Set to "enable" +checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable" to reenable checkpointing. Is enabled by default. While disabled, any unmounting or unexpected shutdowns will cause the filesystem contents to appear as they did when the filesystem was mounted with that option. + While mounting with checkpoint=disabled, the filesystem must + run garbage collection to ensure that all available space can + be used. If this takes too much time, the mount may return + EAGAIN. You may optionally add a value to indicate how much + of the disk you would be willing to temporarily give up to + avoid additional garbage collection. This can be given as a + number of blocks, or as a percent. For instance, mounting + with checkpoint=disable:100% would always succeed, but it may + hide up to all remaining free space. The actual space that + would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable + This space is reclaimed once checkpoint=enable. ================================================================================ DEBUGFS ENTRIES @@ -246,11 +257,14 @@ Files in /sys/fs/f2fs/<devname> .............................................................................. File Content - gc_max_sleep_time This tuning parameter controls the maximum sleep + gc_urgent_sleep_time This parameter controls sleep time for gc_urgent. + 500 ms is set by default. See above gc_urgent. + + gc_min_sleep_time This tuning parameter controls the minimum sleep time for the garbage collection thread. Time is in milliseconds. - gc_min_sleep_time This tuning parameter controls the minimum sleep + gc_max_sleep_time This tuning parameter controls the maximum sleep time for the garbage collection thread. Time is in milliseconds. @@ -270,9 +284,6 @@ Files in /sys/fs/f2fs/<devname> to 1, background thread starts to do GC by given gc_urgent_sleep_time interval. - gc_urgent_sleep_time This parameter controls sleep time for gc_urgent. - 500 ms is set by default. See above gc_urgent. - reclaim_segments This parameter controls the number of prefree segments to be reclaimed. If the number of prefree segments is larger than the number of segments @@ -287,7 +298,16 @@ Files in /sys/fs/f2fs/<devname> checkpoint is triggered, and issued during the checkpoint. By default, it is disabled with 0. - trim_sections This parameter controls the number of sections + discard_granularity This parameter controls the granularity of discard + command size. It will issue discard commands iif + the size is larger than given granularity. Its + unit size is 4KB, and 4 (=16KB) is set by default. + The maximum value is 128 (=512KB). + + reserved_blocks This parameter indicates the number of blocks that + f2fs reserves internally for root. + + batched_trim_sections This parameter controls the number of sections to be trimmed out in batch mode when FITRIM conducts. 32 sections is set by default. @@ -309,11 +329,35 @@ Files in /sys/fs/f2fs/<devname> the number is less than this value, it triggers in-place-updates. + min_seq_blocks This parameter controls the threshold to serialize + write IOs issued by multiple threads in parallel. + + min_hot_blocks This parameter controls the threshold to allocate + a hot data log for pending data blocks to write. + + min_ssr_sections This parameter adds the threshold when deciding + SSR block allocation. If this is large, SSR mode + will be enabled early. + + ram_thresh This parameter controls the memory footprint used + by free nids and cached nat entries. By default, + 10 is set, which indicates 10 MB / 1 GB RAM. + + ra_nid_pages When building free nids, F2FS reads NAT blocks + ahead for speed up. Default is 0. + + dirty_nats_ratio Given dirty ratio of cached nat entries, F2FS + determines flushing them in background. + max_victim_search This parameter controls the number of trials to find a victim segment when conducting SSR and cleaning operations. The default value is 4096 which covers 8GB block address range. + migration_granularity For large-sized sections, F2FS can stop GC given + this granularity instead of reclaiming entire + section. + dir_level This parameter controls the directory level to support large directory. If a directory has a number of files, it can reduce the file lookup @@ -321,9 +365,53 @@ Files in /sys/fs/f2fs/<devname> Otherwise, it needs to decrease this value to reduce the space overhead. The default value is 0. - ram_thresh This parameter controls the memory footprint used - by free nids and cached nat entries. By default, - 10 is set, which indicates 10 MB / 1 GB RAM. + cp_interval F2FS tries to do checkpoint periodically, 60 secs + by default. + + idle_interval F2FS detects system is idle, if there's no F2FS + operations during given interval, 5 secs by + default. + + discard_idle_interval F2FS detects the discard thread is idle, given + time interval. Default is 5 secs. + + gc_idle_interval F2FS detects the GC thread is idle, given time + interval. Default is 5 secs. + + umount_discard_timeout When unmounting the disk, F2FS waits for finishing + queued discard commands which can take huge time. + This gives time out for it, 5 secs by default. + + iostat_enable This controls to enable/disable iostat in F2FS. + + readdir_ra This enables/disabled readahead of inode blocks + in readdir, and default is enabled. + + gc_pin_file_thresh This indicates how many GC can be failed for the + pinned file. If it exceeds this, F2FS doesn't + guarantee its pinning state. 2048 trials is set + by default. + + extension_list This enables to change extension_list for hot/cold + files in runtime. + + inject_rate This controls injection rate of arbitrary faults. + + inject_type This controls injection type of arbitrary faults. + + dirty_segments This shows # of dirty segments. + + lifetime_write_kbytes This shows # of data written to the disk. + + features This shows current features enabled on F2FS. + + current_reserved_blocks This shows # of blocks currently reserved. + + unusable If checkpoint=disable, this shows the number of + blocks that are unusable. + If checkpoint=enable it shows the number of blocks + that would be unusable if checkpoint=disable were + to be set. ================================================================================ USAGE @@ -716,3 +804,28 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " WRITE_LIFE_NONE WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM WRITE_LIFE_LONG " WRITE_LIFE_LONG + +Fallocate(2) Policy +------------------- + +The default policy follows the below posix rule. + +Allocating disk space + The default operation (i.e., mode is zero) of fallocate() allocates + the disk space within the range specified by offset and len. The + file size (as reported by stat(2)) will be changed if offset+len is + greater than the file size. Any subregion within the range specified + by offset and len that did not contain data before the call will be + initialized to zero. This default behavior closely resembles the + behavior of the posix_fallocate(3) library function, and is intended + as a method of optimally implementing that function. + +However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to +fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having +zero or random data, which is useful to the below scenario where: + 1. create(fd) + 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE) + 3. fallocate(fd, 0, 0, size) + 4. address = fibmap(fd, offset) + 5. open(blkdev) + 6. write(blkdev, address) diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt index d2963123eb1c..ae4332464560 100644 --- a/Documentation/filesystems/nfs/nfsroot.txt +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -239,7 +239,7 @@ rdinit=<executable file> A description of the process of mounting the root file system can be found in: - Documentation/early-userspace/README + Documentation/driver-api/early-userspace/early_userspace_support.rst diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 2813a19389fe..6b7a41cfcaed 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -428,8 +428,19 @@ release it yourself. -- [mandatory] d_alloc_root() is gone, along with a lot of bugs caused by code -misusing it. Replacement: d_make_root(inode). The difference is, -d_make_root() drops the reference to inode if dentry allocation fails. +misusing it. Replacement: d_make_root(inode). On success d_make_root(inode) +allocates and returns a new dentry instantiated with the passed in inode. +On failure NULL is returned and the passed in inode is dropped so the reference +to inode is consumed in all cases and failure handling need not do any cleanup +for the inode. If d_make_root(inode) is passed a NULL inode it returns NULL +and also requires no further error handling. Typical usage is: + + inode = foofs_new_inode(....); + s->s_root = d_make_root(inode); + if (!s->s_root) + /* Nothing needed for the inode cleanup */ + return -ENOMEM; + ... -- [mandatory] diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a226061fa109..99ca040e3f90 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -154,9 +154,11 @@ Table 1-1: Process specific entries in /proc symbol the task is blocked in - or "0" if not blocked. pagemap Page table stack Report full stack trace, enable via CONFIG_STACKTRACE - smaps an extension based on maps, showing the memory consumption of + smaps An extension based on maps, showing the memory consumption of each mapping and flags associated with it - numa_maps an extension based on maps, showing the memory locality and + smaps_rollup Accumulated smaps stats for all mappings of the process. This + can be derived from smaps, but is faster and more convenient + numa_maps An extension based on maps, showing the memory locality and binding policy as well as mem usage (in pages) of each mapping. .............................................................................. @@ -366,7 +368,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) exit_code the thread's exit_code in the form reported by the waitpid system call .............................................................................. -The /proc/PID/maps file containing the currently mapped memory regions and +The /proc/PID/maps file contains the currently mapped memory regions and their access permissions. The format is: @@ -417,11 +419,14 @@ is not associated with a file: or if empty, the mapping is anonymous. The /proc/PID/smaps is an extension based on maps, showing the memory -consumption for each of the process's mappings. For each of mappings there -is a series of lines such as the following: +consumption for each of the process's mappings. For each mapping (aka Virtual +Memory Area, or VMA) there is a series of lines such as the following: 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash + Size: 1084 kB +KernelPageSize: 4 kB +MMUPageSize: 4 kB Rss: 892 kB Pss: 374 kB Shared_Clean: 892 kB @@ -443,11 +448,14 @@ Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw -the first of these lines shows the same information as is displayed for the -mapping in /proc/PID/maps. The remaining lines show the size of the mapping -(size), the amount of the mapping that is currently resident in RAM (RSS), the -process' proportional share of this mapping (PSS), the number of clean and -dirty private pages in the mapping. +The first of these lines shows the same information as is displayed for the +mapping in /proc/PID/maps. Following lines show the size of the mapping +(size); the size of each page allocated when backing a VMA (KernelPageSize), +which is usually the same as the size in the page table entries; the page size +used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); +the amount of the mapping that is currently resident in RAM (RSS); the +process' proportional share of this mapping (PSS); and the number of clean and +dirty shared and private pages in the mapping. The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. @@ -478,8 +486,8 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this does not take into account swapped out page of underlying shmem objects. "Locked" indicates whether the mapping is locked in memory or not. -"THPeligible" indicates whether the mapping is eligible for THP pages - 1 if -true, 0 otherwise. +"THPeligible" indicates whether the mapping is eligible for allocating THP +pages - 1 if true, 0 otherwise. It just shows the current status. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded @@ -532,6 +540,19 @@ guarantees: 2) If there is something at a given vaddr during the entirety of the life of the smaps/maps walk, there will be some output for it. +The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps, +but their values are the sums of the corresponding values for all mappings of +the process. Additionally, it contains these fields: + +Pss_Anon +Pss_File +Pss_Shmem + +They represent the proportional shares of anonymous, file, and shmem pages, as +described for smaps above. These fields are omitted in smaps since each +mapping identifies the type (anon, file, or shmem) of all pages it contains. +Thus all information in smaps_rollup can be derived from smaps, but at a +significantly higher cost. The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG bits on both physical and virtual pages associated with a process, and the @@ -1479,7 +1500,7 @@ review the kernel documentation in the directory /usr/src/linux/Documentation. This chapter is heavily based on the documentation included in the pre 2.2 kernels, and became part of it in version 2.2.1 of the Linux kernel. -Please see: Documentation/sysctl/ directory for descriptions of these +Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these entries. ------------------------------------------------------------------------------ diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index 79637d227e85..97d42ccaa92d 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt @@ -105,7 +105,7 @@ All this differs from the old initrd in several ways: - The old initrd file was a gzipped filesystem image (in some file format, such as ext2, that needed a driver built into the kernel), while the new initramfs archive is a gzipped cpio archive (like tar only simpler, - see cpio(1) and Documentation/early-userspace/buffer-format.txt). The + see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst). The kernel's cpio extraction code is not only extremely small, it's also __init text and data that can be discarded during the boot process. @@ -159,7 +159,7 @@ One advantage of the configuration file is that root access is not required to set permissions or create device nodes in the new archive. (Note that those two example "file" entries expect to find files named "init.sh" and "busybox" in a directory called "initramfs", under the linux-2.6.* directory. See -Documentation/early-userspace/README for more details.) +Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.) The kernel does not depend on external cpio tools. If you specify a directory instead of a configuration file, the kernel's build infrastructure diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 5b5311f9358d..ddf15b1b0d5a 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -319,7 +319,7 @@ quick way to lookup the sysfs interface for a device from the result of a stat(2) operation. More information can driver-model specific features can be found in -Documentation/driver-model/. +Documentation/driver-api/driver-model/. TODO: Finish this section. diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index cad797a8a39e..5ecbc03e6b2f 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -98,7 +98,7 @@ A memory policy with a valid NodeList will be saved, as specified, for use at file creation time. When a task allocates a file in the file system, the mount option memory policy will be applied with a NodeList, if any, modified by the calling task's cpuset constraints -[See Documentation/cgroup-v1/cpusets.rst] and any optional flags, listed +[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed below. If the resulting NodeLists is the empty set, the effective memory policy for the file will revert to "default" policy. diff --git a/Documentation/filesystems/xfs-self-describing-metadata.txt b/Documentation/filesystems/xfs-self-describing-metadata.txt index 68604e67a495..8db0121d0980 100644 --- a/Documentation/filesystems/xfs-self-describing-metadata.txt +++ b/Documentation/filesystems/xfs-self-describing-metadata.txt @@ -222,7 +222,7 @@ static void xfs_foo_read_verify( struct xfs_buf *bp) { - struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_mount *mp = bp->b_mount; if ((xfs_sb_version_hascrc(&mp->m_sb) && !xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length), @@ -245,7 +245,7 @@ static bool xfs_foo_verify( struct xfs_buf *bp) { - struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_mount *mp = bp->b_mount; struct xfs_ondisk_hdr *hdr = bp->b_addr; if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC)) @@ -272,7 +272,7 @@ static bool xfs_foo_verify( struct xfs_buf *bp) { - struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_mount *mp = bp->b_mount; struct xfs_ondisk_hdr *hdr = bp->b_addr; if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) { @@ -297,7 +297,7 @@ static void xfs_foo_write_verify( struct xfs_buf *bp) { - struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_mount *mp = bp->b_mount; struct xfs_buf_log_item *bip = bp->b_fspriv; if (!xfs_foo_verify(bp)) { diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt deleted file mode 100644 index a5cbb5e0e3db..000000000000 --- a/Documentation/filesystems/xfs.txt +++ /dev/null @@ -1,470 +0,0 @@ - -The SGI XFS Filesystem -====================== - -XFS is a high performance journaling filesystem which originated -on the SGI IRIX platform. It is completely multi-threaded, can -support large files and large filesystems, extended attributes, -variable block sizes, is extent based, and makes extensive use of -Btrees (directories, extents, free space) to aid both performance -and scalability. - -Refer to the documentation at https://xfs.wiki.kernel.org/ -for further details. This implementation is on-disk compatible -with the IRIX version of XFS. - - -Mount Options -============= - -When mounting an XFS filesystem, the following options are accepted. -For boolean mount options, the names with the (*) suffix is the -default behaviour. - - allocsize=size - Sets the buffered I/O end-of-file preallocation size when - doing delayed allocation writeout (default size is 64KiB). - Valid values for this option are page size (typically 4KiB) - through to 1GiB, inclusive, in power-of-2 increments. - - The default behaviour is for dynamic end-of-file - preallocation size, which uses a set of heuristics to - optimise the preallocation size based on the current - allocation patterns within the file and the access patterns - to the file. Specifying a fixed allocsize value turns off - the dynamic behaviour. - - attr2 - noattr2 - The options enable/disable an "opportunistic" improvement to - be made in the way inline extended attributes are stored - on-disk. When the new form is used for the first time when - attr2 is selected (either when setting or removing extended - attributes) the on-disk superblock feature bit field will be - updated to reflect this format being in use. - - The default behaviour is determined by the on-disk feature - bit indicating that attr2 behaviour is active. If either - mount option it set, then that becomes the new default used - by the filesystem. - - CRC enabled filesystems always use the attr2 format, and so - will reject the noattr2 mount option if it is set. - - discard - nodiscard (*) - Enable/disable the issuing of commands to let the block - device reclaim space freed by the filesystem. This is - useful for SSD devices, thinly provisioned LUNs and virtual - machine images, but may have a performance impact. - - Note: It is currently recommended that you use the fstrim - application to discard unused blocks rather than the discard - mount option because the performance impact of this option - is quite severe. - - grpid/bsdgroups - nogrpid/sysvgroups (*) - These options define what group ID a newly created file - gets. When grpid is set, it takes the group ID of the - directory in which it is created; otherwise it takes the - fsgid of the current process, unless the directory has the - setgid bit set, in which case it takes the gid from the - parent directory, and also gets the setgid bit set if it is - a directory itself. - - filestreams - Make the data allocator use the filestreams allocation mode - across the entire filesystem rather than just on directories - configured to use it. - - ikeep - noikeep (*) - When ikeep is specified, XFS does not delete empty inode - clusters and keeps them around on disk. When noikeep is - specified, empty inode clusters are returned to the free - space pool. - - inode32 - inode64 (*) - When inode32 is specified, it indicates that XFS limits - inode creation to locations which will not result in inode - numbers with more than 32 bits of significance. - - When inode64 is specified, it indicates that XFS is allowed - to create inodes at any location in the filesystem, - including those which will result in inode numbers occupying - more than 32 bits of significance. - - inode32 is provided for backwards compatibility with older - systems and applications, since 64 bits inode numbers might - cause problems for some applications that cannot handle - large inode numbers. If applications are in use which do - not handle inode numbers bigger than 32 bits, the inode32 - option should be specified. - - - largeio - nolargeio (*) - If "nolargeio" is specified, the optimal I/O reported in - st_blksize by stat(2) will be as small as possible to allow - user applications to avoid inefficient read/modify/write - I/O. This is typically the page size of the machine, as - this is the granularity of the page cache. - - If "largeio" specified, a filesystem that was created with a - "swidth" specified will return the "swidth" value (in bytes) - in st_blksize. If the filesystem does not have a "swidth" - specified but does specify an "allocsize" then "allocsize" - (in bytes) will be returned instead. Otherwise the behaviour - is the same as if "nolargeio" was specified. - - logbufs=value - Set the number of in-memory log buffers. Valid numbers - range from 2-8 inclusive. - - The default value is 8 buffers. - - If the memory cost of 8 log buffers is too high on small - systems, then it may be reduced at some cost to performance - on metadata intensive workloads. The logbsize option below - controls the size of each buffer and so is also relevant to - this case. - - logbsize=value - Set the size of each in-memory log buffer. The size may be - specified in bytes, or in kilobytes with a "k" suffix. - Valid sizes for version 1 and version 2 logs are 16384 (16k) - and 32768 (32k). Valid sizes for version 2 logs also - include 65536 (64k), 131072 (128k) and 262144 (256k). The - logbsize must be an integer multiple of the log - stripe unit configured at mkfs time. - - The default value for for version 1 logs is 32768, while the - default value for version 2 logs is MAX(32768, log_sunit). - - logdev=device and rtdev=device - Use an external log (metadata journal) and/or real-time device. - An XFS filesystem has up to three parts: a data section, a log - section, and a real-time section. The real-time section is - optional, and the log section can be separate from the data - section or contained within it. - - noalign - Data allocations will not be aligned at stripe unit - boundaries. This is only relevant to filesystems created - with non-zero data alignment parameters (sunit, swidth) by - mkfs. - - norecovery - The filesystem will be mounted without running log recovery. - If the filesystem was not cleanly unmounted, it is likely to - be inconsistent when mounted in "norecovery" mode. - Some files or directories may not be accessible because of this. - Filesystems mounted "norecovery" must be mounted read-only or - the mount will fail. - - nouuid - Don't check for double mounted file systems using the file - system uuid. This is useful to mount LVM snapshot volumes, - and often used in combination with "norecovery" for mounting - read-only snapshots. - - noquota - Forcibly turns off all quota accounting and enforcement - within the filesystem. - - uquota/usrquota/uqnoenforce/quota - User disk quota accounting enabled, and limits (optionally) - enforced. Refer to xfs_quota(8) for further details. - - gquota/grpquota/gqnoenforce - Group disk quota accounting enabled and limits (optionally) - enforced. Refer to xfs_quota(8) for further details. - - pquota/prjquota/pqnoenforce - Project disk quota accounting enabled and limits (optionally) - enforced. Refer to xfs_quota(8) for further details. - - sunit=value and swidth=value - Used to specify the stripe unit and width for a RAID device - or a stripe volume. "value" must be specified in 512-byte - block units. These options are only relevant to filesystems - that were created with non-zero data alignment parameters. - - The sunit and swidth parameters specified must be compatible - with the existing filesystem alignment characteristics. In - general, that means the only valid changes to sunit are - increasing it by a power-of-2 multiple. Valid swidth values - are any integer multiple of a valid sunit value. - - Typically the only time these mount options are necessary if - after an underlying RAID device has had it's geometry - modified, such as adding a new disk to a RAID5 lun and - reshaping it. - - swalloc - Data allocations will be rounded up to stripe width boundaries - when the current end of file is being extended and the file - size is larger than the stripe width size. - - wsync - When specified, all filesystem namespace operations are - executed synchronously. This ensures that when the namespace - operation (create, unlink, etc) completes, the change to the - namespace is on stable storage. This is useful in HA setups - where failover must not result in clients seeing - inconsistent namespace presentation during or after a - failover event. - - -Deprecated Mount Options -======================== - - Name Removal Schedule - ---- ---------------- - - -Removed Mount Options -===================== - - Name Removed - ---- ------- - delaylog/nodelaylog v4.0 - ihashsize v4.0 - irixsgid v4.0 - osyncisdsync/osyncisosync v4.0 - barrier v4.19 - nobarrier v4.19 - - -sysctls -======= - -The following sysctls are available for the XFS filesystem: - - fs.xfs.stats_clear (Min: 0 Default: 0 Max: 1) - Setting this to "1" clears accumulated XFS statistics - in /proc/fs/xfs/stat. It then immediately resets to "0". - - fs.xfs.xfssyncd_centisecs (Min: 100 Default: 3000 Max: 720000) - The interval at which the filesystem flushes metadata - out to disk and runs internal cache cleanup routines. - - fs.xfs.filestream_centisecs (Min: 1 Default: 3000 Max: 360000) - The interval at which the filesystem ages filestreams cache - references and returns timed-out AGs back to the free stream - pool. - - fs.xfs.speculative_prealloc_lifetime - (Units: seconds Min: 1 Default: 300 Max: 86400) - The interval at which the background scanning for inodes - with unused speculative preallocation runs. The scan - removes unused preallocation from clean inodes and releases - the unused space back to the free pool. - - fs.xfs.error_level (Min: 0 Default: 3 Max: 11) - A volume knob for error reporting when internal errors occur. - This will generate detailed messages & backtraces for filesystem - shutdowns, for example. Current threshold values are: - - XFS_ERRLEVEL_OFF: 0 - XFS_ERRLEVEL_LOW: 1 - XFS_ERRLEVEL_HIGH: 5 - - fs.xfs.panic_mask (Min: 0 Default: 0 Max: 256) - Causes certain error conditions to call BUG(). Value is a bitmask; - OR together the tags which represent errors which should cause panics: - - XFS_NO_PTAG 0 - XFS_PTAG_IFLUSH 0x00000001 - XFS_PTAG_LOGRES 0x00000002 - XFS_PTAG_AILDELETE 0x00000004 - XFS_PTAG_ERROR_REPORT 0x00000008 - XFS_PTAG_SHUTDOWN_CORRUPT 0x00000010 - XFS_PTAG_SHUTDOWN_IOERROR 0x00000020 - XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040 - XFS_PTAG_FSBLOCK_ZERO 0x00000080 - XFS_PTAG_VERIFIER_ERROR 0x00000100 - - This option is intended for debugging only. - - fs.xfs.irix_symlink_mode (Min: 0 Default: 0 Max: 1) - Controls whether symlinks are created with mode 0777 (default) - or whether their mode is affected by the umask (irix mode). - - fs.xfs.irix_sgid_inherit (Min: 0 Default: 0 Max: 1) - Controls files created in SGID directories. - If the group ID of the new file does not match the effective group - ID or one of the supplementary group IDs of the parent dir, the - ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl - is set. - - fs.xfs.inherit_sync (Min: 0 Default: 1 Max: 1) - Setting this to "1" will cause the "sync" flag set - by the xfs_io(8) chattr command on a directory to be - inherited by files in that directory. - - fs.xfs.inherit_nodump (Min: 0 Default: 1 Max: 1) - Setting this to "1" will cause the "nodump" flag set - by the xfs_io(8) chattr command on a directory to be - inherited by files in that directory. - - fs.xfs.inherit_noatime (Min: 0 Default: 1 Max: 1) - Setting this to "1" will cause the "noatime" flag set - by the xfs_io(8) chattr command on a directory to be - inherited by files in that directory. - - fs.xfs.inherit_nosymlinks (Min: 0 Default: 1 Max: 1) - Setting this to "1" will cause the "nosymlinks" flag set - by the xfs_io(8) chattr command on a directory to be - inherited by files in that directory. - - fs.xfs.inherit_nodefrag (Min: 0 Default: 1 Max: 1) - Setting this to "1" will cause the "nodefrag" flag set - by the xfs_io(8) chattr command on a directory to be - inherited by files in that directory. - - fs.xfs.rotorstep (Min: 1 Default: 1 Max: 256) - In "inode32" allocation mode, this option determines how many - files the allocator attempts to allocate in the same allocation - group before moving to the next allocation group. The intent - is to control the rate at which the allocator moves between - allocation groups when allocating extents for new files. - -Deprecated Sysctls -================== - -None at present. - - -Removed Sysctls -=============== - - Name Removed - ---- ------- - fs.xfs.xfsbufd_centisec v4.0 - fs.xfs.age_buffer_centisecs v4.0 - - -Error handling -============== - -XFS can act differently according to the type of error found during its -operation. The implementation introduces the following concepts to the error -handler: - - -failure speed: - Defines how fast XFS should propagate an error upwards when a specific - error is found during the filesystem operation. It can propagate - immediately, after a defined number of retries, after a set time period, - or simply retry forever. - - -error classes: - Specifies the subsystem the error configuration will apply to, such as - metadata IO or memory allocation. Different subsystems will have - different error handlers for which behaviour can be configured. - - -error handlers: - Defines the behavior for a specific error. - -The filesystem behavior during an error can be set via sysfs files. Each -error handler works independently - the first condition met by an error handler -for a specific class will cause the error to be propagated rather than reset and -retried. - -The action taken by the filesystem when the error is propagated is context -dependent - it may cause a shut down in the case of an unrecoverable error, -it may be reported back to userspace, or it may even be ignored because -there's nothing useful we can with the error or anyone we can report it to (e.g. -during unmount). - -The configuration files are organized into the following hierarchy for each -mounted filesystem: - - /sys/fs/xfs/<dev>/error/<class>/<error>/ - -Where: - <dev> - The short device name of the mounted filesystem. This is the same device - name that shows up in XFS kernel error messages as "XFS(<dev>): ..." - - <class> - The subsystem the error configuration belongs to. As of 4.9, the defined - classes are: - - - "metadata": applies metadata buffer write IO - - <error> - The individual error handler configurations. - - -Each filesystem has "global" error configuration options defined in their top -level directory: - - /sys/fs/xfs/<dev>/error/ - - fail_at_unmount (Min: 0 Default: 1 Max: 1) - Defines the filesystem error behavior at unmount time. - - If set to a value of 1, XFS will override all other error configurations - during unmount and replace them with "immediate fail" characteristics. - i.e. no retries, no retry timeout. This will always allow unmount to - succeed when there are persistent errors present. - - If set to 0, the configured retry behaviour will continue until all - retries and/or timeouts have been exhausted. This will delay unmount - completion when there are persistent errors, and it may prevent the - filesystem from ever unmounting fully in the case of "retry forever" - handler configurations. - - Note: there is no guarantee that fail_at_unmount can be set while an - unmount is in progress. It is possible that the sysfs entries are - removed by the unmounting filesystem before a "retry forever" error - handler configuration causes unmount to hang, and hence the filesystem - must be configured appropriately before unmount begins to prevent - unmount hangs. - -Each filesystem has specific error class handlers that define the error -propagation behaviour for specific errors. There is also a "default" error -handler defined, which defines the behaviour for all errors that don't have -specific handlers defined. Where multiple retry constraints are configuredi for -a single error, the first retry configuration that expires will cause the error -to be propagated. The handler configurations are found in the directory: - - /sys/fs/xfs/<dev>/error/<class>/<error>/ - - max_retries (Min: -1 Default: Varies Max: INTMAX) - Defines the allowed number of retries of a specific error before - the filesystem will propagate the error. The retry count for a given - error context (e.g. a specific metadata buffer) is reset every time - there is a successful completion of the operation. - - Setting the value to "-1" will cause XFS to retry forever for this - specific error. - - Setting the value to "0" will cause XFS to fail immediately when the - specific error is reported. - - Setting the value to "N" (where 0 < N < Max) will make XFS retry the - operation "N" times before propagating the error. - - retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day) - Define the amount of time (in seconds) that the filesystem is - allowed to retry its operations when the specific error is - found. - - Setting the value to "-1" will allow XFS to retry forever for this - specific error. - - Setting the value to "0" will cause XFS to fail immediately when the - specific error is reported. - - Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the - operation for up to "N" seconds before propagating the error. - -Note: The default behaviour for a specific error handler is dependent on both -the class and error context. For example, the default values for -"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults -to "fail immediately" behaviour. This is done because ENODEV is a fatal, -unrecoverable error no matter how many times the metadata IO is retried. |