summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/erofs.rst19
-rw-r--r--Documentation/filesystems/ext4/globals.rst1
-rw-r--r--Documentation/filesystems/ext4/inodes.rst10
-rw-r--r--Documentation/filesystems/ext4/orphan.rst52
-rw-r--r--Documentation/filesystems/ext4/special_inodes.rst17
-rw-r--r--Documentation/filesystems/ext4/super.rst15
-rw-r--r--Documentation/filesystems/f2fs.rst17
-rw-r--r--Documentation/filesystems/index.rst1
-rw-r--r--Documentation/filesystems/locking.rst2
-rw-r--r--Documentation/filesystems/ntfs3.rst106
-rw-r--r--Documentation/filesystems/overlayfs.rst3
-rw-r--r--Documentation/filesystems/vfs.rst2
12 files changed, 233 insertions, 12 deletions
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
index 832839fcf4c3..b97579b7d8fb 100644
--- a/Documentation/filesystems/erofs.rst
+++ b/Documentation/filesystems/erofs.rst
@@ -84,6 +84,9 @@ cache_strategy=%s Select a strategy for cached decompression from now on:
It still does in-place I/O decompression
for the rest compressed physical clusters.
========== =============================================
+dax={always,never} Use direct access (no page cache). See
+ Documentation/filesystems/dax.rst.
+dax A legacy option which is an alias for ``dax=always``.
=================== =========================================================
On-disk details
@@ -153,13 +156,14 @@ may not. All metadatas can be now observed in two different spaces (views):
Xattrs, extents, data inline are followed by the corresponding inode with
proper alignment, and they could be optional for different data mappings.
- _currently_ total 4 valid data mappings are supported:
+ _currently_ total 5 data layouts are supported:
== ====================================================================
0 flat file data without data inline (no extent);
1 fixed-sized output data compression (with non-compacted indexes);
2 flat file data with tail packing data inline (no extent);
- 3 fixed-sized output data compression (with compacted indexes, v5.3+).
+ 3 fixed-sized output data compression (with compacted indexes, v5.3+);
+ 4 chunk-based file (v5.15+).
== ====================================================================
The size of the optional xattrs is indicated by i_xattr_count in inode
@@ -210,6 +214,17 @@ Note that apart from the offset of the first filename, nameoff0 also indicates
the total number of directory entries in this block since it is no need to
introduce another on-disk field at all.
+Chunk-based file
+----------------
+In order to support chunk-based data deduplication, a new inode data layout has
+been supported since Linux v5.15: Files are split in equal-sized data chunks
+with ``extents`` area of the inode metadata indicating how to get the chunk
+data: these can be simply as a 4-byte block address array or in the 8-byte
+chunk index form (see struct erofs_inode_chunk_index in erofs_fs.h for more
+details.)
+
+By the way, chunk-based files are all uncompressed for now.
+
Data compression
----------------
EROFS implements LZ4 fixed-sized output compression which generates fixed-sized
diff --git a/Documentation/filesystems/ext4/globals.rst b/Documentation/filesystems/ext4/globals.rst
index 368bf7662b96..b17418974fd3 100644
--- a/Documentation/filesystems/ext4/globals.rst
+++ b/Documentation/filesystems/ext4/globals.rst
@@ -11,3 +11,4 @@ have static metadata at fixed locations.
.. include:: bitmaps.rst
.. include:: mmp.rst
.. include:: journal.rst
+.. include:: orphan.rst
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index a65baffb4ebf..6c5ce666e63f 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
modification time (mtime), and deletion time (dtime). The four fields
are 32-bit signed integers that represent seconds since the Unix epoch
(1970-01-01 00:00:00 GMT), which means that the fields will overflow in
-January 2038. For inodes that are not linked from any directory but are
-still open (orphan inodes), the dtime field is overloaded for use with
-the orphan list. The superblock field ``s_last_orphan`` points to the
-first inode in the orphan list; dtime is then the number of the next
-orphaned inode, or zero if there are no more orphans.
+January 2038. If the filesystem does not have orphan_file feature, inodes
+that are not linked from any directory but are still open (orphan inodes) have
+the dtime field overloaded for use with the orphan list. The superblock field
+``s_last_orphan`` points to the first inode in the orphan list; dtime is then
+the number of the next orphaned inode, or zero if there are no more orphans.
If the inode structure size ``sb->s_inode_size`` is larger than 128
bytes and the ``i_inode_extra`` field is large enough to encompass the
diff --git a/Documentation/filesystems/ext4/orphan.rst b/Documentation/filesystems/ext4/orphan.rst
new file mode 100644
index 000000000000..bb19ecd1b626
--- /dev/null
+++ b/Documentation/filesystems/ext4/orphan.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Orphan file
+-----------
+
+In unix there can inodes that are unlinked from directory hierarchy but that
+are still alive because they are open. In case of crash the filesystem has to
+clean up these inodes as otherwise they (and the blocks referenced from them)
+would leak. Similarly if we truncate or extend the file, we need not be able
+to perform the operation in a single journalling transaction. In such case we
+track the inode as orphan so that in case of crash extra blocks allocated to
+the file get truncated.
+
+Traditionally ext4 tracks orphan inodes in a form of single linked list where
+superblock contains the inode number of the last orphan inode (s\_last\_orphan
+field) and then each inode contains inode number of the previously orphaned
+inode (we overload i\_dtime inode field for this). However this filesystem
+global single linked list is a scalability bottleneck for workloads that result
+in heavy creation of orphan inodes. When orphan file feature
+(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
+(referenced from the superblock through s\_orphan_file_inum) with several
+blocks. Each of these blocks has a structure:
+
+.. list-table::
+ :widths: 8 8 24 40
+ :header-rows: 1
+
+ * - Offset
+ - Type
+ - Name
+ - Description
+ * - 0x0
+ - Array of \_\_le32 entries
+ - Orphan inode entries
+ - Each \_\_le32 entry is either empty (0) or it contains inode number of
+ an orphan inode.
+ * - blocksize - 8
+ - \_\_le32
+ - ob\_magic
+ - Magic value stored in orphan block tail (0x0b10ca04)
+ * - blocksize - 4
+ - \_\_le32
+ - ob\_checksum
+ - Checksum of the orphan block.
+
+When a filesystem with orphan file feature is writeably mounted, we set
+RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
+be valid orphan entries. In case we see this feature when mounting the
+filesystem, we read the whole orphan file and process all orphan inodes found
+there as usual. When cleanly unmounting the filesystem we remove the
+RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
+file and also make the filesystem fully compatible with older kernels.
diff --git a/Documentation/filesystems/ext4/special_inodes.rst b/Documentation/filesystems/ext4/special_inodes.rst
index 9061aabba827..94f304e3a0a7 100644
--- a/Documentation/filesystems/ext4/special_inodes.rst
+++ b/Documentation/filesystems/ext4/special_inodes.rst
@@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
* - 11
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
+Note that there are also some inodes allocated from non-reserved inode numbers
+for other filesystem features which are not referenced from standard directory
+hierarchy. These are generally reference from the superblock. They are:
+
+.. list-table::
+ :widths: 20 50
+ :header-rows: 1
+
+ * - Superblock field
+ - Description
+
+ * - s\_lpf\_ino
+ - Inode number of lost+found directory.
+ * - s\_prj\_quota\_inum
+ - Inode number of quota file tracking project quotas
+ * - s\_orphan\_file\_inum
+ - Inode number of file tracking orphan inodes.
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 2eb1ab20498d..f6a548e957bb 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
- Filename charset encoding flags.
* - 0x280
- \_\_le32
- - s\_reserved[95]
+ - s\_orphan\_file\_inum
+ - Orphan file inode number.
+ * - 0x284
+ - \_\_le32
+ - s\_reserved[94]
- Padding to the end of the block.
* - 0x3FC
- \_\_le32
@@ -603,6 +607,11 @@ following:
the journal, JBD2 incompat feature
(JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
set (COMPAT\_FAST\_COMMIT).
+ * - 0x1000
+ - Orphan file allocated. This is the special file for more efficient
+ tracking of unlinked but still open inodes. When there may be any
+ entries in the file, we additionally set proper rocompat feature
+ (RO\_COMPAT\_ORPHAN\_PRESENT).
.. _super_incompat:
@@ -713,6 +722,10 @@ the following:
- Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
* - 0x8000
- Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
+ * - 0x10000
+ - Indicates orphan file may have valid orphan entries and thus we need
+ to clean them up when mounting the filesystem
+ (RO\_COMPAT\_ORPHAN\_PRESENT).
.. _super_def_hash:
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index ff9e7cc97c65..09de6ebbbdfa 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -185,6 +185,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_KVMALLOC 0x000000002
FAULT_PAGE_ALLOC 0x000000004
FAULT_PAGE_GET 0x000000008
+ FAULT_ALLOC_BIO 0x000000010 (obsolete)
FAULT_ALLOC_NID 0x000000020
FAULT_ORPHAN 0x000000040
FAULT_BLOCK 0x000000080
@@ -195,6 +196,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_CHECKPOINT 0x000001000
FAULT_DISCARD 0x000002000
FAULT_WRITE_IO 0x000004000
+ FAULT_SLAB_ALLOC 0x000008000
=================== ===========
mode=%s Control block allocation mode which supports "adaptive"
and "lfs". In "lfs" mode, there should be no random
@@ -312,6 +314,14 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
Documentation/block/inline-encryption.rst.
atgc Enable age-threshold garbage collection, it provides high
effectiveness and efficiency on background GC.
+discard_unit=%s Control discard unit, the argument can be "block", "segment"
+ and "section", issued discard command's offset/size will be
+ aligned to the unit, by default, "discard_unit=block" is set,
+ so that small discard functionality is enabled.
+ For blkzoned device, "discard_unit=section" will be set by
+ default, it is helpful for large sized SMR or ZNS devices to
+ reduce memory cost by getting rid of fs metadata supports small
+ discard.
======================== ============================================================
Debugfs Entries
@@ -857,8 +867,11 @@ Compression implementation
directly in order to guarantee potential data updates later to the space.
Instead, the main goal is to reduce data writes to flash disk as much as
possible, resulting in extending disk life time as well as relaxing IO
- congestion. Alternatively, we've added ioctl interface to reclaim compressed
- space and show it to user after putting the immutable bit.
+ congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS)
+ interface to reclaim compressed space and show it to user after putting the
+ immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping
+ on the file, until reserving compressed space via
+ ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero.
Compress metadata layout::
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 1a2dd4d35717..c0ad233963ae 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -101,6 +101,7 @@ Documentation for filesystem implementations.
nilfs2
nfs/index
ntfs
+ ntfs3
ocfs2
ocfs2-online-filecheck
omfs
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 2a75dd5da7b5..d36fe79167b3 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -70,7 +70,7 @@ prototypes::
const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
- int (*get_acl)(struct inode *, int);
+ struct posix_acl * (*get_acl)(struct inode *, int, bool);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
diff --git a/Documentation/filesystems/ntfs3.rst b/Documentation/filesystems/ntfs3.rst
new file mode 100644
index 000000000000..ffe9ea0c1499
--- /dev/null
+++ b/Documentation/filesystems/ntfs3.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+NTFS3
+=====
+
+
+Summary and Features
+====================
+
+NTFS3 is fully functional NTFS Read-Write driver. The driver works with
+NTFS versions up to 3.1, normal/compressed/sparse files
+and journal replaying. File system type to use on mount is 'ntfs3'.
+
+- This driver implements NTFS read/write support for normal, sparse and
+ compressed files.
+- Supports native journal replaying;
+- Supports extended attributes
+ Predefined extended attributes:
+ - 'system.ntfs_security' gets/sets security
+ descriptor (SECURITY_DESCRIPTOR_RELATIVE)
+ - 'system.ntfs_attrib' gets/sets ntfs file/dir attributes.
+ Note: applied to empty files, this allows to switch type between
+ sparse(0x200), compressed(0x800) and normal;
+- Supports NFS export of mounted NTFS volumes.
+
+Mount Options
+=============
+
+The list below describes mount options supported by NTFS3 driver in addition to
+generic ones.
+
+===============================================================================
+
+nls=name This option informs the driver how to interpret path
+ strings and translate them to Unicode and back. If
+ this option is not set, the default codepage will be
+ used (CONFIG_NLS_DEFAULT).
+ Examples:
+ 'nls=utf8'
+
+uid=
+gid=
+umask= Controls the default permissions for files/directories created
+ after the NTFS volume is mounted.
+
+fmask=
+dmask= Instead of specifying umask which applies both to
+ files and directories, fmask applies only to files and
+ dmask only to directories.
+
+nohidden Files with the Windows-specific HIDDEN (FILE_ATTRIBUTE_HIDDEN)
+ attribute will not be shown under Linux.
+
+sys_immutable Files with the Windows-specific SYSTEM
+ (FILE_ATTRIBUTE_SYSTEM) attribute will be marked as system
+ immutable files.
+
+discard Enable support of the TRIM command for improved performance
+ on delete operations, which is recommended for use with the
+ solid-state drives (SSD).
+
+force Forces the driver to mount partitions even if 'dirty' flag
+ (volume dirty) is set. Not recommended for use.
+
+sparse Create new files as "sparse".
+
+showmeta Use this parameter to show all meta-files (System Files) on
+ a mounted NTFS partition.
+ By default, all meta-files are hidden.
+
+prealloc Preallocate space for files excessively when file size is
+ increasing on writes. Decreases fragmentation in case of
+ parallel write operations to different files.
+
+no_acs_rules "No access rules" mount option sets access rights for
+ files/folders to 777 and owner/group to root. This mount
+ option absorbs all other permissions:
+ - permissions change for files/folders will be reported
+ as successful, but they will remain 777;
+ - owner/group change will be reported as successful, but
+ they will stay as root
+
+acl Support POSIX ACLs (Access Control Lists). Effective if
+ supported by Kernel. Not to be confused with NTFS ACLs.
+ The option specified as acl enables support for POSIX ACLs.
+
+noatime All files and directories will not update their last access
+ time attribute if a partition is mounted with this parameter.
+ This option can speed up file system operation.
+
+===============================================================================
+
+ToDo list
+=========
+
+- Full journaling support (currently journal replaying is supported) over JBD.
+
+
+References
+==========
+https://www.paragon-software.com/home/ntfs-linux-professional/
+ - Commercial version of the NTFS driver for Linux.
+
+almaz.alexandrovich@paragon-software.com
+ - Direct e-mail address for feedback and requests on the NTFS3 implementation.
diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 455ca86eb4fc..7da6c30ed596 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -427,6 +427,9 @@ b) If a file residing on a lower layer is opened for read-only and then
memory mapped with MAP_SHARED, then subsequent changes to the file are not
reflected in the memory mapping.
+c) If a file residing on a lower layer is being executed, then opening that
+file for write or truncating the file will not be denied with ETXTBSY.
+
The following options allow overlayfs to act more like a standards
compliant filesystem:
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 14c31eced416..bf5c48066fac 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -432,7 +432,7 @@ As of kernel 2.6.22, the following members are defined:
const char *(*get_link) (struct dentry *, struct inode *,
struct delayed_call *);
int (*permission) (struct user_namespace *, struct inode *, int);
- int (*get_acl)(struct inode *, int);
+ struct posix_acl * (*get_acl)(struct inode *, int, bool);
int (*setattr) (struct user_namespace *, struct dentry *, struct iattr *);
int (*getattr) (struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);