diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/ext2.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/blockmap.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.rst | 13 | ||||
-rw-r--r-- | Documentation/filesystems/fscrypt.rst | 22 | ||||
-rw-r--r-- | Documentation/filesystems/fsverity.rst | 53 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 9 | ||||
-rw-r--r-- | Documentation/filesystems/netfs_library.rst | 8 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting.rst | 8 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 65 |
10 files changed, 111 insertions, 73 deletions
diff --git a/Documentation/filesystems/ext2.rst b/Documentation/filesystems/ext2.rst index 154101cf0e4f..92aae683e16a 100644 --- a/Documentation/filesystems/ext2.rst +++ b/Documentation/filesystems/ext2.rst @@ -59,8 +59,6 @@ acl Enable POSIX Access Control Lists support (requires CONFIG_EXT2_FS_POSIX_ACL). noacl Don't support POSIX ACLs. -nobh Do not attach buffer_heads to file pagecache. - quota, usrquota Enable user disk quota support (requires CONFIG_QUOTA). diff --git a/Documentation/filesystems/ext4/blockmap.rst b/Documentation/filesystems/ext4/blockmap.rst index 2bd990402a5c..cc596541ce79 100644 --- a/Documentation/filesystems/ext4/blockmap.rst +++ b/Documentation/filesystems/ext4/blockmap.rst @@ -1,7 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| i.i_block Offset | Where It Points | +| i.i_block Offset | Where It Points | +=====================+==============================================================================================================================================================================================================================+ | 0 to 11 | Direct map to file blocks 0 to 11. | +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index ad8dc8c040a2..98dc24f5c6f0 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -818,10 +818,11 @@ Compression implementation Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS) - interface to reclaim compressed space and show it to user after putting the - immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping - on the file, until reserving compressed space via - ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero. + interface to reclaim compressed space and show it to user after setting a + special flag to the inode. Once the compressed space is released, the flag + will block writing data to the file until either the compressed space is + reserved via ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or the file size is + truncated to zero. Compress metadata layout:: @@ -830,12 +831,12 @@ Compress metadata layout:: | cluster 1 | cluster 2 | ......... | cluster N | +-----------------------------------------------+ . . . . - . . . . + . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 | +----------+---------+---------+---------+ +---------+---------+---------+---------+ - . . + . . . . . . +-------------+-------------+----------+----------------------------+ diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index 2e9aaa295125..5ba5817c17c2 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -337,6 +337,7 @@ Currently, the following pairs of encryption modes are supported: - AES-256-XTS for contents and AES-256-CTS-CBC for filenames - AES-128-CBC for contents and AES-128-CTS-CBC for filenames - Adiantum for both contents and filenames +- AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only) If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair. @@ -357,6 +358,17 @@ To use Adiantum, CONFIG_CRYPTO_ADIANTUM must be enabled. Also, fast implementations of ChaCha and NHPoly1305 should be enabled, e.g. CONFIG_CRYPTO_CHACHA20_NEON and CONFIG_CRYPTO_NHPOLY1305_NEON for ARM. +AES-256-HCTR2 is another true wide-block encryption mode that is intended for +use on CPUs with dedicated crypto instructions. AES-256-HCTR2 has the property +that a bitflip in the plaintext changes the entire ciphertext. This property +makes it desirable for filename encryption since initialization vectors are +reused within a directory. For more details on AES-256-HCTR2, see the paper +"Length-preserving encryption with HCTR2" +(https://eprint.iacr.org/2021/1441.pdf). To use AES-256-HCTR2, +CONFIG_CRYPTO_HCTR2 must be enabled. Also, fast implementations of XCTR and +POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and +CRYPTO_AES_ARM64_CE_BLK for ARM64. + New encryption modes can be added relatively easily, without changes to individual filesystems. However, authenticated encryption (AE) modes are not currently supported because of the difficulty of dealing @@ -404,11 +416,11 @@ alternatively has the file's nonce (for `DIRECT_KEY policies`_) or inode number (for `IV_INO_LBLK_64 policies`_) included in the IVs. Thus, IV reuse is limited to within a single directory. -With CTS-CBC, the IV reuse means that when the plaintext filenames -share a common prefix at least as long as the cipher block size (16 -bytes for AES), the corresponding encrypted filenames will also share -a common prefix. This is undesirable. Adiantum does not have this -weakness, as it is a wide-block encryption mode. +With CTS-CBC, the IV reuse means that when the plaintext filenames share a +common prefix at least as long as the cipher block size (16 bytes for AES), the +corresponding encrypted filenames will also share a common prefix. This is +undesirable. Adiantum and HCTR2 do not have this weakness, as they are +wide-block encryption modes. All supported filenames encryption modes accept any plaintext length >= 16 bytes; cipher block alignment is not required. However, diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index 756f2c215ba1..cb8e7573882a 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -11,9 +11,9 @@ Introduction fs-verity (``fs/verity/``) is a support layer that filesystems can hook into to support transparent integrity and authenticity protection -of read-only files. Currently, it is supported by the ext4 and f2fs -filesystems. Like fscrypt, not too much filesystem-specific code is -needed to support fs-verity. +of read-only files. Currently, it is supported by the ext4, f2fs, and +btrfs filesystems. Like fscrypt, not too much filesystem-specific +code is needed to support fs-verity. fs-verity is similar to `dm-verity <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ @@ -473,9 +473,9 @@ files being swapped around. Filesystem support ================== -fs-verity is currently supported by the ext4 and f2fs filesystems. -The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity -on either filesystem. +fs-verity is supported by several filesystems, described below. The +CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity on +any of these filesystems. ``include/linux/fsverity.h`` declares the interface between the ``fs/verity/`` support layer and filesystems. Briefly, filesystems @@ -544,6 +544,13 @@ Currently, f2fs verity only supports a Merkle tree block size of 4096. Also, f2fs doesn't support enabling verity on files that currently have atomic or volatile writes pending. +btrfs +----- + +btrfs supports fs-verity since Linux v5.15. Verity-enabled inodes are +marked with a RO_COMPAT inode flag, and the verity metadata is stored +in separate btree items. + Implementation details ====================== @@ -622,14 +629,14 @@ workqueue, and then the workqueue work does the decryption or verification. Finally, pages where no decryption or verity error occurred are marked Uptodate, and the pages are unlocked. -Files on ext4 and f2fs may contain holes. Normally, ``->readahead()`` -simply zeroes holes and sets the corresponding pages Uptodate; no bios -are issued. To prevent this case from bypassing fs-verity, these -filesystems use fsverity_verify_page() to verify hole pages. +On many filesystems, files can contain holes. Normally, +``->readahead()`` simply zeroes holes and sets the corresponding pages +Uptodate; no bios are issued. To prevent this case from bypassing +fs-verity, these filesystems use fsverity_verify_page() to verify hole +pages. -ext4 and f2fs disable direct I/O on verity files, since otherwise -direct I/O would bypass fs-verity. (They also do the same for -encrypted files.) +Filesystems also disable direct I/O on verity files, since otherwise +direct I/O would bypass fs-verity. Userspace utility ================= @@ -648,7 +655,7 @@ Tests To test fs-verity, use xfstests. For example, using `kvm-xfstests <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: - kvm-xfstests -c ext4,f2fs -g verity + kvm-xfstests -c ext4,f2fs,btrfs -g verity FAQ === @@ -771,15 +778,15 @@ weren't already directly answered in other parts of this document. e.g. magically trigger construction of a Merkle tree. :Q: Does fs-verity support remote filesystems? -:A: Only ext4 and f2fs support is implemented currently, but in - principle any filesystem that can store per-file verity metadata - can support fs-verity, regardless of whether it's local or remote. - Some filesystems may have fewer options of where to store the - verity metadata; one possibility is to store it past the end of - the file and "hide" it from userspace by manipulating i_size. The - data verification functions provided by ``fs/verity/`` also assume - that the filesystem uses the Linux pagecache, but both local and - remote filesystems normally do so. +:A: So far all filesystems that have implemented fs-verity support are + local filesystems, but in principle any filesystem that can store + per-file verity metadata can support fs-verity, regardless of + whether it's local or remote. Some filesystems may have fewer + options of where to store the verity metadata; one possibility is + to store it past the end of the file and "hide" it from userspace + by manipulating i_size. The data verification functions provided + by ``fs/verity/`` also assume that the filesystem uses the Linux + pagecache, but both local and remote filesystems normally do so. :Q: Why is anything filesystem-specific at all? Shouldn't fs-verity be implemented entirely at the VFS level? diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index c0fe711f14d3..4bb2627026ec 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -252,9 +252,8 @@ prototypes:: bool (*release_folio)(struct folio *, gfp_t); void (*free_folio)(struct folio *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter); - bool (*isolate_page) (struct page *, isolate_mode_t); - int (*migratepage)(struct address_space *, struct page *, struct page *); - void (*putback_page) (struct page *); + int (*migrate_folio)(struct address_space *, struct folio *dst, + struct folio *src, enum migrate_mode); int (*launder_folio)(struct folio *); bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count); int (*error_remove_page)(struct address_space *, struct page *); @@ -280,9 +279,7 @@ invalidate_folio: yes exclusive release_folio: yes free_folio: yes direct_IO: -isolate_page: yes -migratepage: yes (both) -putback_page: yes +migrate_folio: yes (both) launder_folio: yes is_partially_uptodate: yes error_remove_page: yes diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst index 4d19b19bcc08..73a4176144b3 100644 --- a/Documentation/filesystems/netfs_library.rst +++ b/Documentation/filesystems/netfs_library.rst @@ -301,7 +301,7 @@ through which it can issue requests and negotiate:: void (*issue_read)(struct netfs_io_subrequest *subreq); bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, - struct folio *folio, void **_fsdata); + struct folio **foliop, void **_fsdata); void (*done)(struct netfs_io_request *rreq); }; @@ -381,8 +381,10 @@ The operations are as follows: allocated/grabbed the folio to be modified to allow the filesystem to flush conflicting state before allowing it to be modified. - It should return 0 if everything is now fine, -EAGAIN if the folio should be - regrabbed and any other error code to abort the operation. + It may unlock and discard the folio it was given and set the caller's folio + pointer to NULL. It should return 0 if everything is now fine (``*foliop`` + left set) or the op should be retried (``*foliop`` cleared) and any other + error code to abort the operation. * ``done`` diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 7da6c30ed596..4c76fda07645 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -607,7 +607,7 @@ can be removed. User xattr ---------- -The the "-o userxattr" mount option forces overlayfs to use the +The "-o userxattr" mount option forces overlayfs to use the "user.overlay." xattr namespace instead of "trusted.overlay.". This is useful for unprivileged mounting of overlayfs. diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 2e0e4f0e0c6f..aee9aaf9f3df 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -914,3 +914,11 @@ Calling conventions for file_open_root() changed; now it takes struct path * instead of passing mount and dentry separately. For callers that used to pass <mnt, mnt->mnt_root> pair (i.e. the root of given mount), a new helper is provided - file_open_root_mnt(). In-tree users adjusted. + +--- + +**mandatory** + +no_llseek is gone; don't set .llseek to that - just leave it NULL instead. +Checks for "does that file have llseek(2), or should it fail with ESPIPE" +should be done by looking at FMODE_LSEEK in file->f_mode. diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 08069ecd49a6..6cd6953e175b 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -737,12 +737,8 @@ cache in your filesystem. The following members are defined: bool (*release_folio)(struct folio *, gfp_t); void (*free_folio)(struct folio *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); - /* isolate a page for migration */ - bool (*isolate_page) (struct page *, isolate_mode_t); - /* migrate the contents of a page to the specified target */ - int (*migratepage) (struct page *, struct page *); - /* put migration-failed page back to right list */ - void (*putback_page) (struct page *); + int (*migrate_folio)(struct mapping *, struct folio *dst, + struct folio *src, enum migrate_mode); int (*launder_folio) (struct folio *); bool (*is_partially_uptodate) (struct folio *, size_t from, @@ -774,13 +770,38 @@ cache in your filesystem. The following members are defined: See the file "Locking" for more details. ``read_folio`` - called by the VM to read a folio from backing store. The folio - will be locked when read_folio is called, and should be unlocked - and marked uptodate once the read completes. If ->read_folio - discovers that it cannot perform the I/O at this time, it can - unlock the folio and return AOP_TRUNCATED_PAGE. In this case, - the folio will be looked up again, relocked and if that all succeeds, - ->read_folio will be called again. + Called by the page cache to read a folio from the backing store. + The 'file' argument supplies authentication information to network + filesystems, and is generally not used by block based filesystems. + It may be NULL if the caller does not have an open file (eg if + the kernel is performing a read for itself rather than on behalf + of a userspace process with an open file). + + If the mapping does not support large folios, the folio will + contain a single page. The folio will be locked when read_folio + is called. If the read completes successfully, the folio should + be marked uptodate. The filesystem should unlock the folio + once the read has completed, whether it was successful or not. + The filesystem does not need to modify the refcount on the folio; + the page cache holds a reference count and that will not be + released until the folio is unlocked. + + Filesystems may implement ->read_folio() synchronously. + In normal operation, folios are read through the ->readahead() + method. Only if this fails, or if the caller needs to wait for + the read to complete will the page cache call ->read_folio(). + Filesystems should not attempt to perform their own readahead + in the ->read_folio() operation. + + If the filesystem cannot perform the read at this time, it can + unlock the folio, do whatever action it needs to ensure that the + read will succeed in the future and return AOP_TRUNCATED_PAGE. + In this case, the caller should look up the folio, lock it, + and call ->read_folio again. + + Callers may invoke the ->read_folio() method directly, but using + read_mapping_folio() will take care of locking, waiting for the + read to complete and handle cases such as AOP_TRUNCATED_PAGE. ``writepages`` called by the VM to write out pages associated with the @@ -905,20 +926,12 @@ cache in your filesystem. The following members are defined: data directly between the storage and the application's address space. -``isolate_page`` - Called by the VM when isolating a movable non-lru page. If page - is successfully isolated, VM marks the page as PG_isolated via - __SetPageIsolated. - -``migrate_page`` +``migrate_folio`` This is used to compact the physical memory usage. If the VM - wants to relocate a page (maybe off a memory card that is - signalling imminent failure) it will pass a new page and an old - page to this function. migrate_page should transfer any private - data across and update any references that it has to the page. - -``putback_page`` - Called by the VM when isolated page's migration fails. + wants to relocate a folio (maybe from a memory device that is + signalling imminent failure) it will pass a new folio and an old + folio to this function. migrate_folio should transfer any private + data across and update any references that it has to the folio. ``launder_folio`` Called before freeing a folio - it writes back the dirty folio. |