diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-09-21 23:37:39 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-09-21 23:37:39 +0300 |
commit | 70cb0d02b58128db07fc39b5e87a2873e2c16bde (patch) | |
tree | 43c0a4eb00f192ceb306b9c52503b2d54bc59660 /Documentation/filesystems/ext4 | |
parent | 104c0d6bc43e10ba84931c45b67e2c76c9c67f68 (diff) | |
parent | 040823b5372b445d1d9483811e85a24d71314d33 (diff) | |
download | linux-70cb0d02b58128db07fc39b5e87a2873e2c16bde.tar.xz |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Added new ext4 debugging ioctls to allow userspace to get information
about the state of the extent status cache.
Dropped workaround for pre-1970 dates which were encoded incorrectly
in pre-4.4 kernels. Since both the kernel correctly generates, and
e2fsck detects and fixes this issue for the past four years, it'e time
to drop the workaround. (Also, it's not like files with dates in the
distant past were all that common in the first place.)
A lot of miscellaneous bug fixes and cleanups, including some ext4
Documentation fixes. Also included are two minor bug fixes in
fs/unicode"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
unicode: make array 'token' static const, makes object smaller
unicode: Move static keyword to the front of declarations
ext4: add missing bigalloc documentation.
ext4: fix kernel oops caused by spurious casefold flag
ext4: fix integer overflow when calculating commit interval
ext4: use percpu_counters for extent_status cache hits/misses
ext4: fix potential use after free after remounting with noblock_validity
jbd2: add missing tracepoint for reserved handle
ext4: fix punch hole for inline_data file systems
ext4: rework reserved cluster accounting when invalidating pages
ext4: documentation fixes
ext4: treat buffers with write errors as containing valid data
ext4: fix warning inside ext4_convert_unwritten_extents_endio
ext4: set error return correctly when ext4_htree_store_dirent fails
ext4: drop legacy pre-1970 encoding workaround
ext4: add new ioctl EXT4_IOC_GET_ES_CACHE
ext4: add a new ioctl EXT4_IOC_GETSTATE
ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE
jbd2: flush_descriptor(): Do not decrease buffer head's ref count
ext4: remove unnecessary error check
...
Diffstat (limited to 'Documentation/filesystems/ext4')
-rw-r--r-- | Documentation/filesystems/ext4/bigalloc.rst | 32 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/blockgroup.rst | 10 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/blocks.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/directory.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/group_descr.rst | 9 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/inodes.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/super.rst | 20 |
7 files changed, 53 insertions, 28 deletions
diff --git a/Documentation/filesystems/ext4/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst index c6d88557553c..72075aa608e4 100644 --- a/Documentation/filesystems/ext4/bigalloc.rst +++ b/Documentation/filesystems/ext4/bigalloc.rst @@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size exceeds the page size. However, for a filesystem of mostly huge files, it is desirable to be able to allocate disk blocks in units of multiple blocks to reduce both fragmentation and metadata overhead. The -`bigalloc <Bigalloc>`__ feature provides exactly this ability. The -administrator can set a block cluster size at mkfs time (which is stored -in the s\_log\_cluster\_size field in the superblock); from then on, the -block bitmaps track clusters, not individual blocks. This means that -block groups can be several gigabytes in size (instead of just 128MiB); -however, the minimum allocation unit becomes a cluster, not a block, -even for directories. TaoBao had a patchset to extend the “use units of -clusters instead of blocks” to the extent tree, though it is not clear -where those patches went-- they eventually morphed into “extent tree v2” -but that code has not landed as of May 2015. +bigalloc feature provides exactly this ability. + +The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to +use clustered allocation, so that each bit in the ext4 block allocation +bitmap addresses a power of two number of blocks. For example, if the +file system is mainly going to be storing large files in the 4-32 +megabyte range, it might make sense to set a cluster size of 1 megabyte. +This means that each bit in the block allocation bitmap now addresses +256 4k blocks. This shrinks the total size of the block allocation +bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also +means that a block group addresses 32 gigabytes instead of 128 megabytes, +also shrinking the amount of file system overhead for metadata. + +The administrator can set a block cluster size at mkfs time (which is +stored in the s\_log\_cluster\_size field in the superblock); from then +on, the block bitmaps track clusters, not individual blocks. This means +that block groups can be several gigabytes in size (instead of just +128MiB); however, the minimum allocation unit becomes a cluster, not a +block, even for directories. TaoBao had a patchset to extend the “use +units of clusters instead of blocks” to the extent tree, though it is +not clear where those patches went-- they eventually morphed into +“extent tree v2” but that code has not landed as of May 2015. diff --git a/Documentation/filesystems/ext4/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst index baf888e4c06a..3da156633339 100644 --- a/Documentation/filesystems/ext4/blockgroup.rst +++ b/Documentation/filesystems/ext4/blockgroup.rst @@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the superblock, group descriptors, data block bitmaps for groups 0-3, inode bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining space in group 0 is for file data. The effect of this is to group the -block metadata close together for faster loading, and to enable large -files to be continuous on disk. Backup copies of the superblock and -group descriptors are always at the beginning of block groups, even if -flex\_bg is enabled. The number of block groups that make up a flex\_bg -is given by 2 ^ ``sb.s_log_groups_per_flex``. +block group metadata close together for faster loading, and to enable +large files to be continuous on disk. Backup copies of the superblock +and group descriptors are always at the beginning of block groups, even +if flex\_bg is enabled. The number of block groups that make up a +flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``. Meta Block Groups ----------------- diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst index 73d4dc0f7bda..bd722ecd92d6 100644 --- a/Documentation/filesystems/ext4/blocks.rst +++ b/Documentation/filesystems/ext4/blocks.rst @@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is 4KiB. You may experience mounting problems if block size is greater than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory pages). By default a filesystem can contain 2^32 blocks; if the '64bit' -feature is enabled, then a filesystem can have 2^64 blocks. +feature is enabled, then a filesystem can have 2^64 blocks. The location +of structures is stored in terms of the block number the structure lives +in and not the absolute offset on disk. For 32-bit filesystems, limits are as follows: diff --git a/Documentation/filesystems/ext4/directory.rst b/Documentation/filesystems/ext4/directory.rst index 614034e24669..073940cc64ed 100644 --- a/Documentation/filesystems/ext4/directory.rst +++ b/Documentation/filesystems/ext4/directory.rst @@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference - File name. Since file names cannot be longer than 255 bytes, the new directory -entry format shortens the rec\_len field and uses the space for a file +entry format shortens the name\_len field and uses the space for a file type flag, probably to avoid having to load every inode during directory tree traversal. This format is ``ext4_dir_entry_2``, which is at most 263 bytes long, though on disk you'll need to reference diff --git a/Documentation/filesystems/ext4/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst index 0f783ed88592..7ba6114e7f5c 100644 --- a/Documentation/filesystems/ext4/group_descr.rst +++ b/Documentation/filesystems/ext4/group_descr.rst @@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``. * - 0x1E - \_\_le16 - bg\_checksum - - Group descriptor checksum; crc16(sb\_uuid+group+desc) if the - RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) & - 0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set. + - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the + RO\_COMPAT\_GDT\_CSUM feature is set, or + crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the + RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum + field in bg\_desc is skipped when calculating crc16 checksum, + and set to zero if crc32c checksum is used. * - - - diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst index e851e6ca31fa..a65baffb4ebf 100644 --- a/Documentation/filesystems/ext4/inodes.rst +++ b/Documentation/filesystems/ext4/inodes.rst @@ -472,8 +472,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without having to upgrade all of the on-disk inodes. Access to fields beyond EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as -of October 2013) the inode structure is 156 bytes -(``i_extra_isize = 28``). The extra space between the end of the inode +of August 2019) the inode structure is 160 bytes +(``i_extra_isize = 32``). The extra space between the end of the inode structure and the end of the inode record can be used to store extended attributes. Each inode record can be as large as the filesystem block size, though this is not terribly efficient. diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst index 6eae92054827..93e55d7c1d40 100644 --- a/Documentation/filesystems/ext4/super.rst +++ b/Documentation/filesystems/ext4/super.rst @@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in * - 0x1C - \_\_le32 - s\_log\_cluster\_size - - Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is + - Cluster size is 2 ^ (10 + s\_log\_cluster\_size) blocks if bigalloc is enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size. * - 0x20 - \_\_le32 @@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in - Upper 8 bits of the s_wtime field. * - 0x275 - \_\_u8 - - s\_wtime_hi + - s\_mtime_hi - Upper 8 bits of the s_mtime field. * - 0x276 - \_\_u8 @@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in - s\_last_error_time_hi - Upper 8 bits of the s_last_error_time_hi field. * - 0x27A - - \_\_u8[2] - - s\_pad + - \_\_u8 + - s\_pad[2] - Zero padding. * - 0x27C + - \_\_le16 + - s\_encoding + - Filename charset encoding. + * - 0x27E + - \_\_le16 + - s\_encoding_flags + - Filename charset encoding flags. + * - 0x280 - \_\_le32 - - s\_reserved[96] + - s\_reserved[95] - Padding to the end of the block. * - 0x3FC - \_\_le32 @@ -617,7 +625,7 @@ following: * - 0x80 - Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT). * - 0x100 - - Multiple mount protection. Not implemented (INCOMPAT\_MMP). + - Multiple mount protection (INCOMPAT\_MMP). * - 0x200 - Flexible block groups. See the earlier discussion of this feature (INCOMPAT\_FLEX\_BG). |