diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/configfs.rst | 48 | ||||
-rw-r--r-- | Documentation/filesystems/debugfs.rst | 8 | ||||
-rw-r--r-- | Documentation/filesystems/erofs.rst | 38 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.rst | 13 | ||||
-rw-r--r-- | Documentation/filesystems/fscrypt.rst | 7 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 10 | ||||
-rw-r--r-- | Documentation/filesystems/mount_api.rst | 12 | ||||
-rw-r--r-- | Documentation/filesystems/porting.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/proc.rst | 28 | ||||
-rw-r--r-- | Documentation/filesystems/sysfs.rst | 41 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 5 |
11 files changed, 108 insertions, 106 deletions
diff --git a/Documentation/filesystems/configfs.rst b/Documentation/filesystems/configfs.rst index 1d3d6f4a82a9..8c9342ed6d25 100644 --- a/Documentation/filesystems/configfs.rst +++ b/Documentation/filesystems/configfs.rst @@ -289,7 +289,6 @@ config_item_type:: const char *name); struct config_group *(*make_group)(struct config_group *group, const char *name); - int (*commit_item)(struct config_item *item); void (*disconnect_notify)(struct config_group *group, struct config_item *item); void (*drop_item)(struct config_group *group, @@ -486,50 +485,3 @@ up. Here, the heartbeat code calls configfs_depend_item(). If it succeeds, then heartbeat knows the region is safe to give to ocfs2. If it fails, it was being torn down anyway, and heartbeat can gracefully pass up an error. - -Committable Items -================= - -Note: - Committable items are currently unimplemented. - -Some config_items cannot have a valid initial state. That is, no -default values can be specified for the item's attributes such that the -item can do its work. Userspace must configure one or more attributes, -after which the subsystem can start whatever entity this item -represents. - -Consider the FakeNBD device from above. Without a target address *and* -a target device, the subsystem has no idea what block device to import. -The simple example assumes that the subsystem merely waits until all the -appropriate attributes are configured, and then connects. This will, -indeed, work, but now every attribute store must check if the attributes -are initialized. Every attribute store must fire off the connection if -that condition is met. - -Far better would be an explicit action notifying the subsystem that the -config_item is ready to go. More importantly, an explicit action allows -the subsystem to provide feedback as to whether the attributes are -initialized in a way that makes sense. configfs provides this as -committable items. - -configfs still uses only normal filesystem operations. An item is -committed via rename(2). The item is moved from a directory where it -can be modified to a directory where it cannot. - -Any group that provides the ct_group_ops->commit_item() method has -committable items. When this group appears in configfs, mkdir(2) will -not work directly in the group. Instead, the group will have two -subdirectories: "live" and "pending". The "live" directory does not -support mkdir(2) or rmdir(2) either. It only allows rename(2). The -"pending" directory does allow mkdir(2) and rmdir(2). An item is -created in the "pending" directory. Its attributes can be modified at -will. Userspace commits the item by renaming it into the "live" -directory. At this point, the subsystem receives the ->commit_item() -callback. If all required attributes are filled to satisfaction, the -method returns zero and the item is moved to the "live" directory. - -As rmdir(2) does not work in the "live" directory, an item must be -shutdown, or "uncommitted". Again, this is done via rename(2), this -time from the "live" directory back to the "pending" one. The subsystem -is notified by the ct_group_ops->uncommit_object() method. diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst index 71b1fee56d2a..dc35da8b8792 100644 --- a/Documentation/filesystems/debugfs.rst +++ b/Documentation/filesystems/debugfs.rst @@ -155,8 +155,8 @@ any code which does so in the mainline. Note that all files created with debugfs_create_blob() are read-only. If you want to dump a block of registers (something that happens quite -often during development, even if little such code reaches mainline. -Debugfs offers two functions: one to make a registers-only file, and +often during development, even if little such code reaches mainline), +debugfs offers two functions: one to make a registers-only file, and another to insert a register block in the middle of another sequential file:: @@ -183,7 +183,7 @@ The "base" argument may be 0, but you may want to build the reg32 array using __stringify, and a number of register names (macros) are actually byte offsets over a base for the register block. -If you want to dump an u32 array in debugfs, you can create file with:: +If you want to dump a u32 array in debugfs, you can create a file with:: struct debugfs_u32_array { u32 *array; @@ -197,7 +197,7 @@ If you want to dump an u32 array in debugfs, you can create file with:: The "array" argument wraps a pointer to the array's data and the number of its elements. Note: Once array is created its size can not be changed. -There is a helper function to create device related seq_file:: +There is a helper function to create a device-related seq_file:: void debugfs_create_devm_seqfile(struct device *dev, const char *name, diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst index 05e03d54af1a..067fd1670b1f 100644 --- a/Documentation/filesystems/erofs.rst +++ b/Documentation/filesystems/erofs.rst @@ -30,12 +30,18 @@ It is implemented to be a better choice for the following scenarios: especially for those embedded devices with limited memory and high-density hosts with numerous containers. -Here is the main features of EROFS: +Here are the main features of EROFS: - Little endian on-disk design; - - 4KiB block size and 32-bit block addresses, therefore 16TiB address space - at most for now; + - Block-based distribution and file-based distribution over fscache are + supported; + + - Support multiple devices to refer to external blobs, which can be used + for container images; + + - 4KiB block size and 32-bit block addresses for each device, therefore + 16TiB address space at most for now; - Two inode layouts for different requirements: @@ -50,28 +56,31 @@ Here is the main features of EROFS: Metadata reserved 8 bytes 18 bytes ===================== ============ ====================================== - - Metadata and data could be mixed as an option; - - - Support extended attributes (xattrs) as an option; + - Support extended attributes as an option; - - Support tailpacking data and xattr inline compared to byte-addressed - unaligned metadata or smaller block size alternatives; - - - Support POSIX.1e ACLs by using xattrs; + - Support POSIX.1e ACLs by using extended attributes; - Support transparent data compression as an option: LZ4 and MicroLZMA algorithms can be used on a per-file basis; In addition, inplace decompression is also supported to avoid bounce compressed buffers and page cache thrashing. + - Support chunk-based data deduplication and rolling-hash compressed data + deduplication; + + - Support tailpacking inline compared to byte-addressed unaligned metadata + or smaller block size alternatives; + + - Support merging tail-end data into a special inode as fragments. + + - Support large folios for uncompressed files. + - Support direct I/O on uncompressed files to avoid double caching for loop devices; - Support FSDAX on uncompressed images for secure containers and ramdisks in order to get rid of unnecessary page cache. - - Support multiple devices for multi blob container images; - - Support file-based on-demand loading with the Fscache infrastructure. The following git tree provides the file system user-space tools under @@ -259,7 +268,7 @@ By the way, chunk-based files are all uncompressed for now. Data compression ---------------- -EROFS implements LZ4 fixed-sized output compression which generates fixed-sized +EROFS implements fixed-sized output compression which generates fixed-sized compressed data blocks from variable-sized input in contrast to other existing fixed-sized input solutions. Relatively higher compression ratios can be gotten by using fixed-sized output compression since nowadays popular data compression @@ -314,3 +323,6 @@ to understand its delta0 is constantly 1, as illustrated below:: If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT, but it's easy to know the size of such pcluster is 1 lcluster as well. + +Since Linux v6.1, each pcluster can be used for multiple variable-sized extents, +therefore it can be used for compressed data deduplication. diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 17df9a02ccff..220f3e0d3f55 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -25,10 +25,14 @@ a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs). - git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git -For reporting bugs and sending patches, please use the following mailing list: +For sending patches, please use the following mailing list: - linux-f2fs-devel@lists.sourceforge.net +For reporting bugs, please use the following f2fs bug tracker link: + +- https://bugzilla.kernel.org/enter_bug.cgi?product=File%20System&component=f2fs + Background and Design issues ============================ @@ -154,6 +158,8 @@ nobarrier This option can be used if underlying storage guarantees If this option is set, no cache_flush commands are issued but f2fs still guarantees the write ordering of all the data writes. +barrier If this option is set, cache_flush commands are allowed to be + issued. fastboot This option is used when a system wants to reduce mount time as much as possible, even though normal performance can be sacrificed. @@ -199,6 +205,7 @@ fault_type=%d Support configuring fault injection type, should be FAULT_SLAB_ALLOC 0x000008000 FAULT_DQUOT_INIT 0x000010000 FAULT_LOCK_OP 0x000020000 + FAULT_BLKADDR 0x000040000 =================== =========== mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random @@ -340,6 +347,10 @@ memory=%s Control memory mode. This supports "normal" and "low" modes. Because of the nature of low memory devices, in this mode, f2fs will try to save memory sometimes by sacrificing performance. "normal" mode is the default mode and same as before. +age_extent_cache Enable an age extent cache based on rb-tree. It records + data block update frequency of the extent per inode, in + order to provide better temperature hints for data block + allocation. ======================== ============================================================ Debugfs Entries diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index 5ba5817c17c2..ef183387da20 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -338,6 +338,7 @@ Currently, the following pairs of encryption modes are supported: - AES-128-CBC for contents and AES-128-CTS-CBC for filenames - Adiantum for both contents and filenames - AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only) +- SM4-XTS for contents and SM4-CTS-CBC for filenames (v2 policies only) If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair. @@ -369,6 +370,12 @@ CONFIG_CRYPTO_HCTR2 must be enabled. Also, fast implementations of XCTR and POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and CRYPTO_AES_ARM64_CE_BLK for ARM64. +SM4 is a Chinese block cipher that is an alternative to AES. It has +not seen as much security review as AES, and it only has a 128-bit key +size. It may be useful in cases where its use is mandated. +Otherwise, it should not be used. For SM4 support to be available, it +also needs to be enabled in the kernel crypto API. + New encryption modes can be added relatively easily, without changes to individual filesystems. However, authenticated encryption (AE) modes are not currently supported because of the difficulty of dealing diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 8f737e76935c..36fa2a83d714 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -70,7 +70,7 @@ prototypes:: const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int, unsigned int); - struct posix_acl * (*get_acl)(struct inode *, int, bool); + struct posix_acl * (*get_inode_acl)(struct inode *, int, bool); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (const struct path *, struct kstat *, u32, unsigned int); ssize_t (*listxattr) (struct dentry *, char *, size_t); @@ -84,13 +84,14 @@ prototypes:: int (*fileattr_set)(struct user_namespace *mnt_userns, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); + struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int); locking rules: all may block -============= ============================================= +============== ============================================= ops i_rwsem(inode) -============= ============================================= +============== ============================================= lookup: shared create: exclusive link: exclusive (both) @@ -104,6 +105,7 @@ readlink: no get_link: no setattr: exclusive permission: no (may not block if called in rcu-walk mode) +get_inode_acl: no get_acl: no getattr: no listxattr: no @@ -113,7 +115,7 @@ atomic_open: shared (exclusive if O_CREAT is set in open flags) tmpfile: no fileattr_get: no or exclusive fileattr_set: exclusive -============= ============================================= +============== ============================================= Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem diff --git a/Documentation/filesystems/mount_api.rst b/Documentation/filesystems/mount_api.rst index eb358a00be27..63204d2094fd 100644 --- a/Documentation/filesystems/mount_api.rst +++ b/Documentation/filesystems/mount_api.rst @@ -562,17 +562,6 @@ or looking up of superblocks. The following helpers all wrap sget_fc(): - * :: - - int vfs_get_super(struct fs_context *fc, - enum vfs_get_super_keying keying, - int (*fill_super)(struct super_block *sb, - struct fs_context *fc)) - - This creates/looks up a deviceless superblock. The keying indicates how - many superblocks of this type may exist and in what manner they may be - shared: - (1) vfs_get_single_super Only one such superblock may exist in the system. Any further @@ -814,6 +803,7 @@ process the parameters it is given. int fs_lookup_param(struct fs_context *fc, struct fs_parameter *value, bool want_bdev, + unsigned int flags, struct path *_path); This takes a parameter that carries a string or filename type and attempts diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index df0dc37e6f58..d2d684ae7798 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -462,8 +462,8 @@ ERR_PTR(...). argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. generic_permission() has also lost the check_acl argument; ACL checking -has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl -to read an ACL from disk. +has been taken to VFS and filesystems need to provide a non-NULL +->i_op->get_inode_acl to read an ACL from disk. --- diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 898c99eae8e4..e224b6d5b642 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 3.10 /proc/<pid>/timerslack_ns - Task timerslack value 3.11 /proc/<pid>/patch_state - Livepatch patch operation state 3.12 /proc/<pid>/arch_status - Task architecture specific information + 3.13 /proc/<pid>/fd - List of symlinks to open files 4 Configuring procfs 4.1 Mount options @@ -245,7 +246,8 @@ It's slow but very precise. Ngid NUMA group ID (0 if none) Pid process id PPid process id of the parent process - TracerPid PID of process tracing this process (0 if not) + TracerPid PID of process tracing this process (0 if not, or + the tracer is outside of the current pid namespace) Uid Real, effective, saved set, and file system UIDs Gid Real, effective, saved set, and file system GIDs FDSize number of file descriptor slots currently allocated @@ -426,14 +428,16 @@ with the memory region, as the case would be with BSS (uninitialized data). The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: - ============= ==================================== + =================== =========================================== [heap] the heap of the program [stack] the stack of the main process [vdso] the "virtual dynamic shared object", the kernel system call handler - [anon:<name>] an anonymous mapping that has been + [anon:<name>] a private anonymous mapping that has been named by userspace - ============= ==================================== + [anon_shmem:<name>] an anonymous shared memory mapping that has + been named by userspace + =================== =========================================== or if empty, the mapping is anonymous. @@ -2149,6 +2153,22 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc/<pid>/fd - List of symlinks to open files +------------------------------------------------------- +This directory contains symbolic links which represent open files +the process is maintaining. Example output:: + + lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null + l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null + lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]' + lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]' + lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]' + +The number of open files for the process is stored in 'size' member +of stat() output for /proc/<pid>/fd for fast access. +------------------------------------------------------- + + Chapter 4: Configuring procfs ============================= diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst index 8bba676b1365..f8187d466b97 100644 --- a/Documentation/filesystems/sysfs.rst +++ b/Documentation/filesystems/sysfs.rst @@ -12,10 +12,10 @@ Mike Murphy <mamurph@cs.clemson.edu> :Original: 10 January 2003 -What it is: -~~~~~~~~~~~ +What it is +~~~~~~~~~~ -sysfs is a ram-based filesystem initially based on ramfs. It provides +sysfs is a RAM-based filesystem initially based on ramfs. It provides a means to export kernel data structures, their attributes, and the linkages between them to userspace. @@ -43,7 +43,7 @@ userspace. Top-level directories in sysfs represent the common ancestors of object hierarchies; i.e. the subsystems the objects belong to. -Sysfs internally stores a pointer to the kobject that implements a +sysfs internally stores a pointer to the kobject that implements a directory in the kernfs_node object associated with the directory. In the past this kobject pointer has been used by sysfs to do reference counting directly on the kobject whenever the file is opened or closed. @@ -55,7 +55,7 @@ Attributes ~~~~~~~~~~ Attributes can be exported for kobjects in the form of regular files in -the filesystem. Sysfs forwards file I/O operations to methods defined +the filesystem. sysfs forwards file I/O operations to methods defined for the attributes, providing a means to read and write kernel attributes. @@ -72,8 +72,8 @@ you publicly humiliated and your code rewritten without notice. An attribute definition is simply:: struct attribute { - char * name; - struct module *owner; + char *name; + struct module *owner; umode_t mode; }; @@ -138,7 +138,7 @@ __ATTR_WO(name): assumes a name_store only and is restricted to mode 0200 that is root write access only. __ATTR_RO_MODE(name, mode): - fore more restrictive RO access currently + for more restrictive RO access; currently only use case is the EFI System Resource Table (see drivers/firmware/efi/esrt.c) __ATTR_RW(name): @@ -207,7 +207,7 @@ IOW, they should take only an object, an attribute, and a buffer as parameters. sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the -method. Sysfs will call the method exactly once for each read or +method. sysfs will call the method exactly once for each read or write. This forces the following behavior on the method implementations: @@ -221,7 +221,7 @@ implementations: be called again, rearmed, to fill the buffer. - On write(2), sysfs expects the entire buffer to be passed during the - first write. Sysfs then passes the entire buffer to the store() method. + first write. sysfs then passes the entire buffer to the store() method. A terminating null is added after the data on stores. This makes functions like sysfs_streq() safe to use. @@ -237,7 +237,7 @@ Other notes: - Writing causes the show() method to be rearmed regardless of current file position. -- The buffer will always be PAGE_SIZE bytes in length. On i386, this +- The buffer will always be PAGE_SIZE bytes in length. On x86, this is 4096. - show() methods should return the number of bytes printed into the @@ -253,7 +253,7 @@ Other notes: through, be sure to return an error. - The object passed to the methods will be pinned in memory via sysfs - referencing counting its embedded object. However, the physical + reference counting its embedded object. However, the physical entity (e.g. device) the object represents may not be present. Be sure to have a way to check this, if necessary. @@ -295,8 +295,12 @@ The top level sysfs directory looks like:: dev/ devices/ firmware/ - net/ fs/ + hypervisor/ + kernel/ + module/ + net/ + power/ devices/ contains a filesystem representation of the device tree. It maps directly to the internal kernel device tree, which is a hierarchy of @@ -317,15 +321,18 @@ span multiple bus types). fs/ contains a directory for some filesystems. Currently each filesystem wanting to export attributes must create its own hierarchy -below fs/ (see ./fuse.txt for an example). +below fs/ (see ./fuse.rst for an example). + +module/ contains parameter values and state information for all +loaded system modules, for both builtin and loadable modules. -dev/ contains two directories char/ and block/. Inside these two +dev/ contains two directories: char/ and block/. Inside these two directories there are symlinks named <major>:<minor>. These symlinks point to the sysfs directory for the given device. /sys/dev provides a quick way to lookup the sysfs interface for a device from the result of a stat(2) operation. -More information can driver-model specific features can be found in +More information on driver-model specific features can be found in Documentation/driver-api/driver-model/. @@ -335,7 +342,7 @@ TODO: Finish this section. Current Interfaces ~~~~~~~~~~~~~~~~~~ -The following interface layers currently exist in sysfs: +The following interface layers currently exist in sysfs. devices (include/linux/device.h) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 2b55f71e2ae1..2c15e7053113 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -435,7 +435,7 @@ As of kernel 2.6.22, the following members are defined: const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *); int (*permission) (struct user_namespace *, struct inode *, int); - struct posix_acl * (*get_acl)(struct inode *, int, bool); + struct posix_acl * (*get_inode_acl)(struct inode *, int, bool); int (*setattr) (struct user_namespace *, struct dentry *, struct iattr *); int (*getattr) (struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int); ssize_t (*listxattr) (struct dentry *, char *, size_t); @@ -443,7 +443,8 @@ As of kernel 2.6.22, the following members are defined: int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode); int (*tmpfile) (struct user_namespace *, struct inode *, struct file *, umode_t); - int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int); + struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int); + int (*set_acl)(struct user_namespace *, struct dentry *, struct posix_acl *, int); int (*fileattr_set)(struct user_namespace *mnt_userns, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); |