Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o: "Add as a feature case-insensitive directories (the casefold feature) using Unicode 12.1. Also, the usual largish number of cleanups and bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits) ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present ext4: fix ext4_show_options for file systems w/o journal unicode: refactor the rule for regenerating utf8data.h docs: ext4.rst: document case-insensitive directories ext4: Support case-insensitive file name lookups ext4: include charset encoding information in the superblock MAINTAINERS: add Unicode subsystem entry unicode: update unicode database unicode version 12.1.0 unicode: introduce test module for normalized utf8 implementation unicode: implement higher level API for string handling unicode: reduce the size of utf8data[] unicode: introduce code for UTF-8 normalization unicode: introduce UTF-8 character database ext4: actually request zeroing of inode table after grow ext4: cond_resched in work-heavy group loops ext4: fix use-after-free race with debug_want_extra_isize ext4: avoid drop reference to iloc.bh twice ext4: ignore e_value_offs for xattrs with value-in-ea-inode ext4: protect journal inode's blocks using block_validity ext4: use BUG() instead of BUG_ON(1) ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2019-05-08 07:12:44 +0300
committer: Linus Torvalds <torvalds@linux-foundation.org> 2019-05-08 07:12:44 +0300
commit: 5abe37954e9a315c35c9490f78d55f307c3c636b (patch)
tree: fa2034b03b270c48ac516a8fe308654443b4a7e2 /Documentation/admin-guide
parent: e5fef2a9732580c5bd30c0097f5e9091a3d58ce5 (diff)
parent: db90f41916cf04c020062f8d8b0385942248283e (diff)
download: linux-5abe37954e9a315c35c9490f78d55f307c3c636b.tar.xz
1 files changed, 38 insertions, 0 deletions
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index e506d3dae510..059ddcbe769d 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,48 @@ Currently Available
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
   the ordering)
+* Case-insensitive file name lookups
 
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
 
+case-insensitive file name lookups
+======================================================
+
+The case-insensitive file name lookup feature is supported on a
+per-directory basis, allowing the user to mix case-insensitive and
+case-sensitive directories in the same filesystem.  It is enabled by
+flipping the +F inode attribute of an empty directory.  The
+case-insensitive string match operation is only defined when we know how
+text in encoded in a byte sequence.  For that reason, in order to enable
+case-insensitive directories, the filesystem must have the
+casefold feature, which stores the filesystem-wide encoding
+model used.  By default, the charset adopted is the latest version of
+Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
+form.  The comparison algorithm is implemented by normalizing the
+strings to the Canonical decomposition form, as defined by Unicode,
+followed by a byte per byte comparison.
+
+The case-awareness is name-preserving on the disk, meaning that the file
+name provided by userspace is a byte-per-byte match to what is actually
+written in the disk.  The Unicode normalization format used by the
+kernel is thus an internal representation, and not exposed to the
+userspace nor to the disk, with the important exception of disk hashes,
+used on large case-insensitive directories with DX feature.  On DX
+directories, the hash must be calculated using the casefolded version of
+the filename, meaning that the normalization format used actually has an
+impact on where the directory entry is stored.
+
+When we change from viewing filenames as opaque byte sequences to seeing
+them as encoded strings we need to address what happens when a program
+tries to create a file with an invalid name.  The Unicode subsystem
+within the kernel leaves the decision of what to do in this case to the
+filesystem, which select its preferred behavior by enabling/disabling
+the strict mode.  When Ext4 encounters one of those strings and the
+filesystem did not require strict mode, it falls back to considering the
+entire string as an opaque byte sequence, which still allows the user to
+operate on that file, but the case-insensitive lookups won't work.
+
 Options
 =======
author	Linus Torvalds <torvalds@linux-foundation.org>	2019-05-08 07:12:44 +0300
committer	Linus Torvalds <torvalds@linux-foundation.org>	2019-05-08 07:12:44 +0300
commit	5abe37954e9a315c35c9490f78d55f307c3c636b (patch)
tree	fa2034b03b270c48ac516a8fe308654443b4a7e2 /Documentation/admin-guide
parent	e5fef2a9732580c5bd30c0097f5e9091a3d58ce5 (diff)
parent	db90f41916cf04c020062f8d8b0385942248283e (diff)
download	linux-5abe37954e9a315c35c9490f78d55f307c3c636b.tar.xz