summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2026-04-16 01:11:05 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2026-04-16 01:11:05 +0300
commita5f998094fa344cdd1342164948abb4d7c6101ce (patch)
tree804e1470a56e1b449c9fec40a415cc461738ee2b /Documentation
parentf1d26d72f01556c787b1291729aa7a2ce37656a8 (diff)
parent8c0ee19db81f0fa1ff25fd75b22b17c0cc2acde3 (diff)
downloadlinux-a5f998094fa344cdd1342164948abb4d7c6101ce.tar.xz
Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Benjamin Marzinski: "There are fixes for some corner case crashes in dm-cache and dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor fixes and cleanups, especially to dm-verity. dm-vdo: - Make dm-vdo able to format the device itself, like other dm targets, instead of needing a userspace formating program - Add some sanity checks and code cleanup dm-cache: - Fix crashes and hangs when operating in passthrough mode (which have been around, unnoticed, since 4.12), as well as a late arriving fix for an error path bug in the passthrough fix - Fix a corner case memory leak dm-verity: - Another set of minor bugfixes and code cleanups to the forward error correction code dm-mirror - Fix minor initialization bug - Fix overflow crash on a large devices with small region sizes dm-crypt - Reimplement elephant diffuser using AES library and minor cleanups dm-core: - Claude found a buffer overflow in /dev/mapper/contrl ioctl handling - make dm_mod.wait_for correctly wait for partitions - minor code fixes and cleanups" * tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits) dm cache: fix missing return in invalidate_committed's error path dm: fix a buffer overflow in ioctl processing dm-crypt: Make crypt_iv_operations::post return void dm vdo: Fix spelling mistake "postive" -> "positive" dm: provide helper to set stacked limits dm-integrity: always set the io hints dm-integrity: fix mismatched queue limits dm-bufio: use kzalloc_flex dm vdo: save the formatted metadata to disk dm vdo: add formatting logic and initialization dm vdo: add synchronous metadata I/O submission helper dm vdo: add geometry block structure dm vdo: add geometry block encoding dm vdo: add upfront validation for logical size dm vdo: add formatting parameters to table line dm vdo: add super block initialization to encodings.c dm vdo: add geometry block initialization to encodings.c dm-crypt: Make crypt_iv_operations::wipe return void dm-crypt: Reimplement elephant diffuser using AES library dm-verity-fec: warn even when there were no errors ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/device-mapper/verity.rst122
1 files changed, 102 insertions, 20 deletions
diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst
index 3ecab1cff9c6..eb9475d7e196 100644
--- a/Documentation/admin-guide/device-mapper/verity.rst
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@@ -102,29 +102,42 @@ ignore_zero_blocks
that are not guaranteed to contain zeroes.
use_fec_from_device <fec_dev>
- Use forward error correction (FEC) to recover from corruption if hash
- verification fails. Use encoding data from the specified device. This
- may be the same device where data and hash blocks reside, in which case
- fec_start must be outside data and hash areas.
+ Use forward error correction (FEC) parity data from the specified device to
+ try to automatically recover from corruption and I/O errors.
- If the encoding data covers additional metadata, it must be accessible
- on the hash device after the hash blocks.
+ If this option is given, then <fec_roots> and <fec_blocks> must also be
+ given. <hash_block_size> must also be equal to <data_block_size>.
- Note: block sizes for data and hash devices must match. Also, if the
- verity <dev> is encrypted the <fec_dev> should be too.
+ <fec_dev> can be the same as <dev>, in which case <fec_start> must be
+ outside the data area. It can also be the same as <hash_dev>, in which case
+ <fec_start> must be outside the hash and optional additional metadata areas.
+
+ If the data <dev> is encrypted, the <fec_dev> should be too.
+
+ For more information, see `Forward error correction`_.
fec_roots <num>
- Number of generator roots. This equals to the number of parity bytes in
- the encoding data. For example, in RS(M, N) encoding, the number of roots
- is M-N.
+ The number of parity bytes in each 255-byte Reed-Solomon codeword. The
+ Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots.
+
+ The supported values are 2 through 24 inclusive. Higher values provide
+ stronger error correction. However, the minimum value of 2 already provides
+ strong error correction due to the use of interleaving, so 2 is the
+ recommended value for most users. fec_roots=2 corresponds to an
+ RS(255, 253) code, which has a space overhead of about 0.8%.
fec_blocks <num>
- The number of encoding data blocks on the FEC device. The block size for
- the FEC device is <data_block_size>.
+ The total number of <data_block_size> blocks that are error-checked using
+ FEC. This must be at least the sum of <num_data_blocks> and the number of
+ blocks needed by the hash tree. It can include additional metadata blocks,
+ which are assumed to be accessible on <hash_dev> following the hash blocks.
+
+ Note that this is *not* the number of parity blocks. The number of parity
+ blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>.
fec_start <offset>
- This is the offset, in <data_block_size> blocks, from the start of the
- FEC device to the beginning of the encoding data.
+ This is the offset, in <data_block_size> blocks, from the start of <fec_dev>
+ to the beginning of the parity data.
check_at_most_once
Verify data blocks only the first time they are read from the data device,
@@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read
into the page cache. Block hashes are stored linearly, aligned to the nearest
block size.
-If forward error correction (FEC) support is enabled any recovery of
-corrupted data will be verified using the cryptographic hash of the
-corresponding data. This is why combining error correction with
-integrity checking is essential.
-
Hash Tree
---------
@@ -212,6 +220,80 @@ The tree looks something like:
/ ... \ / . . . \ / \
blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+Forward error correction
+------------------------
+
+dm-verity's optional forward error correction (FEC) support adds strong error
+correction capabilities to dm-verity. It allows systems that would be rendered
+inoperable by errors to continue operating, albeit with reduced performance.
+
+FEC uses Reed-Solomon (RS) codes that are interleaved across the entire
+device(s), allowing long bursts of corrupt or unreadable blocks to be recovered.
+
+dm-verity validates any FEC-corrected block against the wanted hash before using
+it. Therefore, FEC doesn't affect the security properties of dm-verity.
+
+The integration of FEC with dm-verity provides significant benefits over a
+separate error correction layer:
+
+- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash
+ or the block cannot be read at all. As a result, FEC doesn't add overhead to
+ the common case where no error occurs.
+
+- dm-verity hashes are also used to identify erasure locations for RS decoding.
+ This allows correcting twice as many errors.
+
+FEC uses an RS(255, k) code where k = 255 - fec_roots. fec_roots is usually 2.
+This means that each k (usually 253) message bytes have fec_roots (usually 2)
+bytes of parity data added to get a 255-byte codeword. (Many external sources
+call RS codewords "blocks". Since dm-verity already uses the term "block" to
+mean something else, we'll use the clearer term "RS codeword".)
+
+FEC checks fec_blocks blocks of message data in total, consisting of:
+
+1. The data blocks from the data device
+2. The hash blocks from the hash device
+3. Optional additional metadata that follows the hash blocks on the hash device
+
+dm-verity assumes that the FEC parity data was computed as if the following
+procedure were followed:
+
+1. Concatenate the message data from the above sources.
+2. Zero-pad to the next multiple of k blocks. Let msg be the resulting byte
+ array, and msglen its length in bytes.
+3. For 0 <= i < msglen / k (for each RS codeword):
+ a. Select msg[i + j * msglen / k] for 0 <= j < k.
+ Consider these to be the 'k' message bytes of an RS codeword.
+ b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword,
+ and concatenate them to the FEC parity data.
+
+Step 3a interleaves the RS codewords across the entire device using an
+interleaving degree of data_block_size * ceil(fec_blocks / k). This is the
+maximal interleaving, such that the message data consists of a region containing
+byte 0 of all the RS codewords, then a region containing byte 1 of all the RS
+codewords, and so on up to the region for byte 'k - 1'. Note that the number of
+codewords is set to a multiple of data_block_size; thus, the regions are
+block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks.
+
+This interleaving allows long bursts of errors to be corrected. It provides
+much stronger error correction than storage devices typically provide, while
+keeping the space overhead low.
+
+The cost is slow decoding: correcting a single block usually requires reading
+254 extra blocks spread evenly across the device(s). However, that is
+acceptable because dm-verity uses FEC only when there is actually an error.
+
+The list below contains additional details about the RS codes used by
+dm-verity's FEC. Userspace programs that generate the parity data need to use
+these parameters for the parity data to match exactly:
+
+- Field used is GF(256)
+- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0
+ through 7 (low-order to high-order) map to the coefficients of x^0 through x^7
+- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1
+- The codes used are systematic, BCH-view codes
+- Primitive element alpha is 'x'
+- First consecutive root of code generator polynomial is 'x^0'
On-disk format
==============