diff options
Diffstat (limited to 'Documentation/filesystems/iomap')
-rw-r--r-- | Documentation/filesystems/iomap/design.rst | 25 | ||||
-rw-r--r-- | Documentation/filesystems/iomap/operations.rst | 45 |
2 files changed, 61 insertions, 9 deletions
diff --git a/Documentation/filesystems/iomap/design.rst b/Documentation/filesystems/iomap/design.rst index b0d0188a095e..f2df9b6df988 100644 --- a/Documentation/filesystems/iomap/design.rst +++ b/Documentation/filesystems/iomap/design.rst @@ -243,8 +243,24 @@ The fields are as follows: regular file data. This is only useful for FIEMAP. - * **IOMAP_F_PRIVATE**: Starting with this value, the upper bits can - be set by the filesystem for its own purposes. + * **IOMAP_F_BOUNDARY**: This indicates I/O and its completion must not be + merged with any other I/O or completion. Filesystems must use this when + submitting I/O to devices that cannot handle I/O crossing certain LBAs + (e.g. ZNS devices). This flag applies only to buffered I/O writeback; all + other functions ignore it. + + * **IOMAP_F_PRIVATE**: This flag is reserved for filesystem private use. + + * **IOMAP_F_ANON_WRITE**: Indicates that (write) I/O does not have a target + block assigned to it yet and the file system will do that in the bio + submission handler, splitting the I/O as needed. + + * **IOMAP_F_ATOMIC_BIO**: This indicates write I/O must be submitted with the + ``REQ_ATOMIC`` flag set in the bio. Filesystems need to set this flag to + inform iomap that the write I/O operation requires torn-write protection + based on HW-offload mechanism. They must also ensure that mapping updates + upon the completion of the I/O must be performed in a single metadata + update. These flags can be set by iomap itself during file operations. The filesystem should supply an ``->iomap_end`` function if it needs @@ -352,6 +368,11 @@ operations: ``IOMAP_NOWAIT`` is often set on behalf of ``IOCB_NOWAIT`` or ``RWF_NOWAIT``. + * ``IOMAP_DONTCACHE`` is set when the caller wishes to perform a + buffered file I/O and would like the kernel to drop the pagecache + after the I/O completes, if it isn't already being used by another + thread. + If it is necessary to read existing file contents from a `different <https://lore.kernel.org/all/20191008071527.29304-9-hch@lst.de/>`_ device or address range on a device, the filesystem should return that diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst index b93115ab8748..3b628e370d88 100644 --- a/Documentation/filesystems/iomap/operations.rst +++ b/Documentation/filesystems/iomap/operations.rst @@ -104,7 +104,7 @@ iomap calls these functions: For the pagecache, races can happen if writeback doesn't take ``i_rwsem`` or ``invalidate_lock`` and updates mapping information. - Races can also happen if the filesytem allows concurrent writes. + Races can also happen if the filesystem allows concurrent writes. For such files, the mapping *must* be revalidated after the folio lock has been taken so that iomap can manage the folio correctly. @@ -131,6 +131,8 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap: * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``. + * ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``. + Internal per-Folio State ------------------------ @@ -283,7 +285,7 @@ The ``ops`` structure must be specified and is as follows: struct iomap_writeback_ops { int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset, unsigned len); - int (*prepare_ioend)(struct iomap_ioend *ioend, int status); + int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status); void (*discard_folio)(struct folio *folio, loff_t pos); }; @@ -306,13 +308,12 @@ The fields are as follows: purpose. This function must be supplied by the filesystem. - - ``prepare_ioend``: Enables filesystems to transform the writeback - ioend or perform any other preparatory work before the writeback I/O - is submitted. + - ``submit_ioend``: Allows the file systems to hook into writeback bio + submission. This might include pre-write space accounting updates, or installing a custom ``->bi_end_io`` function for internal purposes, such as deferring the ioend completion to a workqueue to run metadata update - transactions from process context. + transactions from process context before submitting the bio. This function is optional. - ``discard_folio``: iomap calls this function after ``->map_blocks`` @@ -341,7 +342,7 @@ This can happen in interrupt or process context, depending on the storage device. Filesystems that need to update internal bookkeeping (e.g. unwritten -extent conversions) should provide a ``->prepare_ioend`` function to +extent conversions) should provide a ``->submit_ioend`` function to set ``struct iomap_end::bio::bi_end_io`` to its own function. This function should call ``iomap_finish_ioends`` after finishing its own work (e.g. unwritten extent conversion). @@ -513,6 +514,36 @@ IOMAP_WRITE`` with any combination of the following enhancements: if the mapping is unwritten and the filesystem cannot handle zeroing the unaligned regions without exposing stale contents. + * ``IOMAP_ATOMIC``: This write is being issued with torn-write + protection. + Torn-write protection may be provided based on HW-offload or by a + software mechanism provided by the filesystem. + + For HW-offload based support, only a single bio can be created for the + write, and the write must not be split into multiple I/O requests, i.e. + flag REQ_ATOMIC must be set. + The file range to write must be aligned to satisfy the requirements + of both the filesystem and the underlying block device's atomic + commit capabilities. + If filesystem metadata updates are required (e.g. unwritten extent + conversion or copy-on-write), all updates for the entire file range + must be committed atomically as well. + Untorn-writes may be longer than a single file block. In all cases, + the mapping start disk block must have at least the same alignment as + the write offset. + The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an + untorn-write based on HW-offload. + + For untorn-writes based on a software mechanism provided by the + filesystem, all the disk block alignment and single bio restrictions + which apply for HW-offload based untorn-writes do not apply. + The mechanism would typically be used as a fallback for when + HW-offload based untorn-writes may not be issued, e.g. the range of the + write covers multiple extents, meaning that it is not possible to issue + a single bio. + All filesystem metadata updates for the entire file range must be + committed atomically as well. + Callers commonly hold ``i_rwsem`` in shared or exclusive mode before calling this function. |