diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-28 23:36:49 +0300 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-28 23:36:49 +0300 |
| commit | 278c7d9b5e0ca73a75e5151c22fb05c91cb4495f (patch) | |
| tree | c12dcfa73d4db1aa71996ae450fd40357dcca0cd /include/uapi/linux | |
| parent | 0c4ec4a339b435381bc998f74862bd7a23d33f79 (diff) | |
| parent | 4f984fe7b4d9aea332c7ff59827a4e168f0e4e1b (diff) | |
| download | linux-278c7d9b5e0ca73a75e5151c22fb05c91cb4495f.tar.xz | |
Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fallocate updates from Christian Brauner:
"fallocate() currently supports creating preallocated files
efficiently. However, on most filesystems fallocate() will preallocate
blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.
The extent state must later be converted to a written state when the
user writes data into this range, which can trigger numerous metadata
changes and journal I/O. This may leads to significant write
amplification and performance degradation in synchronous write mode.
At the moment, the only method to avoid this is to create an empty
file and write zero data into it (for example, using 'dd' with a large
block size). However, this method is slow and consumes a considerable
amount of disk bandwidth.
Now that more and more flash-based storage devices are available it is
possible to efficiently write zeros to SSDs using the unmap write
zeroes command if the devices do not write physical zeroes to the
media.
For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
the DEAC bit[1], the write zeroes command does not write actual data
to the device, instead, NVMe converts the zeroed range to a
deallocated state, which works fast and consumes almost no disk write
bandwidth.
This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.
fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
way that subsequent writes to that range do not require further
changes to the file mapping metadata. This flag is beneficial for
subsequent pure overwriting within this range, as it can save on block
allocation and, consequently, significant metadata changes"
* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ext4: add FALLOC_FL_WRITE_ZEROES support
block: add FALLOC_FL_WRITE_ZEROES support
block: factor out common part in blkdev_fallocate()
fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
dm: clear unmap write zeroes limits when disabling write zeroes
scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
nvmet: set WZDS and DRB if device enables unmap write zeroes operation
nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
Diffstat (limited to 'include/uapi/linux')
| -rw-r--r-- | include/uapi/linux/falloc.h | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h index 5810371ed72b..1f9ca757d02d 100644 --- a/include/uapi/linux/falloc.h +++ b/include/uapi/linux/falloc.h @@ -78,4 +78,21 @@ */ #define FALLOC_FL_UNSHARE_RANGE 0x40 +/* + * FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that + * subsequent writes to that range do not require further changes to the file + * mapping metadata. This flag is beneficial for subsequent pure overwriting + * within this range, as it can save on block allocation and, consequently, + * significant metadata changes. Therefore, filesystems that always require + * out-of-place writes should not support this flag. + * + * Different filesystems may implement different limitations on the + * granularity of the zeroing operation. Most will preferably be accelerated + * by submitting write zeroes command if the backing storage supports, which + * may not physically write zeros to the media. + * + * This flag cannot be specified in conjunction with the FALLOC_FL_KEEP_SIZE. + */ +#define FALLOC_FL_WRITE_ZEROES 0x80 + #endif /* _UAPI_FALLOC_H_ */ |
