diff options
author | Amir Goldstein <amir73il@gmail.com> | 2019-05-14 03:22:30 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-05-14 19:47:50 +0300 |
commit | c553ea4fdf2701d64b9e9cca4497a8a2512bb025 (patch) | |
tree | 545b8034a6e27d2fa63eac9704c41165cc4bdf1d | |
parent | 5326905798dee047bc6216da63ecf2c93c15968e (diff) | |
download | linux-c553ea4fdf2701d64b9e9cca4497a8a2512bb025.tar.xz |
fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
writeback") claims that sync_file_range(2) syscall was "created for
userspace to be able to issue background writeout and so waiting for
in-flight IO is undesirable there" and changes the writeback (back) to
WB_SYNC_NONE.
This claim is only partially true. It is true for users that use the flag
SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was the
reason for changing to WB_SYNC_NONE writeback.
However, that claim is not true for users that use that flag combination
SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}. Those users explicitly
requested to wait for in-flight IO as well as to writeback of dirty pages.
Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT and use
WB_SYNC_ALL writeback to perform the full range sync request.
Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
Link: http://lkml.kernel.org/r/20190419072938.31320-1-amir73il@gmail.com
Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r-- | fs/sync.c | 21 | ||||
-rw-r--r-- | include/uapi/linux/fs.h | 3 |
2 files changed, 18 insertions, 6 deletions
diff --git a/fs/sync.c b/fs/sync.c index 01e82170545a..4d1ff010bc5a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -292,8 +292,14 @@ int sync_file_range(struct file *file, loff_t offset, loff_t nbytes, } if (flags & SYNC_FILE_RANGE_WRITE) { + int sync_mode = WB_SYNC_NONE; + + if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) == + SYNC_FILE_RANGE_WRITE_AND_WAIT) + sync_mode = WB_SYNC_ALL; + ret = __filemap_fdatawrite_range(mapping, offset, endbyte, - WB_SYNC_NONE); + sync_mode); if (ret < 0) goto out; } @@ -306,9 +312,9 @@ out: } /* - * sys_sync_file_range() permits finely controlled syncing over a segment of + * ksys_sync_file_range() permits finely controlled syncing over a segment of * a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is - * zero then sys_sync_file_range() will operate from offset out to EOF. + * zero then ksys_sync_file_range() will operate from offset out to EOF. * * The flag bits are: * @@ -325,7 +331,7 @@ out: * Useful combinations of the flag bits are: * * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages - * in the range which were dirty on entry to sys_sync_file_range() are placed + * in the range which were dirty on entry to ksys_sync_file_range() are placed * under writeout. This is a start-write-for-data-integrity operation. * * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which @@ -337,10 +343,13 @@ out: * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait * for that operation to complete and to return the result. * - * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER: + * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER + * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT): * a traditional sync() operation. This is a write-for-data-integrity operation * which will ensure that all pages in the range which were dirty on entry to - * sys_sync_file_range() are committed to disk. + * ksys_sync_file_range() are written to disk. It should be noted that disk + * caches are not flushed by this call, so there are no guarantees here that the + * data will be available on disk after a crash. * * * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 121e82ce296b..59c71fa8c553 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -320,6 +320,9 @@ struct fscrypt_key { #define SYNC_FILE_RANGE_WAIT_BEFORE 1 #define SYNC_FILE_RANGE_WRITE 2 #define SYNC_FILE_RANGE_WAIT_AFTER 4 +#define SYNC_FILE_RANGE_WRITE_AND_WAIT (SYNC_FILE_RANGE_WRITE | \ + SYNC_FILE_RANGE_WAIT_BEFORE | \ + SYNC_FILE_RANGE_WAIT_AFTER) /* * Flags for preadv2/pwritev2: |