diff options
author | Lars Ellenberg <lars.ellenberg@linbit.com> | 2018-12-20 19:23:42 +0300 |
---|---|---|
committer | Jens Axboe <axboe@kernel.dk> | 2018-12-20 19:51:31 +0300 |
commit | f31e583aa2c20892aca3add26957dee6ab80a534 (patch) | |
tree | cd57652fc773adee2e259c783a6abe30da88acfd /drivers/block/drbd/drbd_protocol.h | |
parent | 9848b6ddd8c92305252f94592c5e278574e7a6ac (diff) | |
download | linux-f31e583aa2c20892aca3add26957dee6ab80a534.tar.xz |
drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")
And also re-enable partial-zero-out + discard aligned.
With the introduction of REQ_OP_WRITE_ZEROES,
we started to use that for both WRITE_ZEROES and DISCARDS,
hoping that WRITE_ZEROES would "do what we want",
UNMAP if possible, zero-out the rest.
The example scenario is some LVM "thin" backend.
While an un-allocated block on dm-thin reads as zeroes, on a dm-thin
with "skip_block_zeroing=true", after a partial block write allocated
that block, that same block may well map "undefined old garbage" from
the backends on LBAs that have not yet been written to.
If we cannot distinguish between zero-out and discard on the receiving
side, to avoid "undefined old garbage" to pop up randomly at later times
on supposedly zero-initialized blocks, we'd need to map all discards to
zero-out on the receiving side. But that would potentially do a full
alloc on thinly provisioned backends, even when the expectation was to
unmap/trim/discard/de-allocate.
We need to distinguish on the protocol level, whether we need to guarantee
zeroes (and thus use zero-out, potentially doing the mentioned full-alloc),
or if we want to put the emphasis on discard, and only do a "best effort
zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing
only potential unaligned head and tail clippings to at least *try* to
avoid "false positives" in an online-verify later), hoping that someone
set skip_block_zeroing=false.
For some discussion regarding this on dm-devel, see also
https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html
https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html
For backward compatibility, P_TRIM means zero-out, unless the
DRBD_FF_WZEROES feature flag is agreed upon during handshake.
To have upper layers even try to submit WRITE ZEROES requests,
we need to announce "efficient zeroout" independently.
We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits():
if we can handle "zeroes" efficiently on the protocol,
we want to do that, even if our backend does not announce
max_write_zeroes_sectors itself.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'drivers/block/drbd/drbd_protocol.h')
-rw-r--r-- | drivers/block/drbd/drbd_protocol.h | 47 |
1 files changed, 47 insertions, 0 deletions
diff --git a/drivers/block/drbd/drbd_protocol.h b/drivers/block/drbd/drbd_protocol.h index 48dabbb21e11..e6fc5ad72501 100644 --- a/drivers/block/drbd/drbd_protocol.h +++ b/drivers/block/drbd/drbd_protocol.h @@ -70,6 +70,11 @@ enum drbd_packet { * we may fall back to an opencoded loop instead. */ P_WSAME = 0x34, + /* 0x35 already claimed in DRBD 9 */ + P_ZEROES = 0x36, /* data sock: zero-out, WRITE_ZEROES */ + + /* 0x40 .. 0x48 already claimed in DRBD 9 */ + P_MAY_IGNORE = 0x100, /* Flag to test if (cmd > P_MAY_IGNORE) ... */ P_MAX_OPT_CMD = 0x101, @@ -130,6 +135,12 @@ struct p_header100 { #define DP_SEND_RECEIVE_ACK 128 /* This is a proto B write request */ #define DP_SEND_WRITE_ACK 256 /* This is a proto C write request */ #define DP_WSAME 512 /* equiv. REQ_WRITE_SAME */ +#define DP_ZEROES 1024 /* equiv. REQ_OP_WRITE_ZEROES */ + +/* possible combinations: + * REQ_OP_WRITE_ZEROES: DP_DISCARD | DP_ZEROES + * REQ_OP_WRITE_ZEROES + REQ_NOUNMAP: DP_ZEROES + */ struct p_data { u64 sector; /* 64 bits sector number */ @@ -197,6 +208,42 @@ struct p_block_req { */ #define DRBD_FF_WSAME 4 +/* supports REQ_OP_WRITE_ZEROES on the "wire" protocol. + * + * We used to map that to "discard" on the sending side, and if we cannot + * guarantee that discard zeroes data, the receiving side would map discard + * back to zero-out. + * + * With the introduction of REQ_OP_WRITE_ZEROES, + * we started to use that for both WRITE_ZEROES and DISCARDS, + * hoping that WRITE_ZEROES would "do what we want", + * UNMAP if possible, zero-out the rest. + * + * The example scenario is some LVM "thin" backend. + * + * While an un-allocated block on dm-thin reads as zeroes, on a dm-thin + * with "skip_block_zeroing=true", after a partial block write allocated + * that block, that same block may well map "undefined old garbage" from + * the backends on LBAs that have not yet been written to. + * + * If we cannot distinguish between zero-out and discard on the receiving + * side, to avoid "undefined old garbage" to pop up randomly at later times + * on supposedly zero-initialized blocks, we'd need to map all discards to + * zero-out on the receiving side. But that would potentially do a full + * alloc on thinly provisioned backends, even when the expectation was to + * unmap/trim/discard/de-allocate. + * + * We need to distinguish on the protocol level, whether we need to guarantee + * zeroes (and thus use zero-out, potentially doing the mentioned full-alloc), + * or if we want to put the emphasis on discard, and only do a "best effort + * zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing + * only potential unaligned head and tail clippings), to at least *try* to + * avoid "false positives" in an online-verify later, hoping that someone + * set skip_block_zeroing=false. + */ +#define DRBD_FF_WZEROES 8 + + struct p_connection_features { u32 protocol_min; u32 feature_flags; |