From ed04d98169f1c33ebc79f510c855eed83924d97f Mon Sep 17 00:00:00 2001 From: Milan Broz Date: Mon, 28 Oct 2013 23:21:04 +0100 Subject: dm crypt: add TCW IV mode for old CBC TCRYPT containers dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers in LRW or XTS block encryption mode. TCRYPT containers prior to version 4.1 use CBC mode with some additional tweaks, this patch adds support for these containers. This new mode is implemented using special IV generator named TCW (TrueCrypt IV with whitening). TCW IV only supports containers that are encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and TripleDES). While this mode is legacy and is known to be vulnerable to some watermarking attacks (e.g. revealing of hidden disk existence) it can still be useful to activate old containers without using 3rd party software or for independent forensic analysis of such containers. (Both the userspace and kernel code is an independent implementation based on the format documentation and it completely avoids use of original source code.) The TCW IV generator uses two additional keys: Kw (whitening seed, size is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is always the IV size of the selected cipher). These keys are concatenated at the end of the main encryption key provided in mapping table. While whitening is completely independent from IV, it is implemented inside IV generator for simplification. The whitening value is always 16 bytes long and is calculated per sector from provided Kw as initial seed, xored with sector number and mixed with CRC32 algorithm. Resulting value is xored with ciphertext sector content. IV is calculated from the provided Kiv as initial IV seed and xored with sector number. Detailed calculation can be found in the Truecrypt documentation for version < 4.1 and will also be described on dm-crypt site, see: http://code.google.com/p/cryptsetup/wiki/DMCrypt The experimental support for activation of these containers is already present in git devel brach of cryptsetup. Signed-off-by: Milan Broz Signed-off-by: Mike Snitzer --- Documentation/device-mapper/dm-crypt.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt index 2c656ae43ba7..c81839b52c4d 100644 --- a/Documentation/device-mapper/dm-crypt.txt +++ b/Documentation/device-mapper/dm-crypt.txt @@ -4,12 +4,15 @@ dm-crypt Device-Mapper's "crypt" target provides transparent encryption of block devices using the kernel crypto API. +For a more detailed description of supported parameters see: +http://code.google.com/p/cryptsetup/wiki/DMCrypt + Parameters: \ [<#opt_params> ] Encryption cipher and an optional IV generation mode. - (In format cipher[:keycount]-chainmode-ivopts:ivmode). + (In format cipher[:keycount]-chainmode-ivmode[:ivopts]). Examples: des aes-cbc-essiv:sha256 @@ -19,7 +22,11 @@ Parameters: \ Key used for encryption. It is encoded as a hexadecimal number. - You can only use key sizes that are valid for the selected cipher. + You can only use key sizes that are valid for the selected cipher + in combination with the selected iv mode. + Note that for some iv modes the key string can contain additional + keys (for example IV seed) so the key contains more parts concatenated + into a single string. Multi-key compatibility mode. You can define keys and -- cgit v1.2.3 From 01911c19bea63b1a958b9d9024504c2e9079f155 Mon Sep 17 00:00:00 2001 From: Joe Thornber Date: Thu, 24 Oct 2013 14:10:28 -0400 Subject: dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty() There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber Signed-off-by: Heinz Mauelshagen Signed-off-by: Mike Snitzer --- Documentation/device-mapper/cache-policies.txt | 6 +- drivers/md/dm-cache-policy-mq.c | 147 +++++++++++++++++++++---- 2 files changed, 132 insertions(+), 21 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt index d7c440b444cc..df52a849957f 100644 --- a/Documentation/device-mapper/cache-policies.txt +++ b/Documentation/device-mapper/cache-policies.txt @@ -30,8 +30,10 @@ multiqueue This policy is the default. -The multiqueue policy has two sets of 16 queues: one set for entries -waiting for the cache and another one for those in the cache. +The multiqueue policy has three sets of 16 queues: one set for entries +waiting for the cache and another two for those in the cache (a set for +clean entries and a set for dirty entries). + Cache entries in the queues are aged based on logical time. Entry into the cache is based on variable thresholds and queue selection is based on hit count on entry. The policy aims to take different cache miss diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c index a9a25de5b011..6710e038c730 100644 --- a/drivers/md/dm-cache-policy-mq.c +++ b/drivers/md/dm-cache-policy-mq.c @@ -224,6 +224,7 @@ struct entry { * FIXME: pack these better */ bool in_cache:1; + bool dirty:1; unsigned hit_count; unsigned generation; unsigned tick; @@ -238,13 +239,15 @@ struct mq_policy { struct io_tracker tracker; /* - * We maintain two queues of entries. The cache proper contains - * the currently active mappings. Whereas the pre_cache tracks - * blocks that are being hit frequently and potential candidates - * for promotion to the cache. + * We maintain three queues of entries. The cache proper, + * consisting of a clean and dirty queue, contains the currently + * active mappings. Whereas the pre_cache tracks blocks that + * are being hit frequently and potential candidates for promotion + * to the cache. */ struct queue pre_cache; - struct queue cache; + struct queue cache_clean; + struct queue cache_dirty; /* * Keeps track of time, incremented by the core. We use this to @@ -324,7 +327,8 @@ static void free_entries(struct mq_policy *mq) struct entry *e, *tmp; concat_queue(&mq->free, &mq->pre_cache); - concat_queue(&mq->free, &mq->cache); + concat_queue(&mq->free, &mq->cache_clean); + concat_queue(&mq->free, &mq->cache_dirty); list_for_each_entry_safe(e, tmp, &mq->free, list) kmem_cache_free(mq_entry_cache, e); @@ -508,7 +512,8 @@ static void push(struct mq_policy *mq, struct entry *e) if (e->in_cache) { alloc_cblock(mq, e->cblock); - queue_push(&mq->cache, queue_level(e), &e->list); + queue_push(e->dirty ? &mq->cache_dirty : &mq->cache_clean, + queue_level(e), &e->list); } else queue_push(&mq->pre_cache, queue_level(e), &e->list); } @@ -558,7 +563,8 @@ static bool updated_this_tick(struct mq_policy *mq, struct entry *e) * of the entries. * * At the moment the threshold is taken by averaging the hit counts of some - * of the entries in the cache (the first 20 entries of the first level). + * of the entries in the cache (the first 20 entries across all levels in + * ascending order, giving preference to the clean entries at each level). * * We can be much cleverer than this though. For example, each promotion * could bump up the threshold helping to prevent churn. Much more to do @@ -580,7 +586,16 @@ static void check_generation(struct mq_policy *mq) mq->generation++; for (level = 0; level < NR_QUEUE_LEVELS && count < MAX_TO_AVERAGE; level++) { - head = mq->cache.qs + level; + head = mq->cache_clean.qs + level; + list_for_each_entry(e, head, list) { + nr++; + total += e->hit_count; + + if (++count >= MAX_TO_AVERAGE) + break; + } + + head = mq->cache_dirty.qs + level; list_for_each_entry(e, head, list) { nr++; total += e->hit_count; @@ -633,19 +648,28 @@ static void requeue_and_update_tick(struct mq_policy *mq, struct entry *e) * - set the hit count to a hard coded value other than 1, eg, is it better * if it goes in at level 2? */ -static dm_cblock_t demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock) +static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock, dm_cblock_t *cblock) { - dm_cblock_t result; - struct entry *demoted = pop(mq, &mq->cache); + struct entry *demoted = pop(mq, &mq->cache_clean); - BUG_ON(!demoted); - result = demoted->cblock; + if (!demoted) + /* + * We could get a block from mq->cache_dirty, but that + * would add extra latency to the triggering bio as it + * waits for the writeback. Better to not promote this + * time and hope there's a clean block next time this block + * is hit. + */ + return -ENOSPC; + + *cblock = demoted->cblock; *oblock = demoted->oblock; demoted->in_cache = false; + demoted->dirty = false; demoted->hit_count = 1; push(mq, demoted); - return result; + return 0; } /* @@ -705,11 +729,16 @@ static int cache_entry_found(struct mq_policy *mq, static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e, struct policy_result *result) { + int r; dm_cblock_t cblock; if (find_free_cblock(mq, &cblock) == -ENOSPC) { result->op = POLICY_REPLACE; - cblock = demote_cblock(mq, &result->old_oblock); + r = demote_cblock(mq, &result->old_oblock, &cblock); + if (r) { + result->op = POLICY_MISS; + return 0; + } } else result->op = POLICY_NEW; @@ -717,6 +746,7 @@ static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e, del(mq, e); e->in_cache = true; + e->dirty = false; push(mq, e); return 0; @@ -760,6 +790,7 @@ static void insert_in_pre_cache(struct mq_policy *mq, } e->in_cache = false; + e->dirty = false; e->oblock = oblock; e->hit_count = 1; e->generation = mq->generation; @@ -787,6 +818,7 @@ static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock, e->oblock = oblock; e->cblock = cblock; e->in_cache = true; + e->dirty = false; e->hit_count = 1; e->generation = mq->generation; push(mq, e); @@ -917,6 +949,40 @@ static int mq_lookup(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t return r; } +/* + * FIXME: __mq_set_clear_dirty can block due to mutex. + * Ideally a policy should not block in functions called + * from the map() function. Explore using RCU. + */ +static void __mq_set_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock, bool set) +{ + struct mq_policy *mq = to_mq_policy(p); + struct entry *e; + + mutex_lock(&mq->lock); + e = hash_lookup(mq, oblock); + if (!e) + DMWARN("__mq_set_clear_dirty called for a block that isn't in the cache"); + else { + BUG_ON(!e->in_cache); + + del(mq, e); + e->dirty = set; + push(mq, e); + } + mutex_unlock(&mq->lock); +} + +static void mq_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock) +{ + __mq_set_clear_dirty(p, oblock, true); +} + +static void mq_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock) +{ + __mq_set_clear_dirty(p, oblock, false); +} + static int mq_load_mapping(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t cblock, uint32_t hint, bool hint_valid) @@ -931,6 +997,7 @@ static int mq_load_mapping(struct dm_cache_policy *p, e->cblock = cblock; e->oblock = oblock; e->in_cache = true; + e->dirty = false; /* this gets corrected in a minute */ e->hit_count = hint_valid ? hint : 1; e->generation = mq->generation; push(mq, e); @@ -949,7 +1016,14 @@ static int mq_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn, mutex_lock(&mq->lock); for (level = 0; level < NR_QUEUE_LEVELS; level++) - list_for_each_entry(e, &mq->cache.qs[level], list) { + list_for_each_entry(e, &mq->cache_clean.qs[level], list) { + r = fn(context, e->cblock, e->oblock, e->hit_count); + if (r) + goto out; + } + + for (level = 0; level < NR_QUEUE_LEVELS; level++) + list_for_each_entry(e, &mq->cache_dirty.qs[level], list) { r = fn(context, e->cblock, e->oblock, e->hit_count); if (r) goto out; @@ -974,11 +1048,41 @@ static void mq_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock) del(mq, e); e->in_cache = false; + e->dirty = false; push(mq, e); mutex_unlock(&mq->lock); } +static int __mq_writeback_work(struct mq_policy *mq, dm_oblock_t *oblock, + dm_cblock_t *cblock) +{ + struct entry *e = pop(mq, &mq->cache_dirty); + + if (!e) + return -ENODATA; + + *oblock = e->oblock; + *cblock = e->cblock; + e->dirty = false; + push(mq, e); + + return 0; +} + +static int mq_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock, + dm_cblock_t *cblock) +{ + int r; + struct mq_policy *mq = to_mq_policy(p); + + mutex_lock(&mq->lock); + r = __mq_writeback_work(mq, oblock, cblock); + mutex_unlock(&mq->lock); + + return r; +} + static void force_mapping(struct mq_policy *mq, dm_oblock_t current_oblock, dm_oblock_t new_oblock) { @@ -988,6 +1092,7 @@ static void force_mapping(struct mq_policy *mq, del(mq, e); e->oblock = new_oblock; + e->dirty = true; push(mq, e); } @@ -1063,10 +1168,12 @@ static void init_policy_functions(struct mq_policy *mq) mq->policy.destroy = mq_destroy; mq->policy.map = mq_map; mq->policy.lookup = mq_lookup; + mq->policy.set_dirty = mq_set_dirty; + mq->policy.clear_dirty = mq_clear_dirty; mq->policy.load_mapping = mq_load_mapping; mq->policy.walk_mappings = mq_walk_mappings; mq->policy.remove_mapping = mq_remove_mapping; - mq->policy.writeback_work = NULL; + mq->policy.writeback_work = mq_writeback_work; mq->policy.force_mapping = mq_force_mapping; mq->policy.residency = mq_residency; mq->policy.tick = mq_tick; @@ -1099,7 +1206,9 @@ static struct dm_cache_policy *mq_create(dm_cblock_t cache_size, mq->find_free_last_word = 0; queue_init(&mq->pre_cache); - queue_init(&mq->cache); + queue_init(&mq->cache_clean); + queue_init(&mq->cache_dirty); + mq->generation_period = max((unsigned) from_cblock(cache_size), 1024U); mq->nr_entries = 2 * from_cblock(cache_size); -- cgit v1.2.3 From 2ee57d587357f0d752af6c2e3e46434a74b1bee3 Mon Sep 17 00:00:00 2001 From: Joe Thornber Date: Thu, 24 Oct 2013 14:10:29 -0400 Subject: dm cache: add passthrough mode "Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber Signed-off-by: Heinz Mauelshagen Signed-off-by: Morgan Mears Signed-off-by: Mike Snitzer --- Documentation/device-mapper/cache.txt | 19 +++- drivers/md/dm-cache-metadata.c | 5 + drivers/md/dm-cache-metadata.h | 5 + drivers/md/dm-cache-target.c | 209 ++++++++++++++++++++++++++++------ 4 files changed, 200 insertions(+), 38 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index 33d45ee0b737..ff6639f72536 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -68,10 +68,11 @@ So large block sizes are bad because they waste cache space. And small block sizes are bad because they increase the amount of metadata (both in core and on disk). -Writeback/writethrough ----------------------- +Cache operating modes +--------------------- -The cache has two modes, writeback and writethrough. +The cache has three operating modes: writeback, writethrough and +passthrough. If writeback, the default, is selected then a write to a block that is cached will go only to the cache and the block will be marked dirty in @@ -81,6 +82,18 @@ If writethrough is selected then a write to a cached block will not complete until it has hit both the origin and cache devices. Clean blocks should remain clean. +If passthrough is selected, useful when the cache contents are not known +to be coherent with the origin device, then all reads are served from +the origin device (all reads miss the cache) and all writes are +forwarded to the origin device; additionally, write hits cause cache +block invalidates. Passthrough mode allows a cache device to be +activated without having to worry about coherency. Coherency that +exists is maintained, although the cache will gradually cool as writes +take place. If the coherency of the cache can later be verified, or +established, the cache device can can be transitioned to writethrough or +writeback mode while still warm. Otherwise, the cache contents can be +discarded prior to transitioning to the desired operating mode. + A simple cleaner policy is provided, which will clean (write back) all dirty blocks in a cache. Useful for decommissioning a cache. diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 062b83ed3e84..8601425436cd 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1249,3 +1249,8 @@ int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, return r; } + +int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result) +{ + return blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result); +} diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h index f45cef21f3d0..cd906f14f98d 100644 --- a/drivers/md/dm-cache-metadata.h +++ b/drivers/md/dm-cache-metadata.h @@ -137,6 +137,11 @@ int dm_cache_begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy * int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, uint32_t hint); +/* + * Query method. Are all the blocks in the cache clean? + */ +int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result); + /*----------------------------------------------------------------*/ #endif /* DM_CACHE_METADATA_H */ diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 183dfc9db297..8c0217753cc5 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -104,14 +104,37 @@ static void dm_unhook_bio(struct dm_hook_info *h, struct bio *bio) /* * FIXME: the cache is read/write for the time being. */ -enum cache_mode { +enum cache_metadata_mode { CM_WRITE, /* metadata may be changed */ CM_READ_ONLY, /* metadata may not be changed */ }; +enum cache_io_mode { + /* + * Data is written to cached blocks only. These blocks are marked + * dirty. If you lose the cache device you will lose data. + * Potential performance increase for both reads and writes. + */ + CM_IO_WRITEBACK, + + /* + * Data is written to both cache and origin. Blocks are never + * dirty. Potential performance benfit for reads only. + */ + CM_IO_WRITETHROUGH, + + /* + * A degraded mode useful for various cache coherency situations + * (eg, rolling back snapshots). Reads and writes always go to the + * origin. If a write goes to a cached oblock, then the cache + * block is invalidated. + */ + CM_IO_PASSTHROUGH +}; + struct cache_features { - enum cache_mode mode; - bool write_through:1; + enum cache_metadata_mode mode; + enum cache_io_mode io_mode; }; struct cache_stats { @@ -565,9 +588,24 @@ static void save_stats(struct cache *cache) #define PB_DATA_SIZE_WB (offsetof(struct per_bio_data, cache)) #define PB_DATA_SIZE_WT (sizeof(struct per_bio_data)) +static bool writethrough_mode(struct cache_features *f) +{ + return f->io_mode == CM_IO_WRITETHROUGH; +} + +static bool writeback_mode(struct cache_features *f) +{ + return f->io_mode == CM_IO_WRITEBACK; +} + +static bool passthrough_mode(struct cache_features *f) +{ + return f->io_mode == CM_IO_PASSTHROUGH; +} + static size_t get_per_bio_data_size(struct cache *cache) { - return cache->features.write_through ? PB_DATA_SIZE_WT : PB_DATA_SIZE_WB; + return writethrough_mode(&cache->features) ? PB_DATA_SIZE_WT : PB_DATA_SIZE_WB; } static struct per_bio_data *get_per_bio_data(struct bio *bio, size_t data_size) @@ -1135,6 +1173,32 @@ static void demote_then_promote(struct cache *cache, struct prealloc *structs, quiesce_migration(mg); } +/* + * Invalidate a cache entry. No writeback occurs; any changes in the cache + * block are thrown away. + */ +static void invalidate(struct cache *cache, struct prealloc *structs, + dm_oblock_t oblock, dm_cblock_t cblock, + struct dm_bio_prison_cell *cell) +{ + struct dm_cache_migration *mg = prealloc_get_migration(structs); + + mg->err = false; + mg->writeback = false; + mg->demote = true; + mg->promote = false; + mg->requeue_holder = true; + mg->cache = cache; + mg->old_oblock = oblock; + mg->cblock = cblock; + mg->old_ocell = cell; + mg->new_ocell = NULL; + mg->start_jiffies = jiffies; + + inc_nr_migrations(cache); + quiesce_migration(mg); +} + /*---------------------------------------------------------------- * bio processing *--------------------------------------------------------------*/ @@ -1197,13 +1261,6 @@ static bool spare_migration_bandwidth(struct cache *cache) return current_volume < cache->migration_threshold; } -static bool is_writethrough_io(struct cache *cache, struct bio *bio, - dm_cblock_t cblock) -{ - return bio_data_dir(bio) == WRITE && - cache->features.write_through && !is_dirty(cache, cblock); -} - static void inc_hit_counter(struct cache *cache, struct bio *bio) { atomic_inc(bio_data_dir(bio) == READ ? @@ -1216,6 +1273,15 @@ static void inc_miss_counter(struct cache *cache, struct bio *bio) &cache->stats.read_miss : &cache->stats.write_miss); } +static void issue_cache_bio(struct cache *cache, struct bio *bio, + struct per_bio_data *pb, + dm_oblock_t oblock, dm_cblock_t cblock) +{ + pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + remap_to_cache_dirty(cache, bio, oblock, cblock); + issue(cache, bio); +} + static void process_bio(struct cache *cache, struct prealloc *structs, struct bio *bio) { @@ -1227,7 +1293,8 @@ static void process_bio(struct cache *cache, struct prealloc *structs, size_t pb_data_size = get_per_bio_data_size(cache); struct per_bio_data *pb = get_per_bio_data(bio, pb_data_size); bool discarded_block = is_discarded_oblock(cache, block); - bool can_migrate = discarded_block || spare_migration_bandwidth(cache); + bool passthrough = passthrough_mode(&cache->features); + bool can_migrate = !passthrough && (discarded_block || spare_migration_bandwidth(cache)); /* * Check to see if that block is currently migrating. @@ -1248,15 +1315,39 @@ static void process_bio(struct cache *cache, struct prealloc *structs, switch (lookup_result.op) { case POLICY_HIT: - inc_hit_counter(cache, bio); - pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + if (passthrough) { + inc_miss_counter(cache, bio); - if (is_writethrough_io(cache, bio, lookup_result.cblock)) - remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock); - else - remap_to_cache_dirty(cache, bio, block, lookup_result.cblock); + /* + * Passthrough always maps to the origin, + * invalidating any cache blocks that are written + * to. + */ + + if (bio_data_dir(bio) == WRITE) { + atomic_inc(&cache->stats.demotion); + invalidate(cache, structs, block, lookup_result.cblock, new_ocell); + release_cell = false; + + } else { + /* FIXME: factor out issue_origin() */ + pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + remap_to_origin_clear_discard(cache, bio, block); + issue(cache, bio); + } + } else { + inc_hit_counter(cache, bio); + + if (bio_data_dir(bio) == WRITE && + writethrough_mode(&cache->features) && + !is_dirty(cache, lookup_result.cblock)) { + pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock); + issue(cache, bio); + } else + issue_cache_bio(cache, bio, pb, block, lookup_result.cblock); + } - issue(cache, bio); break; case POLICY_MISS: @@ -1807,7 +1898,7 @@ static int parse_block_size(struct cache_args *ca, struct dm_arg_set *as, static void init_features(struct cache_features *cf) { cf->mode = CM_WRITE; - cf->write_through = false; + cf->io_mode = CM_IO_WRITEBACK; } static int parse_features(struct cache_args *ca, struct dm_arg_set *as, @@ -1832,10 +1923,13 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, arg = dm_shift_arg(as); if (!strcasecmp(arg, "writeback")) - cf->write_through = false; + cf->io_mode = CM_IO_WRITEBACK; else if (!strcasecmp(arg, "writethrough")) - cf->write_through = true; + cf->io_mode = CM_IO_WRITETHROUGH; + + else if (!strcasecmp(arg, "passthrough")) + cf->io_mode = CM_IO_PASSTHROUGH; else { *error = "Unrecognised cache feature requested"; @@ -2088,6 +2182,22 @@ static int cache_create(struct cache_args *ca, struct cache **result) } cache->cmd = cmd; + if (passthrough_mode(&cache->features)) { + bool all_clean; + + r = dm_cache_metadata_all_clean(cache->cmd, &all_clean); + if (r) { + *error = "dm_cache_metadata_all_clean() failed"; + goto bad; + } + + if (!all_clean) { + *error = "Cannot enter passthrough mode unless all blocks are clean"; + r = -EINVAL; + goto bad; + } + } + spin_lock_init(&cache->lock); bio_list_init(&cache->deferred_bios); bio_list_init(&cache->deferred_flush_bios); @@ -2303,17 +2413,37 @@ static int cache_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_SUBMITTED; } + r = DM_MAPIO_REMAPPED; switch (lookup_result.op) { case POLICY_HIT: - inc_hit_counter(cache, bio); - pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + if (passthrough_mode(&cache->features)) { + if (bio_data_dir(bio) == WRITE) { + /* + * We need to invalidate this block, so + * defer for the worker thread. + */ + cell_defer(cache, cell, true); + r = DM_MAPIO_SUBMITTED; + + } else { + pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds); + inc_miss_counter(cache, bio); + remap_to_origin_clear_discard(cache, bio, block); + + cell_defer(cache, cell, false); + } - if (is_writethrough_io(cache, bio, lookup_result.cblock)) - remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock); - else - remap_to_cache_dirty(cache, bio, block, lookup_result.cblock); + } else { + inc_hit_counter(cache, bio); + + if (bio_data_dir(bio) == WRITE && writethrough_mode(&cache->features) && + !is_dirty(cache, lookup_result.cblock)) + remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock); + else + remap_to_cache_dirty(cache, bio, block, lookup_result.cblock); - cell_defer(cache, cell, false); + cell_defer(cache, cell, false); + } break; case POLICY_MISS: @@ -2338,10 +2468,10 @@ static int cache_map(struct dm_target *ti, struct bio *bio) DMERR_LIMIT("%s: erroring bio: unknown policy op: %u", __func__, (unsigned) lookup_result.op); bio_io_error(bio); - return DM_MAPIO_SUBMITTED; + r = DM_MAPIO_SUBMITTED; } - return DM_MAPIO_REMAPPED; + return r; } static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) @@ -2659,10 +2789,19 @@ static void cache_status(struct dm_target *ti, status_type_t type, (unsigned long long) from_cblock(residency), cache->nr_dirty); - if (cache->features.write_through) + if (writethrough_mode(&cache->features)) DMEMIT("1 writethrough "); - else - DMEMIT("0 "); + + else if (passthrough_mode(&cache->features)) + DMEMIT("1 passthrough "); + + else if (writeback_mode(&cache->features)) + DMEMIT("1 writeback "); + + else { + DMERR("internal error: unknown io mode: %d", (int) cache->features.io_mode); + goto err; + } DMEMIT("2 migration_threshold %llu ", (unsigned long long) cache->migration_threshold); if (sz < maxlen) { @@ -2771,7 +2910,7 @@ static void cache_io_hints(struct dm_target *ti, struct queue_limits *limits) static struct target_type cache_target = { .name = "cache", - .version = {1, 1, 1}, + .version = {1, 2, 0}, .module = THIS_MODULE, .ctr = cache_ctr, .dtr = cache_dtr, -- cgit v1.2.3 From 65790ff919e2e07ccb4457415c11075b245d643b Mon Sep 17 00:00:00 2001 From: Joe Thornber Date: Fri, 8 Nov 2013 16:39:50 +0000 Subject: dm cache: add cache block invalidation support Cache block invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the 'invalidate_cblocks' message, which takes an arbitrary number of cblock ranges: invalidate_cblocks [|-]* E.g. dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer --- Documentation/device-mapper/cache.txt | 12 +- drivers/md/dm-cache-target.c | 225 +++++++++++++++++++++++++++++++++- 2 files changed, 233 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index ff6639f72536..fc9d2dfb9415 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -244,12 +244,22 @@ The message format is: E.g. dmsetup message my_cache 0 sequential_threshold 1024 + +Invalidation is removing an entry from the cache without writing it +back. Cache blocks can be invalidated via the invalidate_cblocks +message, which takes an arbitrary number of cblock ranges. + + invalidate_cblocks [|-]* + +E.g. + dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 + Examples ======== The test suite can be found here: -https://github.com/jthornber/thinp-test-suite +https://github.com/jthornber/device-mapper-test-suite dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 8c0217753cc5..41e664b474f1 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -150,6 +150,25 @@ struct cache_stats { atomic_t discard_count; }; +/* + * Defines a range of cblocks, begin to (end - 1) are in the range. end is + * the one-past-the-end value. + */ +struct cblock_range { + dm_cblock_t begin; + dm_cblock_t end; +}; + +struct invalidation_request { + struct list_head list; + struct cblock_range *cblocks; + + atomic_t complete; + int err; + + wait_queue_head_t result_wait; +}; + struct cache { struct dm_target *ti; struct dm_target_callbacks callbacks; @@ -241,6 +260,7 @@ struct cache { bool need_tick_bio:1; bool sized:1; + bool invalidate:1; bool commit_requested:1; bool loaded_mappings:1; bool loaded_discards:1; @@ -251,6 +271,12 @@ struct cache { struct cache_features features; struct cache_stats stats; + + /* + * Invalidation fields. + */ + spinlock_t invalidation_lock; + struct list_head invalidation_requests; }; struct per_bio_data { @@ -283,6 +309,7 @@ struct dm_cache_migration { bool demote:1; bool promote:1; bool requeue_holder:1; + bool invalidate:1; struct dm_bio_prison_cell *old_ocell; struct dm_bio_prison_cell *new_ocell; @@ -904,8 +931,11 @@ static void migration_success_post_commit(struct dm_cache_migration *mg) list_add_tail(&mg->list, &cache->quiesced_migrations); spin_unlock_irqrestore(&cache->lock, flags); - } else + } else { + if (mg->invalidate) + policy_remove_mapping(cache->policy, mg->old_oblock); cleanup_migration(mg); + } } else { if (mg->requeue_holder) @@ -1115,6 +1145,7 @@ static void promote(struct cache *cache, struct prealloc *structs, mg->demote = false; mg->promote = true; mg->requeue_holder = true; + mg->invalidate = false; mg->cache = cache; mg->new_oblock = oblock; mg->cblock = cblock; @@ -1137,6 +1168,7 @@ static void writeback(struct cache *cache, struct prealloc *structs, mg->demote = false; mg->promote = false; mg->requeue_holder = true; + mg->invalidate = false; mg->cache = cache; mg->old_oblock = oblock; mg->cblock = cblock; @@ -1161,6 +1193,7 @@ static void demote_then_promote(struct cache *cache, struct prealloc *structs, mg->demote = true; mg->promote = true; mg->requeue_holder = true; + mg->invalidate = false; mg->cache = cache; mg->old_oblock = old_oblock; mg->new_oblock = new_oblock; @@ -1188,6 +1221,7 @@ static void invalidate(struct cache *cache, struct prealloc *structs, mg->demote = true; mg->promote = false; mg->requeue_holder = true; + mg->invalidate = true; mg->cache = cache; mg->old_oblock = oblock; mg->cblock = cblock; @@ -1524,6 +1558,58 @@ static void writeback_some_dirty_blocks(struct cache *cache) prealloc_free_structs(cache, &structs); } +/*---------------------------------------------------------------- + * Invalidations. + * Dropping something from the cache *without* writing back. + *--------------------------------------------------------------*/ + +static void process_invalidation_request(struct cache *cache, struct invalidation_request *req) +{ + int r = 0; + uint64_t begin = from_cblock(req->cblocks->begin); + uint64_t end = from_cblock(req->cblocks->end); + + while (begin != end) { + r = policy_remove_cblock(cache->policy, to_cblock(begin)); + if (!r) { + r = dm_cache_remove_mapping(cache->cmd, to_cblock(begin)); + if (r) + break; + + } else if (r == -ENODATA) { + /* harmless, already unmapped */ + r = 0; + + } else { + DMERR("policy_remove_cblock failed"); + break; + } + + begin++; + } + + cache->commit_requested = true; + + req->err = r; + atomic_set(&req->complete, 1); + + wake_up(&req->result_wait); +} + +static void process_invalidation_requests(struct cache *cache) +{ + struct list_head list; + struct invalidation_request *req, *tmp; + + INIT_LIST_HEAD(&list); + spin_lock(&cache->invalidation_lock); + list_splice_init(&cache->invalidation_requests, &list); + spin_unlock(&cache->invalidation_lock); + + list_for_each_entry_safe (req, tmp, &list, list) + process_invalidation_request(cache, req); +} + /*---------------------------------------------------------------- * Main worker loop *--------------------------------------------------------------*/ @@ -1593,7 +1679,8 @@ static int more_work(struct cache *cache) !bio_list_empty(&cache->deferred_writethrough_bios) || !list_empty(&cache->quiesced_migrations) || !list_empty(&cache->completed_migrations) || - !list_empty(&cache->need_commit_migrations); + !list_empty(&cache->need_commit_migrations) || + cache->invalidate; } static void do_worker(struct work_struct *ws) @@ -1605,6 +1692,7 @@ static void do_worker(struct work_struct *ws) writeback_some_dirty_blocks(cache); process_deferred_writethrough_bios(cache); process_deferred_bios(cache); + process_invalidation_requests(cache); } process_migrations(cache, &cache->quiesced_migrations, issue_copy); @@ -2271,6 +2359,7 @@ static int cache_create(struct cache_args *ca, struct cache **result) cache->need_tick_bio = true; cache->sized = false; + cache->invalidate = false; cache->commit_requested = false; cache->loaded_mappings = false; cache->loaded_discards = false; @@ -2284,6 +2373,9 @@ static int cache_create(struct cache_args *ca, struct cache **result) atomic_set(&cache->stats.commit_count, 0); atomic_set(&cache->stats.discard_count, 0); + spin_lock_init(&cache->invalidation_lock); + INIT_LIST_HEAD(&cache->invalidation_requests); + *result = cache; return 0; @@ -2833,7 +2925,128 @@ err: } /* - * Supports . + * A cache block range can take two forms: + * + * i) A single cblock, eg. '3456' + * ii) A begin and end cblock with dots between, eg. 123-234 + */ +static int parse_cblock_range(struct cache *cache, const char *str, + struct cblock_range *result) +{ + char dummy; + uint64_t b, e; + int r; + + /* + * Try and parse form (ii) first. + */ + r = sscanf(str, "%llu-%llu%c", &b, &e, &dummy); + if (r < 0) + return r; + + if (r == 2) { + result->begin = to_cblock(b); + result->end = to_cblock(e); + return 0; + } + + /* + * That didn't work, try form (i). + */ + r = sscanf(str, "%llu%c", &b, &dummy); + if (r < 0) + return r; + + if (r == 1) { + result->begin = to_cblock(b); + result->end = to_cblock(from_cblock(result->begin) + 1u); + return 0; + } + + DMERR("invalid cblock range '%s'", str); + return -EINVAL; +} + +static int validate_cblock_range(struct cache *cache, struct cblock_range *range) +{ + uint64_t b = from_cblock(range->begin); + uint64_t e = from_cblock(range->end); + uint64_t n = from_cblock(cache->cache_size); + + if (b >= n) { + DMERR("begin cblock out of range: %llu >= %llu", b, n); + return -EINVAL; + } + + if (e > n) { + DMERR("end cblock out of range: %llu > %llu", e, n); + return -EINVAL; + } + + if (b >= e) { + DMERR("invalid cblock range: %llu >= %llu", b, e); + return -EINVAL; + } + + return 0; +} + +static int request_invalidation(struct cache *cache, struct cblock_range *range) +{ + struct invalidation_request req; + + INIT_LIST_HEAD(&req.list); + req.cblocks = range; + atomic_set(&req.complete, 0); + req.err = 0; + init_waitqueue_head(&req.result_wait); + + spin_lock(&cache->invalidation_lock); + list_add(&req.list, &cache->invalidation_requests); + spin_unlock(&cache->invalidation_lock); + wake_worker(cache); + + wait_event(req.result_wait, atomic_read(&req.complete)); + return req.err; +} + +static int process_invalidate_cblocks_message(struct cache *cache, unsigned count, + const char **cblock_ranges) +{ + int r = 0; + unsigned i; + struct cblock_range range; + + if (!passthrough_mode(&cache->features)) { + DMERR("cache has to be in passthrough mode for invalidation"); + return -EPERM; + } + + for (i = 0; i < count; i++) { + r = parse_cblock_range(cache, cblock_ranges[i], &range); + if (r) + break; + + r = validate_cblock_range(cache, &range); + if (r) + break; + + /* + * Pass begin and end origin blocks to the worker and wake it. + */ + r = request_invalidation(cache, &range); + if (r) + break; + } + + return r; +} + +/* + * Supports + * " " + * and + * "invalidate_cblocks [()|(-)]* * * The key migration_threshold is supported by the cache target core. */ @@ -2841,6 +3054,12 @@ static int cache_message(struct dm_target *ti, unsigned argc, char **argv) { struct cache *cache = ti->private; + if (!argc) + return -EINVAL; + + if (!strcmp(argv[0], "invalidate_cblocks")) + return process_invalidate_cblocks_message(cache, argc - 1, (const char **) argv + 1); + if (argc != 2) return -EINVAL; -- cgit v1.2.3 From 7b6b2bc98c0303b7f043ad5b35906f833e56308d Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Tue, 12 Nov 2013 12:17:43 -0500 Subject: dm cache: resolve small nits and improve Documentation Document passthrough mode, cache shrinking, and cache invalidation. Also, use strcasecmp() and hlist_unhashed(). Reported-by: Alasdair G Kergon Signed-off-by: Mike Snitzer --- Documentation/device-mapper/cache.txt | 42 ++++++++++++++++++++++++++--------- drivers/md/dm-cache-policy-mq.c | 2 +- drivers/md/dm-cache-target.c | 2 +- 3 files changed, 34 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index fc9d2dfb9415..274752f8bdf9 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -86,16 +86,27 @@ If passthrough is selected, useful when the cache contents are not known to be coherent with the origin device, then all reads are served from the origin device (all reads miss the cache) and all writes are forwarded to the origin device; additionally, write hits cause cache -block invalidates. Passthrough mode allows a cache device to be -activated without having to worry about coherency. Coherency that -exists is maintained, although the cache will gradually cool as writes -take place. If the coherency of the cache can later be verified, or -established, the cache device can can be transitioned to writethrough or -writeback mode while still warm. Otherwise, the cache contents can be -discarded prior to transitioning to the desired operating mode. +block invalidates. To enable passthrough mode the cache must be clean. +Passthrough mode allows a cache device to be activated without having to +worry about coherency. Coherency that exists is maintained, although +the cache will gradually cool as writes take place. If the coherency of +the cache can later be verified, or established through use of the +"invalidate_cblocks" message, the cache device can be transitioned to +writethrough or writeback mode while still warm. Otherwise, the cache +contents can be discarded prior to transitioning to the desired +operating mode. A simple cleaner policy is provided, which will clean (write back) all -dirty blocks in a cache. Useful for decommissioning a cache. +dirty blocks in a cache. Useful for decommissioning a cache or when +shrinking a cache. Shrinking the cache's fast device requires all cache +blocks, in the area of the cache being removed, to be clean. If the +area being removed from the cache still contains dirty blocks the resize +will fail. Care must be taken to never reduce the volume used for the +cache's fast device until the cache is clean. This is of particular +importance if writeback mode is used. Writethrough and passthrough +modes already maintain a clean cache. Future support to partially clean +the cache, above a specified threshold, will allow for keeping the cache +warm and in writeback mode during resize. Migration throttling -------------------- @@ -174,7 +185,7 @@ Constructor block size : cache unit size in sectors #feature args : number of feature arguments passed - feature args : writethrough. (The default is writeback.) + feature args : writethrough or passthrough (The default is writeback.) policy : the replacement policy to use #policy args : an even number of arguments corresponding to @@ -190,6 +201,13 @@ Optional feature arguments are: back cache block contents later for performance reasons, so they may differ from the corresponding origin blocks. + passthrough : a degraded mode useful for various cache coherency + situations (e.g., rolling back snapshots of + underlying storage). Reads and writes always go to + the origin. If a write goes to a cached origin + block, then the cache block is invalidated. + To enable passthrough mode the cache must be clean. + A policy called 'default' is always registered. This is an alias for the policy we currently think is giving best all round performance. @@ -247,7 +265,11 @@ E.g. Invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the invalidate_cblocks -message, which takes an arbitrary number of cblock ranges. +message, which takes an arbitrary number of cblock ranges. Each cblock +must be expressed as a decimal value, in the future a variant message +that takes cblock ranges expressed in hexidecimal may be needed to +better support efficient invalidation of larger caches. The cache must +be in passthrough mode when invalidate_cblocks is used. invalidate_cblocks [|-]* diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c index 7209fab8b8ed..416b7b752a6e 100644 --- a/drivers/md/dm-cache-policy-mq.c +++ b/drivers/md/dm-cache-policy-mq.c @@ -310,7 +310,7 @@ static void free_entry(struct entry_pool *ep, struct entry *e) static struct entry *epool_find(struct entry_pool *ep, dm_cblock_t cblock) { struct entry *e = ep->entries + from_cblock(cblock); - return e->hlist.pprev ? e : NULL; + return !hlist_unhashed(&e->hlist) ? e : NULL; } static bool epool_empty(struct entry_pool *ep) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 41e664b474f1..9efcf1059b99 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -3057,7 +3057,7 @@ static int cache_message(struct dm_target *ti, unsigned argc, char **argv) if (!argc) return -EINVAL; - if (!strcmp(argv[0], "invalidate_cblocks")) + if (!strcasecmp(argv[0], "invalidate_cblocks")) return process_invalidate_cblocks_message(cache, argc - 1, (const char **) argv + 1); if (argc != 2) -- cgit v1.2.3