diff options
author | Ming Lei <ming.lei@redhat.com> | 2022-06-23 16:20:05 +0300 |
---|---|---|
committer | Mike Snitzer <snitzer@kernel.org> | 2022-06-23 21:33:13 +0300 |
commit | 61b6e2e5321da281ab3c0c04e1962b3d000f6248 (patch) | |
tree | 0e3d4eb1c7d51ca0b95232075b68b2ddb4b13760 | |
parent | 78ccef91234ba331c04d71f3ecb1377451d21056 (diff) | |
download | linux-61b6e2e5321da281ab3c0c04e1962b3d000f6248.tar.xz |
dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio
Commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO
accounting") removed using cloned bio when dm io splitting is needed.
Using bio_trim()+bio_inc_remaining() rather than bio_split()+bio_chain()
causes multiple dm_io instances to share the same original bio, and it
works fine if IOs are completed successfully.
But a regression was caused for the case when BLK_STS_DM_REQUEUE is
returned from any one of DM's cloned bios (whose dm_io share the same
orig_bio). In this BLK_STS_DM_REQUEUE case only the mapped subset of
the original bio for the current exact dm_io needs to be re-submitted.
However, since the original bio is shared among all dm_io instances,
the ->orig_bio actually only represents the last dm_io instance, so
requeue can't work as expected. Also when more than one dm_io is
requeued, the same original bio is requeued from all dm_io's
completion handler, then race is caused.
Fix this issue by still allocating one clone bio for completing io
only, then io accounting can rely on ->orig_bio being unmodified. This
is needed because the dm_io's sector_offset and sectors members are
recorded relative to an unmodified ->orig_bio.
In the future, we can go back to using bio_trim()+bio_inc_remaining()
for dm's io splitting but then delay needing a bio clone only when
handling BLK_STS_DM_REQUEUE, but that approach is a bit complicated
(so it needs a development cycle):
1) bio clone needs to be done in task context
2) a block interface for unwinding bio is required
Fixes: 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting")
Reported-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
-rw-r--r-- | drivers/md/dm-core.h | 1 | ||||
-rw-r--r-- | drivers/md/dm.c | 11 |
2 files changed, 8 insertions, 4 deletions
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 54c0473a51dd..c954ff91870e 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -272,6 +272,7 @@ struct dm_io { atomic_t io_count; struct mapped_device *md; + struct bio *split_bio; /* The three fields represent mapped part of original bio */ struct bio *orig_bio; unsigned int sector_offset; /* offset to end of orig_bio */ diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 9ede55278eec..2b75f1ef7386 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -594,6 +594,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) atomic_set(&io->io_count, 2); this_cpu_inc(*md->pending_io); io->orig_bio = bio; + io->split_bio = NULL; io->md = md; spin_lock_init(&io->lock); io->start_time = jiffies; @@ -887,7 +888,7 @@ static void dm_io_complete(struct dm_io *io) { blk_status_t io_error; struct mapped_device *md = io->md; - struct bio *bio = io->orig_bio; + struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio; if (io->status == BLK_STS_DM_REQUEUE) { unsigned long flags; @@ -1693,9 +1694,11 @@ static void dm_split_and_process_bio(struct mapped_device *md, * Remainder must be passed to submit_bio_noacct() so it gets handled * *after* bios already submitted have been completely processed. */ - bio_trim(bio, io->sectors, ci.sector_count); - trace_block_split(bio, bio->bi_iter.bi_sector); - bio_inc_remaining(bio); + WARN_ON_ONCE(!dm_io_flagged(io, DM_IO_WAS_SPLIT)); + io->split_bio = bio_split(bio, io->sectors, GFP_NOIO, + &md->queue->bio_split); + bio_chain(io->split_bio, bio); + trace_block_split(io->split_bio, bio->bi_iter.bi_sector); submit_bio_noacct(bio); out: /* |