Age | Commit message (Collapse) | Author | Files | Lines |
|
We can only pass negative error codes to bch2_err_str(); if it's a
positive integer it's not an error and we trip an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for any time we return an error and unwind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Instead of emitting a message immediately when we get an error in the
read path, and then another at the end if we successfully retry - emit
one single log message before returning from bch2_rbio_retry().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add async objs list for
- promote_op
- bch_read_bio
- btree_read_bio
- btree_write_bio
This gets us introspection on in-flight async ops, and because under the
hood it uses fast_lists (percpu slot buffer on top of a radix tree),
it'll be fast enough to enable in production.
This will be very helpful for debugging "something got stuck" issues,
which have been cropping up from time to time (in the CI, especially
with folio writeback).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pretty printer for struct bch_read_bio.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert device IO refs to enumerated_refs, for easier debugging of
refcount issues.
Simple conversion: enumerate all users and convert to the new helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Drop the single-purpose write ref code in bcachefs.h, and convert to
enumarated refs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Defer memory allocations only needed in RW mode until we actually go RW.
This is part of improved support for RO images.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Hygeine, and fix build in userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now, if an extent is poisoned we can move it even if there was a
checksum error. We'll have to give it a new checksum, but the poison bit
means that userspace will still see the appropriate error when they try
to read it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Copygc needs to be able to move extents that have bitrotted. We don't
want to delete them - in the future we'll have an API for "read me the
data even if there's checksum errors", and in general we don't want to
delete anything unless the user asks us to.
That will require writing it with a new checksum, which means we can't
forget that there was a checksum error so we return the correct error to
userspace.
Rebalance also wants to skip bad extents; we can now use the poison flag
for that.
This is currently disabled by default, as we want read fua support so
that we can distinguish between transient and permanent errors from the
device. It may be enabled with the module parameter:
poison_extents_on_checksum_error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If the extent we're reading from changes, due to be being overwritten or
moved (possibly partially) - we need to reset bch_io_failures so that we
don't accidentally mark a new extent as poisoned prematurely.
This means we have to separately track (in the retry path) the extent we
previously read from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
extent poisoning is partly so that we don't keep spewing the dmesg log
when we've got unreadable data - we don't want to print these.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We may end up in the data read retry path when reading cached data and
racing with invalidation, or on checksum error when we were reading into
a userspace buffer that might have been modified while the read was in
flight.
These aren't real errors, so we shouldn't print the 'retry success'
message.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Just use the ChaCha20 and Poly1305 libraries instead of the clunky
crypto API. This is much simpler. It is also slightly faster, since
the libraries provide more direct access to the same
architecture-optimized ChaCha20 and Poly1305 code.
I've tested that existing encrypted bcachefs filesystems can be continue
to be accessed with this patch applied.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now have separate per device io_refs for read and write access.
This fixes a device removal bug where the discard workers were still
running while we're removing alloc info for that device.
It's also a bit of hardening; we no longer allow writes to devices that
are read-only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's something going on with the data move path; log the original key
being moved for debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size
to "the size we can read from the current extent" with a swap, and
restores it to "the size for the total read" after the read_extent call
with another swap.
But we neglected to do the restore before the "if (ret) goto err;" -
which is a problem if we're retrying those errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix build in userspace, and good hygeine.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Next patch will be checking if the extent we're reading from matches the
IO failure we saw before marking the failure.
For this to work, __bch2_read() needs to take the same transaction
context that bch2_rbio_retry() uses to do that check.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Read flags are codepath dependent and change as they're passed around,
while the fields in rbio._state are mostly fixed properties of that
particular object.
Losing track of BCH_READ_data_update would be bad, and previously it was
not obvious if it was always correctly set in the rbio, so this is a
safety cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's possible for checksum errors to be transient - e.g. flakey
controller or cable, thus we need additional retries (besides retrying
from different replicas) before we can definitely return an error.
This is particularly important for the next patch, which will allow the
data move path to move extents with checksum errors - we don't want to
accidentally introduce bitrot due to a transient error!
- bch2_bkey_pick_read_device() is substantially reworked, and
bch2_dev_io_failures is expanded to record more information about the
type of failure (i.e. number of checksum errors).
It now returns an error code that describes more precisely the reason
for the failure - checksum error, io error, or offline device, instead
of the previous generic "insufficient devices". This is important for
the next patches that add poisoning, as we only want to poison extents
when we've got real checksum errors (or perhaps IO errors?) - not
because a device was offline.
- Add a new option and superblock field for the number of checksum
retries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Users have been asking for this, and now that errors are returned to the
top level read retry path - we can.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Next patch will be adding an additional retry loop for checksum errors,
so that we can rule out transient errors before marking an extent as
poisoned.
Prerequisite to this is returning errors to bch2_rbio_retry(); this will
also let us add a "successful retry" message.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that the read path uses proper error codes, we can get rid of the
weird rbio->hole signalling to the move path that the read didn't
happen.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we do a read to a buffer that's mapped into userspace, it's
possible to get a spurious checksum error if userspace was modified the
buffer at the same time.
When we retry those, they have to be bounced before we know definitively
whether we're reading corrupt data.
But the retry path propagates read flags differently, so needs special
handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard
error codes that describe precisely which error occured.
This is going to be used for the data move path, to move but poison
extents with checksum errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
dm-flakey is busted, and this is simpler anyways - this lets us test the
checksum error retry ptahs
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were using our device pointer after we'd released our ref to it.
Unlikely to be a race that's practical to hit, since actually removing a
member device is a whole process besides just taking it offline, but -
needs to be fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
restarts
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reorganize counters a bit, grouping related counters together.
New counters:
- io_read_inline
- io_read_hole
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.
- New helper: bch2_dev_idx_is_online(), so that we can bail out and
report to userspace when we're unable to scrub because the device is
offline
- data_update_opts, which controls the data move path, now understands
scrub: data is only read, not written. The read path is responsible
for rewriting on read error, as with other reads.
- scrub_pred skips data extents that don't have checksums
- bch_ioctl_data has a new scrub member, which has a data_types field
for data types to check - i.e. all data types, or only metadata.
- Add new entries to bch_move_stats so that we can report numbers for
corrected and uncorrected errors
- Add a new enum to bch_ioctl_data_event for explicitly reporting
completion and return code (i.e. device offline)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
To be used for scrub, where we want the read to come from a specific
device.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.
- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
and a bounce.
Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
split bch_read_bio: we want to maintain the relationship between the
bch_read_bio and the data_update it's embedded in.
But correcting read errors requires allocating a split/bounce rbio
that's embedded in a promote_op. We do still have a 1-1 relationship,
i.e. we only allocate a single split/bounce if it's a
BCH_READ_NODECODE, so things hopefully don't get too crazy.
- __bch2_read_extent() now is allowed to allocate the promote_op for
rewriting after a failed read, even if it's BCH_READ_NODECODE.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
we don't want to block completion of the read - starting a promote calls
into the write path, which will block.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.
And, we don't want to queue up infinite self healing moves anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.
This likely fixes some subtle bugs with read errors and data moves.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Move more initialization to rbio_init(), to assist in further cleanups.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The uppercase/lowercase style is nice for making the namespace explicit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Will be adding a bch2_read_bio_to_text().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|