summaryrefslogtreecommitdiff
path: root/fs/bcachefs/io_read.c
AgeCommit message (Collapse)AuthorFilesLines
2025-06-17bcachefs: Fix bch2_read_bio_to_text()Kent Overstreet1-1/+6
We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Read error message now prints if self healingKent Overstreet1-2/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02bcachefs: bch_err_throw()Kent Overstreet1-16/+16
Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01bcachefs: Replace rcu_read_lock() with guardsKent Overstreet1-2/+1
The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Emit a single log message on data read errorKent Overstreet1-62/+21
Instead of emitting a message immediately when we get an error in the read path, and then another at the end if we successfully retry - emit one single log message before returning from bch2_rbio_retry(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Make various async objs visible in debugfsKent Overstreet1-13/+26
Add async objs list for - promote_op - bch_read_bio - btree_read_bio - btree_write_bio This gets us introspection on in-flight async ops, and because under the hood it uses fast_lists (percpu slot buffer on top of a radix tree), it'll be fast enough to enable in production. This will be very helpful for debugging "something got stuck" issues, which have been cropping up from time to time (in the CI, especially with folio writeback). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_read_bio_to_textKent Overstreet1-0/+35
Pretty printer for struct bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch_dev.io_ref -> enumerated_refKent Overstreet1-4/+6
Convert device IO refs to enumerated_refs, for easier debugging of refcount issues. Simple conversion: enumerate all users and convert to the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch_fs.writes -> enumerated_refsKent Overstreet1-3/+4
Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: RO mounts now use less memoryKent Overstreet1-0/+8
Defer memory allocations only needed in RW mode until we actually go RW. This is part of improved support for RO images. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: add missing includeKent Overstreet1-0/+1
Hygeine, and fix build in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Data move can read from poisoned extentsKent Overstreet1-0/+4
Now, if an extent is poisoned we can move it even if there was a checksum error. We'll have to give it a new checksum, but the poison bit means that userspace will still see the appropriate error when they try to read it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Poison extents that can't be read due to checksum errorsKent Overstreet1-4/+67
Copygc needs to be able to move extents that have bitrotted. We don't want to delete them - in the future we'll have an API for "read me the data even if there's checksum errors", and in general we don't want to delete anything unless the user asks us to. That will require writing it with a new checksum, which means we can't forget that there was a checksum error so we return the correct error to userspace. Rebalance also wants to skip bad extents; we can now use the poison flag for that. This is currently disabled by default, as we want read fua support so that we can distinguish between transient and permanent errors from the device. It may be enabled with the module parameter: poison_extents_on_checksum_error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Be precise about bch_io_failuresKent Overstreet1-3/+48
If the extent we're reading from changes, due to be being overwritten or moved (possibly partially) - we need to reset bch_io_failures so that we don't accidentally mark a new extent as poisoned prematurely. This means we have to separately track (in the retry path) the extent we previously read from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-15bcachefs: Silence extent_poisoned error messagesKent Overstreet1-8/+10
extent poisoning is partly so that we don't keep spewing the dmesg log when we've got unreadable data - we don't want to print these. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-13bcachefs: Don't print data read retry success on non-errorsKent Overstreet1-1/+5
We may end up in the data read retry path when reading cached data and racing with invalidation, or on checksum error when we were reading into a userspace buffer that might have been modified while the read was in flight. These aren't real errors, so we shouldn't print the 'retry success' message. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-07bcachefs: use library APIs for ChaCha20 and Poly1305Eric Biggers1-1/+2
Just use the ChaCha20 and Poly1305 libraries instead of the clunky crypto API. This is much simpler. It is also slightly faster, since the libraries provide more direct access to the same architecture-optimized ChaCha20 and Poly1305 code. I've tested that existing encrypted bcachefs filesystems can be continue to be accessed with this patch applied. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet1-4/+4
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Split up bch_dev.io_refKent Overstreet1-3/+3
We now have separate per device io_refs for read and write access. This fixes a device removal bug where the discard workers were still running while we're removing alloc info for that device. It's also a bit of hardening; we no longer allow writes to devices that are read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-31bcachefs: Log original key being moved in data updatesKent Overstreet1-0/+1
There's something going on with the data move path; log the original key being moved for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-25bcachefs: Fix silent short reads in data read retry pathKent Overstreet1-1/+2
__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size to "the size we can read from the current extent" with a swap, and restores it to "the size for the total read" after the read_extent call with another swap. But we neglected to do the restore before the "if (ret) goto err;" - which is a problem if we're retrying those errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Add missing random.h includesKent Overstreet1-0/+1
Fix build in userspace, and good hygeine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: __bch2_read() now takes a btree_transKent Overstreet1-16/+14
Next patch will be checking if the extent we're reading from matches the IO failure we saw before marking the failure. For this to work, __bch2_read() needs to take the same transaction context that bch2_rbio_retry() uses to do that check. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: BCH_READ_data_update -> bch_read_bio.data_updateKent Overstreet1-25/+32
Read flags are codepath dependent and change as they're passed around, while the fields in rbio._state are mostly fixed properties of that particular object. Losing track of BCH_READ_data_update would be bad, and previously it was not obvious if it was always correctly set in the rbio, so this is a safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Checksum errors get additional retriesKent Overstreet1-4/+6
It's possible for checksum errors to be transient - e.g. flakey controller or cable, thus we need additional retries (besides retrying from different replicas) before we can definitely return an error. This is particularly important for the next patch, which will allow the data move path to move extents with checksum errors - we don't want to accidentally introduce bitrot due to a transient error! - bch2_bkey_pick_read_device() is substantially reworked, and bch2_dev_io_failures is expanded to record more information about the type of failure (i.e. number of checksum errors). It now returns an error code that describes more precisely the reason for the failure - checksum error, io error, or offline device, instead of the previous generic "insufficient devices". This is important for the next patches that add poisoning, as we only want to poison extents when we've got real checksum errors (or perhaps IO errors?) - not because a device was offline. - Add a new option and superblock field for the number of checksum retries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Print message on successful read retryKent Overstreet1-0/+16
Users have been asking for this, and now that errors are returned to the top level read retry path - we can. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Return errors to top level bch2_rbio_retry()Kent Overstreet1-15/+26
Next patch will be adding an additional retry loop for checksum errors, so that we can rule out transient errors before marking an extent as poisoned. Prerequisite to this is returning errors to bch2_rbio_retry(); this will also let us add a "successful retry" message. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: BCH_ERR_data_read_buffer_too_smallKent Overstreet1-5/+4
Now that the read path uses proper error codes, we can get rid of the weird rbio->hole signalling to the move path that the read didn't happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Read error message now indicates if it was for an internal moveKent Overstreet1-1/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry pathKent Overstreet1-0/+3
When we do a read to a buffer that's mapped into userspace, it's possible to get a spurious checksum error if userspace was modified the buffer at the same time. When we retry those, they have to be bounced before we know definitively whether we're reading corrupt data. But the retry path propagates read flags differently, so needs special handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Convert read path to standard error codesKent Overstreet1-42/+51
Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard error codes that describe precisely which error occured. This is going to be used for the data move path, to move but poison extents with checksum errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: Debug params for data corruption injectionKent Overstreet1-0/+8
dm-flakey is busted, and this is simpler anyways - this lets us test the checksum error retry ptahs Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_account_io_completion()Kent Overstreet1-22/+23
We need to start accounting successes for every IO, not just failures, so introduce a unified hook for io completion accounting and convert io_read.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Fix read path io_ref handlingKent Overstreet1-3/+6
We were using our device pointer after we'd released our ref to it. Unlikely to be a race that's practical to hit, since actually removing a member device is a whole process besides just taking it offline, but - needs to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction ↵Kent Overstreet1-4/+7
restarts we're starting to use error messages with paths in fsck_errors(), where we do not want nested transaction restart handling, so let's prepare for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Read/move path counter workKent Overstreet1-6/+14
Reorganize counters a bit, grouping related counters together. New counters: - io_read_inline - io_read_hole Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: ScrubKent Overstreet1-1/+3
Add a new data op to walk all data and metadata in a filesystem, checking if it can be read successfully, and on error repairing from another copy if possible. - New helper: bch2_dev_idx_is_online(), so that we can bail out and report to userspace when we're unable to scrub because the device is offline - data_update_opts, which controls the data move path, now understands scrub: data is only read, not written. The read path is responsible for rewriting on read error, as with other reads. - scrub_pred skips data extents that don't have checksums - bch_ioctl_data has a new scrub member, which has a data_types field for data types to check - i.e. all data types, or only metadata. - Add new entries to bch_move_stats so that we can report numbers for corrected and uncorrected errors - Add a new enum to bch_ioctl_data_event for explicitly reporting completion and return code (i.e. device offline) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_bkey_pick_read_device() can now specify a deviceKent Overstreet1-4/+4
To be used for scrub, where we want the read to come from a specific device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Internal reads can now correct errorsKent Overstreet1-52/+56
Rework the read path so that BCH_READ_NODECODE reads now also self-heal after a read error and a successful retry - prerequisite for scrub. - __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE and a bounce. Normally, we don't want a BCH_READ_NODECODE read to ever allocate a split bch_read_bio: we want to maintain the relationship between the bch_read_bio and the data_update it's embedded in. But correcting read errors requires allocating a split/bounce rbio that's embedded in a promote_op. We do still have a 1-1 relationship, i.e. we only allocate a single split/bounce if it's a BCH_READ_NODECODE, so things hopefully don't get too crazy. - __bch2_read_extent() now is allowed to allocate the promote_op for rewriting after a failed read, even if it's BCH_READ_NODECODE. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Don't self-heal if a data update is already rewritingKent Overstreet1-20/+48
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Don't start promotes from bch2_rbio_free()Kent Overstreet1-1/+10
we don't want to block completion of the read - starting a promote calls into the write path, which will block. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Self healing writes are BCH_WRITE_alloc_nowaitKent Overstreet1-2/+2
If a drive is failing and we're moving data off of it, we can't necessairly depend on capacity/disk reservation calculations to avoid deadlocking/blocking on the allocator. And, we don't want to queue up infinite self healing moves anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Promotes should use BCH_WRITE_only_specified_devsKent Overstreet1-0/+1
Promotes, like most other internal moves, should only go to the specified target and not fall back to allocating from the full filesystem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Be stricter in bch2_read_retry_nodecode()Kent Overstreet1-40/+24
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that the read is embedded in a a data_update - and we can check in the retry path if the extent has changed and bail out. This likely fixes some subtle bugs with read errors and data moves. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: cleanup redundant code around data_update_op initializationKent Overstreet1-59/+33
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: promote_op uses embedded bch_read_bioKent Overstreet1-59/+45
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: rbio_init() cleanupKent Overstreet1-10/+8
Move more initialization to rbio_init(), to assist in further cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: rbio_init_fragment()Kent Overstreet1-11/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enumsKent Overstreet1-1/+1
The uppercase/lowercase style is nice for making the namespace explicit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: x-macroize BCH_READ flagsKent Overstreet1-41/+41
Will be adding a bch2_read_bio_to_text(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>