summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2024-12-29bcachefs: kill __bch2_extent_ptr_to_bp()Kent Overstreet2-19/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bch2_extent_ptr_to_bp() no longer depends on deviceKent Overstreet3-36/+14
bch_backpointer no longer contains the bucket_offset field, it's just a direct LBA mapping (with low bits to account for compressed extent splitting), so we don't need to refer to the device to construct it anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bcachefs_metadata_version_disk_accounting_big_endianKent Overstreet5-15/+51
Fix sort order for disk accounting keys, in order to fix a regression on mount times. The typetag is now the most significant byte of the key, meaning disk accounting keys of the same type now sort together. This lets us skip over disk accounting keys that aren't mirrored in memory when reading accounting at startup, instead of having them interleaved with other counter types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bcachefs_metadata_version_backpointer_bucket_genKent Overstreet4-27/+23
New on disk format version: backpointers new include the generation number of the bucket they refer to, and the obsolete bucket_offset field (no longer needed because we no longer store backpointers in alloc keys) is gone. This is an expensive forced upgrade - hopefully the last; we have to run the extents_to_backpointers recovery pass to regenerate backpointers. It's a forced incompatible upgrade because the alternative would've been permamently making backpointers bigger, and as one of the biggest btrees (along with the extents btree) that's not an ideal option. It's worth it though, because this allows us to make the check_extents_to_backpointers pass drastically cheaper: an upcoming patch changes it to sum up backpointers in a bucket and check the sum against the sector counts for that bucket, only looking for missing backpointers if they don't match (and then only for specific buckets). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bch2_btree_path_peek_slot() doesn't return errorsKent Overstreet1-7/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: trace_key_cache_fillKent Overstreet2-5/+32
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: Log message in journal for snapshot deletionKent Overstreet1-0/+16
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bch2_trans_log_msg()Kent Overstreet2-3/+11
Export a helper for logging to the journal when we're already in a transaction context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: Kill snapshot_t->equivKent Overstreet3-111/+13
Now entirely dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-23block: Delete bio_set_prio()John Garry1-3/+3
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_set_prio() does nothing but set bio->bi_ioprio. All other places just set bio->bi_ioprio directly, so replace bio_set_prio() remaining callsites with setting bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-21bcachefs: Snapshot deletion no longer uses snapshot_t->equivKent Overstreet1-135/+133
Switch to generating a private list of interior nodes to delete, instead of using the equivalence class in the global data structure. This eliminates possible races with snapshot creation, and is much cleaner - it'll let us delete a lot of janky code for calculating and maintaining the equivalence classes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill equiv_seen arg to delete_dead_snapshots_process_key()Kent Overstreet1-34/+17
When deleting dead snapshots, we move keys from redundant interior snapshot nodes to child nodes - unless there's already a key, in which case the ancestor key is deleted. Previously, we tracked via equiv_seen whether the child snapshot had a key, but this was tricky w.r.t. transaction restarts, and not transactionally safe w.r.t. updates in the child snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't run overwrite triggers before insertKent Overstreet1-44/+37
This breaks when the trigger is inserting updates for the same btree, as the inode trigger now does. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: alloc_data_type_set() happens in alloc triggerKent Overstreet2-6/+6
Originally, we ran insert triggers before overwrite so that if an extent was being moved (by fallocate insert/collapse range), the bucket sector count wouldn't hit 0 partway through, and so we don't trigger state changes caused by that too soon. But this is better solved by just moving the data type change to the alloc trigger itself, where it's already called. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix key cache + BTREE_ITER_all_snapshotsKent Overstreet1-2/+8
Normally, whitouts (KEY_TYPE_whitout) are filtered from btree lookups, since they exist only to represent deletions of keys in ancestor snapshots - except, they should not be filtered in BTREE_ITER_all_snapshots mode, so that e.g. snapshot deletion can clean them up. This means that that the key cache has to store whiteouts, and key cache fills cannot filter them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix btree_trans_peek_key_cache() BTREE_ITER_all_snapshotsKent Overstreet1-0/+4
In BTREE_ITER_all_snapshots mode, we're required to only return keys where the snapshot field matches the iterator position - BTREE_ITER_filter_snapshots requires pulling keys into the key cache from ancestor snapshots, so we have to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: tidy btree_trans_peek_journal()Kent Overstreet1-17/+12
Change to match bch2_btree_trans_peek_updates() calling convention. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: tidy up __bch2_btree_iter_peek()Kent Overstreet1-8/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: check_indirect_extents can run onlineKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Refactor c->opts.reconstruct_allocKent Overstreet1-16/+12
Now handled in one place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Add empty statement between label and declaration in ↵Nathan Chancellor1-1/+1
check_inode_hash_info_matches_root() Clang 18 and newer warns (or errors with CONFIG_WERROR=y): fs/bcachefs/str_hash.c:164:2: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions] 164 | struct bch_inode_unpacked inode; | ^ In Clang 17 and prior, this is an unconditional hard error: fs/bcachefs/str_hash.c:164:2: error: expected expression 164 | struct bch_inode_unpacked inode; | ^ fs/bcachefs/str_hash.c:165:30: error: use of undeclared identifier 'inode' 165 | ret = bch2_inode_unpack(k, &inode); | ^ fs/bcachefs/str_hash.c:169:55: error: use of undeclared identifier 'inode' 169 | struct bch_hash_info hash2 = bch2_hash_info_init(c, &inode); | ^ fs/bcachefs/str_hash.c:171:40: error: use of undeclared identifier 'inode' 171 | ret = repair_inode_hash_info(trans, &inode); | ^ Add an empty statement between the label and the declaration to fix the warning/error without disturbing the code too much. Fixes: 2519d3b0d656 ("bcachefs: bch2_str_hash_check_key() now checks inode hash info") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412092339.QB7hffGC-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: trace_write_buffer_maybe_flushKent Overstreet3-1/+27
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_snapshot_exists()Kent Overstreet4-4/+21
bch2_snapshot_equiv() is going away; convert users that just wanted to know if the snapshot exists to something better Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_check_key_has_snapshot() prints btree idKent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_str_hash_check_key() now checks inode hash infoKent Overstreet3-33/+125
Versions of the same inode in different snapshots must have the same hash info; this is critical for lookups to work correctly. We're going to be running the str_hash checks online, at readdir or xattr list time, so we now need str_hash_check_key() to check for inode hash seed mismatches, since it won't be run right after check_inodes(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't BUG_ON() inode unpack errorKent Overstreet2-13/+28
Bkey validation checks that inodes are well-formed and unpack successfully, so an unpack error should always indicate memory corruption or some other kind of hardware bug - but these are still errors we can recover from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Use proper errcodes for inode unpack errorsKent Overstreet3-8/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill sysfs internal/accountingKent Overstreet3-32/+0
Since we added per-inode counters there's now far too many counters to show in one shot - if we want this in the future, it'll have to be in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill unnecessary mark_lock usageKent Overstreet8-55/+20
We can't hold mark_lock while calling fsck_err() - that's a deadlock, mark_lock is meant to be a leaf node lock. It's also unnecessary for gc_bucket() and bucket_gen(); rcu suffices since the bucket_gens array describes its size, and we can't race with device removal or resize during gc/fsck since that takes state lock. Reported-by: syzbot+38641fcbda1aaffefdd4@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't start rewriting btree nodes until after journal replayKent Overstreet1-1/+2
This fixes a deadlock during journal replay when btree node read errors kick off a ton of rewrites: we don't want them competing with journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix reuse of bucket before journal flush on multiple empty -> ↵Kent Overstreet2-40/+42
nonempty transition For each bucket we track when the bucket became nonempty and when it became empty again: if we can ensure that there will be no journal flushes in the range [nonempty, empty) (possibly because they occured at the same journal sequence number), then it's safe to reuse the bucket without waiting for a journal commit. This is a major performance optimization for erasure coding, where writes are initially replicated, but the extra replicas are quickly dropped: if those buckets are reused and overwritten without issuing a cache flush to the underlying device, then they only cost bus bandwidth. But there's a tricky corner case when there's multiple empty -> nonempty -> empty transitions in quick succession, i.e. when data is getting overwritten immediately as it's being written. If this happens and the previous empty transition hasn't been flushed, we need to continue tracking the previous nonempty transition - not start a new one. Fixing this means we now need to track both the nonempty and empty transitions in bch_alloc_v4. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_journal_noflush_seq() now takes [start, end)Kent Overstreet3-7/+10
Harder to screw up if we're explicit about the range, and more correct as journal reservations can be outstanding on multiple journal entries simultaneously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Set bucket needs discard, inc gen on empty -> nonempty transitionKent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't add unknown accounting types to eytzinger treeKent Overstreet2-1/+23
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Plumb bkey_validate_context to journal_entry_validateKent Overstreet7-84/+97
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Use a heap for handling overwrites in btree node scanKent Overstreet2-48/+86
Fix an O(n^2) issue when we find many overlapping (overwritten) btree nodes - especially when one node overwrites many smaller nodes. This was discovered to be an issue with the bcachefs merge_torture_flakey test - if we had a large btree that was then emptied, the number of difficult overwrites can be unbounded. Cc: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Minor bucket alloc optimizationKent Overstreet1-22/+33
Check open buckets and buckets waiting for journal commit before doing other expensive lookups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Mark more errors autofixKent Overstreet1-2/+2
tested repairing from a bug uncovered by the merge_torture_flakey test Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: fix bch2_btree_node_header_to_text() format stringKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Journal space calculations should skip durability=0 devicesKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: factor out str_hash.cKent Overstreet5-207/+232
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill flags param to bch2_subvolume_get()Kent Overstreet7-24/+17
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't call bch2_btree_interior_update_will_free_node() until after ↵Kent Overstreet1-7/+7
update succeeds Originally, btree splits always succeeded once we got to the point of recursing to the btree_insert_node() call. But that changed when we switched to not taking intent locks all the way up to the root, and that introduced a bug, because bch2_btree_interior_update_will_free_node() cancels paending writes and reparents a node that's going to be made visible on disk by another btree update to the current btree update. This was discovered in recent backpointers work, because bch2_btree_interior_update_will_free_node() also clears the will_make_reachable flag, causing backpointer target lookup to spuriously thing it had found a dangling backpointer (when the backpointer just hadn't been created yet by btree_update_nodes_written()). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Make sure __bch2_run_explicit_recovery_pass() signals to rewindKent Overstreet2-26/+27
We should always signal to rewind if the requested pass hasn't been run, even if called multiple times. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Call bch2_btree_lost_data() on btree read errorKent Overstreet1-7/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Journal write path refactoring, debug improvementsKent Overstreet2-31/+45
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: dev_alloc_list.devs -> dev_alloc_list.dataKent Overstreet3-49/+34
This lets us use darray macros on dev_alloc_list (and it will become a darray eventually, when we increase the maximum number of devices). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix failure to allocate journal write on discard retryKent Overstreet1-0/+9
When allocating a journal write fails, then retries after doing discards, we were failing to count already allocated replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: BCH_ERR_insufficient_journal_devicesKent Overstreet2-3/+3
kill another standard error code use Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Silence "unable to allocate journal write" if we're already ROKent Overstreet1-2/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>