summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet6-30/+49
Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: kill EBUG_ON() redefinition in bkey.cKent Overstreet1-9/+8
our debug mode assertions in bkey.c haven't been getting run, whoops Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add logging to bch2_inode_peek() & relatedKent Overstreet3-4/+11
Add error messages when we fail to lookup an inode, and also add a few missing bch2_err_class() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix lock thrashing in __bchfs_fallocate()Kent Overstreet1-25/+56
We've observed significant lock thrashing on fstests generic/083 in fallocate, due to dropping and retaking btree locks when checking the pagecache for data. This adds a nonblocking mode to bch2_clamp_data_hole(), where we only use folio_trylock(), and can thus be used safely while btree locks are held - thus we only have to drop btree locks as a fallback, on actual lock contention. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix for bch2_copygc() spuriously returning -EEXISTKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert btree_err_type to normal error codesKent Overstreet2-72/+53
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix btree_err() macroKent Overstreet1-1/+4
Error code wasn't being propagated correctly, change it to match fsck_err() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Ensure topology repair runsKent Overstreet4-1/+5
This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Log a message when running an explicit recovery passKent Overstreet5-26/+35
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Print out required recovery passes on version upgradeKent Overstreet1-25/+37
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix shift by 64 in set_inc_field()Kent Overstreet1-27/+24
UBSAN was complaining about a shift by 64 in set_inc_field(). This only happened when the value being shifted was 0, so in theory should be harmless - a shift by 64 (or register width) should logically give a result of 0, but CPUs will in practice leave the input unchanged when the number of bits to shift by wraps - and since our input here is 0, the output is still what we want. But, it's still undefined behaviour and we need our UBSAN output to be clean, so it needs to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bkey_format helper improvementsKent Overstreet4-22/+43
- add a to_text() method for bkey_format - convert bch2_bkey_format_validate() to modern error message style, where we pass a printbuf for the error string instead of returning a static string Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bcachefs_metadata_version_deleted_inodesKent Overstreet5-3/+125
Add a new bitset btree for inodes pending deletion; this means we no longer have to scan the full inodes btree after an unclean shutdown. Specifically, this adds: - a trigger to update the deleted_inodes btree based on changes to the inodes btree - a new recovery pass - and check_inodes is now only a fsck pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix folio leak in folio_hole_offset()Kent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix overlapping extent repairKent Overstreet2-38/+131
A number of smallish fixes for overlapping extent repair, and (part of) a new unit test. This fixes all the issues turned up by bhzhu203, in his filesystem image from running mongodb + snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: In debug mode, run fsck again after fixing errorsKent Overstreet1-0/+23
We want to ensure that fsck actually fixed all the errors it found - the second fsck run should be clean. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: recovery_types.hKent Overstreet2-42/+48
Move some code out of bcachefs.h, which is too much of an everything header. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Handle weird opt string from sys_fsconfig()Kent Overstreet1-0/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Assorted fixes for clangKent Overstreet14-128/+86
clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Move fsck_inode_rm() to inode.cKent Overstreet3-64/+66
Prep work for the new deleted inodes btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Consolidate btree id propertiesKent Overstreet5-124/+104
This refactoring centralizes defining per-btree properties. bch2_key_types_allowed was also about to overflow a u32, so expand that to a u64. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_trans_update_extent_overwrite()Kent Overstreet2-103/+114
Factor out a new helper, to be used when fsck has to repair overlapping extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix minor memory leak on invalid bkeyKent Overstreet1-5/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Move some declarations to the correct headerKent Overstreet2-9/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix btree iter leak in __bch2_insert_snapshot_whiteouts()Kent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a null ptr deref in check_xattr()Kent Overstreet1-2/+2
We were attempting to initialize inode hash info when no inodes were found. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_btree_bit_mod()Kent Overstreet4-28/+37
New helper for bitset btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: move inode triggers to inode.cKent Overstreet4-71/+75
bit of reorg Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fsck: delete dead codeKent Overstreet1-41/+0
Delete the old, now reimplemented overlapping extent check/repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make topology repair a normal recovery passKent Overstreet7-46/+33
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_run_explicit_recovery_pass()Kent Overstreet5-12/+22
This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Print version, options earlier in startup pathKent Overstreet1-2/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: use prejournaled key updates for write buffer flushesBrian Foster1-2/+28
The write buffer mechanism journals keys twice in certain situations. A key is always journaled on write buffer insertion, and is potentially journaled again if a write buffer flush falls into either of the slow btree insert paths. This has shown to cause journal recovery ordering problems in the event of an untimely crash. For example, consider if a key is inserted into index 0 of a write buffer, the active write buffer switches to index 1, the key is deleted in index 1, and then index 0 is flushed. If the original key is rejournaled in the btree update from the index 0 flush, the (now deleted) key is journaled in a seq buffer ahead of the latest version of key (which was journaled when the key was deleted in index 1). If the fs crashes while this is still observable in the log, recovery sees the key from the btree update after the delete key from the write buffer insert, which is the incorrect order. This problem is occasionally reproduced by generic/388 and generally manifests as one or more backpointer entry inconsistencies. To avoid this problem, never rejournal write buffered key updates to the associated btree. Instead, use prejournaled key updates to pass the journal seq of the write buffer insert down to the btree insert, which updates the btree leaf pin to reflect the seq of the key. Note that tracking the seq is required instead of just using NOJOURNAL here because otherwise we lose protection of the write buffer pin when the buffer is flushed, which means the key can fall off the tail of the on-disk journal before the btree leaf is flushed and lead to similar recovery inconsistencies. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: support btree updates of prejournaled keysBrian Foster4-3/+36
Introduce support for prejournaled key updates. This allows a transaction to commit an update for a key that already exists (and is pinned) in the journal. This is required for btree write buffer updates as the current scheme of journaling both on write buffer insertion and write buffer (slow path) flush is unsafe in certain crash recovery scenarios. Create a small trans update wrapper to pass along the seq where the key resides into the btree_insert_entry. From there, trans commit passes the seq into the btree insert path where it is used to manage the journal pin for the associated btree leaf. Note that this patch only introduces the underlying mechanism and otherwise includes no functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fold bch2_trans_update_by_path_trace() into callersBrian Foster1-19/+8
There is only one other caller so eliminate some boilerplate. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: remove unnecessary btree_insert_key_leaf() wrapperBrian Foster1-7/+1
This is in preparation to support prejournaled keys. We want the ability to optionally pass a seq stored in the btree update rather than the seq of the committing transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: remove duplicate code between backpointer update pathsBrian Foster2-21/+5
Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23MAINTAINERS: add Brian Foster as a reviewer for bcachefsBrian Foster1-0/+1
Brian has been playing with bcachefs for several months now and has offerred to commit time to patch review. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Suppresss various error messages in no_data_io modeKent Overstreet3-6/+10
We commonly use no_data_io mode when debugging filesystem metadata dumps, where data checksum/compression errors are expected and unimportant - this patch suppresses these. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix lookup_inode_for_snapshot()Kent Overstreet1-1/+5
This fixes a use-after-free. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: need_snapshot_cleanup shouldn't be a fsck errorKent Overstreet1-12/+16
We currently don't track whether snapshot cleanup still needs to finish (aside from running a full fsck), so it shouldn't be a fsck error yet - fsck -n after fsck has succesfully completed shouldn't error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve key_visible_in_snapshot()Kent Overstreet1-8/+16
Delete a redundant bch2_snapshot_is_ancestor() check, and convert some assertions to debug assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor overlapping extent checksKent Overstreet1-66/+87
Make the overlapping extent check/repair code more self contained. This is prep work for hopefully reducing key_visible_in_snapshot() usage here as well, and also includes a nice performance optimization to not check ref_visible2() unless the extents potentially overlap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: check_extent(): don't use key_visible_in_snapshot()Kent Overstreet1-35/+12
This changes the main part of check_extents(), that checks the extent against the corresponding inode, to not use key_visible_in_snapshot(). key_visible_in_snapshot() has to iterate over the list of ancestor overwrites repeatedly calling bch2_snapshot_is_ancestor(), so this is a significant performance improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: check_extent() refactoringKent Overstreet1-49/+50
More prep work for reducing key_visible_in_snapshot() usage - this rearranges how KEY_TYPE_whitout keys are handled, so that they can be marked off in inode_warker->inode->seen_this_pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fsck: walk_inode() now takes is_whiteoutKent Overstreet1-12/+13
We only want to synthesize an inode for the current snapshot ID for non whiteouts - this refactoring lets us call walk_inode() earlier and clean up some control flow. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Simplify check_extent()Kent Overstreet1-20/+10
Minor refactoring/dead code deletion, prep work for reworking check_extent() to avoid key_visible_in_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: overlapping_extents_found()Kent Overstreet2-37/+84
This improves the repair path for overlapping extents - we now verify that we find in the btree the overlapping extents that the algorithm detected, and fail the fsck run with a more useful error if it doesn't match. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fsck: inode_walker: last_pos, seen_this_posKent Overstreet1-16/+23
Prep work for changing check_extent() to avoid key_visible_in_snapshot() - this adds the state to track whether an inode has seen an extent at this pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: check_extents(): make sure to check i_sectors for last inodeKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>