summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-22bcachefs: If we run merges at a lower watermark, they must be nonblockingKent Overstreet1-1/+5
Fix another deadlock related to the merge path; previously, we switched to always running merges at a lower watermark (because they are noncritical); but when we run at a lower watermark we also need to run nonblocking or we've introduced a new deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: s@m-h.ug
2024-04-21bcachefs: Fix inode early destruction pathKent Overstreet1-3/+6
discard_new_inode() is the wrong interface to use when we need to free an inode that was never inserted into the inode hash table; we can bypass the whole iput() -> evict() path and replace it with __destroy_inode(); kmem_cache_free() - this fixes a WARN_ON() about I_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-21bcachefs: Fix deadlock in journal write pathKent Overstreet1-18/+42
bch2_journal_write() was incorrectly waiting on earlier journal writes synchronously; this usually worked because most of the time we'd be running in the context of a thread that did a journal_buf_put(), but sometimes we'd be running out of the same workqueue that completes those prior journal writes. Additionally, this makes sure to punt to a workqueue before submitting preflushes - we really don't want to be calling submit_bio() in the main transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-21bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet1-15/+4
Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulongKent Overstreet1-1/+1
this stores the SRCU sequence number, which we use to check if an SRCU barrier has elapsed; this is a partial fix for the key cache shrinker not actually freeing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Fix missing call to bch2_fs_allocator_background_exit()Kent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Check for journal entries overruning end of sb clean sectionKent Overstreet2-1/+10
Fix a missing bounds check in superblock validation. Note that we don't yet have repair code for this case - repair code for individual items is generally low priority, since the whole superblock is checksummed, validated prior to write, and we have backups. Reported-by: lei lu <llfamsec@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-18bcachefs: Fix bio alloc in check_extent_checksum()Kent Overstreet1-1/+1
if the buffer is virtually mapped it won't be a single bvec Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-18bcachefs: fix leak in bch2_gc_write_reflink_keyKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-18bcachefs: KEY_TYPE_error is allowed for reflinkKent Overstreet1-1/+2
KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-18bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shiftKent Overstreet2-5/+5
Fixes: 27c15ed297cb bcachefs: bch_member.btree_allocated_bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: make sure to release last journal pin in replayKent Overstreet1-1/+4
This fixes a deadlock when journal replay has many keys to insert that were from fsck, not the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: node scan: ignore multiple nodes with same seq if interiorKent Overstreet1-0/+2
Interior nodes are not really needed, when we have to scan - but if this pops up for leaf nodes we'll need a real heuristic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: Fix format specifier in validate_bset_keys()Nathan Chancellor1-1/+1
When building for 32-bit platforms, for which size_t is 'unsigned int', there is a warning from a format string in validate_bset_keys(): fs/bcachefs/btree_io.c: In function 'validate_bset_keys': fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=] 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err' 603 | msg, ##__VA_ARGS__); \ | ^~~ fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on' 887 | if (btree_err_on(!bkeyp_u64s_valid(&b->format, k), | ^~~~~~~~~~~~ fs/bcachefs/btree_io.c:891:64: note: format string is defined here 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors BKEY_U64s is size_t so the entire expression is promoted to size_t. Use the '%zu' specifier so that there is no warning regardless of the width of size_t. Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINEKent Overstreet3-3/+19
We need to initialize the stdio redirects before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: set_btree_iter_dontneed also clears should_be_lockedKent Overstreet1-2/+7
This is part of a larger series cleaning up the semantics of should_be_locked and adding assertions around it; if we don't need an iterator/path anymore, it clearly doesn't need to be locked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: fix error path of __bch2_read_super()Chao Yu1-2/+5
In __bch2_read_super(), if kstrdup() fails, it needs to release memory in sb->holder, fix to call bch2_free_super() in the error path. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: Check for backpointer bucket_offset >= bucket sizeKent Overstreet3-10/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: bch_member.btree_allocated_bitmapKent Overstreet9-6/+131
This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: sysfs internal/trigger_journal_flushKent Overstreet1-1/+10
Add a sysfs knob for immediately flushing the entire journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: Fix bch2_btree_node_fill() for !pathKent Overstreet1-26/+18
We shouldn't be doing the unlock/relock dance when we're not using a path - this fixes an assertion pop when called from btree node scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: add safety checks in bch2_btree_node_fill()Kent Overstreet1-1/+24
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: Interior known are required to have known key typesKent Overstreet1-1/+2
For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes; we can simply ignore metadata we don't understand. Pointers to btree nodes must always be of known types, howwever. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15bcachefs: add missing bounds check in __bch2_bkey_val_invalid()Kent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Fix btree node merging on write buffer btreesKent Overstreet1-2/+12
The btree write buffer flush fastpath that avoids the main transaction commit path had the unfortunate side effect of not doing btree node merging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Disable merges from interior update pathKent Overstreet1-0/+10
There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Run merges at BCH_WATERMARK_btreeKent Overstreet1-0/+6
This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Fix missing write refs in fs fio pathsKent Overstreet3-14/+23
bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Fix deadlock in journal replayKent Overstreet1-3/+4
btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Go rw if running any explicit recovery passesKent Overstreet1-1/+1
This fixes a bug where we fail to start when upgrading/downgrading because we forgot we needed to go rw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet10-56/+69
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: don't queue btree nodes for rewrites during scanKent Overstreet1-1/+3
many nodes found during scan will be old nodes, overwritten by newer nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: fix race in bch2_btree_node_evict()Kent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: fix unsafety in bch2_stripe_to_text()Kent Overstreet2-21/+27
.to_text() functions need to work on key values that didn't pass .valid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: fix unsafety in bch2_extent_ptr_to_text()Kent Overstreet1-1/+3
Need to check if we have a valid bucket before checking if ptr is stale Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: btree node scan: handle encrypted nodesKent Overstreet1-0/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Check for packed bkeys that are too bigKent Overstreet2-7/+14
add missing validation; fixes assertion pop in bkey unpack Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet1-13/+14
The btree paths array is now dynamically resizable - and as well the btree_insert_entries array, as it needs to be the same size. The merge path (and interior update path) allocates new btree paths, thus can trigger a resize; thus we need to not retain direct pointers after invoking merge; similarly when running btree node triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-12bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree split pathKent Overstreet1-15/+26
It turns out - btree splits happen with the rest of the transaction still locked, to avoid unnecessary restarts, which means using nofail doesn't work here - we can deadlock. Fortunately, we now have the ability to return errors here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()Kent Overstreet1-5/+7
We weren't respecting trans->journal_replay_not_finished - we shouldn't be searching the journal keys unless we have a ref on them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()Kent Overstreet1-27/+1
dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Fix a race in btree_update_nodes_written()Kent Overstreet1-3/+7
One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10bcachefs: btree_node_scan: Respect member.data_allowedKent Overstreet1-0/+3
If a device wasn't used for btree nodes, no need to scan for them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet4-18/+29
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: Fix check_topology() when using node scanKent Overstreet1-1/+1
shoot down journal keys _before_ populating journal keys with pointers to scanned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: fix eytzinger0_find_gt()Kent Overstreet1-6/+20
- fix return types: promoting from unsigned to ssize_t does not do what we want here, and was pointless since the rest of the eytzinger code is u32 - nr, not size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-08bcachefs: fix bch2_get_acl() transaction restart handlingKent Overstreet1-16/+14
bch2_acl_from_disk() uses allocate_dropping_locks, and can thus return a transaction restart - this wasn't handled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li1-0/+2
When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()Kent Overstreet1-10/+45
Multiple bug fixes for journal iters: - When the journal keys gap buffer is resized, we have to adjust the iterators for moving the gap to the end - We don't want to rewind iterators to point to the key we just inserted if it's not for the correct btree/level Also, add some new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: Rename struct field swap to prevent macro naming collisionThorsten Blum1-4/+4
The struct field swap can collide with the swap() macro defined in linux/minmax.h. Rename the struct field to prevent such collisions. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>