summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: Add iter_flags arg to bch2_btree_delete_range()Kent Overstreet6-13/+17
Will be used by the new snapshot tests, to pass in BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add an error message for copygc spinningKent Overstreet1-0/+5
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix keylist size in btree_updateKent Overstreet1-2/+2
This fixes a buffer overrun, fortunately caught by a BUG_ON(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Improve error messages in device add pathKent Overstreet1-17/+27
This converts the error messages in the device add to a better style, and adds some missing ones. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_hprint(): don't print decimal if conversion was exactKent Overstreet1-1/+1
There's places where we parse these numbers, and our parsing doesn't cope with decimals currently - this is a hack to get the device_add path working again where for the device blocksize there doesn't ever need to be a decimal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Optimize bucket reuseKent Overstreet1-36/+28
If the btree updates pointing to a bucket were never flushed by the journal before the bucket became empty again, we can reuse the bucket without a journal flush. This tweaks the tracking of journal sequence numbers in alloc keys to implement this optimization: now, we only update the journal sequence number in alloc keys on transitions to and from empty. When a bucket becomes empty, we check if we can tell the journal not to flush entries starting from when the bucket was used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Always check for bucket reuse after readKent Overstreet1-3/+2
Since dirty extents can be moved or overwritten, it's not just cached data that we need the ptr_stale() check in bc2h_read_endio for - this fixes data checksum errors seen in the tiering ktest tests. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_journal_noflush_seq()Kent Overstreet3-3/+43
Add bch2_journal_noflush_seq(), for telling the journal that entries before a given sequence number should not be flushes - to be used by an upcoming allocator optimization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add a tracepoint for the btree cache shrinkerKent Overstreet2-2/+37
This is to help with diagnosing why the btree node can doesn't seem to be shrinking - we've had issues in the past with granularity/batch size, since btree nodes are so big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Run scan_old_btree_nodes after version upgradeKent Overstreet1-21/+20
In the recovery path, we scan for old btree nodes if we don't have certain compat bits set. If we do this, we should be doing it after we upgraded to the newest on disk format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Update sysfs compression_stats for snapshotsKent Overstreet1-28/+57
- BTREE_ITER_ALL_SNAPSHOTS flag is required here - change it to also walk the reflink btree - change it to accumulate stats for all pointers in an extent - change it to account for incompressible data Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill bch2_ec_mem_alloc()Kent Overstreet4-50/+5
bch2_ec_mem_alloc() was only used by GC, and there's no real need to preallocate the stripes radix tree since we can cope fine with memory allocation failure when we use the radix tree. This deletes a fair bit of code, and it's also needed for the upcoming patch because bch2_btree_iter_peek_prev() won't be working before journal replay completes (and using it was incorrect previously, as well). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix allocator + journal interactionKent Overstreet2-2/+2
The allocator needs to wait until the last update touching a bucket has been commited before writing to it again. However, the code was checking against the last dirty journal sequence number, not the last flushed journal sequence number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: New in-memory array for bucket gensKent Overstreet6-5/+72
The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Separate out gc_bucket()Kent Overstreet3-51/+57
Since the main in memory bucket array is going away, we don't want to be calling bucket() or __bucket() when what we want is the GC in-memory bucket. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Put open_buckets in a hashtableKent Overstreet4-2/+58
This is so that the copygc code doesn't have to refer to bucket_mark.owned_by_allocator - assisting in getting rid of the in memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Refactor open_bucket codeKent Overstreet8-67/+83
Prep work for adding a hash table of open buckets - instead of embedding a bch_extent_ptr, we need to refer to the bucket directly so that we're not calling sector_to_bucket() in the hash table lookup code, which has an expensive divide. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_alloc_sectors_append_ptrs() now takes cached flagKent Overstreet4-14/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete some obsolete journal_seq_blacklist codeKent Overstreet5-101/+11
Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Journal initialization fixesKent Overstreet1-0/+10
This fixes a rare bug when mounting & unmounting RO - flushing a clean filesystem that never went RO should be a no op. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Use BTREE_ITER_NOPRESERVE in bch2_btree_iter_verify_ret()Kent Overstreet1-0/+1
This fixes a transaction path overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix bch2_journal_meta()Kent Overstreet3-6/+6
This patch ensures that the journal entry written gets written as flush entry, which is important for the shutdown path - the last entry written needs to be a flush entry. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_journal_key_insert() no longer transfers ownershipKent Overstreet4-33/+34
bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill ptr_bucket_mark()Kent Overstreet1-13/+7
Only used in one place, we can just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Don't start allocator threads too earlyKent Overstreet3-2/+11
If the allocator threads start before journal replay has finished replaying alloc keys, journal replay might overwrite the allocator's btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marksKent Overstreet6-106/+19
Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Rewrite bch2_bucket_alloc_new_fs()Kent Overstreet6-14/+46
This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill non-lru cache replacement policiesKent Overstreet7-129/+2
Prep work for persistent LRUs and getting rid of the in memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix a null ptr deref in bch2_inode_delete_keys()Kent Overstreet1-1/+5
Similarly to bch2_btree_delete_range_trans(), bch2_inode_delete_keys() may sometimes split compressed extents, and needs to pass in a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Turn encoded_extent_max into a regular optionKent Overstreet7-22/+25
It'll now be handled at format time and in sysfs like other options - it still can only be set at format time, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Option improvementsKent Overstreet17-122/+205
This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix debugfs -bfloat-failedKent Overstreet1-1/+3
It wasn't updated for snapshots - it's iterating across keys in all snapshots, so needs to be specifying BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: BTREE_ITER_NOPRESERVEKent Overstreet5-16/+16
This adds a flag to not mark the initial btree_path as preserve, for paths that we expect to be cheap to reconstitute if necessary - this solves a btree_path overflow caused by need_whiteout_for_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix some shutdown path bugsKent Overstreet3-9/+16
This fixes some bugs when we hit an error very early in the filesystem startup path, before most things have been initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Optimize memory accesses in bch2_btree_node_get()Kent Overstreet1-9/+10
This puts a load behind some branches before where it's used, so that it can execute in parallel with other loads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Make sure bch2_bucket_alloc_new_fs() obeys buckets_nouseKent Overstreet1-0/+1
This fixes the filesystem migrate tool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill some obsolete sysfs codeKent Overstreet1-27/+7
fs internal/alloc_debug doesn't show anything bcachefs fs usage shows. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Don't call bch2_bkey_transform() unnecessarilyKent Overstreet1-2/+5
If the packed format isn't changing, there's no need to call bch2_bkey_transform(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill bch2_sort_repack_merge()Kent Overstreet3-73/+4
The main function of bch2_sort_repack_merge() was to call .key_normalize on every key, which drops stale (cached) pointers - it hasn't actually merged extents in quite some time. But bch2_gc_gens() now works on individual keys - we used to gc old gens by rewriting entire btree nodes. With that gone, there's no need for internal btree code to be calling .key_normalize anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Split out CONFIG_BCACHEFS_DEBUG_TRANSACTIONSKent Overstreet3-19/+23
This puts the btree_transactions sysfs/debugfs file behind a separate config option - it's highly useful, but not cheap enough to enable permenantly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix an assertion in bch2_truncate()Kent Overstreet1-1/+2
We recently added an assertion that when we truncate a file to 0, i_blocks should also go to 0 - but that's not necessarily true if we're doing an emergency shutdown, lots of invariants no longer hold true in that case. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix debug build in userspaceKent Overstreet3-12/+3
This fixes some compiler warnings that only trigger in userspace - dead code, a maybe uninitialed variable, a maybe null ptr passed to printk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add missing bch2_trans_iter_exit() callKent Overstreet1-0/+2
This fixes a bug where the filesystem goes read only when reading from debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Improve alloc_mem_to_key()Kent Overstreet3-40/+25
This moves some common code into alloc_mem_to_key(), which translates from the in-memory format for a bucket to the btree key format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: bch2_alloc_write()Kent Overstreet4-65/+49
This adds a new helper that much like the one we have for inode updates, that allocates the packed alloc key, packs it and calls bch2_trans_update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add more time_statsKent Overstreet10-14/+52
This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Print out OPT_SECTORS options in bytesKent Overstreet1-1/+1
This matches the conversion the parsing code does. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix null ptr deref in fsck_inode_rm()Kent Overstreet1-1/+4
bch2_btree_delete_range() can split compressed extents, thus needs to pass in a disk reservation when we're operating on extents btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Don't erasure code cached ptrsKent Overstreet3-12/+23
It doesn't make much sense to be erasure coding cached pointers, we should be erasure coding one of the dirty pointers in an extent. This patch makes sure we're passing BCH_WRITE_CACHED when we expect the new pointer to be a cached pointer, and tweaks the write path to not allocate from a stripe when BCH_WRITE_CACHED is set - and fixes an assertion we were hitting in the ec path where when adding the stripe to an extent and deleting the other pointers the pointer to the stripe didn't exist (because dropping all dirty pointers from an extent turns it into a KEY_TYPE_error key). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Split out struct gc_stripe from struct stripeKent Overstreet7-215/+176
We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>