summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2025-03-15bcachefs: BCH_SB_FEATURES_ALL includes BCH_FEATURE_incompat_verison_fieldKent Overstreet2-3/+3
These features are set on format and incompat upgarde. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: sysfs internal/trigger_btree_updatesKent Overstreet1-0/+5
Add a debug knob to manually trigger the btree updates worker. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bcachefs_metadata_version_casefoldingJoshua Ashton14-42/+267
This patch implements support for case-insensitive file name lookups in bcachefs. The implementation uses the same UTF-8 lowering and normalization that ext4 and f2fs is using. More information is provided in Documentation/bcachefs/casefolding.rst Compatibility notes: This uses the new versioning scheme for incompatible features where an incompatible feature is tied to a version number: the superblock says "we may use incompat features up to x" and "incompat features up to x are in use", disallowing mounting by previous versions. Additionally, and old style incompat feature bit is used, so that kernels without utf8 casefolding support know if casefolding specifically is in use and they're allowed to mount. Signed-off-by: Joshua Ashton <joshua@froggi.es> Cc: André Almeida <andrealmeid@igalia.com> Cc: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Split out dirent alloc and name initializationJoshua Ashton1-12/+34
Splits out the code that allocates the dirent and initializes the name to make things easier to implement casefolding in a future commit. Cc: André Almeida <andrealmeid@igalia.com> Cc: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Kill dirent_occupied_size() in create pathKent Overstreet3-12/+15
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Kill dirent_occupied_size() in rename pathKent Overstreet3-6/+14
Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bcachefs_metadata_version_stripe_lruKent Overstreet7-5/+99
Add a persistent LRU for stripes, ordered by "number of empty blocks", i.e. order in which we wish to reuse them. This will replace the in-memory stripes heap, so we can kill off reading stripes into memory at startup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bcachefs_metadata_version_stripe_backpointersKent Overstreet5-4/+36
Stripes now have backpointers. This is needed for proper scrub - stripe checksums need to be verified, separately from extents within the stripe, since a block may not be full of live extents but it's still needed for reconstruct. And this will be needed for (efficient) evacuate/repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Advance bch_alloc.oldest_gen if no stale pointersKent Overstreet1-0/+3
Now that we've got cached backpointers and aren't leaving around stale pointers on bucket invalidation, we no longer need the periodic (rare) gc_gens - which recalculates each bucket's oldest gen to avoid wraparound. We can't delete that code because we've got to support existing filesystems that will still have stale pointers, but this gets rid of another scalability limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Invalidate cached data by backpointersKent Overstreet1-23/+79
If we don't leave stale pointers around, we won't have to deal with bucket gen wraparound. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bcachefs_metadata_version_cached_backpointersKent Overstreet4-14/+16
Cached pointers now have backpointers. This means that we'll be able to kill cached pointers in the bucket_invalidate path, when invalidating/reusing buckets containing cached data, instead of leaving them around to be cleaned up by gc_gens garbago collection - which requires a full metadata scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: rework bch2_trans_commit_run_triggers()Kent Overstreet2-57/+34
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Better trigger orderingKent Overstreet2-0/+14
Transactional triggers need to run in a defined ordering, which is not quite the same as btree ID integer comparison. Previously this was handled in a hacky way in bch2_trans_commit_run_triggers(), since it was only the alloc btree that needed special handling, but upcoming stripe btree changes are going to require more ordering changes - so, define that ordering. Next patch will change the transaction commit path to use it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_trigger_stripe_ptr() no longer uses ec_stripes_heap_lockKent Overstreet5-33/+46
Introduce per-entry locks, like with struct bucket - the stripes heap is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Rework bch2_check_lru_key()Kent Overstreet1-30/+47
It's now easier to add new LRU types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: decouple bch2_lru_check_set() from alloc btreeKent Overstreet3-7/+10
Pass in the backpointer explicitly, instead of assuming 'referring_k' is an alloc key and calculating it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: s/BCH_LRU_FRAGMENTATION_START/BCH_LRU_BUCKET_FRAGMENTATION/Kent Overstreet4-6/+6
FRAGMENTATION_START was incorrect, there's currently only one fragmentation LRU (at the end of the reserved bits for LRU type), and we're getting ready to add a stripe fragmentation lru - so give it a better name. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_lru_change() checks for no-opKent Overstreet3-23/+26
Minor cleanup, no reason for the caller to have to this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: minor journal errcode cleanupKent Overstreet2-5/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_write_op_error() now prints info about data updateKent Overstreet5-35/+80
A user has been seeing the "error verifying existing checksum while rewriting existing data (memory corruption?)" error. This generally indicates a hardware issue (and that may be the case here), but it might also indicate a bug, in which case we need more information to look for patterns. Reported-by: Roland Vet <vet.roland@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: metadata_target is not an inode optionKent Overstreet1-1/+1
This option only applies filesystem wide. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger1_{next,prev} cleanupAndreas Gruenbacher2-26/+4
The eytzinger code was previously relying on the following wrap-around properties and their "eytzinger0" equivalents: eytzinger1_prev(0, size) == eytzinger1_last(size) eytzinger1_next(0, size) == eytzinger1_first(size) However, these properties are no longer relied upon and no longer necessary, so remove the corresponding asserts and forbid the use of eytzinger1_prev(0, size) and eytzinger1_next(0, size). This allows to further simplify the code in eytzinger1_next() and eytzinger1_prev(): where the left shifting happens, eytzinger1_next() is trying to move i to the lowest child on the left, which is equivalent to doubling i until the next doubling would cause it to be greater than size. This is implemented by shifting i to the left so that the most significant bits align and then shifting i to the right by one if the result is greater than size. Likewise, eytzinger1_prev() is trying to move to the lowest child on the right; the same applies here. The 1-offset in (size - 1) in eytzinger1_next() isn't needed at all, but the equivalent offset in eytzinger1_prev() is surprisingly needed to preserve the 'eytzinger1_prev(0, size) == eytzinger1_last(size)' property. However, since we no longer support that property, we can get rid of these offsets as well. This saves one addition in each function and makes the code less confusing. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: convert eytzinger sort to be 1-based (2)Andreas Gruenbacher1-21/+21
In this second step, transform the eytzinger indexes i, j, and k in eytzinger1_sort_r() from 0-based to 1-based. This step looks a bit messy, but the resulting code is slightly better. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: convert eytzinger sort to be 1-based (1)Andreas Gruenbacher1-19/+29
In this first step, convert the eytzinger sort functions to use 1-based primitives. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: convert eytzinger0_find to be 1-basedAndreas Gruenbacher1-7/+7
Several of the algorithms on eytzinger trees are implemented in terms of the eytzinger0 primitives. However, those algorithms can just as easily be expressed in terms of the eytzinger1 primitives, and that leads to better and easier to understand code. Start by converting eytzinger0_find(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Add eytzinger0_find self testAndreas Gruenbacher1-0/+29
Function eytzinger0_find() isn't currently covered, so add a self test. We can rely on eytzinger0_find_le() here because it is being tested independently. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: add eytzinger0_find_ge self testAndreas Gruenbacher1-0/+38
Add an eytzinger0_find_ge() self test similar to eytzinger0_find_gt(). Note that this test requires eytzinger0_find_ge() to return the first matching element in the array in case of duplicates. To prevent bisection errors, we only add this test after strenghening the original implementation (see the previous commit). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: implement eytzinger0_find_ge directlyAndreas Gruenbacher1-5/+7
Implement eytzinger0_find_ge() directly instead of implementing it in terms of eytzinger0_find_le() and adjusting the result. This turns eytzinger0_find_ge() into a minimum search, so when there are duplicate elements, the result of eytzinger0_find_ge() will now always point at the first matching element. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: implement eytzinger0_find_gt directlyAndreas Gruenbacher1-10/+7
Instead of implementing eytzinger0_find_gt() in terms of eytzinger0_find_le() and adjusting the result, implement it directly. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: add eytzinger0_find_gt self testAndreas Gruenbacher1-0/+38
Add an eytzinger0_find_gt() self test similar to eytzinger0_find_le(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: simplify eytzinger0_find_leAndreas Gruenbacher1-20/+6
Replace the over-complicated implementation of eytzinger0_find_le() by an equivalent, simpler version. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: convert eytzinger0_find_le to be 1-basedAndreas Gruenbacher1-8/+8
eytzinger0_find_le() is also easy to concert to 1-based eytzinger (but see the next commit). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: improve eytzinger0_find_le self testAndreas Gruenbacher1-11/+30
Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a new eytzinger0_find_test_val() wrapper that calls it. We have already established that the array is sorted in eytzinger order, so we can use the eytzinger iterator functions and check the boundary conditions to verify the result of eytzinger0_find_le(). Only scan the entire array if we get an incorrect result. When we need to scan, use eytzinger0_for_each_prev() so that we'll stop at the highest matching element in the array in case there are duplicates; going through the array linearly wouldn't give us that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: add eytzinger0_for_each_prevAndreas Gruenbacher2-0/+14
Add an eytzinger0_for_each_prev() macro for iterating through an eytzinger array in reverse. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger0_find_test improvementAndreas Gruenbacher1-3/+6
In eytzinger0_find_test(), remember the smallest element seen so far instead of comparing adjacent array elements. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger[01]_test improvementAndreas Gruenbacher1-0/+2
In eytzinger[01]_test(), make sure that eytzinger[01]_for_each() iterates over all array elements. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger self tests: fix cmp_u16 typoAndreas Gruenbacher1-1/+1
Fix an obvious typo in cmp_u16(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger self tests: missing newline terminationAndreas Gruenbacher1-7/+7
pr_info() format strings need to be newline terminated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: eytzinger self tests: loop cleanupsAndreas Gruenbacher1-11/+11
The iterator variable of eytzinger0_for_each() loops has been changed to be locally scoped at some point, so remove variables defined outside the loop that are now unused. In addition and for clarity, use a different variable inside those loops where an outside variable would be shadowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: EYTZINGER_DEBUG fixAndreas Gruenbacher1-0/+1
When EYTZINGER_DEBUG is defined, <linux/bug.h> needs to be included. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_blacklist_entries_gc cleanupAndreas Gruenbacher1-4/+3
Use an eytzinger0_for_each() loop here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: bch2_bkey_ptr_data_type() now correctly returns cached for cached ptrsKent Overstreet2-2/+7
Necessary for adding backpointers for cached pointers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Add time_stat for btree writesKent Overstreet3-5/+13
We have other metadata IO types covered, this was missing. Note: this includes the time until completion, i.e. including parent pointer update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Add comment explaining why asserts in invalidate_one_bucket() are ↵Kent Overstreet1-0/+7
impossible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Ignore backpointers to stripes in ec_stripe_update_extents()Kent Overstreet1-1/+5
Prep work for stripe backpointers: this path previously would get very confused at being asked to process (remove redundant replicas) stripes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Increase JOURNAL_BUF_NRKent Overstreet4-15/+61
Increase journal pipelining. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Free journal bufs when not in useKent Overstreet3-18/+87
Since we're increasing the number of 'struct journal_bufs', we don't want them all permanently holding onto buffers for the journal data - that'd be 16 * 2MB = 32MB, or potentially more. Add a single-element mempool (open coded, since buffer size varies), this also means we won't be hitting the memory allocator every time we open and close a journal entry/buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Don't touch journal_buf->data->seq in journal_res_getKent Overstreet1-1/+4
This is a small optimization, reducing the number of cachelines we touch in the fast path - and it's also necessary for the next patch that increases JOURNAL_BUF_NR. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Kill journal_res.idxKent Overstreet3-7/+7
More dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-15bcachefs: Kill journal_res_state.unwritten_idxKent Overstreet3-18/+5
Dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>