summaryrefslogtreecommitdiff
path: root/fs/bcachefs/buckets.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: Perf improvements for bch_alloc_read()Kent Overstreet1-2/+2
On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix off-by-one error in ptr gen checkKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete unused argumentsKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make copygc thread globalKent Overstreet1-16/+3
Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use x-macros for data typesKent Overstreet1-20/+20
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve stripe triggers/heap codeKent Overstreet1-51/+52
Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rework triggers interfaceKent Overstreet1-130/+167
The trigger for stripe keys is shortly going to need both the old and the new key passed to the trigger - this patch does that rework. For now, this just changes the in memory triggers, and this doesn't change how extent triggers work. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BTREE_TRIGGER_NOOVERWRITESKent Overstreet1-7/+1
This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_extent_can_insert() not being calledKent Overstreet1-21/+27
It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Track sectors of erasure coded dataKent Overstreet1-10/+18
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a deadlock in the RO pathKent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: delete a slightly faulty assertionKent Overstreet1-2/+0
state lock isn't held at startup Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use cached iterators for alloc btreeKent Overstreet1-41/+44
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Btree key cacheKent Overstreet1-0/+7
This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Turn c->state_lock into an rwsemKent Overstreet1-4/+3
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Always increment bucket gen on bucket reuseKent Overstreet1-11/+19
Not doing so confuses copygc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve assorted error messagesKent Overstreet1-127/+116
This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't require alloc btree to be updated before buckets are usedKent Overstreet1-14/+33
This is to break a circular dependency in the shutdown path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Interior btree updates are now fully transactionalKent Overstreet1-1/+1
We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Better error messages on bucket sector count overflowsKent Overstreet1-16/+23
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't use peek_filter() unnecessarilyKent Overstreet1-6/+3
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Move extent overwrite handling out of core btree codeKent Overstreet1-6/+7
Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: More btree iter invariantsKent Overstreet1-3/+5
Ensure that iter->pos always lies between the start and end of iter->k (the last key returned). Also, bch2_btree_iter_set_pos() now invalidates the key that peek() or next() returned. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix error message on bucket sector count overflowKent Overstreet1-6/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: btree_ptr_v2Kent Overstreet1-0/+2
Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Sort & deduplicate updates in bch2_trans_update()Kent Overstreet1-41/+17
Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Split out btree_trigger_flagsKent Overstreet1-35/+42
The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make btree_insert_entry more private to update pathKent Overstreet1-6/+6
This should be private to btree_update_leaf.c, and we might end up removing it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't BUG_ON() sector count overflowKent Overstreet1-12/+14
Return an error instead (still work in progress...) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Inline more of bch2_trans_commit hot pathKent Overstreet1-15/+33
The main optimization here is that if we let bch2_replicas_delta_list_apply() fail, we can completely skip calling bch2_bkey_replicas_marked_locked(). And assorted other small optimizations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Trust btree alloc info at runtimeKent Overstreet1-1/+1
This lets us avoid a cache miss in the write path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make replicas_delta_list smallerKent Overstreet1-7/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix erasure coding disk space accountingKent Overstreet1-48/+84
Disk space accounting for erasure coding + compression was completely broken - we need to calculate the parity sectors delta the same way we calculate disk_sectors, by calculating the old and new usage and subtracting to get the difference. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Limit pointers to being in only one stripeKent Overstreet1-17/+11
This make the disk accounting code saner, and it's not clear why we'd ever want the same data to be in multiple stripes simultaneously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_mark_extent()Kent Overstreet1-1/+2
If an extent only contained cached or erasure coded pointers, there won't be any devices in the normal dirty replicas list or an entry to update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rework btree iterator lifetimesKent Overstreet1-4/+2
The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill deferred btree updatesKent Overstreet1-4/+4
Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improved bch2_fcollapse()Kent Overstreet1-0/+1
Move extents instead of copying them - this way, we can iterate over only live extents, not the entire keyspace. Also, this means we can mostly skip running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Do updates in order they were queued up inKent Overstreet1-16/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BTREE_INSERT_NOMARK_INSERTKent Overstreet1-5/+4
Was dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITESKent Overstreet1-0/+3
bch2_mark_update() was correct, but bch2_trans_mark_update() wasn't respecting BTREE_INSERT_NOMARK_OVERWRITES - key marking/triggers really need to be cleaned up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve pointer marking checks and error messagesKent Overstreet1-21/+44
Importantly, we don't want to use bch2_fs_inconsistent_on() for errors that fsck can repair, becuase that will just put us in RO mode and prevent fsck from actually fixing stuff. Probably want to get rid of it in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix error message on bucket overflowKent Overstreet1-8/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fixes for replicas trackingKent Overstreet1-14/+9
The continue statement in bch2_trans_mark_extent() was wrong - by bailing out early, we'd be constructing the wrong replicas list to update. Also, the assertion in update_replicas() was wrong - due to rounding with compressed extents, it is possible for sectors to be 0 sometimes. Also, change extent_to_replicas() in replicas.c to match the replicas list we construct in buckets.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor bch2_alloc_write()Kent Overstreet1-17/+0
Major simplification - gets rid of the need for marking buckets as dirty, instead we write buckets if the in memory mark is different from what's in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Trust in memory bucket markKent Overstreet1-12/+31
This fixes a bug in the journal replay -> extent_replay_key -> split_compressed path, when we do an update that changes alloc info but the alloc info in the btree isn't up to date yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix faulty assertionKent Overstreet1-2/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: ReflinkKent Overstreet1-7/+93
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor bch2_extent_trim_atomic() for reflinkKent Overstreet1-6/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rework calling convention for marking overwritesKent Overstreet1-61/+67
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>