summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: Use x-macros for data typesKent Overstreet21-108/+110
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix short buffered writesKent Overstreet1-10/+11
In the buffered write path, we have to check for short writes that write to the full page, where the page wasn't UpToDate; when this happens, the page is partly garbage, so we have to zero it out and revert that part of the write. This check was wrong - we reverted total from copied, but didn't revert the iov_iter, probably also leading to corrupted writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Allow existing stripes to be updated with new data bucketsKent Overstreet2-5/+98
This solves internal fragmentation within stripes. We already have copygc, which evacuates buckets that are partially or mostly empty, but it's up to the ec code that manages stripes to deal with stripes that have empty buckets in them. This patch changes the path for creating new stripes to check if there's existing stripes with empty buckets - and if so, update them with new data buckets instead of creating new stripes. TODO: improve the disk space accounting so that we can only use this (more expensive path) when we have too much fragmentation in existing stripes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor stripe creationKent Overstreet4-139/+180
Prep work for the patch to update existing stripes with new data blocks. This moves allocating new stripes into ec.c, and also sets up the data structures so that we can handly only allocating some of the blocks in a stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Move stripe creation to workqueueKent Overstreet6-61/+82
This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve stripe triggers/heap codeKent Overstreet8-98/+146
Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rework triggers interfaceKent Overstreet2-132/+169
The trigger for stripe keys is shortly going to need both the old and the new key passed to the trigger - this patch does that rework. For now, this just changes the in memory triggers, and this doesn't change how extent triggers work. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BTREE_TRIGGER_NOOVERWRITESKent Overstreet4-15/+11
This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Mark btree nodes as needing rewrite when not all replicas are RWKent Overstreet4-13/+19
This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use blk_status_to_str()Kent Overstreet5-8/+15
Improved error messages are always a good thing Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't cap ios in dio write path at 2 MBKent Overstreet1-10/+0
It appears this was erronious, a different bug was responsible Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor dio write code to reinit bch_write_opKent Overstreet3-47/+35
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_extent_can_insert() not being calledKent Overstreet3-35/+49
It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a null ptr deref in bch2_btree_iter_traverse_one()Kent Overstreet1-1/+1
We use sentinal values that aren't NULL to indicate there's a btree node at a higher level; occasionally, this may result in btree_iter_up_until_good_node() stopping at one of those sentinal values. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Track sectors of erasure coded dataKent Overstreet5-14/+33
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use btree reserve when appropriateKent Overstreet1-3/+3
Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add a kthread_should_stop() check to allocator threadKent Overstreet1-0/+2
Turns out it's possible during shutdown for the allocator to get stuck spinning on bch2_invalidate_buckets() without hitting any of the other checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Change bch2_dump_bset() to also print key valuesKent Overstreet4-27/+26
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a deadlock in the RO pathKent Overstreet2-3/+11
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix incorrect gfp checkKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix lock ordering with new btree cache codeKent Overstreet4-22/+110
The code that checks lock ordering was recently changed to go off of the pos of the btree node, rather than the iterator, but the btree cache code didn't update to handle iterators that point to cached bkeys. Oops Also, update various debug code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: delete a slightly faulty assertionKent Overstreet1-2/+0
state lock isn't held at startup Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Increase size of btree node reserveKent Overstreet2-4/+7
Also tweak the allocator to be more aggressive about keeping it full. The recent changes to make updates to interior nodes transactional (and thus generate updates to the alloc btree) all put more stress on the btree node reserves. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Give bkey_cached_key same attributes as bposKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use cached iterators for alloc btreeKent Overstreet9-115/+184
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Btree key cacheKent Overstreet14-45/+787
This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Implement a new gc that only recalcs oldest genKent Overstreet4-0/+92
Full mark and sweep gc doesn't (yet?) work with the new btree key cache code, but it also blocks updates to interior btree nodes for the duration and isn't really necessary in practice; we aren't currently attempting to repair errors in allocation info at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Turn c->state_lock into an rwsemKent Overstreet7-57/+50
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add an internal option for reading entire journalKent Overstreet4-22/+44
To be used the debug tool that dumps the contents of the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't deadlock when btree node reuse changes lock orderingKent Overstreet4-16/+62
Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a deadlockKent Overstreet4-32/+67
__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for btree node lock ordering, this caused an off by one error that was triggered by bch2_btree_node_get_sibling() getting the previous node. This refactors the code to compare against btree node keys directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Refactor btree insert pathKent Overstreet1-52/+38
This splits out the journalling code from the btree update code; prep work for the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Always give out journal pre-res if we already have oneKent Overstreet3-15/+30
This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: More open bucketsKent Overstreet3-11/+17
We need a larger open bucket reserve now that the btree interior update path holds onto open bucket references; filesystems with many high through devices may need more open buckets now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't allocate memory under the btree cache lockKent Overstreet1-29/+58
The btree cache lock is needed for reclaiming from the btree node cache, and memory allocation can potentially spin and sleep (for 100 ms at a time), so.. don't do that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a linked list bugKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make open bucket reserves more conservativeKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: btree_update_nodes_written() requires alloc reserveKent Overstreet1-3/+6
Also, in the btree_update_start() path, if we already have a journal pre-reservation we don't want to take another - that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()Kent Overstreet1-1/+1
bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as we're a filesystem now GFP_NOFS makes more sense now and is looser. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Call bch2_btree_iter_traverse() if necessary in commit pathKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_trans_downgrade()Kent Overstreet3-24/+22
bch2_btree_iter_downgrade() was looping over all iterators in a transaction; bch2_trans_downgrade() should be doing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve warning for copygc failing to move dataKent Overstreet3-3/+20
This will help narrow down which code is at fault when this happens. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Always increment bucket gen on bucket reuseKent Overstreet2-21/+47
Not doing so confuses copygc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill old allocator startup codeKent Overstreet5-257/+1
It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve assorted error messagesKent Overstreet5-136/+127
This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()Kent Overstreet4-11/+29
There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(), which can leave a btree node locked that is just outside iter->pos, breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally we should get rid of this corner case, but for now fix it locally with verbose comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add debug code to print btree transactionsKent Overstreet8-5/+92
Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Set filesystem features earlier in fs init pathKent Overstreet1-5/+9
Before we were setting features after allocating btree nodes, which meant we were using the old btree pointer format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add an option to disable reflink supportKent Overstreet4-0/+13
Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fixes for going ROKent Overstreet5-33/+60
Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>