summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_io.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: btree_err() now uses bch2_print_string_as_lines()Kent Overstreet1-8/+9
We've seen long error messages get truncated here, so convert to the new bch2_print_string_as_lines(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New locking functionsKent Overstreet1-1/+7
In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add persistent counters for all tracepointsKent Overstreet1-2/+2
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Always rebuild aux search trees when node boundaries changeKent Overstreet1-2/+5
Topology repair may change btree node min/max keys: when it does so, we need to always rebuild eytzinger search trees because nodes directly depend on those values. This fixes a bug found by the 'kill_btree_node' test, where we'd pop an assertion in bch2_bset_search_linear(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: remove dead whiteout_u64s argument.Olexa Bilaniuk1-11/+5
Signed-off-by: Olexa Bilaniuk <obilaniu@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert fsck errors to errcode.hKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix btree node read retriesKent Overstreet1-0/+2
b->written wasn't being reset to 0 in the btree node read retry path, causing decrypting & validation of previously read bsets to not be re-run - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Printbuf reworkKent Overstreet1-20/+20
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix btree node read error pathKent Overstreet1-9/+16
We were forgetting to clear the read_in_flight flag - oops. This also fixes it to not call bch2_fatal_error() before topology repair has had a chance to do its thing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Print message on btree node read retry successKent Overstreet1-1/+6
Right now, we print an error message on btree node read error, and we print that we're retrying, but we don't explicitly say if the retry succeeded - this makes things a little clearer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Improve invalid bkey error messageKent Overstreet1-6/+6
Bkeys have gotten a lot bigger since this code was written and now are often formatted across multiple lines - while the reason a bkey is invalid will still be short and fit on a single line. This patch prints the error bfore the bkey, making it a bit more readable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Shutdown path improvementsKent Overstreet1-5/+9
We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Don't trigger extra assertions in journal replayKent Overstreet1-2/+2
We now pass a rw argument to .key_invalid methods so they can trigger assertions for updates but not on existing keys. We shouldn't trigger these extra assertions in journal replay - this patch changes the transaction commit path accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add rw to .key_invalid()Kent Overstreet1-6/+7
This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Convert .key_invalid methods to printbufsKent Overstreet1-28/+46
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Freespace, need_discard btreesKent Overstreet1-1/+1
This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add printf format attribute to bch2_pr_buf()Kent Overstreet1-1/+1
This tells the compiler to check printf format strings, and catches a few bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: x-macro metadata version enumKent Overstreet1-1/+1
Now we've got strings for metadata versions - this changes bch2_sb_to_text() and our mount log message to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add a missing wakeupKent Overstreet1-1/+2
This fixes a rare bug with bch2_btree_flush_all_writes() getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix usage of six lock's percpu modeKent Overstreet1-1/+1
Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix race leading to btree node write getting stuckKent Overstreet1-3/+7
Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Improve btree_node_write_if_need()Kent Overstreet1-8/+14
btree_node_write_if_need() kicks off a btree node write only if need_write is set; this makes the locking easier to reason about by moving the check into the cmpxchg loop in __bch2_btree_node_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix locking in btree_node_write_done()Kent Overstreet1-18/+7
There was a rare recursive locking bug, in __bch2_btree_node_write() nowrite path -> btree_node_write_done(), in the path that kicks off another write. This splits out an inner __btree_node_write_done() that expects to be run with the btree node lock held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Start moving debug info from sysfs to debugfsKent Overstreet1-27/+0
In sysfs, files can only output at most PAGE_SIZE. This is a problem for debug info that needs to list an arbitrary number of times, and because of this limit some of our debug info has been terser and harder to read than we'd like. This patch moves info about journal pins and cached btree nodes to debugfs, and greatly expands and improves the output we return. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill BCH_FS_HOLD_BTREE_WRITESKent Overstreet1-3/+0
This was just dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Heap allocate printbufsKent Overstreet1-48/+55
This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve some btree node read error messagesKent Overstreet1-2/+3
On btree node read error, it's helpful to see what we were trying to read - was it all zeroes? Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Check for errors from crypto_skcipher_encrypt()Kent Overstreet1-3/+13
Apparently it actually is possible for crypto_skcipher_encrypt() to return an error - not sure why that would be - but we need to replace our assertion with actual error handling. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Log & error message improvementsKent Overstreet1-4/+8
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Option improvementsKent Overstreet1-9/+9
This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kill bch2_sort_repack_merge()Kent Overstreet1-10/+4
The main function of bch2_sort_repack_merge() was to call .key_normalize on every key, which drops stale (cached) pointers - it hasn't actually merged extents in quite some time. But bch2_gc_gens() now works on individual keys - we used to gc old gens by rewriting entire btree nodes. With that gone, there's no need for internal btree code to be calling .key_normalize anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix debug build in userspaceKent Overstreet1-1/+2
This fixes some compiler warnings that only trigger in userspace - dead code, a maybe uninitialed variable, a maybe null ptr passed to printk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix some compiler warningsKent Overstreet1-1/+1
gcc couldn't always deduce that written wasn't used uninitialized Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Clean up/rename bch2_trans_node_* fnsKent Overstreet1-6/+3
These utility functions are for managing btree node state within a btree_trans - rename them for consistency, and drop some unneeded arguments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Reduce iter->trans usageKent Overstreet1-1/+1
Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: BSET_OFFSET()Kent Overstreet1-6/+13
Add a field to struct bset for the sector offset within the btree node where it was written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Update btree ptrs after every writeKent Overstreet1-128/+94
This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Kick off btree node writes from write completionsKent Overstreet1-13/+48
This is a performance improvement by removing the need to wait for the in flight btree write to complete before kicking one off, which is going to be needed to avoid a performance regression with the upcoming patch to update btree ptrs after every btree write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Really don't hold btree locks while btree IOs are in flightKent Overstreet1-5/+46
This is something we've attempted to stick to for quite some time, as it helps guarantee filesystem latency - but there's a few remaining paths that this patch fixes. This is also necessary for an upcoming patch to update btree pointers after every btree write - since the btree write completion path will now be doing btree operations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Regularize argument passing of btree_transKent Overstreet1-3/+5
btree_trans should always be passed when we have one - iter->trans is disfavoured. This mainly updates old code in btree_update_interior.c, some of which predates btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix btree_node_read_all_replicas() error handlingKent Overstreet1-19/+20
We weren't checking bch2_btree_node_read_done() for errors, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix a deadlockKent Overstreet1-0/+4
Waiting on a btree node write with btree locks held can deadlock, if the write errors: the write error path has to do do a btree update to drop the pointer to the replica that errored. The interior update path has to wait on in flight btree writes before freeing nodes on disk. Previously, this was done in bch2_btree_interior_update_will_free_node(), and could deadlock; now, we just stash a pointer to the node and do it in btree_update_nodes_written(), just prior to the transactional part of the update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Split out btree_error_wqKent Overstreet1-1/+1
We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix an issue with inconsistent btree writes after unclean shutdownKent Overstreet1-1/+18
After unclean shutdown, btree writes may have completed on one device and not others - and this inconsistency could lead us to writing new bsets with a gap in our btree node in one of our replicas. Fortunately, this is only an issue with bsets that are newer than the most recent journal flush, and we already have a mechanism for detecting and blacklisting those. We just need to make sure to start new btree writes after the most recent _non_ blacklisted bset. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add a workqueue for btree io completionsKent Overstreet1-6/+7
Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add a debug mode that always reads from every btree replicaKent Overstreet1-7/+266
There's a new module parameter, verify_all_btree_replicas, that enables reading from every btree replica when reading in btree nodes and comparing them against each other. We've been seeing some strange btree corruption - this will hopefully aid in tracking it down and catching it more often. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Fix oob write in __bch2_btree_node_writeDan Robertson1-0/+3
Fix a possible out of bounds write in __bch2_btree_node_write when the data buffer padding is cleared up to the block size. The out of bounds write is possible if the data buffers size is not a multiple of the block size. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New and improved topology repair codeKent Overstreet1-1/+57
This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rewrite btree nodes with errorsKent Overstreet1-0/+7
This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Punt btree writes to workqueue to submitKent Overstreet1-8/+12
We don't want to be submitting IO with btree locks held, and btree writes usually aren't latency sensitive. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>