summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-05-22bcachefs: bch2_run_explicit_recovery_pass_printbuf()Kent Overstreet4-19/+57
We prefer helpers that emit log messages to printbufs rather than printing them directly; that way, we can ensure that different log messages from the same event are grouped together and formatted appropriately in the dmesg log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Incompatible features may now be enabled at runtimeKent Overstreet5-3/+38
version_upgrade is now a runtime option. In the future we'll want to add compatible upgrades at runtime, and call the full check_version_upgrade() when the option changes, but we don't have compatible optional upgrades just yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Clean up option pre/post hooks, small fixesKent Overstreet5-39/+86
The helpers are now: - bch2_opt_hook_pre_set() - bch2_opts_hooks_pre_set() - bch2_opt_hook_post_set Fix a bug where the filesystem discard option would incorrectly be changed when setting the device option, and don't trigger rebalance scans unnecessarily (when options aren't changing). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Use drop_locks_do() in bch2_inode_hash_find()Kent Overstreet1-3/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Single device modeKent Overstreet8-11/+44
Single device filesystems are now identified by the block device name, not the UUID - and single device filesystems with the same UUID can be mounted simultaneously, without any special options. This allocates a new bit in the superblock, BCH_SB_MULTI_DEVICE, which indicates whether a filesystem has ever been multi device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Initialize c->name earlier on single dev filesystemsKent Overstreet1-15/+18
On single device filesystems, c->name contains the block device name, not the UUID. Initialize this earlier, so that single device mode can use it for initializing sysfs/debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Simplify logicAlan Huang1-6/+2
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Remove spurious +1/-1 operationAlan Huang1-2/+2
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Kill bch2_trans_unlock_noassertAlan Huang3-9/+1
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Clean up duplicated code in bch2_journal_halt()Kent Overstreet2-11/+8
It's now a wrapper around bch2_journal_halt_locked(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_dev_allocator_set_rw()Kent Overstreet2-7/+12
Add a helper that lets us change bch_member.data_allowed at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_dev_journal_alloc() now respects data_allowedKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Improve bch2_btree_cache_to_text()Kent Overstreet2-3/+9
Make the output slightly clearer, and include a counter for "nodes we couldn't free because we would have gone under our reserve". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: __btree_node_reclaim_checks()Kent Overstreet1-66/+69
Factor out a helper so we're not duplicating checks after locking the btree node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()Kent Overstreet1-26/+20
Small cleanup, just always increment the counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Improve opts.degradedKent Overstreet5-33/+76
Kill 'opts.very_degraded', and make 'opts.degraded' a persistent option, stored in the superblock. It's now an enum, with available choices ask/yes/very/no. "ask" mode will be handled by the mount helper, for prompting the user (on a machine used interactively) for whether to do a degraded mount. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: export bch2_chacha20Kent Overstreet2-2/+4
Needed for userspcae. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: indent error messages of invalid compressionIntegral2-2/+5
This patch uses printbuf_indent_add_nextline() to set a consistent indentation level for error messages of invalid compression. In my previous patch [1], the newline is added by using '\n' in the argument of prt_str(). This patch replaces prt_str() with prt_printf() to make indentation level work correctly. [1] Link: https://lore.kernel.org/20250406152659.205997-2-integral@archlinuxcn.org Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: split error messages of invalid compression into two linesIntegral1-2/+2
When an invalid compression type or level is passed as an argument to `--compression`, two error messages are squashed into one line: > bcachefs format --compression=lzo bcachefs-comp.img invalid option: invalid compression typecompression: parse error > bcachefs format --compression=lz4:16 bcachefs-comp.img invalid option: invalid compression levelcompression: parse error To resolve this issue, add a newline character at the end of the first error message to separate them into two lines. Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: early return for negative values when parsing BCH_OPT_UINTIntegral2-3/+10
Currently, when passing a negative integer as argument, the error message is "too big" due to casting to an unsigned integer: > bcachefs format --block_size=-1 bcachefs.img invalid option: block_size: too big (max 65536) When negative value in argument detected, return early before calling bch2_opt_validate(). A new error code `BCH_ERR_option_negative` is added. Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: move_data_phys: stats are not requiredKent Overstreet1-2/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: RO mounts now use less memoryKent Overstreet4-24/+44
Defer memory allocations only needed in RW mode until we actually go RW. This is part of improved support for RO images. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Move various init code to _init_early()Kent Overstreet11-21/+23
_init_early() is for initialization that cannot fail, and often must happen for teardown partway through initialization to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alphabetize init function callsKent Overstreet1-25/+25
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: simplify journal pin initializationKent Overstreet1-12/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: btree_io_complete_wq -> btree_write_complete_wqKent Overstreet3-5/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_kvmalloc() mem alloc profilingKent Overstreet1-3/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: add missing includeKent Overstreet1-0/+1
Hygeine, and fix build in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_snapshot_table_make_room()Kent Overstreet1-3/+12
Add a better helper for check_snapshot_exists(). create_snapids() can't be changed to use this, unfortunately, because the transaction that creates new snapshot will also be inserting other keys (e.g. root inode) that reference that snapshot ID, and they expect the snapshot table to already be updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: darray: provide typedefs for primitive typesKent Overstreet3-5/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: reduce new_stripe_alloc_buckets() stack usageKent Overstreet3-19/+25
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request no longer on stackKent Overstreet1-41/+43
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request.ptrs2Kent Overstreet2-6/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request.caKent Overstreet2-26/+30
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request.countersKent Overstreet3-69/+64
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request.usageKent Overstreet2-25/+21
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request: deallocate_extra_replicas()Kent Overstreet1-8/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: new_stripe_alloc_buckets() takes alloc_requestKent Overstreet1-29/+44
More stack usage improvements: instead of creating a new alloc_request (currently on the stack), save/restore just the fields we need to reuse. This is a bit tricky, because we're doing a normal alloc_foreground.c allocation, which calls into ec.c to get a stripe, which then does more normal allocations - some of the fields get reused, and used differently. So we have to save and restore them - but the stack usage improvements will be well worth it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_ec_stripe_head_get() takes alloc_requestKent Overstreet4-19/+18
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_bucket_alloc_trans() takes alloc_requestKent Overstreet1-18/+19
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: alloc_request.data_typeKent Overstreet3-15/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: struct alloc_requestKent Overstreet5-240/+179
Add a struct for common state for satisfying an on disk allocation, instead of passing the same long list of items to every function. This will help with stack usage, performance, and perhaps enable some code cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: trace bch2_trans_kmalloc()Kent Overstreet7-18/+115
We're occasionally seeing the WARN_ON() for bump allocator usage exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: replace memcpy with memcpy_and_pad for jset_entry_log->d buffRoxana Nicolescu1-4/+4
This was achieved before by zero-ing out the source buffer and then copying the bytes into the destination buffer. This can also be done with memcpy_and_pad which will zero out only the destination buffer if its size is bigger than the size of the source buffer. This is already used in the same way in journal_transaction_name(). Moreover, zero-ing the source buffer was done twice, first in __bch2_fs_log_msg() and then in bch2_trans_log_msg(). And this method may also require allocating some extra memory for the source buffer. In conclusion, using memcpy_and_pad is better even tough the result is the same because it brings uniformity with what's already used in journal_transaction_name, it avoids code duplication and reallocating extra memory. Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_nameRoxana Nicolescu1-1/+3
Strncpy is now deprecated. The buffer destination is not required to be NULL-terminated, but we also want to zero out the rest of the buffer as it is already done in other places. Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Rebalance now skips poisoned extentsKent Overstreet1-0/+6
Let's not move poisoned extents unnecessarily, since we can't guard against introducing more bitrot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Data move can read from poisoned extentsKent Overstreet3-11/+25
Now, if an extent is poisoned we can move it even if there was a checksum error. We'll have to give it a new checksum, but the poison bit means that userspace will still see the appropriate error when they try to read it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Poison extents that can't be read due to checksum errorsKent Overstreet3-4/+73
Copygc needs to be able to move extents that have bitrotted. We don't want to delete them - in the future we'll have an API for "read me the data even if there's checksum errors", and in general we don't want to delete anything unless the user asks us to. That will require writing it with a new checksum, which means we can't forget that there was a checksum error so we return the correct error to userspace. Rebalance also wants to skip bad extents; we can now use the poison flag for that. This is currently disabled by default, as we want read fua support so that we can distinguish between transient and permanent errors from the device. It may be enabled with the module parameter: poison_extents_on_checksum_error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: Be precise about bch_io_failuresKent Overstreet3-5/+52
If the extent we're reading from changes, due to be being overwritten or moved (possibly partially) - we need to reset bch_io_failures so that we don't accidentally mark a new extent as poisoned prematurely. This means we have to separately track (in the retry path) the extent we previously read from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22bcachefs: bch2_subvolume_wait_for_pagecache_and_delete() cleanupKent Overstreet1-4/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>