Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a DEFINE_CLASS() for printbufs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helpers to avoid open coded loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we don't finish journal replay we need to keep journal keys around
until the filesystem shuts down - otherwise e.g. -o norecovery, various
tools (dump, list) break, and eventually we'll be doing journal replay
in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a bug report where the errors from btree_node_check_topology()
don't seem to be getting printed; log_fsck_err() does some fancy
ratelimiting-type stuff that we don't want here.
Instead, just use bch2_count_fsck_err(); this is simpler, and modelled
after how we're currently handling bucket ref update errors in
buckets.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More self healing code: readdir will now notice if there are dirents
hashed incorrectly, and it'll repair them if errors=fix_safe.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We don't track snapshot overwrites outside of fsck, so for this to be
called at runtime outside of fsck we need to create it on demand, when
we have repair to do.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now uses bch2_get_snapshot_overwrites(), and much shorter.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helper for getting a list of snapshot IDs that have overwritten a
given key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Recover from "journal and btree in same bucket".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.
Previously we'd just skip these keys, but it turns out that causes
copygc to spin.
So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.
Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
data_update_init() does need to do btree operations, delay doing the
unlock-before-io.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We'll be adding subtypes of these errors, and new error code tracing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These don't access global memory or defer pointer arguments - this
enables CSE optimizations.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were accidentally including the contents from the previous
fsck_err().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Make the superblock error counters available in sysfs; the only other
way they can be seen is 'show-super', but we don't write the superblock
every time the error count gets incremented.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.
For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We have a bug report that looks like we might be leaking open buckets -
let's check if they got left attached to the cached btree node.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More stack usage work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Specialize the .to_text() for alloc_v4, to avoid the temporary on the
stack for conversion from old versions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Separate out a slowpath for bkey_nocow_lock()
- Don't call bch2_bkey_ptrs_c() or loop over pointers more than
necessary
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More stack usage work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out an error path for a small stack usage improvement.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix an assertion pop in the tiering_misaligned test: rounding down to
bucket size at the end of the journal space calculations leaves
cur_entry_sectors == 0, which is incorrect with !cur_entry_err.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's uncomon to have multiple devices with journalling only on a subset,
but can be specified with the 'data_allowed' option. We need to know if
we're doing data/metadata writes to multiple devices, as that requires
issuing flushes before the journal writes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix an infinite loop when bkey_i->k.u64s is 0.
This only happens in userspace, where 'bcachefs list_journal' can print
the entire contents of the journal, and non-dirty entries aren't
validated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Don't print a checksum error when we first read a journal entry: we
print a checksum error later if we'll be using the journal entry.
- Continuing with the theme of of improving error messages and grouping
errors into a single log message per error, print a single 'checksum
error' message per journal entry, and use bch2_journal_ptr_to_text()
to print out where on the device it was.
- Factor out checksum error messages and checking for missing journal
entries into helpers, bch2_journal_read() has gotten obnoxiously big.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"Another set of timer API cleanups:
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*()
namespace convention.
There is another large conversion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next.
The conversion scripts will be run towards the end of the merge window
and a pull request sent once all conflict dependencies have been
merged"
* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
timers: Rename init_timers() as timers_init()
timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
timers: Rename __init_timer() as __timer_init()
timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
timers: Rename init_timer_key() as timer_init_key()
|
|
Fix a small regression from the "run recovery passes" rewrite, which
enabled async recovery passes.
This fixes getting stuck in a loop in recovery.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Other repair code seems to be doing commits themselves, but
check_key_has_snapshot() does not.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a missing wakeup in
'bcachefs set-file-option' -> xattr option update -> inode_write
this was missing because the wakeup needs to happen after transaction
commit. Also, add a 'kick' counter, to make sure we don't miss a wakeup
that occured right after we finished checking the rebalance_work btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a version of bch2_kthread_io_clock_wait() that only schedules once -
behaving more like schedule_timeout().
This will be used for fixing rebalance wakeups.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, don't error out in bucket_ref_update_err(): we don't want to
return -BCH_ERR_cannot_rewind_recovery if it's not an insert, if it's an
overwrite we continue.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Fix memcpy_sglist to handle partially overlapping SG lists
- Use memcpy_sglist to replace null skcipher
- Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
- Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
- Hide CRYPTO_MANAGER
- Add delayed freeing of driver crypto_alg structures
Compression:
- Allocate large buffers on first use instead of initialisation in scomp
- Drop destination linearisation buffer in scomp
- Move scomp stream allocation into acomp
- Add acomp scatter-gather walker
- Remove request chaining
- Add optional async request allocation
Hashing:
- Remove request chaining
- Add optional async request allocation
- Move partial block handling into API
- Add ahash support to hmac
- Fix shash documentation to disallow usage in hard IRQs
Algorithms:
- Remove unnecessary SIMD fallback code on x86 and arm/arm64
- Drop avx10_256 xts(aes)/ctr(aes) on x86
- Improve avx-512 optimisations for xts(aes)
- Move chacha arch implementations into lib/crypto
- Move poly1305 into lib/crypto and drop unused Crypto API algorithm
- Disable powerpc/poly1305 as it has no SIMD fallback
- Move sha256 arch implementations into lib/crypto
- Convert deflate to acomp
- Set block size correctly in cbcmac
Drivers:
- Do not use sg_dma_len before mapping in sun8i-ss
- Fix warm-reboot failure by making shutdown do more work in qat
- Add locking in zynqmp-sha
- Remove cavium/zip
- Add support for PCI device 0x17D8 to ccp
- Add qat_6xxx support in qat
- Add support for RK3576 in rockchip-rng
- Add support for i.MX8QM in caam
Others:
- Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
- Add new SEV/SNP platform shutdown API in ccp"
* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
crypto: qat - add missing header inclusion
crypto: api - Redo lookup on EEXIST
Revert "crypto: testmgr - Add hash export format testing"
crypto: marvell/cesa - Do not chain submitted requests
crypto: powerpc/poly1305 - add depends on BROKEN for now
Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
crypto: ccp - Add missing tee info reg for teev2
crypto: ccp - Add missing bootloader info reg for pspv5
crypto: sun8i-ce - move fallback ahash_request to the end of the struct
crypto: octeontx2 - Use dynamic allocated memory region for lmtst
crypto: octeontx2 - Initialize cptlfs device info once
crypto: xts - Only add ecb if it is not already there
crypto: lrw - Only add ecb if it is not already there
crypto: testmgr - Add hash export format testing
crypto: testmgr - Use ahash for generic tfm
crypto: hmac - Add ahash support
crypto: testmgr - Ignore EEXIST on shash allocation
crypto: algapi - Add driver template support to crypto_inst_setname
crypto: shash - Set reqsize in shash_alg
...
|
|
Repair code will do updates on older snapshot versions, so needs the
correct annotation.
Reported-by: syzbot+42581416dba62b364750@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we're doing a reflink copy of existing reflinked data, we may only
set REFLINK_P_MAY_UPDATE_OPTIONS if it was set on the reflink pointer
we're copying from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Large folios aren't supported without TRANSPARENT_HUGEPAGE
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't unlock a should_be_locked path unless we're in a transaction
restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Different versions differ on the size of the blacklist range; it is
theoretically possible that we could end up with blacklisted journal
sequence numbers newer than the newest seq we find in the journal, and
pick a new start seq that's blacklisted.
Explicitly check for this in bch2_fs_journal_start().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|