kernel/linux.git/fs/btrfs/zstd.c, branch v7.1

btrfs: prevent direct reclaim during compressed readahead

2026-04-07T16:56:08+00:00

Under memory pressure, direct reclaim can kick in during compressed readahead. This puts the associated task into D-state. Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under heavy pressure, we've observed reclaim can run long enough that the CPU becomes prone to CSD lock stalls since it cannot service incoming IPIs. Although the CSD lock stalls are the worst case scenario, we have found many more subtle occurrences of this latency on the order of seconds, over a minute in some cases. Prevent direct reclaim during compressed readahead. This is achieved by using different GFP flags at key points when the bio is marked for readahead. There are two functions that allocate during compressed readahead: btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use GFP_NOFS which includes __GFP_DIRECT_RECLAIM. For the internal API call btrfs_alloc_compr_folio(), the signature changes to accept an additional gfp_t parameter. At the readahead call site, it gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. __GFP_NOWARN is added since these allocations are allowed to fail. Demand reads still use full GFP_NOFS and will enter reclaim if needed. All other existing call sites of btrfs_alloc_compr_folio() now explicitly pass GFP_NOFS to retain their current behavior. add_ra_bio_pages() gains a bool parameter which allows callers to specify if they want to allow direct reclaim or not. In either case, the __GFP_NOWARN flag was added unconditionally since the allocations are speculative. There has been some previous work done on calling add_ra_bio_pages() [0]. This patch is complementary: where that patch reduces call frequency, this patch reduces the latency associated with those calls. [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/ Reviewed-by: Mark Harmstone Reviewed-by: Qu Wenruo Signed-off-by: JP Kobryn (Meta) Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: zstd: don't cache sectorsize in a local variable

2026-04-07T16:56:08+00:00

The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Minor effect on zstd_compress_folios() where it saves 8 bytes of stack. Signed-off-by: David Sterba

btrfs: reduce the size of compressed_bio

2026-04-07T16:55:59+00:00

The member compressed_bio::compressed_len can be replaced by the bio size, as we always submit the full compressed data without any partial read/write. Furthermore we already have enough ASSERT()s making sure the bio size matches the ordered extent or the extent map. This saves 8 bytes from compressed_bio: Before: struct compressed_bio { u64 start; /* 0 8 */ unsigned int len; /* 8 4 */ unsigned int compressed_len; /* 12 4 */ u8 compress_type; /* 16 1 */ bool writeback; /* 17 1 */ /* XXX 6 bytes hole, try to pack */ struct btrfs_bio * orig_bbio; /* 24 8 */ struct btrfs_bio bbio __attribute__((__aligned__(8))); /* 32 304 */ /* XXX last struct has 1 bit hole */ /* size: 336, cachelines: 6, members: 7 */ /* sum members: 330, holes: 1, sum holes: 6 */ /* member types with bit holes: 1, total: 1 */ /* forced alignments: 1 */ /* last cacheline: 16 bytes */ } __attribute__((__aligned__(8))); After: struct compressed_bio { u64 start; /* 0 8 */ unsigned int len; /* 8 4 */ u8 compress_type; /* 12 1 */ bool writeback; /* 13 1 */ /* XXX 2 bytes hole, try to pack */ struct btrfs_bio * orig_bbio; /* 16 8 */ struct btrfs_bio bbio __attribute__((__aligned__(8))); /* 24 304 */ /* XXX last struct has 1 bit hole */ /* size: 328, cachelines: 6, members: 6 */ /* sum members: 326, holes: 1, sum holes: 2 */ /* member types with bit holes: 1, total: 1 */ /* forced alignments: 1 */ /* last cacheline: 8 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

Merge tag 'for-7.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

2026-03-21T15:42:17+00:00

Pull btrfs fixes from David Sterba: "Another batch of fixes for problems that have been identified by tools analyzing code or by fuzzing. Most of them are short, two patches fix the same thing in many places so the diffs are bigger. - handle potential NULL pointer errors after attempting to read extent and checksum trees - prevent ENOSPC when creating many qgroups by ioctls in the same transaction - encoded write ioctl fixes (with 64K page and 4K block size): - fix unexpected bio length - do not let compressed bios and pages interfere with page cache - compression fixes on setups with 64K page and 4K block size: fix folio length assertions (zstd and lzo) - remap tree fixes: - make sure to hold block group reference while moving it - handle early exit when moving block group to unused list - handle deleted subvolumes with inconsistent state of deletion progress" * tag 'for-7.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: reject root items with drop_progress and zero drop_level btrfs: check block group before marking it unused in balance_remap_chunks() btrfs: hold block group reference during entire move_existing_remap() btrfs: fix an incorrect ASSERT() condition inside lzo_decompress_bio() btrfs: fix an incorrect ASSERT() condition inside zstd_decompress_bio() btrfs: do not touch page cache for encoded writes btrfs: fix a bug that makes encoded write bio larger than expected btrfs: reserve enough transaction items for qgroup ioctls btrfs: check for NULL root after calls to btrfs_csum_root() btrfs: check for NULL root after calls to btrfs_extent_root()

btrfs: fix an incorrect ASSERT() condition inside zstd_decompress_bio()

2026-03-17T10:43:08+00:00

[BUG] When running btrfs/284 with 64K page size and 4K fs block size, it crashes with the following ASSERT() triggered: assertion failed: folio_size(fi.folio) == blocksize :: 0, in fs/btrfs/zstd.c:603 ------------[ cut here ]------------ kernel BUG at fs/btrfs/zstd.c:603! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 1183 Comm: kworker/u35:4 Not tainted 6.19.0-rc8-custom+ #185 PREEMPT(voluntary) Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: btrfs-endio simple_end_io_work [btrfs] pc : zstd_decompress_bio+0x4f0/0x508 [btrfs] lr : zstd_decompress_bio+0x4f0/0x508 [btrfs] Call trace: zstd_decompress_bio+0x4f0/0x508 [btrfs] (P) end_bbio_compressed_read+0x260/0x2c0 [btrfs] btrfs_bio_end_io+0xc4/0x258 [btrfs] btrfs_check_read_bio+0x424/0x7e0 [btrfs] simple_end_io_work+0x40/0xa8 [btrfs] process_one_work+0x168/0x3f0 worker_thread+0x25c/0x398 kthread+0x154/0x250 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]--- [CAUSE] Commit 1914b94231e9 ("btrfs: zstd: use folio_iter to handle zstd_decompress_bio()") added the ASSERT() to make sure the folio size matches the fs block size. But the check is completely wrong, the original intention is to make sure for bs > ps cases, we always got a large folio that covers a full fs block. However for bs < ps cases, a folio can never be smaller than page size, and the ASSERT() gets triggered immediately. [FIX] Check the folio size against @min_folio_size instead, which will never be smaller than PAGE_SIZE, and still cover bs > ps cases. Fixes: 1914b94231e9 ("btrfs: zstd: use folio_iter to handle zstd_decompress_bio()") Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

btrfs: remove the old btrfs_compress_folios() infrastructure

2026-02-03T06:59:07+00:00

Since it's been replaced by btrfs_compress_bio(), remove all involved functions. Reviewed-by: Boris Burkov Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: zstd: introduce zstd_compress_bio() helper

2026-02-03T06:59:06+00:00

The new helper has the following enhancements against the existing zstd_compress_folios() - Much smaller parameter list No more shared IN/OUT members, no need to pre-allocate a compressed_folios[] array. Just a workspace and compressed_bio pointer, everything we need can be extracted from that @cb pointer. - Ready-to-be-submitted compressed bio Although the caller still needs to do some common works like rounding up and zeroing the tailing part of the last fs block. Overall the workflow is the same as zstd_compress_folios(), but with some minor changes: - @start/@len is now constant For the current input file offset, use @start + @tot_in instead. The original change of @start and @len makes it pretty hard to know what value we're really comparing to. - No more @cur_len It's only utilized when switching input buffer. Directly use btrfs_calc_input_length() instead. Reviewed-by: Boris Burkov Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: zstd: use folio_iter to handle zstd_decompress_bio()

2026-02-03T06:56:18+00:00

Currently zstd_decompress_bio() is using compressed_bio->compressed_folios[] array to grab each compressed folio. However cb->compressed_folios[] is just a pointer to each folio of the compressed bio, meaning we can just replace the compressed_folios[] array by just grabbing the folio inside the compressed bio. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba