kernel/linux.git/fs/btrfs/zlib.c, branch v7.1

btrfs: prevent direct reclaim during compressed readahead

2026-04-07T16:56:08+00:00

Under memory pressure, direct reclaim can kick in during compressed readahead. This puts the associated task into D-state. Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under heavy pressure, we've observed reclaim can run long enough that the CPU becomes prone to CSD lock stalls since it cannot service incoming IPIs. Although the CSD lock stalls are the worst case scenario, we have found many more subtle occurrences of this latency on the order of seconds, over a minute in some cases. Prevent direct reclaim during compressed readahead. This is achieved by using different GFP flags at key points when the bio is marked for readahead. There are two functions that allocate during compressed readahead: btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use GFP_NOFS which includes __GFP_DIRECT_RECLAIM. For the internal API call btrfs_alloc_compr_folio(), the signature changes to accept an additional gfp_t parameter. At the readahead call site, it gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. __GFP_NOWARN is added since these allocations are allowed to fail. Demand reads still use full GFP_NOFS and will enter reclaim if needed. All other existing call sites of btrfs_alloc_compr_folio() now explicitly pass GFP_NOFS to retain their current behavior. add_ra_bio_pages() gains a bool parameter which allows callers to specify if they want to allow direct reclaim or not. In either case, the __GFP_NOWARN flag was added unconditionally since the allocations are speculative. There has been some previous work done on calling add_ra_bio_pages() [0]. This patch is complementary: where that patch reduces call frequency, this patch reduces the latency associated with those calls. [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/ Reviewed-by: Mark Harmstone Reviewed-by: Qu Wenruo Signed-off-by: JP Kobryn (Meta) Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: zlib: don't cache sectorsize in a local variable

2026-04-07T16:56:08+00:00

The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Signed-off-by: David Sterba

btrfs: zlib: drop redundant folio address variable

2026-04-07T16:56:07+00:00

We're caching the current output folio address but it's not really necessary as we store it in the variable and then pass it to the stream context. We can read the folio address directly. Signed-off-by: David Sterba

btrfs: reduce the size of compressed_bio

2026-04-07T16:55:59+00:00

The member compressed_bio::compressed_len can be replaced by the bio size, as we always submit the full compressed data without any partial read/write. Furthermore we already have enough ASSERT()s making sure the bio size matches the ordered extent or the extent map. This saves 8 bytes from compressed_bio: Before: struct compressed_bio { u64 start; /* 0 8 */ unsigned int len; /* 8 4 */ unsigned int compressed_len; /* 12 4 */ u8 compress_type; /* 16 1 */ bool writeback; /* 17 1 */ /* XXX 6 bytes hole, try to pack */ struct btrfs_bio * orig_bbio; /* 24 8 */ struct btrfs_bio bbio __attribute__((__aligned__(8))); /* 32 304 */ /* XXX last struct has 1 bit hole */ /* size: 336, cachelines: 6, members: 7 */ /* sum members: 330, holes: 1, sum holes: 6 */ /* member types with bit holes: 1, total: 1 */ /* forced alignments: 1 */ /* last cacheline: 16 bytes */ } __attribute__((__aligned__(8))); After: struct compressed_bio { u64 start; /* 0 8 */ unsigned int len; /* 8 4 */ u8 compress_type; /* 12 1 */ bool writeback; /* 13 1 */ /* XXX 2 bytes hole, try to pack */ struct btrfs_bio * orig_bbio; /* 16 8 */ struct btrfs_bio bbio __attribute__((__aligned__(8))); /* 24 304 */ /* XXX last struct has 1 bit hole */ /* size: 328, cachelines: 6, members: 6 */ /* sum members: 326, holes: 1, sum holes: 2 */ /* member types with bit holes: 1, total: 1 */ /* forced alignments: 1 */ /* last cacheline: 8 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

Merge tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

2026-03-28T22:23:03+00:00

Pull btrfs fixes from David Sterba: "A few more fixes. There's one that stands out in size as it fixes an edge case in fsync. - fix issue on fsync where file with zero size appears as a non-zero after log replay - in zlib compression, handle a crash when data alignment causes folio reference issues - fix possible crash with enabled tracepoints on a overlayfs mount - handle device stats update error - on zoned filesystems, fix kobject leak on sub-block groups - fix super block offset in an error message in validation" * tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix lost error when running device stats on multiple devices fs btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file() btrfs: zlib: handle page aligned compressed size correctly btrfs: fix leak of kobject name for sub-group space_info btrfs: fix zero size inode with non-zero size after log replay btrfs: fix super block offset in error message in btrfs_validate_super()

btrfs: zlib: handle page aligned compressed size correctly

2026-03-23T19:11:07+00:00

[BUG] Since commit 3d74a7556fba ("btrfs: zlib: introduce zlib_compress_bio() helper"), there are some reports about different crashes in zlib compression path. One of the symptoms is list corruption like the following: list_del corruption. next->prev should be fffffbb340204a08, but was ffff8d6517cb7de0. (next=fffffbb3402d62c8) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:65! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 21436 Comm: kworker/u16:7 Not tainted 7.0.0-rc2-jcg+ #1 PREEMPT Hardware name: LENOVO 10VGS02P00/3130, BIOS M1XKT57A 02/10/2022 Workqueue: btrfs-delalloc btrfs_work_helper [btrfs] RIP: 0010:__list_del_entry_valid_or_report+0xec/0xf0 Call Trace: btrfs_alloc_compr_folio+0xae/0xc0 [btrfs] zlib_compress_bio+0x39d/0x6a0 [btrfs] btrfs_compress_bio+0x2e3/0x3d0 [btrfs] compress_file_range+0x2b0/0x660 [btrfs] btrfs_work_helper+0xdb/0x3e0 [btrfs] process_one_work+0x192/0x3d0 worker_thread+0x19a/0x310 kthread+0xdf/0x120 ret_from_fork+0x22e/0x310 ret_from_fork_asm+0x1a/0x30 ---[ end trace 0000000000000000 ]--- Other symptoms include VM_BUG_ON() during folio_put() but it's rarer. David Sterba firstly reported this during his CI runs but unfortunately I'm unable to hit it. Meanwhile zstd/lzo doesn't seem to have the same problem. [CAUSE] During zlib_compress_bio() every time the output buffer is full, we queue the full folio into the compressed bio, and allocate a new folio as the output folio. After the input has finished, we loop through zlib_deflate() with Z_FINISH to flush all output. And when that is done, we still need to check if the last folio has any content, and if so we still need to queue that part into the compressed bio. The problem is in the final folio handling, if the final folio is full (for x86_64 the folio size is 4K), the length to queue is calculated by u32 cur_len = offset_in_folio(out_folio, workspace->strm.total_out); But since total_out is 4K aligned, the resulted @cur_len will be 0, then we hit the bio_add_folio(), which has a quirk that if bio_add_folio() got an length 0, it will still queue the folio into the bio, but return false. In that case we go to out: tag, which calls btrfs_free_compr_folio() to release @out_folio, which may put the out folio into the btrfs global pool list. On the other hand, that @out_folio is already added to the compressed bio, and will later be released again by cleanup_compressed_bio(), which results double release. And if this time we still need to put the folio into the btrfs global pool list, it will result a list corruption because it's already in the list. [FIX] Instead of offset_inside_folio(), directly use the difference between strm.total_out and bi_size. So that if the last folio is completely full, we can still properly queue the full folio other than queueing zero byte. Fixes: 3d74a7556fba ("btrfs: zlib: introduce zlib_compress_bio() helper") Reported-by: David Sterba Reported-by: Jean-Christophe Guillain Reported-by: syzbot+3c4d8371d65230f852a2@syzkaller.appspotmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=221176 Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

btrfs: remove the old btrfs_compress_folios() infrastructure

2026-02-03T06:59:07+00:00

Since it's been replaced by btrfs_compress_bio(), remove all involved functions. Reviewed-by: Boris Burkov Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: zlib: introduce zlib_compress_bio() helper

2026-02-03T06:59:06+00:00

The new helper has the following enhancements against the existing zlib_compress_folios() - Much smaller parameter list No more shared IN/OUT members, no need to pre-allocate a compressed_folios[] array. Just a workspace and compressed_bio pointer, everything we need can be extracted from that @cb pointer. - Ready-to-be-submitted compressed bio Although the caller still needs to do some common works like rounding up and zeroing the tailing part of the last fs block. Reviewed-by: Boris Burkov Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba