kernel/linux.git/fs/btrfs, branch v6.1.176

btrfs: fix missing last_unlink_trans update when removing a directory

2026-06-19T11:37:24+00:00

[ Upstream commit 999757231c49376cd1a37308d2c8c4c9932571e1 ] When removing a directory we are not updating its last_unlink_trans field, which can result in incorrect fsync behaviour in case some one fsyncs the directory after it was removed because it's holding a file descriptor on it. Example scenario: mkdir /mnt/dir1 mkdir /mnt/dir1/dir2 mkdir /mnt/dir3 sync -f /mnt # Do some change to the directory and fsync it. chmod 700 /mnt/dir1 xfs_io -c fsync /mnt/dir1 # Move dir2 out of dir1 so that dir1 becomes empty. mv /mnt/dir1/dir2 /mnt/dir3/ open fd on /mnt/dir1 call rmdir(2) on path "/mnt/dir1" fsync fd When attempting to mount the filesystem, the log replay will fail with an -EIO error and dmesg/syslog has the following: [445771.626482] BTRFS info (device dm-0): first mount of filesystem 0368bbea-6c5e-44b5-b409-09abe496e650 [445771.626486] BTRFS info (device dm-0): using crc32c checksum algorithm [445771.627912] BTRFS info (device dm-0): start tree-log replay [445771.628335] page: refcount:2 mapcount:0 mapping:0000000061443ddc index:0x1d00 pfn:0x7072a5 [445771.629453] memcg:ffff89f400351b00 [445771.629892] aops:btree_aops [btrfs] ino:1 [445771.630737] flags: 0x17fffc00000402a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1ffff) [445771.632359] raw: 017fffc00000402a fffff47284d950c8 fffff472907b7c08 ffff89f458e412b8 [445771.633713] raw: 0000000000001d00 ffff89f6c51d1a90 00000002ffffffff ffff89f400351b00 [445771.635029] page dumped because: eb page dump [445771.635825] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=10 ino=258, invalid nlink: has 2 expect no more than 1 for dir [445771.638088] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14878 owner 5 [445771.638091] BTRFS info (device dm-0): refs 4 lock_owner 0 current 3581087 [445771.638094] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 [445771.638097] inode generation 3 transid 9 size 16 nbytes 16384 [445771.638098] block group 0 mode 40755 links 1 uid 0 gid 0 [445771.638100] rdev 0 sequence 2 flags 0x0 [445771.638102] atime 1775744884.0 [445771.660056] ctime 1775744885.645502983 [445771.660058] mtime 1775744885.645502983 [445771.660060] otime 1775744884.0 [445771.660062] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 [445771.660064] index 0 name_len 2 [445771.660066] item 2 key (256 DIR_ITEM 1843588421) itemoff 16077 itemsize 34 [445771.660068] location key (259 1 0) type 2 [445771.660070] transid 9 data_len 0 name_len 4 [445771.660075] item 3 key (256 DIR_ITEM 2363071922) itemoff 16043 itemsize 34 [445771.660076] location key (257 1 0) type 2 [445771.660077] transid 9 data_len 0 name_len 4 [445771.660078] item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34 [445771.660079] location key (257 1 0) type 2 [445771.660080] transid 9 data_len 0 name_len 4 [445771.660081] item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34 [445771.660082] location key (259 1 0) type 2 [445771.660083] transid 9 data_len 0 name_len 4 [445771.660084] item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160 [445771.660086] inode generation 9 transid 9 size 8 nbytes 0 [445771.660087] block group 0 mode 40777 links 1 uid 0 gid 0 [445771.660088] rdev 0 sequence 2 flags 0x0 [445771.660089] atime 1775744885.641174097 [445771.660090] ctime 1775744885.645502983 [445771.660091] mtime 1775744885.645502983 [445771.660105] otime 1775744885.641174097 [445771.660106] item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14 [445771.660107] index 2 name_len 4 [445771.660108] item 8 key (257 DIR_ITEM 2676584006) itemoff 15767 itemsize 34 [445771.660109] location key (258 1 0) type 2 [445771.660110] transid 9 data_len 0 name_len 4 [445771.660111] item 9 key (257 DIR_INDEX 2) itemoff 15733 itemsize 34 [445771.660112] location key (258 1 0) type 2 [445771.660113] transid 9 data_len 0 name_len 4 [445771.660114] item 10 key (258 INODE_ITEM 0) itemoff 15573 itemsize 160 [445771.660115] inode generation 9 transid 10 size 0 nbytes 0 [445771.660116] block group 0 mode 40755 links 2 uid 0 gid 0 [445771.660117] rdev 0 sequence 0 flags 0x0 [445771.660118] atime 1775744885.645502983 [445771.660119] ctime 1775744885.645502983 [445771.660120] mtime 1775744885.645502983 [445771.660121] otime 1775744885.645502983 [445771.660122] item 11 key (258 INODE_REF 257) itemoff 15559 itemsize 14 [445771.660123] index 2 name_len 4 [445771.660124] item 12 key (258 INODE_REF 259) itemoff 15545 itemsize 14 [445771.660125] index 2 name_len 4 [445771.660126] item 13 key (259 INODE_ITEM 0) itemoff 15385 itemsize 160 [445771.660127] inode generation 9 transid 10 size 8 nbytes 0 [445771.660128] block group 0 mode 40755 links 1 uid 0 gid 0 [445771.660129] rdev 0 sequence 1 flags 0x0 [445771.660130] atime 1775744885.645502983 [445771.660130] ctime 1775744885.645502983 [445771.660131] mtime 1775744885.645502983 [445771.660132] otime 1775744885.645502983 [445771.660133] item 14 key (259 INODE_REF 256) itemoff 15371 itemsize 14 [445771.660134] index 3 name_len 4 [445771.660135] item 15 key (259 DIR_ITEM 2676584006) itemoff 15337 itemsize 34 [445771.660136] location key (258 1 0) type 2 [445771.660137] transid 10 data_len 0 name_len 4 [445771.660138] item 16 key (259 DIR_INDEX 2) itemoff 15303 itemsize 34 [445771.660139] location key (258 1 0) type 2 [445771.660140] transid 10 data_len 0 name_len 4 [445771.660144] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected [445771.661650] ------------[ cut here ]------------ [445771.662358] WARNING: fs/btrfs/disk-io.c:326 at btree_csum_one_bio+0x217/0x230 [btrfs], CPU#8: mount/3581087 [445771.663588] Modules linked in: btrfs f2fs xfs (...) [445771.671229] CPU: 8 UID: 0 PID: 3581087 Comm: mount Tainted: G W 7.0.0-rc6-btrfs-next-230+ #2 PREEMPT(full) [445771.672575] Tainted: [W]=WARN [445771.672987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [445771.674460] RIP: 0010:btree_csum_one_bio+0x217/0x230 [btrfs] [445771.675222] Code: 89 44 24 (...) [445771.677364] RSP: 0018:ffffd23882247660 EFLAGS: 00010246 [445771.678029] RAX: 0000000000000000 RBX: ffff89f6c51d1a90 RCX: 0000000000000000 [445771.678975] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff89f406020000 [445771.679983] RBP: ffff89f821204000 R08: 0000000000000000 R09: 00000000ffefffff [445771.680905] R10: ffffd23882247448 R11: 0000000000000003 R12: ffffd23882247668 [445771.681978] R13: ffff89f458e40fc0 R14: ffff89f737f4f500 R15: ffff89f737f4f500 [445771.682912] FS: 00007f0447a98840(0000) GS:ffff89fb9771d000(0000) knlGS:0000000000000000 [445771.684393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [445771.685230] CR2: 00007f0447bf1330 CR3: 000000017cb02002 CR4: 0000000000370ef0 [445771.686273] Call Trace: [445771.686646] [445771.686969] btrfs_submit_bbio+0x83f/0x860 [btrfs] [445771.687750] ? write_one_eb+0x28f/0x340 [btrfs] [445771.688428] btree_writepages+0x2e3/0x550 [btrfs] [445771.689180] ? kmem_cache_alloc_noprof+0x12a/0x490 [445771.689963] ? alloc_extent_state+0x19/0x120 [btrfs] [445771.690801] ? kmem_cache_free+0x135/0x380 [445771.691328] ? preempt_count_add+0x69/0xa0 [445771.691831] ? set_extent_bit+0x252/0x8e0 [btrfs] [445771.692468] ? xas_load+0x9/0xc0 [445771.692873] ? xas_find+0x14d/0x1a0 [445771.693304] do_writepages+0xc6/0x160 [445771.693756] filemap_writeback+0xb8/0xe0 [445771.694274] btrfs_write_marked_extents+0x61/0x170 [btrfs] [445771.694999] btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs] [445771.695818] btrfs_commit_transaction+0x5c8/0xd10 [btrfs] [445771.696530] ? kmem_cache_free+0x135/0x380 [445771.697120] ? release_extent_buffer+0x34/0x160 [btrfs] [445771.697786] btrfs_recover_log_trees+0x7be/0x7e0 [btrfs] [445771.698525] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs] [445771.699206] open_ctree+0x11e5/0x1810 [btrfs] [445771.699776] btrfs_get_tree.cold+0xb/0x162 [btrfs] [445771.700463] ? fscontext_read+0x165/0x180 [445771.701146] ? rw_verify_area+0x50/0x180 [445771.701866] vfs_get_tree+0x25/0xd0 [445771.702491] vfs_cmd_create+0x59/0xe0 [445771.703125] __do_sys_fsconfig+0x303/0x610 [445771.703603] do_syscall_64+0xe9/0xf20 [445771.703974] entry_SYSCALL_64_after_hwframe+0x76/0x7e [445771.704700] RIP: 0033:0x7f0447cbd4aa [445771.705108] Code: 73 01 c3 (...) [445771.707263] RSP: 002b:00007ffc4e528318 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [445771.708107] RAX: ffffffffffffffda RBX: 00005561585d8c20 RCX: 00007f0447cbd4aa [445771.708931] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [445771.709744] RBP: 00005561585d9120 R08: 0000000000000000 R09: 0000000000000000 [445771.710674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [445771.711477] R13: 00007f0447e4f580 R14: 00007f0447e5126c R15: 00007f0447e36a23 [445771.712277] [445771.712541] ---[ end trace 0000000000000000 ]--- [445771.713382] BTRFS error (device dm-0): error while writing out transaction: -5 [445771.714679] BTRFS warning (device dm-0): Skipping commit of aborted transaction. [445771.715562] BTRFS error (device dm-0 state A): Transaction aborted (error -5) [445771.716459] BTRFS: error (device dm-0 state A) in cleanup_transaction:2068: errno=-5 IO failure [445771.717936] BTRFS error (device dm-0 state EA): failed to recover log trees with error: -5 [445771.719681] BTRFS error (device dm-0 state EA): open_ctree failed: -5 The problem is that such a fsync should have result in a fallback to a transaction commit, but that did not happen because through the btrfs_rmdir() we never update the directory's last_unlink_trans field. Any inode that had a link removed must have its last_unlink_trans updated to the ID of transaction used for the operation, otherwise fsync and log replay will not work correctly. btrfs_rmdir() calls btrfs_unlink_inode() and through that call chain we never call btrfs_record_unlink_dir() in order to update last_unlink_trans. However btrfs_unlink(), which is used for unlinking regular files, calls btrfs_record_unlink_dir() and then calls btrfs_unlink_inode(). So fix this by moving the call to btrfs_record_unlink_dir() from btrfs_unlink() to btrfs_unlink_inode(). A test case for fstests will follow soon. Reported-by: Slava0135 Link: https://lore.kernel.org/linux-btrfs/CAAJYhww5ov62Hm+n+tmhcL-e_4cBobg+OWogKjOJxVUXivC=MQ@mail.gmail.com/ CC: stable@vger.kernel.org Signed-off-by: Filipe Manana Signed-off-by: David Sterba [ wrapped dir and inode arguments with BTRFS_I() since 6.1 btrfs_rmdir() uses struct inode * instead of struct btrfs_inode * ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

btrfs: fix double free in create_space_info_sub_group() error path

2026-06-19T11:37:24+00:00

[ Upstream commit a7449edf96143f192606ec8647e3167e1ecbd728 ] When kobject_init_and_add() fails, the call chain is: create_space_info_sub_group() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&sub_group->kobj) -> space_info_release() -> kfree(sub_group) Then control returns to create_space_info_sub_group(), where: btrfs_sysfs_add_space_info_type() returns error -> kfree(sub_group) Thus, sub_group is freed twice. Keep parent->sub_group[index] = NULL for the failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: f92ee31e031c ("btrfs: introduce btrfs_space_info sub-group") CC: stable@vger.kernel.org # 6.18+ Reviewed-by: Qu Wenruo Signed-off-by: Guangshuo Li Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

btrfs: remove fs_info argument from btrfs_sysfs_add_space_info_type()

2026-06-19T11:37:24+00:00

[ Upstream commit 771af6ff72e0ed0eb8bf97e5ae4fa5094e0c5d1d ] We don't need it since we can grab fs_info from the given space_info. So remove the fs_info argument. Reviewed-by: Johannes Thumshirn Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Stable-dep-of: a7449edf9614 ("btrfs: fix double free in create_space_info_sub_group() error path") Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

btrfs: fix btrfs_ioctl_space_info() slot_count TOCTOU which can lead to info-leak

2026-06-19T11:37:24+00:00

[ Upstream commit 973e57c726c1f8e77259d1c8e519519f1e9aea77 ] btrfs_ioctl_space_info() has a TOCTOU race between two passes over the block group RAID type lists. The first pass counts entries to determine the allocation size, then the second pass fills the buffer. The groups_sem rwlock is released between passes, allowing concurrent block group removal to reduce the entry count. When the second pass fills fewer entries than the first pass counted, copy_to_user() copies the full alloc_size bytes including trailing uninitialized kmalloc bytes to userspace. Fix by copying only total_spaces entries (the actually-filled count from the second pass) instead of alloc_size bytes, and switch to kzalloc so any future copy size mismatch cannot leak heap data. Fixes: 7fde62bffb57 ("Btrfs: buffer results in the space_info ioctl") CC: stable@vger.kernel.org # 3.0 Signed-off-by: Yochai Eisenrich Reviewed-by: David Sterba Signed-off-by: David Sterba [ adapted upstream's `return -EFAULT;` to stable's `ret = -EFAULT;` fall-through to existing `out:` cleanup label ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

btrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()

2026-06-01T15:39:26+00:00

[ Upstream commit 82323b1a7088b7a5c3e528a5d634bff447fa286f ] submit_one_async_extent() calls btrfs_reserve_extent(), which decrements bytes_may_use. If the call btrfs_create_io_em() fails, we jump to out_free_reserve, which calls extent_clear_unlock_delalloc(). Because we're specifying EXTENT_DO_ACCOUNTING, i.e. EXTENT_CLEAR_META_RESV | EXTENT_CLEAR_DATA_RESV, this decreases bytes_may_use again. This can lead to problems later on, as an initial write can fail only for the writeback to silently ENOSPC. Fix this by replacing EXTENT_DO_ACCOUNTING with EXTENT_CLEAR_META_RESV. This parallels a4fe134fc1d8eb ("btrfs: fix a double release on reserved extents in cow_one_range()"), which is the same fix in cow_one_range(). Fixes: 151a41bc46df ("Btrfs: fix what bits we clear when erroring out from delalloc") Reviewed-by: Qu Wenruo Signed-off-by: Mark Harmstone Signed-off-by: David Sterba Signed-off-by: Sasha Levin

btrfs: fix double free in create_space_info() error path

2026-06-01T15:38:58+00:00

commit 3f487be81292702a59ea9dbc4088b3360a50e837 upstream. When kobject_init_and_add() fails, the call chain is: create_space_info() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&space_info->kobj) -> space_info_release() -> kfree(space_info) Then control returns to create_space_info(): btrfs_sysfs_add_space_info_type() returns error -> goto out_free -> kfree(space_info) This causes a double free. Keep the direct kfree(space_info) for the earlier failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: a11224a016d6d ("btrfs: fix memory leaks in create_space_info() error paths") CC: stable@vger.kernel.org # 6.19+ Reviewed-by: Qu Wenruo Signed-off-by: Guangshuo Li Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman

btrfs: do not free data reservation in fallback from inline due to -ENOSPC

2026-04-11T12:16:32+00:00

[ Upstream commit f8da41de0bff9eb1d774a7253da0c9f637c4470a ] If we fail to create an inline extent due to -ENOSPC, we will attempt to go through the normal COW path, reserve an extent, create an ordered extent, etc. However we were always freeing the reserved qgroup data, which is wrong since we will use data. Fix this by freeing the reserved qgroup data in __cow_file_range_inline() only if we are not doing the fallback (ret is <= 0). Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin

btrfs: fix the qgroup data free range for inline data extents

2026-04-11T12:16:32+00:00

[ Upstream commit 0bb067ca64e35536f1f5d9ef6aaafc40f4833623 ] Inside function __cow_file_range_inline() since the inlined data no longer take any data space, we need to free up the reserved space. However the code is still using the old page size == sector size assumption, and will not handle subpage case well. Thankfully it is not going to cause any problems because we have two extra safe nets: - Inline data extents creation is disabled for sector size < page size cases for now But it won't stay that for long. - btrfs_qgroup_free_data() will only clear ranges which have been already reserved So even if we pass a range larger than what we need, it should still be fine, especially there is only reserved space for a single block at file offset 0 of an inline data extent. But just for the sake of consistency, fix the call site to use sectorsize instead of page size. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Stable-dep-of: f8da41de0bff ("btrfs: do not free data reservation in fallback from inline due to -ENOSPC") Signed-off-by: Sasha Levin

btrfs: reject root items with drop_progress and zero drop_level

2026-04-11T12:16:19+00:00

[ Upstream commit b17b79ff896305fd74980a5f72afec370ee88ca4 ] [BUG] When recovering relocation at mount time, merge_reloc_root() and btrfs_drop_snapshot() both use BUG_ON(level == 0) to guard against an impossible state: a non-zero drop_progress combined with a zero drop_level in a root_item, which can be triggered: ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1545! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 1 UID: 0 PID: 283 ... Tainted: 6.18.0+ #16 PREEMPT(voluntary) Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2 RIP: 0010:merge_reloc_root+0x1266/0x1650 fs/btrfs/relocation.c:1545 Code: ffff0000 00004589 d7e9acfa ffffe8a1 79bafebe 02000000 Call Trace: merge_reloc_roots+0x295/0x890 fs/btrfs/relocation.c:1861 btrfs_recover_relocation+0xd6e/0x11d0 fs/btrfs/relocation.c:4195 btrfs_start_pre_rw_mount+0xa4d/0x1810 fs/btrfs/disk-io.c:3130 open_ctree+0x5824/0x5fe0 fs/btrfs/disk-io.c:3640 btrfs_fill_super fs/btrfs/super.c:987 [inline] btrfs_get_tree_super fs/btrfs/super.c:1951 [inline] btrfs_get_tree_subvol fs/btrfs/super.c:2094 [inline] btrfs_get_tree+0x111c/0x2190 fs/btrfs/super.c:2128 vfs_get_tree+0x9a/0x370 fs/super.c:1758 fc_mount fs/namespace.c:1199 [inline] do_new_mount_fc fs/namespace.c:3642 [inline] do_new_mount fs/namespace.c:3718 [inline] path_mount+0x5b8/0x1ea0 fs/namespace.c:4028 do_mount fs/namespace.c:4041 [inline] __do_sys_mount fs/namespace.c:4229 [inline] __se_sys_mount fs/namespace.c:4206 [inline] __x64_sys_mount+0x282/0x320 fs/namespace.c:4206 ... RIP: 0033:0x7f969c9a8fde Code: 0f1f4000 48c7c2b0 fffffff7 d8648902 b8ffffff ffc3660f ---[ end trace 0000000000000000 ]--- The bug is reproducible on 7.0.0-rc2-next-20260310 with our dynamic metadata fuzzing tool that corrupts btrfs metadata at runtime. [CAUSE] A non-zero drop_progress.objectid means an interrupted btrfs_drop_snapshot() left a resume point on disk, and in that case drop_level must be greater than 0 because the checkpoint is only saved at internal node levels. Although this invariant is enforced when the kernel writes the root item, it is not validated when the root item is read back from disk. That allows on-disk corruption to provide an invalid state with drop_progress.objectid != 0 and drop_level == 0. When relocation recovery later processes such a root item, merge_reloc_root() reads drop_level and hits BUG_ON(level == 0). The same invalid metadata can also trigger the corresponding BUG_ON() in btrfs_drop_snapshot(). [FIX] Fix this by validating the root_item invariant in tree-checker when reading root items from disk: if drop_progress.objectid is non-zero, drop_level must also be non-zero. Reject such malformed metadata with -EUCLEAN before it reaches merge_reloc_root() or btrfs_drop_snapshot() and triggers the BUG_ON. After the fix, the same corruption is correctly rejected by tree-checker and the BUG_ON is no longer triggered. Reviewed-by: Qu Wenruo Signed-off-by: ZhengYuan Huang Signed-off-by: David Sterba Signed-off-by: Sasha Levin

btrfs: don't take device_list_mutex when querying zone info

2026-04-11T12:16:18+00:00

[ Upstream commit 77603ab10429fe713a03345553ca8dbbfb1d91c6 ] Shin'ichiro reported sporadic hangs when running generic/013 in our CI system. When enabling lockdep, there is a lockdep splat when calling btrfs_get_dev_zone_info_all_devices() in the mount path that can be triggered by i.e. generic/013: ====================================================== WARNING: possible circular locking dependency detected 7.0.0-rc1+ #355 Not tainted ------------------------------------------------------ mount/1043 is trying to acquire lock: ffff8881020b5470 (&vblk->vdev_mutex){+.+.}-{4:4}, at: virtblk_report_zones+0xda/0x430 but task is already holding lock: ffff888102a738e0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: btrfs_get_dev_zone_info_all_devices+0x45/0x90 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&fs_devs->device_list_mutex){+.+.}-{4:4}: __mutex_lock+0xa3/0x1360 btrfs_create_pending_block_groups+0x1f4/0x9d0 __btrfs_end_transaction+0x3e/0x2e0 btrfs_zoned_reserve_data_reloc_bg+0x2f8/0x390 open_ctree+0x1934/0x23db btrfs_get_tree.cold+0x105/0x26c vfs_get_tree+0x28/0xb0 __do_sys_fsconfig+0x324/0x680 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #3 (btrfs_trans_num_extwriters){++++}-{0:0}: join_transaction+0xc2/0x5c0 start_transaction+0x17c/0xbc0 btrfs_zoned_reserve_data_reloc_bg+0x2b4/0x390 open_ctree+0x1934/0x23db btrfs_get_tree.cold+0x105/0x26c vfs_get_tree+0x28/0xb0 __do_sys_fsconfig+0x324/0x680 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (btrfs_trans_num_writers){++++}-{0:0}: lock_release+0x163/0x4b0 __btrfs_end_transaction+0x1c7/0x2e0 btrfs_dirty_inode+0x6f/0xd0 touch_atime+0xe5/0x2c0 btrfs_file_mmap_prepare+0x65/0x90 __mmap_region+0x4b9/0xf00 mmap_region+0xf7/0x120 do_mmap+0x43d/0x610 vm_mmap_pgoff+0xd6/0x190 ksys_mmap_pgoff+0x7e/0xc0 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0x68/0xa0 _copy_to_user+0x22/0x70 blkdev_copy_zone_to_user+0x22/0x40 virtblk_report_zones+0x282/0x430 blkdev_report_zones_ioctl+0xfd/0x130 blkdev_ioctl+0x20f/0x2c0 __x64_sys_ioctl+0x86/0xd0 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&vblk->vdev_mutex){+.+.}-{4:4}: __lock_acquire+0x1522/0x2680 lock_acquire+0xd5/0x2f0 __mutex_lock+0xa3/0x1360 virtblk_report_zones+0xda/0x430 blkdev_report_zones_cached+0x162/0x190 btrfs_get_dev_zones+0xdc/0x2e0 btrfs_get_dev_zone_info+0x219/0xe80 btrfs_get_dev_zone_info_all_devices+0x62/0x90 open_ctree+0x1200/0x23db btrfs_get_tree.cold+0x105/0x26c vfs_get_tree+0x28/0xb0 __do_sys_fsconfig+0x324/0x680 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: &vblk->vdev_mutex --> btrfs_trans_num_extwriters --> &fs_devs->device_list_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(btrfs_trans_num_extwriters); lock(&fs_devs->device_list_mutex); lock(&vblk->vdev_mutex); *** DEADLOCK *** 3 locks held by mount/1043: #0: ffff88811063e878 (&fc->uapi_mutex){+.+.}-{4:4}, at: __do_sys_fsconfig+0x2ae/0x680 #1: ffff88810cb9f0e8 (&type->s_umount_key#31/1){+.+.}-{4:4}, at: alloc_super+0xc0/0x3e0 #2: ffff888102a738e0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: btrfs_get_dev_zone_info_all_devices+0x45/0x90 stack backtrace: CPU: 2 UID: 0 PID: 1043 Comm: mount Not tainted 7.0.0-rc1+ #355 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025 Call Trace: dump_stack_lvl+0x5b/0x80 print_circular_bug.cold+0x18d/0x1d8 check_noncircular+0x10d/0x130 __lock_acquire+0x1522/0x2680 ? vmap_small_pages_range_noflush+0x3ef/0x820 lock_acquire+0xd5/0x2f0 ? virtblk_report_zones+0xda/0x430 ? lock_is_held_type+0xcd/0x130 __mutex_lock+0xa3/0x1360 ? virtblk_report_zones+0xda/0x430 ? virtblk_report_zones+0xda/0x430 ? __pfx_copy_zone_info_cb+0x10/0x10 ? virtblk_report_zones+0xda/0x430 virtblk_report_zones+0xda/0x430 ? __pfx_copy_zone_info_cb+0x10/0x10 blkdev_report_zones_cached+0x162/0x190 ? __pfx_copy_zone_info_cb+0x10/0x10 btrfs_get_dev_zones+0xdc/0x2e0 btrfs_get_dev_zone_info+0x219/0xe80 btrfs_get_dev_zone_info_all_devices+0x62/0x90 open_ctree+0x1200/0x23db btrfs_get_tree.cold+0x105/0x26c ? rcu_is_watching+0x18/0x50 vfs_get_tree+0x28/0xb0 __do_sys_fsconfig+0x324/0x680 do_syscall_64+0x92/0x4f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f615e27a40e RSP: 002b:00007fff11b18fb8 EFLAGS: 00000246 ORIG_RAX: 00000000000001af RAX: ffffffffffffffda RBX: 000055572e92ab10 RCX: 00007f615e27a40e RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 RBP: 00007fff11b19100 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000055572e92bc40 R14: 00007f615e3faa60 R15: 000055572e92bd08 Don't hold the device_list_mutex while calling into btrfs_get_dev_zone_info() in btrfs_get_dev_zone_info_all_devices() to mitigate the issue. This is safe, as no other thread can touch the device list at the moment of execution. Reported-by: Shin'ichiro Kawasaki Reviewed-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin