summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-08-11f2fs: fix to avoid use SSR allocate when do defragmentZhiguo Niu1-1/+2
[ Upstream commit 21327a042dd94bc73181d7300e688699cb1f467e ] SSR allocate mode will be used when doing file defragment if ATGC is working at the same time, that is because set_page_private_gcing may make CURSEG_ALL_DATA_ATGC segment type got in f2fs_allocate_data_block when defragment page is writeback, which may cause file fragmentation is worse. A file with 2 fragmentations is changed as following after defragment: ----------------file info------------------- sensorsdata : -------------------------------------------- dev [254:48] ino [0x 3029 : 12329] mode [0x 81b0 : 33200] nlink [0x 1 : 1] uid [0x 27e6 : 10214] gid [0x 27e6 : 10214] size [0x 242000 : 2367488] blksize [0x 1000 : 4096] blocks [0x 1210 : 4624] -------------------------------------------- file_pos start_blk end_blk blks 0 11361121 11361207 87 356352 11361215 11361216 2 364544 11361218 11361218 1 368640 11361220 11361221 2 376832 11361224 11361225 2 385024 11361227 11361238 12 434176 11361240 11361252 13 487424 11361254 11361254 1 491520 11361271 11361279 9 528384 3681794 3681795 2 536576 3681797 3681797 1 540672 3681799 3681799 1 544768 3681803 3681803 1 548864 3681805 3681805 1 552960 3681807 3681807 1 557056 3681809 3681809 1 Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 8cb1f4080dd9 ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11ext4: check the extent status again before inserting delalloc blockZhang Yi1-0/+21
[ Upstream commit 0ea6560abb3bac1ffcfa4bf6b2c4d344fdc27b3c ] ext4_da_map_blocks looks up for any extent entry in the extent status tree (w/o i_data_sem) and then the looks up for any ondisk extent mapping (with i_data_sem in read mode). If it finds a hole in the extent status tree or if it couldn't find any entry at all, it then takes the i_data_sem in write mode to add a da entry into the extent status tree. This can actually race with page mkwrite & fallocate path. Note that this is ok between 1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the folio lock 2. ext4 buffered write path v/s ext4 fallocate because of the inode lock. But this can race between ext4_page_mkwrite() & ext4 fallocate path ext4_page_mkwrite() ext4_fallocate() block_page_mkwrite() ext4_da_map_blocks() //find hole in extent status tree ext4_alloc_file_blocks() ext4_map_blocks() //allocate block and unwritten extent ext4_insert_delayed_block() ext4_da_reserve_space() //reserve one more block ext4_es_insert_delayed_block() //drop unwritten extent and add delayed extent by mistake Then, the delalloc extent is wrong until writeback and the extra reserved block can't be released any more and it triggers below warning: EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared! Fix the problem by looking up extent status tree again while the i_data_sem is held in write mode. If it still can't find any entry, then we insert a new da entry into the extent status tree. Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240517124005.347221-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11ext4: factor out a common helper to query extent mapZhang Yi1-25/+32
[ Upstream commit 8e4e5cdf2fdeb99445a468b6b6436ad79b9ecb30 ] Factor out a new common helper ext4_map_query_blocks() from the ext4_da_map_blocks(), it query and return the extent map status on the inode's extent path, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240517124005.347221-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11ext4: convert to exclusive lock while inserting delalloc extentsZhang Yi1-14/+11
[ Upstream commit acf795dc161f3cf481db20f05db4250714e375e5 ] ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem when inserting delalloc extents, it could be raced by another querying path of ext4_map_blocks() without i_rwsem, .e.g buffered read path. Suppose we buffered read a file containing just a hole, and without any cached extents tree, then it is raced by another delayed buffered write to the same area or the near area belongs to the same hole, and the new delalloc extent could be overwritten to a hole extent. pread() pwrite() filemap_read_folio() ext4_mpage_readpages() ext4_map_blocks() down_read(i_data_sem) ext4_ext_determine_hole() //find hole ext4_ext_put_gap_in_cache() ext4_es_find_extent_range() //no delalloc extent ext4_da_map_blocks() down_read(i_data_sem) ext4_insert_delayed_block() //insert delalloc extent ext4_es_insert_extent() //overwrite delalloc extent to hole This race could lead to inconsistent delalloc extents tree and incorrect reserved space counter. Fix this by converting to hold i_data_sem in exclusive mode when adding a new delalloc extent in ext4_da_map_blocks(). Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11ext4: refactor ext4_da_map_blocks()Zhang Yi1-22/+17
[ Upstream commit 3fcc2b887a1ba4c1f45319cd8c54daa263ecbc36 ] Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary parameters and branches, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11ext4: make ext4_es_insert_extent() return voidBaokun Li4-28/+18
[ Upstream commit 6c120399cde6b1b5cf65ce403765c579fb3d3e50 ] Now ext4_es_insert_extent() never return error, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-12-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0ea6560abb3b ("ext4: check the extent status again before inserting delalloc block") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11sysctl: always initialize i_uid/i_gidThomas Weißschuh1-4/+2
[ Upstream commit 98ca62ba9e2be5863c7d069f84f7166b45a5b2f4 ] Always initialize i_uid/i_gid inside the sysfs core so set_ownership() can safely skip setting them. Commit 5ec27ec735ba ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") added defaults for i_uid/i_gid when set_ownership() was not implemented. It also missed adjusting net_ctl_set_ownership() to use the same default values in case the computation of a better value failed. Fixes: 5ec27ec735ba ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)Thomas Weißschuh1-1/+1
[ Upstream commit 520713a93d550406dae14d49cdb8778d70cecdfd ] Remove the 'table' argument from set_ownership as it is never used. This change is a step towards putting "struct ctl_table" into .rodata and eventually having sysctl core only use "const struct ctl_table". The patch was created with the following coccinelle script: @@ identifier func, head, table, uid, gid; @@ void func( struct ctl_table_header *head, - struct ctl_table *table, kuid_t *uid, kgid_t *gid) { ... } No additional occurrences of 'set_ownership' were found after doing a tree-wide search. Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com> Stable-dep-of: 98ca62ba9e2b ("sysctl: always initialize i_uid/i_gid") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNTSeth Forshee (DigitalOcean)1-0/+11
[ Upstream commit e1c5ae59c0f22f7fe5c07fb5513a29e4aad868c9 ] Christian noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process priviliged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic. We cannot assume that any filesystem which does not set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in mind, increasing the risk for bugs and security issues. Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is not set for the filesystem and a non-initial user namespace will be used. sget() does not need to be updated as it always uses the user namespace of the current context, or the initial user namespace if SB_SUBMOUNT is set. Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()") Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03ceph: fix incorrect kmalloc size of pagevec mempoolethanwu1-1/+2
[ Upstream commit 03230edb0bd831662a7c08b6fef66b2a9a817774 ] The kmalloc size of pagevec mempool is incorrectly calculated. It misses the size of page pointer and only accounts the number for the array. Fixes: a0102bda5bc0 ("ceph: move sb->wb_pagevec_pool to be a global mempool") Signed-off-by: ethanwu <ethanwu@synology.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03f2fs: fix to update user block counts in block_operations()Chao Yu1-5/+5
[ Upstream commit f06c0f82e38bbda7264d6ef3c90045ad2810e0f3 ] Commit 59c9081bc86e ("f2fs: allow write page cache when writting cp") allows write() to write data to page cache during checkpoint, so block count fields like .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count may encounter race condition as below: CP Thread A - write_checkpoint - block_operations - f2fs_down_write(&sbi->node_change) - __prepare_cp_block : ckpt->valid_block_count = .total_valid_block_count - f2fs_up_write(&sbi->node_change) - write - f2fs_preallocate_blocks - f2fs_map_blocks(,F2FS_GET_BLOCK_PRE_AIO) - f2fs_map_lock - f2fs_down_read(&sbi->node_change) - f2fs_reserve_new_blocks - inc_valid_block_count : percpu_counter_add(&sbi->alloc_valid_block_count, count) : sbi->total_valid_block_count += count - f2fs_up_read(&sbi->node_change) - do_checkpoint : sbi->last_valid_block_count = sbi->total_valid_block_count : percpu_counter_set(&sbi->alloc_valid_block_count, 0) : percpu_counter_set(&sbi->rf_node_block_count, 0) - fsync - need_do_checkpoint - f2fs_space_for_roll_forward : alloc_valid_block_count was reset to zero, so, it may missed last data during checkpoint Let's change to update .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count in block_operations(), then their access can be protected by .node_change and .cp_rwsem lock, so that it can avoid above race condition. Fixes: 59c9081bc86e ("f2fs: allow write page cache when writting cp") Cc: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03f2fs: fix start segno of large sectionSheng Yong1-1/+2
[ Upstream commit 8c409989678e92e4a737e7cd2bb04f3efb81071a ] get_ckpt_valid_blocks() checks valid ckpt blocks in current section. It counts all vblocks from the first to the last segment in the large section. However, START_SEGNO() is used to get the first segno in an SIT block. This patch fixes that to get the correct start segno. Fixes: 61461fc921b7 ("f2fs: fix to avoid touching checkpointed data in get_victim()") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03jfs: Fix array-index-out-of-bounds in diFreeJeongjun Park1-1/+4
[ Upstream commit f73f969b2eb39ad8056f6c7f3a295fa2f85e313a ] Reported-by: syzbot+241c815bda521982cb49@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03nilfs2: handle inconsistent state in nilfs_btnode_create_block()Ryusuke Konishi2-7/+22
commit 4811f7af6090e8f5a398fbdd766f903ef6c0d787 upstream. Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987d45d ("nilfs2: B-tree node cache") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03f2fs: fix return value of f2fs_convert_inline_inode()Chao Yu1-2/+4
commit a8eb3de28e7a365690c61161e7a07a4fc7c60bbf upstream. If device is readonly, make f2fs_convert_inline_inode() return EROFS instead of zero, otherwise it may trigger panic during writeback of inline inode's dirty page as below: f2fs_write_single_data_page+0xbb6/0x1e90 fs/f2fs/data.c:2888 f2fs_write_cache_pages fs/f2fs/data.c:3187 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3342 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3369 do_writepages+0x359/0x870 mm/page-writeback.c:2634 filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397 __filemap_fdatawrite_range mm/filemap.c:430 [inline] file_write_and_wait_range+0x1aa/0x290 mm/filemap.c:788 f2fs_do_sync_file+0x68a/0x1ae0 fs/f2fs/file.c:276 generic_write_sync include/linux/fs.h:2806 [inline] f2fs_file_write_iter+0x7bd/0x24e0 fs/f2fs/file.c:4977 call_write_iter include/linux/fs.h:2114 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa72/0xc90 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Cc: stable@vger.kernel.org Reported-by: syzbot+848062ba19c8782ca5c8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000d103ce06174d7ec3@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03f2fs: fix to don't dirty inode for readonly filesystemChao Yu1-0/+3
commit 192b8fb8d1c8ca3c87366ebbef599fa80bb626b8 upstream. syzbot reports f2fs bug as below: kernel BUG at fs/f2fs/inode.c:933! RIP: 0010:f2fs_evict_inode+0x1576/0x1590 fs/f2fs/inode.c:933 Call Trace: evict+0x2a4/0x620 fs/inode.c:664 dispose_list fs/inode.c:697 [inline] evict_inodes+0x5f8/0x690 fs/inode.c:747 generic_shutdown_super+0x9d/0x2c0 fs/super.c:675 kill_block_super+0x44/0x90 fs/super.c:1667 kill_f2fs_super+0x303/0x3b0 fs/f2fs/super.c:4894 deactivate_locked_super+0xc1/0x130 fs/super.c:484 cleanup_mnt+0x426/0x4c0 fs/namespace.c:1256 task_work_run+0x24a/0x300 kernel/task_work.c:180 ptrace_notify+0x2cd/0x380 kernel/signal.c:2399 ptrace_report_syscall include/linux/ptrace.h:411 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:473 [inline] syscall_exit_work kernel/entry/common.c:251 [inline] syscall_exit_to_user_mode_prepare kernel/entry/common.c:278 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x15c/0x280 kernel/entry/common.c:296 do_syscall_64+0x50/0x110 arch/x86/entry/common.c:88 entry_SYSCALL_64_after_hwframe+0x63/0x6b The root cause is: - do_sys_open - f2fs_lookup - __f2fs_find_entry - f2fs_i_depth_write - f2fs_mark_inode_dirty_sync - f2fs_dirty_inode - set_inode_flag(inode, FI_DIRTY_INODE) - umount - kill_f2fs_super - kill_block_super - generic_shutdown_super - sync_filesystem : sb is readonly, skip sync_filesystem() - evict_inodes - iput - f2fs_evict_inode - f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)) : trigger kernel panic When we try to repair i_current_depth in readonly filesystem, let's skip dirty inode to avoid panic in later f2fs_evict_inode(). Cc: stable@vger.kernel.org Reported-by: syzbot+31e4659a3fe953aec2f4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000e890bc0609a55cff@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03f2fs: fix to force buffered IO on inline_data inodeChao Yu1-0/+2
commit 5c8764f8679e659c5cb295af7d32279002d13735 upstream. It will return all zero data when DIO reading from inline_data inode, it is because f2fs_iomap_begin() assign iomap->type w/ IOMAP_HOLE incorrectly for this case. We can let iomap framework handle inline data via assigning iomap->type and iomap->inline_data correctly, however, it will be a little bit complicated when handling race case in between direct IO and buffered IO. So, let's force to use buffered IO to fix this issue. Cc: stable@vger.kernel.org Reported-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03fs/ntfs3: Update log->page_{mask,bits} if log->page_size changedHuacai Chen1-0/+3
commit 2fef55d8f78383c8e6d6d4c014b9597375132696 upstream. If an NTFS file system is mounted to another system with different PAGE_SIZE from the original system, log->page_size will change in log_replay(), but log->page_{mask,bits} don't change correspondingly. This will cause a panic because "u32 bytes = log->page_size - page_off" will get a negative value in the later read_log_page(). Cc: stable@vger.kernel.org Fixes: b46acd6a6a627d876898e ("fs/ntfs3: Add NTFS journal") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03jbd2: make jbd2_journal_get_max_txn_bufs() internalJan Kara2-1/+6
commit 4aa99c71e42ad60178c1154ec24e3df9c684fb67 upstream. There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03ext4: make sure the first directory block is not a holeBaokun Li1-11/+6
commit f9ca51596bbfd0f9c386dd1c613c394c78d9e5e6 upstream. The syzbot constructs a directory that has no dirblock but is non-inline, i.e. the first directory block is a hole. And no errors are reported when creating files in this directory in the following flow. ext4_mknod ... ext4_add_entry // Read block 0 ext4_read_dirblock(dir, block, DIRENT) bh = ext4_bread(NULL, inode, block, 0) if (!bh && (type == INDEX || type == DIRENT_HTREE)) // The first directory block is a hole // But type == DIRENT, so no error is reported. After that, we get a directory block without '.' and '..' but with a valid dentry. This may cause some code that relies on dot or dotdot (such as make_indexed_dir()) to crash. Therefore when ext4_read_dirblock() finds that the first directory block is a hole report that the filesystem is corrupted and return an error to avoid loading corrupted data from disk causing something bad. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03ext4: check dot and dotdot of dx_root before making dir indexedBaokun Li1-5/+51
commit 50ea741def587a64e08879ce6c6a30131f7111e7 upstream. Syzbot reports a issue as follows: ============================================ BUG: unable to handle page fault for address: ffffed11022e24fe PGD 23ffee067 P4D 23ffee067 PUD 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-g55027e689933 #0 Call Trace: <TASK> make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451 ext4_rename fs/ext4/namei.c:3936 [inline] ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214 [...] ============================================ The immediate cause of this problem is that there is only one valid dentry for the block to be split during do_split, so split==0 results in out of bounds accesses to the map triggering the issue. do_split unsigned split dx_make_map count = 1 split = count/2 = 0; continued = hash2 == map[split - 1].hash; ---> map[4294967295] The maximum length of a filename is 255 and the minimum block size is 1024, so it is always guaranteed that the number of entries is greater than or equal to 2 when do_split() is called. But syzbot's crafted image has no dot and dotdot in dir, and the dentry distribution in dirblock is as follows: bus dentry1 hole dentry2 free |xx--|xx-------------|...............|xx-------------|...............| 0 12 (8+248)=256 268 256 524 (8+256)=264 788 236 1024 So when renaming dentry1 increases its name_len length by 1, neither hole nor free is sufficient to hold the new dentry, and make_indexed_dir() is called. In make_indexed_dir() it is assumed that the first two entries of the dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root because they are treated as dot and dotdot, and only dentry2 is moved to the new leaf block. That's why count is equal to 1. Therefore add the ext4_check_dx_root() helper function to add more sanity checks to dot and dotdot before starting the conversion to avoid the above issue. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03udf: Avoid using corrupted block bitmap bufferJan Kara2-3/+15
commit a90d4471146de21745980cba51ce88e7926bcc4f upstream. When the filesystem block bitmap is corrupted, we detect the corruption while loading the bitmap and fail the allocation with error. However the next allocation from the same bitmap will notice the bitmap buffer is already loaded and tries to allocate from the bitmap with mixed results (depending on the exact nature of the bitmap corruption). Fix the problem by using BH_verified bit to indicate whether the bitmap is valid or not. Reported-by: syzbot+5f682cd029581f9edfd1@syzkaller.appspotmail.com CC: stable@vger.kernel.org Link: https://patch.msgid.link/20240617154201.29512-2-jack@suse.cz Fixes: 1e0d4adf17e7 ("udf: Check consistency of Space Bitmap Descriptor") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03cifs: mount with "unix" mount option for SMB1 incorrectly handledSteve French1-0/+7
commit 0e314e452687ce0ec5874e42cdb993a34325d3d2 upstream. Although by default we negotiate CIFS Unix Extensions for SMB1 mounts to Samba (and they work if the user does not specify "unix" or "posix" or "linux" on mount), and we do properly handle when a user turns them off with "nounix" mount parm. But with the changes to the mount API we broke cases where the user explicitly specifies the "unix" option (or equivalently "linux" or "posix") on mount with vers=1.0 to Samba or other servers which support the CIFS Unix Extensions. "mount error(95): Operation not supported" and logged: "CIFS: VFS: Check vers= mount option. SMB3.11 disabled but required for POSIX extensions" even though CIFS Unix Extensions are supported for vers=1.0 This patch fixes the case where the user specifies both "unix" (or equivalently "posix" or "linux") and "vers=1.0" on mount to a server which supports the CIFS Unix Extensions. Cc: stable@vger.kernel.org Reviewed-by: David Howells <dhowell@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03cifs: fix reconnect with SMB1 UNIX ExtensionsSteve French1-1/+16
commit a214384ce26b6111ea8c8d58fa82a1ca63996c38 upstream. When mounting with the SMB1 Unix Extensions (e.g. mounts to Samba with vers=1.0), reconnects no longer reset the Unix Extensions (SetFSInfo SET_FILE_UNIX_BASIC) after tcon so most operations (e.g. stat, ls, open, statfs) will fail continuously with: "Operation not supported" if the connection ever resets (e.g. due to brief network disconnect) Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03cifs: fix potential null pointer use in destroy_workqueue in init_cifs error ↵Steve French1-4/+4
path commit 193cc89ea0ca1da311877d2b4bb5e9f03bcc82a2 upstream. Dan Carpenter reported a Smack static checker warning: fs/smb/client/cifsfs.c:1981 init_cifs() error: we previously assumed 'serverclose_wq' could be null (see line 1895) The patch which introduced the serverclose workqueue used the wrong oredering in error paths in init_cifs() for freeing it on errors. Fixes: 173217bd7336 ("smb3: retrying on failed server close") Cc: stable@vger.kernel.org Cc: Ritvik Budhiraja <rbudhiraja@microsoft.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: David Howells <dhowell@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03ext2: Verify bitmap and itable block numbers before using themJan Kara1-2/+9
commit 322a6aff03937aa1ece33b4e46c298eafaf9ac41 upstream. Verify bitmap block numbers and inode table blocks are sane before using them for checking bits in the block bitmap. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()Chao Yu1-0/+3
commit 26a2ed107929a855155429b11e1293b83e6b2a8b upstream. Syzbot reports uninitialized value access issue as below: loop0: detected capacity change from 0 to 64 ===================================================== BUG: KMSAN: uninit-value in hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30 hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30 d_revalidate fs/namei.c:862 [inline] lookup_fast+0x89e/0x8e0 fs/namei.c:1649 walk_component fs/namei.c:2001 [inline] link_path_walk+0x817/0x1480 fs/namei.c:2332 path_lookupat+0xd9/0x6f0 fs/namei.c:2485 filename_lookup+0x22e/0x740 fs/namei.c:2515 user_path_at_empty+0x8b/0x390 fs/namei.c:2924 user_path_at include/linux/namei.h:57 [inline] do_mount fs/namespace.c:3689 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x66b/0x810 fs/namespace.c:3875 __x64_sys_mount+0xe4/0x140 fs/namespace.c:3875 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b BUG: KMSAN: uninit-value in hfs_ext_read_extent fs/hfs/extent.c:196 [inline] BUG: KMSAN: uninit-value in hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366 hfs_ext_read_extent fs/hfs/extent.c:196 [inline] hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366 block_read_full_folio+0x4ff/0x11b0 fs/buffer.c:2271 hfs_read_folio+0x55/0x60 fs/hfs/inode.c:39 filemap_read_folio+0x148/0x4f0 mm/filemap.c:2426 do_read_cache_folio+0x7c8/0xd90 mm/filemap.c:3553 do_read_cache_page mm/filemap.c:3595 [inline] read_cache_page+0xfb/0x2f0 mm/filemap.c:3604 read_mapping_page include/linux/pagemap.h:755 [inline] hfs_btree_open+0x928/0x1ae0 fs/hfs/btree.c:78 hfs_mdb_get+0x260c/0x3000 fs/hfs/mdb.c:204 hfs_fill_super+0x1fb1/0x2790 fs/hfs/super.c:406 mount_bdev+0x628/0x920 fs/super.c:1359 hfs_mount+0xcd/0xe0 fs/hfs/super.c:456 legacy_get_tree+0x167/0x2e0 fs/fs_context.c:610 vfs_get_tree+0xdc/0x5d0 fs/super.c:1489 do_new_mount+0x7a9/0x16f0 fs/namespace.c:3145 path_mount+0xf98/0x26a0 fs/namespace.c:3475 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x919/0x9e0 fs/namespace.c:3674 __ia32_sys_mount+0x15b/0x1b0 fs/namespace.c:3674 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2190 [inline] allocate_slab mm/slub.c:2354 [inline] new_slab+0x2d7/0x1400 mm/slub.c:2407 ___slab_alloc+0x16b5/0x3970 mm/slub.c:3540 __slab_alloc mm/slub.c:3625 [inline] __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] kmem_cache_alloc_lru+0x64d/0xb30 mm/slub.c:3879 alloc_inode_sb include/linux/fs.h:3018 [inline] hfs_alloc_inode+0x5a/0xc0 fs/hfs/super.c:165 alloc_inode+0x83/0x440 fs/inode.c:260 new_inode_pseudo fs/inode.c:1005 [inline] new_inode+0x38/0x4f0 fs/inode.c:1031 hfs_new_inode+0x61/0x1010 fs/hfs/inode.c:186 hfs_mkdir+0x54/0x250 fs/hfs/dir.c:228 vfs_mkdir+0x49a/0x700 fs/namei.c:4126 do_mkdirat+0x529/0x810 fs/namei.c:4149 __do_sys_mkdirat fs/namei.c:4164 [inline] __se_sys_mkdirat fs/namei.c:4162 [inline] __x64_sys_mkdirat+0xc8/0x120 fs/namei.c:4162 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b It missed to initialize .tz_secondswest, .cached_start and .cached_blocks fields in struct hfs_inode_info after hfs_alloc_inode(), fix it. Cc: stable@vger.kernel.org Reported-by: syzbot+3ae6be33a50b5aae4dab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-fsdevel/0000000000005ad04005ee48897f@google.com Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240616013841.2217-1-chao@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03fuse: verify {g,u}id mount options correctlyEric Sandeen1-4/+20
commit 525bd65aa759ec320af1dc06e114ed69733e9e23 upstream. As was done in 0200679fc795 ("tmpfs: verify {g,u}id mount options correctly") we need to validate that the requested uid and/or gid is representable in the filesystem's idmapping. Cribbing from the above commit log, The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, fuse has been doing the correct thing. But since fuse is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock. Fixes: c30da2e981a7 ("fuse: convert to use the new mount API") Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03fs/ntfs3: Keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAPKonstantin Komarov1-1/+2
[ Upstream commit eb95678ee930d67d79fc83f0a700245ae7230455 ] We skip the run_truncate_head call also for $MFT::$ATTR_BITMAP. Otherwise wnd_map()/run_lookup_entry will not find the disk position for the bitmap parts. Fixes: 0e5b044cbf3a ("fs/ntfs3: Refactoring attr_set_size to restore after errors") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Missed error returnKonstantin Komarov1-1/+1
[ Upstream commit 2cbbd96820255fff4f0ad1533197370c9ccc570b ] Fixes: 3f3b442b5ad2 ("fs/ntfs3: Add bitmap") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03nilfs2: avoid undefined behavior in nilfs_cnt32_ge macroRyusuke Konishi1-1/+1
[ Upstream commit 0f3819e8c483771a59cf9d3190cd68a7a990083c ] According to the C standard 3.4.3p3, the result of signed integer overflow is undefined. The macro nilfs_cnt32_ge(), which compares two sequence numbers, uses signed integer subtraction that can overflow, and therefore the result of the calculation may differ from what is expected due to undefined behavior in different environments. Similar to an earlier change to the jiffies-related comparison macros in commit 5a581b367b5d ("jiffies: Avoid undefined behavior from signed overflow"), avoid this potential issue by changing the definition of the macro to perform the subtraction as unsigned integers, then cast the result to a signed integer for comparison. Link: https://lkml.kernel.org/r/20130727225828.GA11864@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/20240702183512.6390-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/proc/task_mmu: indicate PM_FILE for PMD-mapped file THPDavid Hildenbrand1-0/+2
[ Upstream commit 3f9f022e975d930709848a86a1c79775b0585202 ] Patch series "fs/proc: move page_mapcount() to fs/proc/internal.h". With all other page_mapcount() users in the tree gone, move page_mapcount() to fs/proc/internal.h, rename it and extend the documentation to prevent future (ab)use. ... of course, I find some issues while working on that code that I sort first ;) We'll now only end up calling page_mapcount() [now folio_precise_page_mapcount()] on pages mapped via present page table entries. Except for /proc/kpagecount, that still does questionable things, but we'll leave that legacy interface as is for now. Did a quick sanity check. Likely we would want some better selfestest for /proc/$/pagemap + smaps. I'll see if I can find some time to write some more. This patch (of 6): Looks like we never taught pagemap_pmd_range() about the existence of PMD-mapped file THPs. Seems to date back to the times when we first added support for non-anon THPs in the form of shmem THP. Link: https://lkml.kernel.org/r/20240607122357.115423-1-david@redhat.com Link: https://lkml.kernel.org/r/20240607122357.115423-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Fix field-spanning write in INDEX_HDRKonstantin Komarov2-6/+7
[ Upstream commit 2f3e176fee66ac86ae387787bf06457b101d9f7a ] Fields flags and res[3] replaced with one 4 byte flags. Fixes: 4534a70b7056 ("fs/ntfs3: Add headers and misc files") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Replace inode_trylock with inode_lockKonstantin Komarov1-4/+1
[ Upstream commit 69505fe98f198ee813898cbcaf6770949636430b ] The issue was detected due to xfstest 465 failing. Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Add missing .dirty_folio in address_space_operationsKonstantin Komarov1-0/+1
[ Upstream commit 0f9579d9e0331b6255132ac06bdf2c0a01cceb90 ] After switching from pages to folio [1], it became evident that the initialization of .dirty_folio for page cache operations was missed for compressed files. [1] https://lore.kernel.org/ntfs3/20240422193203.3534108-1-willy@infradead.org Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Fix getting file typeKonstantin Komarov1-1/+2
[ Upstream commit 24c5100aceedcd47af89aaa404d4c96cd2837523 ] An additional condition causes the mft record to be read from disk and get the file type dt_type. Fixes: 22457c047ed97 ("fs/ntfs3: Modified fix directory element type detection") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Missed NI_FLAG_UPDATE_PARENT settingKonstantin Komarov1-0/+1
[ Upstream commit 1c308ace1fd6de93bd0b7e1a5e8963ab27e2c016 ] Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Fix transform resident to nonresident for compressed filesKonstantin Komarov1-5/+8
[ Upstream commit 25610ff98d4a34e6a85cbe4fd8671be6b0829f8f ] Сorrected calculation of required space len (in clusters) for attribute data storage in case of compression. Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Merge synonym COMPRESSION_UNIT and NTFS_LZNT_CUNITKonstantin Komarov5-7/+4
[ Upstream commit 487f8d482a7e51a640b8f955a398f906a4f83951 ] COMPRESSION_UNIT and NTFS_LZNT_CUNIT mean the same thing (1u<<NTFS_LZNT_CUNIT) determines the size for compression (in clusters). COMPRESS_MAX_CLUSTER is not used in the code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Stable-dep-of: 25610ff98d4a ("fs/ntfs3: Fix transform resident to nonresident for compressed files") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03fs/ntfs3: Use ALIGN kernel macroKonstantin Komarov3-2/+3
[ Upstream commit 97a6815e50619377704e6566fb2b77c1aa4e2647 ] This way code will be more readable. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Stable-dep-of: 25610ff98d4a ("fs/ntfs3: Fix transform resident to nonresident for compressed files") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03ext4: avoid writing unitialized memory to disk in EA inodesJan Kara1-0/+6
[ Upstream commit 65121eff3e4c8c90f8126debf3c369228691c591 ] If the extended attribute size is not a multiple of block size, the last block in the EA inode will have uninitialized tail which will get written to disk. We will never expose the data to userspace but still this is not a good practice so just zero out the tail of the block as it isn't going to cause a noticeable performance overhead. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03ext4: don't track ranges in fast_commit if inode has inlined dataLuis Henriques (SUSE)1-0/+6
[ Upstream commit 7882b0187bbeb647967a7b5998ce4ad26ef68a9a ] When fast-commit needs to track ranges, it has to handle inodes that have inlined data in a different way because ext4_fc_write_inode_data(), in the actual commit path, will attempt to map the required blocks for the range. However, inodes that have inlined data will have it's data stored in inode->i_block and, eventually, in the extended attribute space. Unfortunately, because fast commit doesn't currently support extended attributes, the solution is to mark this commit as ineligible. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883 Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Tested-by: Ben Hutchings <benh@debian.org> Fixes: 9725958bb75c ("ext4: fast commit may miss tracking unwritten range during ftruncate") Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS serverOlga Kornievskaia2-5/+3
[ Upstream commit 4840c00003a2275668a13b82c9f5b1aed80183aa ] Previously in order to mark the communication with the DS server, we tried to use NFS_CS_DS in cl_flags. However, this flag would only be saved for the DS server and in case where DS equals MDS, the client would not find a matching nfs_client in nfs_match_client that represents the MDS (but is also a DS). Instead, don't rely on the NFS_CS_DS but instead use NFS_CS_PNFS. Fixes: 379e4adfddd6 ("NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03ext4: fix infinite loop when replaying fast_commitLuis Henriques (SUSE)1-0/+2
[ Upstream commit 907c3fe532253a6ef4eb9c4d67efb71fab58c706 ] When doing fast_commit replay an infinite loop may occur due to an uninitialized extent_status struct. ext4_ext_determine_insert_hole() does not detect the replay and calls ext4_es_find_extent_range(), which will return immediately without initializing the 'es' variable. Because 'es' contains garbage, an integer overflow may happen causing an infinite loop in this function, easily reproducible using fstest generic/039. This commit fixes this issue by unconditionally initializing the structure in function ext4_es_find_extent_range(). Thanks to Zhang Yi, for figuring out the real problem! Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240515082857.32730-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()Kees Cook1-17/+17
[ Upstream commit ff6d413b0b59466e5acf2e42f294b1842ae130a1 ] One of the last remaining users of strlcpy() in the kernel is kernfs_path_from_node_locked(), which passes back the problematic "length we _would_ have copied" return value to indicate truncation. Convert the chain of all callers to use the negative return value (some of which already doing this explicitly). All callers were already also checking for negative return values, so the risk to missed checks looks very low. In this analysis, it was found that cgroup1_release_agent() actually didn't handle the "too large" condition, so this is technically also a bug fix. :) Here's the chain of callers, and resolution identifying each one as now handling the correct return value: kernfs_path_from_node_locked() kernfs_path_from_node() pr_cont_kernfs_path() returns void kernfs_path() sysfs_warn_dup() return value ignored cgroup_path() blkg_path() bfq_bic_update_cgroup() return value ignored TRACE_IOCG_PATH() return value ignored TRACE_CGROUP_PATH() return value ignored perf_event_cgroup() return value ignored task_group_path() return value ignored damon_sysfs_memcg_path_eq() return value ignored get_mm_memcg_path() return value ignored lru_gen_seq_show() return value ignored cgroup_path_from_kernfs_id() return value ignored cgroup_show_path() already converted "too large" error to negative value cgroup_path_ns_locked() cgroup_path_ns() bpf_iter_cgroup_show_fdinfo() return value ignored cgroup1_release_agent() wasn't checking "too large" error proc_cgroup_show() already converted "too large" to negative value Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Waiman Long <longman@redhat.com> Cc: <cgroups@vger.kernel.org> Co-developed-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20231116192127.1558276-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231212211741.164376-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 1be59c97c83c ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03kernfs: fix all kernel-doc warnings and multiple typosRandy Dunlap6-48/+74
[ Upstream commit 24b3e3dd9c9c742a4dd18e71b6963f9e7ab72911 ] Fix kernel-doc warnings. Many of these are about a function's return value, so use the kernel-doc Return: format to fix those Use % prefix on numeric constant values. dir.c: fix typos/spellos file.c fix typo: s/taret/target/ Fix all of these kernel-doc warnings: dir.c:305: warning: missing initial short description on line: * kernfs_name_hash dir.c:137: warning: No description found for return value of 'kernfs_path_from_node_locked' dir.c:196: warning: No description found for return value of 'kernfs_name' dir.c:224: warning: No description found for return value of 'kernfs_path_from_node' dir.c:292: warning: No description found for return value of 'kernfs_get_parent' dir.c:312: warning: No description found for return value of 'kernfs_name_hash' dir.c:404: warning: No description found for return value of 'kernfs_unlink_sibling' dir.c:588: warning: No description found for return value of 'kernfs_node_from_dentry' dir.c:806: warning: No description found for return value of 'kernfs_find_ns' dir.c:879: warning: No description found for return value of 'kernfs_find_and_get_ns' dir.c:904: warning: No description found for return value of 'kernfs_walk_and_get_ns' dir.c:927: warning: No description found for return value of 'kernfs_create_root' dir.c:996: warning: No description found for return value of 'kernfs_root_to_node' dir.c:1016: warning: No description found for return value of 'kernfs_create_dir_ns' dir.c:1048: warning: No description found for return value of 'kernfs_create_empty_dir' dir.c:1306: warning: No description found for return value of 'kernfs_next_descendant_post' dir.c:1568: warning: No description found for return value of 'kernfs_remove_self' dir.c:1630: warning: No description found for return value of 'kernfs_remove_by_name_ns' dir.c:1667: warning: No description found for return value of 'kernfs_rename_ns' file.c:66: warning: No description found for return value of 'of_on' file.c:88: warning: No description found for return value of 'kernfs_deref_open_node_locked' file.c:1036: warning: No description found for return value of '__kernfs_create_file' inode.c:100: warning: No description found for return value of 'kernfs_setattr' mount.c:160: warning: No description found for return value of 'kernfs_root_from_sb' mount.c:198: warning: No description found for return value of 'kernfs_node_dentry' mount.c:302: warning: No description found for return value of 'kernfs_super_ns' mount.c:318: warning: No description found for return value of 'kernfs_get_tree' symlink.c:28: warning: No description found for return value of 'kernfs_create_link' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221112031456.22980-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 1be59c97c83c ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03hfsplus: fix to avoid false alarm of circular lockingChao Yu3-16/+29
[ Upstream commit be4edd1642ee205ed7bbf66edc0453b1be1fb8d7 ] Syzbot report potential ABBA deadlock as below: loop0: detected capacity change from 0 to 1024 ====================================================== WARNING: possible circular locking dependency detected 6.9.0-syzkaller-10323-g8f6a15f095a6 #0 Not tainted ------------------------------------------------------ syz-executor171/5344 is trying to acquire lock: ffff88807cb980b0 (&tree->tree_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 but task is already holding lock: ffff88807a930108 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x2da/0xb50 fs/hfsplus/extents.c:576 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_extend+0x21b/0x1b70 fs/hfsplus/extents.c:457 hfsplus_bmap_reserve+0x105/0x4e0 fs/hfsplus/btree.c:358 hfsplus_rename_cat+0x1d0/0x1050 fs/hfsplus/catalog.c:456 hfsplus_rename+0x12e/0x1c0 fs/hfsplus/dir.c:552 vfs_rename+0xbdb/0xf00 fs/namei.c:4887 do_renameat2+0xd94/0x13f0 fs/namei.c:5044 __do_sys_rename fs/namei.c:5091 [inline] __se_sys_rename fs/namei.c:5089 [inline] __x64_sys_rename+0x86/0xa0 fs/namei.c:5089 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&tree->tree_lock){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 hfsplus_setattr+0x1ce/0x280 fs/hfsplus/inode.c:265 notify_change+0xb9d/0xe70 fs/attr.c:497 do_truncate+0x220/0x310 fs/open.c:65 handle_truncate fs/namei.c:3308 [inline] do_open fs/namei.c:3654 [inline] path_openat+0x2a3d/0x3280 fs/namei.c:3807 do_filp_open+0x235/0x490 fs/namei.c:3834 do_sys_openat2+0x13e/0x1d0 fs/open.c:1406 do_sys_open fs/open.c:1421 [inline] __do_sys_creat fs/open.c:1497 [inline] __se_sys_creat fs/open.c:1491 [inline] __x64_sys_creat+0x123/0x170 fs/open.c:1491 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); This is a false alarm as tree_lock mutex are different, one is from sbi->cat_tree, and another is from sbi->ext_tree: Thread A Thread B - hfsplus_rename - hfsplus_rename_cat - hfs_find_init - mutext_lock(cat_tree->tree_lock) - hfsplus_setattr - hfsplus_file_truncate - mutex_lock(hip->extents_lock) - hfs_find_init - mutext_lock(ext_tree->tree_lock) - hfs_bmap_reserve - hfsplus_file_extend - mutex_lock(hip->extents_lock) So, let's call mutex_lock_nested for tree_lock mutex lock, and pass correct lock class for it. Fixes: 31651c607151 ("hfsplus: avoid deadlock on file truncation") Reported-by: syzbot+6030b3b1b9bf70e538c4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-fsdevel/000000000000e37a4005ef129563@google.com Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240607142304.455441-1-chao@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-27btrfs: do not BUG_ON on failure to get dir index for new snapshotFilipe Manana1-1/+4
commit df9f278239046719c91aeb59ec0afb1a99ee8b2b upstream. During the transaction commit path, at create_pending_snapshot(), there is no need to BUG_ON() in case we fail to get a dir index for the snapshot in the parent directory. This should fail very rarely because the parent inode should be loaded in memory already, with the respective delayed inode created and the parent inode's index_cnt field already initialized. However if it fails, it may be -ENOMEM like the comment at create_pending_snapshot() says or any error returned by btrfs_search_slot() through btrfs_set_inode_index_count(), which can be pretty much anything such as -EIO or -EUCLEAN for example. So the comment is not correct when it says it can only be -ENOMEM. However doing a BUG_ON() here is overkill, since we can instead abort the transaction and return the error. Note that any error returned by create_pending_snapshot() will eventually result in a transaction abort at cleanup_transaction(), called from btrfs_commit_transaction(), but we can explicitly abort the transaction at this point instead so that we get a stack trace to tell us that the call to btrfs_set_inode_index() failed. So just abort the transaction and return in case btrfs_set_inode_index() returned an error at create_pending_snapshot(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sergio González Collado <sergio.collado@gmail.com> Reported-by: syzbot+c56033c8c15c08286062@syzkaller.appspotmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27filelock: Fix fcntl/close race recovery compat pathJann Horn1-5/+4
commit f8138f2ad2f745b9a1c696a05b749eabe44337ea upstream. When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27fs/ntfs3: Validate ff offsetlei lu1-1/+5
commit 50c47879650b4c97836a0086632b3a2e300b0f06 upstream. This adds sanity checks for ff offset. There is a check on rt->first_free at first, but walking through by ff without any check. If the second ff is a large offset. We may encounter an out-of-bound read. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>