| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 91b76f1059b60f453b51877f29f0e35693737383 upstream.
In a previous commit, a bug was introduced where compact SSA summaries
failed to utilize the entire block space in non-4KB block size
configurations, leading to inefficient space management.
This patch fixes the calculation logic to ensure that compact SSA
summaries can fully occupy the block regardless of the block size.
Reported-by: Chris Mason <clm@meta.com>
Fixes: e48e16f3e37f ("f2fs: support non-4KB block size without packed_ssa feature")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 50ac3ecd8e05b6bcc350c71a4307d40c030ec7e4 ]
-----------[ cut here ]------------
kernel BUG at fs/f2fs/data.c:358!
Call Trace:
<IRQ>
blk_update_request+0x5eb/0xe70 block/blk-mq.c:987
blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1149
blk_complete_reqs block/blk-mq.c:1224 [inline]
blk_done_softirq+0x107/0x160 block/blk-mq.c:1229
handle_softirqs+0x283/0x870 kernel/softirq.c:579
__do_softirq kernel/softirq.c:613 [inline]
invoke_softirq kernel/softirq.c:453 [inline]
__irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050
</IRQ>
In f2fs_write_end_io(), it detects there is inconsistency in between
node page index (nid) and footer.nid of node page.
If footer of node page is corrupted in fuzzed image, then we load corrupted
node page w/ async method, e.g. f2fs_ra_node_pages() or f2fs_ra_node_page(),
in where we won't do sanity check on node footer, once node page becomes
dirty, we will encounter this bug after node page writeback.
Cc: stable@kernel.org
Reported-by: syzbot+803dd716c4310d16ff3a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=803dd716c4310d16ff3a
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0a736109c9d29de0c26567e42cb99b27861aa8ba ]
Add node footer sanity check during node folio's writeback, if sanity
check fails, let's shutdown filesystem to avoid looping to redirty
and writeback in .writepages.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3996b70209f145bfcf2afc7d05dd92c27b233b48 ]
This reverts commit 196c81fdd438f7ac429d5639090a9816abb9760a.
Original patch may cause below deadlock, revert it.
write remount
- write_begin
- lock_page --- lock A
- prepare_write_begin
- f2fs_map_lock
- f2fs_enable_checkpoint
- down_write(cp_enable_rwsem) --- lock B
- sync_inode_sb
- writepages
- lock_page --- lock A
- down_read(cp_enable_rwsem) --- lock A
Cc: stable@kernel.org
Fixes: 196c81fdd438 ("f2fs: block cache/dio write during f2fs_enable_checkpoint()")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ drop tracing bits ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d860974a7e38d35e9e2c4dc8a9f4223b38b6ad99 upstream.
When overwriting already allocated blocks, f2fs_iomap_begin() calls
f2fs_overwrite_io() to check block mappings. However,
f2fs_overwrite_io() iterates through all mapped blocks in the range,
which can be inefficient for fragmented files with large I/O requests.
This patch optimizes f2fs_overwrite_io() by adding a 'check_first'
parameter and introducing __f2fs_overwrite_io() helper. When called from
f2fs_iomap_begin(), we only check the first mapping to determine if the
range is already allocated, which is sufficient for setting
map.m_may_create.
This optimization significantly reduces the number of f2fs_map_blocks()
calls in f2fs_overwrite_io() when called from f2fs_iomap_begin(),
especially for fragmented files with large I/O requests.
Cc: stable@kernel.org
Fixes: 351bc761338d ("f2fs: optimize f2fs DIO overwrites")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c145c03188bc9ba1c29e0bc4d527a5978fc47f9 upstream.
Xiaolong Guo reported a f2fs bug in bugzilla [1]
[1] https://bugzilla.kernel.org/show_bug.cgi?id=220951
Quoted:
"When using stress-ng's swap stress test on F2FS filesystem with kernel 6.6+,
the system experiences data corruption leading to either:
1 dm-verity corruption errors and device reboot
2 F2FS node corruption errors and boot hangs
The issue occurs specifically when:
1 Using F2FS filesystem (ext4 is unaffected)
2 Swapfile size is less than F2FS section size (2MB)
3 Swapfile has fragmented physical layout (multiple non-contiguous extents)
4 Kernel version is 6.6+ (6.1 is unaffected)
The root cause is in check_swap_activate() function in fs/f2fs/data.c. When the
first extent of a small swapfile (< 2MB) is not aligned to section boundaries,
the function incorrectly treats it as the last extent, failing to map
subsequent extents. This results in incorrect swap_extent creation where only
the first extent is mapped, causing subsequent swap writes to overwrite wrong
physical locations (other files' data).
Steps to Reproduce
1 Setup a device with F2FS-formatted userdata partition
2 Compile stress-ng from https://github.com/ColinIanKing/stress-ng
3 Run swap stress test: (Android devices)
adb shell "cd /data/stressng; ./stress-ng-64 --metrics-brief --timeout 60
--swap 0"
Log:
1 Ftrace shows in kernel 6.6, only first extent is mapped during second
f2fs_map_blocks call in check_swap_activate():
stress-ng-swap-8990: f2fs_map_blocks: ino=11002, file offset=0, start
blkaddr=0x43143, len=0x1
(Only 4KB mapped, not the full swapfile)
2 in kernel 6.1, both extents are correctly mapped:
stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=0, start
blkaddr=0x13cd4, len=0x1
stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=1, start
blkaddr=0x60c84b, len=0xff
The problematic code is in check_swap_activate():
if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec ||
nr_pblocks % blks_per_sec ||
!f2fs_valid_pinned_area(sbi, pblock)) {
bool last_extent = false;
not_aligned++;
nr_pblocks = roundup(nr_pblocks, blks_per_sec);
if (cur_lblock + nr_pblocks > sis->max)
nr_pblocks -= blks_per_sec;
/* this extent is last one */
if (!nr_pblocks) {
nr_pblocks = last_lblock - cur_lblock;
last_extent = true;
}
ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks);
if (ret) {
if (ret == -ENOENT)
ret = -EINVAL;
goto out;
}
if (!last_extent)
goto retry;
}
When the first extent is unaligned and roundup(nr_pblocks, blks_per_sec)
exceeds sis->max, we subtract blks_per_sec resulting in nr_pblocks = 0. The
code then incorrectly assumes this is the last extent, sets nr_pblocks =
last_lblock - cur_lblock (entire swapfile), and performs migration. After
migration, it doesn't retry mapping, so subsequent extents are never processed.
"
In order to fix this issue, we need to lookup block mapping info after
we migrate all blocks in the tail of swapfile.
Cc: stable@kernel.org
Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Reported-and-tested-by: Xiaolong Guo <guoxiaolong2008@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220951
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e48e16f3e37fac76e2f0c14c58df2b0398a323b0 upstream.
Currently, F2FS requires the packed_ssa feature to be enabled when
utilizing non-4KB block sizes (e.g., 16KB). This restriction limits
the flexibility of filesystem formatting options.
This patch allows F2FS to support non-4KB block sizes even when the
packed_ssa feature is disabled. It adjusts the SSA calculation logic to
correctly handle summary entries in larger blocks without the packed
layout.
Cc: stable@kernel.org
Fixes: 7ee8bc3942f2 ("f2fs: revert summary entry count from 2048 to 512 in 16kb block support")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ce2739e482bce8d2c014d76c4531c877f382aa54 upstream.
As syzbot reported an use-after-free issue in f2fs_write_end_io().
It is caused by below race condition:
loop device umount
- worker_thread
- loop_process_work
- do_req_filebacked
- lo_rw_aio
- lo_rw_aio_complete
- blk_mq_end_request
- blk_update_request
- f2fs_write_end_io
- dec_page_count
- folio_end_writeback
- kill_f2fs_super
- kill_block_super
- f2fs_put_super
: free(sbi)
: get_pages(, F2FS_WB_CP_DATA)
accessed sbi which is freed
In kill_f2fs_super(), we will drop all page caches of f2fs inodes before
call free(sbi), it guarantee that all folios should end its writeback, so
it should be safe to access sbi before last folio_end_writeback().
Let's relocate ckpt thread wakeup flow before folio_end_writeback() to
resolve this issue.
Cc: stable@kernel.org
Fixes: e234088758fc ("f2fs: avoid wait if IO end up when do_checkpoint for better performance")
Reported-by: syzbot+b4444e3c972a7a124187@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4444e3c972a7a124187
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 98ea0039dbfdd00e5cc1b9a8afa40434476c0955 upstream.
Some f2fs sysfs attributes suffer from out-of-bounds memory access and
incorrect handling of integer values whose size is not 4 bytes.
For example:
vm:~# echo 65537 > /sys/fs/f2fs/vde/carve_out
vm:~# cat /sys/fs/f2fs/vde/carve_out
65537
vm:~# echo 4294967297 > /sys/fs/f2fs/vde/atgc_age_threshold
vm:~# cat /sys/fs/f2fs/vde/atgc_age_threshold
1
carve_out maps to {struct f2fs_sb_info}->carve_out, which is a 8-bit
integer. However, the sysfs interface allows setting it to a value
larger than 255, resulting in an out-of-range update.
atgc_age_threshold maps to {struct atgc_management}->age_threshold,
which is a 64-bit integer, but its sysfs interface cannot correctly set
values larger than UINT_MAX.
The root causes are:
1. __sbi_store() treats all default values as unsigned int, which
prevents updating integers larger than 4 bytes and causes out-of-bounds
writes for integers smaller than 4 bytes.
2. f2fs_sbi_show() also assumes all default values are unsigned int,
leading to out-of-bounds reads and incorrect access to integers larger
than 4 bytes.
This patch introduces {struct f2fs_attr}->size to record the actual size
of the integer associated with each sysfs attribute. With this
information, sysfs read and write operations can correctly access and
update values according to their real data size, avoiding memory
corruption and truncation.
Fixes: b59d0bae6ca3 ("f2fs: add sysfs support for controlling the gc_thread")
Cc: stable@kernel.org
Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
atomic commit and checkpoint writes
commit 7633a7387eb4d0259d6bea945e1d3469cd135bbc upstream.
During SPO tests, when mounting F2FS, an -EINVAL error was returned from
f2fs_recover_inode_page. The issue occurred under the following scenario
Thread A Thread B
f2fs_ioc_commit_atomic_write
- f2fs_do_sync_file // atomic = true
- f2fs_fsync_node_pages
: last_folio = inode folio
: schedule before folio_lock(last_folio) f2fs_write_checkpoint
- block_operations// writeback last_folio
- schedule before f2fs_flush_nat_entries
: set_fsync_mark(last_folio, 1)
: set_dentry_mark(last_folio, 1)
: folio_mark_dirty(last_folio)
- __write_node_folio(last_folio)
: f2fs_down_read(&sbi->node_write)//block
- f2fs_flush_nat_entries
: {struct nat_entry}->flag |= BIT(IS_CHECKPOINTED)
- unblock_operations
: f2fs_up_write(&sbi->node_write)
f2fs_write_checkpoint//return
: f2fs_do_write_node_page()
f2fs_ioc_commit_atomic_write//return
SPO
Thread A calls f2fs_need_dentry_mark(sbi, ino), and the last_folio has
already been written once. However, the {struct nat_entry}->flag did not
have the IS_CHECKPOINTED set, causing set_dentry_mark(last_folio, 1) and
write last_folio again after Thread B finishes f2fs_write_checkpoint.
After SPO and reboot, it was detected that {struct node_info}->blk_addr
was not NULL_ADDR because Thread B successfully write the checkpoint.
This issue only occurs in atomic write scenarios. For regular file
fsync operations, the folio must be dirty. If
block_operations->f2fs_sync_node_pages successfully submit the folio
write, this path will not be executed. Otherwise, the
f2fs_write_checkpoint will need to wait for the folio write submission
to complete, as sbi->nr_pages[F2FS_DIRTY_NODES] > 0. Therefore, the
situation where f2fs_need_dentry_mark checks that the {struct
nat_entry}->flag /wo the IS_CHECKPOINTED flag, but the folio write has
already been submitted, will not occur.
Therefore, for atomic file fsync, sbi->node_write should be acquired
through __write_node_folio to ensure that the IS_CHECKPOINTED flag
correctly indicates that the checkpoint write has been completed.
Fixes: 608514deba38 ("f2fs: set fsync mark only for the last dnode")
Cc: stable@kernel.org
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0eda086de85e140f53c6123a4c00662f4e614ee4 upstream.
Sysfs entry name is gc_pin_file_thresh instead of gc_pin_file_threshold,
fix it.
Cc: stable@kernel.org
Fixes: c521a6ab4ad7 ("f2fs: fix to limit gc_pin_file_threshold")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 761dac9073cd67d4705a94cd1af674945a117f4c upstream.
It missed the stat count in f2fs_gc_range.
Cc: stable@kernel.org
Fixes: 9bf1dcbdfdc8 ("f2fs: fix to account gc stats correctly")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1c253e11225bc5167217897885b85093e17c2217 upstream.
We cannot dereference bs->cur when trying to determine if bs->cur
aliases bs->sc->sa.{bno,rmap}_cur after the latter has been freed.
Fix this by sampling before type before any freeing could happen.
The correct temporal ordering was broken when we removed xfs_btnum_t.
Cc: r772577952@gmail.com
Cc: <stable@vger.kernel.org> # v6.9
Fixes: ec793e690f801d ("xfs: remove xfs_btnum_t")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jiaming Zhang <r772577952@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1caf50ce4af096d0280d59a31abdd85703cd995c upstream.
[ 9.269940][ T3222] Call trace:
[ 9.269948][ T3222] ext4_file_read_iter+0xac/0x108
[ 9.269979][ T3222] vfs_iocb_iter_read+0xac/0x198
[ 9.269993][ T3222] erofs_fileio_rq_submit+0x12c/0x180
[ 9.270008][ T3222] erofs_fileio_submit_bio+0x14/0x24
[ 9.270030][ T3222] z_erofs_runqueue+0x834/0x8ac
[ 9.270054][ T3222] z_erofs_read_folio+0x120/0x220
[ 9.270083][ T3222] filemap_read_folio+0x60/0x120
[ 9.270102][ T3222] filemap_fault+0xcac/0x1060
[ 9.270119][ T3222] do_pte_missing+0x2d8/0x1554
[ 9.270131][ T3222] handle_mm_fault+0x5ec/0x70c
[ 9.270142][ T3222] do_page_fault+0x178/0x88c
[ 9.270167][ T3222] do_translation_fault+0x38/0x54
[ 9.270183][ T3222] do_mem_abort+0x54/0xac
[ 9.270208][ T3222] el0_da+0x44/0x7c
[ 9.270227][ T3222] el0t_64_sync_handler+0x5c/0xf4
[ 9.270253][ T3222] el0t_64_sync+0x1bc/0x1c0
EROFS may encounter above panic when enabling file-backed mount w/
directio mount option, the root cause is it may suffer UAF in below
race condition:
- z_erofs_read_folio wq s_dio_done_wq
- z_erofs_runqueue
- erofs_fileio_submit_bio
- erofs_fileio_rq_submit
- vfs_iocb_iter_read
- ext4_file_read_iter
- ext4_dio_read_iter
- iomap_dio_rw
: bio was submitted and return -EIOCBQUEUED
- dio_aio_complete_work
- dio_complete
- dio->iocb->ki_complete (erofs_fileio_ki_complete())
- kfree(rq)
: it frees iocb, iocb.ki_filp can be UAF in file_accessed().
- file_accessed
: access NULL file point
Introduce a reference count in struct erofs_fileio_rq, and initialize it
as two, both erofs_fileio_ki_complete() and erofs_fileio_rq_submit() will
decrease reference count, the last one decreasing the reference count
to zero will free rq.
Cc: stable@kernel.org
Fixes: fb176750266a ("erofs: add file-backed mount support")
Fixes: 6422cde1b0d5 ("erofs: use buffered I/O for file-backed mounts by default")
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05ce49a902be15dc93854cbfc20161205a9ee446 upstream.
When hfs was converted to the new mount api a bug was introduced by
changing the allocation pattern of sb->s_fs_info. If setup_bdev_super()
fails after a new superblock has been allocated by sget_fc(), but before
hfs_fill_super() takes ownership of the filesystem-specific s_fs_info
data it was leaked.
Fix this by freeing sb->s_fs_info in hfs_kill_super().
Cc: stable@vger.kernel.org
Fixes: ffcd06b6d13b ("hfs: convert hfs to use the new mount api")
Reported-by: syzbot+ad45f827c88778ff7df6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ad45f827c88778ff7df6
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20251201222843.82310-2-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ed527ef0c264e4bed6c7b2a158ddf516b17f5f66 upstream.
When a user executes the FITRIM command, an underflow can occur when
calculating nblocks if end_block is too small. Since nblocks is of
type sector_t, which is u64, a negative nblocks value will become a
very large positive integer. This ultimately leads to the block layer
function __blkdev_issue_discard() taking an excessively long time to
process the bio chain, and the ns_segctor_sem lock remains held for a
long period. This prevents other tasks from acquiring the ns_segctor_sem
lock, resulting in the hang reported by syzbot in [1].
If the ending block is too small, typically if it is smaller than 4KiB
range, depending on the usage of the segment 0, it may be possible to
attempt a discard request beyond the device size causing the hang.
Exiting successfully and assign the discarded size (0 in this case)
to range->len.
Although the start and len values in the user input range are too small,
a conservative strategy is adopted here to safely ignore them, which is
equivalent to a no-op; it will not perform any trimming and will not
throw an error.
[1]
task:segctord state:D stack:28968 pid:6093 tgid:6093 ppid:2 task_flags:0x200040 flags:0x00080000
Call Trace:
rwbase_write_lock+0x3dd/0x750 kernel/locking/rwbase_rt.c:272
nilfs_transaction_lock+0x253/0x4c0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2569 [inline]
nilfs_segctor_thread+0x6ec/0xe00 fs/nilfs2/segment.c:2684
[ryusuke: corrected part of the commit message about the consequences]
Fixes: 82e11e857be3 ("nilfs2: add nilfs_sufile_trim_fs to trim clean segs")
Reported-by: syzbot+7eedce5eb281acd832f0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7eedce5eb281acd832f0
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cf74fcdc43b322b6188a0750b5ee79e38be6d078 upstream.
With smbdirect_send_batch processing we likely have requests without
IB_SEND_SIGNALED, which will be destroyed in the final request
that has IB_SEND_SIGNALED set.
If the connection is broken all requests are signaled
even without explicit IB_SEND_SIGNALED.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5b1c6149657af840a02885135c700ab42e6aa322 upstream.
The server has similar logic and it makes sure that
request->wr is used instead of a stack struct ib_send_wr send_wr.
This makes sure send_done can see request->wr.send_flags
as the next commit will check for IB_SEND_SIGNALED
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 93ac432274e1361b4f6cd69e7c5d9b3ac21e13f5 upstream.
When we are about to use the last send credit that was
granted to us by the peer, we need to wait until
we are ourself able to grant at least one credit
to the peer. Otherwise it might not be possible
for the peer to grant more credits.
The following sections in MS-SMBD are related to this:
3.1.5.1 Sending Upper Layer Messages
...
If Connection.SendCredits is 1 and the CreditsGranted field of the
message is 0, stop processing.
...
3.1.5.9 Managing Credits Prior to Sending
...
If Connection.ReceiveCredits is zero, or if Connection.SendCredits is
one and the Connection.SendQueue is not empty, the sender MUST allocate
and post at least one receive of size Connection.MaxReceiveSize and MUST
increment Connection.ReceiveCredits by the number allocated and posted.
If no receives are posted, the processing MUST return a value of zero to
indicate to the caller that no Send message can be currently performed.
...
This is a similar logic as we have in the server.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 21538121efe6c8c5b51c742fa02cbe820bc48714 upstream.
It turns out that our code will corrupt the stream of
reassabled data transfer messages when we trigger an
immendiate (empty) send.
In order to fix this we'll have a single 'batch' credit per
connection. And code getting that credit is free to use
as much messages until remaining_length reaches 0, then
the batch credit it given back and the next logical send can
happen.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2c1ac39ce9cd4112f406775c626eef7f3eb4c481 upstream.
This will allow us to use similar logic as we have in
the server soon, so that we can share common code later.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dc77da0373529d43175984b390106be2d8f03609 upstream.
This is basically a copy of smb_direct_{alloc,free}_sendmsg()
in the server, with just using ib_dma_unmap_page() in all
cases, which is the same as ib_dma_unmap_single().
We'll use this logic in common code in future.
(I basically backported it from my branch that
as already has everything in common).
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bf30515caec590316e0d08208e4252eed4c160df upstream.
This is like smb_direct_post_send() in the server
and will simplify porting the smbdirect_send_batch
and credit related logic from the server.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bb848d205f7ac0141af52a5acb6dd116d9b71177 upstream.
This simplifies the logic and prepares the use of
smbdirect_send_batch in order to make sure
all messages in a multi fragment send are grouped
together.
We'll add the smbdirect_send_batch processin
in a later patch.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
smbd_post_send_iter()
commit 8bfe3fd33f36b987c8200b112646732b5f5cd8b3 upstream.
If we reach this the connection is already broken as
smbd_post_send() already called
smbd_disconnect_rdma_connection().
This will also simplify further changes.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6858531e5e8d68828eec349989cefce3f45a487f upstream.
We either reach this code path before we call
new_credits = manage_credits_prior_sending(sc),
which means new_credits is still 0
or the connection is already broken as
smbd_post_send() already called
smbd_disconnect_rdma_connection().
This will also simplify further changes.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bf1656e12a9db2add716c7fb57b56967f69599fa upstream.
We don't need a stack variable in addition.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit defb3c05fee94b296eebe05aaea16d2664b00252 upstream.
In captures I saw that Windows was granting 191 credits in a batch
when its peer posted a lot of messages. We are asking for a
credit target of 255 and 191 is 252*3/4.
So we also use that logic in order to fill the
recv buffers available to the peer.
Fixes: 02548c477a90 ("smb: client: queue post_recv_credits_work also if the peer raises the credit target")
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9911b1ed187a770a43950bf51f340ad4b7beecba upstream.
The logic off managing recv credits by counting posted recv_io and
granted credits is racy.
That's because the peer might already consumed a credit,
but between receiving the incoming recv at the hardware
and processing the completion in the 'recv_done' functions
we likely have a window where we grant credits, which
don't really exist.
So we better have a decicated counter for the
available credits, which will be incremented
when we posted new recv buffers and drained when
we grant the credits to the peer.
Fixes: 5fb9b459b368 ("smb: client: count the number of posted recv_io messages in order to calculated credits")
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9da82dc73cb03e85d716a2609364572367a5ff47 upstream.
With smbdirect_send_batch processing we likely have requests without
IB_SEND_SIGNALED, which will be destroyed in the final request
that has IB_SEND_SIGNALED set.
If the connection is broken all requests are signaled
even without explicit IB_SEND_SIGNALED.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8cf2bbac6281434065f5f3aeab19c9c08ff755a2 upstream.
When we are about to use the last send credit that was
granted to us by the peer, we need to wait until
we are ourself able to grant at least one credit
to the peer. Otherwise it might not be possible
for the peer to grant more credits.
The following sections in MS-SMBD are related to this:
3.1.5.1 Sending Upper Layer Messages
...
If Connection.SendCredits is 1 and the CreditsGranted field of the
message is 0, stop processing.
...
3.1.5.9 Managing Credits Prior to Sending
...
If Connection.ReceiveCredits is zero, or if Connection.SendCredits is
one and the Connection.SendQueue is not empty, the sender MUST allocate
and post at least one receive of size Connection.MaxReceiveSize and MUST
increment Connection.ReceiveCredits by the number allocated and posted.
If no receives are posted, the processing MUST return a value of zero to
indicate to the caller that no Send message can be currently performed.
...
This problem was found by running this on Windows 2025
against ksmbd with required smb signing:
'frametest.exe -r 4k -t 20 -n 2000' after
'frametest.exe -w 4k -t 20 -n 2000'.
Link: https://lore.kernel.org/linux-cifs/b58fa352-2386-4145-b42e-9b4b1d484e17@samba.org/
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 34abd408c8ba24d7c97bd02ba874d8c714f49db1 upstream.
It turns out that our code will corrupt the stream of
reassabled data transfer messages when we trigger an
immendiate (empty) send.
In order to fix this we'll have a single 'batch' credit per
connection. And code getting that credit is free to use
as much messages until remaining_length reaches 0, then
the batch credit it given back and the next logical send can
happen.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8106978d400cc88a99fb94927afe8fec7391ca3e upstream.
In captures I saw that Windows was granting 191 credits in a batch
when its peer posted a lot of messages. We are asking for a
credit target of 255 and 191 is 252*3/4.
So we also use that logic in order to fill the
recv buffers available to the peer.
Fixes: a7eef6144c97 ("smb: server: queue post_recv_credits_work in put_recvmsg() and avoid count_avail_recvmsg")
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 26ad87a2cfb8c1384620d1693a166ed87303046e upstream.
The logic off managing recv credits by counting posted recv_io and
granted credits is racy.
That's because the peer might already consumed a credit,
but between receiving the incoming recv at the hardware
and processing the completion in the 'recv_done' functions
we likely have a window where we grant credits, which
don't really exist.
So we better have a decicated counter for the
available credits, which will be incremented
when we posted new recv buffers and drained when
we grant the credits to the peer.
This fixes regression Namjae reported with
the 6.18 release.
Fixes: 89b021a72663 ("smb: server: manage recv credits by counting posted recv_io and granted credits")
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e94268b21c8235d430ce1aa6dc0b15952744b9b upstream.
It turns out that our code will corrupt the stream of
reassabled data transfer messages when we trigger an
immendiate (empty) send.
In order to fix this we'll have a single 'batch' credit per
connection. And code getting that credit is free to use
as much messages until remaining_length reaches 0, then
the batch credit it given back and the next logical send can
happen.
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6e3c5052f9686192e178806e017b7377155f4bab upstream.
The logic off managing recv credits by counting posted recv_io and
granted credits is racy.
That's because the peer might already consumed a credit,
but between receiving the incoming recv at the hardware
and processing the completion in the 'recv_done' functions
we likely have a window where we grant credits, which
don't really exist.
So we better have a decicated counter for the
available credits, which will be incremented
when we posted new recv buffers and drained when
we grant the credits to the peer.
Fixes: 5fb9b459b368 ("smb: client: count the number of posted recv_io messages in order to calculated credits")
Fixes: 89b021a72663 ("smb: server: manage recv credits by counting posted recv_io and granted credits")
Cc: <stable@vger.kernel.org> # 6.18.x
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77ffbcac4e569566d0092d5f22627dfc0896b553 upstream.
On kthread_run() failure in ksmbd_tcp_new_connection(), the transport is
freed via free_transport(), which does not decrement active_num_conn,
leaking this counter.
Replace free_transport() with ksmbd_tcp_disconnect().
Fixes: 0d0d4680db22e ("ksmbd: add max connections parameter")
Cc: stable@vger.kernel.org
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4f3a06cc57976cafa8c6f716646be6c79a99e485 upstream.
ksmbd_chann_list xarray lacks synchronization, allowing use-after-free in
multi-channel sessions (between lookup_chann_list() and ksmbd_chann_del).
Adds rw_semaphore chann_lock to struct ksmbd_session and protects
all xa_load/xa_store/xa_erase accesses.
Cc: stable@vger.kernel.org
Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 010eb01ce23b34b50531448b0da391c7f05a72af upstream.
The problem occurs when a signed request fails smb2 signature verification
check. In __process_request(), if check_sign_req() returns an error,
set_smb2_rsp_status(work, STATUS_ACCESS_DENIED) is called.
set_smb2_rsp_status() set work->next_smb2_rcv_hdr_off as zero. By resetting
next_smb2_rcv_hdr_off to zero, the pointer to the next command in the chain
is lost. Consequently, is_chained_smb2_message() continues to point to
the same request header instead of advancing. If the header's NextCommand
field is non-zero, the function returns true, causing __handle_ksmbd_work()
to repeatedly process the same failed request in an infinite loop.
This results in the kernel log being flooded with "bad smb2 signature"
messages and high CPU usage.
This patch fixes the issue by changing the return value from
SERVER_HANDLER_CONTINUE to SERVER_HANDLER_ABORT. This ensures that
the processing loop terminates immediately rather than attempting to
continue from an invalidated offset.
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ec306600d5ba7148c9dbf8f5a8f1f5c1a044a241 upstream.
is_open, has_lease and on_list are stored in the same bitfield byte in
struct cached_fid but are updated in different code paths that may run
concurrently. Bitfield assignments generate byte read–modify–write
operations (e.g. `orb $mask, addr` on x86_64), so updating one flag can
restore stale values of the others.
A possible interleaving is:
CPU1: load old byte (has_lease=1, on_list=1)
CPU2: clear both flags (store 0)
CPU1: RMW store (old | IS_OPEN) -> reintroduces cleared bits
To avoid this class of races, convert these flags to separate bool
fields.
Cc: stable@vger.kernel.org
Fixes: ebe98f1447bbc ("cifs: enable caching of directories for which a lease is held")
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"A couple of late-breaking MM fixes. One against a new-in-this-cycle
patch and the other addresses a locking issue which has been there for
over a year"
* tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/memory-failure: reject unsupported non-folio compound page
procfs: avoid fetching build ID while holding VMA lock
|
|
Pull ceph fixes from Ilya Dryomov:
"One RBD and two CephFS fixes which address potential oopses.
The RBD thing is more of a rare edge case that pops up in our CI,
while the two CephFS scenarios are regressions that were reported by
users and can be triggered trivially in normal operation. All marked
for stable"
* tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client:
ceph: fix NULL pointer dereference in ceph_mds_auth_match()
ceph: fix oops due to invalid pointer for kfree() in parse_longname()
rbd: check for EOD after exclusive lock is ensured to be held
|
|
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock
or per-VMA lock, whichever was used to lock VMA under question, to avoid
deadlock reported by syzbot:
-> #1 (&mm->mmap_lock){++++}-{4:4}:
__might_fault+0xed/0x170
_copy_to_iter+0x118/0x1720
copy_page_to_iter+0x12d/0x1e0
filemap_read+0x720/0x10a0
blkdev_read_iter+0x2b5/0x4e0
vfs_read+0x7f4/0xae0
ksys_read+0x12a/0x250
do_syscall_64+0xcb/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
__lock_acquire+0x1509/0x26d0
lock_acquire+0x185/0x340
down_read+0x98/0x490
blkdev_read_iter+0x2a7/0x4e0
__kernel_read+0x39a/0xa90
freader_fetch+0x1d5/0xa80
__build_id_parse.isra.0+0xea/0x6a0
do_procmap_query+0xd75/0x1050
procfs_procmap_ioctl+0x7a/0xb0
__x64_sys_ioctl+0x18e/0x210
do_syscall_64+0xcb/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&mm->mmap_lock);
lock(&sb->s_type->i_mutex_key#8);
lock(&mm->mmap_lock);
rlock(&sb->s_type->i_mutex_key#8);
*** DEADLOCK ***
This seems to be exacerbated (as we haven't seen these syzbot reports
before that) by the recent:
777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context")
To make this safe, we need to grab file refcount while VMA is still locked, but
other than that everything is pretty straightforward. Internal build_id_parse()
API assumes VMA is passed, but it only needs the underlying file reference, so
just add another variant build_id_parse_file() that expects file passed
directly.
[akpm@linux-foundation.org: fix up kerneldoc]
Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org
Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The CephFS kernel client has regression starting from 6.18-rc1.
We have issue in ceph_mds_auth_match() if fs_name == NULL:
const char fs_name = mdsc->fsc->mount_options->mds_namespace;
...
if (auth->match.fs_name && strcmp(auth->match.fs_name, fs_name)) {
/ fsname mismatch, try next one */
return 0;
}
Patrick Donnelly suggested that: In summary, we should definitely start
decoding `fs_name` from the MDSMap and do strict authorizations checks
against it. Note that the `-o mds_namespace=foo` should only be used for
selecting the file system to mount and nothing else. It's possible
no mds_namespace is specified but the kernel will mount the only
file system that exists which may have name "foo".
This patch reworks ceph_mdsmap_decode() and namespace_equals() with
the goal of supporting the suggested concept. Now struct ceph_mdsmap
contains m_fs_name field that receives copy of extracted FS name
by ceph_extract_encoded_string(). For the case of "old" CephFS file
systems, it is used "cephfs" name.
[ idryomov: replace redundant %*pE with %s in ceph_mdsmap_decode(),
get rid of a series of strlen() calls in ceph_namespace_match(),
drop changes to namespace_equals() body to avoid treating empty
mds_namespace as equal, drop changes to ceph_mdsc_handle_fsmap()
as namespace_equals() isn't an equivalent substitution there ]
Cc: stable@vger.kernel.org
Fixes: 22c73d52a6d0 ("ceph: fix multifs mds auth caps issue")
Link: https://tracker.ceph.com/issues/73886
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Tested-by: Patrick Donnelly <pdonnell@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Pull smb client fixes from Steve French:
"Two small client memory leak fixes"
* tag 'v6.19rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb/client: fix memory leak in SendReceive()
smb/client: fix memory leak in smb2_open_file()
|
|
This fixes a kernel oops when reading ceph snapshot directories (.snap),
for example by simply running `ls /mnt/my_ceph/.snap`.
The variable str is guarded by __free(kfree), but advanced by one for
skipping the initial '_' in snapshot names. Thus, kfree() is called
with an invalid pointer. This patch removes the need for advancing the
pointer so kfree() is called with correct memory pointer.
Steps to reproduce:
1. Create snapshots on a cephfs volume (I've 63 snaps in my testcase)
2. Add cephfs mount to fstab
$ echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6 /mnt/test/stuff ceph acl,noatime,_netdev 0 0" >> /etc/fstab
3. Reboot the system
$ systemctl reboot
4. Check if it's really mounted
$ mount | grep stuff
5. List snapshots (expected 63 snapshots on my system)
$ ls /mnt/test/stuff/.snap
Now ls hangs forever and the kernel log shows the oops.
Cc: stable@vger.kernel.org
Fixes: 101841c38346 ("[ceph] parse_longname(): strrchr() expects NUL-terminated string")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
Suggested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A regression fix for a memory leak when raid56 is used"
* tag 'for-6.19-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmap
|
|
We allocate the bitmap but we never free it in free_raid_bio_pointers().
Fix this by adding a bitmap_free() call against the stripe_uptodate_bitmap
of a raid bio.
Fixes: 1810350b04ef ("btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap")
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045315.GA31641@lst.de/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Reproducer:
1. server: supports SMB1, directories are exported read-only
2. client: mount -t cifs -o vers=1.0 //${server_ip}/export /mnt
3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct
4. client: umount /mnt
5. client: sleep 1
6. client: modprobe -r cifs
The error message is as follows:
=============================================================================
BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Object 0x00000000d34491e6 @offset=896
Object 0x00000000bde9fab3 @offset=4480
Object 0x00000000104a1f70 @offset=6272
Object 0x0000000092a51bb5 @offset=7616
Object 0x000000006714a7db @offset=13440
...
WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x379/0x3f0, CPU#7: modprobe/712
...
Call Trace:
<TASK>
kmem_cache_destroy+0x69/0x160
cifs_destroy_request_bufs+0x39/0x40 [cifs]
cleanup_module+0x43/0xfc0 [cifs]
__se_sys_delete_module+0x1d5/0x300
__x64_sys_delete_module+0x1a/0x30
x64_sys_call+0x2299/0x2ff0
do_syscall_64+0x6e/0x270
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x39/0x40 [cifs]
WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x142/0x160, CPU#7: modprobe/712
Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6
Fixes: 6be09580df5c ("cifs: Make smb1's SendReceive() wrap cifs_send_recv()")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Reproducer:
1. server: directories are exported read-only
2. client: mount -t cifs //${server_ip}/export /mnt
3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct
4. client: umount /mnt
5. client: sleep 1
6. client: modprobe -r cifs
The error message is as follows:
=============================================================================
BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Object 0x00000000d47521be @offset=14336
...
WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x34e/0x440, CPU#0: modprobe/1577
...
Call Trace:
<TASK>
kmem_cache_destroy+0x94/0x190
cifs_destroy_request_bufs+0x3e/0x50 [cifs]
cleanup_module+0x4e/0x540 [cifs]
__se_sys_delete_module+0x278/0x400
__x64_sys_delete_module+0x5f/0x70
x64_sys_call+0x2299/0x2ff0
do_syscall_64+0x89/0x350
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x3e/0x50 [cifs]
WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x16b/0x190, CPU#0: modprobe/1577
Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6
Fixes: e255612b5ed9 ("cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|