Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 267c159f9c7bcb7009dae16889b880c5ed8759a8 upstream.
We should return when io->bio is null before doing anything. Otherwise, panic.
BUG: kernel NULL pointer dereference, address: 0000000000000010
RIP: 0010:__submit_merged_write_cond+0x164/0x240 [f2fs]
Call Trace:
<TASK>
f2fs_submit_merged_write+0x1d/0x30 [f2fs]
commit_checkpoint+0x110/0x1e0 [f2fs]
f2fs_write_checkpoint+0x9f7/0xf00 [f2fs]
? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs]
__checkpoint_and_complete_reqs+0x84/0x190 [f2fs]
? preempt_count_add+0x82/0xc0
? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs]
issue_checkpoint_thread+0x4c/0xf0 [f2fs]
? __pfx_autoremove_wake_function+0x10/0x10
kthread+0xff/0x130
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Cc: stable@vger.kernel.org # v5.18+
Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 844545c51a5b2a524b22a2fe9d0b353b827d24b4 upstream.
When writing a page from an encrypted file that is using
filesystem-layer encryption (not inline encryption), f2fs encrypts the
pagecache page into a bounce page, then writes the bounce page.
It also passes the bounce page to wbc_account_cgroup_owner(). That's
incorrect, because the bounce page is a newly allocated temporary page
that doesn't have the memory cgroup of the original pagecache page.
This makes wbc_account_cgroup_owner() not account the I/O to the owner
of the pagecache page as it should.
Fix this by always passing the pagecache page to
wbc_account_cgroup_owner().
Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support")
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3aa51c61cb4a4dcb40df51ac61171e9ac5a35321 upstream.
If the storage gives a corrupted node block due to short power failure and
reset, f2fs stops the entire operations by setting the checkpoint failure flag.
Let's give more chances to live by re-issuing IOs for a while in such critical
path.
Cc: stable@vger.kernel.org
Suggested-by: Randall Huang <huangrandall@google.com>
Suggested-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a5571cff4ffcfc24847df9fd545cc5799ac0ee5 upstream.
When converting an inline directory to a regular one, f2fs is leaking
uninitialized memory to disk because it doesn't initialize the entire
directory block. Fix this by zero-initializing the block.
This bug was introduced by commit 4ec17d688d74 ("f2fs: avoid unneeded
initializing when converting inline dentry"), which didn't consider the
security implications of leaking uninitialized memory to disk.
This was found by running xfstest generic/435 on a KMSAN-enabled kernel.
Fixes: 4ec17d688d74 ("f2fs: avoid unneeded initializing when converting inline dentry")
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 00908b3388255fc1d3782b744d07f327712f401f upstream.
This patch moves to send a ack back for receiving a FIN message only
when we are in valid states. In other cases and there might be a sender
waiting for a ack we just let it timeout at the senders time and
hopefully all other cleanups will remove the FIN message on their
sending queue. As an example we should never send out an ACK being in
LAST_ACK state or we cannot assume a working socket communication when
we are in CLOSED state.
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a58496361802070996f9bd76e941d109c4a85ebd upstream.
This patch moves the send fin handling, which should appear in a specific
state change, into the state change handling while the per node
state_lock is held. I experienced issues with other messages because
we changed the state and a fin message was sent out in a different state.
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 15c63db8e86a72e0d5cfb9bf0cd1870e39a3e5fe upstream.
Similar to the stop tx flag, the rx flag should warn about a dlm message
being received at DLM_FIN state change, when we are assuming no other
dlm application messages. If we receive a FIN message and we are in the
state DLM_FIN_WAIT2 we call midcomms_node_reset() which puts the
midcomms node into DLM_CLOSED state. Afterwards we should not set the
DLM_NODE_FLAG_STOP_RX flag any more. This patch changes the setting
DLM_NODE_FLAG_STOP_RX in those state changes when we receive a FIN
message and we assume there will be no other dlm application messages
received until we hit DLM_CLOSED state.
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 164272113b685927126c938b4a9cbd2075eb15ee upstream.
This patch sets the stop tx flag before we commit the dlm message.
This flag will report about unexpected transmissions after we
send the DLM_FIN message out, which should be the last message sent.
When we commit the dlm fin message, it could be that we already
got an ack back and the CLOSED state change already happened.
We should not set this flag when we are in CLOSED state. To avoid this
race we simply set the tx flag before the state change can be in
progress by moving it before dlm_midcomms_commit_mhandle().
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7354fa4ef697191effedc2ae9a8293427708bbf5 upstream.
If we release a midcomms node structure, there should be nothing left
inside the dlm midcomms send queue. However, sometimes this is not true
because I believe some DLM_FIN message was not acked... if we run
into a shutdown timeout, then we should be sure there is no pending send
dlm message inside this queue when releasing midcomms node structure.
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 724b6bab0d75f1dc01fdfbf7fe8d4217a5cb90ba upstream.
While working on processing dlm message in softirq context I experienced
the following KASAN use-after-free warning:
[ 151.760477] ==================================================================
[ 151.761803] BUG: KASAN: use-after-free in dlm_midcomms_commit_mhandle+0x19d/0x4b0
[ 151.763414] Read of size 4 at addr ffff88811a980c60 by task lock_torture/1347
[ 151.765284] CPU: 7 PID: 1347 Comm: lock_torture Not tainted 6.1.0-rc4+ #2828
[ 151.766778] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+16134+e5908aa2 04/01/2014
[ 151.768726] Call Trace:
[ 151.769277] <TASK>
[ 151.769748] dump_stack_lvl+0x5b/0x86
[ 151.770556] print_report+0x180/0x4c8
[ 151.771378] ? kasan_complete_mode_report_info+0x7c/0x1e0
[ 151.772241] ? dlm_midcomms_commit_mhandle+0x19d/0x4b0
[ 151.773069] kasan_report+0x93/0x1a0
[ 151.773668] ? dlm_midcomms_commit_mhandle+0x19d/0x4b0
[ 151.774514] __asan_load4+0x7e/0xa0
[ 151.775089] dlm_midcomms_commit_mhandle+0x19d/0x4b0
[ 151.775890] ? create_message.isra.29.constprop.64+0x57/0xc0
[ 151.776770] send_common+0x19f/0x1b0
[ 151.777342] ? remove_from_waiters+0x60/0x60
[ 151.778017] ? lock_downgrade+0x410/0x410
[ 151.778648] ? __this_cpu_preempt_check+0x13/0x20
[ 151.779421] ? rcu_lockdep_current_cpu_online+0x88/0xc0
[ 151.780292] _convert_lock+0x46/0x150
[ 151.780893] convert_lock+0x7b/0xc0
[ 151.781459] dlm_lock+0x3ac/0x580
[ 151.781993] ? 0xffffffffc0540000
[ 151.782522] ? torture_stop+0x120/0x120 [dlm_locktorture]
[ 151.783379] ? dlm_scan_rsbs+0xa70/0xa70
[ 151.784003] ? preempt_count_sub+0xd6/0x130
[ 151.784661] ? is_module_address+0x47/0x70
[ 151.785309] ? torture_stop+0x120/0x120 [dlm_locktorture]
[ 151.786166] ? 0xffffffffc0540000
[ 151.786693] ? lockdep_init_map_type+0xc3/0x360
[ 151.787414] ? 0xffffffffc0540000
[ 151.787947] torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture]
[ 151.789004] ? torture_stop+0x120/0x120 [dlm_locktorture]
[ 151.789858] ? 0xffffffffc0540000
[ 151.790392] ? lock_torture_cleanup+0x20/0x20 [dlm_locktorture]
[ 151.791347] ? delay_tsc+0x94/0xc0
[ 151.791898] torture_ex_iter+0xc3/0xea [dlm_locktorture]
[ 151.792735] ? torture_start+0x30/0x30 [dlm_locktorture]
[ 151.793606] lock_torture+0x177/0x270 [dlm_locktorture]
[ 151.794448] ? torture_dlm_lock_sync.isra.3+0x150/0x150 [dlm_locktorture]
[ 151.795539] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[ 151.796476] ? do_raw_spin_lock+0x11e/0x1e0
[ 151.797152] ? mark_held_locks+0x34/0xb0
[ 151.797784] ? _raw_spin_unlock_irqrestore+0x30/0x70
[ 151.798581] ? __kthread_parkme+0x79/0x110
[ 151.799246] ? trace_preempt_on+0x2a/0xf0
[ 151.799902] ? __kthread_parkme+0x79/0x110
[ 151.800579] ? preempt_count_sub+0xd6/0x130
[ 151.801271] ? __kasan_check_read+0x11/0x20
[ 151.801963] ? __kthread_parkme+0xec/0x110
[ 151.802630] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[ 151.803569] kthread+0x192/0x1d0
[ 151.804104] ? kthread_complete_and_exit+0x30/0x30
[ 151.804881] ret_from_fork+0x1f/0x30
[ 151.805480] </TASK>
[ 151.806111] Allocated by task 1347:
[ 151.806681] kasan_save_stack+0x26/0x50
[ 151.807308] kasan_set_track+0x25/0x30
[ 151.807920] kasan_save_alloc_info+0x1e/0x30
[ 151.808609] __kasan_slab_alloc+0x63/0x80
[ 151.809263] kmem_cache_alloc+0x1ad/0x830
[ 151.809916] dlm_allocate_mhandle+0x17/0x20
[ 151.810590] dlm_midcomms_get_mhandle+0x96/0x260
[ 151.811344] _create_message+0x95/0x180
[ 151.811994] create_message.isra.29.constprop.64+0x57/0xc0
[ 151.812880] send_common+0x129/0x1b0
[ 151.813467] _convert_lock+0x46/0x150
[ 151.814074] convert_lock+0x7b/0xc0
[ 151.814648] dlm_lock+0x3ac/0x580
[ 151.815199] torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture]
[ 151.816258] torture_ex_iter+0xc3/0xea [dlm_locktorture]
[ 151.817129] lock_torture+0x177/0x270 [dlm_locktorture]
[ 151.817986] kthread+0x192/0x1d0
[ 151.818518] ret_from_fork+0x1f/0x30
[ 151.819369] Freed by task 1336:
[ 151.819890] kasan_save_stack+0x26/0x50
[ 151.820514] kasan_set_track+0x25/0x30
[ 151.821128] kasan_save_free_info+0x2e/0x50
[ 151.821812] __kasan_slab_free+0x107/0x1a0
[ 151.822483] kmem_cache_free+0x204/0x5e0
[ 151.823152] dlm_free_mhandle+0x18/0x20
[ 151.823781] dlm_mhandle_release+0x2e/0x40
[ 151.824454] rcu_core+0x583/0x1330
[ 151.825047] rcu_core_si+0xe/0x20
[ 151.825594] __do_softirq+0xf4/0x5c2
[ 151.826450] Last potentially related work creation:
[ 151.827238] kasan_save_stack+0x26/0x50
[ 151.827870] __kasan_record_aux_stack+0xa2/0xc0
[ 151.828609] kasan_record_aux_stack_noalloc+0xb/0x20
[ 151.829415] call_rcu+0x4c/0x760
[ 151.829954] dlm_mhandle_delete+0x97/0xb0
[ 151.830718] dlm_process_incoming_buffer+0x2fc/0xb30
[ 151.831524] process_dlm_messages+0x16e/0x470
[ 151.832245] process_one_work+0x505/0xa10
[ 151.832905] worker_thread+0x67/0x650
[ 151.833507] kthread+0x192/0x1d0
[ 151.834046] ret_from_fork+0x1f/0x30
[ 151.834900] The buggy address belongs to the object at ffff88811a980c30
which belongs to the cache dlm_mhandle of size 88
[ 151.836894] The buggy address is located 48 bytes inside of
88-byte region [ffff88811a980c30, ffff88811a980c88)
[ 151.839007] The buggy address belongs to the physical page:
[ 151.839904] page:0000000076cf5d62 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11a980
[ 151.841378] flags: 0x8000000000000200(slab|zone=2)
[ 151.842141] raw: 8000000000000200 0000000000000000 dead000000000122 ffff8881089b43c0
[ 151.843401] raw: 0000000000000000 0000000000220022 00000001ffffffff 0000000000000000
[ 151.844640] page dumped because: kasan: bad access detected
[ 151.845822] Memory state around the buggy address:
[ 151.846602] ffff88811a980b00: fb fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb
[ 151.847761] ffff88811a980b80: fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb
[ 151.848921] >ffff88811a980c00: fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb
[ 151.850076] ^
[ 151.851085] ffff88811a980c80: fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb
[ 151.852269] ffff88811a980d00: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fc
[ 151.853428] ==================================================================
[ 151.855618] Disabling lock debugging due to kernel taint
It is accessing a mhandle in dlm_midcomms_commit_mhandle() and the mhandle
was freed by a call_rcu() call in dlm_process_incoming_buffer(),
dlm_mhandle_delete(). It looks like it was freed because an ack of
this message was received. There is a short race between committing the
dlm message to be transmitted and getting an ack back. If the ack is
faster than returning from dlm_midcomms_commit_msg_3_2(), then we run
into a use-after free because we still need to reference the mhandle when
calling srcu_read_unlock().
To avoid that, we don't allow that mhandle to be freed between
dlm_midcomms_commit_msg_3_2() and srcu_read_unlock() by using rcu read
lock. We can do that because mhandle is protected by rcu handling.
Cc: stable@vger.kernel.org
Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aad633dc0cf90093998b1ae0ba9f19b5f1dab644 upstream.
The scand kthread can send dlm messages out, especially dlm remove
messages to free memory for unused rsb on other nodes. To send out dlm
messages, midcomms must be initialized. This patch moves the midcomms
start before scand is started.
Cc: stable@vger.kernel.org
Fixes: e7fd41792fc0 ("[DLM] The core of the DLM for GFS2/CLVM")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39c1ce8eafc0ff64fb9e28536ccc7df6a8e2999d upstream.
inode->i_blocks is not real number of blocks, but 512 byte ones.
Fixes: 98d917047e8b ("exfat: add file operations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bdaadfd343e3cba49ad0b009ff4b148dad0fa404 upstream.
When a file or a directory is deleted, the hint for the cluster of
its parent directory in its in-memory inode is set as DIR_DELETED.
Therefore, DIR_DELETED must be one of invalid cluster numbers. According
to the exFAT specification, a volume can have at most 2^32-11 clusters.
However, DIR_DELETED is wrongly defined as 0xFFFF0321, which could be
a valid cluster number. To fix it, let's redefine DIR_DELETED as
0xFFFFFFF7, the bad cluster number.
Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6cb5d1a16a51d080fbc1649a5144cbc5ca7d6f88 upstream.
If the position is not aligned with the dentry size, the return
value of readdir() will be NULL and errno is 0, which means the
end of the directory stream is reached.
If the position is aligned with dentry size, but there is no file
or directory at the position, exfat_readdir() will continue to
get dentry from the next dentry. So the dentry gotten by readdir()
may not be at the position.
After this commit, if the position is not aligned with the dentry
size, round the position up to the dentry size and continue to get
the dentry.
Fixes: ca06197382bd ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 706fdcac002316893434d753be8cfb549fe1d40d upstream.
Since seekdir() does not check whether the position is valid, the
position may exceed the size of the directory. We found that for
a directory with discontinuous clusters, if the position exceeds
the size of the directory and the excess size is greater than or
equal to the cluster size, exfat_readdir() will return -EIO,
causing a file system error and making the file system unavailable.
Reproduce this bug by:
seekdir(dir, dir_size + cluster_size);
dirent = readdir(dir);
The following log will be printed if mount with 'errors=remount-ro'.
[11166.712896] exFAT-fs (sdb1): error, invalid access to FAT (entry 0xffffffff)
[11166.712905] exFAT-fs (sdb1): Filesystem has been set read-only
Fixes: 1e5654de0f51 ("exfat: handle wrong stream entry size in exfat_readdir()")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07db5e247ab5858439b14dd7cc1fe538b9efcf32 upstream.
The current hfsplus_put_super first calls hfs_btree_close on
sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an
use-after-free issue in hfsplus_release_folio.
As shown in hfsplus_fill_super, the error handling code also calls iput
before hfs_btree_close.
To fix this error, we move all iput calls before hfsplus_btree_close.
Note that this patch is tested on Syzbot.
Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com
Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com
Tested-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a9dc087fd3c484fd1ed18c5efb290efaaf44ce03 upstream.
Syzbot found a kernel BUG in hfs_bnode_put():
kernel BUG at fs/hfs/bnode.c:466!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3634 Comm: kworker/u4:5 Not tainted 6.1.0-rc7-syzkaller-00190-g97ee9d1c1696 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:hfs_bnode_put+0x46f/0x480 fs/hfs/bnode.c:466
Code: 8a 80 ff e9 73 fe ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a0 fe ff ff 48 89 df e8 db 8a 80 ff e9 93 fe ff ff e8 a1 68 2c ff <0f> 0b e8 9a 68 2c ff 0f 0b 0f 1f 84 00 00 00 00 00 55 41 57 41 56
RSP: 0018:ffffc90003b4f258 EFLAGS: 00010293
RAX: ffffffff825e318f RBX: 0000000000000000 RCX: ffff8880739dd7c0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003b4f430 R08: ffffffff825e2d9b R09: ffffed10045157d1
R10: ffffed10045157d1 R11: 1ffff110045157d0 R12: ffff8880228abe80
R13: ffff88807016c000 R14: dffffc0000000000 R15: ffff8880228abe00
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa6ebe88718 CR3: 000000001e93d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hfs_write_inode+0x1bc/0xb40
write_inode fs/fs-writeback.c:1440 [inline]
__writeback_single_inode+0x4d6/0x670 fs/fs-writeback.c:1652
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1878
__writeback_inodes_wb+0x125/0x420 fs/fs-writeback.c:1949
wb_writeback+0x440/0x7b0 fs/fs-writeback.c:2054
wb_check_start_all fs/fs-writeback.c:2176 [inline]
wb_do_writeback fs/fs-writeback.c:2202 [inline]
wb_workfn+0x827/0xef0 fs/fs-writeback.c:2235
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
The BUG_ON() is triggered at here:
/* Dispose of resources used by a node */
void hfs_bnode_put(struct hfs_bnode *node)
{
if (node) {
<skipped>
BUG_ON(!atomic_read(&node->refcnt)); <- we have issue here!!!!
<skipped>
}
}
By tracing the refcnt, I found the node is created by hfs_bmap_alloc()
with refcnt 1. Then the node is used by hfs_btree_write(). There is a
missing of hfs_bnode_get() after find the node. The issue happened in
following path:
<alloc>
hfs_bmap_alloc
hfs_bnode_find
__hfs_bnode_create <- allocate a new node with refcnt 1.
hfs_bnode_put <- decrease the refcnt
<write>
hfs_btree_write
hfs_bnode_find
__hfs_bnode_create
hfs_bnode_findhash <- find the node without refcnt increased.
hfs_bnode_put <- trigger the BUG_ON() since refcnt is 0.
Link: https://lkml.kernel.org/r/20221212021627.3766829-1-liushixin2@huawei.com
Reported-by: syzbot+5b04b49a7ec7226c7426@syzkaller.appspotmail.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d3ca9f7aeba793d74361d88a8800b2f205c9236b upstream.
argv needs to be free when setup_async_work fails or when the current
process is woken up.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
length
commit fb533473d1595fe79ecb528fda1de33552b07178 upstream.
ksmbd allowed the actual frame length to be smaller than the rfc1002
length. If allowed, it is possible to allocates a large amount of memory
that can be limited by credit management and can eventually cause memory
exhaustion problem. This patch do not allow it except SMB2 Negotiate
request which will be validated when message handling proceeds.
Also, Allow a message that padded to 8byte boundary.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8f8c43b125882ac14372f8dca0c8e50a59e78d79 upstream.
When turning debug mode on, The following error message from
ksmbd_smb2_check_message() is coming.
ksmbd: cli req padded more than expected. Length 112 not 88 for cmd:10 mid:14
data area length calculation for smb2 lock request in smb2_get_data_area_len() is
incorrect.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b7625f461da6734a21c38ba6e7558538a116a2e3 upstream.
[BUG]
Since the introduction of per-fs feature sysfs interface
(/sys/fs/btrfs/<UUID>/features/), the content of that directory is never
updated.
Thus for the following case, that directory will not show the new
features like RAID56:
# mkfs.btrfs -f $dev1 $dev2 $dev3
# mount $dev1 $mnt
# btrfs balance start -f -mconvert=raid5 $mnt
# ls /sys/fs/btrfs/$uuid/features/
extended_iref free_space_tree no_holes skinny_metadata
While after unmount and mount, we got the correct features:
# umount $mnt
# mount $dev1 $mnt
# ls /sys/fs/btrfs/$uuid/features/
extended_iref free_space_tree no_holes raid56 skinny_metadata
[CAUSE]
Because we never really try to update the content of per-fs features/
directory.
We had an attempt to update the features directory dynamically in commit
14e46e04958d ("btrfs: synchronize incompat feature bits with sysfs
files"), but unfortunately it get reverted in commit e410e34fad91
("Revert "btrfs: synchronize incompat feature bits with sysfs files"").
The problem in the original patch is, in the context of
btrfs_create_chunk(), we can not afford to update the sysfs group.
The exported but never utilized function, btrfs_sysfs_feature_update()
is the leftover of such attempt. As even if we go sysfs_update_group(),
new files will need extra memory allocation, and we have no way to
specify the sysfs update to go GFP_NOFS.
[FIX]
This patch will address the old problem by doing asynchronous sysfs
update in the cleaner thread.
This involves the following changes:
- Make __btrfs_(set|clear)_fs_(incompat|compat_ro) helpers to set
BTRFS_FS_FEATURE_CHANGED flag when needed
- Update btrfs_sysfs_feature_update() to use sysfs_update_group()
And drop unnecessary arguments.
- Call btrfs_sysfs_feature_update() in cleaner_kthread
If we have the BTRFS_FS_FEATURE_CHANGED flag set.
- Wake up cleaner_kthread in btrfs_commit_transaction if we have
BTRFS_FS_FEATURE_CHANGED flag
By this, all the previously dangerous call sites like
btrfs_create_chunk() need no new changes, as above helpers would
have already set the BTRFS_FS_FEATURE_CHANGED flag.
The real work happens at cleaner_kthread, thus we pay the cost of
delaying the update to sysfs directory, but the delayed time should be
small enough that end user can not distinguish though it might get
delayed if the cleaner thread is busy with removing subvolumes or
defrag.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b5463fcbdfb24e898916bcae2b1359042d26963 upstream.
Async discard does not acquire the block group reference count while it
holds a reference on the discard list. This is generally OK, as the
paths which destroy block groups tend to try to synchronize on
cancelling async discard work. However, relying on cancelling work
requires careful analysis to be sure it is safe from races with
unpinning scheduling more work.
While I am unable to find a race with unpinning in the current code for
either the unused bgs or relocation paths, I believe we have one in an
older version of auto relocation in a Meta internal build. This suggests
that this is in fact an error prone model, and could be fragile to
future changes to these bg deletion paths.
To make this ownership more clear, add a refcount for async discard. If
work is queued for a block group, its refcount should be incremented,
and when work is completed or canceled, it should be decremented.
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e843bf38f7be0766642a91523cfa65f2b021a8a upstream.
If we did not get a lease we can still return a single use cfid to the caller.
The cfid will not have has_lease set and will thus not be shared with any
other concurrent users and will be freed immediately when the caller
drops the handle.
This avoids extra roundtrips for servers that do not support directory leases
where they would first fail to get a cfid with a lease and then fallback
to try a normal SMB2_open()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 66d45ca1350a3bb8d5f4db8879ccad3ed492337a upstream.
Some servers may return that we got a lease in rsp->OplockLevel
but then in the lease context contradict this and say we got no lease
at all. Thus we need to check the context if we have a lease.
Additionally, If we do not get a lease we need to make sure we close
the handle before we return an error to the caller.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3891f6c7655a39065e44980f51ba46bb32be3133 upstream.
The aim of using encryption on a connection is to keep
the data confidential, so we must not use plaintext rdma offload
for that data!
It seems that current windows servers and ksmbd would allow
this, but that's no reason to expose the users data in plaintext!
And servers hopefully reject this in future.
Note modern windows servers support signed or encrypted offload,
see MS-SMB2 2.2.3.1.6 SMB2_RDMA_TRANSFORM_CAPABILITIES, but we don't
support that yet.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a6559cc1d35d3eeafb0296aca347b2f745a28a74 upstream.
We should have the logic to decide if we want rdma offload
in a single spot in order to advance it in future.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d643a8a446fc46c06837d08a056f69da2ff16025 upstream.
This will simplify the following changes and makes it easy to get
in passed in from the caller in future.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d99e86ebde2d7b3a04190f8d14de5bf6814bf10f upstream.
The client was sending rfc1002 session request packet with a wrong
length field set, therefore failing to mount shares against old SMB
servers over port 139.
Fix this by calculating the correct length as specified in rfc1002.
Fixes: d7173623bf0b ("cifs: use ALIGN() and round_up() macros")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit de036dcaca65cf94bf7ff09c571c077f02bc92b4 upstream.
Use a struct assignment with implicit member initialization
Signed-off-by: Volker Lendecke <vl@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d447e794a37288ec7a080aa1b044a8d9deebbab7 upstream.
oparms was not fully initialized
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9ee2e307c6b06384b6f9e393a9b8e048e8fc277 upstream.
Do not map STATUS_OBJECT_NAME_INVALID to -EREMOTE under non-DFS
shares, or 'nodfs' mounts or CONFIG_CIFS_DFS_UPCALL=n builds.
Otherwise, in the slow path, get a referral to figure out whether it
is an actual DFS link.
This could be simply reproduced under a non-DFS share by running the
following
$ mount.cifs //srv/share /mnt -o ...
$ cat /mnt/$(printf '\U110000')
cat: '/mnt/'$'\364\220\200\200': Object is remote
Fixes: c877ce47e137 ("cifs: reduce roundtrips on create/qinfo requests")
CC: stable@vger.kernel.org # 6.2
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3c0070f54b3128de498c2dd9934a21f0dd867111 ]
Make sure to get an up-to-date TCP_Server_Info::nr_targets value prior
to waiting the server to be reconnected in smb2_reconnect(). It is
set in cifs_tcp_ses_needs_reconnect() and protected by
TCP_Server_Info::srv_lock.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ]
We had a bug report that xfstest generic/355 was failing on NFSv4.0.
This test sets various combinations of setuid/setgid modes and tests
whether DIO writes will cause them to be stripped.
What I found was that the server did properly strip those bits, but
the client didn't notice because it held a delegation that was not
recalled. The recall didn't occur because the client itself was the
one generating the activity and we avoid recalls in that case.
Clearing setuid bits is an "implicit" activity. The client didn't
specifically request that we do that, so we need the server to issue a
CB_RECALL, or avoid the situation entirely by not issuing a delegation.
The easiest fix here is to simply not give out a delegation if the file
is being opened for write, and the mode has the setuid and/or setgid bit
set. Note that there is a potential race between the mode and lease
being set, so we test for this condition both before and after setting
the lease.
This patch fixes generic/355, generic/683 and generic/684 for me. (Note
that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and
fail).
Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1f0001d43d0c0ac2a19a34a914f6595ad97cbc1d ]
At first, I thought this might be a source of nfsd_file overputs, but
the current callers seem to avoid an extra put when nfsd4_verify_copy
returns an error.
Still, it's "bad form" to leave the pointers filled out when we don't
have a reference to them anymore, and that might lead to bugs later.
Zero them out as a defensive coding measure.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b66f723bb552ad59c2acb5d45ea45c890f84498b ]
In gfs2_make_fs_rw(), make sure to call gfs2_consist() to report an
inconsistency and mark the filesystem as withdrawn when
gfs2_find_jhead() fails.
At the end of gfs2_make_fs_rw(), when we discover that the filesystem
has been withdrawn, make sure we report an error. This also replaces
the gfs2_withdrawn() check after gfs2_find_jhead().
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 48df133578c70185a95a49390d42df1996ddba2a ]
GCC does not like having a partially allocated object, since it cannot
reason about it for bounds checking when it is passed to other code.
Instead, fully allocate sig_inputArgs. (Alternatively, sig_inputArgs
should be defined as a struct coda_in_hdr, if it is actually not using
any other part of the union.) Seen under GCC 13:
../fs/coda/upcall.c: In function 'coda_upcall':
../fs/coda/upcall.c:801:22: warning: array subscript 'union inputArgs[0]' is partly outside array bounds of 'unsigned char[20]' [-Warray-bounds=]
801 | sig_inputArgs->ih.opcode = CODA_SIGNAL;
| ^~
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: coda@cs.cmu.edu
Cc: codalist@coda.cs.cmu.edu
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230127223921.never.882-kees@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 28232909ba43561887508a6ef46d7f33a648f375 ]
[BUG]
When debugging a scrub related metadata error, it turns out that our
metadata error reporting is not ideal.
The only 3 error messages are:
- BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
Showing we have metadata generation mismatch errors.
- BTRFS error (device dm-2): unable to fixup (regular) error at logical 7110656 on dev /dev/mapper/test-scratch1
Showing which tree blocks are corrupted.
- BTRFS warning (device dm-2): checksum/header error at logical 24772608 on dev /dev/mapper/test-scratch2, physical 3801088: metadata node (level 1) in tree 5
Showing which physical range the corrupted metadata is at.
We have to combine the above 3 to know we have a corrupted metadata with
generation mismatch.
And this is already the better case, if we have other problems, like
fsid mismatch, we can not even know the cause.
[CAUSE]
The problem is caused by the fact that, scrub_checksum_tree_block()
never outputs any error message.
It just return two bits for scrub: sblock->header_error, and
sblock->generation_error.
And later we report error in scrub_print_warning(), but unfortunately we
only have two bits, there is not really much thing we can done to print
any detailed errors.
[FIX]
This patch will do the following to enhance the error reporting of
metadata scrub:
- Add extra warning (ratelimited) for every error we hit
This can help us to distinguish the different types of errors.
Some errors can help us to know what's going wrong immediately,
like bytenr mismatch.
- Re-order the checks
Currently we check bytenr first, then immediately generation.
This can lead to false generation mismatch reports, while the fsid
mismatches.
Here is the new output for the bug I'm debugging (we forgot to
writeback tree blocks for commit roots):
BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
Now we can immediately know it's some tree blocks didn't even get written
back, other than the original confusing generation mismatch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 47d586913f2abec4d240bae33417f537fda987ec ]
Currently, filp_close() and generic_shutdown_super() use printk() to log
messages when bugs are detected. This is problematic because infrastructure
like syzkaller has no idea that this message indicates a bug.
In addition, some people explicitly want their kernels to BUG() when kernel
data corruption has been detected (CONFIG_BUG_ON_DATA_CORRUPTION).
And finally, when generic_shutdown_super() detects remaining inodes on a
system without CONFIG_BUG_ON_DATA_CORRUPTION, it would be nice if later
accesses to a busy inode would at least crash somewhat cleanly rather than
walking through freed memory.
To address all three, use CHECK_DATA_CORRUPTION() when kernel bugs are
detected.
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3d2d7e61553dbcc8ba45201d8ae4f383742c8202 ]
Similarly to other filesystems define EFSCORRUPTED error code for
reporting internal filesystem corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 ]
Currently proc_dobool expects a (bool *) in table->data, but sizeof(int)
in table->maxsize, because it uses do_proc_dointvec() directly.
This is unsafe for at least two reasons:
1. A sysctl table definition may use { .data = &variable, .maxsize =
sizeof(variable) }, not realizing that this makes the sysctl unusable
(see the Fixes: tag) and that they need to use the completely
counterintuitive sizeof(int) instead.
2. proc_dobool() will currently try to parse an array of values if given
.maxsize >= 2*sizeof(int), but will try to write values of type bool
by offsets of sizeof(int), so it will not work correctly with neither
an (int *) nor a (bool *). There is no .maxsize validation to prevent
this.
Fix this by:
1. Constraining proc_dobool() to allow only one value and .maxsize ==
sizeof(bool).
2. Wrapping the original struct ctl_table in a temporary one with .data
pointing to a local int variable and .maxsize set to sizeof(int) and
passing this one to proc_dointvec(), converting the value to/from
bool as needed (using proc_dou8vec_minmax() as an example).
3. Extending sysctl_check_table() to enforce proc_dobool() expectations.
4. Fixing the proc_dobool() docstring (it was just copy-pasted from
proc_douintvec, apparently...).
5. Converting all existing proc_dobool() users to set .maxsize to
sizeof(bool) instead of sizeof(int).
Fixes: 83efeeeb3d04 ("tty: Allow TIOCSTI to be disabled")
Fixes: a2071573d634 ("sysctl: introduce new proc handler proc_dobool")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cbb60951ce18c9b6e91d2eb97deb41d8ff616622 ]
The ->writepage() and ->writepages() operations are supposed to write
entire pages. However, on filesystems with a block size smaller than
PAGE_SIZE, __gfs2_jdata_writepage() only adds the first block to the
current transaction instead of adding the entire page. Fix that.
Fixes: 18ec7d5c3f43 ("[GFS2] Make journaled data files identical to normal files on disk")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit df57109bd50b9ed6911f3c2aa914189fe4c1fe2c ]
In smb2_reconnect_server, we allocate a dummy tcon for
calling reconnect for just the session. This should be
allocated using tconInfoAlloc, and not kmalloc.
Fixes: 3663c9045f51 ("cifs: check reconnects for channels of active tcons too")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3e161c2791f8e661eed24a2c624087084d910215 ]
If the MR allocate failed, the MR recovery work not initialized
and list not cleared. Then will be warning and UAF when release
the MR:
WARNING: CPU: 4 PID: 824 at kernel/workqueue.c:3066 __flush_work.isra.0+0xf7/0x110
CPU: 4 PID: 824 Comm: mount.cifs Not tainted 6.1.0-rc5+ #82
RIP: 0010:__flush_work.isra.0+0xf7/0x110
Call Trace:
<TASK>
__cancel_work_timer+0x2ba/0x2e0
smbd_destroy+0x4e1/0x990
_smbd_get_connection+0x1cbd/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
BUG: KASAN: use-after-free in smbd_destroy+0x4fc/0x990
Read of size 8 at addr ffff88810b156a08 by task mount.cifs/824
CPU: 4 PID: 824 Comm: mount.cifs Tainted: G W 6.1.0-rc5+ #82
Call Trace:
dump_stack_lvl+0x34/0x44
print_report+0x171/0x472
kasan_report+0xad/0x130
smbd_destroy+0x4fc/0x990
_smbd_get_connection+0x1cbd/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Allocated by task 824:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7a/0x90
_smbd_get_connection+0x1b6f/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 824:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
____kasan_slab_free+0x143/0x1b0
__kmem_cache_free+0xc8/0x330
_smbd_get_connection+0x1c6a/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Let's initialize the MR recovery work before MR allocate to prevent
the warning, remove the MRs from the list to prevent the UAF.
Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration")
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e9d3401d95d62a9531082cd2453ed42f2740e3fd ]
If the MR allocate failed, the smb direct connection info is NULL,
then smbd_destroy() will directly return, then the connection info
will be leaked.
Let's set the smb direct connection info to the server before call
smbd_destroy().
Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 90d2175572470ba7f55da8447c72ddd4942923c4 ]
Currently, we're only memcpy'ing the first __be32. Ensure we copy into
both words.
Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4c475eee02375ade6e864f1db16976ba0d96a0a2 ]
Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of
an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is
usually a no-op and doesn't block.
We have a customer running knfsd over very slow storage (XFS over Ceph
RBD). They were using the "async" export option because performance was
more important than data integrity for this application. That export
option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this
codepath however, their final CLOSE calls would still stall (since a
CLOSE effectively became a COMMIT).
I think this fsync is not strictly necessary. We only use that result to
reset the write verifier. Instead of fsync'ing all of the data when we
free an nfsd_file, we can just check for writeback errors when one is
acquired and when it is freed.
If the client never comes back, then it'll never see the error anyway
and there is no point in resetting it. If an error occurs after the
nfsd_file is removed from the cache but before the inode is evicted,
then it will reset the write verifier on the next nfsd_file_acquire,
(since there will be an unseen error).
The only exception here is if something else opens and fsyncs the file
during that window. Given that local applications work with this
limitation today, I don't see that as an issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658
Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-and-tested-by: Pierguido Lambri <plambri@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ]
The nested if statements here make no sense, as you can never reach
"else" branch in the nested statement. Fix the error handling for
when there is a courtesy client that holds a conflicting deny mode.
Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server")
Reported-by: 張智諺 <cc85nod@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ]
When nfsd4_copy fails to allocate memory for async_copy->cp_src, or
nfs4_init_copy_state fails, it calls cleanup_async_copy to do the
cleanup for the async_copy which causes page fault since async_copy
is not yet initialized.
This patche rearranges the order of initializing the fields in
async_copy and adds checks in cleanup_async_copy to skip un-initialized
fields.
Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ]
There are two different flavors of the nfsd4_copy struct. One is
embedded in the compound and is used directly in synchronous copies. The
other is dynamically allocated, refcounted and tracked in the client
struture. For the embedded one, the cleanup just involves releasing any
nfsd_files held on its behalf. For the async one, the cleanup is a bit
more involved, and we need to dequeue it from lists, unhash it, etc.
There is at least one potential refcount leak in this code now. If the
kthread_create call fails, then both the src and dst nfsd_files in the
original nfsd4_copy object are leaked.
The cleanup in this codepath is also sort of weird. In the async copy
case, we'll have up to four nfsd_file references (src and dst for both
flavors of copy structure). They are both put at the end of
nfsd4_do_async_copy, even though the ones held on behalf of the embedded
one outlive that structure.
Change it so that we always clean up the nfsd_file refs held by the
embedded copy structure before nfsd4_copy returns. Rework
cleanup_async_copy to handle both inter and intra copies. Eliminate
nfsd4_cleanup_intra_ssc since it now becomes a no-op.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: 81e722978ad2 ("NFSD: fix problems with cleanup on errors in nfsd4_copy")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fb610c4dbc996415d57d7090957ecddd4fd64fb6 ]
Its possible for __break_lease to find the layout's lease before we've
added the layout to the owner's ls_layouts list. In that case, setting
ls_recalled = true without actually recalling the layout will cause the
server to never send a recall callback.
Move the check for ls_layouts before setting ls_recalled.
Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|