summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-05-19nfsd: ensure new clients break delegationsJ. Bruce Fields1-5/+19
[ Upstream commit 217fd6f625af591e2866bebb8cda778cf85bea2e ] If nfsd already has an open file that it plans to use for IO from another, it may not need to do another vfs open, but it still may need to break any delegations in case the existing opens are for another client. Symptoms are that we may incorrectly fail to break a delegation on a write open from a different client, when the delegation-holding client already has a write open. Fixes: 28df3d1539de ("nfsd: clients don't need to break their own delegations") Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmountingTrond Myklebust1-8/+9
[ Upstream commit 8926cc8302819be9e67f70409ed001ecb2c924a9 ] If the NFS super block is being unmounted, then we currently may end up telling the server that we've forgotten the layout while it is actually still in use by the client. In that case, just assume that the client will soon return the layout anyway, and so return NFS4ERR_DELAY in response to the layout recall. Fixes: 58ac3e59235f ("NFSv4/pnfs: Clean up nfs_layout_find_inode()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFSv4.2 fix handling of sr_eof in SEEK's replyOlga Kornievskaia1-1/+4
[ Upstream commit 73f5c88f521a630ea1628beb9c2d48a2e777a419 ] Currently the client ignores the value of the sr_eof of the SEEK operation. According to the spec, if the server didn't find the requested extent and reached the end of the file, the server would return sr_eof=true. In case the request for DATA and no data was found (ie in the middle of the hole), then the lseek expects that ENXIO would be returned. Fixes: 1c6dcbe5ceff8 ("NFS: Implement SEEK") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()Nikola Livic1-1/+1
[ Upstream commit ed34695e15aba74f45247f1ee2cf7e09d449f925 ] We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym bazalii) observed the check: if (fh->size > sizeof(struct nfs_fh)) should not use the size of the nfs_fh struct which includes an extra two bytes from the size field. struct nfs_fh { unsigned short size; unsigned char data[NFS_MAXFHSIZE]; } but should determine the size from data[NFS_MAXFHSIZE] so the memcpy will not write 2 bytes beyond destination. The proposed fix is to compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs code base. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Nikola Livic <nlivic@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFS: Deal correctly with attribute generation counter overflowTrond Myklebust1-4/+4
[ Upstream commit 9fdbfad1777cb4638f489eeb62d85432010c0031 ] We need to use unsigned long subtraction and then convert to signed in order to deal correcly with C overflow rules. Fixes: f5062003465c ("NFS: Set an attribute barrier on all updates") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFSv4.2: Always flush out writes in nfs42_proc_fallocate()Trond Myklebust1-7/+9
[ Upstream commit 99f23783224355e7022ceea9b8d9f62c0fd01bd8 ] Whether we're allocating or delallocating space, we should flush out the pending writes in order to avoid races with attribute updates. Fixes: 1e564d3dbd68 ("NFSv4.2: Fix a race in nfs42_proc_deallocate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFS: Fix attribute bitmask in _nfs42_proc_fallocate()Trond Myklebust1-2/+8
[ Upstream commit e99812e1382f0bfb6149393262bc70645c9f537a ] We can't use nfs4_fattr_bitmap as a bitmask, because it hasn't been filtered to represent the attributes supported by the server. Instead, let's revert to using server->cache_consistency_bitmask after adding in the missing SPACE_USED attribute. Fixes: 913eca1aea87 ("NFS: Fallocate should use the nfs4_fattr_bitmap") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19NFS: nfs4_bitmask_adjust() must not change the server global bitmasksTrond Myklebust1-22/+34
[ Upstream commit 332d1a0373be32a3a3c152756bca45ff4f4e11b5 ] As currently set, the calls to nfs4_bitmask_adjust() will end up overwriting the contents of the nfs_server cache_consistency_bitmask field. The intention here should be to modify a private copy of that mask in the close/delegreturn/write arguments. Fixes: 76bd5c016ef4 ("NFSv4: make cache consistency bitmask dynamic") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()Chao Yu1-3/+3
[ Upstream commit 25ae837e61dee712b4b1df36602ebfe724b2a0b6 ] Callers may pass fio parameter with NULL value to f2fs_allocate_data_block(), so we should make sure accessing fio's field after fio's validation check. Fixes: f608c38c59c6 ("f2fs: clean up parameter of f2fs_allocate_data_block()") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: Fix a hungtask problem in atomic writeYi Zhuang1-13/+17
[ Upstream commit be1ee45d51384161681ecf21085a42d316ae25f7 ] In the cache writing process, if it is an atomic file, increase the page count of F2FS_WB_CP_DATA, otherwise increase the page count of F2FS_WB_DATA. When you step into the hook branch due to insufficient memory in f2fs_write_begin, f2fs_drop_inmem_pages_all will be called to traverse all atomic inodes and clear the FI_ATOMIC_FILE mark of all atomic files. In f2fs_drop_inmem_pages,first acquire the inmem_lock , revoke all the inmem_pages, and then clear the FI_ATOMIC_FILE mark. Before this mark is cleared, other threads may hold inmem_lock to add inmem_pages to the inode that has just been emptied inmem_pages, and increase the page count of F2FS_WB_CP_DATA. When the IO returns, it is found that the FI_ATOMIC_FILE flag is cleared by f2fs_drop_inmem_pages_all, and f2fs_is_atomic_file returns false,which causes the page count of F2FS_WB_DATA to be decremented. The page count of F2FS_WB_CP_DATA cannot be cleared. Finally, hungtask is triggered in f2fs_wait_on_all_pages because get_pages will never return zero. process A: process B: f2fs_drop_inmem_pages_all ->f2fs_drop_inmem_pages of inode#1 ->mutex_lock(&fi->inmem_lock) ->__revoke_inmem_pages of inode#1 f2fs_ioc_commit_atomic_write ->mutex_unlock(&fi->inmem_lock) ->f2fs_commit_inmem_pages of inode#1 ->mutex_lock(&fi->inmem_lock) ->__f2fs_commit_inmem_pages ->f2fs_do_write_data_page ->f2fs_outplace_write_data ->do_write_page ->f2fs_submit_page_write ->inc_page_count(sbi, F2FS_WB_CP_DATA ) ->mutex_unlock(&fi->inmem_lock) ->spin_lock(&sbi->inode_lock[ATOMIC_FILE]); ->clear_inode_flag(inode, FI_ATOMIC_FILE) ->spin_unlock(&sbi->inode_lock[ATOMIC_FILE]) f2fs_write_end_io ->dec_page_count(sbi, F2FS_WB_DATA ); We can fix the problem by putting the action of clearing the FI_ATOMIC_FILE mark into the inmem_lock lock. This operation can ensure that no one will submit the inmem pages before the FI_ATOMIC_FILE mark is cleared, so that there will be no atomic writes waiting for writeback. Fixes: 57864ae5ce3a ("f2fs: limit # of inmemory pages") Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to cover __allocate_new_section() with curseg_lockChao Yu1-0/+4
[ Upstream commit 823d13e12b6cbaef2f6e5d63c648643e7bc094dd ] In order to avoid race with f2fs_do_replace_block(). Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to avoid touching checkpointed data in get_victim()Chao Yu4-24/+55
[ Upstream commit 61461fc921b756ae16e64243f72af2bfc2e620db ] In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR mode to select victim: 1. LFS is set to find source section during GC, the victim should have no checkpointed data, since after GC, section could not be set free for reuse. Previously, we only check valid chpt blocks in current segment rather than section, fix it. 2. SSR | AT_SSR are set to find target segment for writes which can be fully filled by checkpointed and newly written blocks, we should never select such segment, otherwise it can cause panic or data corruption during allocation, potential case is described as below: a) target segment has 'n' (n < 512) ckpt valid blocks b) GC migrates 'n' valid blocks to other segment (segment is still in dirty list) c) GC migrates '512 - n' blocks to target segment (segment has 'n' cp_vblocks and '512 - n' vblocks) d) If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to update last i_size if fallocate partially succeedsChao Yu1-11/+11
[ Upstream commit 88f2cfc5fa90326edb569b4a81bb38ed4dcd3108 ] In the case of expanding pinned file, map.m_lblk and map.m_len will update in each round of section allocation, so in error path, last i_size will be calculated with wrong m_lblk and m_len, fix it. Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to align to section for fallocate() on pinned fileChao Yu3-19/+36
[ Upstream commit e1175f02291141bbd924fc578299305fcde35855 ] Now, fallocate() on a pinned file only allocates blocks which aligns to segment rather than section, so GC may try to migrate pinned file's block, and after several times of failure, pinned file's block could be migrated to other place, however user won't be aware of such condition, and then old obsolete block address may be readed/written incorrectly. To avoid such condition, let's try to allocate pinned file's blocks with section alignment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix a redundant call to f2fs_balance_fs if an error occursColin Ian King1-1/+2
[ Upstream commit 28e18ee636ba28532dbe425540af06245a0bbecb ] The uninitialized variable dn.node_changed does not get set when a call to f2fs_get_node_page fails. This uninitialized value gets used in the call to f2fs_balance_fs() that may or not may not balances dirty node and dentry pages depending on the uninitialized state of the variable. Fix this by only calling f2fs_balance_fs if err is not set. Thanks to Jaegeuk Kim for suggesting an appropriate fix. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 2a3407607028 ("f2fs: call f2fs_balance_fs only when node was changed") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix panic during f2fs_resize_fs()Chao Yu1-0/+13
[ Upstream commit 3ab0598e6d860ef49d029943ba80f627c15c15d6 ] f2fs_resize_fs() hangs in below callstack with testcase: - mkfs 16GB image & mount image - dd 8GB fileA - dd 8GB fileB - sync - rm fileA - sync - resize filesystem to 8GB kernel BUG at segment.c:2484! Call Trace: allocate_segment_by_default+0x92/0xf0 [f2fs] f2fs_allocate_data_block+0x44b/0x7e0 [f2fs] do_write_page+0x5a/0x110 [f2fs] f2fs_outplace_write_data+0x55/0x100 [f2fs] f2fs_do_write_data_page+0x392/0x850 [f2fs] move_data_page+0x233/0x320 [f2fs] do_garbage_collect+0x14d9/0x1660 [f2fs] free_segment_range+0x1f7/0x310 [f2fs] f2fs_resize_fs+0x118/0x330 [f2fs] __f2fs_ioctl+0x487/0x3680 [f2fs] __x64_sys_ioctl+0x8e/0xd0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The root cause is we forgot to check that whether we have enough space in resized filesystem to store all valid blocks in before-resizing filesystem, then allocator will run out-of-space during block migration in free_segment_range(). Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix to allow migrating fully valid segmentChao Yu5-16/+20
[ Upstream commit 7dede88659df38f96128ab3922c50dde2d29c574 ] F2FS_IOC_FLUSH_DEVICE/F2FS_IOC_RESIZE_FS needs to migrate all blocks of target segment to other place, no matter the segment has partially or fully valid blocks. However, after commit 803e74be04b3 ("f2fs: stop GC when the victim becomes fully valid"), we may skip migration due to target segment is fully valid, result in failing the ioctl interface, fix this. Fixes: 803e74be04b3 ("f2fs: stop GC when the victim becomes fully valid") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGEChao Yu1-33/+104
[ Upstream commit 34178b1bc4b5c936eab3adb4835578093095a571 ] Eric reported a ioctl bug in below link: https://lore.kernel.org/linux-f2fs-devel/20201103032234.GB2875@sol.localdomain/ That said, on some 32-bit architectures, u64 has only 32-bit alignment, notably i386 and x86_32, so that size of struct f2fs_gc_range compiled in x86_32 is 20 bytes, however the size in x86_64 is 24 bytes, binary compiled in x86_32 can not call F2FS_IOC_GARBAGE_COLLECT_RANGE successfully due to mismatched value of ioctl command in between binary and f2fs module, similarly, F2FS_IOC_MOVE_RANGE will fail too. In this patch we introduce two ioctls for compatibility of above special 32-bit binary: - F2FS_IOC32_GARBAGE_COLLECT_RANGE - F2FS_IOC32_MOVE_RANGE Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19f2fs: move ioctl interface definitions to separated fileChao Yu2-79/+1
[ Upstream commit fa4320cefb8537a70cc28c55d311a1f569697cd3 ] Like other filesystem does, we introduce a new file f2fs.h in path of include/uapi/linux/, and move f2fs-specified ioctl interface definitions to that file, after then, in order to use those definitions, userspace developer only need to include the new header file rather than copy & paste definitions from fs/f2fs/f2fs.h. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19cuse: prevent cloneMiklos Szeredi1-0/+2
[ Upstream commit 8217673d07256b22881127bf50dce874d0e51653 ] For cloned connections cuse_channel_release() will be called more than once, resulting in use after free. Prevent device cloning for CUSE, which does not make sense at this point, and highly unlikely to be used in real life. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19virtiofs: fix usernsMiklos Szeredi1-2/+1
[ Upstream commit 0a7419c68a45d2d066b996be5087aa2d07ce80eb ] get_user_ns() is done twice (once in virtio_fs_get_tree() and once in fuse_conn_init()), resulting in a reference leak. Also looks better to use fsc->user_ns (which *should* be the current_user_ns() at this point). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19fuse: invalidate attrs when page writeback completesVivek Goyal1-0/+9
[ Upstream commit 3466958beb31a8e9d3a1441a34228ed088b84f3e ] In fuse when a direct/write-through write happens we invalidate attrs because that might have updated mtime/ctime on server and cached mtime/ctime will be stale. What about page writeback path. Looks like we don't invalidate attrs there. To be consistent, invalidate attrs in writeback path as well. Only exception is when writeback_cache is enabled. In that case we strust local mtime/ctime and there is no need to invalidate attrs. Recently users started experiencing failure of xfstests generic/080, geneirc/215 and generic/614 on virtiofs. This happened only newer "stat" utility and not older one. This patch fixes the issue. So what's the root cause of the issue. Here is detailed explanation. generic/080 test does mmap write to a file, closes the file and then checks if mtime has been updated or not. When file is closed, it leads to flushing of dirty pages (and that should update mtime/ctime on server). But we did not explicitly invalidate attrs after writeback finished. Still generic/080 passed so far and reason being that we invalidated atime in fuse_readpages_end(). This is called in fuse_readahead() path and always seems to trigger before mmaped write. So after mmaped write when lstat() is called, it sees that atleast one of the fields being asked for is invalid (atime) and that results in generating GETATTR to server and mtime/ctime also get updated and test passes. But newer /usr/bin/stat seems to have moved to using statx() syscall now (instead of using lstat()). And statx() allows it to query only ctime or mtime (and not rest of the basic stat fields). That means when querying for mtime, fuse_update_get_attr() sees that mtime is not invalid (only atime is invalid). So it does not generate a new GETATTR and fill stat with cached mtime/ctime. And that means updated mtime is not seen by xfstest and tests start failing. Invalidating attrs after writeback completion should solve this problem in a generic manner. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19fs: dlm: flush swork on shutdownAlexander Aring1-4/+1
[ Upstream commit eec054b5a7cfe6d1f1598a323b05771ee99857b5 ] This patch fixes the flushing of send work before shutdown. The function cancel_work_sync() is not the right workqueue functionality to use here as it would cancel the work if the work queues itself. In cases of EAGAIN in send() for dlm message we need to be sure that everything is send out before. The function flush_work() will ensure that every send work is be done inclusive in EAGAIN cases. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19fs: dlm: check on minimum msglen sizeAlexander Aring1-3/+4
[ Upstream commit 710176e8363f269c6ecd73d203973b31ace119d3 ] This patch adds an additional check for minimum dlm header size which is an invalid dlm message and signals a broken stream. A msglen field cannot be less than the dlm header size because the field is inclusive header lengths. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19fs: dlm: add errno handling to check callbackAlexander Aring1-7/+16
[ Upstream commit 8aa9540b49e0833feba75dbf4f45babadd0ed215 ] This allows to return individual errno values for the config attribute check callback instead of returning invalid argument only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19fs: dlm: fix debugfs dumpAlexander Aring1-0/+1
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ] This patch fixes the following message which randomly pops up during glocktop call: seq_file: buggy .next function table_seq_next did not update position index The issue is that seq_read_iter() in fs/seq_file.c also needs an increment of the index in an non next record case as well which this patch fixes otherwise seq_read_iter() will print out the above message. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14afs: Fix speculative status fetchesDavid Howells6-2/+23
[ Upstream commit 22650f148126571be1098d34160eb4931fc77241 ] The generic/464 xfstest causes kAFS to emit occasional warnings of the form: kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015) This indicates that the data version received back from the server did not match the expected value (the DV should be incremented monotonically for each individual modification op committed to a vnode). What is happening is that a lookup call is doing a bulk status fetch speculatively on a bunch of vnodes in a directory besides getting the status of the vnode it's actually interested in. This is racing with a StoreData operation (though it could also occur with, say, a MakeDir op). On the client, a modification operation locks the vnode, but the bulk status fetch only locks the parent directory, so no ordering is imposed there (thereby avoiding an avenue to deadlock). On the server, the StoreData op handler doesn't lock the vnode until it's received all the request data, and downgrades the lock after committing the data until it has finished sending change notifications to other clients - which allows the status fetch to occur before it has finished. This means that: - a status fetch can access the target vnode either side of the exclusive section of the modification - the status fetch could start before the modification, yet finish after, and vice-versa. - the status fetch and the modification RPCs can complete in either order. - the status fetch can return either the before or the after DV from the modification. - the status fetch might regress the locally cached DV. Some of these are handled by the previous fix[1], but that's not sufficient because it checks the DV it received against the DV it cached at the start of the op, but the DV might've been updated in the meantime by a locally generated modification op. Fix this by the following means: (1) Keep track of when we're performing a modification operation on a vnode. This is done by marking vnode parameters with a 'modification' note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode for the duration. (2) Alter the speculation race detection to ignore speculative status fetches if either the vnode is marked as being modified or the data version number is not what we expected. Note that whilst the "vnode modified" warning does get recovered from as it causes the client to refetch the status at the next opportunity, it will also invalidate the pagecache, so changes might get lost. Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14ovl: invalidate readdir cache on changes to dir with originAmir Goldstein3-37/+36
[ Upstream commit 65cd913ec9d9d71529665924c81015b7ab7d9381 ] The test in ovl_dentry_version_inc() was out-dated and did not include the case where readdir cache is used on a non-merge dir that has origin xattr, indicating that it may contain leftover whiteouts. To make the code more robust, use the same helper ovl_dir_is_real() to determine if readdir cache should be used and if readdir cache should be invalidated. Fixes: b79e05aaa166 ("ovl: no direct iteration for dir with origin xattr") Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxht70nODhNHNwGFMSqDyOKLXOKrY0H6g849os4BQ7cokA@mail.gmail.com/ Cc: Chris Murphy <lists@colorremedies.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14xfs: fix return of uninitialized value in variable errorColin Ian King1-0/+1
[ Upstream commit 3b6dd9a9aeeada19d0c820ff68e979243a888bb6 ] A previous commit removed a call to xfs_attr3_leaf_read that assigned an error return code to variable error. We now have a few early error return paths to label 'out' that return error if error is set; however error now is uninitialized so potentially garbage is being returned. Fix this by setting error to zero to restore the original behaviour where error was zero at the label 'restart'. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14io_uring: fix overflows checks in provide buffersPavel Begunkov1-2/+8
[ Upstream commit 38134ada0ceea3e848fe993263c0ff6207fd46e7 ] Colin reported before possible overflow and sign extension problems in io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing useful, see d81269fecb8ce ("io_uring: fix provide_buffers sign extension"). Do that with help of check_<op>_overflow helpers. And fix struct io_provide_buf::len type, as it doesn't make much sense to keep it signed. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14seccomp: Fix CONFIG tests for Seccomp_filtersKenta.Tada@sony.com1-0/+2
[ Upstream commit 64bdc0244054f7d4bb621c8b4455e292f4e421bc ] Strictly speaking, seccomp filters are only used when CONFIG_SECCOMP_FILTER. This patch fixes the condition to enable "Seccomp_filters" in /proc/$pid/status. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com> Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14afs: Fix updating of i_mode due to 3rd party changeDavid Howells1-3/+3
[ Upstream commit 6e1eb04a87f954eb06a89ee6034c166351dfff6e ] Fix afs_apply_status() to mask off the irrelevant bits from status->mode when OR'ing them into i_mode. This can happen when a 3rd party chmod occurs. Also fix afs_inode_init_from_status() to mask off the mode bits when initialising i_mode. Fixes: 260a980317da ("[AFS]: Add "directory write" support.") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14NFSv4.2: fix copy stateid copying for the async copyOlga Kornievskaia1-2/+2
[ Upstream commit e739b12042b6b079a397a3c234f96c09d1de0b40 ] This patch fixes Dan Carpenter's report that the static checker found a problem where memcpy() was copying into too small of a buffer. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e0639dc5805a ("NFSD introduce async copy feature") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14NFSD: Fix sparse warning in nfs4proc.cChuck Lever1-6/+2
[ Upstream commit eb162e1772f85231dabc789fb4bfea63d2d9df79 ] linux/fs/nfsd/nfs4proc.c:1542:24: warning: incorrect type in assignment (different base types) linux/fs/nfsd/nfs4proc.c:1542:24: expected restricted __be32 [assigned] [usertype] status linux/fs/nfsd/nfs4proc.c:1542:24: got int Clean-up: The dup_copy_fields() function returns only zero, so make it return void for now, and get rid of the return code check. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14ovl: fix missing revert_creds() on error pathDan Carpenter1-1/+2
commit 7b279bbfd2b230c7a210ff8f405799c7e46bbf48 upstream. Smatch complains about missing that the ovl_override_creds() doesn't have a matching revert_creds() if the dentry is disconnected. Fix this by moving the ovl_override_creds() until after the disconnected check. Fixes: aa3ff3c152ff ("ovl: copy up of disconnected dentries") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffersThadeu Lima de Souza Cascardo1-2/+2
commit d1f82808877bb10d3deee7cf3374a4eb3fb582db upstream. Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by: Billy Jheng Bing-Jhong (@st424204) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: Fix occasional generic/418 failureJan Kara1-4/+21
commit 5899593f51e63dde2f07c67358bd65a641585abb upstream. Eric has noticed that after pagecache read rework, generic/418 is occasionally failing for ext4 when blocksize < pagesize. In fact, the pagecache rework just made hard to hit race in ext4 more likely. The problem is that since ext4 conversion of direct IO writes to iomap framework (commit 378f32bab371), we update inode size after direct IO write only after invalidating page cache. Thus if buffered read sneaks at unfortunate moment like: CPU1 - write at offset 1k CPU2 - read from offset 0 iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT); ext4_readpage(); ext4_handle_inode_extension() the read will zero out tail of the page as it still sees smaller inode size and thus page cache becomes inconsistent with on-disk contents with all the consequences. Fix the problem by moving inode size update into end_io handler which gets called before the page cache is invalidated. Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com> Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: allow the dax flag to be set and cleared on inline directoriesTheodore Ts'o2-1/+8
commit 4811d9929cdae4238baf5b2522247bd2f9fa7b50 upstream. This is needed to allow generic/607 to pass for file systems with the inline data_feature enabled, and it allows the use of file systems where the directories use inline_data, while the files are accessed via DAX. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: fix error return code in ext4_fc_perform_commit()Xu Yihang1-1/+3
commit e1262cd2e68a0870fb9fc95eb202d22e8f0074b7 upstream. In case of if not ext4_fc_add_tlv branch, an error return code is missing. Cc: stable@kernel.org Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: fix ext4_error_err save negative errno into superblockYe Bin1-1/+1
commit 6810fad956df9e5467e8e8a5ac66fda0836c71fa upstream. Fix As write_mmp_block() so that it returns -EIO instead of 1, so that the correct error gets saved into the superblock. Cc: stable@kernel.org Fixes: 54d3adbc29f0 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: fix error code in ext4_commit_superFengnan Chang1-2/+4
commit f88f1466e2a2e5ca17dfada436d3efa1b03a3972 upstream. We should set the error code when ext4_commit_super check argument failed. Found in code review. Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs"). Cc: stable@kernel.org Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()Zhang Yi1-3/+0
commit 72ffb49a7b623c92a37657eda7cc46a06d3e8398 upstream. When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due to some error happens behind ext4_orphan_cleanup(), it will end up triggering a after free issue of super_block. The problem is that ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is enabled, after we cleanup the truncated inodes, the last iput() will put them into the lru list, and these inodes' pages may probably dirty and will be write back by the writeback thread, so it could be raced by freeing super_block in the error path of mount_bdev(). After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it was used to ensure updating the quota file properly, but evict inode and trash data immediately in the last iput does not affect the quotafile, so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by just remove the SB_ACTIVE setting. [1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00 Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: fix check to prevent false positive report of incorrect used inodesZhang Yi1-16/+32
commit a149d2a5cabbf6507a7832a1c4fd2593c55fd450 upstream. Commit <50122847007> ("ext4: fix check to prevent initializing reserved inodes") check the block group zero and prevent initializing reserved inodes. But in some special cases, the reserved inode may not all belong to the group zero, it may exist into the second group if we format filesystem below. mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda So, it will end up triggering a false positive report of a corrupted file system. This patch fix it by avoid check reserved inodes if no free inode blocks will be zeroed. Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: annotate data race in jbd2_journal_dirty_metadata()Jan Kara1-4/+4
commit 83fe6b18b8d04c6c849379005e1679bac9752466 upstream. Assertion checks in jbd2_journal_dirty_metadata() are known to be racy but we don't want to be grabbing locks just for them. We thus recheck them under b_state_lock only if it looks like they would fail. Annotate the checks with data_race(). Cc: stable@kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11ext4: annotate data race in start_this_handle()Jan Kara1-1/+6
commit 3b1833e92baba135923af4a07e73fe6e54be5a2f upstream. Access to journal->j_running_transaction is not protected by appropriate lock and thus is racy. We are well aware of that and the code handles the race properly. Just add a comment and data_race() annotation. Cc: stable@kernel.org Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11smb3: do not attempt multichannel to server which does not support itSteve French1-0/+6
commit 9c2dc11df50d1c8537075ff6b98472198e24438e upstream. We were ignoring CAP_MULTI_CHANNEL in the server response - if the server doesn't support multichannel we should not be attempting it. See MS-SMB2 section 3.2.5.2 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-By: Tom Talpey <tom@talpey.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11smb3: when mounting with multichannel include it in requested capabilitiesSteve French1-0/+5
commit 679971e7213174efb56abc8fab1299d0a88db0e8 upstream. In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to advertise CAP_MULTICHANNEL capability when establishing multiple channels has been requested by the user doing the mount. See MS-SMB2 sections 2.2.3 and 3.2.5.2 Without setting it there is some risk that multichannel could fail if the server interpreted the field strictly. Reviewed-By: Tom Talpey <tom@talpey.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11jffs2: check the validity of dstlen in jffs2_zlib_compress()Yang Yang1-0/+3
commit 90ada91f4610c5ef11bc52576516d96c496fc3f1 upstream. KASAN reports a BUG when download file in jffs2 filesystem.It is because when dstlen == 1, cpage_out will write array out of bounds. Actually, data will not be compressed in jffs2_zlib_compress() if data's length less than 4. [ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281 [ 393.809166] Write of size 1 by task tftp/2918 [ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1 [ 393.823173] Hardware name: LS1043A RDB Board (DT) [ 393.827870] Call trace: [ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0 [ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20 [ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0 [ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80 [ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8 [ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40 [ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58 [ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0 [ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0 [ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478 [ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450 [ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0 [ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280 [ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228 [ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288 [ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238 [ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238 [ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110 [ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4 [ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64 [ 393.944197] Allocated: [ 393.946552] PID = 2918 [ 393.948913] save_stack_trace_tsk+0x0/0x220 [ 393.953096] save_stack_trace+0x18/0x20 [ 393.956932] kasan_kmalloc+0xd8/0x188 [ 393.960594] __kmalloc+0x144/0x238 [ 393.963994] jffs2_selected_compress+0x48/0x2a0 [ 393.968524] jffs2_compress+0x58/0x478 [ 393.972273] jffs2_write_inode_range+0x13c/0x450 [ 393.976889] jffs2_write_end+0x2a8/0x4a0 [ 393.980810] generic_perform_write+0x1c0/0x280 [ 393.985251] __generic_file_write_iter+0x1c4/0x228 [ 393.990040] generic_file_write_iter+0x138/0x288 [ 393.994655] __vfs_write+0x1b4/0x238 [ 393.998228] vfs_write+0xd0/0x238 [ 394.001543] SyS_write+0xa0/0x110 [ 394.004856] __sys_trace_return+0x0/0x4 [ 394.008684] Freed: [ 394.010691] PID = 2918 [ 394.013051] save_stack_trace_tsk+0x0/0x220 [ 394.017233] save_stack_trace+0x18/0x20 [ 394.021069] kasan_slab_free+0x88/0x188 [ 394.024902] kfree+0x6c/0x1d8 [ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880 [ 394.032486] jffs2_do_reserve_space+0x198/0x598 [ 394.037016] jffs2_reserve_space+0x3f8/0x4d8 [ 394.041286] jffs2_write_inode_range+0xf0/0x450 [ 394.045816] jffs2_write_end+0x2a8/0x4a0 [ 394.049737] generic_perform_write+0x1c0/0x280 [ 394.054179] __generic_file_write_iter+0x1c4/0x228 [ 394.058968] generic_file_write_iter+0x138/0x288 [ 394.063583] __vfs_write+0x1b4/0x238 [ 394.067157] vfs_write+0xd0/0x238 [ 394.070470] SyS_write+0xa0/0x110 [ 394.073783] __sys_trace_return+0x0/0x4 [ 394.077612] Memory state around the buggy address: [ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 394.104056] ^ [ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.121718] ================================================================== Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Joel Stanley <joel@jms.id.au> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11exfat: fix erroneous discard when clear cluster bitHyeongseok Kim1-10/+1
commit 77edfc6e51055b61cae2f54c8e6c3bb7c762e4fe upstream. If mounted with discard option, exFAT issues discard command when clear cluster bit to remove file. But the input parameter of cluster-to-sector calculation is abnormally added by reserved cluster size which is 2, leading to discard unrelated sectors included in target+2 cluster. With fixing this, remove the wrong comments in set/clear/find bitmap functions. Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11fuse: fix write deadlockVivek Goyal2-12/+30
commit 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 upstream. There are two modes for write(2) and friends in fuse: a) write through (update page cache, send sync WRITE request to userspace) b) buffered write (update page cache, async writeout later) The write through method kept all the page cache pages locked that were used for the request. Keeping more than one page locked is deadlock prone and Qian Cai demonstrated this with trinity fuzzing. The reason for keeping the pages locked is that concurrent mapped reads shouldn't try to pull possibly stale data into the page cache. For full page writes, the easy way to fix this is to make the cached page be the authoritative source by marking the page PG_uptodate immediately. After this the page can be safely unlocked, since mapped/cached reads will take the written data from the cache. Concurrent mapped writes will now cause data in the original WRITE request to be updated; this however doesn't cause any data inconsistency and this scenario should be exceedingly rare anyway. If the WRITE request returns with an error in the above case, currently the page is not marked uptodate; this means that a concurrent read will always read consistent data. After this patch the page is uptodate between writing to the cache and receiving the error: there's window where a cached read will read the wrong data. While theoretically this could be a regression, it is unlikely to be one in practice, since this is normal for buffered writes. In case of a partial page write to an already uptodate page the locking is also unnecessary, with the above caveats. Partial write of a not uptodate page still needs to be handled. One way would be to read the complete page before doing the write. This is not possible, since it might break filesystems that don't expect any READ requests when the file was opened O_WRONLY. The other solution is to serialize the synchronous write with reads from the partial pages. The easiest way to do this is to keep the partial pages locked. The problem is that a write() may involve two such pages (one head and one tail). This patch fixes it by only locking the partial tail page. If there's a partial head page as well, then split that off as a separate WRITE request. Reported-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/ Fixes: ea9b9907b82a ("fuse: implement perform_write") Cc: <stable@vger.kernel.org> # v2.6.26 Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>