summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2019-05-16f2fs: Fix use of number of devicesDamien Le Moal5-15/+32
commit 0916878da355650d7e77104a7ac0fa1784eca852 upstream. For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371d199 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16ceph: handle the case where a dentry has been renamed on outstanding reqJeff Layton1-1/+15
[ Upstream commit 4b8222870032715f9d995f3eb7c7acd8379a275d ] It's possible for us to issue a lookup to revalidate a dentry concurrently with a rename. If done in the right order, then we could end up processing dentry info in the reply that no longer reflects the state of the dentry. If req->r_dentry->d_name differs from the one in the trace, then just ignore the trace in the reply. We only need to do this however if the parent's i_rwsem is not held. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16afs: Fix in-progess ops to ignore server-level callback invalidationDavid Howells3-6/+2
[ Upstream commit eeba1e9cf31d064284dd1fa7bd6cfe01395bd03d ] The in-kernel afs filesystem client counts the number of server-level callback invalidation events (CB.InitCallBackState* RPC operations) that it receives from the server. This is stored in cb_s_break in various structures, including afs_server and afs_vnode. If an inode is examined by afs_validate(), say, the afs_server copy is compared, along with other break counters, to those in afs_vnode, and if one or more of the counters do not match, it is considered that the server's callback promise is broken. At points where this happens, AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be refetched from the server. afs_validate() issues an FS.FetchStatus operation to get updated metadata - and based on the updated data_version may invalidate the pagecache too. However, the break counters are also used to determine whether to note a new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag) and whether to cache the permit data included in the YFSFetchStatus record by the server. The problem comes when the server sends us a CB.InitCallBackState op. The first such instance doesn't cause cb_s_break to be incremented, but rather causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours after last use and all the volumes have been automatically unmounted and the server has forgotten about the client[*], this *will* likely cause an increment. [*] There are other circumstances too, such as the server restarting or needing to make space in its callback table. Note that the server won't send us a CB.InitCallBackState op until we talk to it again. So what happens is: (1) A mount for a new volume is attempted, a inode is created for the root vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set immediately, as we don't have a nominated server to talk to yet - and we may iterate through a few to find one. (2) Before the operation happens, afs_fetch_status(), say, notes in the cursor (fc.cb_break) the break counter sum from the vnode, volume and server counters, but the server->cb_s_break is currently 0. (3) We send FS.FetchStatus to the server. The server sends us back CB.InitCallBackState. We increment server->cb_s_break. (4) Our FS.FetchStatus completes. The reply includes a callback record. (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether the callback promise was broken by checking the break counter sum from step (2) against the current sum. This fails because of step (3), so we don't set the callback record and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode. This does not preclude the syscall from progressing, and we don't loop here rechecking the status, but rather assume it's good enough for one round only and will need to be rechecked next time. (6) afs_validate() it triggered on the vnode, probably called from d_revalidate() checking the parent directory. (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't update vnode->cb_s_break and assumes the vnode to be invalid. (8) afs_validate() needs to calls afs_fetch_status(). Go back to step (2) and repeat, every time the vnode is validated. This primarily affects volume root dir vnodes. Everything subsequent to those inherit an already incremented cb_s_break upon mounting. The issue is that we assume that the callback record and the cached permit information in a reply from the server can't be trusted after getting a server break - but this is wrong since the server makes sure things are done in the right order, holding up our ops if necessary[*]. [*] There is an extremely unlikely scenario where a reply from before the CB.InitCallBackState could get its delivery deferred till after - at which point we think we have a promise when we don't. This, however, requires unlucky mass packet loss to one call. AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from a server we've never contacted before, but this should be unnecessary. It's also further insulated from the problem on an initial mount by querying the server first with FS.GetCapabilities, which triggers the CB.InitCallBackState. Fix this by (1) Remove AFS_SERVER_FL_NEW. (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the calculation. (3) In afs_cb_is_broken(), don't include cb_s_break in the check. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16afs: Unlock pages for __pagevec_release()Marc Dionne1-0/+1
[ Upstream commit 21bd68f196ca91fc0f3d9bd1b32f6e530e8c1c88 ] __pagevec_release() complains loudly if any page in the vector is still locked. The pages need to be locked for generic_error_remove_page(), but that function doesn't actually unlock them. Unlock the pages afterwards. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jonathan Billings <jsbillin@umich.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16kernfs: fix barrier usage in __kernfs_new_node()Andrea Parri1-3/+2
commit 998267900cee901c5d1dfa029a6304d00acbc29f upstream. smp_mb__before_atomic() can not be applied to atomic_set(). Remove the barrier and rely on RELEASE synchronization. Fixes: ba16b2846a8c6 ("kernfs: add an API to get kernfs node from inode number") Cc: stable@vger.kernel.org Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-10NFSv4.1 fix incorrect return value in copy_file_rangeOlga Kornievskaia2-4/+3
[ Upstream commit 0769663b4f580566ef6cdf366f3073dbe8022c39 ] According to the NFSv4.2 spec if the input and output file is the same file, operation should fail with EINVAL. However, linux copy_file_range() system call has no such restrictions. Therefore, in such case let's return EOPNOTSUPP and allow VFS to fallback to doing do_splice_direct(). Also when copy_file_range is called on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support COPY functionality), we also need to return EOPNOTSUPP and fallback to a regular copy. Fixes xfstest generic/075, generic/091, generic/112, generic/263 for all NFSv4.x versions. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08fs: stream_open - opener for stream-like files so that read and write can ↵Kirill Smelkov2-2/+21
run simultaneously without deadlock [ Upstream commit 10dce8af34226d90fa56746a934f8da5dcdba3df ] Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete. This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus. The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006. However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not. See https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396 for historic context. The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples: kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ... Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below): drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel. FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations: See https://github.com/libfuse/osspd https://lwn.net/Articles/308445 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario: https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels: https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 Let's fix this regression. The plan is: 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers. 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously. 3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply. It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations. Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-08hugetlbfs: fix memory leak for resv_mapMike Kravetz1-6/+14
[ Upstream commit 58b6e5e8f1addd44583d61b0a03c0f5519527e35 ] When mknod is used to create a block special file in hugetlbfs, it will allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc(). inode->i_mapping->private_data will point the newly allocated resv_map. However, when the device special file is opened bd_acquire() will set inode->i_mapping to bd_inode->i_mapping. Thus the pointer to the allocated resv_map is lost and the structure is leaked. Programs to reproduce: mount -t hugetlbfs nodev hugetlbfs mknod hugetlbfs/dev b 0 0 exec 30<> hugetlbfs/dev umount hugetlbfs/ resv_map structures are only needed for inodes which can have associated page allocations. To fix the leak, only allocate resv_map for those inodes which could possibly be associated with page allocations. Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Yufen Yu <yuyufen@huawei.com> Suggested-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-08debugfs: fix use-after-free on symlink traversalAl Viro1-4/+9
[ Upstream commit 93b919da64c15b90953f96a536e5e61df896ca57 ] symlink body shouldn't be freed without an RCU delay. Switch debugfs to ->destroy_inode() and use of call_rcu(); free both the inode and symlink body in the callback. Similar to solution for bpf, only here it's even more obvious that ->evict_inode() can be dropped. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-08jffs2: fix use-after-free on symlink traversalAl Viro2-6/+4
[ Upstream commit 4fdcfab5b5537c21891e22e65996d4d0dd8ab4ca ] free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04afs: Fix StoreData op marshallingDavid Howells2-4/+4
[ Upstream commit 8c7ae38d1ce12a0eaeba655df8562552b3596c7f ] The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls generated by ->setattr() ops for the purpose of expanding a file is incorrect due to older documentation incorrectly describing the way the RPC 'FileLength' parameter is meant to work. The older documentation says that this is the length the file is meant to end up at the end of the operation; however, it was never implemented this way in any of the servers, but rather the file is truncated down to this before the write operation is effected, and never expanded to it (and, indeed, it was renamed to 'TruncPos' in 2014). Fix this by setting the position parameter to the new file length and doing a zero-lengh write there. The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file it then mmaps. This can be tested by giving the following test program a filename in an AFS directory: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> int main(int argc, char *argv[]) { char *p; int fd; if (argc != 2) { fprintf(stderr, "Format: test-trunc-mmap <file>\n"); exit(2); } fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC); if (fd < 0) { perror(argv[1]); exit(1); } if (ftruncate(fd, 0x140008) == -1) { perror("ftruncate"); exit(1); } p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror("mmap"); exit(1); } p[0] = 'a'; if (munmap(p, 4096) < 0) { perror("munmap"); exit(1); } if (close(fd) < 0) { perror("close"); exit(1); } exit(0); } Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Jonathan Billings <jsbillin@umich.edu> Tested-by: Jonathan Billings <jsbillin@umich.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04ceph: fix use-after-free on symlink traversalAl Viro1-1/+1
[ Upstream commit daf5cc27eed99afdea8d96e71b89ba41f5406ef6 ] free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04NFS: Fix a typo in nfs_init_timeout_values()Trond Myklebust1-1/+1
[ Upstream commit 5a698243930c441afccec04e4d5dc8febfd2b775 ] Specifying a retrans=0 mount parameter to a NFS/TCP mount, is inadvertently causing the NFS client to rewrite any specified timeout parameter to the default of 60 seconds. Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writesFilipe Manana1-6/+43
[ Upstream commit 609e804d771f59dc5d45a93e5ee0053c74bbe2bf ] When we are mixing buffered writes with direct IO writes against the same file and snapshotting is happening concurrently, we can end up with a corrupt file content in the snapshot. Example: 1) Inode/file is empty. 2) Snapshotting starts. 2) Buffered write at offset 0 length 256Kb. This updates the i_size of the inode to 256Kb, disk_i_size remains zero. This happens after the task doing the snapshot flushes all existing delalloc. 3) DIO write at offset 256Kb length 768Kb. Once the ordered extent completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and updates the inode item in the fs tree with a size of 1Mb (which is the value of disk_i_size). 4) The dealloc for the range [0, 256Kb[ did not start yet. 5) The transaction used in the DIO ordered extent completion, which updated the inode item, is committed by the snapshotting task. 6) Snapshot creation completes. 7) Dealloc for the range [0, 256Kb[ is flushed. After that when reading the file from the snapshot we always get zeroes for the range [0, 256Kb[, the file has a size of 1Mb and the data written by the direct IO write is found. From an application's point of view this is a corruption, since in the source subvolume it could never read a version of the file that included the data from the direct IO write without the data from the buffered write included as well. In the snapshot's tree, file extent items are missing for the range [0, 256Kb[. The issue, obviously, does not happen when using the -o flushoncommit mount option. Fix this by flushing delalloc for all the roots that are about to be snapshotted when committing a transaction. This guarantees total ordering when updating the disk_i_size of an inode since the flush for dealloc is done when a transaction is in the TRANS_STATE_COMMIT_START state and wait is done once no more external writers exist. This is similar to what we do when using the flushoncommit mount option, but we do it only if the transaction has snapshots to create and only for the roots of the subvolumes to be snapshotted. The bulk of the dealloc is flushed in the snapshot creation ioctl, so the flush work we do inside the transaction is minimized. This issue, involving buffered and direct IO writes with snapshotting, is often triggered by fstest btrfs/078, and got reported by fsck when not using the NO_HOLES features, for example: $ cat results/btrfs/078.full (...) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 258 inode 264 errors 100, file extent discount Found file extent holes: start: 524288, len: 65536 ERROR: errors found in fs roots Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox3-10/+18
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02Fix aio_poll() racesAl Viro1-50/+40
commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream. aio_poll() has to cope with several unpleasant problems: * requests that might stay around indefinitely need to be made visible for io_cancel(2); that must not be done to a request already completed, though. * in cases when ->poll() has placed us on a waitqueue, wakeup might have happened (and request completed) before ->poll() returns. * worse, in some early wakeup cases request might end up re-added into the queue later - we can't treat "woken up and currently not in the queue" as "it's not going to stick around indefinitely" * ... moreover, ->poll() might have decided not to put it on any queues to start with, and that needs to be distinguished from the previous case * ->poll() might have tried to put us on more than one queue. Only the first will succeed for aio poll, so we might end up missing wakeups. OTOH, we might very well notice that only after the wakeup hits and request gets completed (all before ->poll() gets around to the second poll_wait()). In that case it's too late to decide that we have an error. req->woken was an attempt to deal with that. Unfortunately, it was broken. What we need to keep track of is not that wakeup has happened - the thing might come back after that. It's that async reference is already gone and won't come back, so we can't (and needn't) put the request on the list of cancellables. The easiest case is "request hadn't been put on any waitqueues"; we can tell by seeing NULL apt.head, and in that case there won't be anything async. We should either complete the request ourselves (if vfs_poll() reports anything of interest) or return an error. In all other cases we get exclusion with wakeups by grabbing the queue lock. If request is currently on queue and we have something interesting from vfs_poll(), we can steal it and complete the request ourselves. If it's on queue and vfs_poll() has not reported anything interesting, we either put it on the cancellable list, or, if we know that it hadn't been put on all queues ->poll() wanted it on, we steal it and return an error. If it's _not_ on queue, it's either been already dealt with (in which case we do nothing), or there's aio_poll_complete_work() about to be executed. In that case we either put it on the cancellable list, or, if we know it hadn't been put on all queues ->poll() wanted it on, simulate what cancel would've done. It's a lot more convoluted than I'd like it to be. Single-consumer APIs suck, and unfortunately aio is not an exception... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: store event at final iocb_put()Al Viro1-16/+17
commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream. Instead of having aio_complete() set ->ki_res.{res,res2}, do that explicitly in its callers, drop the reference (as aio_complete() used to do) and delay the rest until the final iocb_put(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: keep io_event in aio_kiocbAl Viro1-18/+13
commit a9339b7855094ba11a97e8822ae038135e879e79 upstream. We want to separate forming the resulting io_event from putting it into the ring buffer. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: fold lookup_kiocb() into its sole callerAl Viro1-22/+7
commit 833f4154ed560232120bc475935ee1d6a20e159f upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02pin iocb through aio.Linus Torvalds1-16/+21
commit b53119f13a04879c3bf502828d99d13726639ead upstream. aio_poll() is not the only case that needs file pinned; worse, while aio_read()/aio_write() can live without pinning iocb itself, the proof is rather brittle and can easily break on later changes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.Tetsuo Handa1-1/+2
commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream. syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr(). Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET. [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ext4: fix some error pointer dereferencesDan Carpenter1-0/+3
commit 7159a986b4202343f6cca3bb8079ecace5816fd6 upstream. We can't pass error pointers to brelse(). Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02nfsd: wake blocked file lock waiters before sending callbackJeff Layton1-0/+9
commit f456458e4d25a8962d0946891617c76cc3ff5fb9 upstream. When a blocked NFS lock is "awoken" we send a callback to the server and then wake any hosts waiting on it. If a client attempts to get a lock and then drops off the net, we could end up waiting for a long time until we end up waking locks blocked on that request. So, wake any other waiting lock requests before sending the callback. Do this by calling locks_delete_block in a new "prepare" phase for CB_NOTIFY_LOCK callbacks. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02nfsd: wake waiters blocked on file_lock before deleting itJeff Layton1-2/+1
commit 6aaafc43a4ecc5bc8a3f6a2811d5eddc996a97f3 upstream. After a blocked nfsd file_lock request is deleted, knfsd will send a callback to the client and then free the request. Commit 16306a61d3b7 ("fs/locks: always delete_block after waiting.") changed it such that locks_delete_block is always called on a request after it is awoken, but that patch missed fixing up blocked nfsd request handling. Call locks_delete_block on the block to wake up any locks still blocked on the nfsd lock request before freeing it. Some of its callers already do this however, so just remove those calls. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02nfsd: Don't release the callback slot unless it was actually heldTrond Myklebust2-1/+8
commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream. If there are multiple callbacks queued, waiting for the callback slot when the callback gets shut down, then they all currently end up acting as if they hold the slot, and call nfsd4_cb_sequence_done() resulting in interesting side-effects. In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done() causes a loop back to nfsd4_cb_prepare() without first freeing the slot, which causes a deadlock when nfsd41_cb_get_slot() gets called a second time. This patch therefore adds a boolean to track whether or not the callback did pick up the slot, so that it can do the right thing in these 2 cases. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: fix ci->i_head_snapc leakYan, Zheng2-1/+15
commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream. We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224 Reported-and-tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: ensure d_name stability in ceph_dentry_hash()Jeff Layton1-1/+5
commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream. Take the d_lock here to ensure that d_name doesn't change. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: only use d_name directly when parent is lockedJeff Layton1-11/+50
commit 1bcb344086f3ecf8d6705f6d708441baa823beb3 upstream. Ben reported tripping the BUG_ON in create_request_message during some performance testing. Analysis of the vmcore showed that the length of the r_dentry->d_name string changed after we allocated the buffer, but before we encoded it. build_dentry_path returns pointers to d_name in the common case of non-snapped dentries, but this optimization isn't safe unless the parent directory is locked. When it isn't, have the code make a copy of the d_name while holding the d_lock. Cc: stable@vger.kernel.org Reported-by: Ben England <bengland@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02tracing: Fix buffer_ref pipe opsJann Horn1-2/+2
commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream. This fixes multiple issues in buffer_pipe_buf_ops: - The ->steal() handler must not return zero unless the pipe buffer has the only reference to the page. But generic_pipe_buf_steal() assumes that every reference to the pipe is tracked by the page's refcount, which isn't true for these buffers - buffer_pipe_buf_get(), which duplicates a buffer, doesn't touch the page's refcount. Fix it by using generic_pipe_buf_nosteal(), which refuses every attempted theft. It should be easy to actually support ->steal, but the only current users of pipe_buf_steal() are the virtio console and FUSE, and they also only use it as an optimization. So it's probably not worth the effort. - The ->get() and ->release() handlers can be invoked concurrently on pipe buffers backed by the same struct buffer_ref. Make them safe against concurrency by using refcount_t. - The pointers stored in ->private were only zeroed out when the last reference to the buffer_ref was dropped. As far as I know, this shouldn't be necessary anyway, but if we do it, let's always do it. Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02cifs: do not attempt cifs operation on smb2+ rename errorFrank Sorenson1-0/+4
commit 652727bbe1b17993636346716ae5867627793647 upstream. A path-based rename returning EBUSY will incorrectly try opening the file with a cifs (NT Create AndX) operation on an smb2+ mount, which causes the server to force a session close. If the mount is smb2+, skip the fallback. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02cifs: fix page reference leak with readv/writevJérôme Glisse2-15/+23
commit 13f5938d8264b5501368523c4513ff26608a33e8 upstream. CIFS can leak pages reference gotten through GUP (get_user_pages*() through iov_iter_get_pages()). This happen if cifs_send_async_read() or cifs_write_from_iter() calls fail from within __cifs_readv() and __cifs_writev() respectively. This patch move page unreference to cifs_aio_ctx_release() which will happens on all code paths this is all simpler to follow for correctness. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02cifs: fix memory leak in SMB2_readRonnie Sahlberg1-0/+1
[ Upstream commit 05fd5c2c61732152a6bddc318aae62d7e436629b ] Commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e introduced a leak where if SMB2_read() returned an error we would return without freeing the request buffer. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02fs/proc/proc_sysctl.c: Fix a NULL pointer dereferenceYueHaibing1-2/+4
[ Upstream commit 89189557b47b35683a27c80ee78aef18248eefb4 ] Syzkaller report this: sysctl could not get directory: /net//bridge -12 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline] RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline] RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline] RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459 Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48 RSP: 0018:ffff8881bb507778 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568 RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4 R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: erase_entry fs/proc/proc_sysctl.c:178 [inline] erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207 start_unregistering fs/proc/proc_sysctl.c:331 [inline] drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0x68/0x1000 [br_netfilter] do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 68741688d5fbfe85 ]--- commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") forgot to handle start_unregistering() case, while header->parent is NULL, it calls erase_header() and as seen in the above syzkaller call trace, accessing &header->parent->root will trigger a NULL pointer dereference. As that commit explained, there is also no need to call start_unregistering() if header->parent is NULL. Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli2-0/+27
dumping commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27fs/dax: Deposit pagetable even when installing zero pageAneesh Kumar K.V1-0/+15
commit 11cf9d863dcb583345723b0ed72173348761e9c0 upstream. Architectures like ppc64 use the deposited page table to store hardware page table slot information. Make sure we deposit a page table when using zero page at the pmd level for hash. Without this we hit Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc000000000082a74 Oops: Kernel access of bad area, sig: 11 [#1] .... NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0 LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0 Call Trace: hash_page_mm+0x43c/0x740 do_hash_page+0x2c/0x3c copy_from_iter_flushcache+0xa4/0x4a0 pmem_copy_from_iter+0x2c/0x50 [nd_pmem] dax_copy_from_iter+0x40/0x70 dax_iomap_actor+0x134/0x360 iomap_apply+0xfc/0x1b0 dax_iomap_rw+0xac/0x130 ext4_file_write_iter+0x254/0x460 [ext4] __vfs_write+0x120/0x1e0 vfs_write+0xd8/0x220 SyS_write+0x6c/0x110 system_call+0x3c/0x130 Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: <stable@vger.kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: fix handle leak in smb2_query_symlink()Ronnie Sahlberg1-0/+2
commit e6d0fb7b34f264f72c33053558a360a6a734905e upstream. If we enter smb2_query_symlink() for something that is not a symlink and where the SMB2_open() would succeed we would never end up closing this handle and would thus leak a handle on the server. Fix this by immediately calling SMB2_close() on successfull open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: Fix use-after-free in SMB2_readZhangXiaoxu1-2/+2
commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_read+0x1136/0x1190 Read of size 8 at addr ffff8880b4e45e50 by task ln/1009 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> 4.18+ Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: Fix use-after-free in SMB2_writeZhangXiaoxu1-1/+1
commit 6a3eb3360667170988f8a6477f6686242061488a upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_write+0x1342/0x1580 Read of size 8 at addr ffff8880b6a8e450 by task ln/4196 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> 4.18+ Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: Fix lease buffer length errorZhangXiaoxu1-1/+4
commit b57a55e2200ede754e4dc9cce4ba9402544b9365 upstream. There is a KASAN slab-out-of-bounds: BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0 Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539 CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xdd/0x12a print_address_description+0xa7/0x540 kasan_report+0x1ff/0x550 check_memory_region+0x2f1/0x310 memcpy+0x2f/0x80 _copy_from_iter_full+0x783/0xaa0 tcp_sendmsg_locked+0x1840/0x4140 tcp_sendmsg+0x37/0x60 inet_sendmsg+0x18c/0x490 sock_sendmsg+0xae/0x130 smb_send_kvec+0x29c/0x520 __smb_send_rqst+0x3ef/0xc60 smb_send_rqst+0x25a/0x2e0 compound_send_recv+0x9e8/0x2af0 cifs_send_recv+0x24/0x30 SMB2_open+0x35e/0x1620 open_shroot+0x27b/0x490 smb2_open_op_close+0x4e1/0x590 smb2_query_path_info+0x2ac/0x650 cifs_get_inode_info+0x1058/0x28f0 cifs_root_iget+0x3bb/0xf80 cifs_smb3_do_mount+0xe00/0x14c0 cifs_do_mount+0x15/0x20 mount_fs+0x5e/0x290 vfs_kern_mount+0x88/0x460 do_mount+0x398/0x31e0 ksys_mount+0xc6/0x150 __x64_sys_mount+0xea/0x190 do_syscall_64+0x122/0x590 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It can be reproduced by the following step: 1. samba configured with: server max protocol = SMB2_10 2. mount -o vers=default When parse the mount version parameter, the 'ops' and 'vals' was setted to smb30, if negotiate result is smb21, just update the 'ops' to smb21, but the 'vals' is still smb30. When add lease context, the iov_base is allocated with smb21 ops, but the iov_len is initiallited with the smb30. Because the iov_len is longer than iov_base, when send the message, copy array out of bounds. we need to keep the 'ops' and 'vals' consistent. Fixes: 9764c02fcbad ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)") Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27CIFS: keep FileInfo handle live during oplock breakAurelien Aptel4-10/+53
commit b98749cac4a695f084a5ff076f4510b23e353ecd upstream. In the oplock break handler, writing pending changes from pages puts the FileInfo handle. If the refcount reaches zero it closes the handle and waits for any oplock break handler to return, thus causing a deadlock. To prevent this situation: * We add a wait flag to cifsFileInfo_put() to decide whether we should wait for running/pending oplock break handlers * We keep an additionnal reference of the SMB FileInfo handle so that for the rest of the handler putting the handle won't close it. - The ref is bumped everytime we queue the handler via the cifs_queue_oplock_break() helper. - The ref is decremented at the end of the handler This bug was triggered by xfstest 464. Also important fix to address the various reports of oops in smb2_push_mandatory_locks Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20f2fs: fix to add refcount once page is tagged PG_privateChao Yu6-23/+36
[ Upstream commit 240a59156d9bcfabceddb66be449e7b32fb5dc4a ] As Gao Xiang reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202749 f2fs may skip pageout() due to incorrect page reference count. The problem here is that MM defined the rule [1] very clearly that once page was set with PG_private flag, we should increment the refcount in that page, also main flows like pageout(), migrate_page() will assume there is one additional page reference count if page_has_private() returns true. But currently, f2fs won't add/del refcount when changing PG_private flag. Anyway, f2fs should follow MM's rule to make MM's related flows running as expected. [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/ Reported-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: fix to use kvfree instead of kzfreeChao Yu1-5/+5
[ Upstream commit 2a6a7e722e7a78d774ce02b847c5b183a3ff2672 ] As Jiqun Li reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202747 System can panic due to using wrong allocate/free function pair in xattr interface: - use kvmalloc to allocate memory - use kzfree to free memory Let's fix to use kvfree instead of kzfree, BTW, we are safe to get rid of kzfree, since there is no such confidential data stored as xattr, we don't need to zero it before free memory. Fixes: 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") Reported-by: Jiqun Li <jiqun.li@unisoc.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: fix to dirty inode for i_mode recoveryChao Yu1-4/+1
[ Upstream commit ca597bddedd94906cd761d8be6a3ad21292725de ] As Seulbae Kim reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202637 We didn't recover permission field correctly after sudden power-cut, the reason is in setattr we didn't add inode into global dirty list once i_mode is changed, so latter checkpoint triggered by fsync will not flush last i_mode into disk, result in this problem, fix it. Reported-by: Seulbae Kim <seulbae@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20cifs: return -ENODATA when deleting an xattr that does not existRonnie Sahlberg1-0/+10
[ Upstream commit 2109464184919f81efd593b4008291448c522815 ] BUGZILLA: https://bugzilla.kernel.org/show_bug.cgi?id=202007 When deleting an xattr/EA: SMB2/3 servers will return SUCCESS when clients delete non-existing EAs. This means that we need to first QUERY the server and check if the EA exists or not so that we can return -ENODATA correctly when this happens. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: sync filesystem after roll-forward recoveryJaegeuk Kim3-14/+39
[ Upstream commit 812a95977fd2f0d1f220c716a98a7f22e22f488d ] Some works after roll-forward recovery can get an error which will release all the data structures. Let's flush them in order to make it clean. One possible corruption came from: [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null) [ 90.675349] Call trace: [ 90.677869] __list_del_entry_valid+0x94/0xb4 [ 90.682351] remove_dirty_inode+0xac/0x114 [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8 [ 90.691302] f2fs_write_data_pages+0x40/0x4c [ 90.695695] do_writepages+0x80/0xf0 [ 90.699372] __writeback_single_inode+0xdc/0x4ac [ 90.704113] writeback_sb_inodes+0x280/0x440 [ 90.708501] wb_writeback+0x1b8/0x3d0 [ 90.712267] wb_workfn+0x1a8/0x4d4 [ 90.715765] process_one_work+0x1c0/0x3d4 [ 90.719883] worker_thread+0x224/0x344 [ 90.723739] kthread+0x120/0x130 [ 90.727055] ret_from_fork+0x10/0x18 Reported-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: prohibit fstrim in norecovery modeDarrick J. Wong1-0/+7
[ Upstream commit 18915b5873f07e5030e6fb108a050fa7c71c59fb ] The ext4 fstrim implementation uses the block bitmaps to find free space that can be discarded. If we haven't replayed the journal, the bitmaps will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20x86/gart: Exclude GART aperture from kcoreKairui Song1-0/+27
[ Upstream commit ffc8599aa9763f39f6736a79da4d1575e7006f9a ] On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20cifs: Fix slab-out-of-bounds when tracing SMB tconPaulo Alcantara (SUSE)1-3/+3
[ Upstream commit 68ddb496800acdb46172b4981dc3753ea9b39c25 ] This patch fixes the following KASAN report: [ 779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180 [ 779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812 [ 779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62 [ 779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014 [ 779.044761] Call Trace: [ 779.044769] dump_stack+0x5b/0x90 [ 779.044775] ? string+0xab/0x180 [ 779.044781] print_address_description+0x6c/0x23c [ 779.044787] ? string+0xab/0x180 [ 779.044792] ? string+0xab/0x180 [ 779.044797] kasan_report.cold.3+0x1a/0x32 [ 779.044803] ? string+0xab/0x180 [ 779.044809] string+0xab/0x180 [ 779.044816] ? widen_string+0x160/0x160 [ 779.044822] ? vsnprintf+0x5bf/0x7f0 [ 779.044829] vsnprintf+0x4e7/0x7f0 [ 779.044836] ? pointer+0x4a0/0x4a0 [ 779.044841] ? seq_buf_vprintf+0x79/0xc0 [ 779.044848] seq_buf_vprintf+0x62/0xc0 [ 779.044855] trace_seq_printf+0x113/0x210 [ 779.044861] ? trace_seq_puts+0x110/0x110 [ 779.044867] ? trace_raw_output_prep+0xd8/0x110 [ 779.044876] trace_raw_output_smb3_tcon_class+0x9f/0xc0 [ 779.044882] print_trace_line+0x377/0x890 [ 779.044888] ? tracing_buffers_read+0x300/0x300 [ 779.044893] ? ring_buffer_read+0x58/0x70 [ 779.044899] s_show+0x6e/0x140 [ 779.044906] seq_read+0x505/0x6a0 [ 779.044913] vfs_read+0xaf/0x1b0 [ 779.044919] ksys_read+0xa1/0x130 [ 779.044925] ? kernel_write+0xa0/0xa0 [ 779.044931] ? __do_page_fault+0x3d5/0x620 [ 779.044938] do_syscall_64+0x63/0x150 [ 779.044944] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.044949] RIP: 0033:0x7f62c2c2db31 [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02 02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [ 779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003 [ 779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710 [ 779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000 [ 779.044981] Allocated by task 1257: [ 779.044987] __kasan_kmalloc.constprop.5+0xc1/0xd0 [ 779.044992] kmem_cache_alloc+0xad/0x1a0 [ 779.044997] getname_flags+0x6c/0x2a0 [ 779.045003] user_path_at_empty+0x1d/0x40 [ 779.045008] do_faccessat+0x12a/0x330 [ 779.045012] do_syscall_64+0x63/0x150 [ 779.045017] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045019] Freed by task 1257: [ 779.045023] __kasan_slab_free+0x12e/0x180 [ 779.045029] kmem_cache_free+0x85/0x1b0 [ 779.045034] filename_lookup.part.70+0x176/0x250 [ 779.045039] do_faccessat+0x12a/0x330 [ 779.045043] do_syscall_64+0x63/0x150 [ 779.045048] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045052] The buggy address belongs to the object at ffff88814f326600 which belongs to the cache names_cache of size 4096 [ 779.045057] The buggy address is located 872 bytes to the right of 4096-byte region [ffff88814f326600, ffff88814f327600) [ 779.045058] The buggy address belongs to the page: [ 779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0 [ 779.045067] flags: 0x200000000010200(slab|head) [ 779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40 [ 779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 779.045083] page dumped because: kasan: bad access detected [ 779.045085] Memory state around the buggy address: [ 779.045089] ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045093] ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045099] ^ [ 779.045103] ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045107] ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045109] ================================================================== [ 779.045110] Disabling lock debugging due to kernel taint Correctly assign tree name str for smb3_tcon event. Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20fix incorrect error code mapping for OBJECTID_NOT_FOUNDSteve French1-1/+2
[ Upstream commit 85f9987b236cf46e06ffdb5c225cf1f3c0acb789 ] It was mapped to EIO which can be confusing when user space queries for an object GUID for an object for which the server file system doesn't support (or hasn't saved one). As Amir Goldstein suggested this is similar to ENOATTR (equivalently ENODATA in Linux errno definitions) so changing NT STATUS code mapping for OBJECTID_NOT_FOUND to ENODATA. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20cifs: fix that return -EINVAL when do dedupe operationXiaoli Feng1-1/+1
[ Upstream commit b073a08016a10f01dfb0d0b6c7fa89da0d544963 ] dedupe_file_range operations is combiled into remap_file_range. But it's always skipped for dedupe operations in function cifs_remap_file_range. Example to test: Before this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Invalid argument After this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Operation not supported Influence for xfstests: generic/091 generic/112 generic/127 generic/263 These tests report this error "do_copy_range:: Invalid argument" instead of "FIDEDUPERANGE: Invalid argument". Because there are still two bugs cause these test failed. https://bugzilla.kernel.org/show_bug.cgi?id=202935 https://bugzilla.kernel.org/show_bug.cgi?id=202785 Signed-off-by: Xiaoli Feng <fengxiaoli0714@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>