summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-05-23ext4: set h_journal if there is a failure starting a reserved handleTheodore Ts'o1-0/+1
[ Upstream commit b2569260d55228b617bd82aba6d0db2faeeb4116 ] If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f36829b ("jbd2: transaction reservation support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23jbd2: fix use after free in kjournald2()Sahitya Tummala1-1/+1
[ Upstream commit dbfcef6b0f4012c57bc0b6e0e660d5ed12a5eaed ] Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2(). TASK 1: umount cmd: |--jbd2_journal_destroy() { |--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } |--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. } Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23cifs: do not allow creating sockets except with SMB1 posix exensionsSteve French1-4/+5
[ Upstream commit 1d0cffa674cfa7d185a302c8c6850fc50b893bed ] RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23fanotify: fix logic of events on childAmir Goldstein1-19/+15
[ Upstream commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 ] When event on child inodes are sent to the parent inode mark and parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event will not be delivered to the listener process. However, if the same process also has a mount mark, the event to the parent inode will be delivered regadless of the mount mark mask. This behavior is incorrect in the case where the mount mark mask does not contain the specific event type. For example, the process adds a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD) and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR). A modify event on a file inside that directory (and inside that mount) should not create a FAN_MODIFY event, because neither of the marks requested to get that event on the file. Fixes: 1968f5eed54c ("fanotify: use both marks when possible") Cc: stable <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23ext4: bugfix for mmaped pages in mpage_release_unused_pages()wangguang1-0/+2
[ Upstream commit 4e800c0359d9a53e6bf0ab216954971b2515247f ] Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: (1) mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); (2) return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3)Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Cc: stable@vger.kernel.org Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23autofs: mount point create should honour passed in modeIan Kent1-1/+1
[ Upstream commit 1e6306652ba18723015d1b4967fe9de55f042499 ] The autofs file system mkdir inode operation blindly sets the created directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can cause selinux dac_override denials. But the function also checks if the caller is the daemon (as no-one else should be able to do anything here) so there's no point in not honouring the passed in mode, allowing the daemon to set appropriate mode when required. Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23Don't leak MNT_INTERNAL away from internal mountsAl Viro1-1/+2
[ Upstream commit 16a34adb9392b2fe4195267475ab5b472e55292c ] We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. As it is, creating a deep stack of bindings of /proc/*/ns/* somewhere in a new namespace and exiting yields a stack overflow. Cc: stable@kernel.org Reported-by: Alexander Aring <aring@mojatatu.com> Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23jffs2_kill_sb(): deal with failed allocationsAl Viro1-1/+1
[ Upstream commit c66b23c2840446a82c389e4cb1a12eb2a71fa2e4 ] jffs2_fill_super() might fail to allocate jffs2_sb_info; jffs2_kill_sb() must survive that. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23ext4: don't allow r/w mounts if metadata blocks overlap the superblockTheodore Ts'o1-0/+6
[ Upstream commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 ] If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23jbd2: if the journal is aborted then don't allow update of the log tailTheodore Ts'o1-1/+4
[ Upstream commit 85e0c4e89c1b864e763c4e3bb15d0b6d501ad5d9 ] This updates the jbd2 superblock unnecessarily, and on an abort we shouldn't truncate the log. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23fs/reiserfs/journal.c: add missing resierfs_warning() argAndrew Morton1-1/+1
[ Upstream commit 9ad553abe66f8be3f4755e9fa0a6ba137ce76341 ] One use of the reiserfs_warning() macro in journal_init_dev() is missing a parameter, causing the following warning: REISERFS warning (device loop0): journal_init_dev: Cannot open '%s': %i journal_init_dev: This also causes a WARN_ONCE() warning in the vsprintf code, and then a panic if panic_on_warn is set. Please remove unsupported %/ in format string WARNING: CPU: 1 PID: 4480 at lib/vsprintf.c:2138 format_decode+0x77f/0x830 lib/vsprintf.c:2138 Kernel panic - not syncing: panic_on_warn set ... Just add another string argument to the macro invocation. Addresses https://syzkaller.appspot.com/bug?id=0627d4551fdc39bf1ef5d82cd9eef587047f7718 Link: http://lkml.kernel.org/r/d678ebe1-6f54-8090-df4c-b9affad62293@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: <syzbot+6bd77b88c1977c03f584@syzkaller.appspotmail.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23ubifs: Check ubifs_wbuf_sync() return codeRichard Weinberger1-4/+10
[ Upstream commit aac17948a7ce01fb60b9ee6cf902967a47b3ce26 ] If ubifs_wbuf_sync() fails we must not write a master node with the dirty marker cleared. Otherwise it is possible that in case of an IO error while syncing we mark the filesystem as clean and UBIFS refuses to recover upon next mount. Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23getname_kernel() needs to make sure that ->name != ->iname in long caseAl Viro1-1/+2
[ Upstream commit 30ce4d1903e1d8a7ccd110860a5eef3c638ed8be ] missed it in "kill struct filename.separate" several years ago. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23ovl: filter trusted xattr for non-adminMiklos Szeredi1-1/+11
[ Upstream commit a082c6f680da298cf075886ff032f32ccb7c5e1a ] Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()Eryu Guan1-1/+1
[ Upstream commit 624327f8794704c5066b11a52f9da6a09dce7f9a ] ext4_find_unwritten_pgoff() is used to search for offset of hole or data in page range [index, end] (both inclusive), and the max number of pages to search should be at least one, if end == index. Otherwise the only page is missed and no hole or data is found, which is not correct. When block size is smaller than page size, this can be demonstrated by preallocating a file with size smaller than page size and writing data to the last block. E.g. run this xfs_io command on a 1k block size ext4 on x86_64 host. # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \ -c "seek -d 0" /mnt/ext4/testfile wrote 1024/1024 bytes at offset 2048 1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec) Whence Result DATA EOF Data at offset 2k was missed, and lseek(2) returned ENXIO. This is unconvered by generic/285 subtest 07 and 08 on ppc64 host, where pagesize is 64k. Because a recent change to generic/285 reduced the preallocated file size to smaller than 64k. Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()Dan Carpenter1-0/+1
[ Upstream commit 662f9a105b4322b8559d448f86110e6ec24b8738 ] If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime, but obviously we intended to set an error code here. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23btrfs: fix incorrect error return ret being passed to mapping_set_errorColin Ian King1-1/+1
[ Upstream commit bff5baf8aa37a97293725a16c03f49872249c07e ] The setting of return code ret should be based on the error code passed into function end_extent_writepage and not on ret. Thanks to Liu Bo for spotting this mistake in the original fix I submitted. Detected by CoverityScan, CID#1414312 ("Logically dead code") Fixes: 5dca6eea91653e ("Btrfs: mark mapping with error flag to report errors to userspace") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23SMB2: Fix share type handlingChristophe JAILLET1-5/+9
[ Upstream commit cd1230070ae1c12fd34cf6a557bfa81bf9311009 ] In fs/cifs/smb2pdu.h, we have: #define SMB2_SHARE_TYPE_DISK 0x01 #define SMB2_SHARE_TYPE_PIPE 0x02 #define SMB2_SHARE_TYPE_PRINT 0x03 Knowing that, with the current code, the SMB2_SHARE_TYPE_PRINT case can never trigger and printer share would be interpreted as disk share. So, test the ShareType value for equality instead. Fixes: faaf946a7d5b ("CIFS: Add tree connect/disconnect capability for SMB2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23CIFS: silence lockdep splat in cifs_relock_file()Rabin Vincent1-1/+1
[ Upstream commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d ] cifs_relock_file() can perform a down_write() on the inode's lock_sem even though it was already performed in cifs_strict_readv(). Lockdep complains about this. AFAICS, there is no problem here, and lockdep just needs to be told that this nesting is OK. ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #20 Not tainted --------------------------------------------- cat/701 is trying to acquire lock: (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 but task is already holding lock: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&cifsi->lock_sem); lock(&cifsi->lock_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by cat/701: #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 stack backtrace: CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20 Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? trace_hardirqs_on_thunk+0x1a/0x1c ? preempt_schedule_irq+0x6b/0x80 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? cifs_reopen_file+0x7a7/0xc00 down_read+0x2d/0x70 ? cifs_reopen_file+0x7a7/0xc00 cifs_reopen_file+0x7a7/0xc00 ? printk+0x43/0x4b cifs_readpage_worker+0x327/0x8a0 cifs_readpage+0x8c/0x2a0 generic_file_read_iter+0x692/0xd00 cifs_strict_readv+0x29f/0x310 generic_file_splice_read+0x11c/0x1c0 do_splice_to+0xa5/0xc0 splice_direct_to_actor+0xfa/0x350 ? generic_pipe_buf_nosteal+0x10/0x10 do_splice_direct+0xb5/0xe0 do_sendfile+0x278/0x3a0 SyS_sendfile64+0xc4/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23lockd: fix lockd shutdown raceJ. Bruce Fields1-2/+4
[ Upstream commit efda760fe95ea15291853c8fa9235c32d319cd98 ] As reported by David Jeffery: "a signal was sent to lockd while lockd was shutting down from a request to stop nfs. The signal causes lockd to call restart_grace() which puts the lockd_net structure on the grace list. If this signal is received at the wrong time, it will occur after lockd_down_net() has called locks_end_grace() but before lockd_down_net() stops the lockd thread. This leads to lockd putting the lockd_net structure back on the grace list, then exiting without anything removing it from the list." So, perform the final locks_end_grace() from the the lockd thread; this ensures it's serialized with respect to restart_grace(). Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust2-4/+13
[ Upstream commit 0048fdd06614a4ea088f9fcad11511956b795698 ] If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23fs: compat: Remove warning from COMPATIBLE_IOCTLMark Charlebois1-1/+1
[ Upstream commit 9280cdd6fe5b8287a726d24cc1d558b96c8491d7 ] cmd in COMPATIBLE_IOCTL is always a u32, so cast it so there isn't a warning about an overflow in XFORM. From: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23staging: ncpfs: memory corruption in ncp_read_kernel()Dan Carpenter1-0/+4
[ Upstream commit 4c41aa24baa4ed338241d05494f2c595c885af8f ] If the server is malicious then *bytes_read could be larger than the size of the "target" buffer. It would lead to memory corruption when we do the memcpy(). Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23nfsd4: permit layoutget of executable-only filesBenjamin Coddington1-3/+3
[ Upstream commit 66282ec1cf004c09083c29cb5e49019037937bbd ] Clients must be able to read a file in order to execute it, and for pNFS that means the client needs to be able to perform a LAYOUTGET on the file. This behavior for executable-only files was added for OPEN in commit a043226bc140 "nfsd4: permit read opens of executable-only files". This fixes up xfstests generic/126 on block/scsi layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23cifs: small underflow in cnvrtDosUnixTm()Dan Carpenter1-3/+3
[ Upstream commit 564277eceeca01e02b1ef3e141cfb939184601b4 ] January is month 1. There is no zero-th month. If someone passes a zero month then it means we read from one space before the start of the total_days_of_prev_months[] array. We may as well also be strict about days as well. Fixes: 1bd5bbcb6531 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23Btrfs: send, fix file hole not being preserved due to inline extentFilipe Manana1-2/+21
[ Upstream commit e1cbfd7bf6dabdac561c75d08357571f44040a45 ] Normally we don't have inline extents followed by regular extents, but there's currently at least one harmless case where this happens. For example, when the page size is 4Kb and compression is enabled: $ mkfs.btrfs -f /dev/sdb $ mount -o compress /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "fsync" /mnt/foobar $ xfs_io -c "pwrite -S 0xbb 8K 4K" -c "fsync" /mnt/foobar In this case we get a compressed inline extent, representing 4Kb of data, followed by a hole extent and then a regular data extent. The inline extent was not expanded/converted to a regular extent exactly because it represents 4Kb of data. This does not cause any apparent problem (such as the issue solved by commit e1699d2d7bf6 ("btrfs: add missing memset while reading compressed inline extents")) except trigger an unexpected case in the incremental send code path that makes us issue an operation to write a hole when it's not needed, resulting in more writes at the receiver and wasting space at the receiver. So teach the incremental send code to deal with this particular case. The issue can be currently triggered by running fstests btrfs/137 with compression enabled (MOUNT_OPTIONS="-o compress" ./check btrfs/137). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23NFS: don't try to cross a mountpount when there isn't one there.NeilBrown1-4/+20
[ Upstream commit 99bbf6ecc694dfe0b026e15359c5aa2a60b97a93 ] consider the sequence of commands: mkdir -p /import/nfs /import/bind /import/etc mount --bind / /import/bind mount --make-private /import/bind mount --bind /import/etc /import/bind/etc exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/ mount -o vers=4 localhost:/ /import/nfs ls -l /import/nfs/etc You would not expect this to report a stale file handle. Yet it does. The manipulations under /import/bind cause the dentry for /etc to get the DCACHE_MOUNTED flag set, even though nothing is mounted on /etc. This causes nfsd to call nfsd_cross_mnt() even though there is no mountpoint. So an upcall to mountd for "/etc" is performed. The 'crossmnt' flag on the export of / causes mountd to report that /etc is exported as it is a descendant of /. It assumes the kernel wouldn't ask about something that wasn't a mountpoint. The filehandle returned identifies the filesystem and the inode number of /etc. When this filehandle is presented to rpc.mountd, via "nfsd.fh", the inode cannot be found associated with any name in /etc/exports, or with any mountpoint listed by getmntent(). So rpc.mountd says the filehandle doesn't exist. Hence ESTALE. This is fixed by teaching nfsd not to trust DCACHE_MOUNTED too much. It is just a hint, not a guarantee. Change nfsd_mountpoint() to return '1' for a certain mountpoint, '2' for a possible mountpoint, and 0 otherwise. Then change nfsd_crossmnt() to check if follow_down() actually found a mountpount and, if not, to avoid performing a lookup if the location is not known to certainly require an export-point. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()Benjamin Coddington1-2/+4
[ Upstream commit 43b7d964ed30dbca5c83c90cb010985b429ec4f9 ] Commit a7d42ddb3099727f58366fa006f850a219cce6c8 ("nfs: add mirroring support to pgio layer") moved pg_cleanup out of the path when there was non-sequental I/O that needed to be flushed. The result is that for layouts that have more than one layout segment per file, the pg_lseg is not cleared, so we can end up hitting the WARN_ON_ONCE(req_start >= seg_end) in pnfs_generic_pg_test since the pg_lseg will be pointing to that previously-flushed layout segment. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23CIFS: Enable encryption during session setup phasePavel Shilovsky2-19/+11
[ Upstream commit cabfb3680f78981d26c078a26e5c748531257ebb ] In order to allow encryption on SMB connection we need to exchange a session key and generate encryption and decryption keys. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23SMB3: Validate negotiate request must always be signedSteve French1-0/+3
[ Upstream commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd ] According to MS-SMB2 3.2.55 validate_negotiate request must always be signed. Some Windows can fail the request if you send it unsigned See kernel bugzilla bug 197311 CC: Stable <stable@vger.kernel.org> Acked-by: Ronnie Sahlberg <lsahlber.redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23btrfs: alloc_chunk: fix DUP stripe size handlingHans van Kranenburg1-5/+6
[ Upstream commit 92e222df7b8f05c565009c7383321b593eca488b ] In case of using DUP, we search for enough unallocated disk space on a device to hold two stripes. The devices_info[ndevs-1].max_avail that holds the amount of unallocated space found is directly assigned to stripe_size, while it's actually twice the stripe size. Later on in the code, an unconditional division of stripe_size by dev_stripes corrects the value, but in the meantime there's a check to see if the stripe_size does not exceed max_chunk_size. Since during this check stripe_size is twice the amount as intended, the check will reduce the stripe_size to max_chunk_size if the actual correct to be used stripe_size is more than half the amount of max_chunk_size. The unconditional division later tries to correct stripe_size, but will actually make sure we can't allocate more than half the max_chunk_size. Fix this by moving the division by dev_stripes before the max chunk size check, so it always contains the right value, instead of putting a duct tape division in further on to get it fixed again. Since in all other cases than DUP, dev_stripes is 1, this change only affects DUP. Other attempts in the past were made to fix this: * 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried to fix the same problem, but still resulted in part of the code acting on a wrongly doubled stripe_size value. * 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally broke this fix again. The real problem was already introduced with the rest of the code in 73c5de0051. The user visible result however will be that the max chunk size for DUP will suddenly double, while it's actually acting according to the limits in the code again like it was 5 years ago. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation") Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6") Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23fs/aio: Use RCU accessors for kioctx_table->table[]Tejun Heo1-10/+11
[ Upstream commit d0264c01e7587001a8c4608a5d1818dba9a4c11a ] While converting ioctx index from a list to a table, db446a08c23d ("aio: convert the ioctx list to table lookup v3") missed tagging kioctx_table->table[] as an array of RCU pointers and using the appropriate RCU accessors. This introduces a small window in the lookup path where init and access may race. Mark kioctx_table->table[] with __rcu and use the approriate RCU accessors when using the field. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3") Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23fs/aio: Add explicit RCU grace period when freeing kioctxTejun Heo1-4/+19
[ Upstream commit a6d7cff472eea87d96899a20fa718d2bab7109f3 ] While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat") incorrectly removed explicit RCU grace period before freeing kioctx. The intention seems to be depending on the internal RCU grace periods of percpu_ref; however, percpu_ref uses a different flavor of RCU, sched-RCU. This can lead to kioctx being freed while RCU read protected dereferences are still in progress. Fix it by updating free_ioctx() to go through call_rcu() explicitly. v2: Comment added to explain double bouncing. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: e34ecee2ae79 ("aio: Fix a trinity splat") Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23lock_parent() needs to recheck if dentry got __dentry_kill'ed under itAl Viro1-3/+8
[ Upstream commit 3b821409632ab778d46e807516b457dfa72736ed ] In case when dentry passed to lock_parent() is protected from freeing only by the fact that it's on a shrink list and trylock of parent fails, we could get hit by __dentry_kill() (and subsequent dentry_kill(parent)) between unlocking dentry and locking presumed parent. We need to recheck that dentry is alive once we lock both it and parent *and* postpone rcu_read_unlock() until after that point. Otherwise we could return a pointer to struct dentry that already is rcu-scheduled for freeing, with ->d_lock held on it; caller's subsequent attempt to unlock it can end up with memory corruption. Cc: stable@vger.kernel.org # 3.12+, counting backports Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23reiserfs: Make cancel_old_flush() reliableJan Kara3-7/+17
[ Upstream commit 71b0576bdb862e964a82c73327cdd1a249c53e67 ] Currently canceling of delayed work that flushes old data using cancel_old_flush() does not prevent work from being requeued. Thus in theory new work can be queued after cancel_old_flush() from reiserfs_freeze() has run. This will become larger problem once flush_old_commits() can requeue the work itself. Fix the problem by recording in sbi->work_queue that flushing work is canceled and should not be requeued. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-05-23NFS: Fix an incorrect type in struct nfs_direct_reqTrond Myklebust1-1/+1
[ Upstream commit d9ee65539d3eabd9ade46cca1780e3309ad0f907 ] The start offset needs to be of type loff_t. Fixed: 5fadeb47dcc5c ("nfs: count DIO good bytes correctly with mirroring") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-21btrfs: preserve i_mode if __btrfs_set_acl() failsErnesto A. Fernández1-1/+5
[ Upstream commit d7d824966530acfe32b94d1ed672e6fe1638cd68 ] When changing a file's acl mask, btrfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by restoring the original mode bits if __btrfs_set_acl fails. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-21btrfs: Don't clear SGID when inheriting ACLsJan Kara1-6/+7
[ Upstream commit b7f8a09f8097db776b8d160862540e4fc1f51296 ] When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __btrfs_set_acl() into btrfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: David Sterba <dsterba@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-21xfs: quota: check result of register_shrinker()Aliaksei Karaliou1-16/+29
[ Upstream commit 3a3882ff26fbdbaf5f7e13f6a0bccfbf7121041d ] xfs_qm_init_quotainfo() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> [darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-21xfs: quota: fix missed destroy of qi_tree_lockAliaksei Karaliou1-0/+1
[ Upstream commit 2196881566225f3c3428d1a5f847a992944daa5b ] xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock while destroys quotainfo->qi_quotaofflock. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-21sget(): handle failures of register_shrinker()Al Viro1-1/+5
[ Upstream commit 9ee332d99e4d5a97548943b81c54668450ce641b ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04ext4 crypto: don't regenerate the per-inode encryption key unnecessarilyTheodore Ts'o4-5/+19
[ Relevant upstream commit: 1b53cf9815bb4744958d41f3795d5d5a1d365e2d ] This fixes the same problem as upstream commit 1b53cf9815bb: "fscrypt: remove broken support for detecting keyring key revocation". Specifically, key revocations racing with readpage operations will cause the kernel to crash and burn with a BUG_ON or a NULL pointer dereference in a block I/O callback stemming from an ext4_readpage() operation. This fix is needed to fix prevent xfstests test runs from crashing while running the generic/421 test. The root cause is different in the 4.1 kernel, however, since the 4.1's encryption handling is so _primitive_ compared to later kernels. The code isn't actually explicitly checking for revoked keys. Instead, the code is neededly regenerating the per-file encryption key on every mmap() or open() or directory operation (in the case of a directory inode). Yelch! If the file is already opened and actively being read, and while an open() is racing with the read operations, after the user's master key has been revoked, there will be the same net effect as the problem fixed by upstream commit 1b53cf9815bb --- the per-file key will be marked as invalid and this will cause a BUG_ON. The AOSP 3.18 and 4.4 android-common kernels have a much more modern version of ext4 encryption have been backported, which incldes a backport of upstream commit 1b53cf9815bb. This is a (at least) dozen-plus commits, and isn't really suitable for the Upstream LTS kernel. So instead, this is the simplest bug which fixes the same high-level issue as the upstream commit, without dragging in all of the other non-bug-fix improvements to ext4 encryption found in newer kernels. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04fscrypto: add authorization check for setting encryption policyEric Biggers1-0/+3
commit 163ae1c6ad6299b19e22b4a35d5ab24a89791a98 upstream. On an ext4 or f2fs filesystem with file encryption supported, a user could set an encryption policy on any empty directory(*) to which they had readonly access. This is obviously problematic, since such a directory might be owned by another user and the new encryption policy would prevent that other user from creating files in their own directory (for example). Fix this by requiring inode_owner_or_capable() permission to set an encryption policy. This means that either the caller must own the file, or the caller must have the capability CAP_FOWNER. (*) Or also on any regular file, for f2fs v4.6 and later and ext4 v4.8-rc1 and later; a separate bug fix is coming for that. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICYRichard Weinberger1-0/+3
commit 9a200d075e5d05be1fcad4547a0f8aee4e2f9a04 upstream. ...otherwise an user can enable encryption for certain files even when the filesystem is unable to support it. Such a case would be a filesystem created by mkfs.ext4's default settings, 1KiB block size. Ext4 supports encyption only when block size is equal to PAGE_SIZE. But this constraint is only checked when the encryption feature flag is set. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04ncpfs: fix unused variable warningMiklos Szeredi1-2/+1
[ Upstream commit 9a232de4999666b2e8ea6775b2b0e3e4feb09b7a ] Without CONFIG_NCPFS_NLS the following warning is seen: fs/ncpfs/dir.c: In function 'ncp_hash_dentry': fs/ncpfs/dir.c:136:23: warning: unused variable 'sb' [-Wunused-variable] struct super_block *sb = dentry->d_sb; Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04reiserfs: avoid a -Wmaybe-uninitialized warningArnd Bergmann2-2/+1
[ Upstream commit ab4949640d6674b617b314ad3c2c00353304bab9 ] The latest gcc-7.0.1 snapshot warns about an unintialized variable use: In file included from fs/reiserfs/lbalance.c:8:0: fs/reiserfs/lbalance.c: In function 'leaf_item_bottle.isra.3': fs/reiserfs/reiserfs.h:1279:13: error: '*((void *)&n_ih+8).v' may be used uninitialized in this function [-Werror=maybe-uninitialized] v2->v = (v2->v & cpu_to_le64(15ULL << 60)) | cpu_to_le64(offset); ~~^~~ fs/reiserfs/reiserfs.h:1279:13: error: '*((void *)&n_ih+8).v' may be used uninitialized in this function [-Werror=maybe-uninitialized] v2->v = (v2->v & cpu_to_le64(15ULL << 60)) | cpu_to_le64(offset); This happens because the offset/type pair that is stored in ih.key.u.k_offset_v2 is actually uninitialized when we call set_le_ih_k_offset() and set_le_ih_k_type(). After we have called both, all data is correct, but the first of the two reads uninitialized data for the type field and writes it back before it gets overwritten. This works around the warning by initializing the k_offset_v2 through the slightly larger memcpy(). [JK: Remove now unused define and make it obvious we initialize the key] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04btrfs: Fix possible off-by-one in btrfs_search_path_in_treeNikolay Borisov1-1/+1
[ Upstream commit c8bcbfbd239ed60a6562964b58034ac8a25f4c31 ] The name char array passed to btrfs_search_path_in_tree is of size BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes are in the range of [0, 4079]. Currently the code uses the define but this represents an off-by-one. Implications: Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be written to extra space, not some padding that could be provided by the allocator. btrfs-progs store the arguments on stack, but kernel does own copy of the ioctl buffer and the off-by-one overwrite does not affect userspace, but the ending 0 might be lost. Kernel ioctl buffer is allocated dynamically so we're overwriting somebody else's memory, and the ioctl is privileged if args.objectid is not 256. Which is in most cases, but resolving a subvolume stored in another directory will trigger that path. Before this patch the buffer was one byte larger, but then the -1 was not added. Fixes: ac8e9819d71f907 ("Btrfs: add search and inode lookup ioctls") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ added implications ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04vfs: don't do RCU lookup of empty pathnamesLinus Torvalds1-0/+3
[ Upstream commit c0eb027e5aef70b71e5a38ee3e264dc0b497f343 ] Normal pathname lookup doesn't allow empty pathnames, but using AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you can trigger an empty pathname lookup. And not only is the RCU lookup in that case entirely unnecessary (because we'll obviously immediately finalize the end result), it is actively wrong. Why? An empth path is a special case that will return the original 'dirfd' dentry - and that dentry may not actually be RCU-free'd, resulting in a potential use-after-free if we were to initialize the path lazily under the RCU read lock and depend on complete_walk() finalizing the dentry. Found by syzkaller and KASAN. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04Btrfs: fix unexpected -EEXIST when creating new inodeLiu Bo1-0/+18
[ Upstream commit 900c9981680067573671ecc5cbfa7c5770be3a40 ] The highest objectid, which is assigned to new inode, is decided at the time of initializing fs roots. However, in cases where log replay gets processed, the btree which fs root owns might be changed, so we have to search it again for the highest objectid, otherwise creating new inode would end up with -EEXIST. cc: <stable@vger.kernel.org> v4.4-rc6+ Fixes: f32e48e92596 ("Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2018-03-04Btrfs: fix crash due to not cleaning up tree log block's dirty bitsLiu Bo1-0/+9
[ Upstream commit 1846430c24d66e85cc58286b3319c82cd54debb2 ] In cases that the whole fs flips into readonly status due to failures in critical sections, then log tree's blocks are still dirty, and this leads to a crash during umount time, the crash is about use-after-free, umount -> close_ctree -> stop workers -> iput(btree_inode) -> iput_final -> write_inode_now -> ... -> queue job on stop'd workers cc: <stable@vger.kernel.org> v3.12+ Fixes: 681ae50917df ("Btrfs: cleanup reserved space when freeing tree log on error") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>