summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2016-12-27ext4: Simplify DAX fault pathJan Kara1-38/+10
Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-27dax: Call ->iomap_begin without entry lock during dax faultJan Kara1-55/+66
Currently ->iomap_begin() handler is called with entry lock held. If the filesystem held any locks between ->iomap_begin() and ->iomap_end() (such as ext4 which will want to hold transaction open), this would cause lock inversion with the iomap_apply() from standard IO path which first calls ->iomap_begin() and only then calls ->actor() callback which grabs entry locks for DAX (if it faults when copying from/to user provided buffers). Fix the problem by nesting grabbing of entry lock inside ->iomap_begin() - ->iomap_end() pair. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-27dax: Finish fault completely when loading holesJan Kara1-9/+18
The only case when we do not finish the page fault completely is when we are loading hole pages into a radix tree. Avoid this special case and finish the fault in that case as well inside the DAX fault handler. It will allow us for easier iomap handling. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-27dax: Avoid page invalidation races and unnecessary radix tree traversalsJan Kara1-17/+11
Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-27mm: Invalidate DAX radix tree entries only if appropriateJan Kara1-10/+61
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-27ext2: Return BH_New buffers for zeroed blocksJan Kara1-2/+1
So far we did not return BH_New buffers from ext2_get_blocks() when we allocated and zeroed-out a block for DAX inode to avoid racy zeroing in DAX code. This zeroing is gone these days so we can remove the workaround. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-25ktime: Get rid of ktime_equal()Thomas Gleixner2-4/+3
No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25ktime: Cleanup ktime_set() usageThomas Gleixner3-3/+3
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25ktime: Get rid of the unionThomas Gleixner4-18/+17
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds96-96/+96
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds14-90/+177
Pull cifs fixes from Steve French: "This ncludes various cifs/smb3 bug fixes, mostly for stable as well. In the next week I expect that Germano will have some reconnection fixes, and also I expect to have the remaining pieces of the snapshot enablement and SMB3 ACLs, but wanted to get this set of bug fixes in" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs_get_root shouldn't use path with tree name Fix default behaviour for empty domains and add domainauto option cifs: use %16phN for formatting md5 sum cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack CIFS: Fix a possible double locking of mutex during reconnect CIFS: Fix a possible memory corruption during reconnect CIFS: Fix a possible memory corruption in push locks CIFS: Fix missing nls unload in smb2_reconnect() CIFS: Decrease verbosity of ioctl call SMB3: parsing for new snapshot timestamp mount parm
2016-12-23Merge branch 'for-linus' of ↵Linus Torvalds10-133/+162
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: "Assorted cleanups and fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sg_write()/bsg_write() is not fit to be called under KERNEL_DS ufs: fix function declaration for ufs_truncate_blocks fs: exec: apply CLOEXEC before changing dumpable task flags seq_file: reset iterator to first record for zero offset vfs: fix isize/pos/len checks for reflink & dedupe [iov_iter] fix iterate_all_kinds() on empty iterators move aio compat to fs/aio.c reorganize do_make_slave() clone_private_mount() doesn't need to touch namespace_sem remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
2016-12-23Merge tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befsLinus Torvalds13-116/+145
Pull befs updates from Luis de Bethencourt: "A series of small fixes and adding NFS export support" * tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs: befs: add NFS export support befs: remove trailing whitespaces befs: remove signatures from comments befs: fix style issues in header files befs: fix style issues in linuxvfs.c befs: fix typos in linuxvfs.c befs: fix style issues in io.c befs: fix style issues in inode.c befs: fix style issues in debug.c
2016-12-23Merge branch 'work.namespace' into for-linusAl Viro2-44/+38
2016-12-23ufs: fix function declaration for ufs_truncate_blocksJeff Layton1-1/+1
sparse says: fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static? Note that the forward declaration in the file is already marked static. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-23fs: exec: apply CLOEXEC before changing dumpable task flagsAleksa Sarai1-2/+8
If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Cc: <stable@vger.kernel.org> # v3.2+ Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-23seq_file: reset iterator to first record for zero offsetTomasz Majchrzak1-0/+7
If kernfs file is empty on a first read, successive read operations using the same file descriptor will return no data, even when data is available. Default kernfs 'seq_next' implementation advances iterator position even when next object is not there. Kernfs 'seq_start' for following requests will not return iterator as position is already on the second object. This defect doesn't allow to monitor badblocks sysfs files from MD raid. They are initially empty but if data appears at some stage, userspace is not able to read it. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-23vfs: fix isize/pos/len checks for reflink & dedupeDarrick J. Wong3-9/+13
Strengthen the checking of pos/len vs. i_size, clarify the return values for the clone prep function, and remove pointless code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-23move aio compat to fs/aio.cAl Viro2-77/+95
... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22befs: add NFS export supportLuis de Bethencourt1-0/+38
Implement mandatory export_operations, so it is possible to export befs via nfs. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: remove trailing whitespacesLuis de Bethencourt7-48/+47
Removing all trailing whitespaces in befs. I was skeptic about tainting the history with this, but whitespace changes can be ignored by using 'git blame -w' and 'git log -w'. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: remove signatures from commentsLuis de Bethencourt3-7/+1
No idea why some comments have signatures. These predate git. Removing them since they add noise and no information. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix style issues in header filesLuis de Bethencourt6-17/+12
Fixing checkpatch.pl issues in befs header files: WARNING: Missing a blank line after declarations + befs_inode_addr iaddr; + iaddr.allocation_group = blockno >> BEFS_SB(sb)->ag_shift; WARNING: space prohibited between function name and open parenthesis '(' + return BEFS_SB(sb)->block_size / sizeof (befs_disk_inode_addr); ERROR: "foo * bar" should be "foo *bar" + const char *key, befs_off_t * value); ERROR: Macros with complex values should be enclosed in parentheses +#define PACKED __attribute__ ((__packed__)) Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix style issues in linuxvfs.cLuis de Bethencourt1-23/+26
Fix the following type of checkpatch.pl issues: WARNING: line over 80 characters +static struct dentry *befs_lookup(struct inode *, struct dentry *, unsigned int); ERROR: code indent should use tabs where possible + if (!bi)$ WARNING: please, no spaces at the start of a line + if (!bi)$ WARNING: labels should not be indented + unacquire_bh: WARNING: space prohibited between function name and open parenthesis '(' + sizeof (struct befs_inode_info), WARNING: braces {} are not necessary for single statement blocks + if (!*out) { + return -ENOMEM; + } WARNING: Block comments use a trailing */ on a separate line + * in special cases */ WARNING: Missing a blank line after declarations + int token; + if (!*p) ERROR: do not use assignment in if condition + if (!(bh = sb_bread(sb, sb_block))) { ERROR: space prohibited after that open parenthesis '(' + if( befs_sb->num_blocks > ~((sector_t)0) ) { ERROR: space prohibited before that close parenthesis ')' + if( befs_sb->num_blocks > ~((sector_t)0) ) { ERROR: space required before the open parenthesis '(' + if( befs_sb->num_blocks > ~((sector_t)0) ) { Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix typos in linuxvfs.cLuis de Bethencourt1-8/+6
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix style issues in io.cLuis de Bethencourt1-2/+2
Fixing the two following checkpatch.pl issues: ERROR: trailing whitespace + * Based on portions of file.c and inode.c $ WARNING: labels should not be indented + error: Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix style issues in inode.cLuis de Bethencourt1-6/+6
Fixing the following checkpatch.pl errors and warning: ERROR: trailing whitespace + * $ WARNING: Block comments use * on subsequent lines +/* + Validates the correctness of the befs inode ERROR: "foo * bar" should be "foo *bar" +befs_check_inode(struct super_block *sb, befs_inode * raw_inode, Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22befs: fix style issues in debug.cLuis de Bethencourt1-6/+8
Fix all checkpatch.pl errors and warnings in debug.c: ERROR: trailing whitespace + * $ WARNING: Missing a blank line after declarations + va_list args; + va_start(args, fmt); ERROR: "foo * bar" should be "foo *bar" +befs_dump_inode(const struct super_block *sb, befs_inode * inode) ERROR: "foo * bar" should be "foo *bar" +befs_dump_super_block(const struct super_block *sb, befs_super_block * sup) ERROR: "foo * bar" should be "foo *bar" +befs_dump_small_data(const struct super_block *sb, befs_small_data * sd) WARNING: line over 80 characters +befs_dump_index_entry(const struct super_block *sb, befs_disk_btree_super * super) ERROR: "foo * bar" should be "foo *bar" +befs_dump_index_entry(const struct super_block *sb, befs_disk_btree_super * super) ERROR: "foo * bar" should be "foo *bar" +befs_dump_index_node(const struct super_block *sb, befs_btree_nodehead * node) Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-21splice: reinstate SIGPIPE/EPIPE handlingLinus Torvalds1-2/+7
Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") caused a regression when there were no more readers left on a pipe that was being spliced into: rather than the expected SIGPIPE and -EPIPE return value, the writer would end up waiting forever for space to free up (which obviously was not going to happen with no readers around). Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Debugged-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # v4.9 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-21Merge tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds11-109/+166
Pull more NFS client updates from Trond Myklebust: "Highlights include: - further attribute cache improvements to make revalidation more fine grained - NFSv4 locking improvements Bugfixes: - nfs4_fl_prepare_ds must be careful about reporting success in files layout - pNFS/flexfiles: Instead of marking a device inactive, remove it from the cache" * tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES NFSv4: Place the GETATTR operation before the CLOSE NFSv4: Also ask for attributes when downgrading to a READ-only state NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked() pNFS: Return RW layouts on OPEN_DOWNGRADE NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID NFSv4: ensure __nfs4_find_lock_state returns consistent result. NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. pNFS/flexfiles: delete deviceid, don't mark inactive NFS: Clean up nfs_attribute_timeout() NFS: Remove unused function nfs_revalidate_inode_rcu() NFS: Fix and clean up the access cache validity checking NFS: Only look at the change attribute cache state in nfs_weak_revalidate() NFS: Clean up cache validity checking NFS: Don't revalidate the file on close if we hold a delegation NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN NFSv4: Update the attribute cache info in update_changeattr
2016-12-20NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCESTrond Myklebust2-4/+15
If our DELEGRETURN RPC call is rejected with an EACCES call, then we should remove the GETATTR call from the compound RPC and retry. This could potentially happen when there is a conflict between an ACL denying attribute reads and our use of SP4_MACH_CRED. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCESTrond Myklebust1-0/+10
If our CLOSE RPC call is rejected with an EACCES call, then we should remove the GETATTR call from the compound RPC and retry. This could potentially happen when there is a conflict between an ACL denying attribute reads and our use of SP4_MACH_CRED. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: Place the GETATTR operation before the CLOSETrond Myklebust2-12/+12
In order to benefit from the DENY share lock protection, we should put the GETATTR operation before the CLOSE. Otherwise, we might race with a Windows machine that thinks it is now safe to modify the file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: Also ask for attributes when downgrading to a READ-only stateTrond Myklebust1-1/+2
If we're downgrading from a READ+WRITE mode to a READ-only mode, then ask for cache consistency attributes so that we avoid the revalidation in nfs_close_context() Fixes: 3947b74d0f9d ("NFSv4: Don't request a GETATTR on open_downgrade.") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()Trond Myklebust1-7/+0
The NFS_INO_REVAL_FORCED flag now really only has meaning for the case when we've just been handed a delegation for a file that was already cached, and we're unsure about that cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20pNFS: Return RW layouts on OPEN_DOWNGRADETrond Myklebust1-3/+13
If the client holds no more writeable open state, and does not hold a write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE. We do this only for writes, since some layout drivers may require you to also hold a read layout if you are doing a R/W workload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADETrond Myklebust1-0/+10
While we do not need to return the RW layout when downgrading from a read/write open state to read-only, we might want to do so in order to reduce the burden on the metadataserver so that it does not need to check for changed data when responding to GETATTR requests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQIDNeilBrown1-16/+13
When an NFS4ERR_BAD_SEQID is received the open-owner is removed from the ->state_owners rbtree so that it will no longer be used. If any stateids attached to this open-owner are still in use, and if a request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad. The state is marked as needing recovery and the nfs4_state_manager() is scheduled to clean up. nfs4_state_manager() finds states to be recovered by walking the state_owners rbtree. As the open-owner is not in the rbtree, the bad state is not found so nfs4_state_manager() completes having done nothing. The request is then retried, with a predicatable result (indefinite retries). If the stateid is for a delegation, this open_owner will be used to open files when the delegation is returned. For that to work, a new open-owner needs to be presented to the server. This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner in the rbtree but updates the 'create_time' so it looks like a new open-owner. With this the indefinite retries no longer happen. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: ensure __nfs4_find_lock_state returns consistent result.NeilBrown1-8/+20
If a file has both flock locks and OFD locks, then it is possible that two different nfs4 lock states could apply to file accesses from a single process. It is not possible to know, efficiently, which one is "correct". Presumably the state which represents a lock that covers the region undergoing IO would be the "correct" one to use, but finding that has a non-trivial cost and would provide miniscule value. Currently we just return whichever is first in the list, which could result in inconsistent behaviour if an application ever put it self in this position. As consistent behaviour is preferable (when perfectly correct behaviour is not available), change the search to return a consistent result in this circumstance. Specifically: if there is both a flock and OFD lock state, always return the flock one. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.NeilBrown1-1/+2
Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds', then ds->ds_clp will also be non-NULL. This is not necessasrily true in the case when the process received a fatal signal while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the devid may not recently have been marked unavailable. So add a test for ds_clp == NULL and return NULL in that case. Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Olga Kornievskaia <aglo@umich.edu> Acked-by: Adamson, Andy <William.Adamson@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20pNFS/flexfiles: delete deviceid, don't mark inactiveWeston Andros Adamson2-3/+5
Instead of marking a device inactive, remove it from the cache entirely. Flexfiles has a way to report errors back to the server, so we don't want to stop devices from being tried again for 120 seconds. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Clean up nfs_attribute_timeout()Trond Myklebust1-7/+7
It can be made static. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Remove unused function nfs_revalidate_inode_rcu()Trond Myklebust1-9/+0
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Fix and clean up the access cache validity checkingTrond Myklebust1-9/+9
The access cache needs to check whether or not the mode bits, ownership, or ACL has changed or the cache has timed out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Only look at the change attribute cache state in nfs_weak_revalidate()Trond Myklebust1-2/+3
Just like in nfs_check_verifier(), we want to use nfs_mapping_need_revalidate_inode() to check our knowledge of the change attribute is up to date. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Clean up cache validity checkingTrond Myklebust3-22/+34
Consolidate the open-coded checking of NFS_I(inode)->cache_validity into a couple of helper functions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFS: Don't revalidate the file on close if we hold a delegationTrond Myklebust1-0/+2
If we're holding a delegation, we can skip sending the close-to-open GETATTR until we're returning that delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURNTrond Myklebust1-4/+1
DELEGRETURN will always carry a reference to the inode except when the latter is being freed, so let's ensure that we always use that inode information to ensure close-to-open cache consistency, even when the DELEGRETURN call is asynchronous. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-20NFSv4: Update the attribute cache info in update_changeattrTrond Myklebust1-1/+8
If we successfully updated the change attribute, we should timestamp the cache. While we do know that the other attributes are not completely up to date, we have the NFS_INO_INVALID_ATTR flag that let us know that, so it is valid to say that the cache has not timed out. We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute is now known to be valid. Conversely, if the change attribute did not match, we should make sure to also revalidate the access and ACL caches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19Merge branch 'for_linus' of ↵Linus Torvalds8-160/+150
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, fsnotify and ext2 updates from Jan Kara: "Changes to locking of some quota operations from dedicated quota mutex to s_umount semaphore, a fsnotify fix and a simple ext2 fix" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Fix bogus warning in dquot_disable() fsnotify: Fix possible use-after-free in inode iteration on umount ext2: reject inodes with negative size quota: Remove dqonoff_mutex ocfs2: Use s_umount for quota recovery protection quota: Remove dqonoff_mutex from dquot_scan_active() ocfs2: Protect periodic quota syncing with s_umount semaphore quota: Use s_umount protection for quota operations quota: Hold s_umount in exclusive mode when enabling / disabling quotas fs: Provide function to get superblock with exclusive s_umount