summaryrefslogtreecommitdiff
path: root/include/linux/fs.h
AgeCommit message (Collapse)AuthorFilesLines
2018-03-16Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - backport-friendly part of lock_parent() race fix - a fix for an assumption in the heurisic used by path_connected() that is not true on NFS - livelock fixes for d_alloc_parallel() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Teach path_connected to handle nfs filesystems with multiple roots. fs: dcache: Use READ_ONCE when accessing i_dir_seq fs: dcache: Avoid livelock between d_alloc_parallel and __d_add lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
2018-03-16fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman1-0/+1
On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-26dax: fix vma_is_fsdax() helperDan Williams1-1/+1
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed. Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: <stable@vger.kernel.org> Reported-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Jane Chu <jane.chu@oracle.com> Reported-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-31Merge tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+6
Pull xfs updates from Darrick Wong: "This merge cycle, we're again some substantive changes to XFS. Metadata verifiers have been restructured to provide more detail about which part of a metadata structure failed checks, and we've enhanced the new online fsck feature to cross-reference extent allocation information with the other metadata structures. With this pull, the metadata verification part of online fsck is more or less finished, though the feature is still experimental and still disabled by default. We're also preparing to remove the EXPERIMENTAL tag from a couple of features this cycle. This week we're committing a bunch of space accounting fixes for reflink and removing the EXPERIMENTAL tag from reflink; I anticipate that we'll be ready to do the same for the reverse mapping feature next week. (I don't have any pending fixes for rmap; however I wish to remove the tags one at a time.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Let me know if there's any merge problems -- git merge reported that one of our patches touched the same function as the i_version series, but it resolved things cleanly. Summary: - Log faulting code locations when verifiers fail, for improved diagnosis of corrupt filesystems. - Implement metadata verifiers for local format inode fork data. - Online scrub now cross-references metadata records with other metadata. - Refactor the fs geometry ioctl generation functions. - Harden various metadata verifiers. - Fix various accounting problems. - Fix uncancelled transactions leaking when xattr functions fail. - Prevent the copy-on-write speculative preallocation garbage collector from racing with writeback. - Emit log reservation type information as trace data so that we can compare against xfsprogs. - Fix some erroneous asserts in the online scrub code. - Clean up the transaction reservation calculations. - Fix various minor bugs in online scrub. - Log complaints about mixed dio/buffered writes once per day and less noisily than before. - Refactor buffer log item lists to use list_head. - Break PNFS leases before reflinking blocks. - Reduce lock contention on reflink source files. - Fix some quota accounting problems with reflink. - Fix a serious corruption problem in the direct cow write code where we fed bad iomaps to the vfs iomap consumers. - Various other refactorings. - Remove EXPERIMENTAL tag from reflink!" * tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (94 commits) xfs: remove experimental tag for reflinks xfs: don't screw up direct writes when freesp is fragmented xfs: check reflink allocation mappings iomap: warn on zero-length mappings xfs: treat CoW fork operations as delalloc for quota accounting xfs: only grab shared inode locks for source file during reflink xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes xfs: reflink should break pnfs leases before sharing blocks xfs: don't clobber inobt/finobt cursors when xref with rmap xfs: skip CoW writes past EOF when writeback races with truncate xfs: preserve i_rdev when recycling a reclaimable inode xfs: refactor accounting updates out of xfs_bmap_btalloc xfs: refactor inode verifier corruption error printing xfs: make tracepoint inode number format consistent xfs: always zero di_flags2 when we free the inode xfs: call xfs_qm_dqattach before performing reflink operations xfs: bmap code cleanup Use list_head infra-structure for buffer's log items list Split buffer's b_fspriv field Get rid of xfs_buf_log_item_t typedef ...
2018-01-31Merge branch 'work.misc' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of misc stuff, without any unifying topic, from various people. Neil's d_anon patch, several bugfixes, introduction of kvmalloc analogue of kmemdup_user(), extending bitfield.h to deal with fixed-endians, assorted cleanups all over the place..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) alpha: osf_sys.c: use timespec64 where appropriate alpha: osf_sys.c: fix put_tv32 regression jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path dcache: delete unused d_hash_mask dcache: subtract d_hash_shift from 32 in advance fs/buffer.c: fold init_buffer() into init_page_buffers() fs: fold __inode_permission() into inode_permission() fs: add RWF_APPEND sctp: use vmemdup_user() rather than badly open-coding memdup_user() snd_ctl_elem_init_enum_names(): switch to vmemdup_user() replace_user_tlv(): switch to vmemdup_user() new primitive: vmemdup_user() memdup_user(): switch to GFP_USER eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd: convert to use anon_inode_getfd() nfs4file: get rid of pointless include of btrfs.h uvc_v4l2: clean copyin/copyout up vme_user: don't use __copy_..._user() usx2y: don't bother with memdup_user() for 16-byte structure ...
2018-01-31Merge branch 'work.mqueue' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mqueue/bpf vfs cleanups from Al Viro: "mqueue and bpf go through rather painful and similar contortions to create objects in their dentry trees. Provide a primitive for doing that without abusing ->mknod(), switch bpf and mqueue to it. Another mqueue-related thing that has ended up in that branch is on-demand creation of internal mount (based upon the work of Giuseppe Scrivano)" * 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mqueue: switch to on-demand creation of internal mount tidy do_mq_open() up a bit mqueue: clean prepare_open() up do_mq_open(): move all work prior to dentry_open() into a helper mqueue: fold mq_attr_ok() into mqueue_get_inode() move dentry_open() calls up into do_mq_open() mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod() new primitive: vfs_mkobj()
2018-01-31Merge branch 'misc.poll' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
2018-01-29xfs: only grab shared inode locks for source file during reflinkDarrick J. Wong1-0/+5
Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29fs: handle inode->i_version more efficientlyJeff Layton1-1/+1
Since i_version is mostly treated as an opaque value, we can exploit that fact to avoid incrementing it when no one is watching. With that change, we can avoid incrementing the counter on writes, unless someone has queried for it since it was last incremented. If the a/c/mtime don't change, and the i_version hasn't changed, then there's no need to dirty the inode metadata on a write. Convert the i_version counter to an atomic64_t, and use the lowest order bit to hold a flag that will tell whether anyone has queried the value since it was last incremented. When we go to maybe increment it, we fetch the value and check the flag bit. If it's clear then we don't need to do anything if the update isn't being forced. If we do need to update, then we increment the counter by 2, and clear the flag bit, and then use a CAS op to swap it into place. If that works, we return true. If it doesn't then do it again with the value that we fetch from the CAS operation. On the query side, if the flag is already set, then we just shift the value down by 1 bit and return it. Otherwise, we set the flag in our on-stack value and again use cmpxchg to swap it into place if it hasn't changed. If it has, then we use the value from the cmpxchg as the new "old" value and try again. This method allows us to avoid incrementing the counter on writes (and dirtying the metadata) under typical workloads. We only need to increment if it has been queried since it was last changed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
2018-01-29fs: new API for handling inode->i_versionJeff Layton1-15/+0
Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-26fs: fold __inode_permission() into inode_permission()Eric Biggers1-1/+0
Since commit 9c630ebefeee ("ovl: simplify permission checking"), overlayfs doesn't call __inode_permission() anymore, which leaves no users other than inode_permission(). So just fold it back into inode_permission(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-26fs: add RWF_APPENDJürg Billeter1-0/+2
This is the per-I/O equivalent of O_APPEND to support atomic append operations on any open file. If a file is opened with O_APPEND, pwrite() ignores the offset and always appends data to the end of the file. RWF_APPEND enables atomic append and pwrite() with offset on a single file descriptor. Signed-off-by: Jürg Billeter <j@bitron.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-08iomap: report collisions between directio and buffered writes to userspaceDarrick J. Wong1-0/+1
If two programs simultaneously try to write to the same part of a file via direct IO and buffered IO, there's a chance that the post-diowrite pagecache invalidation will fail on the dirty page. When this happens, the dio write succeeded, which means that the page cache is no longer coherent with the disk! Programs are not supposed to mix IO types and this is a clear case of data corruption, so store an EIO which will be reflected to userspace during the next fsync. Replace the WARN_ON with a ratelimited pr_crit so that the developers have /some/ kind of breadcrumb to track down the offending program(s) and file(s) involved. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2018-01-05new primitive: vfs_mkobj()Al Viro1-0/+4
Similar to vfs_create(), but with caller-supplied callback (and argument for it) to be used instead of ->create(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-26VFS: don't keep disconnected dentries on d_anonNeilBrown1-1/+1
The original purpose of the per-superblock d_anon list was to keep disconnected dentries in the cache between consecutive requests to the NFS server. Dentries can be disconnected if a client holds a file open and repeatedly performs IO on it, and if the server drops the dentry, whether due to memory pressure, server restart, or "echo 3 > /proc/sys/vm/drop_caches". This purpose was thwarted by commit 75a6f82a0d10 ("freeing unlinked file indefinitely delayed") which caused disconnected dentries to be freed as soon as their refcount reached zero. This means that, when a dentry being used by nfsd gets disconnected, a new one needs to be allocated for every request (unless requests overlap). As the dentry has no name, no parent, and no children, there is little of value to cache. As small memory allocations are typically fast (from per-cpu free lists) this likely has little cost. This means that the original purpose of s_anon is no longer relevant: there is no longer any need to keep disconnected dentries on a list so they appear to be hashed. However, s_anon now has a new use. When you mount an NFS filesystem, the dentry stored in s_root is just a placebo. The "real" root dentry is allocated using d_obtain_root() and so it kept on the s_anon list. I don't know the reason for this, but suspect it related to NFSv4 where a mount of "server:/some/path" require NFS to look up the root filehandle on the server, then walk down "/some" and "/path" to get the filehandle to mount. Whatever the reason, NFS depends on the s_anon list and on shrink_dcache_for_umount() pruning all dentries on this list. So we cannot simply remove s_anon. We could just leave the code unchanged, but apart from that being potentially confusing, the (unfair) bit-spin-lock which protects s_anon can become a bottle neck when lots of disconnected dentries are being created. So this patch renames s_anon to s_roots, and stops storing disconnected dentries on the list. Only dentries obtained with d_obtain_root() are now stored on this list. There are many fewer of these (only NFS and NILFS2 use the call, and only during filesystem mount) so contention on the bit-lock will not be a problem. Possibly an alternate solution should be found for NFS and NILFS2, but that would require understanding their needs first. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-30autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"Ian Kent1-1/+2
Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but introduced a semantic change. In order to honor AT_NO_AUTOMOUNT a semantic change was made to the negative dentry case for stat family system calls in follow_automount(). This changed the unconditional triggering of an automount in this case to no longer be done and an error returned instead. This has caused more problems than I expected so reverting the change is needed. In a discussion with Neil Brown it was concluded that the automount(8) daemon can implement this change without kernel modifications. So that will be done instead and the autofs module documentation updated with a description of the problem and what needs to be done by module users for this specific case. Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") Signed-off-by: Ian Kent <raven@themaw.net> Cc: Neil Brown <neilb@suse.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Colin Walters <walters@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-30mm: introduce get_user_pages_longtermDan Williams1-0/+14
Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-28->poll() methods should return __poll_tAl Viro1-1/+1
The most common place to find POLL... bitmaps: return values of ->poll() and its subsystem counterparts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-28Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds1-1/+1
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-18Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-2/+2
Pull nfsd updates from Bruce Fields: "Lots of good bugfixes, including: - fix a number of races in the NFSv4+ state code - fix some shutdown crashes in multiple-network-namespace cases - relax our 4.1 session limits; if you've an artificially low limit to the number of 4.1 clients that can mount simultaneously, try upgrading" * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits) SUNRPC: Improve ordering of transport processing nfsd: deal with revoked delegations appropriately svcrdma: Enqueue after setting XPT_CLOSE in completion handlers nfsd: use nfs->ns.inum as net ID rpc: remove some BUG()s svcrdma: Preserve CB send buffer across retransmits nfds: avoid gettimeofday for nfssvc_boot time fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t lockd: double unregister of inetaddr notifiers nfsd4: catch some false session retries nfsd4: fix cached replies to solo SEQUENCE compounds sunrcp: make function _svc_create_xprt static SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status nfsd: use ARRAY_SIZE nfsd: give out fewer session slots as limit approaches nfsd: increase DRC cache limit nfsd: remove unnecessary nofilehandle checks nfs_common: convert int to bool ...
2017-11-17Merge branch 'work.misc' of ↵Linus Torvalds1-1/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, really no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: grab the lock instead of blocking in __fd_install during resizing vfs: stop clearing close on exec when closing a fd include/linux/fs.h: fix comment about struct address_space fs: make fiemap work from compat_ioctl coda: fix 'kernel memory exposure attempt' in fsync pstore: remove unneeded unlikely() vfs: remove unneeded unlikely() stubs for mount_bdev() and kill_block_super() in !CONFIG_BLOCK case make vfs_ustat() static do_handle_open() should be static elf_fdpic: fix unused variable warning fold destroy_super() into __put_super() new helper: destroy_unused_super() fix address space warnings in ipc/ acct.h: get rid of detritus
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-14Merge tag 'fscrypt-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Lots of cleanups, mostly courtesy by Eric Biggers" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: lock mutex before checking for bounce page pool fscrypt: add a documentation file for filesystem-level encryption ext4: switch to fscrypt_prepare_setattr() ext4: switch to fscrypt_prepare_lookup() ext4: switch to fscrypt_prepare_rename() ext4: switch to fscrypt_prepare_link() ext4: switch to fscrypt_file_open() fscrypt: new helper function - fscrypt_prepare_setattr() fscrypt: new helper function - fscrypt_prepare_lookup() fscrypt: new helper function - fscrypt_prepare_rename() fscrypt: new helper function - fscrypt_prepare_link() fscrypt: new helper function - fscrypt_file_open() fscrypt: new helper function - fscrypt_require_key() fscrypt: remove unneeded empty fscrypt_operations structs fscrypt: remove ->is_encrypted() fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() fs, fscrypt: add an S_ENCRYPTED inode flag fscrypt: clean up include file mess
2017-11-13Merge branch 'next-integrity' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem integrity updates from James Morris: "There is a mixture of bug fixes, code cleanup, preparatory code for new functionality and new functionality. Commit 26ddabfe96bb ("evm: enable EVM when X509 certificate is loaded") enabled EVM without loading a symmetric key, but was limited to defining the x509 certificate pathname at build. Included in this set of patches is the ability of enabling EVM, without loading the EVM symmetric key, from userspace. New is the ability to prevent the loading of an EVM symmetric key." * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: Remove redundant conditional operator ima: Fix bool initialization/comparison ima: check signature enforcement against cmdline param instead of CONFIG module: export module signature enforcement status ima: fix hash algorithm initialization EVM: Only complain about a missing HMAC key once EVM: Allow userspace to signal an RSA key has been loaded EVM: Include security.apparmor in EVM measurements ima: call ima_file_free() prior to calling fasync integrity: use kernel_read_file_from_path() to read x509 certs ima: always measure and audit files in policy ima: don't remove the securityfs policy file vfs: fix mounting a filesystem with i_version
2017-11-08integrity: use kernel_read_file_from_path() to read x509 certsChristoph Hellwig1-0/+1
The CONFIG_IMA_LOAD_X509 and CONFIG_EVM_LOAD_X509 options permit loading x509 signed certificates onto the trusted keyrings without verifying the x509 certificate file's signature. This patch replaces the call to the integrity_read_file() specific function with the common kernel_read_file_from_path() function. To avoid verifying the file signature, this patch defines READING_X509_CERTFICATE. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2017-11-06include/linux/fs.h: fix comment about struct address_spaceMike Rapoport1-1/+1
Before commit 9c5d760b8d22 ("mm: split gfp_mask and mapping flags into separate fields") the private_* fields of struct adrress_space were grouped together and using "ditto" in comments describing the last fields was correct. With introduction of gpf_mask between private_lock and private_list "ditto" references the wrong description. Fix it by using the elaborate description. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-03mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flagsDan Williams1-0/+1
The mmap(2) syscall suffers from the ABI anti-pattern of not validating unknown flags. However, proposals like MAP_SYNC need a mechanism to define new behavior that is known to fail on older kernels without the support. Define a new MAP_SHARED_VALIDATE flag pattern that is guaranteed to fail on all legacy mmap implementations. It is worth noting that the original proposal was for a standalone MAP_VALIDATE flag. However, when that could not be supported by all archs Linus observed: I see why you *think* you want a bitmap. You think you want a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC etc, so that people can do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_SYNC, fd, 0); and "know" that MAP_SYNC actually takes. And I'm saying that whole wish is bogus. You're fundamentally depending on special semantics, just make it explicit. It's already not portable, so don't try to make it so. Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value of 0x3, and make people do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0); and then the kernel side is easier too (none of that random garbage playing games with looking at the "MAP_VALIDATE bit", but just another case statement in that map type thing. Boom. Done. Similar to ->fallocate() we also want the ability to validate the support for new flags on a per ->mmap() 'struct file_operations' instance basis. Towards that end arrange for flags to be generically validated against a mmap_supported_flags exported by 'struct file_operations'. By default all existing flags are implicitly supported, but new flags require MAP_SHARED_VALIDATE and per-instance-opt-in. Cc: Jan Kara <jack@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-19fs, fscrypt: add an S_ENCRYPTED inode flagEric Biggers1-0/+2
Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate that the inode is encrypted using the fscrypt (fs/crypto/) mechanism. Checking this flag will give the same information that inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more efficient. This will be useful for adding higher-level helper functions for filesystems to use. For example we'll be able to replace this: if (ext4_encrypted_inode(inode)) { ret = fscrypt_get_encryption_info(inode); if (ret) return ret; if (!fscrypt_has_encryption_key(inode)) return -ENOKEY; } with this: ret = fscrypt_require_key(inode); if (ret) return ret; ... since we'll be able to retain the fast path for unencrypted files as a single flag check, using an inline function. This wasn't possible before because we'd have had to frequently call through the ->i_sb->s_cop->is_encrypted function pointer, even when the encryption support was disabled or not being used. Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is disabled because we want to continue to return an error if an encrypted file is accessed without encryption support, rather than pretending that it is unencrypted. Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-15stubs for mount_bdev() and kill_block_super() in !CONFIG_BLOCK caseAl Viro1-0/+16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-15make vfs_ustat() staticAl Viro1-1/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-04include/linux/fs.h: fix comment about struct address_spaceMike Rapoport1-1/+1
Before commit 9c5d760b8d22 ("mm: split gfp_mask and mapping flags into separate fields") the private_* fields of struct adrress_space were grouped together and using "ditto" in comments describing the last fields was correct. With introduction of gpf_mask between private_lock and private_list "ditto" references the wrong description. Fix it by using the elaborate description. Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-26nfs_common: convert int to boolCorentin Labbe1-2/+2
Since __state_in_grace return only true/false, make it return bool instead of int. Same change for the two user of it, locks_in_grace/opens_in_grace Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-15vfs: constify path argument to kernel_read_file_from_pathMimi Zohar1-1/+1
This patch constifies the path argument to kernel_read_file_from_path(). Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-15Merge branch 'work.read_write' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nowait read support from Al Viro: "Support IOCB_NOWAIT for buffered reads and block devices" * 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: block_dev: support RFW_NOWAIT on block device nodes fs: support RWF_NOWAIT for buffered reads fs: support IOCB_NOWAIT in generic_file_buffered_read fs: pass iocb to do_generic_file_read
2017-09-15Merge branch 'work.mount' of ↵Linus Torvalds1-10/+38
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-15Merge branch 'work.set_fs' of ↵Linus Torvalds1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more set_fs removal from Al Viro: "Christoph's 'use kernel_read and friends rather than open-coding set_fs()' series" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: unexport vfs_readv and vfs_writev fs: unexport vfs_read and vfs_write fs: unexport __vfs_read/__vfs_write lustre: switch to kernel_write gadget/f_mass_storage: stop messing with the address limit mconsole: switch to kernel_read btrfs: switch write_buf to kernel_write net/9p: switch p9_fd_read to kernel_write mm/nommu: switch do_mmap_private to kernel_read serial2002: switch serial2002_tty_write to kernel_{read/write} fs: make the buf argument to __kernel_write a void pointer fs: fix kernel_write prototype fs: fix kernel_read prototype fs: move kernel_read to fs/read_write.c fs: move kernel_write to fs/read_write.c autofs4: switch autofs4_write to __kernel_write ashmem: switch to ->read_iter
2017-09-13Merge branch 'overlayfs-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This fixes d_ino correctness in readdir, which brings overlayfs on par with normal filesystems regarding inode number semantics, as long as all layers are on the same filesystem. There are also some bug fixes, one in particular (random ioctl's shouldn't be able to modify lower layers) that touches some vfs code, but of course no-op for non-overlay fs" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix false positive ESTALE on lookup ovl: don't allow writing ioctl on lower layer ovl: fix relatime for directories vfs: add flags to d_real() ovl: cleanup d_real for negative ovl: constant d_ino for non-merge dirs ovl: constant d_ino across copy up ovl: fix readdir error value ovl: check snprintf return
2017-09-09autofs: fix AT_NO_AUTOMOUNT not being honoredIan Kent1-2/+1
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an automount by the call. But this flag is unconditionally cleared for all stat family system calls except statx(). stat family system calls have always triggered mount requests for the negative dentry case in follow_automount() which is intended but prevents the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled. In order to handle the AT_NO_AUTOMOUNT for both system calls the negative dentry case in follow_automount() needs to be changed to return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are clear). AFAICT this change doesn't have any noticable side effects and may, in some use cases (although I didn't see it in testing) prevent unnecessary callbacks to the automount daemon. It's also possible that a stat family call has been made with a path that is in the process of being mounted by some other process. But stat family calls should return the automount state of the path as it is "now" so it shouldn't wait for mount completion. This is the same semantic as the positive dentry case already handled. Link: http://lkml.kernel.org/r/150216641255.11652.4204561328197919771.stgit@pluto.themaw.net Fixes: deccf497d804a4c5fca ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()") Signed-off-by: Ian Kent <raven@themaw.net> Cc: David Howells <dhowells@redhat.com> Cc: Colin Walters <walters@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-09lib/interval_tree: fast overlap detectionDavidlohr Bueso1-2/+2
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08Merge branch 'quota_scaling' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota scaling updates from Jan Kara: "This contains changes to make the quota subsystem more scalable. Reportedly it improves number of files created per second on ext4 filesystem on fast storage by about a factor of 2x" * 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits) quota: Add lock annotations to struct members quota: Reduce contention on dq_data_lock fs: Provide __inode_get_bytes() quota: Inline dquot_[re]claim_reserved_space() into callsite quota: Inline inode_{incr,decr}_space() into callsites quota: Inline functions into their callsites ext4: Disable dirty list tracking of dquots when journalling quotas quota: Allow disabling tracking of dirty dquots in a list quota: Remove dq_wait_unused from dquot quota: Move locking into clear_dquot_dirty() quota: Do not dirty bad dquots quota: Fix possible corruption of dqi_flags quota: Propagate ->quota_read errors from v2_read_file_info() quota: Fix error codes in v2_read_file_info() quota: Push dqio_sem down to ->read_file_info() quota: Push dqio_sem down to ->write_file_info() quota: Push dqio_sem down to ->get_next_id() quota: Push dqio_sem down to ->release_dqblk() quota: Remove locking for writing to the old quota format quota: Do not acquire dqio_sem for dquot overwrites in v2 format ...
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-09-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+0
Merge updates from Andrew Morton: - various misc bits - DAX updates - OCFS2 - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits) mm,fork: introduce MADV_WIPEONFORK x86,mpx: make mpx depend on x86-64 to free up VMA flag mm: add /proc/pid/smaps_rollup mm: hugetlb: clear target sub-page last when clearing huge page mm: oom: let oom_reap_task and exit_mmap run concurrently swap: choose swap device according to numa node mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for memory reserves access z3fold: use per-cpu unbuddied lists mm, swap: don't use VMA based swap readahead if HDD is used as swap mm, swap: add sysfs interface for VMA based swap readahead mm, swap: VMA based swap readahead mm, swap: fix swap readahead marking mm, swap: add swap readahead hit statistics mm/vmalloc.c: don't reinvent the wheel but use existing llist API mm/vmstat.c: fix wrong comment selftests/memfd: add memfd_create hugetlbfs selftest mm/shmem: add hugetlbfs support to memfd_create() mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() ...
2017-09-07include/linux/fs.h: remove unneeded forward definition of mm_structJeff Layton1-2/+0
Link: http://lkml.kernel.org/r/20170525102927.6163-1-jlayton@redhat.com Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07Merge tag 'wberr-v4.14-1' of ↵Linus Torvalds1-4/+16
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2017-09-06Merge tag 'locks-v4.14-1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This pile just has a few file locking fixes from Ben Coddington. There are a couple of cleanup patches + an attempt to bring sanity to the l_pid value that is reported back to userland on an F_GETLK request. After a few gyrations, he came up with a way for filesystems to communicate to the VFS layer code whether the pid should be translated according to the namespace or presented as-is to userland" * tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: restore a warn for leaked locks on close fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks fs/locks: Use allocation rather than the stack in fcntl_getlk()
2017-09-06Merge tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+1
Pull XFS updates from Darrick Wong: "Here are the changes for xfs for 4.14. Most of these are cleanups and fixes for bad behavior, as we're mostly focusing on improving reliablity this cycle (read: there's potentially a lot of stuff on the horizon for 4.15 so better to spend a few weeks killing other bugs now). Summary: - Write unmount record for a ro mount to avoid unnecessary log replay - Clean up orphaned inodes when mounting fs readonly - Resubmit inode log items when buffer writeback fails to avoid umount hang - Fix log recovery corruption problems when log headers wrap around the end - Avoid infinite loop searching for free inodes when inode counters are wrong - Evict inodes involved with log redo so that we don't leak them later - Fix a potential race between reclaim and inode cluster freeing - Refactor the inode joining code w.r.t. transaction rolling & deferred ops - Fix a bug where the log doesn't properly deal with dirty buffers that are about to become ordered buffers - Fix the extent swap code to deal with making dirty buffers ordered properly - Consolidate page fault handlers - Refactor the incore extent manipulation functions to use the iext abstractions instead of directly modifying with extent data - Disable crashy chattr +/-x until we fix it - Don't allow us to set S_DAX for v2 inodes - Various cleanups - Clarify some documentation - Fix a problem where fsync and a log commit race to send the disk a flush command, resulting in a small window where power fail data loss could occur - Simplify some rmap operations in the fcollapse code - Fix some use-after-free problems in async writeback" * tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits) xfs: use kmem_free to free return value of kmem_zalloc xfs: open code end_buffer_async_write in xfs_finish_page_writeback xfs: don't set v3 xflags for v2 inodes xfs: fix compiler warnings fsmap: fix documentation of FMR_OF_LAST xfs: simplify the rmap code in xfs_bmse_merge xfs: remove unused flags arg from xfs_file_iomap_begin_delay xfs: fix incorrect log_flushed on fsync xfs: disable per-inode DAX flag xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents xfs: move some code around inside xfs_bmap_shift_extents xfs: use xfs_iext_get_extent in xfs_bmap_first_unused xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert xfs: add a xfs_iext_update_extent helper xfs: consolidate the various page fault handlers iomap: return VM_FAULT_* codes from iomap_page_mkwrite xfs: relog dirty buffers during swapext bmbt owner change ...
2017-09-05Merge tag 'char-misc-4.14-rc1' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits) ANDROID: binder: don't queue async transactions to thread. ANDROID: binder: don't enqueue death notifications to thread todo. ANDROID: binder: Don't BUG_ON(!spin_is_locked()). ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue android: binder: Add page usage in binder stats android: binder: fixup crash introduced by moving buffer hdr drivers: w1: add hwmon temp support for w1_therm drivers: w1: refactor w1_slave_show to make the temp reading functionality separate drivers: w1: add hwmon support structures eeprom: idt_89hpesx: Support both ACPI and OF probing mcb: Fix an error handling path in 'chameleon_parse_cells()' MCB: add support for SC31 to mcb-lpc mux: make device_type const char: virtio: constify attribute_group structures. Documentation/ABI: document the nvmem sysfs files lkdtm: fix spelling mistake: "incremeted" -> "incremented" perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file nvmem: include linux/err.h from header ...
2017-09-05fs: unexport vfs_readv and vfs_writevChristoph Hellwig1-2/+0
We've got no modular users left, and any potential modular user is better of with iov_iter based variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-05fs: unexport __vfs_read/__vfs_writeChristoph Hellwig1-1/+0
No modular users left, and any new ones should use kernel_read/write or iov_iter variants instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>