Age | Commit message (Collapse) | Author | Files | Lines |
|
send_mds_reconnect() may call discard_cap_releases() after all
release messages have been dropped by cleanup_cap_releases()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Remove unsupported symlink operations.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
|
When adjusting caps client wants, MDS does not record caps that are
not allowed. For non-auth MDS, it does not record WR caps. So when
a MDS reply changes a non-auth cap to auth cap, client needs to set
cap's mds_wanted according to the reply.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).
The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.
The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
VFS does not directly pass flock's operation code to filesystem's
flock callback. It translates the operation code to the form how
posix lock's parameters are presented.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
handle following sequence of events:
- client releases a inode with i_max_size > 0. The release message
is queued. (is not sent to the auth MDS)
- a 'lookup' request reply from non-auth MDS returns the same inode.
- client opens the inode in write mode. The version of inode trace
in 'open' request reply is equal to the cached inode's version.
- client requests new max size. The MDS ignores the request because
it does not affect client's write range
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Only auth MDS can issue write caps to clients, so don't consider
write caps registered with non-auth MDS as valid.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Use the newly introduced LOOKUPNAME MDS request to connect child
inode to its parent directory.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
ceph_fh_to_parent() returns dentry that corresponds to the 'ino' field
of struct ceph_nfs_confh. This is wrong, it should return dentry that
corresponds to the 'parent_ino' field.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
The callback uses LOOKUPPARENT MDS request to find parent.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
MDS handles LOOKUPHASH and LOOKUPINO MDS requests in the same way.
So __cfh_to_dentry() is redundant.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
The object store limit needs to be updated after writing,
and this can be done provided the corresponding object has already
been initialized. Current object initialization is done asynchrously,
which introduce a race if a file is opened, then immediately followed
by a writing, the initialization may have not completed, the code will
reach the ASSERT in fscache_submit_exclusive_op() to cause kernel
bug.
Tested-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
|
|
Synchronize object->store_limit[_l] with new inode->i_size after file writing.
Tested-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
|
|
Add an interface to explicitly synchronize object->store_limit[_l]
with inode->i_size
Tested-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
|
|
This is racy--we do not know whather d_parent has changed out from
underneath us because i_mutex is not held on the source inode's directory.
Also, taking this reference is useless.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
Do not assume that r_old_dentry implies that r_old_dentry_dir is also
true. Separate out the ref cleanup and make the debugs dump behave when
it is NULL.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
The fsync(dirfd) only covers namespace operations, not inode updates.
We do not need to cover setattr variants or O_TRUNC.
Reported-by: Al Viro <viro@xeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
This is just old_dir; no reason to abuse the dcache pointers.
Reported-by: Al Viro <viro.zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
If readdir 'frag' is adjusted, readdir 'offset' should be reset.
Otherwise some dentries may be lost when readdir and fragmenting
directory happen at the some.
Another way to fix this issue is let MDS adjust readdir 'frag'.
The code that handles MDS reply reset the readdir 'offset' if
the readdir reply is different than the requested one.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|
When changing readdir postion, fi->next_offset should be set to 0
if the new postion is not in the first dirfrag.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for
directory. For a fragmented directory, offset (frag_t, off) can be
larger than inode->i_sb->s_maxbytes.
At the very beginning of ceph_dir_llseek(), local variable old_offset
is initialized to parameter offset. This doesn't make sense neither.
Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset).
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
hash modifications into false negatives from __lookup_mnt() (instead
of hangs)"
On the false negatives from __lookup_mnt():
"The *only* thing we care about is not getting stuck in __lookup_mnt().
If it misses an entry because something in front of it just got moved
around, etc, we are fine. We'll notice that mount_lock mismatch and
that'll be it"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch mnt_hash to hlist
don't bother with propagate_mnt() unless the target is shared
keep shadowed vfsmounts together
resizable namespace.c hashes
|
|
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating. Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.
[fix for dumb braino folded]
Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
preparation to switching mnt_hash to hlist
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: strlcpy (lib/string.c:388 lib/string.c:151)
CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
RIP: strlcpy (lib/string.c:388 lib/string.c:151)
Call Trace:
ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
vfs_mkdir (fs/namei.c:3467)
SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
tracesys (arch/x86/kernel/entry_64.S:749)
akpm: this patch probably disables the feature. A temporary thing to
avoid triviel oopses.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().
Signed-off-by: Jan Kara <jack@suse.cz>
[ Further simplified on suggestion from Dave Jones ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The previous commit removed the register_filesystem() call and the
associated error handling, but left the label for the error path that no
longer exists. Remove that too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).
This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull nfsd fix frm Bruce Fields:
"J R Okajima sent this early and I was just slow to pass it along,
apologies. Fortunately it's a simple fix"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
nfsd: fix lost nfserrno() call in nfsd_setattr()
|
|
We can get false negative from __lookup_mnt() if an unrelated vfsmount
gets moved. In that case legitimize_mnt() is guaranteed to fail,
and we will fall back to non-RCU walk... unless we end up running
into a hard error on a filesystem object we wouldn't have reached
if not for that false negative. IOW, delaying that check until
the end of pathname resolution is wrong - we should recheck right
after we attempt to cross the mountpoint. We don't need to recheck
unless we see d_mountpoint() being true - in that case even if
we have just raced with mount/umount, we can simply go on as if
we'd come at the moment when the sucker wasn't a mountpoint; if we
run into a hard error as the result, it was a legitimate outcome.
__lookup_mnt() returning NULL is different in that respect, since
it might've happened due to operation on completely unrelated
mountpoint.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit bd2a31d522344 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'. However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.
This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 9c225f2655e36a4 ("vfs: atomic f_pos accesses as per POSIX") changed
several system calls to use fdget_pos() instead of fdget(), but missed
sys_llseek(). Fix it.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull CIFS fixes from Steve French:
"A fix for the problem which Al spotted in cifs_writev and a followup
(noticed when fixing CVE-2014-0069) patch to ensure that cifs never
sends more than the smb frame length over the socket (as we saw with
that cifs_iovec_write problem that Jeff fixed last month)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: mask off top byte in get_rfc1002_length()
cifs: sanity check length of data to send before sending
CIFS: Fix wrong pos argument of cifs_find_lock_conflict
|
|
Merge misc fixes from Andrew Morton:
"Nine fixes"
* emailed patches from Andrew Morton akpm@linux-foundation.org>:
cris: convert ffs from an object-like macro to a function-like macro
hfsplus: add HFSX subfolder count support
tools/testing/selftests/ipc/msgque.c: handle msgget failure return correctly
MAINTAINERS: blackfin: add git repository
revert "kallsyms: fix absolute addresses for kASLR"
mm/Kconfig: fix URL for zsmalloc benchmark
fs/proc/base.c: fix GPF in /proc/$PID/map_files
mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block
mm: fix GFP_THISNODE callers and clarify
|
|
Adds support for HFSX 'HasFolderCount' flag and a corresponding
'folderCount' field in folder records. (For reference see
HFS_FOLDERCOUNT and kHFSHasFolderCountBit/kHFSHasFolderCountMask in
Apple's source code.)
Ignoring subfolder count leads to fs errors found by Mac:
...
Checking catalog hierarchy.
HasFolderCount flag needs to be set (id = 105)
(It should be 0x10 instead of 0)
Incorrect folder count in a directory (id = 2)
(It should be 7 instead of 6)
...
Steps to reproduce:
Format with "newfs_hfs -s /dev/diskXXX".
Mount in Linux.
Create a new directory in root.
Unmount.
Run "fsck_hfs /dev/diskXXX".
The patch handles directory creation, deletion, and rename.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.
By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone. In this case the path is not initialized but the
return value is still 0. This results in 'general protection fault'
inside d_path().
Steps to reproduce:
CONFIG_CHECKPOINT_RESTORE=y
fd = open(...);
while (1) {
mmap(fd, ...);
munmap(fd, ...);
}
ls -la /proc/$PID/map_files
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com>
Reported-by: <wiebittewas@gmail.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
Clean up file table accesses (get rid of fget_light() in favor of the
fdget() interface), add proper file position locking.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get rid of fget_light()
sockfd_lookup_light(): switch to fdget^W^Waway from fget_light
vfs: atomic f_pos accesses as per POSIX
ocfs2 syncs the wrong range...
|
|
instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.
This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:
"2.9.7 Thread Interactions with Regular File Operations
All of the following functions shall be atomic with respect to each
other in the effects specified in POSIX.1-2008 when they operate on
regular files or symbolic links: [...]"
and one of the effects is the file position update.
This unprotected file position behavior is not new behavior, and nobody
has ever cared. Until now. Yongzhi Pan reported unexpected behavior to
Michael Kerrisk that was due to this.
This resolves the issue with a f_pos-specific lock that is taken by
read/write/lseek on file descriptors that may be shared across threads
or processes.
Reported-by: Yongzhi Pan <panyongzhi@gmail.com>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER
- Fix an Oopsable delegation callback race
- Fix another bad stateid infinite loop
- Fail the data server I/O is the stateid represents a lost lock
- Fix an Oopsable sunrpc trace event"
* tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix oops when trace sunrpc_task events in nfs client
NFSv4: Fail the truncate() if the lock/open stateid is invalid
NFSv4.1 Fail data server I/O if stateid represents a lost lock
NFSv4: Fix the return value of nfs4_select_rw_stateid
NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid
NFS: Fix a delegation callback race
NFSv4: Fix another nfs4_sequence corruptor
|
|
Pull block fixes from Jens Axboe:
"Small collection of fixes for 3.14-rc. It contains:
- Three minor update to blk-mq from Christoph.
- Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to
two. From Micron.
- Make the blk-mq CPU notify spinlock raw, since it can't be a
sleeper spinlock on RT. From Mike Galbraith.
- Drop now bogus BUG_ON() for bio iteration with blk integrity. From
Nic Bellinger.
- Properly propagate the SYNC flag on requests. From Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: add REQ_SYNC early
rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
blk-mq: support partial I/O completions
blk-mq: merge blk_mq_insert_request and blk_mq_run_request
blk-mq: remove blk_mq_alloc_rq
mtip32xx: Reduce the number of unaligned writes to 2
|
|
If the open stateid could not be recovered, or the file locks were lost,
then we should fail the truncate() operation altogether.
Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|