summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2013-02-28fat: add extended fileds to struct fat_boot_sectorOleksij Rempel1-4/+4
Later we will need "state" field to check if volume was cleanly unmounted. Signed-off-by: Oleksij Rempel <bug-track@fisher-privat.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28hfsplus: fix issue with unzeroed unused b-tree nodesVyacheslav Dubeyko1-0/+2
The fsck_hfs (under MacOS X) complains about unzeroed unused b-tree nodes after deletion of folders' tree under Linux. SYMPTOMS: Running Disk Utiltiy's "Verify Disk" on "test" gives the following: Verifying volume “Test” Checking file systemChecking Journaled HFS Plus volume. Checking extents overflow file. Checking catalog file. Unused node is not erased (node = 3111) Checking multi-linked files. Checking catalog hierarchy. Checking extended attributes file. Checking volume bitmap. Checking volume information. The volume Test was found corrupt and needs to be repaired. Error: This disk needs to be repaired. Click Repair Disk. REPRODUCING PATH: 1. Prepare HFS+ (non-case sensitive) partition (for example, 5GB) under MacOS X. 2. Copy linux kernel source tree (for example, 3.7-rc6 version) on this partition under MacOS X. 3. Then switch to Linux and mount this prepared partition. 4. Execute `sudo rm -r` under prepared directory with linux kernel source tree. 5. Unmount and boot back into OS X. 6. Open up Disk Utility and verify partition. REPRODUCIBILITY: 100% FIX: It is added code of node clearing in hfs_bnode_put() method for the case when node has flag HFS_BNODE_DELETED. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Kyle Laracey <kalaracey@gmail.com> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28hfsplus: add support of manipulation by attributes fileVyacheslav Dubeyko13-183/+287
Add support of manipulation by attributes file. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28hfsplus: rework functionality of getting, setting and deleting of extended ↵Vyacheslav Dubeyko5-0/+999
attributes Rework functionality of getting, setting and deleting of extended attributes. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28hfsplus: add functionality of manipulating by records in attributes treeVyacheslav Dubeyko1-0/+399
Add functionality of manipulating by records in attributes tree. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28hfsplus: add on-disk layout declarations related to attributes treeVyacheslav Dubeyko1-2/+66
Add all necessary on-disk layout declarations related to attributes file. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28ocfs2: ac->ac_allow_chain_relink=0 won't disable group relinkXiaowei.Hu2-5/+4
ocfs2_block_group_alloc_discontig() disables chain relink by setting ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple cluster groups. It doesn't keep the credits for all chain relink,but ocfs2_claim_suballoc_bits overrides this in this call trace: ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()-> __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits() ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call ocfs2_search_chain() one time and disable it again, and then we run out of credits. Fix is to allow relink by default and disable it in ocfs2_block_group_alloc_discontig. Without this patch, End-users will run into a crash due to run out of credits, backtrace like this: RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>] jbd2_journal_dirty_metadata+0x164/0x170 [jbd2] RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0 RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8 RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0 R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800 FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0) Call Trace: ocfs2_journal_dirty+0x2f/0x70 [ocfs2] ocfs2_relink_block_group+0x111/0x480 [ocfs2] ocfs2_search_chain+0x455/0x9a0 [ocfs2] ... Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctlyJeff Liu1-1/+1
We need to re-initialize the security for a new reflinked inode with its parent dirs if it isn't specified to be preserved for ocfs2_reflink(). However, the code logic is broken at ocfs2_init_security_and_acl() although ocfs2_init_security_get() succeed. As a result, ocfs2_acl_init() does not involked and therefore the default ACL of parent dir was missing on the new inode. Note this was introduced by 9d8f13ba3 ("security: new security_inode_init_security API adds function callback") To reproduce: set default ACL for the parent dir(ocfs2 in this case): $ setfacl -m default:user:jeff:rwx ../ocfs2/ $ getfacl ../ocfs2/ # file: ../ocfs2/ # owner: jeff # group: jeff user::rwx group::r-x other::r-x default:user::rwx default:user:jeff:rwx default:group::r-x default:mask::rwx default:other::r-x $ touch a $ getfacl a # file: a # owner: jeff # group: jeff user::rw- group::rw- other::r-- Before patching, create reflink file b from a, the user default ACL entry(user:jeff:rwx)was missing: $ ./ocfs2_reflink a b $ getfacl b # file: b # owner: jeff # group: jeff user::rw- group::rw- other::r-- In this case, the end user can also observed an error message at syslog: (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0 After applying this patch, create reflink file c from a: $ ./ocfs2_reflink a c $ getfacl c # file: c # owner: jeff # group: jeff user::rw- user:jeff:rwx #effective:rw- group::r-x #effective:r-- mask::rw- other::r-- Test program: /* Usage: reflink <source> <dest> */ #include <stdio.h> #include <stdint.h> #include <stdbool.h> #include <string.h> #include <errno.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> static int reflink_file(char const *src_name, char const *dst_name, bool preserve_attrs) { int fd; #ifndef REFLINK_ATTR_NONE # define REFLINK_ATTR_NONE 0 #endif #ifndef REFLINK_ATTR_PRESERVE # define REFLINK_ATTR_PRESERVE 1 #endif #ifndef OCFS2_IOC_REFLINK struct reflink_arguments { uint64_t old_path; uint64_t new_path; uint64_t preserve; }; # define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments) #endif struct reflink_arguments args = { .old_path = (unsigned long) src_name, .new_path = (unsigned long) dst_name, .preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE : REFLINK_ATTR_NONE, }; fd = open(src_name, O_RDONLY); if (fd < 0) { fprintf(stderr, "Failed to open %s: %s\n", src_name, strerror(errno)); return -1; } if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) { fprintf(stderr, "Failed to reflink %s to %s: %s\n", src_name, dst_name, strerror(errno)); return -1; } } int main(int argc, char *argv[]) { if (argc != 3) { fprintf(stdout, "Usage: %s source dest\n", argv[0]); return 1; } return reflink_file(argv[1], argv[2], 0); } Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Tao Ma <boyu.mt@taobao.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27Merge branch 'for-linus' of ↵Linus Torvalds199-775/+793
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-27Merge tag 'ext4_for_linus' of ↵Linus Torvalds31-1622/+2105
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Theodore Ts'o: "The one new feature added in this patch series is the ability to use the "punch hole" functionality for inodes that are not using extent maps. In the bug fix category, we fixed some races in the AIO and fstrim code, and some potential NULL pointer dereferences and memory leaks in error handling code paths. In the optimization category, we fixed a performance regression in the jbd2 layer introduced by commit d9b01934d56a ("jbd: fix fsync() tid wraparound bug", introduced in v3.0) which shows up in the AIM7 benchmark. We also further optimized jbd2 by minimize the amount of time that transaction handles are held active. This patch series also features some additional enhancement of the extent status tree, which is now used to cache extent information in a more efficient/compact form than what we use on-disk." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (65 commits) ext4: fix free clusters calculation in bigalloc filesystem ext4: no need to remove extent if len is 0 in ext4_es_remove_extent() ext4: fix xattr block allocation/release with bigalloc ext4: reclaim extents from extent status tree ext4: adjust some functions for reclaiming extents from extent status tree ext4: remove single extent cache ext4: lookup block mapping in extent status tree ext4: track all extent status in extent status tree ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag ext4: rename and improbe ext4_es_find_extent() ext4: add physical block and status member into extent status tree ext4: refine extent status tree ext4: use ERR_PTR() abstraction for ext4_append() ext4: refactor code to read directory blocks into ext4_read_dirblock() ext4: add debugging context for warning in ext4_da_update_reserve_space() ext4: use KERN_WARNING for warning messages jbd2: use module parameters instead of debugfs for jbd_debug ext4: use module parameters instead of debugfs for mballoc_debug ext4: start handle at the last possible moment when creating inodes ext4: fix the number of credits needed for acl ops with inline data ...
2013-02-27Merge branch 'for_linus' of ↵Linus Torvalds16-78/+177
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, ext3, udf updates from Jan Kara: "Several UDF fixes, a support for UDF extent cache, and couple of ext2 and ext3 cleanups and minor fixes" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: Ext2: remove the static function release_blocks to optimize the kernel Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Ext2: remove the overhead check about sb in the function ext2_new_blocks udf: Remove unused s_extLength from udf_bitmap udf: Make s_block_bitmap standard array udf: Fix bitmap overflow on large filesystems with small block size udf: add extent cache support in case of file reading udf: Write LVID to disk after opening / closing Ext3: return ENOMEM rather than EIO if sb_getblk fails Ext2: return ENOMEM rather than EIO if sb_getblk fails Ext3: use unlikely to improve the efficiency of the kernel Ext2: use unlikely to improve the efficiency of the kernel Ext3: add necessary check in case IO error happens Ext2: free memory allocated and forget buffer head when io error happens ext3: Fix memory leak when quota options are specified multiple times ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
2013-02-27Merge tag 'upstream-3.9-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds5-15/+27
Pull ubifs updates from Artem Bityutskiy: "It's been quite silent and we have only a couple of bug-fixes for the orphans handling code plus one cosmetic change." * tag 'upstream-3.9-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: fix double free of ubifs_orphan objects UBIFS: fix use of freed ubifs_orphan objects UBIFS: rename random32() to prandom_u32()
2013-02-27Merge tag 'f2fs-for-3.9' of ↵Linus Torvalds13-261/+292
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "[Major bug fixes] o Store device file information correctly o Fix -EIO handling with respect to power-off-recovery o Allocate blocks with global locks o Fix wrong calculation of the SSR cost [Cleanups] o Get rid of fake on-stack dentries [Enhancement] o Support (un)freeze_fs o Enhance the f2fs_gc flow o Support 32-bit binary execution on 64-bit kernel" * tag 'f2fs-for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: avoid build warning f2fs: add compat_ioctl to provide backward compatability f2fs: fix calculation of max. gc cost in the SSR case f2fs: clarify and enhance the f2fs_gc flow f2fs: optimize the return condition for has_not_enough_free_secs f2fs: make an accessor to get sections for particular block type f2fs: mark gc_thread as NULL when thread creation is failed f2fs: name gc task as per the block device f2fs: remove unnecessary gc option check and balance_fs f2fs: remove repeated F2FS_SET_SB_DIRT call f2fs: when check superblock failed, try to check another superblock f2fs: use F2FS_BLKSIZE to judge bloksize and page_cache_size f2fs: add device name in debugfs f2fs: stop repeated checking if cp is needed f2fs: avoid balanc_fs during evict_inode f2fs: remove the use of page_cache_release f2fs: fix typo mistake for data_version description f2fs: reorganize code for ra_node_page f2fs: avoid redundant call to has_not_enough_free_secs in f2fs_gc f2fs: add un/freeze_fs into super_operations ...
2013-02-26saner proc_get_inode() calling conventionsAl Viro2-21/+10
Make it drop the pde in *all* cases when no new reference to it is put into an inode - both when an inode had already been set up (as we were already doing) and when inode allocation has failed. Makes for simpler logics in callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26proc: avoid extra pde_put() in proc_fill_super()Maxim Patlasov1-6/+15
If proc_get_inode() succeeded, but d_make_root() failed, pde_put() for proc_root will be called twice: the first time due to iput() called from d_make_root() and the second time directly in the end of proc_fill_super(). Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs: change return values from -EACCES to -EPERMZhao Hongjiang6-16/+16
According to SUSv3: [EACCES] Permission denied. An attempt was made to access a file in a way forbidden by its file access permissions. [EPERM] Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource. So -EPERM should be returned if capability checks fails. Strictly speaking this is an API change since the error code user sees is altered. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs/exec.c: make bprm_mm_init() staticYuanhan Liu1-1/+1
There is only one user of bprm_mm_init, and it's inside the same file. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2/dlm: use GFP_ATOMIC inside a spin_lockDan Carpenter1-1/+1
My static checker complains that this is called with a spin_lock held in dlm_master_requery_handler() from dlmrecovery.c. Probably the reason we have not received any bug reports about this is that recovery is not a common operation. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2: fix possible use-after-free with AIOJan Kara1-1/+1
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code pathSunil Mushran1-1/+1
Commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1 was missing a var init. Reported-and-Tested-by: Vincent Etienne <vetienne@aprogsys.com> Signed-off-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zeroAl Viro2-3/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26export kernel_write(), convert open-coded instancesAl Viro2-7/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs: encode_fh: return FILEID_INVALID if invalid fid_typeNamjae Jeon10-19/+19
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26kill f_vfsmntAl Viro3-4/+4
very few users left... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry opJeff Layton7-13/+53
The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26nfsd: handle vfs_getattr errors in acl protocolJ. Bruce Fields4-9/+24
We're currently ignoring errors from vfs_getattr. The correct thing to do is to do the stat in the main service procedure not in the response encoding. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26switch vfs_getattr() to struct pathAl Viro9-30/+34
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ceph: prepopulate inodes only when request is abortedSage Weil1-2/+38
If r_aborted is true, we do not hold the dir i_mutex, and cannot touch the dcache. However, we still need to update the inodes with the state returned by the MDS. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26d_hash_and_lookup(): export, switch open-coded instancesAl Viro4-24/+18
* calling conventions change - ERR_PTR() is returned on ->d_hash() errors; NULL is just for dcache miss now. * exported, open-coded instances in ncpfs and cifs converted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()Al Viro3-35/+33
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: split dropping the acls from v9fs_set_create_acl()Al Viro3-21/+28
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_acl_chmod() from dentry to inode+fidAl Viro3-16/+13
caller has both, might as well pass them explicitly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_set_acl() from dentry to fidAl Viro1-5/+12
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: lift the call of set_cached_acl() into the callers of v9fs_set_acl()Al Viro1-4/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: add fid-based variant of v9fs_xattr_set()Al Viro2-15/+20
... making v9fs_xattr_set() a wrapper for it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26hugetlb_file_setup(): use d_alloc_pseudo()Al Viro1-4/+15
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds1-12/+17
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Fix for 3.8 breakage introduced by "vfs: Allow unprivileged manipulation of the mount namespace" - accessing mnt->mnt_ns is done there without needed locking *and* without any real need. Definite -stable fodder, fortunately not going too far back. This is *not* all - there will be much bigger vfs pull request tomorrow." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of unprotected dereferencing of mnt->mnt_ns
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds75-500/+773
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-24Merge branch 'for-linus' of ↵Linus Torvalds2-67/+70
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...
2013-02-24Merge branch 'akpm' (more incoming from Andrew)Linus Torvalds6-16/+19
Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...
2013-02-24fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_usedZhang Yanfei3-9/+9
The three variables are calculated from nr_free_buffer_pages so change their types to unsigned long in case of overflow. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24fs/buffer.c: change type of max_buffer_heads to unsigned longZhang Yanfei1-2/+2
max_buffer_heads is calculated from nr_free_buffer_pages(), so change its type to unsigned long in case of overflow. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24swap: make each swap partition have one address_spaceShaohua Li1-2/+2
When I use several fast SSD to do swap, swapper_space.tree_lock is heavily contended. This makes each swap partition have one address_space to reduce the lock contention. There is an array of address_space for swap. The swap entry type is the index to the array. In my test with 3 SSD, this increases the swapout throughput 20%. [akpm@linux-foundation.org: revert unneeded change to __add_to_swap_cache] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-failure: use num_poisoned_pages instead of mce_bad_pagesXishi Qiu1-1/+1
Since MCE is an x86 concept, and this code is in mm/, it would be better to use the name num_poisoned_pages instead of mce_bad_pages. [akpm@linux-foundation.org: fix mm/sparse.c] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Suggested-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: make do_mmap_pgoff return populate as a size in bytes, not as a boolMichel Lespinasse1-3/+2
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple, however there was no equivalent code in mm_populate(), which caused issues. This could be fixed by introduced the same rounding in mm_populate(), however I think it's preferable to make do_mmap_pgoff() return populate as a size rather than as a boolean, so we don't have to duplicate the size rounding logic in mm_populate(). Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: introduce mm_populate() for populating new vmasMichel Lespinasse1-1/+5
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or with MCL_FUTURE in effect), we want to populate the pages within the newly created vmas. This may take a while as we may have to read pages from disk, so ideally we want to do this outside of the write-locked mmap_sem region. This change introduces mm_populate(), which is used to defer populating such mappings until after the mmap_sem write lock has been released. This is implemented as a generalization of the former do_mlock_pages(), which accomplished the same task but was using during mlock() / mlockall(). Signed-off-by: Michel Lespinasse <walken@google.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23hostfs: directory methods have no business in non-directory inode_operationsAl Viro1-8/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-23__d_materialise_unique() is too genericAl Viro1-14/+5
Its first argument is always non-root, while the second one is always root. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-23fs: Fix possible use-after-free with AIOJan Kara1-1/+1
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. CC: Christoph Hellwig <hch@infradead.org> CC: Jens Axboe <axboe@kernel.dk> CC: Jeff Moyer <jmoyer@redhat.com> CC: stable@vger.kernel.org Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-23constify d_lookup() argumentsAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>