summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2011-03-28eCryptfs: ecryptfs_keyring_auth_tok_for_sig() bug fixRoberto Sassu1-0/+1
The pointer '(*auth_tok_key)' is set to NULL in case request_key() fails, in order to prevent its use by functions calling ecryptfs_keyring_auth_tok_for_sig(). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-03-28eCryptfs: Unlock page in write_begin error pathTyler Hicks1-0/+5
Unlock the page in error path of ecryptfs_write_begin(). This may happen, for example, if decryption fails while bring the page up-to-date. Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-03-28ecryptfs: modify write path to encrypt page in writepageThieu Le6-30/+38
Change the write path to encrypt the data only when the page is written to disk in ecryptfs_writepage. Previously, ecryptfs encrypts the page in ecryptfs_write_end which means that if there are multiple write requests to the same page, ecryptfs ends up re-encrypting that page over and over again. This patch minimizes the number of encryptions needed. Signed-off-by: Thieu Le <thieule@chromium.org> [tyhicks: Changed NULL .drop_inode sop pointer to generic_drop_inode] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-03-28eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flagTyler Hicks3-27/+14
Now that grow_file() is not called in the ecryptfs_create() path, the ECRYPTFS_NEW_FILE flag is no longer needed. It helped ecryptfs_readpage() know not to decrypt zeroes that were read from the lower file in the grow_file() path. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-03-28eCryptfs: Remove unnecessary grow_file() functionTyler Hicks2-23/+1
When creating a new eCryptfs file, the crypto metadata is written out and then the lower file was being "grown" with 4 kB of encrypted zeroes. I suspect that growing the encrypted file was to prevent an information leak that the unencrypted file was empty. However, the unencrypted file size is stored, in plaintext, in the metadata so growing the file is unnecessary. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-03-28Merge branch 'for-linus-1' of git://git.infradead.org/mtd-2.6Linus Torvalds1-1/+1
* 'for-linus-1' of git://git.infradead.org/mtd-2.6: (49 commits) mtd: mtdswap: fix compilation warning mtdswap: kill strict error handling option mtd: nand: enable software BCH ECC in nand simulator mtd: nand: add software BCH ECC support mtd: fix printf format warnings, mostly lack of %zd for size_t, in mtdswap mtd: sm_rtl: check kmalloc return value mtd: cfi: add support for AMIC flashes (e.g. A29L160AT) lib: add shared BCH ECC library mtd: mxc_nand: fix OOB corruption when page size > 2KiB mtd: DaVinci: Removed header file that is not required mtd: pxa3xx_nand: clean the keep configure code mtd: pxa3xx_nand: mtd scan id process could be defined by driver itself mtd: pxa3xx_nand: unify prepare command mtd: pxa3xx_nand: discard wait_for_event,write_cmd,__readid function mtd: pxa3xx_nand: rework irq logic mtd: pxa3xx_nand: make scan procedure more clear mtd: speedtest: fix integer overflow mtd: mxc_nand: fix read past buffer end mtd: omap3: nand: report corrected ecc errors jffs2: remove a trailing white space in commentaries ...
2011-03-28fs: fix inode.c kernel-doc warningRandy Dunlap1-1/+1
Fix inode.c kernel-doc fatal error: 2 comment sections have the same name: Error(fs/inode.c:1171): duplicate section name 'Note' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-28proc: fix oops on invalid /proc/<pid>/maps accessLinus Torvalds1-1/+2
When m_start returns an error, the seq_file logic will still call m_stop with that error entry, so we'd better make sure that we check it before using it as a vma. Introduced by commit ec6fd8a4355c ("report errors in /proc/*/*map* sanely"), which replaced NULL with various ERR_PTR() cases. (On ia64, you happen to get a unaligned fault instead of a page fault, since the address used is generally some random error code like -EPERM) Reported-by: Anca Emanuel <anca.emanuel@gmail.com> Reported-by: Tony Luck <tony.luck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: Stephen Wilson <wilsons@start.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-27NFS: Don't leak RPC clients in NFSv4 secinfo negotiationTrond Myklebust1-1/+3
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-27NFS: Fix a hang in the writeback pathTrond Myklebust2-12/+5
Now that the inode scalability patches have been merged, it is no longer safe to call igrab() under the inode->i_lock. Now that we no longer call nfs_clear_request() until the nfs_page is being freed, we know that we are always holding a reference to the nfs_open_context, which again holds a reference to the path, and so the inode cannot be freed until the last nfs_page has been removed from the radix tree and freed. We can therefore skip the igrab()/iput() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-26codafs: fix build break when CONFIG_PROC_SYSCTL=nRakib Mullick1-0/+9
Commit 0bc825d240ab ("codafs: fix compile warning when CONFIG_SYSCTL=n") introduces build breakage, when CONFIG_PROC_SYSCTL=n and CONFIG_CODA_FS=y: fs/built-in.o: In function `init_coda': psdev.c:(.init.text+0xc02): undefined reference to `coda_sysctl_init' psdev.c:(.init.text+0xc7c): undefined reference to `coda_sysctl_clean' fs/built-in.o: In function `exit_coda': psdev.c:(.exit.text+0xa9): undefined reference to `coda_sysctl_clean' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-26Btrfs: mark the bio with an error if we have a failure in dioJosef Bacik1-0/+8
I noticed that dio_end_io calls the appropriate endio function with an error, but the endio functions don't actually do anything with that error, they assume that if there was an error then the bio will not be uptodate. So if we had checksum failures we would never pass back EIO. So if there is an error in our endio functions make sure to clear the uptodate flag on the bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-03-26Btrfs: don't allocate dip->csums when doing writesJosef Bacik1-2/+5
When doing direct writes we store the checksums in the ordered sum stuff in the ordered extent for writing them when the write completes, so we don't even use the dip->csums array. So if we're writing, don't bother allocating dip->csums since we won't use it anyway. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-03-26Btrfs: cleanup how we setup free space clustersJosef Bacik2-185/+182
This patch makes the free space cluster refilling code a little easier to understand, and fixes some things with the bitmap part of it. Currently we either want to refill a cluster with 1) All normal extent entries (those without bitmaps) 2) A bitmap entry with enough space The current code has this ugly jump around logic that will first try and fill up the cluster with extent entries and then if it can't do that it will try and find a bitmap to use. So instead split this out into two functions, one that tries to find only normal entries, and one that tries to find bitmaps. This also fixes a suboptimal thing we would do with bitmaps. If we used a bitmap we would just tell the cluster that we were pointing at a bitmap and it would do the tree search in the block group for that entry every time we tried to make an allocation. Instead of doing that now we just add it to the clusters group. I tested this with my ENOSPC tests and xfstests and it survived. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-03-26xfs: stop using the page cache to back the buffer cacheDave Chinner2-297/+84
Now that the buffer cache has it's own LRU, we do not need to use the page cache to provide persistent caching and reclaim infrastructure. Convert the buffer cache to use alloc_pages() instead of the page cache. This will remove all the overhead of page cache management from setup and teardown of the buffers, as well as needing to mark pages accessed as we find buffers in the buffer cache. By avoiding the page cache, we also remove the need to keep state in the page_private(page) field for persistant storage across buffer free/buffer rebuild and so all that code can be removed. This also fixes the long-standing problem of not having enough bits in the page_private field to track all the state needed for a 512 sector/64k page setup. It also removes the need for page locking during reads as the pages are unique to the buffer and nobody else will be attempting to access them. Finally, it removes the buftarg address space lock as a point of global contention on workloads that allocate and free buffers quickly such as when creating or removing large numbers of inodes in parallel. This remove the 16TB limit on filesystem size on 32 bit machines as the page index (32 bit) is no longer used for lookups of metadata buffers - the buffer cache is now solely indexed by disk address which is stored in a 64 bit field in the buffer. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26xfs: register the inode cache shrinker before quotachecksDave Chinner1-10/+24
During mount, we can do a quotacheck that involves a bulkstat pass on all inodes. If there are more inodes in the filesystem than can be held in memory, we require the inode cache shrinker to run to ensure that we don't run out of memory. Unfortunately, the inode cache shrinker is not registered until we get to the end of the superblock setup process, which is after a quotacheck is run if it is needed. Hence we need to register the inode cache shrinker earlier in the mount process so that we don't OOM during mount. This requires that we also initialise the syncd work before we register the shrinker, so we nee dto juggle that around as well. While there, make sure that we have set up the block sizes in the VFS superblock correctly before the quotacheck is run so that any inodes that are cached as a result of the quotacheck have their block size fields set up correctly. Cc: stable@kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26xfs: xfs_trans_read_buf() should return an error on failureDave Chinner1-1/+2
When inside a transaction and we fail to read a buffer, xfs_trans_read_buf returns a null buffer pointer and no error. xfs_do_da_buf() checks the error return, but not the buffer, and as a result this read failure condition causes a panic when it attempts to dereference the non-existant buffer. Make xfs_trans_read_buf() return the same error for this situation regardless of whether it is in a transaction or not. This means every caller does not need to check both the error return and the buffer before proceeding to use the buffer. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26xfs: introduce inode cluster buffer trylocks for xfs_iflushDave Chinner4-8/+32
There is an ABBA deadlock between synchronous inode flushing in xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the buffer, then takes inode ilocks, whilst synchronous reclaim takes the ilock followed by the buffer lock in xfs_iflush(). To avoid this deadlock, separate the inode cluster buffer locking semantics from the synchronous inode flush semantics, allowing callers to attempt to lock the buffer but still issue synchronous IO if it can get the buffer. This requires xfs_iflush() calls that currently use non-blocking semantics to pass SYNC_TRYLOCK rather than 0 as the flags parameter. This allows xfs_reclaim_inode to avoid the deadlock on the buffer lock and detect the failure so that it can drop the inode ilock and restart the reclaim attempt on the inode. This allows xfs_ifree_cluster to obtain the inode lock, mark the inode stale and release it and hence defuse the deadlock situation. It also has the pleasant side effect of avoiding IO in xfs_reclaim_inode when it tries to next reclaim the inode as it is now marked stale. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26vmap: flush vmap aliases when mapping failsDave Chinner1-3/+11
On 32 bit systems, vmalloc space is limited and XFS can chew through it quickly as the vmalloc space is lazily freed. This can result in failure to map buffers, even when there is apparently large amounts of vmalloc space available. Hence, if we fail to map a buffer, purge the aliases that have not yet been freed to hopefuly free up enough vmalloc space to allow a retry to succeed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26xfs: preallocation transactions do not need to be synchronousDave Chinner4-2/+12
Preallocation and hole punch transactions are currently synchronous and this is causing performance problems in some cases. The transactions don't need to be synchronous as we don't need to guarantee the preallocation is persistent on disk until a fdatasync, fsync, sync operation occurs. If the file is opened O_SYNC or O_DATASYNC, only then should the transaction be issued synchronously. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-25ceph: flush msgr_wq during mds_client shutdownSage Weil1-0/+6
The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-25Merge branch 'nfs-for-2.6.39' of ↵Linus Torvalds19-252/+1600
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (28 commits) Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO NFSv4.1 convert layoutcommit sync to boolean NFSv4.1 pnfs_layoutcommit_inode fixes NFS: Determine initial mount security NFS: use secinfo when crossing mountpoints NFS: Add secinfo procedure NFS: lookup supports alternate client NFS: convert call_sync() to a function NFSv4.1 remove temp code that prevented ds commits NFSv4.1: layoutcommit NFSv4.1: filelayout driver specific code for COMMIT NFSv4.1: remove GETATTR from ds commits NFSv4.1: add generic layer hooks for pnfs COMMIT NFSv4.1: alloc and free commit_buckets NFSv4.1: shift filelayout_free_lseg NFSv4.1: pull out code from nfs_commit_release NFSv4.1: pull error handling out of nfs_commit_list NFSv4.1: add callback to nfs4_commit_done NFSv4.1: rearrange nfs_commit_rpcsetup NFSv4.1: don't send COMMIT to ds for data sync writes ...
2011-03-25Merge branch 'for_linus' of ↵Linus Torvalds15-348/+450
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: fix a BUG in mb_mark_used during trim. ext4: unused variables cleanup in fs/ext4/extents.c ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep() ext4: add more tracepoints and use dev_t in the trace buffer ext4: don't kfree uninitialized s_group_info members ext4: add missing space in printk's in __ext4_grp_locked_error() ext4: add FITRIM to compat_ioctl. ext4: handle errors in ext4_clear_blocks() ext4: unify the ext4_handle_release_buffer() api ext4: handle errors in ext4_rename jbd2: add COW fields to struct jbd2_journal_handle jbd2: add the b_cow_tid field to journal_head struct ext4: Initialize fsync transaction ids in ext4_new_inode() ext4: Use single thread to perform DIO unwritten convertion ext4: optimize ext4_bio_write_page() when no extent conversion is needed ext4: skip orphan cleanup if fs has unknown ROCOMPAT features ext4: use the nblocks arg to ext4_truncate_restart_trans() ext4: fix missing iput of root inode for some mount error paths ext4: make FIEMAP and delayed allocation play well together ext4: suppress verbose debugging information if malloc-debug is off ... Fi up conflicts in fs/ext4/super.c due to workqueue changes
2011-03-25Merge branch 'master' of ↵Artem Bityutskiy502-9389/+13303
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus-1 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (9356 commits) [media] rc: update for bitop name changes fs: simplify iget & friends fs: pull inode->i_lock up out of writeback_single_inode fs: rename inode_lock to inode_hash_lock fs: move i_wb_list out from under inode_lock fs: move i_sb_list out from under inode_lock fs: remove inode_lock from iput_final and prune_icache fs: Lock the inode LRU list separately fs: factor inode disposal fs: protect inode->i_state with inode->i_lock lib, arch: add filter argument to show_mem and fix private implementations SLUB: Write to per cpu data when allocating it slub: Fix debugobjects with lockless fastpath autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd() autofs4 - remove autofs4_lock autofs4 - fix d_manage() return on rcu-walk autofs4 - fix autofs4_expire_indirect() traversal autofs4 - fix dentry leak in autofs4_expire_direct() autofs4 - reinstate last used update on access vfs - check non-mountpoint dentry might block in __follow_mount_rcu() ... NOTE! This merge commit was created to fix compilation error. The block tree was merged upstream and removed the 'elv_queue_empty()' function which the new 'mtdswap' driver is using. So a simple merge of the mtd tree with upstream does not compile. And the mtd tree has already be published, so re-basing it is not an option. To fix this unfortunate situation, I had to merge upstream into the mtd-2.6.git tree without committing, put the fixup patch on top of this, and then commit this. The result is that we do not have commits which do not compile. In other words, this merge commit "merges" 3 things: the MTD tree, the upstream tree, and the fixup patch.
2011-03-25nfsd: fix auth_domain reference leak on nlm operationsJ. Bruce Fields1-1/+0
This was noticed by users who performed more than 2^32 lock operations and hence made this counter overflow (eventually leading to use-after-free's). Setting rq_client to NULL here means that it won't later get auth_domain_put() when it should be. Appears to have been introduced in 2.5.42 by "[PATCH] kNFSd: Move auth domain lookup into svcauth" which moved most of the rq_client handling to common svcauth code, but left behind this one line. Cc: Neil Brown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-25Merge branch 'for-linus' of ↵Linus Torvalds18-516/+586
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: simplify iget & friends fs: pull inode->i_lock up out of writeback_single_inode fs: rename inode_lock to inode_hash_lock fs: move i_wb_list out from under inode_lock fs: move i_sb_list out from under inode_lock fs: remove inode_lock from iput_final and prune_icache fs: Lock the inode LRU list separately fs: factor inode disposal fs: protect inode->i_state with inode->i_lock autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd() autofs4 - remove autofs4_lock autofs4 - fix d_manage() return on rcu-walk autofs4 - fix autofs4_expire_indirect() traversal autofs4 - fix dentry leak in autofs4_expire_direct() autofs4 - reinstate last used update on access vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
2011-03-25fs: simplify iget & friendsChristoph Hellwig1-179/+83
Merge get_new_inode/get_new_inode_fast into iget5_locked/iget_locked as those were the only callers. Remove the internal ifind/ifind_fast helpers - ifind_fast only had a single caller, and ifind had two callers wanting it to do different things. Also clean up the comments in this area to focus on information important to a developer trying to use it, instead of overloading them with implementation details. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: pull inode->i_lock up out of writeback_single_inodeDave Chinner1-7/+11
First thing we do in writeback_single_inode() is take the i_lock and the last thing we do is drop it. A caller already holds the i_lock, so pull the i_lock out of writeback_single_inode() to reduce the round trips on this lock during inode writeback. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: rename inode_lock to inode_hash_lockDave Chinner5-55/+63
All that remains of the inode_lock is protecting the inode hash list manipulation and traversals. Rename the inode_lock to inode_hash_lock to reflect it's actual function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: move i_wb_list out from under inode_lockDave Chinner4-38/+59
Protect the inode writeback list with a new global lock inode_wb_list_lock and use it to protect the list manipulations and traversals. This lock replaces the inode_lock as the inodes on the list can be validity checked while holding the inode->i_lock and hence the inode_lock is no longer needed to protect the list. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: move i_sb_list out from under inode_lockDave Chinner6-56/+67
Protect the per-sb inode list with a new global lock inode_sb_list_lock and use it to protect the list manipulations and traversals. This lock replaces the inode_lock as the inodes on the list can be validity checked while holding the inode->i_lock and hence the inode_lock is no longer needed to protect the list. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: remove inode_lock from iput_final and prune_icacheDave Chinner2-15/+4
Now that inode state changes are protected by the inode->i_lock and the inode LRU manipulations by the inode_lru_lock, we can remove the inode_lock from prune_icache and the initial part of iput_final(). instead of using the inode_lock to protect the inode during iput_final, use the inode->i_lock instead. This protects the inode against new references being taken while we change the inode state to I_FREEING, as well as preventing prune_icache from grabbing the inode while we are manipulating it. Hence we no longer need the inode_lock in iput_final prior to setting I_FREEING on the inode. For prune_icache, we no longer need the inode_lock to protect the LRU list, and the inodes themselves are protected against freeing races by the inode->i_lock. Hence we can lift the inode_lock from prune_icache as well. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: Lock the inode LRU list separatelyDave Chinner1-9/+30
Introduce the inode_lru_lock to protect the inode_lru list. This lock is nested inside the inode->i_lock to allow the inode to be added to the LRU list in iput_final without needing to deal with lock inversions. This keeps iput_final() clean and neat. Further, where marking the inode I_FREEING and removing it from the LRU, move the LRU list manipulation within the inode->i_lock to keep the list manipulation consistent with iput_final. This also means that most of the open coded LRU list removal + unused inode accounting can now use the inode_lru_list_del() wrappers which cleans the code up further. However, this locking change means what the LRU traversal in prune_icache() inverts this lock ordering and needs to use trylock semantics on the inode->i_lock to avoid deadlocking. In these cases, if we fail to lock the inode we move it to the back of the LRU to prevent spinning on it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: factor inode disposalDave Chinner1-63/+41
We have a couple of places that dispose of inodes. factor the disposal into evict() to isolate this code and make it simpler to peel away the inode_lock from the code. While doing this, change the logic flow in iput_final() to separate the different cases that need to be handled to make the transitions the inode goes through more obvious. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25fs: protect inode->i_state with inode->i_lockDave Chinner7-72/+169
Protect inode state transitions and validity checks with the inode->i_lock. This enables us to make inode state transitions independently of the inode_lock and is the first step to peeling away the inode_lock from the code. This requires that __iget() is done atomically with i_state checks during list traversals so that we don't race with another thread marking the inode I_FREEING between the state check and grabbing the reference. Also remove the unlock_new_inode() memory barrier optimisation required to avoid taking the inode_lock when clearing I_NEW. Simplify the code by simply taking the inode->i_lock around the state change and wakeup. Because the wakeup is no longer tricky, remove the wake_up_inode() function and open code the wakeup where necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-25Merge branch 'nfs-for-2.6.39' into nfs-for-nextTrond Myklebust16-190/+1190
2011-03-25Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFOWeston Andros Adamson6-84/+226
changes LAYOUTGET and GETDEVICEINFO XDR parsing to: - not use vmap, which doesn't work on incoherent archs - use xdr_stream parsing for all xdr Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFSv4.1 convert layoutcommit sync to booleanAndy Adamson6-10/+12
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFSv4.1 pnfs_layoutcommit_inode fixesAndy Adamson1-11/+19
Test NFS_INO_LAYOUTCOMMIT before kzalloc Mark inode dirty to retry LAYOUTCOMMIT on kzalloc failure. Add comments. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24autofs4: Do not potentially dereference NULL pointer returned by fget() in ↵Jesper Juhl1-0/+4
autofs_dev_ioctl_setpipefd() In fs/autofs4/dev-ioctl.c::autofs_dev_ioctl_setpipefd() we call fget(), which may return NULL, but we do not explicitly test for that NULL return so we may end up dereferencing a NULL pointer - bad. When I originally submitted this patch I had chosen EBUSY as the return value to use if this happens. Ian Kent was kind enough to explain why that would most likely be wrong and why EBADF should most likely be used instead. This version of the patch uses EBADF. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - remove autofs4_lockIan Kent4-31/+20
The autofs4_lock introduced by the rcu-walk changes has unnecessarily broad scope. The locking is better handled by the per-autofs super block lookup_lock. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix d_manage() return on rcu-walkIan Kent1-0/+2
The daemon never needs to block and, in the rcu-walk case an error return isn't used, so always return zero. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix autofs4_expire_indirect() traversalIan Kent1-1/+51
The vfs-scale changes changed the traversal used in autofs4_expire_indirect() from a list to a depth first tree traversal which isn't right. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix dentry leak in autofs4_expire_direct()Ian Kent1-4/+3
There is a missing dput() when returning from autofs4_expire_direct() when we see that the dentry is already a pending mount. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - reinstate last used update on accessIan Kent2-34/+14
When direct (and offset) mounts were introduced the the last used timeout could no longer be updated in ->d_revalidate(). This is because covered direct mounts would be followed over without calling the autofs file system. As a result the definition of the busyness check for all entries was changed to be "actually busy" being an open file or working directory within the automount. But now we have a call back in the follow so the last used update on any access can be re-instated. This requires DCACHE_MANAGE_TRANSIT to always be set. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24vfs - check non-mountpoint dentry might block in __follow_mount_rcu()Ian Kent1-5/+18
When following a mount in rcu-walk mode we must check if the incoming dentry is telling us it may need to block, even if it isn't actually a mountpoint. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24NFS: Determine initial mount securityBryan Schumaker1-2/+31
When sec=<something> is not presented as a mount option, we should attempt to determine what security flavor the server is using. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFS: use secinfo when crossing mountpointsBryan Schumaker5-11/+142
A submount may use different security than the parent mount does. We should figure out what sec flavor the submount uses at mount time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFS: Add secinfo procedureBryan Schumaker2-0/+170
This patch adds the nfs4 operation secinfo as a valid nfs rpc operation. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFS: lookup supports alternate clientBryan Schumaker7-48/+59
A later patch will need to perform a lookup using an alternate client with a different security flavor. This patch adds support for doing that on NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>