summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2009-02-04Btrfs: Change btree locking to use explicit blocking pointsChris Mason11-39/+470
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: hash_lock is no longer neededChris Mason3-11/+1
Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: disable leak debugging checks in extent_io.cChris Mason1-8/+8
extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: sort references by byte number during btrfs_inc_refChris Mason1-6/+79
When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: async threads should try harder to find workChris Mason2-6/+46
Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: selinux supportJim Owens3-4/+53
Add call to LSM security initialization and save resulting security xattr for new inodes. Add xattr support to symlink inode ops. Set inode->i_op for existing special files. Signed-off-by: jim owens <jowens@hp.com>
2009-02-04Btrfs: make btrfs acls selectableChristian Hesse1-0/+13
This patch adds a menu entry to kconfig to enable acls for btrfs. This allows you to enable FS_POSIX_ACL at kernel compile time. (updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead) Signed-off-by: Christian Hesse <mail@earthworm.de> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2009-02-04Btrfs: Catch missed bios in the async bio submission threadChris Mason2-3/+18
The async bio submission thread was missing some bios that were added after it had decided there was no work left to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-28Btrfs: fix readdir on 32 bit machinesChris Mason1-1/+1
After btrfs_readdir has gone through all the directory items, it sets the directory f_pos to the largest possible int. This way applications that mix readdir with creating new files don't end up in an endless loop finding the new directory items as they go. It was a workaround for a bug in git, but the assumption was that if git could make this looping mistake than it would be a common problem. The largest possible int chosen was INT_LIMIT(typeof(file->f_pos), and it is possible for that to be a larger number than 32 bit glibc expects to come out of readdir. This patches switches that to INT_LIMIT(off_t), which should keep applications happy on 32 and 64 bit machines. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-29Merge branch 'master' of ↵Chris Mason19-318/+418
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable Fix fs/btrfs/super.c conflict around #includes
2009-01-28nfsd: only set file_lock.fl_lmops in nfsd4_lockt if a stateowner is foundJeff Layton1-1/+0
nfsd4_lockt does a search for a lockstateowner when building the lock struct to test. If one is found, it'll set fl_owner to it. Regardless of whether that happens, it'll also set fl_lmops. Given that this lock is basically a "lightweight" lock that's just used for checking conflicts, setting fl_lmops is probably not appropriate for it. This behavior exposed a bug in DLM's GETLK implementation where it wasn't clearing out the fields in the file_lock before filling in conflicting lock info. While we were able to fix this in DLM, it still seems pointless and dangerous to set the fl_lmops this way when we may have a NULL lockstateowner. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@pig.fieldses.org>
2009-01-28nfsd: fix cred leak on every rpcJ. Bruce Fields1-0/+1
Since override_creds() took its own reference on new, we need to release our own reference. (Note the put_cred on the return value puts the *old* value of current->creds, not the new passed-in value). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-28nfsd: fix null dereference on error pathJ. Bruce Fields1-0/+2
We're forgetting to check the return value from groups_alloc(). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-26Merge branch 'for-linus' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: initialize file_lock struct in GETLK before copying conflicting lock dlm: fix plock notify callback to lockd
2009-01-26Merge branch 'for_linus' of ↵Linus Torvalds2-263/+124
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: ocfs2: Remove ocfs2_dquot_initialize() and ocfs2_dquot_drop() quota: Improve locking
2009-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds1-0/+6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: klist.c: bit 0 in pointer can't be used as flag debugfs: introduce stub for debugfs_create_size_t() when DEBUG_FS=n sysfs: fix problems with binary files PNP: fix broken pnp lowercasing for acpi module aliases driver core: Convert '/' to '!' in dev_set_name()
2009-01-26Merge branch 'Kconfig' of ↵Linus Torvalds39-1363/+1284
git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc * 'Kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc: (36 commits) fs/Kconfig: move 9p out fs/Kconfig: move afs out fs/Kconfig: move coda out fs/Kconfig: move the rest of ncpfs out fs/Kconfig: move smbfs out fs/Kconfig: move sunrpc out fs/Kconfig: move nfsd out fs/Kconfig: move nfs out fs/Kconfig: move ufs out fs/Kconfig: move sysv out fs/Kconfig: move romfs out fs/Kconfig: move qnx4 out fs/Kconfig: move hpfs out fs/Kconfig: move omfs out fs/Kconfig: move minix out fs/Kconfig: move vxfs out fs/Kconfig: move squashfs out fs/Kconfig: move cramfs out fs/Kconfig: move efs out fs/Kconfig: move bfs out ...
2009-01-26inotify: clean up inotify_read and fix locking problemsVegard Nossum1-61/+74
If userspace supplies an invalid pointer to a read() of an inotify instance, the inotify device's event list mutex is unlocked twice. This causes an unbalance which effectively leaves the data structure unprotected, and we can trigger oopses by accessing the inotify instance from different tasks concurrently. The best fix (contributed largely by Linus) is a total rewrite of the function in question: On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote: > The thing to notice is that: > > - locking is done in just one place, and there is no question about it > not having an unlock. > > - that whole double-while(1)-loop thing is gone. > > - use multiple functions to make nesting and error handling sane > > - do error testing after doing the things you always need to do, ie do > this: > > mutex_lock(..) > ret = function_call(); > mutex_unlock(..) > > .. test ret here .. > > instead of doing conditional exits with unlocking or freeing. > > So if the code is written in this way, it may still be buggy, but at least > it's not buggy because of subtle "forgot to unlock" or "forgot to free" > issues. > > This _always_ unlocks if it locked, and it always frees if it got a > non-error kevent. Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Robert Love <rlove@google.com> Cc: <stable@kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-26Merge branch 'for-linus' of ↵Linus Torvalds3-18/+30
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix poll notify fuse: destroy bdi on umount fuse: fuse_fill_super error handling cleanup fuse: fix missing fput on error fuse: fix NULL deref in fuse_file_alloc()
2009-01-26fuse: fix poll notifyMiklos Szeredi1-4/+9
Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup(). This is not a big issue because fuse_notify_poll_wakeup() should be atomic, but it's cleaner this way, and later uses of notification will need to be able to finish the copying before performing some actions. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-01-26fuse: destroy bdi on umountMiklos Szeredi2-2/+3
If a fuse filesystem is unmounted but the device file descriptor remains open and a new mount reuses the old device number, then the mount fails with EEXIST and the following warning is printed in the kernel log: WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d() sysfs: duplicate filename '0:15' can not be created The cause is that the bdi belonging to the fuse filesystem was destoryed only after the device file was released. Fix this by calling bdi_destroy() from fuse_put_super() instead. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
2009-01-26fuse: fuse_fill_super error handling cleanupMiklos Szeredi1-18/+19
Clean up error handling for the whole of fuse_fill_super() function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-01-26fuse: fix missing fput on errorMiklos Szeredi1-2/+7
Fix the leaking file reference if allocation or initialization of fuse_conn failed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
2009-01-26fuse: fix NULL deref in fuse_file_alloc()Dan Carpenter1-1/+1
ff is set to NULL and then dereferenced on line 65. Compile tested only. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
2009-01-22Btrfs: do less aggressive btree readaheadChris Mason1-16/+5
Just before reading a leaf, btrfs scans the node for blocks that are close by and reads them too. It tries to build up a large window of IO looking for blocks that are within a max distance from the top and bottom of the IO window. This patch changes things to just look for blocks within 64k of the target block. It will trigger less IO and make for lower latencies on the read size. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-22fs/Kconfig: move 9p outAlexey Dobriyan2-11/+11
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move afs outAlexey Dobriyan2-22/+22
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move coda outAlexey Dobriyan2-22/+22
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move the rest of ncpfs outAlexey Dobriyan2-22/+21
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move smbfs outAlexey Dobriyan2-57/+56
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move sunrpc outAlexey Dobriyan1-79/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move nfsd outAlexey Dobriyan2-81/+81
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move nfs outAlexey Dobriyan2-86/+87
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move ufs outAlexey Dobriyan2-44/+44
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move sysv outAlexey Dobriyan2-38/+37
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move romfs outAlexey Dobriyan2-18/+17
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move qnx4 outAlexey Dobriyan2-26/+26
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move hpfs outAlexey Dobriyan2-16/+15
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move omfs outAlexey Dobriyan2-14/+14
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move minix outAlexey Dobriyan2-18/+18
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move vxfs outAlexey Dobriyan2-17/+17
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move squashfs outAlexey Dobriyan2-52/+52
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move cramfs outAlexey Dobriyan2-20/+20
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move efs outAlexey Dobriyan2-16/+15
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move bfs outAlexey Dobriyan2-22/+20
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move befs outAlexey Dobriyan2-27/+27
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move hfs, hfsplus outAlexey Dobriyan3-27/+27
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move ecryptfs outAlexey Dobriyan2-12/+12
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move affs outAlexey Dobriyan2-22/+22
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22fs/Kconfig: move adfs outAlexey Dobriyan2-27/+28
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>