kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2008-10-29	Btrfs: Add zlib compression support	Chris Mason	1	-1/+2
	This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-10-09	Btrfs: Fix makefile for builing btrfs static	Sage Weil	1	-2/+2
	This fixes the btrfs makefile for building in the tree and out of the tree both as a module and static. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-29	Btrfs: add and improve comments	Chris Mason	1	-1/+1
	This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Update Btrfs files for in-kernel usage	Chris Mason	1	-6/+1
	btrfs had magic to put the chagneset id into a printk on module load. This removes that from the Makefile and hardcodes the printk to print "Btrfs" Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: free space accounting redo	Josef Bacik	1	-1/+1
	1) replace the per fs_info extent_io_tree that tracked free space with two rb-trees per block group to track free space areas via offset and size. The reason to do this is because most allocations come with a hint byte where to start, so we can usually find a chunk of free space at that hint byte to satisfy the allocation and get good space packing. If we cannot find free space at or after the given offset we fall back on looking for a chunk of the given size as close to that given offset as possible. When we fall back on the size search we also try to find a slot as close to the size we want as possible, to avoid breaking small chunks off of huge areas if possible. 2) remove the extent_io_tree that tracked the block group cache from fs_info and replaced it with an rb-tree thats tracks block group cache via offset. also added a per space_info list that tracks the block group cache for the particular space so we can lookup related block groups easily. 3) cleaned up the allocation code to make it a little easier to read and a little less complicated. Basically there are 3 steps, first look from our provided hint. If we couldn't find from that given hint, start back at our original search start and look for space from there. If that fails try to allocate space if we can and start looking again. If not we're screwed and need to start over again. 4) small fixes. there were some issues in volumes.c where we wouldn't allocate the rest of the disk. fixed cow_file_range to actually pass the alloc_hint, which has helped a good bit in making the fs_mark test I run have semi-normal results as we run out of space. Generally with data allocations we don't track where we last allocated from, so everytime we did a data allocation we'd search through every block group that we have looking for free space. Now searching a block group with no free space isn't terribly time consuming, it was causing a slight degradation as we got more data block groups. The alloc_hint has fixed this slight degredation and made things semi-normal. There is still one nagging problem I'm working on where we will get ENOSPC when there is definitely plenty of space. This only happens with metadata allocations, and only when we are almost full. So you generally hit the 85% mark first, but sometimes you'll hit the BUG before you hit the 85% wall. I'm still tracking it down, but until then this seems to be pretty stable and make a significant performance gain. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add a write ahead tree log to optimize synchronous operations	Chris Mason	1	-2/+1
	File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: compile when posix acl's are disabled	Josef Bacik	1	-2/+1
	This patch makes btrfs so it will compile properly when acls are disabled. I tested this and it worked with CONFIG_FS_POSIX_ACL off and on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Switch btrfs_name_hash() to crc32c	David Woodhouse	1	-1/+1
	Date: Tue, 19 Aug 2008 19:21:57 +0100 Using a 64-bit hash as the readdir cookie is just asking for trouble. And gets it, when we try to export the file system by NFS. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	NFS support for btrfs - v3	Balaji Rao	1	-1/+1
	Date: Mon, 21 Jul 2008 02:01:56 +0530 Here's an implementation of NFS support for btrfs. It relies on the fixes which are going in to 2.6.28 for the NFS readdir/lookup deadlock. This uses the btrfs_iget helper introduced previously. [dwmw2: Tidy up a little, switch to d_obtain_alias() w/compat routine, change fh_type, store parent's root object ID where needed, fix some get_parent() and fs_to_dentry() bugs] Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add a leaf reference cache	Yan Zheng	1	-1/+2
	Much of the IO done while dropping snapshots is done looking up leaves in the filesystem trees to see if they point to any extents and to drop the references on any extents found. This creates a cache so that IO isn't required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Create orphan inode records to prevent lost files after a crash	Josef Bacik	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add version strings on module load	Chris Mason	1	-1/+5
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Start btree concurrency work.	Chris Mason	1	-1/+1
	The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: split out ioctl.c	Christoph Hellwig	1	-1/+1
	Split the ioctl handling out of inode.c into a file of it's own. Also fix up checkpatch.pl warnings for the moved code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add async worker threads for pre and post IO checksumming	Chris Mason	1	-1/+1
	Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	btrfs: tiny makefile cleanup	Christoph Hellwig	1	-7/+1
	use normal kbuild syntax to build acl.o conditinally and remove comment out lines. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add support for multiple devices per filesystem	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Split the extent_map code into two parts	Chris Mason	1	-1/+2
	There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Fix compile on kernel without ACLs enabled	Yan	1	-1/+4
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Add data=ordered support	Chris Mason	1	-1/+1
	This forces file data extents down the disk along with the metadata that references them. The current implementation is fairly simple, and just writes out all of the dirty pages in an inode before the commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	xattr support for btrfs	Josef Bacik	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Breakout BTRFS_SETGET_FUNCS into a separate C file, the inlines were too big.	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25	Btrfs: Create extent_buffer interface for large blocksizes	Chris Mason	1	-0/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-09-14	Btrfs: Simplify makefile	Jan Engelhardt	1	-3/+4
	Single-colons will do here. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-09-14	Btrfs: add modules_install target	Chris Mason	1	-0/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-08-29	Btrfs: Add per-root block accounting and sysfs entries	Josef Bacik	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-08-28	Btrfs: Extent based page cache code. This uses an rbtree of extents and tests	Chris Mason	1	-1/+2
	instead of buffer heads. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-08-08	Btrfs: Add run time btree defrag, and an ioctl to force btree defrag	Chris Mason	1	-1/+1
	This adds two types of btree defrag, a run time form that tries to defrag recently allocated blocks in the btree when they are still in ram, and an ioctl that forces defrag of all btree blocks. File data blocks are not defragged yet, but this can make a huge difference in sequential btree reads. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-07-25	Btrfs: cleaner make clean	Joel Becker	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12	Btrfs: split up super.c	Chris Mason	1	-2/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-26	Btrfs: add a radix back bit tree	Chris Mason	1	-1/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-22	Btrfs: transaction rework	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-22	Mountable btrfs, with readdir	Chris Mason	1	-2/+3
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-21	Btrfs: initial move to kernel module land	Chris Mason	1	-35/+15
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-21	Btrfs: Better block record keeping, real mkfs	Chris Mason	1	-2/+5
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-20	Btrfs: Add inode map, and the start of file extent items	Chris Mason	1	-1/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-20	Btrfs: add transaction.h to the Makefile	Chris Mason	1	-1/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-16	Btrfs: transaction handles everywhere	Chris Mason	1	-2/+2
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-16	Btrfs: add inode item	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15	Btrfs: directory testing code and dir item fixes	Chris Mason	1	-2/+3
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15	Btrfs: Use a chunk of the key flags to record the item type.	Chris Mason	1	-1/+1
	Add (untested and simple) directory item code Fix comp_keys to use the new key ordering Add btrfs_insert_empty_item Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-13	Btrfs: Change the super to point to a tree of trees to enable persistent ↵	Chris Mason	1	-1/+2
	snapshots Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-12	Btrfs: get/set for struct header fields	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-08	Btrfs: Fixup last found extent caching	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-07	Btrfs: get rid of add recursion	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-07	Btrfs: Fixup reference counting on cows	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-01	Btrfs: Fixup the code to merge during path walks	Chris Mason	1	-1/+4
	Add a bulk insert/remove test to random-test Add the quick-test code back as another regression test Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-01	Btrfs: return code checking	Chris Mason	1	-1/+4
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-28	Btrfs: Add sparse checking to Makefile	Chris Mason	1	-2/+7
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-26	Btrfs: u64 cleanups	Chris Mason	1	-1/+1
	Signed-off-by: Chris Mason <chris.mason@oracle.com>