summaryrefslogtreecommitdiff
path: root/fs/ext4/super.c
AgeCommit message (Collapse)AuthorFilesLines
2011-03-25Merge branch 'for_linus' of ↵Linus Torvalds1-21/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: fix a BUG in mb_mark_used during trim. ext4: unused variables cleanup in fs/ext4/extents.c ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep() ext4: add more tracepoints and use dev_t in the trace buffer ext4: don't kfree uninitialized s_group_info members ext4: add missing space in printk's in __ext4_grp_locked_error() ext4: add FITRIM to compat_ioctl. ext4: handle errors in ext4_clear_blocks() ext4: unify the ext4_handle_release_buffer() api ext4: handle errors in ext4_rename jbd2: add COW fields to struct jbd2_journal_handle jbd2: add the b_cow_tid field to journal_head struct ext4: Initialize fsync transaction ids in ext4_new_inode() ext4: Use single thread to perform DIO unwritten convertion ext4: optimize ext4_bio_write_page() when no extent conversion is needed ext4: skip orphan cleanup if fs has unknown ROCOMPAT features ext4: use the nblocks arg to ext4_truncate_restart_trans() ext4: fix missing iput of root inode for some mount error paths ext4: make FIEMAP and delayed allocation play well together ext4: suppress verbose debugging information if malloc-debug is off ... Fi up conflicts in fs/ext4/super.c due to workqueue changes
2011-03-22ext4: add missing space in printk's in __ext4_grp_locked_error()Robin Dong1-1/+1
When we do performence-testing on ext4 filesystem, we observed a warning like this: EXT4-fs error (device sda7): ext4_mb_generate_buddy:718: group 259825901 blocks in bitmap, 26057 in gd instead, it should be "group 2598, 25901 blocks in bitmap, 26057 in gd" Reviewed-by: Coly Li <bosong.ly@taobao.com> Cc: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-03-16Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+6
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-15ext4: Copy fs UUID to superblockAneesh Kumar K.V1-0/+2
File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-05ext4: Use single thread to perform DIO unwritten convertionMingming Cao1-1/+1
While running ext4 testing on multiple core, we found there are per cpu ext4-dio-unwritten threads processing conversion from unwritten extents to written for IOs completed from async direct IO patch. Per filesystem is enough, we don't need per cpu threads to work on conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2011-02-28ext4: skip orphan cleanup if fs has unknown ROCOMPAT featuresAmir Goldstein1-0/+8
Orphan cleanup is currently executed even if the file system has some number of unknown ROCOMPAT features, which deletes inodes and frees blocks, which could be very bad for some RO_COMPAT features, especially the SNAPSHOT feature. This patch skips the orphan cleanup if it contains readonly compatible features not known by this ext4 implementation, which would prevent the fs from being mounted (or remounted) readwrite. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-28ext4: fix missing iput of root inode for some mount error pathsManish Katiyar1-2/+3
This assures that the root inode is not leaked, and that sb->s_root is NULL, which will prevent generic_shutdown_super() from doing extra work, including call sync_filesystem, which ultimately results in ext4_sync_fs() getting called with an uninitialized struct super, which is the cause of the crash noted in Kernel Bugzilla #26752. https://bugzilla.kernel.org/show_bug.cgi?id=26752 Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-26ext4: enable mblk_io_submit by defaultTheodore Ts'o1-2/+3
Now that we've fixed the file corruption bug in commit d50bdd5aa55, it's time to enable mblk_io_submit by default. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-24ext4: enable acls and user_xattr by defaultEric Sandeen1-9/+5
There's no good reason to require the extra step of providing a mount option for acl or user_xattr once the feature is configured on; no other filesystem that I know of requires this. Userspace patches have set these options in default mount options, and this patch makes them default in the kernel. At some point we can start to deprecate the options, perhaps. For now I've removed default mount option checks in show_options() to be explicit about what's set, since it's changing the default, but I'm open to alternatives if desired. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-23ext4: mark file-local functions and variables as staticLukas Czerner1-3/+3
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-22ext4: allow inode_readahead_blks=0 (linux-2.6.37)Alexander V. Lukyanov1-2/+2
I cannot disable inode-read-ahead feature of ext4 (on 2.6.37): # echo 0 > /sys/fs/ext4/sda2/inode_readahead_blks bash: echo: write error: Invalid argument On a server with lots of small files and random access this read-ahead makes performance worse, and I'd like to disable it. I work around this problem by using value of 1, but it still reads an extra block. This patch fixes the problem by checking for zero explicitly. Signed-off-by: Alexander V. Lukyanov <lav@netis.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-22ext4: Fix sparse warning: Using plain integer as NULL pointerPeter Huewe1-1/+1
This patch fixes the warning "Using plain integer as NULL pointer", generated by sparse, by replacing the offending 0s with NULL. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-21Merge branch 'master' into for-2.6.39Tejun Heo1-20/+46
2011-02-12ext4: serialize unaligned asynchronous DIOEric Sandeen1-1/+12
ext4 has a data corruption case when doing non-block-aligned asynchronous direct IO into a sparse file, as demonstrated by xfstest 240. The root cause is that while ext4 preallocates space in the hole, mappings of that space still look "new" and dio_zero_block() will zero out the unwritten portions. When more than one AIO thread is going, they both find this "new" block and race to zero out their portion; this is uncoordinated and causes data corruption. Dave Chinner fixed this for xfs by simply serializing all unaligned asynchronous direct IO. I've done the same here. The difference is that we only wait on conversions, not all IO. This is a very big hammer, and I'm not very pleased with stuffing this into ext4_file_write(). But since ext4 is DIO_LOCKING, we need to serialize it at this high level. I tried to move this into ext4_ext_direct_IO, but by then we have the i_mutex already, and we will wait on the work queue to do conversions - which must also take the i_mutex. So that won't work. This was originally exposed by qemu-kvm installing to a raw disk image with a normal sector-63 alignment. I've tested a backport of this patch with qemu, and it does avoid the corruption. It is also quite a lot slower (14 min for package installs, vs. 8 min for well-aligned) but I'll take slow correctness over fast corruption any day. Mingming suggested that we can track outstanding conversions, and wait on those so that non-sparse files won't be affected, and I've implemented that here; unaligned AIO to nonsparse files won't take a perf hit. [tytso@mit.edu: Keep the mutex as a hashed array instead of bloating the ext4 inode] [tytso@mit.edu: Fix up namespace issues so that global variables are protected with an "ext4_" prefix.] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-03ext4: fix up ext4 error handlingTheodore Ts'o1-4/+10
Make sure we the correct cleanup happens if we die while trying to load the ext4 file system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-03ext4: unregister features interface on module unloadLukas Czerner1-2/+10
Ext4 features interface was not properly unregistered which led to problems while unloading/reloading ext4 module. This commit fixes that by adding proper kobject unregistration code into ext4_exit_fs() as well as fail-path of ext4_init_fs() Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-02-03ext4: fix panic on module unload when stopping lazyinit threadEric Sandeen1-13/+14
https://bugzilla.kernel.org/show_bug.cgi?id=27652 If the lazyinit thread is running, the teardown function ext4_destroy_lazyinit_thread() has problems: ext4_clear_request_list(); while (ext4_li_info->li_task) { wake_up(&ext4_li_info->li_wait_daemon); wait_event(ext4_li_info->li_wait_task, ext4_li_info->li_task == NULL); } Clearing the request list will cause the thread to exit and free ext4_li_info, so then we're waiting on something which is getting freed. Fix this up by making the thread respond to kthread_stop, and exit, without the need to wait for that exit in some other homegrown way. Cc: stable@kernel.org Reported-and-Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-01ext4: convert to alloc_workqueue()Tejun Heo1-1/+6
Convert create_workqueue() to alloc_workqueue(). This is an identity conversion. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org
2011-01-21Merge branch 'for_linus' of ↵Linus Torvalds1-18/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix deadlock during path resolution
2011-01-13Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-10/+2
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
2011-01-12quota: Fix deadlock during path resolutionJan Kara1-18/+7
As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl is prone to deadlocks. We hold s_umount semaphore for reading during the path resolution and resolution itself may need to acquire the semaphore for writing when e. g. autofs mountpoint is passed. Solve the problem by performing the resolution before we get hold of the superblock (and thus s_umount semaphore). The whole thing is complicated by the fact that some filesystems (OCFS2) ignore the path argument. So to distinguish between filesystem which want the path and which do not we introduce new .quota_on_meta callback which does not get the path. OCFS2 then uses this callback instead of old .quota_on. CC: Al Viro <viro@ZenIV.linux.org.uk> CC: Christoph Hellwig <hch@lst.de> CC: Ted Ts'o <tytso@mit.edu> CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-01-12Merge branch 'for_linus' of ↵Linus Torvalds1-132/+156
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits) ext4: fix trimming starting with block 0 with small blocksize ext4: revert buggy trim overflow patch ext4: don't pass entire map to check_eofblocks_fl ext4: fix memory leak in ext4_free_branches ext4: remove ext4_mb_return_to_preallocation() ext4: flush the i_completed_io_list during ext4_truncate ext4: add error checking to calls to ext4_handle_dirty_metadata() ext4: fix trimming of a single group ext4: fix uninitialized variable in ext4_register_li_request ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary ext4: drop i_state_flags on architectures with 64-bit longs ext4: reorder ext4_inode_info structure elements to remove unneeded padding ext4: drop ec_type from the ext4_ext_cache structure ext4: use ext4_lblk_t instead of sector_t for logical blocks ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED ext4: fix 32bit overflow in ext4_ext_find_goal() ext4: add more error checks to ext4_mkdir() ext4: ext4_ext_migrate should use NULL not 0 ext4: Use ext4_error_file() to print the pathname to the corrupted inode ext4: use IS_ERR() to check for errors in ext4_error_file ...
2011-01-10ext4: fix uninitialized variable in ext4_register_li_requestAndrew Morton1-1/+1
fs/ext4/super.c: In function 'ext4_register_li_request': fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function It looks buggy to me, too. Cc: Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessaryTheodore Ts'o1-9/+7
Replace the jbd2_inode structure (which is 48 bytes) with a pointer and only allocate the jbd2_inode when it is needed --- that is, when the file system has a journal present and the inode has been opened for writing. This allows us to further slim down the ext4_inode_info structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVEDTheodore Ts'o1-1/+0
Remove the short element i_delalloc_reserved_flag from the ext4_inode_info structure and replace it a new bit in i_state_flags. Since we have an ext4_inode_info for every ext4 inode cached in the inode cache, any savings we can produce here is a very good thing from a memory utilization perspective. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10ext4: Use ext4_error_file() to print the pathname to the corrupted inodeTheodore Ts'o1-12/+16
Where the file pointer is available, use ext4_error_file() instead of ext4_error_inode(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10ext4: use IS_ERR() to check for errors in ext4_error_fileDan Carpenter1-1/+1
d_path() returns an ERR_PTR and it doesn't return NULL. This is in ext4_error_file() and no one actually calls ext4_error_file(). Signed-off-by: Dan Carpenter <error27@gmail.com>
2011-01-07fs: icache RCU free inodesNick Piggin1-1/+8
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-12-20ext4: Use printf extension %pVJoe Perches1-17/+23
Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. In function __ext4_grp_locked_error also added KERN_CONT to some printks. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-20ext4: Use vzalloc in ext4_fill_flex_info()Joe Perches1-8/+7
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-16ext4: Add second mount options field since the s_mount_opt is full upTheodore Ts'o1-2/+5
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-16ext4: Move struct ext4_mount_options from ext4.h to super.cTheodore Ts'o1-0/+15
Move the ext4_mount_options structure definition from ext4.h, since it is only used in super.c. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-16ext4: Simplify the usage of clear_opt() and set_opt() macrosTheodore Ts'o1-81/+81
Change clear_opt() and set_opt() to take a superblock pointer instead of a pointer to EXT4_SB(sb)->s_mount_opt. This makes it easier for us to support a second mount option field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-14ext4: Turn off multiple page-io submission by defaultTheodore Ts'o1-2/+12
Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: Jon Nelson <jnelson@jamponi.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-20fs: Do not dispatch FITRIM through separate super_operationLukas Czerner1-1/+0
There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-19ext4: ext4_fill_super shouldn't return 0 on corruptionDarrick J. Wong1-2/+3
At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit 30ca22c70e3ef0a96ff84de69cd7e8561b416cb2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-18ext4: missing unlock in ext4_clear_request_list()Dan Carpenter1-3/+0
If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-13block: clean up blkdev_get() wrappers and their usersTejun Heo1-1/+1
After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13block: make blkdev_get/put() handle exclusive accessTejun Heo1-10/+2
Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-08ext4: Add new ext4 inode tracepointsTheodore Ts'o1-0/+10
Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and ext4_begin_ordered_truncate() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-08ext4: do not try to grab the s_umount semaphore in ext4_quota_offDmitry Monakhov1-5/+3
It's not needed to sync the filesystem, and it fixes a lock_dep complaint. Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2010-11-08ext4: handle writeback of inodes which are being freedTheodore Ts'o1-0/+2
The following BUG can occur when an inode which is getting freed when it still has dirty pages outstanding, and it gets deleted (in this because it was the target of a rename). In ordered mode, we need to make sure the data pages are written just in case we crash before the rename (or unlink) is committed. If the inode is being freed then when we try to igrab the inode, we end up tripping the BUG_ON at fs/ext4/page-io.c:146. To solve this problem, we need to keep track of the number of io callbacks which are pending, and avoid destroying the inode until they have all been completed. That way we don't have to bump the inode count to keep the inode from being destroyed; an approach which doesn't work because the count could have already been dropped down to zero before the inode writeback has started (at which point we're not allowed to bump the count back up to 1, since it's already started getting freed). Thanks to Dave Chinner for suggesting this approach, which is also used by XFS. kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146! Call Trace: [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307 [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2 [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5 [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d [<ffffffff81087910>] do_writepages+0x1c/0x25 [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10 [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2 [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c [<ffffffff810c14a3>] evict+0x22/0x92 [<ffffffff810c1a3d>] iput+0x212/0x249 [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9 [<ffffffff810bdf6b>] d_kill+0x3d/0x5d [<ffffffff810be613>] dput+0x13a/0x147 [<ffffffff810b990d>] sys_renameat+0x1b5/0x258 [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38 [<ffffffff810b99c6>] sys_rename+0x16/0x18 [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Nick Bowler <nbowler@elliptictech.com>
2010-11-03ext4: initialize the percpu counters before replaying the journalTheodore Ts'o1-26/+39
We now initialize the percpu counters before replaying the journal, but after the journal, we recalculate the global counters, to deal with the possibility of the per-blockgroup counts getting updated by the journal replay. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-02ext4: "ret" may be used uninitialized in ext4_lazyinit_thread()Theodore Ts'o1-8/+6
Newer GCC's reported the following build warning: fs/ext4/super.c: In function 'ext4_lazyinit_thread': fs/ext4/super.c:2702: warning: 'ret' may be used uninitialized in this function Fix it by removing the need for the ret variable in the first place. Signed-off-by: "Lukas Czerner" <lczerner@redhat.com> Reported-by: "Stefan Richter" <stefanr@s5r6.in-berlin.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-02ext4: fix lazyinit hang after removing requestLukas Czerner1-1/+2
When the request has been removed from the list and no other request has been issued, we will end up with next wakeup scheduled to MAX_JIFFY_OFFSET which is bad. So check for that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-29new helper: mount_bdev()Al Viro1-8/+8
... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-28Merge branch 'next' into upstream-mergeTheodore Ts'o1-21/+510
Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h
2010-10-28ext4: fix unbalanced mutex unlock in error path of ext4_li_request_newNicolas Kaiser1-14/+8
Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-28ext4: make various ext4 functions be staticTheodore Ts'o1-1/+1
These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-28ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()Theodore Ts'o1-16/+16
This is a cleanup to avoid namespace leaks out of fs/ext4 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>