summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)AuthorFilesLines
2012-12-10ext4: make ext4_init_dot_dotdot for inline dir usageTao Ma2-44/+75
Currently, the initialization of dot and dotdot are encapsulated in ext4_mkdir and also bond with dir_block. So create a new function named ext4_init_new_dir and the initialization is moved to ext4_init_dot_dotdot. Now it will called either in the normal non-inline case(rec_len of ".." will cover the whole block) or when we converting an inline dir to a block(rec len of ".." will be the real length). The start of the next entry is also returned for inline dir usage. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: add delalloc support for inline dataTao Ma4-9/+262
For delayed allocation mode, we write to inline data if the file is small enough. And in case of we write to some offset larger than the inline size, the 1st page is dirtied, so that ext4_da_writepages can handle the conversion. When the 1st page is initialized with blocks, the inline part is removed. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: add journalled write support for inline dataTao Ma3-20/+85
Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: add normal write support for inline dataTao Ma5-42/+340
For a normal write case (not journalled write, not delayed allocation), we write to the inline if the file is small and convert it to an extent based file when the write is larger than the max inline size. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: add read support for inline dataTao Ma3-1/+98
Let readpage and readpages handle the case when we want to read an inlined file. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: add the basic function for inline data supportTao Ma5-3/+534
Implement inline data with xattr. Now we use "system.data" to store xattr, and the xattr will be extended if the i_size is increased while we don't release the space during truncate. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-05ext4: export inline xattr functionsTao Ma2-33/+64
The inline data feature will need some inline xattr functions, so export them from fs/ext4/xattr.c so that inline.c can use them. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-02ext4: move extra inode read to a new functionTao Ma1-5/+11
Currently, in ext4_iget we do a simple check to see whether there does exist some information starting from the end of i_extra_size. With inline data added, this procedure is more complicated. So move it to a new function named ext4_iget_extra_inode. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-30ext4: fix possible use after free with metadata csumTheodore Ts'o1-1/+1
Commit fa77dcfafeaa introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
2012-11-30ext4: restructure ext4_ext_direct_IO()Theodore Ts'o1-108/+103
Remove a level of indentation by moving the DIO read and extending write case to the beginning of the file. This results in no actual programmatic changes to the file, but makes it easier to read/understand. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: rationalize ext4_extents.h inclusionTheodore Ts'o8-30/+37
Previously, ext4_extents.h was being included at the end of ext4.h, which was bad for a number of reasons: (a) it was not being included in the expected place, and (b) it caused the header to be included multiple times. There were #ifdef's to prevent this from causing any problems, but it still was unnecessary. By moving the function declarations that were in ext4_extents.h to ext4.h, which is standard practice for where the function declarations for the rest of ext4.h can be found, we can remove ext4_extents.h from being included in ext4.h at all, and then we can only include ext4_extents.h where it is needed in ext4's source files. It should be possible to move a few more things into ext4.h, and further reduce the number of source files that need to #include ext4_extents.h, but that's a cleanup for another day. Reported-by: Sachin Kamat <sachin.kamat@linaro.org> Reported-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: fixed potential NULL dereference in ext4_calculate_overhead()Vahram Martirosyan1-1/+0
The memset operation before check can cause a BUG if the memory allocation failed. Since we are using get_zeroed_age, there is no need to use memset anyway. Found by the Spruce system in cooperation with the KEDR Framework. Signed-off-by: Vahram Martirosyan <vmartirosyan@linuxtesting.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: simple cleanup in fiemap codepathLukas Czerner1-16/+16
This commit is simple cleanup of fiemap codepath which has not been included in previous commit to make the changes clearer. In this commit we rename cbex variable to newex in ext4_fill_fiemap_extents() because callback is no longer present Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: prevent race while walking extent tree for fiemapLukas Czerner2-74/+76
Currently ext4_ext_walk_space() only takes i_data_sem for read when searching for the extent at given block with ext4_ext_find_extent(). Then it drops the lock and the extent tree can be changed at will. However later on we're searching for the 'next' extent, but the extent tree might already have changed, so the information might not be accurate. In fact we can hit BUG_ON(end <= start) if the extent got inserted into the tree after the one we found and before the block we were searching for. This has been reproduced by running xfstests 225 in loop on s390x architecture, but theoretically we could hit this on any other architecture as well, but probably not as often. Moreover the extent currently in delayed allocation might be allocated after we search the extent tree and before we search extent status tree delayed buffers resulting in those delayed buffers being completely missed, even though completely written and allocated. We fix all those problems in several steps: 1. remove unnecessary callback indirection 2. rename functions ext4_ext_walk_space -> ext4_fill_fiemap_extents ext4_ext_fiemap_cb -> ext4_find_delayed_extent 3. move fiemap_fill_next_extent() into ext4_fill_fiemap_extents() 4. hold the i_data_sem for: ext4_ext_find_extent() ext4_ext_next_allocated_block() ext4_find_delayed_extent() 5. call fiemap_fill_next_extent after releasing the i_data_sem 6. move path reinitialization into the critical section. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-19Fix misspellings of "whether" in comments.Adam Buchbinder1-1/+1
"Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-16ext4: remove calls to ext4_jbd2_file_inode() from delalloc write pathTheodore Ts'o1-19/+1
The calls to ext4_jbd2_file_inode() are needed to guarantee that we do not expose stale data in the data=ordered mode. However, they are not necessary because in all of the cases where we have newly allocated blocks in the delayed allocation write path, we immediately submit the dirty pages for I/O. Hence, we can avoid the overhead of adding the inode to the list of inodes whose data pages will be to be flushed out to disk completely during the next commit operation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-15ext4: init pagevec in ext4_da_block_invalidatepagesEric Sandeen1-0/+1
ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-11-13ext4: don't verify checksums of dx non-leaf nodes during fallback scanDarrick J. Wong1-0/+17
During a directory entry lookup of a hashed directory, if the hash-based lookup functions fail and we fall back to a linear scan, don't try to verify the dirent checksum on the internal nodes of the hash tree because they don't store a checksum in a hidden dirent like the leaf nodes do. Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-11ext4: do not use ext4_error() when there is no space in dir leaf for csumTheodore Ts'o1-8/+10
If there is no space for a checksum in a directory leaf node, previously we would use EXT4_ERROR_INODE() which would mark the file system as inconsistent. While it would be nice to use e2fsck -D, it certainly isn't required, so just print a warning using ext4_warning(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
2012-11-09ext4: introduce lseek SEEK_DATA/SEEK_HOLE supportZheng Liu1-2/+332
This patch makes ext4 really support SEEK_DATA/SEEK_HOLE flags. Block-mapped and extent-mapped files are fully implemented together because ext4_map_blocks hides this differences. After applying this patch, it will cause a failure in xfstest #285 when the file is block-mapped due to block-mapped file isn't support fallocate(2). I had tried to use ext4_ext_walk_space() to retrieve the offset for a extent-mapped file. But finally I decide to keep using ext4_map_blocks() to support SEEK_DATA/SEEK_HOLE because ext4_map_blocks() can hide the difference between block-mapped file and extent-mapped file. Moreover, in next step, extent status tree will track all extent status, and we can get all mappings from this tree. So I think that using ext4_map_blocks() is a better choice. CC: Hugh Dickins <hughd@google.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: reimplement fiemap using extent status treeZheng Liu1-163/+21
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: reimplement ext4_find_delay_alloc_range on extent status treeZheng Liu4-157/+20
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: add some tracepoints in extent status treeZheng Liu1-0/+8
This patch adds some tracepoints in extent status tree. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: let ext4 maintain extent status treeZheng Liu4-4/+51
This patch lets ext4 maintain extent status tree. Currently it only tracks delay extent status in extent status tree. When a delay allocation is issued, the related delay extent will be inserted into extent status tree. When a delay extent is written out or invalidated, it will be removed from this tree. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: initialize extent status treeZheng Liu1-0/+2
Let ext4 initialize extent status tree of an inode. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: add operations on extent status treeZheng Liu3-1/+513
This patch adds operations on a extent status tree. CC: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: add data structures for the extent status treeZheng Liu2-0/+31
This patch adds two structures that supports extent status tree, extent_status and ext4_es_tree. Currently extent_status is used to track a delay extent for an inode, which record the start block and the length of the delay extent. ext4_es_tree is used to store all extent_status for an inode in memory. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: fix error handling in ext4_fill_super()Lukas Czerner1-8/+8
There are some places in ext4_fill_super() where we would not return proper error code if something fails. The confusion is caused probably due to the fact that we have two "kind-of" return variables 'ret'and 'err'. 'ret' is used to return error code from ext4_fill_super() where err is used to store return values from other functions within ext4_fill_super(). However some places were missing the obligatory 'ret = err'. We could put the assignment where it is missing, but we can have better "future proof" solution. Or we could convert the code to use just one, but it would require more rewrites. This commit fixes the problem by returning value from 'err' variable if it is set and 'ret' otherwise in error handling branch of the ext4_fill_super(). The reasoning is that 'ret' value is often set to default "-EINVAL" or explicit value, where 'err' is used to store return value from other functions and should be otherwise zero. https://bugzilla.kernel.org/show_bug.cgi?id=48431 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-09ext4: fix memory leak in ext4_xattr_set_acl()'s error pathEugene Shatokhin1-2/+4
In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-11-09ext4: remove code duplication in ext4_get_block_write_nolock()Anatol Pomozov1-39/+24
729f52c6be51013 introduced function ext4_get_block_write_nolock() that is very similar to _ext4_get_block(). Eliminate code duplication by passing different flags to _ext4_get_block() Tested: xfs tests Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: use 'inode' variable that is already dereferencedAnatol Pomozov1-1/+1
Tested: xfs tests Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: fix missing call to trace_ext4_ext_map_blocks_exitZheng Liu1-3/+4
When ext4_ext_handle_uninitialized_extents(), we will directly return from ext4_ext_map_blocks(). The trace point of trace_ext4_ext_map_blocks_exit isn't called, and the user doesn't see any result. This patch tries to fix this problem. Meanwhile in ext4_ext_handle_uninitialized_extents it returns errors or the number of allocated blocks. So 'ret' variable can be removed due to previously modifications. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
2012-11-08ext4: print map->m_flags in trace_ext4_ext/ind_map_blocks_exitZheng Liu2-4/+2
When we use trace_ext4_ext/ind_map_blocks_exit, print the value of map->m_flags in order that we can understand the extent's current status. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: print 'flags' in ext4_ext_handle_uninitialized_extentsZheng Liu1-2/+2
In trace_ext4_ext_handle_uninitialized_extents we don't care about the value of map->m_flags because this value is probably 0, and we prefer to get the value of flags because we can know how to handle this extent in this function. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: warn when discard request fails other than EOPNOTSUPPLukas Czerner1-12/+35
We should warn user then the discard request fails. However we need to exclude -EOPNOTSUPP case since parts of the device might not support it while other parts can. So print the kernel warning when the error != -EOPNOTSUPP is returned from ext4_issue_discard(). We should also handle error cases in batched discard, again excluding EOPNOTSUPP. Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: notify when discard is not supportedLukas Czerner1-0/+8
Notify user when mounting the file system with -o discard option, but the device does not support discard. Obviously we do not want to fail the mount or disable the options, because the underlying device might change in future even without file system remount. Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: remove unused assignmentAlan Cox1-1/+1
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: get rid of redundant code in ext4_fill_super()Zhao Hongjiang1-3/+0
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: remove ext4_handle_release_buffer()Eric Sandeen3-22/+3
ext4_handle_release_buffer() was intended to remove journal write access from a buffer, but it doesn't actually do anything at all other than add a BUFFER_TRACE point, but it's not reliably used for that either. Remove all the associated dead code. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2012-11-08ext4: fix awful goto in ext4_mb_new_blocks()Eric Sandeen1-4/+7
I think the whole function could be made prettier, but that goto really took the cake for too-clever-by-half. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-08ext4: fix overhead calculations in ext4_stats, againEric Sandeen1-1/+1
"overhead" was a write-only variable in this function after commit 952fc18e; we set it to 0 for minixdf, or to sbi->s_overhead if !minixdf, but never read it again after that. We need to use it, not sbi->s_overhead, when subtracting out overhead for f_blocks, or we get the wrong answer for minixdf. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-10-29ext4: fix unjournaled inode bitmap modificationEric Sandeen1-10/+9
commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-10-23Merge tag 'ext4_for_linus' of ↵Linus Torvalds9-45/+74
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Various bug fixes for ext4. The most serious of them fixes a security bug (CVE-2012-4508) which leads to stale data exposure when we have fallocate racing against writes to files undergoing delayed allocation. We also have two fixes for the metadata checksum feature, the most serious of which can cause the superblock to have a invalid checksum after a power failure." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Avoid underflow in ext4_trim_fs() ext4: Checksum the block bitmap properly with bigalloc enabled ext4: fix undefined bit shift result in ext4_fill_flex_info ext4: fix metadata checksum calculation for the superblock ext4: race-condition protection for ext4_convert_unwritten_extents_endio ext4: serialize fallocate with ext4_convert_unwritten_extents
2012-10-23ext4: Avoid underflow in ext4_trim_fs()Lukas Czerner1-2/+3
Currently if len argument in ext4_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-10-22ext4: Checksum the block bitmap properly with bigalloc enabledTao Ma6-20/+14
In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
2012-10-15ext4: fix undefined bit shift result in ext4_fill_flex_infoLukas Czerner1-1/+1
The result of the bit shift expression in '1 << sbi->s_log_groups_per_flex' can be undefined in the case that s_log_groups_per_flex is 31 because the result of the shift is bigger than INT_MAX. In reality this probably should not cause much problems since we'll end up with INT_MIN which will then be converted into 'unsigned int' type, but nevertheless according to the ISO C99 the result is actually undefined. Fix this by changing the left operand to 'unsigned int' type. Note that the commit d50f2ab6f050311dbf7b8f5501b25f0bf64a439b already tried to fix the undefined behaviour, but this was missed. Thanks to Laszlo Ersek for pointing this out and suggesting the fix. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com>
2012-10-10ext4: fix metadata checksum calculation for the superblockTheodore Ts'o3-11/+7
The function ext4_handle_dirty_super() was calculating the superblock on the wrong block data. As a result, when the superblock is modified while it is mounted (most commonly, when inodes are added or removed from the orphan list), the superblock checksum would be wrong. We didn't notice because the superblock *was* being correctly calculated in ext4_commit_super(), and this would get called when the file system was unmounted. So the problem only became obvious if the system crashed while the file system was mounted. Fix this by removing the poorly designed function signature for ext4_superblock_csum_set(); if it only took a single argument, the pointer to a struct superblock, the ambiguity which caused this mistake would have been impossible. Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-10-10ext4: race-condition protection for ext4_convert_unwritten_extents_endioDmitry Monakhov1-11/+46
We assumed that at the time we call ext4_convert_unwritten_extents_endio() extent in question is fully inside [map.m_lblk, map->m_len] because it was already split during submission. But this may not be true due to a race between writeback vs fallocate. If extent in question is larger than requested we will split it again. Special precautions should being done if zeroout required because [map.m_lblk, map->m_len] already contains valid data. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-10-09mm: kill vma flag VM_CAN_NONLINEARKonstantin Khlebnikov1-1/+1
Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-08Merge tag 'ext4_for_linus' of ↵Linus Torvalds15-736/+1260
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The big new feature added this time is supporting online resizing using the meta_bg feature. This allows us to resize file systems which are greater than 16TB. In addition, the speed of online resizing has been improved in general. We also fix a number of races, some of which could lead to deadlocks, in ext4's Asynchronous I/O and online defrag support, thanks to good work by Dmitry Monakhov. There are also a large number of more minor bug fixes and cleanups from a number of other ext4 contributors, quite of few of which have submitted fixes for the first time." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits) ext4: fix ext4_flush_completed_IO wait semantics ext4: fix mtime update in nodelalloc mode ext4: fix ext_remove_space for punch_hole case ext4: punch_hole should wait for DIO writers ext4: serialize truncate with owerwrite DIO workers ext4: endless truncate due to nonlocked dio readers ext4: serialize unlocked dio reads with truncate ext4: serialize dio nonlocked reads with defrag workers ext4: completed_io locking cleanup ext4: fix unwritten counter leakage ext4: give i_aiodio_unwritten a more appropriate name ext4: ext4_inode_info diet ext4: convert to use leXX_add_cpu() ext4: ext4_bread usage audit fs: reserve fallocate flag codepoint ext4: remove redundant offset check in mext_check_arguments() ext4: don't clear orphan list on ro mount with errors jbd2: fix assertion failure in commit code due to lacking transaction credits ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails ext4: enable FITRIM ioctl on bigalloc file system ...