summaryrefslogtreecommitdiff
path: root/fs/ext4/super.c
AgeCommit message (Collapse)AuthorFilesLines
2017-05-13Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Incremental fixes and a small feature addition on top of the main libnvdimm 4.12 pull request: - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX. The size regression is fixed by moving all dax helpers into the dax-core and only specifying "select DAX" for FS_DAX and dax-capable drivers. He also asked for clarification of the NR_DEV_DAX config option which, on closer look, does not need to be a config option at all. Mike also throws in a DEV_DAX_PMEM fixup for good measure. - Ben's attention to detail on -stable patch submissions caught a case where the recent fixes to arch_copy_from_iter_pmem() missed a condition where we strand dirty data in the cache. This is tagged for -stable and will also be included in the rework of the pmem api to a proposed {memcpy,copy_user}_flushcache() interface for 4.13. - Vishal adds a feature that missed the initial pull due to pending review feedback. It allows the kernel to clear media errors when initializing a BTT (atomic sector update driver) instance on a pmem namespace. - Ross noticed that the dax_device + dax_operations conversion broke __dax_zero_page_range(). The nvdimm unit tests fail to check this path, but xfstests immediately trips over it. No excuse for missing this before submitting the 4.12 pull request. These all pass the nvdimm unit tests and an xfstests spot check. The set has received a build success notification from the kbuild robot" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: filesystem-dax: fix broken __dax_zero_page_range() conversion libnvdimm, btt: ensure that initializing metadata clears poison libnvdimm: add an atomic vs process context flag to rw_bytes x86, pmem: Fix cache flushing for iovec write < 8 bytes device-dax: kill NR_DEV_DAX block, dax: move "select DAX" from BLOCK to FS_DAX device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
2017-05-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
2017-05-09mm: introduce kv[mz]alloc helpersMichal Hocko1-2/+2
Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - add GETFSMAP support - some performance improvements for very large file systems and for random write workloads into a preallocated file - bug fixes and cleanups. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: cleanup write flags handling from jbd2_write_superblock() ext4: mark superblock writes synchronous for nobarrier mounts ext4: inherit encryption xattr before other xattrs ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio() ext4: avoid unnecessary transaction stalls during writeback ext4: preload block group descriptors ext4: make ext4_shutdown() static ext4: support GETFSMAP ioctls vfs: add common GETFSMAP ioctl definitions ext4: evict inline data when writing to memory map ext4: remove ext4_xattr_check_entry() ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries() ext4: merge ext4_xattr_list() into ext4_listxattr() ext4: constify static data that is never modified ext4: trim return value and 'dir' argument from ext4_insert_dentry() jbd2: fix dbench4 performance regression for 'nobarrier' mounts jbd2: Fix lockdep splat with generic/270 test mm: retry writepages() on ENOMEM when doing an data integrity writeback
2017-05-08block, dax: move "select DAX" from BLOCK to FS_DAXDan Williams1-0/+1
For configurations that do not enable DAX filesystems or drivers, do not require the DAX core to be built. Given that the 'direct_access' method has been removed from 'block_device_operations', we can also go ahead and remove the block-related dax helper functions from fs/block_dev.c to drivers/dax/super.c. This keeps dax details out of the block layer and lets the DAX core be built as a module in the FS_DAX=n case. Filesystems need to include dax.h to call bdev_dax_supported(). Cc: linux-xfs@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-04ext4: mark superblock writes synchronous for nobarrier mountsJan Kara1-1/+1
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. generic_make_request_checks() however strips REQ_FUA flag from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. This affects superblock writes for ext4. Fix the problem by marking superblock writes always as synchronous. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-03Merge branch 'generic' of ↵Linus Torvalds1-8/+66
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, reiserfs, udf and ext2 updates from Jan Kara: "The branch contains changes to quota code so that it does not modify persistent flags in inode->i_flags (it was the only place in kernel doing that) and handle it inside filesystem's quotaon/off handlers instead. The branch also contains two UDF cleanups, a couple of reiserfs fixes and one fix for ext2 quota locking" * 'generic' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext4: Improve comments in ext4_quota_{on|off}() udf: use kmap_atomic for memcpy copying udf: use octal for permissions quota: Remove dquot_quotactl_ops reiserfs: Remove i_attrs_to_sd_attrs() reiserfs: Remove useless setting of i_flags jfs: Remove jfs_get_inode_flags() ext2: Remove ext2_get_inode_flags() ext4: Remove ext4_get_inode_flags() quota: Stop setting IMMUTABLE and NOATIME flags on quota files jfs: Set flags on quota files directly ext2: Set flags on quota files directly reiserfs: Set flags on quota files directly ext4: Set flags on quota files directly reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore reiserfs: Make cancel_old_flush() reliable ext2: Call dquot_writeback_dquots() with s_umount held reiserfs: avoid a -Wmaybe-uninitialized warning
2017-04-30ext4: preload block group descriptorsAndrew Perepechko1-0/+6
With enabled meta_bg option block group descriptors reading IO is not sequential and requires optimization. Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: support GETFSMAP ioctlsDarrick J. Wong1-0/+1
Support the GETFSMAP ioctls so that we can use the xfs free space management tools to probe ext4 as well. Note that this is a partial implementation -- we only report fixed-location metadata and free space; everything else is reported as "unknown". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: constify static data that is never modifiedEric Biggers1-2/+3
Constify static data in ext4 that is never (intentionally) modified so that it is placed in .rodata and benefits from memory protection. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-24ext4: Improve comments in ext4_quota_{on|off}()Jan Kara1-2/+10
Improve comments in ext4_quota_{on|off}() to explain that returning success despite ext4_journal_start() failing is deliberate. Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-19ext4: Set flags on quota files directlyJan Kara1-6/+56
Currently immutable and noatime flags on quota files are set by quota code which requires us to copy inode->i_flags to our on disk version of quota flags in GETFLAGS ioctl and ext4_do_update_inode(). Move to setting / clearing these on-disk flags directly to save that copying. Signed-off-by: Jan Kara <jack@suse.cz>
2017-03-15fscrypt: eliminate ->prepare_context() operationEric Biggers1-6/+4
The only use of the ->prepare_context() fscrypt operation was to allow ext4 to evict inline data from the inode before ->set_context(). However, there is no reason why this cannot be done as simply the first step in ->set_context(), and in fact it makes more sense to do it that way because then the policy modes and flags get validated before any real work is done. Therefore, merge ext4_prepare_context() into ext4_set_context(), and remove ->prepare_context(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-21Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-9/+38
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "For this cycle we add support for the shutdown ioctl, which is primarily used for testing, but which can be useful on production systems when a scratch volume is being destroyed and the data on it doesn't need to be saved. This found (and we fixed) a number of bugs with ext4's recovery to corrupted file system --- the bugs increased the amount of data that could be potentially lost, and in the case of the inline data feature, could cause the kernel to BUG. Also included are a number of other bug fixes, including in ext4's fscrypt, DAX, inline data support" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN ext4: fix fencepost in s_first_meta_bg validation ext4: don't BUG when truncating encrypted inodes on the orphan list ext4: do not use stripe_width if it is not set ext4: fix stripe-unaligned allocations dax: assert that i_rwsem is held exclusive for writes ext4: fix DAX write locking ext4: add EXT4_IOC_GOINGDOWN ioctl ext4: add shutdown bit and check for it ext4: rename s_resize_flags to s_ext4_flags ext4: return EROFS if device is r/o and journal replay is needed ext4: preserve the needs_recovery flag when the journal is aborted jbd2: don't leak modified metadata buffers on an aborted journal ext4: fix inline data error paths ext4: move halfmd4 into hash.c directly ext4: fix use-after-iput when fscrypt contexts are inconsistent jbd2: fix use after free in kjournald2() ext4: fix data corruption in data=journal mode ext4: trim allocation requests to group size ext4: replace BUG_ON with WARN_ON in mb_find_extent() ...
2017-02-15ext4: fix fencepost in s_first_meta_bg validationTheodore Ts'o1-1/+1
It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-02-10ext4: do not use stripe_width if it is not setJan Kara1-2/+2
Avoid using stripe_width for sbi->s_stripe value if it is not actually set. It prevents using the stride for sbi->s_stripe. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-08fscrypt: constify struct fscrypt_operationsEric Biggers1-2/+2
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-06ext4: add EXT4_IOC_GOINGDOWN ioctlTheodore Ts'o1-1/+1
This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl. (In fact, it uses the same code points.) Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05ext4: add shutdown bit and check for itTheodore Ts'o1-0/+21
Add a shutdown bit that will cause ext4 processing to fail immediately with EIO. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05ext4: return EROFS if device is r/o and journal replay is neededTheodore Ts'o1-1/+2
If the file system requires journal recovery, and the device is read-ony, return EROFS to the mount system call. This allows xfstests generic/050 to pass. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-02-05ext4: preserve the needs_recovery flag when the journal is abortedTheodore Ts'o1-2/+4
If the journal is aborted, the needs_recovery feature flag should not be removed. Otherwise, it's the journal might not get replayed and this could lead to more data getting lost. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-01-11ext4: add debug_want_extra_isize mount optionTheodore Ts'o1-2/+7
In order to test the inode extra isize expansion code, it is useful to be able to easily create file systems that have inodes with extra isize values smaller than the current desired value. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-01-08fscrypt: make fscrypt_operations.key_prefix a stringEric Biggers1-12/+1
There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-18Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "In this pile: - autofs-namespace series - dedupe stuff - more struct path constification" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features ocfs2: charge quota for reflinked blocks ocfs2: fix bad pointer cast ocfs2: always unlock when completing dio writes ocfs2: don't eat io errors during _dio_end_io_write ocfs2: budget for extent tree splits when adding refcount flag ocfs2: prohibit refcounted swapfiles ocfs2: add newlines to some error messages ocfs2: convert inode refcount test to a helper simple_write_end(): don't zero in short copy into uptodate exofs: don't mess with simple_write_{begin,end} 9p: saner ->write_end() on failing copy into non-uptodate page fix gfs2_stuffed_write_end() on short copies fix ceph_write_end() nfs_write_end(): fix handling of short copies vfs: refactor clone/dedupe_file_range common functions fs: try to clone files first in vfs_copy_file_range vfs: misc struct path constification namespace.c: constify struct path passed to a bunch of primitives quota: constify struct path in quota_on ...
2016-12-14Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-49/+103
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) dax: Fix sleep in atomic contex in grab_mapping_entry() fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL fscrypt: Delay bounce page pool allocation until needed fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page() fscrypt: Never allocate fscrypt_ctx on in-place encryption fscrypt: Use correct index in decrypt path. fscrypt: move the policy flags and encryption mode definitions to uapi header fscrypt: move non-public structures and constants to fscrypt_private.h fscrypt: unexport fscrypt_initialize() fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info() fscrypto: move ioctl processing more fully into common code fscrypto: remove unneeded Kconfig dependencies MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches ext4: do not perform data journaling when data is encrypted ext4: return -ENOMEM instead of success ext4: reject inodes with negative size ext4: remove another test in ext4_alloc_file_blocks() Documentation: fix description of ext4's block_validity mount option ext4: fix checks for data=ordered and journal_async_commit options ...
2016-12-13Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
2016-12-11ext4: do not perform data journaling when data is encryptedSergey Karamov1-0/+5
Currently data journalling is incompatible with encryption: enabling both at the same time has never been supported by design, and would result in unpredictable behavior. However, users are not precluded from turning on both features simultaneously. This change programmatically replaces data journaling for encrypted regular files with ordered data journaling mode. Background: Journaling encrypted data has not been supported because it operates on buffer heads of the page in the page cache. Namely, when the commit happens, which could be up to five seconds after caching, the commit thread uses the buffer heads attached to the page to copy the contents of the page to the journal. With encryption, it would have been required to keep the bounce buffer with ciphertext for up to the aforementioned five seconds, since the page cache can only hold plaintext and could not be used for journaling. Alternatively, it would be required to setup the journal to initiate a callback at the commit time to perform deferred encryption - in this case, not only would the data have to be written twice, but it would also have to be encrypted twice. This level of complexity was not justified for a mode that in practice is very rarely used because of the overhead from the data journalling. Solution: If data=journaled has been set as a mount option for a filesystem, or if journaling is enabled on a regular file, do not perform journaling if the file is also encrypted, instead fall back to the data=ordered mode for the file. Rationale: The intent is to allow seamless and proper filesystem operation when journaling and encryption have both been enabled, and have these two conflicting features gracefully resolved by the filesystem. Fixes: 4461471107b7 Signed-off-by: Sergey Karamov <skaramov@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-12-06quota: constify struct path in quota_onAl Viro1-2/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-04ext4: fix checks for data=ordered and journal_async_commit optionsJan Kara1-6/+15
Combination of data=ordered mode and journal_async_commit mount option is invalid. However the check in parse_options() fails to detect the case where we simply end up defaulting to data=ordered mode and we detect the problem only on remount which triggers hard to understand failure to remount the filesystem. Fix the checking of mount options to take into account also the default mode by moving the check somewhat later in the mount sequence. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01ext4: validate s_first_meta_bg at mount timeEryu Guan1-0/+9
Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-11-26ext4: fix mmp use after free during unmountEric Sandeen1-1/+1
In ext4_put_super, we call brelse on the buffer head containing the ext4 superblock, but then try to use it when we stop the mmp thread, because when the thread shuts down it does: write_mmp_block ext4_mmp_csum_set ext4_has_metadata_csum WARN_ON_ONCE(ext4_has_feature_metadata_csum(sb)...) which reaches into sb->s_fs_info->s_es->s_feature_ro_compat, which lives in the superblock buffer s_sbh which we just released. Fix this by moving the brelse down to a point where we are no longer using it. Reported-by: Wang Shu <shuwang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-11-21ext4: avoid lockdep warning when inheriting encryption contextEric Biggers1-10/+22
On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep warning when run on ext4 mounted with -o test_dummy_encryption: xfs_io/4594 is trying to acquire lock: (jbd2_handle ){++++.+}, at: [<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b but task is already holding lock: (jbd2_handle ){++++.+}, at: [<ffffffff813000de>] start_this_handle+0x354/0x3d8 The abbreviated call stack is: [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6 [<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6 [<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18 [<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79 [<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f [<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b [<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103 [<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a [<ffffffff812bd371>] ext4_create+0xb7/0x161 When a file is created in an encrypted directory, ext4_set_context() is called to set an encryption context on the new file. This calls ext4_xattr_set(), which contains a retry loop where the journal is forced to commit if an ENOSPC error is encountered. If the task actually were to wait for the journal to commit in this case, then it would deadlock because a handle remains open from __ext4_new_inode(), so the running transaction can't be committed yet. Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not allowing the running transaction to be committed while the current task has it open. However, the above lockdep warning is still triggered. This was a false positive which was introduced by: 1eaa566d368b: jbd2: track more dependencies on transaction commit Fix the problem by passing the handle through the 'fs_data' argument to ext4_set_context(), then using ext4_xattr_set_handle() instead of ext4_xattr_set(). And in the case where no journal handle is specified and ext4_set_context() has to open one, add an ENOSPC retry loop since in that case it is the outermost transaction. Signed-off-by: Eric Biggers <ebiggers@google.com>
2016-11-21ext4: only set S_DAX if DAX is really supportedJan Kara1-0/+6
Currently we have S_DAX set inode->i_flags for a regular file whenever ext4 is mounted with dax mount option. However in some cases we cannot really do DAX - e.g. when inode is marked to use data journalling, when inode data is being encrypted, or when inode is stored inline. Make sure S_DAX flag is appropriately set/cleared in these cases. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: sanity check the block and cluster size at mount timeTheodore Ts'o1-1/+16
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18ext4: add sanity checking to count_overhead()Theodore Ts'o1-3/+8
The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18ext4: use more strict checks for inodes_per_block on mountTheodore Ts'o1-9/+6
Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2016-11-18ext4: fix in-superblock mount options processingTheodore Ts'o1-15/+23
Fix a large number of problems with how we handle mount options in the superblock. For one, if the string in the superblock is long enough that it is not null terminated, we could run off the end of the string and try to interpret superblocks fields as characters. It's unlikely this will cause a security problem, but it could result in an invalid parse. Also, parse_options is destructive to the string, so in some cases if there is a comma-separated string, it would be modified in the superblock. (Fortunately it only happens on file systems with a 1k block size.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18ext4: sanity check the block and cluster size at mount timeTheodore Ts'o1-1/+16
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-15ext4: use current_time() for inode timestampsDeepa Dinamani1-1/+1
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe. current_time() will be transitioned to be y2038 safe along with vfs. current_time() returns timestamps according to the granularities set in the super_block. The granularity check in ext4_current_time() to call current_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to obtain timestamps unconditionally, and remove ext4_current_time(). Quota files are assumed to be on the same filesystem. Hence, use current_time() for these files as well. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2016-11-14ext4: don't lock buffer in ext4_commit_super if holding spinlockTheodore Ts'o1-2/+3
If there is an error reported in mballoc via ext4_grp_locked_error(), the code is holding a spinlock, so ext4_commit_super() must not try to lock the buffer head, or else it will trigger a BUG: BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358 in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166 ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153 ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0 Call Trace: [<ffffffff81318c89>] dump_stack+0x67/0x9e [<ffffffff810810b0>] ___might_sleep+0xf0/0x140 [<ffffffff81081153>] __might_sleep+0x53/0xb0 [<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290 [<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230 [<ffffffff81081153>] ? __might_sleep+0x53/0xb0 [<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320 Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0 (and it is the only caller which does so), avoid locking and unlocking the buffer in this case. This can result in races with ext4_commit_super() if there are other problems (which is what commit 4743f83990614 was trying to address), but a Warning is better than BUG. Fixes: 4743f83990614 Cc: stable@vger.kernel.org # 4.9 Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-14ext4: allow ext4_truncate() to return an errorTheodore Ts'o1-2/+4
This allows us to properly propagate errors back up to ext4_truncate()'s callers. This also means we no longer have to silently ignore some errors (e.g., when trying to add the inode to the orphan inode list). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig1-1/+1
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-13ext4: super.c: Update logging style using KERN_CONTJoe Perches1-10/+11
Recent commit require line continuing printks to use PR_CONT. Update super.c to use KERN_CONT and use vsprintf extension %pV to avoid a printk/vprintk/printk("\n") sequence as well. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-09-30ext4: use journal inode to determine journal overheadEric Whitney1-2/+17
When a file system contains an internal journal that has not been loaded, use the journal inode's i_size field to determine its contribution to the file system's overhead. (The journal's j_maxlen field is normally used to determine its size, but it's unavailable when the journal has not been loaded.) Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-30ext4: create function to read journal inodeEric Whitney1-9/+23
Factor out the code used in ext4_get_journal() to read a valid journal inode from storage, enabling its reuse in other functions. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-06ext4: improve ext4lazyinit scalabilityDmitry Monakhov1-10/+31
ext4lazyinit is a global thread. This thread performs itable initalization under li_list_mtx mutex. It basically does the following: ext4_lazyinit_thread ->mutex_lock(&eli->li_list_mtx); ->ext4_run_li_request(elr) ->ext4_init_inode_table-> Do a lot of IO if the list is large And when new mount/umount arrive they have to block on ->li_list_mtx because lazy_thread holds it during full walk procedure. ext4_fill_super ->ext4_register_li_request ->mutex_lock(&ext4_li_info->li_list_mtx); ->list_add(&elr->lr_request, &ext4_li_info >li_request_list); In my case mount takes 40minutes on server with 36 * 4Tb HDD. Common user may face this in case of very slow dev ( /dev/mmcblkXXX) Even more. If one of filesystems was frozen lazyinit_thread will simply block on sb_start_write() so other mount/umount will be stuck forever. This patch changes logic like follows: - grab ->s_umount read sem before processing new li_request. After that it is safe to drop li_list_mtx because all callers of li_remove_request are holding ->s_umount for write. - li_thread skips frozen SB's Locking order: Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so the only way to to grab ->s_mount inside li_thread is via down_read_trylock xfstests:ext4/023 #PSBM-49658 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-06ext4: enable quota enforcement based on mount optionsJan Kara1-10/+24
When quota information is stored in quota files, we enable only quota accounting on mount and enforcement is enabled only in response to Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a possibility to enable quota enforcement on mount by specifying corresponding quota mount option (usrquota, grpquota, prjquota). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-08-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-1/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix bugs that could cause kernel deadlocks or file system corruption while moving xattrs to expand the extended inode. Also add some sanity checks to the block group descriptors to make sure we don't end up overwriting the superblock" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid deadlock when expanding inode size ext4: properly align shifted xattrs when expanding inodes ext4: fix xattr shifting when expanding inodes part 2 ext4: fix xattr shifting when expanding inodes ext4: validate that metadata blocks do not overlap superblock ext4: reserve xattr index for the Hurd
2016-08-01ext4: validate that metadata blocks do not overlap superblockTheodore Ts'o1-1/+17
A number of fuzzing failures seem to be caused by allocation bitmaps or other metadata blocks being pointed at the superblock. This can cause kernel BUG or WARNings once the superblock is overwritten, so validate the group descriptor blocks to make sure this doesn't happen. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>