Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream.
[ Note, this is a Git cherry-pick of the following commit:
506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")
... for easier x86 PTI code testing and back-porting. ]
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit abdc0eb06964fe1d2fea6dd1391b734d0590365d ]
When session starts beyond offset 2^31 the arithmetics in
udf_check_vsd() would overflow. Make sure the computation is done in
large enough type.
Reported-by: Cezary Sliwa <sliwa@ifpan.edu.pl>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c79dde629d2027ca80329c62854a7635e623d527 ]
After rmmod 8250.ko
tty_kref_put starts kwork (release_one_tty) to release proc interface
oops when accessing driver->driver_name in proc_tty_unregister_driver
Use jprobe, found driver->driver_name point to 8250.ko
static static struct uart_driver serial8250_reg
.driver_name= serial,
Use name in proc_dir_entry instead of driver->driver_name to fix oops
test on linux 4.1.12:
BUG: unable to handle kernel paging request at ffffffffa01979de
IP: [<ffffffff81310f40>] strchr+0x0/0x30
PGD 1a0d067 PUD 1a0e063 PMD 851c1f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ... ... [last unloaded: 8250]
CPU: 7 PID: 116 Comm: kworker/7:1 Tainted: G O 4.1.12 #1
Hardware name: Insyde RiverForest/Type2 - Board Product Name1, BIOS NE5KV904 12/21/2015
Workqueue: events release_one_tty
task: ffff88085b684960 ti: ffff880852884000 task.ti: ffff880852884000
RIP: 0010:[<ffffffff81310f40>] [<ffffffff81310f40>] strchr+0x0/0x30
RSP: 0018:ffff880852887c90 EFLAGS: 00010282
RAX: ffffffff81a5eca0 RBX: ffffffffa01979de RCX: 0000000000000004
RDX: ffff880852887d10 RSI: 000000000000002f RDI: ffffffffa01979de
RBP: ffff880852887cd8 R08: 0000000000000000 R09: ffff88085f5d94d0
R10: 0000000000000195 R11: 0000000000000000 R12: ffffffffa01979de
R13: ffff880852887d00 R14: ffffffffa01979de R15: ffff88085f02e840
FS: 0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa01979de CR3: 0000000001a0c000 CR4: 00000000001406e0
Stack:
ffffffff812349b1 ffff880852887cb8 ffff880852887d10 ffff88085f5cd6c2
ffff880852800a80 ffffffffa01979de ffff880852800a84 0000000000000010
ffff88085bb28bd8 ffff880852887d38 ffffffff812354f0 ffff880852887d08
Call Trace:
[<ffffffff812349b1>] ? __xlate_proc_name+0x71/0xd0
[<ffffffff812354f0>] remove_proc_entry+0x40/0x180
[<ffffffff815f6811>] ? _raw_spin_lock_irqsave+0x41/0x60
[<ffffffff813be520>] ? destruct_tty_driver+0x60/0xe0
[<ffffffff81237c68>] proc_tty_unregister_driver+0x28/0x40
[<ffffffff813be548>] destruct_tty_driver+0x88/0xe0
[<ffffffff813be5bd>] tty_driver_kref_put+0x1d/0x20
[<ffffffff813becca>] release_one_tty+0x5a/0xd0
[<ffffffff81074159>] process_one_work+0x139/0x420
[<ffffffff810745a1>] worker_thread+0x121/0x450
[<ffffffff81074480>] ? process_scheduled_works+0x40/0x40
[<ffffffff8107a16c>] kthread+0xec/0x110
[<ffffffff81080000>] ? tg_rt_schedulable+0x210/0x220
[<ffffffff8107a080>] ? kthread_freezable_should_stop+0x80/0x80
[<ffffffff815f7292>] ret_from_fork+0x42/0x70
[<ffffffff8107a080>] ? kthread_freezable_should_stop+0x80/0x80
Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5e422f5e4fd71d18bc6b851eeb3864477b3d842e ]
There was one spot in xfs_bmap_add_extent_unwritten_real that didn't use the
passed in new extent state but always converted to normal, leading to wrong
behavior when converting from normal to unwritten.
Only found by code inspection, it seems like this code path to move partial
extent from written to unwritten while merging it with the next extent is
rarely exercised.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ed438b476b611c67089760037139f93ea8ed41d5 ]
For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
return ENODATA so that we don't confuse it with the pre-existing ENOENT
cases (inode is in cache, but freed).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9f2a4505800607e537e9dd9dea4f55c4b0c30c7a ]
It is possible for mkfs to format very small filesystems with too
small of an internal log with respect to the various minimum size
and block count requirements. If this occurs when the log happens to
be smaller than the scan window used for cycle verification and the
scan wraps the end of the log, the start_blk calculation in
xlog_find_head() underflows and leads to an attempt to scan an
invalid range of log blocks. This results in log recovery failure
and a failed mount.
Since there may be filesystems out in the wild with this kind of
geometry, we cannot simply refuse to mount. Instead, cap the scan
window for cycle verification to the size of the physical log. This
ensures that the cycle verification proceeds as expected when the
scan wraps the end of the log.
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9ca2e97fa3c3216200afe35a3b111ec51cc796d2 ]
If 'btrfs_alloc_path()' fails, we must free the resources already
allocated, as done in the other error handling paths in this function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3993b112dac968612b0b213ed59cb30f50b0015b ]
There are checks on fs_info in __btrfs_panic to avoid dereferencing a
null fs_info, however, there is a call to btrfs_crit that may also
dereference a null fs_info. Fix this by adding a check to see if fs_info
is null and only print the s_id if fs_info is non-null.
Detected by CoverityScan CID#401973 ("Dereference after null check")
Fixes: efe120a067c8 ("Btrfs: convert printk to btrfs_ and fix BTRFS prefix")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0af2c4bf5a012a40a2f9230458087d7f068339d0 ]
When new device is being added to seed FS, seed FS is marked writable,
but when we fail to bring in the new device, we missed to undo the
writable part. This patch fixes it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9417ebc8a676487c6ec8825f92fb28f7dbeb5f4b ]
btrfs_udpate_root can fail and it aborts the transaction, the correct
way to handle an aborted transaction is to explicitly end with
btrfs_end_transaction. Even now the code is correct since
btrfs_commit_transaction would handle an aborted transaction but this is
more of an implementation detail. So let's be explicit in handling
failure in btrfs_update_root.
Furthermore btrfs_commit_transaction can also fail and by ignoring it's
return value we could have left the in-memory copy of the root item in
an inconsistent state. So capture the error value which allows us to
correctly revert the RO/RW flags in case of commit failure.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 102ed2c5ff932439bbbe74c7bd63e6d5baa9f732 ]
When one of the device is missing, bbio_error() takes care of setting
the error status. And if its only IO that is pending in that stripe, it
fails to check the status of the other IO at %bbio_error before setting
the error %bi_status for the %orig_bio. Fix this by checking if
%bbio->error has exceeded the %bbio->max_errors.
Reproducer as below fdatasync error is seen intermittently.
mount -o degraded /dev/sdc /btrfs
dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
The reason for the intermittences of the problem is because
the following conditions have to be met, which depends on timing:
In btrfs_map_bio()
- the RAID1 the missing device has to be at %dev_nr = 1
In bbio_error()
. before bbio_error() is called the bio of the not-missing
device at %dev_nr = 0 must be completed so that the below
condition is true
if (atomic_dec_and_test(&bbio->stripes_pending)) {
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cc555b09d8c3817aeebda43a14ab67049a5653f7 ]
This patch fixes a deadlock caused when the jdata flag is set for
inodes that are already on the ordered write list. Since it is
on the ordered write list, log_flush calls gfs2_ordered_write which
calls filemap_fdatawrite. But since the inode had the jdata flag
set, that calls gfs2_jdata_writepages, which tries to start a new
transaction. A new transaction cannot be started because it tries
to acquire the log_flush rwsem which is already locked by the log
flush operation.
The bottom line is: We cannot switch an inode from ordered to jdata
until we eliminate any ordered data pages (via log flush) or any
log_flush operation afterward will create the circular dependency
above. So we need to flush the log before setting the diskflags to
switch the file mode, then we need to remove the inode from the
ordered writes list.
Before this patch, the log flush was done for jdata->ordered, but
that's wrong. If we're going from jdata to ordered, we don't need
to call gfs2_log_flush because the call to filemap_fdatawrite will
do it for us:
filemap_fdatawrite() -> __filemap_fdatawrite_range()
__filemap_fdatawrite_range() -> do_writepages()
do_writepages() -> gfs2_jdata_writepages()
gfs2_jdata_writepages() -> gfs2_log_flush()
This patch modifies function do_gfs2_set_flags so that if a file
has its jdata flag set, and it's already on the ordered write list,
the log will be flushed and it will be removed from the list
before setting the flag.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 350976ae21873b0d36584ea005076356431b8f79 ]
On truncate down, if new size is not block size aligned, we zero the
rest of block to avoid exposing stale data to user, and
iomap_truncate_page() skips zeroing if the range is already in
unwritten state or a hole. Then we writeback from on-disk i_size to
the new size if this range hasn't been written to disk yet, and
truncate page cache beyond new EOF and set in-core i_size.
The problem is that we could write data between di_size and newsize
before removing the page cache beyond newsize, as the extents may
still be in unwritten state right after a buffer write. As such, the
page of data that newsize lies in has not been zeroed by page cache
invalidation before it is written, and xfs_do_writepage() hasn't
triggered it's "zero data beyond EOF" case because we haven't
updated in-core i_size yet. Then a subsequent mmap read could see
non-zeros past EOF.
I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):
fallocate 0x0 0x1000 0x0 keep_size
write 0x0 0x1000 0x0
truncate 0x0 0x800 0x1000
punch_hole 0x0 0x800 0x800
mapread 0x0 0x800 0x800
where fallocate allocates unwritten extent but doesn't update
i_size, buffer write populates the page cache and extent is still
unwritten, truncate skips zeroing page past new EOF and writes the
page to disk, punch_hole invalidates the page cache, at last mapread
reads the block back and sees non-zero beyond EOF.
Fix it by moving truncate_setsize() to before writeback so the page
cache invalidation zeros the partial page at the new EOF. This also
triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
time, because newsize has been set and page straddles the newsize.
Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.
Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9d5afec6b8bd46d6ed821aa1579634437f58ef1f upstream.
On a ppc64 machine, when mounting a fuzzed ext2 image (generated by
fsfuzzer) the following call trace is seen,
VFS: brelse: Trying to free free buffer
WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40
.__brelse.part.6+0x20/0x40 (unreliable)
.ext4_find_entry+0x384/0x4f0
.ext4_lookup+0x84/0x250
.lookup_slow+0xdc/0x230
.walk_component+0x268/0x400
.path_lookupat+0xec/0x2d0
.filename_lookup+0x9c/0x1d0
.vfs_statx+0x98/0x140
.SyS_newfstatat+0x48/0x80
system_call+0x58/0x6c
This happens because the directory that ext4_find_entry() looks up has
inode->i_size that is less than the block size of the filesystem. This
causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not
reading any of the directory file's blocks. This renders the entries in
bh_use[] array to continue to have garbage data. buffer_uptodate() on
bh_use[0] can then return a zero value upon which brelse() function is
invoked.
This commit fixes the bug by returning -ENOENT when the directory file
has no associated blocks.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 996fc4477a0ea28226b30d175f053fb6f9a4fa36 upstream.
It's possible for ext4_get_acl() to return an ERR_PTR. So we need to
add a check for this case in __ext4_new_inode(). Otherwise on an
error we can end up oops the kernel.
This was getting triggered by xfstests generic/388, which is a test
which exercises the shutdown code path.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c894aa97577e47d3066b27b32499ecf899bfa8b0 upstream.
Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2)
then crash, we'll see wrong allocated block number (stat -c %b), the
blocks allocated beyond EOF are all lost. fstests generic/468
exposes this bug.
Commit 67a7d5f561f4 ("ext4: fix fdatasync(2) after extent
manipulation operations") fixed all the other extent manipulation
operation paths such as hole punch, zero range, collapse range etc.,
but forgot the fallocate case.
So similarly, fix it by recording the correct journal tid in ext4
inode in fallocate(2) path, so that ext4_sync_file() will wait for
the right tid to be committed on fdatasync(2).
This addresses the test failure in xfstests test generic/468.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc82228a5e3860502dbf3bfa4a9570cb7093cf7f upstream.
407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks)
broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
executable fails because the /lib/ld-linux.so.2 fast symlink
cannot be read anymore.
The patch assumed fast symlinks were created in a specific way,
but that's not true on these really old file systems.
The new behavior is apparently needed only with the large EA inode
feature.
Revert to the old behavior if the large EA inode feature is not set.
This makes my old VM boot again.
Fixes: 407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 779f4e1c6c7c661db40dfebd6dd6bda7b5f88aa3 upstream.
This reverts commit 04e35f4495dd560db30c25efca4eecae8ec8c375.
SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 upstream.
If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:
[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
[ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
[ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000
[ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 040d786032bf59002d374b86d75b04d97624005c upstream.
Negative child dentry holds reference on inode's alias, it makes
d_prune_aliases() do nothing.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b02a16e6413a2f782e542ef60bad9ff6bf212f8a upstream.
This fixes a regression with readdir of impure dir in overlayfs
that is shared to VM via 9p fs.
Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 08d8f8a5b094b66b29936e8751b4a818b8db1207 upstream.
Right now we seem to be passing index as "lowerdentry" and origin.dentry
as "upperdentry". IIUC, we should pass these parameters in reversed order
and this looks like a bug.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 302ec300ef8a545a7fc7f667e5fd743b091c2eeb upstream.
Commit ecc0c469f277 ("autofs: don't fail mount for transient error") was
meant to replace an 'if' with a 'switch', but instead added the 'switch'
leaving the case in place.
Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name
Fixes: ecc0c469f277 ("autofs: don't fail mount for transient error")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a821df3f1af72aa6a0d573eea94a7dd2613e9f4e upstream.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d2dc2cc766c3b51929658cacbc6e34fc8e242fb upstream.
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f4b3526d83c40dd8bf5948b9d7a1b2c340f0dcc8 ]
The handler for the CB.ProbeUuid operation in the cache manager is
implemented, but isn't listed in the switch-statement of operation
selection, so won't be used. Fix this by adding it.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1199db603511d7463d9d3840f96f61967affc766 ]
Fix the total-length calculation in afs_make_call() when the operation
being dispatched has data from a series of pages attached.
Despite the patched code looking like that it should reduce mathematically
to the current code, it doesn't because the 32-bit unsigned arithmetic
being used to calculate the page-offset-difference doesn't correctly extend
to a 64-bit value when the result is effectively negative.
Without this, some FS.StoreData operations that span multiple pages fail,
reporting too little or too much data.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 28cfafb73853f0494b06649716687a3ea07681d5 ]
We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.
Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9280a601e6080c9ff658468c1c775ff6514099a6 ]
Currently we just return err here, but we need to put the fd reference
first.
Fixes: 94073ad77fff (fs/locks: don't mess with the address limit in compat_fcntl64)
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 962cc1ad6caddb5abbb9f0a43e5abe7131a71f18 ]
In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we
skip an inode if we're racing with freeing the inode via
xfs_reclaim_inode, but we forgot to release the rcu read lock when
dumping the inode, with the result that we exit to userspace with a lock
held. Don't do that; generic/320 with a 1k block size fails this
very occasionally.
================================================
WARNING: lock held when returning to user space!
4.14.0-rc6-djwong #4 Tainted: G W
------------------------------------------------
rm/30466 is leaving the kernel with locks still held!
1 lock held by rm/30466:
#0: (rcu_read_lock){....}, at: [<ffffffffa01364d3>] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs]
------------[ cut here ]------------
WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700
Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
CPU: 1 PID: 30466 Comm: rm Tainted: G W 4.14.0-rc6-djwong #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
task: ffff880037680000 task.stack: ffffc90001064000
RIP: 0010:rcu_note_context_switch+0x71/0x700
RSP: 0000:ffffc90001067e50 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200
RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000
R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690
FS: 00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0
Call Trace:
__schedule+0xb8/0xb10
schedule+0x40/0x90
exit_to_usermode_loop+0x6b/0xa0
prepare_exit_to_usermode+0x7a/0x90
retint_user+0x8/0x20
RIP: 0033:0x7fa3b87fda87
RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87
RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060
R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000
---[ end trace e88f83bf0cfbd07d ]---
Fixes: f2e9ad212def50bcf4c098c6288779dd97fff0f0
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d803224c84be067754db7fa58a93f36f61566493 ]
On successful rename, the "old_dentry" is retained and is attached to
the "new_dir", so we need to call nfs_set_verifier() accordingly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 98159d977f71c3b3dee898d1c34e56f520b094e7 ]
Patch series "A few round_pipe_size() and pipe-max-size fixups", v3.
While backporting Michael's "pipe: fix limit handling" patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:
1 - procfs signed wrap: echo'ing a large number into
/proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
negative value.
2 - round_pipe_size() nr_pages overflow on 32bit: this would
subsequently try roundup_pow_of_two(0), which is undefined.
3 - visible non-rounded pipe-max-size value: there is no mutual
exclusion or protection between the time pipe_max_size is assigned
a raw value from proc_dointvec_minmax() and when it is rounded.
4 - unsigned long -> unsigned int conversion makes for potential odd
return errors from do_proc_douintvec_minmax_conv() and
do_proc_dopipe_max_size_conv().
This version underwent the same testing as v1:
https://marc.info/?l=linux-kernel&m=150643571406022&w=2
This patch (of 4):
pipe_max_size is defined as an unsigned int:
unsigned int pipe_max_size = 1048576;
but its procfs/sysctl representation is an integer:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
that is signed:
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
This leads to signed results via procfs for large values of pipe_max_size:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
-2147483648
Use unsigned operations on this variable to avoid such negative values.
Link: http://lkml.kernel.org/r/1507658689-11669-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 692826b2738101549f032a761a9191636e83be4e upstream.
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false. The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.
Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.
Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e19182c0fff451e3744c1107d98f072e7ca377a0 upstream.
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.
Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream.
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.
If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.
These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.
Fixes: efda760fe95e ("lockd: fix lockd shutdown race")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 64ebe12494fd5d193f014ce38e1fd83cc57883c8 upstream.
From kernel 4.9, my two nfsv4 servers sometimes suffer from
"panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().
These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".
The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().
Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks"
Cc: jlayton@redhat.com
Reveiwed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d8a1a000555ecd1b824ac1ed6df8fe364dfbbbb0 upstream.
If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.
Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 15ca08d3299682dc49bad73251677b2c5017ef08 upstream.
Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e138e0d92c6c9d3d481674fb14e3439b495be37 upstream.
We discovered a box that had double allocations, and suspected the space
cache may be to blame. While auditing the write out path I noticed that
if we've already setup the space cache we will just carry on. This
means that any error we hit after cache_save_setup before we go to
actually write the cache out we won't reset the inode generation, so
whatever was already written will be considered correct, except it'll be
stale. Fix this by _always_ resetting the generation on the block group
inode, this way we only ever have valid or invalid cache.
With this patch I was no longer able to reproduce cache corruption with
dm-log-writes and my bpf error injection tool.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5d38f049cee1e1c4a7ac55aa79d37d01ddcc3860 upstream.
Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
flag but introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Neil Brown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 43694d4bf843ddd34519e8e9de983deefeada699 upstream.
While commit 092a53452bb7 ("autofs: take more care to not update
last_used on path walk") helped (partially) resolve a problem where
automounts were not expiring due to aggressive accesses from user space
it has a side effect for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire
policy. So the patch being reverted introduces a regression which
should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.themaw.net
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Colin Walters <walters@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b6e8e12c0aeb5fbf1bf46c84d58cc93aedede385 upstream.
Commit bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to
sb_rdonly(sb)") converted fat_remount():new_rdonly from a bool to an
int.
However fat_remount() depends upon the compiler's conversion of a
non-zero integer into boolean `true'.
Fix it by switching `new_rdonly' back into a bool.
Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp
Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Joe Perches <joe@perches.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04e35f4495dd560db30c25efca4eecae8ec8c375 upstream.
While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.
Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Brad Spengler <spender@grsecurity.net>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 957ac8c421ad8b5eef9b17fe98e146d8311a541e upstream.
PMD faults on a zero length file on a file system mounted with -o dax
will not generate SIGBUS as expected.
fd = open(...O_TRUNC);
addr = mmap(NULL, 2*1024*1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*addr = 'a';
<expect SIGBUS>
The problem is this code in dax_iomap_pmd_fault:
max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
If the inode size is zero, we end up with a max_pgoff that is way larger
than 0. :) Fix it by using DIV_ROUND_UP, as is done elsewhere in the
kernel.
I tested this with some simple test code that ensured that SIGBUS was
received where expected.
Fixes: 642261ac995e ("dax: add struct iomap based DAX PMD support")
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dc3033e16c59a2c4e62b31341258a5786cbcee56 upstream.
lockd_up() can call lockd_unregister_notifiers twice:
inside lockd_start_svc() when it calls lockd_svc_exit_thread()
and then in error path of lockd_up()
Patch forces lockd_start_svc() to unregister notifiers in all error cases
and removes extra unregister in error path of lockd_up().
Fixes: cb7d224f82e4 "lockd: unregister notifier blocks if the service ..."
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8ee031631546cf2f7859cc69593bd60bbdd70b46 upstream.
Commit fd2421f54423 ("fs/9p: When doing inode lookup compare qid details
and inode mode bits.") transformed v9fs_qid_iget() to use iget5_locked()
instead of iget_locked(). However, the test() callback is not checking
fid.path at all, which means that a lookup in the inode cache can now
accidentally locate a completely wrong inode from the same inode hash
bucket if the other fields (qid.type and qid.version) match.
Fixes: fd2421f54423 ("fs/9p: When doing inode lookup compare qid details and inode mode bits.")
Reviewed-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9072d859df3e0f2c3ba450f0d1739595c2d5d13 upstream.
The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.
I've captured an instance of this data corruption in the following fstest:
https://patchwork.kernel.org/patch/9948377/
Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 559db4c6d784ceedc2a5418ced4d357cb843e221 upstream.
If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:
https://www.spinics.net/lists/linux-xfs/msg09859.html
The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:
https://patchwork.kernel.org/patch/9948381/
Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream.
If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).
If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:
Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)
This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.
Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|