Age | Commit message (Collapse) | Author | Files | Lines |
|
commit ce85cfbed6fe3dbc01bd1976b23ac3e97878cde6 upstream.
If a READDIR reply comes back without any page data, avoid a NULL pointer
dereference in xdr_copy_to_scratch().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffff813a378d>] memcpy+0xd/0x110
...
Call Trace:
? xdr_inline_decode+0x7a/0xb0 [sunrpc]
nfs3_decode_dirent+0x73/0x320 [nfsv3]
nfs_readdir_page_filler+0xd5/0x4e0 [nfs]
? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3]
nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs]
? mem_cgroup_commit_charge+0xac/0x160
? nfs_readdir_xdr_to_array+0x330/0x330 [nfs]
nfs_readdir_filler+0x22/0x90 [nfs]
do_read_cache_page+0x7e/0x1a0
read_cache_page+0x1c/0x20
nfs_readdir+0x18e/0x660 [nfs]
? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3]
iterate_dir+0x97/0x130
SyS_getdents+0x94/0x120
? fillonedir+0xd0/0xd0
system_call_fastpath+0x12/0x17
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Neil Brown <nfbrown@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5d05e54af3cdbb13cf19c557ff2184781b91a22c upstream.
Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.
Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.
With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d9dfd8d741683347ee159d25f5b50c346a0df557 upstream.
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count. An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 275bb307865a3 ("NFSv4: Move dentry instantiation into the NFSv4-...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
mount
commit a41cbe86df3afbc82311a1640e20858c0cd7e065 upstream.
A test case is as the description says:
open(foobar, O_WRONLY);
sleep() --> reboot the server
close(foobar)
The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.
Signed-off-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream.
We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.
1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2
__nfs4_close(), state->n_wronly--
nfs4_state_set_mode_locked(), changes state->state = [R]
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3
__update_open_stateid()
nfs_set_open_stateid_locked(), changes state->flags
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
new sequence number is exposed now via nfs4_stateid_copy()
next step would be update_open_stateflags(), pending so_lock
1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
nfs4_close_prepare() gets so_lock and recalcs flags -> send close
1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
__update_open_stateid() gets so_lock
* update_open_stateflags() updates state->n_wronly.
nfs4_state_set_mode_locked() updates state->state
state->flags is [RW]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
* should have suppressed the preceding nfs4_close_prepare() from
sending open_downgrade
1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
nfs_clear_open_stateid_locked()
state->flags is [R]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
1964409 -> write reply (fails, openmode)
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.
Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c68a027c05709330fe5b2f50c50d5fa02124b5d8 upstream.
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed. A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: d6870312659d ("nfs4client: convert to idr_alloc()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e9ae58aeee8842a50f7e199d602a5ccb2e41a95f upstream.
We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit efcbc04e16dfa95fef76309f89710dd1d99a5453 upstream.
It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.
[pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.
When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).
If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.
So only treat O_EXCL specially if O_CREAT was also given.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e8d975e73e5fa05f983fbf2723120edcf68e0b38 upstream.
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d683cc49daf7c5afca8cd9654aaa1bf63cdf2ad9 upstream.
When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.
Fixes: ee5dc7732bd5 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f044636d972246d451e06226cc1675d5da389762 upstream.
Ensure that other operations which raced with our setattr RPC call
cannot revert the file attribute changes that were made on the server.
To do so, we artificially bump the attribute generation counter on
the inode so that all calls to nfs_fattr_init() that precede ours
will be dropped.
The motivation for the patch came from Chuck Lever's reports of readaheads
racing with truncate operations and causing the file size to be reverted.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 7c0af9ffb7bb4e5355470fa60b3eb711ddf226fa upstream.
put_rpccred() can sleep.
Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d8ba1f971497c19cf80da1ea5391a46a5f9fbd41 upstream.
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.
Reported-by: David Ramos <daramos@stanford.edu>
Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6ffa30d3f734d4f6b478081dfc09592021028f90 upstream.
Bruce reported seeing this warning pop when mounting using v4.1:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
Call Trace:
[<ffffffff8186ac78>] dump_stack+0x4c/0x65
[<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
[<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
[<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
[<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
[<ffffffff810d941e>] groups_alloc+0x3e/0x130
[<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
[<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
[<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
[<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
[<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
[<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
[<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
[<ffffffff810d45bf>] kthread+0x11f/0x140
[<ffffffff810ea815>] ? local_clock+0x15/0x30
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
[<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
---[ end trace 675220a11e30f4f2 ]---
nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.
Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3175e1dcec40fab1a444c010087f2068b6b04732 upstream.
If we start state recovery on a client that failed to initialise correctly,
then we are very likely to Oops.
Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Link: http://lkml.kernel.org/r/130621862.279655.1421851650684.JavaMail.zimbra@desy.de
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ee8a1a8b160a87dc3a9c81a86796aa4db85ea815 upstream.
We only support swap file calling nfs_direct_IO. However, application
might be able to get to nfs_direct_IO if it toggles O_DIRECT flag
during IO and it can deadlock because we grab inode->i_mutex in
nfs_file_direct_write(). So return 0 for such case. Then the generic
layer will fall back to buffer IO.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 1fc0703af3143914a389bfa081c7acb09502ed5d upstream.
Currently, our trunking code will check for session trunking, but will
fail to detect client id trunking. This is a problem, because it means
that the client will fail to recognise that the two connections represent
shared state, even if they do not permit a shared session.
By removing the check for the server minor id, and only checking the
major id, we will end up doing the right thing in both cases: we close
down the new nfs_client and fall back to using the existing one.
Fixes: 05f4c350ee02e ("NFS: Discover NFSv4 server trunking when mounting")
Cc: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4bd5a980de87d2b5af417485bde97b8eb3d6cf6a upstream.
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt
early so that it is safe to call .release in case nfs4_alloc_pages
fails.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: a47970ff78147 ("NFSv4.1: Hold reference to layout hdr in layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 0c116cadd94b16b30b1dd90d38b2784d9b39b01a upstream.
This patch removes the assumption made previously, that we only need to
check the delegation stateid when it matches the stateid on a cached
open.
If we believe that we hold a delegation for this file, then we must assume
that its stateid may have been revoked or expired too. If we don't test it
then our state recovery process may end up caching open/lock state in a
situation where it should not.
We therefore rename the function nfs41_clear_delegation_stateid as
nfs41_check_delegation_stateid, and change it to always run through the
delegation stateid test and recovery process as outlined in RFC5661.
http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 869f9dfa4d6d57b79e0afc3af14772c2a023eeb1 upstream.
Any attempt to call nfs_remove_bad_delegation() while a delegation is being
returned is currently a no-op. This means that we can end up looping
forever in nfs_end_delegation_return() if something causes the delegation
to be revoked.
This patch adds a mechanism whereby the state recovery code can communicate
to the delegation return code that the delegation is no longer valid and
that it should not be used when reclaiming state.
It also changes the return value for nfs4_handle_delegation_recall_error()
to ensure that nfs_end_delegation_return() does not reattempt the lock
reclaim before state recovery is done.
http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 16caf5b6101d03335b386e77e9e14136f989be87 upstream.
Variable 'err' needn't be initialized when nfs_getattr() uses it to
check whether it should call generic_fillattr() or not. That can result
in spurious error returns. Initialize 'err' properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f8ebf7a8ca35dde321f0cd385fee6f1950609367 upstream.
If state recovery failed, then we should not attempt to reclaim delegated
state.
http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4dfd4f7af0afd201706ad186352ca423b0f17d4b upstream.
NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so
unlike NFSv4.1, the recovery procedure when stateids have expired or
have been revoked requires us to just forget the delegation.
http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8c393f9a721c30a030049a680e1bf896669bb279 upstream.
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.
Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d1f456b0b9545f1606a54cd17c20775f159bd2ce upstream.
Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.
The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:
In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.
Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
An example application:
open, lock, write a file.
sleep for 6 * lease (could be less)
ulock, close.
In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.
This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit df817ba35736db2d62b07de6f050a4db53492ad8 upstream.
The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a4339b7b686b4acc8b6de2b07d7bacbe3ae44b83 upstream.
If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.
RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 92a56555bd576c61b27a5cab9f38a33a1e9a1df5 upstream.
If a SIGKILL is sent to a task waiting in __nfs_iocounter_wait,
it will busy-wait or soft lockup in its while loop.
nfs_wait_bit_killable won't sleep, and the loop won't exit on
the error return.
Stop the busy-wait by breaking out of the loop when
nfs_wait_bit_killable returns an error.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Neil Brown <nfbrown@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit cd9288ffaea4359d5cfe2b8d264911506aed26a4 upstream.
James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.
Reported-by: James Drews <drews@engr.wisc.edu>
Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu
Fixes: aee7af356e15 (NFSv4: Fix problems with close in the presence...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 080af20cc945d110f9912d01cf6b66f94a375b8d upstream.
There is a race between nfs4_state_manager() and
nfs_server_remove_lists() that happens during a nfsv3 mount.
The v3 mount notices there is already a supper block so
nfs_server_remove_lists() called which uses the nfs_client_lock
spin lock to synchronize access to the client list.
At the same time nfs4_state_manager() is running through
the client list looking for work to do, using the same
lock. When nfs4_state_manager() wins the race to the
list, a v3 client pointer is found and not ignored
properly which causes the panic.
Moving some protocol checks before the state checking
avoids the panic.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e7b563bb2a6f4d974208da46200784b9c5b5a47e upstream.
The radix tree hole searching code is only used for page cache, for
example the readahead code trying to get a a picture of the area
surrounding a fault.
It sufficed to rely on the radix tree definition of holes, which is
"empty tree slot". But this is about to change, though, as shadow page
descriptors will be stored in the page cache after the actual pages get
evicted from memory.
Move the functions over to mm/filemap.c and make them native page cache
operations, where they can later be adapted to handle the new definition
of "page cache hole".
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit aee7af356e151494d5014f57b33460b162f181b5 upstream.
In the presence of delegations, we can no longer assume that the
state->n_rdwr, state->n_rdonly, state->n_wronly reflect the open
stateid share mode, and so we need to calculate the initial value
for calldata->arg.fmode using the state->flags.
Reported-by: James Drews <drews@engr.wisc.edu>
Fixes: 88069f77e1ac5 (NFSv41: Fix a potential state leakage when...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 18dd78c427513fb0f89365138be66e6ee8700d1b upstream.
NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation.
We're still having some problems with data corruption when multiple
clients are appending to a file and those clients are being granted
write delegations on open.
To reproduce:
Client A:
vi /mnt/`hostname -s`
while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
Client B:
vi /mnt/`hostname -s`
while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
What's happening is that in nfs_update_inode() we're recognizing that
the file size has changed and we're setting NFS_INO_INVALID_DATA
accordingly, but then we ignore the cache_validity flags in
nfs_write_pageuptodate() because we have a delegation. As a result,
in nfs_updatepage() we're extending the write to cover the full page
even though we've not read in the data to begin with.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a914722f333b3359d2f4f12919380a334176bb89 upstream.
Otherwise the kernel oopses when remounting with IPv6 server because
net is dereferenced in dev_get_by_name.
Use net ns of current thread so that dev_get_by_name does not operate on
foreign ns. Changing the address is prohibited anyway so this should not
affect anything.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 43b6535e717d2f656f71d9bd16022136b781c934 upstream.
Fix a bug, whereby nfs_update_inode() was declaring the inode to be
up to date despite not having checked all the attributes.
The bug occurs because the temporary variable in which we cache
the validity information is 'sanitised' before reapplying to
nfsi->cache_validity.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6df200f5d5191bdde4d2e408215383890f956781 upstream.
Return the NULL pointer when the allocation fails.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e911b8158ee1def8153849b1641b736026b036e0 upstream.
If we interrupt the nfs4_wait_for_completion_rpc_task() call in
nfs4_run_open_task(), then we don't prevent the RPC call from
completing. So freeing up the opendata->f_attr.mdsthreshold
in the error path in _nfs4_do_open() leads to a use-after-free
when the XDR decoder tries to decode the mdsthreshold information
from the server.
Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN)
Tested-by: Steve Dickson <SteveD@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8f493b9cfcd8941c6b27d6ce8e3b4a78c094b3c1 upstream.
nfs3_proc_setacls is used internally by the NFSv3 create operations
to set the acl after the file has been created. If the operation
fails because the server doesn't support acls, then it must return '0',
not -EOPNOTSUPP.
Reported-by: Russell King <linux@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/20140201010328.GI15937@n2100.arm.linux.org.uk
Cc: Christoph Hellwig <hch@lst.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a1800acaf7d1c2bf6d68b9a8f4ab8560cc66555a upstream.
Avoid returning incorrect acl mask attributes when the server doesn't
support ACLs.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f9c96fcc501a43dbc292b17fc0ded4b54e63b79d upstream.
Fix a dynamic session slot leak where a slot is preallocated and I/O is
resent through the MDS.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4db72b40fdbc706f8957e9773ae73b1574b8c694 upstream.
If the setting of NFS_INO_INVALIDATING gets reordered to before the
clearing of NFS_INO_INVALID_DATA, then another task may hit a race
window where both appear to be clear, even though the inode's pages are
still in need of invalidation. Fix this by adding the appropriate memory
barriers.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 17dfeb9113397a6119091a491ef7182649f0c5a9 upstream.
Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling
of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces
a potential race, since it doesn't test the value of nfsi->cache_validity
and set the bitlock in nfsi->flags atomically.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d529ef83c355f97027ff85298a9709fe06216a66 upstream.
There is a possible race in how the nfs_invalidate_mapping function is
handled. Currently, we go and invalidate the pages in the file and then
clear NFS_INO_INVALID_DATA.
The problem is that it's possible for a stale page to creep into the
mapping after the page was invalidated (i.e., via readahead). If another
writer comes along and sets the flag after that happens but before
invalidate_inode_pages2 returns then we could clear the flag
without the cache having been properly invalidated.
So, we must clear the flag first and then invalidate the pages. Doing
this however, opens another race:
It's possible to have two concurrent read() calls that end up in
nfs_revalidate_mapping at the same time. The first one clears the
NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.
Just before calling that though, the other task races in, checks the
flag and finds it cleared. At that point, it trusts that the mapping is
good and gets the lock on the page, allowing the read() to be satisfied
from the cache even though the data is no longer valid.
These effects are easily manifested by running diotest3 from the LTP
test suite on NFS. That program does a series of DIO writes and buffered
reads. The operations are serialized and page-aligned but the existing
code fails the test since it occasionally allows a read to come out of
the cache incorrectly. While mixing direct and buffered I/O isn't
recommended, I believe it's possible to hit this in other ways that just
use buffered I/O, though that situation is much harder to reproduce.
The problem is that the checking/clearing of that flag and the
invalidation of the mapping really need to be atomic. Fix this by
serializing concurrent invalidations with a bitlock.
At the same time, we also need to allow other places that check
NFS_INO_INVALID_DATA to check whether we might be in the middle of
invalidating the file, so fix up a couple of places that do that
to look for the new NFS_INO_INVALIDATING flag.
Doing this requires us to be careful not to set the bitlock
unnecessarily, so this code only does that if it believes it will
be doing an invalidation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 471252cd8b34b0609973740b25dcd1ff01dc1889 upstream.
cond_resched_lock(cinfo->lock) is called everywhere else while holding
the cinfo->lock spinlock. Not holding this lock while calling
transfer_commit_list in filelayout_recover_commit_reqs causes the BUG
below.
It's true that we can't hold this lock while calling pnfs_put_lseg,
because that might try to lock the inode lock - which might be the
same lock as cinfo->lock.
To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command
that crosses a stripe boundary and is not page aligned, such as:
dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct
BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161
in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1
2 locks held by kworker/0:1/27:
#0: (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
#1: ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: events nfs_direct_write_schedule_work [nfs]
0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130 ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40 ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000
Call Trace:
[<ffffffff81491256>] dump_stack+0x4d/0x66
[<ffffffff8105f103>] __might_sleep+0x100/0x105
[<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files]
[<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files]
[<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs]
[<ffffffff810705df>] ? mark_lock+0x1df/0x224
[<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4
[<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf
[<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs]
[<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
[<ffffffff81050258>] process_one_work+0x1f6/0x3a5
[<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
[<ffffffff8105187e>] worker_thread+0x149/0x1f5
[<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d
[<ffffffff81056d74>] kthread+0xd2/0xda
[<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
[<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0
[<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 1f90ee27461e31a1c18e5d819f6ea6f5c7304b16 upstream.
i_dio_count is used to protect dio access against truncate. We want
to make sure there are no dio reads pending either when doing a
truncate. I suspect on plain NFS things might work even without
this, but once we use a pnfs layout driver that access backing devices
directly things will go bad without the proper synchronization.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 2a009ec98cce440c0992fc9a2353e96cdb0b048b upstream.
We need to have the I/O fully finished before telling the truncate code
that we are done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 9811cd57f4c6b5b60ec104de68a88303717e3106 upstream.
nfs_file_direct_write only updates the inode size if it succeeded and
returned the number of bytes written. But in the AIO case nfs_direct_wait
turns the return value into -EIOCBQUEUED and we skip the size update.
Instead the aio completion path should updated it, which this patch
does. The implementation is a little hacky because there is no obvious
way to find out we are called for a write in nfs_direct_complete.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a8c2275493b866961f4429a741251c630c4fc6d7 upstream.
The correct way to check on IPV6_ADDR_SCOPE_LINKLOCAL is to check with
the ipv6_addr_src_scope function.
Currently this can't be work, because ipv6_addr_scope returns a int with
a mask of IPV6_ADDR_SCOPE_MASK (0x00f0U) and IPV6_ADDR_SCOPE_LINKLOCAL
is 0x02. So the condition is always false.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|