summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-12-15SUNRPC: Fix a race in the receive code pathTrond Myklebust1-9/+19
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15nfs: don't wait on commit in nfs_commit_inode() if there were no commit requestsScott Mayhew1-0/+2
If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15xprtrdma: Spread reply processing over more CPUsChuck Lever4-6/+5
Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15nfs: fix a deadlock in nfs client initializationScott Mayhew2-4/+24
The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-30SUNRPC: Handle ENETDOWN errorsTrond Myklebust2-0/+6
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-29SUNRPC: Allow connect to return EHOSTUNREACHTrond Myklebust1-0/+1
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-29NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"Trond Myklebust1-2/+2
gcc 4.4.4 is too old to have full C11 anonymous union support, so the current initialiser fails to compile. Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> (compile-)Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: Revert "NFS: Move the flock open mode check into nfs_flock()"Benjamin Coddington2-16/+16
Commit e12937279c8b "NFS: Move the flock open mode check into nfs_flock()" changed NFSv3 behavior for flock() such that the open mode must match the lock type, however that requirement shouldn't be enforced for flock(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: Fix typo in nomigration mount optionJoshua Watt1-1/+1
The option was incorrectly masking off all other options. Signed-off-by: Joshua Watt <JPEWhacker@gmail.com> Cc: stable@vger.kernel.org #3.7 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18nfs: Fix ugly referral attributesChuck Lever1-10/+8
Before traversing a referral and performing a mount, the mounted-on directory looks strange: dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31 1969 dir.0 nfs4_get_referral is wiping out any cached attributes with what was returned via GETATTR(fs_locations), but the bit mask for that operation does not request any file attributes. Retrieve owner and timestamp information so that the memcpy in nfs4_get_referral fills in more attributes. Changes since v1: - Don't request attributes that the client unconditionally replaces - Request only MOUNTED_ON_FILEID or FILEID attribute, not both - encode_fs_locations() doesn't use the third bitmask word Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral") Suggested-by: Pradeep Thomas <pradeepthomas@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: super: mark expected switch fall-throughsGustavo A. R. Silva1-2/+10
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 703509 Addresses-Coverity-ID: 703510 Addresses-Coverity-ID: 703511 Addresses-Coverity-ID: 703512 Addresses-Coverity-ID: 703513 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18sunrpc: remove net pointer from messagesVasily Averin2-7/+7
Publishing of net pointer is not safe, use net->ns.inum as net ID [ 171.391947] RPC: created new rpcb local clients (rpcb_local_clnt: ..., rpcb_local_clnt4: ...) for net f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18nfs: remove net pointer from messagesVasily Averin1-7/+7
Publishing of net pointer is not safe, use net->ns.inum instead Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18sunrpc: exit_net cleanup check addedVasily Averin1-0/+3
Be sure that all_clients list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18nfs client: exit_net cleanup check addedVasily Averin1-0/+4
Be sure that nfs_client_list and nfs_volume_list lists initialized in net_init hook were return to initial state in net_exit hook. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18nfs/write: Use common error handling code in nfs_lock_and_join_requests()Markus Elfring1-8/+9
Add a jump target so that a bit of exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Replace closed stateids with the "invalid special stateid"Trond Myklebust3-1/+20
When decoding a CLOSE, replace the stateid returned by the server with the "invalid special stateid" described in RFC5661, Section 8.2.3. In nfs_set_open_stateid_locked, ignore stateids from closed state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: nfs_set_open_stateid must not trigger state recovery for closed stateTrond Myklebust1-1/+2
In nfs_set_open_stateid_locked, we must ignore stateids from closed state. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Check the open stateid when searching for expired stateTrond Myklebust1-0/+5
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Clean up nfs4_delegreturn_doneTrond Myklebust1-18/+14
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: cleanup nfs4_close_doneTrond Myklebust1-19/+16
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturnTrond Myklebust1-3/+12
If our layoutreturn returns an NFS4ERR_OLD_STATEID, then try to update the stateid and retry. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-closeTrond Myklebust3-2/+40
If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID, then try to update the stateid and retry. We know that there should be no further LAYOUTGET requests being launched. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Don't try to CLOSE if the stateid 'other' field has changedTrond Myklebust3-12/+13
If the stateid is no longer recognised on the server, either due to a restart, or due to a competing CLOSE call, then we do not have to retry. Any open contexts that triggered a reopen of the file, will also act as triggers for any CLOSE for the updated stateids. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.Trond Myklebust5-2/+65
If we're racing with an OPEN, then retry the operation instead of declaring it a success. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [Andrew W Elble: Fix a typo in nfs4_refresh_open_stateid] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: Fix a typo in nfs_rename()Trond Myklebust1-1/+1
On successful rename, the "old_dentry" is retained and is attached to the "new_dir", so we need to call nfs_set_verifier() accordingly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Fix open create exclusive when the server rebootsTrond Myklebust1-15/+26
If the server that does not implement NFSv4.1 persistent session semantics reboots while we are performing an exclusive create, then the return value of NFS4ERR_DELAY when we replay the open during the grace period causes us to lose the verifier. When the grace period expires, and we present a new verifier, the server will then correctly reply NFS4ERR_EXIST. This commit ensures that we always present the same verifier when replaying the OPEN. Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Add a tracepoint to document open stateid updatesTrond Myklebust2-0/+5
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFSv4: Fix OPEN / CLOSE raceTrond Myklebust3-35/+123
Ben Coddington has noted the following race between OPEN and CLOSE on a single client. Process 1 Process 2 Server ========= ========= ====== 1) OPEN file 2) OPEN file 3) Process OPEN (1) seqid=1 4) Process OPEN (2) seqid=2 5) Reply OPEN (2) 6) Receive reply (2) 7) new stateid, seqid=2 8) CLOSE file, using stateid w/ seqid=2 9) Reply OPEN (1) 10( Process CLOSE (8) 11) Reply CLOSE (8) 12) Forget stateid file closed 13) Receive reply (7) 14) Forget stateid file closed. 15) Receive reply (1). 16) New stateid seqid=1 is really the same stateid that was closed. IOW: the reply to the first OPEN is delayed. Since "Process 2" does not wait before closing the file, and it does not cache the closed stateid, then when the delayed reply is finally received, it is treated as setting up a new stateid by the client. The fix is to ensure that the client processes the OPEN and CLOSE calls in the same order in which the server processed them. This commit ensures that we examine the seqid of the stateid returned by OPEN. If it is a new stateid, we assume the seqid must be equal to the value 1, and that each state transition increments the seqid value by 1 (See RFC7530, Section 9.1.4.2, and RFC5661, Section 8.2.2). If the tracker sees that an OPEN returns with a seqid that is greater than the cached seqid + 1, then it bumps a flag to ensure that the caller waits for the RPCs carrying the missing seqids to complete. Note that there can still be pathologies where the server crashes before it can even send us the missing seqids. Since the OPEN call is still holding a slot when it waits here, that could cause the recovery to stall forever. To avoid that, we time out after a 5 second wait. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18sunrpc: Add rpc_request static trace pointChuck Lever2-2/+31
Display information about the RPC procedure being requested in the trace log. This sometimes critical information cannot always be derived from other RPC trace entries. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18sunrpc: Fix rpc_task_begin trace pointChuck Lever1-2/+1
The rpc_task_begin trace point always display a task ID of zero. Move the trace point call site so that it picks up the new task ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18SUNRPC: Fix parsing failure in trace points with XIDsChuck Lever1-24/+25
mount.nf-11159 8.... 905.248380: xprt_transmit: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049 mount.nf-11159 8.... 905.248381: rpc_task_sleep: task:6210@1 flags=0e80 state=0005 status=0 timeout=60000 queue=xprt_pending kworker/-1591 1.... 905.248419: xprt_lookup_rqst: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049 kworker/-1591 1.... 905.248423: xprt_complete_rqst: [FAILED TO PARSE] xid=351291440 status=24 addr=192.168.2.5 port=20049 Byte swapping is not available during trace-cmd report. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18net: sunrpc: mark expected switch fall-throughsGustavo A. R. Silva3-0/+16
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: Fix bool initialization/comparisonThomas Meyer4-8/+8
Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18NFS: Avoid RCU usage in tracepointsAnna Schumaker1-18/+6
There isn't an obvious way to acquire and release the RCU lock during a tracepoint, so we can't use the rpc_peeraddr2str() function here. Instead, rely on the client's cl_hostname, which should have similar enough information without needing an rcu_dereference(). Reported-by: Dave Jones <davej@codemonkey.org.uk> Cc: stable@vger.kernel.org # v3.12 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18xprtrdma: Update copyright noticesChuck Lever5-0/+5
Credit work contributed by Oracle engineers since 2014. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18xprtrdma: Remove include for linux/prefetch.hChuck Lever1-1/+0
Clean up. This include should have been removed by commit 23826c7aeac7 ("xprtrdma: Serialize credit accounting again"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18rpcrdma: Remove C structure definitions of XDR data itemsChuck Lever3-68/+3
Clean up: C-structure style XDR encoding and decoding logic has been replaced over the past several merge windows on both the client and server. These data structures are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-18xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE modeChuck Lever1-1/+1
Lift the Send and LocalInv completion handlers out of soft IRQ mode to make room for other work. Also, move the Send CQ to a different CPU than the CPU where the Receive CQ is running, for improved scalability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs_client.cl_count from atomic_t to refcount_tElena Reshetova7-32/+33
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_client.cl_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs_lock_context.count from atomic_t to refcount_tElena Reshetova2-7/+8
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_lock_context.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs4_lock_state.ls_count from atomic_t to refcount_tElena Reshetova3-8/+8
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_lock_state.ls_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs_cache_defer_req.count from atomic_t to refcount_tElena Reshetova2-4/+4
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_cache_defer_req.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs4_ff_layout_mirror.ref from atomic_t to refcount_tElena Reshetova2-5/+6
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_ff_layout_mirror.ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_tElena Reshetova2-7/+7
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_tElena Reshetova2-8/+8
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17fs, nfs: convert nfs4_pnfs_ds.ds_count from atomic_t to refcount_tElena Reshetova2-6/+7
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_pnfs_ds.ds_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17NFSv4.1: Fix up replays of interrupted requestsTrond Myklebust2-47/+103
If the previous request on a slot was interrupted before it was processed by the server, then our slot sequence number may be out of whack, and so we try the next operation using the old sequence number. The problem with this, is that not all servers check to see that the client is replaying the same operations as previously when they decide to go to the replay cache, and so instead of the expected error of NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could (if the operations match up) be mistaken by the client for a new reply. To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op in order to resync our slot sequence number. Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com> [olga.kornievskaia@gmail.com: fix an Oops] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17xprtrdma: Remove atomic send completion countingChuck Lever3-33/+0
The sendctx circular queue now guarantees that xprtrdma cannot overflow the Send Queue, so remove the remaining bits of the original Send WQE counting mechanism. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17xprtrdma: RPC completion should wait for Send completionChuck Lever4-4/+34
When an RPC Call includes a file data payload, that payload can come from pages in the page cache, or a user buffer (for direct I/O). If the payload can fit inline, xprtrdma includes it in the Send using a scatter-gather technique. xprtrdma mustn't allow the RPC consumer to re-use the memory where that payload resides before the Send completes. Otherwise, the new contents of that memory would be exposed by an HCA retransmit of the Send operation. So, block RPC completion on Send completion, but only in the case where a separate file data payload is part of the Send. This prevents the reuse of that memory while it is still part of a Send operation without an undue cost to other cases. Waiting is avoided in the common case because typically the Send will have completed long before the RPC Reply arrives. These days, an RPC timeout will trigger a disconnect, which tears down the QP. The disconnect flushes all waiting Sends. This bounds the amount of time the reply handler has to wait for a Send completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>