summaryrefslogtreecommitdiff
path: root/fs/nfsd/nfs4xdr.c
AgeCommit message (Collapse)AuthorFilesLines
2014-08-17nfsd4: reserve adequate space for LOCK opJ. Bruce Fields1-0/+8
As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space", we permit the server to process a LOCK operation even if there might not be space to return the conflicting lockowner, because we've made returning the conflicting lockowner optional. However, the rpc server still wants to know the most we might possibly return, so we need to take into account the possible conflicting lockowner in the svc_reserve_space() call here. Symptoms were log messages like "RPC request reserved 88 but used 108". Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space" Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17nfsd4: remove obsolete commentJ. Bruce Fields1-7/+0
We do what Neil suggests now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-10Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-78/+50
Pull nfsd updates from Bruce Fields: "This includes a major rewrite of the NFSv4 state code, which has always depended on a single mutex. As an example, open creates are no longer serialized, fixing a performance regression on NFSv3->NFSv4 upgrades. Thanks to Jeff, Trond, and Benny, and to Christoph for review. Also some RDMA fixes from Chuck Lever and Steve Wise, and miscellaneous fixes from Kinglong Mee and others" * 'for-3.17' of git://linux-nfs.org/~bfields/linux: (167 commits) svcrdma: remove rdma_create_qp() failure recovery logic nfsd: add some comments to the nfsd4 object definitions nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net nfsd: remove nfs4_lock_state: nfs4_laundromat nfsd: Remove nfs4_lock_state(): reclaim_complete() nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session() nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn() nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt() nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op() nfsd: remove old fault injection infrastructure nfsd: add more granular locking to *_delegations fault injectors nfsd: add more granular locking to forget_openowners fault injector nfsd: add more granular locking to forget_locks fault injector nfsd: add a list_head arg to nfsd_foreach_client_lock ...
2014-07-31nfsd: Add a mutex to protect the NFSv4.0 open owner replay cacheJeff Layton1-2/+0
We don't want to rely on the client_mutex for protection in the case of NFSv4 open owners. Instead, we add a mutex that will only be taken for NFSv4.0 state mutating operations, and that will be released once the entire compound is done. Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay take a reference to the stateowner when they are using it for NFSv4.0 open and lock replay caching. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23NFSD: Fix crash encoding lock reply on 32-bitKinglong Mee1-1/+3
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space" forgot to free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak. Worse, kfree() can be called on an uninitialized pointer in the case of a succesful lock (or one that fails for a reason other than a conflict). (Note that lock->lk_denied.ld_owner.data appears it should be zero here, until you notice that it's one arm of a union the other arm of which is written to in the succesful case by the memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-18nfsd4: zero op arguments beyond the 8th compound opJ. Bruce Fields1-1/+1
The first 8 ops of the compound are zeroed since they're a part of the argument that's zeroed by the memset(rqstp->rq_argp, 0, procp->pc_argsize); in svc_process_common(). But we handle larger compounds by allocating the memory on the fly in nfsd4_decode_compound(). Other than code recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied lock", I don't know of any examples of code depending on this initialization. But it definitely seems possible, and I'd rather be safe. Compounds this long are unusual so I'm much more worried about failure in this poorly tested cases than about an insignificant performance hit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11NFSD: Fix bad checking of space for padding in splice readKinglong Mee1-5/+2
Note that the caller has already reserved space for count and eof, so xdr->p has already moved past them, only the padding remains. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes dc97618ddd (nfsd4: separate splice and readv cases) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10NFSD: Fix memory leak in encoding denied lockKinglong Mee1-1/+3
Commit 8c7424cff6 (nfsd4: don't try to encode conflicting owner if low on space) forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: Cleanup nfs4svc_encode_compoundresTrond Myklebust1-14/+1
Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c instead of open coding in nfs4svc_encode_compoundres. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09NFSD: Avoid warning message when compile at i686 archKinglong Mee1-1/+1
fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv': >> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default] thislen = min(len, ((void *)xdr->end - (void *)xdr->p)); Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd4: replace defer_free by svcxdr_tmpallocJ. Bruce Fields1-29/+17
Avoid an extra allocation for the tmpbuf struct itself, and stop ignoring some allocation failures. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd4: remove nfs4_acl_newJ. Bruce Fields1-1/+1
This is a not-that-useful kmalloc wrapper. And I'd like one of the callers to actually use something other than kmalloc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd4: define svcxdr_dupstr to share some common codeJ. Bruce Fields1-13/+23
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd4: remove unused defer_free argumentJ. Bruce Fields1-12/+9
28e05dd8457c "knfsd: nfsd4: represent nfsv4 acl with array instead of linked list" removed the last user that wanted a custom free function. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd4: rename cr_linkname->cr_dataJ. Bruce Fields1-8/+7
The name of a link is currently stored in cr_name and cr_namelen, and the content in cr_linkname and cr_linklen. That's confusing. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: fix rare symlink decoding bugJ. Bruce Fields1-1/+12
An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-07nfsd: Fix bad reserving space for encoding rdattr_errorKinglong Mee1-1/+1
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-03nfs: fix nfs4d readlink truncated packetAvi Kivity1-1/+1
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding, but truncates the packet to the padding-less size. Fix by taking the padding into consideration when truncating the packet. Symptoms: # ll /mnt/ ls: cannot read symbolic link /mnt/test: Input/output error total 4 -rw-r--r--. 1 root root 0 Jun 14 01:21 123456 lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation) Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-28nfsd: fix rare symlink decoding bugJ. Bruce Fields1-1/+12
An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23NFSD: Using min/max/min_t/max_t for calculateKinglong Mee1-7/+3
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-18NFSD: fix bug for readdir of pseudofsKinglong Mee1-0/+1
Commit 561f0ed498ca (nfsd4: allow large readdirs) introduces a bug about readdir the root of pseudofs. Call xdr_truncate_encode() revert encoded name when skipping. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-07nfsd4: kill READ64J. Bruce Fields1-17/+16
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-07nfsd4: kill READ32J. Bruce Fields1-128/+127
While we're here, let's kill off a couple of the read-side macros. Leaving the more complicated ones alone for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd: make nfsd4_encode_fattr staticJeff Layton1-1/+1
sparse says: CHECK fs/nfsd/nfs4xdr.c fs/nfsd/nfs4xdr.c:2043:1: warning: symbol 'nfsd4_encode_fattr' was not declared. Should it be static? Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs bufferChristoph Hellwig1-2/+2
Note nobody's ever noticed because the typical client probably never requests FILES_AVAIL without also requesting something else on the list. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31NFSD: Adds macro EX_UUID_LEN for exports uuid's lengthKinglong Mee1-1/+2
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: better reservation of head space for krb5J. Bruce Fields1-2/+3
RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it once in the auth code, where this kind of estimate should be made. And while we're at it we can leave it zero when we're not using krb5i or krb5p. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: kill write32, write64J. Bruce Fields1-30/+21
And switch a couple other functions from the encode(&p,...) convention to the p = encode(p,...) convention mostly used elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: kill WRITEMEMJ. Bruce Fields1-34/+28
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: kill WRITE64J. Bruce Fields1-28/+24
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: kill WRITE32J. Bruce Fields1-152/+154
These macros just obscure what's going on. Adopt the convention of the client-side code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: really fix nfs4err_resource in 4.1 caseJ. Bruce Fields1-0/+8
encode_getattr, for example, can return nfserr_resource to indicate it ran out of buffer space. That's not a legal error in the 4.1 case. And in the 4.1 case, if we ran out of buffer space, we should have exceeded a session limit too. (Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix nfs4err_resource in 4.1 case" we originally tried fixing this error return before fixing the problem that we could error out while we still had lots of available space. The result was to trade one illegal error for another in those cases. We decided that was helpful, so reverted the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only reinstating it now that we've elimited almost all of those cases.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: allow exotic read compoundsJ. Bruce Fields1-36/+25
I'm not sure why a client would want to stuff multiple reads in a single compound rpc, but it's legal for them to do it, and we should really support it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: more read encoding cleanupJ. Bruce Fields1-12/+10
More cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: read encoding cleanupJ. Bruce Fields1-13/+14
Trivial cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: separate splice and readv casesJ. Bruce Fields1-42/+131
The splice and readv cases are actually quite different--for example the former case ignores the array of vectors we build up for the latter. It is probably clearer to separate the two cases entirely. There's some code duplication between the split out encoders, but this is only temporary and will be fixed by a later patch. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: turn off zero-copy-read in exotic casesJ. Bruce Fields1-4/+17
We currently allow only one read per compound, with operations before and after whose responses will require no more than about a page to encode. While we don't expect clients to violate those limits any time soon, this limitation isn't really condoned by the spec, so to future proof the server we should lift the limitation. At the same time we'd like to continue to support zero-copy reads. Supporting multiple zero-copy-reads per compound would require a new data structure to replace struct xdr_buf, which can represent only one set of included pages. So for now we plan to modify encode_read() to support either zero-copy or non-zero-copy reads, and use some heuristics at the start of the compound processing to decide whether a zero-copy read will work. This will allow us to support more exotic compounds without introducing a performance regression in the normal case. Later patches handle those "exotic compounds", this one just makes sure zero-copy is turned off in those cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: don't treat readlink like a zero-copy operationJ. Bruce Fields1-30/+12
There's no advantage to this zero-copy-style readlink encoding, and it unnecessarily limits the kinds of compounds we can handle. (In practice I can't see why a client would want e.g. multiple readlink calls in a comound, but it's probably a spec violation for us not to handle it.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: enforce rd_dircountJ. Bruce Fields1-1/+4
As long as we're here, let's enforce the protocol's limit on the number of directory entries to return in a readdir. I don't think anyone's ever noticed our lack of enforcement, but maybe there's more of a chance they will now that we allow larger readdirs. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: allow large readdirsJ. Bruce Fields1-62/+75
Currently we limit readdir results to a single page. This can result in a performance regression compared to NFSv3 when reading large directories. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: adjust buflen to session channel limitJ. Bruce Fields1-16/+8
We can simplify session limit enforcement by restricting the xdr buflen to the session size. Also fix a preexisting bug: we should really have been taking into account the auth-required space when comparing against session limits, which are limits on the size of the entire rpc reply, including any krb5 overhead. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: fix buflen calculation after read encodingJ. Bruce Fields1-1/+7
We don't necessarily want to assume that the buflen is the same as the number of bytes available in the pages. We may have some reason to set it to something less (for example, later patches will use a smaller buflen to enforce session limits). So, calculate the buflen relative to the previous buflen instead of recalculating it from scratch. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: nfsd4_check_resp_size should check against whole bufferJ. Bruce Fields1-2/+1
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: minor encode_read cleanupJ. Bruce Fields1-4/+6
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: more precise nfsd4_max_replyJ. Bruce Fields1-31/+4
It will turn out to be useful to have a more accurate estimate of reply size; so, piggyback on the existing op reply-size estimators. Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing out that simplification.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: don't try to encode conflicting owner if low on spaceJ. Bruce Fields1-3/+13
I ran into this corner case in testing: in theory clients can provide state owners up to 1024 bytes long. In the sessions case there might be a risk of this pushing us over the DRC slot size. The conflicting owner isn't really that important, so let's humor a client that provides a small maxresponsize_cached by allowing ourselves to return without the conflicting owner instead of outright failing the operation. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: convert 4.1 replay encodingJ. Bruce Fields1-1/+1
Limits on maxresp_sz mean that we only ever need to replay rpc's that are contained entirely in the head. The one exception is very small zero-copy reads. That's an odd corner case as clients wouldn't normally ask those to be cached. in any case, this seems a little more robust. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: allow encoding across page boundariesJ. Bruce Fields1-17/+43
After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: size-checking cleanupJ. Bruce Fields1-14/+15
Better variable name, some comments, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-31nfsd4: remove redundant encode buffer size checkingJ. Bruce Fields1-9/+13
Now that all op encoders can handle running out of space, we no longer need to check the remaining size for every operation; only nonidempotent operations need that check, and that can be done by nfsd4_check_resp_size. Signed-off-by: J. Bruce Fields <bfields@redhat.com>