Age | Commit message (Collapse) | Author | Files | Lines |
|
There is nothing happening in the start of nfsd() that requires
protection by the mutex, so don't take it until shutting down the thread
- which does still require protection - but only for nfsd_put().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Using sv_lock means we don't need to hold the service mutex over these
updates.
In particular, svc_exit_thread() no longer requires synchronisation, so
threads can exit asynchronously.
Note that we could use an atomic_t, but as there are many more read
sites than writes, that would add unnecessary noise to the code.
Some reads are already racy, and there is no need for them to not be.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This allows us to move the updates for th_cnt out of the mutex.
This is a step towards reducing mutex coverage in nfsd().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The use of sv_nrthreads as a general refcount results in clumsy code, as
is seen by various comments needed to explain the situation.
This patch introduces a 'struct kref' and uses that for reference
counting, leaving sv_nrthreads to be a pure count of threads. The kref
is managed particularly in svc_get() and svc_put(), and also nfsd_put();
svc_destroy() now takes a pointer to the embedded kref, rather than to
the serv.
nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This
happens when a transport is created before the first thread is started.
To support this, a 'keep_active' flag is introduced which holds a ref on
the svc_serv. This is set when any listening socket is successfully
added (unless there are running threads), and cleared when the number of
threads is set. So when the last thread exits, the nfs_serv will be
destroyed.
The use of 'keep_active' replaces previous code which checked if there
were any permanent sockets.
We no longer clear ->rq_server when nfsd() exits. This was done
to prevent svc_exit_thread() from calling svc_destroy().
Instead we take an extra reference to the svc_serv to prevent
svc_destroy() from being called.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
it might just reduce the ref count.
nfsd_destroy() is poorly named for the same reason.
This patch:
- removes the refcount functionality from svc_destroy(), moving it to
a new svc_put(). Almost all previous callers of svc_destroy() now
call svc_put().
- renames nfsd_destroy() to nfsd_put() and improves the code, using
the new svc_destroy() rather than svc_put()
- removes a few comments that explain the important for balanced
get/put calls. This should be obvious.
The only non-trivial part of this is that svc_destroy() would call
svc_sock_update() on a non-final decrement. It can no longer do that,
and svc_put() isn't really a good place of it. This call is now made
from svc_exit_thread() which seems like a good place. This makes the
call *before* sv_nrthreads is decremented rather than after. This
is not particularly important as the call just sets a flag which
causes sv_nrthreads set be checked later. A subsequent patch will
improve the ordering.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If write_ports_add() fails, we shouldn't destroy the serv, unless we had
only just created it. So if there are any permanent sockets already
attached, leave the serv in place.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: expected restricted __be32 [usertype] status
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: got int
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Update module_put_and_exit to call kthread_exit instead of do_exit.
Change the name to reflect this change in functionality. All of the
users of module_put_and_exit are causing the current kthread to exit
so this change makes it clear what is happening. There is no
functional change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
A delegation break could arrive as soon as we've called vfs_setlease. A
delegation break runs a callback which immediately (in
nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we
then exit nfs4_set_delegation without hashing the delegation, it will be
freed as soon as the callback is done with it, without ever being
removed from del_recall_lru.
Symptoms show up later as use-after-free or list corruption warnings,
usually in the laundromat thread.
I suspect aba2072f4523 "nfsd: grant read delegations to clients holding
writes" made this bug easier to hit, but I looked as far back as v3.0
and it looks to me it already had the same problem. So I'm not sure
where the bug was introduced; it may have been there from the beginning.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
has re-opened rpc_pipefs_event() race against nfsd_net_id registration
(register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76
("nfsd: fix nsfd startup race triggering BUG_ON").
Restore the order of register_pernet_subsys() vs register_cld_notifier().
Add WARN_ON() to prevent a future regression.
Crash info:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012
CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1
pc : rpc_pipefs_event+0x54/0x120 [nfsd]
lr : rpc_pipefs_event+0x48/0x120 [nfsd]
Call trace:
rpc_pipefs_event+0x54/0x120 [nfsd]
blocking_notifier_call_chain
rpc_fill_super
get_tree_keyed
rpc_fs_get_tree
vfs_get_tree
do_mount
ksys_mount
__arm64_sys_mount
el0_svc_handler
el0_svc
Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Multiple places open-code the same check to determine whether a given
mount is idmapped. Introduce a simple helper function that can be used
instead. This allows us to get rid of the fragile open-coding. We will
later change the check that is used to determine whether a given mount
is idmapped. Introducing a helper allows us to do this in a single
place instead of doing it for multiple places.
Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Pull nfsd bugfix from Bruce Fields:
"This is just one bugfix for a buffer overflow in knfsd's xdr decoding"
* tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux:
NFSD: Fix exposure in nfsd4_decode_bitmap()
|
|
rtm@csail.mit.edu reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().
Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.
Reported by: rtm@csail.mit.edu
Fixes: d1c263a031e8 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Pull nfsd updates from Bruce Fields:
"A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping
support for a filehandle format deprecated 20 years ago, and further
xdr-related cleanup from Chuck"
* tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd4: remove obselete comment
nfsd: document server-to-server-copy parameters
NFSD:fix boolreturn.cocci warning
nfsd: update create verifier comment
SUNRPC: Change return value type of .pc_encode
SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
NFSD: Save location of NFSv4 COMPOUND status
SUNRPC: Change return value type of .pc_decode
SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
SUNRPC: De-duplicate .pc_release() call sites
SUNRPC: Simplify the SVC dispatch code path
SUNRPC: Capture value of xdr_buf::page_base
SUNRPC: Add trace event when alloc_pages_bulk() makes no progress
svcrdma: Split svcrmda_wc_{read,write} tracepoints
svcrdma: Split the svcrdma_wc_send() tracepoint
svcrdma: Split the svcrdma_wc_receive() tracepoint
NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases
NFSD: Initialize pointer ni with NULL and not plain integer 0
NFSD: simplify struct nfsfh
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv4.1 can always retrieve and cache the ACCESS mode on OPEN
- Optimisations for READDIR and the 'ls -l' style workload
- Further replacements of dprintk() with tracepoints and other
tracing improvements
- Ensure we re-probe NFSv4 server capabilities when the user does a
"mount -o remount"
Bugfixes:
- Fix an Oops in pnfs_mark_request_commit()
- Fix up deadlocks in the commit code
- Fix regressions in NFSv2/v3 attribute revalidation due to the
change_attr_type optimisations
- Fix some dentry verifier races
- Fix some missing dentry verifier settings
- Fix a performance regression in nfs_set_open_stateid_locked()
- SUNRPC was sending multiple SYN calls when re-establishing a TCP
connection.
- Fix multiple NFSv4 issues due to missing sanity checking of server
return values
- Fix a potential Oops when FREE_STATEID races with an unmount
Cleanups:
- Clean up the labelled NFS code
- Remove unused header <linux/pnfs_osd_xdr.h>"
* tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (84 commits)
NFSv4: Sanity check the parameters in nfs41_update_target_slotid()
NFS: Remove the nfs4_label argument from decode_getattr_*() functions
NFS: Remove the nfs4_label argument from nfs_setsecurity
NFS: Remove the nfs4_label argument from nfs_fhget()
NFS: Remove the nfs4_label argument from nfs_add_or_obtain()
NFS: Remove the nfs4_label argument from nfs_instantiate()
NFS: Remove the nfs4_label from the nfs_setattrres
NFS: Remove the nfs4_label from the nfs4_getattr_res
NFS: Remove the f_label from the nfs4_opendata and nfs_openres
NFS: Remove the nfs4_label from the nfs4_lookupp_res struct
NFS: Remove the label from the nfs4_lookup_res struct
NFS: Remove the nfs4_label from the nfs4_link_res struct
NFS: Remove the nfs4_label from the nfs4_create_res struct
NFS: Remove the nfs4_label from the nfs_entry struct
NFS: Create a new nfs_alloc_fattr_with_label() function
NFS: Always initialise fattr->label in nfs_fattr_alloc()
NFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode
NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()
NFSv4: Remove unnecessary 'minor version' check
NFSv4: Fix potential Oops in decode_op_map()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"Support for reporting filesystem errors through fanotify so that
system health monitoring daemons can watch for these and act instead
of scraping system logs"
* tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
samples: remove duplicate include in fs-monitor.c
samples: Fix warning in fsnotify sample
docs: Fix formatting of literal sections in fanotify docs
samples: Make fs-monitor depend on libc and headers
docs: Document the FAN_FS_ERROR event
samples: Add fs error monitoring example
ext4: Send notifications on error
fanotify: Allow users to request FAN_FS_ERROR events
fanotify: Emit generic error info for error event
fanotify: Report fid info for file related file system errors
fanotify: WARN_ON against too large file handles
fanotify: Add helpers to decide whether to report FID/DFID
fanotify: Wrap object_fh inline space in a creator macro
fanotify: Support merging of error events
fanotify: Support enqueueing of error events
fanotify: Pre-allocate pool of error events
fanotify: Reserve UAPI bits for FAN_FS_ERROR
fsnotify: Support FS_ERROR event type
fanotify: Require fid_mode for any non-fd event
fanotify: Encode empty file handle when no inode is provided
...
|
|
Refactor: surface useful show_ macros so they can be shared between
the client and server trace code.
Additional clean up:
- Housekeeping: ensure the correct #include files are pulled in
and add proper TRACE_DEFINE_ENUM where they are missing
- Use a consistent naming scheme for the helpers
- Store values to be displayed symbolically as unsigned long, as
that is the type that the __print_yada() functions take
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Mandatory locking has been removed. And the rest of this comment is
redundant with the code.
Reported-by: Jeff layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
FAN_FS_ERROR allows events without inodes - i.e. for file system-wide
errors. Even though fsnotify_handle_inode_event is not currently used
by fanotify, this patch protects other backends from cases where neither
inode or dir are provided. Also document the constraints of the
interface (inode and dir cannot be both NULL).
Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Call the ->get_unique_id method to query the SCSI identifiers. This can
use the cached VPD page in the sd driver instead of sending a command
on every LAYOUTGET. It will also allow to support NVMe based volumes
if the draft for that ever takes off.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20211021060607.264371-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function
'nfssvc_decode_voidarg' with return type bool
Return statements in functions returning bool should use true/false
instead of 1/0.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
I don't know if that Solaris behavior matters any more or if it's still
possible to look up that bug ID any more. The XFS behavior's definitely
still relevant, though; any but the most recent XFS filesystems will
lose the top bits.
Reported-by: Frank S. Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having
.pc_encode return only true or false.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.
Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS
dispatcher to pass in the buffer location of the COMPOUND status.
Instead, save that buffer location in struct nfsd4_compoundres.
The compound tag follows immediately after.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having
.pc_decode return only true or false.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.
Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Bug fixes for NFSD error handling paths"
* tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Keep existing listeners on portlist error
SUNRPC: fix sign error causing rpcsec_gss drops
nfsd: Fix a warning for nfsd_file_close_inode
nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
nfsd: fix error handling of register_pernet_subsys() in init_nfsd()
|
|
If nfsd has existing listening sockets without any processes, then an error
returned from svc_create_xprt() for an additional transport will remove
those existing listeners. We're seeing this in practice when userspace
attempts to create rpcrdma transports without having the rpcrdma modules
present before creating nfsd kernel processes. Fix this by checking for
existing sockets before calling nfsd_destroy().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Refactor.
Now that the NFSv2 and NFSv3 XDR decoders have been converted to
use xdr_streams, the WRITE decoder functions can use
xdr_stream_subsegment() to extract the WRITE payload into its own
xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.
That makes it possible to pass the first kvec, pages array + length,
page_base, and total payload length via a single function parameter.
The payload's page_base is not yet assigned or used, but will be in
subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Pointer ni is being initialized with plain integer zero. Fix
this by initializing with NULL.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
struct) and are accessed using macros like:
#define fh_FOO fh_base.fh_new.fb_FOO
This patch makes the union and struct anonymous, so that "fh_FOO" can be
a name directly within 'struct knfsd_fh' and the #defines aren't needed.
The file handle as a whole is sometimes accessed as "fh_base" or
"fh_base.fh_pad", neither of which are particularly helpful names.
As the struct holding the filehandle is now anonymous, we
cannot use the name of that, so we union it with 'fh_raw' and use that
where the raw filehandle is needed. fh_raw also ensure the structure is
large enough for the largest possible filehandle.
fh_raw is a 'char' array, removing any need to cast it for memcpy etc.
SVCFH_fmt() is simplified using the "%ph" printk format. This
changes the appearance of filehandles in dprintk() debugging, making
them a little more precise.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Filehandles not in the "new" or "version 1" format have not been handed
out for new mounts since Linux 2.4 which was released 20 years ago.
I think it is safe to say that no such file handles are still in use,
and that we can drop support for them.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A small part of the declaration concerning filehandle format are
currently in the "uapi" include directory:
include/uapi/linux/nfsd/nfsfh.h
There is a lot more to the filehandle format, including "enum fid_type"
and "enum nfsd_fsid" which are not exported via "uapi".
This small part of the filehandle definition is of minimal use outside
of the kernel, and I can find no evidence that an other code is using
it. Certainly nfs-utils and wireshark (The most likely candidates) do not
use these declarations.
So move it out of "uapi" by copying the content from
include/uapi/linux/nfsd/nfsfh.h
into
fs/nfsd/nfsfh.h
A few unnecessary "#include" directives are not copied, and neither is
the #define of fh_auth, which is annotated as being for userspace only.
The copyright claims in the uapi file are identical to those in the nfsd
file, so there is no need to copy those.
The "__u32" style integer types are only needed in "uapi". In
kernel-only code we can use the more familiar "u32" style.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
RFC3530 notes that the 'dircount' field may be zero, in which case the
recommendation is to ignore it, and only enforce the 'maxcount' field.
In RFC5661, this recommendation to ignore a zero valued field becomes a
requirement.
Fixes: aee377644146 ("nfsd4: fix rd_dircount enforcement")
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
init_nfsd() should not unregister pernet subsys if the register fails
but should instead unwind from the last successful operation which is
register_filesystem().
Unregistering a failed register_pernet_subsys() call can result in
a kernel GPF as revealed by programmatically injecting an error in
register_pernet_subsys().
Verified the fix handled failure gracefully with no lingering nfsd
entry in /proc/filesystems. This change was introduced by the commit
bd5ae9288d64 ("nfsd: register pernet ops last, unregister first"),
the original error handling logic was correct.
Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: Patrick Ho <Patrick.Ho@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Critical bug fixes:
- Fix crash in NLM TEST procedure
- NFSv4.1+ backchannel not restored after PATH_DOWN"
* tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
NLM: Fix svcxdr_encode_owner()
|
|
DRC bucket pruning is done by nfsd_cache_lookup(), which is part of
every NFSv2 and NFSv3 dispatch (ie, it's done while the client is
waiting).
I added a trace_printk() in prune_bucket() to see just how long
it takes to prune. Here are two ends of the spectrum:
prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining
prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining
...
prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
Pruning latency is noticeable on fast transports with fast storage.
By noticeable, I mean that the latency measured here in the worst
case is the same order of magnitude as the round trip time for
cached server operations.
We could do something like moving expired entries to an expired list
and then free them later instead of freeing them right in
prune_bucket(). But simply limiting the number of entries that can
be pruned by a lookup is simple and retains more entries in the
cache, making the DRC somewhat more effective.
Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
Before:
write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
WRITE:
1848726 ops (30%)
avg bytes sent per op: 8340 avg bytes received per op: 136
backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds)
After:
write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
WRITE:
1891144 ops (30%)
avg bytes sent per op: 8340 avg bytes received per op: 136
backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds)
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client
recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.
Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, qla2xxx,
target, smartpqi, lpfc, mpt3sas).
The core change causing the most churn was replacing the command
request field request with a macro, allowing us to offset map to it
and remove the redundant field; the same was also done for the tag
field.
The most impactful change is the final removal of scsi_ioctl, which
has been deprecated for over a decade"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
scsi: ufs: ufs-exynos: Fix static checker warning
scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Copyright updates for 14.0.0.1 patches
scsi: lpfc: Update lpfc version to 14.0.0.1
scsi: lpfc: Add bsg support for retrieving adapter cmf data
scsi: lpfc: Add cmf_info sysfs entry
scsi: lpfc: Add debugfs support for cm framework buffers
scsi: lpfc: Add support for maintaining the cm statistics buffer
scsi: lpfc: Add rx monitoring statistics
scsi: lpfc: Add support for the CM framework
scsi: lpfc: Add cmfsync WQE support
scsi: lpfc: Add support for cm enablement buffer
scsi: lpfc: Add cm statistics buffer support
scsi: lpfc: Add EDC ELS support
scsi: lpfc: Expand FPIN and RDF receive logging
scsi: lpfc: Add MIB feature enablement support
scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
scsi: fc: Add EDC ELS definition
...
|
|
Pull nfsd updates from Chuck Lever:
"New features:
- Support for server-side disconnect injection via debugfs
- Protocol definitions for new RPC_AUTH_TLS authentication flavor
Performance improvements:
- Reduce page allocator traffic in the NFSD splice read actor
- Reduce CPU utilization in svcrdma's Send completion handler
Notable bug fixes:
- Stabilize lockd operation when re-exporting NFS mounts
- Fix the use of %.*s in NFSD tracepoints
- Fix /proc/sys/fs/nfs/nsm_use_hostnames"
* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
nfsd: fix crash on LOCKT on reexported NFSv3
nfs: don't allow reexport reclaims
lockd: don't attempt blocking locks on nfs reexports
nfs: don't atempt blocking locks on nfs reexports
Keep read and write fds with each nlm_file
lockd: update nlm_lookup_file reexport comment
nlm: minor refactoring
nlm: minor nlm_lookup_file argument change
lockd: lockd server-side shouldn't set fl_ops
SUNRPC: Add documentation for the fail_sunrpc/ directory
SUNRPC: Server-side disconnect injection
SUNRPC: Move client-side disconnect injection
SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
nfsd4: Fix forced-expiry locking
rpc: fix gss_svc_init cleanup on failure
SUNRPC: Add RPC_AUTH_TLS protocol numbers
lockd: change the proc_handler for nsm_use_hostnames
sysctl: introduce new proc handler proc_dobool
SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
...
|
|
Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set. The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.
We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.
I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
NFS implements blocking locks by blocking inside its lock method. In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock. It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.
Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks. The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
We shouldn't really be using a read-only file descriptor to take a write
lock.
Most filesystems will put up with it. But NFS, for example, won't.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.
I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.
This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
|
|
This should use the network-namespace-wide client_lock, not the
per-client cl_lock.
You shouldn't see any bugs unless you're actually using the
forced-expiry interface introduced by 89c905beccbb.
Fixes: 89c905beccbb "nfsd: allow forced expiration of NFSv4 clients"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Including one's name in copyright claims is appropriate. Including it
in random comments is just vanity. After 2 decades, it is time for
these to be gone.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|