summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)AuthorFilesLines
2015-09-18nfs: fix high load average due to callback thread sleepingJeff Layton1-3/+3
commit 5d05e54af3cdbb13cf19c557ff2184781b91a22c upstream. Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs: don't call blocking operations while !TASK_RUNNING). Linux counts tasks in uninterruptible sleep against the load average, so this caused the system's load average to be pinned at at least 1 when there was a NFSv4.1+ mount active. Not a huge problem, but it's probably worth fixing before we get too many complaints about it. This patch converts the code back to use TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop iteration. In practice no one should really be signalling this thread at all, so I think this is reasonably safe. With this change, there's also no need to game the hung task watchdog so we can also convert the schedule_timeout call back to a normal schedule. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18nfs: don't call blocking operations while !TASK_RUNNINGJeff Layton1-3/+5
commit 6ffa30d3f734d4f6b478081dfc09592021028f90 upstream. Bruce reported seeing this warning pop when mounting using v4.1: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 Call Trace: [<ffffffff8186ac78>] dump_stack+0x4c/0x65 [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 [<ffffffff810d941e>] groups_alloc+0x3e/0x130 [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] [<ffffffff810d45bf>] kthread+0x11f/0x140 [<ffffffff810ea815>] ? local_clock+0x15/0x30 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 ---[ end trace 675220a11e30f4f2 ]--- nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, which is just wrong. Fix that by finishing the wait immediately if we've found that the list has something on it. Also, we don't expect this kthread to accept signals, so we should be using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up hung task warnings from the watchdog, so have the schedule_timeout wake up every 60s if there's no callback activity. Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19NFSv4: Don't call put_rpccred() under the rcu_read_lock()Trond Myklebust1-1/+1
commit 7c0af9ffb7bb4e5355470fa60b3eb711ddf226fa upstream. put_rpccred() can sleep. Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_argsTrond Myklebust1-1/+3
commit d8ba1f971497c19cf80da1ea5391a46a5f9fbd41 upstream. If the call to decode_rc_list() fails due to a memory allocation error, then we need to truncate the array size to ensure that we only call kfree() on those pointer that were allocated. Reported-by: David Ramos <daramos@stanford.edu> Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14move d_rcu from overlapping d_child to overlapping d_aliasAl Viro1-1/+1
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [lizf: Backported to 3.4: - adjust context - need one more name change in debugfs]
2015-02-02NFSv4: Ensure that we remove NFSv4.0 delegations when state has expiredTrond Myklebust1-1/+23
commit 4dfd4f7af0afd201706ad186352ca423b0f17d4b upstream. NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so unlike NFSv4.1, the recovery procedure when stateids have expired or have been revoked requires us to just forget the delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-02-02nfs: Fix use of uninitialized variable in nfs_getattr()Jan Kara1-1/+1
commit 16caf5b6101d03335b386e77e9e14136f989be87 upstream. Variable 'err' needn't be initialized when nfs_getattr() uses it to check whether it should call generic_fillattr() or not. That can result in spurious error returns. Initialize 'err' properly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-02-02NFSv4.1: Fix an NFSv4.1 state renewal regressionAndy Adamson2-3/+11
commit d1f456b0b9545f1606a54cd17c20775f159bd2ce upstream. Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat call, on the wire to renew the NFSv4.1 state if the flag was not set. The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and sometimes does the following: In normal operation, the only way a future state renewal call is put on the wire is via a call to nfs4_schedule_state_renewal, which schedules a nfs4_renew_state workqueue task. nfs4_renew_state determines if the NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence, which only gets sent if the NFS4_RENEW_TIMEOUT flag is set. Then the nfs41_proc_async_sequence rpc_release function schedules another state remewal via nfs4_schedule_state_renewal. Without this change we can get into a state where an application stops accessing the NFSv4.1 share, state renewal calls stop due to the NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover from this situation is with a clientid re-establishment, once the application resumes and the server has timed out the lease and so returns NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation. An example application: open, lock, write a file. sleep for 6 * lease (could be less) ulock, close. In the above example with NFSv4.1 delegations enabled, without this change, there are no OP_SEQUENCE state renewal calls during the sleep, and the clientid is recovered due to lease expiration on the close. This issue does not occur with NFSv4.1 delegations disabled, nor with NFSv4.0, with or without delegations enabled. Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-02-02NFSv4: fix open/lock state recovery error handlingTrond Myklebust1-10/+6
commit df817ba35736db2d62b07de6f050a4db53492ad8 upstream. The current open/lock state recovery unfortunately does not handle errors such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping, just proceeds as if the state manager is finished recovering. This patch ensures that we loop back, handle higher priority errors and complete the open/lock state recovery. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2014-12-01NFSv4: Fix another bug in the close/open_downgrade codeTrond Myklebust1-15/+15
commit cd9288ffaea4359d5cfe2b8d264911506aed26a4 upstream. James Drew reports another bug whereby the NFS client is now sending an OPEN_DOWNGRADE in a situation where it should really have sent a CLOSE: the client is opening the file for O_RDWR, but then trying to do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec. Reported-by: James Drews <drews@engr.wisc.edu> Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu Fixes: aee7af356e15 (NFSv4: Fix problems with close in the presence...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2014-09-25NFSv4: Fix problems with close in the presence of a delegationTrond Myklebust1-5/+12
commit aee7af356e151494d5014f57b33460b162f181b5 upstream. In the presence of delegations, we can no longer assume that the state->n_rdwr, state->n_rdonly, state->n_wronly reflect the open stateid share mode, and so we need to calculate the initial value for calldata->arg.fmode using the state->flags. Reported-by: James Drews <drews@engr.wisc.edu> Fixes: 88069f77e1ac5 (NFSv41: Fix a potential state leakage when...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [lizf: Backport to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2014-06-08NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 sessionTrond Myklebust1-2/+12
commit c489ee290bdbbace6bb63ebe6ebd4dd605819495 upstream. NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It usually means that the server is busy handling an unfinished RPC request. Just sleep for a second and then retry. We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return value. If the NFS server has outstanding callbacks, we just want to similarly sleep & retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-08NFSv4.1: Don't decode skipped layoutgetsWeston Andros Adamson1-1/+2
commit 085b7a45c63d3da5be155faab9249a5cab224561 upstream. layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-08NFSv4.1: Fix a race in pNFS layoutcommitTrond Myklebust2-15/+18
commit a073dbff359f4741013ae4b8395f5364c5e00b48 upstream. We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations where the two are out of sync. The first half of the problem is to ensure that pnfs_layoutcommit_inode clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg. We still need to keep the reference to those segments until the RPC call is finished, so in order to make it clear _where_ those references come from, we add a helper pnfs_list_write_lseg_done() that cleans up after pnfs_list_write_lseg. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> [bwh: Backported to 3.2: s/pnfs_put_lseg/put_lseg/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-08NFS: nfs_getaclargs.acl_len is a size_tChuck Lever1-1/+2
commit 56d08fef2369d5ca9ad2e1fc697f5379fd8af751 upstream. Squelch compiler warnings: fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’: fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get acl data", Dec 7, 2011. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-08NFSv4.1: integer overflow in decode_cb_sequence_args()Dan Carpenter1-3/+3
commit 0439f31c35d1da0b28988b308ea455e38e6a350d upstream. This seems like it could overflow on 32 bits. Use kmalloc_array() which has overflow protection built in. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-08NFSv4 wait on recovery for async session errorsAndy Adamson1-2/+1
commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 upstream. When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session draining is off, but DELEGRETURN can still get a session error. The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and the DELEGRETURN done then restarts the RPC task in the prepare state. With the state manager still processing the NFS4CLNT_DELEGRETURN flag with session draining off, these DELEGRETURNs will cycle with errors filling up the session slots. This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task as seen in this kernel thread dump: kernel: 4.12.32.53-ma D 0000000000000000 0 3393 2 0x00000000 kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140 kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001 kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058 kernel: Call Trace: kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc] kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90 kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50 kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc] kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs] kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs] kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs] kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs] kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs] kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs] kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30 kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs] kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs] kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs] kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 The state manager can not therefore process the DELEGRETURN session errors. Change the async handler to wait for recovery on session errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: - Adjust context - There's no restart_call label] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-24NFS: Fix a delegation callback raceTrond Myklebust1-4/+7
commit 755a48a7a4eb05b9c8424e3017d947b2961a60e0 upstream. The clean-up in commit 36281caa839f ended up removing a NULL pointer check that is needed in order to prevent an Oops in nfs_async_inode_return_delegation(). Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20nfs: tear down caches in nfs_init_writepagecache when allocation failsJeff Layton1-2/+8
commit 3dd4765fce04c0b4af1e0bc4c0b10f906f95fabc upstream. ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: drop the nfs_cdata_cachep cleanup; it doesn't exist] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAMEWeston Andros Adamson1-2/+2
commit 78b19bae0813bd6f921ca58490196abd101297bd upstream. Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13NFSv4: OPEN must handle the NFS4ERR_IO return code correctlyTrond Myklebust1-16/+31
commit c7848f69ec4a8c03732cde5c949bd2aa711a9f4b upstream. decode_op_hdr() cannot distinguish between an XDR decoding error and the perfectly valid errorcode NFS4ERR_IO. This is normally not a problem, but for the particular case of OPEN, we need to be able to increment the NFSv4 open sequence id when the server returns a valid response. Reported-by: J Bruce Fields <bfields@fieldses.org> Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20nfs: fix do_div() warning by instead using sector_div()Helge Deller1-1/+1
commit 3873d064b8538686bbbd4b858dc8a07db1f7f43a upstream. When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like shown below. Fix this warning by instead using sector_div() which is provided by the kernel.h header file. fs/nfs/blocklayout/extents.c: In function ‘normalize’: include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’ nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default] fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default] include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-12NFSv4: Update list of irrecoverable errors on DELEGRETURNTrond Myklebust1-2/+8
commit c97cf606e43b85a6cf158b810375dd77312024db upstream. If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID then there is no recovery possible. Just quit without returning an error. Also, note that the client must not assume that the NFSv4 lease has been renewed when it sees an error on DELEGRETURN. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()Trond Myklebust1-0/+1
commit a6f951ddbdfb7bd87d31a44f61abe202ed6ce57f upstream. In nfs4_proc_getlk(), when some error causes a retry of the call to _nfs4_proc_getlk(), we can end up with Oopses of the form BUG: unable to handle kernel NULL pointer dereference at 0000000000000134 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30 <snip> Call Trace: [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70 [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4] [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4] [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4] The problem is that we don't clear the request->fl_ops after the first try and so when we retry, nfs4_set_lock_state() exits early without setting the lock stateid. Regression introduced by commit 70cc6487a4e08b8698c0e2ec935fb48d10490162 (locks: make ->lock release private data before returning in GETLK case) Reported-by: Weston Andros Adamson <dros@netapp.com> Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29SUNRPC handle EKEYEXPIRED in call_refreshresultAndy Adamson5-86/+3
commit eb96d5c97b0825d542e9c4ba5e0a22b519355166 upstream. Currently, when an RPCSEC_GSS context has expired or is non-existent and the users (Kerberos) credentials have also expired or are non-existent, the client receives the -EKEYEXPIRED error and tries to refresh the context forever. If an application is performing I/O, or other work against the share, the application hangs, and the user is not prompted to refresh/establish their credentials. This can result in a denial of service for other users. Users are expected to manage their Kerberos credential lifetimes to mitigate this issue. Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number of times to refresh the gss_context, and then return -EACCES to the application. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: - Adjust context - Drop change to nfs4_handle_reclaim_lease_error()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29nfs: don't allow nfs_find_actor to match inodes of the wrong typeJeff Layton1-0/+2
commit f6488c9ba51d65410e2dbc4345413c0d9120971e upstream. Benny Halevy reported the following oops when testing RHEL6: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>PGD 81448a067 PUD 831632067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled <4>CPU 6 <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6 <4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292 <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760 <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008 <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780 <4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040) <4>Stack: <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745 <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760 <4>Call Trace: <4> [<ffffffff81182745>] __fput+0xf5/0x210 <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182885>] fput+0x25/0x30 <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160 <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50 <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0 <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140 <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80 <4> [<ffffffff8117df90>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 <1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4> RSP <ffff88081458bb98> <4>CR2: 0000000000000000 I think this is ultimately due to a bug on the server. The client had previously found a directory dentry. It then later tried to do an atomic open on a new (regular file) dentry. The attributes it got back had the same filehandle as the previously found directory inode. It then tried to put the filp because it failed the aops tests for O_DIRECT opens, and oopsed here because the ctx was still NULL. Obviously the root cause here is a server issue, but we can take steps to mitigate this on the client. When nfs_fhget is called, we always know what type of inode it is. In the event that there's a broken or malicious server on the other end of the wire, the client can end up crashing because the wrong ops are set on it. Have nfs_find_actor check that the inode type is correct after checking the fileid. The fileid check should rarely ever match, so it should only rarely ever get to this check. In the case where we have a broken server, we may see two different inodes with the same i_ino, but the client should be able to cope with them without crashing. This should fix the oops reported here: https://bugzilla.redhat.com/show_bug.cgi?id=913660 Reported-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07NFSv4: Fix a thinko in nfs4_try_open_cachedTrond Myklebust1-1/+1
commit f448badd34700ae728a32ba024249626d49c10e1 upstream. We need to pass the full open mode flags to nfs_may_open() when doing a delegated open. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-08NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recallTrond Myklebust1-0/+6
commit 8b6cc4d6f841d31f72fe7478453759166d366274 upstream. A server shouldn't normally return NFS4ERR_GRACE if the client holds a delegation, since no conflicting lock reclaims can be granted, however the spec does not require the server to grant the open in this instance Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05pnfs-block: removing DM device maybe cause oops when call dev_removefanchaoting1-2/+2
commit 4376c94618c26225e69e17b7c91169c45a90b292 upstream. when pnfs block using device mapper,if umounting later,it maybe cause oops. we apply "1 + sizeof(bl_umount_request)" memory for msg->data, the memory maybe overflow when we do "memcpy(&dataptr [sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))", because the size of bl_msg is more than 1 byte. Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-14NFS: Don't allow NFS silly-renamed files to be deleted, no signalTrond Myklebust1-7/+13
commit 5a7a613a47a715711b3f2d3322a0eac21d459166 upstream. Commit 73ca100 broke the code that prevents the client from deleting a silly renamed dentry. This affected "delete on last close" semantics as after that commit, nothing prevented removal of silly-renamed files. As a result, a process holding a file open could easily get an ESTALE on the file in a directory where some other process issued 'rm -rf some_dir_containing_the_file' twice. Before the commit, any attempt at unlinking silly renamed files would fail inside may_delete() with -EBUSY because of the DCACHE_NFSFS_RENAMED flag. The following testcase demonstrates the problem: tail -f /nfsmnt/dir/file & rm -rf /nfsmnt/dir rm -rf /nfsmnt/dir # second removal does not fail, 'tail' process receives ESTALE The problem with the above commit is that it unhashes the old and new dentries from the lookup path, even in the normal case when a signal is not encountered and it would have been safe to call d_move. Unfortunately the old dentry has the special DCACHE_NFSFS_RENAMED flag set on it. Unhashing has the side-effect that future lookups call d_alloc(), allocating a new dentry without the special flag for any silly-renamed files. As a result, subsequent calls to unlink silly renamed files do not fail but allow the removal to go through. This will result in ESTALE errors for any other process doing operations on the file. To fix this, go back to using d_move on success. For the signal case, it's unclear what we may safely do beyond d_drop. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28umount oops when remove blocklayoutdriver firstfanchaoting2-0/+2
commit 5a12cca697aca5dfba42a7d4c3356acc0445a2b0 upstream. now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-04NFS: Don't silently fail setattr() requests on mountpointsTrond Myklebust1-0/+20
commit ab225417825963b6dc66be7ea80f94ac1378dfdf upstream. Ensure that any setattr and getattr requests for junctions and/or mountpoints are sent to the server. Ever since commit 0ec26fd0698 (vfs: automount should ignore LOOKUP_FOLLOW), we have silently dropped any setattr requests to a server-side mountpoint. For referrals, we have silently dropped both getattr and setattr requests. This patch restores the original behaviour for setattr on mountpoints, and tries to do the same for referrals, provided that we have a filehandle... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11nfs: fix null checking in nfs_get_option_str()Xi Wang1-1/+1
commit e25fbe380c4e3c09afa98bcdcd9d3921443adab8 upstream. The following null pointer check is broken. *option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!*option' instead. The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11NFS: Fix calls to drop_nlink()Trond Myklebust1-5/+6
commit 1f018458b30b0d5c535c94e577aa0acbb92e1395 upstream. It is almost always wrong for NFS to call drop_nlink() after removing a file. What we really want is to mark the inode's attributes for revalidation, and we want to ensure that the VFS drops it if we're reasonably sure that this is the final unlink(). Do the former using the usual cache validity flags, and the latter by testing if inode->i_nlink == 1, and clearing it in that case. This also fixes the following warning reported by Neil Brown and Jeff Layton (among others). [634155.004438] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442] Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm: bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace: [634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0 [634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f [634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0 [634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40 [634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs] [634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230 [634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230 [634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90 [634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100 [634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b [634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11NFS: avoid NULL dereference in nfs_destroy_serverNeilBrown1-2/+1
commit f259613a1e4b44a0cf85a5dafd931be96ee7c9e5 upstream. In rare circumstances, nfs_clone_server() of a v2 or v3 server can get an error between setting server->destory (to nfs_destroy_server), and calling nfs_start_lockd (which will set server->nlm_host). If this happens, nfs_clone_server will call nfs_free_server which will call nfs_destroy_server and thence nlmclnt_done(NULL). This causes the NULL to be dereferenced. So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL. The other guards there are irrelevant as nlm_host can only be non-NULL if one of these flags are set - so remove those tests. (Thanks to Trond for this suggestion). This is suitable for any stable kernel since 2.6.25. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()Bryan Schumaker1-4/+17
commit 6bdb5f213c4344324f600dde885f25768fbd14db upstream. If I mount an NFS v4.1 server to a single client multiple times and then run xfstests over each mountpoint I usually get the client into a state where recovery deadlocks. The server informs the client of a cb_path_down sequence error, the client then does a bind_connection_to_session and checks the status of the lease. I found that bind_connection_to_session sets the NFS4_SESSION_DRAINING flag on the client, but this flag is never unset before nfs4_check_lease() reaches nfs4_proc_sequence(). This causes the client to deadlock, halting all NFS activity to the server. nfs4_proc_sequence() is only called by the state manager, so I can change it to run in privileged mode to bypass the NFS4_SESSION_DRAINING check and avoid the deadlock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-10pnfsblock: fix partial page buffer wirtePeng Tao2-12/+165
commit fe6e1e8d9fad86873eb74a26e80a8f91f9e870b5 upstream. If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to handle non-sector aligned writes. Otherwise there will be data corruption. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26NFS: Wait for session recovery to finish before returningBryan Schumaker1-2/+1
commit 399f11c3d872bd748e1575574de265a6304c7c43 upstream. Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidateTrond Myklebust1-2/+2
[Fixed upstream as part of 0b728e1911c, but that's a much larger patch, this is only the nfs portion backported as needed.] Fix the following Oops in 3.5.1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] PGD 337c63067 PUD 0 Oops: 0000 [#1] SMP CPU 5 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal processor thermal_sys Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT RIP: 0010:[<ffffffffa03789cd>] [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP: 0018:ffff8801b418bd38 EFLAGS: 00010292 RAX: 00000000fffffff6 RBX: ffff88032016d800 RCX: 0000000000000020 RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff8801824a7b00 RBP: ffff8801b418bdf8 R08: 7fffff0034323030 R09: fffffffff04c03ed R10: ffff8801824a7b00 R11: 0000000000000002 R12: ffff8801824a7b00 R13: ffff8801824a7b00 R14: 0000000000000000 R15: ffff8803201725d0 FS: 00002b53a46cb700(0000) GS:ffff88033fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 000000020a426000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 30431, threadinfo ffff8801b418a000, task ffff8801b5d20600) Stack: ffff8801b418be44 ffff88032016d800 ffff8801b418bdf8 0000000000000000 ffff8801824a7b00 ffff8801b418bdd7 ffff8803201725d0 ffffffff8116a9c0 ffff8801b5c38dc0 0000000000000007 ffff88032016d800 0000000000000000 Call Trace: [<ffffffff8116a9c0>] lookup_dcache+0x80/0xe0 [<ffffffff8116aa43>] __lookup_hash+0x23/0x90 [<ffffffff8116b4a5>] lookup_one_len+0xc5/0x100 [<ffffffffa03869a3>] nfs_sillyrename+0xe3/0x210 [nfs] [<ffffffff8116cadf>] vfs_unlink.part.25+0x7f/0xe0 [<ffffffff8116f22c>] do_unlinkat+0x1ac/0x1d0 [<ffffffff815717b9>] system_call_fastpath+0x16/0x1b [<00002b5348b5f527>] 0x2b5348b5f526 Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 <f6> 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89 RIP [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP <ffff8801b418bd38> CR2: 0000000000000038 ---[ end trace 845113ed191985dd ]--- This Oops affects 3.5 kernels and older, and is due to lookup_one_len() calling down to the dentry revalidation code with a NULL pointer to struct nameidata. It is fixed upstream by commit 0b728e1911c (stop passing nameidata * to ->d_revalidate()) Reported-by: Richard Ems <richard.ems@cape-horn-eng.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18NFS: fix bug in legacy DNS resolver.NeilBrown1-2/+3
commit 8d96b10639fb402357b75b055b1e82a65ff95050 upstream. The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18NFSv4.1: We must release the sequence id when we fail to get a session slotTrond Myklebust1-13/+23
commit 2240a9e2d013d8269ea425b73e1d7a54c7bc141f upstream. If we do not release the sequence id in cases where we fail to get a session slot, then we can deadlock if we hit a recovery scenario. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18NFSv4: nfs4_locku_done must release the sequence idTrond Myklebust1-0/+1
commit 2b1bc308f492589f7d49012ed24561534ea2be8c upstream. If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18nfs: Show original device name verbatim in /proc/*/mount{s,info}Ben Hutchings4-9/+20
commit 97a54868262da1629a3e65121e65b8e8c4419d9f upstream. Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-18nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeoutsScott Mayhew1-1/+1
commit acce94e68a0f346115fd41cdc298197d2d5a59ad upstream. In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: return error from decode_getfh in decode openWeston Andros Adamson1-1/+2
commit 01913b49cf1dc6409a07dd2a4cc6af2e77f3c410 upstream. If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last decode_* call must have succeeded. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: Fix a problem with the legacy binary mount codeTrond Myklebust1-0/+2
commit 872ece86ea5c367aa92f44689c2d01a1c767aeb3 upstream. Apparently, am-utils is still using the legacy binary mountdata interface, and is having trouble parsing /proc/mounts due to the 'port=' field being incorrectly set. The following patch should fix up the regression. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust3-4/+4
commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: Alias the nfs module to nfs4bjschuma@gmail.com1-0/+2
commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: return -ENOKEY when the upcall fails to map the nameBryan Schumaker1-4/+2
commit 12dfd080556124088ed61a292184947711b46cbe upstream. This allows the normal error-paths to handle the error, rather than making a special call to complete_request_key() just for this instance. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14NFS: Clear key construction data if the idmap upcall failsBryan Schumaker1-14/+42
commit c5066945b7ea346a11424dbeb7830b7d7d00c206 upstream. idmap_pipe_downcall already clears this field if the upcall succeeds, but if it fails (rpc.idmapd isn't running) the field will still be set on the next call triggering a BUG_ON(). This patch tries to handle all possible ways that the upcall could fail and clear the idmap key data for each one. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>