summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNeilBrown <neilb@suse.com>2015-07-27 03:55:35 +0300
committerTrond Myklebust <trond.myklebust@primarydata.com>2015-07-27 18:16:56 +0300
commit743c69e7c089ba1bea1b207c5829dd079a4e98f9 (patch)
tree2df8387019360a21ba6baef04998c2dfc4f7a1d6
parentbdcc2cd14e4e86f1fb19c78e191a1b8da3d6dea0 (diff)
downloadlinux-743c69e7c089ba1bea1b207c5829dd079a4e98f9.tar.xz
sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable.
The networking layer does not reliably report the distinction between a non-block write failing because: 1/ the queue is too full already and 2/ a memory allocation attempt failed. The distinction is important because in the first case it is appropriate to retry as soon as the socket reports that it is writable, and in the second case a small delay is required as the socket will most likely report as writable but kmalloc could still fail. sk_stream_wait_memory() exhibits this distinction nicely, setting 'vm_wait' if a small wait is needed. However in the non-blocking case it always returns -EAGAIN no matter the cause of the failure. This -EAGAIN call get all the way to sunrpc. The sunrpc layer expects EAGAIN to indicate the first cause, and ENOBUFS to indicate the second. Various documentation suggests that this is not unreasonable, but does not guarantee the desired error codes. The result of getting -EAGAIN when -ENOBUFS is expected is that the send is tried again in a tight loop and soft lockups are reported. so: add tests after calls to xs_sendpages() to translate -EAGAIN into -ENOBUFS if the socket is writable. This cannot happen inside xs_sendpages() as the test for "is socket writable" is different between TCP and UDP. With this change, the tight loop retrying xs_sendpages() becomes a loop which only retries every 250ms, and so will not trigger a soft-lockup warning. It is possible that the write did fail because the queue was too full and by the time xs_sendpages() completed, the queue was writable again. In this case an extra 250ms delay is inserted that isn't really needed. This circumstance suggests a degree of congestion so a delay is not necessarily a bad thing, and it can only cause a single 250ms delay, not a series of them. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-rw-r--r--net/sunrpc/xprtsock.c9
1 files changed, 9 insertions, 0 deletions
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 44c1927b68c7..4f48b1a19e9f 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -527,6 +527,10 @@ static int xs_local_send_request(struct rpc_task *task)
true, &sent);
dprintk("RPC: %s(%u) = %d\n",
__func__, xdr->len - req->rq_bytes_sent, status);
+
+ if (status == -EAGAIN && sock_writeable(transport->inet))
+ status = -ENOBUFS;
+
if (likely(sent > 0) || status == 0) {
req->rq_bytes_sent += sent;
req->rq_xmit_bytes_sent += sent;
@@ -590,6 +594,9 @@ static int xs_udp_send_request(struct rpc_task *task)
if (status == -EPERM)
goto process_status;
+ if (status == -EAGAIN && sock_writeable(transport->inet))
+ status = -ENOBUFS;
+
if (sent > 0 || status == 0) {
req->rq_xmit_bytes_sent += sent;
if (sent >= req->rq_slen)
@@ -687,6 +694,8 @@ static int xs_tcp_send_request(struct rpc_task *task)
status = -EAGAIN;
break;
}
+ if (status == -EAGAIN && sk_stream_is_writeable(transport->inet))
+ status = -ENOBUFS;
switch (status) {
case -ENOTSOCK: