summaryrefslogtreecommitdiff
path: root/scripts
diff options
context:
space:
mode:
authorChuck Lever <chuck.lever@oracle.com>2026-06-04 20:06:36 +0300
committerAnna Schumaker <anna.schumaker@hammerspace.com>2026-06-10 22:47:06 +0300
commit234c0ff695ef3ffb656931000e6b823d0c2f30fd (patch)
tree131385fe36f3f6ad5b464530be6d984360a16757 /scripts
parent0f13fc7c7d2e0427517e63c739277a4cd338b0c5 (diff)
downloadlinux-234c0ff695ef3ffb656931000e6b823d0c2f30fd.tar.xz
xprtrdma: Resize reply buffers before reposting receives
Commit 0e13dd9ea8be ("xprtrdma: Remove temp allocation of rpcrdma_rep objects") made rpcrdma_rep objects survive disconnects. That is normally fine, but it also means their receive regbufs keep the size they had when they were first allocated. Each rep's receive buffer is sized to ep->re_inline_recv when the rep is created. rpcrdma_ep_create() resets that threshold to the rdma_max_inline_read ceiling for every new endpoint, and the connect handshake then shrinks it to the peer's advertised inline send size. A rep allocated under a smaller negotiated threshold keeps that size: on disconnect, rpcrdma_xprt_disconnect() drains and DMA-unmaps the surviving reps but does not free or resize them. The threshold can come back larger on the next connection. The first peer may supply no RPC-over-RDMA CM private data, defaulting its send size to 1024, while the reconnect target is an ordinary server offering 4096; or, with rdma_max_inline_read raised above its default, the reconnect target may advertise a larger svcrdma_max_req_size than the first. rpcrdma_post_recvs() then reposts a surviving rep whose SGE length is still the old, smaller value, and a larger inline Reply hits a receive length error and forces another disconnect. The undersized rep returns to the free list when its failed Receive flushes, so the following reconnect reposts the same rep and fails the same way. The transport flaps without making forward progress for as long as the peer keeps advertising the larger inline size. This is local/admin-triggerable rather than remote-triggerable: a local administrator must create and maintain the NFS/RDMA mount, while the server or reconnect target has to advertise a larger inline send size and return a reply that uses it. Fix this by checking each rep before it is reposted. If the receive regbuf is smaller than the current endpoint's inline receive size, reallocate it on the current RDMA device's NUMA node and reinitialize the rep's xdr_buf before DMA-mapping and posting the Receive WR. Fixes: 0e13dd9ea8be ("xprtrdma: Remove temp allocation of rpcrdma_rep objects") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Diffstat (limited to 'scripts')
0 files changed, 0 insertions, 0 deletions