<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/sunrpc/svc_rdma.h, branch linux-7.1.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-30T01:25:09+00:00</updated>
<entry>
<title>svcrdma: Add Write chunk WRs to the RPC's Send WR chain</title>
<updated>2026-03-30T01:25:09+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2026-02-27T14:03:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d16f060f3ee297424c0aba047b1d49208adb9318'/>
<id>urn:sha1:d16f060f3ee297424c0aba047b1d49208adb9318</id>
<content type='text'>
Previously, Write chunk RDMA Writes were posted via a separate
ib_post_send() call with their own completion handler. Each Write
chunk incurred a doorbell and generated a completion event.

Link Write chunk WRs onto the RPC Reply's Send WR chain so that a
single ib_post_send() call posts both the RDMA Writes and the Send
WR. A single completion event signals that all operations have
finished. This reduces both doorbell rate and completion rate, as
well as eliminating the latency of a round-trip between the Write
chunk completion and the subsequent Send WR posting.

The lifecycle of Write chunk resources changes: previously, the
svc_rdma_write_done() completion handler released Write chunk
resources when RDMA Writes completed. With WR chaining, resources
remain live until the Send completion. A new sc_write_info_list
tracks Write chunk metadata attached to each Send context, and
svc_rdma_write_chunk_release() frees these resources when the
Send context is released.

The svc_rdma_write_done() handler now handles only error cases.
On success it returns immediately since the Send completion handles
resource release. On failure (WR flush), it closes the connection
to signal to the client that the RPC Reply is incomplete.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Add fair queuing for Send Queue access</title>
<updated>2026-03-30T01:25:09+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2026-02-27T14:03:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ccc89b9d1ed233349cfe8d87b842e7351b74d8de'/>
<id>urn:sha1:ccc89b9d1ed233349cfe8d87b842e7351b74d8de</id>
<content type='text'>
When the Send Queue fills, multiple threads may wait for SQ slots.
The previous implementation had no ordering guarantee, allowing
starvation when one thread repeatedly acquires slots while others
wait indefinitely.

Introduce a ticket-based fair queuing system. Each waiter takes a
ticket number and is served in FIFO order. This ensures forward
progress for all waiters when SQ capacity is constrained.

The implementation has two phases:
1. Fast path: attempt to reserve SQ slots without waiting
2. Slow path: take a ticket, wait for turn, then wait for slots

The ticket system adds two atomic counters to the transport:
- sc_sq_ticket_head: next ticket to issue
- sc_sq_ticket_tail: ticket currently being served

A dedicated wait queue (sc_sq_ticket_wait) handles ticket
ordering, separate from sc_send_wait which handles SQ capacity.
This separation ensures that send completions (the high-frequency
wake source) wake only the current ticket holder rather than all
queued waiters. Ticket handoff wakes only the ticket wait queue,
and each ticket holder that exits via connection close propagates
the wake to the next waiter in line.

When a waiter successfully reserves slots, it advances the tail
counter and wakes the next waiter. This creates an orderly handoff
that prevents starvation while maintaining good throughput on the
fast path when contention is low.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Increase the server's default RPC/RDMA credit grant</title>
<updated>2025-11-16T23:20:11+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2025-10-02T14:00:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=566a414558aec1ab263ab8709fa783dfa2e34325'/>
<id>urn:sha1:566a414558aec1ab263ab8709fa783dfa2e34325</id>
<content type='text'>
The range of commits from commit e3274026e2ec ("SUNRPC: move all of
xprt handling into svc_xprt_handle()") to commit 15d39883ee7d
("SUNRPC: change the back-channel queue to lwq") enabled NFSD
performance to scale better as the number of nfsd threads is
increased. These commits were merged in v6.7.

Now that the nfsd thread count can scale to more threads, permit
individual clients to make more use of those threads. Increase the
RPC/RDMA per-connection credit grant from 64 to 128 -- same as the
Linux NFS client.

Simple single client fio-based benchmarking so far shows only
improvement, no regression.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages</title>
<updated>2025-05-15T20:16:26+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2025-04-28T19:36:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56ab43f50d708e155f013ef60cc1b4fe39bed42e'/>
<id>urn:sha1:56ab43f50d708e155f013ef60cc1b4fe39bed42e</id>
<content type='text'>
Allow allocation of more entries in the sc_pages[] array when the
maximum size of an RPC message is increased.

Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: NeilBrown &lt;neil@brown.name&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages</title>
<updated>2025-05-15T20:16:26+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2025-04-28T19:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81381d1a90dea186ce4585d9bd3cdc40b50e9c48'/>
<id>urn:sha1:81381d1a90dea186ce4585d9bd3cdc40b50e9c48</id>
<content type='text'>
Allow allocation of more entries in the rc_pages[] array when the
maximum size of an RPC message is increased.

Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: NeilBrown &lt;neil@brown.name&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Handle device removal outside of the CM event handler</title>
<updated>2024-09-20T23:31:03+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2024-07-29T20:52:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4de97f7c45434985e5dbf2d6ccc9eca676e37fe'/>
<id>urn:sha1:c4de97f7c45434985e5dbf2d6ccc9eca676e37fe</id>
<content type='text'>
Synchronously wait for all disconnects to complete to ensure the
transports have divested all hardware resources before the
underlying RDMA device can safely be removed.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"</title>
<updated>2024-04-20T15:20:41+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2024-04-19T15:51:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=32cf5a4eda464d76d553ee3f1b06c4d33d796c52'/>
<id>urn:sha1:32cf5a4eda464d76d553ee3f1b06c4d33d796c52</id>
<content type='text'>
Performance regression reported with NFS/RDMA using Omnipath,
bisected to commit e084ee673c77 ("svcrdma: Add Write chunk WRs to
the RPC's Send WR chain").

Tracing on the server reports:

  nfsd-7771  [060]  1758.891809: svcrdma_sq_post_err:
	cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12

sq_post_err reports ENOMEM, and the rdma-&gt;sc_sq_avail (13643) is
larger than rdma-&gt;sc_sq_depth (851). The number of available Send
Queue entries is always supposed to be smaller than the Send Queue
depth. That seems like a Send Queue accounting bug in svcrdma.

As it's getting to be late in the 6.9-rc cycle, revert this commit.
It can be revisited in a subsequent kernel release.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743
Fixes: e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain")
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Add Write chunk WRs to the RPC's Send WR chain</title>
<updated>2024-03-01T14:12:30+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2024-02-04T23:17:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e084ee673c77cade06ab4c2e36b5624c82608b8c'/>
<id>urn:sha1:e084ee673c77cade06ab4c2e36b5624c82608b8c</id>
<content type='text'>
Chain RDMA Writes that convey Write chunks onto the local Send
chain. This means all WRs for an RPC Reply are now posted with a
single ib_post_send() call, and there is a single Send completion
when all of these are done. That reduces both the per-transport
doorbell rate and completion rate.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Post WRs for Write chunks in svc_rdma_sendto()</title>
<updated>2024-03-01T14:12:29+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2024-02-04T23:17:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2727cefff0204c3074a10e3ad2e1b5c9dfb986c'/>
<id>urn:sha1:d2727cefff0204c3074a10e3ad2e1b5c9dfb986c</id>
<content type='text'>
Refactor to eventually enable svcrdma to post the Write WRs for each
RPC response using the same ib_post_send() as the Send WR (ie, as a
single WR chain).

svc_rdma_result_payload (originally svc_rdma_read_payload) was added
so that the upper layer XDR encoder could identify a range of bytes
to be possibly conveyed by RDMA (if a Write chunk was provided by
the client).

The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while
XDR encoding replies") was to post as much of the result payload
outside of svc_rdma_sendto() as possible because svc_rdma_sendto()
used to be called with the xpt_mutex held.

However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into
socket xpo_sendto methods"), the xpt_mutex is no longer held when
calling svc_rdma_sendto(). Thus, that benefit is no longer an issue.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
<entry>
<title>svcrdma: Post the Reply chunk and Send WR together</title>
<updated>2024-03-01T14:12:29+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2024-02-04T23:17:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10e6fc1054d900a205e5233f186ebce9c50e1d1d'/>
<id>urn:sha1:10e6fc1054d900a205e5233f186ebce9c50e1d1d</id>
<content type='text'>
Reduce the doorbell and Send completion rates when sending RPC/RDMA
replies that have Reply chunks. NFS READDIR procedures typically
return their result in a Reply chunk, for example.

Instead of calling ib_post_send() to post the Write WRs for the
Reply chunk, and then calling it again to post the Send WR that
conveys the transport header, chain the Write WRs to the Send WR
and call ib_post_send() only once.

Thanks to the Send Queue completion ordering rules, when the Send
WR completes, that guarantees that Write WRs posted before it have
also completed successfully. Thus all Write WRs for the Reply chunk
can remain unsignaled. Instead of handling a Write completion and
then a Send completion, only the Send completion is seen, and it
handles clean up for both the Writes and the Send.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
</entry>
</feed>
