kernel/linux.git/net/rds/send.c, branch v7.0-rc7

net/rds: rds_sendmsg should not discard payload_len

2026-02-17T11:03:57+00:00

Commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") modifies rds_sendmsg to avoid enqueueing work while a tear down is in progress. However, it also changed the return value of rds_sendmsg to that of rds_send_xmit instead of the payload_len. This means the user may incorrectly receive errno values when it should have simply received a payload of 0 while the peer attempts a reconnections. So this patch corrects the teardown handling code to only use the out error path in that case, thus restoring the original payload_len return value. Fixes: 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") Reviewed-by: Simon Horman Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260213035409.1963391-1-achender@kernel.org Signed-off-by: Paolo Abeni

net/rds: Trigger rds_send_ping() more than once

2026-02-05T04:46:39+00:00

Even though a peer may have already received a non-zero value for "RDS_EXTHDR_NPATHS" from a node in the past, the current peer may not. Therefore it is important to initiate another rds_send_ping() after a re-connect to any peer: It is unknown at that time if we're still talking to the same instance of RDS kernel modules on the other side. Otherwise, the peer may just operate on a single lane ("c_npaths == 0"), not knowing that more lanes are available. However, if "c_with_sport_idx" is supported, we also need to check that the connection we accepted on lane#0 meets the proper source port modulo requirement, as we fan out: Since the exchange of "RDS_EXTHDR_NPATHS" and "RDS_EXTHDR_SPORT_IDX" is asynchronous, initially we have no choice but to accept an incoming connection (via "accept") in the first slot ("cp_index == 0") for backwards compatibility. But that very connection may have come from a different lane with "cp_index != 0", since the peer thought that we already understood and handled "c_with_sport_idx" properly, as indicated by a previous exchange before a module was reloaded. In short: If a module gets reloaded, we recover from that, but do *not* allow a downgrade to support fewer lanes. Downgrades would require us to merge messages from separate lanes, which is rather tricky with the current RDS design. Each lane has its own sequence number space and all messages would need to be re-sequenced as we merge, all while handling "RDS_FLAG_RETRANSMITTED" and "cp_retrans" properly. Signed-off-by: Gerd Rausch Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260203055723.1085751-9-achender@kernel.org Signed-off-by: Jakub Kicinski

net/rds: Use the first lane until RDS_EXTHDR_NPATHS arrives

2026-02-05T04:46:39+00:00

Instead of just blocking the sender until "c_npaths" is known (it gets updated upon the receipt of a MPRDS PONG message), simply use the first lane (cp_index#0). But just using the first lane isn't enough. As soon as we enqueue messages on a different lane, we'd run the risk of out-of-order delivery of RDS messages. Earlier messages enqueued on "cp_index == 0" could be delivered later than more recent messages enqueued on "cp_index > 0", mostly because of possible head of line blocking issues causing the first lane to be slower. To avoid that, we simply take a snapshot of "cp_next_tx_seq" at the time we're about to fan-out to more lanes. Then we delay the transmission of messages enqueued on other lanes with "cp_index > 0" until cp_index#0 caught up with the delivery of new messages (from "cp_send_queue") as well as in-flight messages (from "cp_retrans") that haven't been acknowledged yet by the receiver. We also add a new counter "mprds_catchup_tx0_retries" to keep track of how many times "rds_send_xmit" had to suspend activities, because it was waiting for the first lane to catch up. Signed-off-by: Gerd Rausch Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260203055723.1085751-8-achender@kernel.org Signed-off-by: Jakub Kicinski

net/rds: Encode cp_index in TCP source port

2026-02-05T04:46:38+00:00

Upon "sendmsg", RDS/TCP selects a backend connection based on a hash calculated from the source-port ("RDS_MPATH_HASH"). However, "rds_tcp_accept_one" accepts connections in the order they arrive, which is non-deterministic. Therefore the mapping of the sender's "cp->cp_index" to that of the receiver changes if the backend connections are dropped and reconnected. However, connection state that's preserved across reconnects (e.g. "cp_next_rx_seq") relies on that sender<->receiver mapping to never change. So we make sure that client and server of the TCP connection have the exact same "cp->cp_index" across reconnects by encoding "cp->cp_index" in the lower three bits of the client's TCP source port. A new extension "RDS_EXTHDR_SPORT_IDX" is introduced, that allows the server to tell the difference between clients that do the "cp->cp_index" encoding, and legacy clients that pick source ports randomly. Signed-off-by: Gerd Rausch Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260203055723.1085751-3-achender@kernel.org Signed-off-by: Jakub Kicinski

net/rds: new extension header: rdma bytes

2026-02-05T04:46:38+00:00

Introduce a new extension header type RDSV3_EXTHDR_RDMA_BYTES for an RDMA initiator to exchange rdma byte counts to its target. Currently, RDMA operations cannot precisely account how many bytes a peer just transferred via RDMA, which limits per-connection statistics and future policy (e.g., monitoring or rate/cgroup accounting of RDMA traffic). In this patch we expand rds_message_add_extension to accept multiple extensions, and add new flag to RDS header: RDS_FLAG_EXTHDR_EXTENSION, along with a new extension to RDS header: rds_ext_header_rdma_bytes. Signed-off-by: Shamir Rabinovitch Signed-off-by: Guangyu Sun Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260203055723.1085751-2-achender@kernel.org Signed-off-by: Jakub Kicinski

net/rds: Add per cp work queue

2026-01-13T11:27:03+00:00

This patch adds a per connection workqueue which can be initialized and used independently of the globally shared rds_wq. This patch is the first in a series that aims to address tcp ack timeouts during the tcp socket shutdown sequence. This initial refactoring lays the ground work needed to alleviate queue congestion during heavy reads and writes. The independently managed queues will allow shutdowns and reconnects respond more quickly before the peer(s) timeout waiting for the proper acks. Signed-off-by: Allison Henderson Link: https://patch.msgid.link/20260109224843.128076-2-achender@kernel.org Signed-off-by: Paolo Abeni

rds: Fix endianness annotations for RDS extension headers

2025-08-22T23:44:39+00:00

Per the RDS 3.1 spec [1], RDS extension headers EXTHDR_NPATHS and EXTHDR_GEN_NUM are be16 and be32 values respectively, exchanged during normal operations over-the-wire (RDS Ping/Pong). This contrasts their declarations as host endian unsigned ints. Fix the annotations across occurrences. Flagged by Sparse. [1] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html Signed-off-by: Ujwal Kundur Reviewed-by: Allison Henderson Link: https://patch.msgid.link/20250820175550.498-5-ujwal.kundur@gmail.com Signed-off-by: Jakub Kicinski

rds: Correct spelling

2025-06-21T14:35:39+00:00

Correct spelling as flagged by codespell. With these changes in place codespell only flags false positives in net/rds. Signed-off-by: Simon Horman Reviewed-by: Allison Henderson Link: https://patch.msgid.link/20250619-rds-minor-v1-2-86d2ee3a98b9@kernel.org Signed-off-by: Jakub Kicinski

rds: introduce acquire/release ordering in acquire/release_in_xmit()

2024-03-19T11:15:35+00:00

acquire/release_in_xmit() work as bit lock in rds_send_xmit(), so they are expected to ensure acquire/release memory ordering semantics. However, test_and_set_bit/clear_bit() don't imply such semantics, on top of this, following smp_mb__after_atomic() does not guarantee release ordering (memory barrier actually should be placed before clear_bit()). Instead, we use clear_bit_unlock/test_and_set_bit_lock() here. Fixes: 0f4b1c7e89e6 ("rds: fix rds_send_xmit() serialization") Fixes: 1f9ecd7eacfd ("RDS: Pass rds_conn_path to rds_send_xmit()") Signed-off-by: Yewon Choi Reviewed-by: Michal Kubiak Link: https://lore.kernel.org/r/ZfQUxnNTO9AJmzwc@libra05 Signed-off-by: Paolo Abeni

net/rds: fix WARNING in rds_conn_connect_if_down

2024-03-06T11:58:42+00:00

If connection isn't established yet, get_mr() will fail, trigger connection after get_mr(). Fixes: 584a8279a44a ("RDS: RDMA: return appropriate error on rdma map failures") Reported-and-tested-by: syzbot+d4faee732755bba9838e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis Signed-off-by: David S. Miller