<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/rds/recv.c, branch v7.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-05T04:46:39+00:00</updated>
<entry>
<title>net/rds: Trigger rds_send_ping() more than once</title>
<updated>2026-02-05T04:46:39+00:00</updated>
<author>
<name>Gerd Rausch</name>
<email>gerd.rausch@oracle.com</email>
</author>
<published>2026-02-03T05:57:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d27a0fb122f19b6d01d02f4b4f429ca28811ace'/>
<id>urn:sha1:9d27a0fb122f19b6d01d02f4b4f429ca28811ace</id>
<content type='text'>
Even though a peer may have already received a
non-zero value for "RDS_EXTHDR_NPATHS" from a node in the past,
the current peer may not.

Therefore it is important to initiate another rds_send_ping()
after a re-connect to any peer:
It is unknown at that time if we're still talking to the same
instance of RDS kernel modules on the other side.

Otherwise, the peer may just operate on a single lane
("c_npaths == 0"), not knowing that more lanes are available.

However, if "c_with_sport_idx" is supported,
we also need to check that the connection we accepted on lane#0
meets the proper source port modulo requirement, as we fan out:

Since the exchange of "RDS_EXTHDR_NPATHS" and "RDS_EXTHDR_SPORT_IDX"
is asynchronous, initially we have no choice but to accept an incoming
connection (via "accept") in the first slot ("cp_index == 0")
for backwards compatibility.

But that very connection may have come from a different lane
with "cp_index != 0", since the peer thought that we already understood
and handled "c_with_sport_idx" properly, as indicated by a previous
exchange before a module was reloaded.

In short:
If a module gets reloaded, we recover from that, but do *not*
allow a downgrade to support fewer lanes.

Downgrades would require us to merge messages from separate lanes,
which is rather tricky with the current RDS design.
Each lane has its own sequence number space and all messages
would need to be re-sequenced as we merge, all while
handling "RDS_FLAG_RETRANSMITTED" and "cp_retrans" properly.

Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Signed-off-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://patch.msgid.link/20260203055723.1085751-9-achender@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/rds: Use the first lane until RDS_EXTHDR_NPATHS arrives</title>
<updated>2026-02-05T04:46:39+00:00</updated>
<author>
<name>Gerd Rausch</name>
<email>gerd.rausch@oracle.com</email>
</author>
<published>2026-02-03T05:57:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1f53d5fb6b2e4c81b438ea41f5f16482d084e08'/>
<id>urn:sha1:a1f53d5fb6b2e4c81b438ea41f5f16482d084e08</id>
<content type='text'>
Instead of just blocking the sender until "c_npaths" is known
(it gets updated upon the receipt of a MPRDS PONG message),
simply use the first lane (cp_index#0).

But just using the first lane isn't enough.

As soon as we enqueue messages on a different lane, we'd run the risk
of out-of-order delivery of RDS messages.

Earlier messages enqueued on "cp_index == 0" could be delivered later
than more recent messages enqueued on "cp_index &gt; 0", mostly because of
possible head of line blocking issues causing the first lane to be
slower.

To avoid that, we simply take a snapshot of "cp_next_tx_seq" at the
time we're about to fan-out to more lanes.

Then we delay the transmission of messages enqueued on other lanes
with "cp_index &gt; 0" until cp_index#0 caught up with the delivery of
new messages (from "cp_send_queue") as well as in-flight
messages (from "cp_retrans") that haven't been acknowledged yet
by the receiver.

We also add a new counter "mprds_catchup_tx0_retries" to keep track
of how many times "rds_send_xmit" had to suspend activities,
because it was waiting for the first lane to catch up.

Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Signed-off-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://patch.msgid.link/20260203055723.1085751-8-achender@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/rds: Encode cp_index in TCP source port</title>
<updated>2026-02-05T04:46:38+00:00</updated>
<author>
<name>Gerd Rausch</name>
<email>gerd.rausch@oracle.com</email>
</author>
<published>2026-02-03T05:57:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a20a6992558fa7c19a03c76bea4a793ccaef8505'/>
<id>urn:sha1:a20a6992558fa7c19a03c76bea4a793ccaef8505</id>
<content type='text'>
Upon "sendmsg", RDS/TCP selects a backend connection based
on a hash calculated from the source-port ("RDS_MPATH_HASH").

However, "rds_tcp_accept_one" accepts connections
in the order they arrive, which is non-deterministic.

Therefore the mapping of the sender's "cp-&gt;cp_index"
to that of the receiver changes if the backend
connections are dropped and reconnected.

However, connection state that's preserved across reconnects
(e.g. "cp_next_rx_seq") relies on that sender&lt;-&gt;receiver
mapping to never change.

So we make sure that client and server of the TCP connection
have the exact same "cp-&gt;cp_index" across reconnects by
encoding "cp-&gt;cp_index" in the lower three bits of the
client's TCP source port.

A new extension "RDS_EXTHDR_SPORT_IDX" is introduced,
that allows the server to tell the difference between
clients that do the "cp-&gt;cp_index" encoding, and
legacy clients that pick source ports randomly.

Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Signed-off-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://patch.msgid.link/20260203055723.1085751-3-achender@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/rds: rds_tcp_accept_one ought to not discard messages</title>
<updated>2026-01-23T19:51:31+00:00</updated>
<author>
<name>Gerd Rausch</name>
<email>gerd.rausch@oracle.com</email>
</author>
<published>2026-01-22T05:52:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db69e9b838c39f4fb17d0547aeb71d55a7f28061'/>
<id>urn:sha1:db69e9b838c39f4fb17d0547aeb71d55a7f28061</id>
<content type='text'>
RDS/TCP differs from RDS/RDMA in that message acknowledgment
is done based on TCP sequence numbers:
As soon as the last byte of a message has been acknowledged
by the TCP stack of a peer, "rds_tcp_write_space()" goes on
to discard prior messages from the send queue.

Which is fine, for as long as the receiver never throws any messages away.

Unfortunately, that is *not* the case since the introduction of MPRDS:
commit 1a0e100fb2c96 "RDS: TCP: Enable multipath RDS for TCP"

A new function "rds_tcp_accept_one_path" was introduced,
which is entitled to return "NULL", if no connection path is currently
available.

Unfortunately, this happens after the "-&gt;accept()" call, and the new socket
often already contains messages, since the peer already transitioned
to "RDS_CONN_UP" on behalf of "TCP_ESTABLISHED".

That's also the case after this [1]:
commit 1a0e100fb2c96 "RDS: TCP: Force every connection to be initiated by
numerically smaller IP address"

which tried to address the situation of pending data by only transitioning
connections from a smaller IP address to "RDS_CONN_UP".

But even in those cases, and in particular if the "RDS_EXTHDR_NPATHS"
handshake has not occurred yet, and therefore we're working with
"c_npaths &lt;= 1", "c_conn[0]" may be in a state distinct from
"RDS_CONN_DOWN", and therefore all messages on the just accepted socket
will be tossed away.

This fix changes "rds_tcp_accept_one":

* If connected from a peer with a larger IP address, the new socket
  will continue to get closed right away.
  With commit [1] above, there should not be any messages
  in the socket receive buffer, since the peer never transitioned
  to "RDS_CONN_UP".
  Therefore it should be okay to not make any efforts to dispatch
  the socket receive buffer.

* If connected from a peer with a smaller IP address,
  we call "rds_tcp_accept_one_path" to find a free slot/"path".
  If found, business goes on as usual.
  If none was found, we save/stash the newly accepted socket
  into "rds_tcp_accepted_sock", in order to not lose any
  messages that may have arrived already.
  We then return from "rds_tcp_accept_one" with "-ENOBUFS".
  Later on, when a slot/"path" does become available again
  (e.g. state transitioned to "RDS_CONN_DOWN",
   or HS extension header was received with "c_npaths &gt; 1")
  we call "rds_tcp_conn_slots_available" that simply re-issues
  a "rds_tcp_accept_one_path" worker-callback and picks
  up the new socket from "rds_tcp_accepted_sock", and thereby
  continuing where it left with "-ENOBUFS" last time.
  Since a new slot has become available, those messages
  won't be lost, since processing proceeds as if that slot
  had been available the first time around.

Signed-off-by: Gerd Rausch &lt;gerd.rausch@oracle.com&gt;
Signed-off-by: Jack Vogel &lt;jack.vogel@oracle.com&gt;
Signed-off-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://patch.msgid.link/20260122055213.83608-3-achender@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>rds: Fix endianness annotations for RDS extension headers</title>
<updated>2025-08-22T23:44:39+00:00</updated>
<author>
<name>Ujwal Kundur</name>
<email>ujwal.kundur@gmail.com</email>
</author>
<published>2025-08-20T17:55:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bcb28bee987a1e161eaa5cc4cf2fb0e21306d4a7'/>
<id>urn:sha1:bcb28bee987a1e161eaa5cc4cf2fb0e21306d4a7</id>
<content type='text'>
Per the RDS 3.1 spec [1], RDS extension headers EXTHDR_NPATHS and
EXTHDR_GEN_NUM are be16 and be32 values respectively, exchanged during
normal operations over-the-wire (RDS Ping/Pong). This contrasts their
declarations as host endian unsigned ints.

Fix the annotations across occurrences. Flagged by Sparse.

[1] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html

Signed-off-by: Ujwal Kundur &lt;ujwal.kundur@gmail.com&gt;
Reviewed-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://patch.msgid.link/20250820175550.498-5-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net:rds: Fix possible deadlock in rds_message_put</title>
<updated>2024-02-13T09:25:30+00:00</updated>
<author>
<name>Allison Henderson</name>
<email>allison.henderson@oracle.com</email>
</author>
<published>2024-02-09T02:28:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1acf1ac84d2ae97b7889b87223c1064df850069'/>
<id>urn:sha1:f1acf1ac84d2ae97b7889b87223c1064df850069</id>
<content type='text'>
Functions rds_still_queued and rds_clear_recv_queue lock a given socket
in order to safely iterate over the incoming rds messages. However
calling rds_inc_put while under this lock creates a potential deadlock.
rds_inc_put may eventually call rds_message_purge, which will lock
m_rs_lock. This is the incorrect locking order since m_rs_lock is
meant to be locked before the socket. To fix this, we move the message
item to a local list or variable that wont need rs_recv_lock protection.
Then we can safely call rds_inc_put on any item stored locally after
rs_recv_lock is released.

Fixes: bdbe6fbc6a2f ("RDS: recv.c")
Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com
Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com
Signed-off-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>net: add missing includes of linux/sched/clock.h</title>
<updated>2023-01-27T11:19:46+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2023-01-26T07:14:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2870c4d6a5e479659cb84992717189634c1e71e0'/>
<id>urn:sha1:2870c4d6a5e479659cb84992717189634c1e71e0</id>
<content type='text'>
Number of files depend on linux/sched/clock.h getting included
by linux/skbuff.h which soon will no longer be the case.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rds: fix memory leak in rds_recvmsg</title>
<updated>2021-06-08T23:32:17+00:00</updated>
<author>
<name>Pavel Skripkin</name>
<email>paskripkin@gmail.com</email>
</author>
<published>2021-06-08T08:06:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49bfcbfd989a8f1f23e705759a6bb099de2cff9f'/>
<id>urn:sha1:49bfcbfd989a8f1f23e705759a6bb099de2cff9f</id>
<content type='text'>
Syzbot reported memory leak in rds. The problem
was in unputted refcount in case of error.

int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
		int msg_flags)
{
...

	if (!rds_next_incoming(rs, &amp;inc)) {
		...
	}

After this "if" inc refcount incremented and

	if (rds_cmsg_recv(inc, msg, rs)) {
		ret = -EFAULT;
		goto out;
	}
...
out:
	return ret;
}

in case of rds_cmsg_recv() fail the refcount won't be
decremented. And it's easy to see from ftrace log, that
rds_inc_addref() don't have rds_inc_put() pair in
rds_recvmsg() after rds_cmsg_recv()

 1)               |  rds_recvmsg() {
 1)   3.721 us    |    rds_inc_addref();
 1)   3.853 us    |    rds_message_inc_copy_to_user();
 1) + 10.395 us   |    rds_cmsg_recv();
 1) + 34.260 us   |  }

Fixes: bdbe6fbc6a2f ("RDS: recv.c")
Reported-and-tested-by: syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin &lt;paskripkin@gmail.com&gt;
Reviewed-by: Håkon Bugge &lt;haakon.bugge@oracle.com&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/rds: Drop duplicate sin and sin6 assignments</title>
<updated>2021-03-10T20:45:15+00:00</updated>
<author>
<name>Yejune Deng</name>
<email>yejune.deng@gmail.com</email>
</author>
<published>2021-03-10T03:23:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e6f20e09a455d95e61c60d03a9994848b0f853c'/>
<id>urn:sha1:3e6f20e09a455d95e61c60d03a9994848b0f853c</id>
<content type='text'>
There is no need to assign the msg-&gt;msg_name to sin or sin6,
because there is DECLARE_SOCKADDR statement.

Signed-off-by: Yejune Deng &lt;yejune.deng@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rds: Prevent kernel-infoleak in rds_notify_queue_get()</title>
<updated>2020-07-31T23:52:48+00:00</updated>
<author>
<name>Peilin Ye</name>
<email>yepeilin.cs@gmail.com</email>
</author>
<published>2020-07-30T19:20:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bbc8a99e952226c585ac17477a85ef1194501762'/>
<id>urn:sha1:bbc8a99e952226c585ac17477a85ef1194501762</id>
<content type='text'>
rds_notify_queue_get() is potentially copying uninitialized kernel stack
memory to userspace since the compiler may leave a 4-byte hole at the end
of `cmsg`.

In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which
unfortunately does not always initialize that 4-byte hole. Fix it by using
memset() instead.

Cc: stable@vger.kernel.org
Fixes: f037590fff30 ("rds: fix a leak of kernel memory")
Fixes: bdbe6fbc6a2f ("RDS: recv.c")
Suggested-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Peilin Ye &lt;yepeilin.cs@gmail.com&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
