From d2000361e4ddf5047d660a902a3b0ed7520be1e5 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 7 Apr 2026 10:45:17 +0200 Subject: mptcp: better mptcp-level RTT estimator The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk RTT to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ACK on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level RTT. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. While at it, also use EWMA to compute the msk-level scaling_ratio, to that MPTCP can avoid traversing the subflow list is mptcp_rcv_space_adjust(). Use some care to avoid updating msk and ssk level fields too often. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-1-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski --- include/trace/events/mptcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/trace') diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 269d949b2025..04521acba483 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -219,7 +219,7 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __be32 *p32; __entry->time = time; - __entry->rtt_us = msk->rcvq_space.rtt_us >> 3; + __entry->rtt_us = mptcp_rtt_us_est(msk) >> 3; __entry->copied = msk->rcvq_space.copied; __entry->inq = mptcp_inq_hint(sk); __entry->space = msk->rcvq_space.space; -- cgit v1.2.3