diff options
author | Priyaranjan Jha <priyarjha@google.com> | 2017-11-04 02:38:48 +0300 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2017-11-05 17:15:42 +0300 |
commit | 1f2556916d974cfb62b6af51660186b5f58bd869 (patch) | |
tree | fb63694954f33401a8641c794e118f845ca61057 /net/ipv4/tcp_input.c | |
parent | 6c49b5e26004eef86e7a47093a53be290554351c (diff) | |
download | linux-1f2556916d974cfb62b6af51660186b5f58bd869.tar.xz |
tcp: higher throughput under reordering with adaptive RACK reordering wnd
Currently TCP RACK loss detection does not work well if packets are
being reordered beyond its static reordering window (min_rtt/4).Under
such reordering it may falsely trigger loss recoveries and reduce TCP
throughput significantly.
This patch improves that by increasing and reducing the reordering
window based on DSACK, which is now supported in major TCP implementations.
It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.
- If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
by srtt), since there is possibility that spurious retransmission was
due to reordering delay longer than reo_wnd.
- Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
no. of successful recoveries (accounts for full DSACK-based loss
recovery undo). After that, reset it to default (min_rtt/4).
- At max, reo_wnd is incremented only once per rtt. So that the new
DSACK on which we are reacting, is due to the spurious retx (approx)
after the reo_wnd has been updated last time.
- reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
absolute value to account for change in rtt.
In our internal testing, we observed significant increase in throughput,
in scenarios where reordering exceeds min_rtt/4 (previous static value).
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_input.c')
-rw-r--r-- | net/ipv4/tcp_input.c | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 8393b405ea98..0ada8bfc2ebd 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -856,6 +856,7 @@ void tcp_disable_fack(struct tcp_sock *tp) static void tcp_dsack_seen(struct tcp_sock *tp) { tp->rx_opt.sack_ok |= TCP_DSACK_SEEN; + tp->rack.dsack_seen = 1; } static void tcp_update_reordering(struct sock *sk, const int metric, @@ -2408,6 +2409,8 @@ static bool tcp_try_undo_recovery(struct sock *sk) mib_idx = LINUX_MIB_TCPFULLUNDO; NET_INC_STATS(sock_net(sk), mib_idx); + } else if (tp->rack.reo_wnd_persist) { + tp->rack.reo_wnd_persist--; } if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) { /* Hold old state until something *above* high_seq @@ -2427,6 +2430,8 @@ static bool tcp_try_undo_dsack(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); if (tp->undo_marker && !tp->undo_retrans) { + tp->rack.reo_wnd_persist = min(TCP_RACK_RECOVERY_THRESH, + tp->rack.reo_wnd_persist + 1); DBGUNDO(sk, "D-SACK"); tcp_undo_cwnd_reduction(sk, false); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDSACKUNDO); @@ -3644,6 +3649,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked, &sack_state); + tcp_rack_update_reo_wnd(sk, &rs); + if (tp->tlp_high_seq) tcp_process_tlp_ack(sk, ack, flag); /* If needed, reset TLP/RTO timer; RACK may later override this. */ |