<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ipv4/tcp_timer.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-10-17T13:24:23+00:00</updated>
<entry>
<title>tcp: new TCP_INFO stats for RTO events</title>
<updated>2024-10-17T13:24:23+00:00</updated>
<author>
<name>Aananth V</name>
<email>aananthv@google.com</email>
</author>
<published>2023-09-14T14:36:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=718c49f840ef4e451bf44a8a62aae89ebdd5a687'/>
<id>urn:sha1:718c49f840ef4e451bf44a8a62aae89ebdd5a687</id>
<content type='text'>
[ Upstream commit 3868ab0f192581eff978501a05f3dc2e01541d77 ]

The 2023 SIGCOMM paper "Improving Network Availability with Protective
ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can
effectively reduce application disruption during outages. To better
measure the efficacy of this feature, this patch adds three more
detailed stats during RTO recovery and exports via TCP_INFO.
Applications and monitoring systems can leverage this data to measure
the network path diversity and end-to-end repair latency during network
outages to improve their network infrastructure.

The following counters are added to tcp_sock in order to track RTO
events over the lifetime of a TCP socket.

1. u16 total_rto - Counts the total number of RTO timeouts.
2. u16 total_rto_recoveries - Counts the total number of RTO recoveries.
3. u32 total_rto_time - Counts the total time spent (ms) in RTO
                        recoveries. (time spent in CA_Loss and
                        CA_Recovery states)

To compute total_rto_time, we add a new u32 rto_stamp field to
tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO
recovery (CA_Loss).

Corresponding fields are also added to the tcp_info struct.

Signed-off-by: Aananth V &lt;aananthv@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Stable-dep-of: 27c80efcc204 ("tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits out")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix race in tcp_write_err()</title>
<updated>2024-08-03T06:53:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-05-28T12:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bc4f9c2ccd68afec3a8395d08fd329af2022c7e8'/>
<id>urn:sha1:bc4f9c2ccd68afec3a8395d08fd329af2022c7e8</id>
<content type='text'>
[ Upstream commit 853c3bd7b7917670224c9fe5245bd045cac411dd ]

I noticed flakes in a packetdrill test, expecting an epoll_wait()
to return EPOLLERR | EPOLLHUP on a failed connect() attempt,
after multiple SYN retransmits. It sometimes return EPOLLERR only.

The issue is that tcp_write_err():
 1) writes an error in sk-&gt;sk_err,
 2) calls sk_error_report(),
 3) then calls tcp_done().

tcp_done() is writing SHUTDOWN_MASK into sk-&gt;sk_shutdown,
among other things.

Problem is that the awaken user thread (from 2) sk_error_report())
might call tcp_poll() before tcp_done() has written sk-&gt;sk_shutdown.

tcp_poll() only sees a non zero sk-&gt;sk_err and returns EPOLLERR.

This patch fixes the issue by making sure to call sk_error_report()
after tcp_done().

tcp_write_err() also lacks an smp_wmb().

We can reuse tcp_done_with_error() to factor out the details,
as Neal suggested.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Link: https://lore.kernel.org/r/20240528125253.1966136-3-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: avoid too many retransmit packets</title>
<updated>2024-07-18T11:21:17+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-07-10T00:14:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dfcdd7f89e401d2c6616be90c76c2fac3fa98fde'/>
<id>urn:sha1:dfcdd7f89e401d2c6616be90c76c2fac3fa98fde</id>
<content type='text'>
commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream.

If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.

The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk-&gt;icsk_user_timeout into account.

Before blamed commit, the socket would not timeout after
icsk-&gt;icsk_user_timeout, but would use standard exponential
backoff for the retransmits.

Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.

Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Jon Maxwell &lt;jmaxwell37@gmail.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()</title>
<updated>2024-07-18T11:21:17+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-06-07T12:56:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b75f281bddebdcf363884f0d53c562366e9ead99'/>
<id>urn:sha1:b75f281bddebdcf363884f0d53c562366e9ead99</id>
<content type='text'>
commit 36534d3c54537bf098224a32dc31397793d4594d upstream.

Due to timer wheel implementation, a timer will usually fire
after its schedule.

For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.

For TCP, this means that tp-&gt;rcv_tstamp may be after
inet_csk(sk)-&gt;icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.

We need to make sure tcp_rtx_probe0_timed_out() handles this case.

Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Menglong Dong &lt;imagedong@tencent.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix incorrect undo caused by DSACK of TLP retransmit</title>
<updated>2024-07-18T11:21:12+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2024-07-03T17:12:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=124886cf20599024eb33608a2c3608b7abf3839b'/>
<id>urn:sha1:124886cf20599024eb33608a2c3608b7abf3839b</id>
<content type='text'>
[ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]

Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.

For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:

+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)

The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp-&gt;undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp-&gt;tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp-&gt;tlp_high_seq value until tcp_init_undo() and clear
it only afterward.

Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").

However, this patch will only compile and work correctly with kernels
that have tp-&gt;tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.

Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Kevin Yang &lt;yyd@google.com&gt;
Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2023-08-18T19:44:56+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2023-08-18T19:44:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ff57803d24e0cb326251489d70f60462e3e6c04'/>
<id>urn:sha1:7ff57803d24e0cb326251489d70f60462e3e6c04</id>
<content type='text'>
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e194997 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88ada ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled</title>
<updated>2023-08-15T19:24:04+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2023-08-11T02:37:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e4dd0d3a2f64b8bd8029ec70f52bdbebd0644408'/>
<id>urn:sha1:e4dd0d3a2f64b8bd8029ec70f52bdbebd0644408</id>
<content type='text'>
In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.

The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:

icsk-&gt;icsk_rto = min(icsk-&gt;icsk_rto &lt;&lt; 1, TCP_RTO_MAX);

Above line could be converted to
icsk-&gt;icsk_rto = min(0 &lt;&lt; 1, TCP_RTO_MAX) = 0

Therefore, the timer expires so quickly without any doubt.

I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.

Fixes: 36e31b0af587 ("net: TCP thin linear timeouts")
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: tcp: refactor the dbg message in tcp_retransmit_timer()</title>
<updated>2023-08-13T11:21:38+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2023-08-11T02:55:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=031c44b7527aec2f22ddaae4bcd8b085ff810ec4'/>
<id>urn:sha1:031c44b7527aec2f22ddaae4bcd8b085ff810ec4</id>
<content type='text'>
The debug message in tcp_retransmit_timer() is slightly wrong, because
they could be printed even if we did not receive a new ACK packet from
the remote peer.

Change it to probing zero-window, as it is a expected case now. The
description may be not correct.

Adding the duration since the last ACK we received, and the duration of
the retransmission, which are useful for debugging.

And the message now like this:

Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 209ms ago, lasting 209ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 404ms ago, lasting 408ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 812ms ago, lasting 1224ms

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: tcp: fix unexcepted socket die when snd_wnd is 0</title>
<updated>2023-08-13T11:21:37+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2023-08-11T02:55:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e89688e3e97868451a5d05b38a9d2633d6785cd4'/>
<id>urn:sha1:e89688e3e97868451a5d05b38a9d2633d6785cd4</id>
<content type='text'>
In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp-&gt;rcv_tstamp &gt; TCP_RTO_MAX'. This is not
right all the time.

The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk-&gt;icsk_rto will come up to
TCP_RTO_MAX sooner or later.

However, the timer can be delayed and be triggered after 122877ms, not
TCP_RTO_MAX, as I tested.

Therefore, 'tcp_jiffies32 - tp-&gt;rcv_tstamp &gt; TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX, and the socket will die.

Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk-&gt;icsk_timeout',
which is exact the timestamp of the timeout.

However, "tp-&gt;rcv_tstamp" can restart from idle, then tp-&gt;rcv_tstamp
could already be a long time (minutes or hours) in the past even on the
first RTO. So we double check the timeout with the duration of the
retransmission.

Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
dying too soon.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: set TCP_LINGER2 locklessly</title>
<updated>2023-08-06T07:24:55+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-08-04T14:46:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a81722ddd7e4d76c9bbff078d29416e18c6d7f71'/>
<id>urn:sha1:a81722ddd7e4d76c9bbff078d29416e18c6d7f71</id>
<content type='text'>
tp-&gt;linger2 can be set locklessly as long as readers
use READ_ONCE().

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
