<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ipv4/tcp_input.c, branch v6.6.134</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.134</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.134'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:05:43+00:00</updated>
<entry>
<title>net: tcp: accept old ack during closing</title>
<updated>2026-03-25T10:05:43+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>menglong8.dong@gmail.com</email>
</author>
<published>2024-01-26T04:05:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7f4d6a5d3429d3547b904c90efb14c297318d596'/>
<id>urn:sha1:7f4d6a5d3429d3547b904c90efb14c297318d596</id>
<content type='text'>
commit 795a7dfbc3d95e4c7c09569f319f026f8c7f5a9c upstream.

For now, the packet with an old ack is not accepted if we are in
FIN_WAIT1 state, which can cause retransmission. Taking the following
case as an example:

    Client                               Server
      |                                    |
  FIN_WAIT1(Send FIN, seq=10)          FIN_WAIT1(Send FIN, seq=20, ack=10)
      |                                    |
      |                                Send ACK(seq=21, ack=11)
   Recv ACK(seq=21, ack=11)
      |
   Recv FIN(seq=20, ack=10)

In the case above, simultaneous close is happening, and the FIN and ACK
packet that send from the server is out of order. Then, the FIN will be
dropped by the client, as it has an old ack. Then, the server has to
retransmit the FIN, which can cause delay if the server has set the
SO_LINGER on the socket.

Old ack is accepted in the ESTABLISHED and TIME_WAIT state, and I think
it should be better to keep the same logic.

In this commit, we accept old ack in FIN_WAIT1/FIN_WAIT2/CLOSING/LAST_ACK
states. Maybe we should limit it to FIN_WAIT1 for now?

Signed-off-by: Menglong Dong &lt;menglong8.dong@gmail.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20240126040519.1846345-1-menglong8.dong@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: defer regular ACK while processing socket backlog</title>
<updated>2026-03-04T12:20:22+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-09-11T17:05:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b4d5e97679bc786687879710dc2de280c70ca889'/>
<id>urn:sha1:b4d5e97679bc786687879710dc2de280c70ca889</id>
<content type='text'>
[ Upstream commit 133c4c0d37175f510a10fa9bed51e223936073fc ]

This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -&gt; 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Dave Taht &lt;dave.taht@gmail.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Stable-dep-of: 87b08913a9ae ("inet: move icmp_global_{credit,stamp} to a separate cache line")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Don't call reqsk_fastopen_remove() in tcp_conn_request().</title>
<updated>2025-10-19T14:30:43+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-10-01T23:37:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=422c1c173c39bbbae1e0eaaf8aefe40b2596233b'/>
<id>urn:sha1:422c1c173c39bbbae1e0eaaf8aefe40b2596233b</id>
<content type='text'>
[ Upstream commit 2e7cbbbe3d61c63606994b7ff73c72537afe2e1c ]

syzbot reported the splat below in tcp_conn_request(). [0]

If a listener is close()d while a TFO socket is being processed in
tcp_conn_request(), inet_csk_reqsk_queue_add() does not set reqsk-&gt;sk
and calls inet_child_forget(), which calls tcp_disconnect() for the
TFO socket.

After the cited commit, tcp_disconnect() calls reqsk_fastopen_remove(),
where reqsk_put() is called due to !reqsk-&gt;sk.

Then, reqsk_fastopen_remove() in tcp_conn_request() decrements the
last req-&gt;rsk_refcnt and frees reqsk, and __reqsk_free() at the
drop_and_free label causes the refcount underflow for the listener
and double-free of the reqsk.

Let's remove reqsk_fastopen_remove() in tcp_conn_request().

Note that other callers make sure tp-&gt;fastopen_rsk is not NULL.

[0]:
refcount_t: underflow; use-after-free.
WARNING: CPU: 12 PID: 5563 at lib/refcount.c:28 refcount_warn_saturate (lib/refcount.c:28)
Modules linked in:
CPU: 12 UID: 0 PID: 5563 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:refcount_warn_saturate (lib/refcount.c:28)
Code: ab e8 8e b4 98 ff 0f 0b c3 cc cc cc cc cc 80 3d a4 e4 d6 01 00 75 9c c6 05 9b e4 d6 01 01 48 c7 c7 e8 df fb ab e8 6a b4 98 ff &lt;0f&gt; 0b e9 03 5b 76 00 cc 80 3d 7d e4 d6 01 00 0f 85 74 ff ff ff c6
RSP: 0018:ffffa79fc0304a98 EFLAGS: 00010246
RAX: d83af4db1c6b3900 RBX: ffff9f65c7a69020 RCX: d83af4db1c6b3900
RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: ffffffffac78a280
RBP: 000000009d781b60 R08: 0000000000007fff R09: ffffffffac6ca280
R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9f65c7b4f100
R13: ffff9f65c7d23c00 R14: ffff9f65c7d26000 R15: ffff9f65c7a64ef8
FS:  00007f9f962176c0(0000) GS:ffff9f65fcf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000180 CR3: 000000000dbbe006 CR4: 0000000000372ef0
Call Trace:
 &lt;IRQ&gt;
 tcp_conn_request (./include/linux/refcount.h:400 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/sock.h:1965 ./include/net/request_sock.h:131 net/ipv4/tcp_input.c:7301)
 tcp_rcv_state_process (net/ipv4/tcp_input.c:6708)
 tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1670)
 tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1906)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438)
 ip6_input (net/ipv6/ip6_input.c:500)
 ipv6_rcv (net/ipv6/ip6_input.c:311)
 __netif_receive_skb (net/core/dev.c:6104)
 process_backlog (net/core/dev.c:6456)
 __napi_poll (net/core/dev.c:7506)
 net_rx_action (net/core/dev.c:7569 net/core/dev.c:7696)
 handle_softirqs (kernel/softirq.c:579)
 do_softirq (kernel/softirq.c:480)
 &lt;/IRQ&gt;

Fixes: 45c8a6cc2bcd ("tcp: Clear tcp_sk(sk)-&gt;fastopen_rsk in tcp_disconnect().")
Reported-by: syzkaller &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20251001233755.1340927-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: call tcp_measure_rcv_mss() for ooo packets</title>
<updated>2025-08-15T10:08:47+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-07-11T11:40:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20a1e6536424441f4c244fec2182721e13802e3c'/>
<id>urn:sha1:20a1e6536424441f4c244fec2182721e13802e3c</id>
<content type='text'>
[ Upstream commit 38d7e444336567bae1c7b21fc18b7ceaaa5643a0 ]

tcp_measure_rcv_mss() is used to update icsk-&gt;icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp-&gt;scaling_ratio.

Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk-&gt;sk_rcvbuf, in the case a new flow receives many ooo
packets.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK range</title>
<updated>2025-08-15T10:08:46+00:00</updated>
<author>
<name>xin.guo</name>
<email>guoxin0309@gmail.com</email>
</author>
<published>2025-06-26T12:34:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c45a33903e2484a11234a7ab47593615ed4a1be7'/>
<id>urn:sha1:c45a33903e2484a11234a7ab47593615ed4a1be7</id>
<content type='text'>
[ Upstream commit a041f70e573e185d5d5fdbba53f0db2fbe7257ad ]

If the new coming segment covers more than one skbs in the ofo queue,
and which seq is equal to rcv_nxt, then the sequence range
that is duplicated will be sent as DUP SACK, the detail as below,
in step6, the {501,2001} range is clearly including too much
DUP SACK range, in violation of RFC 2883 rules.

1. client &gt; server: Flags [.], seq 501:1001, ack 1325288529, win 20000, length 500
2. server &gt; client: Flags [.], ack 1, [nop,nop,sack 1 {501:1001}], length 0
3. client &gt; server: Flags [.], seq 1501:2001, ack 1325288529, win 20000, length 500
4. server &gt; client: Flags [.], ack 1, [nop,nop,sack 2 {1501:2001} {501:1001}], length 0
5. client &gt; server: Flags [.], seq 1:2001, ack 1325288529, win 20000, length 2000
6. server &gt; client: Flags [.], ack 2001, [nop,nop,sack 1 {501:2001}], length 0

After this fix, the final ACK is as below:

6. server &gt; client: Flags [.], ack 2001, options [nop,nop,sack 1 {501:1001}], length 0

[edumazet] added a new packetdrill test in the following patch.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: xin.guo &lt;guoxin0309@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250626123420.1933835-2-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix tcp_packet_delayed() for tcp_is_non_sack_preventing_reopen() behavior</title>
<updated>2025-06-27T10:09:02+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2025-06-13T19:30:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d0ddfb574a2255ad84c036b15d84e1e6c06ba0e'/>
<id>urn:sha1:9d0ddfb574a2255ad84c036b15d84e1e6c06ba0e</id>
<content type='text'>
[ Upstream commit d0fa59897e049e84432600e86df82aab3dce7aa5 ]

After the following commit from 2024:

commit e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent")

...there was buggy behavior where TCP connections without SACK support
could easily see erroneous undo events at the end of fast recovery or
RTO recovery episodes. The erroneous undo events could cause those
connections to suffer repeated loss recovery episodes and high
retransmit rates.

The problem was an interaction between the non-SACK behavior on these
connections and the undo logic. The problem is that, for non-SACK
connections at the end of a loss recovery episode, if snd_una ==
high_seq, then tcp_is_non_sack_preventing_reopen() holds steady in
CA_Recovery or CA_Loss, but clears tp-&gt;retrans_stamp to 0. Then upon
the next ACK the "tcp: fix to allow timestamp undo if no retransmits
were sent" logic saw the tp-&gt;retrans_stamp at 0 and erroneously
concluded that no data was retransmitted, and erroneously performed an
undo of the cwnd reduction, restoring cwnd immediately to the value it
had before loss recovery.  This caused an immediate burst of traffic
and build-up of queues and likely another immediate loss recovery
episode.

This commit fixes tcp_packet_delayed() to ignore zero retrans_stamp
values for non-SACK connections when snd_una is at or above high_seq,
because tcp_is_non_sack_preventing_reopen() clears retrans_stamp in
this case, so it's not a valid signal that we can undo.

Note that the commit named in the Fixes footer restored long-present
behavior from roughly 2005-2019, so apparently this bug was present
for a while during that era, and this was simply not caught.

Fixes: e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent")
Reported-by: Eric Wheeler &lt;netdev@lists.ewheeler.net&gt;
Closes: https://lore.kernel.org/netdev/64ea9333-e7f9-0df-b0f2-8d566143acab@ewheeler.net/
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Co-developed-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix initial tp-&gt;rcvq_space.space value for passive TS enabled flows</title>
<updated>2025-06-27T10:08:54+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-05-13T19:39:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c1025a54289bf9de1a587e2a258fe33c507965e9'/>
<id>urn:sha1:c1025a54289bf9de1a587e2a258fe33c507965e9</id>
<content type='text'>
[ Upstream commit cd171461b90a2d2cf230943df60d580174633718 ]

tcp_rcv_state_process() must tweak tp-&gt;advmss for TS enabled flows
before the call to tcp_init_transfer() / tcp_init_buffer_space().

Otherwise tp-&gt;rcvq_space.space is off by 120 bytes
(TCP_INIT_CWND * TCPOLEN_TSTAMP_ALIGNED).

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Wei Wang &lt;weiwan@google.com&gt;
Link: https://patch.msgid.link/20250513193919.1089692-7-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: always seek for minimal rtt in tcp_rcv_rtt_update()</title>
<updated>2025-06-27T10:08:54+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-05-13T19:39:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=156b6bac76450f61b1a409eaaa4c2ac3a3369eea'/>
<id>urn:sha1:156b6bac76450f61b1a409eaaa4c2ac3a3369eea</id>
<content type='text'>
[ Upstream commit b879dcb1aeeca278eacaac0b1e2425b1c7599f9f ]

tcp_rcv_rtt_update() goal is to maintain an estimation of the RTT
in tp-&gt;rcv_rtt_est.rtt_us, used by tcp_rcv_space_adjust()

When TCP TS are enabled, tcp_rcv_rtt_update() is using
EWMA to smooth the samples.

Change this to immediately latch the incoming value if it
is lower than tp-&gt;rcv_rtt_est.rtt_us, so that tcp_rcv_space_adjust()
does not overshoot tp-&gt;rcvq_space.space and sk-&gt;sk_rcvbuf.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250513193919.1089692-8-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()</title>
<updated>2025-06-04T12:41:58+00:00</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ij@kernel.org</email>
</author>
<published>2025-03-05T22:38:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a7b7bc510e9eaa5d5833b4f4012a02c8c28e321b'/>
<id>urn:sha1:a7b7bc510e9eaa5d5833b4f4012a02c8c28e321b</id>
<content type='text'>
[ Upstream commit 149dfb31615e22271d2525f078c95ea49bc4db24 ]

- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
  out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
  the compiler to decide

Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().

Signed-off-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Signed-off-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: drop secpath at the same time as we currently drop dst</title>
<updated>2025-02-27T12:10:49+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2025-02-17T10:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1d5e6a5e468308af7759cf5276779d3155c5e98'/>
<id>urn:sha1:f1d5e6a5e468308af7759cf5276779d3155c5e98</id>
<content type='text'>
[ Upstream commit 9b6412e6979f6f9e0632075f8f008937b5cd4efd ]

Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while
running tests that boil down to:
 - create a pair of netns
 - run a basic TCP test over ipcomp6
 - delete the pair of netns

The xfrm_state found on spi_byaddr was not deleted at the time we
delete the netns, because we still have a reference on it. This
lingering reference comes from a secpath (which holds a ref on the
xfrm_state), which is still attached to an skb. This skb is not
leaked, it ends up on sk_receive_queue and then gets defer-free'd by
skb_attempt_defer_free.

The problem happens when we defer freeing an skb (push it on one CPU's
defer_list), and don't flush that list before the netns is deleted. In
that case, we still have a reference on the xfrm_state that we don't
expect at this point.

We already drop the skb's dst in the TCP receive path when it's no
longer needed, so let's also drop the secpath. At this point,
tcp_filter has already called into the LSM hooks that may require the
secpath, so it should not be needed anymore. However, in some of those
places, the MPTCP extension has just been attached to the skb, so we
cannot simply drop all extensions.

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Reported-by: Xiumei Mu &lt;xmu@redhat.com&gt;
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
