<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/tcp.h, branch v3.11.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.11.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.11.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2013-06-25T09:44:05+00:00</updated>
<entry>
<title>tcp: remove invalid __rcu annotation</title>
<updated>2013-06-25T09:44:05+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-06-25T08:21:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ae8639c9d6d5aba7990b4fad3506ff7b4667a45'/>
<id>urn:sha1:7ae8639c9d6d5aba7990b4fad3506ff7b4667a45</id>
<content type='text'>
struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.

It currently has a __rcu annotation, which is not needed at all.

tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.

This fixes a lot of sparse errors like the following :

net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31:    expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31:    got struct crypto_cipher [noderef] &lt;asn:4&gt;*tfm

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jerry Chu &lt;hkchu@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: properly send new data in fast recovery in first RTT</title>
<updated>2013-06-13T09:46:29+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-06-11T22:35:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=85f16525a2eb66e6092cbd8dcf42371df8334ed0'/>
<id>urn:sha1:85f16525a2eb66e6092cbd8dcf42371df8334ed0</id>
<content type='text'>
Linux sends new unset data during disorder and recovery state if all
(suspected) lost packets have been retransmitted ( RFC5681, section
3.2 step 1 &amp; 2, RFC3517 section 4, NexSeg() Rule 2).  One requirement
is to keep the receive window about twice the estimated sender's
congestion window (tcp_rcv_space_adjust()), assuming the fast
retransmits repair the losses in the next round trip.

But currently it's not the case on the first round trip in either
normal or Fast Open connection, beucase the initial receive window
is identical to (expected) sender's initial congestion window. The
fix is to double it.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: tcp: move GRO/GSO functions to tcp_offload</title>
<updated>2013-06-07T21:39:05+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2013-06-07T05:11:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=28850dc7c71da9d0c0e39246e9ff6913f41f8d0a'/>
<id>urn:sha1:28850dc7c71da9d0c0e39246e9ff6913f41f8d0a</id>
<content type='text'>
Would be good to make things explicit and move those functions to
a new file called tcp_offload.c, thus make this similar to tcpv6_offload.c.
While moving all related functions into tcp_offload.c, we can also
make some of them static, since they are only used there. Also, add
an explicit registration function.

Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: md5: remove spinlock usage in fast path</title>
<updated>2013-05-20T21:00:42+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-05-20T06:52:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=71cea17ed39fdf1c0634f530ddc6a2c2fc601c2b'/>
<id>urn:sha1:71cea17ed39fdf1c0634f530ddc6a2c2fc601c2b</id>
<content type='text'>
TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove bad timeout logic in fast recovery</title>
<updated>2013-05-20T06:51:17+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-05-17T13:45:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e59cb0ddfd2c59991f38e89352ad8a3c71b2374'/>
<id>urn:sha1:3e59cb0ddfd2c59991f38e89352ad8a3c71b2374</id>
<content type='text'>
tcp_timeout_skb() was intended to trigger fast recovery on timeout,
unfortunately in reality it often causes spurious retransmission
storms during fast recovery. The particular sign is a fast retransmit
over the highest sacked sequence (SND.FACK).

Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
to avoid spurious timeout: when SND.UNA advances the sender re-arms
RTO and extends the timeout by icsk_rto. The sender does not offset
the time elapsed since the packet at SND.UNA was sent.

But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
sent before the icsk_rto interval lost, including one that's above the
highest sacked sequence. Most likely a large part of scorebard will be
marked.

If most packets are not lost then the subsequent DUPACKs with new SACK
blocks will cause the sender to continue to retransmit packets beyond
SND.FACK spuriously. Even if only one packet is lost the sender may
falsely retransmit almost the entire window.

The situation becomes common in the world of bufferbloat: the RTT
continues to grow as the queue builds up but RTTVAR remains small and
close to the minimum 200ms. If a data packet is lost and the DUPACK
triggered by the next data packet is slightly delayed, then a spurious
retransmission storm forms.

As the original comment on tcp_timeout_skb() suggests: the usefulness
of this feature is questionable. It also wastes cycles walking the
sack scoreboard and is actually harmful because of false recovery.

It's time to remove this.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: GSO should be TSQ friendly</title>
<updated>2013-04-12T22:17:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-04-12T11:31:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d6a4a10411764cf1c3a5dad4f06c5ebe5194488b'/>
<id>urn:sha1:d6a4a10411764cf1c3a5dad4f06c5ebe5194488b</id>
<content type='text'>
I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.

It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.

We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Remove dead sysctl_tcp_cookie_size declaration</title>
<updated>2013-04-02T18:26:50+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2013-04-02T17:13:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8466563e16d5198b6efeb3b51791b95b6aaacb6b'/>
<id>urn:sha1:8466563e16d5198b6efeb3b51791b95b6aaacb6b</id>
<content type='text'>
Remove a declaration left over from the TCPCT-ectomy. This sysctl is
no longer referenced anywhere since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: refactor F-RTO</title>
<updated>2013-03-21T15:47:50+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-03-20T13:32:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b44190dc114c1720b34975b5bfc65aece112ced'/>
<id>urn:sha1:9b44190dc114c1720b34975b5bfc65aece112ced</id>
<content type='text'>
The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Remove TCPCT</title>
<updated>2013-03-17T18:35:13+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2013-03-17T08:23:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a2c6181c4a1922021b4d7df373bba612c3e5f04'/>
<id>urn:sha1:1a2c6181c4a1922021b4d7df373bba612c3e5f04</id>
<content type='text'>
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
	average (among 7 runs) of 20845.5 Requests/Second
after:
	average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Tail loss probe (TLP)</title>
<updated>2013-03-12T12:30:34+00:00</updated>
<author>
<name>Nandita Dukkipati</name>
<email>nanditad@google.com</email>
</author>
<published>2013-03-11T10:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ba8a3b19e764b6a65e4030ab0999be50c291e6c'/>
<id>urn:sha1:6ba8a3b19e764b6a65e4030ab0999be50c291e6c</id>
<content type='text'>
This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
  -&gt; packets_out &gt; 1: schedule PTO in max(2*SRTT, 10ms).
  -&gt; packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
  -&gt; PTO = min(PTO, RTO)

Conditions for scheduling PTO:
  -&gt; Connection is in Open state.
  -&gt; Connection is either cwnd limited or no new data to send.
  -&gt; Number of probes per tail loss episode is limited to one.
  -&gt; Connection is SACK enabled.

When PTO fires:
  new_segment_exists:
    -&gt; transmit new segment.
    -&gt; packets_out++. cwnd remains same.

  no_new_packet:
    -&gt; retransmit the last segment.
       Its ACK triggers FACK or early retransmit based recovery.

ACK path:
  -&gt; rearm RTO at start of ACK processing.
  -&gt; reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
		 ==1; enables RFC5827 ER.
		 ==2; delayed ER.
		 ==3; TLP and delayed ER. [DEFAULT]
		 ==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for &lt;0.5% of the overall transmissions.

Signed-off-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
