<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/tcp.h, branch v3.10.47</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.10.47</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.10.47'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2014-03-24T04:38:10+00:00</updated>
<entry>
<title>net-tcp: fastopen: fix high order allocations</title>
<updated>2014-03-24T04:38:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-02-20T18:09:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fe42b170afae5978dc90641f29f2b39aefaa47fa'/>
<id>urn:sha1:fe42b170afae5978dc90641f29f2b39aefaa47fa</id>
<content type='text'>
[ Upstream commit f5ddcbbb40aa0ba7fbfe22355d287603dbeeaaac ]

This patch fixes two bugs in fastopen :

1) The tcp_sendmsg(...,  @size) argument was ignored.

   Code was relying on user not fooling the kernel with iovec mismatches

2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.

Fixes: 783237e8daf13 ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Tested-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: TSO packets automatic sizing</title>
<updated>2013-11-04T12:30:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-27T12:46:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e25ba5003ee5de0ba2be56bfd54d16d4b1b028d'/>
<id>urn:sha1:5e25ba5003ee5de0ba2be56bfd54d16d4b1b028d</id>
<content type='text'>
[ Upstream commits 6d36824e730f247b602c90e8715a792003e3c5a7,
  02cf4ebd82ff0ac7254b88e466820a290ed8289a, and parts of
  7eec4174ff29cd42f2acfae8112f51c228545d40 ]

After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp-&gt;xmit_size_goal_segs

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Van Jacobson &lt;vanj@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: GSO should be TSQ friendly</title>
<updated>2013-04-12T22:17:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-04-12T11:31:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d6a4a10411764cf1c3a5dad4f06c5ebe5194488b'/>
<id>urn:sha1:d6a4a10411764cf1c3a5dad4f06c5ebe5194488b</id>
<content type='text'>
I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.

It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.

We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Remove dead sysctl_tcp_cookie_size declaration</title>
<updated>2013-04-02T18:26:50+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2013-04-02T17:13:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8466563e16d5198b6efeb3b51791b95b6aaacb6b'/>
<id>urn:sha1:8466563e16d5198b6efeb3b51791b95b6aaacb6b</id>
<content type='text'>
Remove a declaration left over from the TCPCT-ectomy. This sysctl is
no longer referenced anywhere since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: refactor F-RTO</title>
<updated>2013-03-21T15:47:50+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-03-20T13:32:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b44190dc114c1720b34975b5bfc65aece112ced'/>
<id>urn:sha1:9b44190dc114c1720b34975b5bfc65aece112ced</id>
<content type='text'>
The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Remove TCPCT</title>
<updated>2013-03-17T18:35:13+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2013-03-17T08:23:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a2c6181c4a1922021b4d7df373bba612c3e5f04'/>
<id>urn:sha1:1a2c6181c4a1922021b4d7df373bba612c3e5f04</id>
<content type='text'>
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
	average (among 7 runs) of 20845.5 Requests/Second
after:
	average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Tail loss probe (TLP)</title>
<updated>2013-03-12T12:30:34+00:00</updated>
<author>
<name>Nandita Dukkipati</name>
<email>nanditad@google.com</email>
</author>
<published>2013-03-11T10:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ba8a3b19e764b6a65e4030ab0999be50c291e6c'/>
<id>urn:sha1:6ba8a3b19e764b6a65e4030ab0999be50c291e6c</id>
<content type='text'>
This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
  -&gt; packets_out &gt; 1: schedule PTO in max(2*SRTT, 10ms).
  -&gt; packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
  -&gt; PTO = min(PTO, RTO)

Conditions for scheduling PTO:
  -&gt; Connection is in Open state.
  -&gt; Connection is either cwnd limited or no new data to send.
  -&gt; Number of probes per tail loss episode is limited to one.
  -&gt; Connection is SACK enabled.

When PTO fires:
  new_segment_exists:
    -&gt; transmit new segment.
    -&gt; packets_out++. cwnd remains same.

  no_new_packet:
    -&gt; retransmit the last segment.
       Its ACK triggers FACK or early retransmit based recovery.

ACK path:
  -&gt; rearm RTO at start of ACK processing.
  -&gt; reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
		 ==1; enables RFC5827 ER.
		 ==2; delayed ER.
		 ==3; TLP and delayed ER. [DEFAULT]
		 ==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for &lt;0.5% of the overall transmissions.

Signed-off-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: uninline tcp_prequeue()</title>
<updated>2013-03-07T21:22:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-03-06T12:58:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b2fb4f54ecd47c42413d54b4666b06cf93c05abf'/>
<id>urn:sha1:b2fb4f54ecd47c42413d54b4666b06cf93c05abf</id>
<content type='text'>
tcp_prequeue() became too big to be inlined.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: avoid wakeups for pure ACK</title>
<updated>2013-02-28T20:37:29+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-02-27T07:05:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=79ffef1fe213851f44bfccf037170a140e929f85'/>
<id>urn:sha1:79ffef1fe213851f44bfccf037170a140e929f85</id>
<content type='text'>
TCP prequeue mechanism purpose is to let incoming packets
being processed by the thread currently blocked in tcp_recvmsg(),
instead of behalf of the softirq handler, to better adapt flow
control on receiver host capacity to schedule the consumer.

But in typical request/answer workloads, we send request, then
block to receive the answer. And before the actual answer, TCP
stack receives the ACK packets acknowledging the request.

Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
is a waste of resources, as thread has to immediately sleep again
because it got no payload.

This patch avoids the extra context switches and scheduler overhead.

Before patch :

a:~# echo 0 &gt;/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
231676

 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':

     116251.501765 task-clock                #   11.369 CPUs utilized
         5,025,463 context-switches          #    0.043 M/sec
         1,074,511 CPU-migrations            #    0.009 M/sec
           216,923 page-faults               #    0.002 M/sec
   311,636,972,396 cycles                    #    2.681 GHz
   260,507,138,069 stalled-cycles-frontend   #   83.59% frontend cycles idle
   155,590,092,840 stalled-cycles-backend    #   49.93% backend  cycles idle
   100,101,255,411 instructions              #    0.32  insns per cycle
                                             #    2.60  stalled cycles per insn
    16,535,930,999 branches                  #  142.243 M/sec
       646,483,591 branch-misses             #    3.91% of all branches

      10.225482774 seconds time elapsed

After patch :

a:~# echo 0 &gt;/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
233297

 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':

      91084.870855 task-clock                #    8.887 CPUs utilized
         2,485,916 context-switches          #    0.027 M/sec
           815,520 CPU-migrations            #    0.009 M/sec
           216,932 page-faults               #    0.002 M/sec
   245,195,022,629 cycles                    #    2.692 GHz
   202,635,777,041 stalled-cycles-frontend   #   82.64% frontend cycles idle
   124,280,372,407 stalled-cycles-backend    #   50.69% backend  cycles idle
    83,457,289,618 instructions              #    0.34  insns per cycle
                                             #    2.43  stalled cycles per insn
    13,431,472,361 branches                  #  147.461 M/sec
       504,470,665 branch-misses             #    3.76% of all branches

      10.249594448 seconds time elapsed

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove Appropriate Byte Count support</title>
<updated>2013-02-05T19:51:16+00:00</updated>
<author>
<name>Stephen Hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2013-02-05T07:25:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca2eb5679f8ddffff60156af42595df44a315ef0'/>
<id>urn:sha1:ca2eb5679f8ddffff60156af42595df44a315ef0</id>
<content type='text'>
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.

Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
