<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/tcp.h, branch v3.12.13</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.13</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.13'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2013-09-04T04:27:28+00:00</updated>
<entry>
<title>tcp: Change return value of tcp_rcv_established()</title>
<updated>2013-09-04T04:27:28+00:00</updated>
<author>
<name>Vijay Subramanian</name>
<email>subramanian.vijay@gmail.com</email>
</author>
<published>2013-09-03T19:23:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c995ae2259ee36caf48bbfacf40111998dacd4af'/>
<id>urn:sha1:c995ae2259ee36caf48bbfacf40111998dacd4af</id>
<content type='text'>
tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).

After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().

We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().

Signed-off-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: TSO packets automatic sizing</title>
<updated>2013-08-29T19:50:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-27T12:46:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=95bd09eb27507691520d39ee1044d6ad831c1168'/>
<id>urn:sha1:95bd09eb27507691520d39ee1044d6ad831c1168</id>
<content type='text'>
After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp-&gt;xmit_size_goal_segs

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Van Jacobson &lt;vanj@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: syncookies: export cookie_v6_init_sequence/cookie_v6_check</title>
<updated>2013-08-27T22:28:04+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2013-08-27T06:50:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81eb6a1487718a89621a0e0be7fafd0cd7c429a4'/>
<id>urn:sha1:81eb6a1487718a89621a0e0be7fafd0cd7c429a4</id>
<content type='text'>
Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Tested-by: Martin Topholm &lt;mph@one.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>net: syncookies: export cookie_v4_init_sequence/cookie_v4_check</title>
<updated>2013-08-27T22:27:44+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2013-08-27T06:50:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0198230b7705eb2386e53778d944e307eef0cc71'/>
<id>urn:sha1:0198230b7705eb2386e53778d944e307eef0cc71</id>
<content type='text'>
Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Tested-by: Martin Topholm &lt;mph@one.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>tcp: add server ip to encrypt cookie in fast open</title>
<updated>2013-08-10T07:35:33+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-08-08T21:06:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=149479d019e06df5a7f4096f95c00cfb1380309c'/>
<id>urn:sha1:149479d019e06df5a7f4096f95c00cfb1380309c</id>
<content type='text'>
Encrypt the cookie with both server and client IPv4 addresses,
such that multi-homed server will grant different cookies
based on both the source and destination IPs. No client change
is needed since cookie is opaque to the client.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Remove unused tcpct declarations and comments</title>
<updated>2013-07-31T19:16:45+00:00</updated>
<author>
<name>Dmitry Popov</name>
<email>dp@highloadlab.com</email>
</author>
<published>2013-07-31T09:39:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c0155b2da4cd6583b1b729451249ca346e1c05a2'/>
<id>urn:sha1:c0155b2da4cd6583b1b729451249ca346e1c05a2</id>
<content type='text'>
Remove declaration, 4 defines and confusing comment that are no longer used
since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Dmitry Popov &lt;dp@highloadlab.com&gt;
Acked-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: TCP_NOTSENT_LOWAT socket option</title>
<updated>2013-07-25T00:54:48+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-07-23T03:27:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36'/>
<id>urn:sha1:c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36</id>
<content type='text'>
Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp-&gt;nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
 Specify the minimum number of bytes in the buffer until
 the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 &gt;/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &amp;); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

lpq83:~# echo 131072 &gt;/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &amp;); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 &gt;/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

           412,514 context-switches

     200.034645535 seconds time elapsed

lpq83:~# echo 131072 &gt;/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

         2,675,818 context-switches

     200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-By: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: prefer packet timing to TS-ECR for RTT</title>
<updated>2013-07-23T00:53:42+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-07-22T23:20:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b08e47caf1f2034a3a5b566bbccc8b0be3961ca'/>
<id>urn:sha1:5b08e47caf1f2034a3a5b566bbccc8b0be3961ca</id>
<content type='text'>
Prefer packet timings to TS-ecr for RTT measurements when both
sources are available. That's because broken middle-boxes and remote
peer can return packets with corrupted TS ECR fields. Similarly most
congestion controls that require RTT signals favor timing-based
sources as well. Also check for bad TS ECR values to avoid RTT
blow-ups. It has happened on production Web servers.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: consolidate SYNACK RTT sampling</title>
<updated>2013-07-23T00:53:42+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-07-22T23:20:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=375fe02c91792917aa26d68a87ab110d1937f44e'/>
<id>urn:sha1:375fe02c91792917aa26d68a87ab110d1937f44e</id>
<content type='text'>
The first patch consolidates SYNACK and other RTT measurement to use a
central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
RTT measurement happens after PAWS check, potentially reducing the
impact of RTO seeding on bad TCP timestamps values.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove invalid __rcu annotation</title>
<updated>2013-06-25T09:44:05+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-06-25T08:21:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ae8639c9d6d5aba7990b4fad3506ff7b4667a45'/>
<id>urn:sha1:7ae8639c9d6d5aba7990b4fad3506ff7b4667a45</id>
<content type='text'>
struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.

It currently has a __rcu annotation, which is not needed at all.

tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.

This fixes a lot of sparse errors like the following :

net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31:    expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31:    got struct crypto_cipher [noderef] &lt;asn:4&gt;*tfm

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jerry Chu &lt;hkchu@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
