<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/inet_hashtables.h, branch v4.9.223</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.9.223</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.9.223'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-01-04T12:41:12+00:00</updated>
<entry>
<title>tcp/dccp: fix possible race __inet_lookup_established()</title>
<updated>2020-01-04T12:41:12+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-12-14T02:20:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=792365bfcf3c753a792a4150314d24dd6a7721ea'/>
<id>urn:sha1:792365bfcf3c753a792a4150314d24dd6a7721ea</id>
<content type='text'>
commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 upstream.

Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().

Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.

They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.

Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Reported-by: Firo Yang &lt;firo.yang@suse.com&gt;
Reviewed-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
[stable-4.9: we also need to update code in __inet_lookup_listener() and
 inet6_lookup_listener() which has been removed in 5.0-rc1.]
Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>tcp/dccp: do not touch listener sk_refcnt under synflood</title>
<updated>2016-04-05T02:11:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-01T15:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b24d854cb35383c30642116e5992fd619bdc9bc'/>
<id>urn:sha1:3b24d854cb35383c30642116e5992fd619bdc9bc</id>
<content type='text'>
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
cpus contend on sk-&gt;sk_refcnt and sk-&gt;sk_wmem_alloc changes.

By letting listeners use SOCK_RCU_FREE infrastructure,
we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
only listeners are impacted by this change.

Peak performance under SYNFLOOD is increased by ~33% :

On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

Most consuming functions are now skb_set_owner_w() and sock_wfree()
contending on sk-&gt;sk_wmem_alloc when cooking SYNACK and freeing them.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: remove BH disable/enable in lookup</title>
<updated>2016-04-05T02:11:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-01T15:52:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee3cf32a4a5e6cf5ccc0f0de9865fda3ebc46436'/>
<id>urn:sha1:ee3cf32a4a5e6cf5ccc0f0de9865fda3ebc46436</id>
<content type='text'>
Since linux 2.6.29, lookups only use rcu locking.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: fast reuseport TCP socket selection</title>
<updated>2016-02-11T08:54:15+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c125e80b88687b25b321795457309eaaee4bf270'/>
<id>urn:sha1:c125e80b88687b25b321795457309eaaee4bf270</id>
<content type='text'>
This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: refactor inet[6]_lookup functions to take skb</title>
<updated>2016-02-11T08:54:14+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a583636a83ea383fd07517e5a7a2eedbc5d90fb1'/>
<id>urn:sha1:a583636a83ea383fd07517e5a7a2eedbc5d90fb1</id>
<content type='text'>
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups.  Doing so with a BPF filter will require access to the
skb in question.  This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: struct proto hash function may error</title>
<updated>2016-02-11T08:54:14+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=086c653f5862591a9cfe2386f5650d03adacc33a'/>
<id>urn:sha1:086c653f5862591a9cfe2386f5650d03adacc33a</id>
<content type='text'>
In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: fix hashdance race for passive sessions</title>
<updated>2015-10-23T12:42:21+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-10-22T15:20:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e0724d027f0548511a2165a209572d48fe7a4c8'/>
<id>urn:sha1:5e0724d027f0548511a2165a209572d48fe7a4c8</id>
<content type='text'>
Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.

Only one must win the race, otherwise corruption would occur.

To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.

If request socket was not found, we have to undo the child creation.

This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: install syn_recv requests into ehash table</title>
<updated>2015-10-03T11:32:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-10-02T18:43:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=079096f103faca2dd87342cca6f23d4b34da8871'/>
<id>urn:sha1:079096f103faca2dd87342cca6f23d4b34da8871</id>
<content type='text'>
In this patch, we insert request sockets into TCP/DCCP
regular ehash table (where ESTABLISHED and TIMEWAIT sockets
are) instead of using the per listener hash table.

ACK packets find SYN_RECV pseudo sockets without having
to find and lock the listener.

In nominal conditions, this halves pressure on listener lock.

Note that this will allow for SO_REUSEPORT refinements,
so that we can select a listener using cpu/numa affinities instead
of the prior 'consistent hash', since only SYN packets will
apply this selection logic.

We will shrink listen_sock in the following patch to ease
code review.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ying Cai &lt;ycai@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: constify __inet_inherit_port() sock argument</title>
<updated>2015-09-29T23:53:08+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-09-29T14:42:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ce31c9e08997ea0fa62be0a7437f868be173f13'/>
<id>urn:sha1:1ce31c9e08997ea0fa62be0a7437f868be173f13</id>
<content type='text'>
socket is not touched, make it const.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: simplify timewait refcounting</title>
<updated>2015-07-09T22:12:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-07-08T21:28:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fc01538f9fb75572c969ca9988176ffc2a8741d6'/>
<id>urn:sha1:fc01538f9fb75572c969ca9988176ffc2a8741d6</id>
<content type='text'>
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()

In particular, deferred inet_twsk_put() added in commit
13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
