<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/inet_hashtables.h, branch v6.9.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.9.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.9.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-12-22T22:15:35+00:00</updated>
<entry>
<title>tcp: Remove dead code and fields for bhash2.</title>
<updated>2023-12-22T22:15:35+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2023-12-19T00:18:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8191792c18c55ee444588100cf0464bd8c0bf5f3'/>
<id>urn:sha1:8191792c18c55ee444588100cf0464bd8c0bf5f3</id>
<content type='text'>
Now all sockets including TIME_WAIT are linked to bhash2 using
sock_common.skc_bind_node.

We no longer use inet_bind2_bucket.deathrow, sock.sk_bind2_node,
and inet_timewait_sock.tw_bind2_node.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Unlink sk from bhash.</title>
<updated>2023-12-22T22:15:35+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2023-12-19T00:18:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b2cb9f9ef240b4afaa1838b74fad077f17682f61'/>
<id>urn:sha1:b2cb9f9ef240b4afaa1838b74fad077f17682f61</id>
<content type='text'>
Now we do not use tb-&gt;owners and can unlink sockets from bhash.

sk_bind_node/tw_bind_node are available for bhash2 and will be
used in the following patch.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Link bhash2 to bhash.</title>
<updated>2023-12-22T22:15:34+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2023-12-19T00:18:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=822fb91fc724786372eed4f4397409c70afc08b8'/>
<id>urn:sha1:822fb91fc724786372eed4f4397409c70afc08b8</id>
<content type='text'>
bhash2 added a new member sk_bind2_node in struct sock to link
sockets to bhash2 in addition to bhash.

bhash is still needed to search conflicting sockets efficiently
from a port for the wildcard address.  However, bhash itself need
not have sockets.

If we link each bhash2 bucket to the corresponding bhash bucket,
we can iterate the same set of the sockets from bhash2 via bhash.

This patch links bhash2 to bhash only, and the actual use will be
in the later patches.  Finally, we will remove sk_bind2_node.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Save address type in inet_bind2_bucket.</title>
<updated>2023-12-22T22:15:34+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2023-12-19T00:18:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5a22bba13d0162d278d6e31232259ddcce196b8e'/>
<id>urn:sha1:5a22bba13d0162d278d6e31232259ddcce196b8e</id>
<content type='text'>
inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any()
are called for each bhash2 bucket to check conflicts.  Thus, we call
ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind().

Let's avoid calling them by saving the address type in inet_bind2_bucket.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr.</title>
<updated>2023-12-22T22:15:34+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2023-12-19T00:18:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=06a8c04f89949576b715f11350f1896b62cecb0a'/>
<id>urn:sha1:06a8c04f89949576b715f11350f1896b62cecb0a</id>
<content type='text'>
In bhash2, IPv4/IPv6 addresses are saved in two union members,
which complicate address checks in inet_bind2_bucket_addr_match()
and inet_bind2_bucket_match_addr_any() considering uninitialised
memory and v4-mapped-v6 conflicts.

Let's simplify that by saving IPv4 address as v4-mapped-v6 address
and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3].

Then, we can compare v6 address as is, and after checking v4-mapped-v6,
we can compare v4 address easily.  Also, we can remove tb2-&gt;family.

Note these functions will be further refactored in the next patch.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Fix slab-out-of-bounds in inet[6]_steal_sock</title>
<updated>2023-08-15T20:57:51+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@isovalent.com</email>
</author>
<published>2023-08-15T08:53:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8897562f67b3e61ad736cd5c9f307447d33280e4'/>
<id>urn:sha1:8897562f67b3e61ad736cd5c9f307447d33280e4</id>
<content type='text'>
Kumar reported a KASAN splat in tcp_v6_rcv:

  bash-5.2# ./test_progs -t btf_skc_cls_ingress
  ...
  [   51.810085] BUG: KASAN: slab-out-of-bounds in tcp_v6_rcv+0x2d7d/0x3440
  [   51.810458] Read of size 2 at addr ffff8881053f038c by task test_progs/226

The problem is that inet[6]_steal_sock accesses sk-&gt;sk_protocol without
accounting for request or timewait sockets. To fix this we can't just
check sock_common-&gt;skc_reuseport since that flag is present on timewait
sockets.

Instead, add a fullsock check to avoid the out of bands access of sk_protocol.

Fixes: 9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign")
Reported-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/r/20230815-bpf-next-v2-1-95126eaa4c1b@isovalent.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: cache line align inet_hashinfo</title>
<updated>2023-08-04T08:20:27+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-08-03T07:53:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6f5ca184cbef6d2b78772a350a3ed8be696b54a2'/>
<id>urn:sha1:6f5ca184cbef6d2b78772a350a3ed8be696b54a2</id>
<content type='text'>
I have seen tcp_hashinfo starting at a non optimal location,
forcing input handlers to pull two cache lines instead of one,
and sharing a cache line that was dirtied more than necessary:

ffffffff83680600 b tcp_orphan_timer
ffffffff83680628 b tcp_orphan_cache
ffffffff8368062c b tcp_enable_tx_delay.__tcp_tx_delay_enabled
ffffffff83680630 B tcp_hashinfo
ffffffff83680680 b tcp_cong_list_lock

After this patch, ehash, ehash_locks, ehash_mask and ehash_locks_mask
are located in a read-only cache line.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign</title>
<updated>2023-07-25T20:55:55+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@isovalent.com</email>
</author>
<published>2023-07-20T15:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c02bec95954252c3c01bfbb3f7560e0b95ca955'/>
<id>urn:sha1:9c02bec95954252c3c01bfbb3f7560e0b95ca955</id>
<content type='text'>
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb-&gt;sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e86 ("bpf: Fix use of sk-&gt;sk_reuseport from sk_assign")
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Joe Stringer &lt;joe@cilium.io&gt;
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: remove duplicate sk_lookup helpers</title>
<updated>2023-07-25T20:51:44+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@isovalent.com</email>
</author>
<published>2023-07-20T15:30:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c886db2e78ce1dee163d07240467770a235f33e'/>
<id>urn:sha1:6c886db2e78ce1dee163d07240467770a235f33e</id>
<content type='text'>
Now that inet[6]_lookup_reuseport are parameterised on the ehashfn
we can remove two sk_lookup helpers.

Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-6-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: remove duplicate reuseport_lookup functions</title>
<updated>2023-07-25T20:51:44+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@isovalent.com</email>
</author>
<published>2023-07-20T15:30:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0f495f7617229772403e683033abc473f0f0553c'/>
<id>urn:sha1:0f495f7617229772403e683033abc473f0f0553c</id>
<content type='text'>
There are currently four copies of reuseport_lookup: one each for
(TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of
those functions as well. This is already the case for sk_lookup
helpers (inet,inet6,udp4,udp6)_lookup_run_bpf.

There are two differences between the reuseport_lookup helpers:

1. They call different hash functions depending on protocol
2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED

Move the check for sk_state into the caller and use the INDIRECT_CALL
infrastructure to cut down the helpers to one per IP version.

Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-4-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
</feed>
