<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/core/sock_map.c, branch linux-5.11.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-12-03T02:32:47+00:00</updated>
<entry>
<title>bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps</title>
<updated>2020-12-03T02:32:47+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0d2c4f9640501ff57ba0be1f5644a02c29a02fa1'/>
<id>urn:sha1:0d2c4f9640501ff57ba0be1f5644a02c29a02fa1</id>
<content type='text'>
Do not use rlimit-based memory accounting for sockmap and sockhash maps.
It has been replaced with the memcg-based memory accounting.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-29-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Refine memcg-based memory accounting for sockmap and sockhash maps</title>
<updated>2020-12-03T02:32:46+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7846dd9f835e248901a9f77a43745f8f1de04741'/>
<id>urn:sha1:7846dd9f835e248901a9f77a43745f8f1de04741</id>
<content type='text'>
Include internal metadata into the memcg-based memory accounting.
Also include the memory allocated on updating an element.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-17-guro@fb.com
</content>
</entry>
<entry>
<title>net, sockmap: Don't call bpf_prog_put() on NULL pointer</title>
<updated>2020-10-15T19:05:23+00:00</updated>
<author>
<name>Alex Dewar</name>
<email>alex.dewar90@gmail.com</email>
</author>
<published>2020-10-12T17:09:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83c11c17553c0fca217105c17444c4ef5ab2403f'/>
<id>urn:sha1:83c11c17553c0fca217105c17444c4ef5ab2403f</id>
<content type='text'>
If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is
called unconditionally on skb_verdict, even though it may be NULL. Fix
and tidy up error path.

Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly")
Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL)
Signed-off-by: Alex Dewar &lt;alex.dewar90@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com
</content>
</entry>
<entry>
<title>bpf, sockmap: Add locking annotations to iterator</title>
<updated>2020-10-15T18:49:56+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-10-12T09:18:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f58423aeab28f861b67933206f322f764f05787d'/>
<id>urn:sha1:f58423aeab28f861b67933206f322f764f05787d</id>
<content type='text'>
The sparse checker currently outputs the following warnings:

    include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit
    include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit

Add the necessary __acquires and __release annotations to make the
iterator locking schema palatable to sparse. Also add __must_hold
for good measure.

The kernel codebase uses both __acquires(rcu) and __acquires(RCU).
I couldn't find any guidance which one is preferred, so I used
what is easier to type out.

Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com
</content>
</entry>
<entry>
<title>bpf, sockmap: Allow skipping sk_skb parser program</title>
<updated>2020-10-12T01:09:44+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-10-11T05:09:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ef5659280eb13e8ac31c296f58cfdfa1684ac06b'/>
<id>urn:sha1:ef5659280eb13e8ac31c296f58cfdfa1684ac06b</id>
<content type='text'>
Currently, we often run with a nop parser namely one that just does
this, 'return skb-&gt;len'. This happens when either our verdict program
can handle streaming data or it is only looking at socket data such
as IP addresses and other metadata associated with the flow. The second
case is common for a L3/L4 proxy for instance.

So lets allow loading programs without the parser then we can skip
the stream parser logic and avoid having to add a BPF program that
is effectively a nop.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
</content>
</entry>
<entry>
<title>bpf, sockmap: Check skb_verdict and skb_parser programs explicitly</title>
<updated>2020-10-12T01:09:44+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-10-11T05:09:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=743df8b7749fb5a289fc0c7ac94ec15533596839'/>
<id>urn:sha1:743df8b7749fb5a289fc0c7ac94ec15533596839</id>
<content type='text'>
We are about to allow skb_verdict to run without skb_parser programs
as a first step change code to check each program type specifically.
This should be a mechanical change without any impact to actual result.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower
</content>
</entry>
<entry>
<title>bpf, net: Rework cookie generator as per-cpu one</title>
<updated>2020-09-30T18:50:35+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-09-30T15:18:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92acdc58ab11af66fcaef485433fde61b5e32fac'/>
<id>urn:sha1:92acdc58ab11af66fcaef485433fde61b5e32fac</id>
<content type='text'>
With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: sockmap: Enable map_update_elem from bpf_iter</title>
<updated>2020-09-28T23:40:46+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-09-28T09:08:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6550f2dddfab02a5b948369eeeaedfbc4ae3cc16'/>
<id>urn:sha1:6550f2dddfab02a5b948369eeeaedfbc4ae3cc16</id>
<content type='text'>
Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Since we may be passed a request or timewait
socket we also need to check sk_fullsock. Doing this allows calling
map_update_elem on sockmap from bpf_iter context, which uses
BTF pointers.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com
</content>
</entry>
<entry>
<title>net: Allow iterating sockmap and sockhash</title>
<updated>2020-09-10T19:31:55+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-09-09T16:27:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0365351524d7560d8ed7a42801a15252c6c56f41'/>
<id>urn:sha1:0365351524d7560d8ed7a42801a15252c6c56f41</id>
<content type='text'>
Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
hashtable implementation. sockmap and sockhash share the same iteration
context: a pointer to an arbitrary key and a pointer to a socket. Both
pointers may be NULL, and so BPF has to perform a NULL check before accessing
them. Technically it's not possible for sockhash iteration to yield a NULL
socket, but we ignore this to be able to use a single iteration point.

Iteration will visit all keys that remain unmodified during the lifetime of
the iterator. It may or may not visit newly added ones.

Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain
another guard rail if CONFIG_PROVE_RCU is enabled.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com
</content>
</entry>
<entry>
<title>net: sockmap: Remove unnecessary sk_fullsock checks</title>
<updated>2020-09-10T19:31:55+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-09-09T16:27:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=654785a1afe1b1428106c322c7ea650e3f8d29e9'/>
<id>urn:sha1:654785a1afe1b1428106c322c7ea650e3f8d29e9</id>
<content type='text'>
The lookup paths for sockmap and sockhash currently include a check
that returns NULL if the socket we just found is not a full socket.
However, this check is not necessary. On insertion we ensure that
we have a full socket (caveat around sock_ops), so request sockets
are not a problem. Time-wait sockets are allocated separate from
the original socket and then fed into the hashdance. They don't
affect the sockets already stored in the sockmap.

Suggested-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com
</content>
</entry>
</feed>
