kernel/linux.git/net/core/sock_map.c, branch linux-5.11.y

bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps

2020-12-03T02:32:47+00:00

Do not use rlimit-based memory accounting for sockmap and sockhash maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin Signed-off-by: Alexei Starovoitov Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20201201215900.3569844-29-guro@fb.com

bpf: Refine memcg-based memory accounting for sockmap and sockhash maps

2020-12-03T02:32:46+00:00

Include internal metadata into the memcg-based memory accounting. Also include the memory allocated on updating an element. Signed-off-by: Roman Gushchin Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20201201215900.3569844-17-guro@fb.com

net, sockmap: Don't call bpf_prog_put() on NULL pointer

2020-10-15T19:05:23+00:00

If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is called unconditionally on skb_verdict, even though it may be NULL. Fix and tidy up error path. Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly") Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Alex Dewar Signed-off-by: Daniel Borkmann Acked-by: Jakub Sitnicki Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com

bpf, sockmap: Add locking annotations to iterator

2020-10-15T18:49:56+00:00

The sparse checker currently outputs the following warnings: include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit Add the necessary __acquires and __release annotations to make the iterator locking schema palatable to sparse. Also add __must_hold for good measure. The kernel codebase uses both __acquires(rcu) and __acquires(RCU). I couldn't find any guidance which one is preferred, so I used what is easier to type out. Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash") Reported-by: kernel test robot Signed-off-by: Lorenz Bauer Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Acked-by: Jakub Sitnicki Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com

bpf, sockmap: Allow skipping sk_skb parser program

2020-10-12T01:09:44+00:00

Currently, we often run with a nop parser namely one that just does this, 'return skb->len'. This happens when either our verdict program can handle streaming data or it is only looking at socket data such as IP addresses and other metadata associated with the flow. The second case is common for a L3/L4 proxy for instance. So lets allow loading programs without the parser then we can skip the stream parser logic and avoid having to add a BPF program that is effectively a nop. Signed-off-by: John Fastabend Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower

bpf, sockmap: Check skb_verdict and skb_parser programs explicitly

2020-10-12T01:09:44+00:00

We are about to allow skb_verdict to run without skb_parser programs as a first step change code to check each program type specifically. This should be a mechanical change without any impact to actual result. Signed-off-by: John Fastabend Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower

bpf, net: Rework cookie generator as per-cpu one

2020-09-30T18:50:35+00:00

With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov Reviewed-by: Eric Dumazet Acked-by: Martin KaFai Lau Cc: Eric Dumazet Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net

bpf: sockmap: Enable map_update_elem from bpf_iter

2020-09-28T23:40:46+00:00

Allow passing a pointer to a BTF struct sock_common* when updating a sockmap or sockhash. Since BTF pointers can fault and therefore be NULL at runtime we need to add an additional !sk check to sock_map_update_elem. Since we may be passed a request or timewait socket we also need to check sk_fullsock. Doing this allows calling map_update_elem on sockmap from bpf_iter context, which uses BTF pointers. Signed-off-by: Lorenz Bauer Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com

net: Allow iterating sockmap and sockhash

2020-09-10T19:31:55+00:00

Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and hashtable implementation. sockmap and sockhash share the same iteration context: a pointer to an arbitrary key and a pointer to a socket. Both pointers may be NULL, and so BPF has to perform a NULL check before accessing them. Technically it's not possible for sockhash iteration to yield a NULL socket, but we ignore this to be able to use a single iteration point. Iteration will visit all keys that remain unmodified during the lifetime of the iterator. It may or may not visit newly added ones. Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain another guard rail if CONFIG_PROVE_RCU is enabled. Signed-off-by: Lorenz Bauer Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com

net: sockmap: Remove unnecessary sk_fullsock checks

2020-09-10T19:31:55+00:00

The lookup paths for sockmap and sockhash currently include a check that returns NULL if the socket we just found is not a full socket. However, this check is not necessary. On insertion we ensure that we have a full socket (caveat around sock_ops), so request sockets are not a problem. Time-wait sockets are allocated separate from the original socket and then fed into the hashdance. They don't affect the sockets already stored in the sockmap. Suggested-by: Jakub Sitnicki Signed-off-by: Lorenz Bauer Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com