From f318903c0bf42448b4c884732df2bbb0ef7a2284 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 27 Mar 2020 16:58:52 +0100 Subject: bpf: Add netns cookie and enable it for bpf cgroup hooks In Cilium we're mainly using BPF cgroup hooks today in order to implement kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*), ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic between Cilium managed nodes. While this works in its current shape and avoids packet-level NAT for inter Cilium managed node traffic, there is one major limitation we're facing today, that is, lack of netns awareness. In Kubernetes, the concept of Pods (which hold one or multiple containers) has been built around network namespaces, so while we can use the global scope of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing NodePort ports on loopback addresses), we also have the need to differentiate between initial network namespaces and non-initial one. For example, ExternalIP services mandate that non-local service IPs are not to be translated from the host (initial) network namespace as one example. Right now, we have an ugly work-around in place where non-local service IPs for ExternalIP services are not xlated from connect() and friends BPF hooks but instead via less efficient packet-level NAT on the veth tc ingress hook for Pod traffic. On top of determining whether we're in initial or non-initial network namespace we also have a need for a socket-cookie like mechanism for network namespaces scope. Socket cookies have the nice property that they can be combined as part of the key structure e.g. for BPF LRU maps without having to worry that the cookie could be recycled. We are planning to use this for our sessionAffinity implementation for services. Therefore, add a new bpf_get_netns_cookie() helper which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would provide the cookie for the initial network namespace while passing the context instead of NULL would provide the cookie from the application's network namespace. We're using a hole, so no size increase; the assignment happens only once. Therefore this allows for a comparison on initial namespace as well as regular cookie usage as we have today with socket cookies. We could later on enable this helper for other program types as well as we would see need. (*) Both externalTrafficPolicy={Local|Cluster} types [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net --- include/net/net_namespace.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/net/net_namespace.h') diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 854d39ef1ca3..1c6edfdb9a2c 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -168,6 +168,9 @@ struct net { #ifdef CONFIG_XFRM struct netns_xfrm xfrm; #endif + + atomic64_t net_cookie; /* written once */ + #if IS_ENABLED(CONFIG_IP_VS) struct netns_ipvs *ipvs; #endif @@ -273,6 +276,8 @@ static inline int check_net(const struct net *net) void net_drop_ns(void *); +u64 net_gen_cookie(struct net *net); + #else static inline struct net *get_net(struct net *net) @@ -300,6 +305,11 @@ static inline int check_net(const struct net *net) return 1; } +static inline u64 net_gen_cookie(struct net *net) +{ + return 0; +} + #define net_drop_ns NULL #endif -- cgit v1.2.3 From 5a95cbb80ef8d8f2db29ab10777cd4742e6fc8ec Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Sun, 29 Mar 2020 21:59:02 +0200 Subject: bpf, net: Fix build issue when net ns not configured Fix a redefinition of 'net_gen_cookie' error that was overlooked when net ns is not configured. Fixes: f318903c0bf4 ("bpf: Add netns cookie and enable it for bpf cgroup hooks") Reported-by: kbuild test robot Signed-off-by: Daniel Borkmann --- include/net/net_namespace.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) (limited to 'include/net/net_namespace.h') diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 1c6edfdb9a2c..ab96fb59131c 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -228,6 +228,8 @@ extern struct list_head net_namespace_list; struct net *get_net_ns_by_pid(pid_t pid); struct net *get_net_ns_by_fd(int fd); +u64 net_gen_cookie(struct net *net); + #ifdef CONFIG_SYSCTL void ipx_register_sysctl(void); void ipx_unregister_sysctl(void); @@ -276,8 +278,6 @@ static inline int check_net(const struct net *net) void net_drop_ns(void *); -u64 net_gen_cookie(struct net *net); - #else static inline struct net *get_net(struct net *net) @@ -305,11 +305,6 @@ static inline int check_net(const struct net *net) return 1; } -static inline u64 net_gen_cookie(struct net *net) -{ - return 0; -} - #define net_drop_ns NULL #endif -- cgit v1.2.3