kernel/linux.git/drivers/net/netkit.c, branch v6.13.6

rtnetlink: fix double call of rtnl_link_get_net_ifla()

2024-12-03T10:29:29+00:00

Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() implementation. This looks safer, however, it leads to a classic Time-of-Check to Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And because of the lack of checking error pointer of the second call, it also leads to a kernel crash as reported by syzbot. Fix this by getting rid of the second call, which already becomes redudant after Kuniyuki's work. We have to propagate the result of the first rtnl_link_get_net_ifla() down to each ->newlink(). Reported-by: syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima Signed-off-by: Cong Wang Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni

netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.

2024-11-12T01:26:52+00:00

For per-netns RTNL, we need to prefetch the peer device's netns. Let's set rtnl_link_ops.peer_type and accordingly remove duplicated validation in ->newlink(). Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20241108004823.29419-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski

netkit: Simplify netkit mode over to use NLA_POLICY_MAX

2024-10-08T00:12:37+00:00

Jakub suggested to rely on netlink policy validation via NLA_POLICY_MAX() instead of open-coding it. netkit_check_mode() is a candidate which can be simplified through this as well aside from the netkit scrubbing one. Suggested-by: Jakub Kicinski Signed-off-by: Daniel Borkmann Cc: Nikolay Aleksandrov Acked-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20241004101335.117711-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau

netkit: Add option for scrubbing skb meta data

2024-10-08T00:12:37+00:00

Jordan reported that when running Cilium with netkit in per-endpoint-routes mode, network policy misclassifies traffic. In this direct routing mode of Cilium which is used in case of GKE/EKS/AKS, the Pod's BPF program to enforce policy sits on the netkit primary device's egress side. The issue here is that in case of netkit's netkit_prep_forward(), it will clear meta data such as skb->mark and skb->priority before executing the BPF program. Thus, identity data stored in there from earlier BPF programs (e.g. from tcx ingress on the physical device) gets cleared instead of being made available for the primary's program to process. While for traffic egressing the Pod via the peer device this might be desired, this is different for the primary one where compared to tcx egress on the host veth this information would be available. To address this, add a new parameter for the device orchestration to allow control of skb->mark and skb->priority scrubbing, to make the two accessible from BPF (and eventually leave it up to the program to scrub). By default, the current behavior is retained. For netkit peer this also enables the use case where applications could cooperate/signal intent to the BPF program. Note that struct netkit has a 4 byte hole between policy and bundle which is used here, in other words, struct netkit's first cacheline content used in fast-path does not get moved around. Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Reported-by: Jordan Rife Signed-off-by: Daniel Borkmann Cc: Nikolay Aleksandrov Link: https://github.com/cilium/cilium/issues/34042 Acked-by: Jakub Kicinski Acked-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20241004101335.117711-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2024-09-15T16:13:19+00:00

Merge in late fixes to prepare for the 6.12 net-next PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski

netkit: Assign missing bpf_net_context

2024-09-14T02:49:45+00:00

During the introduction of struct bpf_net_context handling for XDP-redirect, the netkit driver has been missed, which also requires it because NETKIT_REDIRECT invokes skb_do_redirect() which is accessing the per-CPU variables. Otherwise we see the following crash: BUG: kernel NULL pointer dereference, address: 0000000000000038 bpf_redirect() netkit_xmit() dev_hard_start_xmit() Set the bpf_net_context before invoking netkit_xmit() program within the netkit driver. Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Signed-off-by: Breno Leitao Acked-by: Daniel Borkmann Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Toke Høiland-Jørgensen Acked-by: Nikolay Aleksandrov Acked-by: Martin KaFai Lau Link: https://patch.msgid.link/20240912155620.1334587-1-leitao@debian.org Signed-off-by: Jakub Kicinski

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

2024-09-13T03:22:44+00:00

Daniel Borkmann says: ==================== pull-request: bpf-next 2024-09-11 We've added 12 non-merge commits during the last 16 day(s) which contain a total of 20 files changed, 228 insertions(+), 30 deletions(-). There's a minor merge conflict in drivers/net/netkit.c: 00d066a4d4ed ("netdev_features: convert NETIF_F_LLTX to dev->lltx") d96608794889 ("netkit: Disable netpoll support") The main changes are: 1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used to easily parse skbs in BPF programs attached to tracepoints, from Philo Lu. 2) Add a cond_resched() point in BPF's sock_hash_free() as there have been several syzbot soft lockup reports recently, from Eric Dumazet. 3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which got noticed when zero copy ice driver started to use it, from Maciej Fijalkowski. 4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs up via netif_receive_skb_list() to better measure latencies, from Daniel Xu. 5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann. 6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but instead gather the actual value via /proc/sys/net/core/max_skb_frags, also from Maciej Fijalkowski. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: sock_map: Add a cond_resched() in sock_hash_free() selftests/bpf: Expand skb dynptr selftests for tp_btf bpf: Allow bpf_dynptr_from_skb() for tp_btf tcp: Use skb__nullable in trace_tcp_send_reset selftests/bpf: Add test for __nullable suffix in tp_btf bpf: Support __nullable argument suffix for tp_btf bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc() tcp_bpf: Remove an unused parameter for bpf_tcp_ingress() bpf, sockmap: Correct spelling skmsg.c netkit: Disable netpoll support Signed-off-by: Jakub Kicinski ==================== Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski

netdev_features: convert NETIF_F_LLTX to dev->lltx

2024-09-03T09:36:43+00:00

NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin Signed-off-by: Paolo Abeni

netkit: Disable netpoll support

2024-08-27T18:37:45+00:00

Follow-up to 45160cebd6ac ("net: veth: Disable netpoll support") to also disable netpoll for netkit interfaces. Same conditions apply here as well. Signed-off-by: Daniel Borkmann Cc: Breno Leitao Cc: Nikolay Aleksandrov Acked-by: Nikolay Aleksandrov Reviewed-by: Breno Leitao Link: https://lore.kernel.org/r/eab2d69ba2f4c260aef62e4ff0d803e9f60c2c5d.1724414250.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau

netkit: Fix pkt_type override upon netkit pass verdict

2024-05-25T17:48:57+00:00

When running Cilium connectivity test suite with netkit in L2 mode, we found that compared to tcx a few tests were failing which pushed traffic into an L7 proxy sitting in host namespace. The problem in particular is around the invocation of eth_type_trans() in netkit. In case of tcx, this is run before the tcx ingress is triggered inside host namespace and thus if the BPF program uses the bpf_skb_change_type() helper the newly set type is retained. However, in case of netkit, the late eth_type_trans() invocation overrides the earlier decision from the BPF program which eventually leads to the test failure. Instead of eth_type_trans(), split out the relevant parts, meaning, reset of mac header and call to eth_skb_pkt_type() before the BPF program is run in order to have the same behavior as with tcx, and refactor a small helper called eth_skb_pull_mac() which is run in case it's passed up the stack where the mac header must be pulled. With this all connectivity tests pass. Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Signed-off-by: Daniel Borkmann Acked-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20240524163619.26001-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov