summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)AuthorFilesLines
2024-03-01Fix write to cloned skb in ipv6_hop_ioam()Justin Iurman1-0/+10
[ Upstream commit f198d933c2e4f8f89e0620fbaf1ea7eac384a0eb ] ioam6_fill_trace_data() writes inside the skb payload without ensuring it's writeable (e.g., not cloned). This function is called both from the input and output path. The output path (ioam6_iptunnel) already does the check. This commit provides a fix for the input path, inside ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network header pointer ("nh") when returning from ipv6_hop_ioam(). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ipv6: sr: fix possible use-after-free and null-ptr-derefVasiliy Kovalev1-9/+11
[ Upstream commit 5559cea2d5aa3018a5f00dd2aca3427ba09b386b ] The pernet operations structure for the subsystem must be registered before registering the generic netlink family. Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ipv6: properly combine dev_base_seq and ipv6.dev_addr_genidEric Dumazet1-3/+18
[ Upstream commit e898e4cd1aab271ca414f9ac6e08e4c761f6913c ] net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing. If we XOR their values, we could miss to detect if both values were changed with the same amount. Fixes: 63998ac24f83 ("ipv6: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01bpf: Derive source IP addr via bpf_*_fib_lookup()Martynas Pumputis1-0/+1
commit dab4e1f06cabb6834de14264394ccab197007302 upstream. Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-05ipv6: Ensure natural alignment of const ipv6 loopback and router addressesHelge Deller1-7/+14
[ Upstream commit 60365049ccbacd101654a66ddcb299abfabd4fc5 ] On a parisc64 kernel I sometimes notice this kernel warning: Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8 The address 0x40ff8814 points to the in6addr_linklocal_allrouters variable and the warning simply means that some ipv6 function tries to read a 64-bit word directly from the not-64-bit aligned in6addr_linklocal_allrouters variable. Unaligned accesses are non-critical as the architecture or exception handlers usually will fix it up at runtime. Nevertheless it may trigger a performance penality for some architectures. For details read the "unaligned-memory-access" kernel documentation. The patch below ensures that the ipv6 loopback and router addresses will always be naturally aligned. This prevents the unaligned accesses for all architectures. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 034dfc5df99eb ("ipv6: export in6addr_loopback to modules") Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/ZbNuFM1bFqoH-UoY@p100 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()Eric Dumazet1-3/+18
[ Upstream commit 8d975c15c0cd744000ca386247432d57b21f9df0 ] syzbot found __ip6_tnl_rcv() could access unitiliazed data [1]. Call pskb_inet_may_pull() to fix this, and initialize ipv6h variable after this call as it can change skb->head. [1] BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] BUG: KMSAN: uninit-value in IP6_ECN_decapsulate+0x7df/0x1e50 include/net/inet_ecn.h:321 __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] IP6_ECN_decapsulate+0x7df/0x1e50 include/net/inet_ecn.h:321 ip6ip6_dscp_ecn_decapsulate+0x178/0x1b0 net/ipv6/ip6_tunnel.c:727 __ip6_tnl_rcv+0xd4e/0x1590 net/ipv6/ip6_tunnel.c:845 ip6_tnl_rcv+0xce/0x100 net/ipv6/ip6_tunnel.c:888 gre_rcv+0x143f/0x1870 ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:314 [inline] ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5532 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5646 netif_receive_skb_internal net/core/dev.c:5732 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5791 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2084 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0x786/0x1200 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560 __alloc_skb+0x318/0x740 net/core/skbuff.c:651 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6334 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2787 tun_alloc_skb drivers/net/tun.c:1531 [inline] tun_get_user+0x1e8a/0x66d0 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2084 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0x786/0x1200 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b CPU: 0 PID: 5034 Comm: syz-executor331 Not tainted 6.7.0-syzkaller-00562-g9f8413c4a66f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240125170557.2663942-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01bpf: Propagate modified uaddrlen from cgroup sockaddr programsDaan De Meyer4-8/+11
[ Upstream commit fefba7d1ae198dcbf8b3b432de46a4e29f8dbd8c ] As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents. __cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Stable-dep-of: c5114710c8ce ("xsk: fix usage of multi-buffer BPF helpers for ZC XDP") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01ipv6: init the accept_queue's spinlocks in inet6_createZhengchao Shao1-0/+3
[ Upstream commit 435e202d645c197dcfd39d7372eb2a56529b6640 ] In commit 198bc90e0e73("tcp: make sure init the accept_queue's spinlocks once"), the spinlocks of accept_queue are initialized only when socket is created in the inet4 scenario. The locks are not initialized when socket is created in the inet6 scenario. The kernel reports the following error: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107) register_lock_class (kernel/locking/lockdep.c:1289) __lock_acquire (kernel/locking/lockdep.c:5015) lock_acquire.part.0 (kernel/locking/lockdep.c:5756) _raw_spin_lock_bh (kernel/locking/spinlock.c:178) inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386) tcp_disconnect (net/ipv4/tcp.c:2981) inet_shutdown (net/ipv4/af_inet.c:935) __sys_shutdown (./include/linux/file.h:32 net/socket.c:2438) __x64_sys_shutdown (net/socket.c:2445) do_syscall_64 (arch/x86/entry/common.c:52) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f52ecd05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030 RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640 R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0 Fixes: 198bc90e0e73 ("tcp: make sure init the accept_queue's spinlocks once") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_workNikita Zhandarovich1-0/+4
[ Upstream commit 2e7ef287f07c74985f1bf2858bedc62bd9ebf155 ] idev->mc_ifc_count can be written over without proper locking. Originally found by syzbot [1], fix this issue by encapsulating calls to mld_ifc_stop_work() (and mld_gq_stop_work() for good measure) with mutex_lock() and mutex_unlock() accordingly as these functions should only be called with mc_lock per their declarations. [1] BUG: KCSAN: data-race in ipv6_mc_down / mld_ifc_work write to 0xffff88813a80c832 of 1 bytes by task 3771 on cpu 0: mld_ifc_stop_work net/ipv6/mcast.c:1080 [inline] ipv6_mc_down+0x10a/0x280 net/ipv6/mcast.c:2725 addrconf_ifdown+0xe32/0xf10 net/ipv6/addrconf.c:3949 addrconf_notify+0x310/0x980 notifier_call_chain kernel/notifier.c:93 [inline] raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461 __dev_notify_flags+0x205/0x3d0 dev_change_flags+0xab/0xd0 net/core/dev.c:8685 do_setlink+0x9f6/0x2430 net/core/rtnetlink.c:2916 rtnl_group_changelink net/core/rtnetlink.c:3458 [inline] __rtnl_newlink net/core/rtnetlink.c:3717 [inline] rtnl_newlink+0xbb3/0x1670 net/core/rtnetlink.c:3754 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6558 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2545 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6576 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x589/0x650 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x66e/0x770 net/netlink/af_netlink.c:1910 ... write to 0xffff88813a80c832 of 1 bytes by task 22 on cpu 1: mld_ifc_work+0x54c/0x7b0 net/ipv6/mcast.c:2653 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2700 worker_thread+0x525/0x730 kernel/workqueue.c:2781 ... Fixes: 2d9a93b4902b ("mld: convert from timer to delayed work") Reported-by: syzbot+a9400cabb1d784e49abf@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000994e09060ebcdffb@google.com/ Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Acked-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240117172102.12001-1-n.zhandarovich@fintech.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26netfilter: bridge: replace physindev with physinif in nf_bridge_infoPavel Tikhomirov1-3/+8
[ Upstream commit 9874808878d9eed407e3977fd11fee49de1e1d86 ] An skb can be added to a neigh->arp_queue while waiting for an arp reply. Where original skb's skb->dev can be different to neigh's neigh->dev. For instance in case of bridging dnated skb from one veth to another, the skb would be added to a neigh->arp_queue of the bridge. As skb->dev can be reset back to nf_bridge->physindev and used, and as there is no explicit mechanism that prevents this physindev from been freed under us (for instance neigh_flush_dev doesn't cleanup skbs from different device's neigh queue) we can crash on e.g. this stack: arp_process neigh_update skb = __skb_dequeue(&neigh->arp_queue) neigh_resolve_output(..., skb) ... br_nf_dev_xmit br_nf_pre_routing_finish_bridge_slow skb->dev = nf_bridge->physindev br_handle_frame_finish Let's use plain ifindex instead of net_device link. To peek into the original net_device we will use dev_get_by_index_rcu(). Thus either we get device and are safe to use it or we don't get it and drop skb. Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26netfilter: propagate net to nf_bridge_get_physindevPavel Tikhomirov1-1/+1
[ Upstream commit a54e72197037d2c9bfcd70dddaac8c8ccb5b41ba ] This is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26udp: annotate data-races around up->pendingEric Dumazet1-8/+8
[ Upstream commit 482521d8e0c6520429478aa6866cd44128b33d5d ] up->pending can be read without holding the socket lock, as pointed out by syzbot [1] Add READ_ONCE() in lockless contexts, and WRITE_ONCE() on write side. [1] BUG: KCSAN: data-race in udpv6_sendmsg / udpv6_sendmsg write to 0xffff88814e5eadf0 of 4 bytes by task 15547 on cpu 1: udpv6_sendmsg+0x1405/0x1530 net/ipv6/udp.c:1596 inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x257/0x310 net/socket.c:2192 __do_sys_sendto net/socket.c:2204 [inline] __se_sys_sendto net/socket.c:2200 [inline] __x64_sys_sendto+0x78/0x90 net/socket.c:2200 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b read to 0xffff88814e5eadf0 of 4 bytes by task 15551 on cpu 0: udpv6_sendmsg+0x22c/0x1530 net/ipv6/udp.c:1373 inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2586 ___sys_sendmsg net/socket.c:2640 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2726 __do_sys_sendmmsg net/socket.c:2755 [inline] __se_sys_sendmmsg net/socket.c:2752 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2752 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b value changed: 0x00000000 -> 0x0000000a Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 15551 Comm: syz-executor.1 Tainted: G W 6.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+8d482d0e407f665d9d10@syzkaller.appspotmail.com Link: https://lore.kernel.org/netdev/0000000000009e46c3060ebcdffd@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim()Eric Dumazet1-13/+13
[ Upstream commit d375b98e0248980681e5e56b712026174d617198 ] syzbot pointed out [1] that NEXTHDR_FRAGMENT handling is broken. Reading frag_off can only be done if we pulled enough bytes to skb->head. Currently we might access garbage. [1] BUG: KMSAN: uninit-value in ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] __kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517 __do_kmalloc_node mm/slab_common.c:1006 [inline] __kmalloc_node_track_caller+0x118/0x3c0 mm/slab_common.c:1027 kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582 pskb_expand_head+0x226/0x1a00 net/core/skbuff.c:2098 __pskb_pull_tail+0x13b/0x2310 net/core/skbuff.c:2655 pskb_may_pull_reason include/linux/skbuff.h:2673 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] ip6_tnl_parse_tlv_enc_lim+0x901/0xbb0 net/ipv6/ip6_tunnel.c:408 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b CPU: 0 PID: 7345 Comm: syz-executor.3 Not tainted 6.7.0-rc8-syzkaller-00024-gac865f00af29 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01net/ipv6: Revert remove expired routes with a separated list of routesDavid Ahern2-52/+9
[ Upstream commit dade3f6a1e4e35a5ae916d5e78b3229ec34c78ec ] This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87. The commit has some race conditions given how expires is managed on a fib6_info in relation to gc start, adding the entry to the gc list and setting the timer value leading to UAF. Revert the commit and try again in a later release. Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes") Cc: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIXMaciej Żenczykowski1-5/+1
[ Upstream commit bd4a816752bab609dd6d65ae021387beb9e2ddbd ] Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification. We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter. We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter. This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety. Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13ipv6: fix potential NULL deref in fib6_add()Eric Dumazet1-5/+1
[ Upstream commit 75475bb51e78a3f54ad2f69380f2a1c985e85f2d ] If fib6_find_prefix() returns NULL, we should silently fallback using fib6_null_entry regardless of RT6_DEBUG value. syzbot reported: WARNING: CPU: 0 PID: 5477 at net/ipv6/ip6_fib.c:1516 fib6_add+0x310d/0x3fa0 net/ipv6/ip6_fib.c:1516 Modules linked in: CPU: 0 PID: 5477 Comm: syz-executor.0 Not tainted 6.7.0-rc2-syzkaller-00029-g9b6de136b5f0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:fib6_add+0x310d/0x3fa0 net/ipv6/ip6_fib.c:1516 Code: 00 48 8b 54 24 68 e8 42 22 00 00 48 85 c0 74 14 49 89 c6 e8 d5 d3 c2 f7 eb 5d e8 ce d3 c2 f7 e9 ca 00 00 00 e8 c4 d3 c2 f7 90 <0f> 0b 90 48 b8 00 00 00 00 00 fc ff df 48 8b 4c 24 38 80 3c 01 00 RSP: 0018:ffffc90005067740 EFLAGS: 00010293 RAX: ffffffff89cba5bc RBX: ffffc90005067ab0 RCX: ffff88801a2e9dc0 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffffc90005067980 R08: ffffffff89cbca85 R09: 1ffff110040d4b85 R10: dffffc0000000000 R11: ffffed10040d4b86 R12: 00000000ffffffff R13: 1ffff110051c3904 R14: ffff8880206a5c00 R15: ffff888028e1c820 FS: 00007f763783c6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f763783bff8 CR3: 000000007f74d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __ip6_ins_rt net/ipv6/route.c:1303 [inline] ip6_route_add+0x88/0x120 net/ipv6/route.c:3847 ipv6_route_ioctl+0x525/0x7b0 net/ipv6/route.c:4467 inet6_ioctl+0x21a/0x270 net/ipv6/af_inet6.c:575 sock_do_ioctl+0x152/0x460 net/socket.c:1220 sock_ioctl+0x615/0x8c0 net/socket.c:1339 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 Fixes: 7bbfe00e0252 ("ipv6: fix general protection fault in fib6_add()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231129160630.3509216-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.Kuniyuki Iwashima1-3/+4
[ Upstream commit 23be1e0e2a83a8543214d2599a31d9a2185a796b ] Initially, commit 4237c75c0a35 ("[MLSXFRM]: Auto-labeling of child sockets") introduced security_inet_conn_request() in some functions where reqsk is allocated. The hook is added just after the allocation, so reqsk's IPv6 remote address was not initialised then. However, SELinux/Smack started to read it in netlbl_req_setattr() after commit e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm."). Commit 284904aa7946 ("lsm: Relocate the IPv4 security_inet_conn_request() hooks") fixed that kind of issue only in TCPv4 because IPv6 labeling was not supported at that time. Finally, the same issue was introduced again in IPv6. Let's apply the same fix on DCCPv6 and TCPv6. Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20ipv6: avoid atomic fragment on GSO packetsYan Zhai1-1/+7
[ Upstream commit 03d6c848bfb406e9ef6d9846d759e97beaeea113 ] When the ipv6 stack output a GSO packet, if its gso_size is larger than dst MTU, then all segments would be fragmented. However, it is possible for a GSO packet to have a trailing segment with smaller actual size than both gso_size as well as the MTU, which leads to an "atomic fragment". Atomic fragments are considered harmful in RFC-8021. An Existing report from APNIC also shows that atomic fragments are more likely to be dropped even it is equivalent to a no-op [1]. Add an extra check in the GSO slow output path. For each segment from the original over-sized packet, if it fits with the path MTU, then avoid generating an atomic fragment. Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1] Fixes: b210de4f8c97 ("net: ipv6: Validate GSO SKB before finish IPv6 processing") Reported-by: David Wragg <dwragg@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Link: https://lore.kernel.org/r/90912e3503a242dca0bc36958b11ed03a2696e5e.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udplite: fix various data-racesEric Dumazet1-4/+5
[ Upstream commit 882af43a0fc37e26d85fb0df0c9edd3bed928de4 ] udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udplite: remove UDPLITE_BITEric Dumazet1-1/+0
[ Upstream commit 729549aa350c56a777bb342941ed4d69b6585769 ] This flag is set but never read, we can remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 882af43a0fc3 ("udplite: fix various data-races") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: annotate data-races around udp->encap_typeEric Dumazet2-4/+5
[ Upstream commit 70a36f571362a8de8b8c02d21ae524fc776287f2 ] syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GROEric Dumazet1-1/+1
[ Upstream commit ac9a7f4ce5dda1472e8f44096f33066c6ec1a3b4 ] Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 70a36f571362 ("udp: annotate data-races around udp->encap_type") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: move udp->gro_enabled to udp->udp_flagsEric Dumazet1-1/+1
[ Upstream commit e1dc0615c6b08ef36414f08c011965b8fb56198b ] syzbot reported that udp->gro_enabled can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: move udp->no_check6_rx to udp->udp_flagsEric Dumazet1-3/+3
[ Upstream commit bcbc1b1de884647aa0318bf74eb7f293d72a1e40 ] syzbot reported that udp->no_check6_rx can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: move udp->no_check6_tx to udp->udp_flagsEric Dumazet1-2/+2
[ Upstream commit a0002127cd746fcaa182ad3386ef6931c37f3bda ] syzbot reported that udp->no_check6_tx can be read locklessly. Use one atomic bit from udp->udp_flags Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20udp: introduce udp->udp_flagsEric Dumazet1-3/+3
[ Upstream commit 81b36803ac139827538ac5ce4028e750a3c53f53 ] According to syzbot, it is time to use proper atomic flags for various UDP flags. Add udp_flags field, and convert udp->corkflag to first bit in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: a0002127cd74 ("udp: move udp->no_check6_tx to udp->udp_flags") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25net: ipv6: fix typo in commentsDeming Wang1-1/+1
The word "advertize" should be replaced by "advertise". Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Merge tag 'ipsec-2023-10-17' of ↵Jakub Kicinski2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2023-10-17 1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert. From Dong Chenchen. 2) Fix data-races in the xfrm interfaces dev->stats fields. From Eric Dumazet. 3) Fix a data-race in xfrm_gen_index. From Eric Dumazet. 4) Fix an inet6_dev refcount underflow. From Zhang Changzhong. 5) Check the return value of pskb_trim in esp_remove_trailer for esp4 and esp6. From Ma Ke. 6) Fix a data-race in xfrm_lookup_with_ifid. From Eric Dumazet. * tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix a data-race in xfrm_lookup_with_ifid() net: ipv4: fix return value check in esp_remove_trailer net: ipv6: fix return value check in esp_remove_trailer xfrm6: fix inet6_dev refcount underflow problem xfrm: fix a data-race in xfrm_gen_index() xfrm: interface: use DEV_STATS_INC() net: xfrm: skip policies marked as dead while reinserting policies ==================== Link: https://lore.kernel.org/r/20231017083723.1364940-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10net: ipv6: fix return value check in esp_remove_trailerMa Ke1-1/+3
In esp_remove_trailer(), to avoid an unexpected result returned by pskb_trim, we should check the return value of pskb_trim(). Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-03ipv6: tcp: add a missing nf_reset_ct() in 3WHS handlingIlya Maximets1-3/+7
Commit b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") is a direct copy of the old commit b59c270104f0 ("[NETFILTER]: Keep conntrack reference until IPsec policy checks are done") but for IPv6. However, it also copies a bug that this old commit had. That is: when the third packet of 3WHS connection establishment contains payload, it is added into socket receive queue without the XFRM check and the drop of connection tracking context. That leads to nf_conntrack module being impossible to unload as it waits for all the conntrack references to be dropped while the packet release is deferred in per-cpu cache indefinitely, if not consumed by the application. The issue for IPv4 was fixed in commit 6f0012e35160 ("tcp: add a missing nf_reset_ct() in 3WHS handling") by adding a missing XFRM check and correctly dropping the conntrack context. However, the issue was introduced to IPv6 code afterwards. Fixing it the same way for IPv6 now. Fixes: b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") Link: https://lore.kernel.org/netdev/d589a999-d4dd-2768-b2d5-89dec64a4a42@ovn.org/ Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230922210530.2045146-1-i.maximets@ovn.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-20xfrm6: fix inet6_dev refcount underflow problemZhang Changzhong1-2/+2
There are race conditions that may lead to inet6_dev refcount underflow in xfrm6_dst_destroy() and rt6_uncached_list_flush_dev(). One of the refcount underflow bugs is shown below: (cpu 1) | (cpu 2) xfrm6_dst_destroy() | ... | in6_dev_put() | | rt6_uncached_list_flush_dev() ... | ... | in6_dev_put() rt6_uncached_list_del() | ... ... | xfrm6_dst_destroy() calls rt6_uncached_list_del() after in6_dev_put(), so rt6_uncached_list_flush_dev() has a chance to call in6_dev_put() again for the same inet6_dev. Fix it by moving in6_dev_put() after rt6_uncached_list_del() in xfrm6_dst_destroy(). Fixes: 510c321b5571 ("xfrm: reuse uncached_list to track xdsts") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-09-08Merge tag 'net-6.6-rc1' of ↵Linus Torvalds8-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ...
2023-09-04net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddrAlex Henrie1-1/+1
The existing code incorrectly casted a negative value (the result of a subtraction) to an unsigned value without checking. For example, if /proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred lifetime would jump to 4 billion seconds. On my machine and network the shortest lifetime that avoided underflow was 3 seconds. Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01ipv6: ignore dst hint for multipath routesSriram Yagnaraman2-1/+5
Route hints when the nexthop is part of a multipath group causes packets in the same receive batch to be sent to the same nexthop irrespective of the multipath hash of the packet. So, do not extract route hint for packets whose destination is part of a multipath group. A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the flag when route is looked up in fib6_select_path() and use it in ip6_can_use_hint() to check for the existence of the flag. Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01net: annotate data-races around sk->sk_tsflagsEric Dumazet4-4/+4
sk->sk_tsflags can be read locklessly, add corresponding annotations. Fixes: b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-30net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicatedHeng Guo2-3/+0
commit edf391ff1723 ("snmp: add missing counters for RFC 4293") had already added OutOctets for RFC 4293. In commit 2d8dbb04c63e ("snmp: fix OutOctets counter to include forwarded datagrams"), OutOctets was counted again, but not removed from ip_output(). According to RFC 4293 "3.2.3. IP Statistics Tables", ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So "IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment. This patch reverts commit 2d8dbb04c63e ("snmp: fix OutOctets counter to include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to ip_finish_output2 for ipv4. Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-30Merge tag 'sysctl-6.6-rc1' of ↵Linus Torvalds7-9/+33
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Long ago we set out to remove the kitchen sink on kernel/sysctl.c arrays and placings sysctls to their own sybsystem or file to help avoid merge conflicts. Matthew Wilcox pointed out though that if we're going to do that we might as well also *save* space while at it and try to remove the extra last sysctl entry added at the end of each array, a sentintel, instead of bloating the kernel by adding a new sentinel with each array moved. Doing that was not so trivial, and has required slowing down the moves of kernel/sysctl.c arrays and measuring the impact on size by each new move. The complex part of the effort to help reduce the size of each sysctl is being done by the patient work of el señor Don Joel Granados. A lot of this is truly painful code refactoring and testing and then trying to measure the savings of each move and removing the sentinels. Although Joel already has code which does most of this work, experience with sysctl moves in the past shows is we need to be careful due to the slew of odd build failures that are possible due to the amount of random Kconfig options sysctls use. To that end Joel's work is split by first addressing the major housekeeping needed to remove the sentinels, which is part of this merge request. The rest of the work to actually remove the sentinels will be done later in future kernel releases. The preliminary math is showing this will all help reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array where we are able to remove each sentinel in the future. That also means there is no more bloating the kernel with the extra ~64 bytes per array moved as no new sentinels are created" * tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: Use ctl_table_size as stopping criteria for list macro sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl vrf: Update to register_net_sysctl_sz networking: Update to register_net_sysctl_sz netfilter: Update to register_net_sysctl_sz ax.25: Update to register_net_sysctl_sz sysctl: Add size to register_net_sysctl function sysctl: Add size arg to __register_sysctl_init sysctl: Add size to register_sysctl sysctl: Add a size arg to __register_sysctl_table sysctl: Add size argument to init_header sysctl: Add ctl_table_size to ctl_table_header sysctl: Use ctl_table_header in list_for_each_table_entry sysctl: Prefer ctl_table_header in proc_sysctl
2023-08-26Merge tag 'for-netdev' of ↵Jakub Kicinski1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-25 We've added 87 non-merge commits during the last 8 day(s) which contain a total of 104 files changed, 3719 insertions(+), 4212 deletions(-). The main changes are: 1) Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds, from Jiri Olsa. 2) Add support BPF cpu v4 instructions for arm64 JIT compiler, from Xu Kuohai. 3) Add support BPF cpu v4 instructions for riscv64 JIT compiler, from Pu Lehui. 4) Fix LWT BPF xmit hooks wrt their return values where propagating the result from skb_do_redirect() would trigger a use-after-free, from Yan Zhai. 5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch, from Yonghong Song. 6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement, from Kumar Kartikeya Dwivedi. 7) Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak, from Yafang Shao. 8) Remove unmaintained XDP BPF samples as they are maintained in xdp-tools repository out of tree, from Toke Høiland-Jørgensen. 9) Batch of fixes for the tracing programs from BPF samples in order to make them more libbpf-aware, from Daniel T. Lee. 10) Fix a libbpf signedness determination bug in the CO-RE relocation handling logic, from Andrii Nakryiko. 11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up fixes for bpf_refcount shared ownership implementation, both from Dave Marchevsky. 12) Add a new bpf_object__unpin() API function to libbpf, from Daniel Xu. 13) Fix a memory leak in libbpf to also free btf_vmlinux when the bpf_object gets closed, from Hao Luo. 14) Small error output improvements to test_bpf module, from Helge Deller. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits) selftests/bpf: Add tests for rbtree API interaction in sleepable progs bpf: Allow bpf_spin_{lock,unlock} in sleepable progs bpf: Consider non-owning refs to refcounted nodes RCU protected bpf: Reenable bpf_refcount_acquire bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes bpf: Consider non-owning refs trusted bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire selftests/bpf: Enable cpu v4 tests for RV64 riscv, bpf: Support unconditional bswap insn riscv, bpf: Support signed div/mod insns riscv, bpf: Support 32-bit offset jmp insn riscv, bpf: Support sign-extension mov insns riscv, bpf: Support sign-extension load insns riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W samples/bpf: Add note to README about the XDP utilities moved to xdp-tools samples/bpf: Cleanup .gitignore samples/bpf: Remove the xdp_sample_pkts utility samples/bpf: Remove the xdp1 and xdp2 utilities samples/bpf: Remove the xdp_rxq_info utility samples/bpf: Remove the xdp_redirect* utilities ... ==================== Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22net: remove unnecessary input parameter 'how' in ifdown functionZhengchao Shao2-8/+3
When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementation of the ifdown interface, ip6_dst_ifdown does not use the input parameter 'how', xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter 'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-20ipv6: do not match device when remove source routeHangbin Liu1-5/+2
After deleting an IPv6 address on an interface and cleaning up the related preferred source entries, it is important to ensure that all routes associated with the deleted address are properly cleared. The current implementation of rt6_remove_prefsrc() only checks the preferred source addresses bound to the current device. However, there may be routes that are bound to other devices but still utilize the same preferred source address. To address this issue, it is necessary to also delete entries that are bound to other interfaces but share the same source address with the current device. Failure to delete these entries would leave routes that are bound to the deleted address unclear. Here is an example reproducer (I have omitted unrelated routes): + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 1:2:3:4::5/64 dev dummy1 + ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5 + ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5 + ip -6 route show 1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium 7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium 7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium + ip addr del 1:2:3:4::5/64 dev dummy1 + ip -6 route show 7:7:7::1 dev dummy1 metric 1024 pref medium 7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium As Ido reminds, in IPv6, the preferred source address is looked up in the same VRF as the first nexthop device, which is different with IPv4. So, while removing the device checking, we also need to add an ipv6_chk_addr() check to make sure the address does not exist on the other devices of the rt nexthop device's VRF. After fix: + ip addr del 1:2:3:4::5/64 dev dummy1 + ip -6 route show 7:7:7::1 dev dummy1 metric 1024 pref medium 7:7:7::2 dev dummy2 metric 1024 pref medium Reported-by: Thomas Haller <thaller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2170513 Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20net: release reference to inet6_dev pointerPatrick Rohr1-1/+1
addrconf_prefix_rcv returned early without releasing the inet6_dev pointer when the PIO lifetime is less than accept_ra_min_lft. Fixes: 5027d54a9c30 ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes") Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20net: selectively purge error queue in IP_RECVERR / IPV6_RECVERREric Dumazet1-1/+1
Setting IP_RECVERR and IPV6_RECVERR options to zero currently purges the socket error queue, which was probably not expected for zerocopy and tx_timestamp users. I discovered this issue while preparing commit 6b5f43ea0815 ("inet: move inet->recverr to inet->inet_flags"), I presume this change does not need to be backported to stable kernels. Add skb_errqueue_purge() helper to purge error messages only. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+3
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18lwt: Check LWTUNNEL_XMIT_CONTINUE strictlyYan Zhai1-1/+1
LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops. One of the possible errors is to return statuses of dst_output directly. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com
2023-08-16net/ipv6: Remove expired routes with a separated list of routes.Kui-Feng Lee2-9/+52
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->bind_address_no_port to inet->inet_flagsEric Dumazet1-1/+1
IP_BIND_ADDRESS_NO_PORT socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->is_icsk to inet->inet_flagsEric Dumazet2-3/+3
We move single bit fields to inet->inet_flags to avoid races. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->transparent to inet->inet_flagsEric Dumazet1-2/+2
IP_TRANSPARENT socket option can now be set/read without locking the socket. v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent() v4: rebased after commit 3f326a821b99 ("mptcp: change the mpc check helper to return a sk") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->mc_loop to inet->inet_fragsEric Dumazet1-1/+1
IP_MULTICAST_LOOP socket option can now be set/read without locking the socket. v3: fix build bot error reported in ipvs set_mcast_loop() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->hdrincl to inet->inet_flagsEric Dumazet3-14/+9
IP_HDRINCL socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>