summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2023-10-06netfilter: nf_tables: fix memleak when more than 255 elements expiredFlorian Westphal1-1/+1
commit cf5000a7787cbc10341091d37245a42c119d26c5 upstream. When more than 255 elements expired we're supposed to switch to a new gc container structure. This never happens: u8 type will wrap before reaching the boundary and nft_trans_gc_space() always returns true. This means we recycle the initial gc container structure and lose track of the elements that came before. While at it, don't deref 'gc' after we've passed it to call_rcu. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GCPablo Neira Ayuso1-2/+3
commit 4a9e12ea7e70223555ec010bec9f711089ce96f6 upstream. pipapo needs to enqueue GC transactions for catchall elements through nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and nft_trans_gc_catchall_async() to handle GC transaction queueing accordingly. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19tcp: Fix bind() regression for v4-mapped-v6 wildcard address.Kuniyuki Iwashima1-0/+5
[ Upstream commit aa99e5f87bd54db55dd37cb130bd5eb55933027f ] Andrei Vagin reported bind() regression with strace logs. If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:0.0.0.0', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the v4-mapped-v6 wildcard address. The example above does not work after commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket"), but the blamed change is not the commit. Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated as 0.0.0.0, and the sequence above worked by chance. Technically, this case has been broken since bhash2 was introduced. Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, the 2nd bind() fails properly because we fall back to using bhash to detect conflicts for the v4-mapped-v6 address. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Andrei Vagin <avagin@google.com> Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19ipv6: fix ip6_sock_set_addr_preferences() typoEric Dumazet1-1/+1
[ Upstream commit 8cdd9f1aaedf823006449faa4e540026c692ac43 ] ip6_sock_set_addr_preferences() second argument should be an integer. SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were translated to IPV6_PREFER_SRC_TMP Fixes: 18d5ad623275 ("ipv6: add ip6_sock_set_addr_preferences") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230911154213.713941-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19ip_tunnels: use DEV_STATS_INC()Eric Dumazet1-8/+7
[ Upstream commit 9b271ebaf9a2c5c566a54bc6cd915962e8241130 ] syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1] This can run from multiple cpus without mutual exclusion. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. [1] BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000018830 -> 0x0000000000018831 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0 Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19af_unix: Fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT.Kuniyuki Iwashima1-5/+9
[ Upstream commit 718e6b51298e0f254baca0d40ab52a00e004e014 ] Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT because scm_pidfd_recv() always checks msg_controllen against sizeof(struct cmsghdr). We need to use sizeof(struct compat_cmsghdr) for the compat case. Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") Reported-by: Heiko Carstens <hca@linux.ibm.com> Closes: https://lore.kernel.org/netdev/20230901200517.8742-A-hca@linux.ibm.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19ipv4: ignore dst hint for multipath routesSriram Yagnaraman1-0/+1
[ Upstream commit 6ac66cb03ae306c2e288a9be18226310529f5b25 ] Route hints when the nexthop is part of a multipath group causes packets in the same receive batch to be sent to the same nexthop irrespective of the multipath hash of the packet. So, do not extract route hint for packets whose destination is part of a multipath group. A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the flag when route is looked up in ip_mkroute_input() and use it in ip_extract_route_hint() to check for the existence of the flag. Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19net: annotate data-races around sk->sk_tsflagsEric Dumazet2-8/+11
[ Upstream commit e3390b30a5dfb112e8e802a59c0f68f947b638b2 ] sk->sk_tsflags can be read locklessly, add corresponding annotations. Fixes: b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19net: annotate data-races around sk->sk_forward_allocEric Dumazet1-3/+9
[ Upstream commit 5e6300e7b3a4ab5b72a82079753868e91fbf9efc ] Every time sk->sk_forward_alloc is read locklessly, add a READ_ONCE(). Add sk_forward_alloc_add() helper to centralize updates, to reduce number of WRITE_ONCE(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19net: fib: avoid warn splat in flow dissectorFlorian Westphal2-2/+8
[ Upstream commit 8aae7625ff3f0bd5484d01f1b8d5af82e44bec2d ] New skbs allocated via nf_send_reset() have skb->dev == NULL. fib*_rules_early_flow_dissect helpers already have a 'struct net' argument but its not passed down to the flow dissector core, which will then WARN as it can't derive a net namespace to use: WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0 [..] ip_route_me_harder+0x143/0x330 nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4] nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet] nft_do_chain+0x198/0x5d0 [nf_tables] nft_do_chain_inet+0xa4/0x110 [nf_tables] nf_hook_slow+0x41/0xc0 ip_local_deliver+0xce/0x110 .. Cc: Stanislav Fomichev <sdf@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Ido Schimmel <idosch@nvidia.com> Fixes: 812fa71f0d96 ("netfilter: Dissect flow after packet mangling") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217826 Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230830110043.30497-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODEDLuiz Augusto von Dentz2-1/+13
[ Upstream commit 253f3399f4c09ce6f4e67350f839be0361b4d5ff ] This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate that LE Coded PHY shall not be used, it is then set for some Intel models that claim to support it but when used causes many problems. Cc: stable@vger.kernel.org # 6.4.y+ Link: https://github.com/bluez/bluez/issues/577 Link: https://github.com/bluez/bluez/issues/582 Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=dZKbRE1qxT1h4fFbQ@mail.gmail.com/T/# Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: msft: Extended monitor tracking by address filterHilda Wu1-0/+10
[ Upstream commit 9e14606d8f38ea52a38c27692a9c1513c987a5da ] Since limited tracking device per condition, this feature is to support tracking multiple devices concurrently. When a pattern monitor detects the device, this feature issues an address monitor for tracking that device. Let pattern monitor can keep monitor new devices. This feature adds an address filter when receiving a LE monitor device event which monitor handle is for a pattern, and the controller started monitoring the device. And this feature also has cancelled the monitor advertisement from address filters when receiving a LE monitor device event when the controller stopped monitoring the device specified by an address and monitor handle. Below is an example to know the feature adds the address filter. //Add MSFT pattern monitor < HCI Command: Vendor (0x3f|0x00f0) plen 14 #142 [hci0] 55.552420 03 b8 a4 03 ff 01 01 06 09 05 5f 52 45 46 .........._REF > HCI Event: Command Complete (0x0e) plen 6 #143 [hci0] 55.653960 Vendor (0x3f|0x00f0) ncmd 2 Status: Success (0x00) 03 00 //Got event from the pattern monitor > HCI Event: Vendor (0xff) plen 18 #148 [hci0] 58.384953 23 79 54 33 77 88 97 68 02 00 fb c1 29 eb 27 b8 #yT3w..h....).'. 00 01 .. //Add MSFT address monitor (Sample address: B8:27:EB:29:C1:FB) < HCI Command: Vendor (0x3f|0x00f0) plen 13 #149 [hci0] 58.385067 03 b8 a4 03 ff 04 00 fb c1 29 eb 27 b8 .........).'. //Report to userspace about found device (ADV Monitor Device Found) @ MGMT Event: Unknown (0x002f) plen 38 {0x0003} [hci0] 58.680042 01 00 fb c1 29 eb 27 b8 01 ce 00 00 00 00 16 00 ....).'......... 0a 09 4b 45 59 42 44 5f 52 45 46 02 01 06 03 19 ..KEYBD_REF..... c1 03 03 03 12 18 ...... //Got event from address monitor > HCI Event: Vendor (0xff) plen 18 #152 [hci0] 58.672956 23 79 54 33 77 88 97 68 02 00 fb c1 29 eb 27 b8 #yT3w..h....).'. 01 01 Signed-off-by: Alex Lu <alex_lu@realsil.com.cn> Signed-off-by: Hilda Wu <hildawu@realtek.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 253f3399f4c0 ("Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: ISO: Notify user space about failed bis connectionsIulia Tanasescu1-0/+25
[ Upstream commit f777d88278170410b06a1f6633f3b9375a4ddd6b ] Some use cases require the user to be informed if BIG synchronization fails. This commit makes it so that even if the BIG sync established event arrives with error status, a new hconn is added for each BIS, and the iso layer is notified about the failed connections. Unsuccesful bis connections will be marked using the HCI_CONN_BIG_SYNC_FAILED flag. From the iso layer, the POLLERR event is triggered on the newly allocated bis sockets, before adding them to the accept list of the parent socket. From user space, a new fd for each failed bis connection will be obtained by calling accept. The user should check for the POLLERR event on the new socket, to determine if the connection was successful or not. The HCI_CONN_BIG_SYNC flag has been added to mark whether the BIG sync has been successfully established. This flag is checked at bis cleanup, so the HCI LE BIG Terminate Sync command is only issued if needed. The BT_SK_BIG_SYNC flag indicates if BIG create sync has been called for a listening socket, to avoid issuing the command everytime a BIGInfo advertising report is received. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 94d9ba9f9888 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: hci_conn: Consolidate code for aborting connectionsLuiz Augusto von Dentz1-1/+1
[ Upstream commit a13f316e90fdb1fb6df6582e845aa9b3270f3581 ] This consolidates code for aborting connections using hci_cmd_sync_queue so it is synchronized with other threads, but because of the fact that some commands may block the cmd_sync_queue while waiting specific events this attempt to cancel those requests by using hci_cmd_sync_cancel. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 94d9ba9f9888 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13mac80211: make ieee80211_tx_info padding explicitArnd Bergmann1-1/+3
[ Upstream commit a7a2ef0c4b3efbd7d6f3fabd87dbbc0b3f2de5af ] While looking at a bug, I got rather confused by the layout of the 'status' field in ieee80211_tx_info. Apparently, the intention is that status_driver_data[] is used for driver specific data, and fills up the size of the union to 40 bytes, just like the other ones. This is indeed what actually happens, but only because of the combination of two mistakes: - "void *status_driver_data[18 / sizeof(void *)];" is intended to be 18 bytes long but is actually two bytes shorter because of rounding-down in the division, to a multiple of the pointer size (4 bytes or 8 bytes). - The other fields combined are intended to be 22 bytes long, but are actually 24 bytes because of padding in front of the unaligned tx_time member, and in front of the pointer array. The two mistakes cancel out. so the size ends up fine, but it seems more helpful to make this explicit, by having a multiple of 8 bytes in the size calculation and explicitly describing the padding. Fixes: ea5907db2a9cc ("mac80211: fix struct ieee80211_tx_info size") Fixes: 02219b3abca59 ("mac80211: add WMM admission control support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230623152443.2296825-2-arnd@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13lwt: Check LWTUNNEL_XMIT_CONTINUE strictlyYan Zhai1-1/+4
[ Upstream commit a171fbec88a2c730b108c7147ac5e7b2f5a02b47 ] LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops. One of the possible errors is to return statuses of dst_output directly. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: hci_conn: Always allocate unique handlesLuiz Augusto von Dentz1-1/+1
[ Upstream commit 9f78191cc9f1b34c2e2afd7b554a83bf034092dd ] This attempts to always allocate a unique handle for connections so they can be properly aborted by the likes of hci_abort_conn, so this uses the invalid range as a pool of unset handles that way if userspace is trying to create multiple connections at once each will be given a unique handle which will be considered unset. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 66dee21524d9 ("Bluetooth: hci_event: drop only unbound CIS if Set CIG Parameters fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: ISO: do not emit new LE Create CIS if previous is pendingPauli Virtanen2-2/+4
[ Upstream commit 7f74563e6140e42b4ffae62adbef7a65967a3f98 ] LE Create CIS command shall not be sent before all CIS Established events from its previous invocation have been processed. Currently it is sent via hci_sync but that only waits for the first event, but there can be multiple. Make it wait for all events, and simplify the CIS creation as follows: Add new flag HCI_CONN_CREATE_CIS, which is set if Create CIS has been sent for the connection but it is not yet completed. Make BT_CONNECT state to mean the connection wants Create CIS. On events after which new Create CIS may need to be sent, send it if possible and some connections need it. These events are: hci_connect_cis, iso_connect_cfm, hci_cs_le_create_cis, hci_le_cis_estabilished_evt. The Create CIS status/completion events shall queue new Create CIS only if at least one of the connections transitions away from BT_CONNECT, so that we don't loop if controller is sending bogus events. This fixes sending multiple CIS Create for the same CIS in the "ISO AC 6(i) - Success" BlueZ test case: < HCI Command: LE Create Co.. (0x08|0x0064) plen 9 #129 [hci0] Number of CIS: 2 CIS Handle: 257 ACL Handle: 42 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #130 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 #131 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 257 ... < HCI Command: LE Setup Is.. (0x08|0x006e) plen 13 #132 [hci0] ... > HCI Event: Command Complete (0x0e) plen 6 #133 [hci0] LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 ... < HCI Command: LE Create Co.. (0x08|0x0064) plen 5 #134 [hci0] Number of CIS: 1 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #135 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: ACL Connection Already Exists (0x0b) > HCI Event: LE Meta Event (0x3e) plen 29 #136 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 258 ... Fixes: c09b80be6ffc ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13Bluetooth: ISO: Add support for connecting multiple BISesIulia Tanasescu1-0/+30
[ Upstream commit a0bfde167b506423111ddb8cd71930497a40fc54 ] It is required for some configurations to have multiple BISes as part of the same BIG. Similar to the flow implemented for unicast, DEFER_SETUP will also be used to bind multiple BISes for the same BIG, before starting Periodic Advertising and creating the BIG. The user will have to open a new socket for each BIS. By setting the BT_DEFER_SETUP socket option and calling connect, a new connection will be added for the BIG and advertising handle set by the socket QoS parameters. Since all BISes will be bound for the same BIG and advertising handle, the socket QoS options and base parameters should match for all connections. By calling connect on a socket that does not have the BT_DEFER_SETUP option set, periodic advertising will be started and the BIG will be created, with a BIS for each previously bound connection. Since a BIG cannot be reconfigured with additional BISes after creation, no more connections can be bound for the BIG after the start periodic advertising and create BIG commands have been queued. The bis_cleanup function has also been updated, so that the advertising set and the BIG will not be terminated unless there are no more bound or connected BISes. The HCI_CONN_BIG_CREATED connection flag has been added to indicate that the BIG has been successfully created. This flag is checked at bis_cleanup, so that the BIG is only terminated if the HCI_LE_Create_BIG_Complete has been received. This implementation has been tested on hardware, using the "isotest" tool with an additional command line option, to specify the number of BISes to create as part of the desired BIG: tools/isotest -i hci0 -s 00:00:00:00:00:00 -N 2 -G 1 -T 1 The btmon log shows that a BIG containing 2 BISes has been created: < HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) plen 31 Handle: 0x01 Advertising Handle: 0x01 Number of BIS: 2 SDU Interval: 10000 us (0x002710) Maximum SDU size: 40 Maximum Latency: 10 ms (0x000a) RTN: 0x02 PHY: LE 2M (0x02) Packing: Sequential (0x00) Framing: Unframed (0x00) Encryption: 0x00 Broadcast Code: 00000000000000000000000000000000 > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 23 LE Broadcast Isochronous Group Complete (0x1b) Status: Success (0x00) Handle: 0x01 BIG Synchronization Delay: 1974 us (0x0007b6) Transport Latency: 1974 us (0x0007b6) PHY: LE 2M (0x02) NSE: 3 BN: 1 PTO: 1 IRC: 3 Maximum PDU: 40 ISO Interval: 10.00 msec (0x0008) Connection Handle #0: 10 Connection Handle #1: 11 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 10 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 10 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 11 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 11 < ISO Data TX: Handle 10 flags 0x02 dlen 44 < ISO Data TX: Handle 11 flags 0x02 dlen 44 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 10 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 11 Count: 1 Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 7f74563e6140 ("Bluetooth: ISO: do not emit new LE Create CIS if previous is pending") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13tcp: tcp_enter_quickack_mode() should be staticEric Dumazet1-1/+0
[ Upstream commit 03b123debcbc8db987bda17ed8412cc011064c22 ] After commit d2ccd7bc8acd ("tcp: avoid resetting ACK timer in DCTCP"), tcp_enter_quickack_mode() is only used from net/ipv4/tcp_input.c. Fixes: d2ccd7bc8acd ("tcp: avoid resetting ACK timer in DCTCP") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20230718162049.1444938-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-02ipv6: remove hard coded limitation on ipv6_pinfoEric Dumazet1-0/+1
commit f5f80e32de12fad2813d37270e8364a03e6d3ef0 upstream. IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit f5d547676ca0 ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chao Wu <wwchao@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-24Merge tag 'nf-23-08-23' of ↵Paolo Abeni1-0/+6
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net This PR contains nf_tables updates for your *net* tree. First patch fixes table validation, I broke this in 6.4 when tracking validation state per table, reported by Pablo, fixup from myself. Second patch makes sure objects waiting for memory release have been released, this was broken in 6.1, patch from Pablo Neira Ayuso. Patch three is a fix-for-fix from previous PR: In case a transaction gets aborted, gc sequence counter needs to be incremented so pending gc requests are invalidated, from Pablo. Same for patch 4: gc list needs to use gc list lock, not destroy lock, also from Pablo. Patch 5 fixes a UaF in a set backend, but this should only occur when failslab is enabled for GFP_KERNEL allocations, broken since feature was added in 5.6, from myself. Patch 6 fixes a double-free bug that was also added via previous PR: We must not schedule gc work if the previous batch is still queued. netfilter pull request 2023-08-23 * tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ==================== Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24bonding: fix macvlan over alb bond supportHangbin Liu1-10/+1
The commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") aims to enable the use of macvlans on top of rlb bond mode. However, the current rlb bond mode only handles ARP packets to update remote neighbor entries. This causes an issue when a macvlan is on top of the bond, and remote devices send packets to the macvlan using the bond's MAC address as the destination. After delivering the packets to the macvlan, the macvlan will rejects them as the MAC address is incorrect. Consequently, this commit makes macvlan over bond non-functional. To address this problem, one potential solution is to check for the presence of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev) and return NULL in the rlb_arp_xmit() function. However, this approach doesn't fully resolve the situation when a VLAN exists between the bond and macvlan. So let's just do a partial revert for commit 14af9963ba1e in rlb_arp_xmit(). As the comment said, Don't modify or load balance ARPs that do not originate locally. Fixes: 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") Reported-by: susan.zheng@veritas.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-23netfilter: nf_tables: defer gc run if previous batch is still pendingFlorian Westphal1-0/+5
Don't queue more gc work, else we may queue the same elements multiple times. If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue. The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged. The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending. Add a helper for this and skip the gc run in this case. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-23netfilter: nf_tables: validate all pending tablesFlorian Westphal1-0/+1
We have to validate all tables in the transaction that are in VALIDATE_DO state, the blamed commit below did not move the break statement to its right location so we only validate one table. Moreover, we can't init table->validate to _SKIP when a table object is allocated. If we do, then if a transcaction creates a new table and then fails the transaction, nfnetlink will loop and nft will hang until user cancels the command. Add back the pernet state as a place to stash the last state encountered. This is either _DO (we hit an error during commit validation) or _SKIP (transaction passed all checks). Fixes: 00c320f9b755 ("netfilter: nf_tables: make validation state per table") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22Merge tag 'wireless-2023-08-22' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Two fixes: - reorder buffer filter checks can cause bad shift/UBSAN warning with newer HW, avoid the check (mac80211) - add Kconfig dependency for iwlwifi for PTP clock usage * tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning wifi: iwlwifi: mvm: add dependency for PTP clock ==================== Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warningPing-Ke Shih1-0/+1
The commit 06470f7468c8 ("mac80211: add API to allow filtering frames in BA sessions") added reorder_buf_filtered to mark frames filtered by firmware, and it can only work correctly if hw.max_rx_aggregation_subframes <= 64 since it stores the bitmap in a u64 variable. However, new HE or EHT devices can support BlockAck number up to 256 or 1024, and then using a higher subframe index leads UBSAN warning: UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39 shift exponent 215 is too large for 64-bit type 'long long unsigned int' Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 dump_stack+0x10/0x20 __ubsan_handle_shift_out_of_bounds+0x1ac/0x360 ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211] ieee80211_sta_reorder_release+0x9c/0x400 [mac80211] ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211] ieee80211_rx_list+0xaef/0xf60 [mac80211] ieee80211_rx_napi+0x53/0xd0 [mac80211] Since only old hardware that supports <=64 BlockAck uses ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a WARN_ONCE() and comment to note to avoid using this function if hardware capability is not suitable. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com [edit commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-20ipv4: fix data-races around inet->inet_idEric Dumazet2-3/+14
UDP sendmsg() is lockless, so ip_select_ident_segs() can very well be run from multiple cpus [1] Convert inet->inet_id to an atomic_t, but implement a dedicated path for TCP, avoiding cost of a locked instruction (atomic_add_return()) Note that this patch will cause a trivial merge conflict because we added inet->flags in net-next tree. v2: added missing change in drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c (David Ahern) [1] BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1: ip_select_ident_segs include/net/ip.h:542 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0: ip_select_ident_segs include/net/ip.h:541 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x184d -> 0x184e Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 ================================================================== Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20net: validate veth and vxcan peer ifindexesJakub Kicinski1-2/+2
veth and vxcan need to make sure the ifindexes of the peer are not negative, core does not validate this. Using iproute2 with user-space-level checking removed: Before: # ./ip link add index 10 type veth peer index -1 # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff 10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff Now: $ ./ip link add index 10 type veth peer index -1 Error: ifindex can't be negative. This problem surfaced in net-next because an explicit WARN() was added, the root cause is older. Fixes: e6f8f1a739b6 ("veth: Allow to create peer link with given ifindex") Fixes: a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19sock: annotate data-races around prot->memory_pressureEric Dumazet1-3/+4
*prot->memory_pressure is read/writen locklessly, we need to add proper annotations. A recent commit added a new race, it is time to audit all accesses. Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()") Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Abel Wu <wuyun.abel@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17sock: Fix misuse of sk_under_memory_pressure()Abel Wu1-0/+6
The status of global socket memory pressure is updated when: a) __sk_mem_raise_allocated(): enter: sk_memory_allocated(sk) > sysctl_mem[1] leave: sk_memory_allocated(sk) <= sysctl_mem[0] b) __sk_mem_reduce_allocated(): leave: sk_under_memory_pressure(sk) && sk_memory_allocated(sk) < sysctl_mem[0] So the conditions of leaving global pressure are inconstant, which may lead to the situation that one pressured net-memcg prevents the global pressure from being cleared when there is indeed no global pressure, thus the global constrains are still in effect unexpectedly on the other sockets. This patch fixes this by ignoring the net-memcg's pressure when deciding whether should leave global memory pressure. Fixes: e1aab161e013 ("socket: initial cgroup code.") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16Merge tag 'nf-23-08-16' of ↵David S. Miller1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florisn Westphal says: ==================== These are netfilter fixes for the *net* tree. First patch resolves a false-positive lockdep splat: rcu_dereference is used outside of rcu read lock. Let lockdep validate that the transaction mutex is locked. Second patch fixes a kdoc warning added in previous PR. Third patch fixes a memory leak: The catchall element isn't disabled correctly, this allows userspace to deactivate the element again. This results in refcount underflow which in turn prevents memory release. This was always broken since the feature was added in 5.13. Patch 4 fixes an incorrect change in the previous pull request: Adding a duplicate key to a set should work if the duplicate key has expired, restore this behaviour. All from myself. Patch #5 resolves an old historic artifact in sctp conntrack: a 300ms timeout for shutdown_ack. Increase this to 3s. From Xin Long. Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the sysctl value, from Sishuai Gong. This is a day-0 bug that predates git history. Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups for the previous GC rework in nf_tables: The netlink notifier and the netns exit path must both increment the gc worker seqcount, else worker may encounter stale (free'd) pointers. ================ Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge tag 'ipsec-2023-08-15' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a slab-out-of-bounds read in xfrm_address_filter. From Lin Ma. 2) Fix the pfkey sadb_x_filter validation. From Lin Ma. 3) Use the correct nla_policy structure for XFRMA_SEC_CTX. From Lin Ma. 4) Fix warnings triggerable by bad packets in the encap functions. From Herbert Xu. 5) Fix some slab-use-after-free in decode_session6. From Zhengchao Shao. 6) Fix a possible NULL piointer dereference in xfrm_update_ae_params. Lin Ma. 7) Add a forgotten nla_policy for XFRMA_MTIMER_THRESH. From Lin Ma. 8) Don't leak offloaded policies. From Leon Romanovsky. 9) Delete also the offloading part of an acquire state. From Leon Romanovsky. Please pull or let me know if there are problems.
2023-08-16netfilter: nf_tables: fix kdoc warnings after gc reworkFlorian Westphal1-0/+1
Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set' Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-10Merge tag 'nf-23-08-10' of ↵Jakub Kicinski1-75/+45
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The existing attempt to resolve races between control plane and GC work is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places forgot to call nft_set_elem_mark_busy(), leading to double-deactivation of elements. This series contains the following patches: 1) Do not skip expired elements during walk otherwise elements might never decrement the reference counter on data, leading to memleak. 2) Add a GC transaction API to replace the former attempt to deal with races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT on elements and it creates a GC transaction to remove the expired elements, GC transaction could abort in case of interference with control plane and retried later (GC async). Set backends such as rbtree and pipapo also perform GC from control plane (GC sync), in such case, element deactivation and removal is safe because mutex is held then collected elements are released via call_rcu(). 3) Adapt existing set backends to use the GC transaction API. 4) Update rhash set backend to set on _DEAD bit to report deleted elements from datapath for GC. 5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT. * tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: remove busy mark and gc batch API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nf_tables: GC transaction API to avoid race with control plane netfilter: nf_tables: don't skip expired elements during walk ==================== Link: https://lore.kernel.org/r/20230810070830.24064-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10netfilter: nf_tables: remove busy mark and gc batch APIPablo Neira Ayuso1-95/+3
Ditch it, it has been replace it by the GC transaction API and it has no clients anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10netfilter: nf_tables: GC transaction API to avoid race with control planePablo Neira Ayuso1-1/+63
The set types rhashtable and rbtree use a GC worker to reclaim memory. From system work queue, in periodic intervals, a scan of the table is done. The major caveat here is that the nft transaction mutex is not held. This causes a race between control plane and GC when they attempt to delete the same element. We cannot grab the netlink mutex from the work queue, because the control plane has to wait for the GC work queue in case the set is to be removed, so we get following deadlock: cpu 1 cpu2 GC work transaction comes in , lock nft mutex `acquire nft mutex // BLOCKS transaction asks to remove the set set destruction calls cancel_work_sync() cancel_work_sync will now block forever, because it is waiting for the mutex the caller already owns. This patch adds a new API that deals with garbage collection in two steps: 1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT so they are not visible via lookup. Annotate current GC sequence in the GC transaction. Enqueue GC transaction work as soon as it is full. If ruleset is updated, then GC transaction is aborted and retried later. 2) GC work grabs the mutex. If GC sequence has changed then this GC transaction lost race with control plane, abort it as it contains stale references to objects and let GC try again later. If the ruleset is intact, then this GC transaction deactivates and removes the elements and it uses call_rcu() to destroy elements. Note that no elements are removed from GC lockless path, the _DEAD bit is set and pointers are collected. GC catchall does not remove the elements anymore too. There is a new set->dead flag that is set on to abort the GC transaction to deal with set->ops->destroy() path which removes the remaining elements in the set from commit_release, where no mutex is held. To deal with GC when mutex is held, which allows safe deactivate and removal, add sync GC API which releases the set element object via call_rcu(). This is used by rbtree and pipapo backends which also perform garbage collection from control plane path. Since element removal from sets can happen from control plane and element garbage collection/timeout, it is necessary to keep the set structure alive until all elements have been deactivated and destroyed. We cannot do a cancel_work_sync or flush_work in nft_set_destroy because its called with the transaction mutex held, but the aforementioned async work queue might be blocked on the very mutex that nft_set_destroy() callchain is sitting on. This gives us the choice of ABBA deadlock or UaF. To avoid both, add set->refs refcount_t member. The GC API can then increment the set refcount and release it once the elements have been free'd. Set backends are adapted to use the GC transaction API in a follow up patch entitled: ("netfilter: nf_tables: use gc transaction API in set backends") This is joint work with Florian Westphal. Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10Merge tag 'wireless-2023-08-09' of ↵Jakub Kicinski1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few small updates: * fix an integer overflow in nl80211 * fix rtw89 8852AE disconnections * fix a buffer overflow in ath12k * fix AP_VLAN configuration lookups * fix allocation failure handling in brcm80211 * update MAINTAINERS for some drivers * tag 'wireless-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath12k: Fix buffer overflow when scanning with extraie wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() wifi: cfg80211: fix sband iftype data lookup for AP_VLAN wifi: rtw89: fix 8852AE disconnection caused by RX full flags MAINTAINERS: Remove tree entry for rtl8180 MAINTAINERS: Update entry for rtl8187 wifi: brcm80211: handle params_v1 allocation failure ==================== Link: https://lore.kernel.org/r/20230809124818.167432-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08wifi: cfg80211: fix sband iftype data lookup for AP_VLANFelix Fietkau1-0/+3
AP_VLAN interfaces are virtual, so doesn't really exist as a type for capabilities. When passed in as a type, AP is the one that's really intended. Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-02vxlan: Fix nexthop hash sizeBenjamin Poirier1-2/+2
The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-01xfrm: don't skip free of empty state in acquire policyLeon Romanovsky1-0/+1
In destruction flow, the assignment of NULL to xso->dev caused to skip of xfrm_dev_state_free() call, which was called in xfrm_state_put(to_put) routine. Instead of open-coded variant of xfrm_dev_state_delete() and xfrm_dev_state_free(), let's use them directly. Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-07-29net: annotate data-races around sk->sk_markEric Dumazet3-6/+7
sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29net: gro: fix misuse of CB in udp socket lookupRichard Gobert1-0/+43
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the device from CB. The fix changes it to fetch the device from `skb->dev`. l3mdev case requires special attention since it has a master and a slave device. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-25tcp: Reduce chance of collisions in inet6_hashfn().Stewart Smith1-6/+2
For both IPv4 and IPv6 incoming TCP connections are tracked in a hash table with a hash over the source & destination addresses and ports. However, the IPv6 hash is insufficient and can lead to a high rate of collisions. The IPv6 hash used an XOR to fit everything into the 96 bits for the fast jenkins hash, meaning it is possible for an external entity to ensure the hash collides, thus falling back to a linear search in the bucket, which is slow. We take the approach of hash the full length of IPv6 address in __ipv6_addr_jhash() so that all users can benefit from a more secure version. While this may look like it adds overhead, the reality of modern CPUs means that this is unmeasurable in real world scenarios. In simulating with llvm-mca, the increase in cycles for the hashing code was ~16 cycles on Skylake (from a base of ~155), and an extra ~9 on Nehalem (base of ~173). In commit dd6d2910c5e0 ("netfilter: conntrack: switch to siphash") netfilter switched from a jenkins hash to a siphash, but even the faster hsiphash is a more significant overhead (~20-30%) in some preliminary testing. So, in this patch, we keep to the more conservative approach to ensure we don't add much overhead per SYN. In testing, this results in a consistently even spread across the connection buckets. In both testing and real-world scenarios, we have not found any measurable performance impact. Fixes: 08dcdbf6a7b9 ("ipv6: use a stronger hash for tcp") Signed-off-by: Stewart Smith <trawets@amazon.com> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230721222410.17914-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24vxlan: calculate correct header length for GPEJiri Benc1-4/+9
VXLAN-GPE does not add an extra inner Ethernet header. Take that into account when calculating header length. This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is cached. In the collect_md mode (which is the only mode that VXLAN-GPE supports), there's no magic auto-setting of the tunnel interface MTU. It can't be, since the destination and thus the underlying interface may be different for each packet. So, the administrator is responsible for setting the correct tunnel interface MTU. Apparently, the administrators are capable enough to calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36). They set the tunnel interface MTU to 1464. If you run a TCP stream over such interface, it's then segmented according to the MTU 1464, i.e. producing 1514 bytes frames. Which is okay, this still fits the lower MTU. However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as the header size and thus incorrectly calculates the frame size to be 1528. This leads to ICMP too big message being generated (locally), PMTU of 1450 to be cached and the TCP stream to be resegmented. The fix is to use the correct actual header size, especially for skb_tunnel_check_pmtu calculation. Fixes: e1e5314de08ba ("vxlan: implement GPE") Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-20Merge tag 'for-net-2023-07-20' of ↵Jakub Kicinski1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix building with coredump disabled - Fix use-after-free in hci_remove_adv_monitor - Use RCU for hci_conn_params and iterate safely in hci_sync - Fix locking issues on ISO and SCO - Fix bluetooth on Intel Macbook 2014 * tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Use correct address for memcpy() Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 Bluetooth: SCO: fix sco_conn related locking and validity issues Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() Bluetooth: coredump: fix building with coredump disabled Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_event: call disconnect callback before deleting conn Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync ==================== Link: https://lore.kernel.org/r/20230720190201.446469-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20tcp: annotate data-races around tp->notsent_lowatEric Dumazet1-1/+5
tp->notsent_lowat can be read locklessly from do_tcp_getsockopt() and tcp_poll(). Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20tcp: annotate data-races around tp->keepalive_probesEric Dumazet1-2/+7
do_tcp_getsockopt() reads tp->keepalive_probes while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20tcp: annotate data-races around tp->keepalive_intvlEric Dumazet1-2/+7
do_tcp_getsockopt() reads tp->keepalive_intvl while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20tcp: annotate data-races around tp->keepalive_timeEric Dumazet1-2/+5
do_tcp_getsockopt() reads tp->keepalive_time while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>