summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)AuthorFilesLines
2024-10-30net: fix crash when config small gso_max_size/gso_ipv4_max_sizeWang Liang1-2/+2
Config a small gso_max_size/gso_ipv4_max_size will lead to an underflow in sk_dst_gso_max_size(), which may trigger a BUG_ON crash, because sk->sk_gso_max_size would be much bigger than device limits. Call Trace: tcp_write_xmit tso_segs = tcp_init_tso_segs(skb, mss_now); tcp_set_skb_tso_segs tcp_skb_pcount_set // skb->len = 524288, mss_now = 8 // u16 tso_segs = 524288/8 = 65535 -> 0 tso_segs = DIV_ROUND_UP(skb->len, mss_now) BUG_ON(!tso_segs) Add check for the minimum value of gso_max_size and gso_ipv4_max_size. Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation") Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241023035213.517386-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29rtnetlink: Fix kdoc of rtnl_af_register().Kuniyuki Iwashima1-1/+1
Commit 26eebdc4b005 ("rtnetlink: Return int from rtnl_af_register().") made rtnl_af_register() return int again, and kdoc needs to be fixed up. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241022210320.86111-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ipv4: Convert devinet_ioctl to per-netns RTNL.Kuniyuki Iwashima1-3/+3
ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns. Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29rtnetlink: Define rtnl_net_trylock().Kuniyuki Iwashima1-0/+11
We will need the per-netns version of rtnl_trylock(). rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock() successfully holds RTNL. When RTNL is removed, we will use mutex_trylock() for per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()Cong Wang1-0/+4
The following race condition could trigger a NULL pointer dereference: sock_map_link_detach(): sock_map_link_update_prog(): mutex_lock(&sockmap_mutex); ... sockmap_link->map = NULL; mutex_unlock(&sockmap_mutex); mutex_lock(&sockmap_mutex); ... sock_map_prog_link_lookup(sockmap_link->map); mutex_unlock(&sockmap_mutex); <continue> Fix it by adding a NULL pointer check. In this specific case, it makes no sense to update a link which is being released. Reported-by: Ruan Bonan <bonan.ruan@u.nus.edu> Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs") Cc: Yonghong Song <yonghong.song@linux.dev> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20241026185522.338562-1-xiyou.wangcong@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-29neighbour: use kvzalloc()/kvfree()Eric Dumazet1-17/+2
mm layer is providing convenient functions, we do not have to work around old limitations. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241022150059.1345406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2-3/+13
Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov4-35/+69
Cross-merge bpf fixes after downstream PR. No conflicts. Adjacent changes in: include/linux/bpf.h include/uapi/linux/bpf.h kernel/bpf/btf.c kernel/bpf/helpers.c kernel/bpf/syscall.c kernel/bpf/verifier.c kernel/trace/bpf_trace.c mm/slab_common.c tools/include/uapi/linux/bpf.h tools/testing/selftests/bpf/Makefile Link: https://lore.kernel.org/all/20241024215724.60017-1-daniel@iogearbox.net/ Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-24bpf: Add "bool swap_uptrs" arg to bpf_local_storage_update() and ↵Martin KaFai Lau1-3/+3
bpf_selem_alloc() In a later patch, the task local storage will only accept uptr from the syscall update_elem and will not accept uptr from the bpf prog. The reason is the bpf prog does not have a way to provide a valid user space address. bpf_local_storage_update() and bpf_selem_alloc() are used by both bpf prog bpf_task_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE) and bpf syscall update_elem. "bool swap_uptrs" arg is added to bpf_local_storage_update() and bpf_selem_alloc() to tell if it is called by the bpf prog or by the bpf syscall. When swap_uptrs==true, it is called by the syscall. The arg is named (swap_)uptrs because the later patch will swap the uptrs between the newly allocated selem and the user space provided map_value. It will make error handling easier in case map->ops->map_update_elem() fails and the caller can decide if it needs to unpin the uptr in the user space provided map_value or the bpf_local_storage_update() has already taken the uptr ownership and will take care of unpinning it also. Only swap_uptrs==false is passed now. The logic to handle the true case will be added in a later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20241023234759.860539-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23netpoll: remove ndo_netpoll_setup() second argumentEric Dumazet1-1/+1
npinfo is not used in any of the ndo_netpoll_setup() methods. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: sysctl: allow dump_cpumask to handle higher numbers of CPUsAntoine Tenart1-9/+23
This fixes the output of rps_default_mask and flow_limit_cpu_bitmap when the CPU count is > 448, as it was truncated. The underlying values are actually stored correctly when writing to these sysctl but displaying them uses a fixed length temporary buffer in dump_cpumask. This buffer can be too small if the CPU count is > 448. Fix this by dynamically allocating the buffer in dump_cpumask, using a guesstimate of what we need. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: sysctl: do not reserve an extra char in dump_cpumask temporary bufferAntoine Tenart1-1/+1
When computing the length we'll be able to use out of the buffers, one char is removed from the temporary one to make room for a newline. It should be removed from the output buffer length too, but in reality this is not needed as the later call to scnprintf makes sure a null char is written at the end of the buffer which we override with the newline. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: sysctl: remove always-true conditionAntoine Tenart1-2/+4
Before adding a new line at the end of the temporary buffer in dump_cpumask, a length check is performed to ensure there is space for it. len = min(sizeof(kbuf) - 1, *lenp); len = scnprintf(kbuf, len, ...); if (len < *lenp) kbuf[len++] = '\n'; Note that the check is currently logically wrong, the written length is compared against the output buffer, not the temporary one. However this has no consequence as this is always true, even if fixed: scnprintf includes a null char at the end of the buffer but the returned length do not include it and there is always space for overriding it with a newline. Remove the condition. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: use sock_valbool_flag() only in __sock_set_timestamps()Yajun Deng1-5/+2
sock_{,re}set_flag() are contained in sock_valbool_flag(), it would be cleaner to just use sock_valbool_flag(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Link: https://patch.msgid.link/20241017133435.2552-1-yajun.deng@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23bpf: Remove MEM_UNINIT from skb/xdp MTU helpersDaniel Borkmann1-27/+15
We can now undo parts of 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error") as discussed in [0]. Given the BPF helpers now have MEM_WRITE tag, the MEM_UNINIT can be cleared. The mtu_len is an input as well as output argument, meaning, the BPF program has to set it to something. It cannot be uninitialized. Therefore, allowing uninitialized memory and zeroing it on error would be odd. It was done as an interim step in 4b3786a6c539 as the desired behavior could not have been expressed before the introduction of MEM_WRITE tag. Fixes: 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/a86eb76d-f52f-dee4-e5d2-87e45de3e16f@iogearbox.net [0] Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23bpf: Add MEM_WRITE attributeDaniel Borkmann1-2/+2
Add a MEM_WRITE attribute for BPF helper functions which can be used in bpf_func_proto to annotate an argument type in order to let the verifier know that the helper writes into the memory passed as an argument. In the past MEM_UNINIT has been (ab)used for this function, but the latter merely tells the verifier that the passed memory can be uninitialized. There have been bugs with overloading the latter but aside from that there are also cases where the passed memory is read + written which currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22rtnetlink: Protect struct rtnl_af_ops with SRCU.Kuniyuki Iwashima1-16/+47
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK even when its module is being unloaded. Let's use SRCU to protect ops. rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns SRCU-protected ops pointer. The caller must call rtnl_af_put() to release the pointer after the use. Also, rtnl_af_unregister() unlinks the ops first and calls synchronize_srcu() to wait for inflight RTM_SETLINK requests to complete. Note that rtnl_af_ops needs to be protected by its dedicated lock when RTNL is removed. Note also that BUG_ON() in do_setlink() is changed to the normal error handling as a different af_ops might be found after validate_linkmsg(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Return int from rtnl_af_register().Kuniyuki Iwashima1-1/+3
The next patch will add init_srcu_struct() in rtnl_af_register(), then we need to handle its error. Let's add the error handling in advance to make the following patch cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().Kuniyuki Iwashima1-16/+16
We will push RTNL down to rtnl_setlink(). RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink() to move a dev to a new netns, but the netns needs to be fetched before holding rtnl_net_lock(). Let's move it to rtnl_setlink() and pass the netns to do_setlink(). Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink()) can pass the prefetched netns to do_setlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Clean up rtnl_setlink().Kuniyuki Iwashima1-10/+7
We will push RTNL down to rtnl_setlink(). Let's unify the error path to make it easy to place rtnl_net_lock(). While at it, keep the variables in reverse xmas order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Clean up rtnl_dellink().Kuniyuki Iwashima1-17/+10
We will push RTNL down to rtnl_delink(). Let's unify the error path to make it easy to place rtnl_net_lock(). While at it, keep the variables in reverse xmas order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().Kuniyuki Iwashima1-25/+24
Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and is fetched in rtnl_newlink_create(). This must be done before holding rtnl_net_lock(). Let's move IFLA_LINK_NETNSID processing to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().Kuniyuki Iwashima1-23/+28
As a prerequisite of per-netns RTNL, we must fetch netns before looking up dev or moving it to another netns. rtnl_link_get_net_capable() is called in rtnl_newlink_create() and do_setlink(), but both of them need to be moved to the RTNL-independent region, which will be rtnl_newlink(). Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the netns down to where needed. Note that the latter two have not passed the nets to do_setlink() yet but will do so after the remaining rtnl_link_get_net_capable() is moved to rtnl_setlink() later. While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to align with rtnl_{del,set}link(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Protect struct rtnl_link_ops with SRCU.Kuniyuki Iwashima1-22/+61
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK even when its module is being unloaded. Let's use SRCU to protect ops. rtnl_link_ops_get() now iterates link_ops under RCU and returns SRCU-protected ops pointer. The caller must call rtnl_link_ops_put() to release the pointer after the use. Also, __rtnl_link_unregister() unlinks the ops first and calls synchronize_srcu() to wait for inflight RTM_NEWLINK requests to complete. Note that link_ops needs to be protected by its dedicated lock when RTNL is removed. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move ops->validate to rtnl_newlink().Kuniyuki Iwashima1-25/+24
ops->validate() does not require RTNL. Let's move it to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move rtnl_link_ops_get() and retry to rtnl_newlink().Kuniyuki Iwashima1-24/+18
Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(), we release RTNL and redo the whole process after request_module(), which complicates the logic. The ops will be RTNL-independent later. Let's move the ops lookup to rtnl_newlink() and do the retry earlier. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink().Kuniyuki Iwashima1-19/+24
We will push RTNL down to rtnl_newlink(). Let's move RTNL-independent validation to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Factorise do_setlink() path from __rtnl_newlink().Kuniyuki Iwashima1-68/+74
__rtnl_newlink() got too long to maintain. For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even when IFLA_INFO_SLAVE_DATA is not specified. Let's factorise the single dev do_setlink() path to a separate function. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Call validate_linkmsg() in do_setlink().Kuniyuki Iwashima1-11/+4
There are 3 paths that finally call do_setlink(), and validate_linkmsg() is called in each path. 1. RTM_NEWLINK 1-1. dev is found in __rtnl_newlink() 1-2. dev isn't found, but IFLA_GROUP is specified in rtnl_group_changelink() 2. RTM_SETLINK The next patch factorises 1-1 to a separate function. As a preparation, let's move validate_linkmsg() calls to do_setlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs.Kuniyuki Iwashima1-3/+5
We will move linkinfo to rtnl_newlink() and pass it down to other functions. Let's pack it into rtnl_newlink_tbs. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-19Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds2-3/+13
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to not affect subreg_def marks in its range propagation (Eduard Zingerman) - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx (Dimitar Kanaliev) - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition (Daniel Borkmann) - Fix a NULL pointer dereference in BPF devmap due to missing rxq information (Florian Kauer) - Fix a memory leak in bpf_core_apply (Jiri Olsa) - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs (Hou Tao) - Fix build ID fetching where memory areas backing the file were created with memfd_secret (Andrii Nakryiko) - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid (Jordan Rome) - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks (Michal Luczaj) - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri) - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall (Pu Lehui) - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved (Thomas Weißschuh) - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned (Toke Høiland-Jørgensen) - Fix a BPF selftest compilation error in cgroup-related tests with musl libc (Tony Ambardar) - Several fixes to BPF link info dumps to fill missing fields (Tyrone Wu) - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called (Simon Sundberg) - Ensure that internal and user-facing bpf_redirect flags don't overlap (Toke Høiland-Jørgensen) - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van Riel) - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT (Wander Lairson Costa) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits) lib/buildid: Handle memfd_secret() files in build_id_parse() selftests/bpf: Add test case for delta propagation bpf: Fix print_reg_state's constant scalar dump bpf: Fix incorrect delta propagation between linked registers bpf: Properly test iter/task tid filtering bpf: Fix iter/task tid filtering riscv, bpf: Make BPF_CMPXCHG fully ordered bpf, vsock: Drop static vsock_bpf_prot initialization vsock: Update msg_count on read_skb() vsock: Update rx_bytes on read_skb() bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock selftests/bpf: Add asserts for netfilter link info bpf: Fix link info netfilter flags to populate defrag flag selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx() selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx() bpf: Fix truncation bug in coerce_reg_to_size_sx() selftests/bpf: Assert link info uprobe_multi count & path_size if unset bpf: Fix unpopulated path_size when uprobe_multi fields unset selftests/bpf: Fix cross-compiling urandom_read selftests/bpf: Add test for kfunc module order ...
2024-10-17bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsockMichal Luczaj1-0/+8
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure to immediately and visibly fail the forwarding of unsupported af_vsock packets. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
2024-10-16rtnetlink: Remove rtnl_register() and rtnl_register_module().Kuniyuki Iwashima1-53/+21
No one uses rtnl_register() and rtnl_register_module(). Let's remove them. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014201828.91221-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16ipv4: Use rtnl_register_many().Kuniyuki Iwashima1-7/+10
We will remove rtnl_register() in favour of rtnl_register_many(). When it succeeds, rtnl_register_many() guarantees all rtnetlink types in the passed array are supported, and there is no chance that a part of message types is not supported. Let's use rtnl_register_many() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014201828.91221-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16net: Use rtnl_register_many().Kuniyuki Iwashima1-5/+9
We will remove rtnl_register() in favour of rtnl_register_many(). When it succeeds, rtnl_register_many() guarantees all rtnetlink types in the passed array are supported, and there is no chance that a part of message types is not supported. Let's use rtnl_register_many() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014201828.91221-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16neighbour: Use rtnl_register_many().Kuniyuki Iwashima1-9/+10
We will remove rtnl_register() in favour of rtnl_register_many(). When it succeeds, rtnl_register_many() guarantees all rtnetlink types in the passed array are supported, and there is no chance that a part of message types is not supported. Let's use rtnl_register_many() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014201828.91221-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16rtnetlink: Use rtnl_register_many().Kuniyuki Iwashima1-30/+33
We will remove rtnl_register() in favour of rtnl_register_many(). When it succeeds, rtnl_register_many() guarantees all rtnetlink types in the passed array are supported, and there is no chance that a part of message types is not supported. Let's use rtnl_register_many() instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241014201828.91221-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16rtnetlink: Panic when __rtnl_register_many() fails for builtin callers.Kuniyuki Iwashima1-0/+4
We will replace all rtnl_register() and rtnl_register_module() with rtnl_register_many(). Currently, rtnl_register() returns nothing and prints an error message when it fails to register a rtnetlink message type and handlers. The failure happens only when rtnl_register_internal() fails to allocate rtnl_msg_handlers[protocol][msgtype], but it's unlikely for built-in callers on boot time. rtnl_register_many() unwinds the previous successful registrations on failure and returns an error, but it will be useless for built-in callers, especially some subsystems that do not have the legacy ioctl() interface and do not work without rtnetlink. Instead of booting up without rtnetlink functionality, let's panic on failure for built-in rtnl_register_many() callers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014201828.91221-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-16Revert "net: do not leave a dangling sk pointer, when socket creation fails"Ignat Korchagin1-3/+0
This reverts commit 6cd4a78d962bebbaf8beb7d2ead3f34120e3f7b2. inet/inet6->create() implementations have been fixed to explicitly NULL the allocated sk object on error. A warning was put in place to make sure any future changes will not leave a dangling pointer in pf->create() implementations. So this code is now redundant. Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-10-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge tag 'for-netdev' of ↵Paolo Abeni1-4/+13
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-10-14 The following pull-request contains BPF updates for your *net-next* tree. We've added 21 non-merge commits during the last 18 day(s) which contain a total of 21 files changed, 1185 insertions(+), 127 deletions(-). The main changes are: 1) Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads, from Maciej Fijalkowski. 2) Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap, from Alexis Lothoré (eBPF Foundation). 3) Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program, from Daniel Borkmann. 4) Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs, from Mahe Tardy. 5) Extend BPF selftests covering a BPF program setting socket options per MPTCP subflow, from Geliang Tang and Nicolas Rybowski. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (21 commits) xsk: Use xsk_buff_pool directly for cq functions xsk: Wrap duplicated code to function xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool xsk: Get rid of xdp_buff_xsk::orig_addr xsk: s/free_list_node/list_node/ xsk: Get rid of xdp_buff_xsk::xskb_list_node selftests/bpf: check program redirect in xdp_cpumap_attach selftests/bpf: make xdp_cpumap_attach keep redirect prog attached selftests/bpf: fix bpf_map_redirect call for cpu map test selftests/bpf: add tcx netns cookie tests bpf: add get_netns_cookie helper to tc programs selftests/bpf: add missing header include for htons selftests/bpf: Extend netkit tests to validate skb meta data tools: Sync if_link.h uapi tooling header netkit: Add add netkit scrub support to rt_link.yaml netkit: Simplify netkit mode over to use NLA_POLICY_MAX netkit: Add option for scrubbing skb meta data bpf: Remove unused macro selftests/bpf: Add mptcp subflow subtest selftests/bpf: Add getsockopt to inspect mptcp subflow ... ==================== Link: https://patch.msgid.link/20241014211110.16562-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15rtnl_net_debug: Remove rtnl_net_debug_exit().Kuniyuki Iwashima1-6/+0
kernel test robot reported section mismatch in rtnl_net_debug_exit(). WARNING: modpost: vmlinux: section mismatch in reference: rtnl_net_debug_exit+0x20 (section: .exit.text) -> rtnl_net_debug_net_ops (section: .init.data) rtnl_net_debug_exit() uses rtnl_net_debug_net_ops() that is annotated as __net_initdata, but this file is always built-in. Let's remove rtnl_net_debug_exit(). Fixes: 03fa53485659 ("rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410101854.i0vQCaDz-lkp@intel.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241010172433.67694-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15tools: ynl-gen: use names of constants in generated limitsJakub Kicinski1-3/+3
YNL specs can use string expressions for limits, like s32-min or u16-max. We convert all of those into their numeric values when generating the code, which isn't always helpful. Try to retain the string representations in the output. Any sort of calculations still need the integers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241010151248.2049755-1-kuba@kernel.org [pabeni@redhat.com: regenerated netdev-genl-gen.c] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15netdev-genl: Support setting per-NAPI config valuesJoe Damato3-0/+64
Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241011184527.16393-7-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: napi: Add napi_configJoe Damato2-7/+85
Add a persistent NAPI config area for NAPI configuration to the core. Drivers opt-in to setting the persistent config for a NAPI by passing an index when calling netif_napi_add_config. napi_config is allocated in alloc_netdev_mqs, freed in free_netdev (after the NAPIs are deleted). Drivers which call netif_napi_add_config will have persistent per-NAPI settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings. Per-NAPI settings are saved in napi_disable and restored in napi_enable. Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15netdev-genl: Dump gro_flush_timeoutJoe Damato1-0/+6
Support dumping gro_flush_timeout for a NAPI ID. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241011184527.16393-5-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: napi: Make gro_flush_timeout per-NAPIJoe Damato3-7/+47
Allow per-NAPI gro_flush_timeout setting. The existing sysfs parameter is respected; writes to sysfs will write to all NAPI structs for the device and the net_device gro_flush_timeout field. Reads from sysfs will read from the net_device field. The ability to set gro_flush_timeout on specific NAPI instances will be added in a later commit, via netdev-genl. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15netdev-genl: Dump napi_defer_hard_irqsJoe Damato1-0/+6
Support dumping defer_hard_irqs for a NAPI ID. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: napi: Make napi_defer_hard_irqs per-NAPIJoe Damato3-6/+42
Add defer_hard_irqs to napi_struct in preparation for per-NAPI settings. The existing sysfs parameter is respected; writes to sysfs will write to all NAPI structs for the device and the net_device defer_hard_irq field. Reads from sysfs show the net_device field. The ability to set defer_hard_irqs on specific NAPI instances will be added in a later commit, via netdev-genl. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: add skb_set_owner_edemux() helperEric Dumazet1-6/+3
This can be used to attach a socket to an skb, taking a reference on sk->sk_refcnt. This helper might be a NOP if sk->sk_refcnt is zero. Use it from tcp_make_synack(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Link: https://patch.msgid.link/20241010174817.1543642-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: add TIME_WAIT logic to sk_to_full_sk()Eric Dumazet1-5/+1
TCP will soon attach TIME_WAIT sockets to some ACK and RST. Make sure sk_to_full_sk() detects this and does not return a non full socket. v3: also changed sk_const_to_full_sk() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Brian Vazquez <brianvv@google.com> Link: https://patch.msgid.link/20241010174817.1543642-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>