kernel/linux.git/include/net/netns, branch v6.11.8

Merge tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

2024-07-14T14:56:32+00:00

Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski

xfrm: support sending NAT keepalives in ESP in UDP states

2024-06-26T11:22:42+00:00

Add the ability to send out RFC-3948 NAT keepalives from the xfrm stack. To use, Userspace sets an XFRM_NAT_KEEPALIVE_INTERVAL integer property when creating XFRM outbound states which denotes the number of seconds between keepalive messages. Keepalive messages are sent from a per net delayed work which iterates over the xfrm states. The logic is guarded by the xfrm state spinlock due to the xfrm state walk iterator. Possible future enhancements: - Adding counters to keep track of sent keepalives. - deduplicate NAT keepalives between states sharing the same nat keepalive parameters. - provisioning hardware offloads for devices capable of implementing this. - revise xfrm state list to use an rcu list in order to avoid running this under spinlock. Suggested-by: Paul Wouters Tested-by: Paul Wouters Tested-by: Antony Antony Signed-off-by: Eyal Birger Signed-off-by: Steffen Klassert

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2024-06-20T20:49:59+00:00

Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c 1e7962114c10 ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error") 165f87691a89 ("bnxt_en: add timestamping statistics support") No adjacent changes. Signed-off-by: Jakub Kicinski

netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core

2024-06-19T16:41:59+00:00

Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the nf_conntrack module, but the nf_conntrack module is not always loaded. Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error. Move sysctl nf_hooks_lwtunnel into the netfilter core. Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Suggested-by: Pablo Neira Ayuso Signed-off-by: Jianguo Wu Signed-off-by: Pablo Neira Ayuso

net: ipv4: Add a sysctl to set multipath hash seed

2024-06-12T23:42:11+00:00

When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.com Signed-off-by: Jakub Kicinski

tcp: add sysctl_tcp_rto_min_us

2024-06-05T12:42:54+00:00

Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. Signed-off-by: Kevin Yang Reviewed-by: Neal Cardwell Reviewed-by: Yuchung Cheng Reviewed-by: Eric Dumazet Reviewed-by: Tony Lu Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller

net: Namespace-ify sysctl_optmem_max

2023-12-15T11:01:27+00:00

optmem_max being used in tx zerocopy, we want to be able to control it on a netns basis. Following patch changes two tests. Tested: oqq130:~# cat /proc/sys/net/core/optmem_max 131072 oqq130:~# echo 1000000 >/proc/sys/net/core/optmem_max oqq130:~# cat /proc/sys/net/core/optmem_max 1000000 oqq130:~# unshare -n oqq130:~# cat /proc/sys/net/core/optmem_max 131072 oqq130:~# exit logout oqq130:~# cat /proc/sys/net/core/optmem_max 1000000 Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Acked-by: Neal Cardwell Signed-off-by: David S. Miller

Use READ/WRITE_ONCE() for IP local_port_range.

2023-12-08T18:44:42+00:00

Commit 227b60f5102cd added a seqlock to ensure that the low and high port numbers were always updated together. This is overkill because the two 16bit port numbers can be held in a u32 and read/written in a single instruction. More recently 91d0b78c5177f added support for finer per-socket limits. The user-supplied value is 'high << 16 | low' but they are held separately and the socket options protected by the socket lock. Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk' fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are always updated together. Change (the now trival) inet_get_local_port_range() to a static inline to optimise the calling code. (In particular avoiding returning integers by reference.) Signed-off-by: David Laight Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Acked-by: Mat Martineau Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/4e505d4198e946a8be03fb1b4c3072b0@AcuMS.aculab.com Signed-off-by: Jakub Kicinski

netns-ipv4: reorganize netns_ipv4 fast path variables

2023-12-02T22:24:36+00:00

Reorganize fast path variables on tx-txrx-rx order. Fastpath cacheline ends after sysctl_tcp_rmem. There are only read-only variables here. (write is on the control path and not considered in this case) Below data generated with pahole on x86 architecture. Fast path variables span cache lines before change: 4 Fast path variables span cache lines after change: 2 Suggested-by: Eric Dumazet Reviewed-by: Wei Wang Reviewed-by: David Ahern Signed-off-by: Coco Li Reviewed-by: Eric Dumazet Reviewed-by: Shakeel Butt Signed-off-by: David S. Miller

net/smc: add sysctl for max conns per lgr for SMC-R v2.1

2023-11-24T12:13:14+00:00

Add a new sysctl: net.smc.smcr_max_conns_per_lgr, which is used to control the preferred max connections per lgr for SMC-R v2.1. The default value of this sysctl is 255, and the acceptable value ranges from 16 to 255. Signed-off-by: Guangguan Wang Reviewed-by: Dust Li Signed-off-by: David S. Miller