diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-08-04 02:29:08 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-08-04 02:29:08 +0300 |
commit | f86d1fbbe7858884d6754534a0afbb74fc30bc26 (patch) | |
tree | f61796870edefbe77d495e9d719c68af1d14275b /net/ipv6/seg6_iptunnel.c | |
parent | 526942b8134cc34d25d27f95dfff98b8ce2f6fcd (diff) | |
parent | 7c6327c77d509e78bff76f2a4551fcfee851682e (diff) | |
download | linux-f86d1fbbe7858884d6754534a0afbb74fc30bc26.tar.xz |
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
Diffstat (limited to 'net/ipv6/seg6_iptunnel.c')
-rw-r--r-- | net/ipv6/seg6_iptunnel.c | 140 |
1 files changed, 138 insertions, 2 deletions
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index e756ba705fd9..34db881204d2 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -36,9 +36,11 @@ static size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo) case SEG6_IPTUN_MODE_INLINE: break; case SEG6_IPTUN_MODE_ENCAP: + case SEG6_IPTUN_MODE_ENCAP_RED: head = sizeof(struct ipv6hdr); break; case SEG6_IPTUN_MODE_L2ENCAP: + case SEG6_IPTUN_MODE_L2ENCAP_RED: return 0; } @@ -197,6 +199,124 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) } EXPORT_SYMBOL_GPL(seg6_do_srh_encap); +/* encapsulate an IPv6 packet within an outer IPv6 header with reduced SRH */ +static int seg6_do_srh_encap_red(struct sk_buff *skb, + struct ipv6_sr_hdr *osrh, int proto) +{ + __u8 first_seg = osrh->first_segment; + struct dst_entry *dst = skb_dst(skb); + struct net *net = dev_net(dst->dev); + struct ipv6hdr *hdr, *inner_hdr; + int hdrlen = ipv6_optlen(osrh); + int red_tlv_offset, tlv_offset; + struct ipv6_sr_hdr *isrh; + bool skip_srh = false; + __be32 flowlabel; + int tot_len, err; + int red_hdrlen; + int tlvs_len; + + if (first_seg > 0) { + red_hdrlen = hdrlen - sizeof(struct in6_addr); + } else { + /* NOTE: if tag/flags and/or other TLVs are introduced in the + * seg6_iptunnel infrastructure, they should be considered when + * deciding to skip the SRH. + */ + skip_srh = !sr_has_hmac(osrh); + + red_hdrlen = skip_srh ? 0 : hdrlen; + } + + tot_len = red_hdrlen + sizeof(struct ipv6hdr); + + err = skb_cow_head(skb, tot_len + skb->mac_len); + if (unlikely(err)) + return err; + + inner_hdr = ipv6_hdr(skb); + flowlabel = seg6_make_flowlabel(net, skb, inner_hdr); + + skb_push(skb, tot_len); + skb_reset_network_header(skb); + skb_mac_header_rebuild(skb); + hdr = ipv6_hdr(skb); + + /* based on seg6_do_srh_encap() */ + if (skb->protocol == htons(ETH_P_IPV6)) { + ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)), + flowlabel); + hdr->hop_limit = inner_hdr->hop_limit; + } else { + ip6_flow_hdr(hdr, 0, flowlabel); + hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb)); + + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); + IP6CB(skb)->iif = skb->skb_iif; + } + + /* no matter if we have to skip the SRH or not, the first segment + * always comes in the pushed IPv6 header. + */ + hdr->daddr = osrh->segments[first_seg]; + + if (skip_srh) { + hdr->nexthdr = proto; + + set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr); + goto out; + } + + /* we cannot skip the SRH, slow path */ + + hdr->nexthdr = NEXTHDR_ROUTING; + isrh = (void *)hdr + sizeof(struct ipv6hdr); + + if (unlikely(!first_seg)) { + /* this is a very rare case; we have only one SID but + * we cannot skip the SRH since we are carrying some + * other info. + */ + memcpy(isrh, osrh, hdrlen); + goto srcaddr; + } + + tlv_offset = sizeof(*osrh) + (first_seg + 1) * sizeof(struct in6_addr); + red_tlv_offset = tlv_offset - sizeof(struct in6_addr); + + memcpy(isrh, osrh, red_tlv_offset); + + tlvs_len = hdrlen - tlv_offset; + if (unlikely(tlvs_len > 0)) { + const void *s = (const void *)osrh + tlv_offset; + void *d = (void *)isrh + red_tlv_offset; + + memcpy(d, s, tlvs_len); + } + + --isrh->first_segment; + isrh->hdrlen -= 2; + +srcaddr: + isrh->nexthdr = proto; + set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr); + +#ifdef CONFIG_IPV6_SEG6_HMAC + if (unlikely(!skip_srh && sr_has_hmac(isrh))) { + err = seg6_push_hmac(net, &hdr->saddr, isrh); + if (unlikely(err)) + return err; + } +#endif + +out: + hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + + skb_postpush_rcsum(skb, hdr, tot_len); + + return 0; +} + /* insert an SRH within an IPv6 packet, just after the IPv6 header */ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh) { @@ -269,6 +389,7 @@ static int seg6_do_srh(struct sk_buff *skb) return err; break; case SEG6_IPTUN_MODE_ENCAP: + case SEG6_IPTUN_MODE_ENCAP_RED: err = iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6); if (err) return err; @@ -280,7 +401,11 @@ static int seg6_do_srh(struct sk_buff *skb) else return -EINVAL; - err = seg6_do_srh_encap(skb, tinfo->srh, proto); + if (tinfo->mode == SEG6_IPTUN_MODE_ENCAP) + err = seg6_do_srh_encap(skb, tinfo->srh, proto); + else + err = seg6_do_srh_encap_red(skb, tinfo->srh, proto); + if (err) return err; @@ -289,6 +414,7 @@ static int seg6_do_srh(struct sk_buff *skb) skb->protocol = htons(ETH_P_IPV6); break; case SEG6_IPTUN_MODE_L2ENCAP: + case SEG6_IPTUN_MODE_L2ENCAP_RED: if (!skb_mac_header_was_set(skb)) return -EINVAL; @@ -298,7 +424,13 @@ static int seg6_do_srh(struct sk_buff *skb) skb_mac_header_rebuild(skb); skb_push(skb, skb->mac_len); - err = seg6_do_srh_encap(skb, tinfo->srh, IPPROTO_ETHERNET); + if (tinfo->mode == SEG6_IPTUN_MODE_L2ENCAP) + err = seg6_do_srh_encap(skb, tinfo->srh, + IPPROTO_ETHERNET); + else + err = seg6_do_srh_encap_red(skb, tinfo->srh, + IPPROTO_ETHERNET); + if (err) return err; @@ -517,6 +649,10 @@ static int seg6_build_state(struct net *net, struct nlattr *nla, break; case SEG6_IPTUN_MODE_L2ENCAP: break; + case SEG6_IPTUN_MODE_ENCAP_RED: + break; + case SEG6_IPTUN_MODE_L2ENCAP_RED: + break; default: return -EINVAL; } |