summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-06netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null intervalPablo Neira Ayuso2-0/+18
Userspace adds a non-matching null element to the kernel for historical reasons. This null element is added when the set is populated with elements. Inclusion of this element is conditional, therefore, userspace needs to dump the set content to check for its presence. If the NLM_F_CREATE flag is turned on, this becomes an issue because kernel bogusly reports EEXIST. Add special case to ignore NLM_F_CREATE in this case, therefore, re-adding the nul-element never fails. Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nft_counter: fix reset of counters on 32bit archsAnders Grahn2-2/+12
nft_counter_reset() calls u64_stats_add() with a negative value to reset the counter. This will work on 64bit archs, hence the negative value added will wrap as a 64bit value which then can wrap the stat counter as well. On 32bit archs, the added negative value will wrap as a 32bit value and _not_ wrapping the stat counter properly. In most cases, this would just lead to a very large 32bit value being added to the stat counter. Fix by introducing u64_stats_sub(). Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.") Signed-off-by: Anders Grahn <anders.grahn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nft_set_hash: fix get operation on big endianFlorian Westphal1-2/+7
tests/shell/testcases/packetpath/set_match_nomatch_hash_fast fails on big endian with: Error: Could not process rule: No such file or directory reset element ip test s { 244.147.90.126 } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Fatal: Cannot fetch element "244.147.90.126" ... because the wrong bucket is searched, jhash() and jhash1_word are not interchangeable on big endian. Fixes: 3b02b0adc242 ("netfilter: nft_set_hash: fix lookups with fixed size hash on big endian") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06selftests: netfilter: add IPV6_TUNNEL to configFlorian Westphal2-6/+14
The script now requires IPV6 tunnel support, enable this. This should have caught by CI, but as the config option is missing, the tunnel interface isn't added. This results in an error cascade that ends with "route change default" failure. That in turn means the "ipv6 tunnel" test re-uses the previous test setup so the "ip6ip6" test passes and script returns 0. Make sure to catch such bugs, set ret=1 if device cannot be added and delete the old default route before installing the new one. After this change, IPV6_TUNNEL=n kernel builds fail with the expected FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel ... while builds with IPV6_TUNNEL=m pass as before. Fixes: 5e5180352193 ("selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest") Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: flowtable: dedicated slab for flow entryQingfang Deng1-2/+10
The size of `struct flow_offload` has grown beyond 256 bytes on 64-bit kernels (currently 280 bytes) because of the `flow_offload_tunnel` member added recently. So kmalloc() allocates from the kmalloc-512 slab, causing significant memory waste per entry. Introduce a dedicated slab cache for flow entries to reduce memory footprint. Results in a reduction from 512 bytes to 320 bytes per entry on x86_64 kernels. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06selftests: netfilter: nft_queue.sh: add udp fraglist gro test caseFlorian Westphal1-6/+136
Without the preceding patch, this fails with: FAIL: test_udp_gro_ct: Expected udp conntrack entry FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 10 and 0 Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentationFlorian Westphal2-49/+75
Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nft_set_rbtree: don't gc elements on insertFlorian Westphal1-68/+68
During insertion we can queue up expired elements for garbage collection. In case of later abort, the commit hook will never be called. Packet path and 'get' requests will find free'd elements in the binary search blob: nft_set_ext_key include/net/netfilter/nf_tables.h:800 [inline] nft_array_get_cmp+0x1f6/0x2a0 net/netfilter/nft_set_rbtree.c:133 __inline_bsearch include/linux/bsearch.h:15 [inline] bsearch+0x50/0xc0 lib/bsearch.c:33 nft_rbtree_get+0x16b/0x400 net/netfilter/nft_set_rbtree.c:169 nft_setelem_get net/netfilter/nf_tables_api.c:6495 [inline] nft_get_set_elem+0x420/0xaa0 net/netfilter/nf_tables_api.c:6543 nf_tables_getsetelem+0x448/0x5e0 net/netfilter/nf_tables_api.c:6632 nfnetlink_rcv_msg+0x8ae/0x12c0 net/netfilter/nfnetlink.c:290 Also, when we insert an element that triggers -EEXIST, and that insertion happens to also zap a timed-out entry, we end up with same issue: Neither commit nor abort hook is called. Fix this by removing gc api usage during insertion. The blamed commit also removes concurrency of the rbtree with the packet path, so we can now safely rb_erase() the element and move it to a new expired list that can be reaped in the commit hook before building the next blob iteration. This also avoids the need to rebuild the blob in the abort path: Expired elements seen during insertion attempts are kept around until a transaction passes. Reported-by: syzbot+d417922a3e7935517ef6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d417922a3e7935517ef6 Fixes: 7e43e0a1141d ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06net/mlx5e: SHAMPO, Switch to header memcpyDragos Tatulea5-484/+188
Previously the HW-GRO code was using a separate page_pool for the header buffer. The pages of the header buffer were replenished via UMR. This mechanism has some drawbacks: - Reference counting on the page_pool page frags is not cheap. - UMRs have HW overhead for updating and also for access. Especially for the KLM type which was previously used. - UMR code for headers is complex. This patch switches to using a static memory area (static MTT MKEY) for the header buffer and does a header memcpy. This happens only once per GRO session. The SKB is allocated from the per-cpu NAPI SKB cache. Performance numbers for x86: +---------------------------------------------------------+ | Test | Baseline | Header Copy | Change | |---------------------+------------+-------------+--------| | iperf3 oncpu | 59.5 Gbps | 64.00 Gbps | 7 % | | iperf3 offcpu | 102.5 Gbps | 104.20 Gbps | 2 % | | kperf oncpu | 115.0 Gbps | 130.00 Gbps | 12 % | | XDP_DROP (skb mode) | 3.9 Mpps | 3.9 Mpps | 0 % | +---------------------------------------------------------+ Notes on test: - System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz - oncpu: NAPI and application running on same CPU - offcpu: NAPI and application running on different CPUs - MTU: 1500 - iperf3 tests are single stream, 60s with IPv6 (for slightly larger headers) - kperf version [1] [1] git://git.kernel.dk/kperf.git Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260204200345.1724098-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06net/mlx5: Fix 1600G link mode enum namingYael Chemla4-4/+4
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its usage in ethtool and link-info tables. Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski187-815/+1545
Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c 3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support") f66086798f91f ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge tag 'net-6.19-rc9' of ↵Linus Torvalds38-234/+513
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and Netfilter. Previous releases - regressions: - eth: stmmac: fix stm32 (and potentially others) resume regression - nf_tables: fix inverted genmask check in nft_map_catchall_activate() - usb: r8152: fix resume reset deadlock - fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts Previous releases - always broken: - sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads with malicious u32 rules - eth: ice: timestamping related fixes" * tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() net: cpsw: Execute ndo_set_rx_mode callback in a work queue net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue gve: Correct ethtool rx_dropped calculation gve: Fix stats report corruption on queue count change selftest: net: add a test-case for encap segmentation after GRO net: gro: fix outer network offset net: add proper RCU protection to /proc/net/ptype net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi() wifi: iwlwifi: mvm: pause TCM on fast resume wifi: iwlwifi: mld: cancel mlo_scan_start_wk net: spacemit: k1-emac: fix jumbo frame support net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4 net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4 net: enetc: Remove CBDR cacheability AXI settings for ENETC v4 net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4 tipc: use kfree_sensitive() for session key material net: stmmac: fix stm32 (and potentially others) resume regression net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts ...
2026-02-05net/sched: don't use dynamic lockdep keys with clsact/ingress/noqueueDavide Caratti3-6/+28
Currently we are registering one dynamic lockdep key for each allocated qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects packets to another device while the root lock is acquired [1]. Since dynamic keys are a limited resource, we can save them at least for qdiscs that are not meant to acquire the root lock in the traffic path, or to carry traffic at all, like: - clsact - ingress - noqueue Don't register dynamic keys for the above schedulers, so that we hit MAX_LOCKDEP_KEYS later in our tests. [1] https://github.com/multipath-tcp/mptcp_net-next/issues/451 Changes in v2: - change ordering of spin_lock_init() vs. lockdep_register_key() (Jakub Kicinski) Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: imx: fix iMX93 register definitionsRussell King (Oracle)1-11/+14
When looking at the iMX93 documentation, the definitions in the driver do not correspond with the documentation, which makes the driver confusing. The driver, for example, re-uses a definition for bit 0 for two different registers, where this bit have completely different purposes. Fix this by renaming the second register, and adding a definition that reflects the true purpose of bit 0 in the first register (EQOS enable.) Replace MX93_GPR_ENET_QOS_INTF_MODE_MASK with MX93_GPR_ENET_QOS_ENABLE and MX93_GPR_ENET_QOS_INTF_SEL_MASK as MX93_GPR_ENET_QOS_INTF_MODE_MASK is not a register field. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnaGl-00000007i9f-0ZMw@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6Eric Dumazet1-30/+27
TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. This patch is a first step converting IPv6 to the same strategy: Before this patch inet6_sk_rebuild_header() only validated/rebuilt a dst. Automatic variable @fl6 content was lost. After this patch inet6_sk_rebuild_header() also initializes inet->cork.fl.u.ip6, which can be reused in the future. This makes inet6_sk_rebuild_header() very similar to inet_sk_rebuild_header(). Also remove the EXPORT_SYMBOL_GPL(), inet6_sk_rebuild_header() is not called from any module. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260204163035.4123817-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge branch ↵Jakub Kicinski5-138/+107
'tcp-remove-net-core-request_sock-c-and-no-longer-inline-__reqsk_free' Eric Dumazet says: ==================== tcp: remove net/core/request_sock.c and no longer inline __reqsk_free() After DCCP removal, net/core/request_sock.c makes no more sense. Move reqsk_queue_alloc() and reqsk_fastopen_remove() to TCP files. Then put __reqsk_free() out of line to save ~2 Kbytes of text. ==================== Link: https://patch.msgid.link/20260204055147.1682705-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05tcp: move __reqsk_free() out of lineEric Dumazet2-8/+11
Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113) Function old new delta __reqsk_free - 114 +114 sock_edemux 18 82 +64 inet_csk_listen_start 233 264 +31 __pfx___reqsk_free - 16 +16 __pfx_reqsk_queue_alloc 16 - -16 __pfx_reqsk_free 16 - -16 reqsk_queue_alloc 46 - -46 tcp_req_err 272 177 -95 reqsk_fastopen_remove 348 253 -95 cookie_bpf_check 157 62 -95 cookie_tcp_reqsk_alloc 387 290 -97 cookie_v4_check 1568 1465 -103 reqsk_free 105 - -105 cookie_v6_check 1519 1412 -107 sock_gen_put 187 78 -109 sock_pfree 212 82 -130 tcp_try_fastopen 1818 1683 -135 tcp_v4_rcv 3478 3294 -184 reqsk_put 306 90 -216 tcp_get_cookie_sock 551 318 -233 tcp_v6_rcv 3404 3141 -263 tcp_conn_request 2677 2384 -293 Total: Before=24887415, After=24885302, chg -0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: get rid of net/core/request_sock.cEric Dumazet2-19/+1
After DCCP removal, this file was not needed any more. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05tcp: move reqsk_fastopen_remove to net/ipv4/tcp_fastopen.cEric Dumazet2-85/+86
This function belongs to TCP stack, not to net/core/request_sock.c We get rid of the now empty request_sock.c n the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05inet: move reqsk_queue_alloc() to net/ipv4/inet_connection_sock.cEric Dumazet3-26/+9
Only called once from inet_csk_listen_start(), it can be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge branch 'net-stmmac-rk-final-cleanups-part'Jakub Kicinski1-211/+150
Russell King says: ==================== net: stmmac: rk: final cleanups part This is the last part of my current dwmac-rk cleanups. ==================== Link: https://patch.msgid.link/aYMN2gZMfLPKuukG@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: rk3506, rk3528 and rk3588 have rmii_mode in clock registerRussell King (Oracle)1-40/+24
rk3506, rk3528 and rk3588 have the rmii_mode bit in the clock GRF register rather than the gmac GRF register. Provide a mask for this field in the clock register, and convert these SoCs to use this. Add the necessary code in rk_gmac_powerup() to write this field. This allows us to get rid of these SoCs set_to_rmii() function. As such, we need to mark these SoCs as supporting RMII mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588 Link: https://patch.msgid.link/E1vnYyB-00000007hpF-1dwK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: use rk_encode_wm16() for clock selectionRussell King (Oracle)1-99/+75
Use rk_encode_wm16() for RMII clock gating control, and also for the io_clksel bit used to select the transmit clock between CRU-derived and IO-derived clock sources. Both of these were configured via the "set_clock_selection" method in the SoC specific operations, but there is no requirement to change the io_clksel except when enabling clocks. It is also possible that we don't need to ungate the RMII clock if we are operating in RGMII mode, but this commit makes no change there. Split up the configuration of these as separate functions, and remove the set_clock_selection() method. Since these clocking bits are in the same register that we call the "speed" register, move the logic for writing that register into rk_write_speed_grf_reg(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588 Link: https://patch.msgid.link/E1vnYy6-00000007hp9-1AJM@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: rk3528: gmac0 only supports RMIIRussell King (Oracle)1-0/+1
RK3528 gmac0 dtsi contains: gmac0: ethernet@ffbd0000 { phy-handle = <&rmii0_phy>; phy-mode = "rmii"; mdio0: mdio { rmii0_phy: ethernet-phy@2 { phy-is-integrated; }; }; }; This follows the same pattern as rk3328, where this gmac instance only supports RMII. Disable RGMII in phylink's supported_interfaces mask for this gmac instance. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://patch.msgid.link/E1vnYy1-00000007hp3-0hKm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: rk3328: gmac2phy only supports RMIIRussell King (Oracle)1-0/+1
As detailed in a previous commit ("net: stmmac: rk: convert rk3328 to use bsp_priv->id") rk3328 gmac2phy only supports RMII, whereas gmac2io supports both RMII and RGMII. Clear supports_rgmii for gmac2phy. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328 gmac2io,rk3568,rk3588 Link: https://patch.msgid.link/E1vnYxw-00000007hox-0DqH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: replace empty set_to_rmii() with supports_rmiiRussell King (Oracle)1-62/+24
Rather than providing a now-empty set_to_rmii() method to indicate that RMII is supported, switch to setting ops->supports_rmii instead. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588 Link: https://patch.msgid.link/E1vnYxq-00000007hor-3yXt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: stmmac: rk: introduce flags indicating support for RGMII/RMIIRussell King (Oracle)1-10/+25
Introduce two boolean flags into struct rk_priv_data indicating whether RGMII and/or RMII is supported for this instance. Use these to configure the supported_interfaces mask for phylink, validate the interface mode. Initialise these from equivalent flags in the rk_gmac_ops or depending on the presence of the ops->set_to_rgmii and ops->set_to_mii methods. Finally, make ops->set_to_* optional. This will allow us to get rid of empty set_to_rmii() methods. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588 Link: https://patch.msgid.link/E1vnYxl-00000007hol-3XiH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONFShigeru Yoshida1-1/+2
syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6 route. [0] Commit f72514b3c569 ("ipv6: clear RA flags when adding a static route") introduced logic to clear RTF_ADDRCONF from existing routes when a static route with the same nexthop is added. However, this causes a problem when the existing route has a gateway. When RTF_ADDRCONF is cleared from a route that has a gateway, that route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns true. The issue is that this route was never added to the fib6_siblings list. This leads to a mismatch between the following counts: - The sibling count computed by iterating fib6_next chain, which includes the newly ECMP-eligible route - The actual siblings in fib6_siblings list, which does not include that route When a subsequent ECMP route is added, fib6_add_rt2node() hits BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the counts don't match. Fix this by only clearing RTF_ADDRCONF when the existing route does not have a gateway. Routes without a gateway cannot qualify for ECMP anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing RTF_ADDRCONF on them is safe and matches the original intent of the commit. [0]: kernel BUG at net/ipv6/ip6_fib.c:1217! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217 [...] Call Trace: <TASK> fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946 ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571 inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577 sock_do_ioctl+0xdc/0x300 net/socket.c:1245 sock_ioctl+0x576/0x790 net/socket.c:1366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f72514b3c569 ("ipv6: clear RA flags when adding a static route") Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92 Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge tag 'nf-26-02-05' of ↵Jakub Kicinski1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: update for net This is one last-minute crash fix for nf_tables, from Andrew Fasano: Logical check is inverted, this makes kernel fail to correctly undo the transaction, leading to a use-after-free. * tag 'nf-26-02-05' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() ==================== Link: https://patch.msgid.link/20260205074450.3187-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net: add vlan_get_protocol_offset_inline() helperEric Dumazet3-31/+61
skb_protocol() is bloated, and forces slow stack canaries in many fast paths. Add vlan_get_protocol_offset_inline() which deals with the non-vlan common cases. __vlan_get_protocol_offset() is now out of line. It returns a vlan_type_depth struct to avoid stack canaries in callers. struct vlan_type_depth { __be16 type; u16 depth; }; $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 0/22 up/down: 0/-6320 (-6320) Function old new delta vlan_get_protocol_dgram 61 59 -2 __pfx_skb_protocol 16 - -16 __vlan_get_protocol_offset 307 273 -34 tap_get_user 1374 1207 -167 ip_md_tunnel_xmit 1625 1452 -173 tap_sendmsg 940 753 -187 netif_skb_features 1079 866 -213 netem_enqueue 3017 2800 -217 vlan_parse_protocol 271 50 -221 tso_start 567 344 -223 fq_dequeue 1908 1685 -223 skb_network_protocol 434 205 -229 ip6_tnl_xmit 2639 2409 -230 br_dev_queue_push_xmit 474 236 -238 skb_protocol 258 - -258 packet_parse_headers 621 357 -264 __ip6_tnl_rcv 1306 1039 -267 skb_csum_hwoffload_help 515 224 -291 ip_tunnel_xmit 2635 2339 -296 sch_frag_xmit_hook 1582 1233 -349 bpf_skb_ecn_set_ce 868 457 -411 IP6_ECN_decapsulate 1297 768 -529 ip_tunnel_rcv 2121 1489 -632 ipip6_rcv 2572 1922 -650 Total: Before=24892803, After=24886483, chg -0.03% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260204053023.1622775-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05flow_offload: add const qualifiers to function argumentsDavid Yang1-2/+2
Some functions do not modify the pointed-to data, but lack const qualifiers. Add const qualifiers to the arguments of flow_rule_match_has_control_flags() and flow_cls_offload_flow_rule(). Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260204052839.198602-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05Merge branch 'dpll-core-improvements-and-ice-e825-c-synce-support'Paolo Abeni18-150/+1350
Ivan Vecera says: ==================== dpll: Core improvements and ice E825-C SyncE support This series introduces Synchronous Ethernet (SyncE) support for the Intel E825-C Ethernet controller. Unlike previous generations where DPLL connections were implicitly assumed, the E825-C architecture relies on the platform firmware (ACPI) to describe the physical connections between the Ethernet controller and external DPLLs (such as the ZL3073x). To accommodate this, the series extends the DPLL subsystem to support firmware node (fwnode) associations, asynchronous discovery via notifiers, and dynamic pin management. Additionally, a significant refactor of the DPLL reference counting logic is included to ensure robustness and debuggability. DPLL Core Extensions: * Firmware Node Association: Pins can now be associated with a struct fwnode_handle after allocation via dpll_pin_fwnode_set(). This allows drivers to link pin objects with their corresponding DT/ACPI nodes. * Asynchronous Notifiers: A raw notifier chain is added to the DPLL core. This allows the Ethernet driver to subscribe to events and react when the platform DPLL driver registers the parent pins, resolving probe ordering dependencies. * Dynamic Indexing: Drivers can now request DPLL_PIN_IDX_UNSPEC to have the core automatically allocate a unique pin index. Reference Counting & Debugging: * Refactor: The reference counting logic in the core is consolidated. Internal list management helpers now automatically handle hold/put operations, removing fragile open-coded logic in the registration paths. * Reference Tracking: A new Kconfig option DPLL_REFCNT_TRACKER is added. This allows developers to instrument and debug reference leaks by recording stack traces for every get/put operation. Driver Updates: * zl3073x: Updated to associate pins with fwnode handles using the new setter and support the 'mux' pin type. * ice: Implements the E825-C specific hardware configuration for SyncE (CGU registers). It utilizes the new notifier and fwnode APIs to dynamically discover and attach to the platform DPLLs. Patch Summary: Patch 1: DPLL Core (fwnode association). Patch 2: Driver zl3073x (Set fwnode). Patch 3-4: DPLL Core (Notifiers and dynamic IDs). Patch 5: Driver zl3073x (Mux type). Patch 6: DPLL Core (Refcount refactor). Patch 7-8: Refcount tracking infrastructure and driver updates. Patch 9: Driver ice (E825-C SyncE logic). ==================== Link: https://patch.msgid.link/20260203174002.705176-1-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05ice: dpll: Support E825-C SyncE and dynamic pin discoveryArkadiusz Kubalewski8-92/+959
Implement SyncE support for the E825-C Ethernet controller using the DPLL subsystem. Unlike E810, the E825-C architecture relies on platform firmware (ACPI) to describe connections between the NIC's recovered clock outputs and external DPLL inputs. Implement the following mechanisms to support this architecture: 1. Discovery Mechanism: The driver parses the 'dpll-pins' and 'dpll-pin names' firmware properties to identify the external DPLL pins (parents) corresponding to its RCLK outputs ("rclk0", "rclk1"). It uses fwnode_dpll_pin_find() to locate these parent pins in the DPLL core. 2. Asynchronous Registration: Since the platform DPLL driver (e.g. zl3073x) may probe independently of the network driver, utilize the DPLL notifier chain The driver listens for DPLL_PIN_CREATED events to detect when the parent MUX pins become available, then registers its own Recovered Clock (RCLK) pins as children of those parents. 3. Hardware Configuration: Implement the specific register access logic for E825-C CGU (Clock Generation Unit) registers (R10, R11). This includes configuring the bypass MUXes and clock dividers required to drive SyncE signals. 4. Split Initialization: Refactor `ice_dpll_init()` to separate the static initialization path of E810 from the dynamic, firmware-driven path required for E825-C. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-10-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05drivers: Add support for DPLL reference count trackingIvan Vecera6-26/+41
Update existing DPLL drivers to utilize the DPLL reference count tracking infrastructure. Add dpll_tracker fields to the drivers' internal device and pin structures. Pass pointers to these trackers when calling dpll_device_get/put() and dpll_pin_get/put(). This allows developers to inspect the specific references held by this driver via debugfs when CONFIG_DPLL_REFCNT_TRACKER is enabled, aiding in the debugging of resource leaks. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-9-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: Add reference count tracking supportIvan Vecera8-54/+139
Add support for the REF_TRACKER infrastructure to the DPLL subsystem. When enabled, this allows developers to track and debug reference counting leaks or imbalances for dpll_device and dpll_pin objects. It records stack traces for every get/put operation and exposes this information via debugfs at: /sys/kernel/debug/ref_tracker/dpll_device_* /sys/kernel/debug/ref_tracker/dpll_pin_* The following API changes are made to support this: 1. dpll_device_get() / dpll_device_put() now accept a 'dpll_tracker *' (which is a typedef to 'struct ref_tracker *' when enabled, or an empty struct otherwise). 2. dpll_pin_get() / dpll_pin_put() and fwnode_dpll_pin_find() similarly accept the tracker argument. 3. Internal registration structures now hold a tracker to associate the reference held by the registration with the specific owner. All existing in-tree drivers (ice, mlx5, ptp_ocp, zl3073x) are updated to pass NULL for the new tracker argument, maintaining current behavior while enabling future debugging capabilities. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-8-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: Enhance and consolidate reference counting logicIvan Vecera1-24/+50
Refactor the reference counting mechanism for DPLL devices and pins to improve consistency and prevent potential lifetime issues. Introduce internal helpers __dpll_{device,pin}_{hold,put}() to centralize reference management. Update the internal XArray reference helpers (dpll_xa_ref_*) to automatically grab a reference to the target object when it is added to a list, and release it when removed. This ensures that objects linked internally (e.g., pins referenced by parent pins) are properly kept alive without relying on the caller to manually manage the count. Consequently, remove the now redundant manual `refcount_inc/dec` calls in dpll_pin_on_pin_{,un}register()`, as ownership is now correctly handled by the dpll_xa_ref_* functions. Additionally, ensure that dpll_device_{,un}register()` takes/releases a reference to the device, ensuring the device object remains valid for the duration of its registration. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-7-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: zl3073x: Add support for mux pin typeIvan Vecera1-0/+2
Add parsing for the "mux" string in the 'connection-type' pin property mapping it to DPLL_PIN_TYPE_MUX. Recognizing this type in the driver allows these pins to be taken as parent pins for pin-on-pin pins coming from different modules (e.g. network drivers). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-6-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: Support dynamic pin index allocationIvan Vecera2-2/+48
Allow drivers to register DPLL pins without manually specifying a pin index. Currently, drivers must provide a unique pin index when calling dpll_pin_get(). This works well for hardware-mapped pins but creates friction for drivers handling virtual pins or those without a strict hardware indexing scheme. Introduce DPLL_PIN_IDX_UNSPEC (U32_MAX). When a driver passes this value as the pin index: 1. The core allocates a unique index using an IDA 2. The allocated index is mapped to a range starting above `INT_MAX` This separation ensures that dynamically allocated indices never collide with standard driver-provided hardware indices, which are assumed to be within the `0` to `INT_MAX` range. The index is automatically freed when the pin is released in dpll_pin_put(). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-5-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: Add notifier chain for dpll eventsPetr Oros4-0/+96
Currently, the DPLL subsystem reports events (creation, deletion, changes) to userspace via Netlink. However, there is no mechanism for other kernel components to be notified of these events directly. Add a raw notifier chain to the DPLL core protected by dpll_lock. This allows other kernel subsystems or drivers to register callbacks and receive notifications when DPLL devices or pins are created, deleted, or modified. Define the following: - Registration helpers: {,un}register_dpll_notifier() - Event types: DPLL_DEVICE_CREATED, DPLL_PIN_CREATED, etc. - Context structures: dpll_{device,pin}_notifier_info to pass relevant data to the listeners. The notification chain is invoked alongside the existing Netlink event generation to ensure in-kernel listeners are kept in sync with the subsystem state. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20260203174002.705176-4-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: zl3073x: Associate pin with fwnode handleIvan Vecera1-0/+1
Associate the registered DPLL pin with its firmware node by calling dpll_pin_fwnode_set(). This links the created pin object to its corresponding DT/ACPI node in the DPLL core. Consequently, this enables consumer drivers (such as network drivers) to locate and request this specific pin using the fwnode_dpll_pin_find() helper. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-3-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05dpll: Allow associating dpll pin with a firmware nodeIvan Vecera3-0/+62
Extend the DPLL core to support associating a DPLL pin with a firmware node. This association is required to allow other subsystems (such as network drivers) to locate and request specific DPLL pins defined in the Device Tree or ACPI. * Add a .fwnode field to the struct dpll_pin * Introduce dpll_pin_fwnode_set() helper to allow the provider driver to associate a pin with a fwnode after the pin has been allocated * Introduce fwnode_dpll_pin_find() helper to allow consumers to search for a registered DPLL pin using its associated fwnode handle * Ensure the fwnode reference is properly released in dpll_pin_put() Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20260203174002.705176-2-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05net/mlx5: Support devlink port state for host PFMoshe Shemesh5-12/+108
Add support for devlink port function state get/set operations for the host physical function (PF). Until now, mlx5 only allowed state get/set for subfunctions (SFs) ports. This change enables an administrator with eSwitch manager privileges to query or modify the host PF’s function state, allowing it to be explicitly inactivated or activated. While inactivated, the administrator can modify the functions attributes, such as enable/disable roce. $ devlink port show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: type eth netdev eth1 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr a0:88:c2:45:17:7c state active opstate attached roce enable max_io_eqs 120 $ devlink port function set pci/0000:03:00.0/196608 state inactive $ devlink port show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: type eth netdev eth1 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr a0:88:c2:45:17:7c state inactive opstate detached roce enable max_io_eqs 120 Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260203102402.1712218-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05Merge branch 'move-can-skb-headroom-content-to-skb-extensions'Paolo Abeni16-142/+287
Oliver Hartkopp says: ==================== move CAN skb headroom content to skb extensions CAN bus related skbuffs (ETH_P_CAN/ETH_P_CANFD/ETH_P_CANXL) simply contain CAN frame structs for CAN CC/FD/XL of skb->len length at skb->data. Those CAN skbs do not have network/mac/transport headers nor other such references for encapsulated protocols like ethernet/IP protocols. To store data for CAN specific use-cases all CAN bus related skbuffs are created with a 16 byte private skb headroom (struct can_skb_priv). Using the skb headroom and accessing skb->head for this private data led to several problems in the past likely due to "The struct can_skb_priv business is highly unconventional for the networking stack." [1] This patch set aims to remove the unconventional skb headroom usage for CAN bus related skbuffs and use the common skb extensions instead. [1] https://lore.kernel.org/linux-can/20260104074222.29e660ac@kernel.org/ - v1: https://patch.msgid.link/20260125201601.5018-1-socketcan@hartkopp.net - v2: https://lore.kernel.org/linux-can/20260128-can-skb-ext-v2-0-fe64aa152c8a@pengutronix.de/ - v4: https://lore.kernel.org/netdev/20260128-can_skb_ext-v1-0-330f60fd5d7e@hartkopp.net/ - v5: https://patch.msgid.link/20260129-can_skb_ext-v5-0-21252fdc8900@hartkopp.net - v6: https://patch.msgid.link/20260130-can_skb_ext-v6-0-8fceafab7f26@hartkopp.net - v7: https://patch.msgid.link/20260131-can_skb_ext-v7-0-dd0f8f84a83d@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> ==================== Link: https://patch.msgid.link/20260201-can_skb_ext-v8-0-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: gw: use can_gw_hops instead of sk_buff::csum_startOliver Hartkopp2-18/+7
As CAN skbs don't use IP checksums the skb->csum_start variable was used to store the can-gw CAN frame time-to-live counter together with skb->ip_summed set to CHECKSUM_UNNECESSARY. Remove the 'hack' using the skb->csum_start variable and move the content to can_skb_ext::can_gw_hops of the CAN skb extensions. The module parameter 'max_hops' has been reduced to a single byte to fit can_skb_ext::can_gw_hops as the maximum value to be stored is 6. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-6-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: remove private CAN skb headroom infrastructureOliver Hartkopp7-77/+20
This patch removes struct can_skb_priv which was stored at skb->head and the can_skb_reserve() helper which was used to shift skb->head. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-5-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: move frame_len to CAN skb extensionsOliver Hartkopp1-8/+21
The can_skb_priv::frame_len variable is used to cache a previous calculated CAN frame length to be passed to BQL queueing disciplines. Move the can_skb_priv::frame_len content to can_skb_ext::can_framelen. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-4-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: move ifindex to CAN skb extensionsOliver Hartkopp7-15/+14
When routing CAN frames over different CAN interfaces the interface index skb->iif is overwritten with every single hop. To prevent sending a CAN frame back to its originating (first) incoming CAN interface another ifindex variable is needed, which was stored in can_skb_priv::ifindex. Move the can_skb_priv::ifindex content to can_skb_ext::can_iif. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-3-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: add CAN skb extension infrastructureOliver Hartkopp15-14/+219
To remove the private CAN bus skb headroom infrastructure 8 bytes need to be stored in the skb. The skb extensions are a common pattern and an easy and efficient way to hold private data travelling along with the skb. We only need the skb_ext_add() and skb_ext_find() functions to allocate and access CAN specific content as the skb helpers to copy/clone/free skbs automatically take care of skb extensions and their final removal. This patch introduces the complete CAN skb extensions infrastructure: - add struct can_skb_ext in new file include/net/can.h - add include/net/can.h in MAINTAINERS - add SKB_EXT_CAN to skbuff.c and skbuff.h - select SKB_EXTENSIONS in Kconfig when CONFIG_CAN is enabled - check for existing CAN skb extensions in can_rcv() in af_can.c - add CAN skb extensions allocation at every skb_alloc() location - duplicate the skb extensions if cloning outgoing skbs (framelen/gw_hops) - introduce can_skb_ext_add() and can_skb_ext_find() helpers The patch also corrects an indention issue in the original code from 2018: Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602010426.PnGrYAk3-lkp@intel.com/ Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-2-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: use skb hash instead of private variable in headroomOliver Hartkopp9-19/+15
The can_skb_priv::skbcnt variable is used to identify CAN skbs in the RX path analogue to the skb->hash. As the skb hash is not filled in CAN skbs move the private skbcnt value to skb->hash and set skb->sw_hash accordingly. The skb->hash is a value used for RPS to identify skbs. Use it as intended. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-1-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05MAINTAINERS: Remove myself from TC maintainersCong Wang1-1/+0
Recently TC maintainer Jamal intentionally a broke reasonable use case: https://lore.kernel.org/netdev/aG10rqwjX6elG1Gx@pop-os.localdomain/ Although I tried my best to help by: 1) Strongly objecting this breakage from the very beginning 2) Reverting it and offering a much better solution 3) Offering Jamal for video chat on 8 Jul 2025 and 26 Nov 2025 None of them worked. So it makes no sense for me to continue caring about this subsystem. Most importantly, intentionally breaking reasonable use cases is against my moral, I don't want to get ashamed. Thanks for the opportunity! Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20260130212021.46610-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>