summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-01-08i40e: add ability to reset VF for Tx and Rx MDD eventsAleksandr Loktionov7-17/+123
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs. This flag is also used in other network adapters like ICE. Usage: - "on" - The problematic VF will be automatically reset if a malformed descriptor is detected. - "off" - The problematic VF will be disabled. In cases where a VF sends malformed packets classified as malicious, it can cause the Tx queue to freeze, rendering it unusable for several minutes. When an MDD event occurs, this new implementation allows for a graceful VF reset to quickly restore operational state. Currently, VF queues are disabled if an MDD event occurs. This patch adds the ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD event logging throttling to avoid dmesg pollution and unifies the format of Tx and Rx MDD messages. Note: Standard message rate limiting functions like dev_info_ratelimited() do not meet our requirements. Custom rate limiting is implemented, please see the code for details. Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08ixgbevf: Fix passing 0 to ERR_PTR in ixgbevf_run_xdp()Yue Haibing1-13/+10
ixgbevf_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in ixgbevf_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-12-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08ixgbe: Fix passing 0 to ERR_PTR in ixgbe_run_xdp()Yue Haibing1-14/+9
ixgbe_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in ixgbe_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-11-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Fix passing 0 to ERR_PTR in igb_run_xdp()Yue Haibing1-14/+8
igb_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in igb_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-10-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igc: Fix passing 0 to ERR_PTR in igc_xdp_run_prog()Yue Haibing1-13/+7
igc_xdp_run_prog() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in igc_clean_rx_irq(). Remove this error pointer handing instead use plain int return value to fix this smatch warnings: drivers/net/ethernet/intel/igc/igc_main.c:2533 igc_xdp_run_prog() warn: passing zero to 'ERR_PTR' Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igc: Allow hot-swapping XDP programSong Yoong Siang1-2/+4
Currently, the driver would always close and reopen the network interface when setting/removing the XDP program, regardless of the presence of XDP resources. This could cause unnecessary disruptions. To avoid this, introduces a check to determine if there is a need to close and reopen the interface, allowing for seamless hot-swapping of XDP programs. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-8-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Add AF_XDP zero-copy Tx supportSriram Yagnaraman3-10/+116
Add support for AF_XDP zero-copy transmit path. A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and igb_clean_tx_irq() can clean the buffers correctly based on type. igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is enabled. We share the TX ring between slow path, XDP and AF_XDP zero-copy, so we use the netdev queue lock to ensure mutual exclusion. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted, Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in igb_xmit_zc(), Add FIXME for RS bit] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-7-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Add AF_XDP zero-copy Rx supportSriram Yagnaraman3-19/+360
Add support for AF_XDP zero-copy receive path. When AF_XDP zero-copy is enabled, the rx buffers are allocated from the xsk buff pool using igb_alloc_rx_buffers_zc(). Use xsk_pool_get_rx_frame_size() to set SRRCTL rx buf size when zero-copy is enabled. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Port to v6.12 and provide napi_id for xdp_rxq_info_reg(), RCT, remove NETDEV_XDP_ACT_XSK_ZEROCOPY, update NTC handling, READ_ONCE() xsk_pool, likelyfy for XDP_REDIRECT case] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Add XDP finalize and stats update functionsKurt Kanzenbach2-19/+38
Move XDP finalize and Rx statistics update into separate functions. This way, they can be reused by the XDP and XDP/ZC code later. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Introduce XSK data structures and helpersSriram Yagnaraman4-2/+229
Add the following ring flag: - IGB_RING_FLAG_TX_DISABLED (when xsk pool is being setup) Add a xdp_buff array for use with XSK receive batch API, and a pointer to xsk_pool in igb_adapter. Add enable/disable functions for TX and RX rings. Add enable/disable functions for XSK pool. Add xsk wakeup function. None of the above functionality will be active until NETDEV_XDP_ACT_XSK_ZEROCOPY is advertised in netdev->xdp_features. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add READ/WRITE_ONCE(), synchronize_net(), remove IGB_RING_FLAG_AF_XDP_ZC] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Introduce igb_xdp_is_enabled()Sriram Yagnaraman2-3/+10
Introduce igb_xdp_is_enabled() to check if an XDP program is assigned to the device. Use that wherever xdp_prog is read and evaluated. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Split patches and use READ_ONCE()] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08igb: Remove static qualifiersSriram Yagnaraman2-31/+35
Remove static qualifiers on the following functions to be able to call from XSK specific file that is added in the later patches: - igb_xdp_tx_queue_mapping() - igb_xdp_ring_update_tail() - igb_clean_tx_ring() - igb_clean_rx_ring() - igb_xdp_xmit_back() - igb_process_skb_fields() While at it, inline igb_xdp_tx_queue_mapping() and igb_xdp_ring_update_tail(). These functions are small enough and used in XDP hot paths. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Split patches, inline small XDP functions] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08Merge branch 'tools-ynl-decode-link-types-present-in-tests'Jakub Kicinski4-40/+152
Jakub Kicinski says: ==================== tools: ynl: decode link types present in tests Using a kernel built for the net selftest target to run drivers/net tests currently fails, because the net kernel automatically spawns a handful of tunnel devices which YNL can't decode. Fill in those missing link types in rt_link. We need to extend subset support a bit for it to work. v1: https://lore.kernel.org/20250105012523.1722231-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250107022820.2087101-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08netlink: specs: rt_link: decode ip6tnl, vti and vti6 link attrsJakub Kicinski1-0/+87
Some of our tests load vti and ip6tnl so not being able to decode the link attrs gets in the way of using Python YNL for testing. Decode link attributes for ip6tnl, vti and vti6. ip6tnl uses IFLA_IPTUN_FLAGS as u32, while ipv4 and sit expect a u16 attribute, so we have a (first?) subset type override... Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250107022820.2087101-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08tools: ynl: print some information about attribute we can't parseJakub Kicinski1-35/+39
When parsing throws an exception one often has to figure out which attribute couldn't be parsed from first principles. For families with large message parsing trees like rtnetlink guessing the attribute can be hard. Print a bit of information as the exception travels out, e.g.: # when dumping rt links Error decoding 'flags' from 'linkinfo-ip6tnl-attrs' Error decoding 'data' from 'linkinfo-attrs' Error decoding 'linkinfo' from 'link-attrs' Traceback (most recent call last): File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 119, in <module> main() File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 100, in main reply = ynl.dump(args.dump, attrs) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1064, in dump return self._op(method, vals, dump=True) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1058, in _op return self._ops(ops)[0] File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1045, in _ops rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 738, in _decode subdict = self._decode(NlAttrs(attr.raw), attr_spec['nested-attributes'], search_attrs) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 763, in _decode decoded = self._decode_sub_msg(attr, attr_spec, search_attrs) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 714, in _decode_sub_msg subdict = self._decode(NlAttrs(attr.raw, offset), msg_format.attr_set) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 749, in _decode decoded = attr.as_scalar(attr_spec['type'], attr_spec.byte_order) File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 147, in as_scalar return format.unpack(self.raw)[0] struct.error: unpack requires a buffer of 2 bytes The Traceback is what we would previously see, the "Error..." messages are new. We print a message per level (in the stack order). Printing single combined message gets tricky quickly given sub-messages etc. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250107022820.2087101-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08tools: ynl: correctly handle overrides of fields in subsetJakub Kicinski2-5/+26
We stated in documentation [1] and previous discussions [2] that the need for overriding fields in members of subsets is anticipated. Implement it. Since each attr is now a new object we need to make sure that the modifications are propagated. Specifically C codegen wants to annotate which attrs are used in requests and replies to generate the right validation artifacts. [1] https://docs.kernel.org/next/userspace-api/netlink/specs.html#subset-of [2] https://lore.kernel.org/netdev/20231004171350.1f59cd1d@kernel.org/ Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250107022820.2087101-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08if_vlan: fix kdoc warningsJakub Kicinski1-2/+11
While merging net to net-next I noticed that the kdoc above __vlan_get_protocol_offset() has the wrong function name. Fix that and all the other kdoc warnings in this file. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250106174620.1855269-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08Merge branch 'net-dsa-cleanup-eee-part-2'Jakub Kicinski11-64/+0
Russell King says: ==================== net: dsa: cleanup EEE (part 2) This is part 2 of the DSA EEE cleanups, removing what has become dead code as a result of the EEE management phylib now does. Patch 1 removes the useless setting of tx_lpi parameters in the ksz driver. Patch 2 does the same for mt753x. Patch 3 removes the DSA core code that calls the get_mac_eee() operation. This needs to be done before removing the implementations because doing otherwise would cause dsa_user_get_eee() to return -EOPNOTSUPP. Patches 4..8 remove the trivial get_mac_eee() implementations from DSA drivers. Patch 9 finally removes the get_mac_eee() method from struct dsa_switch_ops. ==================== Link: https://patch.msgid.link/Z3vDwwsHSxH5D6Pm@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: remove get_mac_eee() methodRussell King (Oracle)1-2/+0
The get_mac_eee() is no longer called by the core DSA code, nor are there any implementations of this method. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUllU-007UzL-KV@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: qca: remove qca8k_get_mac_eee()Russell King (Oracle)3-9/+0
qca8k_get_mac_eee() is no longer called by the core DSA code. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUllP-007UzF-Gk@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: mv88e6xxx: remove mv88e6xxx_get_mac_eee()Russell King (Oracle)1-8/+0
mv88e6xxx_get_mac_eee() is no longer called by the core DSA code. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUllK-007Uz9-D7@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: mt753x: remove ksz_get_mac_eee()Russell King (Oracle)1-7/+0
mt753x_get_mac_eee() is no longer called by the core DSA code. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com> Link: https://patch.msgid.link/E1tUllF-007Uz3-95@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: ksz: remove ksz_get_mac_eee()Russell King (Oracle)1-7/+0
ksz_get_mac_eee() is no longer called by the core DSA code. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUllA-007Uyx-4o@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: b53/bcm_sf2: remove b53_get_mac_eee()Russell King (Oracle)3-9/+0
b53_get_mac_eee() is no longer called by the core DSA code. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUll5-007Uyr-1U@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: no longer call ds->ops->get_mac_eee()Russell King (Oracle)1-8/+0
All implementations of get_mac_eee() now just return zero without doing anything useful. Remove the call to this method in preparation to removing the method from each DSA driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUlkz-007Uyl-UA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: mt753x: remove setting of tx_lpi parametersRussell King (Oracle)1-6/+0
dsa_user_get_eee() calls the DSA switch get_mac_eee() method followed by phylink_ethtool_get_eee(), which goes on to call phy_ethtool_get_eee(). This overwrites all members of the passed ethtool_keee, which means anything written by the DSA switch get_mac_eee() method will be discarded. Remove setting any members in mt753x_get_mac_eee(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com> Link: https://patch.msgid.link/E1tUlku-007Uyc-RP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: dsa: ksz: remove setting of tx_lpi parametersRussell King (Oracle)1-8/+0
dsa_user_get_eee() calls the DSA switch get_mac_eee() method followed by phylink_ethtool_get_eee(), which goes on to call phy_ethtool_get_eee(). This overwrites all members of the passed ethtool_keee, which means anything written by the DSA switch get_mac_eee() method will be discarded. Remove setting any members in ksz_get_mac_eee(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUlkp-007UyW-OR@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08Merge branch 'net-hold-per-netns-rtnl-during-netdev-notifier-registration'Jakub Kicinski1-12/+29
Kuniyuki Iwashima says: ==================== net: Hold per-netns RTNL during netdev notifier registration. This series adds per-netns RTNL for registration of the global and per-netns netdev notifiers. v1: https://lore.kernel.org/netdev/20250104063735.36945-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250106070751.63146-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().Kuniyuki Iwashima1-6/+10
(un)?register_netdevice_notifier_dev_net() hold RTNL before triggering the notifier for all netdev in the netns. Let's convert the RTNL to rtnl_net_lock(). Note that move_netdevice_notifiers_dev_net() is assumed to be (but not yet) protected by per-netns RTNL of both src and dst netns; we need to convert wireless and hyperv drivers that call dev_change_net_namespace(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250106070751.63146-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net().Kuniyuki Iwashima1-4/+6
(un)?register_netdevice_notifier_net() hold RTNL before triggering the notifier for all netdev in the netns. Let's convert the RTNL to rtnl_net_lock(). Note that the per-netns netdev notifier is protected by per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250106070751.63146-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier().Kuniyuki Iwashima1-2/+13
(un)?register_netdevice_notifier() hold pernet_ops_rwsem and RTNL, iterate all netns, and trigger the notifier for all netdev. Let's hold __rtnl_net_lock() before triggering the notifier. Note that we will need protection for netdev_chain when RTNL is removed. (e.g. blocking_notifier conversion [0] with a lockdep annotation [1]) Link: https://lore.kernel.org/netdev/20250104063735.36945-2-kuniyu@amazon.com/ [0] Link: https://lore.kernel.org/netdev/20250105075957.67334-1-kuniyu@amazon.com/ [1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250106070751.63146-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08ixgbevf: Remove unused ixgbevf_hv_mbx_opsDr. David Alan Gilbert2-13/+0
The const struct ixgbevf_hv_mbx_ops was added in 2016 as part of commit c6d45171d706 ("ixgbevf: Support Windows hosts (Hyper-V)") but has remained unused. The functions it references are still referenced elsewhere. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250105122847.27341-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: watchdog: rename __dev_watchdog_up() and dev_watchdog_down()Eric Dumazet4-22/+17
In commit d7811e623dd4 ("[NET]: Drop tx lock in dev_watchdog_up") dev_watchdog_up() became a simple wrapper for __netdev_watchdog_up() Herbert also said : "In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and replace it with dev_watchdog_up." This patch consolidates things to have only two functions, with a common prefix. - netdev_watchdog_up(), exported for the sake of one freescale driver. This replaces __netdev_watchdog_up() and dev_watchdog_up(). - netdev_watchdog_down(), static to net/sched/sch_generic.c This replaces dev_watchdog_down(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250105090924.1661822-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08Merge tag 'for-netdev' of ↵Jakub Kicinski11-103/+190
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-01-07 We've added 7 non-merge commits during the last 32 day(s) which contain a total of 11 files changed, 190 insertions(+), 103 deletions(-). The main changes are: 1) Migrate the test_xdp_meta.sh BPF selftest into test_progs framework, from Bastien Curutchet. 2) Add ability to configure head/tailroom for netkit devices, from Daniel Borkmann. 3) Fixes and improvements to the xdp_hw_metadata selftest, from Song Yoong Siang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Extend netkit tests to validate set {head,tail}room netkit: Add add netkit {head,tail}room to rt_link.yaml netkit: Allow for configuring needed_{head,tail}room selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c selftests/bpf: test_xdp_meta: Rename BPF sections selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata ==================== Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07bridge: Make br_is_nd_neigh_msg() accept pointer to "const struct sk_buff"Ted Chen2-2/+2
The skb_buff struct in br_is_nd_neigh_msg() is never modified. Mark it as const. Signed-off-by: Ted Chen <znscnchen@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250104083846.71612-1-znscnchen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07Merge branch 'dev-hold-per-netns-rtnl-in-register-netdev'Paolo Abeni3-5/+26
Kuniyuki Iwashima says: ==================== dev: Hold per-netns RTNL in register_netdev(). Patch 1 adds rtnl_net_lock_killable() and Patch 2 uses it in register_netdev() and converts it and unregister_netdev() to per-netns RTNL. With this and the netdev notifier series [0], ASSERT_RTNL_NET() for NETDEV_REGISTER [1] wasn't fired on a simplest QEMU setup like e1000 + x86_64_defconfig + CONFIG_DEBUG_NET_SMALL_RTNL. [0]: https://lore.kernel.org/netdev/20250104063735.36945-1-kuniyu@amazon.com/ [1]: ---8<--- diff --git a/net/core/rtnl_net_debug.c b/net/core/rtnl_net_debug.c index f406045cbd0e..c0c30929002e 100644 --- a/net/core/rtnl_net_debug.c +++ b/net/core/rtnl_net_debug.c @@ -21,7 +21,6 @@ static int rtnl_net_debug_event(struct notifier_block *nb, case NETDEV_DOWN: case NETDEV_REBOOT: case NETDEV_CHANGE: - case NETDEV_REGISTER: case NETDEV_UNREGISTER: case NETDEV_CHANGEMTU: case NETDEV_CHANGEADDR: @@ -60,19 +59,10 @@ static int rtnl_net_debug_event(struct notifier_block *nb, ASSERT_RTNL(); break; - /* Once an event fully supports RTNL_NET, move it here - * and remove "if (0)" below. - * - * case NETDEV_XXX: - * ASSERT_RTNL_NET(net); - * break; - */ - } - - /* Just to avoid unused-variable error for dev and net. */ - if (0) + case NETDEV_REGISTER: ASSERT_RTNL_NET(net); + break; + } return NOTIFY_DONE; } ---8<--- ==================== Link: https://patch.msgid.link/20250104082149.48493-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07dev: Hold per-netns RTNL in (un)?register_netdev().Kuniyuki Iwashima1-4/+10
Let's hold per-netns RTNL of dev_net(dev) in register_netdev() and unregister_netdev(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07rtnetlink: Add rtnl_net_lock_killable().Kuniyuki Iwashima2-1/+16
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL. Let's unexport it and add the corresponding helper. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07eth: fbnic: update fbnic_poll return valueMohsin Bashir1-1/+1
In cases where the work done is less than the budget, `fbnic_poll` is returning 0. This affects the tracing of `napi_poll`. Following is a snippet of before and after result from `napi_poll` tracepoint. Instead, returning the work done improves the manual tracing. Before: @[10]: 1 ... @[64]: 208175 @[0]: 2128008 After: @[56]: 86 @[48]: 222 ... @[5]: 1885756 @[6]: 1933841 Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250104015316.3192946-1-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07Merge branch 'net-airoha-add-qdisc-offload-support'Paolo Abeni1-4/+514
Lorenzo Bianconi says: ==================== net: airoha: Add Qdisc offload support Introduce support for ETS and HTB Qdisc offload available on the Airoha EN7581 ethernet controller. ==================== Link: https://patch.msgid.link/20250103-airoha-en7581-qdisc-offload-v1-0-608a23fa65d5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07net: airoha: Add sched HTB offload supportLorenzo Bianconi1-1/+287
Introduce support for HTB Qdisc offload available in the Airoha EN7581 ethernet controller. EN7581 can offload only one level of HTB leafs. Each HTB leaf represents a QoS channel supported by EN7581 SoC. The typical use-case is creating a HTB leaf for QoS channel to rate limit the egress traffic and attach an ETS Qdisc to each HTB leaf in order to enforce traffic prioritization. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07net: airoha: Add sched ETS offload supportLorenzo Bianconi1-1/+195
Introduce support for ETS Qdisc offload available on the Airoha EN7581 ethernet controller. In order to be effective, ETS Qdisc must configured as leaf of a HTB Qdisc (HTB Qdisc offload will be added in the following patch). ETS Qdisc available on EN7581 ethernet controller supports at most 8 concurrent bands (QoS queues). We can enable an ETS Qdisc for each available QoS channel. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07net: airoha: Introduce ndo_select_queue callbackLorenzo Bianconi1-2/+28
Airoha EN7581 SoC supports 32 Tx DMA rings used to feed packets to QoS channels. Each channels supports 8 QoS queues where the user can apply QoS scheduling policies. In a similar way, the user can configure hw rate shaping for each QoS channel. Introduce ndo_select_queue callback in order to select the tx queue based on QoS channel and QoS queue. In particular, for dsa device select QoS channel according to the dsa user port index, rely on port id otherwise. Select QoS queue based on the skb priority. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07net: airoha: Enable Tx drop capability for each Tx DMA ringLorenzo Bianconi1-0/+4
This is a preliminary patch in order to enable hw Qdisc offloading. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07selftests/net: packetdrill: report benign debug flakes as xfailWillem de Bruijn2-7/+31
A few recently added packetdrill tests that are known time sensitive (e.g., because testing timestamping) occasionally fail in debug mode: https://netdev.bots.linux.dev/contest.html?executor=vmksft-packetdrill-dbg These failures are well understood. Correctness of the tests is verified in non-debug mode. Continue running in debug mode also, to keep coverage with debug instrumentation. But, only in debug mode, mark these tests with well understood timing issues as XFAIL (known failing) rather than FAIL when failing. Introduce an allow list xfail_list with known cases. Expand the ktap infrastructure with XFAIL support. Fixes: eab35989cc37 ("selftests/net: packetdrill: import tcp/fast_recovery, tcp/nagle, tcp/timestamping") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241218100013.0c698629@kernel.org/ Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250103113142.129251-1-willemdebruijn.kernel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07net: stmmac: Set dma_sync_size to zero for discarded framesFurong Xu1-1/+1
If a frame is going to be discarded by driver, this frame is never touched by driver and the cache lines never become dirty obviously, page_pool_recycle_direct() wastes CPU cycles on unnecessary calling of page_pool_dma_sync_for_device() to sync entire frame. page_pool_put_page() with sync_size setting to 0 is the proper method. Signed-off-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250103093733.3872939-1-0x1207@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07octeontx2-pf: mcs: Remove dead code and semi-colon from rsrc_name()Nihar Chaithanya1-3/+1
Every case in the switch-block ends with return statement, and the default: branch handles the cases where rsrc_type is invalid and returns "Unknown", this makes the return statement at the end of the function unreachable and redundant. The semi-colon is not required after the switch-block's curly braces. Remove the semi-colon after the switch-block's curly braces and the return statement at the end of the function. This issue was reported by Coverity Scan. Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Link: https://patch.msgid.link/20250104171905.13293-1-niharchaithanya@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07nfc: st21nfca: Drop unneeded null check in st21nfca_tx_work()Krzysztof Kozlowski1-10/+8
Variable 'info' is obtained via container_of() of struct work_struct, so it cannot be NULL. Simplify the code and solve Smatch warning: drivers/nfc/st21nfca/dep.c:119 st21nfca_tx_work() warn: can 'info' even be NULL? Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250104142043.116045-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07Merge branch 'mlx5-hardware-steering-part-2'Jakub Kicinski19-299/+227
Tariq Toukan says: ==================== mlx5 Hardware Steering part 2 This series contain HWS code cleanups, enhancements, bug fixes, and additions. Note that some of these patches are fixing bugs in existing code, but we submit them without 'Fixes' tag to avoid the unnecessary burden for stable releases, as HWS still couldn't be enabled. Patches 1-5: HWS, various code cleanups and enhancements Patches 6-14: HWS, various bug fixes and additions Patch 15: HWS, setting timeout on polling ==================== Link: https://patch.msgid.link/20250102181415.1477316-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07net/mlx5: HWS, set timeout on polling for completionYevgeny Kliteynik2-10/+18
Consolidate BWC polling for completion into one function and set a time limit on the loop that polls for completion. This can happen only if there is some issue with FW/PCI/HW, such as FW being stuck, PCI issue, etc. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250102181415.1477316-16-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>