summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2024-09-12net: dpaa: Pad packets to ETH_ZLENSean Anderson1-1/+8
When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running $ ping -s 11 destination Fixes: 9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: libwx: fix number of Rx and Tx descriptorsJiawen Wu1-3/+3
The number of transmit and receive descriptors must be a multiple of 128 due to the hardware limitation. If it is set to a multiple of 8 instead of a multiple 128, the queues will easily be hung. Cc: stable@vger.kernel.org Fixes: 883b5984a5d2 ("net: wangxun: add ethtool_ops for ring parameters") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge branch '100GbE' of ↵Jakub Kicinski4-15/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-09 (ice, igb) This series contains updates to ice and igb drivers. Martyna moves LLDP rule removal to the proper uninitialization function for ice. Jake corrects accounting logic for FWD_TO_VSI_LIST switch filters on ice. Przemek removes incorrect, explicit calls to pci_disable_device() for ice. Michal Schmidt stops incorrect use of VSI list for VLAN use on ice. Sriram Yagnaraman adjusts igb_xdp_ring_update_tail() to be called under Tx lock on igb. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Always call igb_xdp_ring_update_tail() under Tx lock ice: fix VSI lists confusion when adding VLANs ice: stop calling pci_disable_device() as we use pcim ice: fix accounting for filters shared by multiple VSIs ice: Fix lldp packets dropping after changing the number of channels ==================== Link: https://patch.msgid.link/20240909203842.3109822-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge tag 'mlx5-fixes-2024-09-09' of ↵Jakub Kicinski5-22/+51
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-09 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2024-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix bridge mode operations when there are no VFs net/mlx5: Verify support for scheduling element and TSAR type net/mlx5: Add missing masks and QoS bit masks for scheduling elements net/mlx5: Explicitly set scheduling element and TSAR type net/mlx5e: Add missing link mode to ptys2ext_ethtool_map net/mlx5e: Add missing link modes to ptys2ethtool_map net/mlx5: Update the list of the PCI supported devices ==================== Link: https://patch.msgid.link/20240909194505.69715-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10net: ftgmac100: Enable TX interrupt to avoid TX timeoutJacky Chou1-1/+1
Currently, the driver only enables RX interrupt to handle RX packets and TX resources. Sometimes there is not RX traffic, so the TX resource needs to wait for RX interrupt to free. This situation will toggle the TX timeout watchdog when the MAC TX ring has no more resources to transmit packets. Therefore, enable TX interrupt to release TX resources at any time. When I am verifying iperf3 over UDP, the network hangs. Like the log below. root# iperf3 -c 192.168.100.100 -i1 -t10 -u -b0 Connecting to host 192.168.100.100, port 5201 [ 4] local 192.168.100.101 port 35773 connected to 192.168.100.100 port 5201 [ ID] Interval Transfer Bandwidth Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 20 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 0.000 ms 0/20 (0%) [ 4] Sent 20 datagrams iperf3: error - the server has terminated The network topology is FTGMAC connects directly to a PC. UDP does not need to wait for ACK, unlike TCP. Therefore, FTGMAC needs to enable TX interrupt to release TX resources instead of waiting for the RX interrupt. Fixes: 10cbd6407609 ("ftgmac100: Rework NAPI & interrupts handling") Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20240906062831.2243399-1-jacky_chou@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-10octeontx2-af: Modify SMQ flush sequence to drop packetsNaveen Mamindlapalli2-14/+48
The current implementation of SMQ flush sequence waits for the packets in the TM pipeline to be transmitted out of the link. This sequence doesn't succeed in HW when there is any issue with link such as lack of link credits, link down or any other traffic that is fully occupying the link bandwidth (QoS). This patch modifies the SMQ flush sequence to drop the packets after TL1 level (SQM) instead of polling for the packets to be sent out of RPM/CGX link. Fixes: 5d9b976d4480 ("octeontx2-af: Support fixed transmit scheduler topology") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://patch.msgid.link/20240906045838.1620308-1-naveenm@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-09net/mlx5: Fix bridge mode operations when there are no VFsBenjamin Poirier1-2/+2
Currently, trying to set the bridge mode attribute when numvfs=0 leads to a crash: bridge link set dev eth2 hwmode vepa [ 168.967392] BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] [ 168.969989] RIP: 0010:mlx5_add_flow_rules+0x1f/0x300 [mlx5_core] [...] [ 168.976037] Call Trace: [ 168.976188] <TASK> [ 168.978620] _mlx5_eswitch_set_vepa_locked+0x113/0x230 [mlx5_core] [ 168.979074] mlx5_eswitch_set_vepa+0x7f/0xa0 [mlx5_core] [ 168.979471] rtnl_bridge_setlink+0xe9/0x1f0 [ 168.979714] rtnetlink_rcv_msg+0x159/0x400 [ 168.980451] netlink_rcv_skb+0x54/0x100 [ 168.980675] netlink_unicast+0x241/0x360 [ 168.980918] netlink_sendmsg+0x1f6/0x430 [ 168.981162] ____sys_sendmsg+0x3bb/0x3f0 [ 168.982155] ___sys_sendmsg+0x88/0xd0 [ 168.985036] __sys_sendmsg+0x59/0xa0 [ 168.985477] do_syscall_64+0x79/0x150 [ 168.987273] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 168.987773] RIP: 0033:0x7f8f7950f917 (esw->fdb_table.legacy.vepa_fdb is null) The bridge mode is only relevant when there are multiple functions per port. Therefore, prevent setting and getting this setting when there are no VFs. Note that after this change, there are no settings to change on the PF interface using `bridge link` when there are no VFs, so the interface no longer appears in the `bridge link` output. Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Verify support for scheduling element and TSAR typeCarolina Jubran2-20/+31
Before creating a scheduling element in a NIC or E-Switch scheduler, ensure that the requested element type is supported. If the element is of type Transmit Scheduling Arbiter (TSAR), also verify that the specific TSAR type is supported. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Fixes: 85c5f7c9200e ("net/mlx5: E-switch, Create QoS on demand") Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Explicitly set scheduling element and TSAR typeCarolina Jubran1-0/+7
Ensure the scheduling element type and TSAR type are explicitly initialized in the QoS rate group creation. This prevents potential issues due to default values. Fixes: 1ae258f8b343 ("net/mlx5: E-switch, Introduce rate limiting groups API") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5e: Add missing link mode to ptys2ext_ethtool_mapShahar Shitrit1-0/+6
Add MLX5E_400GAUI_8_400GBASE_CR8 to the extended modes in ptys2ext_ethtool_table, since it was missing. Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5e: Add missing link modes to ptys2ethtool_mapShahar Shitrit1-0/+4
Add MLX5E_1000BASE_T and MLX5E_100BASE_TX to the legacy modes in ptys2legacy_ethtool_table, since they were missing. Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Update the list of the PCI supported devicesMaher Sanalla1-0/+1
Add the upcoming ConnectX-9 device ID to the table of supported PCI device IDs. Fixes: f908a35b2218 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09igb: Always call igb_xdp_ring_update_tail() under Tx lockSriram Yagnaraman1-4/+13
Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment and lockdep assert to indicate that. This is needed to share the same TX ring between XDP, XSK and slow paths. Furthermore, the current XDP implementation is racy on tail updates. Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add lockdep assert and fixes tag] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: fix VSI lists confusion when adding VLANsMichal Schmidt1-1/+1
The description of function ice_find_vsi_list_entry says: Search VSI list map with VSI count 1 However, since the blamed commit (see Fixes below), the function no longer checks vsi_count. This causes a problem in ice_add_vlan_internal, where the decision to share VSI lists between filter rules relies on the vsi_count of the found existing VSI list being 1. The reproducing steps: 1. Have a PF and two VFs. There will be a filter rule for VLAN 0, referring to a VSI list containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1). 2. Add VLAN 1234 to VF#0. ice will make the wrong decision to share the VSI list with the new rule. The wrong behavior may not be immediately apparent, but it can be observed with debug prints. 3. Add VLAN 1234 to VF#1. ice will unshare the VSI list for the VLAN 1234 rule. Due to the earlier bad decision, the newly created VSI list will contain VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1). 4. Try pinging a network peer over the VLAN interface on VF#0. This fails. Reproducer script at: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh Commented debug trace: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt Patch adding the debug prints: https://gitlab.com/mschmidt2/linux/-/commit/f8a8814623944a45091a77c6094c40bfe726bfdb (Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.) Michal Swiatkowski added to the explanation that the bug is caused by reusing a VSI list created for VLAN 0. All created VFs' VSIs are added to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already in VLAN 0 (normal case), the VSI list from VLAN 0 is reused. It leads to a problem because all VFs (VSIs to be specific) that are subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is one bug, another is the bug described above. Removing filters from one VF will remove VLAN filter from the previous VF. It happens a VF is reset. Example: - creation of 3 VFs - we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)] - we are adding VLAN 100 on VF1, we are reusing the previous list because 2 is there - VLAN traffic works fine, but VLAN 100 tagged traffic can be received on all VSIs from the list (for example broadcast or unicast) - trust is turning on VF2, VF2 is resetting, all filters from VF2 are removed; the VLAN 100 filter is also removed because 3 is on the list - VLAN traffic to VF1 isn't working anymore, there is a need to recreate VLAN interface to readd VLAN filter One thing I'm not certain about is the implications for the LAG feature, which is another caller of ice_find_vsi_list_entry. I don't have a LAG-capable card at hand to test. Fixes: 23ccae5ce15f ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Dave Ertman <David.m.ertman@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: stop calling pci_disable_device() as we use pcimPrzemek Kitszel1-2/+0
Our driver uses devres to manage resources, in particular we call pcim_enable_device(), what also means we express the intent to get automatic pci_disable_device() call at driver removal. Manual calls to pci_disable_device() misuse the API. Recent commit (see "Fixes" tag) has changed the removal action from conditional (silent ignore of double call to pci_disable_device()) to unconditional, but able to catch unwanted redundant calls; see cited "Fixes" commit for details. Since that, unloading the driver yields following warn+splat: [70633.628490] ice 0000:af:00.7: disabling already-disabled device [70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100 ... [70633.628744] ? pci_disable_device+0xf4/0x100 [70633.628752] release_nodes+0x4a/0x70 [70633.628759] devres_release_all+0x8b/0xc0 [70633.628768] device_unbind_cleanup+0xe/0x70 [70633.628774] device_release_driver_internal+0x208/0x250 [70633.628781] driver_detach+0x47/0x90 [70633.628786] bus_remove_driver+0x80/0x100 [70633.628791] pci_unregister_driver+0x2a/0xb0 [70633.628799] ice_module_exit+0x11/0x3a [ice] Note that this is the only Intel ethernet driver that needs such fix. Fixes: f748a07a0b64 ("PCI: Remove legacy pcim_release()") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: fix accounting for filters shared by multiple VSIsJacob Keller1-1/+1
When adding a switch filter (such as a MAC or VLAN filter), it is expected that the driver will detect the case where the filter already exists, and return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr, and ice_vsi_add_vlan to avoid incrementing the accounting fields such as vsi->num_vlan or vf->num_mac. This logic works correctly for the case where only a single VSI has added a given switch filter. When a second VSI adds the same switch filter, the driver converts the existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST filter. This saves switch resources, by ensuring that multiple VSIs can re-use the same filter. The ice_add_update_vsi_list() function is responsible for doing this conversion. When first converting a filter from the FWD_TO_VSI into FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the existing rule's VSI. In such a case it returns -EEXIST. However, when the switch rule has already been converted to a FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just requires extending the VSI list entry. The logic for checking if the rule already exists in this case returns 0 instead of -EEXIST. This breaks the accounting logic mentioned above, so the counters for how many MAC and VLAN filters exist for a given VF or VSI no longer accurately reflect the actual count. This breaks other code which relies on these counts. In typical usage this primarily affects such filters generally shared by multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses. Fix this by correctly reporting -EEXIST in the case of adding the same VSI to a switch rule already converted to ICE_FWD_TO_VSI_LIST. Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: Fix lldp packets dropping after changing the number of channelsMartyna Szapar-Mudlaw1-7/+8
After vsi setup refactor commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <mgregr@netx.as> Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-05Merge branch '100GbE' of ↵Jakub Kicinski6-161/+106
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba says: PF reset can be triggered asynchronously, by tx_timeout or by a user. With some unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and modify XDP rings at the same time, causing system crash. The first patch factors out rtnl-locked code from VSI rebuild code to avoid deadlock. The following changes lock rebuild and .ndo_bpf() critical sections with an internal mutex as well and provide complementary fixes. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: do not bring the VSI up, if it was down before the XDP setup ice: remove ICE_CFG_BUSY locking from AF_XDP code ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset ice: check for XDP rings instead of bpf program when unconfiguring ice: protect XDP configuration with a mutex ice: move netif_queue_set_napi to rtnl-protected sections ==================== Link: https://patch.msgid.link/20240903183034.3530411-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05net: xilinx: axienet: Fix race in axienet_stopSean Anderson2-0/+11
axienet_dma_err_handler can race with axienet_stop in the following manner: CPU 1 CPU 2 ====================== ================== axienet_stop() napi_disable() axienet_dma_stop() axienet_dma_err_handler() napi_disable() axienet_dma_stop() axienet_dma_start() napi_enable() cancel_work_sync() free_irq() Fix this by setting a flag in axienet_stop telling axienet_dma_err_handler not to bother doing anything. I chose not to use disable_work_sync to allow for easier backporting. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Link: https://patch.msgid.link/20240903175141.4132898-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-04net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanupSouradeep Chakrabarti1-9/+13
Currently napi_disable() gets called during rxq and txq cleanup, even before napi is enabled and hrtimer is initialized. It causes kernel panic. ? page_fault_oops+0x136/0x2b0 ? page_counter_cancel+0x2e/0x80 ? do_user_addr_fault+0x2f2/0x640 ? refill_obj_stock+0xc4/0x110 ? exc_page_fault+0x71/0x160 ? asm_exc_page_fault+0x27/0x30 ? __mmdrop+0x10/0x180 ? __mmdrop+0xec/0x180 ? hrtimer_active+0xd/0x50 hrtimer_try_to_cancel+0x2c/0xf0 hrtimer_cancel+0x15/0x30 napi_disable+0x65/0x90 mana_destroy_rxq+0x4c/0x2f0 mana_create_rxq.isra.0+0x56c/0x6d0 ? mana_uncfg_vport+0x50/0x50 mana_alloc_queues+0x21b/0x320 ? skb_dequeue+0x5f/0x80 Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-03ice: do not bring the VSI up, if it was down before the XDP setupLarysa Zaremba1-2/+5
After XDP configuration is completed, we bring the interface up unconditionally, regardless of its state before the call to .ndo_bpf(). Preserve the information whether the interface had to be brought down and later bring it up only in such case. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: remove ICE_CFG_BUSY locking from AF_XDP codeLarysa Zaremba1-9/+0
Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing, because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF state, not VSI one. Therefore it does not protect the queue pair from e.g. reset. Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check ICE_VSI_DOWN under rtnl_lock when preparing for resetLarysa Zaremba1-6/+6
Consider the following scenario: .ndo_bpf() | ice_prepare_for_reset() | ________________________|_______________________________________| rtnl_lock() | | ice_down() | | | test_bit(ICE_VSI_DOWN) - true | | ice_dis_vsi() returns | ice_up() | | | proceeds to rebuild a running VSI | .ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply new configuration. Another example is .set_channels(). To avoid the race condition above, act only after reading ICE_VSI_DOWN under rtnl_lock. Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check for XDP rings instead of bpf program when unconfiguringLarysa Zaremba3-7/+7
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI without applying new ring configuration. When unconfiguring the VSI, we can encounter the state in which there is an XDP program but no XDP rings to destroy or there will be XDP rings that need to be destroyed, but no XDP program to indicate their presence. When unconfiguring, rely on the presence of XDP rings rather then XDP program, as they better represent the current state that has to be destroyed. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: protect XDP configuration with a mutexLarysa Zaremba4-19/+39
The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: move netif_queue_set_napi to rtnl-protected sectionsLarysa Zaremba4-118/+49
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not rtnl-locked when called from the reset. This creates the need to take the rtnl_lock just for a single function and complicates the synchronization with .ndo_bpf. At the same time, there no actual need to fill napi-to-queue information at this exact point. Fill napi-to-queue information when opening the VSI and clear it when the VSI is being closed. Those routines are already rtnl-locked. Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP queues, as this leads to out-of-bounds writes, such as one below. [ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0 [ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047 [ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2 [ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000003] Call Trace: [ +0.000003] <TASK> [ +0.000002] dump_stack_lvl+0x60/0x80 [ +0.000007] print_report+0xce/0x630 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0 [ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000003] kasan_report+0xe9/0x120 [ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000004] netif_queue_set_napi+0x1c2/0x1e0 [ +0.000005] ice_vsi_close+0x161/0x670 [ice] [ +0.000114] ice_dis_vsi+0x22f/0x270 [ice] [ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice] [ +0.000087] pci_dev_save_and_disable+0x82/0xd0 [ +0.000006] pci_reset_function+0x12d/0x230 [ +0.000004] reset_store+0xa0/0x100 [ +0.000006] ? __pfx_reset_store+0x10/0x10 [ +0.000002] ? __pfx_mutex_lock+0x10/0x10 [ +0.000004] ? __check_object_size+0x4c1/0x640 [ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0 [ +0.000006] vfs_write+0x5d6/0xdf0 [ +0.000005] ? fd_install+0x180/0x350 [ +0.000005] ? __pfx_vfs_write+0x10/0xA10 [ +0.000004] ? do_fcntl+0x52c/0xcd0 [ +0.000004] ? kasan_save_track+0x13/0x60 [ +0.000003] ? kasan_save_free_info+0x37/0x60 [ +0.000006] ksys_write+0xfa/0x1d0 [ +0.000003] ? __pfx_ksys_write+0x10/0x10 [ +0.000002] ? __x64_sys_fcntl+0x121/0x180 [ +0.000004] ? _raw_spin_lock+0x87/0xe0 [ +0.000005] do_syscall_64+0x80/0x170 [ +0.000007] ? _raw_spin_lock+0x87/0xe0 [ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000003] ? file_close_fd_locked+0x167/0x230 [ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000005] ? do_syscall_64+0x8c/0x170 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? fput+0x1a/0x2c0 [ +0.000004] ? filp_close+0x19/0x30 [ +0.000004] ? do_dup2+0x25a/0x4c0 [ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0 [ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? __count_memcg_events+0x113/0x380 [ +0.000005] ? handle_mm_fault+0x136/0x820 [ +0.000005] ? do_user_addr_fault+0x444/0xa80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7f2033593154 Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios") Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03net: ethernet: ti: am65-cpsw: Fix RX statistics for XDP_TX and XDP_REDIRECTRoger Quadros1-4/+13
We are not using ndev->stats for rx_packets and rx_bytes anymore. Instead, we use per CPU stats which are collated in am65_cpsw_nuss_ndo_get_stats(). Fix RX statistics for XDP_TX and XDP_REDIRECT cases. Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Julien Panis <jpanis@baylibre.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03net: ethernet: ti: am65-cpsw: Fix NULL dereference on XDP_TXRoger Quadros1-1/+2
If number of TX queues are set to 1 we get a NULL pointer dereference during XDP_TX. ~# ethtool -L eth0 tx 1 ~# ./xdp-trafficgen udp -A <ipv6-src> -a <ipv6-dst> eth0 -t 2 Transmitting on eth0 (ifindex 2) [ 241.135257] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 Fix this by using actual TX queues instead of max TX queues when picking the TX channel in am65_cpsw_ndo_xdp_xmit(). Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Julien Panis <jpanis@baylibre.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03net: ethernet: ti: am65-cpsw: fix XDP_DROP, XDP_TX and XDP_REDIRECTRoger Quadros1-28/+34
The following XDP_DROP test from [1] stalls the interface after 250 packets. ~# xdb-bench drop -m native eth0 This is because new RX requests are never queued. Fix that. The below XDP_TX test from [1] fails with a warning [ 499.947381] XDP_WARN: xdp_update_frame_from_buff(line:277): Driver BUG: missing reserved tailroom ~# xdb-bench tx -m native eth0 Fix that by using PAGE_SIZE during xdp_init_buf(). In XDP_REDIRECT case only 1 packet was processed in rx_poll. Fix it to process up to budget packets. Fix all XDP error cases to call trace_xdp_exception() and drop the packet in am65_cpsw_run_xdp(). [1] xdp-tools suite https://github.com/xdp-project/xdp-tools Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Julien Panis <jpanis@baylibre.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-02igc: Unlock on error in igc_io_resume()Dan Carpenter1-0/+1
Call rtnl_unlock() on this error path, before returning. Fixes: bc23aa949aeb ("igc: Add pcie error handler support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-02net: microchip: vcap: Fix use-after-free error in kunit testJens Emil Schulz Østergaard1-12/+2
This is a clear use-after-free error. We remove it, and rely on checking the return code of vcap_del_rule. Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/kernel-janitors/7bffefc6-219a-4f71-baa0-ad4526e5c198@kili.mountain/ Fixes: c956b9b318d9 ("net: microchip: sparx5: Adding KUNIT tests of key/action values in VCAP API") Signed-off-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-29ice: Add netif_device_attach/detach into PF reset flowDawid Osuchowski1-0/+7
Ethtool callbacks can be executed while reset is in progress and try to access deleted resources, e.g. getting coalesce settings can result in a NULL pointer dereference seen below. Reproduction steps: Once the driver is fully initialized, trigger reset: # echo 1 > /sys/class/net/<interface>/device/reset when reset is in progress try to get coalesce settings using ethtool: # ethtool -c <interface> BUG: kernel NULL pointer dereference, address: 0000000000000020 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 Call Trace: <TASK> ice_get_coalesce+0x17/0x30 [ice] coalesce_prepare_data+0x61/0x80 ethnl_default_doit+0xde/0x340 genl_family_rcv_msg_doit+0xf2/0x150 genl_rcv_msg+0x1b3/0x2c0 netlink_rcv_skb+0x5b/0x110 genl_rcv+0x28/0x40 netlink_unicast+0x19c/0x290 netlink_sendmsg+0x222/0x490 __sys_sendto+0x1df/0x1f0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x82/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7faee60d8e27 Calling netif_device_detach() before reset makes the net core not call the driver when ethtool command is issued, the attempt to execute an ethtool command during reset will result in the following message: netlink error: No such device instead of NULL pointer dereference. Once reset is done and ice_rebuild() is executing, the netif_device_attach() is called to allow for ethtool operations to occur again in a safe manner. Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-29igb: Fix not clearing TimeSync interrupts for 82580Daiwei Li1-0/+10
82580 NICs have a hardware bug that makes it necessary to write into the TSICR (TimeSync Interrupt Cause) register to clear it: https://lore.kernel.org/all/CDCB8BE0.1EC2C%25matthew.vick@intel.com/ Add a conditional so only for 82580 we write into the TSICR register, so we don't risk losing events for other models. Without this change, when running ptp4l with an Intel 82580 card, I get the following output: > timed out while polling for tx timestamp increasing tx_timestamp_timeout or > increasing kworker priority may correct this issue, but a driver bug likely > causes it This goes away with this change. This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events"). Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w@mail.gmail.com/ Tested-by: Daiwei Li <daiweili@google.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Daiwei Li <daiweili@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-27ionic: Prevent tx_timeout due to frequent doorbell ringingBrett Creeley2-2/+2
With recent work to the doorbell workaround code a small hole was introduced that could cause a tx_timeout. This happens if the rx dbell_deadline goes beyond the netdev watchdog timeout set by the driver (i.e. 2 seconds). Fix this by changing the netdev watchdog timeout to 5 seconds and reduce the max rx dbell_deadline to 4 seconds. The test that can reproduce the issue being fixed is a multi-queue send test via pktgen with the "burst" setting to 1. This causes the queue's doorbell to be rung on every packet sent to the driver, which may result in the device missing doorbells due to the high doorbell rate. Cc: stable@vger.kernel.org Fixes: 4ded136c78f8 ("ionic: add work item for missed-doorbell check") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240822192557.9089-1-brett.creeley@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27net: ti: icssg-prueth: Fix 10M Link issue on AM64xMD Danish Anwar1-0/+1
Crash is seen on AM64x 10M link when connecting / disconnecting multiple times. The fix for this is to enable quirk_10m_link_issue for AM64x. Fixes: b256e13378a9 ("net: ti: icssg-prueth: Add AM64x icssg support") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20240823120412.1262536-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26net: ftgmac100: Ensure tx descriptor updates are visibleJacky Chou1-8/+18
The driver must ensure TX descriptor updates are visible before updating TX pointer and TX clear pointer. This resolves TX hangs observed on AST2600 when running iperf3. Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-23net: mana: Fix race of mana_hwc_post_rx_wqe and new hwc responseHaiyang Zhang1-28/+34
The mana_hwc_rx_event_handler() / mana_hwc_handle_resp() calls complete(&ctx->comp_event) before posting the wqe back. It's possible that other callers, like mana_create_txq(), start the next round of mana_hwc_send_request() before the posting of wqe. And if the HW is fast enough to respond, it can hit no_wqe error on the HW channel, then the response message is lost. The mana driver may fail to create queues and open, because of waiting for the HW response and timed out. Sample dmesg: [ 528.610840] mana 39d4:00:02.0: HWC: Request timed out! [ 528.614452] mana 39d4:00:02.0: Failed to send mana message: -110, 0x0 [ 528.618326] mana 39d4:00:02.0 enP14804s2: Failed to create WQ object: -110 To fix it, move posting of rx wqe before complete(&ctx->comp_event). Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-22net: xilinx: axienet: Fix dangling multicast addressesSean Anderson2-12/+10
If a multicast address is removed but there are still some multicast addresses, that address would remain programmed into the frame filter. Fix this by explicitly setting the enable bit for each filter. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22net: xilinx: axienet: Always disable promiscuous modeSean Anderson1-0/+4
If promiscuous mode is disabled when there are fewer than four multicast addresses, then it will not be reflected in the hardware. Fix this by always clearing the promiscuous mode flag even when we program multicast addresses. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22octeontx2-af: Fix CPT AF register offset calculationBharat Bhushan1-12/+11
Some CPT AF registers are per LF and others are global. Translation of PF/VF local LF slot number to actual LF slot number is required only for accessing perf LF registers. CPT AF global registers access do not require any LF slot number. Also, there is no reason CPT PF/VF to know actual lf's register offset. Without this fix microcode loading will fail, VFs cannot be created and hardware is not usable. Fixes: bc35e28af789 ("octeontx2-af: replace cpt slot with lf id on reg write") Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240821070558.1020101-1-bbhushan2@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-22net: ngbe: Fix phy mode set to external phyMengyuan Lou1-2/+6
The MAC only has add the TX delay and it can not be modified. MAC and PHY are both set the TX delay cause transmission problems. So just disable TX delay in PHY, when use rgmii to attach to external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers. And it is does not matter to internal phy. Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink") Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Cc: stable@vger.kernel.org # 6.3+ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E6759CF1387CF84C+20240820030425.93003-1-mengyuanlou@net-swift.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-22Merge branch '100GbE' of ↵Jakub Kicinski3-46/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-08-20 (ice) This series contains updates to ice driver only. Maciej fixes issues with Rx data path on architectures with PAGE_SIZE >= 8192; correcting page reuse usage and calculations for last offset and truesize. Michal corrects assignment of devlink port number to use PF id. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: use internal pf id instead of function number ice: fix truesize operations for PAGE_SIZE >= 8192 ice: fix ICE_LAST_OFFSET formula ice: fix page reuse when PAGE_SIZE is over 8k ==================== Link: https://patch.msgid.link/20240820215620.1245310-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22bnxt_en: Fix double DMA unmapping for XDP_REDIRECTSomnath Kotur1-5/+0
Remove the dma_unmap_page_attrs() call in the driver's XDP_REDIRECT code path. This should have been removed when we let the page pool handle the DMA mapping. This bug causes the warning: WARNING: CPU: 7 PID: 59 at drivers/iommu/dma-iommu.c:1198 iommu_dma_unmap_page+0xd5/0x100 CPU: 7 PID: 59 Comm: ksoftirqd/7 Tainted: G W 6.8.0-1010-gcp #11-Ubuntu Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.15.2 04/02/2024 RIP: 0010:iommu_dma_unmap_page+0xd5/0x100 Code: 89 ee 48 89 df e8 cb f2 69 ff 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 e9 ab 17 71 00 <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 RSP: 0018:ffffab1fc0597a48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99ff838280c8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffab1fc0597a78 R08: 0000000000000002 R09: ffffab1fc0597c1c R10: ffffab1fc0597cd3 R11: ffff99ffe375acd8 R12: 00000000e65b9000 R13: 0000000000000050 R14: 0000000000001000 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff9a06efb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000565c34c37210 CR3: 00000005c7e3e000 CR4: 0000000000350ef0 ? show_regs+0x6d/0x80 ? __warn+0x89/0x150 ? iommu_dma_unmap_page+0xd5/0x100 ? report_bug+0x16a/0x190 ? handle_bug+0x51/0xa0 ? exc_invalid_op+0x18/0x80 ? iommu_dma_unmap_page+0xd5/0x100 ? iommu_dma_unmap_page+0x35/0x100 dma_unmap_page_attrs+0x55/0x220 ? bpf_prog_4d7e87c0d30db711_xdp_dispatcher+0x64/0x9f bnxt_rx_xdp+0x237/0x520 [bnxt_en] bnxt_rx_pkt+0x640/0xdd0 [bnxt_en] __bnxt_poll_work+0x1a1/0x3d0 [bnxt_en] bnxt_poll+0xaa/0x1e0 [bnxt_en] __napi_poll+0x33/0x1e0 net_rx_action+0x18a/0x2f0 Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240820203415.168178-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21igb: cope with large MAX_SKB_FRAGSPaolo Abeni1-0/+1
Sabrina reports that the igb driver does not cope well with large MAX_SKB_FRAG values: setting MAX_SKB_FRAG to 45 causes payload corruption on TX. An easy reproducer is to run ssh to connect to the machine. With MAX_SKB_FRAGS=17 it works, with MAX_SKB_FRAGS=45 it fails. This has been reported originally in https://bugzilla.redhat.com/show_bug.cgi?id=2265320 The root cause of the issue is that the driver does not take into account properly the (possibly large) shared info size when selecting the ring layout, and will try to fit two packets inside the same 4K page even when the 1st fraglist will trump over the 2nd head. Address the issue by checking if 2K buffers are insufficient. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reported-by: Jan Tluka <jtluka@redhat.com> Reported-by: Jirka Hladky <jhladky@redhat.com> Reported-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Corinna Vinschen <vinschen@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Link: https://patch.msgid.link/20240816152034.1453285-1-vinschen@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21cxgb4: add forgotten u64 ivlan cast before shiftNikolay Kuratov1-1/+2
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match() There is no reason it should not be done here Found by Linux Verification Center (linuxtesting.org) with SVACE Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Cc: stable@vger.kernel.org Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240819075408.92378-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()Dan Carpenter1-3/+4
The dpaa2_switch_add_bufs() function returns the number of bufs that it was able to add. It returns BUFS_PER_CMD (7) for complete success or a smaller number if there are not enough pages available. However, the error checking is looking at the total number of bufs instead of the number which were added on this iteration. Thus the error checking only works correctly for the first iteration through the loop and subsequent iterations are always counted as a success. Fix this by checking only the bufs added in the current iteration. Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20ice: use internal pf id instead of function numberMichal Swiatkowski1-2/+2
Use always the same pf id in devlink port number. When doing pass-through the PF to VM bus info func number can be any value. Fixes: 2ae0aa4758b0 ("ice: Move devlink port to PF/VF struct") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix truesize operations for PAGE_SIZE >= 8192Maciej Fijalkowski2-34/+20
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192, truesize is calculated and stored in xdp_buff::frame_sz per each processed Rx buffer. This means that frame_sz will contain the truesize based on last received buffer, but commit 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") assumed this value will be constant for each buffer, which breaks the page recycling scheme and mess up the way we update the page::page_offset. To fix this, let us work on constant truesize when PAGE_SIZE >= 8192 instead of basing this on size of a packet read from Rx descriptor. This way we can simplify the code and avoid calculating truesize per each received frame and on top of that when using xdp_update_skb_shared_info(), current formula for truesize update will be valid. This means ice_rx_frame_truesize() can be removed altogether. Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff() in ice_vsi_cfg_rxq(). This should have been removed at the point where xdp_buff struct started to be a member of ice_rx_ring and it was no longer a stack based variable. There are two fixes tags as my understanding is that the first one exposed us to broken truesize and page_offset handling and then second introduced broken skb_shared_info update in ice_{construct,build}_skb(). Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com> Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/ Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix ICE_LAST_OFFSET formulaMaciej Fijalkowski1-1/+1
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers. Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not ICE_RXBUF_2048. Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag") Suggested-by: Luiz Capitulino <luizcap@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix page reuse when PAGE_SIZE is over 8kMaciej Fijalkowski1-9/+3
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the same as x86 currently, meaning reuse of a page should only take place when no one else is busy with it. Do two things independently of underlying PAGE_SIZE: - store the page count under ice_rx_buf::pgcnt - then act upon its value vs ice_rx_buf::pagecnt_bias when making the decision regarding page reuse Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>