summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2023-04-26e1000e: Disable TSO on i219-LM card to increase speedSebastian Basierski1-25/+26
[ Upstream commit 67d47b95119ad589b0a0b16b88b1dd9a04061ced ] While using i219-LM card currently it was only possible to achieve about 60% of maximum speed due to regression introduced in Linux 5.8. This was caused by TSO not being disabled by default despite commit f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround"). Fix that by disabling TSO during driver probe. Fixes: f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround") Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230417205345.1030801-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26bnxt_en: fix free-runnig PHC modeVadim Fedorenko1-1/+1
[ Upstream commit 8c154d272c3e03b100baaf1df473f22a78fa403e ] The patch in fixes changed the way real-time mode is chosen for PHC on the NIC. Apparently there is one more use case of the check outside of ptp part of the driver which was not converted to the new macro and is making a lot of noise in free-running mode. Fixes: 131db4991622 ("bnxt_en: reset PHC frequency in free-running mode") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230418202511.1544735-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26mlxsw: pci: Fix possible crash during initializationIdo Schimmel1-1/+1
[ Upstream commit 1f64757ee2bb22a93ec89b4c71707297e8cca0ba ] During initialization the driver issues a reset command via its command interface in order to remove previous configuration from the device. After issuing the reset, the driver waits for 200ms before polling on the "system_status" register using memory-mapped IO until the device reaches a ready state (0x5E). The wait is necessary because the reset command only triggers the reset, but the reset itself happens asynchronously. If the driver starts polling too soon, the read of the "system_status" register will never return and the system will crash [1]. The issue was discovered when the device was flashed with a development firmware version where the reset routine took longer to complete. The issue was fixed in the firmware, but it exposed the fact that the current wait time is borderline. Fix by increasing the wait time from 200ms to 400ms. With this patch and the buggy firmware version, the issue did not reproduce in 10 reboots whereas without the patch the issue is reproduced quite consistently. [1] mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Shutting down cpus with NMI Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: ac004e84164e ("mlxsw: pci: Wait longer before accessing the device after reset") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()Nikita Zhandarovich1-0/+2
[ Upstream commit c0e73276f0fcbbd3d4736ba975d7dc7a48791b0c ] Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This behaviour may lead to NULL pointer dereference in 'multi->total_len'. Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value against NULL. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Co-developed-by: Natalia Petrova <n.petrova@fintech.ru> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230417120718.52325-1-n.zhandarovich@fintech.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26bnxt_en: Do not initialize PTP on older P3/P4 chipsMichael Chan1-1/+1
[ Upstream commit e8b51a1a15d5a3cce231e0669f6a161dc5bb9b75 ] The driver does not support PTP on these older chips and it is assuming that firmware on these older chips will not return the PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(), causing the function to abort quietly. But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg() will proceed further. Eventually it will fail in bnxt_ptp_init() -> bnxt_map_ptp_regs() because there is no code to support the older chips. The driver will then complain: "PTP initialization failed.\n" Fix it so that we abort quietly earlier without going through the unnecessary steps and alarming the user with the warning log. Fixes: ae5c42f0b92c ("bnxt_en: Get PTP hardware capability from firmware") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26cxgb4: fix use after free bugs caused by circular dependency problemDuoming Zhou1-1/+1
[ Upstream commit e50b9b9e8610d47b7c22529443e45a16b1ea3a15 ] The flower_stats_timer can schedule flower_stats_work and flower_stats_work can also arm the flower_stats_timer. The process is shown below: ----------- timer schedules work ------------ ch_flower_stats_cb() //timer handler schedule_work(&adap->flower_stats_work); ----------- work arms timer ------------ ch_flower_stats_handler() //workqueue callback function mod_timer(&adap->flower_stats_timer, ...); When the cxgb4 device is detaching, the timer and workqueue could still be rearmed. The process is shown below: (cleanup routine) | (timer and workqueue routine) remove_one() | free_some_resources() | ch_flower_stats_cb() //timer cxgb4_cleanup_tc_flower() | schedule_work() del_timer_sync() | | ch_flower_stats_handler() //workqueue | mod_timer() cancel_work_sync() | kfree(adapter) //FREE | ch_flower_stats_cb() //timer | adap->flower_stats_work //USE This patch changes del_timer_sync() to timer_shutdown_sync(), which could prevent rearming of the timer from the workqueue. Fixes: e0f911c81e93 ("cxgb4: fetch stats for offloaded tc flower flows") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20230415081227.7463-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26i40e: fix i40e_setup_misc_vector() error handlingAleksandr Loktionov1-1/+4
[ Upstream commit c86c00c6935505929cc9adb29ddb85e48c71f828 ] Add error handling of i40e_setup_misc_vector() in i40e_rebuild(). In case interrupt vectors setup fails do not re-open vsi-s and do not bring up vf-s, we have no interrupts to serve a traffic anyway. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26i40e: fix accessing vsi->active_filters without holding lockAleksandr Loktionov1-2/+2
[ Upstream commit 8485d093b076e59baff424552e8aecfc5bd2d261 ] Fix accessing vsi->active_filters without holding the mac_filter_hash_lock. Move vsi->active_filters = 0 inside critical section and move clear_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state) after the critical section to ensure the new filters from other threads can be added only after filters cleaning in the critical section is finished. Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26sfc: Fix use-after-free due to selftest_workDing Hui2-1/+2
[ Upstream commit a80bb8e7233b2ad6ff119646b6e33fb3edcec37b ] There is a use-after-free scenario that is: When the NIC is down, user set mac address or vlan tag to VF, the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop() and efx_net_open(), since netif_running() is false, the port will not start and keep port_enabled false, but selftest_work is scheduled in efx_net_open(). If we remove the device before selftest_work run, the efx_stop_port() will not be called since the NIC is down, and then efx is freed, we will soon get a UAF in run_timer_softirq() like this: [ 1178.907941] ================================================================== [ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90 [ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0 [ 1178.907950] [ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022 [ 1178.907955] Call Trace: [ 1178.907956] <IRQ> [ 1178.907960] dump_stack+0x71/0xab [ 1178.907963] print_address_description+0x6b/0x290 [ 1178.907965] ? run_timer_softirq+0xdea/0xe90 [ 1178.907967] kasan_report+0x14a/0x2b0 [ 1178.907968] run_timer_softirq+0xdea/0xe90 [ 1178.907971] ? init_timer_key+0x170/0x170 [ 1178.907973] ? hrtimer_cancel+0x20/0x20 [ 1178.907976] ? sched_clock+0x5/0x10 [ 1178.907978] ? sched_clock_cpu+0x18/0x170 [ 1178.907981] __do_softirq+0x1c8/0x5fa [ 1178.907985] irq_exit+0x213/0x240 [ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330 [ 1178.907989] apic_timer_interrupt+0xf/0x20 [ 1178.907990] </IRQ> [ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370 If the NIC is not actually brought up, there is no need to schedule selftest_work, so let's move invoking efx_selftest_async_start() into efx_start_all(), and it will be canceled by broughting down. Fixes: dd40781e3a4e ("sfc: Run event/IRQ self-test asynchronously when interface is brought up") Fixes: e340be923012 ("sfc: add ndo_set_vf_mac() function for EF10") Debugged-by: Huang Cun <huangcun@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Suggested-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20net: macb: fix a memory corruption in extended buffer descriptor modeRoman Gushchin1-0/+4
[ Upstream commit e8b74453555872851bdd7ea43a7c0ec39659834f ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b4296148066 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Co-developed-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230412232144.770336-1-roman.gushchin@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20qlcnic: check pci_reset_function resultDenis Plotnikov1-1/+7
[ Upstream commit 7573099e10ca69c3be33995c1fcd0d241226816d ] Static code analyzer complains to unchecked return value. The result of pci_reset_function() is unchecked. Despite, the issue is on the FLR supported code path and in that case reset can be done with pcie_flr(), the patch uses less invasive approach by adding the result check of pci_reset_function(). Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism") Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20iavf: remove active_cvlans and active_svlans bitmapsAhmed Zaki3-47/+45
[ Upstream commit 9c85b7fa12ef2e4fc11a4e31ac595fb5f9d0ddf9 ] The VLAN filters info is currently being held in a list and 2 bitmaps (active_cvlans and active_svlans). We are experiencing some racing where data is not in sync in the list and bitmaps. For example, the VLAN is initially added to the list but only when the PF replies, it is added to the bitmap. If a user adds many V2 VLANS before the PF responds: while [ $((i++)) ] ip l add l eth0 name eth0.$i type vlan id $i we might end up with more VLAN list entries than the designated limit. Also, The "ip link show" will show more links added than the PF limit. On the other and, the bitmaps are only used to check the number of VLAN filters and to re-enable the filters when the interface goes from DOWN to UP. This patch gets rid of the bitmaps and uses the list only. To do that, the states of the VLAN filter are modified: 1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing the PF. This is the "ip link del eth0.$i" path. 2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be removed from the PF and then marked INACTIVE. 3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not delete the VLAN. Fixes: 48ccc43ecf10 ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20iavf: refactor VLAN filter statesAhmed Zaki3-26/+28
[ Upstream commit 0c0da0e951053fda20412cd284e2714bbbb31bff ] The VLAN filter states are currently being saved as individual bits. This is error prone as multiple bits might be mistakenly set. Fix by replacing the bits with a single state enum. Also, add an "ACTIVE" state for filters that are accepted by the PF. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 9c85b7fa12ef ("iavf: remove active_cvlans and active_svlans bitmaps") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20niu: Fix missing unwind goto in niu_alloc_channels()Harshit Mogalapalli1-1/+1
[ Upstream commit 8ce07be703456acb00e83d99f3b8036252c33b02 ] Smatch reports: drivers/net/ethernet/sun/niu.c:4525 niu_alloc_channels() warn: missing unwind goto? If niu_rbr_fill() fails, then we are directly returning 'err' without freeing the channels. Fix this by changing direct return to a goto 'out_err'. Fixes: a3138df9f20e ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: stmmac: Add queue reset into stmmac_xdp_open() functionSong Yoong Siang1-0/+2
commit 24e3fce00c0b557491ff596c0682a29dee6fe848 upstream. Queue reset was moved out from __init_dma_rx_desc_rings() and __init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit and receive packet after XDP prog setup. This commit adds the missing queue reset into stmmac_xdp_open() function. Fixes: f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions") Cc: <stable@vger.kernel.org> # 6.0+ Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13net: stmmac: check fwnode for phy device before scanning for phyMichael Sit Wei Hong1-4/+11
[ Upstream commit 8fbc10b995a506e173f1080dfa2764f232a65e02 ] Some DT devices already have phy device configured in the DT/ACPI. Current implementation scans for a phy unconditionally even though there is a phy listed in the DT/ACPI and already attached. We should check the fwnode if there is any phy device listed in fwnode and decide whether to scan for a phy to attach to. Fixes: fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY") Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/ Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shahab Vahedi <shahab@synopsys.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13gve: Secure enough bytes in the first TX desc for all TCP pktsShailend Chand2-7/+7
[ Upstream commit 3ce9345580974863c060fa32971537996a7b2d57 ] Non-GSO TCP packets whose SKBs' linear portion did not include the entire TCP header were not populating the first Tx descriptor with as many bytes as the vNIC expected. This change ensures that all TCP packets populate the first descriptor with the correct number of bytes. Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13ice: Reset FDIR counter in FDIR init stageLingyu Liu1-0/+16
[ Upstream commit 83c911dc5e0e8e6eaa6431c06972a8f159bfe2fc ] Reset the FDIR counters when FDIR inits. Without this patch, when VF initializes or resets, all the FDIR counters are not cleaned, which may cause unexpected behaviors for future FDIR rule create (e.g., rule conflict). Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13ice: fix wrong fallback logic for FDIRSimei Su1-3/+4
[ Upstream commit b4a01ace20f5c93c724abffc0a83ec84f514b98d ] When adding a FDIR filter, if ice_vc_fdir_set_irq_ctx returns failure, the inserted fdir entry will not be removed and if ice_vc_fdir_write_fltr returns failure, the fdir context info for irq handler will not be cleared which may lead to inconsistent or memory leak issue. This patch refines failure cases to resolve this issue. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Simei Su <simei.su@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: stmmac: fix up RX flow hash indirection table when setting channelsCorinna Vinschen1-1/+5
[ Upstream commit 218c597325f4faf7b7a6049233a30d7842b5b2dc ] stmmac_reinit_queues() fails to fix up the RX hash. Even if the number of channels gets restricted, the output of `ethtool -x' indicates that all RX queues are used: $ ethtool -l enp0s29f2 Channel parameters for enp0s29f2: Pre-set maximums: RX: 8 TX: 8 Other: n/a Combined: n/a Current hardware settings: RX: 8 TX: 8 Other: n/a Combined: n/a $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 8 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 0 1 2 3 4 5 6 7 [...] $ ethtool -L enp0s29f2 rx 3 $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 3 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 0 1 2 3 4 5 6 7 [...] Fix this by setting the indirection table according to the number of specified queues. The result is now as expected: $ ethtool -L enp0s29f2 rx 3 $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 3 RX ring(s): 0: 0 1 2 0 1 2 0 1 8: 2 0 1 2 0 1 2 0 [...] Tested on Intel Elkhart Lake. Fixes: 0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probeSiddharth Vadapalli1-2/+4
[ Upstream commit c6b486fb33680ad5a3a6390ce693c835caaae3f7 ] In the am65_cpsw_nuss_probe() function's cleanup path, the call to of_platform_device_destroy() for the common->mdio_dev device is invoked unconditionally. It is possible that either the MDIO node is not present in the device-tree, or the MDIO node is disabled in the device-tree. In both these cases, the MDIO device is not created, resulting in a NULL pointer dereference when the of_platform_device_destroy() function is invoked on the common->mdio_dev device on the cleanup path. Fix this by ensuring that the common->mdio_dev device exists, before attempting to invoke of_platform_device_destroy(). Fixes: a45cfcc69a25 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: ethernet: mtk_eth_soc: fix remaining throughput regressionFelix Fietkau1-0/+4
[ Upstream commit e669ce46740a9815953bb4452a6bc5a7fdc21a50 ] Based on further tests, it seems that the QDMA shaper is not able to perform shaping close to the MAC link rate without throughput loss. This cannot be compensated by increasing the shaping rate, so it seems to be an internal limit. Fix the remaining throughput regression by detecting that condition and limiting shaping to ports with lower link speed. This patch intentionally ignores link speed gain from TRGMII, because even on such links, shaping to 1000 Mbit/s incurs some throughput degradation. Fixes: f63959c7eec3 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Tested-By: Frank Wunderlich <frank-w@public-files.de> Reported-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: stmmac: remove redundant fixup to support fixed-link modeMichael Sit Wei Hong1-1/+0
[ Upstream commit 6fc21a6ed5953b1dd3a41ce7be1ea57f5ef8c081 ] Currently, intel_speed_mode_2500() will fix-up xpcs_an_inband to 1 if the underlying controller has a max speed of 1000Mbps. The value has been initialized and modified if it is a fixed-linked setup earlier. This patch removes the fix-up to allow for fixed-linked setup support. In stmmac_phy_setup(), ovr_an_inband is set based on the value of xpcs_an_inband. Which in turn will return an error in phylink_parse_mode() where MLO_AN_FIXED and ovr_an_inband are both set. Fixes: c82386310d95 ("stmmac: intel: prepare to support 1000BASE-X phy interface setting") Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13net: stmmac: check if MAC needs to attach to a PHYMichael Sit Wei Hong1-1/+3
[ Upstream commit fe2cfbc9680356a3d9f8adde8a38e715831e32f5 ] After the introduction of the fixed-link support, the MAC driver no longer attempt to scan for a PHY to attach to. This causes the non fixed-link setups to stop working. Using the phylink_expects_phy() to check and determine if the MAC should expect and attach a PHY. Fixes: ab21cf920928 ("net: stmmac: make mdio register skips PHY scanning for fixed-link") Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: Lai Peter Jun Ann <peter.jun.ann.lai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flowFelix Fietkau1-0/+1
[ Upstream commit 924531326e2dd4ceabe7240f2b55a88e7d894ec2 ] The cache needs to be flushed to ensure that the hardware stops offloading the flow immediately. Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offloadFelix Fietkau2-4/+7
[ Upstream commit 5f36ca1b841fb17a20249fd9fedafc7dc7fdd940 ] Check for skb metadata in order to detect the case where the DSA header is not present. Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: ethernet: mtk_eth_soc: fix flow block refcounting logicFelix Fietkau1-1/+2
[ Upstream commit 8c1cb87c2a5c29da416848451a687473f379611c ] Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to call flow_block_cb_incref for a newly allocated cb. Also fix the accidentally inverted refcount check on unbind. Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06bnxt_en: Add missing 200G link speed reportingMichael Chan2-0/+3
[ Upstream commit 581bce7bcb7e7f100908728e7b292e266c76895b ] bnxt_fw_to_ethtool_speed() is missing the case statement for 200G link speed reported by firmware. As a result, ethtool will report unknown speed when the firmware reports 200G link speed. Fixes: 532262ba3b84 ("bnxt_en: ethtool: support PAM4 link speeds up to 200G") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06bnxt_en: Fix typo in PCI id to device description string mappingKalesh AP1-4/+4
[ Upstream commit 62aad36ed31abc80f35db11e187e690448a79f7d ] Fix 57502 and 57508 NPAR description string entries. The typos caused these devices to not match up with lspci output. Fixes: 49c98421e6ab ("bnxt_en: Add PCI IDs for 57500 series NPAR devices.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06bnxt_en: Fix reporting of test result in ethtool selftestKalesh AP1-0/+1
[ Upstream commit 83714dc3db0e4a088673601bc8099b079bc1a077 ] When the selftest command fails, driver is not reporting the failure by updating the "test->flags" when bnxt_close_nic() fails. Fixes: eb51365846bc ("bnxt_en: Add basic ethtool -t selftest support.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06i40e: fix registers dump after run ethtool adapter self testRadoslaw Tyl2-6/+7
[ Upstream commit c5cff16f461a4a434a9915a7be7ac9ced861a8a4 ] Fix invalid registers dump from ethtool -d ethX after adapter self test by ethtool -t ethY. It causes invalid data display. The problem was caused by overwriting i40e_reg_list[].elements which is common for ethtool self test and dump. Fixes: 22dd9ae8afcc ("i40e: Rework register diagnostic") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230328172659.3906413-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06bnx2x: use the right build_skb() helperJakub Kicinski1-2/+14
[ Upstream commit 8c495270845d6b4854607e946baef3637a8259ed ] build_skb() no longer accepts slab buffers. Since slab use is fairly uncommon we prefer the drivers to call a separate slab_build_skb() function appropriately. bnx2x uses the old semantics where size of 0 meant buffer from slab. It sets the fp->rx_frag_size to 0 for MTUs which don't fit in a page. It needs to call slab_build_skb(). This fixes the WARN_ONCE() of incorrect API use seen with bnx2x. Reported-by: Thomas Voegtle <tv@lio96.de> Link: https://lore.kernel.org/all/b8f295e4-ba57-8bfb-7d9c-9d62a498a727@lio96.de/ Fixes: ce098da1497c ("skbuff: Introduce slab_build_skb()") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230329000013.2734957-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G linksFelix Fietkau1-2/+0
[ Upstream commit 07b3af42d8d528374d4f42d688bae86eeb30831a ] Using the QDMA tx scheduler to throttle tx to line speed works fine for switch ports, but apparently caused a regression on non-switch ports. Based on a number of tests, it seems that this throttling can be safely dropped without re-introducing the issues on switch ports that the tx scheduling changes resolved. Link: https://lore.kernel.org/netdev/trinity-92c3826f-c2c8-40af-8339-bc6d0d3ffea4-1678213958520@3c-app-gmx-bs16/ Fixes: f63959c7eec3 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Reported-by: Frank Wunderlich <frank-w@public-files.de> Reported-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230324140404.95745-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()Jakob Koschel1-3/+5
[ Upstream commit e9a1cc2e4c4ee7c7e60fb26345618c2522a2a10f ] The code implicitly assumes that the list iterator finds a correct handle. If 'vsi_handle' is not found the 'old_agg_vsi_info' was pointing to an bogus memory location. For safety a separate list iterator variable should be used to make the != NULL check on 'old_agg_vsi_info' correct under any circumstances. Additionally Linus proposed to avoid any use of the list iterator variable after the loop, in the attempt to move the list iterator variable declaration into the macro to avoid any potential misuse after the loop. Using it in a pointer comparison after the loop is undefined behavior and should be omitted if possible [1]. Fixes: 37c592062b16 ("ice: remove the VSI info from previous agg") Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jkl820.git@gmail.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06ice: add profile conflict check for AVF FDIRJunfeng Guo1-0/+73
[ Upstream commit 29486b6df3e6a63b57d1ed1dce06051267282ff4 ] Add profile conflict check while adding some FDIR rules to avoid unexpected flow behavior, rules may have conflict including: IPv4 <---> {IPv4_UDP, IPv4_TCP, IPv4_SCTP} IPv6 <---> {IPv6_UDP, IPv6_TCP, IPv6_SCTP} For example, when we create an FDIR rule for IPv4, this rule will work on packets including IPv4, IPv4_UDP, IPv4_TCP and IPv4_SCTP. But if we then create an FDIR rule for IPv4_UDP and then destroy it, the first FDIR rule for IPv4 cannot work on pkt IPv4_UDP then. To prevent this unexpected behavior, we add restriction in software when creating FDIR rules by adding necessary profile conflict check. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06ice: Fix ice_cfg_rdma_fltr() to only update relevant fieldsBrett Creeley1-4/+22
[ Upstream commit d94dbdc4e0209b5e7d736ab696f8d635b034e3ee ] The current implementation causes ice_vsi_update() to update all VSI fields based on the cached VSI context. This also assumes that the ICE_AQ_VSI_PROP_Q_OPT_VALID bit is set. This can cause problems if the VSI context is not correctly synced by the driver. Fix this by only updating the fields that correspond to ICE_AQ_VSI_PROP_Q_OPT_VALID. Also, make sure to save the updated result in the cached VSI context on success. Fixes: 348048e724a0 ("ice: Implement iidc operations") Co-developed-by: Robert Malz <robertx.malz@intel.com> Signed-off-by: Robert Malz <robertx.malz@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06smsc911x: avoid PHY being resumed when interface is not upWolfram Sang1-2/+5
[ Upstream commit f22c993f31fa9615df46e49cd768b713d39a852f ] SMSC911x doesn't need mdiobus suspend/resume, that's why it sets 'mac_managed_pm'. However, setting it needs to be moved from init to probe, so mdiobus PM functions will really never be called (e.g. when the interface is not up yet during suspend/resume). Fixes: 3ce9f2bef755 ("net: smsc911x: Stop and start PHY during suspend and resume") Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230327083138.6044-1-wsa+renesas@sang-engineering.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: mvpp2: parser fix PPPoESven Auhagen1-48/+34
[ Upstream commit 031a416c2170866be5132ae42e14453d669b0cb1 ] In PPPoE add all IPv4 header option length to the parser and adjust the L3 and L4 offset accordingly. Currently the L4 match does not work with PPPoE and all packets are matched as L3 IP4 OPT. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: mvpp2: parser fix QinQSven Auhagen1-2/+2
[ Upstream commit a587a84813b90372cb0a7565e201a4075da67919 ] The mvpp2 parser entry for QinQ has the inner and outer VLAN in the wrong order. Fix the problem by swapping them. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: mvpp2: classifier flow fix fragmentation flagsSven Auhagen1-12/+18
[ Upstream commit 9a251cae51d57289908222e6c322ca61fccc25fd ] Add missing IP Fragmentation Flag. Fixes: f9358e12a0af ("net: mvpp2: split ingress traffic into multiple flows") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net: stmmac: don't reject VLANs when IFF_PROMISC is setVladimir Oltean2-59/+3
[ Upstream commit a7602e7332b97cfbec7bacb0f1ade99a575fe104 ] The blamed commit has introduced the following tests to dwmac4_add_hw_vlan_rx_fltr(), called from stmmac_vlan_rx_add_vid(): if (hw->promisc) { netdev_err(dev, "Adding VLAN in promisc mode not supported\n"); return -EPERM; } "VLAN promiscuous" mode is keyed in this driver to IFF_PROMISC, and so, vlan_vid_add() and vlan_vid_del() calls cannot take place in IFF_PROMISC mode. I have the following 2 arguments that this restriction is.... hm, how shall I put it nicely... unproductive :) First, take the case of a Linux bridge. If the kernel is compiled with CONFIG_BRIDGE_VLAN_FILTERING=y, then this bridge shall have a VLAN database. The bridge shall try to call vlan_add_vid() on its bridge ports for each VLAN in the VLAN table. It will do this irrespectively of whether that port is *currently* VLAN-aware or not. So it will do this even when the bridge was created with vlan_filtering 0. But the Linux bridge, in VLAN-unaware mode, configures its ports in promiscuous (IFF_PROMISC) mode, so that they accept packets with any MAC DA (a switch must do this in order to forward those packets which are not directly targeted to its MAC address). As a result, the stmmac driver does not work as a bridge port, when the kernel is compiled with CONFIG_BRIDGE_VLAN_FILTERING=y. $ ip link add br0 type bridge && ip link set br0 up $ ip link set eth0 master br0 && ip link set eth0 up [ 2333.943296] br0: port 1(eth0) entered blocking state [ 2333.943381] br0: port 1(eth0) entered disabled state [ 2333.943782] device eth0 entered promiscuous mode [ 2333.944080] 4033c000.ethernet eth0: Adding VLAN in promisc mode not supported [ 2333.976509] 4033c000.ethernet eth0: failed to initialize vlan filtering on this port RTNETLINK answers: Operation not permitted Secondly, take the case of stmmac as DSA master. Some switch tagging protocols are based on 802.1Q VLANs (tag_sja1105.c), and as such, tag_8021q.c uses vlan_vid_add() to work with VLAN-filtering DSA masters. But also, when a DSA port becomes promiscuous (for example when it joins a bridge), the DSA framework also makes the DSA master promiscuous. Moreover, for every VLAN that a DSA switch sends to the CPU, DSA also programs a VLAN filter on the DSA master, because if the the DSA switch uses a tail tag, then the hardware frame parser of the DSA master will see VLAN as VLAN, and might filter them out, for being unknown. Due to the above 2 reasons, my belief is that the stmmac driver does not get to choose to not accept vlan_vid_add() calls while IFF_PROMISC is enabled, because the 2 are completely independent and there are code paths in the network stack which directly lead to this situation occurring, without the user's direct input. In fact, my belief is that "VLAN promiscuous" mode should have never been keyed on IFF_PROMISC in the first place, but rather, on the NETIF_F_HW_VLAN_CTAG_FILTER feature flag which can be toggled by the user through ethtool -k, when present in netdev->hw_features. In the stmmac driver, NETIF_F_HW_VLAN_CTAG_FILTER is only present in "features", making this feature "on [fixed]". I have this belief because I am unaware of any definition of promiscuity which implies having an effect on anything other than MAC DA (therefore not VLAN). However, I seem to be rather alone in having this opinion, looking back at the disagreements from this discussion: https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/ In any case, to remove the vlan_vid_add() dependency on !IFF_PROMISC, one would need to remove the check and see what fails. I guess the test was there because of the way in which dwmac4_vlan_promisc_enable() is implemented. For context, the dwmac4 supports Perfect Filtering for a limited number of VLANs - dwmac4_get_num_vlan(), priv->hw->num_vlan, with a fallback on Hash Filtering - priv->dma_cap.vlhash - see stmmac_vlan_update(), also visible in cat /sys/kernel/debug/stmmaceth/eth0/dma_cap | grep 'VLAN Hash Filtering'. The perfect filtering is based on MAC_VLAN_Tag_Filter/MAC_VLAN_Tag_Data registers, accessed in the driver through dwmac4_write_vlan_filter(). The hash filtering is based on the MAC_VLAN_Hash_Table register, named GMAC_VLAN_HASH_TABLE in the driver and accessed by dwmac4_update_vlan_hash(). The control bit for enabling hash filtering is GMAC_VLAN_VTHM (MAC_VLAN_Tag_Ctrl bit VTHM: VLAN Tag Hash Table Match Enable). Now, the description of dwmac4_vlan_promisc_enable() is that it iterates through the driver's cache of perfect filter entries (hw->vlan_filter[i], added by dwmac4_add_hw_vlan_rx_fltr()), and evicts them from hardware by unsetting their GMAC_VLAN_TAG_DATA_VEN (MAC_VLAN_Tag_Data bit VEN - VLAN Tag Enable) bit. Then it unsets the GMAC_VLAN_VTHM bit, which disables hash matching. This leaves the MAC, according to table "VLAN Match Status" from the documentation, to always enter these data paths: VID |VLAN Perfect Filter |VTHM Bit |VLAN Hash Filter |Final VLAN Match |Match Result | |Match Result |Status -------|--------------------|---------|-----------------|---------------- VID!=0 |Fail |0 |don't care |Pass So, dwmac4_vlan_promisc_enable() does its job, but by unsetting GMAC_VLAN_VTHM, it conflicts with the other code path which controls this bit: dwmac4_update_vlan_hash(), called through stmmac_update_vlan_hash() from stmmac_vlan_rx_add_vid() and from stmmac_vlan_rx_kill_vid(). This is, I guess, why dwmac4_add_hw_vlan_rx_fltr() is not allowed to run after dwmac4_vlan_promisc_enable() has unset GMAC_VLAN_VTHM: because if it did, then dwmac4_update_vlan_hash() would set GMAC_VLAN_VTHM again, breaking the "VLAN promiscuity". It turns out that dwmac4_vlan_promisc_enable() is way too complicated for what needs to be done. The MAC_Packet_Filter register also has the VTFE bit (VLAN Tag Filter Enable), which simply controls whether VLAN tagged packets which don't match the filtering tables (either perfect or hash) are dropped or not. At the moment, this driver unconditionally sets GMAC_PACKET_FILTER_VTFE if NETIF_F_HW_VLAN_CTAG_FILTER was detected through the priv->dma_cap.vlhash capability bits of the device, in stmmac_dvr_probe(). I would suggest deleting the unnecessarily complex logic from dwmac4_vlan_promisc_enable(), and simply unsetting GMAC_PACKET_FILTER_VTFE when becoming IFF_PROMISC, which has the same effect of allowing packets with any VLAN tags, but has the additional benefit of being able to run concurrently with stmmac_vlan_rx_add_vid() and stmmac_vlan_rx_kill_vid(). As much as I believe that the VTFE bit should have been exclusively controlled by NETIF_F_HW_VLAN_CTAG_FILTER through ethtool, and not by IFF_PROMISC, changing that is not a punctual fix to the problem, and it would probably break the VFFQ feature added by the later commit e0f9956a3862 ("net: stmmac: Add option for VLAN filter fail queue enable"). From the commit description, VFFQ needs IFF_PROMISC=on and VTFE=off in order to work (and this change respects that). But if VTFE was changed to be controlled through ethtool -k, then a user-visible change would have been introduced in Intel's scripts (a need to run "ethtool -k eth0 rx-vlan-filter off" which did not exist before). The patch was tested with this set of commands: ip link set eth0 up ip link add link eth0 name eth0.100 type vlan id 100 ip addr add 192.168.100.2/24 dev eth0.100 && ip link set eth0.100 up ip link set eth0 promisc on ip link add link eth0 name eth0.101 type vlan id 101 ip addr add 192.168.101.2/24 dev eth0.101 && ip link set eth0.101 up ip link set eth0 promisc off ping -c 5 192.168.100.1 ping -c 5 192.168.101.1 ip link set eth0 promisc on ping -c 5 192.168.100.1 ping -c 5 192.168.101.1 ip link del eth0.100 ip link del eth0.101 # Wait for VLAN-tagged pings from the other end... # Check with "tcpdump -i eth0 -e -n -p" and we should see them ip link set eth0 promisc off # Wait for VLAN-tagged pings from the other end... # Check with "tcpdump -i eth0 -e -n -p" and we shouldn't see them # anymore, but remove the "-p" argument from tcpdump and they're there. Fixes: c89f44ff10fd ("net: stmmac: Add support for VLAN promiscuous mode") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06r8169: fix RTL8168H and RTL8107E rx crc errorChunHao Lin1-0/+3
[ Upstream commit 33189f0a94b9639c058781fcf82e4ea3803b1682 ] When link speed is 10 Mbps and temperature is under -20°C, RTL8168H and RTL8107E may have rx crc error. Disable phy 10 Mbps pll off to fix this issue. Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E") Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06sfc: ef10: don't overwrite offload features at NIC resetÍñigo Huguet2-22/+33
[ Upstream commit ca4a80e4bb7e87daf33b27d2ab9e4f5311018a89 ] At NIC reset, some offload features related to encapsulated traffic might have changed (this mainly happens if the firmware-variant is changed with the sfboot userspace tool). Because of this, features are checked and set again at reset time. However, this was not done right, and some features were improperly overwritten at NIC reset: - Tunneled IPv6 segmentation was always disabled - Features disabled with ethtool were reenabled - Features that becomes unsupported after the reset were not disabled Also, checking if the device supports IPV6_CSUM to enable TSO6 is no longer necessary because all currently supported devices support it. Additionally, move the assignment of some other features to the EF10_OFFLOAD_FEATURES macro, like it is done in ef100, leaving the selection of features in efx_pci_probe_post_io a bit cleaner. Fixes: ffffd2454a7a ("sfc: correctly advertise tunneled IPv6 segmentation") Fixes: 24b2c3751aa3 ("sfc: advertise encapsulated offloads on EF10") Reported-by: Tianhao Zhao <tizhao@redhat.com> Suggested-by: Jonathan Cooper <jonathan.s.cooper@amd.com> Tested-by: Jonathan Cooper <jonathan.s.cooper@amd.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230323083417.7345-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisitesAdham Faris1-3/+7
[ Upstream commit 78dee7befd56987283c13877b834c0aa97ad51b9 ] XSK redirecting XDP programs require linearity, hence applies restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498. Features that contradict with XDP such HW-LRO and HW-GRO are enforced by the driver in advance, during XSK params validation, except for MTU, which was not enforced before this patch. This has been spotted during test scenario described below: Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching XDP program, changing the MTU to arbitrary value in the range [3499, 3754], attaching XDP program again, which ended up with failure since MTU is > 3498. This commit lowers the XSK MTU limitation to be aligned with XDP MTU limitation, since XSK socket is meaningless without XDP program. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30igb: revert rtnl_lock() that causes deadlockLin Ma1-2/+0
commit 65f69851e44d71248b952a687e44759a7abb5016 upstream. The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock to eliminate a false data race shown below (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] The above race will never happen and the extra rtnl_lock causes deadlock below [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc Since the igb_disable_sriov() will call pci_disable_sriov() before releasing any resources, the netdev core will synchronize the cleanup to avoid any races. This patch removes the useless rtnl_(un)lock to guarantee correctness. CC: stable@vger.kernel.org Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") Reported-by: Corinna Vinschen <vinschen@redhat.com> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ Signed-off-by: Lin Ma <linma@zju.edu.cn> Tested-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-30gve: Cache link_speed value from deviceJoshua Washington1-1/+4
[ Upstream commit 68c3e4fc8628b1487c965aabb29207249657eb5f ] The link speed is never changed for the uptime of a VM, and the current implementation sends an admin queue command for each call. Admin queue command invocations have nontrivial overhead (e.g., VM exits), which can be disruptive to users if triggered frequently. Our telemetry data shows that there are VMs that make frequent calls to this admin queue command. Caching the result of the original admin queue command would eliminate the need to send multiple admin queue commands on subsequent calls to retrieve link speed. Fixes: 7e074d5a76ca ("gve: Enable Link Speed Reporting in the driver.") Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230321172332.91678-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30mlxsw: spectrum_fid: Fix incorrect local port typeIdo Schimmel1-2/+2
[ Upstream commit bb765a743377d46d8da8e7f7e5128022504741b9 ] Local port is a 10-bit number, but it was mistakenly stored in a u8, resulting in firmware errors when using a netdev corresponding to a local port higher than 255. Fix by storing the local port in u16, as is done in the rest of the code. Fixes: bf73904f5fba ("mlxsw: Add support for 802.1Q FID family") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/eace1f9d96545ab8a2775db857cb7e291a9b166b.1679398549.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30net/sonic: use dma_mapping_error() for error checkZhang Changzhong1-2/+2
[ Upstream commit 4107b8746d93ace135b8c4da4f19bbae81db785f ] The DMA address returned by dma_map_single() should be checked with dma_mapping_error(). Fix it accordingly. Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6645a4b5c1e364312103f48b7b36783b94e197a2.1679370343.git.fthain@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30net: mscc: ocelot: fix stats region batchingVladimir Oltean1-1/+2
[ Upstream commit 6acc72a43eac78a309160d0a7512bbc59bcdd757 ] The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to "u32 reg". However, "u32 reg" is not quite a register address, but an enum ocelot_reg, which in itself encodes an enum ocelot_target target in the upper bits, and an index into the ocelot->map[target][] array in the lower bits. So, whereas the previous code comparison between stats_layout[i].offset and last + 1 was correct (because those "offsets" at the time were 32-bit relative addresses), the new code, comparing layout[i].reg to last + 4 is not correct, because the "reg" here is an enum/index, not an actual register address. What we want to compare are indeed register addresses, but to do that, we need to actually go through the same motions as __ocelot_bulk_read_ix() itself. With this bug, all statistics counters are deemed by ocelot_prepare_stats_regions() as constituting their own region. (Truncated) log on VSC9959 (Felix) below (prints added by me): Before: region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] After: region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Since commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats") intended bulking as a performance improvement, and since now, with trivial-sized regions, performance is even worse than without bulking at all, this could easily qualify as a performance regression. Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30net/mlx5: E-Switch, Fix an Oops in error handling codeDan Carpenter1-2/+1
[ Upstream commit 640fcdbcf27fc62de9223f958ceb4e897a00e791 ] The error handling dereferences "vport". There is nothing we can do if it is an error pointer except returning the error code. Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>