summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2020-09-23e1000e: Add support for Comet LakeSasha Neftin2-0/+12
commit 914ee9c436cbe90c8ca8a46ec8433cb614a2ada5 upstream. Add devices ID's for the next LOM generations that will be available on the next Intel Client platform (Comet Lake) This patch provides the initial support for these devices Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Anthony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03scsi: fcoe: Fix I/O path allocationMike Christie1-1/+1
[ Upstream commit fa39ab5184d64563cd36f2fb5f0d3fbad83a432c ] ixgbe_fcoe_ddp_setup() can be called from the main I/O path and is called with a spin_lock held, so we have to use GFP_ATOMIC allocation instead of GFP_KERNEL. Link: https://lore.kernel.org/r/1596831813-9839-1-git-send-email-michael.christie@oracle.com cc: Hannes Reinecke <hare@suse.de> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26i40e: Fix crash during removing i40e driverGrzegorz Szczurek1-0/+3
[ Upstream commit 5b6d4a7f20b09c47ca598760f6dafd554af8b6d5 ] Fix the reason of crashing system by add waiting time to finish reset recovery process before starting remove driver procedure. Now VSI is releasing if VSI is not in reset recovery mode. Without this fix it was possible to start remove driver if other processing command need reset recovery procedure which resulted in null pointer dereference. VSI used by the ethtool process has been cleared by remove driver process. [ 6731.508665] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6731.508668] #PF: supervisor read access in kernel mode [ 6731.508670] #PF: error_code(0x0000) - not-present page [ 6731.508671] PGD 0 P4D 0 [ 6731.508674] Oops: 0000 [#1] SMP PTI [ 6731.508679] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0021.032120170601 03/21/2017 [ 6731.508694] RIP: 0010:i40e_down+0x252/0x310 [i40e] [ 6731.508696] Code: c7 78 de fa c0 e8 61 02 3a c1 66 83 bb f6 0c 00 00 00 0f 84 bf 00 00 00 45 31 e4 45 31 ff eb 03 41 89 c7 48 8b 83 98 0c 00 00 <4a> 8b 3c 20 e8 a5 79 02 00 48 83 bb d0 0c 00 00 00 74 10 48 8b 83 [ 6731.508698] RSP: 0018:ffffb75ac7b3faf0 EFLAGS: 00010246 [ 6731.508700] RAX: 0000000000000000 RBX: ffff9c9874bd5000 RCX: 0000000000000007 [ 6731.508701] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff9c987f4d9780 [ 6731.508703] RBP: ffffb75ac7b3fb30 R08: 0000000000005b60 R09: 0000000000000004 [ 6731.508704] R10: ffffb75ac64fbd90 R11: 0000000000000001 R12: 0000000000000000 [ 6731.508706] R13: ffff9c97a08e0000 R14: ffff9c97a08e0a68 R15: 0000000000000000 [ 6731.508708] FS: 00007f2617cd2740(0000) GS:ffff9c987f4c0000(0000) knlGS:0000000000000000 [ 6731.508710] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6731.508711] CR2: 0000000000000000 CR3: 0000001e765c4006 CR4: 00000000003606e0 [ 6731.508713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6731.508714] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6731.508715] Call Trace: [ 6731.508734] i40e_vsi_close+0x84/0x90 [i40e] [ 6731.508742] i40e_quiesce_vsi.part.98+0x3c/0x40 [i40e] [ 6731.508749] i40e_pf_quiesce_all_vsi+0x55/0x60 [i40e] [ 6731.508757] i40e_prep_for_reset+0x59/0x130 [i40e] [ 6731.508765] i40e_reconfig_rss_queues+0x5a/0x120 [i40e] [ 6731.508774] i40e_set_channels+0xda/0x170 [i40e] [ 6731.508778] ethtool_set_channels+0xe9/0x150 [ 6731.508781] dev_ethtool+0x1b94/0x2920 [ 6731.508805] dev_ioctl+0xc2/0x590 [ 6731.508811] sock_do_ioctl+0xae/0x150 [ 6731.508813] sock_ioctl+0x34f/0x3c0 [ 6731.508821] ksys_ioctl+0x98/0xb0 [ 6731.508828] __x64_sys_ioctl+0x1a/0x20 [ 6731.508831] do_syscall_64+0x57/0x1c0 [ 6731.508835] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 4b8164467b85 ("i40e: Add common function for finding VSI by type") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26i40e: Set RX_ONLY mode for unicast promiscuous on VLANPrzemyslaw Patynowski2-9/+28
[ Upstream commit 4bd5e02a2ed1575c2f65bd3c557a077dd399f0e8 ] Trusted VF with unicast promiscuous mode set, could listen to TX traffic of other VFs. Set unicast promiscuous mode to RX traffic, if VSI has port VLAN configured. Rename misleading I40E_AQC_SET_VSI_PROMISC_TX bit to I40E_AQC_SET_VSI_PROMISC_RX_ONLY. Aligned unicast promiscuous with VLAN to the one without VLAN. Fixes: 6c41a7606967 ("i40e: Add promiscuous on VLAN support") Fixes: 3b1200891b7f ("i40e: When in promisc mode apply promisc mode to Tx Traffic as well") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19ice: Graceful error handling in HW table calloc failureSurabhi Boob1-1/+3
[ Upstream commit bcc46cb8a077c6189b44f1555b8659837f748eb2 ] In the ice_init_hw_tbls, if the devm_kcalloc for es->written fails, catch that error and bail out gracefully, instead of continuing with a NULL pointer. Fixes: 32d63fa1e9f3 ("ice: Initialize DDP package structures") Signed-off-by: Surabhi Boob <surabhi.boob@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19iavf: Fix updating statisticsTony Nguyen1-1/+4
[ Upstream commit 9358076642f14cec8c414850d5a909cafca3a9d6 ] Commit bac8486116b0 ("iavf: Refactor the watchdog state machine") inverted the logic for when to update statistics. Statistics should be updated when no other commands are pending, instead they were only requested when a command was processed. iavf_request_stats() would see a pending request and not request statistics to be updated. This caused statistics to never be updated; fix the logic. Fixes: bac8486116b0 ("iavf: Refactor the watchdog state machine") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19iavf: fix error return code in iavf_init_get_resources()Wei Yongjun1-1/+3
[ Upstream commit 753f3884f253de6b6d3a516e6651bda0baf4aede ] Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: b66c7bc1cd4d ("iavf: Refactor init state machine") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-11igb: reinit_locked() should be called with rtnl_lockFrancesco Ruggeri1-0/+9
[ Upstream commit 024a8168b749db7a4aa40a5fbdfa04bf7e77c1c0 ] We observed two panics involving races with igb_reset_task. The first panic is caused by this race condition: kworker reboot -f igb_reset_task igb_reinit_locked igb_down napi_synchronize __igb_shutdown igb_clear_interrupt_scheme igb_free_q_vectors igb_free_q_vector adapter->q_vector[v_idx] = NULL; napi_disable Panics trying to access adapter->q_vector[v_idx].napi_state The second panic (a divide error) is caused by this race: kworker reboot -f tx packet igb_reset_task __igb_shutdown rtnl_lock() ... igb_clear_interrupt_scheme igb_free_q_vectors adapter->num_tx_queues = 0 ... rtnl_unlock() rtnl_lock() igb_reinit_locked igb_down igb_up netif_tx_start_all_queues dev_hard_start_xmit igb_xmit_frame igb_tx_queue_mapping Panics on r_idx % adapter->num_tx_queues This commit applies to igb_reset_task the same changes that were applied to ixgbe in commit 2f90b8657ec9 ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver"), commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in reset subtask"). Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-16i40e: protect ring accesses with READ- and WRITE_ONCECiara Loftus1-10/+19
[ Upstream commit d59e267912cd90b0adf33b4659050d831e746317 ] READ_ONCE should be used when reading rings prior to accessing the statistics pointer. Introduce this as well as the corresponding WRITE_ONCE usage when allocating and freeing the rings, to ensure protected access. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-16ixgbe: protect ring accesses with READ- and WRITE_ONCECiara Loftus2-9/+17
[ Upstream commit f140ad9fe2ae16f385f8fe4dc9cf67bb4c51d794 ] READ_ONCE should be used when reading rings prior to accessing the statistics pointer. Introduce this as well as the corresponding WRITE_ONCE usage when allocating and freeing the rings, to ensure protected access. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24e1000e: Do not wake up the system via WOL if device wakeup is disabledChen Yu1-4/+10
commit 6bf6be1127f7e6d4bf39f84d56854e944d045d74 upstream. Currently the system will be woken up via WOL(Wake On LAN) even if the device wakeup ability has been disabled via sysfs: cat /sys/devices/pci0000:00/0000:00:1f.6/power/wakeup disabled The system should not be woken up if the user has explicitly disabled the wake up ability for this device. This patch clears the WOL ability of this network device if the user has disabled the wake up ability in sysfs. Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver") Reported-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24iavf: fix speed reporting over virtchnlBrett Creeley4-21/+120
[ Upstream commit e0ef26fbe2b0c62f42ba7667076dc38b693b6fb8 ] Link speeds are communicated over virtchnl using an enum virtchnl_link_speed. Currently, the highest link speed is 40Gbps which leaves us unable to reflect some speeds that an ice VF is capable of. This causes link speed to be misreported on the iavf driver. Allow for communicating link speeds using Mbps so that the proper speed can be reported for an ice VF. Moving away from the enum allows us to communicate future speed changes without requiring a new enum to be added. In order to support communicating link speeds over virtchnl in Mbps the following functionality was added: - Added u32 link_speed_mbps in the iavf_adapter structure. - Added the macro ADV_LINK_SUPPORT(_a) to determine if the VF driver supports communicating link speeds in Mbps. - Added the function iavf_get_vpe_link_status() to fill the correct link_status in the event_data union based on the ADV_LINK_SUPPORT(_a) macro. - Added the function iavf_set_adapter_link_speed_from_vpe() to determine whether or not to fill the u32 link_speed_mbps or enum virtchnl_link_speed link_speed field in the iavf_adapter structure based on the ADV_LINK_SUPPORT(_a) macro. - Do not free vf_res in iavf_init_get_resources() as vf_res will be accessed in iavf_get_link_ksettings(); memset to 0 instead. This memory is subsequently freed in iavf_remove(). Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Sergey Nemov <sergey.nemov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22igb: Report speed and duplex as unknown when device is runtime suspendedKai-Heng Feng1-1/+2
commit 165ae7a8feb53dc47fb041357e4b253bfc927cf9 upstream. igb device gets runtime suspended when there's no link partner. We can't get correct speed under that state: $ cat /sys/class/net/enp3s0/speed 1000 In addition to that, an error can also be spotted in dmesg: [ 385.991957] igb 0000:03:00.0 enp3s0: PCIe link lost Since device can only be runtime suspended when there's no link partner, we can skip reading register and let the following logic set speed and duplex with correct status. The more generic approach will be wrap get_link_ksettings() with begin() and complete() callbacks. However, for this particular issue, begin() calls igb_runtime_resume() , which tries to rtnl_lock() while the lock is already hold by upper ethtool layer. So let's take this approach until the igb_runtime_resume() no longer needs to hold rtnl_lock. CC: stable <stable@vger.kernel.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22e1000e: Relax condition to trigger reset for ME workaroundPunit Agrawal2-8/+5
commit d601afcae2febc49665008e9a79e701248d56c50 upstream. It's an error if the value of the RX/TX tail descriptor does not match what was written. The error condition is true regardless the duration of the interference from ME. But the driver only performs the reset if E1000_ICH_FWSM_PCIM2PCI_COUNT (2000) iterations of 50us delay have transpired. The extra condition can lead to inconsistency between the state of hardware as expected by the driver. Fix this by dropping the check for number of delay iterations. While at it, also make __ew32_prepare() static as it's not used anywhere else. CC: stable <stable@vger.kernel.org> Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22e1000e: Disable TSO for buffer overrun workaroundKai-Heng Feng1-0/+4
commit f29801030ac67bf98b7a65d3aea67b30769d4f7c upstream. Commit b10effb92e27 ("e1000e: fix buffer overrun while the I219 is processing DMA transactions") imposes roughly 30% performance penalty. The commit log states that "Disabling TSO eliminates performance loss for TCP traffic without a noticeable impact on CPU performance", so let's disable TSO by default to regain the loss. CC: stable <stable@vger.kernel.org> Fixes: b10effb92e27 ("e1000e: fix buffer overrun while the I219 is processing DMA transactions") BugLink: https://bugs.launchpad.net/bugs/1802691 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22ixgbe: fix signed-integer-overflow warningXie XiuQi1-1/+1
[ Upstream commit 3b70683fc4d68f5d915d9dc7e5ba72c732c7315c ] ubsan report this warning, fix it by adding a unsigned suffix. UBSAN: signed-integer-overflow in drivers/net/ethernet/intel/ixgbe/ixgbe_common.c:2246:26 65535 * 65537 cannot be represented in type 'int' CPU: 21 PID: 7 Comm: kworker/u256:0 Not tainted 5.7.0-rc3-debug+ #39 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 03/27/2020 Workqueue: ixgbe ixgbe_service_task [ixgbe] Call trace: dump_backtrace+0x0/0x3f0 show_stack+0x28/0x38 dump_stack+0x154/0x1e4 ubsan_epilogue+0x18/0x60 handle_overflow+0xf8/0x148 __ubsan_handle_mul_overflow+0x34/0x48 ixgbe_fc_enable_generic+0x4d0/0x590 [ixgbe] ixgbe_service_task+0xc20/0x1f78 [ixgbe] process_one_work+0x8f0/0xf18 worker_thread+0x430/0x6d0 kthread+0x218/0x238 ret_from_fork+0x10/0x18 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22ice: fix potential double free in probe unrollingJacob Keller1-1/+2
[ Upstream commit bc3a024101ca497bea4c69be4054c32a5c349f1d ] If ice_init_interrupt_scheme fails, ice_probe will jump to clearing up the interrupts. This can lead to some static analysis tools such as the compiler sanitizers complaining about double free problems. Since ice_init_interrupt_scheme already unrolls internally on failure, there is no need to call ice_clear_interrupt_scheme when it fails. Add a new unroll label and use that instead. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22e1000: Distribute switch variables for initializationKees Cook1-1/+3
[ Upstream commit a34c7f5156654ebaf7eaace102938be7ff7036cb ] Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. drivers/net/ethernet/intel/e1000/e1000_main.c: In function ‘e1000_xmit_frame’: drivers/net/ethernet/intel/e1000/e1000_main.c:3143:18: warning: statement will never be executed [-Wswitch-unreachable] 3143 | unsigned int pull_size; | ^~~~~~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22ice: Fix for memory leaks and modify ICE_FREE_CQ_BUFSSurabhi Boob1-21/+28
[ Upstream commit 68d270783742783f96e89ef92ac24ab3c7fb1d31 ] Handle memory leaks during control queue initialization and buffer allocation failures. The macro ICE_FREE_CQ_BUFS is modified to re-use for this fix. Signed-off-by: Surabhi Boob <surabhi.boob@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22ice: Fix memory leakSurabhi Boob1-1/+7
[ Upstream commit 1aaef2bc4e0a5ce9e4dd86359e6a0bf52c6aa64f ] Handle memory leak on filter management initialization failure. Signed-off-by: Surabhi Boob <surabhi.boob@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22ixgbe: Fix XDP redirect on archs with PAGE_SIZE above 4KJesper Dangaard Brouer1-1/+2
[ Upstream commit 88eb0ee17b2ece64fcf6689a4557a5c2e7a89c4b ] The ixgbe driver have another memory model when compiled on archs with PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in two halves, but instead increment rx_buffer->page_offset by truesize of packet (which include headroom and tailroom for skb_shared_info). This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip which is currently only called on XDP_TX and XDP_REDIRECT, it forgets to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for veth and cpumap. Fix by adding size of skb_shared_info tailroom. Maintainers notice: This fix have been queued to Jeff. Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: https://lore.kernel.org/bpf/158945344946.97035.17031588499266605743.stgit@firesoul Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12ice: Don't tell the OS that link is going downMichal Swiatkowski1-7/+0
[ Upstream commit 8a55c08d3bbc9ffc9639f69f742e59ebd99f913b ] Remove code that tell the OS that link is going down when user change flow control via ethtool. When link is up it isn't certain that link goes down after 0x0605 aq command. If link doesn't go down, OS thinks that link is down, but physical link is up. To reset this state user have to take interface down and up. If link goes down after 0x0605 command, FW send information about that and after that driver tells the OS that the link goes down. So this code in ethtool is unnecessary. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05ice: update Unit Load Status bitmask to check after resetBruce Allan2-5/+18
[ Upstream commit cf8fc2a0863f9ff27ebd2efcdb1f7d378b9fb8a6 ] After a reset the Unit Load Status bits in the GLNVM_ULD register to check for completion should be 0x7FF before continuing. Update the mask to check (minus the three reserved bits that are always set). Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05i40e: Fix the conditional for i40e_vc_validate_vqs_bitmapsBrett Creeley1-2/+2
[ Upstream commit f27f37a04a69890ac85d9155f03ee2d23b678d8f ] Commit d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") introduced a necessary change for verifying how queue bitmaps from the iavf driver get validated. Unfortunately, the conditional was reversed. Fix this. Fixes: d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-28e1000e: Use rtnl_lock to prevent race conditions between net and pci/pmAlexander Duyck1-33/+35
commit a7023819404ac9bd2bb311a4fafd38515cfa71ec upstream. This patch is meant to address possible race conditions that can exist between network configuration and power management. A similar issue was fixed for igb in commit 9474933caf21 ("igb: close/suspend race in netif_device_detach"). In addition it consolidates the code so that the PCI error handling code will essentially perform the power management freeze on the device prior to attempting a reset, and will thaw the device afterwards if that is what it is planning to do. Otherwise when we call close on the interface it should see it is detached and not attempt to call the logic to down the interface and free the IRQs again. From what I can tell the check that was adding the check for __E1000_DOWN in e1000e_close was added when runtime power management was added. However it should not be relevant for us as we perform a call to pm_runtime_get_sync before we call e1000_down/free_irq so it should always be back up before we call into this anyway. Reported-by: Morumuri Srivalli <smorumu1@in.ibm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24i40e: Relax i40e_xsk_wakeup's return value when PF is busyMaciej Fijalkowski1-1/+1
[ Upstream commit c77e9f09143822623dd71a0fdc84331129e97c3a ] Return -EAGAIN instead of -ENETDOWN to provide a slightly milder information to user space so that an application will know to retry the syscall when __I40E_CONFIG_BUSY bit is set on pf->state. Fixes: b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20200205045834.56795-2-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06iavf: remove current MAC address filter on VF resetStefan Assmann3-4/+18
[ Upstream commit 9e05229190380f6b8f702da39aaeb97a0fc80dc3 ] Currently MAC filters are not altered during a VF reset event. This may lead to a stale filter when an administratively set MAC is forced by the PF. For an administratively set MAC the PF driver deletes the VFs filters, overwrites the VFs MAC address and triggers a VF reset. However the VF driver itself is not aware of the filter removal, which is what the VF reset is for. The VF reset queues all filters present in the VF driver to be re-added to the PF filter list (including the filter for the now stale VF MAC address) and triggers a VIRTCHNL_OP_GET_VF_RESOURCES event, which provides the new MAC address to the VF. When this happens i40e will complain and reject the stale MAC filter, at least in the untrusted VF case. i40e 0000:08:00.0: Setting MAC 3c:fa:fa:fa:fa:01 on VF 0 iavf 0000:08:02.0: Reset warning received from the PF iavf 0000:08:02.0: Scheduling reset task i40e 0000:08:00.0: Bring down and up the VF interface to make this change effective. i40e 0000:08:00.0: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation i40e 0000:08:00.0: VF 0 failed opcode 10, retval: -1 iavf 0000:08:02.0: Failed to add MAC filter, error IAVF_ERR_NVM To avoid re-adding the stale MAC filter it needs to be removed from the VF driver's filter list before queuing the existing filters. Then during the VIRTCHNL_OP_GET_VF_RESOURCES event the correct filter needs to be added again, at which point the MAC address has been updated. As a bonus this change makes bringing the VF down and up again superfluous for the administratively set MAC case. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06igb: Fix SGMII SFP module discovery for 100FX/LX.Manfred Rudigier2-7/+3
[ Upstream commit 5365ec1aeff5b9f2962a9c9b31d63f9dad7e0e2d ] Changing the link mode should also be done for 100BaseFX SGMII modules, otherwise they just don't work when the default link mode in CTRL_EXT coming from the EEPROM is SERDES. Additionally 100Base-LX SGMII SFP modules are also supported now, which was not the case before. Tested with an i210 using Flexoptix S.1303.2M.G 100FX and S.1303.10.G 100LX SGMII SFP modules. Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06ixgbe: Fix calculation of queue with VFs and flow director on interface flapCambda Zhu1-10/+27
[ Upstream commit 4fad78ad6422d9bca62135bbed8b6abc4cbb85b8 ] This patch fixes the calculation of queue when we restore flow director filters after resetting adapter. In ixgbe_fdir_filter_restore(), filter's vf may be zero which makes the queue outside of the rx_ring array. The calculation is changed to the same as ixgbe_add_ethtool_fdir_entry(). Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06ixgbevf: Remove limit of 10 entries for unicast filter listRadoslaw Tyl1-5/+0
[ Upstream commit aa604651d523b1493988d0bf6710339f3ee60272 ] Currently, though the FDB entry is added to VF, it does not appear in RAR filters. VF driver only allows to add 10 entries. Attempting to add another causes an error. This patch removes limitation and allows use of all free RAR entries for the FDB if needed. Fixes: 46ec20ff7d ("ixgbevf: Add macvlan support in the set rx mode op") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06i40e: Fix virtchnl_queue_select bitmap validationBrett Creeley1-4/+18
[ Upstream commit d9d6a9aed3f66f8ce5fa3ca6ca26007d75032296 ] Currently in i40e_vc_disable_queues_msg() we are incorrectly validating the virtchnl queue select bitmaps. The virtchnl_queue_select rx_queues and tx_queue bitmap is being compared against ICE_MAX_VF_QUEUES, but the problem is that these bitmaps can have a value greater than I40E_MAX_VF_QUEUES. Fix this by comparing the bitmaps against BIT(I40E_MAX_VF_QUEUES). Also, add the function i40e_vc_validate_vqs_bitmaps() that checks to see if both virtchnl_queue_select bitmaps are empty along with checking that the bitmaps only have valid bits set. This function can then be used in both the queue enable and disable flows. Suggested-by: Arkady Gilinksky <arkady.gilinsky@harmonicinc.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06e1000e: Revert "e1000e: Make watchdog use delayed work"Jeff Kirsher2-32/+27
[ Upstream commit d5ad7a6a7f3c87b278d7e4973b65682be4e588dd ] This reverts commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db. This is due to this commit causing driver crashes and connections to reset unexpectedly. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-06e1000e: Drop unnecessary __E1000_DOWN bit twiddlingAlexander Duyck1-6/+1
[ Upstream commit daee5598e491d8d3979bd4ad6c447d89ce57b446 ] Since we no longer check for __E1000_DOWN in e1000e_close we can drop the spot where we were restoring the bit. This saves us a bit of unnecessary complexity. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26ice: fix stack leakageJesse Brandeburg1-2/+1
commit 949375de945f7042df2b6488228a1a2b36e69f35 upstream. In the case of an invalid virtchannel request the driver would return uninitialized data to the VF from the PF stack which is a bug. Fix by initializing the stack variable earlier in the function before any return paths can be taken. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23i40e: prevent memory leak in i40e_setup_macvlansNavid Emamdoost1-0/+1
[ Upstream commit 27d461333459d282ffa4a2bdb6b215a59d493a8f ] In i40e_setup_macvlans if i40e_setup_channel fails the allocated memory for ch should be released. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12net/ixgbe: Fix concurrency issues between config flow and XSKMaxim Mikityanskiy2-3/+12
[ Upstream commit c0fdccfd226a1424683d3000d9e08384391210a2 ] Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN. 2. After switching the XDP program, call synchronize_rcu to let ixgbe_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down. 4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to avoid using the XDP ring when it's already destroyed. synchronize_rcu is called from ixgbe_txrx_ring_disable. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-5-maximmi@mellanox.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12net/i40e: Fix concurrency issues between config flow and XSKMaxim Mikityanskiy3-4/+12
[ Upstream commit b3873a5be757b44d51af542a50a6f2a3b6f95284 ] Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. i40e_down already calls synchronize_rcu. On i40e_down either __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in i40e_xsk_wakeup (the former is already checked there). 2. After switching the XDP program, call synchronize_rcu to let i40e_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down (see i40e_prep_for_reset and i40e_pf_quiesce_all_vsi). 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ice: Fix setting coalesce to handle DCB configurationBrett Creeley1-3/+10
[ Upstream commit e25f9152bc07de534b2b590ce6c052ea25dd8900 ] Currently there can be a case where a DCB map is applied and there are more interrupt vectors (vsi->num_q_vectors) than Rx queues (vsi->num_rxq) and Tx queues (vsi->num_txq). If we try to set coalesce settings in this case it will report a false failure. Fix this by checking if vector index is valid with respect to the number of Tx and Rx queues configured. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ice: Only disable VF state when freeing each VF resourcesAkeem G Abodunrin1-4/+8
[ Upstream commit 1f9639d2fb9188a59acafae9dea626391c442a8d ] It is wrong to set PF disable state flag for all VFs when freeing VF resources - Instead, we should set VF disable state flag for each VF with its resources being returned to the device. Right now, all VF opcodes, mailbox communication to clear its resources as well fails - since we already indicate that PF is in disable state, with all VFs not active. In addition, we don't need to notify VF that PF is intending to reset it, if it is already in disabled state. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ice: delay lessMitch Williams2-3/+4
[ Upstream commit 88bb432a55de8ae62106305083a8bfbb23b01ad2 ] Shorten the delay for SQ responses, but increase the number of loops. Max delay time is unchanged, but some operations complete much more quickly. In the process, add a new define to make the delay count and delay time more explicit. Add comments to make things more explicit. This fixes a problem with VF resets failing on with many VFs. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ice: Check for null pointer dereference when setting ringsMichal Swiatkowski1-4/+14
[ Upstream commit eb0ee8abfeb9ff4b98e8e40217b8667bfb08587a ] Without this check rebuild vsi can lead to kernel panic. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ixgbe: protect TX timestamping from API misuseManjunath Patil1-1/+2
[ Upstream commit 07066d9dc3d2326fbad8f7b0cb0120cff7b7dedb ] HW timestamping can only be requested for a packet if the NIC is first setup via ioctl(SIOCSHWTSTAMP). If this step was skipped, then the ixgbe driver still allowed TX packets to request HW timestamping. In this situation, we see 'clearing Tx Timestamp hang' noise in the log. Fix this by checking that the NIC is configured for HW TX timestamping before accepting a HW TX timestamping request. Similar-to: commit 26bd4e2db06b ("igb: protect TX timestamping from API misuse") commit 0a6f2f05a2f5 ("igb: Fix a test with HWTSTAMP_TX_ON") Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31i40e: Wrong 'Advertised FEC modes' after set FEC to AUTOJaroslaw Gawin2-19/+26
[ Upstream commit e42b7e9cefca9dd008cbafffca97285cf264f72d ] Fix display of parameters "Configured FEC encodings:" and "Advertised FEC modes:" in ethtool. Implemented by setting proper FEC bits in “advertising” bitmask of link_modes struct and “fec” bitmask in ethtool_fecparam struct. Without this patch wrong FEC settings can be shown. Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31i40e: initialize ITRN registers with correct valuesNicholas Nunley1-5/+5
[ Upstream commit 998e5166e604fd37afe94352f7b8c2d816b11049 ] Since commit 92418fb14750 ("i40e/i40evf: Use usec value instead of reg value for ITR defines") the driver tracks the interrupt throttling intervals in single usec units, although the actual ITRN/ITR0 registers are programmed in 2 usec units. Most register programming flows in the driver correctly handle the conversion, although it is currently not applied when the registers are initialized to their default values. Most of the time this doesn't present a problem since the default values are usually immediately overwritten through the standard adaptive throttling mechanism, or updated manually by the user, but if adaptive throttling is disabled and the interval values are left alone then the incorrect value will persist. Since the intended default interval of 50 usecs (vs. 100 usecs as programmed) performs better for most traffic workloads, this can lead to performance regressions. This patch adds the correct conversion when writing the initial values to the ITRN registers. Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-15igb: Reject requests that fail to enable time stamping on both edges.Richard Cochran1-0/+6
This hardware always time stamps rising and falling edges, and so this patch validates that the request does contains both edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Introduce strict checking of external time stamp options.Richard Cochran1-1/+2
User space may request time stamps on rising edges, falling edges, or both. However, the particular mode may or may not be supported in the hardware or in the driver. This patch adds a "strict" flag that tells drivers to ensure that the requested mode will be honored. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15igb: reject unsupported external timestamp flagsJacob Keller1-0/+6
Fix the igb PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. This HW always time stamps both edges: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp both edges Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: reject PTP periodic output requests with unsupported flagsJacob Keller1-0/+4
Commit 823eb2a3c4c7 ("PTP: add support for one-shot output") introduced a new flag for the PTP periodic output request ioctl. This flag is not currently supported by any driver. Fix all drivers which implement the periodic output request ioctl to explicitly reject any request with flags they do not understand. This ensures that the driver does not accidentally misinterpret the PTP_PEROUT_ONE_SHOT flag, or any new flag introduced in the future. This is important for forward compatibility: if a new flag is introduced, the driver should reject requests to enable the flag until the driver has actually been modified to support the flag in question. Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Christopher Hall <christopher.s.hall@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09ixgbe: need_wakeup flag might not be set for TxMagnus Karlsson1-8/+2
The need_wakeup flag for Tx might not be set for AF_XDP sockets that are only used to send packets. This happens if there is at least one outstanding packet that has not been completed by the hardware and we get that corresponding completion (which will not generate an interrupt since interrupts are disabled in the napi poll loop) between the time we stopped processing the Tx completions and interrupts are enabled again. In this case, the need_wakeup flag will have been cleared at the end of the Tx completion processing as we believe we will get an interrupt from the outstanding completion at a later point in time. But if this completion interrupt occurs before interrupts are enable, we lose it and should at that point really have set the need_wakeup flag since there are no more outstanding completions that can generate an interrupt to continue the processing. When this happens, user space will see a Tx queue need_wakeup of 0 and skip issuing a syscall, which means will never get into the Tx processing again and we have a deadlock. This patch introduces a quick fix for this issue by just setting the need_wakeup flag for Tx to 1 all the time. I am working on a proper fix for this that will toggle the flag appropriately, but it is more challenging than I anticipated and I am afraid that this patch will not be completed before the merge window closes, therefore this easier fix for now. This fix has a negative performance impact in the range of 0% to 4%. Towards the higher end of the scale if you have driver and application on the same core and issue a lot of packets, and towards no negative impact if you use two cores, lower transmission speeds and/or a workload that also receives packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-09i40e: need_wakeup flag might not be set for TxMagnus Karlsson1-8/+2
The need_wakeup flag for Tx might not be set for AF_XDP sockets that are only used to send packets. This happens if there is at least one outstanding packet that has not been completed by the hardware and we get that corresponding completion (which will not generate an interrupt since interrupts are disabled in the napi poll loop) between the time we stopped processing the Tx completions and interrupts are enabled again. In this case, the need_wakeup flag will have been cleared at the end of the Tx completion processing as we believe we will get an interrupt from the outstanding completion at a later point in time. But if this completion interrupt occurs before interrupts are enable, we lose it and should at that point really have set the need_wakeup flag since there are no more outstanding completions that can generate an interrupt to continue the processing. When this happens, user space will see a Tx queue need_wakeup of 0 and skip issuing a syscall, which means will never get into the Tx processing again and we have a deadlock. This patch introduces a quick fix for this issue by just setting the need_wakeup flag for Tx to 1 all the time. I am working on a proper fix for this that will toggle the flag appropriately, but it is more challenging than I anticipated and I am afraid that this patch will not be completed before the merge window closes, therefore this easier fix for now. This fix has a negative performance impact in the range of 0% to 4%. Towards the higher end of the scale if you have driver and application on the same core and issue a lot of packets, and towards no negative impact if you use two cores, lower transmission speeds and/or a workload that also receives packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>