summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2021-11-25ice: Delete always true check of PF pointerLeon Romanovsky1-3/+0
commit 2ff04286a9569675948f39cec2c6ad47c3584633 upstream. PF pointer is always valid when PCI core calls its .shutdown() and .remove() callbacks. There is no need to check it again. Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25ice: Fix VF true promiscuous modeBrett Creeley2-43/+40
commit 1a8c7778bcde5981463a5b9f9b2caa44a327ff93 upstream. When a VF requests promiscuous mode and it's trusted and true promiscuous mode is enabled the PF driver attempts to enable unicast and/or multicast promiscuous mode filters based on the request. This is fine, but there are a couple issues with the current code. [1] The define to configure the unicast promiscuous mode mask also includes bits to configure the multicast promiscuous mode mask, which causes multicast to be set/cleared unintentionally. [2] All 4 cases for enable/disable unicast/multicast mode are not handled in the promiscuous mode message handler, which causes unexpected results regarding the current promiscuous mode settings. To fix [1] make sure any promiscuous mask defines include the correct bits for each of the promiscuous modes. To fix [2] make sure that all 4 cases are handled since there are 2 bits (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be either set or cleared. Also, since either unicast and/or multicast promiscuous configuration can fail, introduce two separate error values to handle each of these cases. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25e100: fix device suspend/resumeJesse Brandeburg1-5/+13
[ Upstream commit 5d2ca2e12dfb2aff3388ca57b06f570fa6206ced ] As reported in [1], e100 was no longer working for suspend/resume cycles. The previous commit mentioned in the fixes appears to have broken things and this attempts to practice best known methods for device power management and keep wake-up working while allowing suspend/resume to work. To do this, I reorder a little bit of code and fix the resume path to make sure the device is enabled. [1] https://bugzilla.kernel.org/show_bug.cgi?id=214933 Fixes: 69a74aef8a18 ("e100: use generic power management") Cc: Vaibhav Gupta <vaibhavgupta40@gmail.com> Reported-by: Alexey Kuznetsov <axet@me.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Alexey Kuznetsov <axet@me.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix display error code in dmesgGrzegorz Szczurek1-3/+2
[ Upstream commit 5aff430d4e33a0b48a6b3d5beb06f79da23f9916 ] Fix misleading display error in dmesg if tc filter return fail. Only i40e status error code should be converted to string, not linux error code. Otherwise, we return false information about the error. Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix creation of first queue by omitting it if is not power of twoJedrzej Jagielski1-40/+19
[ Upstream commit 2e6d218c1ec6fb9cd70693b78134cbc35ae0b5a9 ] Reject TCs creation with proper message if the first queue assignment is not equal to the power of two. The first queue number was checked too late in the second queue iteration, if second queue was configured at all. Now if first queue value is not a power of two, then trying to create qdisc will be rejected. Fixes: 8f88b3034db3 ("i40e: Add infrastructure for queue channel support") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix warning message and call stack during rmmod i40e driverKaren Sornek1-32/+21
[ Upstream commit 3a3b311e3881172fc8e019b6508f04bc40c92d9d ] Restore part of reset functionality used when reset is called from the VF to reset itself. Without this fix warning message is displayed when VF is being removed via sysfs. Fix the crash of the VF during reset by ensuring that the PF receives the reset message successfully. Refactor code to use one function instead of two. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix ping is lost after configuring ADq on VFEryk Rybak3-8/+74
[ Upstream commit 9e0a603cb7dce2a19d98116d42de84b6db26d716 ] Properly reconfigure VF VSIs after VF request ADQ. Created new function to update queue mapping and queue pairs per TC with AQ update VSI. This sets proper RSS size on NIC. VFs num_queue_pairs should not be changed during setup of queue maps. Previously, VF main VSI in ADQ had configured too many queues and had wrong RSS size, which lead to packets not being consumed and drops in connectivity. Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use") Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix changing previously set num_queue_pairs for PFsEryk Rybak1-12/+23
[ Upstream commit d2a69fefd75683004ffe87166de5635b3267ee07 ] Currently, the i40e_vsi_setup_queue_map is basing the count of queues in TCs on a VSI's alloc_queue_pairs member which is not changed throughout any user's action (for example via ethtool's set_channels callback). This implies that vsi->tc_config.tc_info[n].qcount value that is given to the kernel via netdev_set_tc_queue() that notifies about the count of queues per particular traffic class is constant even if user has changed the total count of queues. This in turn caused the kernel warning after setting the queue count to the lower value than the initial one: $ ethtool -l ens801f0 Channel parameters for ens801f0: Pre-set maximums: RX: 0 TX: 0 Other: 1 Combined: 64 Current hardware settings: RX: 0 TX: 0 Other: 1 Combined: 64 $ ethtool -L ens801f0 combined 40 [dmesg] Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled! Reason was that vsi->alloc_queue_pairs stayed at 64 value which was used to set the qcount on TC0 (by default only TC0 exists so all of the existing queues are assigned to TC0). we update the offset/qcount via netdev_set_tc_queue() back to the old value but then the netif_set_real_num_tx_queues() is using the vsi->num_queue_pairs as a value which got set to 40. Fix it by using vsi->req_queue_pairs as a queue count that will be distributed across TCs. Do it only for non-zero values, which implies that user actually requested the new count of queues. For VSIs other than main, stay with the vsi->alloc_queue_pairs as we only allow manipulating the queue count on main VSI. Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use") Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix NULL ptr dereference on VSI filter syncMichal Maloszewski2-2/+4
[ Upstream commit 37d9e304acd903a445df8208b8a13d707902dea6 ] Remove the reason of null pointer dereference in sync VSI filters. Added new I40E_VSI_RELEASING flag to signalize deleting and releasing of VSI resources to sync this thread with sync filters subtask. Without this patch it is possible to start update the VSI filter list after VSI is removed, that's causing a kernel oops. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Reviewed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Reviewed-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com> Reviewed-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25i40e: Fix correct max_pkt_size on VF RX queueEryk Rybak1-44/+9
[ Upstream commit 6afbd7b3c53cb7417189f476e99d431daccb85b0 ] Setting VLAN port increasing RX queue max_pkt_size by 4 bytes to take VLAN tag into account. Trigger the VF reset when setting port VLAN for VF to renegotiate its capabilities and reinitialize. Fixes: ba4e003d29c1 ("i40e: don't hold spinlock while resetting VF") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: Restore VLAN filters after link downAkeem G Abodunrin2-5/+31
[ Upstream commit 4293014230b887d94b68aa460ff00153454a3709 ] Restore VLAN filters after the link is brought down, and up - since all filters are deleted from HW during the netdev link down routine. Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close") Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: Fix for setting queues to 0Grzegorz Szczurek1-1/+1
[ Upstream commit 9a6e9e483a9684a34573fd9f9e30ecfb047cb8cb ] Now setting combine to 0 will be rejected with the appropriate error code. This has been implemented by adding a condition that checks the value of combine equal to zero. Without this patch, when the user requested it, no error was returned and combine was set to the default value for VF. Fixes: 5520deb15326 ("iavf: Enable support for up to 16 queues") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: Fix for the false positive ASQ/ARQ errors while issuing VF resetSurabhi Boob1-1/+1
[ Upstream commit 321421b57a12e933f92b228e0e6d0b2c6541f41d ] While issuing VF Reset from the guest OS, the VF driver prints logs about critical / Overflow error detection. This is not an actual error since the VF_MBX_ARQLEN register is set to all FF's for a short period of time and the VF would catch the bits set if it was reading the register during that spike of time. This patch introduces an additional check to ignore this condition since the VF is in reset. Fixes: 19b73d8efaa4 ("i40evf: Add additional check for reset") Signed-off-by: Surabhi Boob <surabhi.boob@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: validate pointersMitch Williams1-7/+6
[ Upstream commit 131b0edc4028bb88bb472456b1ddba526cfb7036 ] In some cases, the ethtool get_rxfh handler may be called with a null key or indir parameter. So check these pointers, or you will have a very bad day. Fixes: 43a3d9ba34c9 ("i40evf: Allow PF driver to configure RSS") Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: prevent accidental free of filter structureJacob Keller1-2/+2
[ Upstream commit 4f0400803818f2642f066d3eacaf013f23554cc7 ] In iavf_config_clsflower, the filter structure could be accidentally released at the end, if iavf_parse_cls_flower or iavf_handle_tclass ever return a non-zero but positive value. In this case, the function continues through to the end, and will call kfree() on the filter structure even though it has been added to the linked list. This can actually happen because iavf_parse_cls_flower will return a positive IAVF_ERR_CONFIG value instead of the traditional negative error codes. Fix this by ensuring that the kfree() check and error checks are similar. Use the more idiomatic "if (err)" to catch all non-zero error codes. Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: Fix failure to exit out from last all-multicast modePiotr Marczak1-2/+1
[ Upstream commit 8905072a192fffe9389255489db250c73ecab008 ] The driver could only quit allmulti when allmulti and promisc modes are turn on at the same time. If promisc had been off there was no way to turn off allmulti mode. The patch corrects this behavior. Switching allmulti does not depends on promisc state mode anymore Fixes: f42a5c74da99 ("i40e: Add allmulti support for the VF") Signed-off-by: Piotr Marczak <piotr.marczak@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: don't clear a lock we don't holdNicholas Nunley1-2/+4
[ Upstream commit 2135a8d5c8186bc92901dc00f179ffd50e54c2ac ] In iavf_configure_clsflower() the function will bail out if it is unable to obtain the crit_section lock in a reasonable time. However, it will clear the lock when exiting, so fix this. Fixes: 640a8af5841f ("i40evf: Reorder configure_clsflower to avoid deadlock on error") Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: free q_vectors before queues in iavf_disable_vfNicholas Nunley1-1/+1
[ Upstream commit 89f22f129696ab53cfbc608e0a2184d0fea46ac1 ] iavf_free_queues() clears adapter->num_active_queues, which iavf_free_q_vectors() relies on, so swap the order of these two function calls in iavf_disable_vf(). This resolves a panic encountered when the interface is disabled and then later brought up again after PF communication is restored. Fixes: 65c7006f234c ("i40evf: assign num_active_queues inside i40evf_alloc_queues") Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: check for null in iavf_fix_featuresNicholas Nunley1-1/+2
[ Upstream commit 8a4a126f4be88eb8b5f00a165ab58c35edf4ef76 ] If the driver has lost contact with the PF then it enters a disabled state and frees adapter->vf_res. However, ndo_fix_features can still be called on the interface, so we need to check for this condition first. Since we have no information on the features at this time simply leave them unmodified and return. Fixes: c4445aedfe09 ("i40evf: Fix VLAN features") Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25iavf: Fix return of set the new channel countMateusz Palczewski1-0/+15
[ Upstream commit 4e5e6b5d9d1334d3490326b6922a2daaf56a867f ] Fixed return correct code from set the new channel count. Implemented by check if reset is done in appropriate time. This solution give a extra time to pf for reset vf in case when user want set new channel count for all vfs. Without this patch it is possible to return misleading output code to user and vf reset not to be correctly performed by pf. Fixes: 5520deb15326 ("iavf: Enable support for up to 16 queues") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ice: Fix not stopping Tx queues for VFsBrett Creeley2-5/+3
[ Upstream commit b385cca47363316c6d9a74ae9db407bbc281f815 ] When a VF is removed and/or reset its Tx queues need to be stopped from the PF. This is done by calling the ice_dis_vf_qs() function, which calls ice_vsi_stop_lan_tx_rings(). Currently ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA. Unfortunately, this is causing the Tx queues to not be disabled in some cases and when the VF tries to re-enable/reconfigure its Tx queues over virtchnl the op is failing. This is because a VF can be reset and/or removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues were already configured via ice_vsi_cfg_single_txq() in the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. This was causing the following error message when loading the ice driver, creating VFs, and modifying VF trust in an endless loop: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by always calling ice_dis_vf_qs() and silencing the error message in ice_vsi_stop_tx_ring() since the calling code ignores the return anyway. Also, all other places that call ice_vsi_stop_tx_ring() catch the error, so this doesn't affect those flows since there was no change to the values the function returns. Other solutions were considered (i.e. tracking which VF queues had been "started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed more complicated than it was worth. This solution also brings in the chance for other unexpected conditions due to invalid state bit checks. So, the proposed solution seemed like the best option since there is no harm in failing to stop Tx queues that were never started. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ice: Fix replacing VF hardware MAC to existing MAC filterSylwester Dziedziuch1-5/+9
[ Upstream commit ce572a5b88d5ca6737b5e23da9892792fd708ad3 ] VF was not able to change its hardware MAC address in case the new address was already present in the MAC filter list. Change the handling of VF add mac request to not return if requested MAC address is already present on the list and check if its hardware MAC needs to be updated in this case. Fixes: ed4c068d46f6 ("ice: Enable ip link show on the PF to display VF unicast MAC(s)") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: intel: igc_ptp: fix build for UMLRandy Dunlap1-1/+1
[ Upstream commit 523994ba3ad1b7b55abe4a72e156897b5e2db825 ] On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp conversion. The function that is used is not available on UML builds, so have the function use the default system_counterval_t timestamp instead for UML builds. Prevents this build error on UML: ../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’: ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration] return convert_art_ns_to_tsc(tstamp); ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected return convert_art_ns_to_tsc(tstamp); Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ice: Move devlink port to PF/VF structWojciech Drewek7-37/+103
[ Upstream commit 2ae0aa4758b0f4a247d45cb3bf01548a7f396751 ] Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-25ice: check whether PTP is initialized in ice_ptp_release()Yongxin Liu1-0/+3
PTP is currently only supported on E810 devices, it is checked in ice_ptp_init(). However, there is no check in ice_ptp_release(). For other E800 series devices, ice_ptp_release() will be wrongly executed. Fix the following calltrace. INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Workqueue: ice ice_service_task [ice] Call Trace: dump_stack_lvl+0x5b/0x82 dump_stack+0x10/0x12 register_lock_class+0x495/0x4a0 ? find_held_lock+0x3c/0xb0 __lock_acquire+0x71/0x1830 lock_acquire+0x1e6/0x330 ? ice_ptp_release+0x3c/0x1e0 [ice] ? _raw_spin_lock+0x19/0x70 ? ice_ptp_release+0x3c/0x1e0 [ice] _raw_spin_lock+0x38/0x70 ? ice_ptp_release+0x3c/0x1e0 [ice] ice_ptp_release+0x3c/0x1e0 [ice] ice_prepare_for_reset+0xcb/0xe0 [ice] ice_do_reset+0x38/0x110 [ice] ice_service_task+0x138/0xf10 [ice] ? __this_cpu_preempt_check+0x13/0x20 process_one_work+0x26a/0x650 worker_thread+0x3f/0x3b0 ? __kthread_parkme+0x51/0xb0 ? process_one_work+0x650/0x650 kthread+0x161/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-25ice: Respond to a NETDEV_UNREGISTER event for LAGDave Ertman1-14/+4
When the PF is a member of a link aggregate, and the driver is removed, the process will hang unless we respond to the NETDEV_UNREGISTER event that is sent to the event_handler for LAG. Add a case statement for the ice_lag_event_handler to unlink the PF from the link aggregate. Also remove code that was incorrectly applying a dev_hold to peer_netdevs that were associated with the ice driver. Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20ice: Add missing E810 device idsTony Nguyen3-0/+8
As part of support for E810 XXV devices, some device ids were inadvertently left out. Add those missing ids. Fixes: 195fb97766da ("ice: add additional E810 device id") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
2021-10-20igc: Update I226_K device IDSasha Neftin1-1/+1
The device ID for I226_K was incorrectly assigned, update the device ID to the correct one. Fixes: bfa5e98c9de4 ("igc: Add new device ID") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20e1000e: Fix packet loss on Tiger Lake and laterSasha Neftin2-1/+13
Update the HW MAC initialization flow. Do not gate DMA clock from the modPHY block. Keeping this clock will prevent dropped packets sent in burst mode on the Kumeran interface. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377 Fixes: fb776f5d57ee ("e1000e: Add support for Tiger Lake") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Mark Pearson <markpearson@lenovo.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20e1000e: Separate TGP board type from SPTSasha Neftin3-23/+46
We have the same LAN controller on different PCHs. Separate TGP board type from SPT which will allow for specific fixes to be applied for TGP platforms. Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Mark Pearson <markpearson@lenovo.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Print the api_patch as part of the fw.mgmt.apiBrett Creeley1-1/+2
Currently when a user uses "devlink dev info", the fw.mgmt.api will be the major.minor numbers as shown below: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7 <--- No patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 There are many features in the driver that depend on the major, minor, and patch version of the FW. Without the patch number in the output for fw.mgmt.api debugging issues related to the FW API version is difficult. Also, using major.minor.patch aligns with the existing firmware version which uses a 3 digit value. Fix this by making the fw.mgmt.api print the major.minor.patch versions. Shown below is the result: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7.9 <--- patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: fix getting UDP tunnel entryMichal Swiatkowski1-2/+2
Correct parameters order in call to ice_tunnel_idx_to_entry function. Entry in sparse port table is correct when the idx is 0. For idx 1 one correct entry should be skipped, for idx 2 two of them should be skipped etc. Change if condition to be true when idx is 0, which means that previous valid entry of this tunnel type were skipped. Fixes: b20e6c17c468 ("ice: convert to new udp_tunnel infrastructure") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Avoid crash from unnecessary IDA freeDave Ertman1-1/+5
In the remove path, there is an attempt to free the aux_idx IDA whether it was allocated or not. This can potentially cause a crash when unloading the driver on systems that do not initialize support for RDMA. But, this free cannot be gated by the status bit for RDMA, since it is allocated if the driver detects support for RDMA at probe time, but the driver can enter into a state where RDMA is not supported after the IDA has been allocated at probe time and this would lead to a memory leak. Initialize aux_idx to an invalid value and check for a valid value when unloading to determine if an IDA free is necessary. Fixes: d25a0fc41c1f9 ("ice: Initialize RDMA support") Reported-by: Jun Miao <jun.miao@windriver.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Fix failure to re-add LAN/RDMA Tx queuesBrett Creeley3-0/+23
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active the RDMA Tx queue scheduler node configuration will not be cleaned up. This will cause the rebuild/re-add of the VSI to fail due to the software structures not being correctly cleaned up for the VSI index. Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there are no RDMA scheduler nodes created, then there is no harm in calling ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if RDMA support is added for other VSI types they will also get this change. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-12ice: fix locking for Tx timestamp tracking flushJacob Keller1-8/+7
Commit 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") added a lock around the Tx timestamp tracker flow which is used to cleanup any left over SKBs and prepare for device removal. This lock is problematic because it is being held around a call to ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY write command to firmware. This could lead to a deadlock if the mutex actually sleeps, and causes the following warning on a kernel with preemption debugging enabled: [ 715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573 [ 715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod [ 715.435652] INFO: lockdep is turned off. [ 715.439591] Preemption disabled at: [ 715.439594] [<0000000000000000>] 0x0 [ 715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G W OE 5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c [ 715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 715.468483] Call Trace: [ 715.470940] dump_stack_lvl+0x6a/0x9a [ 715.474613] ___might_sleep.cold+0x224/0x26a [ 715.478895] __mutex_lock+0xb3/0x1440 [ 715.482569] ? stack_depot_save+0x378/0x500 [ 715.486763] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.494979] ? kfree+0xc1/0x520 [ 715.498128] ? mutex_lock_io_nested+0x12a0/0x12a0 [ 715.502837] ? kasan_set_free_info+0x20/0x30 [ 715.507110] ? __kasan_slab_free+0x10b/0x140 [ 715.511385] ? slab_free_freelist_hook+0xc7/0x220 [ 715.516092] ? kfree+0xc1/0x520 [ 715.519235] ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.527359] ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.535133] ? pci_device_remove+0xab/0x1d0 [ 715.539318] ? __device_release_driver+0x35b/0x690 [ 715.544110] ? driver_detach+0x214/0x2f0 [ 715.548035] ? bus_remove_driver+0x11d/0x2f0 [ 715.552309] ? pci_unregister_driver+0x26/0x250 [ 715.556840] ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.564799] ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.570554] ? do_syscall_64+0x3b/0x90 [ 715.574303] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.579529] ? start_flush_work+0x542/0x8f0 [ 715.583719] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.591923] ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.599960] ? wait_for_completion_io+0x250/0x250 [ 715.604662] ? lock_acquire+0x196/0x200 [ 715.608504] ? do_raw_spin_trylock+0xa5/0x160 [ 715.612864] ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.620813] ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.628497] ? __debug_check_no_obj_freed+0x1e8/0x3c0 [ 715.633550] ? trace_hardirqs_on+0x1c/0x130 [ 715.637748] ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.646220] ? do_raw_spin_trylock+0xa5/0x160 [ 715.650581] ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.658797] ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.667013] ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.675403] ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.683440] ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.691037] ? _raw_spin_unlock_irqrestore+0x46/0x73 [ 715.696005] pci_device_remove+0xab/0x1d0 [ 715.700018] __device_release_driver+0x35b/0x690 [ 715.704637] driver_detach+0x214/0x2f0 [ 715.708389] bus_remove_driver+0x11d/0x2f0 [ 715.712489] pci_unregister_driver+0x26/0x250 [ 715.716857] ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.724637] __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.730210] ? free_module+0x6d0/0x6d0 [ 715.733963] ? task_work_run+0xe1/0x170 [ 715.737803] ? exit_to_user_mode_loop+0x17f/0x1d0 [ 715.742509] ? rcu_read_lock_sched_held+0x12/0x80 [ 715.747215] ? trace_hardirqs_on+0x1c/0x130 [ 715.751401] do_syscall_64+0x3b/0x90 [ 715.754981] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.760033] RIP: 0033:0x7f4dfe59000b [ 715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48 [ 715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b [ 715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918 [ 715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940 [ 715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0 Notice that this is the only case where we use the lock in this way. In the cleanup kthread and work kthread the lock is only taken around the bit accesses. This was done intentionally to avoid this kind of issue. The way the lock is used, we only protect ordering of bit sets vs bit clears. The Tx writers in the hot path don't need to be protected against the entire kthread loop. The Tx queues threads only need to ensure that they do not re-use an index that is currently in use. The cleanup loop does not need to block all new set bits, since it will re-queue itself if new timestamps are present. Fix the tracker flow so that it uses the same flow as the standard cleanup thread. In addition, ensure the in_use bitmap actually gets cleared properly. This fixes the warning and also avoids the potential deadlock that might have occurred otherwise. Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06iavf: fix double unlock of crit_lockStefan Assmann1-1/+0
The crit_lock mutex could be unlocked twice as reported here https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210823/025525.html Remove the superfluous unlock. Technically the problem was already present before 5ac49f3c2702 as that commit only replaced the locking primitive, but no functional change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Fixes: bac8486116b0 ("iavf: Refactor the watchdog state machine") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06i40e: Fix freeing of uninitialized misc IRQ vectorSylwester Dziedziuch1-1/+2
When VSI set up failed in i40e_probe() as part of PF switch set up driver was trying to free misc IRQ vectors in i40e_clear_interrupt_scheme and produced a kernel Oops: Trying to free already-free IRQ 266 WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300 Workqueue: events work_for_cpu_fn RIP: 0010:__free_irq+0x9a/0x300 Call Trace: ? synchronize_irq+0x3a/0xa0 free_irq+0x2e/0x60 i40e_clear_interrupt_scheme+0x53/0x190 [i40e] i40e_probe.part.108+0x134b/0x1a40 [i40e] ? kmem_cache_alloc+0x158/0x1c0 ? acpi_ut_update_ref_count.part.1+0x8e/0x345 ? acpi_ut_update_object_reference+0x15e/0x1e2 ? strstr+0x21/0x70 ? irq_get_irq_data+0xa/0x20 ? mp_check_pin_attr+0x13/0xc0 ? irq_get_irq_data+0xa/0x20 ? mp_map_pin_to_irq+0xd3/0x2f0 ? acpi_register_gsi_ioapic+0x93/0x170 ? pci_conf1_read+0xa4/0x100 ? pci_bus_read_config_word+0x49/0x70 ? do_pci_enable_device+0xcc/0x100 local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1a7/0x360 worker_thread+0x1cf/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x1f/0x40 The problem is that at that point misc IRQ vectors were not allocated yet and we get a call trace that driver is trying to free already free IRQ vectors. Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED PF state before calling i40e_free_misc_vector. This state is set only if misc IRQ vectors were properly initialized. Fixes: c17401a1dd21 ("i40e: use separate state bit for miscellaneous IRQ setup") Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06i40e: fix endless loop under rtnlJiri Benc1-1/+1
The loop in i40e_get_capabilities can never end. The problem is that although i40e_aq_discover_capabilities returns with an error if there's a firmware problem, the returned error is not checked. There is a check for pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most firmware problems. When i40e_aq_discover_capabilities encounters a firmware problem, it will encounter the same problem on its next invocation. As the result, the loop becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking at the code, it can happen with a range of other firmware errors. I don't know what the correct behavior should be: whether the firmware should be retried a few times, or whether pf->hw.aq.asq_last_status should be always set to the encountered firmware error (but then it would be pointless and can be just replaced by the i40e_aq_discover_capabilities return value). However, the current behavior with an endless loop under the rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we explained the bug to them 7 months ago. This may not be the best possible fix but it's better than hanging the whole system on a firmware bug. Fixes: 56a62fc86895 ("i40e: init code and hardware support") Tested-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-29ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setupFeng Zhou2-3/+7
The ixgbe driver currently generates a NULL pointer dereference with some machine (online cpus < 63). This is due to the fact that the maximum value of num_xdp_queues is nr_cpu_ids. Code is in "ixgbe_set_rss_queues"". Here's how the problem repeats itself: Some machine (online cpus < 63), And user set num_queues to 63 through ethtool. Code is in the "ixgbe_set_channels", adapter->ring_feature[RING_F_FDIR].limit = count; It becomes 63. When user use xdp, "ixgbe_set_rss_queues" will set queues num. adapter->num_rx_queues = rss_i; adapter->num_tx_queues = rss_i; adapter->num_xdp_queues = ixgbe_xdp_queues(adapter); And rss_i's value is from f = &adapter->ring_feature[RING_F_FDIR]; rss_i = f->indices = f->limit; So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup", for (i = 0; i < adapter->num_rx_queues; i++) if (adapter->xdp_ring[i]->xsk_umem) It leads to panic. Call trace: [exception RIP: ixgbe_xdp+368] RIP: ffffffffc02a76a0 RSP: ffff9fe16202f8d0 RFLAGS: 00010297 RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000001c RDI: ffffffffa94ead90 RBP: ffff92f8f24c0c18 R8: 0000000000000000 R9: 0000000000000000 R10: ffff9fe16202f830 R11: 0000000000000000 R12: ffff92f8f24c0000 R13: ffff9fe16202fc01 R14: 000000000000000a R15: ffffffffc02a7530 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc 8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808 9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235 10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384 11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd 12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb 13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88 14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319 15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290 16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8 17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64 18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9 19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c So I fix ixgbe_max_channels so that it will not allow a setting of queues to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup, take the smaller value of num_rx_queues and num_xdp_queues. Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP") Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27e100: fix buffer overrun in e100_get_regsJacob Keller1-6/+10
The e100_get_regs function is used to implement a simple register dump for the e100 device. The data is broken into a couple of MAC control registers, and then a series of PHY registers, followed by a memory dump buffer. The total length of the register dump is defined as (1 + E100_PHY_REGS) * sizeof(u32) + sizeof(nic->mem->dump_buf). The logic for filling in the PHY registers uses a convoluted inverted count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This is actually one more than the supposed number of PHY registers. The memory dump buffer is then filled into the space at [2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past the total size. The end result is that we overrun the total buffer size allocated by the kernel, which could lead to a panic or other issues due to memory corruption. It is difficult to determine the actual total number of registers here. The only 8255x datasheet I could find indicates there are 28 total MDI registers. However, we're reading 29 here, and reading them in reverse! In addition, the ethtool e100 register dump interface appears to read the first PHY register to determine if the device is in MDI or MDIx mode. This doesn't appear to be documented anywhere within the 8255x datasheet. I can only assume it must be in register 28 (the extra register we're reading here). Lets not change any of the intended meaning of what we copy here. Just extend the space by 4 bytes to account for the extra register and continue copying the data out in the same order. Change the E100_PHY_REGS value to be the correct total (29) so that the total register dump size is calculated properly. Fix the offset for where we copy the dump buffer so that it doesn't overrun the total size. Re-write the for loop to use counting up instead of the convoluted down-counting. Correct the mdio_read offset to use the 0-based register offsets, but maintain the bizarre reverse ordering so that we have the ABI expected by applications like ethtool. This requires and additional subtraction of 1. It seems a bit odd but it makes the flow of assignment into the register buffer easier to follow. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-27e100: fix length calculation in e100_get_regs_lenJacob Keller1-1/+5
commit abf9b902059f ("e100: cleanup unneeded math") tried to simplify e100_get_regs_len and remove a double 'divide and then multiply' calculation that the e100_reg_regs_len function did. This change broke the size calculation entirely as it failed to account for the fact that the numbered registers are actually 4 bytes wide and not 1 byte. This resulted in a significant under allocation of the register buffer used by e100_get_regs. Fix this by properly multiplying the register count by u32 first before adding the size of the dump buffer. Fixes: abf9b902059f ("e100: cleanup unneeded math") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-19igc: fix build errors for PTPRandy Dunlap1-0/+1
When IGC=y and PTP_1588_CLOCK=m, the ptp_*() interface family is not available to the igc driver. Make this driver depend on PTP_1588_CLOCK_OPTIONAL so that it will build without errors. Various igc commits have used ptp_*() functions without checking that PTP_1588_CLOCK is enabled. Fix all of these here. Fixes these build errors: ld: drivers/net/ethernet/intel/igc/igc_main.o: in function `igc_msix_other': igc_main.c:(.text+0x6494): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x64ef): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x6559): undefined reference to `ptp_clock_event' ld: drivers/net/ethernet/intel/igc/igc_ethtool.o: in function `igc_ethtool_get_ts_info': igc_ethtool.c:(.text+0xc7a): undefined reference to `ptp_clock_index' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_feature_enable_i225': igc_ptp.c:(.text+0x330): undefined reference to `ptp_find_pin' ld: igc_ptp.c:(.text+0x36f): undefined reference to `ptp_find_pin' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_init': igc_ptp.c:(.text+0x11cd): undefined reference to `ptp_clock_register' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_stop': igc_ptp.c:(.text+0x12dd): undefined reference to `ptp_clock_unregister' ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe': Fixes: 64433e5bf40ab ("igc: Enable internal i225 PPS") Fixes: 60dbede0c4f3d ("igc: Add support for ethtool GET_TS_INFO command") Fixes: 87938851b6efb ("igc: enable auxiliary PHC functions for the i225") Fixes: 5f2958052c582 ("igc: Add basic skeleton for PTP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Ederson de Souza <ederson.desouza@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: intel-wired-lan@lists.osuosl.org Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16igc: fix tunnel offloadingPaolo Abeni1-1/+3
Checking tunnel offloading, it turns out that offloading doesn't work as expected. The following script allows to reproduce the issue. Call it as `testscript DEVICE LOCALIP REMOTEIP NETMASK' === SNIP === if [ $# -ne 4 ] then echo "Usage $0 DEVICE LOCALIP REMOTEIP NETMASK" exit 1 fi DEVICE="$1" LOCAL_ADDRESS="$2" REMOTE_ADDRESS="$3" NWMASK="$4" echo "Driver: $(ethtool -i ${DEVICE} | awk '/^driver:/{print $2}') " ethtool -k "${DEVICE}" | grep tx-udp echo echo "Set up NIC and tunnel..." ip addr add "${LOCAL_ADDRESS}/${NWMASK}" dev "${DEVICE}" ip link set "${DEVICE}" up sleep 2 ip link add vxlan1 type vxlan id 42 \ remote "${REMOTE_ADDRESS}" \ local "${LOCAL_ADDRESS}" \ dstport 0 \ dev "${DEVICE}" ip addr add fc00::1/64 dev vxlan1 ip link set vxlan1 up sleep 2 rm -f vxlan.pcap echo "Running tcpdump and iperf3..." ( nohup tcpdump -i any -w vxlan.pcap >/dev/null 2>&1 ) & sleep 2 iperf3 -c fc00::2 >/dev/null pkill tcpdump echo echo -n "Max. Paket Size: " tcpdump -r vxlan.pcap -nnle 2>/dev/null \ | grep "${LOCAL_ADDRESS}.*> ${REMOTE_ADDRESS}.*OTV" \ | awk '{print $8}' | awk -F ':' '{print $1}' \ | sort -n | tail -1 echo ip link del vxlan1 ip addr del ${LOCAL_ADDRESS}/${NWMASK} dev "${DEVICE}" === SNAP === The expected outcome is Max. Paket Size: 64904 This is what you see on igb, the code igc has been taken from. However, on igc the output is Max. Paket Size: 1516 so the GSO aggregate packets are segmented by the kernel before calling igc_xmit_frame. Inside the subsequent call to igc_tso, the check for skb_is_gso(skb) fails and the function returns prematurely. It turns out that this occurs because the feature flags aren't set entirely correctly in igc_probe. In contrast to the original code from igb_probe, igc_probe neglects to set the flags required to allow tunnel offloading. Setting the same flags as igb fixes the issue on igc. Fixes: 34428dff3679 ("igc: Add GSO partial support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-10ice: Correctly deal with PFs that do not support RDMADave Ertman2-0/+8
There are two cases where the current PF does not support RDMA functionality. The first is if the NVM loaded on the device is set to not support RDMA (common_caps.rdma is false). The second is if the kernel bonding driver has included the current PF in an active link aggregate. When the driver has determined that this PF does not support RDMA, then auxiliary devices should not be created on the auxiliary bus. Without a device on the auxiliary bus, even if the irdma driver is present, there will be no RDMA activity attempted on this PF. Currently, in the reset flow, an attempt to create auxiliary devices is performed without regard to the ability of the PF. There needs to be a check in ice_aux_plug_dev (as the central point that creates auxiliary devices) to see if the PF is in a state to support the functionality. When disabling and re-enabling RDMA due to the inclusion/removal of the PF in a link aggregate, we also need to set/clear the bit which controls auxiliary device creation so that a reset recovery in a link aggregate situation doesn't try to create auxiliary devices when it shouldn't. Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reported-by: Yongxin Liu <yongxin.liu@windriver.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-16/+63
include/linux/netdevice.h net/socket.c d0efb16294d1 ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") 876f0bf9d0d5 ("net: socket: simplify dev_ifconf handling") 29c4964822aa ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-27ice: Only lock to update netdev dev_addrBrett Creeley1-4/+9
commit 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") introduced calls to netif_addr_lock_bh() and netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is fine since the driver is updated the netdev's dev_addr, but since this is a spinlock, the driver cannot sleep when the lock is held. Unfortunately the functions to add/delete MAC filters depend on a mutex. This was causing a trace with the lock debug kernel config options enabled when changing the mac address via iproute. [ 203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip [ 203.273068] Preemption disabled at: [ 203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S W I 5.14.0-rc4 #2 [ 203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 203.273102] Call Trace: [ 203.273107] dump_stack_lvl+0x33/0x42 [ 203.273113] ? ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273124] ___might_sleep.cold.150+0xda/0xea [ 203.273131] mutex_lock+0x1c/0x40 [ 203.273136] ice_remove_mac+0xe3/0x180 [ice] [ 203.273155] ? ice_fltr_add_mac_list+0x20/0x20 [ice] [ 203.273175] ice_fltr_prepare_mac+0x43/0xa0 [ice] [ 203.273194] ice_set_mac_address+0xab/0x1c0 [ice] [ 203.273206] dev_set_mac_address+0xb8/0x120 [ 203.273210] dev_set_mac_address_user+0x2c/0x50 [ 203.273212] do_setlink+0x1dd/0x10e0 [ 203.273217] ? __nla_validate_parse+0x12d/0x1a0 [ 203.273221] __rtnl_newlink+0x530/0x910 [ 203.273224] ? __kmalloc_node_track_caller+0x17f/0x380 [ 203.273230] ? preempt_count_add+0x68/0xa0 [ 203.273236] ? _raw_spin_lock_irqsave+0x1f/0x30 [ 203.273241] ? kmem_cache_alloc_trace+0x4d/0x440 [ 203.273244] rtnl_newlink+0x43/0x60 [ 203.273245] rtnetlink_rcv_msg+0x13a/0x380 [ 203.273248] ? rtnl_calcit.isra.40+0x130/0x130 [ 203.273250] netlink_rcv_skb+0x4e/0x100 [ 203.273256] netlink_unicast+0x1a2/0x280 [ 203.273258] netlink_sendmsg+0x242/0x490 [ 203.273260] sock_sendmsg+0x58/0x60 [ 203.273263] ____sys_sendmsg+0x1ef/0x260 [ 203.273265] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273268] ? ____sys_recvmsg+0xe6/0x170 [ 203.273270] ___sys_sendmsg+0x7c/0xc0 [ 203.273272] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273274] ? ___sys_recvmsg+0x89/0xc0 [ 203.273276] ? __netlink_sendskb+0x50/0x50 [ 203.273278] ? mod_objcg_state+0xee/0x310 [ 203.273282] ? __dentry_kill+0x114/0x170 [ 203.273286] ? get_max_files+0x10/0x10 [ 203.273288] __sys_sendmsg+0x57/0xa0 [ 203.273290] do_syscall_64+0x37/0x80 [ 203.273295] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 203.273296] RIP: 0033:0x7f8edf96e278 [ 203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 [ 203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278 [ 203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003 [ 203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0 [ 203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005 Fix this by only locking when changing the netdev->dev_addr. Also, make sure to restore the old netdev->dev_addr on any failures. Fixes: 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27ice: restart periodic outputs around time changesJacob Keller1-0/+49
When we enabled auxiliary input/output support for the E810 device, we forgot to add logic to restart the output when we change time. This is important as the periodic output will be incorrect after a time change otherwise. This unfortunately includes the adjust time function, even though it uses an atomic hardware interface. The atomic adjustment can still cause the pin output to stall permanently, so we need to stop and restart it. Introduce wrapper functions to temporarily disable and then re-enable the clock outputs. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha D Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27igc: Add support for CBS offloadingAravindhan Gunasekaran5-1/+195
Implement support for Credit-based shaper(CBS) Qdisc hardware offload mode in the driver. There are two sets of IEEE802.1Qav (CBS) HW logic in i225 controller and this patch supports enabling them in the top two priority TX queues. Driver implemented as recommended by Foxville External Architecture Specification v0.993. Idleslope and Hi-credit are the CBS tunable parameters for i225 NIC, programmed in TQAVCC and TQAVHC registers respectively. In-order for IEEE802.1Qav (CBS) algorithm to work as intended and provide BW reservation CBS should be enabled in highest priority queue first. If we enable CBS on any of low priority queues, the traffic in high priority queue does not allow low priority queue to be selected for transmission and bandwidth reservation is not guaranteed. Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27igc: Simplify TSN flags handlingVinicius Costa Gomes4-27/+43
Separates the procedure done during reset from applying a configuration, knowing when the code is executing allow us to separate the better what changes the hardware state from what changes only the driver state. Introduces a flag for bookkeeping the driver state of TSN features. When Qav and frame-preemption is also implemented this flag makes it easier to keep track on whether a TSN feature driver state is enabled or not though controller state changes, say, during a reset. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27igc: Use default cycle 'start' and 'end' values for queuesVinicius Costa Gomes2-22/+21
Sets default values for each queue cycle start and cycle end. This allows some simplification in the handling of these configurations as most TSN features in i225 require a cycle to be configured. In i225, cycle start and end time is required to be programmed for CBS to work properly. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>