summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2022-03-15ice: cleanup error logging for ice_ena_vfsJacob Keller1-13/+19
The ice_ena_vfs function and some of its sub-functions like ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }" flow. This flow discards specialized errors reported by the called function. This style is generally not preferred as it makes tracing error sources more difficult. It also means we cannot log the actual error received properly. Refactor several calls in the ice_ena_vfs function that do this to catch the error in the 'ret' variable. Report this in the messages, and then return the more precise error value. Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in places where -ENOSPC makes more sense. Fix these calls up to return the more appropriate value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: move ice_set_vf_port_vlan near other .ndo opsJacob Keller1-96/+96
The ice_set_vf_port_vlan function is located in ice_sriov.c very far away from the other .ndo operations that it is similar to. Move this so that its located near the other .ndo operation definitions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: refactor spoofchk control code in ice_sriov.cJacob Keller1-11/+7
The API to control the VSI spoof checking for a VF VSI has three functions: enable, disable, and set. The set function takes the VSI and the VF and decides whether to call enable or disable based on the vf->spoofchk field. In some flows, vf->spoofchk is not yet set, such as the function used to control the setting for a VF. (vf->spoofchk is only updated after a success). Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be "ice_vsi_apply_spoofchk" which takes the boolean and allows all callers to avoid having to determine whether to call enable or disable themselves. This matches the expected callers better, and will prevent the need to export more than one function when this code must be called from another file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: rename ICE_MAX_VF_COUNT to avoid confusionJacob Keller3-7/+7
The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true for SR-IOV but will not be true for all VF implementations, such as when the ice driver supports Scalable IOV. Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: remove unused definitions from ice_sriov.hJacob Keller2-7/+1
A few more macros exist in ice_sriov.h which are not used anywhere. These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability is set but never checked anywhere in the driver. Thus it is also safe to remove. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: convert vf->vc_ops to a const pointerJacob Keller3-23/+55
The vc_ops structure is used to allow different handlers for virtchnl commands when the driver is in representor mode. The current implementation uses a copy of the ops table in each VF, and modifies this copy dynamically. The usual practice in kernel code is to store the ops table in a constant structure and point to different versions. This has a number of advantages: 1. Reduced memory usage. Each VF merely points to the correct table, so they're able to re-use the same constant lookup table in memory. 2. Consistency. It becomes more difficult to accidentally update or edit only one op call. Instead, the code switches to the correct able by a single pointer write. In general this is atomic, either the pointer is updated or its not. 3. Code Layout. The VF structure can store a pointer to the table without needing to have the full structure definition defined prior to the VF structure definition. This will aid in future refactoring of code by allowing the VF pointer to be kept in ice_vf_lib.h while the virtchnl ops table can be maintained in ice_virtchnl.h There is one major downside in the case of the vc_ops structure. Most of the operations in the table are the same between the two current implementations. This can appear to lead to duplication since each implementation must now fill in the complete table. It could make spotting the differences in the representor mode more challenging. Unfortunately, methods to make this less error prone either add complexity overhead (macros using CPP token concatenation) or don't work on all compilers we support (constant initializer from another constant structure). The cost of maintaining two structures does not out weigh the benefits of the constant table model. While we're making these changes, go ahead and rename the structure and implementations with "virtchnl" instead of "vc_vf_". This will more closely align with the planned file renaming, and avoid similar names when we later introduce a "vf ops" table for separating Scalable IOV and Single Root IOV implementations. Leave the accessor/assignment functions in order to avoid issues with compiling with options disabled. The interface makes it easier to handle when CONFIG_PCI_IOV is disabled in the kernel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: remove circular header dependencies on ice.hJacob Keller14-11/+35
Several headers in the ice driver include ice.h even though they are themselves included by that header. The most notable of these is ice_common.h, but several other headers also do this. Such a recursive inclusion is problematic as it forces headers to be included in a strict order, otherwise compilation errors can result. The circular inclusions do not trigger an endless loop due to standard header inclusion guards, however other errors can occur. For example, ice_flow.h defines ice_rss_hash_cfg, which is used by ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx. ice_flow.h includes ice_acl.h, which includes ice_common.h, and which finally includes ice.h. Since ice.h itself includes ice_sriov.h, this creates a circular dependency. The definition in ice_sriov.h requires things from ice_flow.h, but ice_flow.h itself will lead to trying to load ice_sriov.h as part of its process for expanding ice.h. The current code avoids this issue by having an implicit dependency without the include of ice_flow.h. If we were to fix that so that ice_sriov.h explicitly depends on ice_flow.h the following pattern would occur: ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h At this point, during the expansion of, the header guard for ice_flow.h is already set, so when ice_sriov.h attempts to load the ice_flow.h header it is skipped. Then, we go on to begin including the rest of ice_sriov.h, including structure definitions which depend on ice_rss_hash_cfg. This produces a compiler warning because ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the start of ice_flow.h! If the order of headers is incorrect (ice_flow.h is not implicitly loaded first in all files which include ice_sriov.h) then we get the same failure. Removing this recursive inclusion requires fixing a few cases where some headers depended on the header inclusions from ice.h. In addition, a few other changes are also required. Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h, which is the likely reason that ice_common.h includes ice.h at all. This macro implementation requires the full definition of ice_pf in order to properly compile. Fix this by moving it to a function declared in ice_main.c, so that we do not require all files to depend on the layout of the ice_pf structure. Note that this change only fixes circular dependencies, but it does not fully resolve all implicit dependencies where one header may depend on the inclusion of another. I tried to fix as many of the implicit dependencies as I noticed, but fixing them all requires a somewhat tedious analysis of each header and attempting to compile it separately. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: rename ice_virtchnl_pf.c to ice_sriov.cJacob Keller7-8/+8
The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the code for implementing Single Root IOV virtualization resides. This code includes support for bringing up and tearing down VFs, hooks into the kernel SR-IOV netdev operations, and for handling virtchnl messages from VFs. In the future, we plan to support Scalable IOV in addition to Single Root IOV as an alternative virtualization scheme. This implementation will re-use some but not all of the code in ice_virtchnl_pf.c To prepare for this future, we want to refactor and split up the code in ice_virtchnl_pf.c into the following scheme: * ice_vf_lib.[ch] Basic VF structures and accessors. This is where scheme-independent code will reside. * ice_virtchnl.[ch] Virtchnl message handling. This is where the bulk of the logic for processing messages from VFs using the virtchnl messaging scheme will reside. This is separated from ice_vf_lib.c because it is distinct and has a bulk of the processing code. * ice_sriov.[ch] Single Root IOV implementation, including initialization and the routines for interacting with SR-IOV based netdev operations. * (future) ice_siov.[ch] Scalable IOV implementation. As a first step, lets assume that all of the code in ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to ice_sriov.c and its header to ice_sriov.h Future changes will further split out the code in these files following the plan outlined here. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: rename ice_sriov.c to ice_vf_mbx.cJacob Keller4-6/+6
The ice_sriov.c file primarily contains code which handles the logic for mailbox overflow detection and some other utility functions related to the virtualization mailbox. The bulk of the SR-IOV implementation is actually found in ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific. In the future, the ice driver will support an additional virtualization scheme known as Scalable IOV, and the code in this file will be used for this alternative implementation. Rename this file (and its associated header) to ice_vf_mbx.c, so that we can later re-use the ice_sriov.c file as the SR-IOV specific file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11ice: Support GTP-U and GTP-C offload in switchdevMarcin Szycik8-24/+765
Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add support for parsing GTP-specific options (QFI and PDU type) and TEID. By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: GTP-U with outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11ice: Fix FV offset searchingMichal Swiatkowski3-51/+12
Checking only protocol ids while searching for correct FVs can lead to a situation, when incorrect FV will be added to the list. Incorrect means that FV has correct protocol id but incorrect offset. Call ice_get_sw_fv_list with ice_prot_lkup_ext struct which contains all protocol ids with offsets. With this modification allocating and collecting protocol ids list is not longer needed. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11Merge branch '100GbE' of ↵Jakub Kicinski8-28/+370
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-09 This series contains updates to ice driver only. Martyna implements switchdev filtering on inner EtherType field for tunnels. Marcin adds reporting of slowpath statistics for port representors. Jonathan Toppins changes a non-fatal link error message from warning to debug. Maciej removes unnecessary checks in ice_clean_tx_irq(). Amritha adds support for ADQ to match outer destination MAC for tunnels. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for outer dest MAC for ADQ tunnels ice: avoid XDP checks in ice_clean_tx_irq() ice: change "can't set link" message to dbg level ice: Add slow path offload stats on port representor in switchdev ice: Add support for inner etype in switchdev ==================== Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-103/+97
net/dsa/dsa2.c commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU") commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit 4bcc4249b4cf ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11ice: Fix race condition during interface enslaveIvan Vecera2-2/+21
Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10e1000e: Print PHY register address when MDI read/write failsChen Yu1-4/+4
There is occasional suspend error from e1000e which blocks the system from further suspending. And the issue was found on a WhiskeyLake-U platform with I219-V: [ 20.078957] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x780 [e1000e] returns -2 [ 20.078970] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -2 [ 20.078974] e1000e 0000:00:1f.6: PM: pci_pm_suspend+0x0/0x170 returned -2 after 371012 usecs [ 20.078978] e1000e 0000:00:1f.6: PM: failed to suspend async: error -2 According to the code flow, this might be caused by broken MDI read/write to PHY registers. However currently the code does not tell us which register is broken. Thus enhance the debug information to print the offender PHY register. So the next the issue is reproduced, this information could be used for narrow down. Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220308172030.451566-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09ice: Add support for outer dest MAC for ADQ tunnelsAmritha Nambiar1-4/+28
TC flower does not support matching on user specified outer MAC address for tunnels. For ADQ tunnels, the driver adds outer destination MAC address as lower netdev's active unicast MAC address to filter out packets with unrelated MAC address being delivered to ADQ VSIs. Example: - create tunnel device ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \ dev $PF - add TC filter (in ADQ mode) $tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \ dst_ip $INNER_DST_IP ip_proto tcp dst_port $INNER_DST_PORT \ enc_key_id $VXLAN_VNI hw_tc $ADQ_TC Note: Filters with wild-card tunnel ID (when user does not specify tunnel key) are also supported. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: avoid XDP checks in ice_clean_tx_irq()Maciej Fijalkowski1-6/+1
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced Tx IRQ cleaning routine dedicated for XDP rings. Currently it is impossible to call ice_clean_tx_irq() against XDP ring, so it is safe to drop ice_ring_is_xdp() calls in there. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: change "can't set link" message to dbg levelJonathan Toppins1-3/+3
In the case where the link is owned by manageability, the firmware is not allowed to set the link state, so an error code is returned. This however is non-fatal and there is nothing the operator can do, so instead of confusing the operator with messages they can do nothing about hide this message behind the debug log level. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: Add slow path offload stats on port representor in switchdevMarcin Szycik3-3/+61
Implement callbacks to check for stats and fetch port representor stats. Stats are taken from RX/TX ring corresponding to port representor and show the number of bytes/packets that were not offloaded. To see slow path stats run: ifstat -x cpu_hits -a Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: Add support for inner etype in switchdevMartyna Szapar-Mudlaw3-12/+277
Enable support for adding TC rules that filter on the inner EtherType field of tunneled packet headers. Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: Fix curr_link_speed advertised speedJedrzej Jagielski1-1/+1
Change curr_link_speed advertised speed, due to link_info.link_speed is not equal phy.curr_user_speed_req. Without this patch it is impossible to set advertised speed to same as link_speed. Testing Hints: Try to set advertised speed to 25G only with 25G default link (use ethtool -s 0x80000000) Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: Don't use GFP_KERNEL in atomic contextChristophe JAILLET1-1/+1
ice_misc_intr() is an irq handler. It should not sleep. Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: Fix error with handling of bonding MTUDave Ertman2-15/+15
When a bonded interface is destroyed, .ndo_change_mtu can be called during the tear-down process while the RTNL lock is held. This is a problem since the auxiliary driver linked to the LAN driver needs to be notified of the MTU change, and this requires grabbing a device_lock on the auxiliary_device's dev. Currently this is being attempted in the same execution context as the call to .ndo_change_mtu which is causing a dead-lock. Move the notification of the changed MTU to a separate execution context (watchdog service task) and eliminate the "before" notification. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: stop disabling VFs due to PF error responsesJacob Keller2-21/+0
The ice_vc_send_msg_to_vf function has logic to detect "failure" responses being sent to a VF. If a VF is sent more than ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled. Almost identical logic also existed in the i40e driver. This logic was added to the ice driver in commit 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") which itself copied from the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface"). Neither commit provides a proper explanation or justification of the check. In fact, later commits to i40e changed the logic to allow bypassing the check in some specific instances. The "logic" for this seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lack of resources, etc. Additionally, this causes the PF to log an info message for every failed VF response which may confuse users, and can spam the kernel log. This behavior is not documented as part of any requirement for our products and other operating system drivers such as the FreeBSD implementation of our drivers do not include this type of check. In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1 ("i40e: Lower some message levels") explains that these messages typically don't actually indicate a real issue. It is quite likely that a user who hits this in practice will be very confused as the VF will be disabled without an obvious way to recover. We already have robust malicious driver detection logic using actual hardware detection mechanisms that detect and prevent invalid device usage. Remove the logic since its not a documented requirement and the behavior is not intuitive. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09i40e: stop disabling VFs due to PF error responsesJacob Keller3-59/+9
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf) function has logic to detect "failure" responses sent to the VF. If a VF is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is marked as disabled. In either case, a dev_info message is printed stating that a VF opcode failed. This logic originates from the early implementation of VF support in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface"). That commit did not go far enough. The "logic" for this behavior seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lacking resources, an unsupported operation, etc. This does not indicate a malicious VF. We already have a separate robust malicious VF detection which relies on hardware logic to detect and prevent a variety of behaviors. There is no justification for this behavior in the original implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some message levels") reduced the opcode failure message from a dev_err to a dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") changed the logic to allow quieting it for expected failures. That commit prevented this logic from kicking in for specific circumstances. This change did not go far enough. The behavior is not documented nor is it part of any requirement for our products. Other operating systems such as the FreeBSD implementation of our driver do not include this logic. It is clear this check does not make sense, and causes problems which led to ugly workarounds. Fix this by just removing the entire logic and the need for the i40e_vc_send_msg_to_vf_ex function. Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09iavf: Fix adopting new combined settingMichal Maloszewski2-4/+10
In some cases overloaded flag IAVF_FLAG_REINIT_ITR_NEEDED which should indicate that interrupts need to be completely reinitialized during reset leads to RTNL deadlocks using ethtool -C while a reset is in progress. To fix, it was added a new flag IAVF_FLAG_REINIT_MSIX_NEEDED used to trigger MSI-X reinit. New combined setting is fixed adopt after VF reset. This has been implemented by call reinit interrupt scheme during VF reset. Without this fix new combined setting has never been adopted. Fixes: 209f2f9c7181 ("iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08iavf: Fix handling of vlan strip virtual channel messagesMichal Maloszewski1-0/+40
Modify netdev->features for vlan stripping based on virtual channel messages received from the PF. Change is needed to synchronize vlan strip status between PF sysfs and iavf ethtool. Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR") Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ixgbevf: add disable link stateSlawomir Mrozowicz5-1/+57
Add possibility to disable link state if it is administratively disabled in PF. It is part of the general functionality that allows the PF driver to control the state of the virtual link VF devices. Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ixgbe: add improvement for MDD response functionalitySlawomir Mrozowicz3-1/+52
The 82599 PF driver disable VF driver after a special MDD event occurs. Adds the option for administrators to control whether VFs are automatically disabled after several MDD events. The automatically disabling is now the default mode for 82599 PF driver, as it is more reliable. This addresses CVE-2021-33061. Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ixgbe: add the ability for the PF to disable VF link stateSlawomir Mrozowicz5-44/+182
Add support for ndo_set_vf_link_state the Network Device Option that allows the PF driver to control the virtual link state of the VF devices. Without this change a VF cannot be disabled/enabled by the administrator. In the implementation the auto state takes over PF link state to VF link setting, the enable state is not supported, the disable state shut off the VF link regardless of the PF setting. Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ice: xsk: fix GCC version checking against pragma unroll presenceMaciej Fijalkowski1-1/+1
Pragma unroll was introduced around GCC 8, whereas current xsk code in ice that prepares loop_unrolled_for macro that is based on mentioned pragma, compares GCC version against 4, which is wrong and Stephen found this out by compiling kernel with GCC 5.4 [0]. Fix this mistake and check if GCC version is >= 8. [0]: https://lore.kernel.org/netdev/20220307213659.47658125@canb.auug.org.au/ Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20220307231353.56638-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-04Merge branch '100GbE' of ↵David S. Miller13-554/+872
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-03 Jacob Keller says: This series refactors the ice networking driver VF storage from a simple static array to a hash table. It also introduces krefs and proper locking and protection to prevent common use-after-free and concurrency issues. There are two motivations for this work. First is to make the ice driver more resilient by preventing a whole class of use-after-free bugs that can occur around concurrent access to VF structures while removing VFs. The second is to prepare the ice driver for future virtualization work to support Scalable IOV, an alternative VF implementation compared to Single Root IOV. The new VF implementation will allow for more dynamic VF creation and removal, necessitating a more robust implementation for VF storage that can't rely on the existing mechanisms to prevent concurrent access violations. The first few patches are cleanup and preparatory work needed to make the conversion to the hash table safe. Following this preparatory work is a patch to migrate the VF structures and variables to a new sub-structure for code clarity. Next introduce new interface functions to abstract the VF storage. Finally, the driver is actually converted to the hash table and kref implementation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03ice: convert VF storage to hash table with krefs and RCUJacob Keller8-125/+363
The ice driver stores VF structures in a simple array which is allocated once at the time of VF creation. The VF structures are then accessed from the array by their VF ID. The ID must be between 0 and the number of allocated VFs. Multiple threads can access this table: * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust * interrupts, such as due to messages from the VF using the virtchnl communication * processing such as device reset * commands to add or remove VFs The current implementation does not keep track of when all threads are done operating on a VF and can potentially result in use-after-free issues caused by one thread accessing a VF structure after it has been released when removing VFs. Some of these are prevented with various state flags and checks. In addition, this structure is quite static and does not support a planned future where virtualization can be more dynamic. As we begin to look at supporting Scalable IOV with the ice driver (as opposed to just supporting Single Root IOV), this structure is not sufficient. In the future, VFs will be able to be added and removed individually and dynamically. To allow for this, and to better protect against a whole class of use-after-free bugs, replace the VF storage with a combination of a hash table and krefs to reference track all of the accesses to VFs through the hash table. A hash table still allows efficient look up of the VF given its ID, but also allows adding and removing VFs. It does not require contiguous VF IDs. The use of krefs allows the cleanup of the VF memory to be delayed until after all threads have released their reference (by calling ice_put_vf). To prevent corruption of the hash table, a combination of RCU and the mutex table_lock are used. Addition and removal from the hash table use the RCU-aware hash macros. This allows simple read-only look ups that iterate to locate a single VF can be fast using RCU. Accesses which modify the hash table, or which can't take RCU because they sleep, will hold the mutex lock. By using this design, we have a stronger guarantee that the VF structure can't be released until after all threads are finished operating on it. We also pave the way for the more dynamic Scalable IOV implementation in the future. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-83/+152
net/batman-adv/hard-interface.c commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check") commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails") commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03ice: introduce VF accessor functionsJacob Keller5-53/+115
Before we switch the VF data structure storage mechanism to a hash, introduce new accessor functions to define the new interface. * ice_get_vf_by_id is a function used to obtain a reference to a VF from the table based on its VF ID * ice_has_vfs is used to quickly check if any VFs are configured * ice_get_num_vfs is used to get an exact count of how many VFs are configured We can drop the old ice_validate_vf_id function, since every caller was just going to immediately access the VF table to get a reference anyways. This way we simply use the single ice_get_vf_by_id to both validate the VF ID is within range and that there exists a VF with that ID. This change enables us to more easily convert the codebase to the hash table since most callers now properly use the interface. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: factor VF variables to separate structureJacob Keller7-68/+83
We maintain a number of values for VFs within the ice_pf structure. This includes the VF table, the number of allocated VFs, the maximum number of supported SR-IOV VFs, the number of queue pairs per VF, the number of MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events. We're about to add a few more variables to this list. Clean this up first by extracting these members out into a new ice_vfs structure defined in ice_virtchnl_pf.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: convert ice_for_each_vf to include VF entry iteratorJacob Keller8-152/+163
The ice_for_each_vf macro is intended to be used to loop over all VFs. The current implementation relies on an iterator that is the index into the VF array in the PF structure. This forces all users to perform a look up themselves. This abstraction forces a lot of duplicate work on callers and leaks the interface implementation to the caller. Replace this with an implementation that includes the VF pointer the primary iterator. This version simplifies callers which just want to iterate over every VF, as they no longer need to perform their own lookup. The "i" iterator value is replaced with a new unsigned int "bkt" parameter, as this will match the necessary interface for replacing the VF array with a hash table. For now, the bkt is the VF ID, but in the future it will simply be the hash bucket index. Document that it should not be treated as a VF ID. This change aims to simplify switching from the array to a hash table. I considered alternative implementations such as an xarray but decided that the hash table was the simplest and most suitable implementation. I also looked at methods to hide the bkt iterator entirely, but I couldn't come up with a feasible solution that worked for hash table iterators. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: use ice_for_each_vf for iteration during removalJacob Keller1-5/+4
When removing VFs, the driver takes a weird approach of assigning pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary variable. This logic has been in the driver for a long time, and seems to have been carried forward from i40e. We want to refactor the way VFs are stored, and iterating over the data structure without the ice_for_each_vf interface impedes this work. The logic relies on implicitly using the num_alloc_vfs as a sort of "safe guard" for accessing VF data. While this sort of guard makes sense for Single Root IOV where all VFs are added at once, the data structures don't work for VFs which can be added and removed dynamically. We also have a separate state flag, ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against concurrent removal and access. Avoid the custom tmp iteration and replace it with the standard ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until after this loop finishes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: remove checks in ice_vc_send_msg_to_vfJacob Keller1-8/+3
The ice_vc_send_msg_to_vf function is used by the PF to send a response to a VF. This function has overzealous checks to ensure its not passed a NULL VF pointer and to ensure that the passed in struct ice_vf has a valid vf_id sub-member. These checks have existed since commit 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") and function as simple sanity checks. We are planning to refactor the ice driver to use a hash table along with appropriate locks in a future refactor. This change will modify how the ice_validate_vf_id function works. Instead of a simple >= check to ensure the VF ID is between some range, it will check the hash table to see if the specified VF ID is actually in the table. This requires that the function properly lock the table to prevent race conditions. The checks may seem ok at first glance, but they don't really provide much benefit. In order for ice_vc_send_msg_to_vf to have these checks fail, the callers must either (1) pass NULL as the VF, (2) construct an invalid VF pointer manually, or (3) be using a VF pointer which becomes invalid after they obtain it properly using ice_get_vf_by_id. For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show that in most cases the functions already operate assuming their VF pointer is valid, such as by derferencing vf->pf or other members. They obtain the VF pointer by accessing the VF array using the VF ID, which can never produce a NULL value (since its a simple address operation on the array it will not be NULL. The sole exception for (1) is that ice_vc_process_vf_msg will forward a NULL VF pointer to this function as part of its goto error handler logic. This requires some minor cleanup to simply exit immediately when an invalid VF ID is detected (Rather than use the same error flow as the rest of the function). For (2), it is unexpected for a flow to construct a VF pointer manually instead of accessing the VF array. Defending against this is likely to just hide bad programming. For (3), it is definitely true that VF pointers could become invalid, for example if a thread is processing a VF message while the VF gets removed. However, the correct solution is not to add additional checks like this which do not guarantee to prevent the race. Instead we plan to solve the root of the problem by preventing the possibility entirely. This solution will require the change to a hash table with proper locking and reference counts of the VF structures. When this is done, ice_validate_vf_id will require locking of the hash table. This will be problematic because all of the callers of ice_vc_send_msg_to_vf will already have to take the lock to obtain the VF pointer anyways. With a mutex, this leads to a double lock that could hang the kernel thread. Avoid this by removing the checks which don't provide much value, so that we can safely add the necessary protections properly. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: move VFLR acknowledge during ice_free_vfsJacob Keller1-19/+8
After removing all VFs, the driver clears the VFLR indication for VFs. This has been in ice since the beginning of SR-IOV support in the ice driver. The implementation was copied from i40e, and the motivation for the VFLR indication clearing is described in the commit f7414531a0cf ("i40e: acknowledge VFLR when disabling SR-IOV") The commit explains that we need to clear the VFLR indication because the virtual function undergoes a VFLR event. If we don't indicate that it is complete it can cause an issue when VFs are re-enabled due to a "phantom" VFLR. The register block read was added under a pci_vfs_assigned check originally. This was done because we added the check after calling pci_disable_sriov. This was later moved to disable SRIOV earlier in the flow so that the VF drivers could be torn down before we removed functionality. Move the VFLR acknowledge into the main loop that tears down VF resources. This avoids using the tmp value for iterating over VFs multiple times. The result will make it easier to refactor the VF array in a future change. It's possible we might want to modify this flow to also stop checking pci_vfs_assigned. However, it seems reasonable to keep this change: we should only clear the VFLR if we actually disabled SR-IOV. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: move clear_malvf call in ice_free_vfsJacob Keller1-7/+6
The ice_mbx_clear_malvf function is used to clear the indication and count of how many times a VF was detected as malicious. During ice_free_vfs, we use this function to ensure that all removed VFs are reset to a clean state. The call currently is done at the end of ice_free_vfs() using a tmp value to iterate over all of the entries in the bitmap. This separate iteration using tmp is problematic for a planned refactor of the VF array data structure. To avoid this, lets move the call slightly higher into the function inside the loop where we teardown all of the VFs. This avoids one use of the tmp value used for iteration. We'll fix the other user in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: pass num_vfs to ice_set_per_vf_res()Jacob Keller1-61/+26
We are planning to replace the simple array structure tracking VFs with a hash table. This change will also remove the "num_alloc_vfs" variable. Instead, new access functions to use the hash table as the source of truth will be introduced. These will generally be equivalent to existing checks, except during VF initialization. Specifically, ice_set_per_vf_res() cannot use the hash table as it will be operating prior to VF structures being inserted into the hash table. Instead of using pf->num_alloc_vfs, simply pass the num_vfs value in from the caller. Note that a sub-function of ice_set_per_vf_res, ice_determine_res, also implicitly depends on pf->num_alloc_vfs. Replace ice_determine_res with a simpler inline implementation based on rounddown_pow_of_two. Note that we must explicitly check that the argument is non-zero since it does not play well with zero as a value. Instead of using the function and while loop, simply calculate the number of queues we have available by dividing by num_vfs. Check if the desired queues are available. If not, round down to the nearest power of 2 that fits within our available queues. This matches the behavior of ice_determine_res but is easier to follow as simple in-line logic. Remove ice_determine_res entirely. With this change, we no longer depend on the pf->num_alloc_vfs during the initialization phase of VFs. This will allow us to safely remove it in a future planned refactor of the VF data structures. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: store VF pointer instead of VF IDJacob Keller10-95/+142
The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated with the VFs. This API was designed with the idea that VFs are stored in a simple array that was expected to be static throughout most of the driver's life. We plan on refactoring VF storage in a few key ways: 1) converting from a simple static array to a hash table 2) using krefs to track VF references obtained from the hash table 3) use RCU to delay release of VF memory until after all references are dropped This is motivated by the goal to ensure that the lifetime of VF structures is accounted for, and prevent various use-after-free bugs. With the existing vsi->vf_id, the reference tracking for VFs would become somewhat convoluted, because each VSI maintains a vf_id field which will then require performing a look up. This means all these flows will require reference tracking and proper usage of rcu_read_lock, etc. We know that the VF VSI will always be backed by a valid VF structure, because the VSI is created during VF initialization and removed before the VF is destroyed. Rely on this and store a reference to the VF in the VSI structure instead of storing a VF ID. This will simplify the usage and avoid the need to perform lookups on the hash table in the future. For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a vsi->vf pointer is valid when dealing with VF VSIs. This will aid in debugging code which violates this assumption and avoid more disastrous panics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: refactor unwind cleanup in eswitch modeJacob Keller2-55/+53
The code for supporting eswitch mode and port representors on VFs uses an unwind based cleanup flow when handling errors. These flows are used to cleanup and get everything back to the state prior to attempting to switch from legacy to representor mode or back. The unwind iterations make sense, but complicate a plan to refactor the VF array structure. In the future we won't have a clean method of reversing an iteration of the VFs. Instead, we can change the cleanup flow to just iterate over all VF structures and clean up appropriately. First notice that ice_repr_add_for_all_vfs and ice_repr_rem_from_all_vfs have an additional step of re-assigning the VC ops. There is no good reason to do this outside of ice_repr_add and ice_repr_rem. It can simply be done as the last step of these functions. Second, make sure ice_repr_rem is safe to call on a VF which does not have a representor. Check if vf->repr is NULL first and exit early if so. Move ice_repr_rem_from_all_vfs above ice_repr_add_for_all_vfs so that we can call it from the cleanup function. In ice_eswitch.c, replace the unwind iteration with a call to ice_eswitch_release_reprs. This will go through all of the VFs and revert the VF back to the standard model without the eswitch mode. To make this safe, ensure this function checks whether or not the represent or has been moved. Rely on the metadata destination in vf->repr->dst. This must be NULL if the representor has not been moved to eswitch mode. Ensure that we always re-assign this value back to NULL after freeing it, and move the ice_eswitch_release_reprs so that it can be called from the setup function. With these changes, eswitch cleanup no longer uses an unwind flow that is problematic for the planned VF data structure change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()Maciej Fijalkowski1-2/+4
Commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") addressed the ring transient state when MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the interface to through down/up. Maurice reported that when carrier is not ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU cycles due to the constant NAPI rescheduling as ixgbe_poll() states that there is still some work to be done. To fix this, do not set work_done to false for a !netif_carrier_ok(). Fixes: c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") Reported-by: Maurice Baijens <maurice.baijens@ellips.com> Tested-by: Maurice Baijens <maurice.baijens@ellips.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03ice: add TTY for GNSS module for E810T deviceKarol Kolacinski11-1/+566
Add a new ice_gnss.c file for holding the basic GNSS module functions. If the device supports GNSS module, call the new ice_gnss_init and ice_gnss_release functions where appropriate. Implement basic functionality for reading the data from GNSS module using TTY device. Add I2C read AQ command. It is now required for controlling the external physical connectors via external I2C port expander on E810-T adapters. Future changes will introduce write functionality. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01iavf: Remove non-inclusive languageMateusz Palczewski3-4/+4
Remove non-inclusive language from the iavf driver. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-01iavf: Fix incorrect use of assigning iavf_status to intMateusz Palczewski1-92/+68
Currently there are functions in iavf_virtchnl.c for polling specific virtchnl receive events. These are all assigning iavf_status values to int values. Fix this and explicitly assign int values if iavf_status is not IAVF_SUCCESS. Also, refactor a small amount of duplicated code that can be reused by all of the previously mentioned functions. Finally, fix some spacing errors for variable assignment and get rid of all the goto statements in the refactored functions for clarity. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-01iavf: stop leaking iavf_status as "errno" valuesMateusz Palczewski3-39/+157
Several functions in the iAVF core files take status values of the enum iavf_status and convert them into integer values. This leads to confusion as functions return both Linux errno values and status codes intermixed. Reporting status codes as if they were "errno" values can lead to confusion when reviewing error logs. Additionally, it can lead to unexpected behavior if a return value is not interpreted properly. Fix this by introducing iavf_status_to_errno, a switch that explicitly converts from the status codes into an appropriate error value. Also introduce a virtchnl_status_to_errno function for the one case where we were returning both virtchnl status codes and iavf_status codes in the same function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-01iavf: remove redundant ret variableMinghao Chi1-7/+2
Return value directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>