summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2-14/+19
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14idpf: add SRIOV support and other ndo_opsJoshua Hay8-3/+953
Add support for SRIOV: send the requested number of VFs to the device Control Plane, via the virtchnl message and then enable the VFs using 'pci_enable_sriov'. Add other ndo ops supported by the driver such as features_check, set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64, set_features, and tx_timeout. Initialize the statistics task which requests the queue related statistics to the CP. Add loopback and promiscuous mode support and the respective virtchnl messages. Finally, add documentation and build support for the driver. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add ethtool callbacksAlan Brady7-3/+1851
Initialize all the ethtool ops that are supported by the driver and add the necessary support for the ethtool callbacks. Also add asynchronous link notification virtchnl support where the device Control Plane sends the link status and link speed as an asynchronous event message. Driver report the link speed on ethtool .idpf_get_link_ksettings query. Introduce soft reset function which is used by some of the ethtool callbacks such as .set_channels, .set_ringparam etc. to change the existing queue configuration. It deletes the existing queues by sending delete queues virtchnl message to the CP and calls the 'vport_stop' flow which disables the queues, vport etc. New set of queues are requested to the CP and reconfigure the queue context by calling the 'vport_open' flow. Soft reset flow also adjusts the number of vectors associated to a vport if .set_channels is called. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add singleq start_xmit and napi pollJoshua Hay8-31/+1290
Add the start_xmit, TX and RX napi poll support for the single queue model. Unlike split queue model, single queue uses same queue to post buffer descriptors and completed descriptors. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add RX splitq napi poll supportAlan Brady4-6/+892
Add support to handle interrupts for the RX completion queue and RX buffer queue. When the interrupt fires on RX completion queue, process the RX descriptors that are received. Allocate and prepare the SKB with the RX packet info, for both data and header buffer. IDPF uses software maintained refill queues to manage buffers between RX queue producer and the buffer queue consumer. They are required in order to maintain a lockless buffer management system and are strictly software only constructs. Instead of updating the RX buffer queue tail with available buffers right after the clean routine, it posts the buffer ids to the refill queues, only to post them to the HW later. If the generic receive offload (GRO) is enabled in the capabilities and turned on by default or via ethtool, then HW performs the packet coalescing if certain criteria are met by the incoming packets and updates the RX descriptor. Similar to GRO, if generic checksum is enabled, HW computes the checksum and updates the respective fields in the descriptor. Add support to update the SKB fields with the GRO and the generic checksum received. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add TX splitq napi poll supportJoshua Hay6-5/+926
Add support to handle the interrupts for the TX completion queue and process the various completion types. In the flow scheduling mode, the driver processes primarily buffer completions as well as descriptor completions occasionally. This mode supports out of order TX completions. To do so, HW generates one buffer completion per packet. Each of those completions contains the unique tag provided during the TX encoding which is used to locate the packet either on the TX buffer ring or in a hash table. The hash table is used to track TX buffer information so the descriptor(s) for a given packet can be reused while the driver is still waiting on the buffer completion(s). Packets end up in the hash table in one of 2 ways: 1) a packet was stashed during descriptor completion cleaning, or 2) because an out of order buffer completion was processed. A descriptor completion arrives only every so often and is primarily used to guarantee the TX descriptor ring can be reused without having to wait on the individual buffer completions. E.g. a descriptor completion for N+16 guarantees HW read all of the descriptors for packets N through N+15, therefore all of the buffers for packets N through N+15 are stashed into the hash table and the descriptors can be reused for more TX packets. Similarly, a packet can be stashed in the hash table because an out an order buffer completion was processed. E.g. processing a buffer completion for packet N+3 implies that HW read all of the descriptors for packets N through N+3 and they can be reused. However, the HW did not do the DMA yet. The buffers for packets N through N+2 cannot be freed, so they are stashed in the hash table. In either case, the buffer completions will eventually be processed for all of the stashed packets, and all of the buffers will be cleaned from the hash table. In queue based scheduling mode, the driver processes primarily descriptor completions and cleans the TX ring the conventional way. Finally, the driver triggers a TX queue drain after sending the disable queues virtchnl message. When the HW completes the queue draining, it sends the driver a queue marker packet completion. The driver determines when all TX queues have been drained and proceeds with the disable flow. With this, the driver can send TX packets and clean up the resources properly. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add splitq start_xmitJoshua Hay5-1/+1107
Add start_xmit support for split queue model. To start with, add the necessary checks to linearize the skb if it uses more number of buffers than the hardware supported limit. Stop the transmit queue if there are no enough descriptors available for the skb to use or if there we're going to potentially overrun the completion queue. Finally prepare the descriptor with all the required information and update the tail. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: initialize interrupts and enable vportPavan Kumar Linga10-3/+1389
To further continue 'vport open', initialize all the resources required for the interrupts. To start with, initialize the queue vector indices with the ones received from the device Control Plane. Now that all the TX and RX queues are initialized, map the RX descriptor and buffer queues as well as TX completion queues to the allocated vectors. Initialize and enable the napi handler for the napi polling. Finally, request the IRQs for the interrupt vectors from the stack and setup the interrupt handler. Once the interrupt init is done, send 'map queue vector', 'enable queues' and 'enable vport' virtchnl messages to the CP to complete the 'vport open' flow. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: configure resources for RX queuesAlan Brady8-12/+1705
Similar to the TX, RX also supports both single and split queue models. In single queue model, the same descriptor queue is used by SW to post buffer descriptors to HW and by HW to post completed descriptors to SW. In split queue model, "RX buffer queues" are used to pass descriptor buffers from SW to HW whereas "RX queues" are used to post the descriptor completions i.e. descriptors that point to completed buffers, from HW to SW. "RX queue group" is a set of RX queues grouped together and will be serviced by a "RX buffer queue group". IDPF supports 2 buffer queues i.e. large buffer (4KB) queue and small buffer (2KB) queue per buffer queue group. HW uses large buffers for 'hardware gro' feature and also if the packet size is more than 2KB, if not 2KB buffers are used. Add all the resources required for the RX queues initialization. Allocate memory for the RX queue and RX buffer queue groups. Initialize the software maintained refill queues for buffer management algorithm. Same like the TX queues, initialize the queue parameters for the RX queues and send the config RX queue virtchnl message to the device Control Plane. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: configure resources for TX queuesAlan Brady6-0/+1410
IDPF supports two queue models i.e. single queue which is a traditional queueing model as well as split queue model. In single queue model, the same descriptor queue is used by SW to post descriptors to the HW, HW to post completed descriptors to SW. In split queue model, "TX Queues" are used to pass buffers from SW to HW and "TX Completion Queues" are used to post descriptor completions from HW to SW. Device supports asymmetric ratio of TX queues to TX completion queues. Considering this, queue group mechanism is used i.e. some TX queues are grouped together which will be serviced by only one TX completion queue per TX queue group. Add all the resources required for the TX queues initialization. To start with, allocate memory for the TX queue groups, TX queues and TX completion queues. Then, allocate the descriptors for both TX and TX completion queues, and bookkeeping buffers for TX queues alone. Also, allocate queue vectors for the vport and initialize the TX queue related fields for each queue vector. Initialize the queue parameters such as q_id, q_type and tail register offset with the info received from the device control plane (CP). Once all the TX queues are configured, send config TX queue virtchnl message to the CP with all the TX queue context information. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add ptypes and MAC filter supportPavan Kumar Linga4-1/+837
Add the virtchnl support to request the packet types. Parse the responses received from CP and based on the protocol headers, populate the packet type structure with necessary information. Initialize the MAC address and add the virtchnl support to add and del MAC address. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add create vport and netdev configurationPavan Kumar Linga7-20/+1522
Add the required support to create a vport by spawning the init task. Once the vport is created, initialize and allocate the resources needed for it. Configure and register a netdev for each vport with all the features supported by the device based on the capabilities received from the device Control Plane. Spawn the init task till all the default vports are created. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add core init and interrupt requestPavan Kumar Linga9-3/+1471
As the mailbox is setup, add the necessary send and receive mailbox message framework to support the virtchnl communication between the driver and device Control Plane (CP). Add the core initialization. To start with, driver confirms the virtchnl version with the CP. Once that is done, it requests and gets the required capabilities and resources needed such as max vectors, queues etc. Based on the vector information received in 'VIRTCHNL2_OP_GET_CAPS', request the stack to allocate the required vectors. Finally add the interrupt handling mechanism for the mailbox queue and enable the interrupt. Note: Checkpatch issues a warning about IDPF_FOREACH_VPORT_VC_STATE and IDPF_GEN_STRING being complex macros and should be enclosed in parentheses but it's not the case. They are never used as a statement and instead only used to define the enum and array. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add controlq init and reset checksJoshua Hay14-3/+1851
At the end of the probe, initialize and schedule the event workqueue. It calls the hard reset function where reset checks are done to find if the device is out of the reset. Control queue initialization and the necessary control queue support is added. Introduce function pointers for the register operations which are different between PF and VF devices. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add module register and probe functionalityPhani Burra5-0/+193
Add the required support to register IDPF PCI driver, as well as probe and remove call backs. Enable the PCI device and request the kernel to reserve the memory resources that will be used by the driver. Finally map the BAR0 address space. Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14virtchnl: add virtchnl version 2 opsPavan Kumar Linga2-0/+1724
Virtchnl version 1 is an interface used by the current generation of foundational NICs to negotiate the capabilities and configure the HW resources such as queues, vectors, RSS LUT, etc between the PF and VF drivers. It is not extensible to enable new features supported in the next generation of NICs/IPUs and to negotiate descriptor types, packet types and register offsets. To overcome the limitations of the existing interface, introduce the virtchnl version 2 and add the necessary opcodes, structures, definitions, and descriptor formats. The driver also learns the data queue and other register offsets to use instead of hardcoding them. The advantage of this approach is that it gives the flexibility to modify the register offsets if needed, restrict the use of certain descriptor types and negotiate the supported packet types. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13iavf: Add ability to turn off CRC stripping for VFNorbert Zulinski3-1/+64
Previously CRC stripping was always enabled for VF. Now it is possible to turn off CRC stripping via ethtool: #ethtool -K <interface> rx-fcs on To turn off CRC stripping, first VLAN stripping must be disabled: #ethtool -K <interface> rx-vlan-offload off if any VLAN interfaces exists, otherwise VLAN stripping will be turned off by the driver. In iavf_configure_queues add check if CRC stripping is enabled for VF, if it's enabled then set crc_disabled to false on every VF's queue. In iavf_set_features add check if CRC stripping setting was changed then schedule reset. Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13ice: Check CRC strip requirement for VLAN stripHaiyue Wang2-9/+58
When VLAN strip is enabled, the CRC strip must not be disabled. And when the CRC strip is disabled, the VLAN strip should not be enabled. The driver needs to check CRC strip disable setting parameter before configuring the Rx/Tx queues, otherwise, in current error handling, the already set Tx queue context doesn't roll back correctly, it will cause the Tx queue setup failure next time: "Failed to set LAN Tx queue context" Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13ice: Support FCS/CRC strip disable for VFHaiyue Wang1-0/+15
To support CRC strip enable/disable functionality, VF needs the explicit request VIRTCHNL_VF_OFFLOAD_CRC offload. Then according to crc_disable flag of Rx queue configuration information to set up the queue context. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13igb: clean up in all error paths when enabling SR-IOVCorinna Vinschen1-1/+4
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing the igb module could hang or crash (depending on the machine) when the module has been loaded with the max_vfs parameter set to some value != 0. In case of one test machine with a dual port 82580, this hang occurred: [ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1 [ 233.093257] igb 0000:41:00.1: IOV Disabled [ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0 [ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000 [ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First) [ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000 [ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First) [ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback) [ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0 [ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed [ 234.157244] igb 0000:41:00.0: IOV Disabled [ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds. [ 371.627489] Not tainted 6.4.0-dirty #2 [ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this. [ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0 [ 371.650330] Call Trace: [ 371.653061] <TASK> [ 371.655407] __schedule+0x20e/0x660 [ 371.659313] schedule+0x5a/0xd0 [ 371.662824] schedule_preempt_disabled+0x11/0x20 [ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0 [ 371.673237] ? __pfx_aer_root_reset+0x10/0x10 [ 371.678105] report_error_detected+0x25/0x1c0 [ 371.682974] ? __pfx_report_normal_detected+0x10/0x10 [ 371.688618] pci_walk_bus+0x72/0x90 [ 371.692519] pcie_do_recovery+0xb2/0x330 [ 371.696899] aer_process_err_devices+0x117/0x170 [ 371.702055] aer_isr+0x1c0/0x1e0 [ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0 [ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10 [ 371.715496] irq_thread_fn+0x20/0x60 [ 371.719491] irq_thread+0xe6/0x1b0 [ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10 [ 371.728255] ? __pfx_irq_thread+0x10/0x10 [ 371.732731] kthread+0xe2/0x110 [ 371.736243] ? __pfx_kthread+0x10/0x10 [ 371.740430] ret_from_fork+0x2c/0x50 [ 371.744428] </TASK> The reproducer was a simple script: #!/bin/sh for i in `seq 1 5`; do modprobe -rv igb modprobe -v igb max_vfs=1 sleep 1 modprobe -rv igb done It turned out that this could only be reproduce on 82580 (quad and dual-port), but not on 82576, i350 and i210. Further debugging showed that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because dev->is_physfn is 0 on 82580. Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), igb_enable_sriov() jumped into the "err_out" cleanup branch. After this commit it only returned the error code. So the cleanup didn't take place, and the incorrect VF setup in the igb_adapter structure fooled the igb driver into assuming that VFs have been set up where no VF actually existed. Fix this problem by cleaning up again if pci_enable_sriov() fails. Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13ixgbe: fix timestamp configuration codeVadim Fedorenko1-13/+15
The commit in fixes introduced flags to control the status of hardware configuration while processing packets. At the same time another structure is used to provide configuration of timestamper to user-space applications. The way it was coded makes this structures go out of sync easily. The repro is easy for 82599 chips: [root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1 current settings: tx_type 0 rx_filter 0 new settings: tx_type 1 rx_filter 12 The eth0 device is properly configured to timestamp any PTPv2 events. [root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1 current settings: tx_type 1 rx_filter 12 SIOCSHWTSTAMP failed: Numerical result out of range The requested time stamping mode is not supported by the hardware. The error is properly returned because HW doesn't support all packets timestamping. But the adapter->flags is cleared of timestamp flags even though no HW configuration was done. From that point no RX timestamps are received by user-space application. But configuration shows good values: [root@hostname ~]# hwstamp_ctl -i eth0 current settings: tx_type 1 rx_filter 12 Fix the issue by applying new flags only when the HW was actually configured. Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11iavf: Fix promiscuous mode configuration flow messagesBrett Creeley3-60/+74
Currently when configuring promiscuous mode on the AVF we detect a change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to determine whether or not we need to request/release promiscuous mode and/or multicast promiscuous mode. The problem is that the AQ calls for setting/clearing promiscuous/multicast mode are treated separately. This leads to a case where we can trigger two promiscuous mode AQ calls in a row with the incorrect state. To fix this make a few changes. Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags. In iavf_set_rx_mode() detect if there is a change in the netdev->flags in comparison with adapter->flags and set the IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE and call iavf_set_promiscuous() if it's set. In iavf_set_promiscuous() check again to see which (if any) promiscuous mode bits have changed when comparing the netdev->flags with the adapter->flags. Use this to set the flags which get sent to the PF driver. Add a spinlock that is used for updating current_netdev_promisc_flags and only allows one promiscuous mode AQ at a time. [1] Fixes the fact that we will only have one AQ call in the aq_required queue at any one time. [2] Streamlines the change in promiscuous mode to only set one AQ required bit. [3] This allows us to keep track of the current state of the flags and also makes it so we can take the most recent netdev->flags promiscuous mode state. [4] This fixes the problem where a change in the netdev->flags can cause IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(), but cleared in iavf_set_promiscuous() before the change is ever made via AQ call. Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11i40e: fix potential memory leaks in i40e_remove()Andrii Staikov1-3/+7
Instead of freeing memory of a single VSI, make sure the memory for all VSIs is cleared before releasing VSIs. Add releasing of their resources in a loop with the iteration number equal to the number of allocated VSIs. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05igb: Change IGB_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igb devices can use as low as 64 descriptors. This change will unify igb with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igbvf devices can use as low as 64 descriptors. This change will unify igbvf with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05igc: Change IGC_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igc devices can use as low as 64 descriptors. This change will unify igc with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-03igb: disable virtualization features on 82580Corinna Vinschen1-2/+3
Disable virtualization features on 82580 just as on i210/i211. This avoids that virt functions are acidentally called on 82850. Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni3-10/+59
Merge in late fixes to prepare for the 6.6 net-next PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-28igb: set max size RX buffer when store bad packet is enabledRadoslaw Tyl1-4/+7
Increase the RX buffer size to 3K when the SBP bit is on. The size of the RX buffer determines the number of pages allocated which may not be sufficient for receive frames larger than the set MTU size. Cc: stable@vger.kernel.org Fixes: 89eaefb61dc9 ("igb: Support RX-ALL feature flag.") Reported-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com> Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25ice: avoid executing commands on other ports when driving syncJacob Keller2-6/+52
The ice hardware has a synchronization mechanism used to drive the simultaneous application of commands on both PHY ports and the source timer in the MAC. When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will simultaneously apply the commands programmed for the main timer and each PHY port. Neither the main timer command register, nor the PHY port command registers auto clear on command execution. During the execution of a timer command intended for a single port on E822 devices, such as those used to configure a PHY during link up, the driver is not correctly clearing the previous commands. This results in unintentionally executing the last programmed command on the main timer and other PHY ports whenever performing reconfiguration on E822 ports after link up. This results in unintended side effects on other timers, depending on what command was previously programmed. To fix this, the driver must ensure that the main timer and all other PHY ports are properly initialized to perform no action. The enumeration for timer commands does not include an enumeration value for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a timer command to hardware, leave the command bits set to zero which indicates that no operation should be performed on that port. Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports other than the one being configured, write their timer command register to ICE_PTP_NOP. This ensures that no side effect happens on the timer command. To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always initialize all other ports to ICE_PTP_NOP. This ensures that no side effects happen on the other ports. Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in ice_sync_phy_timer_e822() and ice_start_phy_timer_e822(). With both of these changes, the driver should no longer execute a stale command on the main timer or another PHY port when reconfiguring one of the PHY ports after link up. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-24e1000e: Add support for the next LOM generationSasha Neftin5-0/+17
Add devices IDs for the next LOM generations that will be available on the next Intel Client platforms. This patch provides the initial support for these devices. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Decrease PTM short interval from 10 us to 1 usSasha Neftin1-1/+1
With the 10us interval, we were seeing PTM transactions take around 12us. Hardware team suggested this interval could be lowered to 1us which was confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took around 2us. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Add support for multiple in-flight TX timestampsVinicius Costa Gomes6-63/+192
Add support for using the four sets of timestamping registers that i225/i226 have available for TX. In some workloads, where multiple applications request hardware transmission timestamps, it was possible that some of those requests were denied because the only in use register was already occupied. This is also in preparation to future support for hardware timestamping with multiple PTP domains. With multiple domains chances of multiple TX timestamps being requested at the same time increase. Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% +1 +41 +73 13 1500 150 0.00% 0.00% 0.00% 100.00% +9 +49 +87 15 2250 225 0.00% 0.00% 0.00% 100.00% +9 +42 +79 13 3375 337 0.00% 0.00% 0.00% 100.00% +11 +46 +81 13 5062 506 0.00% 0.00% 0.00% 100.00% +7 +44 +80 13 7593 759 0.00% 0.00% 0.00% 100.00% +9 +44 +79 12 11389 1138 0.00% 0.00% 0.00% 100.00% +14 +51 +87 13 17083 1708 0.00% 0.00% 0.00% 100.00% +1 +41 +80 14 25624 2562 0.00% 0.00% 0.00% 100.00% +11 +50 +5107 51 38436 3843 0.00% 0.00% 0.00% 100.00% -2 +36 +7843 38 57654 5765 0.00% 0.00% 0.00% 100.00% +4 +42 +10503 69 86481 8648 0.00% 0.00% 0.00% 100.00% +11 +54 +5492 65 129721 12972 0.00% 0.00% 0.00% 100.00% +31 +2680 +6942 2606 194581 16384 16.79% 0.00% 0.87% 82.34% +73 +4444 +15879 3116 291871 16384 35.05% 0.00% 1.53% 63.42% +188 +5381 +17019 3035 437806 16384 54.95% 0.00% 2.55% 42.50% +233 +6302 +13885 2846 After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -20 +12 +43 13 1500 150 0.00% 0.00% 0.00% 100.00% -23 +18 +57 14 2250 225 0.00% 0.00% 0.00% 100.00% -2 +33 +67 13 3375 337 0.00% 0.00% 0.00% 100.00% +1 +38 +76 13 5062 506 0.00% 0.00% 0.00% 100.00% +9 +52 +93 14 7593 759 0.00% 0.00% 0.00% 100.00% +11 +47 +82 13 11389 1138 0.00% 0.00% 0.00% 100.00% -9 +27 +74 13 17083 1708 0.00% 0.00% 0.00% 100.00% -13 +25 +66 14 25624 2562 0.00% 0.00% 0.00% 100.00% -8 +28 +65 13 38436 3843 0.00% 0.00% 0.00% 100.00% -13 +28 +69 13 57654 5765 0.00% 0.00% 0.00% 100.00% -11 +32 +71 14 86481 8648 0.00% 0.00% 0.00% 100.00% +2 +44 +83 14 129721 12972 15.36% 0.00% 0.35% 84.29% -2 +2248 +22907 4252 194581 16384 42.98% 0.00% 1.98% 55.04% -4 +5278 +65039 5856 291871 16384 54.33% 0.00% 2.21% 43.46% -3 +6306 +22608 5665 We can see that with 4 registers, as expected, we are able to handle a increasing number of requests more consistently, but as soon as all registers are in use, the decrease in quality of service happens in a sharp step. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski8-48/+30
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc842de ("ipv4: fix data-races around inet->inet_id") c274af224269 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8def3 ("bonding: fix macvlan over alb bond support") f11e5bd159b0 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc842de ("ipv4: fix data-races around inet->inet_id") b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()Andrii Staikov1-2/+3
Add check for pf->vf not being NULL before dereferencing pf->vf[vsi->vf_id] in updating VSI filter sync. Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted in the condition for clearing promisc mode bit. Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23igc: Fix the typo in the PTM Control macroSasha Neftin1-1/+1
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM requests. The bit resolution of this field is six bits. That bit five was missing in the mask. This patch comes to correct the typo in the IGC_PTM_CTRL_SHRT_CYC macro. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23igb: Avoid starting unnecessary workqueuesAlessio Igor Bogani1-12/+12
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting PTP related workqueues. In this way we can fix this: BUG: unable to handle page fault for address: ffffc9000440b6f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0 Oops: 0000 [#1] PREEMPT SMP [...] Workqueue: events igb_ptp_overflow_check RIP: 0010:igb_rd32+0x1f/0x60 [...] Call Trace: igb_ptp_read_82580+0x20/0x50 timecounter_read+0x15/0x60 igb_ptp_overflow_check+0x1a/0x50 process_one_work+0x1cb/0x3c0 worker_thread+0x53/0x3f0 ? rescuer_thread+0x370/0x370 kthread+0x142/0x160 ? kthread_associate_blkcg+0xc0/0xc0 ret_from_fork+0x1f/0x30 Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.") Fixes: d339b1331616 ("igb: add PTP Hardware Clock code") Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21ice: Fix NULL pointer deref during VF resetPetr Oros1-7/+8
During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a NULL pointer dereference in ice_reset_vf that VF's VSI is null. More than one instance of ice_reset_vf() can be running at a given time. When we rebuild the VSI in ice_reset_vf, another reset can be triaged from ice_service_task. In this case we can access the currently uninitialized VSI and cause panic. The window for this racing condition has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because the reset runs faster. ice_reset_vf() using vf->cfg_lock and when we move this lock before accessing to the VF VSI, we can fix BUG for all cases. Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes in ice_vsi_stop_all_rx_rings() With our reproducer, we can hit BUG: ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). After this fix we are not able to reproduce it after ~48h There was commit cf90b74341ee ("ice: Fix call trace with null VSI during VF reset") which also tried to fix this issue, but it was only partially resolved and the bug still exists. [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6420.665382] #PF: supervisor read access in kernel mode [ 6420.670521] #PF: error_code(0x0000) - not-present page [ 6420.675659] PGD 0 [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1 [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022 [ 6420.698729] Workqueue: ice ice_service_task [ice] [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice] [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83 [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246 [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000 [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828 [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000 [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0 [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000 [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0 [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002) [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6420.811841] PKRU: 55555554 [ 6420.811842] Call Trace: [ 6420.811843] <TASK> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice] [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice] [ 6420.841343] ice_service_task+0x23b/0x600 [ice] [ 6420.845884] ? __schedule+0x212/0x550 [ 6420.849550] process_one_work+0x1e2/0x3b0 [ 6420.853563] ? rescuer_thread+0x390/0x390 [ 6420.857577] worker_thread+0x50/0x3a0 [ 6420.861242] ? rescuer_thread+0x390/0x390 [ 6420.865253] kthread+0xdd/0x100 [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20 [ 6420.873194] ret_from_fork+0x1f/0x30 [ 6420.876774] </TASK> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk r Fixes: efe41860008e ("ice: Fix memory corruption in VF driver") Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21Revert "ice: Fix ice VF reset during iavf initialization"Petr Oros4-25/+4
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc. After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems. Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21ice: fix receive buffer size miscalculationJesse Brandeburg1-1/+2
The driver is misconfiguring the hardware for some values of MTU such that it could use multiple descriptors to receive a packet when it could have simply used one. Change the driver to use a round-up instead of the result of a shift, as the shift can truncate the lower bits of the size, and result in the problem noted above. It also aligns this driver with similar code in i40e. The insidiousness of this problem is that everything works with the wrong size, it's just not working as well as it could, as some MTU sizes end up using two or more descriptors, and there is no way to tell that is happening without looking at ice_trace or a bus analyzer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-19Merge branch '100GbE' of ↵Jakub Kicinski19-735/+624
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-17 (ice) This series contains updates to ice driver only. Jan removes unused functions and refactors code to make, possible, functions static. Jake rearranges some functions to be logically grouped. Marcin removes an unnecessary call to disable VLAN stripping. Yang Yingliang utilizes list_for_each_entry() helper for a couple list traversals. Przemek removes some parameters from ice_aq_alloc_free_res() which were always the same and reworks ice_aq_wait_for_event() to reduce chance of race. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: split ice_aq_wait_for_event() func into two ice: embed &ice_rq_event_info event into struct ice_aq_task ice: ice_aq_check_events: fix off-by-one check when filling buffer ice: drop two params from ice_aq_alloc_free_res() ice: use list_for_each_entry() helper ice: Remove redundant VSI configuration in eswitch setup ice: move E810T functions to before device agnostic ones ice: refactor ice_vsi_is_vlan_pruning_ena ice: refactor ice_ptp_hw to make functions static ice: refactor ice_sched to make functions static ice: Utilize assign_bit() helper ice: refactor ice_vf_lib to make functions static ice: refactor ice_lib to make functions static ice: refactor ice_ddp to make functions static ice: remove unused methods ==================== Link: https://lore.kernel.org/r/20230817212239.2601543-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-19Merge branch '40GbE' of ↵Jakub Kicinski6-56/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== virtchnl: fix fake 1-elem arrays Alexander Lobakin says: 6.5-rc1 started spitting warning splats when composing virtchnl messages, precisely on virtchnl_rss_key and virtchnl_lut: [ 84.167709] memcpy: detected field-spanning write (size 52) of single field "vrk->key" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1095 (size 1) [ 84.169915] WARNING: CPU: 3 PID: 11 at drivers/net/ethernet/intel/ iavf/iavf_virtchnl.c:1095 iavf_set_rss_key+0x123/0x140 [iavf] ... [ 84.191982] Call Trace: [ 84.192439] <TASK> [ 84.192900] ? __warn+0xc9/0x1a0 [ 84.193353] ? iavf_set_rss_key+0x123/0x140 [iavf] [ 84.193818] ? report_bug+0x12c/0x1b0 [ 84.194266] ? handle_bug+0x42/0x70 [ 84.194714] ? exc_invalid_op+0x1a/0x50 [ 84.195149] ? asm_exc_invalid_op+0x1a/0x20 [ 84.195592] ? iavf_set_rss_key+0x123/0x140 [iavf] [ 84.196033] iavf_watchdog_task+0xb0c/0xe00 [iavf] ... [ 84.225476] memcpy: detected field-spanning write (size 64) of single field "vrl->lut" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1127 (size 1) [ 84.227190] WARNING: CPU: 27 PID: 1044 at drivers/net/ethernet/intel/ iavf/iavf_virtchnl.c:1127 iavf_set_rss_lut+0x123/0x140 [iavf] ... [ 84.246601] Call Trace: [ 84.247228] <TASK> [ 84.247840] ? __warn+0xc9/0x1a0 [ 84.248263] ? iavf_set_rss_lut+0x123/0x140 [iavf] [ 84.248698] ? report_bug+0x12c/0x1b0 [ 84.249122] ? handle_bug+0x42/0x70 [ 84.249549] ? exc_invalid_op+0x1a/0x50 [ 84.249970] ? asm_exc_invalid_op+0x1a/0x20 [ 84.250390] ? iavf_set_rss_lut+0x123/0x140 [iavf] [ 84.250820] iavf_watchdog_task+0xb16/0xe00 [iavf] Gustavo already tried to fix those back in 2021[0][1]. Unfortunately, a VM can run a different kernel than the host, meaning that those structures are sorta ABI. However, it is possible to have proper flex arrays + struct_size() calculations and still send the very same messages with the same sizes. The common rule is: elem[1] -> elem[] size = struct_size() + <difference between the old and the new msg size> The "old" size in the current code is calculated 3 different ways for 10 virtchnl structures total. Each commit addresses one of the ways cumulatively instead of per-structure. I was planning to send it to -net initially, but given that virtchnl was renamed from i40evf and got some fat style cleanup commits in the past, it's not very straightforward to even pick appropriate SHAs, not speaking of automatic portability. I may send manual backports for a couple of the latest supported kernels later on if anyone needs it at all. [0] https://lore.kernel.org/all/20210525230912.GA175802@embeddedor [1] https://lore.kernel.org/all/20210525231851.GA176647@embeddedor * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: virtchnl: fix fake 1-elem arrays for structures allocated as `nents` virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1` virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1 ==================== Link: https://lore.kernel.org/r/20230816210657.1326772-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-12/+104
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Merge branch '40GbE' of ↵Jakub Kicinski4-12/+93
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e) This series contains updates to iavf and i40e drivers. Piotr adds checks for unsupported Flow Director rules on iavf. Andrii replaces incorrect 'write' messaging on read operations for i40e. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix misleading debug logs iavf: fix FDIR rule fields masks validation ==================== Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17ice: split ice_aq_wait_for_event() func into twoPrzemek Kitszel3-28/+57
Mitigate race between registering on wait list and receiving AQ Response from FW. ice_aq_prep_for_event() should be called before sending AQ command, ice_aq_wait_for_event() should be called after sending AQ command, to wait for AQ Response. Please note, that this was found by reading the code, an actual race has not yet materialized. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: embed &ice_rq_event_info event into struct ice_aq_taskPrzemek Kitszel3-49/+40
Expose struct ice_aq_task to callers, what takes burden of memory ownership out from AQ-wait family of functions, and reduces need for heap-based allocations. Embed struct ice_rq_event_info event into struct ice_aq_task (instead of it being a ptr) to remove some more code from the callers. Subsequent commit will improve more based on this one. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: ice_aq_check_events: fix off-by-one check when filling bufferPrzemek Kitszel1-6/+7
Allow task's event buffer to be filled also in the case that it's size is exactly the size of the message. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: drop two params from ice_aq_alloc_free_res()Przemek Kitszel4-34/+22
Drop @num_entries and @cd params, latter of which was always NULL. Number of entities to alloc is passed in internal buffer, the outer layer (that @num_entries was assigned to) meaning is closer to "the number of requests", which was =1 in all cases. ice_free_hw_res() was always called with 1 as its @num arg. Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: use list_for_each_entry() helperYang Yingliang1-6/+2
Convert list_for_each() to list_for_each_entry() where applicable. No functional changed. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: Remove redundant VSI configuration in eswitch setupMarcin Szycik1-4/+0
Remove a call to disable VLAN stripping on switchdev control plane VSI, as it is disabled by default. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>