summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
AgeCommit message (Collapse)AuthorFilesLines
2025-05-30idpf: fix a race in txq wakeupBrian Vazquez1-4/+5
Add a helper function to correctly handle the lockless synchronization when the sender needs to block. The paradigm is if (no_resources()) { stop_queue(); barrier(); if (!no_resources()) restart_queue(); } netif_subqueue_maybe_stop already handles the paradigm correctly, but the code split the check for resources in three parts, the first one (descriptors) followed the protocol, but the other two (completions and tx_buf) were only doing the first part and so race prone. Luckily netif_subqueue_maybe_stop macro already allows you to use a function to evaluate the start/stop conditions so the fix only requires the right helper function to evaluate all the conditions at once. The patch removes idpf_tx_maybe_stop_common since it's no longer needed and instead adjusts separately the conditions for singleq and splitq. Note that idpf_tx_buf_hw_update doesn't need to check for resources since that will be covered in idpf_tx_splitq_frame. To reproduce: Reduce the threshold for pending completions to increase the chances of hitting this pause by changing your kernel: drivers/net/ethernet/intel/idpf/idpf_txrx.h -#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1) +#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4) Use pktgen to force the host to push small pkts very aggressively: ./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \ -p 10000-10000 -t 16 -n 0 -v -x -c 64 Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-30idpf: assign extracted ptype to struct libeth_rqe_info fieldMateusz Polchlopek1-14/+11
Assign the ptype extracted from qword to the ptype field of struct libeth_rqe_info. Remove the now excess ptype param of idpf_rx_singleq_extract_fields(), idpf_rx_singleq_extract_base_fields() and idpf_rx_singleq_extract_flex_fields(). Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-14libeth: move idpf_rx_csum_decoded and idpf_rx_extractedMateusz Polchlopek1-24/+27
Structs idpf_rx_csum_decoded and idpf_rx_extracted are used both in idpf and iavf Intel drivers. Change the prefix from idpf_* to libeth_* and move mentioned structs to libeth's rx.h header file. Adjust usage in idpf driver. Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: enable WB_ON_ITRJoshua Hay1-1/+5
Tell hardware to write back completed descriptors even when interrupts are disabled. Otherwise, descriptors might not be written back until the hardware can flush a full cacheline of descriptors. This can cause unnecessary delays when traffic is light (or even trigger Tx queue timeout). The example scenario to reproduce the Tx timeout if the fix is not applied: - configure at least 2 Tx queues to be assigned to the same q_vector, - generate a huge Tx traffic on the first Tx queue - try to send a few packets using the second Tx queue. In such a case Tx timeout will appear on the second Tx queue because no completion descriptors are written back for that queue while interrupts are disabled due to NAPI polling. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: fix netdev Tx queue stop/wakeMichal Kubiak1-0/+4
netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: refactor Tx completion routinesJoshua Hay1-10/+14
Add a mechanism to guard against stashing partial packets into the hash table to make the driver more robust, with more efficient decision making when cleaning. Don't stash partial packets. This can happen when an RE (Report Event) completion is received in flow scheduling mode, or when an out of order RS (Report Status) completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field libeth_sqe::nr_frags to track the number of fragments/tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TxQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the Tx ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: convert to libeth Tx buffer completionAlexander Lobakin1-52/+30
&idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libeth helpers to do it properly and reduce the copy-paste around the Tx code. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: use libeth Rx buffer management for payload bufferAlexander Lobakin1-13/+14
idpf uses Page Pool for data buffers with hardcoded buffer lengths of 4k for "classic" buffers and 2k for "short" ones. This is not flexible and does not ensure optimal memory usage. Why would you need 4k buffers when the MTU is 1500? Use libeth for the data buffers and don't hardcode any buffer sizes. Let them be calculated from the MTU for "classics" and then divide the truesize by 2 for "short" ones. The memory usage is now greatly reduced and 2 buffer queues starts make sense: on frames <= 1024, you'll recycle (and resync) a page only after 4 HW writes rather than two. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: convert header split mode to libeth + napi_build_skb()Alexander Lobakin1-0/+1
Currently, idpf uses the following model for the header buffers: * buffers are allocated via dma_alloc_coherent(); * when receiving, napi_alloc_skb() is called and then the header is copied to the newly allocated linear part. This is far from optimal as DMA coherent zone is slow on many systems and memcpy() neutralizes the idea and benefits of the header split. Not speaking of that XDP can't be run on DMA coherent buffers, but at the same time the idea of allocating an skb to run XDP program is ill. Instead, use libeth to create page_pools for the header buffers, allocate them dynamically and then build an skb via napi_build_skb() around them with no memory copy. With one exception... When you enable header split, you expect you'll always have a separate header buffer, so that you could reserve headroom and tailroom only there and then use full buffers for the data. For example, this is how TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE. The current hardware running idpf does *not* guarantee that you'll always have headers placed separately. For example, on my setup, even ICMP packets are written as one piece to the data buffers. You can't build a valid skb around a data buffer in this case. To not complicate things and not lose TCP zerocopy etc., when such thing happens, use the empty header buffer and pull either full frame (if it's short) or the Ethernet header there and build an skb around it. GRO layer will pull more from the data buffer later. This W/A will hopefully be removed one day. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: reuse libeth's definitions of parsed ptype structuresAlexander Lobakin1-64/+49
idpf's in-kernel parsed ptype structure is almost identical to the one used in the previous Intel drivers, which means it can be converted to use libeth's definitions and even helpers. The only difference is that it doesn't use a constant table (libie), rather than one obtained from the device. Remove the driver counterpart and use libeth's helpers for hashes and checksums. This slightly optimizes skb fields processing due to faster checks. Also don't define big static array of ptypes in &idpf_vport -- allocate them dynamically. The pointer to it is anyway cached in &idpf_rx_queue. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: merge singleq and splitq &net_device_opsAlexander Lobakin1-29/+2
It makes no sense to have a second &net_device_ops struct (800 bytes of rodata) with only one difference in .ndo_start_xmit, which can easily be just one `if`. This `if` is a drop in the ocean and you won't see any difference. Define unified idpf_xmit_start(). The preparation for sending is the same, just call either idpf_tx_splitq_frame() or idpf_tx_singleq_frame() depending on the active model to actually map and send the skb. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: split &idpf_queue into 4 strictly-typed queue structuresAlexander Lobakin1-68/+76
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-10idpf: stop using macros for accessing queue descriptorsAlexander Lobakin1-10/+10
In C, we have structures and unions. Casting `void *` via macros is not only error-prone, but also looks confusing and awful in general. In preparation for splitting the queue structs, replace it with a union and direct array dereferences. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+0
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-26idpf: fix corrupted frames and skb leaks in singleq modeAlexander Lobakin1-1/+0
idpf_ring::skb serves only for keeping an incomplete frame between several NAPI Rx polling cycles, as one cycle may end up before processing the end of packet descriptor. The pointer is taken from the ring onto the stack before entering the loop and gets written there after the loop exits. When inside the loop, only the onstack pointer is used. For some reason, the logics is broken in the singleq mode, where the pointer is taken from the ring each iteration. This means that if a frame got fragmented into several descriptors, each fragment will have its own skb, but only the last one will be passed up the stack (containing garbage), leaving the rest leaked. Then, on ifdown, rxq::skb is being freed only in the splitq mode, while it can point to a valid skb in singleq as well. This can lead to a yet another skb leak. Just don't touch the ring skb field inside the polling loop, letting the onstack skb pointer work as expected: build a new skb if it's the first frame descriptor and attach a frag otherwise. On ifdown, free rxq::skb unconditionally if the pointer is non-NULL. Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18idpf: refactor some missing field get/prep conversionsJesse Brandeburg1-4/+3
Most of idpf correctly uses FIELD_GET and FIELD_PREP, but a couple spots were missed so fix those. Automated conversion with coccinelle script and manually fixed up, including audits for opportunities to convert to {get,encode,replace} bits functions. Add conversions to le16_get/encode/replace_bits where appropriate. And in one place fix up a cast from a u16 to a u16. @prep2@ constant shift,mask; type T; expression a; @@ -(((T)(a) << shift) & mask) +FIELD_PREP(mask, a) @prep@ constant shift,mask; type T; expression a; @@ -((T)((a) << shift) & mask) +FIELD_PREP(mask, a) @get@ constant shift,mask; type T; expression a; @@ -((T)((a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ CC: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: add singleq start_xmit and napi pollJoshua Hay1-2/+1117
Add the start_xmit, TX and RX napi poll support for the single queue model. Unlike split queue model, single queue uses same queue to post buffer descriptors and completed descriptors. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: initialize interrupts and enable vportPavan Kumar Linga1-0/+11
To further continue 'vport open', initialize all the resources required for the interrupts. To start with, initialize the queue vector indices with the ones received from the device Control Plane. Now that all the TX and RX queues are initialized, map the RX descriptor and buffer queues as well as TX completion queues to the allocated vectors. Initialize and enable the napi handler for the napi polling. Finally, request the IRQs for the interrupt vectors from the stack and setup the interrupt handler. Once the interrupt init is done, send 'map queue vector', 'enable queues' and 'enable vport' virtchnl messages to the CP to complete the 'vport open' flow. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14idpf: configure resources for RX queuesAlan Brady1-0/+57
Similar to the TX, RX also supports both single and split queue models. In single queue model, the same descriptor queue is used by SW to post buffer descriptors to HW and by HW to post completed descriptors to SW. In split queue model, "RX buffer queues" are used to pass descriptor buffers from SW to HW whereas "RX queues" are used to post the descriptor completions i.e. descriptors that point to completed buffers, from HW to SW. "RX queue group" is a set of RX queues grouped together and will be serviced by a "RX buffer queue group". IDPF supports 2 buffer queues i.e. large buffer (4KB) queue and small buffer (2KB) queue per buffer queue group. HW uses large buffers for 'hardware gro' feature and also if the packet size is more than 2KB, if not 2KB buffers are used. Add all the resources required for the RX queues initialization. Allocate memory for the RX queue and RX buffer queue groups. Initialize the software maintained refill queues for buffer management algorithm. Same like the TX queues, initialize the queue parameters for the RX queues and send the config RX queue virtchnl message to the device Control Plane. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>