summaryrefslogtreecommitdiff
path: root/include/linux/mlx5
diff options
context:
space:
mode:
authorOfer Levi <oferle@mellanox.com>2020-05-17 10:16:49 +0300
committerSaeed Mahameed <saeedm@nvidia.com>2020-09-15 21:59:53 +0300
commitb7cf0806e8783e38f881cae3c56f0869e70b8da2 (patch)
treeeed5f62532eed307fc508cb9b2a3389fe9ef8896 /include/linux/mlx5
parent748cde9a3802e1ababedabbf759b3eedbaeaba52 (diff)
downloadlinux-b7cf0806e8783e38f881cae3c56f0869e70b8da2.tar.xz
net/mlx5e: Add CQE compression support for multi-strides packets
Add CQE compression support for completions of packets that span multiple strides in a Striding RQ, per the HW capability. In our memory model, we use small strides (256B as of today) for the non-linear SKB mode. This feature allows CQE compression to work also for multiple strides packets. In this case decompressing the mini CQE array will use stride index provided by HW as part of the mini CQE. Before this feature, compression was possible only for single-strided packets, i.e. for packets of size up to 256 bytes when in non-linear mode, and the index was maintained by SW. This feature is supported for ConnectX-5 and above. Feature performance test: This was whitebox-tested, we reduced the PCI speed from 125Gb/s to 62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming packets before building the SKB to achieve low cpu utilization. Outcome is low cpu utilization and bottleneck on pci only. Test setup: Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores NIC: ConnectX-6 DX. Sender side generates 300 byte packets at full pci bandwidth. Receiver side configuration: Single channel, one cpu processing with one ring allocated. Cpu utilization is ~20% while pci bandwidth is fully utilized. For the generated traffic and interface MTU of 4500B (to activate the non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps to ~21Mpps. Without this feature, counters show no CQE compression blocks for this setup, while with the feature, counters show ~20.7Mpps compressed CQEs in ~500K compression blocks. Signed-off-by: Ofer Levi <oferle@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Diffstat (limited to 'include/linux/mlx5')
-rw-r--r--include/linux/mlx5/device.h3
1 files changed, 2 insertions, 1 deletions
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 4d3376e20f5e..81ca5989009b 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -816,7 +816,7 @@ struct mlx5_mini_cqe8 {
__be32 rx_hash_result;
struct {
__be16 checksum;
- __be16 rsvd;
+ __be16 stridx;
};
struct {
__be16 wqe_counter;
@@ -836,6 +836,7 @@ enum {
enum {
MLX5_CQE_FORMAT_CSUM = 0x1,
+ MLX5_CQE_FORMAT_CSUM_STRIDX = 0x3,
};
#define MLX5_MINI_CQE_ARRAY_SIZE 8