summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-01octeontx2-pf: Remove xdp queues on program detachGeetha sowjanya3-7/+4
XDP queues are created/destroyed when a XDP program is attached/detached. In current driver xdp_queues are not getting destroyed on program exit due to incorrect xdp_queue and tot_tx_queue count values. This patch fixes the issue by setting tot_tx_queue and xdp_queue count to correct values. It also fixes xdp.data_hard_start address. Fixes: 06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Link: https://lore.kernel.org/r/20240130120610.16673-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01Merge branch 'ena-driver-changes'Paolo Abeni8-329/+258
David Arinzon says: ==================== ENA driver changes From: David Arinzon <darinzon@amazon.com> This patchset contains a set of minor and cosmetic changes to the ENA driver. Changes from v1: - Address comments from Shannon Nelson ==================== Link: https://lore.kernel.org/r/20240130095353.2881-1-darinzon@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Reduce lines with longer column width boundaryDavid Arinzon4-260/+151
This patch reduces some of the lines by removing newlines where more variables or print strings can be pushed back to the previous line while still adhering to the styling guidelines. Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: handle ena_calc_io_queue_size() possible errorsDavid Arinzon1-3/+21
Fail queue size calculation when the device returns maximum TX/RX queue sizes that are smaller than the allowed minimum. Signed-off-by: Osama Abboud <osamaabb@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Change default print level for netif_ printsDavid Arinzon1-2/+2
The netif_* functions are used by the driver to log events into the kernel ring (dmesg) similar to the netdev_* ones. Unlike the latter, the netif_* function family allow the user to choose what events get logged using ethtool: sudo ethtool -s [interface] msglvl [msg_type] on By default the events which get logged are slow-path related and aren't printed often (e.g. interface up related prints). This patch removes the NETIF_MSG_TX_DONE type (called every TX completion polling) from the defaults and adds NETIF_MSG_IFDOWN instead as it makes more sensible defaults. This patch also transforms ena_down() print from netif_info into netif_dbg (same as the analogue print in ena_up()) as it suits it better. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Relocate skb_tx_timestamp() to improve time stamping accuracyDavid Arinzon1-2/+2
Move skb_tx_timestamp() closer to the actual time the driver sends the packets to the device. Signed-off-by: Osama Abboud <osamaabb@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Add more information on TX timeoutsDavid Arinzon2-14/+64
The function responsible for polling TX completions might not receive the CPU resources it needs due to higher priority tasks running on the requested core. The driver might not be able to recognize such cases, but it can use its state to suspect that they happened. If both conditions are met: - napi hasn't been executed more than the TX completion timeout value - napi is scheduled (meaning that we've received an interrupt) Then it's more likely that the napi handler isn't scheduled because of an overloaded CPU. It was decided that for this case, the driver would wait twice as long as the regular timeout before scheduling a reset. The driver uses ENA_REGS_RESET_SUSPECTED_POLL_STARVATION reset reason to indicate this case to the device. This patch also adds more information to the ena_tx_timeout() callback. This function is called by the kernel when it detects that a specific TX queue has been closed for too long. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Change error print during ena_device_init()David Arinzon1-1/+2
The print was re-worded to a more informative one. Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Remove CQ tail pointer updateDavid Arinzon5-39/+2
The functionality was added to allow the drivers to create an SQ and CQ of different sizes. When the RX/TX SQ and CQ have the same size, such update isn't necessary as the device can safely assume it doesn't override unprocessed completions. However, if the SQ is larger than the CQ, the device might "have" more completions it wants to update about than there's room in the CQ. There's no support for different SQ and CQ sizes, therefore, removing the API and its usage. '____cacheline_aligned' compiler attribute was added to 'struct ena_com_io_cq' to ensure that the removal of the 'cq_head_db_reg' field doesn't change the cache-line layout of this struct. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Enable DIM by defaultDavid Arinzon1-0/+6
Dynamic Interrupt Moderation (DIM) is a technique designed to balance the need for timely data processing with the desire to minimize CPU overhead. Instead of generating an interrupt for every received packet, the system can dynamically adjust the rate at which interrupts are generated based on the incoming traffic patterns. Enabling DIM by default to improve the user experience. DIM can be turned on/off through ethtool: `ethtool -C <interface> adaptive-rx <on/off>` Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Osama Abboud <osamaabb@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Minor cosmetic changesDavid Arinzon1-4/+2
A few changes for better readability and style 1. Adding / Removing newlines 2. Removing an unnecessary and confusing comment 3. Using an existing variable rather than re-checking a field Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Add more documentation for RX copybreakDavid Arinzon1-0/+6
This patch contains more details about the functionality of RX copybreak. Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: ena: Remove an unused fieldDavid Arinzon2-4/+0
Remove io_sq->header_addr field because it is no longer in use. LLQ was updated to support a bounce buffer so there is no need in saving the header address of the sq. Signed-off-by: Nati Koler <nkoler@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01Merge branch 'net-mana-assigning-irq-affinity-on-ht-cores'Paolo Abeni4-10/+113
Souradeep Chakrabarti says: ==================== net: mana: Assigning IRQ affinity on HT cores This patch set introduces a new helper function irq_setup(), to optimize IRQ distribution for MANA network devices. The patch set makes the driver working 15% faster than with cpumask_local_spread(). ==================== Link: https://lore.kernel.org/r/1706509267-17754-1-git-send-email-schakrabarti@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: mana: Assigning IRQ affinity on HT coresSouradeep Chakrabarti1-11/+50
Existing MANA design assigns IRQ to every CPU, including sibling hyper-threads. This may cause multiple IRQs to be active simultaneously in the same core and may reduce the network performance. Improve the performance by assigning IRQ to non sibling CPUs in local NUMA node. The performance improvement we are getting using ntttcp with following patch is around 15 percent against existing design and approximately 11 percent, when trying to assign one IRQ in each core across NUMA nodes, if enough cores are present. The change will improve the performance for the system with high number of CPU, where number of CPUs in a node is more than 64 CPUs. Nodes with 64 CPUs or less than 64 CPUs will not be affected by this change. The performance study was done using ntttcp tool in Azure. The node had 2 nodes with 32 cores each, total 128 vCPU and number of channels were 32 for 32 RX rings. The below table shows a comparison between existing design and new design: IRQ node-num core-num CPU performance(%) 1 0 | 0 0 | 0 0 | 0-1 0 2 0 | 0 0 | 1 1 | 2-3 3 3 0 | 0 1 | 2 2 | 4-5 10 4 0 | 0 1 | 3 3 | 6-7 15 5 0 | 0 2 | 4 4 | 8-9 15 ... ... 25 0 | 0 12| 24 24| 48-49 12 ... 32 0 | 0 15| 31 31| 62-63 12 33 0 | 0 16| 0 32| 0-1 10 ... 64 0 | 0 31| 31 63| 62-63 0 Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01net: mana: add a function to spread IRQs per CPUsYury Norov1-0/+29
Souradeep investigated that the driver performs faster if IRQs are spread on CPUs with the following heuristics: 1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second priority; 3. Sibling dislocality is the last priority. Let's consider this topology: Node 0 1 Core 0 1 2 3 CPU 0 1 2 3 4 5 6 7 The most performant IRQ distribution based on the above topology and heuristics may look like this: IRQ Nodes Cores CPUs 0 1 0 0-1 1 1 1 2-3 2 1 0 0-1 3 1 1 2-3 4 2 2 4-5 5 2 3 6-7 6 2 2 4-5 7 2 3 6-7 The irq_setup() routine introduced in this patch leverages the for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as described above. According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on cpumask_local_spread() performance test results look like this: ./ntttcp -r -m 16 NTTTCP for Linux 1.4.0 --------------------------------------------------------- 08:05:20 INFO: 17 threads created 08:05:28 INFO: Network activity progressing... 08:06:28 INFO: Test run completed. 08:06:28 INFO: Test cycle finished. 08:06:28 INFO: ##### Totals: ##### 08:06:28 INFO: test duration :60.00 seconds 08:06:28 INFO: total bytes :630292053310 08:06:28 INFO: throughput :84.04Gbps 08:06:28 INFO: retrans segs :4 08:06:28 INFO: cpu cores :192 08:06:28 INFO: cpu speed :3799.725MHz 08:06:28 INFO: user :0.05% 08:06:28 INFO: system :1.60% 08:06:28 INFO: idle :96.41% 08:06:28 INFO: iowait :0.00% 08:06:28 INFO: softirq :1.94% 08:06:28 INFO: cycles/byte :2.50 08:06:28 INFO: cpu busy (all) :534.41% For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster: ./ntttcp -r -m 16 NTTTCP for Linux 1.4.0 --------------------------------------------------------- 08:08:51 INFO: 17 threads created 08:08:56 INFO: Network activity progressing... 08:09:56 INFO: Test run completed. 08:09:56 INFO: Test cycle finished. 08:09:56 INFO: ##### Totals: ##### 08:09:56 INFO: test duration :60.00 seconds 08:09:56 INFO: total bytes :741966608384 08:09:56 INFO: throughput :98.93Gbps 08:09:56 INFO: retrans segs :6 08:09:56 INFO: cpu cores :192 08:09:56 INFO: cpu speed :3799.791MHz 08:09:56 INFO: user :0.06% 08:09:56 INFO: system :1.81% 08:09:56 INFO: idle :96.18% 08:09:56 INFO: iowait :0.00% 08:09:56 INFO: softirq :1.95% 08:09:56 INFO: cycles/byte :2.25 08:09:56 INFO: cpu busy (all) :569.22% [1] https://lore.kernel.org/all/20231211063726.GA4977@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ Signed-off-by: Yury Norov <yury.norov@gmail.com> Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01cpumask: define cleanup function for cpumasksYury Norov1-0/+3
Now we can simplify code that allocates cpumasks for local needs. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01cpumask: add cpumask_weight_andnot()Yury Norov3-0/+32
Similarly to cpumask_weight_and(), cpumask_weight_andnot() is a handy helper that may help to avoid creating an intermediate mask just to calculate number of bits that set in a 1st given mask, and clear in 2nd one. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01firewire: core: search descriptor leaf just after vendor directory entry in ↵Takashi Sakamoto1-1/+10
root directory It appears that Sony DVMC-DA1 has a quirk that the descriptor leaf entry locates just after the vendor directory entry in root directory. This is not conformant to the legacy layout of configuration ROM described in Configuration ROM for AV/C Devices 1.0 (1394 Trading Association, Dec 2000, TA Document 1999027). This commit changes current implementation to parse configuration ROM for device attributes so that the descriptor leaf entry can be detected for the vendor name. $ config-rom-pretty-printer < Sony-DVMC-DA1.img ROM header and bus information block ----------------------------------------------------------------- 1024 041ee7fb bus_info_length 4, crc_length 30, crc 59387 1028 31333934 bus_name "1394" 1032 e0644000 irmc 1, cmc 1, isc 1, bmc 0, cyc_clk_acc 100, max_rec 4 (32) 1036 08004603 company_id 080046 | 1040 0014193c device_id 12886219068 | EUI-64 576537731003586876 root directory ----------------------------------------------------------------- 1044 0006b681 directory_length 6, crc 46721 1048 03080046 vendor 1052 0c0083c0 node capabilities: per IEEE 1394 1056 8d00000a --> eui-64 leaf at 1096 1060 d1000003 --> unit directory at 1072 1064 c3000005 --> vendor directory at 1084 1068 8100000a --> descriptor leaf at 1108 unit directory at 1072 ----------------------------------------------------------------- 1072 0002cdbf directory_length 2, crc 52671 1076 1200a02d specifier id 1080 13010000 version vendor directory at 1084 ----------------------------------------------------------------- 1084 00020cfe directory_length 2, crc 3326 1088 17fa0000 model 1092 81000008 --> descriptor leaf at 1124 eui-64 leaf at 1096 ----------------------------------------------------------------- 1096 0002c66e leaf_length 2, crc 50798 1100 08004603 company_id 080046 | 1104 0014193c device_id 12886219068 | EUI-64 576537731003586876 descriptor leaf at 1108 ----------------------------------------------------------------- 1108 00039e26 leaf_length 3, crc 40486 1112 00000000 textual descriptor 1116 00000000 minimal ASCII 1120 536f6e79 "Sony" descriptor leaf at 1124 ----------------------------------------------------------------- 1124 0005001d leaf_length 5, crc 29 1128 00000000 textual descriptor 1132 00000000 minimal ASCII 1136 44564d43 "DVMC" 1140 2d444131 "-DA1" 1144 00000000 Suggested-by: Adam Goldman <adamg@pobox.com> Tested-by: Adam Goldman <adamg@pobox.com> Link: https://lore.kernel.org/r/20240130100409.30128-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-02-01firewire: core: correct documentation of fw_csr_string() kernel APITakashi Sakamoto1-4/+3
Against its current description, the kernel API can accepts all types of directory entries. This commit corrects the documentation. Cc: stable@vger.kernel.org Fixes: 3c2c58cb33b3 ("firewire: core: fw_csr_string addendum") Link: https://lore.kernel.org/r/20240130100409.30128-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-02-01net: dsa: Add KSZ8567 switch supportPhilippe Schenker5-1/+53
This commit introduces support for the KSZ8567, a robust 7-port Ethernet switch. The KSZ8567 features two RGMII/MII/RMII interfaces, each capable of gigabit speeds, complemented by five 10/100 Mbps MAC/PHYs. Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240130083419.135763-2-dev@pschenker.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01dt-bindings: net: dsa: Add KSZ8567 switch supportPhilippe Schenker1-0/+1
This commit adds the dt-binding for KSZ8567, a robust 7-port Ethernet switch. The KSZ8567 features two RGMII/MII/RMII interfaces, each capable of gigabit speeds, complemented by five 10/100 Mbps MAC/PHYs. This binding is necessary to set specific capabilities for this switch chip that are necessary due to the ksz dsa driver only accepting specific chip ids. The KSZ8567 is very similar to KSZ9567 however only containing 100 Mbps phys on its downstream ports. Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240130083419.135763-1-dev@pschenker.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01dt-bindings: net: qcom,ipa: do not override firmware-name $refKrzysztof Kozlowski1-1/+1
dtschema package defines firmware-name as string-array, so individual bindings should not make it a string but instead just narrow the number of expected firmware file names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20240129142121.102450-1-krzysztof.kozlowski@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01Merge branch 'tools-net-ynl-add-features-for-tc-family'Jakub Kicinski7-187/+2221
Donald Hunter says: ==================== tools/net/ynl: Add features for tc family Add features to ynl for tc and update the tc spec to use them. Patch 1 adds an option to output json instead of python pretty printing. Patch 2, 3 adds support and docs for sub-messages in nested attribute spaces that reference keys from a parent space. Patches 4 and 7-9 refactor ynl in support of nested struct definitions Patch 5 implements sub-message encoding for write ops. Patch 6 adds logic to set default zero values for binary blobs Patches 10, 11 adds support and docs for nested struct definitions Patch 12 updates the ynl doc generator to include type information for struct members. Patch 13 updates the tc spec - still a work in progress but more complete ==================== Link: https://lore.kernel.org/r/20240129223458.52046-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01doc/netlink/specs: Update the tc specDonald Hunter1-101/+2017
Fill in many of the gaps in the tc netlink spec, including stats attrs, classes and actions. Many documentation strings have also been added. This is still a work in progress, albeit fairly complete: - there are still many attributes left as binary blobs. - actions have not had much testing Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-14-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Add type info to struct members in generated docsDonald Hunter1-1/+8
Extend the ynl doc generator to include type information for struct members, ignoring the pad type. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-13-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01doc/netlink: Describe nested structs in netlink raw docsDonald Hunter1-0/+34
Add a description and example of nested struct definitions to the netlink raw documentation. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-12-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Add support for nested structsDonald Hunter3-9/+34
Make it possible for struct definitions to reference other struct definitions ofr binary members. For example, the tbf qdisc uses this struct definition for its parms attribute: - name: tc-tbf-qopt type: struct members: - name: rate type: binary struct: tc-ratespec - name: peakrate type: binary struct: tc-ratespec - name: limit type: u32 - name: buffer type: u32 - name: mtu type: u32 This adds the necessary schema changes and adds nested struct encoding and decoding to ynl. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-11-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Move formatted_string method out of NlAttrDonald Hunter1-16/+15
The formatted_string() class method was in NlAttr so that it could be accessed by NlAttr.as_struct(). Now that as_struct() has been removed, move formatted_string() to YnlFamily as an internal helper method. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-10-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Rename _fixed_header_size() to _struct_size()Donald Hunter1-6/+6
Refactor the _fixed_header_size() method to be _struct_size() so that naming is consistent with _encode_struct() and _decode_struct(). Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-9-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Combine struct decoding logic in ynlDonald Hunter1-33/+14
_decode_fixed_header() and NlAttr.as_struct() both implemented struct decoding logic. Deduplicate the code into newly named _decode_struct() method. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-8-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Encode default values for binary blobsDonald Hunter1-2/+7
Add support for defaulting binary byte arrays to all zeros as well as defaulting scalar values to 0 when encoding input parameters. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-7-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Add support for encoding sub-messagesDonald Hunter1-4/+23
Add sub-message encoding to ynl. This makes it possible to create tc qdiscs and other polymorphic netlink objects. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-6-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Refactor fixed header encoding into separate methodDonald Hunter1-11/+15
Refactor the fixed header encoding into a separate _encode_struct method so that it can be reused for fixed headers in sub-messages and for encoding structs. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01doc/netlink: Describe sub-message selector resolutionDonald Hunter1-0/+8
Update the netlink-raw docs to add a description of sub-message selector resolution to explain that selector resolution is constrained by the spec. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Support sub-messages in nested attribute spacesDonald Hunter1-9/+29
Sub-message selectors could only be resolved using values from the current nest level. Enable value lookup in outer scopes by using collections.ChainMap to implement an ordered lookup from nested to outer scopes. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01tools/net/ynl: Add --output-json arg to ynl cliDonald Hunter1-3/+19
The ynl cli currently emits python pretty printed structures which is hard to consume. Add a new --output-json argument to emit JSON. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240129223458.52046-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01selftests: Declare local variable for pause in fcnal-test.shDavid Ahern1-4/+5
Running fcnal-test.sh script with -P argument is causing test failures: $ ./fcnal-test.sh -t ping -P TEST: ping out - ns-B IP [ OK ] hit enter to continue, 'q' to quit fcnal-test.sh: line 106: [: ping: integer expression expected TEST: out, [FAIL] expected rc ping; actual rc 0 hit enter to continue, 'q' to quit The test functions use local variable 'a' for addresses and then log_test is also using 'a' without a local declaration. Fix by declaring a local variable and using 'ans' (for answer) in the read. Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240130154327.33848-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01Merge branch 'selftests-net-a-few-pmtu-sh-fixes'Jakub Kicinski2-9/+12
Paolo Abeni says: ==================== selftests: net: a few pmtu.sh fixes This series try to address CI failures for the pmtu.sh tests. It does _not_ attempt to enable all the currently skipped cases, to avoid adding more entropy. Tested with: make -C tools/testing/selftests/ TARGETS=net install vng --build --config tools/testing/selftests/net/config vng --run . --user root -- \ ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:pmtu.sh ==================== Link: https://lore.kernel.org/r/cover.1706635101.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01selftests: net: don't access /dev/stdout in pmtu.shPaolo Abeni1-1/+1
When running the pmtu.sh via the kselftest infra, accessing /dev/stdout gives unexpected results: # dd: failed to open '/dev/stdout': Device or resource busy # TEST: IPv4, bridged vxlan4: PMTU exceptions [FAIL] Let dd use directly the standard output to fix the above: # TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ] Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/23d7592c5d77d75cff9b34f15c227f92e911c2ae.1706635101.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01selftests: net: fix available tunnels detectionPaolo Abeni1-8/+8
The pmtu.sh test tries to detect the tunnel protocols available in the running kernel and properly skip the unsupported cases. In a few more complex setup, such detection is unsuccessful, as the script currently ignores some intermediate error code at setup time. Before: # which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin) # TEST: vti6: PMTU exceptions (ESP-in-UDP) [FAIL] # PMTU exception wasn't created after creating tunnel exceeding link layer MTU # ./pmtu.sh: line 931: kill: (7543) - No such process # ./pmtu.sh: line 931: kill: (7544) - No such process After: # xfrm4 not supported # TEST: vti4: PMTU exceptions [SKIP] Fixes: ece1278a9b81 ("selftests: net: add ESP-in-UDP PMTU test") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/cab10e75fda618e6fff8c595b632f47db58b9309.1706635101.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01selftests: net: add missing config for pmtu.sh testsPaolo Abeni1-0/+3
The mentioned test uses a few Kconfig still missing the net config, add them. Before: # Error: Specified qdisc kind is unknown. # Error: Specified qdisc kind is unknown. # Error: Qdisc not classful. # We have an error talking to the kernel # Error: Qdisc not classful. # We have an error talking to the kernel # policy_routing not supported # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP] After: # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ] Fixes: ec730c3e1f0e ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/8d27bf6762a5c7b3acc457d6e6872c533040f9c1.1706635101.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01Merge branch 'pds_core-various-fixes'Jakub Kicinski8-40/+124
Brett Creeley says: ==================== pds_core: Various fixes This series includes the following changes: There can be many users of the pds_core's adminq. This includes pds_core's uses and any clients that depend on it. When the pds_core device goes through a reset for any reason the adminq is freed and reconfigured. There are some gaps in the current implementation that will cause crashes during reset if any of the previously mentioned users of the adminq attempt to use it after it's been freed. Issues around how resets are handled, specifically regarding the driver's error handlers. Originally these patches were aimed at net-next, but it was requested to push the fixes patches to net. The original patches can be found here: https://lore.kernel.org/netdev/20240126174255.17052-1-brett.creeley@amd.com/ Also, the Reviewed-by tags were left in place from net-next reviews as the patches didn't change. ==================== Link: https://lore.kernel.org/r/20240129234035.69802-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Rework teardown/setup flow to be more commonBrett Creeley5-18/+9
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively. Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init(). Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Clear BARs on resetBrett Creeley6-11/+45
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed until they are re-mapped. Also, make sure any places that can access the BARs when they are NULL are prevented. Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Prevent race issues involving the adminqBrett Creeley3-6/+47
There are multiple paths that can result in using the pdsc's adminq. [1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq() [2] pdsc_adminq_post() When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated. A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed. In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time. The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Use struct pdsc for the pdsc_adminq_isr private dataBrett Creeley2-3/+4
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each client driver's adminq isr would use their specific adminqcq for the private data struct. For the time being the design has changed to only use a single adminq for all clients. So, instead use the struct pdsc for the private data to simplify things a bit. This also has the benefit of not dereferencing the adminqcq to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit is set and the adminqcq has actually been cleared/freed. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Cancel AQ work on teardownBrett Creeley1-0/+2
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_adminq() passes the PDSC_S_STOPPING_DRIVER check and starts to process adminq/notifyq work and then the driver starts a fw_down cycle. This could cause some undefined behavior if the notifyqcq/adminqcq are free'd while pdsc_process_adminq() is running. Use cancel_work_sync() on the adminqcq's work struct to make sure any pending work items are cancelled and any in progress work items are completed. Also, make sure to not call cancel_work_sync() if the work item has not be initialized. Without this, traces will happen in cases where a reset fails and teardown is called again or if reset fails and the driver is removed. Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01pds_core: Prevent health thread from running during reset/removeBrett Creeley1-2/+17
The PCIe reset handlers can run at the same time as the health thread. This can cause the health thread to stomp on the PCIe reset. Fix this by preventing the health thread from running while a PCIe reset is happening. As part of this use timer_shutdown_sync() during reset and remove to make sure the timer doesn't ever get rearmed. Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01af_unix: fix lockdep positive in sk_diag_dump_icons()Eric Dumazet3-15/+21
syzbot reported a lockdep splat [1]. Blamed commit hinted about the possible lockdep violation, and code used unix_state_lock_nested() in an attempt to silence lockdep. It is not sufficient, because unix_state_lock_nested() is already used from unix_state_double_lock(). We need to use a separate subclass. This patch adds a distinct enumeration to make things more explicit. Also use swap() in unix_state_double_lock() as a clean up. v2: add a missing inline keyword to unix_state_lock_nested() [1] WARNING: possible circular locking dependency detected 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted syz-executor.1/2542 is trying to acquire lock: ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 but task is already holding lock: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&u->lock/1){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378 sk_diag_dump_icons net/unix/diag.c:87 [inline] sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157 sk_diag_dump net/unix/diag.c:196 [inline] unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220 netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319 sock_diag_rcv_msg+0xe3/0x400 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x39a/0x520 net/socket.c:1160 call_write_iter include/linux/fs.h:2085 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa74/0xca0 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b -> #0 (rlock-AF_UNIX){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&u->lock/1); lock(rlock-AF_UNIX); lock(&u->lock/1); lock(rlock-AF_UNIX); *** DEADLOCK *** 1 lock held by syz-executor.1/2542: #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 stack backtrace: CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f26d887cda9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9 RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004 RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68 Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>