summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-12-13tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is setSalvatore Dipietro1-1/+1
Based on the tcp man page, if TCP_NODELAY is set, it disables Nagle's algorithm and packets are sent as soon as possible. However in the `tcp_push` function where autocorking is evaluated the `nonagle` value set by TCP_NODELAY is not considered which can trigger unexpected corking of packets and induce delays. For example, if two packets are generated as part of a server's reply, if the first one is not transmitted on the wire quickly enough, the second packet can trigger the autocorking in `tcp_push` and be delayed instead of sent as soon as possible. It will either wait for additional packets to be coalesced or an ACK from the client before transmitting the corked packet. This can interact badly if the receiver has tcp delayed acks enabled, introducing 40ms extra delay in completion times. It is not always possible to control who has delayed acks set, but it is possible to adjust when and how autocorking is triggered. Patch prevents autocorking if the TCP_NODELAY flag is set on the socket. Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and Apache Tomcat 9.0.83 running the basic servlet below: import java.io.IOException; import java.io.OutputStreamWriter; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; public class HelloWorldServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=utf-8"); OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8"); String s = "a".repeat(3096); osw.write(s,0,s.length()); osw.flush(); } } Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS c6i.8xlarge instance. With the current auto-corking behavior and TCP_NODELAY set an additional 40ms latency from P99.99+ values are observed. With the patch applied we see no occurrences of 40ms latencies. The patch has also been tested with iperf and uperf benchmarks and no regression was observed. # No patch with tcp_autocorking=1 and TCP_NODELAY set on all sockets ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello' ... 50.000% 0.91ms 75.000% 1.12ms 90.000% 1.46ms 99.000% 1.73ms 99.900% 1.96ms 99.990% 43.62ms <<< 40+ ms extra latency 99.999% 48.32ms 100.000% 49.34ms # With patch ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello' ... 50.000% 0.89ms 75.000% 1.13ms 90.000% 1.44ms 99.000% 1.67ms 99.900% 1.78ms 99.990% 2.27ms <<< no 40+ ms extra latency 99.999% 3.71ms 100.000% 4.57ms Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Salvatore Dipietro <dipiets@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13Merge branch 'net-at803x-cleanups'David S. Miller1-349/+426
Christian Marangi says: ==================== net: phy: at803x: cleanup The intention of this big series is to try to cleanup the big at803x PHY driver. It currently have 3 different family of PHY in it. at803x, qca83xx and qca808x. The current codebase required lots of cleanup and reworking to make the split possible as currently there is a greater use of adding special function matching the phy_id. This has been reworked to make the function actually generic and make the change only in more specific one. The result is the addition of micro additional function but that is for good as it massively simplify splitting the driver later. Consider that this is all in preparation for the addition of qca807x PHY driver that will also uso some of the functions of at803x. Subsequent series will come with the actual PHY split and other required cleanup. This is only to start the process with minor changes. Changes v4: - Improve at8031_probe function Changes v3: - Add Reviewed-by tag from Andrew - Split patch 10 (at8031 rename) to rename and move Changes v2: - Drop split part due to series too big - Split changes even more - Fix problem pointed out by Russell (flawed reworked function logic) - Add Reviewed-by tag from Andrew - Minor rework to prevent further code duplication for cdt ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: drop specific PHY ID check from cable test functionsChristian Marangi1-45/+50
Drop specific PHY ID check for cable test functions for at803x. This is done to make functions more generic. While at it better describe what the functions does by using more symbolic function names. PHYs that requires to set additional reg are moved to specific function calling the more generic one. cdt_start and cdt_wait_for_completion are changed to take an additional arg to pass specific values specific to the PHY. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move at8035 specific DT parse to dedicated probeChristian Marangi1-19/+41
Move at8035 specific DT parse for clock out frequency to dedicated probe to make at803x probe function more generic. This is to tidy code and no behaviour change are intended. Detection logic is changed, we check if the clk 25m mask is set and if it's not zero, we assume the qca,clk-out-frequency property is set. The property is checked in the generic at803x_parse_dt called by at803x_probe. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move at8031 functions in dedicated sectionChristian Marangi1-133/+133
Move at8031 functions in dedicated section with dedicated at8031 parse_dt and probe. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: make at8031 related DT functions name more specificChristian Marangi1-8/+8
Rename at8031 related DT function name to a more specific name referencing they are only related to at8031 and not to the generic at803x PHY family. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move specific at8031 config_intr to dedicated functionChristian Marangi1-6/+24
Move specific at8031 config_intr bits to dedicated function to make at803x_config_initr more generic. This is needed in preparation for PHY driver split as qca8081 share the same function to setup interrupts. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move specific at8031 WOL bits to dedicated functionChristian Marangi1-17/+25
Move specific at8031 WOL enable/disable to dedicated function to make at803x_set_wol more generic. This is needed in preparation for PHY driver split as qca8081 share the same function to toggle WOL settings. In this new implementation WOL module in at8031 is enabled after the generic interrupt is setup. This should not cause any problem as the WOL_INT has a separate implementation and only relay on MAC bits. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move specific at8031 config_init to dedicated functionChristian Marangi1-20/+25
Move specific at8031 config_init to dedicated function to make at803x_config_init more generic and tidy things up. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move specific at8031 probe mode check to dedicated probeChristian Marangi1-20/+19
Move specific at8031 probe mode check to dedicated probe to make at803x_probe more generic and keep code tidy. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move specific DT option for at8031 to specific probeChristian Marangi1-24/+31
Move specific DT options for at8031 to specific probe to tidy things up and make at803x_parse_dt more generic. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move qca83xx specific check in dedicated functionsChristian Marangi1-31/+37
Rework qca83xx specific check to dedicated function to tidy things up and drop useless phy_id check. Also drop an useless link_change_notify for QCA8337 as it did nothing an returned early. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: raname hw_stats functions to qca83xx specific nameChristian Marangi1-22/+22
The function and the struct related to hw_stats were specific to qca83xx PHY but were called following the convention in the driver of calling everything with at803x prefix. To better organize the code, rename these function a more specific name to better describe that they are specific to 83xx PHY family. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: move disable WOL to specific at8031 probeChristian Marangi1-10/+17
Move the WOL disable call to specific at8031 probe to make at803x_probe more generic and drop extra check for PHY ID. Keep the same previous behaviour by first calling at803x_probe and then disabling WOL. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13net: phy: at803x: fix passing the wrong reference for config_intrChristian Marangi1-3/+3
Fix passing the wrong reference for config_initr on passing the function pointer, drop the wrong & from at803x_config_intr in the PHY struct. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13dpll: remove leftover mode_supported() op and use mode_get() insteadJiri Pirko5-52/+10
Mode supported is currently reported to the user exactly the same, as the current mode. That's because mode changing is not implemented. Remove the leftover mode_supported() op and use mode_get() to fill up the supported mode exposed to user. One, if even, mode changing is going to be introduced, this could be very easily taken back. In the meantime, prevent drivers form implementing this in wrong way (as for example recent netdevsim implementation attempt intended to do). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13Merge tag 'hid-for-linus-2023121201' of ↵Linus Torvalds6-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - Lenovo ThinkPad TrackPoint Keyboard II firmware-specific regression fix (Mikhail Khvainitski) - device-specific fixes (various authors) * tag 'hid-for-linus-2023121201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards HID: lenovo: Restrict detection of patched firmware only to USB cptkbd HID: Add quirk for Labtec/ODDOR/aikeec handbrake HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[] mailmap: add address mapping for Jiri Kosina
2023-12-13dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()Jiri Pirko1-5/+8
User may not pass DPLL_A_PIN_STATE attribute in the pin set operation message. Sanitize that by checking if the attr pointer is not null and process the passed state attribute value only in that case. Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13Merge branch 'ena-driver-xdp-bug-fixes'Jakub Kicinski2-30/+26
David Arinzon says: ==================== ENA driver XDP bug fixes This patchset contains multiple XDP-related bug fixes in the ENA driver. ==================== Link: https://lore.kernel.org/r/20231211062801.27891-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: ena: Fix XDP redirection errorDavid Arinzon1-3/+0
When sending TX packets, the meta descriptor can be all zeroes as no meta information is required (as in XDP). This patch removes the validity check, as when `disable_meta_caching` is enabled, such TX packets will be dropped otherwise. Fixes: 0e3a3f6dacf0 ("net: ena: support new LLQ acceleration mode") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: ena: Fix DMA syncing in XDP path when SWIOTLB is onDavid Arinzon1-14/+9
This patch fixes two issues: Issue 1 ------- Description ``````````` Current code does not call dma_sync_single_for_cpu() to sync data from the device side memory to the CPU side memory before the XDP code path uses the CPU side data. This causes the XDP code path to read the unset garbage data in the CPU side memory, resulting in incorrect handling of the packet by XDP. Solution ```````` 1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to use the data in the CPU side memory. 2. The XDP code verdict can be XDP_PASS, in which case there is a fallback to the non-XDP code, which also calls dma_sync_single_for_cpu(). To avoid calling dma_sync_single_for_cpu() twice: 2.1. Put the dma_sync_single_for_cpu() in the code in such a place where it happens before XDP and non-XDP code. 2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code for the first buffer only (rx_copybreak and non-rx_copybreak cases), since the new call that was added covers these cases. The call to dma_sync_single_for_cpu() for the second buffer and on stays because only the first buffer is handled by the newly added dma_sync_single_for_cpu(). And there is no need for special handling of the second buffer and on for the XDP path since currently the driver supports only single buffer packets. Issue 2 ------- Description ``````````` In case the XDP code forwarded the packet (ENA_XDP_FORWARDED), ena_unmap_rx_buff_attrs() is called with attrs set to 0. This means that before unmapping the buffer, the internal function dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on the whole buffer (not only on the data part of it). This sync is both wasteful (since a sync was already explicitly called before) and also causes a bug, which will be explained using the below diagram. The following diagram shows the flow of events causing the bug. The order of events is (1)-(4) as shown in the diagram. CPU side memory area (3)convert_to_xdp_frame() initializes the headroom with xdpf metadata || \/ ___________________________________ | | 0 | V 4K --------------------------------------------------------------------- | xdpf->data | other xdpf | < data > | tailroom ||...| | | fields | | GARBAGE || | --------------------------------------------------------------------- /\ /\ || || (4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu() dma_sync_single_for_cpu() on the copies data from device whole buffer page, overwriting side to CPU side memory the xdpf->data with GARBAGE. || 0 4K --------------------------------------------------------------------- | headroom | < data > | tailroom ||...| | GARBAGE | | GARBAGE || | --------------------------------------------------------------------- Device side memory area /\ || (1) device writes RX packet data After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data becomes corrupted, and so when it is later accessed in ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault, crashing the kernel. Solution ```````` Explicitly tell ena_unmap_rx_buff_attrs() not to call dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC flag. Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: ena: Fix xdp drops handling due to multibuf packetsDavid Arinzon1-7/+10
Current xdp code drops packets larger than ENA_XDP_MAX_MTU. This is an incorrect condition since the problem is not the size of the packet, rather the number of buffers it contains. This commit: 1. Identifies and drops XDP multi-buffer packets at the beginning of the function. 2. Increases the xdp drop statistic when this drop occurs. 3. Adds a one-time print that such drops are happening to give better indication to the user. Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: ena: Destroy correct number of xdp queues upon failureDavid Arinzon1-6/+7
The ena_setup_and_create_all_xdp_queues() function freed all the resources upon failure, after creating only xdp_num_queues queues, instead of freeing just the created ones. In this patch, the only resources that are freed, are the ones allocated right before the failure occurs. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13Merge branch 'bnxt_en-update-for-net-next'Jakub Kicinski6-47/+153
Michael Chan says: ==================== bnxt_en: Update for net-next The first 4 patches in the series fix issues in the net-next tree introduced in the last 4 weeks. The first 3 patches fix ring accounting and indexing logic. The 4th patch fix TX timeout when the TX ring is very small. The next 7 patches add new features on the P7 chips, including TX coalesced completions, VXLAN GPE and UDP GSO stateless offload, a new rx_filter_miss counters, and more QP backing store memory for RoCE. The last 2 patches are PTP improvements. ==================== Link: https://lore.kernel.org/r/20231212005122.2401-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Make PTP TX timestamp HWRM query silentPavan Chebbi1-3/+3
In a busy network, especially with flow control enabled, we may experience timestamp query failures fairly regularly. After a while, dmesg may be flooded with timestamp query failure error messages. Silence the error message from the low level hwrm function that sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() if this FW call ever fails. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-14-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Skip nic close/open when configuring tstamp filtersPavan Chebbi2-20/+11
We don't have to close and open the nic to make sure we have valid rx timestamps. Once we have the timestamp filter applied to the HW and the timestamp_fld_format bit is cleared in the rx completion and the timestamp is non-zero, we can be sure that rx timestamp is valid data. Skip close/open when we set any timestamp filter. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-13-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Add support for UDP GSO on 5760X chipsMichael Chan2-3/+19
The new 5760X chips supports UDP GSO. Tested using udpgso_bench_tx. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: add rx_filter_miss extended statsDamodharam Ammepalli1-0/+1
rx_filter_miss counter is newly added to the rx_port_stats_ext stats structure for newer chips. Newer firmware will return the structure size that includes this counter. Add this entry to the bnxt_port_stats_ext_arr array and the ethtool -S code will pick up this counter if it is supported. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Configure UDP tunnel TPAMichael Chan2-0/+34
On the new P7 chips, TPA for tunnel packets can be independently enabled for each VNIC. The default TPA configuration should not include UDP tunnels because the UDP ports for these tunnels are not known yet. The chip should not aggregate these UDP tunneled packets using default UDP ports until the ports are known. Add a new function bnxt_hwrm_vnic_update_tunl_tpa() to enable VXLAN and Geneve TPA if the corresponding UDP ports are known. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Add support for VXLAN GPEMichael Chan2-5/+40
Add a new bnxt_udp_tunnels_p7 struct to support the new P7 chips that can parse VXLAN GPE packets. Add VXLAN GPE tunnel type handling to the .set_port() and .unset_port() functions. .ndo_features_check() is also enhanced to support VXLAN GPE which may encapsulate inner IP packets instead of ethernet packets. Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Use proper TUNNEL_DST_PORT_ALLOC* commandsMichael Chan1-2/+2
In bnxt_udp_tunnel_set_port(), use the proper ALLOC commands instead of the FREE commands for correctness. The ALLOC and FREE commands happen to be identical so this is just a cosmetic fix for correctness. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Allocate extra QP backing store memory when RoCE FW reports itSelvin Xavier1-2/+12
The Fast QP modify destroy RoCE feature requires additional QP entries in QP context backing store. FW reports the extra count to be allocated during backing store query. Use this value and allocate extra memory. Note that this works for both the V1 and V1 backing store FW APIs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Support TX coalesced completion on 5760X chipsMichael Chan2-2/+12
TX coalesced completions are supported on newer chips to provide one TX completion record for multiple TX packets up to the sq_cons_idx in the completion record. This method saves PCIe bandwidth by reducing the number of TX completions. Only very minor changes are now required to support this mode with the new framework that handles TX completions based on the consumer indices. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Prevent TX timeout with a very small TX ringMichael Chan1-1/+4
If xmit_more condition is true, the driver may set the TX_BD_FLAGS_NO_CMPL flag. If after this packet, the TX ring can no longer hold a packet with maximum fragments, we will stop the TX queue. When this happens, we must clear the TX_BD_FLAGS_NO_CMPL flag on the last packet or there will be no completion and cause TX timeout. Fixes: c1056a59aee1 ("bnxt_en: Optimize xmit_more TX path") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Fix TX ring indexing logicMichael Chan2-2/+2
Two spots were missed when modifying the TX ring indexing logic. The use of unmasked TX index in bnxt_tx_int() will cause unnecessary __bnxt_tx_int() calls. The same issue in bnxt_tx_int_xdp() can result in illegal array index. Fixes: 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Fix AGG ring check logic in bnxt_check_rings()Somnath Kotur1-3/+3
_bnxt_get_max_rings() that is invoked in bnxt_check_rings() already accounts for the AGG ring(s) and gives a max value based on that. Increasing for AGG rings before calling _bnxt_get_max_rings() will result in checking for twice the number of rings than required and it can fail. Fix it by adjusting for AGG rings after calling _bnxt_get_max_rings(). Fixes: f5b29c6afe36 ("bnxt_en: Add helper to get the number of CP rings required for TX rings") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13bnxt_en: Fix trimming of P5 RX and TX ringsMichael Chan1-5/+11
The recent commit to trim the RX and TX rings on P5 chips by assigning each with max CP rings divided by 2 is not correct. Max CP rings divided by 2 may be bigger than the original RX or TX and would lead to failure. In other words, we may be checking for increased RX/TX rings than required and it may fail. Fix it by calling __bnxt_trim_rings() instead that would properly trim RX and TX without the possibility of increasing their values. Fixes: f5b29c6afe36 ("bnxt_en: Add helper to get the number of CP rings required for TX rings") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: mdio-gpio: replace deprecated strncpy with strscpyJustin Stitt1-2/+2
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect new_bus->id to be NUL-terminated but not NUL-padded based on its prior assignment through snprintf: | snprintf(new_bus->id, MII_BUS_ID_SIZE, "gpio-%x", bus_id); Due to this, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. We can also use sizeof() instead of a length macro as this more closely ties the maximum buffer size to the destination buffer. Do this for two instances. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231211-strncpy-drivers-net-mdio-mdio-gpio-c-v3-1-76dea53a1a52@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: Remove acked SYN flag from packet in the transmit queue correctlyDong Chenchen1-0/+6
syzkaller report: kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716) The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows: TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... // TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag 6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // TCP B: send challenge ack to SYN_FIN_ACK 7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack // TCP A: <SND.UNA=101> 8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic __tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13Merge tag 'pef2256-framer' of ↵Jakub Kicinski16-0/+3098
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Linus Walleij says: ==================== Immutable tag for the PEF2256 framer * tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: MAINTAINERS: Add the Lantiq PEF2256 driver entry pinctrl: Add support for the Lantic PEF2256 pinmux net: wan: framer: Add support for the Lantiq PEF2256 framer dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framer net: wan: Add framer framework support ==================== Link: https://lore.kernel.org/all/CACRpkdYT1J7noFUhObFgfA60XQAfL4rb=knEmWS__TKKtCMh7Q@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13MAINTAINERS: Add the Lantiq PEF2256 driver entryHerve Codina1-0/+8
After contributing the driver, add myself as the maintainer for the Lantiq PEF2256 driver. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/20231128132534.258459-6-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-13pinctrl: Add support for the Lantic PEF2256 pinmuxHerve Codina3-0/+374
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. This kind of component can be found in old telecommunication system. It was used to digital transmission of many simultaneous telephone calls by time-division multiplexing. Also using HDLC protocol, WAN networks can be reached through the framer. This pinmux support handles the pin muxing part (pins RP(A..D) and pins XP(A..D)) of the PEF2256. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231128132534.258459-5-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-13net: wan: framer: Add support for the Lantiq PEF2256 framerHerve Codina6-0/+1187
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-4-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-13dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framerHerve Codina1-0/+213
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-3-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-13net: wan: Add framer framework supportHerve Codina7-0/+1316
A framer is a component in charge of an E1/T1 line interface. Connected usually to a TDM bus, it converts TDM frames to/from E1/T1 frames. It also provides information related to the E1/T1 line. The framer framework provides a set of APIs for the framer drivers (framer provider) to create/destroy a framer and APIs for the framer users (framer consumer) to obtain a reference to the framer, and use the framer. This basic implementation provides a framer abstraction for: - power on/off the framer - get the framer status (line state) - be notified on framer status changes - get/set the framer configuration Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-2-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-13qed: Fix a potential use-after-free in qed_cxt_tables_allocDinghao Liu1-0/+1
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to free p_hwfn->p_cxt_mngr->ilt_shadow on error. However, qed_cxt_tables_alloc() accesses the freed pointer on failure of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(), which may lead to use-after-free. Fix this issue by setting p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free(). Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: asix: fix fortify warningDmitry Antipov2-4/+6
When compiling with gcc version 14.0.0 20231129 (experimental) and CONFIG_FORTIFY_SOURCE=y, I've noticed the following warning: ... In function 'fortify_memcpy_chk', inlined from 'ax88796c_tx_fixup' at drivers/net/ethernet/asix/ax88796c_main.c:287:2: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... This call to 'memcpy()' is interpreted as an attempt to copy TX_OVERHEAD (which is 8) bytes from 4-byte 'sop' field of 'struct tx_pkt_info' and thus overread warning is issued. Since we actually want to copy both 'sop' and 'seg' fields at once, use the convenient 'struct_group()' here. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Łukasz Stelmach <l.stelmach@samsung.com> Link: https://lore.kernel.org/r/20231211090535.9730-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12e1000e: Use pcie_capability_read_word() for reading LNKSTAIlpo Järvinen2-8/+4
Use pcie_capability_read_word() for reading LNKSTA and remove the custom define that matches to PCI_EXP_LNKSTA. As only single user for cap_offset remains, replace it with a call to pci_pcie_cap(). Instead of e1000_adapter, make local variable out of pci_dev because both users are interested in it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12Merge tag 'ext4_for_linus-6.7-rc6' of ↵Linus Torvalds5-20/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix various bugs / regressions for ext4, including a soft lockup, a WARN_ON, and a BUG" * tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix soft lockup in journal_finish_inode_data_buffers() ext4: fix warning in ext4_dio_write_end_io() jbd2: increase the journal IO's priority jbd2: correct the printing of write_flags in jbd2_write_superblock() ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12iavf: Fix iavf_shutdown to call iavf_remove instead iavf_closeSlawomir Laba1-51/+21
Make the flow for pci shutdown be the same to the pci remove. iavf_shutdown was implementing an incomplete version of iavf_remove. It misses several calls to the kernel like iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap that might break the system on reboot or hibernation. Implement the call of iavf_remove directly in iavf_shutdown to close this gap. Fixes below error messages (dmesg) during shutdown stress tests - [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for VF 0 [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for VF 0 Reproduction: 1. Create one VF interface: echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs 2. Run live dmesg on the host: dmesg -wH 3. On SUT, script below steps into vf_namespace_assignment.sh <#!/bin/sh> // Remove <>. Git removes # line if=<VF name> (edit this per VF name) loop=0 while true; do echo test round $loop let loop++ ip netns add ns$loop ip link set dev $if up ip link set dev $if netns ns$loop ip netns exec ns$loop ip link set dev $if up ip netns exec ns$loop ip link set dev $if netns 1 ip netns delete ns$loop done 4. Run the script for at least 1000 iterations on SUT: ./vf_namespace_assignment.sh Expected result: No errors in dmesg. Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>