| Age | Commit message (Collapse) | Author | Files | Lines |
|
Configure SerDes (port 4) RX and TX polarities using the newly
introduced generic properties. The polarities are described at the port
level which equals the polarities of the external pins of the chip.
Note that the RX lane is inverted internally and the vendor driver
simply always sets bit GSW1XX_SGMII_PHY_RX0_CFG2_INVERT unconditionally
to end up with the correct (ie. as documented in datasheets) polarity at
the external pins.
In this sense, PHY_POLARITY_NORMAL denotes normal polarity for pins as
documented for the MRQFN 105-pin package (GSW120, GSW125, GSW140, GSW141
and GSW145 all use the same package and have identical pin layouts
except for TP port 2 and 3 being N/C on GSW12x):
pin B18 (TX0_P) positive signal of the differential SGMII data output pair
pin B19 (TX0_M) negative signal of the differential SGMII data output pair
pin B20 (RX0_P) positive signal of the differential SGMII data input pair
pin B21 (RX0_M) negative signal of the differential SGMII data input pair
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/8bf79b3476e23673fceffbe2bc9d6abc13d132e5.1769916962.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Reference the common PHY properties so RX and TX SerDes lane polarity
of the SGMII/1000Base-X/2500Base-X port can be configured.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/f556ef8be75e37a2f864b9d905a78962bbe76d18.1769916962.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Jakub Kicinski says:
====================
net: stats, tools, driver tests for HW GRO [part]
Add miscellaneous pieces related to production use of HW-GRO:
- report standard stats from drivers (bnxt included here,
Gal recently posted patches for mlx5 which is great)
- CLI tool for calculating HW GRO savings / effectiveness
====================
Link: https://patch.msgid.link/20260207003509.3927744-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend ynltool to compute HW GRO savings metric - how many
packets has HW GRO been able to save the kernel from seeing.
Note that this definition does not actually take into account
whether the segments were or weren't eligible for HW GRO.
If a machine is receiving all-UDP traffic - new metric will show
HW-GRO savings of 0%. Conversely since the super-packet still
counts as a received packet, savings of 100% is not achievable.
Perfect HW-GRO on a machine with 4k MTU and 64kB super-frames
would show ~93.75% savings. With 1.5k MTU we may see up to
~97.8% savings (if my math is right).
Example after 10 sec of iperf on a freshly booted machine
with 1.5k MTU:
$ ynltool qstats show
eth0 rx-packets: 40681280 rx-bytes: 61575208437
rx-alloc-fail: 0 rx-hw-gro-packets: 1225133
rx-hw-gro-wire-packets: 40656633
$ ynltool qstats hw-gro
eth0: 96.9% savings
None of the NICs I have access to can report "missed" HW-GRO
opportunities so computing a true "effectiveness" metric
is not possible. One could also argue that effectiveness metric
is inferior in environments where we control both senders and
receivers, the savings metrics will capture both regressions
in receiver's HW GRO effectiveness but also regressions in senders
sending smaller TSO trains. And we care about both. The main
downside is that it's hard to tell at a glance how well the NIC
is doing because the savings will be dependent on traffic patterns.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260207003509.3927744-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The logic to open a socket and dump the queues is the same
across sub-commands. Factor it out, we'll need it again.
No functional changes intended.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260207003509.3927744-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Count and report HW-GRO stats as seen by the kernel.
The device stats for GRO seem to not reflect the reality,
perhaps they count sessions which did not actually result
in any aggregation. Also they count wire packets, so we
have to count super-frames, anyway.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260207003509.3927744-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For NXP NCI devices (e.g. PN7150), the interrupt is level-triggered and
active high, not edge-triggered.
Using IRQF_TRIGGER_RISING in the driver can cause interrupts to fail
to trigger correctly.
Remove IRQF_TRIGGER_RISING and rely on the IRQ trigger type configured
via Device Tree.
Signed-off-by: Carl Lee <carl.lee@amd.com>
Link: https://patch.msgid.link/20260205-fc-nxp-nci-remove-interrupt-trigger-type-v2-1-79d2ed4a7e42@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Alice Mikityanska says:
====================
BIG TCP without HBH in IPv6
Resubmitting after the grace period.
This series is part 1 of "BIG TCP for UDP tunnels". Due to the number of
patches, I'm splitting it into two logical parts:
* Remove hop-by-hop header for BIG TCP IPv6 to align with BIG TCP IPv4.
* Fix up things that prevent BIG TCP from working with UDP tunnels.
The current BIG TCP IPv6 code inserts a hop-by-hop extension header with
32-bit length of the packet. When the packet is encapsulated, and either
the outer or the inner protocol is IPv6, or both are IPv6, there will be
1 or 2 HBH headers that need to be dealt with. The issues that arise:
1. The drivers don't strip it, and they'd all need to know the structure
of each tunnel protocol in order to strip it correctly, also taking into
account all combinations of IPv4/IPv6 inner/outer protocols.
2. Even if (1) is implemented, it would be an additional performance
penalty per aggregated packet.
3. The skb_gso_validate_network_len check is skipped in
ip6_finish_output_gso when IP6SKB_FAKEJUMBO is set, but it seems that it
would make sense to do the actual validation, just taking into account
the length of the HBH header. When the support for tunnels is added, it
becomes trickier, because there may be one or two HBH headers, depending
on whether it's IPv6 in IPv6 or not.
At the same time, having an HBH header to store the 32-bit length is not
strictly necessary, as BIG TCP IPv4 doesn't do anything like this and
just restores the length from skb->len. The same thing can be done for
BIG TCP IPv6. Removing HBH from BIG TCP would allow to simplify the
implementation significantly, and align it with BIG TCP IPv4, which has
been a long-standing goal.
====================
Link: https://patch.msgid.link/20260205133925.526371-1-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the HBH jumbo helpers are not used by any driver or GSO, remove
them altogether.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-13-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the bng_en TX path, that used to check and
remove HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Link: https://patch.msgid.link/20260205133925.526371-12-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the mana TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-11-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the gve TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-10-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the bnxt_en TX path, that used to check and
remove HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260205133925.526371-9-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the ice TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-8-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the mlx4 TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-7-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the mlx5e and mlx5i TX path, that used to check
and remove HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-6-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the GSO TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-5-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Complementary to the previous commit, stop inserting HBH when building
BIG TCP GRO SKBs.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-4-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real
IPv6 payload length when it doesn't fit into the 16-bit field in the
IPv6 header itself. While it helps tools parse the packet, it also
requires every driver that supports TSO and BIG TCP to remove this
8-byte extension header. It might not sound that bad until we try to
apply it to tunneled traffic. Currently, the drivers don't attempt to
strip HBH if skb->encapsulation = 1. Moreover, trying to do so would
require dissecting different tunnel protocols and making corresponding
adjustments on case-by-case basis, which would slow down the fastpath
(potentially also requiring adjusting checksums in outer headers).
At the same time, BIG TCP IPv4 doesn't insert any extra headers and just
calculates the payload length from skb->len, significantly simplifying
implementing BIG TCP for tunnels.
Stop inserting HBH when building BIG TCP GSO SKBs.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-3-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The next commits will transition away from using the hop-by-hop
extension header to encode packet length for BIG TCP. Add wrappers
around ip6->payload_len that return the actual value if it's non-zero,
and calculate it from skb->len if payload_len is set to zero (and a
symmetrical setter).
The new helpers are used wherever the surrounding code supports the
hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code
uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h).
No behavioral change in this commit.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'dpll-zl3073x-include-current-frequency-in-supported-frequencies-list'
Ivan Vecera says:
====================
dpll: zl3073x: Include current frequency in supported frequencies list
This series ensures that the current operating frequency of a DPLL pin
is always reported in its supported frequencies list.
Problem:
When a ZL3073x DPLL pin is registered, its supported frequencies are
read from the firmware node's "supported-frequencies-hz" property.
However, if the firmware node is missing, or doesn't include the
current operating frequency, the pin reports a frequency that isn't
in its supported list. This inconsistency can confuse userspace tools
that expect the current frequency to be among the supported values.
Solution:
Always include the current pin frequency as the first entry in the
supported frequencies list, followed by any additional frequencies
from the firmware node (with duplicates filtered out).
Patch 1 refactors the output pin frequency calculation into a reusable
helper function zl3073x_dev_output_pin_freq_get(), which mirrors the
existing zl3073x_dev_ref_freq_get() for input pins.
Patch 2 modifies zl3073x_pin_props_get() to obtain the current
frequency early and place it at index 0 of the supported frequencies
array, ensuring it is always present regardless of firmware node
contents.
====================
Link: https://patch.msgid.link/20260205154350.3180465-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure the current pin frequency is always present in the list of
supported frequencies reported to userspace. Previously, if the
firmware node was missing or didn't include the current operating
frequency in the supported-frequencies-hz property, the pin would
report a frequency that wasn't in its supported list.
Get the current frequency early in zl3073x_pin_props_get():
- For input pins: use zl3073x_dev_ref_freq_get()
- For output pins: use zl3073x_dev_output_pin_freq_get()
Place the current frequency at index 0 of the supported frequencies
array, then append frequencies from the firmware node (if present),
skipping any duplicate of the current frequency.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260205154350.3180465-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce zl3073x_dev_output_pin_freq_get() helper function to compute
the output pin frequency based on synthesizer frequency, output divisor,
and signal format. For N-div signal formats, the N-pin frequency is
additionally divided by esync_n_period.
Add zl3073x_out_is_ndiv() helper to check if an output is configured
in N-div mode (2_NDIV or 2_NDIV_INV signal formats).
Refactor zl3073x_dpll_output_pin_frequency_get() callback to use the
new helper, reducing code duplication and enabling reuse of the
frequency calculation logic in other contexts.
This is a preparatory change for adding current frequency to the
supported frequencies list in pin properties.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260205154350.3180465-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prefer pskb_may_pull() to avoid a stack canary in raw6_local_deliver().
Note: skb->head can change, hence we reload ip6h pointer in
ipv6_raw_deliver()
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-86 (-86)
Function old new delta
raw6_local_deliver 780 694 -86
Total: Before=24889784, After=24889698, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260205211909.4115285-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, both the icssg-prueth and icssg-prueth-sr1 drivers create
a dedicated 'emac->cmd_wq' workqueue.
In the icssg-prueth-sr1 driver, this workqueue is not utilized at all.
In the icssg-prueth driver, the workqueue is only used to execute the
actual processing of ndo_set_rx_mode. However, creating a dedicated
workqueue for such a simple use case is unnecessary. To simplify the
code, switch to using the system default workqueue instead.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20260205-icssg-prueth-workqueue-v2-1-cf5cf97efb37@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This helper is already (auto)inlined from IPv4 TCP stack.
Make it an inline function to benefit IPv6 as well.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19)
Function old new delta
tcp_v6_rcv 3448 3478 +30
__pfx_tcp_filter 16 - -16
tcp_filter 33 - -33
Total: Before=24891904, After=24891885, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While compile testing on less common architectures, I noticed that gcc-10 on
s390 finds a bug that all other configurations seem to miss:
drivers/net/ethernet/myricom/myri10ge/myri10ge.c: In function 'myri10ge_set_multicast_list':
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:391:25: error: 'cmd.data0' is used uninitialized in this function [-Werror=uninitialized]
391 | buf->data0 = htonl(data->data0);
| ^~
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:392:25: error: '*((void *)&cmd+4)' is used uninitialized in this function [-Werror=uninitialized]
392 | buf->data1 = htonl(data->data1);
| ^~
drivers/net/ethernet/myricom/myri10ge/myri10ge.c: In function 'myri10ge_allocate_rings':
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:392:13: error: 'cmd.data1' is used uninitialized in this function [-Werror=uninitialized]
392 | buf->data1 = htonl(data->data1);
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1939:22: note: 'cmd.data1' was declared here
1939 | struct myri10ge_cmd cmd;
| ^~~
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:393:13: error: 'cmd.data2' is used uninitialized in this function [-Werror=uninitialized]
393 | buf->data2 = htonl(data->data2);
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:1939:22: note: 'cmd.data2' was declared here
1939 | struct myri10ge_cmd cmd;
It would be nice to understand how to make other compilers catch this as
well, but for the moment I'll just shut up the warning by fixing the
undefined behavior in this driver.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260205162935.2126442-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver started using dimlib but fails to select the corresponding
symbol, which results in a link failure:
x86_64-linux-ld: drivers/net/ethernet/huawei/hinic3/hinic3_irq.o: in function `hinic3_poll':
hinic3_irq.c:(.text+0x179): undefined reference to `net_dim'
x86_64-linux-ld: drivers/net/ethernet/huawei/hinic3/hinic3_irq.o: in function `hinic3_rx_dim_work':
hinic3_irq.c:(.text+0x1fb): undefined reference to `net_dim_get_rx_moderation'
Fixes: b35a6fd37a00 ("hinic3: Add adaptive IRQ coalescing with DIM")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260205161530.1308504-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The skb extension ids range from 0 .. 7 to fit their bits as flags into
a single byte. The ids are automatically enumnerated in enum skb_ext_id
in skbuff.h, where SKB_EXT_NUM is defined as the last value.
When having 8 skb extension ids (0 .. 7), SKB_EXT_NUM becomes 8 which is
a valid value for SKB_EXT_NUM.
Fixes: 96ea3a1e2d31 ("can: add CAN skb extension infrastructure")
Link: https://lore.kernel.org/netdev/aXoMqaA7b2CqJZNA@strlen.de/
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260205-skb_ext-v1-1-9ba992ccee8b@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In prestera_ethtool_set_fecparam(), the error message is opposite of
the condition checking PRESTERA_PORT_TCVR_SFP. FEC configuration is
not allowed on SFP ports, but the message says "non-SFP ports", which
does not match the condition. However, FEC may be required depending on
the transceiver, cable, or mode, and firmware already validates invalid
combinations.
Remove the SFP transceiver check and let firmware handle validation.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Elad Nachman <enachman@marvell.com>
Link: https://patch.msgid.link/20260205091958.231413-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using kmem_cache_free_bulk() in fq_gc() was not optimal.
1) It needs an array.
2) It is only saving cpu cycles for large batches.
The automatic array forces a stack canary, which is expensive.
In practice fq_gc was finding zero, one or two flows at most
per round.
Remove the array, use kmem_cache_free().
This makes fq_enqueue() smaller and faster.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-79 (-79)
Function old new delta
fq_enqueue 1629 1550 -79
Total: Before=24886583, After=24886504, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260204190034.76277-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, unhash_nsid() scans the entire system for each netns being
killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as
__peernet2id() also performs a linear search in the IDR.
Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move
unhash_nsid() out of the per-netns loop in cleanup_net() to perform a
single-pass traversal over survivor namespaces.
Identify dying peers by an 'is_dying' flag, which is set under net_rwsem
write lock after the netns is removed from the global list. This batches
the unhashing work and eliminates the O(L_dying_net) multiplier.
To minimize the impact on struct net size, 'is_dying' is placed in an
existing hole after 'hash_mix' in struct net.
Use a restartable idr_get_next() loop for iteration. This avoids the
unsafe modification issue inherent to idr_for_each() callbacks and allows
dropping the nsid_lock to safely call sleepy rtnl_net_notifyid().
Clean up redundant nsid_lock and simplify the destruction loop now that
unhashing is centralized.
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously the HW-GRO code was using a separate page_pool for the header
buffer. The pages of the header buffer were replenished via UMR. This
mechanism has some drawbacks:
- Reference counting on the page_pool page frags is not cheap.
- UMRs have HW overhead for updating and also for access. Especially for
the KLM type which was previously used.
- UMR code for headers is complex.
This patch switches to using a static memory area (static MTT MKEY) for
the header buffer and does a header memcpy. This happens only once per
GRO session. The SKB is allocated from the per-cpu NAPI SKB cache.
Performance numbers for x86:
+---------------------------------------------------------+
| Test | Baseline | Header Copy | Change |
|---------------------+------------+-------------+--------|
| iperf3 oncpu | 59.5 Gbps | 64.00 Gbps | 7 % |
| iperf3 offcpu | 102.5 Gbps | 104.20 Gbps | 2 % |
| kperf oncpu | 115.0 Gbps | 130.00 Gbps | 12 % |
| XDP_DROP (skb mode) | 3.9 Mpps | 3.9 Mpps | 0 % |
+---------------------------------------------------------+
Notes on test:
- System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
- oncpu: NAPI and application running on same CPU
- offcpu: NAPI and application running on different CPUs
- MTU: 1500
- iperf3 tests are single stream, 60s with IPv6 (for slightly larger
headers)
- kperf version [1]
[1] git://git.kernel.dk/kperf.git
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260204200345.1724098-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its
usage in ethtool and link-info tables.
Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc9).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and Netfilter.
Previous releases - regressions:
- eth: stmmac: fix stm32 (and potentially others) resume regression
- nf_tables: fix inverted genmask check in nft_map_catchall_activate()
- usb: r8152: fix resume reset deadlock
- fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts
Previous releases - always broken:
- sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads
with malicious u32 rules
- eth: ice: timestamping related fixes"
* tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF
netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()
net: cpsw: Execute ndo_set_rx_mode callback in a work queue
net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue
gve: Correct ethtool rx_dropped calculation
gve: Fix stats report corruption on queue count change
selftest: net: add a test-case for encap segmentation after GRO
net: gro: fix outer network offset
net: add proper RCU protection to /proc/net/ptype
net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi()
wifi: iwlwifi: mvm: pause TCM on fast resume
wifi: iwlwifi: mld: cancel mlo_scan_start_wk
net: spacemit: k1-emac: fix jumbo frame support
net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4
net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4
net: enetc: Remove CBDR cacheability AXI settings for ENETC v4
net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4
tipc: use kfree_sensitive() for session key material
net: stmmac: fix stm32 (and potentially others) resume regression
net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts
...
|
|
Currently we are registering one dynamic lockdep key for each allocated
qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects
packets to another device while the root lock is acquired [1].
Since dynamic keys are a limited resource, we can save them at least for
qdiscs that are not meant to acquire the root lock in the traffic path,
or to carry traffic at all, like:
- clsact
- ingress
- noqueue
Don't register dynamic keys for the above schedulers, so that we hit
MAX_LOCKDEP_KEYS later in our tests.
[1] https://github.com/multipath-tcp/mptcp_net-next/issues/451
Changes in v2:
- change ordering of spin_lock_init() vs. lockdep_register_key()
(Jakub Kicinski)
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When looking at the iMX93 documentation, the definitions in the driver
do not correspond with the documentation, which makes the driver
confusing.
The driver, for example, re-uses a definition for bit 0 for two
different registers, where this bit have completely different purposes.
Fix this by renaming the second register, and adding a definition that
reflects the true purpose of bit 0 in the first register (EQOS enable.)
Replace MX93_GPR_ENET_QOS_INTF_MODE_MASK with MX93_GPR_ENET_QOS_ENABLE
and MX93_GPR_ENET_QOS_INTF_SEL_MASK as MX93_GPR_ENET_QOS_INTF_MODE_MASK
is not a register field.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vnaGl-00000007i9f-0ZMw@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP v6 spends a good amount of time rebuilding a fresh fl6 at each
transmit in inet6_csk_xmit()/inet6_csk_route_socket().
TCP v4 caches the information in inet->cork.fl.u.ip4 instead.
This patch is a first step converting IPv6 to the same strategy:
Before this patch inet6_sk_rebuild_header() only validated/rebuilt
a dst. Automatic variable @fl6 content was lost.
After this patch inet6_sk_rebuild_header() also initializes
inet->cork.fl.u.ip6, which can be reused in the future.
This makes inet6_sk_rebuild_header() very similar to
inet_sk_rebuild_header().
Also remove the EXPORT_SYMBOL_GPL(), inet6_sk_rebuild_header()
is not called from any module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260204163035.4123817-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'tcp-remove-net-core-request_sock-c-and-no-longer-inline-__reqsk_free'
Eric Dumazet says:
====================
tcp: remove net/core/request_sock.c and no longer inline __reqsk_free()
After DCCP removal, net/core/request_sock.c makes no more sense.
Move reqsk_queue_alloc() and reqsk_fastopen_remove() to TCP files.
Then put __reqsk_free() out of line to save ~2 Kbytes of text.
====================
Link: https://patch.msgid.link/20260204055147.1682705-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113)
Function old new delta
__reqsk_free - 114 +114
sock_edemux 18 82 +64
inet_csk_listen_start 233 264 +31
__pfx___reqsk_free - 16 +16
__pfx_reqsk_queue_alloc 16 - -16
__pfx_reqsk_free 16 - -16
reqsk_queue_alloc 46 - -46
tcp_req_err 272 177 -95
reqsk_fastopen_remove 348 253 -95
cookie_bpf_check 157 62 -95
cookie_tcp_reqsk_alloc 387 290 -97
cookie_v4_check 1568 1465 -103
reqsk_free 105 - -105
cookie_v6_check 1519 1412 -107
sock_gen_put 187 78 -109
sock_pfree 212 82 -130
tcp_try_fastopen 1818 1683 -135
tcp_v4_rcv 3478 3294 -184
reqsk_put 306 90 -216
tcp_get_cookie_sock 551 318 -233
tcp_v6_rcv 3404 3141 -263
tcp_conn_request 2677 2384 -293
Total: Before=24887415, After=24885302, chg -0.01%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After DCCP removal, this file was not needed any more.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This function belongs to TCP stack, not to net/core/request_sock.c
We get rid of the now empty request_sock.c n the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only called once from inet_csk_listen_start(), it can be static.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: stmmac: rk: final cleanups part
This is the last part of my current dwmac-rk cleanups.
====================
Link: https://patch.msgid.link/aYMN2gZMfLPKuukG@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rk3506, rk3528 and rk3588 have the rmii_mode bit in the clock GRF
register rather than the gmac GRF register. Provide a mask for this
field in the clock register, and convert these SoCs to use this.
Add the necessary code in rk_gmac_powerup() to write this field.
This allows us to get rid of these SoCs set_to_rmii() function. As
such, we need to mark these SoCs as supporting RMII mode.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588
Link: https://patch.msgid.link/E1vnYyB-00000007hpF-1dwK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use rk_encode_wm16() for RMII clock gating control, and also for the
io_clksel bit used to select the transmit clock between CRU-derived
and IO-derived clock sources.
Both of these were configured via the "set_clock_selection" method in
the SoC specific operations, but there is no requirement to change the
io_clksel except when enabling clocks.
It is also possible that we don't need to ungate the RMII clock if we
are operating in RGMII mode, but this commit makes no change there.
Split up the configuration of these as separate functions, and remove
the set_clock_selection() method. Since these clocking bits are in the
same register that we call the "speed" register, move the logic for
writing that register into rk_write_speed_grf_reg().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588
Link: https://patch.msgid.link/E1vnYy6-00000007hp9-1AJM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RK3528 gmac0 dtsi contains:
gmac0: ethernet@ffbd0000 {
phy-handle = <&rmii0_phy>;
phy-mode = "rmii";
mdio0: mdio {
rmii0_phy: ethernet-phy@2 {
phy-is-integrated;
};
};
};
This follows the same pattern as rk3328, where this gmac instance
only supports RMII. Disable RGMII in phylink's supported_interfaces
mask for this gmac instance.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/E1vnYy1-00000007hp3-0hKm@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As detailed in a previous commit ("net: stmmac: rk: convert rk3328 to
use bsp_priv->id") rk3328 gmac2phy only supports RMII, whereas gmac2io
supports both RMII and RGMII. Clear supports_rgmii for gmac2phy.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328 gmac2io,rk3568,rk3588
Link: https://patch.msgid.link/E1vnYxw-00000007hox-0DqH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rather than providing a now-empty set_to_rmii() method to indicate
that RMII is supported, switch to setting ops->supports_rmii instead.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de> #px30,rk3328,rk3568,rk3588
Link: https://patch.msgid.link/E1vnYxq-00000007hor-3yXt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|