Age | Commit message (Collapse) | Author | Files | Lines |
|
Jan Stancek says:
====================
tools: ynl: add install target
This series adds an install target for ynl. The python code
is moved to a subdirectory, so it can be used as a package
with flat layout, as well as directly from the tree.
To try the install as a non-root user you can run:
$ mkdir /tmp/myroot
$ make DESTDIR=/tmp/myroot install
$ PATH="/tmp/myroot/usr/bin:$PATH" PYTHONPATH="$(ls -1d /tmp/myroot/usr/lib/python*/site-packages)" ynl --help
Proposed install layout is described in last patch.
====================
Link: https://patch.msgid.link/cover.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This will install C library, specs, rsts and pyynl. The initial
structure is:
$ mkdir /tmp/myroot
$ make DESTDIR=/tmp/myroot install
/usr
/usr/lib64
/usr/lib64/libynl.a
/usr/lib/python3.XX/site-packages/pyynl/*
/usr/lib/python3.XX/site-packages/pyynl-0.0.1.dist-info/*
/usr/bin
/usr/bin/ynl
/usr/bin/ynl-ethtool
/usr/include/ynl/*.h
/usr/share
/usr/share/doc
/usr/share/doc/ynl
/usr/share/doc/ynl/*.rst
/usr/share/ynl
/usr/share/ynl/genetlink-c.yaml
/usr/share/ynl/genetlink-legacy.yaml
/usr/share/ynl/genetlink.yaml
/usr/share/ynl/netlink-raw.yaml
/usr/share/ynl/specs
/usr/share/ynl/specs/devlink.yaml
/usr/share/ynl/specs/dpll.yaml
/usr/share/ynl/specs/ethtool.yaml
/usr/share/ynl/specs/fou.yaml
/usr/share/ynl/specs/handshake.yaml
/usr/share/ynl/specs/mptcp_pm.yaml
/usr/share/ynl/specs/netdev.yaml
/usr/share/ynl/specs/net_shaper.yaml
/usr/share/ynl/specs/nfsd.yaml
/usr/share/ynl/specs/nftables.yaml
/usr/share/ynl/specs/nlctrl.yaml
/usr/share/ynl/specs/ovs_datapath.yaml
/usr/share/ynl/specs/ovs_flow.yaml
/usr/share/ynl/specs/ovs_vport.yaml
/usr/share/ynl/specs/rt_addr.yaml
/usr/share/ynl/specs/rt_link.yaml
/usr/share/ynl/specs/rt_neigh.yaml
/usr/share/ynl/specs/rt_route.yaml
/usr/share/ynl/specs/rt_rule.yaml
/usr/share/ynl/specs/tcp_metrics.yaml
/usr/share/ynl/specs/tc.yaml
/usr/share/ynl/specs/team.yaml
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/c882688d751295c7f35c7d4eba104cd5174a0861.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Generate docs using ynl_gen_rst and add install target for
headers, specs and generates rst files.
Factor out SPECS_DIR since it's repeated many times.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/645c68e3d201f1ef4276e3daddfe06262a0c2804.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add pyproject.toml and define authors, dependencies and
user-facing scripts. This will be used later by pip to
install python code.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/b184b43340f08aef97387bfd7f2b2cd9b015c343.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move python code to a separate directory so it can be
packaged as a python module. Updates existing references
in selftests and docs.
Also rename ynl-gen-[c|rst] to ynl_gen_[c|rst], avoid
dashes as these prevent easy imports for entrypoints.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/a4151bad0e6984e7164d395125ce87fd2e048bf1.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Empty nests are the same size as a flag at the netlink level
(just a 4 byte nlattr without a payload). They are sometimes
useful in case we want to only communicate a presence of
something but may want to add more details later.
This may be the case in the upcoming io_uring ZC patches,
for example.
Improve handling of nested empty structs. We already support
empty structs since a lot of netlink replies are empty, but
for nested ones we need minor tweaks to avoid pointless empty
lines and unused variables.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20250108200758.2693155-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
John Daley says:
====================
enic: Set link speed only after link up
====================
Link: https://patch.msgid.link/20250107214159.18807-1-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RX adaptive interrupt moderation table is indexed by link speed
range, where the last row of the table is the catch-all for all link
speeds greater than 10Gbps. The comment said 10 - 40Gbps, but since
there are now adapters with link speeds than 40Gbps, the comment is now
wrong and should indicate it applies to all speeds greater than 10Gbps.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107214159.18807-4-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The link speed is obtained in the RX adaptive coalescing function. It
was being called at probe time when the link may not be up. Change the
call to run after the Link comes up.
The impact of not getting the correct link speed was that the low end of
the adaptive interrupt range was always being set to 0 which could have
caused a slight increase in the number of RX interrupts.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107214159.18807-3-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the function used for setting the RX coalescing range to before
the function that checks the link status. It needs to be called from
there instead of from the probe function.
There is no functional change.
Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250107214159.18807-2-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All Qualcomm firmwares uploaded to linux-firmware are in MBN format,
instead of split MDT. No functional changes, just correct the DTS
example so people will not rely on unaccepted files.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Link: https://patch.msgid.link/20250108120242.156201-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
net: make sure we retain NAPI ordering on netdev->napi_list
I promised Eric to remove the rtnl protection of the NAPI list,
when I sat down to implement it over the break I realized that
the recently added NAPI ID retention will break the list ordering
assumption we have in netlink dump. The ordering used to happen
"naturally", because we'd always add NAPIs that the head of the
list, and assign a new monotonically increasing ID.
Before the first patch of this series we'd still only add at
the head of the list but now the newly added NAPI may inherit
from its config an ID lower than something else already on the list.
The fix is in the first patch, the rest is netdevsim churn to test it.
I'm posting this for net-next, because AFAICT the problem can't
be triggered in net, given the very limited queue API adoption.
v2:
- [patch 2] allocate the array with kcalloc() instead of kvcalloc()
- [patch 2] set GFP_KERNEL_ACCOUNT when allocating queues
- [patch 6] don't null-check page pool before page_pool_destroy()
- [patch 6] controled -> controlled
- [patch 7] change mode to 0200
- [patch 7] reorder removal to be inverse of add
- [patch 7] fix the spaces vs tabs
v1: https://lore.kernel.org/20250103185954.1236510-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250107160846.2223263-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Test listing netdevsim NAPIs before and after a single queue
has been reset (and NAPIs re-added).
Start from resetting the middle queue because edge cases
(first / last) may actually be less likely to trigger bugs.
# ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..4
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Support triggering queue reset via debugfs for an upcoming test.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add queue management API support. We need a way to reset queues
to test NAPI reordering, the queue management API provides a
handy scaffolding for that.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We'll need the code to allocate and free queues in the queue management
API, factor it out.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make nsim->rqs an array of pointers and allocate them individually
so that we can swap them out one by one.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Link the NAPI instances to their configs. This will be needed to test
that NAPI config doesn't break list ordering.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Linus suggested during one of past maintainer summits (in context of
a DMA_BUF discussion) that symbol namespaces can be used to prevent
unwelcome but in-tree code from using all exported functions.
Create a namespace for netdev.
Export netdev_rx_queue_restart(), drivers may want to use it since
it gives them a simple and safe way to restart a queue to apply
config changes. But it's both too low level and too actively developed
to be used outside netdev.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Netlink code depends on NAPI instances being sorted by ID on
the netdev list for dump continuation. We need to be able to
find the position on the list where we left off if dump does
not fit in a single skb, and in the meantime NAPI instances
can come and go.
This was trivially true when we were assigning a new ID to every
new NAPI instance. Since we added the NAPI config API, we try
to retain the ID previously used for the same queue, but still
add the new NAPI instance at the start of the list.
This is fine if we reset the entire netdev and all NAPIs get
removed and added back. If driver replaces a NAPI instance
during an operation like DEVMEM queue reset, or recreates
a subset of NAPI instances in other ways we may end up with
broken ordering, and therefore Netlink dumps with either
missing or duplicated entries.
At this stage the problem is theoretical. Only two drivers
support queue API, bnxt and gve. gve recreates NAPIs during
queue reset, but it doesn't support NAPI config.
bnxt supports NAPI config but doesn't recreate instances
during reset.
We need to save the ID in the config as soon as it is assigned
because otherwise the new NAPI will not know what ID it will
get at enable time, at the time it is being added.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A synchronize_rcu() was added by mistake in commit
c5a759117210 ("net/hsr: Use list_head (and rcu) instead
of array for slave devices.")
RCU does not mandate to observe a grace period after
list_add_tail_rcu().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250107144701.503884-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In commit 66e4c8d95008 ("net: warn if transport header was not set")
I added a debug check in skb_transport_header() to detect
if a caller expects the transport_header to be set to a meaningful
value by a prior code path.
Unfortunately, __netif_receive_skb_core() resets the transport header
to the same value than the network header, defeating this check
in receive paths.
Pretending the transport and network headers are the same
is usually wrong.
This patch removes this reset for CONFIG_DEBUG_NET=y builds
to let fuzzers and CI find bugs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250107144342.499759-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.
No functional changes here, but saves some comments during reviews of
new patches built on existing code.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
Reviewed-by: Roger Quadros <rogerq@kernel.org> # for ti,k3-am654-*
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> # net/brcm,*
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20250107125613.211478-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This change introduces a mechanism for notifying userspace
applications about changes to IPv6 anycast addresses via netlink. It
includes:
* Addition and deletion of IPv6 anycast addresses are reported using
RTM_NEWANYCAST and RTM_DELANYCAST.
* A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these
notifications.
This enables user space applications(e.g. ip monitor) to efficiently
track anycast addresses through netlink messages, improving metrics
collection and system monitoring. It also unlocks the potential for
advanced anycast management in user space, such as hardware offload
control and fine grained network control.
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The overflow_work is using system wq to do overflow checks and updates
for PHC device timecounter, which might be overhelmed by other tasks.
But there is dedicated kthread in PTP subsystem designed for such
things. This patch changes the work queue to proper align with PTP
subsystem and to avoid overloading system work queue.
The adjfine() function acts the same way as overflow check worker,
we can postpone ptp aux worker till the next overflow period after
adjfine() was called.
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250107104812.380225-1-vadfed@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
stmmac_rx_offset() is referenced in stmmac_main.c only,
let's move it to stmmac_main.c.
Drop the inline keyword by the way, it is better to let the compiler
to decide.
Compile tested only.
No functional change intended.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250107075448.4039925-1-0x1207@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for RTL8125BP rev.b. Its XID is 0x689. This chip supports
DASH and its dash type is "RTL_DASH_25_BP".
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250107064355.104711-1-hau@realtek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Most of our tests use rtnetlink to read device stats, so they
don't expose the drivers much to paths in which device stats
are read under RCU. Add tests which hammer profcs reads to
make sure drivers:
- don't sleep while reporting stats,
- can handle parallel reads,
- can handle device going down while reading.
Set ifname on the env class in NetDrvEnv, we already do that
in NetDrvEpEnv.
KTAP version 1
1..7
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
ok 6 stats.procfs_hammer
# completed up/down cycles: 6
ok 7 stats.procfs_downup_hammer
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250107022932.2087744-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'intel-wired-lan-driver-updates-2025-01-06-igb-igc-ixgbe-ixgbevf-i40e-fm10k'
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-01-06 (igb, igc, ixgbe, ixgbevf, i40e, fm10k)
For igb:
Sriram Yagnaraman and Kurt Kanzenbach add support for AF_XDP
zero-copy.
Original cover letter:
The first couple of patches adds helper functions to prepare for AF_XDP
zero-copy support which comes in the last couple of patches, one each
for Rx and TX paths.
As mentioned in v1 patchset [0], I don't have access to an actual IGB
device to provide correct performance numbers. I have used Intel 82576EB
emulator in QEMU [1] to test the changes to IGB driver.
The tests use one isolated vCPU for RX/TX and one isolated vCPU for the
xdp-sock application [2]. Hope these measurements provide at the least
some indication on the increase in performance when using ZC, especially
in the TX path. It would be awesome if someone with a real IGB NIC can
test the patch.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 220 235 350
txpush 1.000 1.000 410
l2fwd 1.000 1.000 200
AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 200 210 310
txpush 1.000 1.000 410
l2fwd 0.900 1.000 160
[0]: https://lore.kernel.org/intel-wired-lan/20230704095915.9750-1-sriram.yagnaraman@est.tech/
[1]: https://www.qemu.org/docs/master/system/devices/igb.html
[2]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
Subsequent changes and information can be found here:
https://lore.kernel.org/intel-wired-lan/20241018-b4-igb_zero_copy-v9-0-da139d78d796@linutronix.de/
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For igc:
Song Yoong Siang allows for the XDP program to be hot-swapped.
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
Joe Damato adds sets IRQ and queues to NAPI instances to allow for
reporting via netdev-genl API.
For ixgbe:
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For ixgbevf:
Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.
For i40e:
Alex implements "mdd-auto-reset-vf" private flag to automatically reset
VFs when encountering an MDD event.
For fm10k:
Dr. David Alan Gilbert removes an unused function.
====================
Link: https://patch.msgid.link/20250106221929.956999-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fm10k_iov_msg_mac_vlan_pf() has been unused since 2017's
commit 1f5c27e52857 ("fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN
requests")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-16-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link queues to NAPI instances via netdev-genl API so that users can
query this information with netlink. Handle a few cases in the driver:
1. Link/unlink the NAPIs when XDP is enabled/disabled
2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled
Example output when IGC_FLAG_QUEUE_PAIRS is enabled:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
Since IGC_FLAG_QUEUE_PAIRS is enabled, you'll note that the same NAPI ID
is present for both rx and tx queues at the same index, for example
index 0:
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using
the grub command line option "maxcpus=2" to force
igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS.
Example output when IGC_FLAG_QUEUE_PAIRS is disabled:
$ lscpu | grep "On-line CPU"
On-line CPU(s) list: 0,2
$ ethtool -l enp86s0 | tail -5
Current hardware settings:
RX: n/a
TX: n/a
Other: 1
Combined: 2
$ cat /proc/interrupts | grep enp
144: [...] enp86s0
145: [...] enp86s0-rx-0
146: [...] enp86s0-rx-1
147: [...] enp86s0-tx-0
148: [...] enp86s0-tx-1
1 "other" IRQ, and 2 IRQs for each of RX and Tx, so we expect netlink to
report 4 IRQs with unique NAPI IDs:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8196, 'ifindex': 2, 'irq': 148},
{'id': 8195, 'ifindex': 2, 'irq': 147},
{'id': 8194, 'ifindex': 2, 'irq': 146},
{'id': 8193, 'ifindex': 2, 'irq': 145}]
Now we examine which queues these NAPIs are associated with, expecting
that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will
have its own NAPI instance:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-15-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link IRQs to NAPI instances via netdev-genl API so that users can query
this information with netlink.
Compare the output of /proc/interrupts (noting that IRQ 128 is the
"other" IRQ which does not appear to have a NAPI instance):
$ cat /proc/interrupts | grep enp86s0 | cut --delimiter=":" -f1
128
129
130
131
132
The output from netlink shows the mapping of NAPI IDs to IRQs (again
noting that 128 is absent as it is the "other" IRQ):
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8196,
'ifindex': 2,
'irq': 132},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8195,
'ifindex': 2,
'irq': 131},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8194,
'ifindex': 2,
'irq': 130},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8193,
'ifindex': 2,
'irq': 129}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-14-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs.
This flag is also used in other network adapters like ICE.
Usage:
- "on" - The problematic VF will be automatically reset
if a malformed descriptor is detected.
- "off" - The problematic VF will be disabled.
In cases where a VF sends malformed packets classified as malicious, it can
cause the Tx queue to freeze, rendering it unusable for several minutes. When
an MDD event occurs, this new implementation allows for a graceful VF reset to
quickly restore operational state.
Currently, VF queues are disabled if an MDD event occurs. This patch adds the
ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD
event logging throttling to avoid dmesg pollution and unifies the format of
Tx and Rx MDD messages.
Note: Standard message rate limiting functions like dev_info_ratelimited()
do not meet our requirements. Custom rate limiting is implemented,
please see the code for details.
Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ixgbevf_run_xdp() converts customed xdp action to a negative error code
with the sk_buff pointer type which be checked with IS_ERR in
ixgbevf_clean_rx_irq(). Remove this error pointer handing instead use
plain int return value.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-12-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ixgbe_run_xdp() converts customed xdp action to a negative error code
with the sk_buff pointer type which be checked with IS_ERR in
ixgbe_clean_rx_irq(). Remove this error pointer handing instead use
plain int return value.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-11-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
igb_run_xdp() converts customed xdp action to a negative error code
with the sk_buff pointer type which be checked with IS_ERR in
igb_clean_rx_irq(). Remove this error pointer handing instead use plain
int return value.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-10-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
igc_xdp_run_prog() converts customed xdp action to a negative error code
with the sk_buff pointer type which be checked with IS_ERR in
igc_clean_rx_irq(). Remove this error pointer handing instead use plain
int return value to fix this smatch warnings:
drivers/net/ethernet/intel/igc/igc_main.c:2533
igc_xdp_run_prog() warn: passing zero to 'ERR_PTR'
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-9-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the driver would always close and reopen the network interface
when setting/removing the XDP program, regardless of the presence of XDP
resources. This could cause unnecessary disruptions.
To avoid this, introduces a check to determine if there is a need to
close and reopen the interface, allowing for seamless hot-swapping of
XDP programs.
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for AF_XDP zero-copy transmit path.
A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
igb_clean_tx_irq() can clean the buffers correctly based on type.
igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
enabled. We share the TX ring between slow path, XDP and AF_XDP
zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
igb_xmit_zc(), Add FIXME for RS bit]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-7-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for AF_XDP zero-copy receive path.
When AF_XDP zero-copy is enabled, the rx buffers are allocated from the
xsk buff pool using igb_alloc_rx_buffers_zc().
Use xsk_pool_get_rx_frame_size() to set SRRCTL rx buf size when zero-copy
is enabled.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Port to v6.12 and provide napi_id for xdp_rxq_info_reg(),
RCT, remove NETDEV_XDP_ACT_XSK_ZEROCOPY, update NTC handling,
READ_ONCE() xsk_pool, likelyfy for XDP_REDIRECT case]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move XDP finalize and Rx statistics update into separate functions. This
way, they can be reused by the XDP and XDP/ZC code later.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the following ring flag:
- IGB_RING_FLAG_TX_DISABLED (when xsk pool is being setup)
Add a xdp_buff array for use with XSK receive batch API, and a pointer
to xsk_pool in igb_adapter.
Add enable/disable functions for TX and RX rings.
Add enable/disable functions for XSK pool.
Add xsk wakeup function.
None of the above functionality will be active until
NETDEV_XDP_ACT_XSK_ZEROCOPY is advertised in netdev->xdp_features.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Add READ/WRITE_ONCE(), synchronize_net(),
remove IGB_RING_FLAG_AF_XDP_ZC]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce igb_xdp_is_enabled() to check if an XDP program is assigned to
the device. Use that wherever xdp_prog is read and evaluated.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Split patches and use READ_ONCE()]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove static qualifiers on the following functions to be able to call
from XSK specific file that is added in the later patches:
- igb_xdp_tx_queue_mapping()
- igb_xdp_ring_update_tail()
- igb_clean_tx_ring()
- igb_clean_rx_ring()
- igb_xdp_xmit_back()
- igb_process_skb_fields()
While at it, inline igb_xdp_tx_queue_mapping() and
igb_xdp_ring_update_tail(). These functions are small enough and used in
XDP hot paths.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Split patches, inline small XDP functions]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
tools: ynl: decode link types present in tests
Using a kernel built for the net selftest target to run drivers/net
tests currently fails, because the net kernel automatically spawns
a handful of tunnel devices which YNL can't decode.
Fill in those missing link types in rt_link. We need to extend subset
support a bit for it to work.
v1: https://lore.kernel.org/20250105012523.1722231-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250107022820.2087101-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some of our tests load vti and ip6tnl so not being able to decode
the link attrs gets in the way of using Python YNL for testing.
Decode link attributes for ip6tnl, vti and vti6.
ip6tnl uses IFLA_IPTUN_FLAGS as u32, while ipv4 and sit expect
a u16 attribute, so we have a (first?) subset type override...
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250107022820.2087101-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When parsing throws an exception one often has to figure out which
attribute couldn't be parsed from first principles. For families
with large message parsing trees like rtnetlink guessing the
attribute can be hard.
Print a bit of information as the exception travels out, e.g.:
# when dumping rt links
Error decoding 'flags' from 'linkinfo-ip6tnl-attrs'
Error decoding 'data' from 'linkinfo-attrs'
Error decoding 'linkinfo' from 'link-attrs'
Traceback (most recent call last):
File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 119, in <module>
main()
File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 100, in main
reply = ynl.dump(args.dump, attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1064, in dump
return self._op(method, vals, dump=True)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1058, in _op
return self._ops(ops)[0]
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1045, in _ops
rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 738, in _decode
subdict = self._decode(NlAttrs(attr.raw), attr_spec['nested-attributes'], search_attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 763, in _decode
decoded = self._decode_sub_msg(attr, attr_spec, search_attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 714, in _decode_sub_msg
subdict = self._decode(NlAttrs(attr.raw, offset), msg_format.attr_set)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 749, in _decode
decoded = attr.as_scalar(attr_spec['type'], attr_spec.byte_order)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 147, in as_scalar
return format.unpack(self.raw)[0]
struct.error: unpack requires a buffer of 2 bytes
The Traceback is what we would previously see, the "Error..."
messages are new. We print a message per level (in the stack
order). Printing single combined message gets tricky quickly
given sub-messages etc.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250107022820.2087101-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We stated in documentation [1] and previous discussions [2]
that the need for overriding fields in members of subsets
is anticipated. Implement it.
Since each attr is now a new object we need to make sure
that the modifications are propagated. Specifically C codegen
wants to annotate which attrs are used in requests and replies
to generate the right validation artifacts.
[1] https://docs.kernel.org/next/userspace-api/netlink/specs.html#subset-of
[2] https://lore.kernel.org/netdev/20231004171350.1f59cd1d@kernel.org/
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250107022820.2087101-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While merging net to net-next I noticed that the kdoc above
__vlan_get_protocol_offset() has the wrong function name.
Fix that and all the other kdoc warnings in this file.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250106174620.1855269-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: dsa: cleanup EEE (part 2)
This is part 2 of the DSA EEE cleanups, removing what has become dead
code as a result of the EEE management phylib now does.
Patch 1 removes the useless setting of tx_lpi parameters in the
ksz driver.
Patch 2 does the same for mt753x.
Patch 3 removes the DSA core code that calls the get_mac_eee() operation.
This needs to be done before removing the implementations because doing
otherwise would cause dsa_user_get_eee() to return -EOPNOTSUPP.
Patches 4..8 remove the trivial get_mac_eee() implementations from DSA
drivers.
Patch 9 finally removes the get_mac_eee() method from struct
dsa_switch_ops.
====================
Link: https://patch.msgid.link/Z3vDwwsHSxH5D6Pm@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|