Age | Commit message (Collapse) | Author | Files | Lines |
|
No driver should spam the kernel log when merely being loaded.
The message in hclge_init() is clearly a debug message.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jijie Shao<shaojijie@huawei.com>
Link: https://patch.msgid.link/c2ac6f20f85056e7b35bd56d424040f996d32109.1749657070.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hash based multicast filtering is an optional feature. Currently,
driver overrides the value of multicast_filter_bins based on the hash
table size. If the feature is not supported, hash table size reads 0
however the value of multicast_filter_bins remains set to default
HASH_TABLE_SIZE which is incorrect. Let's extend the use of the property
snps,multicast-filter-bins to xgmac so it can be set to 0 via devicetree
to indicate multicast filtering is not supported.
Signed-off-by: Nikunj Kela <nikunj.kela@sima.ai>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/20250610200411.3751943-1-nikunj.kela@sima.ai
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some stress/negative firmware testing around devcmd(s) returning
EAGAIN found that the done bit could get out of sync in the
firmware when it wasn't cleared in a retry case.
While here, change the type of the local done variable to a bool
to match the return type from ionic_dev_cmd_done().
Fixes: ec8ee714736e ("ionic: stretch heartbeat detection")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250609212827.53842-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for the new rxfh_fields callbacks, instead of de-muxing
the rxnfc calls. The code is moved as we try to declare the functions
in the order ing which they appear in the ops struct.
Link: https://patch.msgid.link/20250611145949.2674086-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We're migrating RXFH config to new callbacks.
Remove RXFH handling from drivers where it does nothing.
Reviewed-by: Ziwei Xiao <ziweixiao@google.com>
Link: https://patch.msgid.link/20250611145949.2674086-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RX Flow Hashing supports using different configuration for different
RSS contexts. Only two drivers seem to support it. Make sure we
uniformly error out for drivers which don't.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611145949.2674086-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc2).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Apply conservative defaults.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250610220403.935-3-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make use of the return value from napi_complete_done(). This allows users to
use the gro_flush_timeout and napi_defer_hard_irqs sysfs attributes for
configuring software interrupt coalescing.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250610220403.935-2-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the shutdown hook to ensure clean and complete deactivation of
MACB controller. The shutdown sequence is protected with 'rtnl_lock()'
to serialize access and prevent race conditions while detaching and
closing the network device. This ensure a safe transition when the Kexec
utility calls the shutdown hook, facilitating seamless loading and
booting of a new kernel from the currently running one.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Link: https://patch.msgid.link/20250610114111.1708614-1-abin.joseph@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expand coverage of MAC stats via ethtool by adding rmon and eth-ctrl
stats.
ethtool -S eth0 --groups eth-ctrl
Standard stats for eth0:
eth-ctrl-MACControlFramesTransmitted: 0
eth-ctrl-MACControlFramesReceived: 0
ethtool -S eth0 --groups rmon
Standard stats for eth0:
rmon-etherStatsUndersizePkts: 0
rmon-etherStatsOversizePkts: 0
rmon-etherStatsFragments: 0
rmon-etherStatsJabbers: 0
rx-rmon-etherStatsPkts64Octets: 32807689
rx-rmon-etherStatsPkts65to127Octets: 567512968
rx-rmon-etherStatsPkts128to255Octets: 64730266
rx-rmon-etherStatsPkts256to511Octets: 20136039
rx-rmon-etherStatsPkts512to1023Octets: 28476870
rx-rmon-etherStatsPkts1024to1518Octets: 6958335
rx-rmon-etherStatsPkts1519to2047Octets: 164
rx-rmon-etherStatsPkts2048to4095Octets: 3844
rx-rmon-etherStatsPkts4096to8191Octets: 21814
rx-rmon-etherStatsPkts8192to9216Octets: 6540818
rx-rmon-etherStatsPkts9217to9742Octets: 4180897
tx-rmon-etherStatsPkts64Octets: 8786
tx-rmon-etherStatsPkts65to127Octets: 31475804
tx-rmon-etherStatsPkts128to255Octets: 3581331
tx-rmon-etherStatsPkts256to511Octets: 2483038
tx-rmon-etherStatsPkts512to1023Octets: 4500916
tx-rmon-etherStatsPkts1024to1518Octets: 38741270
tx-rmon-etherStatsPkts1519to2047Octets: 15521
tx-rmon-etherStatsPkts2048to4095Octets: 4109
tx-rmon-etherStatsPkts4096to8191Octets: 20817
tx-rmon-etherStatsPkts8192to9216Octets: 6904055
tx-rmon-etherStatsPkts9217to9742Octets: 6757746
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250610171109.1481229-3-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the link is up, either eth_proto_oper or ext_eth_proto_oper
typically reports the active link protocol, from which both speed
and number of lanes can be retrieved. However, in certain cases,
such as when a NIC is connected via a non-standard cable, the
firmware may not report the protocol.
In such scenarios, the speed can still be obtained from the
data_rate_oper field in PTYS register. Since data_rate_oper
provides only speed information and lacks lane details, it is
incorrect to derive the number of lanes from it.
This patch corrects the behavior by setting the number of lanes to
UNKNOWN instead of incorrectly using MAX_LANES when relying on
data_rate_oper.
Fixes: 7e959797f021 ("net/mlx5e: Enable lanes configuration when auto-negotiation is off")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, a unique tunnel id was added for the matching on TC
non-zero chains, to support inner header rewrite with goto action.
Later, it was used to support VF tunnel offload for vxlan, then for
Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is
used to parse tunnel options. For Geneve, if there is TLV option, a
object is created, or refcnt is added if already exists. But the
temporary mlx5_flow_spec is directly freed after parsing, which causes
the leak because no information regarding the object is saved in
flow's mlx5_flow_spec, which is used to free the object when deleting
the flow.
To fix the leak, call mlx5_geneve_tlv_option_del() before free the
temporary spec if it has TLV object.
Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When there are more than one destinations, we create a FW flow
table and provide it with all the destinations. FW requires to
have wire as the last destination in the list (if it exists),
otherwise the operation fails with FW syndrome.
This patch fixes the destination array action creation: if it
contains a wire destination, it is moved to the end.
Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix missing field handling in definer - outer IP version.
Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The newly introduced mutex is only used for reformat actions, but it was
initialized for modify header instead.
The struct that contains the mutex is zero-initialized and an all-zero
mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES.
Fixes: b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When attempting to add a rule to an existing flow group, if a matching
flow group exists but is not active, the error code returned should be
EAGAIN, so that the rule can be added to the matching flow group once
it is active, rather than ENOENT, which indicates that no matching
flow group was found.
Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix shutdown flow UAF when a virtual function is created on the embedded
chip (ECVF) of a BlueField device. In such case the vport acl ingress
table is not properly destroyed.
ECVF functionality is independent of ecpf_vport_exists capability and
thus functions mlx5_eswitch_(enable|disable)_pf_vf_vports() should not
test it when enabling/disabling ECVF vports.
kernel log:
[] refcount_t: underflow; use-after-free.
[] WARNING: CPU: 3 PID: 1 at lib/refcount.c:28
refcount_warn_saturate+0x124/0x220
----------------
[] Call trace:
[] refcount_warn_saturate+0x124/0x220
[] tree_put_node+0x164/0x1e0 [mlx5_core]
[] mlx5_destroy_flow_table+0x98/0x2c0 [mlx5_core]
[] esw_acl_ingress_table_destroy+0x28/0x40 [mlx5_core]
[] esw_acl_ingress_lgcy_cleanup+0x80/0xf4 [mlx5_core]
[] esw_legacy_vport_acl_cleanup+0x44/0x60 [mlx5_core]
[] esw_vport_cleanup+0x64/0x90 [mlx5_core]
[] mlx5_esw_vport_disable+0xc0/0x1d0 [mlx5_core]
[] mlx5_eswitch_unload_ec_vf_vports+0xcc/0x150 [mlx5_core]
[] mlx5_eswitch_disable_sriov+0x198/0x2a0 [mlx5_core]
[] mlx5_device_disable_sriov+0xb8/0x1e0 [mlx5_core]
[] mlx5_sriov_detach+0x40/0x50 [mlx5_core]
[] mlx5_unload+0x40/0xc4 [mlx5_core]
[] mlx5_unload_one_devl_locked+0x6c/0xe4 [mlx5_core]
[] mlx5_unload_one+0x3c/0x60 [mlx5_core]
[] shutdown+0x7c/0xa4 [mlx5_core]
[] pci_device_shutdown+0x3c/0xa0
[] device_shutdown+0x170/0x340
[] __do_sys_reboot+0x1f4/0x2a0
[] __arm64_sys_reboot+0x2c/0x40
[] invoke_syscall+0x78/0x100
[] el0_svc_common.constprop.0+0x54/0x184
[] do_el0_svc+0x30/0xac
[] el0_svc+0x48/0x160
[] el0t_64_sync_handler+0xa4/0x12c
[] el0t_64_sync+0x1a4/0x1a8
[] --[ end trace 9c4601d68c70030e ]---
Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When firmware asks the driver to allocate more pages, using event of
give_pages, the driver should always allocate it from same NUMA, the
original device NUMA. Current code uses dev_to_node() which can result
in different NUMA as it is changed by other driver flows, such as
mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
allocating firmware pages.
Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-06-10 (i40e, iavf, ice, e1000)
For i40e:
Robert Malz improves reset handling for situations where multiple reset
requests could cause some to be missed.
For iavf:
Ahmed adds detection, and handling, of reset that could occur early in
the initialization process to stop long wait/hangs.
For ice:
Anton, properly, sets missed use_nsecs value.
For e1000:
Joe Damato moves cancel_work_sync() call to avoid deadlock.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000: Move cancel_work_sync to avoid deadlock
ice/ptp: fix crosstimestamp reporting
iavf: fix reset_task for early reset event
i40e: retry VFLR handling if there is ongoing VF reset
i40e: return false from i40e_reset_vf if reset is in progress
====================
Link: https://patch.msgid.link/20250610171348.1476574-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'managed' is a non-boolean property specified in ethernet-controller.yaml.
Since commit c141ecc3cecd7 ("of: Warn when of_property_read_bool() is
used on non-boolean properties") this raises a warning. Use the
replacement of_property_present() instead.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250610114057.414791-1-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This entry is covered by the entry in the next line already.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/2d81fe20-f71d-4483-817d-d46f9ec88cce@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
According to Realtek [0] it's safe to enable EEE at 5Gbps on RTL8126.
[0] https://www.spinics.net/lists/netdev/msg1091873.html
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/18ce0996-0182-4a11-a93a-df14b0e6876c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Devlink info allows to expose serial number and board serial number
Get the values from PCI VPD and expose it.
$ devlink dev info
pci/0000:08:00.0:
driver mlx5_core
serial_number e4397f872caeed218000846daa7d2f49
board.serial_number MT2314XZ00YA
versions:
fixed:
fw.psid MT_0000000894
running:
fw.version 28.41.1000
fw 28.41.1000
stored:
fw.version 28.41.1000
fw 28.41.1000
auxiliary/mlx5_core.eth.0:
driver mlx5_core.eth
pci/0000:08:00.1:
driver mlx5_core
serial_number e4397f872caeed218000846daa7d2f49
board.serial_number MT2314XZ00YA
versions:
fixed:
fw.psid MT_0000000894
running:
fw.version 28.41.1000
fw 28.41.1000
stored:
fw.version 28.41.1000
fw 28.41.1000
auxiliary/mlx5_core.eth.1:
driver mlx5_core.eth
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250610025128.109232-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
igc already supports enabling MAC Merge for FPE. This patch adds
support for preemptible queues in mqprio.
Tested preemption with mqprio by:
1. Enable FPE:
ethtool --set-mm enp1s0 pmac-enabled on tx-enabled on verify-enabled on
2. Enable preemptible queue in mqprio:
mqprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 1@3 \
fp P P P E
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Changes:
1. Introduce tx_enabled flag to control preemptible queue. tx_enabled
is set via mmsv module based on multiple factors, including link
up/down status, to determine if FPE is active or inactive.
2. Add priority field to TXDCTL for express queue to improve data
fetch performance.
3. Block preemptible queue setup in taprio unless reverse-tsn-txq-prio
private flag is set. Encourages adoption of standard queue priority
scheme for new features.
4. Hardware-padded frames from preemptible queues result in incorrect
mCRC values, as padding bytes are excluded from the computation. Pad
frames to at least 60 bytes using skb_padto() before transmission to
ensure the hardware includes padding in the mCRC calculation.
Tested preemption with taprio by:
1. Enable FPE:
ethtool --set-mm enp1s0 pmac-enabled on tx-enabled on verify-enabled on
2. Enable private flag to reverse TX queue priority:
ethtool --set-priv-flags enp1s0 reverse-txq-prio on
3. Enable preemptible queue in taprio:
taprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 1@3 \
fp P P P E
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
By default, igc assigns TX hw queue 0 the highest priority and queue 3
the lowest. This is opposite of most NICs, where TX hw queue 3 has the
highest priority and queue 0 the lowest.
mqprio in igc already uses TX arbitration unconditionally to reverse TX
queue priority when mqprio is enabled. The TX arbitration logic does not
require a private flag, because mqprio was added recently and no known
users depend on the default queue ordering, which differs from the typical
convention.
taprio does not use TX arbitration, so it inherits the default igc TX
queue priority order. This causes tc command inconsistencies when
configuring frame preemption with taprio compared to mqprio in igc.
Other tc command inconsistencies and configuration issues already exist
when using taprio on igc compared to other network controllers. These
issues are described in a later section.
To harmonize TX queue priority behavior between taprio and mqprio, and
to fix these issues without breaking long-standing taprio use cases,
this patch adds a new private flag, called reverse-tsn-txq-prio, to
reverse the TX queue priority. It makes queue 3 the highest and queue 0
the lowest, reusing the TX arbitration logic already used by mqprio.
Users must set the private flag when enabling frame preemption with
taprio to follow the standard convention. Doing so promotes adoption of
the correct priority model for new features while preserving
compatibility with legacy configurations.
This new private flag addresses:
1. Non-standard socket -> tc -> TX hw queue mapping for taprio in igc
Without the private flag:
- taprio maps (socket -> tc -> TX hardware queue) differently on igc
compared to other network controllers
- On igc, mqprio maps tc differently from taprio, since mqprio already
uses TX arbitration
The following examples compare taprio configuration on igc and other
network controllers:
a) On other NICs (TX hw queue 3 is highest priority):
taprio num_tc 4 map 0 1 2 3 .... \
queues 1@0 1@1 1@2 1@3
Mapping translates to:
socket 0 -> tc 0 -> queue 0
socket 3 -> tc 3 -> queue 3
This is the normal mapping that respects the standard convention:
higher socket number -> higher tc -> higher priority TX hw queue
b) On igc (TX hw queue 0 is highest priority by default):
taprio num_tc 4 map 3 2 1 0 .... \
queues 1@0 1@1 1@2 1@3
Mapping translates to:
socket 0 -> tc 3 -> queue 3
socket 3 -> tc 0 -> queue 0
This igc tc mapping example is based on Intel's TSN validation test
case, where a higher socket priority maps to a higher priority queue.
It respects the mapping:
higher socket number -> higher priority TX hw queue
but breaks the expected ordering:
higher tc -> higher priority TX hw queue
as defined in [Ref1]. This custom mapping complicates common taprio
setup across NICs.
2. Non-standard frame preemption mapping for taprio in igc
Without the private flag:
- Compared to other network controllers, taprio on igc must flip the
expected fp sequence, since express traffic is expected to map to the
highest priority queue and preemptible traffic to lower ones
- On igc, frame preemption configuration for mqprio differs from taprio,
since mqprio already uses TX arbitration
The following examples compare taprio frame preemption configuration on
igc and other network controllers:
a) On other NICs (TX hw queue 3 is highest priority):
taprio num_tc 4 map ..... \
queues 1@0 1@1 1@2 1@3 \
fp P P P E
Mapping translates to:
tc0, tc1, tc2 -> preemptible -> queue 0, 1, 2
tc3 -> express -> queue 3
This is the normal mapping that respects the standard convention:
higher tc -> express traffic -> higher priority TX hw queue
lower tc -> preemptible traffic -> lower priority TX hw queue
b) On igc (TX hw queue 0 is highest priority by default):
taprio num_tc 4 map ...... \
queues 1@0 1@1 1@2 1@3 \
fp E P P P
Mapping translates to:
tc0 -> express -> queue 0
tc1, tc2, tc3 -> preemptible -> queue 1, 2, 3
This inversion respects the mapping of:
express traffic -> higher priority TX hw queue
but breaks the expected ordering:
higher tc -> express traffic
as defined in [Ref1] where higher tc indicates higher priority. In
this case, the lower tc0 is assigned to express traffic. This custom
mapping further complicates common preemption setup across NICs.
Tests were performed on taprio with the following combinations, where
two apps send traffic simultaneously on different queues:
Private Flag Traffic Sent By Traffic Sent By
----------------------------------------------------------------
enabled iperf3 (queue 3) iperf3 (queue 0)
disabled iperf3 (queue 0) iperf3 (queue 3)
enabled iperf3 (queue 3) real-time app (queue 0)
disabled iperf3 (queue 0) real-time app (queue 3)
enabled real-time app (queue 3) iperf3 (queue 0)
disabled real-time app (queue 0) iperf3 (queue 3)
enabled real-time app (queue 3) real-time app (queue 0)
disabled real-time app (queue 0) real-time app (queue 3)
Private flag is controlled with:
ethtool --set-priv-flags enp1s0 reverse-tsn-txq-prio <on|off>
[Ref1]
IEEE 802.1Q clause 8.6.8 Transmission selection:
"For a given Port and traffic class, frames are selected from the
corresponding queue for transmission if and only if:
...
b) For each queue corresponding to a numerically higher value of traffic
class supported by the Port, the operation of the transmission selection
algorithm supported by that queue determines that there is no frame
available for transmission."
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Previously, TX arbitration prioritized queues based on the TC they were
mapped to. A queue mapped to TC 3 had higher priority than one mapped to
TC 0.
To improve code reuse for upcoming patches and align with typical NIC
behavior, this patch updates the logic to prioritize higher queue numbers
when mqprio is used. As a result, queue 0 becomes the lowest priority and
queue 3 becomes the highest.
This patch also introduces igc_tsn_is_tc_to_queue_priority_ordered() to
preserve the original TC-based priority rule and reject configurations
where a higher TC maps to a lower queue offset.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Refactor TXDCTL macro handling to use FIELD_PREP and GENMASK macros.
This prepares the code for adding a new TXDCTL priority field in an
upcoming patch.
Verified that the macro values remain unchanged before and after
refactoring.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Rename macros to use the DCTL prefix for consistency with existing
macros that reference the same register. This prepares for an upcoming
patch that adds new fields to TXDCTL.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Move and consolidate TXDCTL and RXDCTL macros in preparation for
upcoming TXDCTL changes. This improves organization and readability.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix RX_DONE_INT_MASK definition in order to enable RX queues 16-31.
Fixes: f252493e18353 ("net: airoha: Enable multiple IRQ lines support in airoha_eth driver.")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250609-aioha-fix-rx-queue-mask-v1-1-f33706a06fa2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce flowtable hw acceleration for PPPoE traffic.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250609-b4-airoha-flowtable-pppoe-v1-1-1520fa7711b4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enable memory-mapped I/O access to RMON statistics registers for devices
known to work correctly. Currently, only the D-Link DGE-550T (`0x4000`)
with PCI revision A3 (`0x0c`) is allowed.
To avoid issues on other hardware, a runtime check was added to restrict
MMIO usage. The `MEM_MAPPING` macro was removed in favor of runtime
detection.
To access RMON registers, the code `dw32(RmonStatMask, 0x0007ffff);`
must also be skipped, so this patch conditionally disables it as well.
Tested-on: D-Link DGE-550T Rev-A3
Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com>
Link: https://patch.msgid.link/20250610000130.49065-2-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Just because otx2_atomic64_add is using u64 pointer as argument
all callers has to typecast __iomem void pointers which inturn
causing sparse warnings. Fix those by changing otx2_atomic64_add
argument to void pointer.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749484421-3607-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch removes unnecessary typecasts by marking the
mbox_regions array as __iomem since it is used to store
pointers to memory-mapped I/O (MMIO) regions. Also simplified
the call to readq() in PF driver by removing redundant type casts.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749484309-3434-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A local variable of tx_q worked around name collision with internal
txq variable in netif_subqueue macros.
This workaround is no longer needed.
Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Link: https://patch.msgid.link/6376db2a39b8d3bf2fa893f58f56246bed128d5d.1749038081.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Improve consistency of code by using only netif_subqueue variant apis
Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Link: https://patch.msgid.link/5fd897b75729cf078385aacd9ed40091314ea63d.1749038081.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-06-09 (ice, i40e, ixgbe, iavf)
Jake moves from individual virtchnl RSS configuration values, for ice,
i40e, and iavf, to a common libie location and values.
Martyna and Dawid add counters for link_down_events to ice, i40e, and
ixgbe drivers. The counter increments only on actual physical link-down
events visible to the PHY. It does not increment when the user performs
a software-only interface down/up (e.g. ip link set dev down).
The counter does increment in cases where the interface is reinitialized
in a way that causes a real link drop - such as eg. when attaching
an XDP program, reconfiguring channels, or toggling certain priv-flags.
For ice:
Arkadiusz and Karol separate PTP and DPLL functionality to their
respective APIs.
Michal adds a separate handler for Flow Director command processing.
For iavf:
Ahmed converts driver to utilize core's IRQ affinity API.
For ixgbe:
Alok Tiwari fixes issues with some comments; typos, copy/paste errors,
etc.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: Fix typos and clarify comments in X550 driver code
iavf: convert to NAPI IRQ affinity API
ice: add a separate Rx handler for flow director commands
ice: add ice driver PTP pin documentation
ice: change SMA pins to SDP in PTP API
ice: redesign dpll sma/u.fl pins control
ixgbe: add link_down_events statistic
i40e: add link_down_events statistic
ice: add link_down_events statistic
net: intel: move RSS packet classifier types to libie
net: intel: rename 'hena' to 'hashcfg' for clarity
====================
Link: https://patch.msgid.link/20250609212652.1138933-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The last use of t3_l2t_send_event() was removed in 2019 by
commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module from
kernel")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250609152330.24027-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for reporting additional hardware counters for drop and
TC using the ethtool -S interface.
These counters include:
- Aggregate Rx/Tx drop counters
- Per-TC Rx/Tx packet counters
- Per-TC Rx/Tx byte counters
- Per-TC Rx/Tx pause frame counters
The counters are exposed using ethtool_ops->get_ethtool_stats and
ethtool_ops->get_strings. This feature/counters are not available
to all versions of hardware.
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20250609100103.GA7102@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, e1000_down called cancel_work_sync for the e1000 reset task
(via e1000_down_and_stop), which takes RTNL.
As reported by users and syzbot, a deadlock is possible in the following
scenario:
CPU 0:
- RTNL is held
- e1000_close
- e1000_down
- cancel_work_sync (cancel / wait for e1000_reset_task())
CPU 1:
- process_one_work
- e1000_reset_task
- take RTNL
To remedy this, avoid calling cancel_work_sync from e1000_down
(e1000_reset_task does nothing if the device is down anyway). Instead,
call cancel_work_sync for e1000_reset_task when the device is being
removed.
Fixes: e400c7444d84 ("e1000: Hold RTNL when e1000_down can be called")
Reported-by: syzbot+846bb38dc67fe62cc733@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/683837bf.a00a0220.52848.0003.GAE@google.com/
Reported-by: John <john.cs.hey@gmail.com>
Closes: https://lore.kernel.org/netdev/CAP=Rh=OEsn4y_2LvkO3UtDWurKcGPnZ_NPSXK=FbgygNXL37Sw@mail.gmail.com/
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Set use_nsecs=true as timestamp is reported in ns. Lack of this result
in smaller timestamp error window which cause error during phc2sys
execution on E825 NICs:
phc2sys[1768.256]: ioctl PTP_SYS_OFFSET_PRECISE: Invalid argument
This problem was introduced in the cited commit which omitted setting
use_nsecs to true when converting the ice driver to use
convert_base_to_cs().
Testing hints (ethX is PF netdev):
phc2sys -s ethX -c CLOCK_REALTIME -O 37 -m
phc2sys[1769.256]: CLOCK_REALTIME phc offset -5 s0 freq -0 delay 0
Fixes: d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()")
Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If a reset event is received from the PF early in the init cycle, the
state machine hangs for about 25 seconds.
Reproducer:
echo 1 > /sys/class/net/$PF0/device/sriov_numvfs
ip link set dev $PF0 vf 0 mac $NEW_MAC
The log shows:
[792.620416] ice 0000:5e:00.0: Enabling 1 VFs
[792.738812] iavf 0000:5e:01.0: enabling device (0000 -> 0002)
[792.744182] ice 0000:5e:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
[792.839964] ice 0000:5e:00.0: Setting MAC 52:54:00:00:00:11 on VF 0. VF driver will be reinitialized
[813.389684] iavf 0000:5e:01.0: Failed to communicate with PF; waiting before retry
[818.635918] iavf 0000:5e:01.0: Hardware came out of reset. Attempting reinit.
[818.766273] iavf 0000:5e:01.0: Multiqueue Enabled: Queue pair count = 16
Fix it by scheduling the reset task and making the reset task capable of
resetting early in the init cycle.
Fixes: ef8693eb90ae3 ("i40evf: refactor reset handling")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a VFLR interrupt is received during a VF reset initiated from a
different source, the VFLR may be not fully handled. This can
leave the VF in an undefined state.
To address this, set the I40E_VFLR_EVENT_PENDING bit again during VFLR
handling if the reset is not yet complete. This ensures the driver
will properly complete the VF reset in such scenarios.
Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The function i40e_vc_reset_vf attempts, up to 20 times, to handle a
VF reset request, using the return value of i40e_reset_vf as an indicator
of whether the reset was successfully triggered. Currently, i40e_reset_vf
always returns true, which causes new reset requests to be ignored if a
different VF reset is already in progress.
This patch updates the return value of i40e_reset_vf to reflect when
another VF reset is in progress, allowing the caller to properly use
the retry mechanism.
Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Corrected spelling errors such as "simular" -> "similar",
"excepted" -> "accepted", and "Determime" -> "Determine".
Fixed including incorrect word usage ("to MAC" -> "two MAC")
and improved awkward phrasing.
Aligned function header descriptions with their actual functionality
(e.g., "Writes a value" -> "Reads a value").
Corrected typo in error code from -ENIVAL to -EINVAL.
Improved overall clarity and consistency in comment across various
functions.
These changes improve maintainability and readability of the code
without affecting functionality.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit bd7c00605ee0 ("net: move aRFS rmap management and CPU affinity
to core") allows the drivers to delegate the IRQ affinity to the NAPI
instance. However, the driver needs to use a persistent NAPI config
and explicitly set/unset the NAPI<->IRQ association.
Convert to the new IRQ affinity API.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|