Age | Commit message (Collapse) | Author | Files | Lines |
|
The VIC hardware has a constraint that the MSIX interrupt used for errors
be specified as a 7 bit number. Before this patch, it was allocated after
the I/O interrupts, which would cause a problem if 128 or more I/O
interrupts are in use.
So make the required interrupts come before the I/O interrupts to
guarantee the error interrupt offset never exceeds 7 bits.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-2-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bundling the wq/rq specific data into dedicated enic_wq/rq structures
cleans up the enic structure and simplifies future changes related to
wq/rq.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-1-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When configuring flow steering rules, the driver is currently going
through a reset for all errors from the device. Instead, the driver
should only reset when there's a timeout error from the device.
Fixes: 57718b60df9b ("gve: Add flow steering adminq commands")
Cc: stable@vger.kernel.org
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241113175930.2585680-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool ntuple filters with FLOW_RSS were originally defined as adding
the base queue ID (ring_cookie) to the value from the indirection table,
so that the same table could distribute over more than one set of queues
when used by different filters.
However, some drivers / hardware ignore the ring_cookie, and simply use
the indirection table entries as queue IDs directly. Thus, for drivers
which have not opted in by setting ethtool_ops.cap_rss_rxnfc_adds to
declare that they support the original (addition) semantics, reject in
ethtool_set_rxnfc any filter which combines FLOW_RSS and a nonzero ring.
(For a ring_cookie of zero, both behaviours are equivalent.)
Set the cap bit in sfc, as it is known to support this feature.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/cc3da0844083b0e301a33092a6299e4042b65221.1731499022.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After assembling the new private flags on a PF, the operation to determine
the changed flags uses the wrong bitmaps. Instead of xor-ing orig_flags
with new_flags, it uses the still unchanged pf->flags, thus changed_flags
is always 0.
Fix it by using the correct bitmaps.
The issue was discovered while debugging why disabling source pruning
stopped working with release 6.7. Although the new flags will be copied to
pf->flags later on in that function, disabling source pruning requires
a reset of the PF, which was skipped due to this bug.
Disabling source pruning:
$ sudo ethtool --set-priv-flags eno1 disable-source-pruning on
$ sudo ethtool --show-priv-flags eno1
Private flags for eno1:
MFP : off
total-port-shutdown : off
LinkPolling : off
flow-director-atr : on
veb-stats : off
hw-atr-eviction : off
link-down-on-close : off
legacy-rx : off
disable-source-pruning: on
disable-fw-lldp : off
rs-fec : off
base-r-fec : off
vf-vlan-pruning : off
Regarding reproducing:
I observed the issue with a rather complicated lab setup, where
* two VLAN interfaces are created on eno1
* each with a different MAC address assigned
* each moved into a separate namespace
* both VLANs are bridged externally, so they form a single layer 2 network
The external bridge is done via a channel emulator adding packet loss and
delay and the application in the namespaces tries to send/receive traffic
and measure the performance. Sender and receiver are separated by
namespaces, yet the network card "sees its own traffic" send back to it.
To make that work, source pruning has to be disabled.
Cc: stable@vger.kernel.org
Fixes: 70756d0a4727 ("i40e: Use DECLARE_BITMAP for flags and hw_features fields in i40e_pf")
Signed-off-by: Peter Große <pegro@friiks.de>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20241113210705.1296408-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sparx5 switchdev driver can be built either with or without support
for the Lan969x switch. However, it cannot be built-in when the lan969x
driver is a loadable module because of a link-time dependency:
arm-linux-gnueabi-ld: drivers/net/ethernet/microchip/sparx5/sparx5_main.o:(.rodata+0xd44): undefined reference to `lan969x_desc'
Add a Kconfig dependency to reflect this in Kconfig, allowing all
the valid configurations but forcing sparx5 to be a loadable module
as well if lan969x is.
Fixes: 98a01119608d ("net: sparx5: add compatible string for lan969x")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241113115513.4132548-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We recently added this error path. We need to call enetc_pci_remove()
before returning. It cleans up the resources from enetc_pci_probe().
Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/93888efa-c838-4682-a7e5-e6bf318e844e@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vendor driver r8125 doesn't advertise 2.5G EEE on RTL8125A, and r8126
doesn't advertise 5G EEE. Likely there are compatibility issues,
therefore do the same in r8169.
With this change we don't have to disable 2.5G EEE advertisement in
rtl8125a_config_eee_phy() any longer.
We use new phylib accessor phy_set_eee_broken() to mark the respective
EEE modes as broken.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ce185e10-8a2f-4cf8-a49b-fd8fb3c3c8a1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc8).
Conflicts:
tools/testing/selftests/net/.gitignore
252e01e68241 ("selftests: net: add netlink-dumps to .gitignore")
be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw")
https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/
drivers/net/phy/phylink.c
671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled")
7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"")
Adjacent changes:
drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c
5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines")
e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for the 'ethtool -d <dev>' command to retrieve and print
a register dump for fbnic. The dump defaults to version 1 and consists
of two parts: all the register sections that can be dumped linearly, and
an RPC RAM section that is structured in an interleaved fashion and
requires special handling. For each register section, the dump also
contains the start and end boundary information which can simplify parsing.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20241112222605.3303211-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The first PPS latch time needs to be calculated by the driver
(in rounded off seconds) and configured as the start time
offset for the cycle. After synchronizing two PTP clocks
running as master/slave, missing this would cause master
and slave to start immediately with some milliseconds
drift which causes the PPS signal to never synchronize with
the PTP master.
Fixes: 186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20241111095842.478833-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the clock dwmac->tx_clk was not enabled in intel_eth_plat_probe,
it should not be disabled in any path.
Conversely, if it was enabled in intel_eth_plat_probe, it must be disabled
in all error paths to ensure proper cleanup.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes: 9efc9b2b04c7 ("net: stmmac: Add dwmac-intel-plat for GBE driver")
Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Link: https://patch.msgid.link/20241108173334.2973603-1-mordan@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mediatek,mac-wol property is being handled backwards to what is
described in the binding: it currently enables PHY WOL when the property
is present and vice versa. Invert the driver logic so it matches the
binding description.
Fixes: fd1d62d80ebc ("net: stmmac: replace the use_phy_wol field with a flag")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://patch.msgid.link/20241109-mediatek-mac-wol-noninverted-v2-1-0e264e213878@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Indicate that dwmac_socfpga has a gmac. This will make sure that
gmac-specific interrupt processing is done, including timestamp
interrupt handling. Without this, the external snapshot interrupt is
never ack'd and we have an interrupt storm on external snapshot event.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-10-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PTP_TCR (Timestamp Control Register) is used to configure several
features related to packet timestamping.
On one hand, it configures the 1588 packet processing, to indicate what
types of frames should be timestamped (all, only 1588v1 or 1588v2, using
L2 or L4 timestamping, on IPv4 or IPv6, etc.). This is congfigured
usually through the ioctl / ndo dedicated for such setup. This
configuration is done by setting some fields in that register, that seem
to behave the same way on all dwmac variants, including DWMAC1000.
On the other hand, and only on DWMAC1000 apparently, some fields in that
register are used to configure external snapshots (bits 24/25).
On DWMAC4 and others, these fields are reserved and external
snapshots are configured through a dedicated register that simply
doesn't seem to exist on DWMAC1000.
This configuration is done in the dwmac1000-specific ptp_clock_info ops
(cf dwmac1000_ptp_enable()).
So to avoid the timestamping configuration interfering with the external
snapshots, this commit makes sure that the config_hw_tstamping only
configures the relevant bits in PTP_TCR, so that the DWMAC1000
timestamping can correctly rely on these otherwise reserved fields.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-9-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The stmmac_ptp code doesn't need the dwmac4 register definitions, remove
the inclusion.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-8-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The default configuration for the interrupts on dwmac1000 have the
timestamping interrupt masked. Now that the timestamping has been
adapted to dwmac1000, enable the timestamping interrupt on these
platforms.
On dwmac1000, the external snapshot interrupt is configured through a
dedicated bit, that is set as reserved on other dwmac variants. The
timestaming interrupt is acknowledged by reading the
GMAC3_X_TIMESTAMP_STATUS register.
Make sure that this interrupt is enabled when snapshot is enabled, and
masked when disabled.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-7-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In GMAC3_X, the timestamping configuration differs from GMAC4 in the
layout of the registers accessed to grab the number of snapshots in FIFO
as well as the register offset to grab the aux snapshot timestamp.
Introduce dedicated ops to configure timestamping on dwmac100 and
dwmac1000. The latency correction doesn't seem to exist on GMAC3, so its
corresponding operation isn't populated.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-6-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PTP configuration for GMAC3_X differs from the other implementations
in several ways :
- There's only one external snapshot trigger
- The snapshot configuration is done through the PTP_TCR register,
whereas the other dwmac variants have a dedicated ACR (auxiliary
control reg) for that purpose
- The layout for the PTP_TCR register also differs, as bits 24/25 are
used for the snapshot configuration. These bits are reserved on other
variants.
On GMAC3_X, we also can't discover the number of snapshot triggers
automatically.
The GMAC3_X has one PPS output, however it's configuration isn't
supported yet so report 0 n_per_out for now.
Introduce a dedicated set of ptp_clock_info ops and configuration
parameters to reflect these differences specific to GMAC3_X.
This was tested on dwmac_socfpga.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-5-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some DWMAC variants such as dwmac1000 don't support discovering the
number of output pps and auxiliary snapshots. Allow these parameters to
be defined in default ptp_clock_info, and let them be updated only when
the feature discovery yielded a result.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The auxiliary snapshot configuration was found to differ depending on
the dwmac version. To prepare supporting this, allow specifying the
ptp_clock_info ops in the hwif array
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The stmmac_ptp_clock_ops are copied into the stmmac_priv structure
before being registered to the PTP core. Some adjustments are made prior
to that, such as the number of snapshots or max adjustment parameters.
Instead of modifying the global definition, then copying into the local
private data, let's first copy then modify the local parameters.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241112170658.2388529-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
There are a few paths for e1000_down to be called in e1000 where RTNL is
not currently being held:
- e1000_shutdown (pci shutdown)
- e1000_suspend (power management)
- e1000_reinit_locked (via e1000_reset_task delayed work)
- e1000_io_error_detected (via pci error handler)
Hold RTNL in three places to fix this issue:
- e1000_reset_task: igc, igb, and e100e all hold rtnl in this path.
- e1000_io_error_detected (pci error handler): e1000e and ixgbe hold
rtnl in this path. A patch has been posted for igc to do the same
[1].
- __e1000_shutdown (which is called from both e1000_shutdown and
e1000_suspend): igb, ixgbe, and e1000e all hold rtnl in the same
path.
The other paths which call e1000_down seemingly hold RTNL and are OK:
- e1000_close (ndo_stop)
- e1000_change_mtu (ndo_change_mtu)
Based on the above analysis and mailing list discussion [2], I believe
adding rtnl in the three places mentioned above is correct.
Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
Link: https://lore.kernel.org/netdev/20241022215246.307821-3-jdamato@fastly.com/ [1]
Link: https://lore.kernel.org/netdev/ZxgVRX7Ne-lTjwiJ@LQ3V64L9R2/ [2]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
tx_queue_lock and stats_lock are declared and initialized, but never
used. Remove them.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix 2 spelling mistakes in comments in `igb_main.c`.
Signed-off-by: Johnny Park <pjohnny0508@gmail.com>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Since the igc driver doesn't support forced speed configuration and
its current related hardware doesn't support it either, there is no
use of the mac.autoneg parameter. Moreover, in one case this usage
might result in a NULL pointer dereference due to an uninitialized
function pointer, phy.ops.force_speed_duplex.
Therefore, remove this parameter from the igc code.
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Header ixgbe_type.h includes ixgbe_mbx.h. Also, header
ixgbe_mbx.h included ixgbe_type.h, thus introducing a circular
dependency.
- Remove ixgbe_mbx.h inclusion from ixgbe_type.h.
- ixgbe_mbx.h requires the definition of struct ixgbe_mbx_operations
so move its definition there. While at it, add missing argument
identifier names.
- Add required forward structure declarations.
- Include ixgbe_mbx.h in the .c files that need it, for the
following reasons:
ixgbe_sriov.c uses ixgbe_check_for_msg
ixgbe_main.c uses ixgbe_init_mbx_params_pf
ixgbe_82599.c uses mbx_ops_generic
ixgbe_x540.c uses mbx_ops_generic
ixgbe_x550.c uses mbx_ops_generic
Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice workqueue doesn't seem to rely on any CPU locality and should
therefore be able to run on any CPU. In practice this is already
happening through the unbound ice_service_timer that may fire anywhere
and queue the workqueue accordingly to any CPU.
Make this official so that the ice workqueue is only ever queued to
housekeeping CPUs on nohz_full.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_vc_query_rxdid() function allocates memory to store the
virtchnl_supported_rxdids structure used to communicate the bitmap of
supported RXDIDs to a VF.
This structure is only 8 bytes in size. The function must hold the
allocated length on the stack as well as the pointer to the structure which
itself is 8 bytes. Allocating this storage on the heap adds unnecessary
overhead including a potential error path that must be handled in case
kzalloc fails. Because this structure is so small, we're not saving stack
space. Additionally, because we must ensure that we free the allocated
memory, the return value from ice_vc_send_msg_to_vf() must also be saved in
the stack ret variable. Depending on compiler optimization, this means
allocating the 8-byte structure is requiring up to 16-bytes of stack
memory!
Simplify this function to keep the rxdid variable on the stack, saving
memory and removing a potential failure exit path from this function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The pf->supported_rxdids field is used to populate the list of valid RXDIDs
that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC.
The set of supported RXDIDs is dependent on the DDP, and can be read from
the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the
VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to
use this list to validate the requested descriptor ID from the VF when
programming the Rx queues.
A future update to support VF live migration will also want to validate
that the target VF can support the same descriptor ID when migrating.
Currently, pf->supported_rxdids is initialized inside the
ice_vc_query_rxdid() function. This means that it is only ever initialized
if at least one VF actually tries to negotiate
VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized
every time the VF loads and requests the descriptor list. This worked
before because the PF only checks pf->suppported_rxdids when programming
the Rx queue if the VF actually negotiates the
VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature.
This will be problematic for VF live migration. We need the list of
supported Rx descriptor IDs when migrating. It is possible that no VF on
the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs.
Refactor the driver to initialize pf->supported_rxdids during driver
initialization after the DDP is loaded. This is simpler, avoids unnecessary
duplicate work, and avoids issues with the live migration process.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently when any VF is trusted and true promiscuous mode is enabled on
the PF, the VF will receive all unicast traffic directed to the device's
internal switch. This includes traffic external to the NIC and also from
other VSI (i.e. VFs). This does not match the expected behavior as
unicast traffic should only be visible from external sources in this
case. Disable the Tx promiscuous mode bits for unicast promiscuous mode.
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs. This preserves NAPI config settings when queue
counts are adjusted.
Tested with an E810-2CQDA2 NIC.
Begin by setting the queue count to 4:
$ sudo ethtool -L eth4 combined 4
Check the queue settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of
1111:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}'
None
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 1111,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now reduce the queue count to 2, which would destroy the queue with NAPI
ID 8451:
$ sudo ethtool -L eth4 combined 2
Check the queue settings, noting that NAPI ID 8451 is gone:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now, increase the number of queues back to 4:
$ sudo ethtool -L eth4 combined 4
Dump the settings, expecting to see the same NAPI IDs as above and for
NAPI ID 8451 to have its gro-flush-timeout set to 1111:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 1111,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
An optional flag field has been added to the signature segment header.
The field contains two flags, a "valid" bit, and a "last segment" bit
that indicates whether the segment is the last segment that will be
sent to firmware.
If the flag field's valid bit is NOT set, then as was done before,
assume that this is the last segment being downloaded.
However, if the flag field's valid bit IS set, then use the last segment
flag to determine if this segment is the last segment to download.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add ice_ddp_send_hunk() that buffers "sent FW hunk" calls to AQ in order
to mark the "last" one in more elegant way. Next commit will add even
more complicated "sent FW" flow, so it's better to untangle a bit before.
Note that metadata buffers were not skipped for NOT-@indicate_last
segments, this is fixed now.
Minor:
+ use ice_is_buffer_metadata() instead of open coding it in
ice_dwnld_cfg_bufs();
+ ice_dwnld_cfg_bufs_no_lock() + dependencies were moved up a bit to have
better git-diff, as this function was rewritten (in terms of git-blame)
CC: Paul Greenwalt <paul.greenwalt@intel.com>
CC: Dan Nowlin <dan.nowlin@intel.com>
CC: Ahmed Zaki <ahmed.zaki@intel.com>
CC: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Extend the work done in commit 70838938e89c ("ice: Implement driver
functionality to dump serdes equalizer values") by adding the new set of
Rx registers that can be read using command:
$ ethtool -d interface_name
Rx equalization parameters are E810 PHY registers used by end user to
gather information about configuration and status to debug link and
connection issues in the field.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Refactor function ice_get_tx_rx_equa() to iterate over new table of
params instead of multiple calls to ice_aq_get_phy_equalization().
Subsequent commit will extend that function by add more serdes equalizer
values to dump.
Shorten the fields of struct ice_serdes_equalization_to_ethtool for
readability purposes.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implements tc offload support for rvu representors.
Usage example:
- Add tc rule to drop packets with vlan id 3 using port
representor(Rpf1vf0).
# tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower
vlan_id 3 vlan_ethtype ipv4 skip_sw action drop
- Redirect packets with vlan id 5 and IPv4 packets to eth1,
after stripping vlan header.
# tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5
vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress
redirect dev eth1
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement the offload stat ndo by fetching the HW stats
of rx/tx queues attached to the representor.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register devlink port for the rvu representors.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hardware supports different types of MACs eg RPM, SDP, LBK.
LBK is for internal Tx->Rx HW loopback path. RPM and SDP MACs support
ingress/egress pkt IO on interfaces with different set of capabilities
like interface modes. At the time of netdev driver registration PF will
seek MAC related information from Admin function driver
'drivers/net/ethernet/marvell/octeontx2/af' and sets up ingress/egress
queues etc such that pkt IO on the channels of these different MACs is
possible. This patch add representors for SDP MAC.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds support to manage the mtu configuration for VF through representor.
On update of representor mtu a mbox notification is send
to VF to update its mtu.
This feature is implemented based on the "Network Function Representors"
kernel documentation.
"
Setting an MTU on the representor should cause that same MTU
to be reported to the representee.
"
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implements the below requirement mentioned
in the representors documentation.
"
The representee's link state is controlled through the
representor. Setting the representor administratively UP
or DOWN should cause carrier ON or OFF at the representee.
"
This patch enables
- Reflecting the link state of representor based on the VF state and
link state of VF based on representor.
- On VF interface up/down a notification is sent via mbox to representor
to update the link state.
eg: ip link set eth0 up/down will disable carrier on/off
of the corresponding representor(r0p1) interface.
- On representor interface up/down will cause the link state update of VF.
eg: ip link set r0p1 up/down will disable carrier on/off
of the corresponding representee(eth0) interface.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds support to export VF port statistics via representor
netdev. Defines new mbox "NIX_LF_STATS" to fetch VF hw stats.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current HW, do not support in-built switch which will forward pkts
between representee and representor. When representor is put under
a bridge and pkts needs to be sent to representee, then pkts from
representor are sent on a HW internal loopback channel, which again
will be punted to ingress pkt parser. Now the rules that this patch
installs are the MCAM filters/rules which will match against these
pkts and forward them to representee.
The rules that this patch installs are for basic
representor <=> representee path similar to Tun/TAP between VM and
Host.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implements basic set of net_device_ops.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds initial devlink support to set/get the switchdev mode.
Representor netdevs are created for each rvu devices when
the switch mode is set to 'switchdev'. These netdevs are
be used to control and configure VFs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds basic driver for the RVU representor.
Driver on probe does pci specific initialization and
does hw resources configuration. Introduces RVU_ESWITCH
kernel config to enable/disable the driver. Representor
and NIC shares the code but representors netdev support
subset of NIC functionality. Hence "otx2_rep_dev" API
helps to skip the features initialization that are not
supported by the representors.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Testing small size RPCs (300B-400B) on a large AMD system suggests
that page pool recycling is very useful even for just the head frags.
With this patch (and copy break disabled) I see a 30% performance
improvement (82Gbps -> 106Gbps).
Convert bnxt from normal page frags to page pool frags for head buffers.
On systems with small page size we can use the same pool as for TPA
pages. On systems with large pages the frag allocation logic of the
page pool is already used to split a large page into TPA chunks.
TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB)
and we always allocate the same sized chunks. Mixing allocation
of TPA and head pages would lead to sub-optimal memory use.
Plus Taehee's work on zero-copy / devmem will need to differentiate
between TPA and non-TPA page pool, anyway. Conditionally allocate
a new page pool for heads.
Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f.
Sebastian noticed the ISR indirectly acquires spin_locks, which are
sleeping locks under PREEMPT_RT, which leads to kernel splats.
Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|