Age | Commit message (Collapse) | Author | Files | Lines |
|
It supports XDP_PASS, XDP_DROP and multi buffer.
The new function aq_nic_xmit_xdpf() is used to send packet with
xdp_frame and internally it calls aq_nic_map_xdp().
AQC chip supports 32 multi-queues and 8 vectors(irq).
there are two option
1. under 8 cores and 4 tx queues per core.
2. under 4 cores and 8 tx queues per core.
Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT
queue. If so, no tx_lock is needed.
But this patchset doesn't use this strategy because getting hardware tx
queue index cost is too high.
So, tx_lock is used in the aq_nic_xmit_xdpf().
single-core, single queue, 80% cpu utilization.
30.75% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
10.35% [kernel] [k] aq_hw_read_reg <---------- here
4.38% [kernel] [k] get_page_from_freelist
single-core, 8 queues, 100% cpu utilization, half PPS.
45.56% [kernel] [k] aq_hw_read_reg <---------- here
17.58% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
4.72% [kernel] [k] hw_atl_b0_hw_ring_rx_receive
The new function __aq_ring_xdp_clean() is a xdp rx handler and this is
called only when XDP is attached.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
aq_xdp() is a xdp setup callback function for Atlantic driver.
When XDP is attached or detached, the device will be restarted because
it uses different headroom, tailroom, and page order value.
If XDP enabled, it switches default page order value from 0 to 2.
Because the default maximum frame size is still 2K and it needs
additional area for headroom and tailroom.
The total size(headroom + frame size + tailroom) is 2624.
So, 1472Bytes will be always wasted for every frame.
But when order-2 is used, these pages can be used 6 times
with flip strategy.
It means only about 106Bytes per frame will be wasted.
Also, It supports xdp fragment feature.
MTU can be 16K if xdp prog supports xdp fragment.
If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.
And a static key is added and It will be used to call the xdp_clean
handler in ->poll(). data plane implementation will be contained
the followed patch.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
DSA cross-chip notifier cleanups
This patch set makes the following improvements:
- Cross-chip notifiers pass a switch index, port index, sometimes tree
index, all as integers. Sometimes we need to recover the struct
dsa_port based on those integers. That recovery involves traversing a
list. By passing directly a pointer to the struct dsa_port we can
avoid that, and the indices passed previously can still be obtained
from the passed struct dsa_port.
- Resetting VLAN filtering on a switch has explicit code to make it run
on a single switch, so it has no place to stay in the cross-chip
notifier code. Move it out.
- Changing the MTU on a user port affects only that single port, yet the
code passes through the cross-chip notifier layer where all switches
are notified. Avoid that.
- Other related cosmetic changes in the MTU changing procedure.
Apart from the slight improvement in performance given by
(a) doing less work in cross-chip notifiers
(b) emitting less cross-chip notifiers
we also end up with about 100 less lines of code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A cross-chip notifier with "targeted_match=true" is one that matches
only the local port of the switch that emitted it. In other words,
passing through the cross-chip notifier layer serves no purpose.
Eliminate this concept by calling directly ds->ops->port_change_mtu
instead of emitting a targeted cross-chip notifier. This leaves the
DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU
port, which need to be reflected also across all DSA links.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can get a hold of the "ds" pointer directly from "dp", no need for
the dsa_slave_priv.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We could retrieve the cpu_dp pointer directly from the "dp" we already
have, no need to resort to dsa_to_port(ds, port).
This change also removes the need for an "int port", so that is also
deleted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the more conventional iterator over user ports instead of explicitly
ignoring them, and use the more conventional name "other_dp" instead of
"dp_iter", for readability.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To determine whether a given port should react to the port targeted by
the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match()
look at the positioning of the switch port currently executing the
notifier relative to the switch port for which the notifier was emitted.
To maintain stylistic compatibility with the other match functions from
switch.c, the host address and host VLAN match functions take the
notifier information about targeted port, switch and tree indices as
argument. However, these functions only use that information to retrieve
the struct dsa_port *targeted_dp, which is an invariant for the outer
loop that calls them. So it makes more sense to calculate the targeted
dp only once, and pass it to them as argument.
But furthermore, the targeted dp is actually known at the time the call
to dsa_port_notify() is made. It is just that we decide to only save the
indices of the port, switch and tree in the notifier structure, just to
retrace our steps and find the dp again using dsa_switch_find() and
dsa_to_port().
But both the above functions are relatively expensive, since they need
to iterate through lists. It appears more straightforward to make all
notifiers just pass the targeted dp inside their info structure, and
have the code that needs the indices to look at info->dp->index instead
of info->port, or info->dp->ds->index instead of info->sw_index, or
info->dp->ds->dst->index instead of info->tree_index.
For the sake of consistency, all cross-chip notifiers are converted to
pass the "dp" directly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In dsa_port_switchdev_unsync_attrs() there is a comment that resetting
the VLAN filtering isn't done where it is expected. And since commit
108dc8741c20 ("net: dsa: Avoid cross-chip syncing of VLAN filtering"),
there is no reason to handle this in switch.c either.
Therefore, move the logic to port.c, and adapt it slightly to the data
structures and naming conventions from there.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Let's describe this sysctl.
Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.
To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.
Also, remove vmalloc_no_huge() and add opt-in helper vmalloc_huge().
Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more fixes for SPI, plus one new PCI ID for another Intel
chipset.
All device specific stuff"
* tag 'spi-fix-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller
spi: cadence-quadspi: fix incorrect supports_op() return value
spi: intel: Add support for Raptor Lake-S SPI serial flash
spi: spi-mtk-nor: initialize spi controller after resume
|
|
Last cycle we extended the idmapped mounts infrastructure to support
idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
Since then, the meaning of an idmapped mount is a mount whose idmapping
is different from the filesystems idmapping.
While doing that work we missed to adapt the acl translation helpers.
They still assume that checking for the identity mapping is enough. But
they need to use the no_idmapping() helper instead.
Note, POSIX ACLs are always translated right at the userspace-kernel
boundary using the caller's current idmapping and the initial idmapping.
The order depends on whether we're coming from or going to userspace.
The filesystem's idmapping doesn't matter at the border.
Consequently, if a non-idmapped mount is passed we need to make sure to
always pass the initial idmapping as the mount's idmapping and not the
filesystem idmapping. Since it's irrelevant here it would yield invalid
ids and prevent setting acls for filesystems that are mountable in a
userns and support posix acls (tmpfs and fuse).
I verified the regression reported in [1] and verified that this patch
fixes it. A regression test will be added to xfstests in parallel.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # 5.17
Cc: <regressions@lists.linux.dev>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch adds an entry for the CTU CAN FD IP to the maintainers
file.
Link: https://lore.kernel.org/all/2cc77e2999d9688bed155e4c7f7807e46d1bf9e3.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
CTU CAN FD IP core documentation based on Martin Jeřábek's diploma theses
Open-source and Open-hardware CAN FD Protocol Support
https://dspace.cvut.cz/handle/10467/80366
.
Link: https://lore.kernel.org/all/692b965999ff6c272239df0fe1c76b68d02b134d.1647932262.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com>
Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Platform bus adaptation for CTU CAN FD open-source IP core.
The core has been tested together with OpenCores SJA1000
modified to be CAN FD frames tolerant on MicroZed Zynq based
MZ_APO education kits designed by Petr Porazil from PiKRON.com
company. FPGA design
https://gitlab.fel.cvut.cz/canbus/zynq/zynq-can-sja1000-top.
The kit description at the Computer Architectures course pages
https://cw.fel.cvut.cz/wiki/courses/b35apo/documentation/mz_apo/start .
Kit carrier board and mechanics design source files
https://gitlab.com/pikron/projects/mz_apo/microzed_apo
The work is documented in Martin Jeřábek's diploma theses
Open-source and Open-hardware CAN FD Protocol Support
https://dspace.cvut.cz/handle/10467/80366
.
Link: https://lore.kernel.org/all/4d5c53499bafe7717815f948801bd5aedaa05c12.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com>
Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
PCI bus adaptation for CTU CAN FD open-source IP core.
The project providing FPGA design for Intel EP4CGX15 based DB4CGX15
PCIe board with PiKRON.com designed transceiver riser shield is available
at https://gitlab.fel.cvut.cz/canbus/pcie-ctucanfd .
Link: https://lore.kernel.org/all/a81333e206a9bcf9434797f6f54d8664775542e2.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com>
Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
independent part.
This driver adds support for the CTU CAN FD open-source IP core.
More documentation and core sources at project page
(https://gitlab.fel.cvut.cz/canbus/ctucanfd_ip_core).
The core integration to Xilinx Zynq system as platform driver
is available (https://gitlab.fel.cvut.cz/canbus/zynq/zynq-can-sja1000-top).
Implementation on Intel FPGA based PCI Express board is available
from project (https://gitlab.fel.cvut.cz/canbus/pcie-ctucanfd).
More about CAN bus related projects used and developed at CTU FEE at
https://canbus.pages.fel.cvut.cz/ .
Link: https://lore.kernel.org/all/1906e4941560ae2ce4b8d181131fd4963aa31611.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com>
Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The device-tree bindings for open-source/open-hardware CAN FD IP core
designed at the Czech Technical University in Prague.
CTU CAN FD IP core and other CTU CAN bus related projects
listing and documentation page
http://canbus.pages.fel.cvut.cz/
Link: https://lore.kernel.org/all/c5a37fc470ae065b21e79caa65863539393c0d7c.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Prague.
The Czech Technical University in Prague (CTU) is one of
the biggest and oldest (founded 1707) technical universities
in Europe. The abbreviation in Czech language is ČVUT according
to official name in Czech language
České vysoké učení technické v Praze
The English translation
The Czech Technical University in Prague
The university pages in English
https://www.cvut.cz/en
Link: https://lore.kernel.org/all/ff3a7216114fcd83530e70b994ef0e4277ddf000.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an
integrated transceiver (ATA6563). This patch add support for the new
device.
Link: https://lore.kernel.org/all/20220419072805.2840340-3-mkl@pengutronix.de
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an
integrated Transceiver (ATA6563). Add the microchip,mcp251863 as a new
compatible to the binding.
Link: https://lore.kernel.org/all/20220419072805.2840340-2-mkl@pengutronix.de
Cc: devicetree@vger.kernel.org
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch adds documentation for the r8a77961 to the
renesas,rcar-canfd binding.
Link: https://lore.kernel.org/all/20220401153743.77871-1-wsa+renesas@sang-engineering.com
Cc: devicetree@vger.kernel.org
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch marks the bit timing constants as const.
Fixes: c223da689324 ("can: xilinx_can: Add support for CANFD FD frames")
Link: https://lore.kernel.org/all/20220317203119.792552-1-mkl@pengutronix.de
Cc: Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com>
Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Commit 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN
binding to YAML") converts xilinx_can.txt to xilinx,can.yaml, but
missed to adjust its reference in MAINTAINERS.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains
about a broken reference.
Repair this file reference in XILINX CAN DRIVER.
Fixes: 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML")
Link: https://lore.kernel.org/all/20220321122840.17841-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Using pm_runtime_resume_and_get is more appropriate
for simplifing code
Link: https://lore.kernel.org/all/20220419081449.2574026-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
powerpc's asm/prom.h brings some headers that it doesn't need itself.
In order to clean it up, first add missing headers in users of
asm/prom.h
Link: https://lore.kernel.org/all/878888f9057ad2f66ca0621a0007472bf57f3e3d.1648833432.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Technologic Systems has rebranded as embeddedTS with the current
domain eventually going offline. Update web/doc URLs to correct
resource locations.
Link: https://lore.kernel.org/all/20220329201229.16279-1-kris@embeddedTS.com
Signed-off-by: Kris Bahnsen <kris@embeddedTS.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
larger ones
The CiA (CAN in Automation) lists in their Newsletter 1/2018 in the
"Recommendation for the CAN FD bit-timing" [1] article several
recommendations, one of them is:
| Recommendation 3: Choose BRPA and BRPD as low as possible
[1] https://can-newsletter.org/uploads/media/raw/f6a36d1461371a2f86ef0011a513712c.pdf
With the current bit timing algorithm Srinivas Neeli noticed that on
the Xilinx Versal ACAP board the CAN data bit timing parameters are
not calculated optimally. For most bit rates, the bit rate
prescaler (BRP) is != 1, although it's possible to configure the
requested with a bit rate with a prescaler of 1:
| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'v4.8'
| nominal real Bitrt nom real SampP
| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error
| 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8%
| 10000000 25 1 1 1 1 2 9999999 0.0% 75.0% 75.0% 0.0%
| 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7%
| 5000000 50 1 1 1 1 4 4999999 0.0% 75.0% 75.0% 0.0%
| 4000000 62 1 1 1 1 5 3999999 0.0% 75.0% 75.0% 0.0%
| 2000000 125 1 1 1 1 10 1999999 0.0% 75.0% 75.0% 0.0%
| 1000000 250 1 1 1 1 20 999999 0.0% 75.0% 75.0% 0.0%
The bit timing parameter calculation algorithm iterates effectively
from low to high BRP values. It selects a new best parameter set, if
the sample point error of the current parameter set is equal or less
to old best parameter set.
If the given hardware constraints (clock rate and bit timing parameter
constants) don't allow a sample point error of 0, the algorithm will
first find a valid bit timing parameter set with a low BRP, but then
will accept parameter sets with higher BRPs that have the same sample
point error.
This patch changes the algorithm to only accept a new parameter set,
if the resulting sample point error is lower. This leads to the
following data bit timing parameter for the Versal ACAP board:
| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'can-next'
| nominal real Bitrt nom real SampP
| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error
| 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8%
| 10000000 12 2 3 2 1 1 9999999 0.0% 75.0% 75.0% 0.0%
| 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7%
| 5000000 12 5 6 4 1 1 4999999 0.0% 75.0% 75.0% 0.0%
| 4000000 12 7 7 5 1 1 3999999 0.0% 75.0% 75.0% 0.0%
| 2000000 12 14 15 10 1 1 1999999 0.0% 75.0% 75.0% 0.0%
| 1000000 25 14 15 10 1 2 999999 0.0% 75.0% 75.0% 0.0%
Note: Due to HW constraints a data bit rate of 1 MBit/s with BRP = 1 is not possible.
Link: https://lore.kernel.org/all/20220318144913.873614-1-mkl@pengutronix.de
Link: https://lore.kernel.org/all/20220113203004.jf2rqj2pirhgx72i@pengutronix.de
Cc: Srinivas Neeli <sneeli@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
can_rx_offload_queue_timestamp()
This patch renames the function can_rx_offload_queue_sorted() to
can_rx_offload_queue_timestamp(). This better describes what the
function does, it adds a newly RX'ed skb to the sorted queue by its
timestamp.
Link: https://lore.kernel.org/all/20220417194327.2699059-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
netlink_dump() is allocating an skb, reserves space in it
but forgets to reset network header.
This allows a BPF program, invoked later from sk_filter()
to access uninitialized kernel memory from the reserved
space.
Theorically mac header reset could be omitted, because
it is set to a special initial value.
bpf_internal_load_pointer_neg_helper calls skb_mac_header()
without checking skb_mac_header_was_set().
Relying on skb->len not being too big seems fragile.
We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
to avoid surprises in the future.
syzbot report was:
BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
__bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
__bpf_prog_run include/linux/filter.h:626 [inline]
bpf_prog_run include/linux/filter.h:633 [inline]
__bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
sk_filter include/linux/filter.h:905 [inline]
netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was stored to memory at:
___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
__bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
__bpf_prog_run include/linux/filter.h:626 [inline]
bpf_prog_run include/linux/filter.h:633 [inline]
__bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
sk_filter include/linux/filter.h:905 [inline]
netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at:
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3244 [inline]
__kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
kmalloc_reserve net/core/skbuff.c:354 [inline]
__alloc_skb+0x545/0xf90 net/core/skbuff.c:426
alloc_skb include/linux/skbuff.h:1158 [inline]
netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: db65a3aaf29e ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Florent Fourcot says:
====================
rtnetlink: improve ALT_IFNAME config and fix dangerous GROUP usage
First commit forbids dangerous calls when both IFNAME and GROUP are
given, since it can introduce unexpected behaviour when IFNAME does not
match any interface.
Second patch achieves primary goal of this patchset to fix/improve
IFLA_ALT_IFNAME attribute, since previous code was never working for
newlink/setlink. ip-link command is probably getting interface index
before, and was not using this feature.
Last two patches are improving error code on corner cases.
Changes in v2:
* Remove ifname argument in rtnl_dev_get/do_setlink
functions (simplify code)
* Use a boolean to avoid condition duplication in __rtnl_newlink
Changes in v3:
* Simplify rtnl_dev_get signature
Changes in v4:
* Rename link_lookup to link_specified
Changes in v5:
* Re-order patches
====================
Link: https://lore.kernel.org/r/20220415165330.10497-1-florent.fourcot@wifirst.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A request without interface name/interface index/interface group cannot
work. We should return EINVAL
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If IFLA_ALT_IFNAME is set and given interface is not found,
we should return ENODEV and be consistent with IFLA_IFNAME
behaviour
This commit extends feature of commit 76c9ac0ee878,
"net: rtnetlink: add possibility to use alternative names as message handle"
CC: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
buffer called "ifname" given in function rtnl_dev_get
is always valid when called by setlink/newlink,
but contains only empty string when IFLA_IFNAME is not given. So
IFLA_ALT_IFNAME is always ignored
This patch fixes rtnl_dev_get function with a remove of ifname argument,
and move ifname copy in do_setlink when required.
It extends feature of commit 76c9ac0ee878,
"net: rtnetlink: add possibility to use alternative names as message
handle""
CC: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the interface does not exist, and a group is given, the given
parameters are being set to all interfaces of the given group. The given
IFNAME/ALT_IF_NAME are being ignored in that case.
That can be dangerous since a typo (or a deleted interface) can produce
weird side effects for caller:
Case 1:
IFLA_IFNAME=valid_interface
IFLA_GROUP=1
MTU=1234
Case 1 will update MTU and group of the given interface "valid_interface".
Case 2:
IFLA_IFNAME=doesnotexist
IFLA_GROUP=1
MTU=1234
Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is
ignored in this case
This behaviour is not consistent and dangerous. In order to fix this issue,
we now return ENODEV when the given IFNAME does not exist.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Tonghao Zhang says:
====================
net: sched: allow user to select txqueue
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Patch 1 allow user to select txqueue in clsact hook.
Patch 2 support skbhash to select txqueue.
====================
Link: https://lore.kernel.org/r/20220415164046.26636-1-xiangxia.m.yue@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch allows users to pick queue_mapping, range
from A to B. Then we can load balance packets from A
to B tx queue. The range is an unsigned 16bit value
in decimal format.
$ tc filter ... action skbedit queue_mapping skbhash A B
"skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit")
is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH
+----+ +----+ +----+
| P1 | | P2 | | Pn |
+----+ +----+ +----+
| | |
+-----------+-----------+
|
| clsact/skbedit
| MQ
v
+-----------+-----------+
| q0 | qn | qm
v v v
HTB/FQ FIFO ... FIFO
For example:
If P1 sends out packets to different Pods on other host, and
we want distribute flows from qn - qm. Then we can use skb->hash
as hash.
setup commands:
$ NETDEV=eth0
$ ip netns add n1
$ ip link add ipv1 link $NETDEV type ipvlan mode l2
$ ip link set ipv1 netns n1
$ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up
$ tc qdisc add dev $NETDEV clsact
$ tc filter add dev $NETDEV egress protocol ip prio 1 \
flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6
$ tc qdisc add dev $NETDEV handle 1: root mq
$ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
$ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
$ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit
$ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
$ tc qdisc add dev $NETDEV parent 1:3 pfifo
$ tc qdisc add dev $NETDEV parent 1:4 pfifo
$ tc qdisc add dev $NETDEV parent 1:5 pfifo
$ tc qdisc add dev $NETDEV parent 1:6 pfifo
$ tc qdisc add dev $NETDEV parent 1:7 pfifo
$ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10
pick txqueue from 2 - 6:
$ ethtool -S $NETDEV | grep -i tx_queue_[0-9]_bytes
tx_queue_0_bytes: 42
tx_queue_1_bytes: 0
tx_queue_2_bytes: 11442586444
tx_queue_3_bytes: 7383615334
tx_queue_4_bytes: 3981365579
tx_queue_5_bytes: 3983235051
tx_queue_6_bytes: 6706236461
tx_queue_7_bytes: 42
tx_queue_8_bytes: 0
tx_queue_9_bytes: 0
txqueues 2 - 6 are mapped to classid 1:3 - 1:7
$ tc -s class show dev $NETDEV
...
class mq 1:3 root leaf 8002:
Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
class mq 1:4 root leaf 8003:
Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
class mq 1:5 root leaf 8004:
Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
class mq 1:6 root leaf 8005:
Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
class mq 1:7 root leaf 8006:
Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
...
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch fixes issue:
* If we install tc filters with act_skbedit in clsact hook.
It doesn't work, because netdev_core_pick_tx() overwrites
queue_mapping.
$ tc filter ... action skbedit queue_mapping 1
And this patch is useful:
* We can use FQ + EDT to implement efficient policies. Tx queues
are picked by xps, ndo_select_queue of netdev driver, or skb hash
in netdev_core_pick_tx(). In fact, the netdev driver, and skb
hash are _not_ under control. xps uses the CPUs map to select Tx
queues, but we can't figure out which task_struct of pod/containter
running on this cpu in most case. We can use clsact filters to classify
one pod/container traffic to one Tx queue. Why ?
In containter networking environment, there are two kinds of pod/
containter/net-namespace. One kind (e.g. P1, P2), the high throughput
is key in these applications. But avoid running out of network resource,
the outbound traffic of these pods is limited, using or sharing one
dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
(e.g. Pn), the low latency of data access is key. And the traffic is not
limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
This choice provides two benefits. First, contention on the HTB/FQ Qdisc
lock is significantly reduced since fewer CPUs contend for the same queue.
More importantly, Qdisc contention can be eliminated completely if each
CPU has its own FIFO Qdisc for the second kind of pods.
There must be a mechanism in place to support classifying traffic based on
pods/container to different Tx queues. Note that clsact is outside of Qdisc
while Qdisc can run a classifier to select a sub-queue under the lock.
In general recording the decision in the skb seems a little heavy handed.
This patch introduces a per-CPU variable, suggested by Eric.
The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
- Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
is set in qdisc->enqueue() though tx queue has been selected in
netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
firstly in __dev_queue_xmit(), is useful:
- Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
For example, eth0, macvlan in pod, which root Qdisc install skbedit
queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping
because there is no filters in clsact or tx Qdisc of this netdev.
Same action taked in eth0, ixgbe in Host.
- Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it
in __dev_queue_xmit when processing next packets.
For performance reasons, use the static key. If user does not config the NET_EGRESS,
the patch will not be compiled.
+----+ +----+ +----+
| P1 | | P2 | | Pn |
+----+ +----+ +----+
| | |
+-----------+-----------+
|
| clsact/skbedit
| MQ
v
+-----------+-----------+
| q0 | q1 | qn
v v v
HTB/FQ HTB/FQ ... FIFO
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the user runs:
bridge link set dev $br_port mcast_flood on
this command should affect not only L2 multicast, but also IPv4 and IPv6
multicast.
In the Ocelot switch, unknown multicast gets flooded according to
different PGIDs according to its type, and PGID_MC only handles L2
multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their
default value of 0, unknown IP multicast traffic is never flooded.
Fixes: 421741ea5672 ("net: mscc: ocelot: offload bridge port flags to device")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In case the checksum calculation is offloaded to the DSA master network
interface, it will include the switch trailing tag. As soon as the switch strips
that tag on egress, the calculated checksum is wrong.
Therefore, add the checksum calculation to the tagger (if required) before
adding the switch tag. This way, the hellcreek code works with all DSA master
interfaces regardless of their declared feature set.
Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This will reset deeply on freeze and thaw instead of suspend and
resume and prevent null pointer dereferences of the uninitialized ring
0 buffer while thawing.
The impact is an indefinitely hanging kernel. You can't switch
consoles after this and the only possible user interaction is SysRq.
BUG: kernel NULL pointer dereference
RIP: 0010:aq_ring_rx_fill+0xcf/0x210 [atlantic]
aq_vec_init+0x85/0xe0 [atlantic]
aq_nic_init+0xf7/0x1d0 [atlantic]
atl_resume_common+0x4f/0x100 [atlantic]
pci_pm_thaw+0x42/0xa0
resolves in aq_ring.o to
```
0000000000000ae0 <aq_ring_rx_fill>:
{
/* ... */
baf: 48 8b 43 08 mov 0x8(%rbx),%rax
buff->flags = 0U; /* buff is NULL */
```
The bug has been present since the introduction of the new pm code in
8aaa112a57c1 ("net: atlantic: refactoring pm logic") and was hidden
until 8ce84271697a ("net: atlantic: changes for multi-TC support"),
which refactored the aq_vec_{free,alloc} functions into
aq_vec_{,ring}_{free,alloc}, but is technically not wrong. The
original functions just always reinitialized the buffers on S3/S4. If
the interface is down before freezing, the bug does not occur. It does
not matter, whether the initrd contains and loads the module before
thawing.
So the fix is to invert the boolean parameter deep in all pm function
calls, which was clearly intended to be set like that.
First report was on Github [1], which you have to guess from the
resume logs in the posted dmesg snippet. Recently I posted one on
Bugzilla [2], since I did not have an AQC device so far.
#regzbot introduced: 8ce84271697a
#regzbot from: koo5 <kolman.jindrich@gmail.com>
#regzbot monitor: https://github.com/Aquantia/AQtion/issues/32
Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic")
Link: https://github.com/Aquantia/AQtion/issues/32 [1]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215798 [2]
Cc: stable@vger.kernel.org
Reported-by: koo5 <kolman.jindrich@gmail.com>
Signed-off-by: Manuel Ullmann <labre@posteo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DSA tags before IP header (categories 1 and 2) or after the payload (3)
might introduce offload checksum issues.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: Introduce line card support for modular switch
Jiri says:
This patchset introduces support for modular switch systems and also
introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch.
It contains 8 slots to accommodate line cards - replaceable PHY modules
which may contain gearboxes.
Currently supported line card:
16X 100GbE (QSFP28)
Other line cards that are going to be supported:
8X 200GbE (QSFP56)
4X 400GbE (QSFP-DD)
There may be other types of line cards added in the future.
To be consistent with the port split configuration (splitter cabels),
the line card entities are treated in the similar way. The nature of
a line card is not "a pluggable device", but "a pluggable PHY module".
A concept of "provisioning" is introduced. The user may "provision"
certain slot with a line card type. Driver then creates all instances
(devlink ports, netdevices, etc) related to this line card type. It does
not matter if the line card is plugged-in at the time. User is able to
configure netdevices, devlink ports, setup port splitters, etc. From the
perspective of the switch ASIC, all is present and can be configured.
The carrier of netdevices stays down if the line card is not plugged-in.
Once the line card is inserted and activated, the carrier of
the related netdevices is then reflecting the physical line state,
same as for an ordinary fixed port.
Once user does not want to use the line card related instances
anymore, he can "unprovision" the slot. Driver then removes the
instances.
Patches 1-4 are extending devlink driver API and UAPI in order to
register, show, dump, provision and activate the line card.
Patches 5-17 are implementing the introduced API in mlxsw.
The last patch adds a selftest for mlxsw line cards.
Example:
$ devlink port # No ports are listed
$ devlink lc
pci/0000:01:00.0:
lc 1 state unprovisioned
supported_types:
16x100G
lc 2 state unprovisioned
supported_types:
16x100G
lc 3 state unprovisioned
supported_types:
16x100G
lc 4 state unprovisioned
supported_types:
16x100G
lc 5 state unprovisioned
supported_types:
16x100G
lc 6 state unprovisioned
supported_types:
16x100G
lc 7 state unprovisioned
supported_types:
16x100G
lc 8 state unprovisioned
supported_types:
16x100G
Note that driver exposes list supported line card types. Currently
there is only one: "16x100G".
To provision the slot #8:
$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
lc 8 state active type 16x100G
supported_types:
16x100G
$ devlink port
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4
To uprovision the slot #8:
$ devlink lc set pci/0000:01:00.0 lc 8 notype
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
provision/unprovision/activation tests
Introduce basic line card manipulation which consists of provisioning,
unprovisioning and activation of a line card.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For each port get slot_index using PMLP register. For ports residing
on a linecard, identify it with the linecard by setting mapping
using devlink_port_linecard_set() helper. Use linecard slot index for
PMTDB register queries.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of line card implementation, the core has to have a way to
remove relevant ports manually. Extend the Spectrum driver ops by an op
that implements port removal of selected ports upon request.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow to process events generated upon line card getting "ready" and
"active".
When DSDSC event with "ready" bit set is delivered, that means the
line card is powered up. Use MDDC register to push the line card to
active state. Once FW is done with that, the DSDSC event with "active"
bit set is delivered.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce objects for line cards and an infrastructure around that.
Use devlink_linecard_create/destroy() to register the line card with
devlink core. Implement provisioning ops with a list of supported
line cards.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MBCT register allows to transfer binary INI codes from the host to
the management FW by transferring it by chunks of maximum 1KB.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|