Age | Commit message (Collapse) | Author | Files | Lines |
|
At the time of introduction, in commit bdeced75b13f ("net: dsa: felix:
Add PCS operations for PHYLINK"), support for the Lynx PCS inside Felix
was relying, for USXGMII support, on the fact that get_phy_device() is
able to parse the Lynx PCS "device-in-package" registers for this C45
MDIO device and identify it correctly.
However, this was actually working somewhat by mistake (in the sense
that, even though it was detected, it was detected for the wrong
reasons).
The get_phy_c45_ids() function works by iterating through all MMDs
starting from 1 (MDIO_MMD_PMAPMD) and stops at the first one which
returns a non-zero value in the "device-in-package" register pair,
proceeding to see what that non-zero value is.
For the Felix PCS, the first MMD (1, for the PMA/PMD) returns a non-zero
value of 0xffffffff in the "device-in-package" registers. There is a
code branch which is supposed to treat this case and flag it as wrong,
and normally, this would have caught my attention when adding initial
support for this PCS:
if ((devs_in_pkg & 0x1fffffff) == 0x1fffffff) {
/* If mostly Fs, there is no device there, then let's probe
* MMD 0, as some 10G PHYs have zero Devices In package,
* e.g. Cortina CS4315/CS4340 PHY.
*/
However, this code never actually kicked in, it seems, because this
snippet from get_phy_c45_devs_in_pkg() was basically sabotaging itself,
by returning 0xfffffffe instead of 0xffffffff:
/* Bit 0 doesn't represent a device, it indicates c22 regs presence */
*devices_in_package &= ~BIT(0);
Then the rest of the code just carried on thinking "ok, MMD 1 (PMA/PMD)
says that there are 31 devices in that package, each having a device id
of ffff:ffff, that's perfectly fine, let's go ahead and probe this PHY
device".
But after cleanup commit 320ed3bf9000 ("net: phy: split
devices_in_package"), this got "fixed", and now devs_in_pkg is no longer
0xfffffffe, but 0xffffffff. So now, get_phy_device is returning -ENODEV
for the Lynx PCS, because the semantics have remained mostly unchanged:
the loop stops at the first MMD that returns a non-zero value, and that
is MMD 1.
But the Lynx PCS is simply a clause 37 PCS which implements the required
MAC-side functionality for USXGMII (when operated in C45 mode, which is
where C45 devices-in-package detection is relevant to). Of course it
will fail the PMD/PMA test (MMD 1), since it is not a PHY. But it does
implement detection for MDIO_MMD_PCS (3):
- MDIO_DEVS1=0x008a, MDIO_DEVS2=0x0000,
- MDIO_DEVID1=0x0083, MDIO_DEVID2=0xe400
Let get_phy_c45_ids() continue searching for valid MMDs, and don't
assume that every phy_device has a PMA/PMD MMD implemented.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Petr Machata says:
====================
net: sched: Do not drop root lock in tcf_qevent_handle()
Mirred currently does not mix well with blocks executed after the qdisc
root lock is taken. This includes classification blocks (such as in PRIO,
ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by
mirred can cause deadlocks: either when the thread of execution attempts to
take the lock a second time, or when two threads end up waiting on each
other's locks.
The qevent patchset attempted to not introduce further badness of this
sort, and dropped the lock before executing the qevent block. However this
lead to too little locking and races between qdisc configuration and packet
enqueue in the RED qdisc.
Before the deadlock issues are solved in a way that can be applied across
many qdiscs reasonably easily, do for qevents what is done for the
classification blocks and just keep holding the root lock.
That is done in patch #1. Patch #2 then drops the now unnecessary root_lock
argument from Qdisc_ops.enqueue.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mirred currently does not mix well with blocks executed after the qdisc
root lock is taken. This includes classification blocks (such as in PRIO,
ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by
mirred can cause deadlocks: either when the thread of execution attempts to
take the lock a second time, or when two threads end up waiting on each
other's locks.
The qevent patchset attempted to not introduce further badness of this
sort, and dropped the lock before executing the qevent block. However this
lead to too little locking and races between qdisc configuration and packet
enqueue in the RED qdisc.
Before the deadlock issues are solved in a way that can be applied across
many qdiscs reasonably easily, do for qevents what is done for the
classification blocks and just keep holding the root lock.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The 128 bits ct_label field is matched using a 32 bit hardware register.
As such, only the lower 32 bits of ct_label field are offloaded. Change
this logic to support setting and matching higher bits too.
Map the 128 bits data to a unique 32 bits ID. Matching is done as exact
match of the mapping ID of key & mask.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
UMR WQEs are posted in bulks, and HW is notified once per a bulk.
Reduce the number of completions by requesting such only for
the last WQE of the bulk.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Synchronize offloading device ESN with xfrm received SN
by updating an existing IPsec HW context with the new SN.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
On receive flow inspect received packets for IPsec offload indication
using the cqe, for IPsec offloaded packets propagate offload status
and stack handle to stack for further processing.
Supported statuses:
- Offload ok.
- Authentication failure.
- Bad trailer indication.
Connect-X IPsec does not use mlx5e_ipsec_handle_rx_cqe.
For RX only offload, we see the BW gain. Below is the iperf3
performance report on two server of 24 cores Intel(R) Xeon(R)
CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX.
We use one thread per IPsec tunnel.
---------------------------------------------------------------------
Mode | Num tunnel | BW | Send CPU util | Recv CPU util
| | (Gbps) | (Average %) | (Average %)
---------------------------------------------------------------------
Cryto offload | 1 | 4.6 | 4.2 | 14.5
---------------------------------------------------------------------
Cryto offload | 24 | 38 | 73 | 63
---------------------------------------------------------------------
Non-offload | 1 | 4 | 4 | 13
---------------------------------------------------------------------
Non-offload | 24 | 23 | 52 | 67
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Introduce decrypt FT, the RX error FT and the default rules.
The IPsec RX decrypt flow table is pointed by the TTC
(Traffic Type Classifier) ESP steering rules.
The decrypt flow table has two flow groups. The first flow group
keeps the decrypt steering rule programmed via the "ip xfrm s" interface.
The second flow group has a default rule to forward all non-offloaded
ESP packet to the TTC ESP default RSS TIR.
The RX error flow table is the destination of the decrypt steering rules
in the IPsec RX decrypt flow table. It has a fixed rule with single
copy action that copies ipsec_syndrome to metadata_regB[0:6]. The IPsec
syndrome is used to filter out non-ipsec packet and to return the IPsec
crypto offload status in Rx flow. The destination of RX error flow table
is the TTC ESP default RSS TIR.
All the FTs (decrypt FT and error FT) are created only when IPsec SAs
are added. If there is no IPsec SAs, the FTs are removed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add FTE actions IPsec ENCRYPT/DECRYPT
Add ipsec_obj_id field in FTE
Add new action field MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This patch adds support for Connect-X IPsec crypto offload
by implementing the IPsec acceleration layer needed routines,
which delegates IPsec offloads to Connect-X routines.
In Connect-X IPsec, a Security Association (SA) is added or deleted
via allocating a HW context of an encryption/decryption key and
a HW context of a matching SA (IPsec object).
The Security Policy (SP) is added or deleted by creating matching Tx/Rx
steering rules whith an action of encryption/decryption respectively,
executed using the previously allocated SA HW context.
When new xfrm state (SA) is added:
- Use a separate crypto key HW context.
- Create a separate IPsec context in HW to inlcude the SA properties:
- aes-gcm salt.
- ICV properties (ICV length, implicit IV).
- on supported devices also update ESN.
- associate the allocated crypto key with this IPsec context.
Introduce a new compilation flag MLX5_IPSEC for it.
Downstream patches will implement the Rx,Tx steering
and will add the update esn.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This to set the base for downstream patches to support
the new IPsec implementation of the Connect-X family.
Following modifications made:
- Remove accel layer dependency from MLX5_FPGA_IPSEC.
- Introduce accel_ipsec_ops, each IPsec device will
have to support these ops.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently only ECPF allows enabling eswitch when SR-IOV is disabled.
Enable PF also to enable eswitch when SR-IOV is disabled.
Load VF vports when eswitch is already enabled.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
for non ECPF eswitch manager function, vports are already
enabled/disabled when eswitch is enabled/disabled respectively.
Simplify function change handler for such eswitch manager function.
Therefore, ECPF is the only one which remains PF/VF function change
handler.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
TLS runs only over Eth, and the Eth driver is the only user of
the core TLS functionality.
There is no meaning of having the core functionality without the usage
in Eth driver.
Hence, let both TLS core implementations depend on MLX5_CORE_EN,
and select MLX5_EN_TLS.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5e_accel_sk_get_rxq is only used in ktls_rx.c file which already
depends on XPS to be compiled, move it from the generic en_accel.h
header to be local in ktls_rx.c, to fix the below build break
In file included from
../drivers/net/ethernet/mellanox/mlx5/core/en_main.c:49:0:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h:
In function ‘mlx5e_accel_sk_get_rxq’:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h:153:12:
error: implicit declaration of function ‘sk_rx_queue_get’ ...
int rxq = sk_rx_queue_get(sk);
^~~~~~~~~~~~~~~
Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
|
|
Cited commit in fixes tag missed to set the switch id of the PF and VF
ports. Due to this flow cannot be offloaded, a simple command like below
fails to offload with below error.
tc filter add dev ens2f0np0 parent ffff: prio 1 flower \
dst_mac 00:00:00:00:00:00/00:00:00:00:00:00 skip_sw \
action mirred egress redirect dev ens2f0np0pf0vf0
Error: mlx5_core: devices are not on same switch HW, can't offload forwarding.
Hence, fix it by setting switch id for each PF and VF representors port
as before the cited commit.
Fixes: 71ad8d55f8e5 ("devlink: Replace devlink_port_attrs_set parameters with a struct")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
|
|
Having the users of MSCC_OCELOT_SWITCH_LIB depend on REGMAP_MMIO was a
bad idea, since that symbol is not user-selectable. So we should have
kept a 'select REGMAP_MMIO'.
When we do that, we run into 2 more problems:
- By depending on GENERIC_PHY, we are causing a recursive dependency.
But it looks like GENERIC_PHY has no other dependencies, and other
drivers select it, so we can select it too:
drivers/of/Kconfig:69:error: recursive dependency detected!
drivers/of/Kconfig:69: symbol OF_IRQ depends on IRQ_DOMAIN
kernel/irq/Kconfig:68: symbol IRQ_DOMAIN is selected by REGMAP
drivers/base/regmap/Kconfig:7: symbol REGMAP default is visible depending on REGMAP_MMIO
drivers/base/regmap/Kconfig:39: symbol REGMAP_MMIO is selected by MSCC_OCELOT_SWITCH_LIB
drivers/net/ethernet/mscc/Kconfig:15: symbol MSCC_OCELOT_SWITCH_LIB is selected by MSCC_OCELOT_SWITCH
drivers/net/ethernet/mscc/Kconfig:22: symbol MSCC_OCELOT_SWITCH depends on GENERIC_PHY
drivers/phy/Kconfig:8: symbol GENERIC_PHY is selected by PHY_BCM_NS_USB3
drivers/phy/broadcom/Kconfig:41: symbol PHY_BCM_NS_USB3 depends on MDIO_BUS
drivers/net/phy/Kconfig:13: symbol MDIO_BUS depends on MDIO_DEVICE
drivers/net/phy/Kconfig:6: symbol MDIO_DEVICE is selected by PHYLIB
drivers/net/phy/Kconfig:254: symbol PHYLIB is selected by ARC_EMAC_CORE
drivers/net/ethernet/arc/Kconfig:19: symbol ARC_EMAC_CORE is selected by ARC_EMAC
drivers/net/ethernet/arc/Kconfig:25: symbol ARC_EMAC depends on OF_IRQ
- By depending on PHYLIB, we are causing a recursive dependency. PHYLIB
only has a single dependency, "depends on NETDEVICES", which we are
already depending on, so we can again hack our way into conformance by
turning the PHYLIB dependency into a select.
drivers/of/Kconfig:69:error: recursive dependency detected!
drivers/of/Kconfig:69: symbol OF_IRQ depends on IRQ_DOMAIN
kernel/irq/Kconfig:68: symbol IRQ_DOMAIN is selected by REGMAP
drivers/base/regmap/Kconfig:7: symbol REGMAP default is visible depending on REGMAP_MMIO
drivers/base/regmap/Kconfig:39: symbol REGMAP_MMIO is selected by MSCC_OCELOT_SWITCH_LIB
drivers/net/ethernet/mscc/Kconfig:15: symbol MSCC_OCELOT_SWITCH_LIB is selected by MSCC_OCELOT_SWITCH
drivers/net/ethernet/mscc/Kconfig:22: symbol MSCC_OCELOT_SWITCH depends on PHYLIB
drivers/net/phy/Kconfig:254: symbol PHYLIB is selected by ARC_EMAC_CORE
drivers/net/ethernet/arc/Kconfig:19: symbol ARC_EMAC_CORE is selected by ARC_EMAC
drivers/net/ethernet/arc/Kconfig:25: symbol ARC_EMAC depends on OF_IRQ
Fixes: f4d0323bae4e ("net: mscc: ocelot: convert MSCC_OCELOT_SWITCH into a library")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled words "will" and "attach".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org
|
|
Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0].
This can happen if server_thread runs faster than the main thread.
In that case, pthread_cond_wait will wait forever because
pthread_cond_signal was executed before the main thread was blocking.
Let's move pthread_mutex_lock up a bit to make sure server_thread
runs strictly after the main thread goes to sleep.
(Not sure why this is 5.5 specific, maybe scheduling is less
deterministic? But I was able to confirm that it does indeed
happen in a VM.)
[0] https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isUAyg-dFzGDY7QWYkmm7rmTSg@mail.gmail.com/
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200715224107.3591967-1-sdf@google.com
|
|
This reverts commit 3203c9010060 ("test_bpf: flag tests that cannot
be jited on s390").
The s390 bpf JIT previously had a restriction on the maximum program
size, which required some tests in test_bpf to be flagged as expected
failures. The program size limitation has been removed, and the tests
now pass, so these tests should no longer be flagged.
Fixes: d1242b10ff03 ("s390/bpf: Remove JITed image size limitations")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20200716143931.330122-1-seth.forshee@canonical.com
|
|
A busy-wait loop is used to implement waiting for bits to be copied
from the skb to the kernel buffer before retiring a block. This is
a problem on PREEMPT_RT because the copying task could be preempted
by the busy-waiting task and thus live lock in the busy-wait loop.
Replace the busy-wait logic with an rwlock_t. This provides lockdep
coverage and makes the code RT ready.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sergey Organov says:
====================
net: fec: a few improvements
This is a collection of simple improvements that reduce and/or
simplify code. They got developed out of attempt to use DP83640 PTP
PHY connected to built-in FEC (that has its own PTP support) of the
iMX 6SX micro-controller. The primary bug-fix was now submitted
separately, and this is the rest of the changes.
NOTE: the patches are developed and tested on 4.9.146, and rebased on
top of recent 'net-next/master', where, besides visual inspection, I
only tested that they do compile.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No need to use snprintf() on a constant string, nor using magic
constant in the fixed code was a good idea.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Code of the form "if(x) x = 0" replaced with "x = 0".
Code of the form "if(x == a) x = a" removed.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Initializing with 0 makes it much easier to identify time stamps from
otherwise uninitialized clock.
Initialization of PTP clock with current kernel time makes little sense as
PTP time scale differs from UTC time scale that kernel time represents.
It only leads to confusion when no actual PTP initialization happens, as
these time scales differ in a small integer number of seconds (37 at the
time of writing.)
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PPS feature could be useful even when hardware time stamping
of network packets is not in use, so remove offending check
for this condition from fec_ptp_enable_pps().
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Similar to what have been done for DEVMAP, introduce tests to verify
ability to add a XDP program to an entry in a CPUMAP.
Verify CPUMAP programs can not be attached to devices as a normal
XDP program, and only programs with BPF_XDP_CPUMAP attach type can
be loaded in a CPUMAP.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/9c632fcea5382ea7b4578bd06b6eddf382c3550b.1594734381.git.lorenzo@kernel.org
|
|
Extend xdp_redirect_cpu_{usr,kern}.c adding the possibility to load
a XDP program on cpumap entries. The following options have been added:
- mprog-name: cpumap entry program name
- mprog-filename: cpumap entry program filename
- redirect-device: output interface if the cpumap program performs a
XDP_REDIRECT to an egress interface
- redirect-map: bpf map used to perform XDP_REDIRECT to an egress
interface
- mprog-disable: disable loading XDP program on cpumap entries
Add xdp_pass, xdp_drop, xdp_redirect stats accounting
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/aa5a9a281b9dac425620fdabe82670ffb6bbdb92.1594734381.git.lorenzo@kernel.org
|
|
As for DEVMAP, support SEC("xdp_cpumap/") as a short cut for loading
the program with type BPF_PROG_TYPE_XDP and expected attach type
BPF_XDP_CPUMAP.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/33174c41993a6d860d9c7c1f280a2477ee39ed11.1594734381.git.lorenzo@kernel.org
|
|
Introduce XDP_REDIRECT support for eBPF programs attached to cpumap
entries.
This patch has been tested on Marvell ESPRESSObin using a modified
version of xdp_redirect_cpu sample in order to attach a XDP program
to CPUMAP entries to perform a redirect on the mvneta interface.
In particular the following scenario has been tested:
rq (cpu0) --> mvneta - XDP_REDIRECT (cpu0) --> CPUMAP - XDP_REDIRECT (cpu1) --> mvneta
$./xdp_redirect_cpu -p xdp_cpu_map0 -d eth0 -c 1 -e xdp_redirect \
-f xdp_redirect_kern.o -m tx_port -r eth0
tx: 285.2 Kpps rx: 285.2 Kpps
Attaching a simple XDP program on eth0 to perform XDP_TX gives
comparable results:
tx: 288.4 Kpps rx: 288.4 Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/2cf8373a731867af302b00c4ff16c122630c4980.1594734381.git.lorenzo@kernel.org
|
|
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.
This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:
$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
|
|
As it has been already done for devmap, introduce 'struct bpf_cpumap_val'
to formalize the expected values that can be passed in for a CPUMAP.
Update cpumap code to use the struct.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
|
|
Do not update xdp_redirect_cpu maps running while option loop but
defer it after all available options have been parsed. This is a
preliminary patch to pass the program name we want to attach to the
map entries as a user option
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/95dc46286fd2c609042948e04bb7ae1f5b425538.1594734381.git.lorenzo@kernel.org
|
|
Move the guts of xdp_convert_buff_to_frame to a new helper,
xdp_update_frame_from_buff so it can be reused removing code duplication
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/90a68c283d7ebeb48924934c9b7ac79492300472.1594734381.git.lorenzo@kernel.org
|
|
Commit 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched") changed
away from using single frame ptr_ring dequeue (__ptr_ring_consume) to
consume a batched, but it uses a locked version, which as the comment
explain isn't needed.
Change to use the non-locked version __ptr_ring_consume_batched.
Fixes: 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/a9c7d06f9a009e282209f0c8c7b2c5d9b9ad60b9.1594734381.git.lorenzo@kernel.org
|
|
Drop the doubled word "by" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled words in several comments.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled word "the" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled word "to" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled words "or" and "the" in several comments.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled word "not" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled words in two comments.
Fix a spello/typo.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled words in several comments.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop doubled word "the" in two comments.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The word 'descriptor' is misspelled throughout the tree.
Fix it up accordingly:
decriptor -> descriptor
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ido Schimmel says:
====================
mlxsw: Offload tc police action
This patch set adds support for tc police action in mlxsw.
Patches #1-#2 add defines for policer bandwidth limits and resource
identifiers (e.g., maximum number of policers).
Patch #3 adds a common policer core in mlxsw. Currently it is only used
by the policy engine, but future patch sets will use it for trap
policers and storm control policers. The common core allows us to share
common logic between all policer types and abstract certain details from
the various users in mlxsw.
Patch #4 exposes the maximum number of supported policers and their
current usage to user space via devlink-resource. This provides better
visibility and also used for selftests purposes.
Patches #5-#7 gradually add support for tc police action in the policy
engine by calling into previously mentioned policer core.
Patch #8 adds a generic selftest for tc-police that can be used with
veth pairs or physical loopbacks.
Patches #9-#11 add mlxsw-specific selftests.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Test that policers shared by different tc filters are correctly
reference counted by observing policers' occupancy via devlink-resource.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|