Age | Commit message (Collapse) | Author | Files | Lines |
|
SFD register configures FDB table. As preparation for unified bridge model,
add some required fields for future use.
In the new model, firmware no longer configures the egress VID, this
responsibility is moved to software. For layer 2 this means that software
needs to determine the egress VID for both unicast and multicast.
For unicast FDB records and unicast LAG FDB records, the VID needs to be
set via new fields in SFD - 'set_vid' and 'vid'.
Add the two mentioned fields for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SFMR register creates and configures FIDs. As preparation unified bridge
model, add some required fields for future use.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-2 and later ASICs, the SMPE index is an attribute of the FID
and programmed via new fields in SFMR register - 'smpe_valid' and 'smpe'.
Add the two mentioned fields for future use and increase the length of
the register accordingly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SMID register maps multicast ID (MID) into a list of local ports.
As preparation for unified bridge model, add some required fields for
future use.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-1, both indexes into the MPE table (local port and SMPE) are
derived from the PGT table. Therefore, the SMPE index needs to be
programmed as part of the PGT entry via new fields in SMID - 'smpe_valid'
and 'smpe'.
Add the two mentioned fields for future use and align the callers of
mlxsw_reg_smid2_pack() to pass zeros for SMPE fields.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SMPE register maps {egress_port, SMPE index} -> VID.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-1, the index into the MPE table - called switch multicast to
port egress VID (SMPE) - is derived from the PGT entry, whereas in
Spectrum-2 and later ASICs it is derived from the FID.
In the legacy model, software did not interact with this table as it was
completely hidden in firmware. In the new model, software needs to
populate the table itself in order to map from {Local port, SMPE index} to
an egress VID. This is done using the SMPE register.
Add the register for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SVFA register controls the VID to FID mapping and {Port, VID} to FID
mapping for virtualized ports. As preparation for unified bridge model,
add some required fields for future use.
On ingress, after ingress ACL, a packet needs to be classified to a FID.
The key for this lookup can be one of:
1. VID. When port is not in virtual mode.
2. {RQ, VID}. When port is in virtual mode.
3. FID. When FID was set by ingress ACL.
Since RITR no longer performs ingress configuration, the ingress RIF for
the first two entry types needs to be set via new fields in SVFA -
'irif_v' and 'irif'.
Add the two mentioned fields for future use and increase the length of
the register accordingly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SFMR register creates and configures FIDs. As preparation for unified
bridge model, add some required fields for future use.
On ingress, after ingress ACL, a packet needs to be classified to a FID.
The key for this lookup can be one of:
1. VID. When port is not in virtual mode.
2. {RQ, VID}. When port is in virtual mode.
3. FID. When FID was set by ingress ACL.
For example, via VR_AND_FID_ACTION.
Since RITR no longer performs ingress configuration, the ingress RIF for
the last entry type needs to be set via new fields in SFMR - 'irif_v'
and 'irif'.
Add the two mentioned fields for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SFMR register creates and configures FIDs. As preparation for unified
bridge model, add a field for future use.
In the new model, RITR no longer configures the rFID used for sub-port RIFs
and it has to be created by software via SFMR. Such FIDs need to be created
with special flood indication using 'flood_rsp' field. When set, this bit
instructs the device to manage the flooding entries for this FID in a
reserved part of the port group table (PGT).
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'Commit 6f91f4ba046e ("vmxnet3: add support for capability registers")'
added support for capability registers. These registers are used
to advertize capabilities of the device.
The patch updated the dev_caps to disable outer checksum offload if
PTCR register does not support it. However, it missed to update
other overlay offloads. This patch fixes this issue.
Fixes: 6f91f4ba046e ("vmxnet3: add support for capability registers")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kuniyuki Iwashima says:
====================
raw: Fix nits of RCU conversion series.
The first patch fixes a build error by commit ba44f8182ec2 ("raw: use
more conventional iterators"), but it does not land in the net tree,
so this series is targeted to net-next. The second patch replaces some
hlist functions with sk's helper macros.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry() have dedicated
macros for sk.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The trailing semicolon causes a compiler error, so let's remove it.
net/ipv4/raw.c: In function ‘raw_icmp_error’:
net/ipv4/raw.c:266:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
266 | struct hlist_nulls_head *hlist;
| ^~~~~~
Fixes: ba44f8182ec2 ("raw: use more conventional iterators")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delete the redundant word 'and'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delete the redundant word 'and'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delete the redundant word 'and'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 9386ebccfc59 ("nfp: update nfp_X logging definitions")
The reverted patch was intended to improve logging for the NFP driver by
including information such as the source code file and number in log
messages.
Unfortunately our experience is that this has not improved things as
we had hoped. The resulting logs are inconsistent with (most) other
kernel log messages. And rely on knowledge of the source code version
in order for the extra information to be useful.
Thus, revert the change.
We acknowledge that Jakub Kicinski <kuba@kernel.org> foresaw this problem.
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King says:
====================
net: introduce mii_bmcr_encode_fixed()
While converting the mv88e6xxx driver to phylink pcs, it has been
noticed that we've started to have repeated cases where we convert a
speed and duplex to a BMCR value.
Rather than open coding this in multiple locations, let's provide a
helper for this - in linux/mii.h. This helper not only takes care of
the standard 10, 100 and 1000Mbps encodings, but also includes
2500Mbps (which is the same as 1000Mbps) for those users who require
that encoding as well. Unknown speeds will be encoded to 10Mbps, and
non-full duplexes will be encoded as half duplex.
This series converts the existing users to the new helper, and the
mv88e6xxx conversion will add further users in the 6352 and 639x PCS
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the newly introduced mii_bmcr_encode_fixed() for the xpcs driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make use of the newly introduced mii_bmcr_encode_fixed() to get the
BMCR value when setting loopback mode for the 88e1510.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylib can make use of the newly introduced mii_bmcr_encode_fixed()
macro, so let's convert it over.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a function to encode a fixed speed/duplex to a BMCR value.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
raw: RCU conversion
Using rwlock in networking code is extremely risky.
writers can starve if enough readers are constantly
grabing the rwlock.
I thought rwlock were at fault and sent this patch:
https://lkml.org/lkml/2022/6/17/272
But Peter and Linus essentially told me rwlock had to be unfair.
We need to get rid of rwlock in networking stacks.
Without this conversion, following script triggers soft lockups:
for i in {1..48}
do
ping -f -n -q 127.0.0.1 &
sleep 0.1
done
Next step will be to convert ping sockets to RCU as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using rwlock in networking code is extremely risky.
writers can starve if enough readers are constantly
grabing the rwlock.
I thought rwlock were at fault and sent this patch:
https://lkml.org/lkml/2022/6/17/272
But Peter and Linus essentially told me rwlock had to be unfair.
We need to get rid of rwlock in networking code.
Without this fix, following script triggers soft lockups:
for i in {1..48}
do
ping -f -n -q 127.0.0.1 &
sleep 0.1
done
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to prepare the following patch,
I change raw v4 & v6 code to use more conventional
iterators.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When adjusting the PTP clock, the base time of the TAS configuration
will become unreliable. We need reset the TAS configuration by using a
new base time.
For example, if the driver gets a base time 0 of Qbv configuration from
user, and current time is 20000. The driver will set the TAS base time
to be 20000. After the PTP clock adjustment, the current time becomes
10000. If the TAS base time is still 20000, it will be a future time,
and TAS entry list will stop running. Another example, if the current
time becomes to be 10000000 after PTP clock adjust, a large time offset
can cause the hardware to hang.
This patch introduces a tas_clock_adjust() function to reset the TAS
module by using a new base time after the PTP clock adjustment. This can
avoid issues above.
Due to PTP clock adjustment can occur at any time, it may conflict with
the TAS configuration. We introduce a new TAS lock to serialize the
access to the TAS registers.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
QCOM_SOCINFO depends on QCOM_SMEM but is not selected, this cause some
problems with QCOM_SOCINFO getting selected with the dependency of
QCOM_SMEM not met.
To fix this remove the select in Kconfig and add additional info in the
DWMAC_IPQ806X config description.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9ec092d2feb6 ("net: ethernet: stmmac: add missing sgmii configure for ipq806x")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using rwlock in networking code is extremely risky.
writers can starve if enough readers are constantly
grabing the rwlock.
I thought rwlock were at fault and sent this patch:
https://lkml.org/lkml/2022/6/17/272
But Peter and Linus essentially told me rwlock had to be unfair.
We need to get rid of rwlock in networking code.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ax25_dev_device_up() is only called during device setup, which is
done in user context. In addition, ax25_dev_device_up()
unconditionally calls ax25_register_dev_sysctl(), which already
allocates with GFP_KERNEL.
Since it is allowed to sleep in this function, here we change
ax25_dev_device_up() to use GFP_KERNEL to reduce unnecessary
out-of-memory errors.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Lafreniere <pjlafren@mtu.edu>
Link: https://lore.kernel.org/r/20220616152333.9812-1-pjlafren@mtu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Delete the redundant word 'the'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220616164155.11686-1-wangxiang@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Delete the redundant word 'the'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220616142624.3397-1-wangxiang@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Show correct pause frame parameters for nfp. These parameters cannot
be configured, so .set_pauseparam() is not implemented. With this
change:
#ethtool --show-pause enp1s0np0
Pause parameters for enp1s0np0:
Autonegotiate: off
RX: on
TX: on
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220616133358.135305-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rework MDIO locking to avoid potential circular locking:
WARNING: possible circular locking dependency detected
5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5 Not tainted
------------------------------------------------------
kworker/u2:4/68 is trying to acquire lock:
81f3c83c (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}, at: regmap_write+0x50/0x8c
but task is already holding lock:
81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&bus->mdio_lock){+.+.}-{4:4}:
lock_acquire+0x2d4/0x360
__mutex_lock+0xf8/0x384
mutex_lock_nested+0x2c/0x38
mdiobus_write+0x44/0x80
ar9331_sw_bus_write+0x50/0xe4
_regmap_raw_write_impl+0x604/0x724
_regmap_bus_raw_write+0x9c/0xb4
_regmap_write+0xdc/0x1a0
_regmap_update_bits+0xf4/0x118
_regmap_select_page+0x108/0x138
_regmap_raw_read+0x25c/0x288
_regmap_bus_read+0x60/0x98
_regmap_read+0xd4/0x1b0
_regmap_update_bits+0xc4/0x118
regmap_update_bits_base+0x64/0x8c
ar9331_sw_irq_bus_sync_unlock+0x40/0x6c
__irq_set_handler+0x7c/0xac
ar9331_sw_irq_map+0x48/0x7c
irq_domain_associate+0x174/0x208
irq_create_mapping_affinity+0x1a8/0x230
ar9331_sw_probe+0x22c/0x388
mdio_probe+0x44/0x70
really_probe+0x200/0x424
__driver_probe_device+0x290/0x298
driver_probe_device+0x54/0xe4
__device_attach_driver+0xe4/0x130
bus_for_each_drv+0xb4/0xd8
__device_attach+0x104/0x1a4
bus_probe_device+0x48/0xc4
device_add+0x600/0x800
mdio_device_register+0x68/0xa0
of_mdiobus_register+0x2bc/0x3c4
ag71xx_probe+0x6e4/0x984
platform_probe+0x78/0xd0
really_probe+0x200/0x424
__driver_probe_device+0x290/0x298
driver_probe_device+0x54/0xe4
__driver_attach+0x17c/0x190
bus_for_each_dev+0x8c/0xd0
bus_add_driver+0x110/0x228
driver_register+0xe4/0x12c
do_one_initcall+0x104/0x2a0
kernel_init_freeable+0x250/0x288
kernel_init+0x34/0x130
ret_from_kernel_thread+0x14/0x1c
-> #0 (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}:
check_noncircular+0x88/0xc0
__lock_acquire+0x10bc/0x18bc
lock_acquire+0x2d4/0x360
__mutex_lock+0xf8/0x384
mutex_lock_nested+0x2c/0x38
regmap_write+0x50/0x8c
ar9331_sw_mbus_read+0x74/0x1b8
__mdiobus_read+0x90/0xec
mdiobus_read+0x50/0x78
get_phy_device+0xa0/0x18c
fwnode_mdiobus_register_phy+0x120/0x1d4
of_mdiobus_register+0x244/0x3c4
devm_of_mdiobus_register+0xe8/0x100
ar9331_sw_setup+0x16c/0x3a0
dsa_register_switch+0x7dc/0xcc0
ar9331_sw_probe+0x370/0x388
mdio_probe+0x44/0x70
really_probe+0x200/0x424
__driver_probe_device+0x290/0x298
driver_probe_device+0x54/0xe4
__device_attach_driver+0xe4/0x130
bus_for_each_drv+0xb4/0xd8
__device_attach+0x104/0x1a4
bus_probe_device+0x48/0xc4
deferred_probe_work_func+0xf0/0x10c
process_one_work+0x314/0x4d4
worker_thread+0x2a4/0x354
kthread+0x134/0x13c
ret_from_kernel_thread+0x14/0x1c
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&bus->mdio_lock);
lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock);
lock(&bus->mdio_lock);
lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u2:4/68:
#0: 81c04eb4 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4
#1: 81f0de78 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4
#2: 81f0a880 (&dev->mutex){....}-{4:4}, at: __device_attach+0x40/0x1a4
#3: 80c8aee0 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x5c/0xcc0
#4: 81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78
stack backtrace:
CPU: 0 PID: 68 Comm: kworker/u2:4 Not tainted 5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5
Workqueue: events_unbound deferred_probe_work_func
Stack : 00000056 800d4638 81f0d64c 00000004 00000018 00000000 80a20000 80a20000
80937590 81ef3858 81f0d760 3913578a 00000005 8045e824 81f0d600 a8db84cc
00000000 00000000 80937590 00000a44 00000000 00000002 00000001 ffffffff
81f0d6a4 80982d7c 0000000f 20202020 80a20000 00000001 80937590 81ef3858
81f0d760 3913578a 00000005 00000005 00000000 03bd0000 00000000 80e00000
...
Call Trace:
[<80069db0>] show_stack+0x94/0x130
[<8045e824>] dump_stack_lvl+0x54/0x8c
[<800c7fac>] check_noncircular+0x88/0xc0
[<800ca068>] __lock_acquire+0x10bc/0x18bc
[<800cb478>] lock_acquire+0x2d4/0x360
[<807b84c4>] __mutex_lock+0xf8/0x384
[<807b877c>] mutex_lock_nested+0x2c/0x38
[<804ea640>] regmap_write+0x50/0x8c
[<80501e38>] ar9331_sw_mbus_read+0x74/0x1b8
[<804fe9a0>] __mdiobus_read+0x90/0xec
[<804feac4>] mdiobus_read+0x50/0x78
[<804fcf74>] get_phy_device+0xa0/0x18c
[<804ffeb4>] fwnode_mdiobus_register_phy+0x120/0x1d4
[<805004f0>] of_mdiobus_register+0x244/0x3c4
[<804f0c50>] devm_of_mdiobus_register+0xe8/0x100
[<805017a0>] ar9331_sw_setup+0x16c/0x3a0
[<807355c8>] dsa_register_switch+0x7dc/0xcc0
[<80501468>] ar9331_sw_probe+0x370/0x388
[<804ff0c0>] mdio_probe+0x44/0x70
[<804d1848>] really_probe+0x200/0x424
[<804d1cfc>] __driver_probe_device+0x290/0x298
[<804d1d58>] driver_probe_device+0x54/0xe4
[<804d2298>] __device_attach_driver+0xe4/0x130
[<804cf048>] bus_for_each_drv+0xb4/0xd8
[<804d200c>] __device_attach+0x104/0x1a4
[<804d026c>] bus_probe_device+0x48/0xc4
[<804d108c>] deferred_probe_work_func+0xf0/0x10c
[<800a0ffc>] process_one_work+0x314/0x4d4
[<800a17fc>] worker_thread+0x2a4/0x354
[<800a9a54>] kthread+0x134/0x13c
[<8006306c>] ret_from_kernel_thread+0x14/0x1c
[
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220616112550.877118-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-06-17
We've added 72 non-merge commits during the last 15 day(s) which contain
a total of 92 files changed, 4582 insertions(+), 834 deletions(-).
The main changes are:
1) Add 64 bit enum value support to BTF, from Yonghong Song.
2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.
3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.
4) Fix libbpf's internal USDT address translation logic for shared libraries as
well as uprobe's symbol file offset calculation, from Andrii Nakryiko.
5) Extend libbpf to provide an API for textual representation of the various
map/prog/attach/link types and use it in bpftool, from Daniel Müller.
6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
core seen in 32 bit when storing BPF function addresses, from Pu Lehui.
7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
for 'long' types, from Douglas Raillard.
8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
has been unreliable and caused a regression on COS, from Quentin Monnet.
9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
detachment, from Tadeusz Struk.
10) Fix bpftool build bootstrapping during cross compilation which was pointing
to the wrong AR process, from Shahab Vahedi.
11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.
12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
is no free element. Also add a benchmark as reproducer, from Feng Zhou.
13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.
14) Various minor cleanup and improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
bpf: Fix bpf_skc_lookup comment wrt. return type
bpf: Fix non-static bpf_func_proto struct definitions
selftests/bpf: Don't force lld on non-x86 architectures
selftests/bpf: Add selftests for raw syncookie helpers in TC mode
bpf: Allow the new syncookie helpers to work with SKBs
selftests/bpf: Add selftests for raw syncookie helpers
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Allow helpers to accept pointers with a fixed size
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
selftests/bpf: add tests for sleepable (uk)probes
libbpf: add support for sleepable uprobe programs
bpf: allow sleepable uprobe programs to attach
bpf: implement sleepable uprobes by chaining gps
bpf: move bpf_prog to bpf.h
libbpf: Fix internal USDT address translation logic for shared libraries
samples/bpf: Check detach prog exist or not in xdp_fwd
selftests/bpf: Avoid skipping certain subtests
selftests/bpf: Fix test_varlen verification failure with latest llvm
bpftool: Do not check return value from libbpf_set_strict_mode()
Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
...
====================
Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function no longer returns 'unsigned long' as of commit edbf8c01de5a
("bpf: add skc_lookup_tcp helper").
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220617152121.29617-1-tklauser@distanz.ch
|
|
This patch does two things:
1) Marks the dynptr bpf_func_proto structs that were added in [1]
as static, as pointed out by the kernel test robot in [2].
2) There are some bpf_func_proto structs marked as extern which can
instead be statically defined.
[1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/
[2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com
|
|
tipc_dest_list_len() is not being called anywhere. Clean it up.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
JML register on probe will return zero . This register is configured
later on macb_init_hw() which is called on open.
Since we have zero, after header and FCS length subtraction we will get
negative max_mtu size. This issue was affecting DSA drivers with MTU support
(for example KSZ9477).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Under CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y, Clang is bugged
here for calculating the size of the destination buffer (0x10 instead of
0x14). This copy is a fixed size (sizeof(struct fw_section_info_st)), with
the source and dest being struct fw_section_info_st, so the memcpy should
be safe, assuming the index is within bounds, which is UBSAN_BOUNDS's
responsibility to figure out.
Avoid the whole thing and just do a direct assignment. This results in
no change to the executable code.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: netdev@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://github.com/ClangBuiltLinux/linux/issues/1592
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current kernel will compile this driver with warnings. This patch will
fix it.
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_fast_reset':
drivers/net/ethernet/atheros/ag71xx.c:996:31: warning: passing argument 2 of 'ag71xx_hw_set
_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
996 | ag71xx_hw_set_macaddr(ag, dev->dev_addr);
| ~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
| ~~~~~~~~~~~~~~~^~~
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_open':
drivers/net/ethernet/atheros/ag71xx.c:1441:32: warning: passing argument 2 of 'ag71xx_hw_se
t_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
1441 | ag71xx_hw_set_macaddr(ag, ndev->dev_addr);
| ~~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
| ~~~~~~~~~~~~~~~^~~
Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove accidental dup of tcp_wmem_schedule.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ong Boon Leong says:
====================
pcs-xpcs, stmmac: add 1000BASE-X AN for network switch
Thanks for v4 review feedback in [1] and [2]. I have changed the v5
implementation as follow.
v5 changes:
1/5 - No change from v4.
2/5 - No change from v4.
3/5 - [Fix] make xpcs_modify_changed() static and use
mdiodev_modify_changed() for cleaner code as suggested by
Russell King.
4/5 - [Fix] Use fwnode_get_phy_mode() as recommended by Andrew Lunn.
5/5 - [Fix] Make fwnode = of_fwnode_handle(priv->plat->phylink_node)
order after priv = netdev_priv(dev).
v4 changes:
1/5 - Squash v3:1/7 & 2/7 patches into v4:1/6 so that it passes build.
2/5 - [No change] same as v3:3/7
3/5 - [Fix] Fix issues identified by Russell in [1]
4/5 - [Fix] Drop v3:5/7 patch per input by Russell in [2] and make
dwmac-intel clear the ovr_an_inband flag if fixed-link
is used in ACPI _DSD.
5/5 - [No change] same as v3:7/7
For the steps to setup ACPI _DSD and checking, they are the same
as in [3]
Reference:
[1] https://patchwork.kernel.org/comment/24894239/
[2] https://patchwork.kernel.org/comment/24895330/
[3] https://patchwork.kernel.org/project/netdevbpf/cover/20220610033610.114084-1-boon.leong.ong@intel.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
stmmac_mdio_register() lacks fixed-link consideration and only skip PHY
scanning if it has done DT style PHY discovery. So, for DT or ACPI _DSD
setting of fixed-link, the PHY scanning should not happen.
v2: fix incorrect order related to fwnode that is not caught in non-DT
platform.
Tested-by: Emilio Riva <emilio.riva@ericsson.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, phy_interface for TSN controller instance is set based on its
PCI Device ID. For SGMII PHY interface, phy_interface default to
PHY_INTERFACE_MODE_SGMII. As C37 AN supports both SGMII and 1000BASE-X
mode, we add support for 'phy-mode' ACPI _DSD for port-specific
and customer platform specific customization.
v3: use fwnode_get_phy_mode() as suggested by Andrew Lunn in
https://patchwork.kernel.org/comment/24895330/
v2:
For platform that sets 'fixed-link' using ACPI _DSD, we will unset
xpcs_an_inband within stmmac. Thanks to Russell King for his comment in
https://patchwork.kernel.org/comment/24890222/
v1:
Thanks to Andrew Lunn's guidance in
https://patchwork.kernel.org/comment/24827101/
Tested-by: Emilio Riva <emilio.riva@ericsson.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For CL37 1000BASE-X AN, DW xPCS does not support C22 method but offers
C45 vendor-specific MII MMD for programming.
We also add the ability to disable Autoneg (through ethtool for certain
network switch that supports 1000BASE-X (1000Mbps and Full-Duplex) but
not Autoneg capability.
v4: Fixes to comment from Russell King. Thanks!
https://patchwork.kernel.org/comment/24894239/
Make xpcs_modify_changed() as private, change to use
mdiodev_modify_changed() for cleaner code.
v3: Fixes to issues spotted by Russell King. Thanks!
https://patchwork.kernel.org/comment/24890210/
Use phylink_mii_c22_pcs_decode_state(), remove unnecessary
interrupt clearing and skip speed & duplex setting if AN
is enabled.
v2: Fixes to issues spotted by Russell King in v1. Thanks!
https://patchwork.kernel.org/comment/24826650/
Use phylink_mii_c22_pcs_encode_advertisement() and implement
C45 MII ADV handling since IP only support C45 access.
Tested-by: Emilio Riva <emilio.riva@ericsson.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, intel_speed_mode_2500() redundantly fix-up phy_interface to
PHY_INTERFACE_MODE_SGMII if the underlying controller is in 1000Mbps
SGMII mode. The value of phy_interface has been initialized earlier.
This patch removes such redundancy to prepare for setting 1000BASE-X
mode for certain hardware platform configuration.
Also update the intel_mgbe_common_data() to include 1000BASE-X setup.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
xpcs_config() has 'advertising' input that is required for C37 1000BASE-X
AN in later patch series. So, we prepare xpcs_do_config() for it.
For sja1105, xpcs_do_config() is used for xpcs configuration without
depending on advertising input, so set to NULL.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: L3 HW stats improvements
While testing L3 HW stats [1] on top of mlxsw, two issues were found:
1. Stats cannot be enabled for more than 205 netdevs. This was fixed in
commit 4b7a632ac4e7 ("mlxsw: spectrum_cnt: Reorder counter pools").
2. ARP packets are counted as errors. Patch #1 takes care of that. See
the commit message for details.
The goal of the majority of the rest of the patches is to add selftests
that would have discovered that only about 205 netdevs can have L3 HW
stats supported, despite the HW supporting much more. The obvious place
to plug this in is the scale test framework.
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures
are noted and handled gracefully.
However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.
To that end, this patchset adds traffic tests. The goal of these is to
run traffic and observe whether a sample of the allocated resource
instances actually perform their task. Traffic tests are only run on the
positive leg of the scale test (no point trying to pass traffic when the
expected outcome is that the resource will not be allocated). They are
opt-in, if a given test does not expose it, it is not run.
The patchset proceeds as follows:
- Patches #2 and #3 add to "devlink resource" support for number of
allocated RIFs, and the capacity. This is necessary, because when
evaluating how many L3 HW stats instances it should be possible to
allocate, the limiting resource on Spectrum-2 and above currently is
not the counters themselves, but actually the RIFs.
- Patch #6 adds support for invocation of a traffic test, if a given scale
tests exposes it.
- Patch #7 adds support for skipping a given scale test. Because on
Spectrum-2 and above, the limiting factor to L3 HW stats instances is
actually the number of RIFs, there is no point in running the failing leg
of a scale tests, because it would test exhaustion of RIFs, not of RIF
counters.
- With patch #8, the scale tests drivers pass the target number to the
cleanup function of a scale test.
- In patch #9, add a traffic test to the tc_flower selftests. This makes
sure that the flow counters installed with the ACLs actually do count as
they are supposed to.
- In patch #10, add a new scale selftest for RIF counter scale, including a
traffic test.
- In patch #11, the scale target for the tc_flower selftest is
dynamically set instead of being hard coded.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca0a53dcec9495d1dc5bbc369c810c520d728373
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of hard coding the scale target in the test, dynamically set it
based on the maximum number of flow counters and their current
occupancy.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This tests creates as many RIFs as possible, ideally more than there can be
RIF counters (though that is currently only possible on Spectrum-1). It
then tries to enable L3 HW stats on each of the RIFs. It also contains the
traffic test, which tries to run traffic through a log2 of those counters
and checks that the traffic is shown in the counter values.
Like with tc_flower traffic test, take a log2 subset of rules. The logic
behind picking log2 rules is that then every bit of the instantiated item's
number is exercised. This should catch issues whether they happen at the
high end, low end, or somewhere in between.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a test that checks that the created filters do actually trigger on
matching traffic.
Exercising all the rules would be a very lengthy process. Instead, take a
log2 subset of rules. The logic behind picking log2 rules is that then
every bit of the instantiated item's number is exercised. This should catch
issues whether they happen at the high end, low end, or somewhere in
between.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The scale tests are verifying behavior of mlxsw when number of instances of
some resource reaches the ASIC capacity. The number of instances is
referred to as "target" number.
No scale tests so far needed to know this target number to clean up. E.g.
the tc_flower simply removes the clsact qdisc that all the tested filters
are hooked onto, and that takes care of collecting all the filters.
However, for the RIF counter test, which is being added in a future patch,
VLAN netdevices are created. These are created as part of the test, but of
course the cleanup needs to undo them again. For that it needs to know how
many there were. To support this usage, pass the target number to the
cleanup callback.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|