summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-07sfc: falcon: Use the bitmap API to allocate bitmapsChristophe JAILLET1-4/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/c62c1774e6a34bc64323ce526b385aa87c1ca575.1657049799.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07sfc/siena: Use the bitmap API to allocate bitmapsChristophe JAILLET1-4/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/717ba530215f4d7ce9fedcc73d98dba1f70d7f71.1657049636.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07net: dsa: b53: remove unnecessary spi_set_drvdata()Yang Yingliang1-2/+0
Remove unnecessary spi_set_drvdata() in b53_spi_remove(), the driver_data will be set to NULL in device_unbind_cleanup() after calling ->remove(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220705131733.351962-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07Revert "Merge branch 'octeontx2-af-next'"Jakub Kicinski20-2876/+67
This reverts commit 2ef8e39f58f08589ab035223c2687830c0eba30f, reversing changes made to e7ce9fc9ad38773b660ef663ae98df4f93cb6a37. There are build warnings here which break the normal build due to -Werror. Ratheesh was nice enough to quickly follow up with fixes but didn't hit all the warnings I see on GCC 12 so to unlock net-next from taking patches let get this series out for now. Link: https://lore.kernel.org/r/20220707013201.1372433-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06Merge branch 'hinic-dev_get_stats-fixes'David S. Miller4-52/+16
Qiao Ma says: ==================== net: hinic: fix bugs about dev_get_stats These patches fixes 2 bugs of hinic driver: - fix bug that ethtool get wrong stats because of hinic_{txq|rxq}_clean_stats() is called - avoid kernel hung in hinic_get_stats64() See every patch for more information. Changes in v4: - removed meaningless u64_stats_sync protection in hinic_{txq|rxq}_get_stats - merged the third patch in v2 into first one Changes in v3: - fixes a compile warning reported by kernel test robot <lkp@intel.com> Changes in v2: - fixes another 2 bugs. (v1 is a single patch, see: https://lore.kernel.org/all/07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154.git.mqaio@linux.alibaba.com/). - to fix extra bugs, hinic_dev.tx_stats/rx_stats is removed, so there is no need to use spinlock or semaphore now. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06net: hinic: avoid kernel hung in hinic_get_stats64()Qiao Ma1-4/+0
When using hinic device as a bond slave device, and reading device stats of master bond device, the kernel may hung. The kernel panic calltrace as follows: Kernel panic - not syncing: softlockup: hung tasks Call trace: native_queued_spin_lock_slowpath+0x1ec/0x31c dev_get_stats+0x60/0xcc dev_seq_printf_stats+0x40/0x120 dev_seq_show+0x1c/0x40 seq_read_iter+0x3c8/0x4dc seq_read+0xe0/0x130 proc_reg_read+0xa8/0xe0 vfs_read+0xb0/0x1d4 ksys_read+0x70/0xfc __arm64_sys_read+0x20/0x30 el0_svc_common+0x88/0x234 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 And the calltrace of task that actually caused kernel hungs as follows: __switch_to+124 __schedule+548 schedule+72 schedule_timeout+348 __down_common+188 __down+24 down+104 hinic_get_stats64+44 [hinic] dev_get_stats+92 bond_get_stats+172 [bonding] dev_get_stats+92 dev_seq_printf_stats+60 dev_seq_show+24 seq_read_iter+964 seq_read+220 proc_reg_read+164 vfs_read+172 ksys_read+108 __arm64_sys_read+28 el0_svc_common+132 do_el0_svc+40 el0_svc+24 el0_sync_handler+164 el0_sync+324 When getting device stats from bond, kernel will call bond_get_stats(). It first holds the spinlock bond->stats_lock, and then call hinic_get_stats64() to collect hinic device's stats. However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to protect its critical section, which may schedule current task out. And if system is under high pressure, the task cannot be woken up immediately, which eventually triggers kernel hung panic. Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local variable in hinic_get_stats64(), there is nothing need to be protected by lock, so just removing down()/up() is ok. Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats") Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06net: hinic: fix bug that ethtool get wrong statsQiao Ma4-48/+16
Function hinic_get_stats64() will do two operations: 1. reads stats from every hinic_rxq/txq and accumulates them 2. calls hinic_rxq/txq_clean_stats() to clean every rxq/txq's stats For hinic_get_stats64(), it could get right data, because it sums all data to nic_dev->rx_stats/tx_stats. But it is wrong for get_drv_queue_stats(), this function will read hinic_rxq's stats, which have been cleared to zero by hinic_get_stats64(). I have observed hinic's cleanup operation by using such command: > watch -n 1 "cat ethtool -S eth4 | tail -40" Result before: ... rxq7_pkts: 1 rxq7_bytes: 90 rxq7_errors: 0 rxq7_csum_errors: 0 rxq7_other_errors: 0 ... rxq9_pkts: 11 rxq9_bytes: 726 rxq9_errors: 0 rxq9_csum_errors: 0 rxq9_other_errors: 0 ... rxq11_pkts: 0 rxq11_bytes: 0 rxq11_errors: 0 rxq11_csum_errors: 0 rxq11_other_errors: 0 Result after a few seconds: ... rxq7_pkts: 0 rxq7_bytes: 0 rxq7_errors: 0 rxq7_csum_errors: 0 rxq7_other_errors: 0 ... rxq9_pkts: 2 rxq9_bytes: 132 rxq9_errors: 0 rxq9_csum_errors: 0 rxq9_other_errors: 0 ... rxq11_pkts: 1 rxq11_bytes: 170 rxq11_errors: 0 rxq11_csum_errors: 0 rxq11_other_errors: 0 To solve this problem, we just keep every queue's total stats in their own queue (aka hinic_{rxq|txq}), and simply sum all per-queue stats every time calling hinic_get_stats64(). With that solution, there is no need to clean per-queue stats now, and there is no need to maintain global hinic_dev.{tx|rx}_stats, too. Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats") Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06Merge branch 'tls-rx-nopad-and-backlog-flushing'David S. Miller10-17/+191
Jakub Kicinski says: ==================== tls: rx: nopad and backlog flushing This small series contains the two changes I've been working towards in the previous ~50 patches a couple of months ago. The first major change is the optional "nopad" optimization. Currently TLS 1.3 Rx performs quite poorly because it does not support the "zero-copy" or rather direct decrypt to a user space buffer. Because of TLS 1.3 record padding we don't know if a record contains data or a control message until we decrypt it. Most records will contain data, tho, so the optimization is to try the decryption hoping its data and retry if it wasn't. The performance gain from doing that is significant (~40%) but if I'm completely honest the major reason is that we call skb_cow_data() on the non-"zc" path. The next series will remove the CoW, dropping the gain to only ~10%. The second change is to flush the backlog every 128kB. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06tls: rx: periodically flush socket backlogJakub Kicinski2-0/+24
We continuously hold the socket lock during large reads and writes. This may inflate RTT and negatively impact TCP performance. Flush the backlog periodically. I tried to pick a flush period (128kB) which gives significant benefit but the max Bps rate is not yet visibly impacted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06selftests: tls: add selftest variant for padJakub Kicinski1-0/+15
Add a self-test variant with TLS 1.3 nopad set. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3Jakub Kicinski8-7/+122
Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06tls: rx: support optimistic decrypt to user buffer with TLS 1.3Jakub Kicinski1-9/+29
We currently don't support decrypt to user buffer with TLS 1.3 because we don't know the record type and how much padding record contains before decryption. In practice data records are by far most common and padding gets used rarely so we can assume data record, no padding, and if we find out that wasn't the case - retry the crypto in place (decrypt to skb). To safeguard from user overwriting content type and padding before we can check it attach a 1B sg entry where last byte of the record will land. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06tls: rx: don't include tail size in data_lenJakub Kicinski1-4/+4
To make future patches easier to review make data_len contain the length of the data, without the tail. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06Merge branch 'octeontx2-af-next'David S. Miller20-67/+2876
Ratheesh Kannoth says: ==================== octeontx2: *** Exact Match Table and Field hash *** *** Exact match table and Field hash support for CN10KB silicon *** Ratheesh Kannoth (11): These patch series enables exact match table in CN10KB silicon. Legacy silicon used NPC mcam to do packet fields/channel matching for NPC rules. NPC mcam resources exahausted as customer use case increased. Supporting many DMAC filter becomes a challenge, as RPM based filter count is less. Exact match table has 4way 2K entry table and a 32 entry fully associative cam table. Second table is to handle hash table collision overflows in 4way 2K entry table. Enabling exact match table results in KEX key to be appended with Hit/Miss status. This can be used to match in NPC mcam for a more generic rule and drop those packets than having DMAC drop rules for each DMAC entry in NPC mcam. octeontx2-af: Exact match support octeontx2-af: Exact match scan from kex profile octeontx2-af: devlink configuration support octeontx2-af: FLR handler for exact match table. octeontx2-af: Drop rules for NPC MCAM octeontx2-af: Debugsfs support for exact match. octeontx2: Modify mbox request and response structures octeontx2-af: Wrapper functions for mac addr add/del/update/reset octeontx2-af: Invoke exact match functions if supported octeontx2-pf: Add support for exact match table. octeontx2-af: Enable Exact match flag in kex profile Suman Ghosh (1): CN10KB variant of CN10K series of silicons supports a new feature where in a large protocol field (eg 128bit IPv6 DIP) can be condensed into a small hashed 32bit data. This saves a lot of space in MCAM key and allows user to add more protocol fields into the filter. A max of two such protocol data can be hashed. This patch adds support for hashing IPv6 SIP and/or DIP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Enable Exact match flag in kex profileRatheesh Kannoth1-2/+3
Enabled EXACT match flag in Kex default profile. Since there is no space in key, NPC_PARSE_NIBBLE_ERRCODE is removed Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-pf: Add support for exact match table.Ratheesh Kannoth4-29/+67
NPC exact match table can support more entries than RPM dmac filters. This requires field size of DMAC filter count and index to be increased. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Invoke exact match functions if supportedRatheesh Kannoth2-0/+41
If exact match table is suppoted, call functions to add/del/update entries in exact match table instead of RPM dmac filters Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Wrapper functions for MAC addr add/del/update/resetRatheesh Kannoth2-11/+363
These functions are wrappers for mac add/addr/del/update in exact match table. These will be invoked from mbox handler routines if exact matct table is supported and enabled. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2: Modify mbox request and response structuresRatheesh Kannoth3-9/+24
Exact match table modification requires wider fields as it has more number of slots to fill in. Modifying an entry in exact match table may cause hash collision and may be required to delete entry from 4-way 2K table and add to fully associative 32 entry CAM table. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Debugsfs support for exact match.Ratheesh Kannoth1-0/+179
There debugfs files created. 1. General information on exact match table 2. Exact match table entries. 3. NPC mcam drop on hit count stats. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Drop rules for NPC MCAMRatheesh Kannoth6-6/+476
NPC exact match table installs drop on hit rules in NPC mcam for each channel. This rule has broadcast and multicast bits cleared. Exact match bit cleared and channel bits set. If exact match table hit bit is 0, corresponding NPC mcam drop rule will be hit for the packet and will be dropped. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: FLR handler for exact match table.Ratheesh Kannoth3-0/+31
FLR handler should remove/free all exact match table resources corresponding to each interface. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: devlink configuration supportRatheesh Kannoth3-2/+100
CN10KB silicon supports Exact match feature. This feature can be disabled through devlink configuration. Devlink command fails if DMAC filter rules are already present. Once disabled, legacy RPM based DMAC filters will be configured. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Exact match scan from kex profileRatheesh Kannoth2-3/+35
CN10KB silicon supports exact match table. Scanning KEX profile should check for exact match feature is enabled and then set profile masks properly. These kex profile masks are required to configure NPC MCAM drop rules. If there is a miss in exact match table, these drop rules will drop those packets. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Exact match supportRatheesh Kannoth6-1/+1016
CN10KB silicon has support for exact match table. This table can be used to match maimum 64 bit value of KPU parsed output. Hit/non hit in exact match table can be used as a KEX key to NPC mcam. This patch makes use of Exact match table to increase number of DMAC filters supported. NPC mcam is no more need for each of these DMAC entries as will be populated in Exact match table. This patch implements following 1. Initialization of exact match table only for CN10KB. 2. Add/del/update interface function for exact match table. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06octeontx2-af: Use hashed field in MCAM keyRatheesh Kannoth12-17/+554
CN10KB variant of CN10K series of silicons supports a new feature where in a large protocol field (eg 128bit IPv6 DIP) can be condensed into a small hashed 32bit data. This saves a lot of space in MCAM key and allows user to add more protocol fields into the filter. A max of two such protocol data can be hashed. This patch adds support for hashing IPv6 SIP and/or DIP. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06Merge branch 'nfp-tso'David S. Miller3-24/+5
Merge branch 'nfp-tso' Simon Horman says: ==================== nfp: enable TSO by default this short series enables TSO by default on all NICs supported by the NFP driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06nfp: enable TSO by default for nfp netdevSimon Horman2-7/+5
We can benefit from TSO when the host CPU is not powerful enough, so enable it by default now. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06nfp: allow TSO packets with metadata prepended in NFDK pathYinjun Zhang1-17/+0
Packets with metadata prepended can be correctly handled in firmware when TSO is enabled, now remove the error path and related comments. Since there's no existing firmware that uses prepended metadata, no need to add compatibility check here. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06net: asix: change the type of asix_set_sw/hw_mii to staticZhengchao Shao2-22/+21
The functions of asix_set_sw/hw_mii are not called in other files, so change them to static. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220704123448.128980-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06net: dsa: felix: build as module when tc-taprio is moduleVladimir Oltean1-0/+1
felix_vsc9959.c calls taprio_offload_get() and taprio_offload_free(), symbols exported by net/sched/sch_taprio.c. As such, we must disallow building the Felix driver as built-in when the symbol exported by tc-taprio isn't present in the kernel image. Fixes: 1c9017e44af2 ("net: dsa: felix: keep reference on entire tc-taprio config") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220704190241.1288847-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06net: sched: provide shim definitions for taprio_offload_{get,free}Vladimir Oltean1-0/+17
All callers of taprio_offload_get() and taprio_offload_free() prior to the blamed commit are conditionally compiled based on CONFIG_NET_SCH_TAPRIO. felix_vsc9959.c is different; it provides vsc9959_qos_port_tas_set() even when taprio is compiled out. Provide shim definitions for the functions exported by taprio so that felix_vsc9959.c is able to compile. vsc9959_qos_port_tas_set() in that case is dead code anyway, and ocelot_port->taprio remains NULL, which is fine for the rest of the logic. Fixes: 1c9017e44af2 ("net: dsa: felix: keep reference on entire tc-taprio config") Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20220704190241.1288847-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06eth: remove neterion/vxgeJakub Kicinski16-23265/+1
The last meaningful change to this driver was made by Jon in 2011. As much as we'd like to believe that this is because the code is perfect the chances are nobody is using this hardware. Because of the size of this driver there is a nontrivial maintenance cost to keeping this code around, in the last 2 years we're averaging more than 1 change a month. Some of which require nontrivial review effort, see commit 877fe9d49b74 ("Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"") for example. Let's try to remove this driver. In general, IMHO, we need to establish a clear path for shedding dead code. It will be hard to unless we have some experience trying to delete stuff. Link: https://lore.kernel.org/r/20220701044234.706229-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06dt-bindings: net: dsa: mediatek,mt7530: Add missing 'reg' propertyRob Herring1-0/+3
The 'reg' property is missing from the mediatek,mt7530 schema which results in the following warning once 'unevaluatedProperties' is fixed: Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.example.dtb: switch@0: Unevaluated properties are not allowed ('reg' was unexpected) Fixes: e0dda3119741 ("dt-bindings: net: dsa: convert binding for mediatek switches") Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220701222240.1706272-1-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-05cxgb4: Use the bitmap API to allocate bitmapsChristophe JAILLET3-24/+17
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. While at it, remove a useless bitmap_zero(). The bitmap is already zeroed when allocated. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/8a2168ef9871bd9c4f1cf19b8d5f7530662a5d15.1656866770.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-05net/mlx5: fix 32bit buildPaolo Abeni1-1/+2
We can't use the division operator on 64 bits integers, that breaks 32 bits build. Instead use the relevant helper. Fixes: 6ddac26cf763 ("net/mlx5e: Add support to modify hardware flow meter parameters") Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/ecb00ddd1197b4f8a4882090206bd2eee1eb8b5b.1657005206.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-05Merge branch 'af_unix-fix-regression-by-the-per-netns-hash-table-series'Paolo Abeni4-11/+189
Kuniyuki Iwashima says: ==================== af_unix: Fix regression by the per-netns hash table series. The series 6dd4142fb5a9 ("Merge branch 'af_unix-per-netns-socket-hash'") replaced a global hash table with per-netns tables, which caused regression reported in the links below. [0][1] When a pathname socket is visible, any socket with the same type has to be able to connect to it even in different netns. The series puts all sockets into each namespace's hash table, making it impossible to look up a visible socket in different netns. On the other hand, while dumping sockets, they are filtered by netns. To keep such code simple, let's add a new global hash table only for pathname sockets and link them with sk_bind_node. Then we can keep all sockets in each per-netns table and look up pathname sockets via the global table. [0]: https://lore.kernel.org/netdev/B2AA3091-796D-475E-9A11-0021996E1C00@linux.ibm.com/ [1]: https://lore.kernel.org/netdev/5fb8d86f-b633-7552-8ba9-41e42f07c02a@gmail.com/ Changes: v3: * 1st: Update changelog s/named/pathname/ * 2nd: Fix checkpatch.pl CHECK by --strict option v2: https://lore.kernel.org/netdev/20220702014447.93746-1-kuniyu@amazon.com/ * Add selftest v1: https://lore.kernel.org/netdev/20220701072519.96097-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20220702154818.66761-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-05selftests: net: af_unix: Test connect() with different netns.Kuniyuki Iwashima3-1/+152
This patch add a test that checks connect()ivity between two sockets: unnamed socket -> bound socket * SOCK_STREAM or SOCK_DGRAM * pathname or abstract * same or different netns Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-05af_unix: Put pathname sockets in the global hash table.Kuniyuki Iwashima1-10/+37
Commit cf2f225e2653 ("af_unix: Put a socket into a per-netns hash table.") accidentally broke user API for pathname sockets. A socket was able to connect() to a pathname socket whose file was visible even if they were in different network namespaces. The commit puts all sockets into a per-netns hash table. As a result, connect() to a pathname socket in a different netns fails to find it in the caller's per-netns hash table and returns -ECONNREFUSED even when the task can view the peer socket file. We can reproduce this issue by: Console A: # python3 >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM, 0) >>> s.bind('test') >>> s.listen(32) Console B: # ip netns add test # ip netns exec test sh # python3 >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM, 0) >>> s.connect('test') Note when dumping sockets by sock_diag, procfs, and bpf_iter, they are filtered only by netns. In other words, even if they are visible and connect()able, all sockets in different netns are skipped while iterating sockets. Thus, we need a fix only for finding a peer pathname socket. This patch adds a global hash table for pathname sockets, links them with sk_bind_node, and uses it in unix_find_socket_byinode(). By doing so, we can keep sockets in per-netns hash tables and dump them easily. Thanks to Sachin Sant and Leonard Crestez for reports, logs and a reproducer. Fixes: cf2f225e2653 ("af_unix: Put a socket into a per-netns hash table.") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reported-by: Leonard Crestez <cdleonard@gmail.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-04net: hns: Fix spelling mistakes in comments.Zhang Jiaming1-2/+2
Fix spelling of 'waitting' in comments. remove unnecessary space of 'MDIO_COMMAND_REG 's'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04Merge branch 'nfp-vlan-strip-and-insert'David S. Miller11-40/+187
Simon Horman says: ==================== nfp: support VLAN strip and insert this series adds support to the NFP driver for HW offload of both: * RX VLAN ctag/stag strip * TX VLAN ctag insert ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04nfp: support TX VLAN ctag insertDiana Wang5-19/+45
Add support for TX VLAN ctag insert which may be configured via ethtool. e.g. # ethtool -K $DEV tx-vlan-offload on The NIC supplies VLAN insert information as packet metadata. The fields of this VLAN metadata are gotten from sk_buff, including vlan_proto and vlan tag. Configuration control bit NFP_NET_CFG_CTRL_TXVLAN_V2 is to signal availability of ctag-insert features of the firmware. NFDK is used to communicate via PCIE to NFP-3800 based NICs while NFD3 is used for other NICs supported by the NFP driver. The metadata format on tx side of NFD3 is different from NFDK. This feature is not currently implemented for NFDK. Signed-off-by: Diana Wang <na.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04nfp: support RX VLAN ctag/stag stripDiana Wang11-22/+143
Add support for RX VLAN ctag/stag strip which may be configured via ethtool. e.g. # ethtool -K $DEV rx-vlan-offload on # ethtool -K $DEV rx-vlan-stag-hw-parse on Ctag-stripped and stag-stripped cannot be enabled at the same time because currently the kernel supports only one layer of VLAN stripping. The NIC supplies VLAN strip information as packet metadata. The fields of this VLAN metadata are: * strip flag: 1 for stripped; 0 for unstripped * tci: VLAN TCI ID * tpid: 1 for ETH_P_8021AD; 0 for ETH_P_8021Q Configuration control bits NFP_NET_CFG_CTRL_RXVLAN_V2 and NFP_NET_CFG_CTRL_RXQINQ are to signal availability of ctag-strip and stag-strip features of the firmware. Signed-off-by: Diana Wang <na.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04Merge branch 'smsc95xx-deadlock'David S. Miller1-116/+86
Lukas Wunner says: ==================== Deadlock no more in LAN95xx Second attempt at fixing a runtime resume deadlock in the LAN95xx USB driver: In short, the driver isn't using the "nopm" register accessors in portions of its runtime resume path, causing a deadlock. I'm fixing that by auto-detecting whether nopm accessors shall be used, instead of having to explicitly call them wherever it's necessary. As a byproduct, code size shrinks significantly (see diffstat below). Back in April I submitted a first attempt which was rejected by Alan Stern: https://lore.kernel.org/all/6710d8c18ff54139cdc538763ba544187c5a0cee.1651041411.git.lukas@wunner.de/ That approach only detected whether a PM callback is running concurrently, not whether the access is performed by the PM callback. I've come up with a different approach which should resolve the objection (see patch [1/3]). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04usbnet: smsc95xx: Clean up unnecessary BUG_ON() upon register accessLukas Wunner1-4/+0
smsc95xx_read_reg() and smsc95xx_write_reg() call BUG_ON() if the struct usbnet pointer passed in is NULL. The functions have just been amended to dereference the pointer on entry. So the kernel now oopses if the pointer is NULL, eliminating the need for an explicit BUG_ON(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04usbnet: smsc95xx: Clean up nopm handlingLukas Wunner1-106/+66
The LAN95xx driver has just been amended to auto-detect whether the _nopm variant of usbnet_read_cmd() / usbnet_write_cmd() shall be used. Drop all the now unnecessary open coding of that distinction. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04usbnet: smsc95xx: Fix deadlock on runtime resumeLukas Wunner1-6/+20
Commit 05b35e7eb9a1 ("smsc95xx: add phylib support") amended smsc95xx_resume() to call phy_init_hw(). That function waits for the device to runtime resume even though it is placed in the runtime resume path, causing a deadlock. The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(), which never uses the _nopm variant of usbnet_read_cmd(). Commit b4df480f68ae ("usbnet: smsc95xx: add reset_resume function with reset operation") causes a similar deadlock on resume if the device was already runtime suspended when entering system sleep: That's because the commit introduced smsc95xx_reset_resume(), which calls down to smsc95xx_reset(), which neglects to use _nopm accessors. Fix by auto-detecting whether a device access is performed by the suspend/resume task_struct and use the _nopm variant if so. This works because the PM core guarantees that suspend/resume callbacks are run in task context. Stacktrace for posterity: INFO: task kworker/2:1:49 blocked for more than 122 seconds. Workqueue: usb_hub_wq hub_event schedule rpm_resume __pm_runtime_resume usb_autopm_get_interface usbnet_read_cmd __smsc95xx_read_reg __smsc95xx_phy_wait_not_busy __smsc95xx_mdio_read smsc95xx_mdiobus_read __mdiobus_read mdiobus_read smsc_phy_reset phy_init_hw smsc95xx_resume usb_resume_interface usb_resume_both usb_runtime_resume __rpm_callback rpm_callback rpm_resume __pm_runtime_resume usb_autoresume_device hub_event process_one_work Fixes: b4df480f68ae ("usbnet: smsc95xx: add reset_resume function with reset operation") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.16+ Cc: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04net: phy: broadcom: Add support for BCM53128 internal PHYsKurt Kanzenbach2-0/+16
Add support for BCM53128 internal PHYs. These support interrupts as well as statistics. Therefore, enable the Broadcom PHY driver for them. Tested on BCM53128 switch using the mainline b53 DSA driver. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04dt-bindings: net: dsa: renesas,rzn1-a5psw: add interrupts descriptionClément Léger1-0/+23
Describe the switch interrupts (dlr, switch, prp, hub, pattern) which are connected to the GIC. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04selftest: net: bridge mdb add/del entry to port that is downCasper Andersson2-0/+119
Tests that permanent mdb entries can be added/deleted on ports with state down. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>