summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-22Merge branch 'ax88796c-spi-ethernet-adapter'Jakub Kicinski13-0/+2304
Łukasz Stelmach says: ==================== AX88796C SPI Ethernet Adapter This is a driver for AX88796C Ethernet Adapter connected in SPI mode as found on ARTIK5 evaluation board. The driver has been ported from a v3.10.9 vendor kernel for ARTIK5 board. ==================== Link: https://lore.kernel.org/r/20211020182422.362647-1-l.stelmach@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22net: ax88796c: ASIX AX88796C SPI Ethernet Adapter DriverŁukasz Stelmach11-0/+2229
ASIX AX88796[1] is a versatile ethernet adapter chip, that can be connected to a CPU with a 8/16-bit bus or with an SPI. This driver supports SPI connection. The driver has been ported from the vendor kernel for ARTIK5[2] boards. Several changes were made to adapt it to the current kernel which include: + updated DT configuration, + clock configuration moved to DT, + new timer, ethtool and gpio APIs, + dev_* instead of pr_* and custom printk() wrappers, + removed awkward vendor power managemtn. + introduced ethtool tunable to control SPI compression [1] https://www.asix.com.tw/products.php?op=pItemdetail&PItemID=104;65;86&PLine=65 [2] https://git.tizen.org/cgit/profile/common/platform/kernel/linux-3.10-artik/ The other ax88796 driver is for NE2000 compatible AX88796L chip. These chips are not compatible. Hence, two separate drivers are required. Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22dt-bindings: net: Add bindings for AX88796C SPI Ethernet AdapterŁukasz Stelmach1-0/+73
Add bindings for AX88796C SPI Ethernet Adapter. Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22dt-bindings: vendor-prefixes: Add asix prefixŁukasz Stelmach1-0/+2
Add the prefix for ASIX Electronics Corporation. Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22Merge branch 'enetc-trivial-ptp-one-step-tx-timestamping-cleanups'Jakub Kicinski1-6/+2
Vladimir Oltean says: ==================== enetc: trivial PTP one-step TX timestamping cleanups These are two cleanup patches for some inconsistencies I noticed in the driver's TX ring cleanup function. ==================== Link: https://lore.kernel.org/r/20211020174220.1093032-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22net: enetc: use the skb variable directly in enetc_clean_tx_ring()Vladimir Oltean1-2/+1
The code checks whether the skb had one-step TX timestamping enabled, in order to schedule the work item for emptying the priv->tx_skbs queue. That code checks for "tx_swbd->skb" directly, when we already had a skb retrieved using enetc_tx_swbd_get_skb(tx_swbd) - a TX software BD can also hold an XDP_TX packet or an XDP frame. But since the direct tx_swbd dereference is in an "if" block guarded by the non-NULL quality of "skb", accessing "tx_swbd->skb" directly is not wrong, just confusing. Just use the local variable named "skb". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22net: enetc: remove local "priv" variable in enetc_clean_tx_ring()Vladimir Oltean1-4/+1
The "priv" variable is needed in the "check_writeback" scope since commit d39823121911 ("enetc: add hardware timestamping support"). Since commit 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping"), we also need "priv" in the larger function scope. So the local variable from the "if" block scope is not needed, and actually shadows the other one. Delete it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-21net/core: Remove unused assignment operations and variableluo penghao1-2/+1
Although if_info_size is assigned, it has not been used. And the variable should also be deleted. The clang_analyzer complains as follows: net/core/rtnetlink.c:3806: warning: Although the value stored to 'if_info_size' is used in the enclosing expression, the value is never actually read from 'if_info_size'. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding.Sebastian Andrzej Siewior1-6/+37
Since the rework, the statistics code always adds up the byte and packet value(s). On 32bit architectures a seqcount_t is used in gnet_stats_basic_sync to ensure that the 64bit values are not modified during the read since two 32bit loads are required. The usage of a seqcount_t requires a lock to ensure that only one writer is active at a time. This lock leads to disabled preemption during the update. The lack of disabling preemption is now creating a warning as reported by Naresh since the query done by gnet_stats_copy_basic() is in preemptible context. For ___gnet_stats_copy_basic() there is no need to disable preemption since the update is performed on stack and can't be modified by another writer. Instead of disabling preemption, to avoid the warning, simply create a read function to just read the values and return as u64. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: 67c9e6270f301 ("net: sched: Protect Qdisc::bstats with u64_stats") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge branch 'dsa_to_port-loops'David S. Miller11-214/+224
Vladimir Oltean says: ==================== Remove the "dsa_to_port in a loop" antipattern v1->v2: more patches v2->v3: less patches As opposed to previous series, I would now like to first refactor the DSA core, since that sees fewer patches than drivers, and make the helpers available. Since the refactoring is fairly noisy, I don't want to force it on driver maintainers right away, patches can be submitted independently. The original cover letter is below: The DSA core and drivers currently iterate too much through the port list of a switch. For example, this snippet: for (port = 0; port < ds->num_ports; port++) { if (!dsa_is_cpu_port(ds, port)) continue; ds->ops->change_tag_protocol(ds, port, tag_ops->proto); } iterates through ds->num_ports once, and then calls dsa_is_cpu_port to filter out the other types of ports. But that function has a hidden call to dsa_to_port() in it, which contains: list_for_each_entry(dp, &dst->ports, list) if (dp->ds == ds && dp->index == p) return dp; where the only thing we wanted to know in the first place was whether dp->type == DSA_PORT_TYPE_CPU or not. So it seems that the problem is that we are not iterating with the right variable. We have an "int port" but in fact need a "struct dsa_port *dp". This has started being an issue since this patch series: https://patchwork.ozlabs.org/project/netdev/cover/20191020031941.3805884-1-vivien.didelot@gmail.com/ The currently proposed set of changes iterates like this: dsa_switch_for_each_cpu_port(cpu_dp, ds) err = ds->ops->change_tag_protocol(ds, cpu_dp->index, tag_ops->proto); which iterates directly over ds->dst->ports, which is a list of struct dsa_port *dp. This makes it much easier and more efficient to check dp->type. As a nice side effect, with the proposed driver API, driver writers are now encouraged to use more efficient patterns, and not only due to less iterations through the port list. For example, something like this: for (port = 0; port < ds->num_ports; port++) do_something(); probably does not need to do_something() for the ports that are disabled in the device tree. But adding extra code for that would look like this: for (port = 0; port < ds->num_ports; port++) { if (!dsa_is_unused_port(ds, port)) continue; do_something(); } and therefore, it is understandable that some driver writers may decide to not bother. This patch series introduces a "dsa_switch_for_each_available_port" macro which comes at no extra cost in terms of lines of code / number of braces to the driver writer, but it has the "dsa_is_unused_port" check embedded within it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argumentVladimir Oltean5-22/+24
Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that contains the necessary information from the two arguments that are currently provided: the switch and the port number. Also rename those functions so that they have a dsa_port_* prefix, since they operate on a struct dsa_port *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_portVladimir Oltean1-4/+1
Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: convert cross-chip notifiers to iterate using dpVladimir Oltean2-102/+112
The majority of cross-chip switch notifiers need to filter in some way over the type of ports: some install VLANs etc on all cascade ports. The difference is that the matching function, which filters by port type, is separate from the function where the iteration happens. So this patch needs to refactor the matching functions' prototypes as well, to take the dp as argument. In a future patch/series, I might convert dsa_towards_port to return a struct dsa_port *dp too, but at the moment it is a bit entangled with dsa_routing_port which is also used by mv88e6xxx and they both return an int port. So keep dsa_towards_port the way it is and convert it into a dp using dsa_to_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_portVladimir Oltean1-2/+2
Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct dsa_port *dp was already available in the function scope, and replace them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses that dp directly and does not perform another hidden dsa_to_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: do not open-code dsa_switch_for_each_portVladimir Oltean1-30/+14
Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: remove the "dsa_to_port in a loop" antipattern from the coreVladimir Oltean6-56/+45
Ever since Vivien's conversion of the ds->ports array into a dst->ports list, and the introduction of dsa_to_port, iterations through the ports of a switch became quadratic whenever dsa_to_port was needed. dsa_to_port can either be called directly, or indirectly through the dsa_is_{user,cpu,dsa,unused}_port helpers. Use the newly introduced dsa_switch_for_each_port() iteration macro that works with the iterator variable being a struct dsa_port *dp directly, and not an int i. It is an expensive variable to go from i to dp, but cheap to go from dp to i. This macro iterates through the entire ds->dst->ports list and filters by the ports belonging just to the switch provided as argument. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: introduce helpers for iterating through ports using dpVladimir Oltean1-0/+28
Since the DSA conversion from the ds->ports array into the dst->ports list, the DSA API has encouraged driver writers, as well as the core itself, to write inefficient code. Currently, code that wants to filter by a specific type of port when iterating, like {!unused, user, cpu, dsa}, uses the dsa_is_*_port helper. Under the hood, this uses dsa_to_port which iterates again through dst->ports. But the driver iterates through the port list already, so the complexity is quadratic for the typical case of a single-switch tree. This patch introduces some iteration helpers where the iterator is already a struct dsa_port *dp, so that the other variant of the filtering functions, dsa_port_is_{unused,user,cpu_dsa}, can be used directly on the iterator. This eliminates the second lookup. These functions can be used both by the core and by drivers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge branch '100GbE' of ↵David S. Miller18-154/+2021
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-20 Sudheer Mogilappagari says: This series introduces initial support for Application Device Queues(ADQ) in ice driver. ADQ provides traffic isolation for application flows in hardware and ability to steer traffic to a given traffic class. This helps in aligning NIC queues to application threads. Traffic classes are configured using mqprio framework of tc command and mapped to HW channels(VSIs) in the driver. The queue set of each traffic class is managed by corresponding VSI. Each traffic channel can be configured with bandwidth rate-limiting limits and is offloaded to the hardware through the mqprio framework by specifying the mode option as 'channel' and shaper option as 'bw_rlimit'. Next, the flows of application can be steered into a given traffic class using "tc filter" command. The option "skip_sw hw_tc x" indicates hw-offload of filtering and steering filtered traffic into specified TC. Non-matching traffic flows through TC0. When channel configuration are removed queue configuration is set to default and filters configured on individual traffic classes are deleted. example: $ ethtool -K eth0 hw-tc-offload on Configure 3 traffic classes and map priority 0,1,2 to TC0, TC1 and TC2 respectively. TC0 has 2 queues from offset 0 & TC1 has 8 queues from offset 2 and TC2 has 4 queues from offset 10. Enable hardware offload of channels. $ tc qdisc add dev eth0 root mqprio num_tc 3 map 0 1 2 queues \ 2@0 8@2 4@10 hw 1 mode channel $ tc qdisc show dev eth0 qdisc mqprio 8001: root tc 2 map 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 queues:(0:1) (2:9) (10:13) mode:channel Configure two filters to match based on dst ipaddr, dst tcp port and redirect to TC1 and TC2. $ tc qdisc add dev eth0 clsact $ tc filter add dev eth0 protocol ip ingress prio 1 flower\ dst_ip 192.168.1.1/32 ip_proto tcp dst_port 80\ skip_sw hw_tc 1 $ tc filter add dev eth0 protocol ip ingress prio 1 flower\ dst_ip 192.168.1.1/32 ip_proto tcp dst_port 5001\ skip_sw hw_tc 2 $ tc filter show dev eth0 ingress Delete traffic classes configuration: $ sudo tc qdisc del dev eth0 root ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge branch 'mscc-ocelot-all-ports-vlan-untagged-egress'David S. Miller5-94/+223
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: track the port pvid using a pointerVladimir Oltean2-25/+13
Now that we have a list of struct ocelot_bridge_vlan entries, we can rewrite the pvid logic to simply point to one of those structures, instead of having a separate structure with a "bool valid". The NULL pointer will represent the lack of a bridge pvid (not to be confused with the lack of a hardware pvid on the port, that is present at all times). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: add the local station MAC addresses in VID 0Vladimir Oltean4-15/+17
The ocelot switchdev driver does not include the CPU port in the list of flooding destinations for unknown traffic, instead that traffic is supposed to match FDB entries to reach the CPU. The addresses it installs are: (a) the station MAC address, in ocelot_probe_port() and later during runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add(). (b) multicast addresses added with dev_mc_add() (not bridge host MDB entries) in ocelot_mc_sync() (c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(), to make sure those are dropped (not forwarded) by the bridging service, just trapped to the CPU So we can see that the logic is slightly buggy ever since the initial commit a556c76adc05 ("net: mscc: Add initial Ocelot switch support"). This is because, when ocelot_probe_port() runs, the port pvid is 0. Then we join a VLAN-aware bridge, the pvid becomes 1, we call ocelot_port_set_mac_address(), this learns the new MAC address in VID 1 (also fails to forget the old one, since it thinks it's in VID 1, but that's not so important). Then when we leave the VLAN-aware bridge, outside world is unable to ping our new MAC address because it isn't learned in VID 0, the VLAN-unaware pvid. [ note: this is strictly based on static analysis, I don't have hardware to test. But there are also many more corner cases ] The basic idea is that we should have a separation of concerns, and the FDB entries used for standalone operation should be managed by the driver, and the FDB entries used by the bridging service should be managed by the bridge. So the standalone and VLAN-unaware bridge FDB entries should not follow the bridge PVID, because that will only be active when the bridge is VLAN-aware. So since the port pvid is coincidentally zero during probe time, just make those entries statically go to VID 0. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: allow a config where all bridge VLANs are egress-untaggedVladimir Oltean2-36/+113
At present, the ocelot driver accepts a single egress-untagged bridge VLAN, meaning that this sequence of operations: ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0 bridge vlan add dev swp0 vid 2 pvid untagged fails because the bridge automatically installs VID 1 as a pvid & untagged VLAN, and vid 2 would be the second untagged VLAN on this port. It is necessary to delete VID 1 before proceeding to add VID 2. This limitation comes from the fact that we operate the port tag, when it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode. The ocelot switches do not have full flexibility and can either have one single VID as egress-untagged, or all of them. There are use cases for having all VLANs as egress-untagged as well, and this patch adds support for that. The change rewrites ocelot_port_set_native_vlan() into a more generic ocelot_port_manage_port_tag() function. Because the software bridge's state, transmitted to us via switchdev, can become very complex, we don't attempt to track all possible state transitions, but instead take a more declarative approach and just make ocelot_port_manage_port_tag() figure out which more to operate in: - port is VLAN-unaware: the classified VLAN (internal, unrelated to the 802.1Q header) is not inserted into packets on egress - port is VLAN-aware: - port has tagged VLANs: -> port has no untagged VLAN: set up as pure trunk -> port has one untagged VLAN: set up as trunk port + native VLAN -> port has more than one untagged VLAN: this is an invalid config which is rejected by ocelot_vlan_prepare - port has no tagged VLANs -> set up as pure egress-untagged port We don't keep the number of tagged and untagged VLANs, we just count the structures we keep. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: convert the VLAN masks to a listVladimir Oltean2-18/+72
First and foremost, the driver currently allocates a constant sized 4K * u32 (16KB memory) array for the VLAN masks. However, a typical application might not need so many VLANs, so if we dynamically allocate the memory as needed, we might actually save some space. Secondly, we'll need to keep more advanced bookkeeping of the VLANs we have, notably we'll have to check how many untagged and how many tagged VLANs we have. This will have to stay in a structure, and allocating another 16 KB array for that is again a bit too much. So refactor the bridge VLANs in a linked list of structures. The hook points inside the driver are ocelot_vlan_member_add() and ocelot_vlan_member_del(), which previously used to operate on the ocelot->vlan_mask[vid] array element. ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call ocelot_vlan_member_set() to commit to the ocelot->vlan_mask. Additionally, we had two calls to ocelot_vlan_member_set() from outside those callers, and those were directly from ocelot_vlan_init(). Those calls do not set up bridging service VLANs, instead they: - clear the VLAN table on reset - set the port pvid to the value used by this driver for VLAN-unaware standalone port operation (VID 0) So now, when we have a structure which represents actual bridge VLANs, VID 0 doesn't belong in that structure, since it is not part of the bridging layer. So delete the middle man, ocelot_vlan_member_set(), and let ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes any data structure and writes directly to hardware, which is all that we need. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFGVladimir Oltean2-8/+16
This is a cosmetic patch which clarifies what are the port tagging options for Ocelot switches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21ice: Add tc-flower filter support for channelKiran Patil6-8/+438
Add support to add/delete channel specific filter using tc-flower. For now, only supported action is "skip_sw hw_tc <tc_num>" Filter criteria is specific to channel and it can be combination of L3, L3+L4, L2+L4. Example: MATCH criteria Action --------------------------- src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>" dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>" dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>" src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>" src MAC + src L4 port -> Forward to "hw_tc <tc_num>" Adding tc-flower filter for channel using "hw_tc" ------------------------------------------------- tc qdisc add dev <ethX> clsact Above two steps are only needed the first time when adding tc-flower filter. tc filter add dev <ethX> protocol ip ingress prio 1 flower \ dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \ skip_sw hw_tc 1 tc filter show dev <ethX> ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.1 dst_port 5001 skip_sw in_hw in_hw_count 1 Delete specific filter: ------------------------- tc filter del dev <ethx> ingress pref 1 handle 0x1 flower Delete All filters: ------------------ tc filter del dev <ethX> ingress Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-21ice: enable ndo_setup_tc support for mqprio_qdiscKiran Patil5-10/+913
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-21ice: Add infrastructure for mqprio support via ndo_setup_tcKiran Patil13-137/+671
Add infrastructure required for "ndo_setup_tc:qdisc_mqprio". ice_vsi_setup is modified to configure traffic classes based on mqprio data received from the stack. This includes low-level functions to configure min, max rate-limit parameters in hardware for traffic classes. Each traffic class gets mapped to a hardware channel (VSI) which can be individually configured with different bandwidth parameters. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-21fq_codel: generalise ce_threshold marking for subset of trafficToke Høiland-Jørgensen5-15/+25
Commit e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1 marking") expanded the ce_threshold feature of FQ-CoDel so it can be applied to a subset of the traffic, using the ECT(1) bit of the ECN field as the classifier. However, hard-coding ECT(1) as the only classifier for this feature seems limiting, so let's expand it to be more general. To this end, change the parameter from a ce_threshold_ect1 boolean, to a one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied to the whole diffserv/ECN field in the IP header. This makes it possible to classify packets by any value in either the ECN field or the diffserv field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of INET_ECN_MASK corresponds to the functionality before this patch, and a mask of ~INET_ECN_MASK allows using the selector as a straight-forward match against a diffserv code point: # apply ce_threshold to ECT(1) traffic tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3 # apply ce_threshold to ECN-capable traffic marked as diffserv AF22 tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc Regardless of the selector chosen, the normal rules for ECN-marking of packets still apply, i.e., the flow must still declare itself ECN-capable by setting one of the bits in the ECN field to get marked at all. v2: - Add tc usage examples to patch description Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-20phy: micrel: ksz8041nl: do not use power down modeStefan Agner1-2/+3
Some Micrel KSZ8041NL PHY chips exhibit continuous RX errors after using the power down mode bit (0.11). If the PHY is taken out of power down mode in a certain temperature range, the PHY enters a weird state which leads to continuously reporting RX errors. In that state, the MAC is not able to receive or send any Ethernet frames and the activity LED is constantly blinking. Since Linux is using the suspend callback when the interface is taken down, ending up in that state can easily happen during a normal startup. Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal clock recovery when using power down mode. Even the latest revision (A4, Revision ID 0x1513) seems to suffer that problem, and according to the errata is not going to be fixed. Remove the suspend/resume callback to avoid using the power down mode completely. [*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf Fixes: 1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs") Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20net: enetc: unmap DMA in enetc_send_cmd()Tim Gardner1-7/+11
Coverity complains of a possible dereference of a null return value. 5. returned_null: kzalloc returns NULL. [show details] 6. var_assigned: Assigning: si_data = NULL return value from kzalloc. 488 si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL); 489 cbd.length = cpu_to_le16(data_size); 490 491 dma = dma_map_single(&priv->si->pdev->dev, si_data, 492 data_size, DMA_FROM_DEVICE); While this kzalloc() is unlikely to fail, I did notice that the function returned without unmapping si_data. Fix this by refactoring the error paths and checking for kzalloc() failure. Fixes: 888ae5a3952ba ("net: enetc: add tc flower psfp offload driver") Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org (open list) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20net-core: use netdev_* calls for kernel messagesJesse Brandeburg1-12/+10
While loading a driver and changing the number of queues, I noticed this message in the kernel log: "[253489.070080] Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" But I had no idea what interface was being talked about because this message used pr_warn(). After investigating, it appears we can use the netdev_* helpers already defined to create predictably formatted messages, and that already handle <unknown netdev> cases, in more of the messages in dev.c. After this change, this message (and others) will look like this: "[ 170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" One goal here was not to change the message significantly from the original format so as to not break user's expectations, so I just changed messages that used pr_* and generally started with %s == dev->name. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20batman-adv: use eth_hw_addr_set() instead of ether_addr_copy()Jakub Kicinski1-1/+1
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Convert batman from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20mac802154: use dev_addr_set() - manualJakub Kicinski2-8/+9
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20mac802154: use dev_addr_set()Jakub Kicinski1-1/+1
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20batman-adv: prepare for const netdev->dev_addrJakub Kicinski5-13/+14
netdev->dev_addr will be constant soon, make sure the qualifier is propagated thru batman-adv. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20soc: fsl: dpio: Unsigned compared against 0 in qbman_swp_set_irq_coalescing()Tim Gardner1-6/+5
Coverity complains of unsigned compare against 0. There are 2 cases in this function: 1821 itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns; CID 121131 (#1 of 1): Unsigned compared against 0 (NO_EFFECT) unsigned_compare: This less-than-zero comparison of an unsigned value is never true. itp < 0U. 1822 if (itp < 0 || itp > 4096) { 1823 max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000; 1824 pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff); 1825 return -EINVAL; 1826 } 1827 unsigned_compare: This less-than-zero comparison of an unsigned value is never true. irq_threshold < 0U. 1828 if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) { 1829 pr_err("irq_threshold must be between 0..%d\n", 1830 p->dqrr.dqrr_size - 1); 1831 return -EINVAL; 1832 } Fix this by removing the comparisons altogether as they are incorrect. Zero is a possible value in either case. Also fix a minor comment typo and update the 2 pr_err() calls to use %u formatting as well as be more precise regarding the exact error. Fixes: ed1d2143fee5 ("soc: fsl: dpio: add support for irq coalescing per software portal") Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Roy Pledge <Roy.Pledge@nxp.com> Cc: Li Yang <leoyang.li@nxp.com> Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20net: dsa: qca8k: tidy for loop in setup and add cpu port checkAnsuel Smith1-30/+44
Tidy and organize qca8k setup function from multiple for loop. Change for loop in bridge leave/join to scan all port and skip cpu port. No functional change intended. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20Merge branch '100GbE' of ↵David S. Miller17-286/+764
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-19 This series contains updates to ice driver only. Brett implements support for ndo_set_vf_rate allowing for min_tx_rate and max_tx_rate to be set for a VF. Jesse updates DIM moderation to improve latency and resolves problems with reported rate limit and extra software generated interrupts. Wojciech moves a check for trusted VFs to the correct function, disables lb_en for switchdev offloads, and refactors ethtool ops due to differences in support for PF and port representor support. Cai Huoqing utilizes the helper function devm_add_action_or_reset(). Gustavo A. R. Silva replaces uses of allocation to devm_kcalloc() as applicable. Dan Carpenter propagates an error instead of returning success. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20Merge branch 'dev_addr-conversions-part-three'David S. Miller6-17/+31
Jakub Kicinski says: ==================== ethernet: manual netdev->dev_addr conversions (part 3) Manual conversions of Ethernet drivers writing directly to netdev->dev_addr (part 3 out of 3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: via-velocity: use eth_hw_addr_set()Jakub Kicinski1-1/+3
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: via-rhine: use eth_hw_addr_set()Jakub Kicinski1-1/+3
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: tlan: use eth_hw_addr_set()Jakub Kicinski1-4/+6
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, do the swapping, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: tehuti: use eth_hw_addr_set()Jakub Kicinski1-2/+4
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Break the address up into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: stmmac: use eth_hw_addr_set()Jakub Kicinski1-2/+6
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ethernet: netsec: use eth_hw_addr_set()Jakub Kicinski1-7/+9
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20Merge branch 'sja1105-next'David S. Miller5-47/+157
Vladimir Oltean says: ==================== New RGMII delay DT bindings for the SJA1105 DSA driver During recent reviews I've been telling people that new MAC drivers should adopt a certain DT binding format for RGMII delays in order to avoid conflicting interpretations. Some suggestions were better received than others, and it appears we are still far from a consensus. Part of the problem seems to be that there are still drivers that apply RGMII delays based on an incorrect interpretation of the device tree, and these serve as a bad example for others. I happen to maintain one of those drivers and I am able to test it, so I figure that one of the ways in which I can make a change is to stop providing a bad example. Therefore, this series adds support for the "rx-internal-delay-ps" and "tx-internal-delay-ps" properties inside sja1105 switch port DT nodes, and if these are present, they will decide what RGMII delays will the driver apply. The in-tree device trees are also updated to follow the new format, as well as the schema validator. I assume it's okay to get all changes merged in through the same tree (net-next). Although the DTS changes could be split, if needed - the driver works with or without them. There is one more DTS which should be changed, which is in Shawn's tree but not in net-next: https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts?h=for-next For that, I'd have to send a separate patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delaysVladimir Oltean3-47/+107
This change does not fix any functional issue or address any real life use case that wasn't possible before. It is just a small step in the process of standardizing the way in which Ethernet MAC drivers may apply RGMII delays (traditionally these have been applied by PHYs, with no clear definition of what to do in the case of a fixed-link). The sja1105 driver used to apply MAC-level RGMII delays on the RX data lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or "rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id". But the standard definitions don't say anything about behaving differently when the port is in fixed-link vs when it isn't, and the new device tree bindings are about having a way of applying the delays in a way that is independent of the phy-mode and of the fixed-link property. When the {rx,tx}-internal-delay-ps properties are present, use them, otherwise fall back to the old behavior and warn. One other thing to note is that the SJA1105 hardware applies a delay value in degrees rather than in picoseconds (the delay in ps changes depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at 100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase shift of the internal delay lines assuming that the device tree meant gigabit, and we let the hardware scale those according to the link speed. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/ Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20dt-bindings: net: dsa: sja1105: add {rx,tx}-internal-delay-psVladimir Oltean2-0/+46
Add a schema validator to nxp,sja1105.yaml and to dsa.yaml for explicit MAC-level RGMII delays. These properties must be per port and must be present only for a phy-mode that represents RGMII. We tell dsa.yaml that these port properties might be present, we also define their valid values for SJA1105. We create a common definition for the RX and TX valid range, since it's quite a mouthful. We also modify the example to include the explicit RGMII delay properties. On the fixed-link ports (in the example, port 4), having these explicit delays is actually mandatory, since with the new behavior, the driver shouts that it is interpreting what delays to apply based on phy-mode. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20dt-bindings: net: dsa: inherit the ethernet-controller DT schemaVladimir Oltean1-0/+3
Since a switch is basically a bunch of Ethernet controllers, just inherit the common schema for one to get stronger type validation of the properties of a port. For example, before this change it was valid to have a phy-mode = "xfi" even if "xfi" is not part of ethernet-controller.yaml, now it is not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20dt-bindings: net: dsa: sja1105: fix example so all ports have a phy-handle ↵Vladimir Oltean1-0/+1
of fixed-link All ports require either a phy-handle or a fixed-link, and port 3 in the example didn't have one. Add it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>