summaryrefslogtreecommitdiff
path: root/drivers/net/dsa
AgeCommit message (Collapse)AuthorFilesLines
2025-01-09net: dsa: microchip: Fix LAN937X set_ageing_time functionTristram Ha2-6/+65
[ Upstream commit bb9869043438af5b94230f94fb4c39206525d758 ] The aging count is not a simple 20-bit value but comprises a 3-bit multiplier and a 20-bit second time. The code tries to use the original multiplier which is 4 as the second count is still 300 seconds by default. As the 20-bit number is now too large for practical use there is an option to interpret it as microseconds instead of seconds. Fixes: 2c119d9982b1 ("net: dsa: microchip: add the support for set_ageing_time") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241218020224.70590-3-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09net: dsa: microchip: add ksz_rmw8() functionOleksij Rempel1-0/+5
[ Upstream commit 6f1b986a43ce9aa67b11a7e54ac75530705d04e7 ] Add ksz_rmw8(), it will be used in the next patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: bb9869043438 ("net: dsa: microchip: Fix LAN937X set_ageing_time function") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09net: dsa: microchip: Fix KSZ9477 set_ageing_time functionTristram Ha2-14/+37
[ Upstream commit 262bfba8ab820641c8cfbbf03b86d6c00242c078 ] The aging count is not a simple 11-bit value but comprises a 3-bit multiplier and an 8-bit second count. The code tries to use the original multiplier which is 4 as the second count is still 300 seconds by default. Fixes: 2c119d9982b1 ("net: dsa: microchip: add the support for set_ageing_time") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241218020224.70590-2-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-19net: dsa: felix: fix stuck CPU-injected packets with short taprio windowsVladimir Oltean1-6/+11
[ Upstream commit acfcdb78d5d4cdb78e975210c8825b9a112463f6 ] With this port schedule: tc qdisc replace dev $send_if parent root handle 100 taprio \ num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ map 0 1 2 3 4 5 6 7 \ base-time 0 cycle-time 10000 \ sched-entry S 01 1250 \ sched-entry S 02 1250 \ sched-entry S 04 1250 \ sched-entry S 08 1250 \ sched-entry S 10 1250 \ sched-entry S 20 1250 \ sched-entry S 40 1250 \ sched-entry S 80 1250 \ flags 2 ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[4134.168]: port 2: send peer delay response failed It turns out that the driver can't take their TX timestamps because it can't transmit them in the first place. And there's nothing special about the Pdelay_Resp packets - they're just regular 68 byte packets. But with this taprio configuration, the switch would refuse to send even the ETH_ZLEN minimum packet size. This should have definitely not been the case. When applying the taprio config, the driver prints: mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent without problems. Yet it's not. For the forwarding path, the configuration is fine, yet packets injected from Linux get stuck with this schedule no matter what. The first hint that the static guard bands are the cause of the problem is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode") made things work. It must be that the guard bands are calculated incorrectly. I remembered that there is a magic constant in the driver, set to 33 ns for no logical reason other than experimentation, which says "never let the static guard bands get so large as to leave less than this amount of remaining space in the time slot, because the queue system will refuse to schedule packets otherwise, and they will get stuck". I had a hunch that my previous experimentally-determined value was only good for packets coming from the forwarding path, and that the CPU injection path needed more. I came to the new value of 35 ns through binary search, after seeing that with 544 ns (the bit time required to send the Pdelay_Resp packet at gigabit) it works. Again, this is purely experimental, there's no logic and the manual doesn't say anything. The new driver prints for this schedule look like this: mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS So yes, the maximum MTU is now even smaller by 1 byte than before. This is maybe counter-intuitive, but makes more sense with a diagram of one time slot. Before: Gate open Gate close | | v 1250 ns total time slot duration v <----------------------------------------------------> <----><----------------------------------------------> 33 ns 1217 ns static guard band useful Gate open Gate close | | v 1250 ns total time slot duration v <----------------------------------------------------> <-----><---------------------------------------------> 35 ns 1215 ns static guard band useful The static guard band implemented by this switch hardware directly determines the maximum allowable MTU for that traffic class. The larger it is, the earlier the switch will stop scheduling frames for transmission, because otherwise they might overrun the gate close time (and avoiding that is the entire purpose of Michael's patch). So, we now have guard bands smaller by 2 ns, thus, in this particular case, we lose a byte of the maximum MTU. Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14dsa: qca8k: Use nested lock to avoid splatAndrew Lunn1-1/+1
[ Upstream commit 078e0d596f7b5952dad8662ace8f20ed2165e2ce ] qca8k_phy_eth_command() is used to probe the child MDIO bus while the parent MDIO is locked. This causes lockdep splat, reporting a possible deadlock. It is not an actually deadlock, because different locks are used. By making use of mutex_lock_nested() we can avoid this false positive. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241110175955.3053664-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-01net: dsa: mv88e6xxx: Fix error when setting port policy on mv88e6393xPeter Rashleigh1-0/+1
[ Upstream commit 12bc14949c4a7272b509af0f1022a0deeb215fd8 ] mv88e6393x_port_set_policy doesn't correctly shift the ptr value when converting the policy format between the old and new styles, so the target register ends up with the ptr being written over the data bits. Shift the pointer to align with the format expected by mv88e6393x_port_policy_write(). Fixes: 6584b26020fc ("net: dsa: mv88e6xxx: implement .port_set_policy for Amethyst") Signed-off-by: Peter Rashleigh <peter@rashleigh.ca> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <20241016040822.3917-1-peter@rashleigh.ca> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17net: dsa: lan9303: ensure chip reset and wait for READY statusAnatolij Gustschin1-0/+29
commit 5c14e51d2d7df49fe0d4e64a12c58d2542f452ff upstream. Accessing device registers seems to be not reliable, the chip revision is sometimes detected wrongly (0 instead of expected 1). Ensure that the chip reset is performed via reset GPIO and then wait for 'Device Ready' status in HW_CFG register before doing any register initializations. Cc: stable@vger.kernel.org Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Anatolij Gustschin <agust@denx.de> [alex: reworked using read_poll_timeout()] Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20241004113655.3436296-1-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17net: dsa: b53: fix jumbo frames on 10/100 portsJonas Gorski1-1/+1
[ Upstream commit 2f3dcd0d39affe5b9ba1c351ce0e270c8bdd5109 ] All modern chips support and need the 10_100 bit set for supporting jumbo frames on 10/100 ports, so instead of enabling it only for 583XX enable it for everything except bcm63xx, where the bit is writeable, but does nothing. Tested on BCM53115, where jumbo frames were dropped at 10/100 speeds without the bit set. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17net: dsa: b53: allow lower MTUs on BCM5325/5365Jonas Gorski1-1/+1
[ Upstream commit e4b294f88a32438baf31762441f3dd1c996778be ] While BCM5325/5365 do not support jumbo frames, they do support slightly oversized frames, so do not error out if requesting a supported MTU for them. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17net: dsa: b53: fix max MTU for BCM5325/BCM5365Jonas Gorski1-0/+6
[ Upstream commit ca8c1f71c10193c270f772d70d34b15ad765d6a8 ] BCM5325/BCM5365 do not support jumbo frames, so we should not report a jumbo frame mtu for them. But they do support so called "oversized" frames up to 1536 bytes long by default, so report an appropriate MTU. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17net: dsa: b53: fix max MTU for 1g switchesJonas Gorski1-1/+4
[ Upstream commit 680a8217dc00dc7e7da57888b3c053289b60eb2b ] JMS_MAX_SIZE is the ethernet frame length, not the MTU, which is payload without ethernet headers. According to the datasheets maximum supported frame length for most gigabyte swithes is 9720 bytes, so convert that to the expected MTU when using VLAN tagged frames. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17net: dsa: b53: fix jumbo frame mtu checkJonas Gorski1-1/+1
[ Upstream commit 42fb3acf6826c6764ba79feb6e15229b43fd2f9f ] JMS_MIN_SIZE is the full ethernet frame length, while mtu is just the data payload size. Comparing these two meant that mtus between 1500 and 1518 did not trigger enabling jumbo frames. So instead compare the set mtu ETH_DATA_LEN, which is equal to JMS_MIN_SIZE - ETH_HLEN - ETH_FCS_LEN; Also do a check that the requested mtu is actually greater than the minimum length, else we do not need to enable jumbo frames. In practice this only introduced a very small range of mtus that did not work properly. Newer chips allow 2000 byte large frames by default, and older chips allow 1536 bytes long, which is equivalent to an mtu of 1514. So effectivly only mtus of 1515~1517 were broken. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12net: dsa: vsc73xx: fix possible subblocks range of CAPT blockPawel Dembicki1-2/+8
[ Upstream commit 8e69c96df771ab469cec278edb47009351de4da6 ] CAPT block (CPU Capture Buffer) have 7 sublocks: 0-3, 4, 6, 7. Function 'vsc73xx_is_addr_valid' allows to use only block 0 at this moment. This patch fix it. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240903203340.1518789-1-paweldembicki@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: mv88e6xxx: Fix out-of-bound accessJoseph Huang1-1/+2
[ Upstream commit 528876d867a23b5198022baf2e388052ca67c952 ] If an ATU violation was caused by a CPU Load operation, the SPID could be larger than DSA_MAX_PORTS (the size of mv88e6xxx_chip.ports[] array). Fixes: 75c05a74e745 ("net: dsa: mv88e6xxx: Fix counting of ATU violations") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240819235251.1331763-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: mscc: ocelot: serialize access to the injection/extraction groupsVladimir Oltean1-0/+11
[ Upstream commit c5e12ac3beb0dd3a718296b2d8af5528e9ab728e ] As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add spinlock for frame transmission from CPU") which is for a similar hardware design, multiple CPUs can simultaneously perform injection or extraction. There are only 2 register groups for injection and 2 for extraction, and the driver only uses one of each. So we'd better serialize access using spin locks, otherwise frame corruption is possible. Note that unlike in sparx5, FDMA in ocelot does not have this issue because struct ocelot_fdma_tx_ring already contains an xmit_lock. I guess this is mostly a problem for NXP LS1028A, as that is dual core. I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka the felix DSA driver) started using register-based packet injection and extraction. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: check busy flag in MDIO operationsPawel Dembicki1-1/+36
[ Upstream commit fa63c6434b6f6aaf9d8d599dc899bc0a074cc0ad ] The VSC73xx has a busy flag used during MDIO operations. It is raised when MDIO read/write operations are in progress. Without it, PHYs are misconfigured and bus operations do not work as expected. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: use read_poll_timeout instead delay loopPawel Dembicki1-14/+16
[ Upstream commit eb7e33d01db3aec128590391b2397384bab406b6 ] Switch the delay loop during the Arbiter empty check from vsc73xx_adjust_link() to use read_poll_timeout(). Functionally, one msleep() call is eliminated at the end of the loop in the timeout case. As Russell King suggested: "This [change] avoids the issue that on the last iteration, the code reads the register, tests it, finds the condition that's being waiting for is false, _then_ waits and end up printing the error message - that last wait is rather useless, and as the arbiter state isn't checked after waiting, it could be that we had success during the last wait." Suggested-by: Russell King <linux@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29net: dsa: vsc73xx: pass value in phy_write operationPawel Dembicki1-1/+1
[ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ] In the 'vsc73xx_phy_write' function, the register value is missing, and the phy write operation always sends zeros. This commit passes the value variable into the proper register. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()Joe Hattori1-1/+3
[ Upstream commit e3862093ee93fcfbdadcb7957f5f8974fffa806a ] bcm_sf2_mdio_register() calls of_phy_find_device() and then phy_device_remove() in a loop to remove existing PHY devices. of_phy_find_device() eventually calls bus_find_device(), which calls get_device() on the returned struct device * to increment the refcount. The current implementation does not decrement the refcount, which causes memory leak. This commit adds the missing phy_device_free() call to decrement the refcount via put_device() to balance the refcount. Fixes: 771089c2a485 ("net: dsa: bcm_sf2: Ensure that MDIO diversion is used") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240806011327.3817861-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03net: dsa: b53: Limit chip-wide jumbo frame config to CPU portsMartin Willi1-0/+3
[ Upstream commit c5118072e228e7e4385fc5ac46b2e31cf6c4f2d3 ] Broadcom switches supported by the b53 driver use a chip-wide jumbo frame configuration. In the commit referenced with the Fixes tag, the setting is applied just for the last port changing its MTU. While configuring CPU ports accounts for tagger overhead, user ports do not. When setting the MTU for a user port, the chip-wide setting is reduced to not include the tagger overhead, resulting in an potentially insufficient chip-wide maximum frame size for the CPU port. As, by design, the CPU port MTU is adjusted for any user port change, apply the chip-wide setting only for CPU ports. This aligns the driver to the behavior of other switch drivers. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Willi <martin@strongswan.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU portsMartin Willi1-1/+2
[ Upstream commit 66b6095c264e1b4e0a441c6329861806504e06c6 ] Marvell chips not supporting per-port jumbo frame size configurations use a chip-wide frame size configuration. In the commit referenced with the Fixes tag, the setting is applied just for the last port changing its MTU. While configuring CPU ports accounts for tagger overhead, user ports do not. When setting the MTU for a user port, the chip-wide setting is reduced to not include the tagger overhead, resulting in an potentially insufficient maximum frame size for the CPU port. Specifically, sending full-size frames from the CPU port on a MV88E6097 having a user port MTU of 1500 bytes results in dropped frames. As, by design, the CPU port MTU is adjusted for any user port change, apply the chip-wide setting only for CPU ports. Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU") Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Willi <martin@strongswan.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11net: dsa: mv88e6xxx: Correct check for empty listSimon Horman1-2/+2
[ Upstream commit 4c7f3950a9fd53a62b156c0fe7c3a2c43b0ba19b ] Since commit a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") mv88e6xxx_default_mdio_bus() has checked that the return value of list_first_entry() is non-NULL. This appears to be intended to guard against the list chip->mdios being empty. However, it is not the correct check as the implementation of list_first_entry is not designed to return NULL for empty lists. Instead, use list_first_entry_or_null() which does return NULL if the list is empty. Flagged by Smatch. Compile tested only. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240430-mv88e6xx-list_empty-v3-1-c35c69d88d2e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05net: dsa: microchip: fix wrong register write when masking interruptTristram Ha1-1/+1
[ Upstream commit b1c4b4d45263241ec6c2405a8df8265d4b58e707 ] The switch global port interrupt mask, REG_SW_PORT_INT_MASK__4, is defined as 0x001C in ksz9477_reg.h. The designers used 32-bit value in anticipation for increase of port count in future product but currently the maximum port count is 7 and the effective value is 0x7F in register 0x001F. Each port has its own interrupt mask and is defined as 0x#01F. It uses only 4 bits for different interrupts. The developer who implemented the current interrupt mechanism in the switch driver noticed there are similarities between the mechanism to mask port interrupts in global interrupt and individual interrupts in each port and so used the same code to handle these interrupts. He updated the code to use the new macro REG_SW_PORT_INT_MASK__1 which is defined as 0x1F in ksz_common.h but he forgot to update the 32-bit write to 8-bit as now the mask registers are 0x1F and 0x#01F. In addition all KSZ switches other than the KSZ9897/KSZ9893 and LAN937X families use only 8-bit access and so this common code will eventually be changed to accommodate them. Fixes: e1add7dd6183 ("net: dsa: microchip: use common irq routines for girq and pirq") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Link: https://lore.kernel.org/r/1719009262-2948-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05net: dsa: microchip: use collision based back pressure modeEnguerrand de Ribaucourt2-0/+5
[ Upstream commit d963c95bc9840d070a788c35e41b715a648717f7 ] Errata DS80000758 states that carrier sense back pressure mode can cause link down issues in 100BASE-TX half duplex mode. The datasheet also recommends to always use the collision based back pressure mode. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Reviewed-by: Woojung Huh <Woojung.huh@microchip.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05net: dsa: microchip: fix initial port flush problemTristram Ha1-4/+2
[ Upstream commit ad53f5f54f351e967128edbc431f0f26427172cf ] The very first flush in any port will flush all learned addresses in all ports. This can be observed by unplugging the cable from one port while additional ports are connected and dumping the fdb entries. This problem is caused by the initially wrong value programmed to the REG_SW_LUE_CTRL_1 register. Setting SW_FLUSH_STP_TABLE and SW_FLUSH_MSTP_TABLE bits does not have an immediate effect. It is when ksz9477_flush_dyn_mac_table() is called then the SW_FLUSH_STP_TABLE bit takes effect and flushes all learned entries. After that call both bits are reset and so the next port flush will not cause such problem again. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Link: https://patch.msgid.link/1718756202-2731-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27net: dsa: realtek: keep default LED state in rtl8366rbLuiz Angelo Daros de Luca1-67/+20
[ Upstream commit 5edc6585aafefa3d44fb8a84adf241d90227f7a3 ] This switch family supports four LEDs for each of its six ports. Each LED group is composed of one of these four LEDs from all six ports. LED groups can be configured to display hardware information, such as link activity, or manually controlled through a bitmap in registers RTL8366RB_LED_0_1_CTRL_REG and RTL8366RB_LED_2_3_CTRL_REG. After a reset, the default LED group configuration for groups 0 to 3 indicates, respectively, link activity, link at 1000M, 100M, and 10M, or RTL8366RB_LED_CTRL_REG as 0x5432. These configurations are commonly used for LED indications. However, the driver was replacing that configuration to use manually controlled LEDs (RTL8366RB_LED_FORCE) without providing a way for the OS to control them. The default configuration is deemed more useful than fixed, uncontrollable turned-on LEDs. The driver was enabling/disabling LEDs during port_enable/disable. However, these events occur when the port is administratively controlled (up or down) and are not related to link presence. Additionally, when a port N was disabled, the driver was turning off all LEDs for group N, not only the corresponding LED for port N in any of those 4 groups. In such cases, if port 0 was brought down, the LEDs for all ports in LED group 0 would be turned off. As another side effect, the driver was wrongly warning that port 5 didn't have an LED ("no LED for port 5"). Since showing the administrative state of ports is not an orthodox way to use LEDs, it was not worth it to fix it and all this code was dropped. The code to disable LEDs was simplified only changing each LED group to the RTL8366RB_LED_OFF state. Registers RTL8366RB_LED_0_1_CTRL_REG and RTL8366RB_LED_2_3_CTRL_REG are only used when the corresponding LED group is configured with RTL8366RB_LED_FORCE and they don't need to be cleaned. The code still references an LED controlled by RTL8366RB_INTERRUPT_CONTROL_REG, but as of now, no test device has actually used it. Also, some magic numbers were replaced by macros. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12net: dsa: microchip: fix RGMII error in KSZ DSA driverTristram Ha1-1/+1
[ Upstream commit 278d65ccdadb5f0fa0ceaf7b9cc97b305cd72822 ] The driver should return RMII interface when XMII is running in RMII mode. Fixes: 0ab7f6bf1675 ("net: dsa: microchip: ksz9477: use common xmii function") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Jerry Ray <jerry.ray@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/1716932066-3342-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12net: dsa: mv88e6xxx: Avoid EEPROM timeout without EEPROM on 88E6250-family ↵Matthias Schiffer3-2/+93
switches [ Upstream commit e44894e2aa4eb311ceda134de8b6f51ff979211b ] 88E6250-family switches have the quirk that the EEPROM Running flag can get stuck at 1 when no EEPROM is connected, causing mv88e6xxx_g2_eeprom_wait() to time out. We still want to wait for the EEPROM however, to avoid interrupting a transfer and leaving the EEPROM in an invalid state. The condition to wait for recommended by the hardware spec is the EEInt flag, however this flag is cleared on read, so before the hardware reset, is may have been cleared already even though the EEPROM has been read successfully. For this reason, we revive the mv88e6xxx_g1_wait_eeprom_done() function that was removed in commit 6ccf50d4d474 ("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent") in a slightly refactored form, and introduce a new mv88e6xxx_g1_wait_eeprom_done_prereset() that additionally handles this case by triggering another EEPROM reload that can be waited on. On other switch models without this quirk, mv88e6xxx_g2_eeprom_wait() is kept, as it avoids the additional reload. Fixes: 6ccf50d4d474 ("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12net: dsa: mv88e6xxx: Add support for model-specific pre- and post-reset handlersMatthias Schiffer2-4/+52
[ Upstream commit 0fdd27b9d6d7c60bd319d3497ad797934bab13cb ] Instead of calling mv88e6xxx_g2_eeprom_wait() directly from mv88e6xxx_hardware_reset(), add configurable pre- and post-reset hard reset handlers. Initially, the handlers are set to mv88e6xxx_g2_eeprom_wait() for all families that have get/set_eeprom() to match the existing behavior. No functional change intended (except for additional error messages on failure). Fixes: 6ccf50d4d474 ("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341Marek Behún1-2/+2
[ Upstream commit b9a61c20179fda7bdfe2c1210aa72451991ab81a ] The Topaz family (88E6141 and 88E6341) only support 256 Forwarding Information Tables. Fixes: a75961d0ebfd ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6341") Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240429133832.9547-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-27net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boardsArınç ÜNAL2-5/+13
commit 06dfcd4098cfdc4d4577d94793a4f9125386da8b upstream. The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") brought EEE support but did not enable EEE on MT7531 switch MACs. EEE is enabled on MT7531 switch MACs by pulling the LAN2LED0 pin low on the board (bootstrapping), unsetting the EEE_DIS bit on the trap register, or setting the internal EEE switch bit on the CORE_PLL_GROUP4 register. Thanks to SkyLake Huang (黃啟澤) from MediaTek for providing information on the internal EEE switch bit. There are existing boards that were not designed to pull the pin low. Because of that, the EEE status currently depends on the board design. The EEE_DIS bit on the trap pertains to the LAN2LED0 pin which is usually used to control an LED. Once the bit is unset, the pin will be low. That will make the active low LED turn on. The pin is controlled by the switch PHY. It seems that the PHY controls the pin in the way that it inverts the pin state. That means depending on the wiring of the LED connected to LAN2LED0 on the board, the LED may be on without an active link. To not cause this unwanted behaviour whilst enabling EEE on all boards, set the internal EEE switch bit on the CORE_PLL_GROUP4 register. My testing on MT7531 shows a certain amount of traffic loss when EEE is enabled. That said, I haven't come across a board that enables EEE. So enable EEE on the switch MACs but disable EEE advertisement on the switch PHYs. This way, we don't change the behaviour of the majority of the boards that have this switch. The mediatek-ge PHY driver already disables EEE advertisement on the switch PHYs but my testing shows that it is somehow enabled afterwards. Disabling EEE advertisement before the PHY driver initialises keeps it off. With this change, EEE can now be enabled using ethtool. Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/20240408-for-net-mt7530-fix-eee-for-mt7531-mt7988-v3-1-84fdef1f008b@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530Arınç ÜNAL1-2/+3
commit 5f563c31ff0c40ce395d0bae7daa94c7950dac97 upstream. The MT7530 switch after reset initialises with a core clock frequency that works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock frequency must be set to 500MHz. The mt7530_pll_setup() function is responsible of setting the core clock frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This causes MT7530 switch with 25MHz XTAL to egress and ingress frames improperly. Introduce a check to run it only on MT7530 with 40MHz XTAL. The core clock frequency is set by writing to a switch PHY's register. Access to the PHY's register is done via the MDIO bus the switch is also on. Therefore, it works only when the switch makes switch PHYs listen on the MDIO bus the switch is on. This is controlled either by the state of the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the modifiable trap register. When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means accessing PHY registers via the PHY indirect access control register of the switch. When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means accessing PHY registers via the MDIO bus the switch is on. For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high, the core clock frequency won't be set to 500MHz, causing the switch to egress and ingress frames improperly. Run mt7530_pll_setup() after PHY direct access is set on the modifiable trap register. With these two changes, all MT7530 switches with 25MHz and 40MHz, and P1_LED_1 pulled high or low, will egress and ingress frames properly. Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039 Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean1-0/+15
commit b79d7c14f48083abb3fb061370c0c64a569edf4c upstream. Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAPArınç ÜNAL2-8/+8
commit ff221029a51fd54cacac66e193e0c75e4de940e7 upstream. MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU frames (further restricted by PCR_MATRIX). Currently the driver sets the first CPU port as the single port in this bit mask, which works fine regardless of whether the device tree defines port 5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's logic of picking the first CPU port as the CPU port that all user ports are affine to, by default. An upcoming change would like to influence DSA's selection of the default CPU port to no longer be the first one, and in that case, this logic needs adaptation. Since there is no observed leakage or duplication of frames if all CPU ports are defined in this bit mask, simply include them all. Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27net: dsa: mt7530: fix mirroring frames received on local portArınç ÜNAL2-0/+10
[ Upstream commit d59cf049c8378677053703e724808836f180888e ] This switch intellectual property provides a bit on the ARL global control register which controls allowing mirroring frames which are received on the local port (monitor port). This bit is unset after reset. This ability must be enabled to fully support the port mirroring feature on this switch intellectual property. Therefore, this patch fixes the traffic not being reflected on a port, which would be configured like below: tc qdisc add dev swp0 clsact tc filter add dev swp0 ingress matchall skip_sw \ action mirred egress mirror dev swp0 As a side note, this configuration provides the hairpinning feature for a single port. Fixes: 37feab6076aa ("net: dsa: mt7530: add support for port mirroring") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17net: dsa: mt7530: trap link-local frames regardless of ST Port StateArınç ÜNAL2-34/+200
[ Upstream commit 17c560113231ddc20088553c7b499b289b664311 ] In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer (DLL) of the Open Systems Interconnection basic reference model (OSI/RM) are described; the medium access control (MAC) and logical link control (LLC) sublayers. The MAC sublayer is the one facing the physical layer. In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A Bridge component comprises a MAC Relay Entity for interconnecting the Ports of the Bridge, at least two Ports, and higher layer entities with at least a Spanning Tree Protocol Entity included. Each Bridge Port also functions as an end station and shall provide the MAC Service to an LLC Entity. Each instance of the MAC Service is provided to a distinct LLC Entity that supports protocol identification, multiplexing, and demultiplexing, for protocol data unit (PDU) transmission and reception by one or more higher layer entities. It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC Entity associated with each Bridge Port is modeled as being directly connected to the attached Local Area Network (LAN). On the switch with CPU port architecture, CPU port functions as Management Port, and the Management Port functionality is provided by software which functions as an end station. Software is connected to an IEEE 802 LAN that is wholly contained within the system that incorporates the Bridge. Software provides access to the LLC Entity associated with each Bridge Port by the value of the source port field on the special tag on the frame received by software. We call frames that carry control information to determine the active topology and current extent of each Virtual Local Area Network (VLAN), i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN Registration Protocol Data Units (MVRPDUs), and frames from other link constrained protocols, such as Extensible Authentication Protocol over LAN (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They are not forwarded by a Bridge. Permanently configured entries in the filtering database (FDB) ensure that such frames are discarded by the Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in detail: Each of the reserved MAC addresses specified in Table 8-1 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be permanently configured in the FDB in C-VLAN components and ERs. Each of the reserved MAC addresses specified in Table 8-2 (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently configured in the FDB in S-VLAN components. Each of the reserved MAC addresses specified in Table 8-3 (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB in TPMR components. The FDB entries for reserved MAC addresses shall specify filtering for all Bridge Ports and all VIDs. Management shall not provide the capability to modify or remove entries for reserved MAC addresses. The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of propagation of PDUs within a Bridged Network, as follows: The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN) component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward. PDUs transmitted using this destination address, or any other addresses that appear in Table 8-1, Table 8-2, and Table 8-3 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can therefore travel no further than those stations that can be reached via a single individual LAN from the originating station. The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an address that no conformant S-VLAN component, C-VLAN component, or MAC Bridge can forward; however, this address is relayed by a TPMR component. PDUs using this destination address, or any of the other addresses that appear in both Table 8-1 and Table 8-2 but not in Table 8-3 (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed by any TPMRs but will propagate no further than the nearest S-VLAN component, C-VLAN component, or MAC Bridge. The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an address that no conformant C-VLAN component, MAC Bridge can forward; however, it is relayed by TPMR components and S-VLAN components. PDUs using this destination address, or any of the other addresses that appear in Table 8-1 but not in either Table 8-2 or Table 8-3 (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and S-VLAN components but will propagate no further than the nearest C-VLAN component or MAC Bridge. Because the LLC Entity associated with each Bridge Port is provided via CPU port, we must not filter these frames but forward them to CPU port. In a Bridge, the transmission Port is majorly decided by ingress and egress rules, FDB, and spanning tree Port State functions of the Forwarding Process. For link-local frames, only CPU port should be designated as destination port in the FDB, and the other functions of the Forwarding Process must not interfere with the decision of the transmission Port. We call this process trapping frames to CPU port. Therefore, on the switch with CPU port architecture, link-local frames must be trapped to CPU port, and certain link-local frames received by a Port of a Bridge comprising a TPMR component or an S-VLAN component must be excluded from it. A Bridge of the switch with CPU port architecture cannot comprise a Two-Port MAC Relay (TPMR) component as a TPMR component supports only a subset of the functionality of a MAC Bridge. A Bridge comprising two Ports (Management Port doesn't count) of this architecture will either function as a standard MAC Bridge or a standard VLAN Bridge. Therefore, a Bridge of this architecture can only comprise S-VLAN components, C-VLAN components, or MAC Bridge components. Since there's no TPMR component, we don't need to relay PDUs using the destination addresses specified on the Nearest non-TPMR section, and the proportion of the Nearest Customer Bridge section where they must be relayed by TPMR components. One option to trap link-local frames to CPU port is to add static FDB entries with CPU port designated as destination port. However, because that Independent VLAN Learning (IVL) is being used on every VID, each entry only applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC Bridge component or a C-VLAN component, there would have to be 16 times 4096 entries. This switch intellectual property can only hold a maximum of 2048 entries. Using this option, there also isn't a mechanism to prevent link-local frames from being discarded when the spanning tree Port State of the reception Port is discarding. The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4 registers. Whilst this applies to every VID, it doesn't contain all of the reserved MAC addresses without affecting the remaining Standard Group MAC Addresses. The REV_UN frame tag utilised using the RGAC4 register covers the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF destination addresses which may be relayed by MAC Bridges or VLAN Bridges. The latter option provides better but not complete conformance. This switch intellectual property also does not provide a mechanism to trap link-local frames with specific destination addresses to CPU port by Bridge, to conform to the filtering rules for the distinct Bridge components. Therefore, regardless of the type of the Bridge component, link-local frames with these destination addresses will be trapped to CPU port: 01-80-C2-00-00-[00,01,02,03,0E] In a Bridge comprising a MAC Bridge component or a C-VLAN component: Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] In a Bridge comprising an S-VLAN component: Link-local frames with these destination addresses will be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-00 Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-[04,05,06,07,08,09,0A] Currently on this switch intellectual property, if the spanning tree Port State of the reception Port is discarding, link-local frames will be discarded. To trap link-local frames regardless of the spanning tree Port State, make the switch regard them as Bridge Protocol Data Units (BPDUs). This switch intellectual property only lets the frames regarded as BPDUs bypass the spanning tree Port State function of the Forwarding Process. With this change, the only remaining interference is the ingress rules. When the reception Port has no PVID assigned on software, VLAN-untagged frames won't be allowed in. There doesn't seem to be a mechanism on the switch intellectual property to have link-local frames bypass this function of the Forwarding Process. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27net: dsa: mt7530: fix handling of all link-local framesArınç ÜNAL2-4/+46
[ Upstream commit 69ddba9d170bdaee1dc0eb4ced38d7e4bb7b92af ] Currently, the MT753X switches treat frames with :01-0D and :0F MAC DAs as regular multicast frames, therefore flooding them to user ports. On page 205, section "8.6.3 Frame filtering" of the active standard, IEEE Std 802.1Q™-2022, it is stated that frames with 01:80:C2:00:00:00-0F as MAC DA must only be propagated to C-VLAN and MAC Bridge components. That means VLAN-aware and VLAN-unaware bridges. On the switch designs with CPU ports, these frames are supposed to be processed by the CPU (software). So we make the switch only forward them to the CPU port. And if received from a CPU port, forward to a single port. The software is responsible of making the switch conform to the latter by setting a single port as destination port on the special tag. This switch intellectual property cannot conform to this part of the standard fully. Whilst the REV_UN frame tag covers the remaining :04-0D and :0F MAC DAs, it also includes :22-FF which the scope of propagation is not supposed to be restricted for these MAC DAs. Set frames with :01-03 MAC DAs to be trapped to the CPU port(s). Add a comment for the remaining MAC DAs. Note that the ingress port must have a PVID assigned to it for the switch to forward untagged frames. A PVID is set by default on VLAN-aware and VLAN-unaware ports. However, when the network interface that pertains to the ingress port is attached to a vlan_filtering enabled bridge, the user can remove the PVID assignment from it which would prevent the link-local frames from being trapped to the CPU port. I am yet to see a way to forward link-local frames while preventing other untagged frames from being forwarded too. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27net: dsa: mt7530: fix link-local frames that ingress vlan filtering portsArınç ÜNAL2-9/+23
[ Upstream commit e8bf353577f382c7066c661fed41b2adc0fc7c40 ] Whether VLAN-aware or not, on every VID VLAN table entry that has the CPU port as a member of it, frames are set to egress the CPU port with the VLAN tag stacked. This is so that VLAN tags can be appended after hardware special tag (called DSA tag in the context of Linux drivers). For user ports on a VLAN-unaware bridge, frame ingressing the user port egresses CPU port with only the special tag. For user ports on a VLAN-aware bridge, frame ingressing the user port egresses CPU port with the special tag and the VLAN tag. This causes issues with link-local frames, specifically BPDUs, because the software expects to receive them VLAN-untagged. There are two options to make link-local frames egress untagged. Setting CONSISTENT or UNTAGGED on the EG_TAG bits on the relevant register. CONSISTENT means frames egress exactly as they ingress. That means egressing with the VLAN tag they had at ingress or egressing untagged if they ingressed untagged. Although link-local frames are not supposed to be transmitted VLAN-tagged, if they are done so, when egressing through a CPU port, the special tag field will be broken. BPDU egresses CPU port with VLAN tag egressing stacked, received on software: 00:01:25.104821 AF Unknown (382365846), length 106: | STAG | | VLAN | 0x0000: 0000 6c27 614d 4143 0001 0000 8100 0001 ..l'aMAC........ 0x0010: 0026 4242 0300 0000 0000 0000 6c27 614d .&BB........l'aM 0x0020: 4143 0000 0000 0000 6c27 614d 4143 0000 AC......l'aMAC.. 0x0030: 0000 1400 0200 0f00 0000 0000 0000 0000 ................ BPDU egresses CPU port with VLAN tag egressing untagged, received on software: 00:23:56.628708 AF Unknown (25215488), length 64: | STAG | 0x0000: 0000 6c27 614d 4143 0001 0000 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ BPDU egresses CPU port with VLAN tag egressing tagged, received on software: 00:01:34.311963 AF Unknown (25215488), length 64: | Mess | 0x0000: 0000 6c27 614d 4143 0001 0001 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ To prevent confusing the software, force the frame to egress UNTAGGED instead of CONSISTENT. This way, frames can't possibly be received TAGGED by software which would have the special tag field broken. VLAN Tag Egress Procedure For all frames, one of these options set the earliest in this order will apply to the frame: - EG_TAG in certain registers for certain frames. This will apply to frame with matching MAC DA or EtherType. - EG_TAG in the address table. This will apply to frame at its incoming port. - EG_TAG in the PVC register. This will apply to frame at its incoming port. - EG_CON and [EG_TAG per port] in the VLAN table. This will apply to frame at its outgoing port. - EG_TAG in the PCR register. This will apply to frame at its outgoing port. EG_TAG in certain registers for certain frames: PPPoE Discovery_ARP/RARP: PPP_EG_TAG and ARP_EG_TAG in the APC register. IGMP_MLD: IGMP_EG_TAG and MLD_EG_TAG in the IMC register. BPDU and PAE: BPDU_EG_TAG and PAE_EG_TAG in the BPC register. REV_01 and REV_02: R01_EG_TAG and R02_EG_TAG in the RGAC1 register. REV_03 and REV_0E: R03_EG_TAG and R0E_EG_TAG in the RGAC2 register. REV_10 and REV_20: R10_EG_TAG and R20_EG_TAG in the RGAC3 register. REV_21 and REV_UN: R21_EG_TAG and RUN_EG_TAG in the RGAC4 register. With this change, it can be observed that a bridge interface with stp_state and vlan_filtering enabled will properly block ports now. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27net: dsa: mt7530: prevent possible incorrect XTAL frequency selectionArınç ÜNAL1-4/+4
[ Upstream commit f490c492e946d8ffbe65ad4efc66de3c5ede30a4 ] On MT7530, the HT_XTAL_FSEL field of the HWTRAP register stores a 2-bit value that represents the frequency of the crystal oscillator connected to the switch IC. The field is populated by the state of the ESW_P4_LED_0 and ESW_P4_LED_0 pins, which is done right after reset is deasserted. ESW_P4_LED_0 ESW_P3_LED_0 Frequency ----------------------------------------- 0 0 Reserved 0 1 20MHz 1 0 40MHz 1 1 25MHz On MT7531, the XTAL25 bit of the STRAP register stores this. The LAN0LED0 pin is used to populate the bit. 25MHz when the pin is high, 40MHz when it's low. These pins are also used with LEDs, therefore, their state can be set to something other than the bootstrapping configuration. For example, a link may be established on port 3 before the DSA subdriver takes control of the switch which would set ESW_P3_LED_0 to high. Currently on mt7530_setup() and mt7531_setup(), 1000 - 1100 usec delay is described between reset assertion and deassertion. Some switch ICs in real life conditions cannot always have these pins set back to the bootstrapping configuration before reset deassertion in this amount of delay. This causes wrong crystal frequency to be selected which puts the switch in a nonfunctional state after reset deassertion. The tests below are conducted on an MT7530 with a 40MHz crystal oscillator by Justin Swartz. With a cable from an active peer connected to port 3 before reset, an incorrect crystal frequency (0b11 = 25MHz) is selected: [1] [3] [5] : : : _____________________________ __________________ ESW_P4_LED_0 |_______| _____________________________ ESW_P3_LED_0 |__________________________ : : : : : : [4]...: : : [2]................: [1] Reset is asserted. [2] Period of 1000 - 1100 usec. [3] Reset is deasserted. [4] Period of 315 usec. HWTRAP register is populated with incorrect XTAL frequency. [5] Signals reflect the bootstrapped configuration. Increase the delay between reset_control_assert() and reset_control_deassert(), and gpiod_set_value_cansleep(priv->reset, 0) and gpiod_set_value_cansleep(priv->reset, 1) to 5000 - 5100 usec. This amount ensures a higher possibility that the switch IC will have these pins back to the bootstrapping configuration before reset deassertion. With a cable from an active peer connected to port 3 before reset, the correct crystal frequency (0b10 = 40MHz) is selected: [1] [2-1] [3] [5] : : : : _____________________________ __________________ ESW_P4_LED_0 |_______| ___________________ _______ ESW_P3_LED_0 |_________| |__________________ : : : : : : [2-2]...: [4]...: [2]................: [1] Reset is asserted. [2] Period of 5000 - 5100 usec. [2-1] ESW_P3_LED_0 goes low. [2-2] Remaining period of 5000 - 5100 usec. [3] Reset is deasserted. [4] Period of 310 usec. HWTRAP register is populated with bootstrapped XTAL frequency. [5] Signals reflect the bootstrapped configuration. ESW_P3_LED_0 low period before reset deassertion: 5000 usec - 5100 usec TEST RESET HOLD # (usec) --------------------- 1 5410 2 5440 3 4375 4 5490 5 5475 6 4335 7 4370 8 5435 9 4205 10 4335 11 3750 12 3170 13 4395 14 4375 15 3515 16 4335 17 4220 18 4175 19 4175 20 4350 Min 3170 Max 5490 Median 4342.500 Avg 4466.500 Revert commit 2920dd92b980 ("net: dsa: mt7530: disable LEDs before reset"). Changing the state of pins via reset assertion is simpler and more efficient than doing so by setting the LED controller off. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Co-developed-by: Justin Swartz <justin.swartz@risingedge.co.za> Signed-off-by: Justin Swartz <justin.swartz@risingedge.co.za> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-15net: dsa: microchip: fix register write order in ksz8_ind_write8()Tobias Jakobi (Compleo)1-2/+2
[ Upstream commit b7fb7729c94fb2d23c79ff44f7a2da089c92d81c ] This bug was noticed while re-implementing parts of the kernel driver in userspace using spidev. The goal was to enable some of the errata workarounds that Microchip describes in their errata sheet [1]. Both the errata sheet and the regular datasheet of e.g. the KSZ8795 imply that you need to do this for indirect register accesses: - write a 16-bit value to a control register pair (this value consists of the indirect register table, and the offset inside the table) - either read or write an 8-bit value from the data storage register (indicated by REG_IND_BYTE in the kernel) The current implementation has the order swapped. It can be proven, by reading back some indirect register with known content (the EEE register modified in ksz8_handle_global_errata() is one of these), that this implementation does not work. Private discussion with Oleksij Rempel of Pengutronix has revealed that the workaround was apparantly never tested on actual hardware. [1] https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/KSZ87xx-Errata-DS80000687C.pdf Signed-off-by: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: 7b6e6235b664 ("net: dsa: microchip: ksz8795: handle eee specif erratum") Link: https://lore.kernel.org/r/20240304154135.161332-1-tobias.jakobi.compleo@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05net: dsa: qca8k: fix illegal usage of GPIOMichal Vokáč1-2/+1
[ Upstream commit c44fc98f0a8ffd94fa0bd291928e7e312ffc7ca4 ] When working with GPIO, its direction must be set either when the GPIO is requested by gpiod_get*() or later on by one of the gpiod_direction_*() functions. Neither of this is done here which results in undefined behavior on some systems. As the reset GPIO is used right after it is requested here, it makes sense to configure it as GPIOD_OUT_HIGH right away. With that, the following gpiod_set_value_cansleep(1) becomes redundant and can be safely removed. Fixes: a653f2f538f9 ("net: dsa: qca8k: introduce reset via gpio feature") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/1706266175-3408-1-git-send-email-michal.vokac@ysoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05net: dsa: qca8k: put MDIO bus OF node on qca8k_mdio_register() failureVladimir Oltean1-5/+16
[ Upstream commit 68e1010cda7967cfca9c8650ee1f4efcae54ab90 ] of_get_child_by_name() gives us an OF node with an elevated refcount, which should be dropped when we're done with it. This is so that, if (of_node_check_flag(node, OF_DYNAMIC)) is true, the node's memory can eventually be freed. There are 2 distinct paths to be considered in qca8k_mdio_register(): - devm_of_mdiobus_register() succeeds: since commit 3b73a7b8ec38 ("net: mdio_bus: add refcounting for fwnodes to mdiobus"), the MDIO core treats this well. - devm_of_mdiobus_register() or anything up to that point fails: it is the duty of the qca8k driver to release the OF node. This change addresses the second case by making sure that the OF node reference is not leaked. The "mdio" node may be NULL, but of_node_put(NULL) is safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05net: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error pathTobias Waldekranz3-11/+11
[ Upstream commit fc82a08ae795ee6b73fb6b50785f7be248bec7b5 ] mv88e6xxx_get_stats, which collects stats from various sources, expects all callees to return the number of stats read. If an error occurs, 0 should be returned. Prevent future mishaps of this kind by updating the return type to reflect this contract. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26net: dsa: vsc73xx: Add null pointer check to vsc73xx_gpio_probeKunwu Chan1-0/+2
[ Upstream commit 776dac5a662774f07a876b650ba578d0a62d20db ] devm_kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240111072018.75971-1-chentao@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28net: dsa: lan9303: consequently nested-lock physical MDIOAlexander Sverdlin1-2/+2
commit 5a22fbcc10f3f7d94c5d88afbbffa240a3677057 upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc7005831523 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25net: dsa: bcm_sf2: Fix possible memory leak in bcm_sf2_mdio_register()Jinjie Ruan1-9/+15
commit 61b40cefe51af005c72dbdcf975a3d166c6e6406 upstream. In bcm_sf2_mdio_register(), the class_find_device() will call get_device() to increment reference count for priv->master_mii_bus->dev if of_mdio_find_bus() succeeds. If mdiobus_alloc() or mdiobus_register() fails, it will call get_device() twice without decrement reference count for the device. And it is the same if bcm_sf2_mdio_register() succeeds but fails in bcm_sf2_sw_probe(), or if bcm_sf2_sw_probe() succeeds. If the reference count has not decremented to zero, the dev related resource will not be freed. So remove the get_device() in bcm_sf2_mdio_register(), and call put_device() if mdiobus_alloc() or mdiobus_register() fails and in bcm_sf2_mdio_unregister() to solve the issue. And as Simon suggested, unwind from errors for bcm_sf2_mdio_register() and just return 0 if it succeeds to make it cleaner. Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231011032419.2423290-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-20net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal ↵Marek Behún1-0/+11
PHYs via management frames [ Upstream commit 526c8ee04bdbd4d8d19a583b1f3b06700229a815 ] Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus also Micron ethernet PHY (dedicated to the WAN port). We've been experiencing a strange behavior of the WAN ethernet interface, wherein the WAN PHY started timing out the MDIO accesses, for example when the interface was brought down and then back up. Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet"), which added support to access the QCA8337 switch's internal PHYs via management ethernet frames. Connecting the MDIO bus pins onto an oscilloscope, I was able to see that the MDIO bus was active whenever a request to read/write an internal PHY register was done via an management ethernet frame. My theory is that when the switch core always communicates with the internal PHYs via the MDIO bus, even when externally we request the access via ethernet. This MDIO bus is the same one via which the switch and internal PHYs are accessible to the board, and the board may have other devices connected on this bus. An ASCII illustration may give more insight: +---------+ +----| | | | WAN PHY | | +--| | | | +---------+ | | | | +----------------------------------+ | | | QCA8337 | MDC | | | +-------+ | ------o-+--|--------o------------o--| | | MDIO | | | | | PHY 1 |-|--to RJ45 --------o--|---o----+---------o--+--| | | | | | | | +-------+ | | +-------------+ | o--| | | | | MDIO MDC | | | | PHY 2 |-|--to RJ45 eth1 | | | o--+--| | | -----------|-|port0 | | | +-------+ | | | | | o--| | | | | switch core | | | | PHY 3 |-|--to RJ45 | +-------------+ o--+--| | | | | | +-------+ | | | o--| ... | | +----------------------------------+ When we send a request to read an internal PHY register via an ethernet management frame via eth1, the switch core receives the ethernet frame on port 0 and then communicates with the internal PHY via MDIO. At this time, other potential devices, such as the WAN PHY on Turris 1.x, cannot use the MDIO bus, since it may cause a bus conflict. Fix this issue by locking the MDIO bus even when we are accessing the PHY registers via ethernet management frames. Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absentFabio Estevam5-35/+6
[ Upstream commit 6ccf50d4d4741e064ba35511a95402c63bbe21a8 ] Since commit 23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset") the following error is seen on a imx8mn board with a 88E6320 switch: mv88e6085 30be0000.ethernet-1:00: Timeout waiting for EEPROM done This board does not have an EEPROM attached to the switch though. This problem is well explained by Andrew Lunn: "If there is an EEPROM, and the EEPROM contains a lot of data, it could be that when we perform a hardware reset towards the end of probe, it interrupts an I2C bus transaction, leaving the I2C bus in a bad state, and future reads of the EEPROM do not work. The work around for this was to poll the EEInt status and wait for it to go true before performing the hardware reset. However, we have discovered that for some boards which do not have an EEPROM, EEInt never indicates complete. As a result, mv88e6xxx_g1_wait_eeprom_done() spins for a second and then prints a warning. We probably need a different solution than calling mv88e6xxx_g1_wait_eeprom_done(). The datasheet for 6352 documents the EEPROM Command register: bit 15 is: EEPROM Unit Busy. This bit must be set to a one to start an EEPROM operation (see EEOp below). Only one EEPROM operation can be executing at one time so this bit must be zero before setting it to a one. When the requested EEPROM operation completes this bit will automatically be cleared to a zero. The transition of this bit from a one to a zero can be used to generate an interrupt (the EEInt in Global 1, offset 0x00). and more interesting is bit 11: Register Loader Running. This bit is set to one whenever the register loader is busy executing instructions contained in the EEPROM." Change to using mv88e6xxx_g2_eeprom_wait() to fix the timeout error when the EEPROM chip is not present. Fixes: 23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19net: dsa: sja1105: block FDB accesses that are concurrent with a switch resetVladimir Oltean1-0/+2
[ Upstream commit 86899e9e1e29e854b5f6dcc24ba4f75f792c89aa ] Currently, when we add the first sja1105 port to a bridge with vlan_filtering 1, then we sometimes see this output: sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2 It is because sja1105_fdb_add() runs from the dsa_owq which is no longer serialized with switch resets since it dropped the rtnl_lock() in the blamed commit. Either performing the FDB accesses before the reset, or after the reset, is equally fine, because sja1105_static_fdb_change() backs up those changes in the static config, but FDB access during reset isn't ok. Make sja1105_static_config_reload() take the fdb_lock to fix that. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accessesVladimir Oltean2-13/+45
[ Upstream commit ea32690daf4fa525dc5a4d164bd00ed8c756e1c6 ] sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood() runs from switchdev_deferred_process_work(). Prior to the blamed commit, they used to be indirectly serialized through the rtnl_lock(), which no longer holds true because dsa_owq dropped that. So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP elements concurrently compared to when we change them, in sja1105_static_fdb_change(). That is not ideal, since it might result in data corruption. Introduce a mutex which serializes accesses to the hardware FDB and to the static config elements for the L2 Address Lookup table. I can't find a good reason to add locking around sja1105_fdb_dump(). I'll add it later if needed. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>