summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-28net: netsec: enable pp skb recyclingLorenzo Bianconi1-1/+1
Similar to mvneta or mvpp2, enable page_pool skb recycling for netsec dirver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28atm: firestream: check the return value of ioremap() in fs_init()Jia-Ju Bai1-0/+2
The function ioremap() in fs_init() can fail, so its return value should be checked. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28net: sparx5: Add #include to remove warningCasper Andersson1-0/+2
main.h uses NUM_TARGETS from main_regs.h, but the missing include never causes any errors because everywhere main.h is (currently) included, main_regs.h is included before. But since it is dependent on main_regs.h it should always be included. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Joacim Zetterling <joacim.zetterling@westermo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28net/smc: Call trace_smc_tx_sendmsg when data corkedTony Lu1-9/+8
This also calls trace_smc_tx_sendmsg() even if data is corked. For ease of understanding, if statements are not expanded here. Link: https://lore.kernel.org/all/f4166712-9a1e-51a0-409d-b7df25a66c52@linux.ibm.com/ Fixes: 139653bc6635 ("net/smc: Remove corked dealyed work") Suggested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28net/smc: Fix cleanup when register ULP failsTony Lu1-1/+3
This patch calls smc_ib_unregister_client() when tcp_register_ulp() fails, and make sure to clean it up. Fixes: d7cd421da9da ("net/smc: Introduce TCP ULP support") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28Merge branch 'flow_offload-tc-police-parameters'David S. Miller14-38/+454
Jianbo Liu says: ==================== flow_offload: add tc police parameters As a preparation for more advanced police offload in mlx5 (e.g., jumping to another chain when bandwidth is not exceeded), extend the flow offload API with more tc-police parameters. Adjust existing drivers to reject unsupported configurations. Changes since v2: * Rename index to extval in exceed and notexceed acts. * Add policer validate functions for all drivers. Changes since v1: * Add one more strict validation for the control of drop/ok. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28flow_offload: reject offload for all drivers with invalid police parametersJianbo Liu12-38/+369
As more police parameters are passed to flow_offload, driver can check them to make sure hardware handles packets in the way indicated by tc. The conform-exceed control should be drop/pipe or drop/ok. Besides, for drop/ok, the police should be the last action. As hardware can't configure peakrate/avrate/overhead, offload should not be supported if any of them is configured. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28net: flow_offload: add tc police action parametersJianbo Liu3-0/+85
The current police offload action entry is missing exceed/notexceed actions and parameters that can be configured by tc police action. Add the missing parameters as a pre-step for offloading police actions to hardware. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28net: ipv6: ensure we call ipv6_mc_down() at most oncej.nixdorf@avm.de1-2/+6
There are two reasons for addrconf_notify() to be called with NETDEV_DOWN: either the network device is actually going down, or IPv6 was disabled on the interface. If either of them stays down while the other is toggled, we repeatedly call the code for NETDEV_DOWN, including ipv6_mc_down(), while never calling the corresponding ipv6_mc_up() in between. This will cause a new entry in idev->mc_tomb to be allocated for each multicast group the interface is subscribed to, which in turn leaks one struct ifmcaddr6 per nontrivial multicast group the interface is subscribed to. The following reproducer will leak at least $n objects: ip addr add ff2e::4242/32 dev eth0 autojoin sysctl -w net.ipv6.conf.eth0.disable_ipv6=1 for i in $(seq 1 $n); do ip link set up eth0; ip link set down eth0 done Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2) can also be used to create a nontrivial idev->mc_list, which will the leak objects with the right up-down-sequence. Based on both sources for NETDEV_DOWN events the interface IPv6 state should be considered: - not ready if the network interface is not ready OR IPv6 is disabled for it - ready if the network interface is ready AND IPv6 is enabled for it The functions ipv6_mc_up() and ipv6_down() should only be run when this state changes. Implement this by remembering when the IPv6 state is ready, and only run ipv6_mc_down() if it actually changed from ready to not ready. The other direction (not ready -> ready) already works correctly, as: - the interface notification triggered codepath for NETDEV_UP / NETDEV_CHANGE returns early if ipv6 is disabled, and - the disable_ipv6=0 triggered codepath skips fully initializing the interface as long as addrconf_link_ready(dev) returns false - calling ipv6_mc_up() repeatedly does not leak anything Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled") Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28efivars: Respect "block" flag in efivar_entry_set_safe()Jann Horn1-1/+4
When the "block" flag is false, the old code would sometimes still call check_var_size(), which wrongly tells ->query_variable_store() that it can block. As far as I can tell, this can't really materialize as a bug at the moment, because ->query_variable_store only does something on X86 with generic EFI, and in that configuration we always take the efivar_entry_set_nonblocking() path. Fixes: ca0e30dcaa53 ("efi: Add nonblocking option to efi_query_variable_store()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220218180559.1432559-1-jannh@google.com
2022-02-28riscv/efi_stub: Fix get_boot_hartid_from_fdt() return valueSunil V L1-7/+10
The get_boot_hartid_from_fdt() function currently returns U32_MAX for failure case which is not correct because U32_MAX is a valid hartid value. This patch fixes the issue by returning error code. Cc: <stable@vger.kernel.org> Fixes: d7071743db31 ("RISC-V: Add EFI stub support.") Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-02-28Linux 5.17-rc6Linus Torvalds1-1/+1
2022-02-28Merge tag 'irq-urgent-2022-02-27' of ↵Linus Torvalds1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a regression caused by the recent PCI/MSI rework which resulted in a recursive locking problem in the VMD driver. The cure is to cache the relevant information upfront instead of retrieving it at runtime" * tag 'irq-urgent-2022-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI: vmd: Prevent recursive locking on interrupt allocation
2022-02-27Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds3-1/+18
Pull dma-mapping fix from Christoph Hellwig: - fix a swiotlb info leak (Halil Pasic) * tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-27Merge tag 'pinctrl-v5-17-3' of ↵Linus Torvalds4-8/+13
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix some drive strength and pull-up code in the K210 driver. - Add the Alder Lake-M ACPI ID so it starts to work properly. - Use a static name for the StarFive GPIO irq_chip, forestalling an upcoming fixes series from Marc Zyngier. - Fix an ages old bug in the Tegra 186 driver where we were indexing at random into struct and being lucky getting the right member. * tag 'pinctrl-v5-17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: gpio: tegra186: Fix chip_data type confusion pinctrl: starfive: Use a static name for the GPIO irq_chip pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID" pinctrl: k210: Fix bias-pull-up pinctrl: fix loop in k210_pinconf_get_drive()
2022-02-27Merge branch 'dsa-fdb-isolation'David S. Miller31-510/+925
Vladimir Oltean says: ==================== DSA FDB isolation There are use cases which need FDB isolation between standalone ports and bridged ports, as well as isolation between ports of different bridges. Most of these use cases are a result of the fact that packets can now be partially forwarded by the software bridge, so one port might need to send a packet to the CPU but its FDB lookup will see that it can forward it directly to a bridge port where that packet was autonomously learned. So the source port will attempt to shortcircuit the CPU and forward autonomously, which it can't due to the forwarding isolation we have in place. So we will have packet drops instead of proper operation. Additionally, before DSA can implement IFF_UNICAST_FLT for standalone ports, we must have control over which database we install FDB entries corresponding to port MAC addresses in. We don't want to hinder the operation of the bridging layer. DSA does not have a driver API that encourages FDB isolation, so this needs to be created. The basis for this is a new struct dsa_db which annotates each FDB and MDB entry with the database it belongs to. The sja1105 and felix drivers are modified to observe the dsa_db argument, and therefore, enforce the FDB isolation. Compared to the previous RFC patch series from August: https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/ what is different is that I stopped trying to make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking, instead I'm making use of the fact that DSA waits for switchdev FDB work items to finish before a port leaves the bridge. This is possible since: https://patchwork.kernel.org/project/netdevbpf/patch/20211024171757.3753288-7-vladimir.oltean@nxp.com/ Additionally, v2 is also rebased over the DSA LAG FDB work. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: mscc: ocelot: enforce FDB isolation when VLAN-unawareVladimir Oltean6-56/+317
Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: sja1105: enforce FDB isolationVladimir Oltean1-29/+30
For sja1105, to enforce FDB isolation simply means to turn on Independent VLAN Learning unconditionally, and to remap VLAN-unaware FDB and MDB entries towards the private VLAN allocated by tag_8021q for each bridge. Standalone ports each have their own standalone tag_8021q VLAN. No learning happens in that VLAN due to: - learning being disabled on standalone user ports - learning being disabled on the CPU port (we use assisted_learning_on_cpu_port which only installs bridge FDBs) VLAN-aware ports learn FDB entries with the bridge VLANs. VLAN-unaware bridge ports learn with the tag_8021q VLAN for bridging. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: pass extack to .port_bridge_join driver methodsVladimir Oltean19-20/+40
As FDB isolation cannot be enforced between VLAN-aware bridges in lack of hardware assistance like extra FID bits, it seems plausible that many DSA switches cannot do it. Therefore, they need to reject configurations with multiple VLAN-aware bridges from the two code paths that can transition towards that state: - joining a VLAN-aware bridge - toggling VLAN awareness on an existing bridge The .port_vlan_filtering method already propagates the netlink extack to the driver, let's propagate it from .port_bridge_join too, to make sure that the driver can use the same function for both. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: request drivers to perform FDB isolationVladimir Oltean17-92/+280
For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vidVladimir Oltean4-7/+7
The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for bridge TX forwarding offload, it is the private VLAN reserved for VLAN-unaware bridging in a way that is compatible with FDB isolation. So just rename it dsa_tag_8021q_bridge_vid. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: merge RX and TX VLANsVladimir Oltean7-193/+115
In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: felix: delete workarounds present due to SVL tag_8021q bridgingVladimir Oltean1-17/+2
The felix driver, which also has a tagging protocol implementation based on tag_8021q, does not care about adding the RX VLAN that is pvid on one port on the other ports that are in the same bridge with it. It simply doesn't need that, because in its implementation, the RX VLAN that is pvid of a port is only used to install a TCAM rule that pushes that VLAN ID towards the CPU port. Now that tag_8021q no longer performs Shared VLAN Learning based forwarding, the RX VLANs are actually segregated into two types: standalone VLANs and VLAN-unaware bridging VLANs. Since you actually have to call dsa_tag_8021q_bridge_join() to get a bridging VLAN from tag_8021q, and felix does not do that because it doesn't need it, it means that it only gets standalone port VLANs from tag_8021q. Which is perfect because this means it can drop its workarounds that avoid the VLANs it does not need. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27docs: net: dsa: sja1105: document limitations of tc-flower rule VLAN awarenessVladimir Oltean1-0/+27
After change "net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging", tag_8021q enforces two different pvids on a port, depending on whether it is standalone or in a VLAN-unaware bridge. Up until now, there was a single pvid, represented by dsa_tag_8021q_rx_vid(), and that was used as the VLAN for VLAN-unaware virtual link rules, regardless of whether the port was bridged or standalone. To keep VLAN-unaware virtual links working, we need to follow whether the port is in a bridge or not, and update the VLAN ID from those rules. In fact we can't fully do that. Depending on whether the switch is VLAN-aware or not, we can accept Virtual Link rules with just the MAC DA, or with a MAC DA and a VID. So we already deny changes to the VLAN awareness of the switch. But the VLAN awareness may also change as a result of joining or leaving a bridge. One might say we could just allow the following: a port may leave a VLAN-unaware bridge while it has VLAN-unaware VL (tc-flower) rules, and the driver will update those with the new tag_8021q pvid for standalone mode, but the driver won't accept joining a bridge at all while VL rules were installed in standalone mode. This is sort of a compromise made because leaving a bridge is an operation that cannot be vetoed. But this sort of setup change is not fully supported, either: as mentioned, VLAN filtering changes can also be triggered by leaving a bridge, therefore, the existing veto we have in place for turning VLAN filtering off with VLAN-aware VL rules active still isn't fully effective. I really don't know how to deal with this in a way that produces predictable behavior for user space. Since at the moment, keeping this feature fully functional on constellation changes (not changing the tag_8021q port pvid when joining a bridge) is blocking progress for the DSA FDB isolation, I'd rather document it as a (potentially temporary) limitation and go on without it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: add support for imprecise RX based on the VBIDVladimir Oltean4-13/+55
The sja1105 switch can't populate the PORT field of the tag_8021q header when sending a frame to the CPU with a non-zero VBID. Similar to dsa_find_designated_bridge_port_by_vid() which performs imprecise RX for VLAN-aware bridges, let's introduce a helper in tag_8021q for performing imprecise RX based on the VLAN that it has allocated for a VLAN-unaware bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridgingVladimir Oltean6-101/+70
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too tied with the sja1105 switch: each port uses the same pvid which is also used for standalone operation (a unique one from which the source port and device ID can be retrieved when packets from that port are forwarded to the CPU). Since each port has a unique pvid when performing autonomous forwarding, the switch must be configured for Shared VLAN Learning (SVL) such that the VLAN ID itself is ignored when performing FDB lookups. Without SVL, packets would always be flooded, since FDB lookup in the source port's VLAN would never find any entry. First of all, to make tag_8021q more palatable to switches which might not support Shared VLAN Learning, let's just use a common VLAN for all ports that are under the same bridge. Secondly, using Shared VLAN Learning means that FDB isolation can never be enforced. But if all ports under the same VLAN-unaware bridge share the same VLAN ID, it can. The disadvantage is that the CPU port can no longer perform precise source port identification for these packets. But at least we have a mechanism which has proven to be adequate for that situation: imprecise RX (dsa_find_designated_bridge_port_by_vid), which is what we use for termination on VLAN-aware bridges. The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the same one as we were previously using for imprecise TX (bridge TX forwarding offload). It is already allocated, it is just a matter of using it. Note that because now all ports under the same bridge share the same VLAN, the complexity of performing a tag_8021q bridge join decreases dramatically. We no longer have to install the RX VLAN of a newly joining port into the port membership of the existing bridge ports. The newly joining port just becomes a member of the VLAN corresponding to that bridge, and the other ports are already members of it from when they joined the bridge themselves. So forwarding works properly. This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put these calls directly into the sja1105 driver. With this new mode of operation, a port controlled by tag_8021q can have two pvids whereas before it could only have one. The pvid for standalone operation is different from the pvid used for VLAN-unaware bridging. This is done, again, so that FDB isolation can be enforced. Let tag_8021q manage this by deleting the standalone pvid when a port joins a bridge, and restoring it when it leaves it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27Merge branch 'FFungible-ethernet-driver'David S. Miller26-0/+8776
Dimitris Michailidis says: ==================== new Fungible Ethernet driver This patch series contains a new network driver for the Ethernet functionality of Fungible cards. It contains two modules. The first one in patch 2 is a library module that implements some of the device setup, queue managenent, and support for operating an admin queue. These are placed in a separate module because the cards provide a number of PCI functions handled by different types of drivers and all use the same common means to interact with the device. Each of the drivers will be relying on this library module for them. The remaining patches provide the Ethernet driver for the cards. v2: - Fix set_pauseparam, remove get_wol, remove module param (Andrew Lunn) - Fix a register poll loop (Andrew) - Replace constants defined with 'static const' - make W=1 C=1 is clean - Remove devlink FW update (Jakub) - Remove duplicate ethtool stats covered by structured API (Jakub) v3: - Make TLS stats unconditional (Andrew) - Remove inline from .c (Andrew) - Replace some ifdef with IS_ENABLED (Andrew) - Fix build failure on 32b arches (build robot) - Fix build issue with make O= (Jakub) v4: - Fix for newer bpf_warn_invalid_xdp_action() (Jakub) - Remove 32b dma_set_mask_and_coherent() v5: - Make XDP enter/exit non-disruptive to active traffic - Remove dormant port state - Style fixes, unused stuff removal (Jakub) v6: - When changing queue depth or numbers allocate the new queues before shutting down the existing ones (Jakub) v7: - Convert IRQ bookeeping to use XArray. - Changes to the numbers of Tx/Rx queues are now incremental and do not disrupt ongoing traffic. - Implement .ndo_eth_ioctl instead of .ndo_do_ioctl. - Replace deprecated irq_set_affinity_hint. - Remove TLS 1.3 support (Jakub) - Remove hwtstamp_config.flags check (Jakub) - Add locking in SR-IOV enable/disable. (Jakub) v8: - Remove dropping of <33B packets and the associated counter (Jakub) - Report CQE size. - Show last MAC stats when the netdev isn't running (Andrew) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/fungible: Kconfig, Makefiles, and MAINTAINERSDimitris Michailidis7-0/+69
Hook up the new driver to configuration and build. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/funeth: add kTLS TX control partDimitris Michailidis2-0/+186
This provides the control pieces for kTLS Tx offload, implementinng the offload operations. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/funeth: add the data pathDimitris Michailidis4-0/+1969
Add the driver's data path. Tx handles skbs, XDP, and kTLS, Rx has skbs and XDP. Also included are Rx and Tx queue creation/tear-down and tracing. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/funeth: devlink supportDimitris Michailidis2-0/+53
The devlink part, which is minimal at this time giving just the driver name. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/funeth: ethtool operationsDimitris Michailidis2-0/+1259
Add ethtool operations, primarily related to queues and ports, as well as device statistics. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/funeth: probing and netdev opsDimitris Michailidis2-0/+2262
This is the first part of the Fungible ethernet driver. It deals with device probing, net_device creation, and netdev ops. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net/fungible: Add service module for Fungible driversDimitris Michailidis6-0/+2976
Fungible cards have a number of different PCI functions and thus different drivers, all of which use a common method to initialize and interact with the device. This commit adds a library module that collects these common mechanisms. They mainly deal with device initialization, setting up and destroying queues, and operating an admin queue. A subset of the FW interface is also included here. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27PCI: Add Fungible Vendor ID to pci_ids.hDimitris Michailidis1-0/+2
Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-26Merge tag 'trace-v5.17-rc4' of ↵Linus Torvalds18-79/+148
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - rtla (Real-Time Linux Analysis tool): - fix typo in man page - Update API -e to -E before it is released - Error message fix and memory leak fix - Partially uninline trace event soft disable to shrink text - Fix function graph start up test - Have triggers affect the trace instance they are in and not top level - Have osnoise sleep in the units it says it uses - Remove unused ftrace stub function - Remove event probe redundant info from event in the buffer - Fix group ownership setting in tracefs - Ensure trace buffer is minimum size to prevent crashes * tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla/osnoise: Fix error message when failing to enable trace instance rtla/osnoise: Free params at the exit rtla/hist: Make -E the short version of --entries tracing: Fix selftest config check for function graph start up test tracefs: Set the group ownership in apply_options() not parse_options() tracing/osnoise: Make osnoise_main to sleep for microseconds ftrace: Remove unused ftrace_startup_enable() stub tracing: Ensure trace buffer is at least 4096 bytes large tracing: Uninline trace_trigger_soft_disabled() partly eprobes: Remove redundant event type information tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance rtla: Fix systme -> system typo on man page
2022-02-26Merge tag 'fixes-2022-02-26' of ↵Linus Torvalds1-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "Use kfree() to release kmalloced memblock regions memblock.{reserved,memory}.regions may be allocated using kmalloc() in memblock_double_array(). Use kfree() to release these kmalloced regions" * tag 'fixes-2022-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: use kfree() to release kmalloced memblock regions
2022-02-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-20/+56
Merge misc fixes from Andrew Morton: "12 patches. Subsystems affected by this patch series: MAINTAINERS, mailmap, memfd, and mm (hugetlb, kasan, hugetlbfs, pagemap, selftests, memcg, and slab)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: selftests/memfd: clean up mapping in mfd_fail_write mailmap: update Roman Gushchin's email MAINTAINERS, SLAB: add Roman as reviewer, git tree MAINTAINERS: add Shakeel as a memcg co-maintainer MAINTAINERS: remove Vladimir from memcg maintainers MAINTAINERS: add Roman as a memcg co-maintainer selftest/vm: fix map_fixed_noreplace test failure mm: fix use-after-free bug when mm->mmap is reused after being freed hugetlbfs: fix a truncation issue in hugepages parameter kasan: test: prevent cache merging in kmem_cache_double_destroy mm/hugetlb: fix kernel crash with hugetlb mremap MAINTAINERS: add sysctl-next git tree
2022-02-26Merge tag 'riscv-for-linus-5.17-rc6' of ↵Linus Torvalds5-6/+46
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for the K210 sdcard defconfig, to avoid using a fixed delay for the root FS - A fix to make sure there's a proper call frame for trace_hardirqs_{on,off}(). * tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: fix oops caused by irqsoff latency tracer riscv: fix nommu_k210_sdcard_defconfig
2022-02-26Merge tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-2/+5
Pull xfs fixes from Darrick Wong: "Nothing exciting, just more fixes for not returning sync_filesystem error values (and eliding it when it's not necessary). Summary: - Only call sync_filesystem when we're remounting the filesystem readonly readonly, and actually check its return value" * tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: only bother with sync_filesystem during readonly remount
2022-02-26selftests/memfd: clean up mapping in mfd_fail_writeMike Kravetz1-0/+1
Running the memfd script ./run_hugetlbfs_test.sh will often end in error as follows: memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE memfd-hugetlb: SEAL-SHRINK fallocate(ALLOC) failed: No space left on device ./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs opening: ./mnt/memfd fuse: DONE If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE test the mfd_fail_write routine maps the file, but does not unmap. As a result, two hugetlb pages remain reserved for the mapping. When the fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb pages, it is short by the two reserved pages. Fix by making sure to unmap in mfd_fail_write. Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26mailmap: update Roman Gushchin's emailRoman Gushchin1-0/+3
I'm moving to a @linux.dev account. Map my old addresses. Link: https://lkml.kernel.org/r/20220221200006.416377-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26MAINTAINERS, SLAB: add Roman as reviewer, git treeVlastimil Babka1-0/+2
The slab code has an overlap with kmem accounting, where Roman has done a lot of work recently and it would be useful to make sure he's CC'd on patches that potentially affect it. Thus add him as a reviewer for the SLAB subsystem. Also while at it, add the link to slab git tree. Link: https://lkml.kernel.org/r/20220222103104.13241-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26MAINTAINERS: add Shakeel as a memcg co-maintainerShakeel Butt1-0/+1
I have been contributing and reviewing to the memcg codebase for last couple of years. So, making it official. Link: https://lkml.kernel.org/r/20220224060148.4092228-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26MAINTAINERS: remove Vladimir from memcg maintainersVladimir Davydov1-1/+0
Link: https://lkml.kernel.org/r/4ad1f8da49d7b71c84a0c15bd5347f5ce704e730.1645608825.git.vdavydov.dev@gmail.com Signed-off-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26MAINTAINERS: add Roman as a memcg co-maintainerRoman Gushchin1-0/+1
Add myself as a memcg co-maintainer. My primary focus over last few years was the kernel memory accounting stack, but I do work on some other parts of the memory controller as well. Link: https://lkml.kernel.org/r/20220221233951.659048-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26selftest/vm: fix map_fixed_noreplace test failureAneesh Kumar K.V1-12/+37
On the latest RHEL the test fails due to executable mapped at 256MB address # ./map_fixed_noreplace mmap() @ 0x10000000-0x10050000 p=0xffffffffffffffff result=File exists 10000000-10010000 r-xp 00000000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10010000-10020000 r--p 00000000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10020000-10030000 rw-p 00010000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10029b90000-10029bc0000 rw-p 00000000 00:00 0 [heap] 7fffbb510000-7fffbb750000 r-xp 00000000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb750000-7fffbb760000 r--p 00230000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb760000-7fffbb770000 rw-p 00240000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb780000-7fffbb7a0000 r--p 00000000 00:00 0 [vvar] 7fffbb7a0000-7fffbb7b0000 r-xp 00000000 00:00 0 [vdso] 7fffbb7b0000-7fffbb800000 r-xp 00000000 fd:04 24514 /usr/lib64/ld64.so.2 7fffbb800000-7fffbb810000 r--p 00040000 fd:04 24514 /usr/lib64/ld64.so.2 7fffbb810000-7fffbb820000 rw-p 00050000 fd:04 24514 /usr/lib64/ld64.so.2 7fffd93f0000-7fffd9420000 rw-p 00000000 00:00 0 [stack] Error: couldn't map the space we need for the test Fix this by finding a free address using mmap instead of hardcoding BASE_ADDRESS. Link: https://lkml.kernel.org/r/20220217083417.373823-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26mm: fix use-after-free bug when mm->mmap is reused after being freedSuren Baghdasaryan1-0/+1
oom reaping (__oom_reap_task_mm) relies on a 2 way synchronization with exit_mmap. First it relies on the mmap_lock to exclude from unlock path[1], page tables tear down (free_pgtables) and vma destruction. This alone is not sufficient because mm->mmap is never reset. For historical reasons[2] the lock is taken there is also MMF_OOM_SKIP set for oom victims before. The oom reaper only ever looks at oom victims so the whole scheme works properly but process_mrelease can opearate on any task (with fatal signals pending) which doesn't really imply oom victims. That means that the MMF_OOM_SKIP part of the synchronization doesn't work and it can see a task after the whole address space has been demolished and traverse an already released mm->mmap list. This leads to use after free as properly caught up by KASAN report. Fix the issue by reseting mm->mmap so that MMF_OOM_SKIP synchronization is not needed anymore. The MMF_OOM_SKIP is not removed from exit_mmap yet but it acts mostly as an optimization now. [1] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") [2] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") [mhocko@suse.com: changelog rewrite] Link: https://lore.kernel.org/all/00000000000072ef2c05d7f81950@google.com/ Link: https://lkml.kernel.org/r/20220215201922.1908156-1-surenb@google.com Fixes: 64591e8605d6 ("mm: protect free_pgtables with mmap_lock write lock in exit_mmap") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+2ccf63a4bd07cf39cab0@syzkaller.appspotmail.com Suggested-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Rik van Riel <riel@surriel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jan Engelhardt <jengelh@inai.de> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26hugetlbfs: fix a truncation issue in hugepages parameterLiu Yuntao1-2/+2
When we specify a large number for node in hugepages parameter, it may be parsed to another number due to truncation in this statement: node = tmp; For example, add following parameter in command line: hugepagesz=1G hugepages=4294967297:5 and kernel will allocate 5 hugepages for node 1 instead of ignoring it. I move the validation check earlier to fix this issue, and slightly simplifies the condition here. Link: https://lkml.kernel.org/r/20220209134018.8242-1-liuyuntao10@huawei.com Fixes: b5389086ad7be0 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Signed-off-by: Liu Yuntao <liuyuntao10@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-26kasan: test: prevent cache merging in kmem_cache_double_destroyAndrey Konovalov1-1/+4
With HW_TAGS KASAN and kasan.stacktrace=off, the cache created in the kmem_cache_double_destroy() test might get merged with an existing one. Thus, the first kmem_cache_destroy() call won't actually destroy it but will only decrease the refcount. This causes the test to fail. Provide an empty constructor for the created cache to prevent the cache from getting merged. Link: https://lkml.kernel.org/r/b597bd434c49591d8af00ee3993a42c609dc9a59.1644346040.git.andreyknvl@google.com Fixes: f98f966cd750 ("kasan: test: add test case for double-kmem_cache_destroy()") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>