summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds22-124/+197
Pull rdma fixes from Jason Gunthorpe: "Several regression fixes from work that landed in the merge window, particularly in the mlx5 driver: - Various static checker and warning fixes - General bug fixes in rvt, qedr, hns, mlx5 and hfi1 - Several regression fixes related to the ECE and QP changes in last cycle - Fixes for a few long standing crashers in CMA, uverbs ioctl, and xrc" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits) IB/hfi1: Add atomic triggered sleep/wakeup IB/hfi1: Correct -EBUSY handling in tx code IB/hfi1: Fix module use count flaw due to leftover module put calls IB/hfi1: Restore kfree in dummy_netdev cleanup IB/mad: Fix use after free when destroying MAD agent RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata RDMA/counter: Query a counter before release RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads() RDMA/mlx5: Fix integrity enabled QP creation RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs RDMA/mlx5: Fix remote gid value in query QP RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path RDMA/core: Check that type_attrs is not NULL prior access RDMA/hns: Fix an cmd queue issue when resetting RDMA/hns: Fix a calltrace when registering MR from userspace RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake RDMA/cma: Protect bind_list and listen_list while finding matching cm id RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532 RDMA/efa: Set maximum pkeys device attribute RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq ...
2020-06-25ptp_pch: use generic power managementVaibhav Gupta1-34/+3
With legacy PM, drivers themselves were responsible for managing the device's power states and takes care of register states. After upgrading to the generic structure, PCI core will take care of required tasks and drivers should do only device-specific operations. In the case of ptp_pch, after removing PCI helper functions, .suspend() and .resume() became empty-body functions. Hence, define them NULL and use dev_pm_ops. Compile-tested only. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25tcp: don't ignore ECN CWR on pure ACKDenis Kirjanov1-3/+11
there is a problem with the CWR flag set in an incoming ACK segment and it leads to the situation when the ECE flag is latched forever the following packetdrill script shows what happens: // Stack receives incoming segments with CE set +0.1 <[ect0] . 11001:12001(1000) ack 1001 win 65535 +0.0 <[ce] . 12001:13001(1000) ack 1001 win 65535 +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535 // Stack repsonds with ECN ECHO +0.0 >[noecn] . 1001:1001(0) ack 12001 +0.0 >[noecn] E. 1001:1001(0) ack 13001 +0.0 >[noecn] E. 1001:1001(0) ack 14001 // Write a packet +0.1 write(3, ..., 1000) = 1000 +0.0 >[ect0] PE. 1001:2001(1000) ack 14001 // Pure ACK received +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535 // Since CWR was sent, this packet should NOT have ECE set +0.1 write(3, ..., 1000) = 1000 +0.0 >[ect0] P. 2001:3001(1000) ack 14001 // but Linux will still keep ECE latched here, with packetdrill // flagging a missing ECE flag, expecting // >[ect0] PE. 2001:3001(1000) ack 14001 // in the script In the situation above we will continue to send ECN ECHO packets and trigger the peer to reduce the congestion window. To avoid that we can check CWR on pure ACKs received. v3: - Add a sequence check to avoid sending an ACK to an ACK v2: - Adjusted the comment - move CWR check before checking for unacknowledged packets Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: avoid skcipher API for single block AES encryptionArd Biesheuvel2-33/+10
The skcipher API dynamically instantiates the transformation object on request that implements the requested algorithm optimally on the given platform. This notion of optimality only matters for cases like bulk network or disk encryption, where performance can be a bottleneck, or in cases where the algorithm itself is not known at compile time. In the mscc case, we are dealing with AES encryption of a single block, and so neither concern applies, and we are better off using the AES library interface, which is lightweight and safe for this kind of use. Note that the scatterlist API does not permit references to buffers that are located on the stack, so the existing code is incorrect in any case, but avoiding the skcipher and scatterlist APIs entirely is the most straight-forward approach to fixing this. Cc: Antoine Tenart <antoine.tenart@bootlin.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Fixes: 28c5107aa904e ("net: phy: mscc: macsec support") Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge tag 's390-5.8-3' of ↵Linus Torvalds3-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Fix kernel crash on system call single stepping. - Make sure early program check handler is executed with DAT on to avoid an endless program check loop. - Add __GFP_NOWARN flag to debug feature to avoid user triggerable allocation failure messages. * tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/debug: avoid kernel warning on too large number of pages s390/kasan: fix early pgm check handler execution s390: fix system call single stepping
2020-06-25Merge tag 'sound-5.8-rc3' of ↵Linus Torvalds35-160/+347
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes gathered in the last two weeks. The major changes here are fixes for the recent DPCM regressions found on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC DAI registrations. Other than those are mostly device-specific fixes including the usual USB- and HD-audio quirks, and a fix for syzkaller case and ID updates for new Intel platforms" * tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: usb-audio: Fix OOB access of mixer element list ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG) ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S ASoC: rockchip: Fix a reference count leak. ASoC: amd: closing specific instance. ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL ASoC: hdac_hda: fix memleak with regmap not freed on remove ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H ASoC: SOF: Intel: add PCI ID for CometLake-S ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems ALSA: usb-audio: Fix potential use-after-free of streams ALSA: hda/realtek - Add quirk for MSI GE63 laptop ASoC: fsl_ssi: Fix bclk calculation for mono channel ASoC: SOF: Intel: hda: Clear RIRB status before reading WP ASoC: rt1015: Update rt1015 default register value according to spec modification. ASoC: qcom: common: set correct directions for dailinks ASoc: q6afe: add support to get port direction ASoC: soc-pcm: fix checks for multi-cpu FE dailinks ASoC: rt5682: Let dai clks be registered whether mclk exists or not ...
2020-06-25net: enetc add tc flower offload flow metering policing actionPo Liu2-12/+172
Flow metering entries in IEEE 802.1Qci is an optional function for a flow filtering module. Flow metering is two rates two buckets and three color marker to policing the frames. This patch only enable one rate one bucket and in color blind mode. Flow metering instance are as specified in the algorithm in MEF 10.3 and in Bandwidth Profile Parameters. They are: a) Flow meter instance identifier. An integer value identifying the flow meter instance. The patch use the police 'index' as thin value. b) Committed Information Rate (CIR), in bits per second. This patch use the 'rate_bytes_ps' represent this value. c) Committed Burst Size (CBS), in octets. This patch use the 'burst' represent this value. d) Excess Information Rate (EIR), in bits per second. e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets. And plus some other parameters. This patch set EIR/EBS default disable and color blind mode. v1->v2 changes: - Use div_u64() as division replace the '/' report: All errors (new ones prefixed by >>): ld: drivers/net/ethernet/freescale/enetc/enetc_qos.o: in function `enetc_flowmeter_hw_set': >> enetc_qos.c:(.text+0x66): undefined reference to `__udivdi3' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: qos: police action add index for tc flower offloadingPo Liu2-0/+2
Hardware device may include more than one police entry. Specifying the action's index make it possible for several tc filters to share the same police action when installing the filters. Propagate this index to device drivers through the flow offload intermediate representation, so that drivers could share a single hardware policer between multiple filters. v1->v2 changes: - Update the commit message suggest by Ido Schimmel <idosch@idosch.org> Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: enetc: add support max frame size for tc flower offloadPo Liu1-16/+36
Base on the tc flower offload police action add max frame size by the parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci stream filter can implement the max frame size filtering. Add it to the current hardware tc flower stearm filter driver. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: qos: add tc police offloading action with max frame size limitPo Liu3-0/+12
Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some hardware own the capability to limit the frame size. If the frame size larger than the setting, the frame would be dropped. For the police action itself already accept the 'mtu' parameter in tc command. But not extend to tc flower offloading. So extend 'mtu' to tc flower offloading. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-bcmgenet-use-hardware-padding-of-runt-frames'David S. Miller1-83/+5
Doug Berger says: ==================== net: bcmgenet: use hardware padding of runt frames Now that scatter-gather and tx-checksumming are enabled by default it revealed a packet corruption issue that can occur for very short fragmented packets. When padding these frames to the minimum length it is possible for the non-linear (fragment) data to be added to the end of the linear header in an SKB. Since the number of fragments is read before the padding and used afterward without reloading, the fragment that should have been consumed can be tacked on in place of part of the padding. The third commit in this set corrects this by removing the software padding and allowing the hardware to add the pad bytes if necessary. The first two commits resolve warnings observed by the kbuild test robot and are included here for simplicity of application. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bcmgenet: use hardware padding of runt framesDoug Berger1-5/+3
When commit 474ea9cafc45 ("net: bcmgenet: correctly pad short packets") added the call to skb_padto() it should have been located before the nr_frags parameter was read since that value could be changed when padding packets with lengths between 55 and 59 bytes (inclusive). The use of a stale nr_frags value can cause corruption of the pad data when tx-scatter-gather is enabled. This corruption of the pad can cause invalid checksum computation when hardware offload of tx-checksum is also enabled. Since the original reason for the padding was corrected by commit 7dd399130efb ("net: bcmgenet: fix skb_len in bcmgenet_xmit_single()") we can remove the software padding all together and make use of hardware padding of short frames as long as the hardware also always appends the FCS value to the frame. Fixes: 474ea9cafc45 ("net: bcmgenet: correctly pad short packets") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bcmgenet: use __be16 for htons(ETH_P_IP)Doug Berger1-1/+2
The 16-bit value that holds a short in network byte order should be declared as a restricted big endian type to allow type checks to succeed during assignment. Fixes: 3e370952287c ("net: bcmgenet: add support for ethtool rxnfc flows") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bcmgenet: re-remove bcmgenet_hfb_add_filterDoug Berger1-77/+0
This function was originally removed by Baoyou Xie in commit e2072600a241 ("net: bcmgenet: remove unused function in bcmgenet.c") to prevent a build warning. Some of the functions removed by Baoyou Xie are now used for WAKE_FILTER support so his commit was reverted, but this function is still unused and the kbuild test robot dutifully reported the warning. This commit once again removes the remaining unused hfb functions. Fixes: 14da1510fedc ("Revert "net: bcmgenet: remove unused function in bcmgenet.c"") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge tag 'erofs-for-5.8-rc3-fixes' of ↵Linus Torvalds1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Fix a regression which uses potential uninitialized high 32-bit value unexpectedly recently observed with specific compiler options" * tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
2020-06-25selftests: netfilter: add test case for conntrack helper assignmentFlorian Westphal2-1/+176
check that 'nft ... ct helper set <foo>' works: 1. configure ftp helper via nft and assign it to connections on port 2121 2. check with 'conntrack -L' that the next connection has the ftp helper attached to it. Also add a test for auto-assign (old behaviour). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ip6tables: Add a .pre_exit hook in all ip6table_foo.c.David Wilder5-6/+44
Using new helpers ip6t_unregister_table_pre_exit() and ip6t_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit ↵David Wilder2-1/+17
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Add a .pre_exit hook in all iptable_foo.c.David Wilder5-7/+44
Using new helpers ipt_unregister_table_pre_exit() and ipt_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit ↵David Wilder2-1/+20
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: Add MODULE_DESCRIPTION entries to kernel modulesRob Gill41-0/+41
The user tool modinfo is used to get information on kernel modules, including a description where it is available. This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules (descriptions taken from Kconfig file or code comments) Signed-off-by: Rob Gill <rrobgill@protonmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ipset: fix unaligned atomic accessRussell King1-0/+2
When using ip_set with counters and comment, traffic causes the kernel to panic on 32-bit ARM: Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>] Unhandled fault: alignment exception (0x221) at 0xea08133c PC is at ip_set_match_extensions+0xe0/0x224 [ip_set] The problem occurs when we try to update the 64-bit counters - the faulting address above is not 64-bit aligned. The problem occurs due to the way elements are allocated, for example: set->dsize = ip_set_elem_len(set, tb, 0, 0); map = ip_set_alloc(sizeof(*map) + elements * set->dsize); If the element has a requirement for a member to be 64-bit aligned, and set->dsize is not a multiple of 8, but is a multiple of four, then every odd numbered elements will be misaligned - and hitting an atomic64_add() on that element will cause the kernel to panic. ip_set_elem_len() must return a size that is rounded to the maximum alignment of any extension field stored in the element. This change ensures that is the case. Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25qed: add missing error test for DBG_STATUS_NO_MATCHING_FRAMING_MODEColin Ian King1-1/+2
The error DBG_STATUS_NO_MATCHING_FRAMING_MODE was added to the enum enum dbg_status however there is a missing corresponding entry for this in the array s_status_str. This causes an out-of-bounds read when indexing into the last entry of s_status_str. Fix this by adding in the missing entry. Addresses-Coverity: ("Out-of-bounds read"). Fixes: 2d22bc8354b1 ("qed: FW 8.42.2.0 debug features") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-phy-call-phy_disable_interrupts-in-phy_init_hw'David S. Miller3-1/+6
Jisheng Zhang says: ==================== net: phy: call phy_disable_interrupts() in phy_init_hw() We face an issue with rtl8211f, a pin is shared between INTB and PMEB, and the PHY Register Accessible Interrupt is enabled by default, so the INTB/PMEB pin is always active in polling mode case. As Heiner pointed out "I was thinking about calling phy_disable_interrupts() in phy_init_hw(), to have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." patch1 makes phy_disable_interrupts() non-static so that it could be used in phy_init_hw() to have a defined init state. patch2 calls phy_disable_interrupts() in phy_init_hw() to have a defined init state. Since v3: - call phy_disable_interrupts() have interrupts disabled first then config_init, thank Florian Since v2: - Don't export phy_disable_interrupts() but just make it non-static Since v1: - EXPORT the correct symbol ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: call phy_disable_interrupts() in phy_init_hw()Jisheng Zhang1-0/+4
Call phy_disable_interrupts() in phy_init_hw() to "have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." as pointed out by Heiner. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: make phy_disable_interrupts() non-staticJisheng Zhang2-1/+2
We face an issue with rtl8211f, a pin is shared between INTB and PMEB, and the PHY Register Accessible Interrupt is enabled by default, so the INTB/PMEB pin is always active in polling mode case. As Heiner pointed out "I was thinking about calling phy_disable_interrupts() in phy_init_hw(), to have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." Make phy_disable_interrupts() non-static so that it could be used in phy_init_hw() to have a defined init state. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: ethernet: mvneta: Add back interface mode validationSascha Hauer1-3/+19
When writing the serdes configuration register was moved to mvneta_config_interface() the whole code block was removed from mvneta_port_power_up() in the assumption that its only purpose was to write the serdes configuration register. As mentioned by Russell King its purpose was also to check for valid interface modes early so that later in the driver we do not have to care for unexpected interface modes. Add back the test to let the driver bail out early on unhandled interface modes. Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: ethernet: mvneta: Do not error out in non serdes modesSascha Hauer1-1/+1
In mvneta_config_interface() the RGMII modes are catched by the default case which is an error return. The RGMII modes are valid modes for the driver, so instead of returning an error add a break statement to return successfully. This avoids this warning for non comphy SoCs which use RGMII, like SolidRun Clearfog: WARNING: CPU: 0 PID: 268 at drivers/net/ethernet/marvell/mvneta.c:3512 mvneta_start_dev+0x220/0x23c Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25lan743x: Remove duplicated include from lan743x_main.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25dsa: Allow forwarding of redirected IGMP trafficDaniel Mack1-3/+34
The driver for Marvell switches puts all ports in IGMP snooping mode which results in all IGMP/MLD frames that ingress on the ports to be forwarded to the CPU only. The bridge code in the kernel can then interpret these frames and act upon them, for instance by updating the mdb in the switch to reflect multicast memberships of stations connected to the ports. However, the IGMP/MLD frames must then also be forwarded to other ports of the bridge so external IGMP queriers can track membership reports, and external multicast clients can receive query reports from foreign IGMP queriers. Currently, this is impossible as the EDSA tagger sets offload_fwd_mark on the skb when it unwraps the tagged frames, and that will make the switchdev layer prevent the skb from egressing on any other port of the same switch. To fix that, look at the To_CPU code in the DSA header and make forwarding of the frame possible for trapped IGMP packets. Introduce some #defines for the frame types to make the code a bit more comprehensive. This was tested on a Marvell 88E6352 variant. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-bridge-fdb-activity-tracking'David S. Miller4-17/+139
Nikolay Aleksandrov says: ==================== net: bridge: fdb activity tracking This set adds extensions needed for EVPN multi-homing proper and efficient mac sync. User-space (e.g. FRR) needs to be able to track non-dynamic entry activity on per-fdb basis depending if a tracked fdb is currently peer active or locally active and needs to be able to add new peer active fdb (static + track + inactive) without refreshing it to get real activity tracking. Patch 02 adds a new NDA attribute - NDA_FDB_EXT_ATTRS to avoid future pollution of NDA attributes by bridge or vxlan. New bridge/vxlan specific fdb attributes are embedded in NDA_FDB_EXT_ATTRS, which is used in patch 03 to pass the new NFEA_ACTIVITY_NOTIFY attribute which controls if an fdb should be tracked and also reflects its current state when dumping. It is treated as a bitfield, current valid bits are: 1 - mark an entry for activity tracking 2 - mark an entry as inactive to avoid multiple notifications and reflect state properly Patch 04 adds the ability to avoid refreshing an entry when changing it via the NFEA_DONT_REFRESH flag. That allows user-space to mark a static entry for tracking and keep its real activity unchanged. The set has been extensively tested with FRR and those changes will be upstreamed if/after it gets accepted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add a flag to avoid refreshing fdb when changing/addingNikolay Aleksandrov2-1/+5
When we modify or create a new fdb entry sometimes we want to avoid refreshing its activity in order to track it properly. One example is when a mac is received from EVPN multi-homing peer by FRR, which doesn't want to change local activity accounting. It makes it static and sets a flag to track its activity. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add option to allow activity notifications for any fdb entriesNikolay Aleksandrov3-13/+119
This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: neighbor: add fdb extended attributeNikolay Aleksandrov2-0/+13
Add an attribute to NDA which will contain all future fdb-specific attributes in order to avoid polluting the NDA namespace with e.g. bridge or vxlan specific attributes. The attribute is called NDA_FDB_EXT_ATTRS and the structure would look like: [NDA_FDB_EXT_ATTRS] = { [NFEA_xxx] } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: fdb_add_entry takes ndm as argumentNikolay Aleksandrov1-5/+4
We can just pass ndm as an argument instead of its fields separately. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25openvswitch: take into account de-fragmentation/gso_size in ↵Lorenzo Bianconi1-2/+7
execute_check_pkt_len ovs connection tracking module performs de-fragmentation on incoming fragmented traffic. Take info account if traffic has been de-fragmented in execute_check_pkt_len action otherwise we will perform the wrong nested action considering the original packet size. This issue typically occurs if ovs-vswitchd adds a rule in the pipeline that requires connection tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action. Moreover take into account GSO fragment size for GSO packet in execute_check_pkt_len routine Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-phy-mscc-PHC-and-timestamping-support'David S. Miller12-21/+2225
Antoine Tenart says: ==================== net: phy: mscc: PHC and timestamping support This series aims at adding support for PHC and timestamping operations in the MSCC PHY driver, for the VSC858x and VSC8575. Those PHYs are capable of timestamping in 1-step and 2-step for both L2 and L4 traffic. As of this series, only IPv4 support was implemented when using L4 mode. This is because of an hardware limitation which prevents us for supporting both IPv4 and IPv6 at the same time. Implementing support for IPv6 should be quite easy (I do have the modifications needed for the hardware configuration) but I did not see a way to retrieve this information in hwtstamp(). What would you suggest? Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Patch 1 extends the recently added helpers to share information between PHYs of the same hardware package; to allow having part of the probe to be shared (in addition to the already supported init part). This will be used when adding support for PHC/TS to initialize locks. Patches 2 and 3 are mostly cosmetic. Patch 4 takes into account the 1588 block in the MACsec initialization, to allow having both the MACsec and 1588 blocks initialized on a running system. Patches 5 and 6 add support for PHC and timestamping operations in the MSCC driver. An initialization of the 1588 block (plus all the registers definition; and helpers) is added first; and then comes a patch to implement the PHC and timestamping API. Patches 7 and 8 add the required hardware description for device trees, to be able to use the load/save GPIO pin on the PCB120 board. To use this on a PCB120 board, two other series are needed and have already been sent upstream (one is merged). There are no dependency between all those series. Since v3: - Fixed a SKB leak. - Removed ts_lock from the init, as TS and PHC operations aren't registered at this time. - Refectored the ts_base_addr/phy intialization. - Cleaned up the ingr/egr latencies definitons. - Fixed a comment about locking and the shared GPIO. - A few cosmetic fixes. Since v2: - Removed explicit inlines from .c files. - Fixed three warnings. Since v1: - Removed checks in rxtstamp/txtstamp as skb cannot be NULL here. - Reworked get_ptp_header_rx/get_ptp_header. - Reworked the locking logic between the PHC and timestamping operations. - Fixed a compilation issue on x86 reported by Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25MIPS: dts: ocelot: describe the load/save GPIOQuentin Schulz1-1/+11
This patch adds a description of the load/save GPIN pin, used in the VSC8584 PHY for timestamping operations. The related pinctrl description is also added. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25dt-bindings: net: phy: vsc8531: document the load/save GPIOAntoine Tenart1-0/+3
A new optional property can be used to reference the load/save GPIO, used for PTP hardware clock (PHC) operations. This patch documents it in the binding documentation. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: timestamping and PHC supportAntoine Tenart3-4/+630
This patch adds support for PHC and timestamping operations for the MSCC PHY. PTP 1-step and 2-step modes are supported, over Ethernet and UDP. To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: 1588 block initializationQuentin Schulz5-2/+1552
This patch adds the first parts of the 1588 support in the MSCC PHY, with registers definition and the 1588 block initialization. Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). Co-developed-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: take into account the 1588 block in MACsec initAntoine Tenart1-1/+3
This patch takes in account the use of the 1588 block in the MACsec initialization, as a conditional configuration has to be done (when the 1588 block is used). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: remove the TR CLK disable magic valueQuentin Schulz2-5/+6
This patch adds a define for the 0x8000 magic value used to perform enable/disable actions on the "token ring clock". The patch is only cosmetic. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: fix copyright and author information in MACsecAntoine Tenart4-6/+6
All headers in the MSCC PHY driver have been copied and pasted from the original mscc.c file. However the information is not necessarily correct, as in the MACsec support. Fix this. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: add support for a common probe between shared PHYsAntoine Tenart1-3/+15
Shared PHYs (PHYs in the same hardware package) may have shared registers and their drivers would usually need to share information. There is currently a way to have a shared (part of the) init, by using phy_package_init_once(). This patch extends the logic to share parts of the probe to allow sharing the initialization of locks or resources retrieval. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds9-35/+207
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place. This includes a couple of tests that I would normally defer, but since they have already been helpful in catching some bugs, don't build for any users at all, and having them upstream makes life easier for everyone, I think it's ok even at this late stage" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: Use tools/include/list.h instead of stubs tools/virtio: Reset index in virtio_test --reset. tools/virtio: Extract virtqueue initialization in vq_reset tools/virtio: Use __vring_new_virtqueue in virtio_test.c tools/virtio: Add --reset tools/virtio: Add --batch=random option tools/virtio: Add --batch option virtio-mem: add memory via add_memory_driver_managed() virtio-mem: silence a static checker warning vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap() vdpa: fix typos in the comments for __vdpa_alloc_device()
2020-06-25Merge tag 'for-linus-2020-06-24' of ↵Linus Torvalds4-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fix from Christian Brauner: "This fixes a regression introduced with 303cc571d107 ("nsproxy: attach to namespaces via pidfds"). The LTP testsuite reported a regression where users would now see EBADF returned instead of EINVAL when an fd was passed that referred to an open file but the file was not a namespace file. Fix this by continuing to report EINVAL and add a regression test" * tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: test for setns() EINVAL regression nsproxy: restore EINVAL for non-namespace file descriptor
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn2-20/+53
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn1-17/+18
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Fix module use count flaw due to leftover module put callsDennis Dalessandro1-17/+2
When the try_module_get calls were removed from opening and closing of the i2c debugfs file, the corresponding module_put calls were missed. This results in an inaccurate module use count that requires a power cycle to fix. Fixes: 09fbca8e6240 ("IB/hfi1: No need to use try_module_get for debugfs") Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>