summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-25net: bpf: Implement bpf iterator for udpYonghong Song1-0/+116
The bpf iterator for udp is implemented. Both udp4 and udp6 sockets will be traversed. It is up to bpf program to filter for udp4 or udp6 only, or both families of sockets. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230813.3988404-1-yhs@fb.com
2020-06-25net: bpf: Add bpf_seq_afinfo in udp_iter_stateYonghong Song2-5/+24
Similar to tcp_iter_state, a new field bpf_seq_afinfo is added to udp_iter_state to provide bpf udp iterator afinfo. This does not change /proc/net/{udp, udp6} behavior. But it enables bpf iterator to avoid get afinfo from PDE_DATA and iterate through all udp and udp6 sockets in one pass. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230812.3988347-1-yhs@fb.com
2020-06-25bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpersYonghong Song6-2/+121
Three more helpers are added to cast a sock_common pointer to an tcp_sock, tcp_timewait_sock or a tcp_request_sock for tracing programs. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230811.3988277-1-yhs@fb.com
2020-06-25bpf: Add bpf_skc_to_tcp6_sock() helperYonghong Song8-12/+148
The helper is used in tracing programs to cast a socket pointer to a tcp6_sock pointer. The return value could be NULL if the casting is illegal. A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added so the verifier is able to deduce proper return types for the helper. Different from the previous BTF_ID based helpers, the bpf_skc_to_tcp6_sock() argument can be several possible btf_ids. More specifically, all possible socket data structures with sock_common appearing in the first in the memory layout. This patch only added socket types related to tcp and udp. All possible argument btf_id and return value btf_id for helper bpf_skc_to_tcp6_sock() are pre-calculcated and cached. In the future, it is even possible to precompute these btf_id's at kernel build time. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
2020-06-25bpf: Allow tracing programs to use bpf_jiffies64() helperYonghong Song1-0/+2
/proc/net/tcp{4,6} uses jiffies for various computations. Let us add bpf_jiffies64() helper to tracing program so bpf_iter and other programs can use it. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230808.3988073-1-yhs@fb.com
2020-06-25bpf: Support 'X' in bpf_seq_printf() helperYonghong Song1-1/+2
'X' tells kernel to print hex with upper case letters. /proc/net/tcp{4,6} seq_file show() used this, and supports it in bpf_seq_printf() helper too. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230807.3988014-1-yhs@fb.com
2020-06-25net: bpf: Implement bpf iterator for tcpYonghong Song1-0/+123
The bpf iterator for tcp is implemented. Both tcp4 and tcp6 sockets will be traversed. It is up to bpf program to filter for tcp4 or tcp6 only, or both families of sockets. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230805.3987959-1-yhs@fb.com
2020-06-25net: bpf: Add bpf_seq_afinfo in tcp_iter_stateYonghong Song2-6/+25
A new field bpf_seq_afinfo is added to tcp_iter_state to provide bpf tcp iterator afinfo. There are two reasons on why we did this. First, the current way to get afinfo from PDE_DATA does not work for bpf iterator as its seq_file inode does not conform to /proc/net/{tcp,tcp6} inode structures. More specifically, anonymous bpf iterator will use an anonymous inode which is shared in the system and we cannot change inode private data structure at all. Second, bpf iterator for tcp/tcp6 wants to traverse all tcp and tcp6 sockets in one pass and bpf program can control whether they want to skip one sk_family or not. Having a different afinfo with family AF_UNSPEC make it easier to understand in the code. This patch does not change /proc/net/{tcp,tcp6} behavior as the bpf_seq_afinfo will be NULL for these two proc files. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230804.3987829-1-yhs@fb.com
2020-06-25Merge tag 'erofs-for-5.8-rc3-fixes' of ↵Linus Torvalds1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Fix a regression which uses potential uninitialized high 32-bit value unexpectedly recently observed with specific compiler options" * tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
2020-06-25selftests: netfilter: add test case for conntrack helper assignmentFlorian Westphal2-1/+176
check that 'nft ... ct helper set <foo>' works: 1. configure ftp helper via nft and assign it to connections on port 2121 2. check with 'conntrack -L' that the next connection has the ftp helper attached to it. Also add a test for auto-assign (old behaviour). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ip6tables: Add a .pre_exit hook in all ip6table_foo.c.David Wilder5-6/+44
Using new helpers ip6t_unregister_table_pre_exit() and ip6t_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit ↵David Wilder2-1/+17
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Add a .pre_exit hook in all iptable_foo.c.David Wilder5-7/+44
Using new helpers ipt_unregister_table_pre_exit() and ipt_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit ↵David Wilder2-1/+20
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: Add MODULE_DESCRIPTION entries to kernel modulesRob Gill41-0/+41
The user tool modinfo is used to get information on kernel modules, including a description where it is available. This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules (descriptions taken from Kconfig file or code comments) Signed-off-by: Rob Gill <rrobgill@protonmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ipset: fix unaligned atomic accessRussell King1-0/+2
When using ip_set with counters and comment, traffic causes the kernel to panic on 32-bit ARM: Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>] Unhandled fault: alignment exception (0x221) at 0xea08133c PC is at ip_set_match_extensions+0xe0/0x224 [ip_set] The problem occurs when we try to update the 64-bit counters - the faulting address above is not 64-bit aligned. The problem occurs due to the way elements are allocated, for example: set->dsize = ip_set_elem_len(set, tb, 0, 0); map = ip_set_alloc(sizeof(*map) + elements * set->dsize); If the element has a requirement for a member to be 64-bit aligned, and set->dsize is not a multiple of 8, but is a multiple of four, then every odd numbered elements will be misaligned - and hitting an atomic64_add() on that element will cause the kernel to panic. ip_set_elem_len() must return a size that is rounded to the maximum alignment of any extension field stored in the element. This change ensures that is the case. Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25qed: add missing error test for DBG_STATUS_NO_MATCHING_FRAMING_MODEColin Ian King1-1/+2
The error DBG_STATUS_NO_MATCHING_FRAMING_MODE was added to the enum enum dbg_status however there is a missing corresponding entry for this in the array s_status_str. This causes an out-of-bounds read when indexing into the last entry of s_status_str. Fix this by adding in the missing entry. Addresses-Coverity: ("Out-of-bounds read"). Fixes: 2d22bc8354b1 ("qed: FW 8.42.2.0 debug features") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-phy-call-phy_disable_interrupts-in-phy_init_hw'David S. Miller3-1/+6
Jisheng Zhang says: ==================== net: phy: call phy_disable_interrupts() in phy_init_hw() We face an issue with rtl8211f, a pin is shared between INTB and PMEB, and the PHY Register Accessible Interrupt is enabled by default, so the INTB/PMEB pin is always active in polling mode case. As Heiner pointed out "I was thinking about calling phy_disable_interrupts() in phy_init_hw(), to have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." patch1 makes phy_disable_interrupts() non-static so that it could be used in phy_init_hw() to have a defined init state. patch2 calls phy_disable_interrupts() in phy_init_hw() to have a defined init state. Since v3: - call phy_disable_interrupts() have interrupts disabled first then config_init, thank Florian Since v2: - Don't export phy_disable_interrupts() but just make it non-static Since v1: - EXPORT the correct symbol ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: call phy_disable_interrupts() in phy_init_hw()Jisheng Zhang1-0/+4
Call phy_disable_interrupts() in phy_init_hw() to "have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." as pointed out by Heiner. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: make phy_disable_interrupts() non-staticJisheng Zhang2-1/+2
We face an issue with rtl8211f, a pin is shared between INTB and PMEB, and the PHY Register Accessible Interrupt is enabled by default, so the INTB/PMEB pin is always active in polling mode case. As Heiner pointed out "I was thinking about calling phy_disable_interrupts() in phy_init_hw(), to have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." Make phy_disable_interrupts() non-static so that it could be used in phy_init_hw() to have a defined init state. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: ethernet: mvneta: Add back interface mode validationSascha Hauer1-3/+19
When writing the serdes configuration register was moved to mvneta_config_interface() the whole code block was removed from mvneta_port_power_up() in the assumption that its only purpose was to write the serdes configuration register. As mentioned by Russell King its purpose was also to check for valid interface modes early so that later in the driver we do not have to care for unexpected interface modes. Add back the test to let the driver bail out early on unhandled interface modes. Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: ethernet: mvneta: Do not error out in non serdes modesSascha Hauer1-1/+1
In mvneta_config_interface() the RGMII modes are catched by the default case which is an error return. The RGMII modes are valid modes for the driver, so instead of returning an error add a break statement to return successfully. This avoids this warning for non comphy SoCs which use RGMII, like SolidRun Clearfog: WARNING: CPU: 0 PID: 268 at drivers/net/ethernet/marvell/mvneta.c:3512 mvneta_start_dev+0x220/0x23c Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25lan743x: Remove duplicated include from lan743x_main.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25dsa: Allow forwarding of redirected IGMP trafficDaniel Mack1-3/+34
The driver for Marvell switches puts all ports in IGMP snooping mode which results in all IGMP/MLD frames that ingress on the ports to be forwarded to the CPU only. The bridge code in the kernel can then interpret these frames and act upon them, for instance by updating the mdb in the switch to reflect multicast memberships of stations connected to the ports. However, the IGMP/MLD frames must then also be forwarded to other ports of the bridge so external IGMP queriers can track membership reports, and external multicast clients can receive query reports from foreign IGMP queriers. Currently, this is impossible as the EDSA tagger sets offload_fwd_mark on the skb when it unwraps the tagged frames, and that will make the switchdev layer prevent the skb from egressing on any other port of the same switch. To fix that, look at the To_CPU code in the DSA header and make forwarding of the frame possible for trapped IGMP packets. Introduce some #defines for the frame types to make the code a bit more comprehensive. This was tested on a Marvell 88E6352 variant. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-bridge-fdb-activity-tracking'David S. Miller4-17/+139
Nikolay Aleksandrov says: ==================== net: bridge: fdb activity tracking This set adds extensions needed for EVPN multi-homing proper and efficient mac sync. User-space (e.g. FRR) needs to be able to track non-dynamic entry activity on per-fdb basis depending if a tracked fdb is currently peer active or locally active and needs to be able to add new peer active fdb (static + track + inactive) without refreshing it to get real activity tracking. Patch 02 adds a new NDA attribute - NDA_FDB_EXT_ATTRS to avoid future pollution of NDA attributes by bridge or vxlan. New bridge/vxlan specific fdb attributes are embedded in NDA_FDB_EXT_ATTRS, which is used in patch 03 to pass the new NFEA_ACTIVITY_NOTIFY attribute which controls if an fdb should be tracked and also reflects its current state when dumping. It is treated as a bitfield, current valid bits are: 1 - mark an entry for activity tracking 2 - mark an entry as inactive to avoid multiple notifications and reflect state properly Patch 04 adds the ability to avoid refreshing an entry when changing it via the NFEA_DONT_REFRESH flag. That allows user-space to mark a static entry for tracking and keep its real activity unchanged. The set has been extensively tested with FRR and those changes will be upstreamed if/after it gets accepted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add a flag to avoid refreshing fdb when changing/addingNikolay Aleksandrov2-1/+5
When we modify or create a new fdb entry sometimes we want to avoid refreshing its activity in order to track it properly. One example is when a mac is received from EVPN multi-homing peer by FRR, which doesn't want to change local activity accounting. It makes it static and sets a flag to track its activity. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add option to allow activity notifications for any fdb entriesNikolay Aleksandrov3-13/+119
This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: neighbor: add fdb extended attributeNikolay Aleksandrov2-0/+13
Add an attribute to NDA which will contain all future fdb-specific attributes in order to avoid polluting the NDA namespace with e.g. bridge or vxlan specific attributes. The attribute is called NDA_FDB_EXT_ATTRS and the structure would look like: [NDA_FDB_EXT_ATTRS] = { [NFEA_xxx] } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: fdb_add_entry takes ndm as argumentNikolay Aleksandrov1-5/+4
We can just pass ndm as an argument instead of its fields separately. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25openvswitch: take into account de-fragmentation/gso_size in ↵Lorenzo Bianconi1-2/+7
execute_check_pkt_len ovs connection tracking module performs de-fragmentation on incoming fragmented traffic. Take info account if traffic has been de-fragmented in execute_check_pkt_len action otherwise we will perform the wrong nested action considering the original packet size. This issue typically occurs if ovs-vswitchd adds a rule in the pipeline that requires connection tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action. Moreover take into account GSO fragment size for GSO packet in execute_check_pkt_len routine Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge branch 'net-phy-mscc-PHC-and-timestamping-support'David S. Miller12-21/+2225
Antoine Tenart says: ==================== net: phy: mscc: PHC and timestamping support This series aims at adding support for PHC and timestamping operations in the MSCC PHY driver, for the VSC858x and VSC8575. Those PHYs are capable of timestamping in 1-step and 2-step for both L2 and L4 traffic. As of this series, only IPv4 support was implemented when using L4 mode. This is because of an hardware limitation which prevents us for supporting both IPv4 and IPv6 at the same time. Implementing support for IPv6 should be quite easy (I do have the modifications needed for the hardware configuration) but I did not see a way to retrieve this information in hwtstamp(). What would you suggest? Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Patch 1 extends the recently added helpers to share information between PHYs of the same hardware package; to allow having part of the probe to be shared (in addition to the already supported init part). This will be used when adding support for PHC/TS to initialize locks. Patches 2 and 3 are mostly cosmetic. Patch 4 takes into account the 1588 block in the MACsec initialization, to allow having both the MACsec and 1588 blocks initialized on a running system. Patches 5 and 6 add support for PHC and timestamping operations in the MSCC driver. An initialization of the 1588 block (plus all the registers definition; and helpers) is added first; and then comes a patch to implement the PHC and timestamping API. Patches 7 and 8 add the required hardware description for device trees, to be able to use the load/save GPIO pin on the PCB120 board. To use this on a PCB120 board, two other series are needed and have already been sent upstream (one is merged). There are no dependency between all those series. Since v3: - Fixed a SKB leak. - Removed ts_lock from the init, as TS and PHC operations aren't registered at this time. - Refectored the ts_base_addr/phy intialization. - Cleaned up the ingr/egr latencies definitons. - Fixed a comment about locking and the shared GPIO. - A few cosmetic fixes. Since v2: - Removed explicit inlines from .c files. - Fixed three warnings. Since v1: - Removed checks in rxtstamp/txtstamp as skb cannot be NULL here. - Reworked get_ptp_header_rx/get_ptp_header. - Reworked the locking logic between the PHC and timestamping operations. - Fixed a compilation issue on x86 reported by Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25MIPS: dts: ocelot: describe the load/save GPIOQuentin Schulz1-1/+11
This patch adds a description of the load/save GPIN pin, used in the VSC8584 PHY for timestamping operations. The related pinctrl description is also added. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25dt-bindings: net: phy: vsc8531: document the load/save GPIOAntoine Tenart1-0/+3
A new optional property can be used to reference the load/save GPIO, used for PTP hardware clock (PHC) operations. This patch documents it in the binding documentation. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: timestamping and PHC supportAntoine Tenart3-4/+630
This patch adds support for PHC and timestamping operations for the MSCC PHY. PTP 1-step and 2-step modes are supported, over Ethernet and UDP. To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: 1588 block initializationQuentin Schulz5-2/+1552
This patch adds the first parts of the 1588 support in the MSCC PHY, with registers definition and the 1588 block initialization. Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). Co-developed-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: take into account the 1588 block in MACsec initAntoine Tenart1-1/+3
This patch takes in account the use of the 1588 block in the MACsec initialization, as a conditional configuration has to be done (when the 1588 block is used). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: remove the TR CLK disable magic valueQuentin Schulz2-5/+6
This patch adds a define for the 0x8000 magic value used to perform enable/disable actions on the "token ring clock". The patch is only cosmetic. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: mscc: fix copyright and author information in MACsecAntoine Tenart4-6/+6
All headers in the MSCC PHY driver have been copied and pasted from the original mscc.c file. However the information is not necessarily correct, as in the MACsec support. Fix this. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: add support for a common probe between shared PHYsAntoine Tenart1-3/+15
Shared PHYs (PHYs in the same hardware package) may have shared registers and their drivers would usually need to share information. There is currently a way to have a shared (part of the) init, by using phy_package_init_once(). This patch extends the logic to share parts of the probe to allow sharing the initialization of locks or resources retrieval. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds9-35/+207
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place. This includes a couple of tests that I would normally defer, but since they have already been helpful in catching some bugs, don't build for any users at all, and having them upstream makes life easier for everyone, I think it's ok even at this late stage" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: Use tools/include/list.h instead of stubs tools/virtio: Reset index in virtio_test --reset. tools/virtio: Extract virtqueue initialization in vq_reset tools/virtio: Use __vring_new_virtqueue in virtio_test.c tools/virtio: Add --reset tools/virtio: Add --batch=random option tools/virtio: Add --batch option virtio-mem: add memory via add_memory_driver_managed() virtio-mem: silence a static checker warning vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap() vdpa: fix typos in the comments for __vdpa_alloc_device()
2020-06-25Merge tag 'for-linus-2020-06-24' of ↵Linus Torvalds4-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fix from Christian Brauner: "This fixes a regression introduced with 303cc571d107 ("nsproxy: attach to namespaces via pidfds"). The LTP testsuite reported a regression where users would now see EBADF returned instead of EINVAL when an fd was passed that referred to an open file but the file was not a namespace file. Fix this by continuing to report EINVAL and add a regression test" * tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: test for setns() EINVAL regression nsproxy: restore EINVAL for non-namespace file descriptor
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn2-20/+53
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn1-17/+18
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Fix module use count flaw due to leftover module put callsDennis Dalessandro1-17/+2
When the try_module_get calls were removed from opening and closing of the i2c debugfs file, the corresponding module_put calls were missed. This results in an inaccurate module use count that requires a power cycle to fix. Fixes: 09fbca8e6240 ("IB/hfi1: No need to use try_module_get for debugfs") Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Restore kfree in dummy_netdev cleanupDennis Dalessandro1-1/+1
We need to do some rework on the dummy netdev. Calling the free_netdev() would normally make sense, and that will be addressed in an upcoming patch. For now just revert the behavior to what it was before keeping the unused variable removal part of the patch. The dd->dumm_netdev is mainly used for packet receiving through alloc_netdev_mqs() for typical net devices. A a result, it should be freed with kfree instead of free_netdev() that leads to a crash when unloading the hfi1 module: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 RIP: 0010:__hw_addr_flush+0x12/0x80 Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84 RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298 RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549 R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298 R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000 FS: 00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dev_addr_flush+0x15/0x30 free_netdev+0x7e/0x130 hfi1_netdev_free+0x59/0x70 [hfi1] remove_one+0x65/0x110 [hfi1] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0xec/0x1b0 driver_detach+0x46/0x90 bus_remove_driver+0x58/0xd0 pci_unregister_driver+0x26/0xa0 hfi1_mod_cleanup+0xc/0xd54 [hfi1] __x64_sys_delete_module+0x16c/0x260 ? exit_to_usermode_loop+0xa4/0xc0 do_syscall_64+0x5b/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 193ba03141bb ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()") Link: https://lore.kernel.org/r/20200623203224.106975.16926.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24bpf: Add SO_KEEPALIVE and related options to bpf_setsockoptDmitry Yakunin4-5/+72
This patch adds support of SO_KEEPALIVE flag and TCP related options to bpf_setsockopt() routine. This is helpful if we want to enable or tune TCP keepalive for applications which don't do it in the userspace code. v3: - update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>) v4: - update kernel-doc in tools too (Alexei Starovoitov) - add test to selftests (Alexei Starovoitov) Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
2020-06-24tcp: Expose tcp_sock_set_keepidle_lockedDmitry Yakunin2-3/+4
This is preparation for usage in bpf_setsockopt. v2: - remove redundant EXPORT_SYMBOL (Alexei Starovoitov) Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200620153052.9439-2-zeil@yandex-team.ru
2020-06-24sock: Move sock_valbool_flag to headerDmitry Yakunin2-9/+9
This is preparation for usage in bpf_setsockopt. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200620153052.9439-1-zeil@yandex-team.ru
2020-06-24selftests/bpf: Workaround for get_stack_rawtp test.Alexei Starovoitov1-1/+2
./test_progs-no_alu32 -t get_stack_raw_tp fails due to: 52: (85) call bpf_get_stack#67 53: (bf) r8 = r0 54: (bf) r1 = r8 55: (67) r1 <<= 32 56: (c7) r1 s>>= 32 ; if (usize < 0) 57: (c5) if r1 s< 0x0 goto pc+26 R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 58: (1f) r9 -= r8 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 59: (bf) r2 = r7 60: (0f) r2 += r1 regs=1 stack=0 before 52: (85) call bpf_get_stack#67 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 61: (bf) r1 = r6 62: (bf) r3 = r9 63: (b7) r4 = 0 64: (85) call bpf_get_stack#67 R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608) R3 unbounded memory access, use 'var &= const' or 'if (var < const)' In the C code: usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); if (usize < 0) return 0; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); if (ksize < 0) return 0; We used to have problem with pointer arith in R2. Now it's a problem with two integers in R3. 'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800]. Both registers represent the same 'usize' variable. Then R9 -= R8 is doing 800 - [-inf, 800] so the result of "max_len - usize" looks unbounded to the verifier while it's obvious in C code that "max_len - usize" should be [0, 800]. To workaround the problem convert ksize and usize variables from int to long. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-24libbpf: Prevent loading vmlinux BTF twiceAndrii Nakryiko1-11/+22
Prevent loading/parsing vmlinux BTF twice in some cases: for CO-RE relocations and for BTF-aware hooks (tp_btf, fentry/fexit, etc). Fixes: a6ed02cac690 ("libbpf: Load btf_vmlinux only once per object.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200624043805.1794620-1-andriin@fb.com