kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-06-17	selftests: mlxsw: resource_scale: Allow skipping a test	Petr Machata	2	-0/+9
	The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. Sometimes the scale test depends on more than one resource. In particular, a following patch will add a RIF counter scale test, which depends on the number of RIF counters that can be bound, and also on the number of RIFs that can be created. When the test is limited by the auxiliary resource and not by the primary one, there's no point trying to run the overflow test, because it would be testing exhaustion of the wrong resource. To support this use case, when the $test_get_target yields 0, skip the test instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Introduce traffic tests	Petr Machata	2	-3/+20
	The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. However the ability to allocate the resource does not mean that the resource actually works when passing traffic. For that, make it possible for a given scale to also test traffic. Traffic test is only run on the positive leg of the scale test (no point trying to pass traffic when the expected outcome is that the resource will not be allocated). Traffic tests are opt-in, if a given test does not expose it, it is not run. To this end, delay the test cleanup until after the traffic test is run. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Update scale target after test setup	Ido Schimmel	2	-0/+6
	The scale of each resource is tested in the following manner: 1. The scale target is queried. 2. The test setup is prepared. 3. The test is invoked. In some cases, the occupancy of a resource changes as part of the second step, requiring the test to return a scale target that takes this change into account. Make this more robust by re-querying the scale target after the second step. Another possible solution is to swap the first and second steps, but when a test needs to be skipped (i.e., scale target is zero), the setup would have been in vain. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other ↵	Amit Cohen	1	-3/+4
	configurations Using mlxsw driver, the configurations are offloaded just in case that there is a physical port which is enslaved to the virtual device (e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an address and route before there are ports in the bridge. It means that these configurations are not offloaded. Till now the test passes with mlxsw driver even that the RIF of the bridge is not in the hardware, because the ARP packets are trapped in layer 2 and also mirrored, so there is no real need of the RIF in hardware. The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to be done at layer 3 instead of layer 2. With this change the ARP packets are not trapped during the test, as the RIF is not in the hardware because of the order of configurations. Reorder the configurations to make them to be offloaded, then the test will pass with the change of the traps. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	mlxsw: Add a resource describing number of RIFs	Petr Machata	3	-0/+42
	The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be created. The limit depends on the ASIC and FW revision, and mlxsw reads it from the FW. In order to communicate both the number of RIFs that there can be, and how many are taken now (i.e. occupancy), introduce a corresponding devlink resource. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	mlxsw: Keep track of number of allocated RIFs	Petr Machata	2	-0/+7
	In order to expose number of RIFs as a resource, it is going to be handy to have the number of currently-allocated RIFs as a single number. Introduce such. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	mlxsw: Trap ARP packets at layer 3 instead of layer 2	Amit Cohen	2	-6/+6
	Currently, the traps 'ARP_REQUEST' and 'ARP_RESPONSE' occur at layer 2. To allow the packets to be flooded, they are configured with the action 'MIRROR_TO_CPU' which means that the CPU receives a replica of the packet. Today, Spectrum ASICs also support trapping ARP packets at layer 3. This behavior is better, then the packets can just be trapped and there is no need to mirror them. An additional motivation is that using the traps at layer 2, the ARP packets are dropped in the router as they do not have an IP header, then they are counted as error packets, which might confuse users. Add the relevant traps for layer 3 and use them instead of the existing traps. There is no visible change to user space. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	Merge branch 'tcp-mem-pressure-fixes'	David S. Miller	2	-7/+50
	Eric Dumazet says: ==================== tcp: final (?) round of mem pressure fixes While working on prior patch series (e10b02ee5b6c "Merge branch 'net-reduce-tcp_memory_allocated-inflation'"), I found that we could still have frozen TCP flows under memory pressure. I thought we had solved this in 2015, but the fix was not complete. v2: deal with zerocopy tx paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	tcp: fix possible freeze in tx path under memory pressure	Eric Dumazet	1	-0/+17
	Blamed commit only dealt with applications issuing small writes. Issue here is that we allow to force memory schedule for the sk_buff allocation, but we have no guarantee that sendmsg() is able to copy some payload in it. In this patch, I make sure the socket can use up to tcp_wmem[0] bytes. For example, if we consider tcp_wmem[0] = 4096 (default on x86), and initial skb->truesize being 1280, tcp_sendmsg() is able to copy up to 2816 bytes under memory pressure. Before this patch a sendmsg() sending more than 2816 bytes would either block forever (if persistent memory pressure), or return -EAGAIN. For bigger MTU networks, it is advised to increase tcp_wmem[0] to avoid sending too small packets. v2: deal with zero copy paths. Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	tcp: fix possible freeze in tx path under memory pressure	Eric Dumazet	1	-4/+29
	Blamed commit only dealt with applications issuing small writes. Issue here is that we allow to force memory schedule for the sk_buff allocation, but we have no guarantee that sendmsg() is able to copy some payload in it. In this patch, I make sure the socket can use up to tcp_wmem[0] bytes. For example, if we consider tcp_wmem[0] = 4096 (default on x86), and initial skb->truesize being 1280, tcp_sendmsg() is able to copy up to 2816 bytes under memory pressure. Before this patch a sendmsg() sending more than 2816 bytes would either block forever (if persistent memory pressure), or return -EAGAIN. For bigger MTU networks, it is advised to increase tcp_wmem[0] to avoid sending too small packets. v2: deal with zero copy paths. Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	tcp: fix over estimation in sk_forced_mem_schedule()	Eric Dumazet	1	-3/+4
	sk_forced_mem_schedule() has a bug similar to ones fixed in commit 7c80b038d23e ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors") While this bug has little chance to trigger in old kernels, we need to fix it before the following patch. Fixes: d83769a580f1 ("tcp: fix possible deadlock in tcp_send_fin()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests/bpf: Don't force lld on non-x86 architectures	Andrii Nakryiko	1	-2/+9
	LLVM's lld linker doesn't have a universal architecture support (e.g., it definitely doesn't work on s390x), so be safe and force lld for urandom_read and liburandom_read.so only on x86 architectures. This should fix s390x CI runs. Fixes: 3e6fe5ce4d48 ("libbpf: Fix internal USDT address translation logic for shared libraries") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220617045512.1339795-1-andrii@kernel.org
2022-06-17	Merge branch 'New BPF helpers to accelerate synproxy'	Alexei Starovoitov	13	-21/+1833
	Maxim Mikityanskiy says: ==================== The first patch of this series is a documentation fix. The second patch allows BPF helpers to accept memory regions of fixed size without doing runtime size checks. The two next patches add new functionality that allows XDP to accelerate iptables synproxy. v1 of this series [1] used to include a patch that exposed conntrack lookup to BPF using stable helpers. It was superseded by series [2] by Kumar Kartikeya Dwivedi, which implements this functionality using unstable helpers. The third patch adds new helpers to issue and check SYN cookies without binding to a socket, which is useful in the synproxy scenario. The fourth patch adds a selftest, which includes an XDP program and a userspace control application. The XDP program uses socketless SYN cookie helpers and queries conntrack status instead of socket status. The userspace control application allows to tune parameters of the XDP program. This program also serves as a minimal example of usage of the new functionality. The last two patches expose the new helpers to TC BPF and extend the selftest. The draft of the new functionality was presented on Netdev 0x15 [3]. v2 changes: Split into two series, submitted bugfixes to bpf, dropped the conntrack patches, implemented the timestamp cookie in BPF using bpf_loop, dropped the timestamp cookie patch. v3 changes: Moved some patches from bpf to bpf-next, dropped the patch that changed error codes, split the new helpers into IPv4/IPv6, added verifier functionality to accept memory regions of fixed size. v4 changes: Converted the selftest to the test_progs runner. Replaced some deprecated functions in xdp_synproxy userspace helper. v5 changes: Fixed a bug in the selftest. Added questionable functionality to support new helpers in TC BPF, added selftests for it. v6 changes: Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced fclose with pclose and fixed the MSS for IPv6 in the selftest. v7 changes: Fixed the off-by-one error in indices, changed the section name to "xdp", added missing kernel config options to vmtest in CI. v8 changes: Properly rebased, dropped the first patch (the same change was applied by someone else), updated the cover letter. v9 changes: Fixed selftests for no_alu32. v10 changes: Selftests for s390x were blacklisted due to lack of support of kfunc, rebased the series, split selftests to separate commits, created ARG_PTR_TO_FIXED_SIZE_MEM and packed arg_size, addressed the rest of comments. [1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/ [2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/ [3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	selftests/bpf: Add selftests for raw syncookie helpers in TC mode	Maxim Mikityanskiy	3	-69/+224
	This commit extends selftests for the new BPF helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF functionality added in the previous commit. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-7-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: Allow the new syncookie helpers to work with SKBs	Maxim Mikityanskiy	1	-0/+10
	This commit allows the new BPF helpers to work in SKB context (in TC BPF programs): bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. Using these helpers in TC BPF programs is not recommended, because it's unlikely that the BPF program will provide any substantional speedup compared to regular SYN cookies or synproxy, after the SKB is already created. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-6-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	selftests/bpf: Add selftests for raw syncookie helpers	Maxim Mikityanskiy	5	-1/+1330
	This commit adds selftests for the new BPF helpers: bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. xdp_synproxy_kern.c is a BPF program that generates SYN cookies on allowed TCP ports and sends SYNACKs to clients, accelerating synproxy iptables module. xdp_synproxy.c is a userspace control application that allows to configure the following options in runtime: list of allowed ports, MSS, window scale, TTL. A selftest is added to prog_tests that leverages the above programs to test the functionality of the new helpers. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-5-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: Add helpers to issue and check SYN cookies in XDP	Maxim Mikityanskiy	6	-1/+281
	The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP program to generate SYN cookies in response to TCP SYN packets and to check those cookies upon receiving the first ACK packet (the final packet of the TCP handshake). Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a listening socket on the local machine, which allows to use them together with synproxy to accelerate SYN cookie generation. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: Allow helpers to accept pointers with a fixed size	Maxim Mikityanskiy	2	-11/+45
	Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-3-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie	Maxim Mikityanskiy	2	-8/+12
	bpf_tcp_gen_syncookie expects the full length of the TCP header (with all options), and bpf_tcp_check_syncookie accepts lengths bigger than sizeof(struct tcphdr). Fix the documentation that says these lengths should be exactly sizeof(struct tcphdr). While at it, fix a typo in the name of struct ipv6hdr. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements'	Jakub Kicinski	5	-9/+567
	Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series continues with the addition of supported features for the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. ==================== Link: https://lore.kernel.org/r/20220616041226.26996-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: phy: add support to get Master-Slave configuration	Raju Lakkaraju	1	-0/+3
	Add support to Master-Slave configuration and state Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: lan743x: Add support to SGMII 1G and 2.5G	Raju Lakkaraju	3	-9/+442
	Add SGMII access read and write functions Add support to SGMII 1G and 2.5G for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: lan743x: Add support to Secure-ON WOL	Raju Lakkaraju	3	-0/+51
	Add support to Magic Packet Detection with Secure-ON for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: lan743x: Add support to LAN743x register dump	Raju Lakkaraju	2	-0/+71
	Add support to LAN743x common register dump Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	Merge branch 'net-dsa-realtek-rtl8365mb-improve-handling-of-phy-modes'	Jakub Kicinski	1	-122/+177
	Alvin Šipraga says: ==================== net: dsa: realtek: rtl8365mb: improve handling of PHY modes This series introduces some minor cleanup of the driver and improves the handling of PHY interface modes to break the assumption that CPU ports are always over an external interface, and the assumption that user ports are always using an internal PHY. ==================== Link: https://lore.kernel.org/r/20220615225116.432283-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: dsa: realtek: rtl8365mb: handle PHY interface modes correctly	Alvin Šipraga	1	-107/+174
	Realtek switches in the rtl8365mb family always have at least one port with a so-called external interface, supporting PHY interface modes such as RGMII or SGMII. The purpose of this patch is to improve the driver's handling of these ports. A new struct rtl8365mb_chip_info is introduced together with a static array of such structs. An instance of this struct is added for each supported switch, distinguished by its chip ID and version. Embedded in each chip_info struct is an array of struct rtl8365mb_extint, describing the external interfaces available. This is more specific than the old rtl8365mb_extint_port_map, which was only valid for switches with up to 6 ports. The struct rtl8365mb_extint also contains a bitmask of supported PHY interface modes, which allows the driver to distinguish which ports support RGMII. This corrects a previous mistake in the driver whereby it was assumed that any port with an external interface supports RGMII. This is not actually the case: for example, the RTL8367S has two external interfaces, only the second of which supports RGMII. The first supports only SGMII and HSGMII. This new design will make it easier to add support for other interface modes. Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported capabilities based on the external interface properties described above. This addresses Vladimir's point in the linked thread that the capabilities are not actually a function of the DSA port type: Although most typical applications will treat the ports with internal PHY as user ports, there is no actual hardware limitation preventing one from using them as a CPU port. Equally, ports with external interface(s) may well be treated as user ports, even though it is typical to use those ports as CPU ports. Link: https://lore.kernel.org/netdev/20220510192301.5djdt3ghoavxulhl@bang-olufsen.dk/ Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: dsa: realtek: rtl8365mb: remove learn_limit_max private data member	Alvin Šipraga	1	-6/+1
	The variable is just assigned the value of a macro, so it can be removed. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: dsa: realtek: rtl8365mb: correct the max number of ports	Alvin Šipraga	1	-2/+1
	The maximum number of ports is actually 11, according to two observations: 1. The highest port ID used in the vendor driver is 10. Since port IDs are indexed from 0, and since DSA follows the same numbering system, this means up to 11 ports are to be presumed. 2. The registers with port mask fields always amount to a maximum port mask of 0x7FF, corresponding to a maximum 11 ports. In view of this, I also deleted the comment. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: dsa: realtek: rtl8365mb: remove port_mask private data member	Alvin Šipraga	1	-7/+1
	There is no real need for this variable: the line change interrupt mask is sufficiently masked out when getting linkup_ind and linkdown_ind in the interrupt handler. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: dsa: realtek: rtl8365mb: rename macro RTL8367RB -> RTL8367RB_VB	Alvin Šipraga	1	-3/+3
	The official name of this switch is RTL8367RB-VB, not RTL8367RB. There is also an RTL8367RB-VC which is rather different. Change the name of the CHIP_ID/_VER macros for reasons of consistency. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	Merge branch 'net-ipa-more-multi-channel-event-ring-work'	Jakub Kicinski	3	-50/+40
	Alex Elder says: ==================== net: ipa: more multi-channel event ring work This series makes a little more progress toward supporting multiple channels with a single event ring. The first removes the assumption that consecutive events are associated with the same RX channel. The second derives the channel associated with an event from the event itself, and the next does a small cleanup enabled by that. The fourth causes updates to occur for every event processed (rather once). And the final patch does a little more rework to make TX completion have more in common with RX completion. ==================== Link: https://lore.kernel.org/r/20220615165929.5924-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: ipa: move more code out of gsi_channel_update()	Alex Elder	1	-14/+20
	Move the processing done for TX channels in gsi_channel_update() into gsi_evt_ring_rx_update(). The called function is called for both RX and TX channels, so rename it to be gsi_evt_ring_update(). As a result, this code no longer assumes events in an event ring are associated with just one channel. Because all events in a ring are handled in that function, we can move the call to gsi_trans_move_complete() there, and can ring the event ring doorbell there as well after all new events in the ring have been processed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: ipa: call gsi_evt_ring_rx_update() unconditionally	Alex Elder	1	-3/+3
	When an RX transaction completes, we update the trans->len field to contain the actual number of bytes received. This is done in a loop in gsi_evt_ring_rx_update(). Change that function so it checks the data transfer direction recorded in the transaction, and only updates trans->len for RX transfers. Then call it unconditionally. This means events for TX endpoints will run through the loop without otherwise doing anything, but this will change shortly. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: ipa: pass GSI pointer to gsi_evt_ring_rx_update()	Alex Elder	1	-6/+7
	The only reason the event ring's channel pointer is needed in gsi_evt_ring_rx_update() is so we can get at its GSI pointer. We can pass the GSI pointer as an argument, along with the event ring ID, and thereby avoid using the event ring channel pointer. This is another step toward no longer assuming an event ring services a single channel. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: ipa: don't pass channel when mapping transaction	Alex Elder	1	-6/+10
	Change gsi_channel_trans_map() so it derives the channel used from the transaction. Pass the index of the first TRE used by the transaction, and have the called function account for the fact that the last one used is what's important. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: ipa: don't assume one channel per event ring	Alex Elder	3	-27/+6
	In gsi_evt_ring_rx_update(), use gsi_event_trans() repeatedly to find the transaction associated with an event, rather than assuming consecutive events are associated with the same channel. This removes the only caller of gsi_trans_pool_next(), so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	Merge branch 'dt-bindings-dp83867-add-binding-for-io_impedance_ctrl-nvmem-cell'	Jakub Kicinski	3	-9/+67
	Rasmus Villemoes says: ==================== dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the reset value (which is factory calibrated to a value corresponding to approximately 50 ohms) or using one of the two boolean properties to set it to the min/max value - are too coarse. This series adds a device tree binding for an nvmem cell which can be populated during production with a suitable value calibrated for each board, and corresponding support in the driver. The second patch adds a trivial phy wrapper for dev_err_probe(), used in the third. ==================== Link: https://lore.kernel.org/r/20220614084612.325229-1-linux@rasmusvillemoes.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	net: phy: dp83867: implement support for io_impedance_ctrl nvmem cell	Rasmus Villemoes	1	-6/+49
	We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the (factory calibrated) reset value or using one of the two boolean properties to set it to the min/max value - are too coarse. Implement support for the newly added binding allowing device tree to specify an nvmem cell containing an appropriate value for this specific board. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	linux/phy.h: add phydev_err_probe() wrapper for dev_err_probe()	Rasmus Villemoes	1	-0/+3
	The dev_err_probe() function is quite useful to avoid boilerplate related to -EPROBE_DEFER handling. Add a phydev_err_probe() helper to simplify making use of that from phy drivers which otherwise use the phydev_* helpers. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell	Rasmus Villemoes	1	-3/+15
	We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the reset value (which is factory calibrated to a value corresponding to approximately 50 ohms) or using one of the two boolean properties to set it to the min/max value - are too coarse. There is no fixed mapping from register values to values in the range 35-70 ohms; it varies from chip to chip, and even that target range is approximate. So add a DT binding for an nvmem cell which can be populated during production with a value suitable for each specific board. Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	201	-1672/+2489
	No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	Merge branch 'sleepable uprobe support'	Alexei Starovoitov	10	-49/+231
	Delyan Kratunov says: ==================== This series implements support for sleepable uprobe programs. Key work is in patches 2 and 3, the rest is plumbing and tests. The main observation is that the only obstacle in the way of sleepable uprobe programs is not the uprobe infrastructure, which already runs in a user-like context, but the rcu usage around bpf_prog_array. Details are in patch 2 but the tl;dr is that we chain trace_tasks and normal rcu grace periods when releasing to array to accommodate users of either rcu type. This introduces latency for non-sleepable users (kprobe, tp) but that's deemed acceptable, given recent benchmarks by Andrii [1]. We're a couple of orders of magnitude under the rate of bpf_prog_array churn that would raise flags (~1MM/s per Paul). [1]: https://lore.kernel.org/bpf/CAEf4BzbpjN6ca7D9KOTiFPOoBYkciYvTz0UJNp5c-_3ptm=Mrg@mail.gmail.com/ v3 -> v4: * Fix kdoc and inline issues * Rebase v2 -> v3: * Inline uprobe_call_bpf into trace_uprobe.c, it's just a bpf_prog_run_array_sleepable call now. * Do not disable preemption for uprobe non-sleepable programs. * Add acks. v1 -> v2: * Fix lockdep annotations in bpf_prog_run_array_sleepable * Chain rcu grace periods only for perf_event-attached programs. This limits the additional latency on the free path to use cases where we know it won't be a problem. * Add tests calling helpers only available in sleepable programs. * Remove kprobe.s support from libbpf. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	selftests/bpf: add tests for sleepable (uk)probes	Delyan Kratunov	2	-1/+108
	Add tests that ensure sleepable uprobe programs work correctly. Add tests that ensure sleepable kprobe programs cannot attach. Also add tests that attach both sleepable and non-sleepable uprobe programs to the same location (i.e. same bpf_prog_array). Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/c744e5bb7a5c0703f05444dc41f2522ba3579a48.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	libbpf: add support for sleepable uprobe programs	Delyan Kratunov	1	-1/+4
	Add section mappings for u(ret)probe.s programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/aedbc3b74f3523f00010a7b0df8f3388cca59f16.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: allow sleepable uprobe programs to attach	Delyan Kratunov	2	-8/+12
	uprobe and kprobe programs have the same program type, KPROBE, which is currently not allowed to load sleepable programs. To avoid adding a new UPROBE type, instead allow sleepable KPROBE programs to load and defer the is-it-actually-a-uprobe-program check to attachment time, where there's already validation of the corresponding perf_event. A corollary of this patch is that you can now load a sleepable kprobe program but cannot attach it. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/fcd44a7cd204f372f6bb03ef794e829adeaef299.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: implement sleepable uprobes by chaining gps	Delyan Kratunov	4	-5/+71
	uprobes work by raising a trap, setting a task flag from within the interrupt handler, and processing the actual work for the uprobe on the way back to userspace. As a result, uprobe handlers already execute in a might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe programs is therefore on the bpf side. Namely, the bpf_prog_array attached to the uprobe is protected by normal rcu. In order for uprobe bpf programs to become sleepable, it has to be protected by the tasks_trace rcu flavor instead (and kfree() called after a corresponding grace period). Therefore, the free path for bpf_prog_array now chains a tasks_trace and normal grace periods one after the other. Users who iterate under tasks_trace read section would be safe, as would users who iterate under normal read sections (from non-sleepable locations). The downside is that the tasks_trace latency affects all perf_event-attached bpf programs (and not just uprobe ones). This is deemed safe given the possible attach rates for kprobe/uprobe/tp programs. Separately, non-sleepable programs need access to dynamically sized rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes an rcu read section, in addition to the overarching tasks_trace section. Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	bpf: move bpf_prog to bpf.h	Delyan Kratunov	2	-34/+36
	In order to add a version of bpf_prog_run_array which accesses the bpf_prog->aux member, bpf_prog needs to be more than a forward declaration inside bpf.h. Given that filter.h already includes bpf.h, this merely reorders the type declarations for filter.h users. bpf.h users now have access to bpf_prog internals. Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/3ed7824e3948f22d84583649ccac0ff0d38b6b58.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	libbpf: Fix internal USDT address translation logic for shared libraries	Andrii Nakryiko	2	-65/+72
	Perform the same virtual address to file offset translation that libbpf is doing for executable ELF binaries also for shared libraries. Currently libbpf is making a simplifying and sometimes wrong assumption that for shared libraries relative virtual addresses inside ELF are always equal to file offsets. Unfortunately, this is not always the case with LLVM's lld linker, which now by default generates quite more complicated ELF segments layout. E.g., for liburandom_read.so from selftests/bpf, here's an excerpt from readelf output listing ELF segments (a.k.a. program headers): Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R 0x8 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0005e4 0x0005e4 R 0x1000 LOAD 0x0005f0 0x00000000000015f0 0x00000000000015f0 0x000160 0x000160 R E 0x1000 LOAD 0x000750 0x0000000000002750 0x0000000000002750 0x000210 0x000210 RW 0x1000 LOAD 0x000960 0x0000000000003960 0x0000000000003960 0x000028 0x000029 RW 0x1000 Compare that to what is generated by GNU ld (or LLVM lld's with extra -znoseparate-code argument which disables this cleverness in the name of file size reduction): Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000550 0x000550 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000131 0x000131 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0000ac 0x0000ac R 0x1000 LOAD 0x002dc0 0x0000000000003dc0 0x0000000000003dc0 0x000262 0x000268 RW 0x1000 You can see from the first example above that for executable (Flg == "R E") PT_LOAD segment (LOAD #2), Offset doesn't match VirtAddr columns. And it does in the second case (GNU ld output). This is important because all the addresses, including USDT specs, operate in a virtual address space, while kernel is expecting file offsets when performing uprobe attach. So such mismatches have to be properly taken care of and compensated by libbpf, which is what this patch is fixing. Also patch clarifies few function and variable names, as well as updates comments to reflect this important distinction (virtaddr vs file offset) and to ephasize that shared libraries are not all that different from executables in this regard. This patch also changes selftests/bpf Makefile to force urand_read and liburand_read.so to be built with Clang and LLVM's lld (and explicitly request this ELF file size optimization through -znoseparate-code linker parameter) to validate libbpf logic and ensure regressions don't happen in the future. I've bundled these selftests changes together with libbpf changes to keep the above description tied with both libbpf and selftests changes. Fixes: 74cc6311cec9 ("libbpf: Add USDT notes parsing and resolution logic") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616055543.3285835-1-andrii@kernel.org
2022-06-16	Merge tag 'net-5.19-rc3' of ↵	Linus Torvalds	36	-752/+433
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Mostly driver fixes. Current release - regressions: - Revert "net: Add a second bind table hashed by port and address", needs more work - amd-xgbe: use platform_irq_count(), static setup of IRQ resources had been removed from DT core - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation Current release - new code bugs: - hns3: modify the ring param print info Previous releases - always broken: - axienet: make the 64b addressable DMA depends on 64b architectures - iavf: fix issue with MAC address of VF shown as zero - ice: fix PTP TX timestamp offset calculation - usb: ax88179_178a needs FLAG_SEND_ZLP Misc: - document some net.sctp.* sysctls" * tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits) net: axienet: add missing error return code in axienet_probe() Revert "net: Add a second bind table hashed by port and address" net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg net: usb: ax88179_178a needs FLAG_SEND_ZLP MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS ARM: dts: at91: ksz9477_evb: fix port/phy validation net: bgmac: Fix an erroneous kfree() in bgmac_remove() ice: Fix memory corruption in VF driver ice: Fix queue config fail handling ice: Sync VLAN filtering features for DVM ice: Fix PTP TX timestamp offset calculation mlxsw: spectrum_cnt: Reorder counter pools docs: networking: phy: Fix a typo amd-xgbe: Use platform_irq_count() octeontx2-vf: Add support for adaptive interrupt coalescing xilinx: Fix build on x86. net: axienet: Use iowrite64 to write all 64b descriptor pointers net: axienet: make the 64b addresable DMA depends on 64b archectures net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization net: hns3: fix PF rss size initialization bug ...
2022-06-16	net: axienet: add missing error return code in axienet_probe()	Yang Yingliang	1	-0/+1
	It should return error code in error path in axienet_probe(). Fixes: 00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>