summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-08-10bpf, tests: Add branch conversion JIT testJohan Almbladh1-0/+43
Some JITs may need to convert a conditional jump instruction to to short PC-relative branch and a long unconditional jump, if the PC-relative offset exceeds offset field width in the CPU instruction. This test triggers such branch conversion on the 32-bit MIPS JIT. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809091829.810076-11-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add word-order tests for load/store of double wordsJohan Almbladh1-0/+36
A double word (64-bit) load/store may be implemented as two successive 32-bit operations, one for each word. Check that the order of those operations is consistent with the machine endianness. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-10-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add tests for ALU operations implemented with function callsJohan Almbladh1-0/+141
32-bit JITs may implement complex ALU64 instructions using function calls. The new tests check aspects related to this, such as register clobbering and register argument re-ordering. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809091829.810076-9-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add more ALU64 BPF_MUL testsJohan Almbladh1-0/+48
This patch adds BPF_MUL tests for 64x32 and 64x64 multiply. Mainly testing 32-bit JITs that implement ALU64 operations with two 32-bit CPU registers per operand. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809091829.810076-8-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64Johan Almbladh1-2/+542
This patch adds a number of tests for BPF_LSH, BPF_RSH amd BPF_ARSH ALU64 operations with values that may trigger different JIT code paths. Mainly testing 32-bit JITs that implement ALU64 operations with two 32-bit CPU registers per operand. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809091829.810076-7-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSHJohan Almbladh1-0/+102
This patch adds more tests of ALU32 shift operations BPF_LSH and BPF_RSH, including the special case of a zero immediate. Also add corresponding BPF_ARSH tests which were missing for ALU32. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-6-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add more tests of ALU32 and ALU64 bitwise operationsJohan Almbladh1-0/+210
This patch adds tests of BPF_AND, BPF_OR and BPF_XOR with different magnitude of the immediate value. Mainly checking 32-bit JIT sub-word handling and zero/sign extension. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-5-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Fix typos in test case descriptionsJohan Almbladh1-4/+4
This patch corrects the test description in a number of cases where the description differed from what was actually tested and expected. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-4-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add BPF_MOV tests for zero and sign extensionJohan Almbladh1-0/+84
Tests for ALU32 and ALU64 MOV with different sizes of the immediate value. Depending on the immediate field width of the native CPU instructions, a JIT may generate code differently depending on the immediate value. Test that zero or sign extension is performed as expected. Mainly for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-3-johan.almbladh@anyfinetworks.com
2021-08-10bpf, tests: Add BPF_JMP32 test casesJohan Almbladh1-0/+511
An eBPF JIT may implement JMP32 operations in a different way than JMP, especially on 32-bit architectures. This patch adds a series of tests for JMP32 operations, mainly for testing JITs. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210809091829.810076-2-johan.almbladh@anyfinetworks.com
2021-08-10samples, bpf: Add an explict comment to handle nested vlan tagging.Muhammad Falak R Wani2-0/+4
A codeblock for handling nested vlan trips newbies into thinking it as duplicate code. Explicitly add a comment to clarify. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809070046.32142-1-falakreyaz@gmail.com
2021-08-10selftests/bpf: Add tests for XDP bondingJussi Maki1-0/+520
Add a test suite to test XDP bonding implementation over a pair of veth devices. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210731055738.16820-8-joamaki@gmail.com
2021-08-10selftests/bpf: Fix xdp_tx.c prog section nameJussi Maki2-2/+2
The program type cannot be deduced from 'tx' which causes an invalid argument error when trying to load xdp_tx.o using the skeleton. Rename the section name to "xdp" so that libbpf can deduce the type. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210731055738.16820-7-joamaki@gmail.com
2021-08-10net, core: Allow netdev_lower_get_next_private_rcu in bh contextJussi Maki1-1/+1
For the XDP bonding slave lookup to work in the NAPI poll context in which the redudant rcu_read_lock() has been removed we have to follow the same approach as in 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context") and modify the WARN_ON to also check rcu_read_lock_bh_held(). Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210731055738.16820-6-joamaki@gmail.com
2021-08-10bpf, devmap: Exclude XDP broadcast to master deviceJussi Maki1-9/+60
If the ingress device is bond slave, do not broadcast back through it or the bond master. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-5-joamaki@gmail.com
2021-08-10net, bonding: Add XDP support to the bonding driverJussi Maki2-1/+309
XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". On top of that, for ice, further work is in progress on improving the XDP_TX numbers [4]. -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
2021-08-10net, core: Add support for XDP redirection to slave deviceJussi Maki4-2/+55
This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX into XDP_REDIRECT after BPF program run when the ingress device is a bond slave. The dev_xdp_prog_count is exposed so that slave devices can be checked for loaded XDP programs in order to avoid the situation where both bond master and slave have programs loaded according to xdp_state. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-3-joamaki@gmail.com
2021-08-10net, bonding: Refactor bond_xmit_hash for use with xdp_buffJussi Maki1-57/+90
In preparation for adding XDP support to the bonding driver refactor the packet hashing functions to be able to work with any linear data buffer without an skb. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
2021-08-07Merge branch 'samples/bpf: xdpsock: Minor enhancements'Andrii Nakryiko1-12/+8
Simon Horman says: ==================== This short series provides minor enhancements to the sample code in samples/bpf/xdpsock_user.c. Each change is explained more fully in its own commit message. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-07samples/bpf: xdpsock: Remove forward declaration of ip_fast_csum()Niklas Söderlund1-3/+1
There is a forward declaration of ip_fast_csum() just before its implementation, remove the unneeded forward declaration. While at it mark the implementation as static inline. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/bpf/20210806122855.26115-3-simon.horman@corigine.com
2021-08-07samples/bpf: xdpsock: Make the sample more useful outside the treeNiklas Söderlund1-9/+7
The xdpsock sample application is a useful base for experiment's around AF_XDP sockets. Compiling the sample outside of the kernel tree is made harder then it has to be as the sample includes two headers and that are not installed by 'make install_header' nor are usually part of distributions kernel headers. The first header asm/barrier.h is not used and can just be dropped. The second linux/compiler.h are only needed for the decorator __force and are only used in ip_fast_csum(), csum_fold() and csum_tcpudp_nofold(). These functions are copied verbatim from include/asm-generic/checksum.h and lib/checksum.c. While it's fine to copy and use these functions in the sample application the decorator brings no value and can be dropped together with the include. With this change it's trivial to compile the xdpsock sample outside the kernel tree from xdpsock_user.c and xdpsock.h. $ gcc -o xdpsock xdpsock_user.c -lbpf -lpthread Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/bpf/20210806122855.26115-2-simon.horman@corigine.com
2021-08-06selftests/bpf: Rename reference_tracking BPF programsAndrii Nakryiko2-9/+9
BPF programs for reference_tracking selftest use "fail_" prefix to notify that they are expected to fail. This is really confusing and inconvenient when trying to grep through test_progs output to find *actually* failed tests. So rename the prefix from "fail_" to "err_". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210805230734.437914-1-andrii@kernel.org
2021-08-06selftests/bpf: Fix bpf-iter-tcp4 test to print correctly the dest IPJose Blanquicet1-1/+1
Currently, this test is incorrectly printing the destination port in place of the destination IP. Fixes: 2767c97765cb ("selftests/bpf: Implement sample tcp/tcp6 bpf_iter programs") Signed-off-by: Jose Blanquicet <josebl@microsoft.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210805164044.527903-1-josebl@microsoft.com
2021-08-05selftests/bpf: Move netcnt test under test_progsStanislav Fomichev7-163/+96
Rewrite to skel and ASSERT macros as well while we are at it. v3: - replace -f with -A to make it work with busybox ping. -A is available on both busybox and iputils, from the man page: On networks with low RTT this mode is essentially equivalent to flood mode. v2: - don't check result of bpf_map__fd (Yonghong Song) - remove from .gitignore (Andrii Nakryiko) - move ping_command into network_helpers (Andrii Nakryiko) - remove assert() (Andrii Nakryiko) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210804205524.3748709-1-sdf@google.com
2021-08-05bpf, samples: Add missing mprog-disable to xdp_redirect_cpu's optstringMatthew Cover1-1/+1
Commit ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap") added the following option, but missed adding it to optstring: - mprog-disable: disable loading XDP program on cpumap entries Fix it and add the missing option character. Fixes: ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap") Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210731005632.13228-1-matthew.cover@stackpath.com
2021-08-05bpf: Fix bpf_prog_test_run_xdp logic after incorrect merge resolutionAndrii Nakryiko1-2/+1
During recent net into net-next merge ([0]) a piece of old logic ([1]) got reintroduced accidentally while resolving merge conflict between bpf's [2] and bpf-next's [3]. This check was removed in bpf-next tree to allow extra ctx_in parameter passed for XDP test runs. Reinstating the check breaks bpf_prog_test_run_xdp logic and causes a corresponding xdp_context_test_run selftest failure. Fix by removing the check and allow ctx_in for XDP test runs. [0] 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") [1] 947e8b595b82 ("bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN") [2] 5e21bb4e8125 ("bpf, test: fix NULL pointer dereference on invalid expected_attach_type") [3] 47316f4a3053 ("bpf: Support input xdp_md context in BPF_PROG_TEST_RUN") Fixes: 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2021-08-03bpf, unix: Check socket type in unix_bpf_update_proto()Cong Wang1-0/+3
As of now, only AF_UNIX datagram socket supports sockmap. But unix_proto is shared for all kinds of AF_UNIX sockets, so we have to check the socket type in unix_bpf_update_proto() to explicitly reject other types, otherwise they could be added into sockmap, too. Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210731195038.8084-1-xiyou.wangcong@gmail.com
2021-08-03bpf: Fix off-by-one in tail call count limitingJohan Almbladh1-1/+1
Before, the interpreter allowed up to MAX_TAIL_CALL_CNT + 1 tail calls. Now precisely MAX_TAIL_CALL_CNT is allowed, which is in line with the behavior of the x86 JITs. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210728164741.350370-1-johan.almbladh@anyfinetworks.com
2021-08-03net/mlx4: make the array states static const, makes object smallerColin Ian King1-1/+1
Don't populate the array states on the stack but instead it static const. Makes the object code smaller by 79 bytes. Before: text data bss dec hex filename 21309 8304 192 29805 746d drivers/net/ethernet/mellanox/mlx4/qp.o After: text data bss dec hex filename 21166 8368 192 29726 741e drivers/net/ethernet/mellanox/mlx4/qp.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210801153742.147304-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03net: 3c509: make the array if_names static const, makes object smallerColin Ian King1-1/+3
Don't populate the array if_names on the stack but instead it static const. Makes the object code smaller by 99 bytes. Before: text data bss dec hex filename 27886 10752 672 39310 998e ./drivers/net/ethernet/3com/3c509.o After: text data bss dec hex filename 27723 10816 672 39211 992b ./drivers/net/ethernet/3com/3c509.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801152650.146572-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03dpaa2-eth: make the array faf_bits static const, makes object smallerColin Ian King1-1/+1
Don't populate the array faf_bits on the stack but instead it static const. Makes the object code smaller by 175 bytes. Before: text data bss dec hex filename 9645 4552 0 14197 3775 ../freescale/dpaa2/dpaa2-eth-devlink.o After: text data bss dec hex filename 9406 4616 0 14022 36c6 ../freescale/dpaa2/dpaa2-eth-devlink.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801152209.146359-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03qlcnic: make the array random_data static const, makes object smallerColin Ian King1-1/+1
Don't populate the array random_data on the stack but instead it static const. Makes the object code smaller by 66 bytes. Before: text data bss dec hex filename 52895 10976 0 63871 f97f ../qlogic/qlcnic/qlcnic_ethtool.o After: text data bss dec hex filename 52701 11104 0 63805 f93d ../qlogic//qlcnic/qlcnic_ethtool.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801151659.146113-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03net: marvell: make the array name static, makes object smallerColin Ian King1-1/+1
Don't populate the const array name on the stack but instead it static. Makes the object code smaller by 28 bytes. Add a missing const to clean up a checkpatch warning. Before: text data bss dec hex filename 124565 31565 384 156514 26362 drivers/net/ethernet/marvell/sky2.o After: text data bss dec hex filename 124441 31661 384 156486 26346 drivers/net/ethernet/marvell/sky2.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801150647.145728-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03cxgb4: make the array match_all_mac static, makes object smallerColin Ian King1-2/+2
Don't populate the array match_all_mac on the stack but instead it static const. Makes the object code smaller by 75 bytes. Before: text data bss dec hex filename 46701 8960 64 55725 d9ad ../chelsio/cxgb4/cxgb4_filter.o After: text data bss dec hex filename 46338 9120 192 55650 d962 ../chelsio/cxgb4/cxgb4_filter.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801151205.145924-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02ipv4: Fix refcount warning for new fib_infoDavid Ahern1-1/+1
Ioana reported a refcount warning when booting over NFS: [ 5.042532] ------------[ cut here ]------------ [ 5.047184] refcount_t: addition on 0; use-after-free. [ 5.052324] WARNING: CPU: 7 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0xa4/0x150 ... [ 5.167201] Call trace: [ 5.169635] refcount_warn_saturate+0xa4/0x150 [ 5.174067] fib_create_info+0xc00/0xc90 [ 5.177982] fib_table_insert+0x8c/0x620 [ 5.181893] fib_magic.isra.0+0x110/0x11c [ 5.185891] fib_add_ifaddr+0xb8/0x190 [ 5.189629] fib_inetaddr_event+0x8c/0x140 fib_treeref needs to be set after kzalloc. The old code had a ++ which led to the confusion when the int was replaced by a refcount_t. Fixes: 79976892f7ea ("net: convert fib_treeref from int to refcount_t") Signed-off-by: David Ahern <dsahern@kernel.org> Reported-by: Ioana Ciornei <ciorneiioana@gmail.com> Cc: Yajun Deng <yajun.deng@linux.dev> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20210802160221.27263-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02net: phy: mscc: make some arrays static const, makes object smallerColin Ian King1-4/+4
Don't populate arrays on the stack but instead them static const. Makes the object code smaller by 280 bytes. Before: text data bss dec hex filename 24142 4368 192 28702 701e ./drivers/net/phy/mscc/mscc_ptp.o After: text data bss dec hex filename 23830 4400 192 28422 6f06 ./drivers/net/phy/mscc/mscc_ptp.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801070155.139057-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02netdevsim: make array res_ids static const, makes object smallerColin Ian King1-1/+1
Don't populate the array res_ids on the stack but instead it static const. Makes the object code smaller by 14 bytes. Before: text data bss dec hex filename 50833 8314 256 59403 e80b ./drivers/net/netdevsim/fib.o After: text data bss dec hex filename 50755 8378 256 59389 e7fd ./drivers/net/netdevsim/fib.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210801065328.138906-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02net/ipv4: Replace one-element array with flexible-array memberGustavo A. R. Silva3-18/+30
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged: $ pahole -C ip_msfilter net/ipv4/ip_sockglue.o struct ip_msfilter { union { struct { __be32 imsf_multiaddr_aux; /* 0 4 */ __be32 imsf_interface_aux; /* 4 4 */ __u32 imsf_fmode_aux; /* 8 4 */ __u32 imsf_numsrc_aux; /* 12 4 */ __be32 imsf_slist[1]; /* 16 4 */ }; /* 0 20 */ struct { __be32 imsf_multiaddr; /* 0 4 */ __be32 imsf_interface; /* 4 4 */ __u32 imsf_fmode; /* 8 4 */ __u32 imsf_numsrc; /* 12 4 */ __be32 imsf_slist_flex[0]; /* 16 0 */ }; /* 0 16 */ }; /* 0 20 */ /* size: 20, cachelines: 1, members: 1 */ /* last cacheline: 20 bytes */ }; Also, refactor the code accordingly and make use of the struct_size() and flex_array_size() helpers. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02net: dsa: remove the struct packet_type argument from dsa_device_ops::rcv()Vladimir Oltean17-49/+25
No tagging driver uses this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02nfc: hci: pass callback data param as pointer in nci_request()Krzysztof Kozlowski3-72/+67
The nci_request() receives a callback function and unsigned long data argument "opt" which is passed to the callback. Almost all of the nci_request() callers pass pointer to a stack variable as data argument. Only few pass scalar value (e.g. u8). All such callbacks do not modify passed data argument and in previous commit they were made as const. However passing pointers via unsigned long removes the const annotation. The callback could simply cast unsigned long to a pointer to writeable memory. Use "const void *" as type of this "opt" argument to solve this and prevent modifying the pointed contents. This is also consistent with generic pattern of passing data arguments - via "void *". In few places which pass scalar values, use casts via "unsigned long" to suppress any warnings. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02cavium: switch from 'pci_' to 'dma_' APIChristophe JAILLET4-18/+6
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. It has been hand modified to use 'dma_set_mask_and_coherent()' instead of 'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable. It has been compile tested. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02net: dsa: mt7530: drop paranoid checks in .get_tag_protocol()Vladimir Oltean1-9/+1
It is desirable to reduce the surface of DSA_TAG_PROTO_NONE as much as we can, because we now have options for switches without hardware support for DSA tagging, and the occurrence in the mt7530 driver is in fact quite gratuitout and easy to remove. Since ds->ops->get_tag_protocol() is only called for CPU ports, the checks for a CPU port in mtk_get_tag_protocol() are redundant and can be removed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02Merge branch 'octeon-drr-config'David S. Miller10-15/+253
Sunil Goutham says: ==================== cn10k: DWRR MTU and weights configuration On OcteonTx2 DWRR quantum is directly configured into each of the transmit scheduler queues. And PF/VF drivers were free to config any value upto 2^24. On CN10K, HW is modified, the quantum configuration at scheduler queues is in terms of weight. And SW needs to setup a base DWRR MTU at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do 'DWRR MTU * weight' to get the quantum. This patch series addresses this HW change on CN10K silicons, both admin function and PF/VF drivers are modified. Also added support to program DWRR MTU via devlink params. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02octeontx2-pf: cn10k: Config DWRR weight based on MTUSunil Goutham7-12/+52
Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU. The DWRR MTU from admin function is retrieved via mbox. On OcteaonTx2 silicon, admin function driver responds with DWRR MTU as '1'. This helps to avoid silicon specific transmit scheduler DWRR quantum/weight configuration logic. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02octeontx2-af: cn10k: DWRR MTU configurationSunil Goutham5-3/+201
On OcteonTx2 DWRR quantum is directly configured into each of the transmit scheduler queues. And PF/VF drivers were free to config any value upto 2^24. On CN10K, HW is modified, the quantum configuration at scheduler queues is in terms of weight. And SW needs to setup a base DWRR MTU at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do 'DWRR MTU * weight' to get the quantum. For LBK traffic, value programmed into NIX_AF_DWRR_RPM_MTU register is considered as DWRR MTU. This patch programs a default DWRR MTU of 8192 into HW and also provides a way to change this via devlink params. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02selftests/net: remove min gso test in packet_sndDust Li1-3/+0
This patch removed the 'raw gso min size - 1' test which always fails now: ./in_netns.sh ./psock_snd -v -c -g -l "${mss}" raw gso min size - 1 (expected to fail) tx: 1524 rx: 1472 OK After commit 7c6d2ecbda83 ("net: be more gentle about silly gso requests coming from user"), we relaxed the min gso_size check in virtio_net_hdr_to_skb(). So when a packet which is smaller then the gso_size, GSO for this packet will not be set, the packet will be send/recv successfully. Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02bonding: 3ad: fix the concurrency between __bond_release_one() and ↵Yufeng Mo1-1/+2
bond_3ad_state_machine_handler() Some time ago, I reported a calltrace issue "did not find a suitable aggregator", please see[1]. After a period of analysis and reproduction, I find that this problem is caused by concurrency. Before the problem occurs, the bond structure is like follows: bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1 \ port0 \ slaver1(eth1) - agg1.lag_ports -> NULL \ port1 If we run 'ifenslave bond0 -d eth1', the process is like below: excuting __bond_release_one() | bond_upper_dev_unlink()[step1] | | | | | bond_3ad_lacpdu_recv() | | ->bond_3ad_rx_indication() | | spin_lock_bh() | | ->ad_rx_machine() | | ->__record_pdu()[step2] | | spin_unlock_bh() | | | | bond_3ad_state_machine_handler() | spin_lock_bh() | ->ad_port_selection_logic() | ->try to find free aggregator[step3] | ->try to find suitable aggregator[step4] | ->did not find a suitable aggregator[step5] | spin_unlock_bh() | | | | bond_3ad_unbind_slave() | spin_lock_bh() spin_unlock_bh() step1: already removed slaver1(eth1) from list, but port1 remains step2: receive a lacpdu and update port0 step3: port0 will be removed from agg0.lag_ports. The struct is "agg0.lag_ports -> port1" now, and agg0 is not free. At the same time, slaver1/agg1 has been removed from the list by step1. So we can't find a free aggregator now. step4: can't find suitable aggregator because of step2 step5: cause a calltrace since port->aggregator is NULL To solve this concurrency problem, put bond_upper_dev_unlink() after bond_3ad_unbind_slave(). In this way, we can invalid the port first and skip this port in bond_3ad_state_machine_handler(). This eliminates the situation that the slaver has been removed from the list but the port is still valid. [1]https://lore.kernel.org/netdev/10374.1611947473@famine/ Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02net_sched: refactor TC action init APICong Wang37-169/+185
TC action ->init() API has 10 parameters, it becomes harder to read. Some of them are just boolean and can be replaced by flags. Similarly for the internal API tcf_action_init() and tcf_exts_validate(). This patch converts them to flags and fold them into the upper 16 bits of "flags", whose lower 16 bits are still reserved for user-space. More specifically, the following kernel flags are introduced: TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to distinguish whether it is compatible with policer. TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether this action is bound to a filter. TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts, means we are replacing an existing action. TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the opposite meaning, because we still hold RTNL in most cases. The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is untouched and still stored as before. I have tested this patch with tdc and I do not see any failure related to this patch. Tested-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02niu: read property length only if we use itMartin Kaiser1-3/+3
In three places, the driver calls of_get_property and reads the property length although the length is not used. Update the calls to not request the length. Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-31Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski83-1808/+5027
Andrii Nakryiko says: ==================== bpf-next 2021-07-30 We've added 64 non-merge commits during the last 15 day(s) which contain a total of 83 files changed, 5027 insertions(+), 1808 deletions(-). The main changes are: 1) BTF-guided binary data dumping libbpf API, from Alan. 2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei. 3) Ambient BPF run context and cgroup storage cleanup, from Andrii. 4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi. 5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri. 6) bpf_{get,set}sockopt() support in BPF iterators, from Martin. 7) BPF map pinning improvements in libbpf, from Martynas. 8) Improved module BTF support in libbpf and bpftool, from Quentin. 9) Bpftool cleanups and documentation improvements, from Quentin. 10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi. 11) Increased maximum cgroup storage size, from Stanislav. 12) Small fixes and improvements to BPF tests and samples, from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) tools: bpftool: Complete metrics list in "bpftool prog profile" doc tools: bpftool: Document and add bash completion for -L, -B options selftests/bpf: Update bpftool's consistency script for checking options tools: bpftool: Update and synchronise option list in doc and help msg tools: bpftool: Complete and synchronise attach or map types selftests/bpf: Check consistency between bpftool source, doc, completion tools: bpftool: Slightly ease bash completion updates unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg() libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf tools: bpftool: Support dumping split BTF by id libbpf: Add split BTF support for btf__load_from_kernel_by_id() tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id() tools: Free BTF objects at various locations libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id() libbpf: Rename btf__load() as btf__load_into_kernel() libbpf: Return non-null error on failures in libbpf_find_prog_btf_id() bpf: Emit better log message if bpf_iter ctx arg btf_id == 0 tools/resolve_btfids: Emit warnings and patch zero id for missing symbols bpf: Increase supported cgroup storage value size libbpf: Fix race when pinning maps in parallel ... ==================== Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>