summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-28bpf, test, cgroup: Use sk_{alloc,free} for test casesDaniel Borkmann1-5/+9
BPF test infra has some hacks in place which kzalloc() a socket and perform minimum init via sock_net_set() and sock_init_data(). As a result, the sk's skcd->cgroup is NULL since it didn't go through proper initialization as it would have been the case from sk_alloc(). Rather than re-adding a NULL test in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test socket. The latter also allows to get rid of the bpf_sk_storage_free() special case. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: b7a1848e8398 ("bpf: add BPF_PROG_TEST_RUN support for flow dissector") Fixes: 2cb494a36c98 ("bpf: add tests for direct packet access from CGROUP_SKB") Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
2021-09-28bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interruptDaniel Borkmann1-5/+12
If cgroup_sk_alloc() is called from interrupt context, then just assign the root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path, and iff indeed NULL returning the root cgroup (v ?: &cgrp_dfl_root.cgrp). Rather than re-adding the NULL-test to the fast-path we can just assign it once from cgroup_sk_alloc() given v1/v2 handling has been simplified. The migration from NULL test with returning &cgrp_dfl_root.cgrp to assigning &cgrp_dfl_root.cgrp directly does /not/ change behavior for callers of sock_cgroup_ptr(). syzkaller was able to trigger a splat in the legacy netrom code base, where the RX handler in nr_rx_frame() calls nr_make_new() which calls sk_alloc() and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL skcd->cgroup, where it trips over on cgroup_sk_free() side given it expects a non-NULL object. There are a few other candidates aside from netrom which have similar pattern where in their accept-like implementation, they just call to sk_alloc() and thus cgroup_sk_alloc() instead of sk_clone_lock() with the corresponding cgroup_sk_clone() which then inherits the cgroup from the parent socket. None of them are related to core protocols where BPF cgroup programs are running from. However, in future, they should follow to implement a similar inheritance mechanism. Additionally, with a !CONFIG_CGROUP_NET_PRIO and !CONFIG_CGROUP_NET_CLASSID configuration, the same issue was exposed also prior to 8520e224f547 due to commit e876ecc67db8 ("cgroup: memcg: net: do not associate sock with unrelated cgroup") which added the early in_interrupt() return back then. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: e876ecc67db8 ("cgroup: memcg: net: do not associate sock with unrelated cgroup") Reported-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com Reported-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com Tested-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20210927123921.21535-1-daniel@iogearbox.net
2021-09-28libbpf: Fix segfault in static linker for objects without BTFKumar Kartikeya Dwivedi1-1/+7
When a BPF object is compiled without BTF info (without -g), trying to link such objects using bpftool causes a SIGSEGV due to btf__get_nr_types accessing obj->btf which is NULL. Fix this by checking for the NULL pointer, and return error. Reproducer: $ cat a.bpf.c extern int foo(void); int bar(void) { return foo(); } $ cat b.bpf.c int foo(void) { return 0; } $ clang -O2 -target bpf -c a.bpf.c $ clang -O2 -target bpf -c b.bpf.c $ bpftool gen obj out a.bpf.o b.bpf.o Segmentation fault (core dumped) After fix: $ bpftool gen obj out a.bpf.o b.bpf.o libbpf: failed to find BTF info for object 'a.bpf.o' Error: failed to link 'a.bpf.o': Unknown error -22 (-22) Fixes: a46349227cd8 (libbpf: Add linker extern resolution support for functions and global variables) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210924023725.70228-1-memxor@gmail.com
2021-09-28MAINTAINERS: Add btf headers to BPFDave Marchevsky1-0/+2
BPF folks maintain these and they're not picked up by the current MAINTAINERS entries. Files caught by the added globs: include/linux/btf.h include/linux/btf_ids.h include/uapi/linux/btf.h Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210924193557.3081469-1-davemarchevsky@fb.com
2021-09-28bpf: Exempt CAP_BPF from checks against bpf_jit_limitLorenz Bauer1-1/+1
When introducing CAP_BPF, bpf_jit_charge_modmem() was not changed to treat programs with CAP_BPF as privileged for the purpose of JIT memory allocation. This means that a program without CAP_BPF can block a program with CAP_BPF from loading a program. Fix this by checking bpf_capable() in bpf_jit_charge_modmem(). Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922111153.19843-1-lmb@cloudflare.com
2021-09-28bpf/tests: Add tail call limit test with external function callJohan Almbladh1-3/+83
This patch adds a tail call limit test where the program also emits a BPF_CALL to an external function prior to the tail call. Mainly testing that JITed programs preserve its internal register state, for example tail call count, across such external calls. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-15-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Fix error in tail call limit testsJohan Almbladh1-10/+27
This patch fixes an error in the tail call limit test that caused the test to fail on for x86-64 JIT. Previously, the register R0 was used to report the total number of tail calls made. However, after a tail call fall-through, the value of the R0 register is undefined. Now, all tail call error path tests instead use context state to store the count. Fixes: 874be05f525e ("bpf, tests: Add tail call test suite") Reported-by: Paul Chaignon <paul@cilium.io> Reported-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/bpf/20210914091842.4186267-14-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add more BPF_END byte order conversion testsJohan Almbladh1-0/+122
This patch adds tests of the high 32 bits of 64-bit BPF_END conversions. It also adds a mirrored set of tests where the source bytes are reversed. The MSB of each byte is now set on the high word instead, possibly affecting sign-extension during conversion in a different way. Mainly for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-13-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Expand branch conversion JIT testJohan Almbladh1-34/+91
This patch expands the branch conversion test introduced by 66e5eb84 ("bpf, tests: Add branch conversion JIT test"). The test now includes a JMP with maximum eBPF offset. This triggers branch conversion for the 64-bit MIPS JIT. Additional variants are also added for cases when the branch is taken or not taken. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-12-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add JMP tests with degenerate conditionalJohan Almbladh1-0/+229
This patch adds a set of tests for JMP and JMP32 operations where the branch decision is know at JIT time. Mainly testing JIT behaviour. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-11-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add JMP tests with small offsetsJohan Almbladh1-0/+71
This patch adds a set of tests for JMP to verify that the JITed jump offset is calculated correctly. We pretend that the verifier has inserted any zero extensions to make the jump-over operations JIT to one instruction each, in order to control the exact JITed jump offset. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-10-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add test case flag for verifier zero-extensionJohan Almbladh1-0/+3
This patch adds a new flag to indicate that the verified did insert zero-extensions, even though the verifier is not being run for any of the tests. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-9-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add exhaustive test of LD_IMM64 immediate magnitudesJohan Almbladh1-0/+63
This patch adds a test for the 64-bit immediate load, a two-instruction operation, to verify correctness for all possible magnitudes of the immediate operand. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-8-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add staggered JMP and JMP32 testsJohan Almbladh1-0/+829
This patch adds a new type of jump test where the program jumps forwards and backwards with increasing offset. It mainly tests JITs where a relative jump may generate different JITed code depending on the offset size, read MIPS. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-7-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add exhaustive tests of JMP operand magnitudesJohan Almbladh1-0/+779
This patch adds a set of tests for conditional JMP and JMP32 operations to verify correctness for all possible magnitudes of the immediate and register operands. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-6-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add exhaustive tests of ALU operand magnitudesJohan Almbladh1-0/+772
This patch adds a set of tests for ALU64 and ALU32 arithmetic and bitwise logical operations to verify correctness for all possible magnitudes of the register and immediate operands. Mainly intended for JIT testing. The patch introduces a pattern generator that can be used to drive extensive tests of different kinds of operations. It is parameterized to allow tuning of the operand combinations to test. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-5-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Add exhaustive tests of ALU shift valuesJohan Almbladh1-0/+260
This patch adds a set of tests for ALU64 and ALU32 shift operations to verify correctness for all possible values of the shift value. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-4-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Reduce memory footprint of test suiteJohan Almbladh1-14/+12
The test suite used to call any fill_helper callbacks to generate eBPF program data for all test cases at once. This caused ballooning memory requirements as more extensive test cases were added. Now the each fill_helper is called before the test is run and the allocated memory released afterwards, before the next test case is processed. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-3-johan.almbladh@anyfinetworks.com
2021-09-28bpf/tests: Allow different number of runs per test caseJohan Almbladh1-0/+4
This patch allows a test cast to specify the number of runs to use. For compatibility with existing test case definitions, the default value 0 is interpreted as MAX_TESTRUNS. A reduced number of runs is useful for complex test programs where 1000 runs may take a very long time. Instead of reducing what is tested, one can instead reduce the number of times the test is run. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-2-johan.almbladh@anyfinetworks.com
2021-09-28libbpf: Ignore STT_SECTION symbols in 'maps' sectionToke Høiland-Jørgensen1-2/+3
When parsing legacy map definitions, libbpf would error out when encountering an STT_SECTION symbol. This becomes a problem because some versions of binutils will produce SECTION symbols for every section when processing an ELF file, so BPF files run through 'strip' will end up with such symbols, making libbpf refuse to load them. There's not really any reason why erroring out is strictly necessary, so change libbpf to just ignore SECTION symbols when parsing the ELF. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210927205810.715656-1-toke@redhat.com
2021-09-28Merge branch 'bpf-xsk-rx-batch'Daniel Borkmann10-156/+376
Magnus Karlsson says: ==================== This patch set introduces a batched interface for Rx buffer allocation in AF_XDP buffer pool. Instead of using xsk_buff_alloc(*pool), drivers can now use xsk_buff_alloc_batch(*pool, **xdp_buff_array, max). Instead of returning a pointer to an xdp_buff, it returns the number of xdp_buffs it managed to allocate up to the maximum value of the max parameter in the function call. Pointers to the allocated xdp_buff:s are put in the xdp_buff_array supplied in the call. This could be a SW ring that already exists in the driver or a new structure that the driver has allocated. u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max); When using this interface, the driver should also use the new interface below to set the relevant fields in the struct xdp_buff. The reason for this is that xsk_buff_alloc_batch() does not fill in the data and data_meta fields for you as is the case with xsk_buff_alloc(). So it is not sufficient to just set data_end (effectively the size) anymore in the driver. The reason for this is performance as explained in detail in the commit message. void xsk_buff_set_size(struct xdp_buff *xdp, u32 size); Patch 6 also optimizes the buffer allocation in the aligned case. In this case, we can skip the reinitialization of most fields in the xdp_buff_xsk struct at allocation time. As the number of elements in the heads array is equal to the number of possible buffers in the umem, we can initialize them once and for all at bind time and then just point to the correct one in the xdp_buff_array that is returned to the driver. No reason to have a stack of free head entries. In the unaligned case, the buffers can reside anywhere in the umem, so this optimization is not possible as we still have to fill in the right information in the xdp_buff every single time one is allocated. I have updated i40e and ice to use this new batched interface. These are the throughput results on my 2.1 GHz Cascade Lake system: Aligned mode: ice: +11% / -9 cycles/pkt i40e: +12% / -9 cycles/pkt Unaligned mode: ice: +1.5% / -1 cycle/pkt i40e: +1% / -1 cycle/pkt For the aligned case, batching provides around 40% of the performance improvement and the aligned optimization the rest, around 60%. Would have expected a ~4% boost for unaligned with this data, but I only get around 1%. Do not know why. Note that memory consumption in aligned mode is also reduced by this patch set. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-09-28selftests: xsk: Add frame_headroom testMagnus Karlsson2-11/+44
Add a test for the frame_headroom feature that can be set on the umem. The logic added validates that all offsets in all tests and packets are valid, not just the ones that have a specifically configured frame_headroom. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-14-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Change interleaving of packets in unaligned modeMagnus Karlsson1-3/+3
Change the interleaving of packets in unaligned mode. With the current buffer addresses in the packet stream, the last buffer in the umem could not be used as a large packet could potentially write over the end of the umem. The kernel correctly threw this buffer address away and refused to use it. This is perfectly fine for all regular packet streams, but the ones used for unaligned mode have every other packet being at some different offset. As we will add checks for correct offsets in the next patch, this needs to be fixed. Just start these page-boundary straddling buffers one page earlier so that the last one is not on the last page of the umem, making all buffers valid. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-13-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Add single packet testMagnus Karlsson2-0/+14
Add a test where a single packet is sent and received. This might sound like a silly test, but since many of the interfaces in xsk are batched, it is important to be able to validate that we did not break something as fundamental as just receiving single packets, instead of batches of packets at high speed. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-12-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Introduce pacing of trafficMagnus Karlsson2-7/+29
Introduce pacing of traffic so that the Tx thread can never send more packets than the receiver has processed plus the number of packets it can have in its umem. So at any point in time, the number of in flight packets (not processed by the Rx thread) are less than or equal to the number of packets that can be held in the Rx thread's umem. The batch size is also increased to improve running time. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-11-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Fix socket creation retryMagnus Karlsson1-6/+5
The socket creation retry unnecessarily registered the umem once for every retry. No reason to do this. It wastes memory and it might lead to too many pages being locked at some point and the failure of a test. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-10-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Put the same buffer only once in the fill ringMagnus Karlsson1-6/+11
Fix a problem where the fill ring was populated with too many entries. If number of buffers in the umem was smaller than the fill ring size, the code used to loop over from the beginning of the umem and start putting the same buffers in again. This is racy indeed as a later packet can be received overwriting an earlier one before the Rx thread manages to validate it. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-9-magnus.karlsson@gmail.com
2021-09-28selftests: xsk: Fix missing initializationMagnus Karlsson1-0/+7
Fix missing initialization of the member rx_pkt_nb in the packet stream. This leads to some tests declaring success too early as the test thought all packets had already been received. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-8-magnus.karlsson@gmail.com
2021-09-28xsk: Optimize for aligned caseMagnus Karlsson3-38/+79
Optimize for the aligned case by precomputing the parameter values of the xdp_buff_xsk and xdp_buff structures in the heads array. We can do this as the heads array size is equal to the number of chunks in the umem for the aligned case. Then every entry in this array will reflect a certain chunk/frame and can therefore be prepopulated with the correct values and we can drop the use of the free_heads stack. Note that it is not possible to allocate more buffers than what has been allocated in the aligned case since each chunk can only contain a single buffer. We can unfortunately not do this in the unaligned case as one chunk might contain multiple buffers. In this case, we keep the old scheme of populating a heads entry every time it is used and using the free_heads stack. Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They were for some reason in xsk.c even though they are buffer pool operations. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
2021-09-28i40e: Use the xsk batched rx allocation interfaceMagnus Karlsson1-27/+25
Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com
2021-09-28ice: Use the xsk batched rx allocation interfaceMagnus Karlsson1-25/+19
Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com
2021-09-28ice: Use xdp_buf instead of rx_buf for xsk zero-copyMagnus Karlsson2-39/+33
In order to use the new xsk batched buffer allocation interface, a pointer to an array of struct xsk_buff pointers need to be provided so that the function can put the result of the allocation there. In the ice driver, we already have a ring that stores pointers to xdp_buffs. This is only used for the xsk zero-copy driver and is a union with the structure that is used for the regular non zero-copy path. Unfortunately, that structure is larger than the xdp_buffs pointers which mean that there will be a stride (of 20 bytes) between each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch interface will not work since it assumes a regular array of xdp_buff pointers (each 8 bytes with 0 bytes in-between them on a 64-bit system). To fix this, remove the xdp_buff pointer from the rx_buf union and move it one step higher to the union above which only has pointers to arrays in it. This solves the problem and we can directly feed the SW ring of xdp_buff pointers straight into the allocation function in the next patch when that interface is used. This will improve performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com
2021-09-28xsk: Batched buffer allocation for the poolMagnus Karlsson4-4/+118
Add a new driver interface xsk_buff_alloc_batch() offering batched buffer allocations to improve performance. The new interface takes three arguments: the buffer pool to allocated from, a pointer to an array of struct xdp_buff pointers which will contain pointers to the allocated xdp_buffs, and an unsigned integer specifying the max number of buffers to allocate. The return value is the actual number of buffers that the allocator managed to allocate and it will be in the range 0 <= N <= max, where max is the third parameter to the function. u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max); A second driver interface is also introduced that need to be used in conjunction with xsk_buff_alloc_batch(). It is a helper that sets the size of struct xdp_buff and is used by the NIC Rx irq routine when receiving a packet. This helper sets the three struct members data, data_meta, and data_end. The two first ones is in the xsk_buff_alloc() case set in the allocation routine and data_end is set when a packet is received in the receive irq function. This unfortunately leads to worse performance since the xdp_buff is touched twice with a long time period in between leading to an extra cache miss. Instead, we fill out the xdp_buff with all 3 fields at one single point in time in the driver, when the size of the packet is known. Hence this helper. Note that the driver has to use this helper (or set all three fields itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as before and does not require this. void xsk_buff_set_size(struct xdp_buff *xdp, u32 size); Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
2021-09-28xsk: Get rid of unused entry in struct xdp_buff_xskMagnus Karlsson1-1/+0
Get rid of the unused entry "unaligned" in struct xdp_buff_xsk. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-2-magnus.karlsson@gmail.com
2021-09-28Merge tag 'perf-tools-fixes-for-v5.15-2021-09-27' of ↵Linus Torvalds20-43/+70
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools fixes from Arnaldo Carvalho de Melo: - Fix 'perf test' DWARF unwind for optimized builds. - Fix 'perf test' 'Object code reading' when dealing with samples in @plt symbols. - Fix off-by-one directory paths in the ARM support code. - Fix error message to eliminate confusion in 'perf config' when first creating a config file. - 'perf iostat' fix for system wide operation. - Fix printing of metrics when 'perf iostat' is used with one or more iio_root_ports and unconnected cpus (using -C). - Fix several typos in the documentation files. - Fix spelling mistake "icach" -> "icache" in the power8 JSON vendor files. * tag 'perf-tools-fixes-for-v5.15-2021-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf iostat: Fix Segmentation fault from NULL 'struct perf_counts_values *' perf iostat: Use system-wide mode if the target cpu_list is unspecified perf config: Refine error message to eliminate confusion perf doc: Fix typos all over the place perf arm: Fix off-by-one directory paths. perf vendor events powerpc: Fix spelling mistake "icach" -> "icache" perf tests: Fix flaky test 'Object code reading' perf test: Fix DWARF unwind for optimized builds.
2021-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds40-269/+556
Pull kvm fixes from Paolo Bonzini: "A bit late... I got sidetracked by back-from-vacation routines and conferences. But most of these patches are already a few weeks old and things look more calm on the mailing list than what this pull request would suggest. x86: - missing TLB flush - nested virtualization fixes for SMM (secure boot on nested hypervisor) and other nested SVM fixes - syscall fuzzing fixes - live migration fix for AMD SEV - mirror VMs now work for SEV-ES too - fixes for reset - possible out-of-bounds access in IOAPIC emulation - fix enlightened VMCS on Windows 2022 ARM: - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms Generic: - KCSAN fixes selftests: - random fixes, mostly for clang compilation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) selftests: KVM: Explicitly use movq to read xmm registers selftests: KVM: Call ucall_init when setting up in rseq_test KVM: Remove tlbs_dirty KVM: X86: Synchronize the shadow pagetable before link it KVM: X86: Fix missed remote tlb flush in rmap_write_protect() KVM: x86: nSVM: don't copy virt_ext from vmcb12 KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 KVM: x86: nSVM: restore int_vector in svm_clear_vintr kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode KVM: x86: reset pdptrs_from_userspace when exiting smm KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit KVM: nVMX: Filter out all unsupported controls when eVMCS was activated KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs KVM: Clean up benign vcpu->cpu data races when kicking vCPUs ...
2021-09-27Merge tag 'media/v5.15-2' of ↵Linus Torvalds5-26/+45
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A couple of driver fixes: - hantro: Fix check for single irq - cedrus: Fix SUNXI tile size calculation - s5p-jpeg: rename JPEG marker constants to prevent build warnings - ir_toy: prevent device from hanging during transmit" * tag 'media/v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: ir_toy: prevent device from hanging during transmit media: s5p-jpeg: rename JPEG marker constants to prevent build warnings media: cedrus: Fix SUNXI tile size calculation media: hantro: Fix check for single irq
2021-09-27watchdog/sb_watchdog: fix compilation problem due to COMPILE_TESTJackie Liu1-1/+1
Compiling sb_watchdog needs to clearly define SIBYTE_HDR_FEATURES. In arch/mips/sibyte/Platform like: cflags-$(CONFIG_SIBYTE_BCM112X) += \ -I$(srctree)/arch/mips/include/asm/mach-sibyte \ -DSIBYTE_HDR_FEATURES=SIBYTE_HDR_FMASK_1250_112x_ALL Otherwise, SIBYTE_HDR_FEATURES is SIBYTE_HDR_FMASK_ALL. SIBYTE_HDR_FMASK_ALL is mean: #define SIBYTE_HDR_FMASK_ALL SIBYTE_HDR_FMASK_1250_ALL | SIBYTE_HDR_FMASK_112x_ALL \ | SIBYTE_HDR_FMASK_1480_ALL) So, If not limited to CPU_SB1, we will get such an error: arch/mips/include/asm/sibyte/bcm1480_scd.h:261: error: "M_SPC_CFG_CLEAR" redefined [-Werror] arch/mips/include/asm/sibyte/bcm1480_scd.h:262: error: "M_SPC_CFG_ENABLE" redefined [-Werror] Fixes: da2a68b3eb47 ("watchdog: Enable COMPILE_TEST where possible") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-27vboxfs: fix broken legacy mount signature checkingLinus Torvalds1-10/+2
Commit 9d682ea6bcc7 ("vboxsf: Fix the check for the old binary mount-arguments struct") was meant to fix a build error due to sign mismatch in 'char' and the use of character constants, but it just moved the error elsewhere, in that on some architectures characters and signed and on others they are unsigned, and that's just how the C standard works. The proper fix is a simple "don't do that then". The code was just being silly and odd, and it should never have cared about signed vs unsigned characters in the first place, since what it is testing is not four "characters", but four bytes. And the way to compare four bytes is by using "memcmp()". Which compilers will know to just turn into a single 32-bit compare with a constant, as long as you don't have crazy debug options enabled. Link: https://lore.kernel.org/lkml/20210927094123.576521-1-arnd@kernel.org/ Cc: Arnd Bergmann <arnd@kernel.org> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-27RDMA/hns: Add the check of the CQE size of the user spaceWenpeng Liang1-9/+22
If the CQE size of the user space is not the size supported by the hardware, the creation of CQ should be stopped. Fixes: 09a5f210f67e ("RDMA/hns: Add support for CQE in size of 64 Bytes") Link: https://lore.kernel.org/r/20210927125557.15031-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27RDMA/hns: Fix the size setting error when copying CQE in clean_cq()Wenpeng Liang1-1/+1
The size of CQE is different for different versions of hardware, so the driver needs to specify the size of CQE explicitly. Fixes: 09a5f210f67e ("RDMA/hns: Add support for CQE in size of 64 Bytes") Link: https://lore.kernel.org/r/20210927125557.15031-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27RDMA/hfi1: Fix kernel pointer leakGuo Zhi1-4/+4
Pointers should be printed with %p or %px rather than cast to 'unsigned long long' and printed with %llx. Change %llx to %p to print the secured pointer. Fixes: 042a00f93aad ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev") Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn> Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27Merge branch 'for-linus' of ↵Linus Torvalds5-8/+32
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - NULL pointer dereference fixes in amd_sfh driver (Basavaraj Natikar, Evgeny Novikov) - data processing fix for hid-u2fzero (Andrej Shadura) - fix for out-of-bounds write in hid-betop (F.A.Sulaiman) - new device IDs / device-specific quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: amd_sfh: Fix potential NULL pointer dereference HID: u2fzero: ignore incomplete packets without data HID: amd_sfh: Fix potential NULL pointer dereference HID: wacom: Add new Intuos BT (CTL-4100WL/CTL-6100WL) device IDs HID: apple: Fix logical maximum and usage maximum of Magic Keyboard JIS HID: betop: fix slab-out-of-bounds Write in betop_probe
2021-09-27e100: fix buffer overrun in e100_get_regsJacob Keller1-6/+10
The e100_get_regs function is used to implement a simple register dump for the e100 device. The data is broken into a couple of MAC control registers, and then a series of PHY registers, followed by a memory dump buffer. The total length of the register dump is defined as (1 + E100_PHY_REGS) * sizeof(u32) + sizeof(nic->mem->dump_buf). The logic for filling in the PHY registers uses a convoluted inverted count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This is actually one more than the supposed number of PHY registers. The memory dump buffer is then filled into the space at [2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past the total size. The end result is that we overrun the total buffer size allocated by the kernel, which could lead to a panic or other issues due to memory corruption. It is difficult to determine the actual total number of registers here. The only 8255x datasheet I could find indicates there are 28 total MDI registers. However, we're reading 29 here, and reading them in reverse! In addition, the ethtool e100 register dump interface appears to read the first PHY register to determine if the device is in MDI or MDIx mode. This doesn't appear to be documented anywhere within the 8255x datasheet. I can only assume it must be in register 28 (the extra register we're reading here). Lets not change any of the intended meaning of what we copy here. Just extend the space by 4 bytes to account for the extra register and continue copying the data out in the same order. Change the E100_PHY_REGS value to be the correct total (29) so that the total register dump size is calculated properly. Fix the offset for where we copy the dump buffer so that it doesn't overrun the total size. Re-write the for loop to use counting up instead of the convoluted down-counting. Correct the mdio_read offset to use the 0-based register offsets, but maintain the bizarre reverse ordering so that we have the ABI expected by applications like ethtool. This requires and additional subtraction of 1. It seems a bit odd but it makes the flow of assignment into the register buffer easier to follow. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-27e100: fix length calculation in e100_get_regs_lenJacob Keller1-1/+5
commit abf9b902059f ("e100: cleanup unneeded math") tried to simplify e100_get_regs_len and remove a double 'divide and then multiply' calculation that the e100_reg_regs_len function did. This change broke the size calculation entirely as it failed to account for the fact that the numbered registers are actually 4 bytes wide and not 1 byte. This resulted in a significant under allocation of the register buffer used by e100_get_regs. Fix this by properly multiplying the register count by u32 first before adding the size of the dump buffer. Fixes: abf9b902059f ("e100: cleanup unneeded math") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-27Merge branch 'bcmgenet-flow-control'David S. Miller3-89/+130
Florian Fainelli says: ==================== net: bcmgenet: support for flow control This patch series adds support for flow control to the GENET driver, the first 2 patches remove superfluous code, the 3rd one does re-organize code a little bit and the 4th one ads the support for flow control proper. ====================
2021-09-27net: bcmgenet: add support for ethtool flow controlDoug Berger3-7/+92
This commit extends the supported ethtool operations to allow MAC level flow control to be configured for the bcmgenet driver. The ethtool utility can be used to change the configuration of auto-negotiated symmetric and asymmetric modes as well as manually configuring support for RX and TX Pause frames individually. Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27net: bcmgenet: pull mac_config from adjust_linkDoug Berger1-45/+49
This commit separates out the MAC configuration that occurs on a PHY state change into a function named bcmgenet_mac_config(). This allows the function to be called directly elsewhere. Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27net: bcmgenet: remove old link state valuesDoug Berger3-45/+0
The PHY state machine has been fixed to only call the adjust_link callback when the link state has changed. Therefore the old link state variables are no longer needed to detect a change in link state. This commit effectively reverts commit 5ad6e6c50899 ("net: bcmgenet: improve bcmgenet_mii_setup()") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27net: bcmgenet: remove netif_carrier_off from adjust_linkDoug Berger1-3/+0
The bcmgenet_mii_setup() function is registered as the adjust_link callback from the phylib for the GENET driver. The phylib always sets the netif_carrier according to phydev->link prior to invoking the adjust_link callback, so there is no need to repeat that in the link down case within the network driver. Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>