summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-13selftests/bpf: Expand getsockname and getpeername testsJordan Rife5-2/+412
This expands coverage for getsockname and getpeername hooks to include getsockname4, getsockname6, getpeername4, and getpeername6. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-17-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13sefltests/bpf: Expand sockaddr hook deny testsJordan Rife7-0/+378
This patch expands test coverage for EPERM tests to include connect and bind calls and rounds out the coverage for sendmsg by adding tests for sendmsg_unix. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-16-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Expand sockaddr program return value testsJordan Rife1-0/+294
This patch expands verifier coverage for program return values to cover bind, connect, sendmsg, getsockname, and getpeername hooks. It also rounds out the recvmsg coverage by adding test cases for recvmsg_unix hooks. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Retire test_sock_addr.(c|sh)Jordan Rife4-636/+1
Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage has been fully moved to prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Remove redundant sendmsg test casesJordan Rife1-161/+0
Remove these test cases completely, as the same behavior is already covered by other sendmsg* test cases in prog_tests/sock_addr.c. This just rewrites the destination address similar to sendmsg_v4_prog and sendmsg_v6_prog. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate ATTACH_REJECT test casesJordan Rife2-146/+102
Migrate test case from bpf/test_sock_addr.c ensuring that program attachment fails when using an inappropriate attach type. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate expected_attach_type testsJordan Rife2-84/+96
Migrates tests from progs/test_sock_addr.c ensuring that programs fail to load when the expected attach type does not match. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate wildcard destination rewrite testJordan Rife3-20/+37
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg respects when sendmsg6 hooks rewrite the destination IP with the IPv6 wildcard IP, [::]. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate sendmsg6 v4 mapped address testsJordan Rife3-20/+42
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate sendmsg deny test casesJordan Rife4-45/+110
This set of tests checks that sendmsg calls are rejected (return -EPERM) when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c with corresponding tests in prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate WILDCARD_IP testJordan Rife3-20/+56
Move wildcard IP sendmsg test case out of bpf/test_sock_addr.c into prog_tests/sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-7-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test casesJordan Rife1-20/+58
In preparation to move test cases from bpf/test_sock_addr.c that expect system calls to return ENOTSUPP or EPERM, this patch propagates errno from relevant system calls up to test_sock_addr() where the result can be checked. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-6-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Handle ATTACH_REJECT test casesJordan Rife1-1/+34
In preparation to move test cases from bpf/test_sock_addr.c that expect ATTACH_REJECT, this patch adds BPF_SKEL_FUNCS_RAW to generate load and destroy functions that use bpf_prog_attach() to control the attach_type. The normal load functions use bpf_program__attach_cgroup which does not have the same degree of control over the attach type, as bpf_program_attach_fd() calls bpf_link_create() with the attach type extracted from prog using bpf_program__expected_attach_type(). It is currently not possible to modify the attach type before bpf_program__attach_cgroup() is called, since bpf_program__set_expected_attach_type() has no effect after the program is loaded. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-5-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Handle LOAD_REJECT test casesJordan Rife1-5/+98
In preparation to move test cases from bpf/test_sock_addr.c that expect LOAD_REJECT, this patch adds expected_attach_type and extends load_fn to accept an expected attach type and a flag indicating whether or not rejection is expected. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-4-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Use program name for skel load/destroy functionsJordan Rife1-46/+50
In preparation to migrate tests from bpf/test_sock_addr.c to sock_addr.c, update BPF_SKEL_FUNCS so that it generates functions based on prog_name instead of skel_name. This allows us to differentiate between programs in the same skeleton. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-3-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13selftests/bpf: Migrate recvmsg* return code tests to verifier_sock_addr.cJordan Rife3-70/+39
This set of tests check that the BPF verifier rejects programs with invalid return codes (recvmsg4 and recvmsg6 hooks can only return 1). This patch replaces the tests in test_sock_addr.c with verifier_sock_addr.c, a new verifier prog_tests for sockaddr hooks, in a step towards fully retiring test_sock_addr.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240510190246.3247730-2-jrife@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13riscv, bpf: make some atomic operations fully orderedPuranjay Mohan1-10/+10
The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering. Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says: | To provide more efficient support for release consistency [5], each | atomic instruction has two bits, aq and rl, used to specify additional | memory ordering constraints as viewed by other RISC-V harts. and | If only the aq bit is set, the atomic memory operation is treated as | an acquire access. | If only the rl bit is set, the atomic memory operation is treated as a | release access. | | If both the aq and rl bits are set, the atomic memory operation is | sequentially consistent. Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG. [1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13riscv, bpf: Fix typo in commentXiao Wang1-2/+2
We can use either "instruction" or "insn" in the comment. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13s390/bpf: Emit a barrier for BPF_FETCH instructionsIlya Leoshkevich1-2/+6
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH" should be the same as atomic_fetch_add(), which is currently not the case on s390x: the serialization instruction "bcr 14,0" is missing. This applies to "and", "or" and "xor" variants too. s390x is allowed to reorder stores with subsequent fetches from different addresses, so code relying on BPF_FETCH acting as a barrier, for example: stw [%r0], 1 afadd [%r1], %r2 ldxw %r3, [%r4] may be broken. Fix it by emitting "bcr 14,0". Note that a separate serialization instruction is not needed for BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs serialization itself. Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops") Reported-by: Puranjay Mohan <puranjay12@gmail.com> Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13Merge branch 'bpf-inline-helpers-in-arm64-and-riscv-jits'Alexei Starovoitov8-0/+132
Puranjay Mohan says: ==================== bpf: Inline helpers in arm64 and riscv JITs Changes in v5 -> v6: arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/ riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ - Combine riscv and arm64 changes in single series - Some coding style fixes Changes in v4 -> v5: v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/ - Implement the inlining of the bpf_get_smp_processor_id() in the JIT. NOTE: This needs to be based on: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ to be built. Manual run of bpf-ci with this series rebased on above: https://github.com/kernel-patches/bpf/pull/6929 Changes in v3 -> v4: v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/ - Fix coding style issue related to C89 standards. Changes in v2 -> v3: v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/ - Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)" - Made ARM64 and x86-64 use the same code for inlining. The only difference that remains is the per-cpu address of the cpu_number. Changes in v1 -> v2: v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/ - Add a patch to inline bpf_get_smp_processor_id() - Fix an issue in MRS instruction encoding as pointed out by Will - Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP This series adds the support of internal only per-CPU instructions and inlines the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs. Here is an example of calls to bpf_get_smp_processor_id() and percpu_array_map_lookup_elem() before and after this series on ARM64. BPF ===== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); (85) call bpf_get_smp_processor_id#229032 (85) call bpf_get_smp_processor_id#8 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); (18) r1 = map[id:78] (18) r1 = map[id:153] (18) r2 = map[id:82][0]+65536 (18) r2 = map[id:157][0]+65536 (85) call percpu_array_map_lookup_elem#313512 (07) r1 += 496 (61) r0 = *(u32 *)(r2 +0) (35) if r0 >= 0x1 goto pc+5 (67) r0 <<= 3 (0f) r0 += r1 (79) r0 = *(u64 *)(r0 +0) (bf) r0 = &(void __percpu *)(r0) (05) goto pc+1 (b7) r0 = 0 ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); mov x0, #0xffff0003ffffffff mov x0, #0xffff0003ffffffff movk x0, #0xce5c, lsl #16 movk x0, #0xe0f3, lsl #16 movk x0, #0xca00 movk x0, #0x7c00 mov x1, #0xffff8000ffffffff mov x1, #0xffff8000ffffffff movk x1, #0x8bdb, lsl #16 movk x1, #0xb0c7, lsl #16 movk x1, #0x6000 movk x1, #0xe000 mov x10, #0xffffffffffff3ed0 add x0, x0, #0x1f0 movk x10, #0x802d, lsl #16 ldr w7, [x1] movk x10, #0x8000, lsl #32 cmp x7, #0x1 blr x10 b.cs 0x0000000000000090 add x7, x0, #0x0 lsl x7, x7, #3 add x7, x7, x0 ldr x7, [x7] mrs x10, tpidr_el1 add x7, x7, x10 b 0x0000000000000094 mov x7, #0x0 Performance improvement found using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef RISCV64 JIT output for `call bpf_get_smp_processor_id` ======================================================= Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ ==================== Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13bpf, arm64: inline bpf_get_smp_processor_id() helperPuranjay Mohan3-0/+28
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting a read from struct thread_info. The SP_EL0 system register holds the pointer to the task_struct and thread_info is the first member of this struct. We can read the cpu number from the thread_info. Here is how the ARM64 JITed assembly changes after this commit: ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 Performance improvement using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan4-0/+38
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu access using tpidr_el1"), the per-cpu offset for the CPU is stored in the tpidr_el1/2 register of that CPU. To support this BPF instruction in the ARM64 JIT, the following ARM64 instructions are emitted: mov dst, src // Move src to dst, if src != dst mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. add dst, dst, tmp // Add the per cpu offset to the dst. To measure the performance improvement provided by this change, the benchmark in [1] was used: Before: glob-arr-inc : 23.597 ± 0.012M/s arr-inc : 23.173 ± 0.019M/s hash-inc : 12.186 ± 0.028M/s After: glob-arr-inc : 23.819 ± 0.034M/s arr-inc : 23.285 ± 0.017M/s hash-inc : 12.419 ± 0.011M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13riscv, bpf: inline bpf_get_smp_processor_id()Puranjay Mohan4-0/+42
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan1-0/+24
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13ARC: Add eBPF JIT supportShahab Vahedi9-2/+4611
This will add eBPF JIT support to the 32-bit ARCv2 processors. The implementation is qualified by running the BPF tests on a Synopsys HSDK board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU. The test_bpf.ko reports 2-10 fold improvements in execution time of its tests. For instance: test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS Deployment and structure ------------------------ The related codes are added to "arch/arc/net": - bpf_jit.h -- The interface that a back-end translator must provide - bpf_jit_core.c -- Knows how to handle the input eBPF byte stream - bpf_jit_arcv2.c -- The back-end code that knows the translation logic The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance to the whole process. Normally, the translation is done in one pass, namely the "normal pass". In case some relocations are not known during this pass, some data (arc_jit_data) is allocated for the next pass to come. This possible next (and last) pass is called the "extra pass". 1. Normal pass # The necessary pass 1a. Dry run # Get the whole JIT length, epilogue offset, etc. 1b. Emit phase # Allocate memory and start emitting instructions 2. Extra pass # Only needed if there are relocations to be fixed 2a. Patch relocations Support status -------------- The JIT compiler supports BPF instructions up to "cpu=v4". However, it does not yet provide support for: - Tail calls - Atomic operations - 64-bit division/remainder - BPF_PROBE_MEM* (exception table) The result of "test_bpf" test suite on an HSDK board is: hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] All the failing test cases are due to the ones that were not JIT'ed. Categorically, they can be represented as: .-----------.------------.-------------. | test type | opcodes | # of cases | |-----------+------------+-------------| | atomic | 0xC3, 0xDB | 149 | | div64 | 0x37, 0x3F | 22 | | mod64 | 0x97, 0x9F | 15 | `-----------^------------+-------------| | (total) 186 | `-------------' Setup: build config ------------------- The following configs must be set to have a working JIT test: CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_TEST_BPF=m The following options are not necessary for the tests module, but are good to have: CONFIG_DEBUG_INFO=y # prerequisite for below CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h CONFIG_FTRACE=y # CONFIG_BPF_SYSCALL=y # all these options lead to CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y CONFIG_PERF_EVENTS=y # Some BPF programs provide data through /sys/kernel/debug: CONFIG_DEBUG_FS=y arc# mount -t debugfs debugfs /sys/kernel/debug Setup: elfutils --------------- The libdw.{so,a} library that is used by pahole for processing the final binary must come from elfutils 0.189 or newer. The support for ARCv2 [1] has been added since that version. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7 Setup: pahole ------------- The line below in linux/scripts/Makefile.btf must be commented out: pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats Or else, the build will fail: $ make V=1 ... BTF .btf.vmlinux.bin.o pahole -J --btf_gen_floats \ -j --lang_exclude=rust \ --skip_encoding_btf_inconsistent_proto \ --btf_gen_optimized .tmp_vmlinux.btf Complex, interval and imaginary float types are not supported Encountered error while encoding BTF. ... BTFIDS vmlinux ./tools/bpf/resolve_btfids/resolve_btfids vmlinux libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available This is due to the fact that the ARC toolchains generate "complex float" DIE entries in libgcc and at the moment, pahole can't handle such entries. Running the tests ----------------- host$ scp /bld/linux/lib/test_bpf.ko arc: arc # sysctl net.core.bpf_jit_enable=1 arc # insmod test_bpf.ko test_suite=test_bpf ... test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] Acknowledgments --------------- - Claudiu Zissulescu for his unwavering support - Yuriy Kolerov for testing and troubleshooting - Vladimir Isaev for the pahole workaround - Sergey Matyukevich for paving the road by adding the interpreter support Signed-off-by: Shahab Vahedi <shahab@synopsys.com> Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-13hwmon: (nzxt-kraken3) Bail out for unsupported device variantsGuenter Roeck1-1/+2
Dan Carpenter reports: Commit cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names from kinds") from Apr 28, 2024 (linux-next), leads to the following Smatch static checker warning: drivers/hwmon/nzxt-kraken3.c:957 kraken3_probe() error: uninitialized symbol 'device_name'. Indeed, 'device_name' will be uninitizalized if an unknown product is encountered. In practice this should not matter because the driver should not instantiate on unknown products, but lets play safe and bail out if that happens. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-hwmon/b1738c50-db42-40f0-a899-9c027c131ffb@moroto.mountain/ Cc: Jonas Malaco <jonas@protocubo.io> Cc: Aleksa Savic <savicaleksa83@gmail.com> Fixes: cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names from kinds") Acked-by: Jonas Malaco <jonas@protocubo.io> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-13Linux 6.9Linus Torvalds1-1/+1
2024-05-12Merge tag 'kselftest-fix-vfork-2024-05-12' of ↵Linus Torvalds4-67/+147
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull Kselftest fixes from Mickaël Salaün: "Fix Kselftest's vfork() side effects. As reported by Kernel Test Robot and Sean Christopherson, some tests fail since v6.9-rc1 . This is due to the use of vfork() which introduced some side effects. Similarly, while making it more generic, a previous commit made some Landlock file system tests flaky, and subject to the host's file system mount configuration. This fixes all these side effects by replacing vfork() with clone3() and CLONE_VFORK, which is cleaner (no arbitrary shared memory) and makes the Kselftest framework more robust" Link: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com Link: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com Link: https://lore.kernel.org/r/20240511171445.904356-1-mic@digikod.net * tag 'kselftest-fix-vfork-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: selftests/harness: Handle TEST_F()'s explicit exit codes selftests/harness: Fix vfork() side effects selftests/harness: Share _metadata between forked processes selftests/pidfd: Fix wrong expectation selftests/harness: Constify fixture variants selftests/landlock: Do not allocate memory in fixture data selftests/harness: Fix interleaved scheduling leading to race conditions selftests/harness: Fix fixture teardown selftests/landlock: Fix FS tests when run on a private mount point selftests/pidfd: Fix config for pidfd_setns_test
2024-05-12Merge tag 'for-linus-6.9' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm fix from Paolo Bonzini: - Fix NULL pointer read on s390 in ioctl(KVM_CHECK_EXTENSION) for /dev/kvm * tag 'for-linus-6.9' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: Check kvm pointer when testing KVM_CAP_S390_HPAGE_1M
2024-05-12Merge tag 'edac_urgent_for_v6.9' of ↵Linus Torvalds1-13/+37
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Fix a race condition when clearing error count bits and toggling the error interrupt throug the same register, in synopsys_edac * tag 'edac_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/synopsys: Fix ECC status and IRQ control race condition
2024-05-12hwmon: (emc1403) Add support for EMC1428 and EMC1438.Lars Petter Mostad2-14/+124
EMC1428 and EMC1438 are similar to EMC14xx, but have eight temperature channels, as well as signed data and limit registers. Chips currently supported by this driver have unsigned registers only. Signed-off-by: Lars Petter Mostad <larspm@gmail.com> Link: https://lore.kernel.org/r/20240510142824.824332-1-lars.petter.mostad@appear.net Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-12Merge tag 'x86_urgent_for_v6.9' of ↵Linus Torvalds3-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add a new PCI ID which belongs to a new AMD CPU family 0x1a - Ensure that that last level cache ID is set in all cases, in the AMD CPU topology parsing code, in order to prevent invalid scheduling domain CPU masks * tag 'x86_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology/amd: Ensure that LLC ID is initialized x86/amd_nb: Add new PCI IDs for AMD family 0x1a
2024-05-12ALSA: scarlett2: Increase mixer range to +12dBGeoffrey D. Bennett1-4/+5
The values loaded into the mixer are 16-bit values, with 8192 representing 0dB, going up to a current maximum of 16345 (+6dB). All supported interfaces have no problem going up to 32612 (+12dB), so update SCARLETT2_MIXER_MAX_DB and scarlett2_mixer_values[] to allow for this. Tested with: - Scarlett 2nd Gen 6i6, 18i8, 18i20 - Scarlett 3rd Gen 4i4, 8i6, 18i8, 18i20 - Scarlett 4th Gen Solo, 2i2, 4i4 - Clarett+ 2Pre, 4Pre, 8Pre - Vocaster One and Two Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/Zj+gYT4F2XeKTD93@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-12ALSA: scarlett2: Add S/PDIF source selection controlsGeoffrey D. Bennett1-0/+179
Add S/PDIF Source/Digital I/O Mode selection controls for the Scarlett 3rd Gen 18i8/18i20 and Clarett 4Pre/8Pre interfaces. These models have both coax S/PDIF and optical inputs, and the optical inputs are switchable between being used as S/PDIF and ADAT inputs. The Scarlett 3rd Gen 18i20 also has a "Dual ADAT" mode for 8-channel audio at 88.2/96kHz. Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/Zj8zCTjzPsTDENN+@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-11selftests/harness: Handle TEST_F()'s explicit exit codesMickaël Salaün1-1/+5
If TEST_F() explicitly calls exit(code) with code different than 0, then _metadata->exit_code is set to this code (e.g. KVM_ONE_VCPU_TEST()). We need to keep in mind that _metadata->exit_code can be KSFT_SKIP while the process exit code is 0. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Closes: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Link: https://lore.kernel.org/r/20240511171445.904356-11-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/harness: Fix vfork() side effectsMickaël Salaün2-25/+57
Setting the time namespace with CLONE_NEWTIME returns -EUSERS if the calling thread shares memory with another thread (because of the shared vDSO), which is the case when it is created with vfork(). Fix pidfd_setns_test by replacing test harness's vfork() call with a clone3() call with CLONE_VFORK, and an explicit sharing of the _metadata and self objects. Replace _metadata->teardown_parent with a new FIXTURE_TEARDOWN_PARENT() helper that can replace FIXTURE_TEARDOWN(). This is a cleaner approach and it enables to selectively share the fixture data between the child process running tests and the parent process running the fixture teardown. This also avoids updating several tests to not rely on the self object's copy-on-write property (e.g. storing the returned value of a fork() call). Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-10-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/harness: Share _metadata between forked processesMickaël Salaün1-11/+15
Unconditionally share _metadata between all forked processes, which enables to actually catch errors which were previously ignored. This is required for a following commit replacing vfork() with clone3() and CLONE_VFORK (i.e. not sharing the full memory) . It should also be useful to share _metadata to extend expectations to test process's forks. For instance, this change identified a wrong expectation in pidfd_setns_test. Because this _metadata is used by the new XFAIL_ADD(), use a global pointer initialized in TEST_F(). This is OK because only XFAIL_ADD() use it, and XFAIL_ADD() already depends on TEST_F(). Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-9-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/pidfd: Fix wrong expectationMickaël Salaün1-1/+1
Replace a wrong EXPECT_GT(self->child_pid_exited, 0) with EXPECT_GE(), which will be actually tested on the parent and child sides with a following commit. Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-8-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/harness: Constify fixture variantsMickaël Salaün1-2/+2
FIXTURE_VARIANT_ADD() types are passed as const pointers to FIXTURE_TEARDOWN(). Make that explicit by constifying the variants declarations. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-7-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/landlock: Do not allocate memory in fixture dataMickaël Salaün1-22/+35
Do not allocate self->dir_path in the test process because this would not be visible in the FIXTURE_TEARDOWN() process when relying on fork()/clone3() instead of vfork(). This change is required for a following commit removing vfork() call to not break the layout3_fs.* test cases. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-6-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/harness: Fix interleaved scheduling leading to race conditionsMickaël Salaün1-1/+14
Fix a race condition when running several FIXTURE_TEARDOWN() managing the same resource. This fixes a race condition in the Landlock file system tests when creating or unmounting the same directory. Using clone3() with CLONE_VFORK guarantees that the child and grandchild test processes are sequentially scheduled. This is implemented with a new clone3_vfork() helper replacing the fork() call. This avoids triggering this error in __wait_for_test(): Test ended in some other way [127] Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-5-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/harness: Fix fixture teardownMickaël Salaün1-5/+9
Make sure fixture teardowns are run when test cases failed, including when _metadata->teardown_parent is set to true. Make sure only one fixture teardown is run per test case, handling the case where the test child forks. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shengyu Li <shengyu.li.evgeny@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 72d7cb5c190b ("selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN") Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-4-mic@digikod.net Rule: add Link: https://lore.kernel.org/stable/20240506165518.474504-4-mic%40digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/landlock: Fix FS tests when run on a private mount pointMickaël Salaün1-1/+9
According to the test environment, the mount point of the test's working directory may be shared or not, which changes the visibility of the nested "tmp" mount point for the test's parent process calling umount("tmp"). This was spotted while running tests in containers [1], where mount points are private. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://github.com/landlock-lsm/landlock-test-tools/pull/4 [1] Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11selftests/pidfd: Fix config for pidfd_setns_testMickaël Salaün1-0/+2
Required by switch_timens() to open /proc/self/ns/time_for_children. CONFIG_GENERIC_VDSO_TIME_NS is not available on UML, so pidfd_setns_test cannot be run successfully on this architecture. Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 2b40c5db73e2 ("selftests/pidfd: add pidfd setns tests") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-2-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11csky: Emulate one-byte cmpxchgPaul E. McKenney2-0/+11
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] Co-developed-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <linux-csky@vger.kernel.org>
2024-05-11Merge branch 'acpica'Rafael J. Wysocki9-47/+314
Merge ACPICA material for v6.10. This is mostly new material included in the 20240322 upstream ACPICA release. - Disable -Wstringop-truncation for some ACPICA code in the kernel to avoid a compiler warning that is not very useful (Arnd Bergmann). - Add EINJ CXL error types to actbl1.h (Ben Cheatham). - Add support for RAS2 table to ACPICA (Shiju Jose). - Fix various spelling mistakes in text files and code comments in ACPICA (Colin Ian King). - Fix spelling and typos in ACPICA (Saket Dumbre). - Modify ACPI_OBJECT_COMMON_HEADER (lijun). - Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu). - Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam). - Add missin increment of registered GPE count to ACPICA (Daniil Tatianin). - Mark new ACPICA release 20240322 (Saket Dumbre). - Add support for the AEST V2 table to ACPICA (Ruidong Tian). * acpica: ACPICA: AEST: Add support for the AEST V2 table ACPICA: Update acpixf.h for new ACPICA release 20240322 ACPICA: events/evgpeinit: don't forget to increment registered GPE count ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure ACPICA: SRAT: Add RISC-V RINTC affinity structure ACPICA: Modify ACPI_OBJECT_COMMON_HEADER ACPICA: Fix spelling and typos ACPICA: Clean up the fix for Issue #900 ACPICA: Fix various spelling mistakes in text files and code comments ACPICA: Attempt 1 to fix issue #900 ACPICA: ACPI 6.5: RAS2: Add support for RAS2 table ACPICA: actbl1.h: Add EINJ CXL error types ACPI: disable -Wstringop-truncation
2024-05-11Merge branch 'mlx5-misc-fixes'Jakub Kicinski9-51/+79
Tariq Toukan says: ==================== mlx5 misc fixes This patchset provides bug fixes to mlx5 driver. Patch 1 by Shay fixes the error flow in mlx5e_suspend(). Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow. Patch 3 by Maher solves a deadlock in lag enable/disable. Patches 4 and 5 by Akiva address issues in command interface corner cases. Series generated against: commit 393ceeb9211e ("Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'") ==================== Link: https://lore.kernel.org/r/20240509112951.590184-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-11net/mlx5: Discard command completions in internal errorAkiva Goldberger1-0/+3
Fix use after free when FW completion arrives while device is in internal error state. Avoid calling completion handler in this case, since the device will flush the command interface and trigger all completions manually. Kernel log: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. ... RIP: 0010:refcount_warn_saturate+0xd8/0xe0 ... Call Trace: <IRQ> ? __warn+0x79/0x120 ? refcount_warn_saturate+0xd8/0xe0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xd8/0xe0 cmd_ent_put+0x13b/0x160 [mlx5_core] mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core] cmd_comp_notifier+0x1f/0x30 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 mlx5_eq_async_int+0xf6/0x290 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x4b/0x160 handle_irq_event+0x2e/0x80 handle_edge_irq+0x98/0x230 __common_interrupt+0x3b/0xa0 common_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-11net/mlx5: Add a timeout to acquire the command queue semaphoreAkiva Goldberger2-9/+33
Prevent forced completion handling on an entry that has not yet been assigned an index, causing an out of bounds access on idx = -22. Instead of waiting indefinitely for the sem, blocking flow now waits for index to be allocated or a sem acquisition timeout before beginning the timer for FW completion. Kernel log example: mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-11net/mlx5: Reload only IB representors upon lag disable/enableMaher Sanalla4-17/+25
On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded. In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock: Call trace: __switch_to+0xf4/0x148 __schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68 mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core] mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core] mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core] mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core] mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core] mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core] mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138 [mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0xdc/0x180 genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220 netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268 netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0 el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above. While at it, refactor the mlx5_esw_offloads_rep_load() function to have a static helper method for its internal logic, in symmetry with the representor unload design. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Co-developed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>