Age | Commit message (Collapse) | Author | Files | Lines |
|
This expands coverage for getsockname and getpeername hooks to include
getsockname4, getsockname6, getpeername4, and getpeername6.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-17-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch expands test coverage for EPERM tests to include connect and
bind calls and rounds out the coverage for sendmsg by adding tests for
sendmsg_unix.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-16-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch expands verifier coverage for program return values to cover
bind, connect, sendmsg, getsockname, and getpeername hooks. It also
rounds out the recvmsg coverage by adding test cases for recvmsg_unix
hooks.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage
has been fully moved to prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Remove these test cases completely, as the same behavior is already
covered by other sendmsg* test cases in prog_tests/sock_addr.c. This
just rewrites the destination address similar to sendmsg_v4_prog and
sendmsg_v6_prog.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Migrate test case from bpf/test_sock_addr.c ensuring that program
attachment fails when using an inappropriate attach type.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Migrates tests from progs/test_sock_addr.c ensuring that programs fail
to load when the expected attach type does not match.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
respects when sendmsg6 hooks rewrite the destination IP with the IPv6
wildcard IP, [::].
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to
prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This set of tests checks that sendmsg calls are rejected (return -EPERM)
when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c
with corresponding tests in prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Move wildcard IP sendmsg test case out of bpf/test_sock_addr.c into
prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-7-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In preparation to move test cases from bpf/test_sock_addr.c that expect
system calls to return ENOTSUPP or EPERM, this patch propagates errno
from relevant system calls up to test_sock_addr() where the result can
be checked.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-6-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In preparation to move test cases from bpf/test_sock_addr.c that expect
ATTACH_REJECT, this patch adds BPF_SKEL_FUNCS_RAW to generate load and
destroy functions that use bpf_prog_attach() to control the attach_type.
The normal load functions use bpf_program__attach_cgroup which does not
have the same degree of control over the attach type, as
bpf_program_attach_fd() calls bpf_link_create() with the attach type
extracted from prog using bpf_program__expected_attach_type(). It is
currently not possible to modify the attach type before
bpf_program__attach_cgroup() is called, since
bpf_program__set_expected_attach_type() has no effect after the program
is loaded.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-5-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In preparation to move test cases from bpf/test_sock_addr.c that expect
LOAD_REJECT, this patch adds expected_attach_type and extends load_fn to
accept an expected attach type and a flag indicating whether or not
rejection is expected.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-4-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In preparation to migrate tests from bpf/test_sock_addr.c to
sock_addr.c, update BPF_SKEL_FUNCS so that it generates functions
based on prog_name instead of skel_name. This allows us to differentiate
between programs in the same skeleton.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-3-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This set of tests check that the BPF verifier rejects programs with
invalid return codes (recvmsg4 and recvmsg6 hooks can only return 1).
This patch replaces the tests in test_sock_addr.c with
verifier_sock_addr.c, a new verifier prog_tests for sockaddr hooks, in a
step towards fully retiring test_sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-2-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The BPF atomic operations with the BPF_FETCH modifier along with
BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements
all atomic operations except BPF_CMPXCHG with relaxed ordering.
Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic
Instructions" says:
| To provide more efficient support for release consistency [5], each
| atomic instruction has two bits, aq and rl, used to specify additional
| memory ordering constraints as viewed by other RISC-V harts.
and
| If only the aq bit is set, the atomic memory operation is treated as
| an acquire access.
| If only the rl bit is set, the atomic memory operation is treated as a
| release access.
|
| If both the aq and rl bits are set, the atomic memory operation is
| sequentially consistent.
Fix this by setting both aq and rl bits as 1 for operations with
BPF_FETCH and BPF_XCHG.
[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We can use either "instruction" or "insn" in the comment.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.
s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:
stw [%r0], 1
afadd [%r1], %r2
ldxw %r3, [%r4]
may be broken. Fix it by emitting "bcr 14,0".
Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.
Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Puranjay Mohan says:
====================
bpf: Inline helpers in arm64 and riscv JITs
Changes in v5 -> v6:
arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/
riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
- Combine riscv and arm64 changes in single series
- Some coding style fixes
Changes in v4 -> v5:
v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/
- Implement the inlining of the bpf_get_smp_processor_id() in the JIT.
NOTE: This needs to be based on:
https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
to be built.
Manual run of bpf-ci with this series rebased on above:
https://github.com/kernel-patches/bpf/pull/6929
Changes in v3 -> v4:
v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/
- Fix coding style issue related to C89 standards.
Changes in v2 -> v3:
v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/
- Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)"
- Made ARM64 and x86-64 use the same code for inlining. The only difference
that remains is the per-cpu address of the cpu_number.
Changes in v1 -> v2:
v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/
- Add a patch to inline bpf_get_smp_processor_id()
- Fix an issue in MRS instruction encoding as pointed out by Will
- Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP
This series adds the support of internal only per-CPU instructions and inlines
the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs.
Here is an example of calls to bpf_get_smp_processor_id() and
percpu_array_map_lookup_elem() before and after this series on ARM64.
BPF
=====
BEFORE AFTER
-------- -------
int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
(85) call bpf_get_smp_processor_id#229032 (85) call bpf_get_smp_processor_id#8
p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero);
(18) r1 = map[id:78] (18) r1 = map[id:153]
(18) r2 = map[id:82][0]+65536 (18) r2 = map[id:157][0]+65536
(85) call percpu_array_map_lookup_elem#313512 (07) r1 += 496
(61) r0 = *(u32 *)(r2 +0)
(35) if r0 >= 0x1 goto pc+5
(67) r0 <<= 3
(0f) r0 += r1
(79) r0 = *(u64 *)(r0 +0)
(bf) r0 = &(void __percpu *)(r0)
(05) goto pc+1
(b7) r0 = 0
ARM64 JIT
===========
BEFORE AFTER
-------- -------
int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0
movk x10, #0x802b, lsl #16 ldr w7, [x10, #24]
movk x10, #0x8000, lsl #32
blr x10
add x7, x0, #0x0
p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero);
mov x0, #0xffff0003ffffffff mov x0, #0xffff0003ffffffff
movk x0, #0xce5c, lsl #16 movk x0, #0xe0f3, lsl #16
movk x0, #0xca00 movk x0, #0x7c00
mov x1, #0xffff8000ffffffff mov x1, #0xffff8000ffffffff
movk x1, #0x8bdb, lsl #16 movk x1, #0xb0c7, lsl #16
movk x1, #0x6000 movk x1, #0xe000
mov x10, #0xffffffffffff3ed0 add x0, x0, #0x1f0
movk x10, #0x802d, lsl #16 ldr w7, [x1]
movk x10, #0x8000, lsl #32 cmp x7, #0x1
blr x10 b.cs 0x0000000000000090
add x7, x0, #0x0 lsl x7, x7, #3
add x7, x7, x0
ldr x7, [x7]
mrs x10, tpidr_el1
add x7, x7, x10
b 0x0000000000000094
mov x7, #0x0
Performance improvement found using benchmark[1]
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+-------------------+-------------------+--------------+
| Name | Before | After | % change |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% |
| arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% |
| hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% |
+---------------+-------------------+-------------------+--------------+
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
RISCV64 JIT output for `call bpf_get_smp_processor_id`
=======================================================
Before After
-------- -------
auipc t1,0x848c ld a5,32(tp)
jalr 604(t1)
mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+
| Name | Before | After | % change |
|---------------+------------------+------------------+--------------|
| glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% |
| arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% |
| hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% |
+---------------+------------------+------------------+--------------+
====================
Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.
Here is how the ARM64 JITed assembly changes after this commit:
ARM64 JIT
===========
BEFORE AFTER
-------- -------
int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0
movk x10, #0x802b, lsl #16 ldr w7, [x10, #24]
movk x10, #0x8000, lsl #32
blr x10
add x7, x0, #0x0
Performance improvement using benchmark[1]
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+-------------------+-------------------+--------------+
| Name | Before | After | % change |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% |
| arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% |
| hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% |
+---------------+-------------------+-------------------+--------------+
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.
To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:
mov dst, src // Move src to dst, if src != dst
mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp // Add the per cpu offset to the dst.
To measure the performance improvement provided by this change, the
benchmark in [1] was used:
Before:
glob-arr-inc : 23.597 ± 0.012M/s
arr-inc : 23.173 ± 0.019M/s
hash-inc : 12.186 ± 0.028M/s
After:
glob-arr-inc : 23.819 ± 0.034M/s
arr-inc : 23.285 ± 0.017M/s
hash-inc : 12.419 ± 0.011M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).
RISCV64 JIT output for `call bpf_get_smp_processor_id`
======================================================
Before After
-------- -------
auipc t1,0x848c ld a5,32(tp)
jalr 604(t1)
mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+
| Name | Before | After | % change |
|---------------+------------------+------------------+--------------|
| glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% |
| arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% |
| hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% |
+---------------+------------------+------------------+--------------+
NOTE: This benchmark includes changes from this patch and the previous
patch that implemented the per-cpu insn.
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).
Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.
To measure the improvement from this change, the benchmark in [1] was
used on Qemu:
Before:
glob-arr-inc : 1.127 ± 0.013M/s
arr-inc : 1.121 ± 0.004M/s
hash-inc : 0.681 ± 0.052M/s
After:
glob-arr-inc : 1.138 ± 0.011M/s
arr-inc : 1.366 ± 0.006M/s
hash-inc : 0.676 ± 0.001M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:
test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS
Deployment and structure
------------------------
The related codes are added to "arch/arc/net":
- bpf_jit.h -- The interface that a back-end translator must provide
- bpf_jit_core.c -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic
The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".
1. Normal pass # The necessary pass
1a. Dry run # Get the whole JIT length, epilogue offset, etc.
1b. Emit phase # Allocate memory and start emitting instructions
2. Extra pass # Only needed if there are relocations to be fixed
2a. Patch relocations
Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:
- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)
The result of "test_bpf" test suite on an HSDK board is:
hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:
.-----------.------------.-------------.
| test type | opcodes | # of cases |
|-----------+------------+-------------|
| atomic | 0xC3, 0xDB | 149 |
| div64 | 0x37, 0x3F | 22 |
| mod64 | 0x97, 0x9F | 15 |
`-----------^------------+-------------|
| (total) 186 |
`-------------'
Setup: build config
-------------------
The following configs must be set to have a working JIT test:
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_TEST_BPF=m
The following options are not necessary for the tests module,
but are good to have:
CONFIG_DEBUG_INFO=y # prerequisite for below
CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h
CONFIG_FTRACE=y #
CONFIG_BPF_SYSCALL=y # all these options lead to
CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y
CONFIG_PERF_EVENTS=y #
Some BPF programs provide data through /sys/kernel/debug:
CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug
Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.
[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7
Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
Or else, the build will fail:
$ make V=1
...
BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats \
-j --lang_exclude=rust \
--skip_encoding_btf_inconsistent_proto \
--btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
...
BTFIDS vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.
Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
...
test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Dan Carpenter reports:
Commit cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names
from kinds") from Apr 28, 2024 (linux-next), leads to the following
Smatch static checker warning:
drivers/hwmon/nzxt-kraken3.c:957 kraken3_probe()
error: uninitialized symbol 'device_name'.
Indeed, 'device_name' will be uninitizalized if an unknown product is
encountered. In practice this should not matter because the driver
should not instantiate on unknown products, but lets play safe and
bail out if that happens.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-hwmon/b1738c50-db42-40f0-a899-9c027c131ffb@moroto.mountain/
Cc: Jonas Malaco <jonas@protocubo.io>
Cc: Aleksa Savic <savicaleksa83@gmail.com>
Fixes: cbeb479ff4cd ("hwmon: (nzxt-kraken3) Decouple device names from kinds")
Acked-by: Jonas Malaco <jonas@protocubo.io>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull Kselftest fixes from Mickaël Salaün:
"Fix Kselftest's vfork() side effects.
As reported by Kernel Test Robot and Sean Christopherson, some
tests fail since v6.9-rc1 . This is due to the use of vfork() which
introduced some side effects. Similarly, while making it more generic,
a previous commit made some Landlock file system tests flaky, and
subject to the host's file system mount configuration.
This fixes all these side effects by replacing vfork() with clone3()
and CLONE_VFORK, which is cleaner (no arbitrary shared memory) and
makes the Kselftest framework more robust"
Link: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com
Link: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com
Link: https://lore.kernel.org/r/20240511171445.904356-1-mic@digikod.net
* tag 'kselftest-fix-vfork-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/harness: Handle TEST_F()'s explicit exit codes
selftests/harness: Fix vfork() side effects
selftests/harness: Share _metadata between forked processes
selftests/pidfd: Fix wrong expectation
selftests/harness: Constify fixture variants
selftests/landlock: Do not allocate memory in fixture data
selftests/harness: Fix interleaved scheduling leading to race conditions
selftests/harness: Fix fixture teardown
selftests/landlock: Fix FS tests when run on a private mount point
selftests/pidfd: Fix config for pidfd_setns_test
|
|
Pull kvm fix from Paolo Bonzini:
- Fix NULL pointer read on s390 in ioctl(KVM_CHECK_EXTENSION) for
/dev/kvm
* tag 'for-linus-6.9' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: Check kvm pointer when testing KVM_CAP_S390_HPAGE_1M
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Fix a race condition when clearing error count bits and toggling the
error interrupt throug the same register, in synopsys_edac
* tag 'edac_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Fix ECC status and IRQ control race condition
|
|
EMC1428 and EMC1438 are similar to EMC14xx, but have eight temperature
channels, as well as signed data and limit registers. Chips currently
supported by this driver have unsigned registers only.
Signed-off-by: Lars Petter Mostad <larspm@gmail.com>
Link: https://lore.kernel.org/r/20240510142824.824332-1-lars.petter.mostad@appear.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a new PCI ID which belongs to a new AMD CPU family 0x1a
- Ensure that that last level cache ID is set in all cases, in the AMD
CPU topology parsing code, in order to prevent invalid scheduling
domain CPU masks
* tag 'x86_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology/amd: Ensure that LLC ID is initialized
x86/amd_nb: Add new PCI IDs for AMD family 0x1a
|
|
The values loaded into the mixer are 16-bit values, with 8192
representing 0dB, going up to a current maximum of 16345 (+6dB). All
supported interfaces have no problem going up to 32612 (+12dB), so
update SCARLETT2_MIXER_MAX_DB and scarlett2_mixer_values[] to allow
for this.
Tested with:
- Scarlett 2nd Gen 6i6, 18i8, 18i20
- Scarlett 3rd Gen 4i4, 8i6, 18i8, 18i20
- Scarlett 4th Gen Solo, 2i2, 4i4
- Clarett+ 2Pre, 4Pre, 8Pre
- Vocaster One and Two
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Link: https://lore.kernel.org/r/Zj+gYT4F2XeKTD93@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add S/PDIF Source/Digital I/O Mode selection controls for the Scarlett
3rd Gen 18i8/18i20 and Clarett 4Pre/8Pre interfaces. These models have
both coax S/PDIF and optical inputs, and the optical inputs are
switchable between being used as S/PDIF and ADAT inputs. The Scarlett
3rd Gen 18i20 also has a "Dual ADAT" mode for 8-channel audio at
88.2/96kHz.
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Link: https://lore.kernel.org/r/Zj8zCTjzPsTDENN+@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
If TEST_F() explicitly calls exit(code) with code different than 0, then
_metadata->exit_code is set to this code (e.g. KVM_ONE_VCPU_TEST()). We
need to keep in mind that _metadata->exit_code can be KSFT_SKIP while
the process exit code is 0.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Reported-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com
Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()")
Link: https://lore.kernel.org/r/20240511171445.904356-11-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Setting the time namespace with CLONE_NEWTIME returns -EUSERS if the
calling thread shares memory with another thread (because of the shared
vDSO), which is the case when it is created with vfork().
Fix pidfd_setns_test by replacing test harness's vfork() call with a
clone3() call with CLONE_VFORK, and an explicit sharing of the
_metadata and self objects.
Replace _metadata->teardown_parent with a new FIXTURE_TEARDOWN_PARENT()
helper that can replace FIXTURE_TEARDOWN(). This is a cleaner approach
and it enables to selectively share the fixture data between the child
process running tests and the parent process running the fixture
teardown. This also avoids updating several tests to not rely on the
self object's copy-on-write property (e.g. storing the returned value of
a fork() call).
Cc: Christian Brauner <brauner@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com
Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()")
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-10-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Unconditionally share _metadata between all forked processes, which
enables to actually catch errors which were previously ignored.
This is required for a following commit replacing vfork() with clone3()
and CLONE_VFORK (i.e. not sharing the full memory) . It should also be
useful to share _metadata to extend expectations to test process's
forks. For instance, this change identified a wrong expectation in
pidfd_setns_test.
Because this _metadata is used by the new XFAIL_ADD(), use a global
pointer initialized in TEST_F(). This is OK because only XFAIL_ADD()
use it, and XFAIL_ADD() already depends on TEST_F().
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Will Drewry <wad@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-9-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Replace a wrong EXPECT_GT(self->child_pid_exited, 0) with EXPECT_GE(),
which will be actually tested on the parent and child sides with a
following commit.
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20240511171445.904356-8-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
FIXTURE_VARIANT_ADD() types are passed as const pointers to
FIXTURE_TEARDOWN(). Make that explicit by constifying the variants
declarations.
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Will Drewry <wad@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-7-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Do not allocate self->dir_path in the test process because this would
not be visible in the FIXTURE_TEARDOWN() process when relying on
fork()/clone3() instead of vfork().
This change is required for a following commit removing vfork() call to
not break the layout3_fs.* test cases.
Cc: Günther Noack <gnoack@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-6-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Fix a race condition when running several FIXTURE_TEARDOWN() managing
the same resource. This fixes a race condition in the Landlock file
system tests when creating or unmounting the same directory.
Using clone3() with CLONE_VFORK guarantees that the child and grandchild
test processes are sequentially scheduled. This is implemented with a
new clone3_vfork() helper replacing the fork() call.
This avoids triggering this error in __wait_for_test():
Test ended in some other way [127]
Cc: Christian Brauner <brauner@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling")
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Make sure fixture teardowns are run when test cases failed, including
when _metadata->teardown_parent is set to true.
Make sure only one fixture teardown is run per test case, handling the
case where the test child forks.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Shengyu Li <shengyu.li.evgeny@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Fixes: 72d7cb5c190b ("selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN")
Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()")
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-4-mic@digikod.net
Rule: add
Link: https://lore.kernel.org/stable/20240506165518.474504-4-mic%40digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
According to the test environment, the mount point of the test's working
directory may be shared or not, which changes the visibility of the
nested "tmp" mount point for the test's parent process calling
umount("tmp").
This was spotted while running tests in containers [1], where mount
points are private.
Cc: Günther Noack <gnoack@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://github.com/landlock-lsm/landlock-test-tools/pull/4 [1]
Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling")
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240511171445.904356-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Required by switch_timens() to open /proc/self/ns/time_for_children.
CONFIG_GENERIC_VDSO_TIME_NS is not available on UML, so pidfd_setns_test
cannot be run successfully on this architecture.
Cc: Shuah Khan <skhan@linuxfoundation.org>
Fixes: 2b40c5db73e2 ("selftests/pidfd: add pidfd setns tests")
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20240511171445.904356-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
Co-developed-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-csky@vger.kernel.org>
|
|
Merge ACPICA material for v6.10. This is mostly new material included
in the 20240322 upstream ACPICA release.
- Disable -Wstringop-truncation for some ACPICA code in the kernel to
avoid a compiler warning that is not very useful (Arnd Bergmann).
- Add EINJ CXL error types to actbl1.h (Ben Cheatham).
- Add support for RAS2 table to ACPICA (Shiju Jose).
- Fix various spelling mistakes in text files and code comments in
ACPICA (Colin Ian King).
- Fix spelling and typos in ACPICA (Saket Dumbre).
- Modify ACPI_OBJECT_COMMON_HEADER (lijun).
- Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu).
- Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam).
- Add missin increment of registered GPE count to ACPICA (Daniil
Tatianin).
- Mark new ACPICA release 20240322 (Saket Dumbre).
- Add support for the AEST V2 table to ACPICA (Ruidong Tian).
* acpica:
ACPICA: AEST: Add support for the AEST V2 table
ACPICA: Update acpixf.h for new ACPICA release 20240322
ACPICA: events/evgpeinit: don't forget to increment registered GPE count
ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table
ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure
ACPICA: SRAT: Add RISC-V RINTC affinity structure
ACPICA: Modify ACPI_OBJECT_COMMON_HEADER
ACPICA: Fix spelling and typos
ACPICA: Clean up the fix for Issue #900
ACPICA: Fix various spelling mistakes in text files and code comments
ACPICA: Attempt 1 to fix issue #900
ACPICA: ACPI 6.5: RAS2: Add support for RAS2 table
ACPICA: actbl1.h: Add EINJ CXL error types
ACPI: disable -Wstringop-truncation
|
|
Tariq Toukan says:
====================
mlx5 misc fixes
This patchset provides bug fixes to mlx5 driver.
Patch 1 by Shay fixes the error flow in mlx5e_suspend().
Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow.
Patch 3 by Maher solves a deadlock in lag enable/disable.
Patches 4 and 5 by Akiva address issues in command interface corner cases.
Series generated against:
commit 393ceeb9211e ("Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'")
====================
Link: https://lore.kernel.org/r/20240509112951.590184-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix use after free when FW completion arrives while device is in
internal error state. Avoid calling completion handler in this case,
since the device will flush the command interface and trigger all
completions manually.
Kernel log:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
...
Call Trace:
<IRQ>
? __warn+0x79/0x120
? refcount_warn_saturate+0xd8/0xe0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0xd8/0xe0
cmd_ent_put+0x13b/0x160 [mlx5_core]
mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core]
cmd_comp_notifier+0x1f/0x30 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
mlx5_eq_async_int+0xf6/0x290 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
irq_int_handler+0x19/0x30 [mlx5_core]
__handle_irq_event_percpu+0x4b/0x160
handle_irq_event+0x2e/0x80
handle_edge_irq+0x98/0x230
__common_interrupt+0x3b/0xa0
common_interrupt+0x7b/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prevent forced completion handling on an entry that has not yet been
assigned an index, causing an out of bounds access on idx = -22.
Instead of waiting indefinitely for the sem, blocking flow now waits for
index to be allocated or a sem acquisition timeout before beginning the
timer for FW completion.
Kernel log example:
mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion
Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On lag disable, the bond IB device along with all of its
representors are destroyed, and then the slaves' representors get reloaded.
In case the slave IB representor load fails, the eswitch error flow
unloads all representors, including ethernet representors, where the
netdevs get detached and removed from lag bond. Such flow is inaccurate
as the lag driver is not responsible for loading/unloading ethernet
representors. Furthermore, the flow described above begins by holding
lag lock to prevent bond changes during disable flow. However, when
reaching the ethernet representors detachment from lag, the lag lock is
required again, triggering the following deadlock:
Call trace:
__switch_to+0xf4/0x148
__schedule+0x2c8/0x7d0
schedule+0x50/0xe0
schedule_preempt_disabled+0x18/0x28
__mutex_lock.isra.13+0x2b8/0x570
__mutex_lock_slowpath+0x1c/0x28
mutex_lock+0x4c/0x68
mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]
mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]
mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]
mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]
mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]
mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]
mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]
mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core]
mlx5_disable_lag+0x130/0x138 [mlx5_core]
mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock
mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]
devlink_nl_cmd_eswitch_set_doit+0xdc/0x180
genl_family_rcv_msg_doit.isra.17+0xe8/0x138
genl_rcv_msg+0xe4/0x220
netlink_rcv_skb+0x44/0x108
genl_rcv+0x40/0x58
netlink_unicast+0x198/0x268
netlink_sendmsg+0x1d4/0x418
sock_sendmsg+0x54/0x60
__sys_sendto+0xf4/0x120
__arm64_sys_sendto+0x30/0x40
el0_svc_common+0x8c/0x120
do_el0_svc+0x30/0xa0
el0_svc+0x20/0x30
el0_sync_handler+0x90/0xb8
el0_sync+0x160/0x180
Thus, upon lag enable/disable, load and unload only the IB representors
of the slaves preventing the deadlock mentioned above.
While at it, refactor the mlx5_esw_offloads_rep_load() function to have
a static helper method for its internal logic, in symmetry with the
representor unload design.
Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
Co-developed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509112951.590184-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|