kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-09-16	bpf: Disallow fentry/fexit/freplace for exception callbacks	Kumar Kartikeya Dwivedi	2	-0/+7
	During testing, it was discovered that extensions to exception callbacks had no checks, upon running a testcase, the kernel ended up running off the end of a program having final call as bpf_throw, and hitting int3 instructions. The reason is that while the default exception callback would have reset the stack frame to return back to the main program's caller, the replacing extension program will simply return back to bpf_throw, which will instead return back to the program and the program will continue execution, now in an undefined state where anything could happen. The way to support extensions to an exception callback would be to mark the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it from calling bpf_throw. This would make the JIT produce a prologue that restores saved registers and reset the stack frame. But let's not do that until there is a concrete use case for this, and simply disallow this for now. Similar issues will exist for fentry and fexit cases, where trampoline saves data on the stack when invoking exception callback, which however will then end up resetting the stack frame, and on return, the fexit program will never will invoked as the return address points to the main program's caller in the kernel. Instead of additional complexity and back and forth between the two stacks to enable such a use case, simply forbid it. One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't require any modifications, even though we emit instructions before the corresponding endbr64 instruction. This is because we ensure that a main subprog never serves as an exception callback, and therefore the exception callback (which will be a global subprog) can never serve as the tail call target, eliminating any discrepancies. However, once we support a BPF_PROG_TYPE_EXT to also act as an exception callback, it will end up requiring change to the tail call offset to account for the extra instructions. For simplicitly, tail calls could be disabled for such targets. Noting the above, it appears better to wait for a concrete use case before choosing to permit extension programs to replace exception callbacks. As a precaution, we disable fentry and fexit for exception callbacks as well. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-13-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Detect IP == ksym.end as part of BPF program	Kumar Kartikeya Dwivedi	1	-1/+5
	Now that bpf_throw kfunc is the first such call instruction that has noreturn semantics within the verifier, this also kicks in dead code elimination in unprecedented ways. For one, any instruction following a bpf_throw call will never be marked as seen. Moreover, if a callchain ends up throwing, any instructions after the call instruction to the eventually throwing subprog in callers will also never be marked as seen. The tempting way to fix this would be to emit extra 'int3' instructions which bump the jited_len of a program, and ensure that during runtime when a program throws, we can discover its boundaries even if the call instruction to bpf_throw (or to subprogs that always throw) is emitted as the final instruction in the program. An example of such a program would be this: do_something(): ... r0 = 0 exit foo(): r1 = 0 call bpf_throw r0 = 0 exit bar(cond): if r1 != 0 goto pc+2 call do_something exit call foo r0 = 0 // Never seen by verifier exit // main(ctx): r1 = ... call bar r0 = 0 exit Here, if we do end up throwing, the stacktrace would be the following: bpf_throw foo bar main In bar, the final instruction emitted will be the call to foo, as such, the return address will be the subsequent instruction (which the JIT emits as int3 on x86). This will end up lying outside the jited_len of the program, thus, when unwinding, we will fail to discover the return address as belonging to any program and end up in a panic due to the unreliable stack unwinding of BPF programs that we never expect. To remedy this case, make bpf_prog_ksym_find treat IP == ksym.end as part of the BPF program, so that is_bpf_text_address returns true when such a case occurs, and we are able to unwind reliably when the final instruction ends up being a call instruction. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-12-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Prevent KASAN false positive with bpf_throw	Kumar Kartikeya Dwivedi	1	-0/+6
	The KASAN stack instrumentation when CONFIG_KASAN_STACK is true poisons the stack of a function when it is entered and unpoisons it when leaving. However, in the case of bpf_throw, we will never return as we switch our stack frame to the BPF exception callback. Later, this discrepancy will lead to confusing KASAN splats when kernel resumes execution on return from the BPF program. Fix this by unpoisoning everything below the stack pointer of the BPF program, which should cover the range that would not be unpoisoned. An example splat is below: BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x14e/0x170 Write of size 8 at addr ffffc900013af958 by task test_progs/227 CPU: 0 PID: 227 Comm: test_progs Not tainted 6.5.0-rc2-g43f1c6c9052a-dirty #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-2.fc39 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x670 ? arch_stack_walk+0x79/0x100 kasan_report+0xda/0x110 ? stack_trace_consume_entry+0x14e/0x170 ? stack_trace_consume_entry+0x14e/0x170 ? __pfx_stack_trace_consume_entry+0x10/0x10 stack_trace_consume_entry+0x14e/0x170 ? __sys_bpf+0xf2e/0x41b0 arch_stack_walk+0x8b/0x100 ? __sys_bpf+0xf2e/0x41b0 ? bpf_prog_test_run_skb+0x341/0x1c70 ? bpf_prog_test_run_skb+0x341/0x1c70 stack_trace_save+0x9b/0xd0 ? __pfx_stack_trace_save+0x10/0x10 ? __kasan_slab_free+0x109/0x180 ? bpf_prog_test_run_skb+0x341/0x1c70 ? __sys_bpf+0xf2e/0x41b0 ? __x64_sys_bpf+0x78/0xc0 ? do_syscall_64+0x3c/0x90 ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 kasan_save_stack+0x33/0x60 ? kasan_save_stack+0x33/0x60 ? kasan_set_track+0x25/0x30 ? kasan_save_free_info+0x2b/0x50 ? __kasan_slab_free+0x109/0x180 ? kmem_cache_free+0x191/0x460 ? bpf_prog_test_run_skb+0x341/0x1c70 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x109/0x180 kmem_cache_free+0x191/0x460 bpf_prog_test_run_skb+0x341/0x1c70 ? __pfx_bpf_prog_test_run_skb+0x10/0x10 ? __fget_light+0x51/0x220 __sys_bpf+0xf2e/0x41b0 ? __might_fault+0xa2/0x170 ? __pfx___sys_bpf+0x10/0x10 ? lock_release+0x1de/0x620 ? __might_fault+0xcd/0x170 ? __pfx_lock_release+0x10/0x10 ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 __x64_sys_bpf+0x78/0xc0 ? syscall_enter_from_user_mode+0x20/0x50 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f0fbb38880d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 45 12 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe13907de8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007ffe13908708 RCX: 00007f0fbb38880d RDX: 0000000000000050 RSI: 00007ffe13907e20 RDI: 000000000000000a RBP: 00007ffe13907e00 R08: 0000000000000000 R09: 00007ffe13907e20 R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000003 R13: 0000000000000000 R14: 00007f0fbb532000 R15: 0000000000cfbd90 </TASK> The buggy address belongs to stack of task test_progs/227 KASAN internal error: frame info validation failed; invalid marker: 0 The buggy address belongs to the virtual mapping at [ffffc900013a8000, ffffc900013b1000) created by: kernel_clone+0xcd/0x600 The buggy address belongs to the physical page: page:00000000b70f4332 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11418f flags: 0x2fffe0000000000(node=0\|zone=2\|lastcpupid=0x7fff) page_type: 0xffffffff() raw: 02fffe0000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffc900013af800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900013af880: 00 00 00 f1 f1 f1 f1 00 00 00 f3 f3 f3 f3 f3 00 >ffffc900013af900: 00 00 00 00 00 00 00 00 00 00 00 f1 00 00 00 00 ^ ffffc900013af980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900013afa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	mm: kasan: Declare kasan_unpoison_task_stack_below in kasan.h	Kumar Kartikeya Dwivedi	2	-1/+2
	We require access to this kasan helper in BPF code in the next patch where we have to unpoison the task stack when we unwind and reset the stack frame from bpf_throw, and it never really unpoisons the poisoned stack slots on entry when compiler instrumentation is generated by CONFIG_KASAN_STACK and inline instrumentation is supported. Also, remove the declaration from mm/kasan/kasan.h as we put it in the header file kasan.h. Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Treat first argument as return value for bpf_throw	Kumar Kartikeya Dwivedi	1	-13/+24
	In case of the default exception callback, change the behavior of bpf_throw, where the passed cookie value is no longer ignored, but is instead the return value of the default exception callback. As such, we need to place restrictions on the value being passed into bpf_throw in such a case, only allowing those permitted by the check_return_code function. Thus, bpf_throw can now control the return value of the program from each call site without having the user install a custom exception callback just to override the return value when an exception is thrown. We also modify the hidden subprog instructions to now move BPF_REG_1 to BPF_REG_0, so as to set the return value before exit in the default callback. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Perform CFG walk for exception callback	Kumar Kartikeya Dwivedi	1	-2/+13
	Since exception callbacks are not referenced using bpf_pseudo_func and bpf_pseudo_call instructions, check_cfg traversal will never explore instructions of the exception callback. Even after adding the subprog, the program will then fail with a 'unreachable insn' error. We thus need to begin walking from the start of the exception callback again in check_cfg after a complete CFG traversal finishes, so as to explore the CFG rooted at the exception callback. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Add support for custom exception callbacks	Kumar Kartikeya Dwivedi	5	-18/+160
	By default, the subprog generated by the verifier to handle a thrown exception hardcodes a return value of 0. To allow user-defined logic and modification of the return value when an exception is thrown, introduce the 'exception_callback:' declaration tag, which marks a callback as the default exception handler for the program. The format of the declaration tag is 'exception_callback:<value>', where <value> is the name of the exception callback. Each main program can be tagged using this BTF declaratiion tag to associate it with an exception callback. In case the tag is absent, the default callback is used. As such, the exception callback cannot be modified at runtime, only set during verification. Allowing modification of the callback for the current program execution at runtime leads to issues when the programs begin to nest, as any per-CPU state maintaing this information will have to be saved and restored. We don't want it to stay in bpf_prog_aux as this takes a global effect for all programs. An alternative solution is spilling the callback pointer at a known location on the program stack on entry, and then passing this location to bpf_throw as a parameter. However, since exceptions are geared more towards a use case where they are ideally never invoked, optimizing for this use case and adding to the complexity has diminishing returns. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Refactor check_btf_func and split into two phases	Kumar Kartikeya Dwivedi	1	-28/+100
	This patch splits the check_btf_info's check_btf_func check into two separate phases. The first phase sets up the BTF and prepares func_info, but does not perform any validation of required invariants for subprogs just yet. This is left to the second phase, which happens where check_btf_info executes currently, and performs the line_info and CO-RE relocation. The reason to perform this split is to obtain the userspace supplied func_info information before we perform the add_subprog call, where we would now require finding and adding subprogs that may not have a bpf_pseudo_call or bpf_pseudo_func instruction in the program. We require this as we want to enable userspace to supply exception callbacks that can override the default hidden subprogram generated by the verifier (which performs a hardcoded action). In such a case, the exception callback may never be referenced in an instruction, but will still be suitably annotated (by way of BTF declaration tags). For finding this exception callback, we would require the program's BTF information, and the supplied func_info information which maps BTF type IDs to subprograms. Since the exception callback won't actually be referenced through instructions, later checks in check_cfg and do_check_subprogs will not verify the subprog. This means that add_subprog needs to add them in the add_subprog_and_kfunc phase before we move forward, which is why the BTF and func_info are required at that point. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Implement BPF exceptions	Kumar Kartikeya Dwivedi	8	-27/+247
	This patch implements BPF exceptions, and introduces a bpf_throw kfunc to allow programs to throw exceptions during their execution at runtime. A bpf_throw invocation is treated as an immediate termination of the program, returning back to its caller within the kernel, unwinding all stack frames. This allows the program to simplify its implementation, by testing for runtime conditions which the verifier has no visibility into, and assert that they are true. In case they are not, the program can simply throw an exception from the other branch. BPF exceptions are explicitly NOT an unlikely slowpath error handling primitive, and this objective has guided design choices of the implementation of the them within the kernel (with the bulk of the cost for unwinding the stack offloaded to the bpf_throw kfunc). The implementation of this mechanism requires use of add_hidden_subprog mechanism introduced in the previous patch, which generates a couple of instructions to move R1 to R0 and exit. The JIT then rewrites the prologue of this subprog to take the stack pointer and frame pointer as inputs and reset the stack frame, popping all callee-saved registers saved by the main subprog. The bpf_throw function then walks the stack at runtime, and invokes this exception subprog with the stack and frame pointers as parameters. Reviewers must take note that currently the main program is made to save all callee-saved registers on x86_64 during entry into the program. This is because we must do an equivalent of a lightweight context switch when unwinding the stack, therefore we need the callee-saved registers of the caller of the BPF program to be able to return with a sane state. Note that we have to additionally handle r12, even though it is not used by the program, because when throwing the exception the program makes an entry into the kernel which could clobber r12 after saving it on the stack. To be able to preserve the value we received on program entry, we push r12 and restore it from the generated subprogram when unwinding the stack. For now, bpf_throw invocation fails when lingering resources or locks exist in that path of the program. In a future followup, bpf_throw will be extended to perform frame-by-frame unwinding to release lingering resources for each stack frame, removing this limitation. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Implement support for adding hidden subprogs	Kumar Kartikeya Dwivedi	5	-11/+43
	Introduce support in the verifier for generating a subprogram and include it as part of a BPF program dynamically after the do_check phase is complete. The first user will be the next patch which generates default exception callbacks if none are set for the program. The phase of invocation will be do_misc_fixups. Note that this is an internal verifier function, and should be used with instruction blocks which uphold the invariants stated in check_subprogs. Since these subprogs are always appended to the end of the instruction sequence of the program, it becomes relatively inexpensive to do the related adjustments to the subprog_info of the program. Only the fake exit subprogram is shifted forward, making room for our new subprog. This is useful to insert a new subprogram, get it JITed, and obtain its function pointer. The next patch will use this functionality to insert a default exception callback which will be invoked after unwinding the stack. Note that these added subprograms are invisible to userspace, and never reported in BPF_OBJ_GET_INFO_BY_ID etc. For now, only a single subprogram is supported, but more can be easily supported in the future. To this end, two function counts are introduced now, the existing func_cnt, and real_func_cnt, the latter including hidden programs. This allows us to conver the JIT code to use the real_func_cnt for management of resources while syscall path continues working with existing func_cnt. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arch/x86: Implement arch_bpf_stack_walk	Kumar Kartikeya Dwivedi	3	-0/+39
	The plumbing for offline unwinding when we throw an exception in programs would require walking the stack, hence introduce a new arch_bpf_stack_walk function. This is provided when the JIT supports exceptions, i.e. bpf_jit_supports_exceptions is true. The arch-specific code is really minimal, hence it should be straightforward to extend this support to other architectures as well, as it reuses the logic of arch_stack_walk, but allowing access to unwind_state data. Once the stack pointer and frame pointer are known for the main subprog during the unwinding, we know the stack layout and location of any callee-saved registers which must be restored before we return back to the kernel. This handling will be added in the subsequent patches. Note that while we primarily unwind through BPF frames, which are effectively CONFIG_UNWINDER_FRAME_POINTER, we still need one of this or CONFIG_UNWINDER_ORC to be able to unwind through the bpf_throw frame from which we begin walking the stack. We also require both sp and bp (stack and frame pointers) from the unwind_state structure, which are only available when one of these two options are enabled. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf: Use bpf_is_subprog to check for subprogs	Kumar Kartikeya Dwivedi	4	-3/+8
	We would like to know whether a bpf_prog corresponds to the main prog or one of the subprogs. The current JIT implementations simply check this using the func_idx in bpf_prog->aux->func_idx. When the index is 0, it belongs to the main program, otherwise it corresponds to some subprogram. This will also be necessary to halt exception propagation while walking the stack when an exception is thrown, so we add a simple helper function to check this, named bpf_is_subprog, and convert existing JIT implementations to also make use of it. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	Merge branch 'arm32-bpf-add-support-for-cpuv4-insns'	Alexei Starovoitov	10	-31/+694
	Puranjay Mohan says: ==================== arm32, bpf: add support for cpuv4 insns Changes in V2 -> V3 - Added comments at places where there could be confustion. - In the patch for DIV64, fix the if-else case that would never run. - In the same patch use a single instruction to POP caller saved regs. - Add a patch to change maintainership of ARM32 BPF JIT. Changes in V1 -> V2: - Fix coding style issues. - Don't use tmp variable for src in emit_ldsx_r() as it is redundant. - Optimize emit_ldsx_r() when offset can fit in immediate. Add the support for cpuv4 instructions for ARM32 BPF JIT. 64-bit division was not supported earlier so this series adds 64-bit DIV, SDIV, MOD, SMOD instructions as well. This series needs any one of the patches from [1] to disable zero-extension for BPF_MEMSX to support ldsx. The relevant selftests have passed expect ldsx_insn which needs fentry: Tested on BeagleBone Black (ARMv7-A): [root@alarm del]# echo 1 > /proc/sys/net/core/bpf_jit_enable [root@alarm del]# ./test_progs -a verifier_sdiv,verifier_movsx,verifier_ldsx,verifier_gotol,verifier_bswap #337/1 verifier_bswap/BSWAP, 16:OK #337/2 verifier_bswap/BSWAP, 16 @unpriv:OK #337/3 verifier_bswap/BSWAP, 32:OK #337/4 verifier_bswap/BSWAP, 32 @unpriv:OK #337/5 verifier_bswap/BSWAP, 64:OK #337/6 verifier_bswap/BSWAP, 64 @unpriv:OK #337 verifier_bswap:OK #351/1 verifier_gotol/gotol, small_imm:OK #351/2 verifier_gotol/gotol, small_imm @unpriv:OK #351 verifier_gotol:OK #359/1 verifier_ldsx/LDSX, S8:OK #359/2 verifier_ldsx/LDSX, S8 @unpriv:OK #359/3 verifier_ldsx/LDSX, S16:OK #359/4 verifier_ldsx/LDSX, S16 @unpriv:OK #359/5 verifier_ldsx/LDSX, S32:OK #359/6 verifier_ldsx/LDSX, S32 @unpriv:OK #359/7 verifier_ldsx/LDSX, S8 range checking, privileged:OK #359/8 verifier_ldsx/LDSX, S16 range checking:OK #359/9 verifier_ldsx/LDSX, S16 range checking @unpriv:OK #359/10 verifier_ldsx/LDSX, S32 range checking:OK #359/11 verifier_ldsx/LDSX, S32 range checking @unpriv:OK #359 verifier_ldsx:OK #370/1 verifier_movsx/MOV32SX, S8:OK #370/2 verifier_movsx/MOV32SX, S8 @unpriv:OK #370/3 verifier_movsx/MOV32SX, S16:OK #370/4 verifier_movsx/MOV32SX, S16 @unpriv:OK #370/5 verifier_movsx/MOV64SX, S8:OK #370/6 verifier_movsx/MOV64SX, S8 @unpriv:OK #370/7 verifier_movsx/MOV64SX, S16:OK #370/8 verifier_movsx/MOV64SX, S16 @unpriv:OK #370/9 verifier_movsx/MOV64SX, S32:OK #370/10 verifier_movsx/MOV64SX, S32 @unpriv:OK #370/11 verifier_movsx/MOV32SX, S8, range_check:OK #370/12 verifier_movsx/MOV32SX, S8, range_check @unpriv:OK #370/13 verifier_movsx/MOV32SX, S16, range_check:OK #370/14 verifier_movsx/MOV32SX, S16, range_check @unpriv:OK #370/15 verifier_movsx/MOV32SX, S16, range_check 2:OK #370/16 verifier_movsx/MOV32SX, S16, range_check 2 @unpriv:OK #370/17 verifier_movsx/MOV64SX, S8, range_check:OK #370/18 verifier_movsx/MOV64SX, S8, range_check @unpriv:OK #370/19 verifier_movsx/MOV64SX, S16, range_check:OK #370/20 verifier_movsx/MOV64SX, S16, range_check @unpriv:OK #370/21 verifier_movsx/MOV64SX, S32, range_check:OK #370/22 verifier_movsx/MOV64SX, S32, range_check @unpriv:OK #370/23 verifier_movsx/MOV64SX, S16, R10 Sign Extension:OK #370/24 verifier_movsx/MOV64SX, S16, R10 Sign Extension @unpriv:OK #370 verifier_movsx:OK #382/1 verifier_sdiv/SDIV32, non-zero imm divisor, check 1:OK #382/2 verifier_sdiv/SDIV32, non-zero imm divisor, check 1 @unpriv:OK #382/3 verifier_sdiv/SDIV32, non-zero imm divisor, check 2:OK #382/4 verifier_sdiv/SDIV32, non-zero imm divisor, check 2 @unpriv:OK #382/5 verifier_sdiv/SDIV32, non-zero imm divisor, check 3:OK #382/6 verifier_sdiv/SDIV32, non-zero imm divisor, check 3 @unpriv:OK #382/7 verifier_sdiv/SDIV32, non-zero imm divisor, check 4:OK #382/8 verifier_sdiv/SDIV32, non-zero imm divisor, check 4 @unpriv:OK #382/9 verifier_sdiv/SDIV32, non-zero imm divisor, check 5:OK #382/10 verifier_sdiv/SDIV32, non-zero imm divisor, check 5 @unpriv:OK #382/11 verifier_sdiv/SDIV32, non-zero imm divisor, check 6:OK #382/12 verifier_sdiv/SDIV32, non-zero imm divisor, check 6 @unpriv:OK #382/13 verifier_sdiv/SDIV32, non-zero imm divisor, check 7:OK #382/14 verifier_sdiv/SDIV32, non-zero imm divisor, check 7 @unpriv:OK #382/15 verifier_sdiv/SDIV32, non-zero imm divisor, check 8:OK #382/16 verifier_sdiv/SDIV32, non-zero imm divisor, check 8 @unpriv:OK #382/17 verifier_sdiv/SDIV32, non-zero reg divisor, check 1:OK #382/18 verifier_sdiv/SDIV32, non-zero reg divisor, check 1 @unpriv:OK #382/19 verifier_sdiv/SDIV32, non-zero reg divisor, check 2:OK #382/20 verifier_sdiv/SDIV32, non-zero reg divisor, check 2 @unpriv:OK #382/21 verifier_sdiv/SDIV32, non-zero reg divisor, check 3:OK #382/22 verifier_sdiv/SDIV32, non-zero reg divisor, check 3 @unpriv:OK #382/23 verifier_sdiv/SDIV32, non-zero reg divisor, check 4:OK #382/24 verifier_sdiv/SDIV32, non-zero reg divisor, check 4 @unpriv:OK #382/25 verifier_sdiv/SDIV32, non-zero reg divisor, check 5:OK #382/26 verifier_sdiv/SDIV32, non-zero reg divisor, check 5 @unpriv:OK #382/27 verifier_sdiv/SDIV32, non-zero reg divisor, check 6:OK #382/28 verifier_sdiv/SDIV32, non-zero reg divisor, check 6 @unpriv:OK #382/29 verifier_sdiv/SDIV32, non-zero reg divisor, check 7:OK #382/30 verifier_sdiv/SDIV32, non-zero reg divisor, check 7 @unpriv:OK #382/31 verifier_sdiv/SDIV32, non-zero reg divisor, check 8:OK #382/32 verifier_sdiv/SDIV32, non-zero reg divisor, check 8 @unpriv:OK #382/33 verifier_sdiv/SDIV64, non-zero imm divisor, check 1:OK #382/34 verifier_sdiv/SDIV64, non-zero imm divisor, check 1 @unpriv:OK #382/35 verifier_sdiv/SDIV64, non-zero imm divisor, check 2:OK #382/36 verifier_sdiv/SDIV64, non-zero imm divisor, check 2 @unpriv:OK #382/37 verifier_sdiv/SDIV64, non-zero imm divisor, check 3:OK #382/38 verifier_sdiv/SDIV64, non-zero imm divisor, check 3 @unpriv:OK #382/39 verifier_sdiv/SDIV64, non-zero imm divisor, check 4:OK #382/40 verifier_sdiv/SDIV64, non-zero imm divisor, check 4 @unpriv:OK #382/41 verifier_sdiv/SDIV64, non-zero imm divisor, check 5:OK #382/42 verifier_sdiv/SDIV64, non-zero imm divisor, check 5 @unpriv:OK #382/43 verifier_sdiv/SDIV64, non-zero imm divisor, check 6:OK #382/44 verifier_sdiv/SDIV64, non-zero imm divisor, check 6 @unpriv:OK #382/45 verifier_sdiv/SDIV64, non-zero reg divisor, check 1:OK #382/46 verifier_sdiv/SDIV64, non-zero reg divisor, check 1 @unpriv:OK #382/47 verifier_sdiv/SDIV64, non-zero reg divisor, check 2:OK #382/48 verifier_sdiv/SDIV64, non-zero reg divisor, check 2 @unpriv:OK #382/49 verifier_sdiv/SDIV64, non-zero reg divisor, check 3:OK #382/50 verifier_sdiv/SDIV64, non-zero reg divisor, check 3 @unpriv:OK #382/51 verifier_sdiv/SDIV64, non-zero reg divisor, check 4:OK #382/52 verifier_sdiv/SDIV64, non-zero reg divisor, check 4 @unpriv:OK #382/53 verifier_sdiv/SDIV64, non-zero reg divisor, check 5:OK #382/54 verifier_sdiv/SDIV64, non-zero reg divisor, check 5 @unpriv:OK #382/55 verifier_sdiv/SDIV64, non-zero reg divisor, check 6:OK #382/56 verifier_sdiv/SDIV64, non-zero reg divisor, check 6 @unpriv:OK #382/57 verifier_sdiv/SMOD32, non-zero imm divisor, check 1:OK #382/58 verifier_sdiv/SMOD32, non-zero imm divisor, check 1 @unpriv:OK #382/59 verifier_sdiv/SMOD32, non-zero imm divisor, check 2:OK #382/60 verifier_sdiv/SMOD32, non-zero imm divisor, check 2 @unpriv:OK #382/61 verifier_sdiv/SMOD32, non-zero imm divisor, check 3:OK #382/62 verifier_sdiv/SMOD32, non-zero imm divisor, check 3 @unpriv:OK #382/63 verifier_sdiv/SMOD32, non-zero imm divisor, check 4:OK #382/64 verifier_sdiv/SMOD32, non-zero imm divisor, check 4 @unpriv:OK #382/65 verifier_sdiv/SMOD32, non-zero imm divisor, check 5:OK #382/66 verifier_sdiv/SMOD32, non-zero imm divisor, check 5 @unpriv:OK #382/67 verifier_sdiv/SMOD32, non-zero imm divisor, check 6:OK #382/68 verifier_sdiv/SMOD32, non-zero imm divisor, check 6 @unpriv:OK #382/69 verifier_sdiv/SMOD32, non-zero reg divisor, check 1:OK #382/70 verifier_sdiv/SMOD32, non-zero reg divisor, check 1 @unpriv:OK #382/71 verifier_sdiv/SMOD32, non-zero reg divisor, check 2:OK #382/72 verifier_sdiv/SMOD32, non-zero reg divisor, check 2 @unpriv:OK #382/73 verifier_sdiv/SMOD32, non-zero reg divisor, check 3:OK #382/74 verifier_sdiv/SMOD32, non-zero reg divisor, check 3 @unpriv:OK #382/75 verifier_sdiv/SMOD32, non-zero reg divisor, check 4:OK #382/76 verifier_sdiv/SMOD32, non-zero reg divisor, check 4 @unpriv:OK #382/77 verifier_sdiv/SMOD32, non-zero reg divisor, check 5:OK #382/78 verifier_sdiv/SMOD32, non-zero reg divisor, check 5 @unpriv:OK #382/79 verifier_sdiv/SMOD32, non-zero reg divisor, check 6:OK #382/80 verifier_sdiv/SMOD32, non-zero reg divisor, check 6 @unpriv:OK #382/81 verifier_sdiv/SMOD64, non-zero imm divisor, check 1:OK #382/82 verifier_sdiv/SMOD64, non-zero imm divisor, check 1 @unpriv:OK #382/83 verifier_sdiv/SMOD64, non-zero imm divisor, check 2:OK #382/84 verifier_sdiv/SMOD64, non-zero imm divisor, check 2 @unpriv:OK #382/85 verifier_sdiv/SMOD64, non-zero imm divisor, check 3:OK #382/86 verifier_sdiv/SMOD64, non-zero imm divisor, check 3 @unpriv:OK #382/87 verifier_sdiv/SMOD64, non-zero imm divisor, check 4:OK #382/88 verifier_sdiv/SMOD64, non-zero imm divisor, check 4 @unpriv:OK #382/89 verifier_sdiv/SMOD64, non-zero imm divisor, check 5:OK #382/90 verifier_sdiv/SMOD64, non-zero imm divisor, check 5 @unpriv:OK #382/91 verifier_sdiv/SMOD64, non-zero imm divisor, check 6:OK #382/92 verifier_sdiv/SMOD64, non-zero imm divisor, check 6 @unpriv:OK #382/93 verifier_sdiv/SMOD64, non-zero imm divisor, check 7:OK #382/94 verifier_sdiv/SMOD64, non-zero imm divisor, check 7 @unpriv:OK #382/95 verifier_sdiv/SMOD64, non-zero imm divisor, check 8:OK #382/96 verifier_sdiv/SMOD64, non-zero imm divisor, check 8 @unpriv:OK #382/97 verifier_sdiv/SMOD64, non-zero reg divisor, check 1:OK #382/98 verifier_sdiv/SMOD64, non-zero reg divisor, check 1 @unpriv:OK #382/99 verifier_sdiv/SMOD64, non-zero reg divisor, check 2:OK #382/100 verifier_sdiv/SMOD64, non-zero reg divisor, check 2 @unpriv:OK #382/101 verifier_sdiv/SMOD64, non-zero reg divisor, check 3:OK #382/102 verifier_sdiv/SMOD64, non-zero reg divisor, check 3 @unpriv:OK #382/103 verifier_sdiv/SMOD64, non-zero reg divisor, check 4:OK #382/104 verifier_sdiv/SMOD64, non-zero reg divisor, check 4 @unpriv:OK #382/105 verifier_sdiv/SMOD64, non-zero reg divisor, check 5:OK #382/106 verifier_sdiv/SMOD64, non-zero reg divisor, check 5 @unpriv:OK #382/107 verifier_sdiv/SMOD64, non-zero reg divisor, check 6:OK #382/108 verifier_sdiv/SMOD64, non-zero reg divisor, check 6 @unpriv:OK #382/109 verifier_sdiv/SMOD64, non-zero reg divisor, check 7:OK #382/110 verifier_sdiv/SMOD64, non-zero reg divisor, check 7 @unpriv:OK #382/111 verifier_sdiv/SMOD64, non-zero reg divisor, check 8:OK #382/112 verifier_sdiv/SMOD64, non-zero reg divisor, check 8 @unpriv:OK #382/113 verifier_sdiv/SDIV32, zero divisor:OK #382/114 verifier_sdiv/SDIV32, zero divisor @unpriv:OK #382/115 verifier_sdiv/SDIV64, zero divisor:OK #382/116 verifier_sdiv/SDIV64, zero divisor @unpriv:OK #382/117 verifier_sdiv/SMOD32, zero divisor:OK #382/118 verifier_sdiv/SMOD32, zero divisor @unpriv:OK #382/119 verifier_sdiv/SMOD64, zero divisor:OK #382/120 verifier_sdiv/SMOD64, zero divisor @unpriv:OK #382 verifier_sdiv:OK Summary: 5/163 PASSED, 0 SKIPPED, 0 FAILED As the selftests don't compile for 32-bit architectures without modifications due to long being 32-bit, I have added new tests to lib/test_bpf.c for cpuv4 insns, all are passing: test_bpf: Summary: 1052 PASSED, 0 FAILED, [891/1040 JIT'ed] test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED [1] https://lore.kernel.org/all/mb61p5y4u3ptd.fsf@amazon.com/ ==================== Link: https://lore.kernel.org/r/20230907230550.1417590-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	MAINTAINERS: Add myself for ARM32 BPF JIT maintainer.	Puranjay Mohan	1	-2/+3
	As Shubham has been inactive since 2017, Add myself for ARM32 BPF JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-10-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	bpf/tests: add tests for cpuv4 instructions	Puranjay Mohan	2	-4/+417
	The BPF JITs now support cpuv4 instructions. Add tests for these new instructions to the test suite: 1. Sign extended Load 2. Sign extended Mov 3. Unconditional byte swap 4. Unconditional jump with 32-bit offset 5. Signed division and modulo Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-9-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	selftest, bpf: enable cpu v4 tests for arm32	Puranjay Mohan	5	-5/+10
	Now that all the cpuv4 instructions are supported by the arm32 JIT, enable the selftests for arm32. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-8-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for 64 bit division instruction	Puranjay Mohan	1	-1/+115
	ARM32 doesn't have instructions to do 64-bit/64-bit divisions. So, to implement the following instructions: BPF_ALU64 \| BPF_DIV BPF_ALU64 \| BPF_MOD BPF_ALU64 \| BPF_SDIV BPF_ALU64 \| BPF_SMOD We implement the above instructions by doing function calls to div64_u64() and div64_u64_rem() for unsigned division/mod and calls to div64_s64() for signed division/mod. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-7-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for 32-bit signed division	Puranjay Mohan	2	-8/+32
	The cpuv4 added a new BPF_SDIV instruction that does signed division. The encoding is similar to BPF_DIV but BPF_SDIV sets offset=1. ARM32 already supports 32-bit BPF_DIV which can be easily extended to support BPF_SDIV as ARM32 has the SDIV instruction. When the CPU is not ARM-v7, we implement that SDIV/SMOD with the function call similar to the implementation of DIV/MOD. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-6-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for unconditional bswap instruction	Puranjay Mohan	1	-3/+5
	The cpuv4 added a new unconditional bswap instruction with following behaviour: BPF_ALU64 \| BPF_TO_LE \| BPF_END with imm = 16/32/64 means: dst = bswap16(dst) dst = bswap32(dst) dst = bswap64(dst) As we already support converting to big-endian from little-endian we can use the same for unconditional bswap. just treat the unconditional scenario the same as big-endian conversion. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-5-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for sign-extension mov instruction	Puranjay Mohan	1	-5/+30
	The cpuv4 added a new BPF_MOVSX instruction that sign extends the src before moving it to the destination. BPF_ALU \| BPF_MOVSX sign extends 8-bit and 16-bit operands into 32-bit operands, and zeroes the remaining upper 32 bits. BPF_ALU64 \| BPF_MOVSX sign extends 8-bit, 16-bit, and 32-bit operands into 64-bit operands. The offset field of the instruction is used to tell the number of bit to use for sign-extension. BPF_MOV and BPF_MOVSX have the same code but the former sets offset to 0 and the later one sets the offset to 8, 16 or 32 The behaviour of this instruction is dst = (s8,s16,s32)src On ARM32 the implementation uses LSH and ARSH to extend the 8/16 bits to a 32-bit register and then it is sign extended to the upper 32-bit register using ARSH. For 32-bit we just move it to the destination register and use ARSH to extend it to the upper 32-bit register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-4-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for sign-extension load instruction	Puranjay Mohan	2	-1/+75
	The cpuv4 added the support of an instruction that is similar to load but also sign-extends the result after the load. BPF_MEMSX \| <size> \| BPF_LDX means dst = (signed size ) (src + offset) here <size> can be one of BPF_B, BPF_H, BPF_W. ARM32 has instructions to load a byte or a half word with sign extension into a 32bit register. As the JIT uses two 32 bit registers to simulate a 64-bit BPF register, an extra instruction is emitted to sign-extent the result up to the second register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-16	arm32, bpf: add support for 32-bit offset jmp instruction	Puranjay Mohan	1	-2/+7
	The cpuv4 adds unconditional jump with 32-bit offset where the immediate field of the instruction is to be used to calculate the jump offset. BPF_JA \| BPF_K \| BPF_JMP32 => gotol +imm => PC += imm. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15	bpf: Allow to use kfunc XDP hints and frags together	Larysa Zaremba	1	-1/+8
	There is no fundamental reason, why multi-buffer XDP and XDP kfunc RX hints cannot coexist in a single program. Allow those features to be used together by modifying the flags condition for dev-bound-only programs, segments are still prohibited for fully offloaded programs, hence additional check. Suggested-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/CAKH8qBuzgtJj=OKMdsxEkyML36VsAuZpcrsXcyqjdKXSJCBq=Q@mail.gmail.com/ Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230915083914.65538-1-larysa.zaremba@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15	Merge branch 'bpf: expose information about netdev xdp-metadata kfunc support'	Martin KaFai Lau	12	-13/+123
	Stanislav Fomichev says: ==================== Extend netdev netlink family to expose the bitmask with the kfuncs that the device implements. The source of truth is the device's xdp_metadata_ops. There is some amount of auto-generated netlink boilerplate; the change itself is super minimal. v2: - add netdev->xdp_metadata_ops NULL check when dumping to netlink (Martin) Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15	tools: ynl: extend netdev sample to dump xdp-rx-metadata-features	Stanislav Fomichev	4	-2/+30
	The tool can be used to verify that everything works end to end. Unrelated updates: - include tools/include/uapi to pick the latest kernel uapi headers - print "xdp-features" and "xdp-rx-metadata-features" so it's clear which bitmask is being dumped Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-4-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15	bpf: expose information about supported xdp metadata kfunc	Stanislav Fomichev	8	-5/+78
	Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15	bpf: make it easier to add new metadata kfunc	Stanislav Fomichev	3	-10/+19
	No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15	xsk: add multi-buffer support for sockets sharing umem	Tirthendu Sarkar	3	-1/+6
	Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15	bpf: Charge modmem for struct_ops trampoline	Song Liu	1	-4/+22
	Current code charges modmem for regular trampoline, but not for struct_ops trampoline. Add bpf_jit_[charge\|uncharge]_modmem() to struct_ops so the trampoline is charged in both cases. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230914222542.2986059-1-song@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-14	selftests/bpf: Skip module_fentry_shadow test when bpf_testmod is not available	Artem Savkov	1	-0/+5
	This test relies on bpf_testmod, so skip it if the module is not available. Fixes: aa3d65de4b900 ("bpf/selftests: Test fentry attachment to shadowed functions") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230914124928.340701-1-asavkov@redhat.com
2023-09-14	Merge branch 'seltests-xsk-various-improvements-to-xskxceiver'	Alexei Starovoitov	4	-261/+368
	Magnus Karlsson says: ==================== seltests/xsk: various improvements to xskxceiver This patch set implements several improvements to the xsk selftests test suite that I thought were useful while debugging the xsk multi-buffer code and tests. The largest new feature is the ability to be able to execute a single test instead of the whole test suite. This required some surgery on the current code, details below. Anatomy of the path set: 1: Print useful info on a per packet basis with the option -v 2: Add a timeout in the transmission loop too. We only used to have one for the Rx thread, but Tx can lock up too waiting for completions. 3: Add an option (-m) to only run the tests (or a single test with a later patch) in a single mode: skb, drv, or zc (zero-copy). 4-5: Preparatory patches to be able to specify a test to run. Need to define the test names in a single structure and their entry points, so we can use this when wanting to run a specific test. 6: Adds a command line option (-l) that lists all the tests. 7: Adds a command line option (-t) that runs a specific test instead of the whole test suite. Can be combined with -m to specify a single mode too. 8: Use ksft_print_msg() uniformly throughout the tests. It was a mix of printf() and ksft_print_msg() before. 9: In some places, we failed the whole test suite instead of a single test in certain circumstances. Fix this so only the test in question is failed and the rest of the test suite continues. 10: Display the available command line options with -h v3 -> v4: * Fixed another spelling error in patch #9 [Maciej] * Only allow the actual strings for the -m command [Maciej] * Move some code from patch #7 to #3 [Maciej] v2 -> v3: * Drop the support for environment variables. Probably not useful. [Maciej] * Fixed spelling mistake in patch #9 [Maciej] * Fail gracefully if unsupported mode is chosen [Maciej] * Simplified test run loop [Maciej] v1 -> v2: * Introduce XSKTEST_MODE env variable to be able to set the mode to use [Przemyslaw] * Introduce XSKTEST_ETH env variable to be able to set the ethernet interface to use by introducing a new patch (#11) [Magnus] * Fixed spelling error in patch #5 [Przemyslaw, Maciej] * Fixed confusing documentation in patch #10 [Przemyslaw] * The -l option can now be used without being root [Magnus, Maciej] * Fixed documentation error in patch #7 [Maciej] * Added error handling to the -t option [Maciej] * -h now displayed as an option [Maciej] Thanks: Magnus ==================== Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-1-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: display command line options with -h	Magnus Karlsson	2	-2/+14
	Add the -h option to display all available command line options available for test_xsk.sh and xskxceiver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-11-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: fail single test instead of all tests	Magnus Karlsson	1	-24/+46
	In a number of places at en error, exit_with_error() is called that terminates the whole test suite. This is not always desirable as it would be more logical to only fail that test and then go along with the other ones. So change this in a number of places in which I thought it would be more logical to just fail the test in question. Examples of this are in code that is only used by a single test. Also delete a pointless if-statement in receive_pkts() that has an exit_with_error() in it. It can never occur since the return value is an unsigned and the test is for less than zero. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-10-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: use ksft_print_msg uniformly	Magnus Karlsson	1	-12/+13
	Use ksft_print_msg() instead of printf() and fprintf() in all places as the ksefltests framework is being used. There is only one exception and that is for the list-of-tests print out option, since no tests are run in that case. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-9-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: add option to run single test	Magnus Karlsson	3	-21/+48
	Add a command line option to be able to run a single test. This option (-t) takes a number from the list of tests available with the "-l" option. Here are two examples: Run test number 2, the "receive single packet" test in all available modes: ./test_xsk.sh -t 2 Run test number 21, the metadata copy test in skb mode only ./test_xsh.sh -t 21 -m skb Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-8-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: add option that lists all tests	Magnus Karlsson	3	-7/+42
	Add a command line option (-l) that lists all the tests. The number before the test will be used in the next commit for specifying a single test to run. Here is an example of the output: Tests: 0: SEND_RECEIVE 1: SEND_RECEIVE_2K_FRAME 2: SEND_RECEIVE_SINGLE_PKT 3: POLL_RX 4: POLL_TX 5: POLL_RXQ_FULL 6: POLL_TXQ_FULL 7: SEND_RECEIVE_UNALIGNED : : Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-7-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: declare test names in struct	Magnus Karlsson	2	-171/+57
	Declare the test names statically in a struct so that we can refer to them when adding the support to execute a single test in the next commit. Before this patch, the names of them were not declared in a single place which made it not possible to refer to them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-6-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: move all tests to separate functions	Magnus Karlsson	1	-55/+115
	Prepare for the capability to be able to run a single test by moving all the tests to their own functions. This function can then be called to execute that test in the next commit. Also, the tests named RUN_TO_COMPLETION_* were not named well, so change them to SEND_RECEIVE_* as it is just a basic send and receive test of 4K packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-5-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: add option to only run tests in a single mode	Magnus Karlsson	3	-8/+47
	Add an option -m on the command line that allows the user to run the tests in a single mode instead of all of them. Valid modes are skb, drv, and zc (zero-copy). An example: To run test suite in drv mode only: ./test_xsk.sh -m drv Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-4-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: add timeout for Tx thread	Magnus Karlsson	1	-4/+22
	Add a timeout for the transmission thread. If packets are not completed properly, for some reason, the test harness would previously get stuck forever in a while loop. But with this patch, this timeout will trigger, flag the test as a failure, and continue with the next test. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-3-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	selftests/xsk: print per packet info in verbose mode	Magnus Karlsson	1	-3/+10
	Print info about every packet in verbose mode, both for Tx and Rx. This is useful to have when a test fails or to validate that a test is really doing what it was designed to do. Info on what is supposed to be received and sent is also printed for the custom packet streams since they differ from the base line. Here is an example: Tx addr: 37e0 len: 64 options: 0 pkt_nb: 8 Tx addr: 4000 len: 64 options: 0 pkt_nb: 9 Rx: addr: 100 len: 64 options: 0 pkt_nb: 0 valid: 1 Rx: addr: 1100 len: 64 options: 0 pkt_nb: 1 valid: 1 Rx: addr: 2100 len: 64 options: 0 pkt_nb: 4 valid: 1 Rx: addr: 3100 len: 64 options: 0 pkt_nb: 8 valid: 1 Rx: addr: 4100 len: 64 options: 0 pkt_nb: 9 valid: 1 One pointless verbose print statement is also deleted and another one is made clearer. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-2-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14	docs/bpf: update out-of-date doc in BPF flow dissector	Quan Tian	1	-1/+1
	Commit a5e2151ff9d5 ("net/ipv6: SKB symmetric hash should incorporate transport ports") removed the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in __skb_get_hash_symmetric(), making the doc out-of-date. Signed-off-by: Quan Tian <qtian@vmware.com> Link: https://lore.kernel.org/r/20230911152353.8280-1-qtian@vmware.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-12	Merge branch 'bpf-x64-fix-tailcall-infinite-loop'	Alexei Starovoitov	7	-12/+305
	Leon Hwang says: ==================== bpf, x64: Fix tailcall infinite loop This patch series fixes a tailcall infinite loop on x64. From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before. From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64. From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs. How about combining them all together? 1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller. As a result, a tailcall infinite loop comes up. And the loop would halt the machine. As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines. How did I discover the bug? From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for subprogs with tailcalls"), the total stack size limits to around 8KiB. Then, I write some bpf progs to validate the stack consuming, that are tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf. At that time, accidently, I made a tailcall loop. And then the loop halted my VM. Without the loop, the bpf progs would consume over 8KiB stack size. But the _stack-overflow_ did not halt my VM. With bpf_printk(), I confirmed that the tailcall count limit did not work expectedly. Next, read the code and fix it. Thank Ilya Leoshkevich, this bug on s390x has been fixed. Hopefully, this bug on arm64 will be fixed in near future. ==================== Link: https://lore.kernel.org/r/20230912150442.2009-1-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12	selftests/bpf: Add testcases for tailcall infinite loop fixing	Leon Hwang	3	-4/+269
	Add 4 test cases to confirm the tailcall infinite loop bug has been fixed. Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then check the final count result. tools/testing/selftests/bpf/test_progs -t tailcalls 226/13 tailcalls/tailcall_bpf2bpf_fentry:OK 226/14 tailcalls/tailcall_bpf2bpf_fexit:OK 226/15 tailcalls/tailcall_bpf2bpf_fentry_fexit:OK 226/16 tailcalls/tailcall_bpf2bpf_fentry_entry:OK 226 tailcalls:OK Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-4-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12	bpf, x64: Fix tailcall infinite loop	Leon Hwang	4	-8/+32
	From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before. From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64. From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs. How about combining them all together? 1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller. As a result, a tailcall infinite loop comes up. And the loop would halt the machine. As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines. Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12	bpf, x64: Comment tail_call_cnt initialisation	Leon Hwang	1	-0/+4
	Without understanding emit_prologue(), it is really hard to figure out where does tail_call_cnt come from, even though searching tail_call_cnt in the whole kernel repo. By adding these comments, it is a little bit easier to understand tail_call_cnt initialisation. Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-2-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12	selftests/bpf: Correct map_fd to data_fd in tailcalls	Leon Hwang	1	-16/+16
	Get and check data_fd. It should not check map_fd again. Meanwhile, correct some 'return' to 'goto out'. Thank the suggestion from Maciej in "bpf, x64: Fix tailcall infinite loop"[0] discussions. [0] https://lore.kernel.org/bpf/e496aef8-1f80-0f8e-dcdd-25a8c300319a@gmail.com/T/#m7d3b601066ba66400d436b7e7579b2df4a101033 Fixes: 79d49ba048ec ("bpf, testing: Add various tail call test cases") Fixes: 3b0379111197 ("selftests/bpf: Add tailcall_bpf2bpf tests") Fixes: 5e0b0a4c52d3 ("selftests/bpf: Test tail call counting with bpf2bpf and data on stack") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230906154256.95461-1-hffilwlqm@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-09	bpftool: Fix -Wcast-qual warning	Denys Zagorui	1	-1/+1
	This cast was made by purpose for older libbpf where the bpf_object_skeleton field is void * instead of const void * to eliminate a warning (as i understand -Wincompatible-pointer-types-discards-qualifiers) but this cast introduces another warning (-Wcast-qual) for libbpf where data field is const void * It makes sense for bpftool to be in sync with libbpf from kernel sources Signed-off-by: Denys Zagorui <dzagorui@cisco.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20230907090210.968612-1-dzagorui@cisco.com
2023-09-09	Merge branch 'selftests/bpf: Optimize kallsyms cache'	Andrii Nakryiko	5	-46/+122
	Rong Tao says: ==================== We need to optimize the kallsyms cache, including optimizations for the number of symbols limit, and, some test cases add new kernel symbols (such as testmods) and we need to refresh kallsyms (reload or refresh). ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-09	selftests/bpf: trace_helpers.c: Add a global ksyms initialization mutex	Rong Tao	1	-0/+4
	As Jirka said [0], we just need to make sure that global ksyms initialization won't race. [0] https://lore.kernel.org/lkml/ZPCbAs3ItjRd8XVh@krava/ Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/tencent_5D0A837E219E2CFDCB0495DAD7D5D1204407@qq.com