Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a new selftest to check the PTR_UNTRUSTED condition. Below is the
result,
#160 ptr_untrusted:OK
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We are utilizing BPF LSM to monitor BPF operations within our container
environment. When we add support for raw_tracepoint, it hits below
error.
; (const void *)attr->raw_tracepoint.name);
27: (79) r3 = *(u64 *)(r2 +0)
access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8
It can be reproduced with below BPF prog.
SEC("lsm/bpf")
int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
{
switch (cmd) {
case BPF_RAW_TRACEPOINT_OPEN:
bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
break;
default:
break;
}
return 0;
}
The reason is that when accessing a field in a union, such as bpf_attr,
if the field is located within a nested struct that is not the first
member of the union, it can result in incorrect field verification.
union bpf_attr {
struct {
__u32 map_type; <<<< Actually it will find that field.
__u32 key_size;
__u32 value_size;
...
};
...
struct {
__u64 name; <<<< We want to verify this field.
__u32 prog_fd;
} raw_tracepoint;
};
Considering the potential deep nesting levels, finding a perfect
solution to address this issue has proven challenging. Therefore, I
propose a solution where we simply skip the verification process if the
field in question is located within a union.
Fixes: 7e3617a72df3 ("bpf: Add array support to btf_struct_access")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add selftests for nested_strust to check whehter PTR_UNTRUSTED is cleared
as expected, the result as follows:
#141/1 nested_trust/test_read_cpumask:OK
#141/2 nested_trust/test_skb_field:OK <<<<
#141/3 nested_trust/test_invalid_nested_user_cpus:OK
#141/4 nested_trust/test_invalid_nested_offset:OK
#141/5 nested_trust/test_invalid_skb_field:OK <<<<
#141 nested_trust:OK
The #141/2 and #141/5 are newly added.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Per discussion with Alexei, the PTR_UNTRUSTED flag should not been
cleared when we start to walk a new struct, because the struct in
question may be a struct nested in a union. We should also check and set
this flag before we walk its each member, in case itself is a union.
We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL.
Fixes: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Menglong Dong says:
====================
bpf, x86: allow function arguments up to 12 for TRACING
From: Menglong Dong <imagedong@tencent.com>
For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than or equal to 6, if
not considering '> 8 bytes' struct argument. This is not friendly at all,
as too many functions have arguments count more than 6. According to the
current kernel version, below is a statistics of the function arguments
count:
argument count | function count
7 | 704
8 | 270
9 | 84
10 | 47
11 | 47
12 | 27
13 | 22
14 | 5
15 | 0
16 | 1
Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
In the 1st patch, we save/restore regs with BPF_DW size to make the code
in save_regs()/restore_regs() simpler.
In the 2nd patch, we make arch_prepare_bpf_trampoline() support to copy
function arguments in stack for x86 arch. Therefore, the maximum
arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY, FEXIT and
MODIFY_RETURN. Meanwhile, we clean the potential garbage value when we
copy the arguments on-stack.
And the 3rd patch is for the testcases of the this series.
Changes since v9:
- fix the failed test cases of trampoline_count and get_func_args_test
in the 3rd patch
Changes since v8:
- change the way to test fmod_ret in the 3rd patch
Changes since v7:
- split the testcases, and add fentry_many_args/fexit_many_args to
DENYLIST.aarch64 in 3rd patch
Changes since v6:
- somit nits from commit message and comment in the 1st patch
- remove the inline in get_nr_regs() in the 1st patch
- rename some function and various in the 1st patch
Changes since v5:
- adjust the commit log of the 1st patch, avoiding confusing people that
bugs exist in current code
- introduce get_nr_regs() to get the space that used to pass args on
stack correct in the 2nd patch
- add testcases to tracing_struct.c instead of fentry_test.c and
fexit_test.c
Changes since v4:
- consider the case of the struct in arguments can't be hold by regs
- add comment for some code
- add testcases for MODIFY_RETURN
- rebase to the latest
Changes since v3:
- try make the stack pointer 16-byte aligned. Not sure if I'm right :)
- introduce clean_garbage() to clean the grabage when argument count is 7
- use different data type in bpf_testmod_fentry_test{7,12}
- add testcase for grabage values in ctx
Changes since v2:
- keep MAX_BPF_FUNC_ARGS still
- clean garbage value in upper bytes in the 2nd patch
- move bpf_fentry_test{7,12} to bpf_testmod.c and rename them to
bpf_testmod_fentry_test{7,12} meanwhile in the 3rd patch
Changes since v1:
- change the maximun function arguments to 14 from 12
- add testcases (Jiri Olsa)
- instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
====================
Link: https://lore.kernel.org/r/20230713040738.1789742-1-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add fentry_many_args.c and fexit_many_args.c to test the fentry/fexit
with 7/11 arguments. As this feature is not supported by arm64 yet, we
disable these testcases for arm64 in DENYLIST.aarch64. We can combine
them with fentry_test.c/fexit_test.c when arm64 is supported too.
Correspondingly, add bpf_testmod_fentry_test7() and
bpf_testmod_fentry_test11() to bpf_testmod.c
Meanwhile, add bpf_modify_return_test2() to test_run.c to test the
MODIFY_RETURN with 7 arguments.
Add bpf_testmod_test_struct_arg_7/bpf_testmod_test_struct_arg_7 in
bpf_testmod.c to test the struct in the arguments.
And the testcases passed on x86_64:
./test_progs -t fexit
Summary: 5/14 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t fentry
Summary: 3/2 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t modify_return
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t tracing_struct
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-4-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than or equal to 6, if
not considering '> 8 bytes' struct argument. This is not friendly at all,
as too many functions have arguments count more than 6.
According to the current kernel version, below is a statistics of the
function arguments count:
argument count | function count
7 | 704
8 | 270
9 | 84
10 | 47
11 | 47
12 | 27
13 | 22
14 | 5
15 | 0
16 | 1
Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
For the case that we don't need to call origin function, which means
without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
that stored in the frame of the caller to current frame. The 7th and later
arguments are stored in "$rbp + 0x18", and they will be copied to the
stack area following where register values are saved.
For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
in stack before call origin function, which means we need alloc extra
"8 * (arg_count - 6)" memory in the top of the stack. Note, there should
not be any data be pushed to the stack before calling the origin function.
So 'rbx' value will be stored on a stack position higher than where stack
arguments are stored for BPF_TRAMP_F_CALL_ORIG.
According to the research of Yonghong, struct members should be all in
register or all on the stack. Meanwhile, the compiler will pass the
argument on regs if the remaining regs can hold the argument. Therefore,
we need save the arguments in order. Otherwise, disorder of the args can
happen. For example:
struct foo_struct {
long a;
int b;
};
int foo(char, char, char, char, char, struct foo_struct,
char);
the arg1-5,arg7 will be passed by regs, and arg6 will by stack. Therefore,
we should save/restore the arguments in the same order with the
declaration of foo(). And the args used as ctx in stack will be like this:
reg_arg6 -- copy from regs
stack_arg2 -- copy from stack
stack_arg1
reg_arg5 -- copy from regs
reg_arg4
reg_arg3
reg_arg2
reg_arg1
We use EMIT3_off32() or EMIT4() for "lea" and "sub". The range of the
imm in "lea" and "sub" is [-128, 127] if EMIT4() is used. Therefore,
we use EMIT3_off32() instead if the imm out of the range.
It works well for the FENTRY/FEXIT/MODIFY_RETURN.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-3-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As we already reserve 8 byte in the stack for each reg, it is ok to
store/restore the regs in BPF_DW size. This will make the code in
save_regs()/restore_regs() simpler.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-2-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After using "__fallthrough;" in a switch/case block in bpftool's
btf_dumper.c [0], and then turning it into a comment [1] to prevent a
merge conflict in linux-next when the keyword was changed into just
"fallthrough;" [2], we can now drop the comment and use the new keyword,
no underscores.
Also update the other occurrence of "/* fallthrough */" in bpftool.
[0] commit 9fd496848b1c ("bpftool: Support inline annotations when dumping the CFG of a program")
[1] commit 4b7ef71ac977 ("bpftool: Replace "__fallthrough" by a comment to address merge conflict")
[2] commit f7a858bffcdd ("tools: Rename __fallthrough to fallthrough")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230712152322.81758-1-quentin@isovalent.com
|
|
Alexei Starovoitov says:
====================
v3->v4:
- extra patch 14 from Hou to check for object leaks.
- fixed the race/leak in free_by_rcu_ttrace. Extra hunk in patch 8.
- added Acks and fixed typos.
v2->v3:
- dropped _tail optimization for free_by_rcu_ttrace
- new patch 5 to refactor inc/dec of c->active
- change 'draining' logic in patch 7
- add rcu_barrier in patch 12
- __llist_add-> llist_add(waiting_for_gp_ttrace) in patch 9 to fix race
- David's Ack in patch 13 and explanation that migrate_disable cannot be removed just yet.
v1->v2:
- Fixed race condition spotted by Hou. Patch 7.
v1:
Introduce bpf_mem_cache_free_rcu() that is similar to kfree_rcu except
the objects will go through an additional RCU tasks trace grace period
before being freed into slab.
Patches 1-9 - a bunch of prep work
Patch 10 - a patch from Paul that exports rcu_request_urgent_qs_task().
Patch 12 - the main bpf_mem_cache_free_rcu patch.
Patch 13 - use it in bpf_cpumask.
bpf_local_storage, bpf_obj_drop, qp-trie will be other users eventually.
With additional hack patch to htab that replaces bpf_mem_cache_free with bpf_mem_cache_free_rcu
the following are benchmark results:
- map_perf_test 4 8 16348 1000000
drops from 800k to 600k. Waiting for RCU GP makes objects cache cold.
- bench htab-mem -a -p 8
20% drop in performance and big increase in memory. From 3 Mbyte to 50 Mbyte. As expected.
- bench htab-mem -a -p 16 --use-case add_del_on_diff_cpu
Same performance and better memory consumption.
Before these patches this bench would OOM (with or without 'reuse after GP')
Patch 8 addresses the issue.
At the end the performance drop and additional memory consumption due to _rcu()
were expected and came out to be within reasonable margin.
Without Paul's patch 10 the memory consumption in 'bench htab-mem' is in Gbytes
which wouldn't be acceptable.
Patch 8 is a heuristic to address 'alloc on one cpu, free on another' issue.
It works well in practice. One can probably construct an artificial benchmark
to make heuristic ineffective, but we have to trade off performance, code complexity,
and memory consumption.
The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_llist above high watermark -> free_by_rcu_ttrace
free_rcu: enqueue free_by_rcu -> waiting_for_gp
after RCU GP waiting_for_gp -> free_by_rcu_ttrace
free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The object leak check is cheap. Do it unconditionally to spot difficult races
in bpf_mem_alloc.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230706033447.54696-15-alexei.starovoitov@gmail.com
|
|
Convert bpf_cpumask to bpf_mem_cache_free_rcu.
Note that migrate_disable() in bpf_cpumask_release() is still necessary, since
bpf_cpumask_release() is a dtor. bpf_obj_free_fields() can be converted to do
migrate_disable() there in a follow up.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-14-alexei.starovoitov@gmail.com
|
|
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
objects into free_by_rcu_ttrace list where they are waiting for RCU
task trace grace period to be freed into slab.
The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_rcu: enqueue free_by_rcu -> waiting_for_gp
free_llist above high watermark -> free_by_rcu_ttrace
after RCU GP waiting_for_gp -> free_by_rcu_ttrace
free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com
|
|
bpf_obj_new() calls bpf_mem_alloc(), but doing alloc/free of 8 elements
is not triggering watermark conditions in bpf_mem_alloc.
Increase to 200 elements to make sure alloc_bulk/free_bulk is exercised.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-12-alexei.starovoitov@gmail.com
|
|
If a CPU is executing a long series of non-sleeping system calls,
RCU grace periods can be delayed for on the order of a couple hundred
milliseconds. This is normally not a problem, but if each system call
does a call_rcu(), those callbacks can stack up. RCU will eventually
notice this callback storm, but use of rcu_request_urgent_qs_task()
allows the code invoking call_rcu() to give RCU a heads up.
This function is not for general use, not yet, anyway.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230706033447.54696-11-alexei.starovoitov@gmail.com
|
|
alloc_bulk() can reuse elements from free_by_rcu_ttrace.
Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc().
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230706033447.54696-10-alexei.starovoitov@gmail.com
|
|
To address OOM issue when one cpu is allocating and another cpu is freeing add
a target bpf_mem_cache hint to allocated objects and when local cpu free_llist
overflows free to that bpf_mem_cache. The hint addresses the OOM while
maintaining the same performance for common case when alloc/free are done on the
same cpu.
Note that do_call_rcu_ttrace() now has to check 'draining' flag in one more case,
since do_call_rcu_ttrace() is called not only for current cpu.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-9-alexei.starovoitov@gmail.com
|
|
The next patch will introduce cross-cpu llist access and existing
irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will
not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't
guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add
objects back to llist of cpu A. Add 'bool draining' flag.
The modified sequence looks like:
for_each_cpu:
WRITE_ONCE(c->draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more
irq_work_sync(); // wait for irq_work callback (free_bulk) to finish
drain_mem_cache(); // free all objects
rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-8-alexei.starovoitov@gmail.com
|
|
In certain scenarios alloc_bulk() might be taking free objects mainly from
free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are
redundant, but they show up in perf profile. Split the loop and only set memcg
when allocating from slab. No performance difference in this patch alone, but
it helps in combination with further patches.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-7-alexei.starovoitov@gmail.com
|
|
Factor out local_inc/dec_return(&c->active) into helpers.
No functional changes.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-6-alexei.starovoitov@gmail.com
|
|
Factor out inner body of alloc_bulk into separate helper.
No functional changes.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-5-alexei.starovoitov@gmail.com
|
|
Let free_all() helper return the number of freed elements.
It's not used in this patch, but helps in debug/development of bpf_mem_alloc.
For example this diff for __free_rcu():
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
+ printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(),
+ free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size));
would show how busy RCU tasks trace is.
In artificial benchmark where one cpu is allocating and different cpu is freeing
the RCU tasks trace won't be able to keep up and the list of objects
would keep growing from thousands to millions and eventually OOMing.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-4-alexei.starovoitov@gmail.com
|
|
Use kmemdup() to simplify the code.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-3-alexei.starovoitov@gmail.com
|
|
Rename:
- struct rcu_head rcu;
- struct llist_head free_by_rcu;
- struct llist_head waiting_for_gp;
- atomic_t call_rcu_in_progress;
+ struct llist_head free_by_rcu_ttrace;
+ struct llist_head waiting_for_gp_ttrace;
+ struct rcu_head rcu_ttrace;
+ atomic_t call_rcu_ttrace_in_progress;
...
- static void do_call_rcu(struct bpf_mem_cache *c)
+ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
to better indicate intended use.
The 'tasks trace' is shortened to 'ttrace' to reduce verbosity.
No functional changes.
Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230706033447.54696-2-alexei.starovoitov@gmail.com
|
|
Add a per-cpu array resizing use case and demonstrate how
bpf_get_smp_processor_id() can be used to directly access proper data
with no extra checks.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230711232400.1658562-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_get_smp_processor_id() helper returns current CPU on which BPF
program runs. It can't return value that is bigger than maximum allowed
number of CPUs (minus one, due to zero indexing). Teach BPF verifier to
recognize that. This makes it possible to use bpf_get_smp_processor_id()
result to index into arrays without extra checks, as demonstrated in
subsequent selftests/bpf patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230711232400.1658562-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
links'
Yafang Shao says:
====================
This patchset enhances the usability of kprobe_multi program by introducing
support for ->fill_link_info. This allows users to easily determine the
probed functions associated with a kprobe_multi program. While
`bpftool perf show` already provides information about functions probed by
perf_event programs, supporting ->fill_link_info ensures consistent access
to this information across all bpf links.
In addition, this patch extends support to generic perf events, which are
currently not covered by `bpftool perf show`. While userspace is exposed to
only the perf type and config, other attributes such as sample_period and
sample_freq are disregarded.
To ensure accurate identification of probed functions, it is preferable to
expose the address directly rather than relying solely on the symbol name.
However, this implementation respects the kptr_restrict setting and avoids
exposing the address if it is not permitted.
v6->v7:
- From Daniel
- No need to explicitly cast in many places
- Use ptr_to_u64() instead of the cast
- return -ENOMEM when calloc fails
- Simplify the code in bpf_get_kprobe_info() further
- Squash #9 with #8
- And other coding style improvement
- From Andrii
- Comment improvement
- Use ENOSPC instead of E2BIG
- Use strlen only when buf in not NULL
- Clear probe_addr in bpf_get_uprobe_info()
v5->v6:
- From Andrii
- if ucount is too less, copy ucount items and return -E2BIG
- zero out kmulti_link->cnt elements if it is not permitted by kptr
- avoid leaking information when ucount is greater than kmulti_link->cnt
- drop the flags, and add BPF_PERF_EVENT_[UK]RETPROBE
- From Quentin
- use jsonw_null instead when we have no module name
- add explanation on perf_type_name in the commit log
- avoid the unnecessary out lable
v4->v5:
- Print "func [module]" in the kprobe_multi header (Andrii)
- Remove MAX_BPF_PERF_EVENT_TYPE (Alexei)
- Add padding field for future reuse (Yonghong)
v3->v4:
- From Quentin
- Rename MODULE_NAME_LEN to MODULE_MAX_NAME
- Convert retprobe to boolean for json output
- Trim the square brackets around module names for json output
- Move perf names into link.c
- Use a generic helper to get perf names
- Show address before func name, for consistency
- Use switch-case instead of if-else
- Increase the buff len to PATH_MAX
- Move macros to the top of the file
- From Andrii
- kprobe_multi flags should always be returned
- Keep it single line if it fits in under 100 characters
- Change the output format when showing kprobe_multi
- Imporve the format of perf_event names
- Rename struct perf_link to struct perf_event, and change the names of
the enum consequently
- From Yonghong
- Avoid disallowing extensions for all structs in the big union
- From Jiri
- Add flags to bpf_kprobe_multi_link
- Report kprobe_multi selftests errors
- Rename bpf_perf_link_fill_name and make it a separate patch
- Avoid breaking compilation when CONFIG_KPROBE_EVENTS or
CONFIG_UPROBE_EVENTS options are not defined
v2->v3:
- Expose flags instead of retporbe (Andrii)
- Simplify the check on kmulti_link->cnt (Andrii)
- Use kallsyms_show_value() instead (Andrii)
- Show also the module name for kprobe_multi (Andrii)
- Add new enum bpf_perf_link_type (Andrii)
- Move perf event names into bpftool (Andrii, Quentin, Jiri)
- Keep perf event names in sync with perf tools (Jiri)
v1->v2:
- Fix sparse warning (Stanislav, lkp@intel.com)
- Fix BPF CI build error
- Reuse kernel_syms_load() (Alexei)
- Print 'name' instead of 'func' (Alexei)
- Show whether the probe is retprobe or not (Andrii)
- Add comment for the meaning of perf_event name (Andrii)
- Add support for generic perf event
- Adhere to the kptr_restrict setting
RFC->v1:
- Use a single copy_to_user() instead (Jiri)
- Show also the symbol name in bpftool (Quentin, Alexei)
- Use calloc() instead of malloc() in bpftool (Quentin)
- Avoid having conditional entries in the JSON output (Quentin)
- Drop ->show_fdinfo (Alexei)
- Use __u64 instead of __aligned_u64 for the field addr (Alexei)
- Avoid the contradiction in perf_event name length (Alexei)
- Address a build warning reported by kernel test robot <lkp@intel.com>
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Enhance bpftool to display comprehensive information about exposed
perf_event links, covering uprobe, kprobe, tracepoint, and generic perf
event. The resulting output will include the following details:
$ tools/bpf/bpftool/bpftool link show
3: perf_event prog 14
event software:cpu-clock
bpf_cookie 0
pids perf_event(19483)
4: perf_event prog 14
event hw-cache:LLC-load-misses
bpf_cookie 0
pids perf_event(19483)
5: perf_event prog 14
event hardware:cpu-cycles
bpf_cookie 0
pids perf_event(19483)
6: perf_event prog 19
tracepoint sched_switch
bpf_cookie 0
pids tracepoint(20947)
7: perf_event prog 26
uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
bpf_cookie 0
pids uprobe(21973)
8: perf_event prog 27
uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
bpf_cookie 0
pids uprobe(21973)
10: perf_event prog 43
kprobe ffffffffb70a9660 kernel_clone
bpf_cookie 0
pids kprobe(35275)
11: perf_event prog 41
kretprobe ffffffffb70a9660 kernel_clone
bpf_cookie 0
pids kprobe(35275)
$ tools/bpf/bpftool/bpftool link show -j
[{"id":3,"type":"perf_event","prog_id":14,"event_type":"software","event_config":"cpu-clock","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":4,"type":"perf_event","prog_id":14,"event_type":"hw-cache","event_config":"LLC-load-misses","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":5,"type":"perf_event","prog_id":14,"event_type":"hardware","event_config":"cpu-cycles","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":6,"type":"perf_event","prog_id":19,"tracepoint":"sched_switch","bpf_cookie":0,"pids":[{"pid":20947,"comm":"tracepoint"}]},{"id":7,"type":"perf_event","prog_id":26,"retprobe":false,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":8,"type":"perf_event","prog_id":27,"retprobe":true,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":10,"type":"perf_event","prog_id":43,"retprobe":false,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]},{"id":11,"type":"perf_event","prog_id":41,"retprobe":true,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]}]
For generic perf events, the displayed information in bpftool is limited to
the type and configuration, while other attributes such as sample_period,
sample_freq, etc., are not included.
The kernel function address won't be exposed if it is not permitted by
kptr_restrict. The result as follows when kptr_restrict is 2.
$ tools/bpf/bpftool/bpftool link show
3: perf_event prog 14
event software:cpu-clock
4: perf_event prog 14
event hw-cache:LLC-load-misses
5: perf_event prog 14
event hardware:cpu-cycles
6: perf_event prog 19
tracepoint sched_switch
7: perf_event prog 26
uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
8: perf_event prog 27
uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
10: perf_event prog 43
kprobe kernel_clone
11: perf_event prog 41
kretprobe kernel_clone
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-11-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add new functions and macros to get perf event names. These names except
the perf_type_name are all copied from
tool/perf/util/{parse-events,evsel}.c, so that in the future we will
have a good chance to use the same code.
Suggested-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-10-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
By introducing support for ->fill_link_info to the perf_event link, users
gain the ability to inspect it using `bpftool link show`. While the current
approach involves accessing this information via `bpftool perf show`,
consolidating link information for all link types in one place offers
greater convenience. Additionally, this patch extends support to the
generic perf event, which is not currently accommodated by
`bpftool perf show`. While only the perf type and config are exposed to
userspace, other attributes such as sample_period and sample_freq are
ignored. It's important to note that if kptr_restrict is not permitted, the
probed address will not be exposed, maintaining security measures.
A new enum bpf_perf_event_type is introduced to help the user understand
which struct is relevant.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a common helper bpf_copy_to_user(), which will be used at multiple
places.
No functional change.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-8-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since different symbols can share the same name, it is insufficient to only
expose the symbol name. It is essential to also expose the symbol address
so that users can accurately identify which one is being probed.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-7-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
To avoid returning uninitialized or random values when querying the file
descriptor (fd) and accessing probe_addr, it is necessary to clear the
variable prior to its use.
Fixes: 41bdc4b40ed6 ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The probed address can be accessed by userspace through querying the task
file descriptor (fd). However, it is crucial to adhere to the kptr_restrict
setting and refrain from exposing the address if it is not permitted.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Show the already expose kprobe_multi link info in bpftool. The result as
follows,
$ tools/bpf/bpftool/bpftool link show
91: kprobe_multi prog 244
kprobe.multi func_cnt 7
addr func [module]
ffffffff98c44f20 schedule_timeout_interruptible
ffffffff98c44f60 schedule_timeout_killable
ffffffff98c44fa0 schedule_timeout_uninterruptible
ffffffff98c44fe0 schedule_timeout_idle
ffffffffc075b8d0 xfs_trans_get_efd [xfs]
ffffffffc0768a10 xfs_trans_get_buf_map [xfs]
ffffffffc076c320 xfs_trans_get_dqtrx [xfs]
pids kprobe_multi(188367)
92: kprobe_multi prog 244
kretprobe.multi func_cnt 7
addr func [module]
ffffffff98c44f20 schedule_timeout_interruptible
ffffffff98c44f60 schedule_timeout_killable
ffffffff98c44fa0 schedule_timeout_uninterruptible
ffffffff98c44fe0 schedule_timeout_idle
ffffffffc075b8d0 xfs_trans_get_efd [xfs]
ffffffffc0768a10 xfs_trans_get_buf_map [xfs]
ffffffffc076c320 xfs_trans_get_dqtrx [xfs]
pids kprobe_multi(188367)
$ tools/bpf/bpftool/bpftool link show -j
[{"id":91,"type":"kprobe_multi","prog_id":244,"retprobe":false,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]},{"id":92,"type":"kprobe_multi","prog_id":244,"retprobe":true,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]}]
When kptr_restrict is 2, the result is,
$ tools/bpf/bpftool/bpftool link show
91: kprobe_multi prog 244
kprobe.multi func_cnt 7
92: kprobe_multi prog 244
kretprobe.multi func_cnt 7
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If the kernel symbol is in a module, we will dump the module name as
well. The square brackets around the module name are trimmed.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With the addition of support for fill_link_info to the kprobe_multi link,
users will gain the ability to inspect it conveniently using the
`bpftool link show`. This enhancement provides valuable information to the
user, including the count of probed functions and their respective
addresses. It's important to note that if the kptr_restrict setting is not
permitted, the probed address will not be exposed, ensuring security.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
__NR_open never exist on AArch64.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_C6AD4AD72BEFE813228FC188905F96C6A506@qq.com
|
|
Remove the wrong HASHMAP_INIT. It's not used anywhere in libbpf.
Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230711070712.2064144-1-sanpeqf@gmail.com
|
|
realloc() and reallocarray() can either return NULL or a special
non-NULL pointer, if their size argument is zero. This requires a bit
more care to handle NULL-as-valid-result situation differently from
NULL-as-error case. This has caused real issues before ([0]), and just
recently bit again in production when performing bpf_program__attach_usdt().
This patch fixes 4 places that do or potentially could suffer from this
mishandling of NULL, including the reported USDT-related one.
There are many other places where realloc()/reallocarray() is used and
NULL is always treated as an error value, but all those have guarantees
that their size is always non-zero, so those spot don't need any extra
handling.
[0] d08ab82f59d5 ("libbpf: Fix double-free when linker processes empty sections")
Fixes: 999783c8bbda ("libbpf: Wire up spec management and other arch-independent USDT logic")
Fixes: b63b3c490eee ("libbpf: Add bpf_program__set_insns function")
Fixes: 697f104db8a6 ("libbpf: Support custom SEC() handlers")
Fixes: b12688267280 ("libbpf: Change the order of data and text relocations.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230711024150.1566433-1-andrii@kernel.org
|
|
The BPF standardization effort is actively underway with the IETF. As
described in the BPF Working Group (WG) charter in [0], there are a
number of proposed documents, some informational and some proposed
standards, that will be drafted as part of the standardization effort.
[0]: https://datatracker.ietf.org/wg/bpf/about/
Though the specific documents that will formally be standardized will
exist as Internet Drafts (I-D) and WG documents in the BPF WG
datatracker page, the source of truth from where those documents will be
generated will reside in the kernel documentation tree (originating in
the bpf-next tree).
Because these documents will be used to generate the I-D and WG
documents which will be standardized with the IETF, they are a bit
special as far as kernel-tree documentation goes:
- They will be dual licensed with LGPL-2.1 OR BSD-2-Clause
- IETF I-D and WG documents (the documents which will actually be
standardized) will be auto-generated from these documents.
In order to keep things clearly organized in the BPF documentation tree,
and to make it abundantly clear where standards-related documentation
needs to go, we should move standards-relevant documents into a separate
standardization/ subdirectory.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230710183027.15132-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Quentin Monnet says:
====================
At runtime, bpftool may run its own BPF programs to get the pids of
processes referencing BPF programs, or to profile programs. The skeletons
for these programs rely on a vmlinux.h header and may fail to compile when
building bpftool on hosts running older kernels, where some structs or
enums are not defined. In this set, we address this issue by using local
definitions for struct perf_event, struct bpf_perf_link,
BPF_LINK_TYPE_PERF_EVENT (pids.bpf.c) and struct bpf_perf_event_value
(profiler.bpf.c).
This set contains patches 1 to 3 from Alexander Lobakin's series, "bpf:
random unpopular userspace fixes (32 bit et al)" (v2) [0], from April 2022.
An additional patch defines a local version of BPF_LINK_TYPE_PERF_EVENT in
bpftool's pids.bpf.c.
[0] https://lore.kernel.org/bpf/20220421003152.339542-1-alobakin@pm.me/
v2: Fixed description (CO-RE for container_of()) in patch 2.
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Alexander Lobakin (3):
bpftool: use a local copy of perf_event to fix accessing ::bpf_cookie
bpftool: define a local bpf_perf_link to fix accessing its fields
bpftool: use a local bpf_perf_event_value to fix accessing its fields
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Fix the following error when building bpftool:
CLANG profiler.bpf.o
CLANG pid_iter.bpf.o
skeleton/profiler.bpf.c:18:21: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_perf_event_value'
__uint(value_size, sizeof(struct bpf_perf_event_value));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:13:39: note: expanded from macro '__uint'
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helper_defs.h:7:8: note: forward declaration of 'struct bpf_perf_event_value'
struct bpf_perf_event_value;
^
struct bpf_perf_event_value is being used in the kernel only when
CONFIG_BPF_EVENTS is enabled, so it misses a BTF entry then.
Define struct bpf_perf_event_value___local with the
`preserve_access_index` attribute inside the pid_iter BPF prog to
allow compiling on any configs. It is a full mirror of a UAPI
structure, so is compatible both with and w/o CO-RE.
bpf_perf_event_read_value() requires a pointer of the original type,
so a cast is needed.
Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-5-quentin@isovalent.com
|
|
In order to allow the BPF program in bpftool's pid_iter.bpf.c to compile
correctly on hosts where vmlinux.h does not define
BPF_LINK_TYPE_PERF_EVENT (running kernel versions lower than 5.15, for
example), define and use a local copy of the enum value. This requires
LLVM 12 or newer to build the BPF program.
Fixes: cbdaf71f7e65 ("bpftool: Add bpf_cookie to link output")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-4-quentin@isovalent.com
|
|
When building bpftool with !CONFIG_PERF_EVENTS:
skeleton/pid_iter.bpf.c:47:14: error: incomplete definition of type 'struct bpf_perf_link'
perf_link = container_of(link, struct bpf_perf_link, link);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:74:22: note: expanded from macro 'container_of'
((type *)(__mptr - offsetof(type, member))); \
^~~~~~~~~~~~~~~~~~~~~~
tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:68:60: note: expanded from macro 'offsetof'
#define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE *)0)->MEMBER)
~~~~~~~~~~~^
skeleton/pid_iter.bpf.c:44:9: note: forward declaration of 'struct bpf_perf_link'
struct bpf_perf_link *perf_link;
^
&bpf_perf_link is being defined and used only under the ifdef.
Define struct bpf_perf_link___local with the `preserve_access_index`
attribute inside the pid_iter BPF prog to allow compiling on any
configs. CO-RE will substitute it with the real struct bpf_perf_link
accesses later on.
container_of() uses offsetof(), which does the necessary CO-RE
relocation if the field is specified with `preserve_access_index` - as
is the case for struct bpf_perf_link___local.
Fixes: cbdaf71f7e65 ("bpftool: Add bpf_cookie to link output")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-3-quentin@isovalent.com
|
|
When CONFIG_PERF_EVENTS is not set, struct perf_event remains empty.
However, the structure is being used by bpftool indirectly via BTF.
This leads to:
skeleton/pid_iter.bpf.c:49:30: error: no member named 'bpf_cookie' in 'struct perf_event'
return BPF_CORE_READ(event, bpf_cookie);
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
...
skeleton/pid_iter.bpf.c:49:9: error: returning 'void' from a function with incompatible result type '__u64' (aka 'unsigned long long')
return BPF_CORE_READ(event, bpf_cookie);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tools and samples can't use any CONFIG_ definitions, so the fields
used there should always be present.
Define struct perf_event___local with the `preserve_access_index`
attribute inside the pid_iter BPF prog to allow compiling on any
configs. CO-RE will substitute it with the real struct perf_event
accesses later on.
Fixes: cbdaf71f7e65 ("bpftool: Add bpf_cookie to link output")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230707095425.168126-2-quentin@isovalent.com
|
|
Don't reset recorded sec_def handler unconditionally on
bpf_program__set_type(). There are two situations where this is wrong.
First, if the program type didn't actually change. In that case original
SEC handler should work just fine.
Second, catch-all custom SEC handler is supposed to work with any BPF
program type and SEC() annotation, so it also doesn't make sense to
reset that.
This patch fixes both issues. This was reported recently in the context
of breaking perf tool, which uses custom catch-all handler for fancy BPF
prologue generation logic. This patch should fix the issue.
[0] https://lore.kernel.org/linux-perf-users/ab865e6d-06c5-078e-e404-7f90686db50d@amd.com/
Fixes: d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230707231156.1711948-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When wrapping code, use ';' better than using ',' which is more in line with
the coding habits of most engineers.
Signed-off-by: Lu Hongfei <luhongfei@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230707081253.34638-1-luhongfei@vivo.com
|
|
Now that kernel provides a new available_filter_functions_addrs file
which can help us avoid the need to cross-validate
available_filter_functions and kallsyms, we can improve efficiency of
multi-attach kprobes. For example, on my device, the sample program [1]
of start time:
$ sudo ./funccount "tcp_*"
before after
1.2s 1.0s
[1]: https://github.com/JackieLiu1/ketones/tree/master/src/funccount
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-2-liu.yun@linux.dev
|
|
When using regular expression matching with "kprobe multi", it scans all
the functions under "/proc/kallsyms" that can be matched. However, not all
of them can be traced by kprobe.multi. If any one of the functions fails
to be traced, it will result in the failure of all functions. The best
approach is to filter out the functions that cannot be traced to ensure
proper tracking of the functions.
Closes: https://lore.kernel.org/oe-kbuild-all/202307030355.TdXOHklM-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230705091209.3803873-1-liu.yun@linux.dev
|