bpf: Inline calls to bpf_loop when callback is known - starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

diff options

author	Eduard Zingerman <eddyz87@gmail.com>	2022-06-21 02:53:42 +0300
committer	Alexei Starovoitov <ast@kernel.org>	2022-06-21 03:40:51 +0300
commit	1ade23711971b0eececf0d7fedc29d3c1d2fce01 (patch)
tree	8e32aa6e6109037ea5020965f237ae3821b17999 /tools
parent	7a42008ca5c700819e4b3003025e5e1695fd1f86 (diff)
download	linux-1ade23711971b0eececf0d7fedc29d3c1d2fce01.tar.xz

bpf: Inline calls to bpf_loop when callback is known

Calls to `bpf_loop` are replaced with direct loops to avoid indirection. E.g. the following: bpf_loop(10, foo, NULL, 0); Is replaced by equivalent of the following: for (int i = 0; i < 10; ++i) foo(i, NULL); This transformation could be applied when: - callback is known and does not change during program execution; - flags passed to `bpf_loop` are always zero. Inlining logic works as follows: - During execution simulation function `update_loop_inline_state` tracks the following information for each `bpf_loop` call instruction: - is callback known and constant? - are flags constant and zero? - Function `optimize_bpf_loop` increases stack depth for functions where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` to apply the inlining. The additional stack space is used to spill registers R6, R7 and R8. These registers are used as loop counter, loop maximal bound and callback context parameter; Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Diffstat (limited to 'tools')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: