summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/stackcollapse.py
diff options
context:
space:
mode:
authorPuranjay Mohan <puranjay@kernel.org>2024-10-26 15:53:37 +0300
committerDaniel Borkmann <daniel@iogearbox.net>2024-10-30 17:29:59 +0300
commit6a4794d5a3e2bf10233ce8a5e53f168e23715e8a (patch)
tree6c79fb12267fb9be945a1e273586783cec9257d3 /tools/perf/scripts/python/stackcollapse.py
parentdb71aae70e3e646d8ba4cb50e4bd4c281a91c804 (diff)
downloadlinux-6a4794d5a3e2bf10233ce8a5e53f168e23715e8a.tar.xz
bpf: bpf_csum_diff: Optimize and homogenize for all archs
1. Optimization ------------ The current implementation copies the 'from' and 'to' buffers to a scratchpad and it takes the bitwise NOT of 'from' buffer while copying. In the next step csum_partial() is called with this scratchpad. so, mathematically, the current implementation is doing: result = csum(to - from) Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need it in the scratchpad buffer because csum_partial() takes a single contiguous buffer and not two disjoint buffers like 'to' and 'from'. We can re write this equation to: result = csum(to) - csum(from) using the distributive property of csum(). this allows 'to' and 'from' to be at different locations and therefore this scratchpad and copying is not needed. This in C code will look like: result = csum_sub(csum_partial(to, to_size, seed), csum_partial(from, from_size, 0)); 2. Homogenization -------------- The bpf_csum_diff() helper calls csum_partial() which is implemented by some architectures like arm and x86 but other architectures rely on the generic implementation in lib/checksum.c The generic implementation in lib/checksum.c returns a 16 bit value but the arch specific implementations can return more than 16 bits, this works out in most places because before the result is used, it is passed through csum_fold() that turns it into a 16-bit value. bpf_csum_diff() directly returns the value from csum_partial() and therefore the returned values could be different on different architectures. see discussion in [1]: for the int value 28 the calculated checksums are: x86 : -29 : 0xffffffe3 generic (arm64, riscv) : 65507 : 0x0000ffe3 arm : 131042 : 0x0001ffe2 Pass the result of bpf_csum_diff() through from32to16() before returning to homogenize this result for all architectures. NOTE: from32to16() is used instead of csum_fold() because csum_fold() does from32to16() + bitwise NOT of the result, which is not what we want to do here. [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/ Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-3-puranjay@kernel.org
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
0 files changed, 0 insertions, 0 deletions