x86/csum: clean up `csum_partial' further - starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

diff options

author	Linus Torvalds <torvalds@linux-foundation.org>	2023-06-27 23:55:32 +0300
committer	Linus Torvalds <torvalds@linux-foundation.org>	2024-01-05 02:42:30 +0300
commit	a476aae3f1dc78a162a0d2e7945feea7d2b29401 (patch)
tree	b8f68a96d244f3a6d5c14decc4c4eb1a707a69a7 /net/x25
parent	5d4acb62853abac1da2deebcb1c1c5b79219bf3b (diff)
download	linux-a476aae3f1dc78a162a0d2e7945feea7d2b29401.tar.xz

x86/csum: clean up `csum_partial' further

Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") ended up improving the code generation for the IP csum calculations, and in particular special-casing the 40-byte case that is a hot case for IPv6 headers. It then had _another_ special case for the 64-byte unrolled loop, which did two chains of 32-byte blocks, which allows modern CPU's to improve performance by doing the chains in parallel thanks to renaming the carry flag. This just unifies the special cases and combines them into just one single helper the 40-byte csum case, and replaces the 64-byte case by a 80-byte case that just does that single helper twice. It avoids having all these different versions of inline assembly, and actually improved performance further in my tests. There was never anything magical about the 64-byte unrolled case, even though it happens to be a common size (and typically is the cacheline size). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Diffstat (limited to 'net/x25')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: