diff options
author | Robin Murphy <robin.murphy@arm.com> | 2020-01-15 19:42:39 +0300 |
---|---|---|
committer | Will Deacon <will@kernel.org> | 2020-01-16 18:23:29 +0300 |
commit | 5777eaed566a1d63e344d3dd8f2b5e33be20643e (patch) | |
tree | 9bf4f13c0209f26135e66073760b297ba0cac0d9 /arch/arm64/lib/Makefile | |
parent | 46cf053efec6a3a5f343fead837777efe8252a46 (diff) | |
download | linux-5777eaed566a1d63e344d3dd8f2b5e33be20643e.tar.xz |
arm64: Implement optimised checksum routine
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.
The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).
Reported-by: Lingyan Huang <huanglingyan2@huawei.com>
Tested-by: Lingyan Huang <huanglingyan2@huawei.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'arch/arm64/lib/Makefile')
-rw-r--r-- | arch/arm64/lib/Makefile | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index c21b936dc01d..2fc253466dbf 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -1,9 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 lib-y := clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o \ - clear_page.o memchr.o memcpy.o memmove.o memset.o \ - memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o tishift.o + clear_page.o csum.o memchr.o memcpy.o memmove.o \ + memset.o memcmp.o strcmp.o strncmp.o strlen.o \ + strnlen.o strchr.o strrchr.o tishift.o ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o |