kernel/linux.git/arch/arm64/lib/Makefile, branch linux-5.9.y

arm64: Implement optimised checksum routine

2020-01-16T15:23:29+00:00

Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang Tested-by: Lingyan Huang Signed-off-by: Robin Murphy Signed-off-by: Will Deacon

Merge branch 'for-next/atomics' into for-next/core

2019-08-30T11:55:39+00:00

* for-next/atomics: (10 commits) Rework LSE instruction selection to use static keys instead of alternatives

arm64: atomics: Remove atomic_ll_sc compilation unit

2019-08-29T14:53:49+00:00

We no longer fall back to out-of-line atomics on systems with CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set. Remove the unused compilation unit which provided these symbols. Signed-off-by: Andrew Murray Signed-off-by: Will Deacon

arm64: Add support for function error injection

2019-08-07T12:53:09+00:00

Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function error injection"), this patch supports function error injection for Arm64. This patch mainly support two functions: one is regs_set_return_value() which is used to overwrite the return value; the another function is override_function_with_return() which is to override the probed function returning and jump to its caller. Reviewed-by: Masami Hiramatsu Signed-off-by: Leo Yan Signed-off-by: Will Deacon

arm64: Makefile: Replace -pg with CC_FLAGS_FTRACE

2019-04-09T09:34:59+00:00

In preparation for arm64 supporting ftrace built on other compiler options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE) flags, whatever these may be, rather than assuming '-pg'. There should be no functional change as a result of this patch. Reviewed-by: Mark Rutland Signed-off-by: Torsten Duwe Signed-off-by: Will Deacon

arm64: crypto: add NEON accelerated XOR implementation

2018-12-06T16:47:06+00:00

This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: Jackie Liu Reviewed-by: Ard Biesheuvel Signed-off-by: Will Deacon

arm64: lse: remove -fcall-used-x0 flag

2018-09-24T09:56:24+00:00

x0 is not callee-saved in the PCS. So there is no need to specify -fcall-used-x0. Clang doesn't currently support -fcall-used flags. This patch will help building the kernel with clang. Tested-by: Nick Desaulniers Acked-by: Will Deacon Signed-off-by: Tri Vo Signed-off-by: Catalin Marinas

arm64/lib: add accelerated crc32 routines

2018-09-10T15:10:53+00:00

Unlike crc32c(), which is wired up to the crypto API internally so the optimal driver is selected based on the platform's capabilities, crc32_le() is implemented as a library function using a slice-by-8 table based C implementation. Even though few of the call sites may be bottlenecks, calling a time variant implementation with a non-negligible D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates support for the CRC32 instructions that were optional in ARMv8.0, but are already widely available, even on the Cortex-A53 based Raspberry Pi. So implement routines that use these instructions if available, and fall back to the existing generic routines otherwise. The selection is based on alternatives patching. Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE, this just codifies the status quo. Signed-off-by: Ard Biesheuvel Signed-off-by: Catalin Marinas

locking/atomics/arm64: Replace our atomic/lock bitop implementations with asm-generic

2018-06-21T10:52:12+00:00

The implementations are built around the atomic-fetch ops, which we implement efficiently for both LSE and LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S. Signed-off-by: Will Deacon Acked-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-9-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar

arm64: avoid instrumenting atomic_ll_sc.o

2018-04-27T11:14:44+00:00

Our out-of-line atomics are built with a special calling convention, preventing pointless stack spilling, and allowing us to patch call sites with ARMv8.1 atomic instructions. Instrumentation inserted by the compiler may result in calls to functions not following this special calling convention, resulting in registers being unexpectedly clobbered, and various problems resulting from this. For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of the atomic functions. This has been observed to result in spurious cmpxchg failures, leading to a hang early on in the boot process. This patch avoids such issues by preventing instrumentation of our out-of-line atomics. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Will Deacon Signed-off-by: Will Deacon