<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch/arm64/lib/Makefile, branch linux-5.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-01-16T15:23:29+00:00</updated>
<entry>
<title>arm64: Implement optimised checksum routine</title>
<updated>2020-01-16T15:23:29+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2020-01-15T16:42:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5777eaed566a1d63e344d3dd8f2b5e33be20643e'/>
<id>urn:sha1:5777eaed566a1d63e344d3dd8f2b5e33be20643e</id>
<content type='text'>
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.

The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).

Reported-by: Lingyan Huang &lt;huanglingyan2@huawei.com&gt;
Tested-by: Lingyan Huang &lt;huanglingyan2@huawei.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-next/atomics' into for-next/core</title>
<updated>2019-08-30T11:55:39+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2019-08-30T11:55:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61b7cddfe861f239bf39ab19a065e29b58153a80'/>
<id>urn:sha1:61b7cddfe861f239bf39ab19a065e29b58153a80</id>
<content type='text'>
* for-next/atomics: (10 commits)
  Rework LSE instruction selection to use static keys instead of alternatives
</content>
</entry>
<entry>
<title>arm64: atomics: Remove atomic_ll_sc compilation unit</title>
<updated>2019-08-29T14:53:49+00:00</updated>
<author>
<name>Andrew Murray</name>
<email>andrew.murray@arm.com</email>
</author>
<published>2019-08-28T17:50:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eb3aabbfbfc203082d06a64517df97a3746ba9ea'/>
<id>urn:sha1:eb3aabbfbfc203082d06a64517df97a3746ba9ea</id>
<content type='text'>
We no longer fall back to out-of-line atomics on systems with
CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.

Remove the unused compilation unit which provided these symbols.

Signed-off-by: Andrew Murray &lt;andrew.murray@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Add support for function error injection</title>
<updated>2019-08-07T12:53:09+00:00</updated>
<author>
<name>Leo Yan</name>
<email>leo.yan@linaro.org</email>
</author>
<published>2019-08-06T10:00:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42d038c4fb00f1ec1a4c4616784da4561385b628'/>
<id>urn:sha1:42d038c4fb00f1ec1a4c4616784da4561385b628</id>
<content type='text'>
Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function
error injection"), this patch supports function error injection for
Arm64.

This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.

Reviewed-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Signed-off-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Makefile: Replace -pg with CC_FLAGS_FTRACE</title>
<updated>2019-04-09T09:34:59+00:00</updated>
<author>
<name>Torsten Duwe</name>
<email>duwe@lst.de</email>
</author>
<published>2019-02-08T15:10:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=edf072d36dbfdf74465b66988f30084b6c996fbf'/>
<id>urn:sha1:edf072d36dbfdf74465b66988f30084b6c996fbf</id>
<content type='text'>
In preparation for arm64 supporting ftrace built on other compiler
options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE)
flags, whatever these may be, rather than assuming '-pg'.

There should be no functional change as a result of this patch.

Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Torsten Duwe &lt;duwe@suse.de&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
</content>
</entry>
<entry>
<title>arm64: crypto: add NEON accelerated XOR implementation</title>
<updated>2018-12-06T16:47:06+00:00</updated>
<author>
<name>Jackie Liu</name>
<email>liuyun01@kylinos.cn</email>
</author>
<published>2018-12-04T01:43:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc9f8349cb33965120a96c12e05d00676162eb7f'/>
<id>urn:sha1:cc9f8349cb33965120a96c12e05d00676162eb7f</id>
<content type='text'>
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:

[ 93.837726] xor: measuring software checksum speed
[ 93.874039]   8regs  : 7123.200 MB/sec
[ 93.914038]   32regs : 7180.300 MB/sec
[ 93.954043]   arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)

I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.

Signed-off-by: Jackie Liu &lt;liuyun01@kylinos.cn&gt;
Reviewed-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
</content>
</entry>
<entry>
<title>arm64: lse: remove -fcall-used-x0 flag</title>
<updated>2018-09-24T09:56:24+00:00</updated>
<author>
<name>Tri Vo</name>
<email>trong@android.com</email>
</author>
<published>2018-09-19T19:27:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a6c7c367de82951c98a290a21156770f6f82c84'/>
<id>urn:sha1:2a6c7c367de82951c98a290a21156770f6f82c84</id>
<content type='text'>
x0 is not callee-saved in the PCS. So there is no need to specify
-fcall-used-x0.

Clang doesn't currently support -fcall-used flags. This patch will help
building the kernel with clang.

Tested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Tri Vo &lt;trong@android.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>arm64/lib: add accelerated crc32 routines</title>
<updated>2018-09-10T15:10:53+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2018-08-27T11:02:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7481cddf29ede204b475facc40e6f65459939881'/>
<id>urn:sha1:7481cddf29ede204b475facc40e6f65459939881</id>
<content type='text'>
Unlike crc32c(), which is wired up to the crypto API internally so the
optimal driver is selected based on the platform's capabilities,
crc32_le() is implemented as a library function using a slice-by-8 table
based C implementation. Even though few of the call sites may be
bottlenecks, calling a time variant implementation with a non-negligible
D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates
support for the CRC32 instructions that were optional in ARMv8.0, but are
already widely available, even on the Cortex-A53 based Raspberry Pi.

So implement routines that use these instructions if available, and fall
back to the existing generic routines otherwise. The selection is based
on alternatives patching.

Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
this just codifies the status quo.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>locking/atomics/arm64: Replace our atomic/lock bitop implementations with asm-generic</title>
<updated>2018-06-21T10:52:12+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2018-06-19T12:53:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7c8fc35dfc32dfa97d8a1dc25dbd064cf83936db'/>
<id>urn:sha1:7c8fc35dfc32dfa97d8a1dc25dbd064cf83936db</id>
<content type='text'>
The &lt;asm-generic/bitops/{atomic,lock}.h&gt; implementations are built around
the atomic-fetch ops, which we implement efficiently for both LSE and
LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: yamada.masahiro@socionext.com
Link: https://lore.kernel.org/lkml/1529412794-17720-9-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: avoid instrumenting atomic_ll_sc.o</title>
<updated>2018-04-27T11:14:44+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2018-04-27T10:50:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3789c122d0a016b947ce5c05d3f1fbafa5db8f26'/>
<id>urn:sha1:3789c122d0a016b947ce5c05d3f1fbafa5db8f26</id>
<content type='text'>
Our out-of-line atomics are built with a special calling convention,
preventing pointless stack spilling, and allowing us to patch call sites
with ARMv8.1 atomic instructions.

Instrumentation inserted by the compiler may result in calls to
functions not following this special calling convention, resulting in
registers being unexpectedly clobbered, and various problems resulting
from this.

For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the
compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of
the atomic functions. This has been observed to result in spurious
cmpxchg failures, leading to a hang early on in the boot process.

This patch avoids such issues by preventing instrumentation of our
out-of-line atomics.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
</content>
</entry>
</feed>
