diff options
| author | Eric Biggers <ebiggers@google.com> | 2018-09-01 10:17:07 +0300 |
|---|---|---|
| committer | Herbert Xu <herbert@gondor.apana.org.au> | 2018-09-04 06:37:05 +0300 |
| commit | a1b22a5f45fe884147a99e7c381bcc48d9b2acef (patch) | |
| tree | a285e59c34660ad0b822769b2746334e428b916d /tools/perf/scripts/python/syscall-counts-by-pid.py | |
| parent | 11dcb1037f40a19f298845a9b2ec093f7b8b958b (diff) | |
| download | linux-a1b22a5f45fe884147a99e7c381bcc48d9b2acef.tar.xz | |
crypto: arm/chacha20 - faster 8-bit rotations and other optimizations
Optimize ChaCha20 NEON performance by:
- Implementing the 8-bit rotations using the 'vtbl.8' instruction.
- Streamlining the part that adds the original state and XORs the data.
- Making some other small tweaks.
On ARM Cortex-A7, these optimizations improve ChaCha20 performance from
about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement.
There is a tradeoff involved with the 'vtbl.8' rotation method since
there is at least one CPU (Cortex-A53) where it's not fastest. But it
seems to be a better default; see the added comment. Overall, this
patch reduces Cortex-A53 performance by less than 0.5%.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'tools/perf/scripts/python/syscall-counts-by-pid.py')
0 files changed, 0 insertions, 0 deletions
