diff options
| author | Ard Biesheuvel <ardb@kernel.org> | 2024-11-05 19:09:06 +0300 | 
|---|---|---|
| committer | Herbert Xu <herbert@gondor.apana.org.au> | 2024-11-15 14:52:51 +0300 | 
| commit | e7c1d1c9b2023decb855ec4c921a7d78abbf64eb (patch) | |
| tree | 34837564a16ba4a5787a3b09b712012401f2d287 /scripts/gdb/linux/config.py | |
| parent | 802d8d110ce2b3ae979221551f4cb168e2f5e464 (diff) | |
| download | linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.tar.xz | |
crypto: arm/crct10dif - Implement plain NEON variant
The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in
the folding coefficients, which are also only 16 bits wide.
This means that the polynomial multiplications involving these
coefficients can be performed using 8-bit long polynomial multiplication
(8x8 -> 16) in only a few steps, and this is an instruction that is part
of the base NEON ISA, which is all most real ARMv7 cores implement. (The
64-bit PMULL instruction is part of the crypto extensions, which are
only implemented by 64-bit cores)
The final reduction is a bit more involved, but we can delegate that to
the generic CRC-T10DIF implementation after folding the entire input
into a 16 byte vector.
This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit
mode. On Cortex-A8 (BeagleBone White), the results are substantially
better than that, but not sufficiently reproducible (with tcrypt) to
quote a number here.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'scripts/gdb/linux/config.py')
0 files changed, 0 insertions, 0 deletions
