diff options
author | Martin Willi <martin@strongswan.org> | 2015-07-16 20:14:02 +0300 |
---|---|---|
committer | Herbert Xu <herbert@gondor.apana.org.au> | 2015-07-17 16:20:25 +0300 |
commit | 274f938e0a01286f465d84d5a3f1565225f4ec4b (patch) | |
tree | 57dd7a99156fe50ce0f4c43c32494de0dcf9fd2b /arch/x86/crypto/chacha20_glue.c | |
parent | c9320b6dcb89658a5e53b4f8e31f4c2ee810ec2d (diff) | |
download | linux-274f938e0a01286f465d84d5a3f1565225f4ec4b.tar.xz |
crypto: chacha20 - Add a four block SSSE3 variant for x86_64
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.
For large messages, throughput increases by ~110% compared to single block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'arch/x86/crypto/chacha20_glue.c')
-rw-r--r-- | arch/x86/crypto/chacha20_glue.c | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c index 250de401d28f..4d677c3eb7bd 100644 --- a/arch/x86/crypto/chacha20_glue.c +++ b/arch/x86/crypto/chacha20_glue.c @@ -20,12 +20,20 @@ #define CHACHA20_STATE_ALIGN 16 asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src); +asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src); static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { u8 buf[CHACHA20_BLOCK_SIZE]; + while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + chacha20_4block_xor_ssse3(state, dst, src); + bytes -= CHACHA20_BLOCK_SIZE * 4; + src += CHACHA20_BLOCK_SIZE * 4; + dst += CHACHA20_BLOCK_SIZE * 4; + state[12] += 4; + } while (bytes >= CHACHA20_BLOCK_SIZE) { chacha20_block_xor_ssse3(state, dst, src); bytes -= CHACHA20_BLOCK_SIZE; |