From 64edccea594cf7cb1e2975fdf44531e3377b32db Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sun, 14 Dec 2025 10:17:11 -0800 Subject: lib/crypto: Add ML-DSA verification support Add support for verifying ML-DSA signatures. ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is specified in FIPS 204 and is the standard version of Dilithium. Unlike RSA and elliptic-curve cryptography, ML-DSA is believed to be secure even against adversaries in possession of a large-scale quantum computer. Compared to the earlier patch (https://lore.kernel.org/r/20251117145606.2155773-3-dhowells@redhat.com/) that was based on "leancrypto", this implementation: - Is about 700 lines of source code instead of 4800. - Generates about 4 KB of object code instead of 28 KB. - Uses 9-13 KB of memory to verify a signature instead of 31-84 KB. - Is at least about the same speed, with a microbenchmark showing 3-5% improvements on one x86_64 CPU and -1% to 1% changes on another. When memory is a bottleneck, it's likely much faster. - Correctly implements the RejNTTPoly step of the algorithm. The API just consists of a single function mldsa_verify(), supporting pure ML-DSA with any standard parameter set (ML-DSA-44, ML-DSA-65, or ML-DSA-87) as selected by an enum. That's all that's actually needed. The following four potential features are unneeded and aren't included. However, any that ever become needed could fairly easily be added later, as they only affect how the message representative mu is calculated: - Nonempty context strings - Incremental message hashing - HashML-DSA - External mu Signing support would, of course, be a larger and more complex addition. However, the kernel doesn't, and shouldn't, need ML-DSA signing support. Note that mldsa_verify() allocates memory, so it can sleep and can fail with ENOMEM. Unfortunately we don't have much choice about that, since ML-DSA needs a lot of memory. At least callers have to check for errors anyway, since the signature could be invalid. Note that verification doesn't require constant-time code, and in fact some steps are inherently variable-time. I've used constant-time patterns in some places anyway, but technically they're not needed. Reviewed-by: David Howells Tested-by: David Howells Link: https://lore.kernel.org/r/20251214181712.29132-2-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/mldsa.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 include/crypto/mldsa.h (limited to 'include') diff --git a/include/crypto/mldsa.h b/include/crypto/mldsa.h new file mode 100644 index 000000000000..cf30aef29970 --- /dev/null +++ b/include/crypto/mldsa.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Support for verifying ML-DSA signatures + * + * Copyright 2025 Google LLC + */ +#ifndef _CRYPTO_MLDSA_H +#define _CRYPTO_MLDSA_H + +#include + +/* Identifier for an ML-DSA parameter set */ +enum mldsa_alg { + MLDSA44, /* ML-DSA-44 */ + MLDSA65, /* ML-DSA-65 */ + MLDSA87, /* ML-DSA-87 */ +}; + +/* Lengths of ML-DSA public keys and signatures in bytes */ +#define MLDSA44_PUBLIC_KEY_SIZE 1312 +#define MLDSA65_PUBLIC_KEY_SIZE 1952 +#define MLDSA87_PUBLIC_KEY_SIZE 2592 +#define MLDSA44_SIGNATURE_SIZE 2420 +#define MLDSA65_SIGNATURE_SIZE 3309 +#define MLDSA87_SIGNATURE_SIZE 4627 + +/** + * mldsa_verify() - Verify an ML-DSA signature + * @alg: The ML-DSA parameter set to use + * @sig: The signature + * @sig_len: Length of the signature in bytes. Should match the + * MLDSA*_SIGNATURE_SIZE constant associated with @alg, + * otherwise -EBADMSG will be returned. + * @msg: The message + * @msg_len: Length of the message in bytes + * @pk: The public key + * @pk_len: Length of the public key in bytes. Should match the + * MLDSA*_PUBLIC_KEY_SIZE constant associated with @alg, + * otherwise -EBADMSG will be returned. + * + * This verifies a signature using pure ML-DSA with the specified parameter set. + * The context string is assumed to be empty. + * + * Context: Might sleep + * + * Return: + * * 0 if the signature is valid + * * -EBADMSG if the signature and/or public key is malformed + * * -EKEYREJECTED if the signature is invalid but otherwise well-formed + * * -ENOMEM if out of memory so the validity of the signature is unknown + */ +int mldsa_verify(enum mldsa_alg alg, const u8 *sig, size_t sig_len, + const u8 *msg, size_t msg_len, const u8 *pk, size_t pk_len); + +#if IS_ENABLED(CONFIG_CRYPTO_LIB_MLDSA_KUNIT_TEST) +/* Internal function, exposed only for unit testing */ +s32 mldsa_use_hint(u8 h, s32 r, s32 gamma2); +#endif + +#endif /* _CRYPTO_MLDSA_H */ -- cgit v1.2.3 From 14e15c71d7bb591fd08f527669717694810d3973 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 10 Dec 2025 17:18:33 -0800 Subject: lib/crypto: nh: Add NH library Add support for the NH "almost-universal hash function" to lib/crypto/, specifically the variant of NH used in Adiantum. This will replace the need for the "nhpoly1305" crypto_shash algorithm. All the implementations of "nhpoly1305" use architecture-optimized code only for the NH stage; they just use the generic C Poly1305 code for the Poly1305 stage. We can achieve the same result in a simpler way using an (architecture-optimized) nh() function combined with code in crypto/adiantum.c that passes the results to the Poly1305 library. This commit begins this cleanup by adding the nh() function. The code is derived from crypto/nhpoly1305.c and include/crypto/nhpoly1305.h. Link: https://lore.kernel.org/r/20251211011846.8179-2-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/nh.h | 52 +++++++++++++++++++++++++++++++++ lib/crypto/Kconfig | 10 +++++++ lib/crypto/Makefile | 8 ++++++ lib/crypto/nh.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 152 insertions(+) create mode 100644 include/crypto/nh.h create mode 100644 lib/crypto/nh.c (limited to 'include') diff --git a/include/crypto/nh.h b/include/crypto/nh.h new file mode 100644 index 000000000000..465e85bf203f --- /dev/null +++ b/include/crypto/nh.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * NH hash function for Adiantum + */ + +#ifndef _CRYPTO_NH_H +#define _CRYPTO_NH_H + +#include + +/* NH parameterization: */ + +/* Endianness: little */ +/* Word size: 32 bits (works well on NEON, SSE2, AVX2) */ + +/* Stride: 2 words (optimal on ARM32 NEON; works okay on other CPUs too) */ +#define NH_PAIR_STRIDE 2 +#define NH_MESSAGE_UNIT (NH_PAIR_STRIDE * 2 * sizeof(u32)) + +/* Num passes (Toeplitz iteration count): 4, to give ε = 2^{-128} */ +#define NH_NUM_PASSES 4 +#define NH_HASH_BYTES (NH_NUM_PASSES * sizeof(u64)) + +/* Max message size: 1024 bytes (32x compression factor) */ +#define NH_NUM_STRIDES 64 +#define NH_MESSAGE_WORDS (NH_PAIR_STRIDE * 2 * NH_NUM_STRIDES) +#define NH_MESSAGE_BYTES (NH_MESSAGE_WORDS * sizeof(u32)) +#define NH_KEY_WORDS (NH_MESSAGE_WORDS + \ + NH_PAIR_STRIDE * 2 * (NH_NUM_PASSES - 1)) +#define NH_KEY_BYTES (NH_KEY_WORDS * sizeof(u32)) + +/** + * nh() - NH hash function for Adiantum + * @key: The key. @message_len + 48 bytes of it are used. This is NH_KEY_BYTES + * if @message_len has its maximum length of NH_MESSAGE_BYTES. + * @message: The message + * @message_len: The message length in bytes. Must be a multiple of 16 + * (NH_MESSAGE_UNIT) and at most 1024 (NH_MESSAGE_BYTES). + * @hash: (output) The resulting hash value + * + * Note: the pseudocode for NH in the Adiantum paper iterates over 1024-byte + * segments of the message, computes a 32-byte hash for each, and returns all + * the hashes concatenated together. In contrast, this function just hashes one + * segment and returns one hash. It's the caller's responsibility to call this + * function for each 1024-byte segment and collect all the hashes. + * + * Context: Any context. + */ +void nh(const u32 *key, const u8 *message, size_t message_len, + __le64 hash[NH_NUM_PASSES]); + +#endif /* _CRYPTO_NH_H */ diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index ee6ab129d0cf..f14c9f5974d8 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -108,6 +108,16 @@ config CRYPTO_LIB_MLDSA The ML-DSA library functions. Select this if your module uses any of the functions from . +config CRYPTO_LIB_NH + tristate + help + Implementation of the NH almost-universal hash function, specifically + the variant of NH used in Adiantum. + +config CRYPTO_LIB_NH_ARCH + bool + depends on CRYPTO_LIB_NH && !UML + config CRYPTO_LIB_POLY1305 tristate help diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index a0578105266f..929b84568809 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -131,6 +131,14 @@ libmldsa-y := mldsa.o ################################################################################ +obj-$(CONFIG_CRYPTO_LIB_NH) += libnh.o +libnh-y := nh.o +ifeq ($(CONFIG_CRYPTO_LIB_NH_ARCH),y) +CFLAGS_nh.o += -I$(src)/$(SRCARCH) +endif + +################################################################################ + obj-$(CONFIG_CRYPTO_LIB_POLY1305) += libpoly1305.o libpoly1305-y := poly1305.o ifeq ($(CONFIG_ARCH_SUPPORTS_INT128),y) diff --git a/lib/crypto/nh.c b/lib/crypto/nh.c new file mode 100644 index 000000000000..e1d0095b5289 --- /dev/null +++ b/lib/crypto/nh.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018 Google LLC + */ + +/* + * Implementation of the NH almost-universal hash function, specifically the + * variant of NH used in Adiantum. This is *not* a cryptographic hash function. + * + * Reference: section 6.3 of "Adiantum: length-preserving encryption for + * entry-level processors" (https://eprint.iacr.org/2018/720.pdf). + */ + +#include +#include +#include +#include +#include + +#ifdef CONFIG_CRYPTO_LIB_NH_ARCH +#include "nh.h" /* $(SRCARCH)/nh.h */ +#else +static bool nh_arch(const u32 *key, const u8 *message, size_t message_len, + __le64 hash[NH_NUM_PASSES]) +{ + return false; +} +#endif + +void nh(const u32 *key, const u8 *message, size_t message_len, + __le64 hash[NH_NUM_PASSES]) +{ + u64 sums[4] = { 0, 0, 0, 0 }; + + if (nh_arch(key, message, message_len, hash)) + return; + + static_assert(NH_PAIR_STRIDE == 2); + static_assert(NH_NUM_PASSES == 4); + + while (message_len) { + u32 m0 = get_unaligned_le32(message + 0); + u32 m1 = get_unaligned_le32(message + 4); + u32 m2 = get_unaligned_le32(message + 8); + u32 m3 = get_unaligned_le32(message + 12); + + sums[0] += (u64)(u32)(m0 + key[0]) * (u32)(m2 + key[2]); + sums[1] += (u64)(u32)(m0 + key[4]) * (u32)(m2 + key[6]); + sums[2] += (u64)(u32)(m0 + key[8]) * (u32)(m2 + key[10]); + sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]); + sums[0] += (u64)(u32)(m1 + key[1]) * (u32)(m3 + key[3]); + sums[1] += (u64)(u32)(m1 + key[5]) * (u32)(m3 + key[7]); + sums[2] += (u64)(u32)(m1 + key[9]) * (u32)(m3 + key[11]); + sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]); + key += NH_MESSAGE_UNIT / sizeof(key[0]); + message += NH_MESSAGE_UNIT; + message_len -= NH_MESSAGE_UNIT; + } + + hash[0] = cpu_to_le64(sums[0]); + hash[1] = cpu_to_le64(sums[1]); + hash[2] = cpu_to_le64(sums[2]); + hash[3] = cpu_to_le64(sums[3]); +} +EXPORT_SYMBOL_GPL(nh); + +#ifdef nh_mod_init_arch +static int __init nh_mod_init(void) +{ + nh_mod_init_arch(); + return 0; +} +subsys_initcall(nh_mod_init); + +static void __exit nh_mod_exit(void) +{ +} +module_exit(nh_mod_exit); +#endif + +MODULE_DESCRIPTION("NH almost-universal hash function"); +MODULE_LICENSE("GPL"); -- cgit v1.2.3 From f676740c426535d0d2097e0b9adedb085634039b Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 10 Dec 2025 17:18:42 -0800 Subject: crypto: nhpoly1305 - Remove crypto_shash support Remove nhpoly1305 support from crypto_shash. It no longer has any user now that crypto/adiantum.c no longer uses it. Acked-by: Herbert Xu Link: https://lore.kernel.org/r/20251211011846.8179-11-ebiggers@kernel.org Signed-off-by: Eric Biggers --- crypto/Kconfig | 6 -- crypto/Makefile | 1 - crypto/nhpoly1305.c | 255 -------------------------------------------- include/crypto/nhpoly1305.h | 74 ------------- 4 files changed, 336 deletions(-) delete mode 100644 crypto/nhpoly1305.c delete mode 100644 include/crypto/nhpoly1305.h (limited to 'include') diff --git a/crypto/Kconfig b/crypto/Kconfig index 89d1ccaa1fa0..12a87f7cf150 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -759,12 +759,6 @@ config CRYPTO_XTS implementation currently can't handle a sectorsize which is not a multiple of 16 bytes. -config CRYPTO_NHPOLY1305 - tristate - select CRYPTO_HASH - select CRYPTO_LIB_POLY1305 - select CRYPTO_LIB_POLY1305_GENERIC - endmenu menu "AEAD (authenticated encryption with associated data) ciphers" diff --git a/crypto/Makefile b/crypto/Makefile index 16a35649dd91..23d3db7be425 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -94,7 +94,6 @@ obj-$(CONFIG_CRYPTO_CTR) += ctr.o obj-$(CONFIG_CRYPTO_XCTR) += xctr.o obj-$(CONFIG_CRYPTO_HCTR2) += hctr2.o obj-$(CONFIG_CRYPTO_ADIANTUM) += adiantum.o -obj-$(CONFIG_CRYPTO_NHPOLY1305) += nhpoly1305.o obj-$(CONFIG_CRYPTO_GCM) += gcm.o obj-$(CONFIG_CRYPTO_CCM) += ccm.o obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o diff --git a/crypto/nhpoly1305.c b/crypto/nhpoly1305.c deleted file mode 100644 index 2b648615b5ec..000000000000 --- a/crypto/nhpoly1305.c +++ /dev/null @@ -1,255 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * NHPoly1305 - ε-almost-∆-universal hash function for Adiantum - * - * Copyright 2018 Google LLC - */ - -/* - * "NHPoly1305" is the main component of Adiantum hashing. - * Specifically, it is the calculation - * - * H_L ← Poly1305_{K_L}(NH_{K_N}(pad_{128}(L))) - * - * from the procedure in section 6.4 of the Adiantum paper [1]. It is an - * ε-almost-∆-universal (ε-∆U) hash function for equal-length inputs over - * Z/(2^{128}Z), where the "∆" operation is addition. It hashes 1024-byte - * chunks of the input with the NH hash function [2], reducing the input length - * by 32x. The resulting NH digests are evaluated as a polynomial in - * GF(2^{130}-5), like in the Poly1305 MAC [3]. Note that the polynomial - * evaluation by itself would suffice to achieve the ε-∆U property; NH is used - * for performance since it's over twice as fast as Poly1305. - * - * This is *not* a cryptographic hash function; do not use it as such! - * - * [1] Adiantum: length-preserving encryption for entry-level processors - * (https://eprint.iacr.org/2018/720.pdf) - * [2] UMAC: Fast and Secure Message Authentication - * (https://fastcrypto.org/umac/umac_proc.pdf) - * [3] The Poly1305-AES message-authentication code - * (https://cr.yp.to/mac/poly1305-20050329.pdf) - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -static void nh_generic(const u32 *key, const u8 *message, size_t message_len, - __le64 hash[NH_NUM_PASSES]) -{ - u64 sums[4] = { 0, 0, 0, 0 }; - - BUILD_BUG_ON(NH_PAIR_STRIDE != 2); - BUILD_BUG_ON(NH_NUM_PASSES != 4); - - while (message_len) { - u32 m0 = get_unaligned_le32(message + 0); - u32 m1 = get_unaligned_le32(message + 4); - u32 m2 = get_unaligned_le32(message + 8); - u32 m3 = get_unaligned_le32(message + 12); - - sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]); - sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]); - sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]); - sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]); - sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]); - sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]); - sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]); - sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]); - key += NH_MESSAGE_UNIT / sizeof(key[0]); - message += NH_MESSAGE_UNIT; - message_len -= NH_MESSAGE_UNIT; - } - - hash[0] = cpu_to_le64(sums[0]); - hash[1] = cpu_to_le64(sums[1]); - hash[2] = cpu_to_le64(sums[2]); - hash[3] = cpu_to_le64(sums[3]); -} - -/* Pass the next NH hash value through Poly1305 */ -static void process_nh_hash_value(struct nhpoly1305_state *state, - const struct nhpoly1305_key *key) -{ - BUILD_BUG_ON(NH_HASH_BYTES % POLY1305_BLOCK_SIZE != 0); - - poly1305_core_blocks(&state->poly_state, &key->poly_key, state->nh_hash, - NH_HASH_BYTES / POLY1305_BLOCK_SIZE, 1); -} - -/* - * Feed the next portion of the source data, as a whole number of 16-byte - * "NH message units", through NH and Poly1305. Each NH hash is taken over - * 1024 bytes, except possibly the final one which is taken over a multiple of - * 16 bytes up to 1024. Also, in the case where data is passed in misaligned - * chunks, we combine partial hashes; the end result is the same either way. - */ -static void nhpoly1305_units(struct nhpoly1305_state *state, - const struct nhpoly1305_key *key, - const u8 *src, unsigned int srclen, nh_t nh_fn) -{ - do { - unsigned int bytes; - - if (state->nh_remaining == 0) { - /* Starting a new NH message */ - bytes = min_t(unsigned int, srclen, NH_MESSAGE_BYTES); - nh_fn(key->nh_key, src, bytes, state->nh_hash); - state->nh_remaining = NH_MESSAGE_BYTES - bytes; - } else { - /* Continuing a previous NH message */ - __le64 tmp_hash[NH_NUM_PASSES]; - unsigned int pos; - int i; - - pos = NH_MESSAGE_BYTES - state->nh_remaining; - bytes = min(srclen, state->nh_remaining); - nh_fn(&key->nh_key[pos / 4], src, bytes, tmp_hash); - for (i = 0; i < NH_NUM_PASSES; i++) - le64_add_cpu(&state->nh_hash[i], - le64_to_cpu(tmp_hash[i])); - state->nh_remaining -= bytes; - } - if (state->nh_remaining == 0) - process_nh_hash_value(state, key); - src += bytes; - srclen -= bytes; - } while (srclen); -} - -int crypto_nhpoly1305_setkey(struct crypto_shash *tfm, - const u8 *key, unsigned int keylen) -{ - struct nhpoly1305_key *ctx = crypto_shash_ctx(tfm); - int i; - - if (keylen != NHPOLY1305_KEY_SIZE) - return -EINVAL; - - poly1305_core_setkey(&ctx->poly_key, key); - key += POLY1305_BLOCK_SIZE; - - for (i = 0; i < NH_KEY_WORDS; i++) - ctx->nh_key[i] = get_unaligned_le32(key + i * sizeof(u32)); - - return 0; -} -EXPORT_SYMBOL(crypto_nhpoly1305_setkey); - -int crypto_nhpoly1305_init(struct shash_desc *desc) -{ - struct nhpoly1305_state *state = shash_desc_ctx(desc); - - poly1305_core_init(&state->poly_state); - state->buflen = 0; - state->nh_remaining = 0; - return 0; -} -EXPORT_SYMBOL(crypto_nhpoly1305_init); - -int crypto_nhpoly1305_update_helper(struct shash_desc *desc, - const u8 *src, unsigned int srclen, - nh_t nh_fn) -{ - struct nhpoly1305_state *state = shash_desc_ctx(desc); - const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm); - unsigned int bytes; - - if (state->buflen) { - bytes = min(srclen, (int)NH_MESSAGE_UNIT - state->buflen); - memcpy(&state->buffer[state->buflen], src, bytes); - state->buflen += bytes; - if (state->buflen < NH_MESSAGE_UNIT) - return 0; - nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT, - nh_fn); - state->buflen = 0; - src += bytes; - srclen -= bytes; - } - - if (srclen >= NH_MESSAGE_UNIT) { - bytes = round_down(srclen, NH_MESSAGE_UNIT); - nhpoly1305_units(state, key, src, bytes, nh_fn); - src += bytes; - srclen -= bytes; - } - - if (srclen) { - memcpy(state->buffer, src, srclen); - state->buflen = srclen; - } - return 0; -} -EXPORT_SYMBOL(crypto_nhpoly1305_update_helper); - -int crypto_nhpoly1305_update(struct shash_desc *desc, - const u8 *src, unsigned int srclen) -{ - return crypto_nhpoly1305_update_helper(desc, src, srclen, nh_generic); -} -EXPORT_SYMBOL(crypto_nhpoly1305_update); - -int crypto_nhpoly1305_final_helper(struct shash_desc *desc, u8 *dst, nh_t nh_fn) -{ - struct nhpoly1305_state *state = shash_desc_ctx(desc); - const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm); - - if (state->buflen) { - memset(&state->buffer[state->buflen], 0, - NH_MESSAGE_UNIT - state->buflen); - nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT, - nh_fn); - } - - if (state->nh_remaining) - process_nh_hash_value(state, key); - - poly1305_core_emit(&state->poly_state, NULL, dst); - return 0; -} -EXPORT_SYMBOL(crypto_nhpoly1305_final_helper); - -int crypto_nhpoly1305_final(struct shash_desc *desc, u8 *dst) -{ - return crypto_nhpoly1305_final_helper(desc, dst, nh_generic); -} -EXPORT_SYMBOL(crypto_nhpoly1305_final); - -static struct shash_alg nhpoly1305_alg = { - .base.cra_name = "nhpoly1305", - .base.cra_driver_name = "nhpoly1305-generic", - .base.cra_priority = 100, - .base.cra_ctxsize = sizeof(struct nhpoly1305_key), - .base.cra_module = THIS_MODULE, - .digestsize = POLY1305_DIGEST_SIZE, - .init = crypto_nhpoly1305_init, - .update = crypto_nhpoly1305_update, - .final = crypto_nhpoly1305_final, - .setkey = crypto_nhpoly1305_setkey, - .descsize = sizeof(struct nhpoly1305_state), -}; - -static int __init nhpoly1305_mod_init(void) -{ - return crypto_register_shash(&nhpoly1305_alg); -} - -static void __exit nhpoly1305_mod_exit(void) -{ - crypto_unregister_shash(&nhpoly1305_alg); -} - -module_init(nhpoly1305_mod_init); -module_exit(nhpoly1305_mod_exit); - -MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function"); -MODULE_LICENSE("GPL v2"); -MODULE_AUTHOR("Eric Biggers "); -MODULE_ALIAS_CRYPTO("nhpoly1305"); -MODULE_ALIAS_CRYPTO("nhpoly1305-generic"); diff --git a/include/crypto/nhpoly1305.h b/include/crypto/nhpoly1305.h deleted file mode 100644 index 306925fea190..000000000000 --- a/include/crypto/nhpoly1305.h +++ /dev/null @@ -1,74 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Common values and helper functions for the NHPoly1305 hash function. - */ - -#ifndef _NHPOLY1305_H -#define _NHPOLY1305_H - -#include -#include - -/* NH parameterization: */ - -/* Endianness: little */ -/* Word size: 32 bits (works well on NEON, SSE2, AVX2) */ - -/* Stride: 2 words (optimal on ARM32 NEON; works okay on other CPUs too) */ -#define NH_PAIR_STRIDE 2 -#define NH_MESSAGE_UNIT (NH_PAIR_STRIDE * 2 * sizeof(u32)) - -/* Num passes (Toeplitz iteration count): 4, to give ε = 2^{-128} */ -#define NH_NUM_PASSES 4 -#define NH_HASH_BYTES (NH_NUM_PASSES * sizeof(u64)) - -/* Max message size: 1024 bytes (32x compression factor) */ -#define NH_NUM_STRIDES 64 -#define NH_MESSAGE_WORDS (NH_PAIR_STRIDE * 2 * NH_NUM_STRIDES) -#define NH_MESSAGE_BYTES (NH_MESSAGE_WORDS * sizeof(u32)) -#define NH_KEY_WORDS (NH_MESSAGE_WORDS + \ - NH_PAIR_STRIDE * 2 * (NH_NUM_PASSES - 1)) -#define NH_KEY_BYTES (NH_KEY_WORDS * sizeof(u32)) - -#define NHPOLY1305_KEY_SIZE (POLY1305_BLOCK_SIZE + NH_KEY_BYTES) - -struct nhpoly1305_key { - struct poly1305_core_key poly_key; - u32 nh_key[NH_KEY_WORDS]; -}; - -struct nhpoly1305_state { - - /* Running total of polynomial evaluation */ - struct poly1305_state poly_state; - - /* Partial block buffer */ - u8 buffer[NH_MESSAGE_UNIT]; - unsigned int buflen; - - /* - * Number of bytes remaining until the current NH message reaches - * NH_MESSAGE_BYTES. When nonzero, 'nh_hash' holds the partial NH hash. - */ - unsigned int nh_remaining; - - __le64 nh_hash[NH_NUM_PASSES]; -}; - -typedef void (*nh_t)(const u32 *key, const u8 *message, size_t message_len, - __le64 hash[NH_NUM_PASSES]); - -int crypto_nhpoly1305_setkey(struct crypto_shash *tfm, - const u8 *key, unsigned int keylen); - -int crypto_nhpoly1305_init(struct shash_desc *desc); -int crypto_nhpoly1305_update(struct shash_desc *desc, - const u8 *src, unsigned int srclen); -int crypto_nhpoly1305_update_helper(struct shash_desc *desc, - const u8 *src, unsigned int srclen, - nh_t nh_fn); -int crypto_nhpoly1305_final(struct shash_desc *desc, u8 *dst); -int crypto_nhpoly1305_final_helper(struct shash_desc *desc, u8 *dst, - nh_t nh_fn); - -#endif /* _NHPOLY1305_H */ -- cgit v1.2.3 From a22fd0e3c495dd2d706c49c26663476e24d96e7d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:00 -0800 Subject: lib/crypto: aes: Introduce improved AES library The kernel's AES library currently has the following issues: - It doesn't take advantage of the architecture-optimized AES code, including the implementations using AES instructions. - It's much slower than even the other software AES implementations: 2-4 times slower than "aes-generic", "aes-arm", and "aes-arm64". - It requires that both the encryption and decryption round keys be computed and cached. This is wasteful for users that need only the forward (encryption) direction of the cipher: the key struct is 484 bytes when only 244 are actually needed. This missed optimization is very common, as many AES modes (e.g. GCM, CFB, CTR, CMAC, and even the tweak key in XTS) use the cipher only in the forward (encryption) direction even when doing decryption. - It doesn't provide the flexibility to customize the prepared key format. The API is defined to do key expansion, and several callers in drivers/crypto/ use it specifically to expand the key. This is an issue when integrating the existing powerpc, s390, and sparc code, which is necessary to provide full parity with the traditional API. To resolve these issues, I'm proposing the following changes: 1. New structs 'aes_key' and 'aes_enckey' are introduced, with corresponding functions aes_preparekey() and aes_prepareenckey(). Generally these structs will include the encryption+decryption round keys and the encryption round keys, respectively. However, the exact format will be under control of the architecture-specific AES code. (The verb "prepare" is chosen over "expand" since key expansion isn't necessarily done. It's also consistent with hmac*_preparekey().) 2. aes_encrypt() and aes_decrypt() will be changed to operate on the new structs instead of struct crypto_aes_ctx. 3. aes_encrypt() and aes_decrypt() will use architecture-optimized code when available, or else fall back to a new generic AES implementation that unifies the existing two fragmented generic AES implementations. The new generic AES implementation uses tables for both SubBytes and MixColumns, making it almost as fast as "aes-generic". However, instead of aes-generic's huge 8192-byte tables per direction, it uses only 1024 bytes for encryption and 1280 bytes for decryption (similar to "aes-arm"). The cost is just some extra rotations. The new generic AES implementation also includes table prefetching, making it have some "constant-time hardening". That's an improvement from aes-generic which has no constant-time hardening. It does slightly regress in constant-time hardening vs. the old lib/crypto/aes.c which had smaller tables, and from aes-fixed-time which disabled IRQs on top of that. But I think this is tolerable. The real solutions for constant-time AES are AES instructions or bit-slicing. The table-based code remains a best-effort fallback for the increasingly-rare case where a real solution is unavailable. 4. crypto_aes_ctx and aes_expandkey() will remain for now, but only for callers that are using them specifically for the AES key expansion (as opposed to en/decrypting data with the AES library). This commit begins the migration process by introducing the new structs and functions, backed by the new generic AES implementation. To allow callers to be incrementally converted, aes_encrypt() and aes_decrypt() are temporarily changed into macros that use a _Generic expression to call either the old functions (which take crypto_aes_ctx) or the new functions (which take the new types). Once all callers have been updated, these macros will go away, the old functions will be removed, and the "_new" suffix will be dropped from the new functions. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-3-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/aes.h | 157 ++++++++++++++++++-- lib/crypto/Kconfig | 4 + lib/crypto/Makefile | 11 +- lib/crypto/aes.c | 399 +++++++++++++++++++++++++++++++++++++++++++-------- 4 files changed, 501 insertions(+), 70 deletions(-) (limited to 'include') diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 9339da7c20a8..e8b83d4849fe 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -18,6 +18,60 @@ #define AES_MAX_KEYLENGTH (15 * 16) #define AES_MAX_KEYLENGTH_U32 (AES_MAX_KEYLENGTH / sizeof(u32)) +union aes_enckey_arch { + u32 rndkeys[AES_MAX_KEYLENGTH_U32]; +}; + +union aes_invkey_arch { + u32 inv_rndkeys[AES_MAX_KEYLENGTH_U32]; +}; + +/** + * struct aes_enckey - An AES key prepared for encryption + * @len: Key length in bytes: 16 for AES-128, 24 for AES-192, 32 for AES-256. + * @nrounds: Number of rounds: 10 for AES-128, 12 for AES-192, 14 for AES-256. + * This is '6 + @len / 4' and is cached so that AES implementations + * that need it don't have to recompute it for each en/decryption. + * @padding: Padding to make offsetof(@k) be a multiple of 16, so that aligning + * this struct to a 16-byte boundary results in @k also being 16-byte + * aligned. Users aren't required to align this struct to 16 bytes, + * but it may slightly improve performance. + * @k: This typically contains the AES round keys as an array of '@nrounds + 1' + * groups of four u32 words. However, architecture-specific implementations + * of AES may store something else here, e.g. just the raw key if it's all + * they need. + * + * Note that this struct is about half the size of struct aes_key. This is + * separate from struct aes_key so that modes that need only AES encryption + * (e.g. AES-GCM, AES-CTR, AES-CMAC, tweak key in AES-XTS) don't incur the time + * and space overhead of computing and caching the decryption round keys. + * + * Note that there's no decryption-only equivalent (i.e. "struct aes_deckey"), + * since (a) it's rare that modes need decryption-only, and (b) some AES + * implementations use the same @k for both encryption and decryption, either + * always or conditionally; in the latter case both @k and @inv_k are needed. + */ +struct aes_enckey { + u32 len; + u32 nrounds; + u32 padding[2]; + union aes_enckey_arch k; +}; + +/** + * struct aes_key - An AES key prepared for encryption and decryption + * @aes_enckey: Common fields and the key prepared for encryption + * @inv_k: This generally contains the round keys for the AES Equivalent + * Inverse Cipher, as an array of '@nrounds + 1' groups of four u32 + * words. However, architecture-specific implementations of AES may + * store something else here. For example, they may leave this field + * uninitialized if they use @k for both encryption and decryption. + */ +struct aes_key { + struct aes_enckey; /* Include all fields of aes_enckey. */ + union aes_invkey_arch inv_k; +}; + /* * Please ensure that the first two fields are 16-byte aligned * relative to the start of the structure, i.e., don't move them! @@ -34,7 +88,7 @@ extern const u32 crypto_it_tab[4][256] ____cacheline_aligned; /* * validate key length for AES algorithms */ -static inline int aes_check_keylen(unsigned int keylen) +static inline int aes_check_keylen(size_t keylen) { switch (keylen) { case AES_KEYSIZE_128: @@ -69,23 +123,104 @@ int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len); /** - * aes_encrypt - Encrypt a single AES block - * @ctx: Context struct containing the key schedule - * @out: Buffer to store the ciphertext - * @in: Buffer containing the plaintext + * aes_preparekey() - Prepare an AES key for encryption and decryption + * @key: (output) The key structure to initialize + * @in_key: The raw AES key + * @key_len: Length of the raw key in bytes. Should be either AES_KEYSIZE_128, + * AES_KEYSIZE_192, or AES_KEYSIZE_256. + * + * This prepares an AES key for both the encryption and decryption directions of + * the block cipher. Typically this involves expanding the raw key into both + * the standard round keys and the Equivalent Inverse Cipher round keys, but + * some architecture-specific implementations don't do the full expansion here. + * + * The caller is responsible for zeroizing both the struct aes_key and the raw + * key once they are no longer needed. + * + * If you don't need decryption support, use aes_prepareenckey() instead. + * + * Return: 0 on success or -EINVAL if the given key length is invalid. No other + * errors are possible, so callers that always pass a valid key length + * don't need to check for errors. + * + * Context: Any context. + */ +int aes_preparekey(struct aes_key *key, const u8 *in_key, size_t key_len); + +/** + * aes_prepareenckey() - Prepare an AES key for encryption-only + * @key: (output) The key structure to initialize + * @in_key: The raw AES key + * @key_len: Length of the raw key in bytes. Should be either AES_KEYSIZE_128, + * AES_KEYSIZE_192, or AES_KEYSIZE_256. + * + * This prepares an AES key for only the encryption direction of the block + * cipher. Typically this involves expanding the raw key into only the standard + * round keys, resulting in a struct about half the size of struct aes_key. + * + * The caller is responsible for zeroizing both the struct aes_enckey and the + * raw key once they are no longer needed. + * + * Note that while the resulting prepared key supports only AES encryption, it + * can still be used for decrypting in a mode of operation that uses AES in only + * the encryption (forward) direction, for example counter mode. + * + * Return: 0 on success or -EINVAL if the given key length is invalid. No other + * errors are possible, so callers that always pass a valid key length + * don't need to check for errors. + * + * Context: Any context. + */ +int aes_prepareenckey(struct aes_enckey *key, const u8 *in_key, size_t key_len); + +typedef union { + const struct aes_enckey *enc_key; + const struct aes_key *full_key; +} aes_encrypt_arg __attribute__ ((__transparent_union__)); + +/** + * aes_encrypt() - Encrypt a single AES block + * @key: The AES key, as a pointer to either an encryption-only key + * (struct aes_enckey) or a full, bidirectional key (struct aes_key). + * @out: Buffer to store the ciphertext block + * @in: Buffer containing the plaintext block + * + * Context: Any context. */ -void aes_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); +#define aes_encrypt(key, out, in) \ + _Generic((key), \ + struct crypto_aes_ctx *: aes_encrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ + const struct crypto_aes_ctx *: aes_encrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ + struct aes_enckey *: aes_encrypt_new((const struct aes_enckey *)(key), (out), (in)), \ + const struct aes_enckey *: aes_encrypt_new((const struct aes_enckey *)(key), (out), (in)), \ + struct aes_key *: aes_encrypt_new((const struct aes_key *)(key), (out), (in)), \ + const struct aes_key *: aes_encrypt_new((const struct aes_key *)(key), (out), (in))) +void aes_encrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); +void aes_encrypt_new(aes_encrypt_arg key, u8 out[at_least AES_BLOCK_SIZE], + const u8 in[at_least AES_BLOCK_SIZE]); /** - * aes_decrypt - Decrypt a single AES block - * @ctx: Context struct containing the key schedule - * @out: Buffer to store the plaintext - * @in: Buffer containing the ciphertext + * aes_decrypt() - Decrypt a single AES block + * @key: The AES key, previously initialized by aes_preparekey() + * @out: Buffer to store the plaintext block + * @in: Buffer containing the ciphertext block + * + * Context: Any context. */ -void aes_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); +#define aes_decrypt(key, out, in) \ + _Generic((key), \ + struct crypto_aes_ctx *: aes_decrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ + const struct crypto_aes_ctx *: aes_decrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ + struct aes_key *: aes_decrypt_new((const struct aes_key *)(key), (out), (in)), \ + const struct aes_key *: aes_decrypt_new((const struct aes_key *)(key), (out), (in))) +void aes_decrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); +void aes_decrypt_new(const struct aes_key *key, u8 out[at_least AES_BLOCK_SIZE], + const u8 in[at_least AES_BLOCK_SIZE]); extern const u8 crypto_aes_sbox[]; extern const u8 crypto_aes_inv_sbox[]; +extern const u32 aes_enc_tab[256]; +extern const u32 aes_dec_tab[256]; void aescfb_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, int len, const u8 iv[AES_BLOCK_SIZE]); diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index 781a42c5c572..4efad77daa24 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -11,6 +11,10 @@ config CRYPTO_LIB_UTILS config CRYPTO_LIB_AES tristate +config CRYPTO_LIB_AES_ARCH + bool + depends on CRYPTO_LIB_AES && !UML && !KMSAN + config CRYPTO_LIB_AESCFB tristate select CRYPTO_LIB_AES diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 45128eccedef..01193b3f47ba 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -15,8 +15,15 @@ obj-$(CONFIG_CRYPTO_HASH_INFO) += hash_info.o obj-$(CONFIG_CRYPTO_LIB_UTILS) += libcryptoutils.o libcryptoutils-y := memneq.o utils.o -obj-$(CONFIG_CRYPTO_LIB_AES) += libaes.o -libaes-y := aes.o +################################################################################ + +obj-$(CONFIG_CRYPTO_LIB_AES) += libaes.o +libaes-y := aes.o +ifeq ($(CONFIG_CRYPTO_LIB_AES_ARCH),y) +CFLAGS_aes.o += -I$(src)/$(SRCARCH) +endif # CONFIG_CRYPTO_LIB_AES_ARCH + +################################################################################ obj-$(CONFIG_CRYPTO_LIB_AESCFB) += libaescfb.o libaescfb-y := aescfb.o diff --git a/lib/crypto/aes.c b/lib/crypto/aes.c index 102aaa76bc8d..88da68dcf5a8 100644 --- a/lib/crypto/aes.c +++ b/lib/crypto/aes.c @@ -1,9 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2017-2019 Linaro Ltd + * Copyright 2026 Google LLC */ #include +#include #include #include #include @@ -89,6 +91,110 @@ extern const u8 crypto_aes_inv_sbox[256] __alias(aes_inv_sbox); EXPORT_SYMBOL(crypto_aes_sbox); EXPORT_SYMBOL(crypto_aes_inv_sbox); +/* aes_enc_tab[i] contains MixColumn([SubByte(i), 0, 0, 0]). */ +const u32 ____cacheline_aligned aes_enc_tab[256] = { + 0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6, 0x0df2f2ff, 0xbd6b6bd6, + 0xb16f6fde, 0x54c5c591, 0x50303060, 0x03010102, 0xa96767ce, 0x7d2b2b56, + 0x19fefee7, 0x62d7d7b5, 0xe6abab4d, 0x9a7676ec, 0x45caca8f, 0x9d82821f, + 0x40c9c989, 0x877d7dfa, 0x15fafaef, 0xeb5959b2, 0xc947478e, 0x0bf0f0fb, + 0xecadad41, 0x67d4d4b3, 0xfda2a25f, 0xeaafaf45, 0xbf9c9c23, 0xf7a4a453, + 0x967272e4, 0x5bc0c09b, 0xc2b7b775, 0x1cfdfde1, 0xae93933d, 0x6a26264c, + 0x5a36366c, 0x413f3f7e, 0x02f7f7f5, 0x4fcccc83, 0x5c343468, 0xf4a5a551, + 0x34e5e5d1, 0x08f1f1f9, 0x937171e2, 0x73d8d8ab, 0x53313162, 0x3f15152a, + 0x0c040408, 0x52c7c795, 0x65232346, 0x5ec3c39d, 0x28181830, 0xa1969637, + 0x0f05050a, 0xb59a9a2f, 0x0907070e, 0x36121224, 0x9b80801b, 0x3de2e2df, + 0x26ebebcd, 0x6927274e, 0xcdb2b27f, 0x9f7575ea, 0x1b090912, 0x9e83831d, + 0x742c2c58, 0x2e1a1a34, 0x2d1b1b36, 0xb26e6edc, 0xee5a5ab4, 0xfba0a05b, + 0xf65252a4, 0x4d3b3b76, 0x61d6d6b7, 0xceb3b37d, 0x7b292952, 0x3ee3e3dd, + 0x712f2f5e, 0x97848413, 0xf55353a6, 0x68d1d1b9, 0x00000000, 0x2cededc1, + 0x60202040, 0x1ffcfce3, 0xc8b1b179, 0xed5b5bb6, 0xbe6a6ad4, 0x46cbcb8d, + 0xd9bebe67, 0x4b393972, 0xde4a4a94, 0xd44c4c98, 0xe85858b0, 0x4acfcf85, + 0x6bd0d0bb, 0x2aefefc5, 0xe5aaaa4f, 0x16fbfbed, 0xc5434386, 0xd74d4d9a, + 0x55333366, 0x94858511, 0xcf45458a, 0x10f9f9e9, 0x06020204, 0x817f7ffe, + 0xf05050a0, 0x443c3c78, 0xba9f9f25, 0xe3a8a84b, 0xf35151a2, 0xfea3a35d, + 0xc0404080, 0x8a8f8f05, 0xad92923f, 0xbc9d9d21, 0x48383870, 0x04f5f5f1, + 0xdfbcbc63, 0xc1b6b677, 0x75dadaaf, 0x63212142, 0x30101020, 0x1affffe5, + 0x0ef3f3fd, 0x6dd2d2bf, 0x4ccdcd81, 0x140c0c18, 0x35131326, 0x2fececc3, + 0xe15f5fbe, 0xa2979735, 0xcc444488, 0x3917172e, 0x57c4c493, 0xf2a7a755, + 0x827e7efc, 0x473d3d7a, 0xac6464c8, 0xe75d5dba, 0x2b191932, 0x957373e6, + 0xa06060c0, 0x98818119, 0xd14f4f9e, 0x7fdcdca3, 0x66222244, 0x7e2a2a54, + 0xab90903b, 0x8388880b, 0xca46468c, 0x29eeeec7, 0xd3b8b86b, 0x3c141428, + 0x79dedea7, 0xe25e5ebc, 0x1d0b0b16, 0x76dbdbad, 0x3be0e0db, 0x56323264, + 0x4e3a3a74, 0x1e0a0a14, 0xdb494992, 0x0a06060c, 0x6c242448, 0xe45c5cb8, + 0x5dc2c29f, 0x6ed3d3bd, 0xefacac43, 0xa66262c4, 0xa8919139, 0xa4959531, + 0x37e4e4d3, 0x8b7979f2, 0x32e7e7d5, 0x43c8c88b, 0x5937376e, 0xb76d6dda, + 0x8c8d8d01, 0x64d5d5b1, 0xd24e4e9c, 0xe0a9a949, 0xb46c6cd8, 0xfa5656ac, + 0x07f4f4f3, 0x25eaeacf, 0xaf6565ca, 0x8e7a7af4, 0xe9aeae47, 0x18080810, + 0xd5baba6f, 0x887878f0, 0x6f25254a, 0x722e2e5c, 0x241c1c38, 0xf1a6a657, + 0xc7b4b473, 0x51c6c697, 0x23e8e8cb, 0x7cdddda1, 0x9c7474e8, 0x211f1f3e, + 0xdd4b4b96, 0xdcbdbd61, 0x868b8b0d, 0x858a8a0f, 0x907070e0, 0x423e3e7c, + 0xc4b5b571, 0xaa6666cc, 0xd8484890, 0x05030306, 0x01f6f6f7, 0x120e0e1c, + 0xa36161c2, 0x5f35356a, 0xf95757ae, 0xd0b9b969, 0x91868617, 0x58c1c199, + 0x271d1d3a, 0xb99e9e27, 0x38e1e1d9, 0x13f8f8eb, 0xb398982b, 0x33111122, + 0xbb6969d2, 0x70d9d9a9, 0x898e8e07, 0xa7949433, 0xb69b9b2d, 0x221e1e3c, + 0x92878715, 0x20e9e9c9, 0x49cece87, 0xff5555aa, 0x78282850, 0x7adfdfa5, + 0x8f8c8c03, 0xf8a1a159, 0x80898909, 0x170d0d1a, 0xdabfbf65, 0x31e6e6d7, + 0xc6424284, 0xb86868d0, 0xc3414182, 0xb0999929, 0x772d2d5a, 0x110f0f1e, + 0xcbb0b07b, 0xfc5454a8, 0xd6bbbb6d, 0x3a16162c, +}; +EXPORT_SYMBOL(aes_enc_tab); + +/* aes_dec_tab[i] contains InvMixColumn([InvSubByte(i), 0, 0, 0]). */ +const u32 ____cacheline_aligned aes_dec_tab[256] = { + 0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a, 0xcb6bab3b, 0xf1459d1f, + 0xab58faac, 0x9303e34b, 0x55fa3020, 0xf66d76ad, 0x9176cc88, 0x254c02f5, + 0xfcd7e54f, 0xd7cb2ac5, 0x80443526, 0x8fa362b5, 0x495ab1de, 0x671bba25, + 0x980eea45, 0xe1c0fe5d, 0x02752fc3, 0x12f04c81, 0xa397468d, 0xc6f9d36b, + 0xe75f8f03, 0x959c9215, 0xeb7a6dbf, 0xda595295, 0x2d83bed4, 0xd3217458, + 0x2969e049, 0x44c8c98e, 0x6a89c275, 0x78798ef4, 0x6b3e5899, 0xdd71b927, + 0xb64fe1be, 0x17ad88f0, 0x66ac20c9, 0xb43ace7d, 0x184adf63, 0x82311ae5, + 0x60335197, 0x457f5362, 0xe07764b1, 0x84ae6bbb, 0x1ca081fe, 0x942b08f9, + 0x58684870, 0x19fd458f, 0x876cde94, 0xb7f87b52, 0x23d373ab, 0xe2024b72, + 0x578f1fe3, 0x2aab5566, 0x0728ebb2, 0x03c2b52f, 0x9a7bc586, 0xa50837d3, + 0xf2872830, 0xb2a5bf23, 0xba6a0302, 0x5c8216ed, 0x2b1ccf8a, 0x92b479a7, + 0xf0f207f3, 0xa1e2694e, 0xcdf4da65, 0xd5be0506, 0x1f6234d1, 0x8afea6c4, + 0x9d532e34, 0xa055f3a2, 0x32e18a05, 0x75ebf6a4, 0x39ec830b, 0xaaef6040, + 0x069f715e, 0x51106ebd, 0xf98a213e, 0x3d06dd96, 0xae053edd, 0x46bde64d, + 0xb58d5491, 0x055dc471, 0x6fd40604, 0xff155060, 0x24fb9819, 0x97e9bdd6, + 0xcc434089, 0x779ed967, 0xbd42e8b0, 0x888b8907, 0x385b19e7, 0xdbeec879, + 0x470a7ca1, 0xe90f427c, 0xc91e84f8, 0x00000000, 0x83868009, 0x48ed2b32, + 0xac70111e, 0x4e725a6c, 0xfbff0efd, 0x5638850f, 0x1ed5ae3d, 0x27392d36, + 0x64d90f0a, 0x21a65c68, 0xd1545b9b, 0x3a2e3624, 0xb1670a0c, 0x0fe75793, + 0xd296eeb4, 0x9e919b1b, 0x4fc5c080, 0xa220dc61, 0x694b775a, 0x161a121c, + 0x0aba93e2, 0xe52aa0c0, 0x43e0223c, 0x1d171b12, 0x0b0d090e, 0xadc78bf2, + 0xb9a8b62d, 0xc8a91e14, 0x8519f157, 0x4c0775af, 0xbbdd99ee, 0xfd607fa3, + 0x9f2601f7, 0xbcf5725c, 0xc53b6644, 0x347efb5b, 0x7629438b, 0xdcc623cb, + 0x68fcedb6, 0x63f1e4b8, 0xcadc31d7, 0x10856342, 0x40229713, 0x2011c684, + 0x7d244a85, 0xf83dbbd2, 0x1132f9ae, 0x6da129c7, 0x4b2f9e1d, 0xf330b2dc, + 0xec52860d, 0xd0e3c177, 0x6c16b32b, 0x99b970a9, 0xfa489411, 0x2264e947, + 0xc48cfca8, 0x1a3ff0a0, 0xd82c7d56, 0xef903322, 0xc74e4987, 0xc1d138d9, + 0xfea2ca8c, 0x360bd498, 0xcf81f5a6, 0x28de7aa5, 0x268eb7da, 0xa4bfad3f, + 0xe49d3a2c, 0x0d927850, 0x9bcc5f6a, 0x62467e54, 0xc2138df6, 0xe8b8d890, + 0x5ef7392e, 0xf5afc382, 0xbe805d9f, 0x7c93d069, 0xa92dd56f, 0xb31225cf, + 0x3b99acc8, 0xa77d1810, 0x6e639ce8, 0x7bbb3bdb, 0x097826cd, 0xf418596e, + 0x01b79aec, 0xa89a4f83, 0x656e95e6, 0x7ee6ffaa, 0x08cfbc21, 0xe6e815ef, + 0xd99be7ba, 0xce366f4a, 0xd4099fea, 0xd67cb029, 0xafb2a431, 0x31233f2a, + 0x3094a5c6, 0xc066a235, 0x37bc4e74, 0xa6ca82fc, 0xb0d090e0, 0x15d8a733, + 0x4a9804f1, 0xf7daec41, 0x0e50cd7f, 0x2ff69117, 0x8dd64d76, 0x4db0ef43, + 0x544daacc, 0xdf0496e4, 0xe3b5d19e, 0x1b886a4c, 0xb81f2cc1, 0x7f516546, + 0x04ea5e9d, 0x5d358c01, 0x737487fa, 0x2e410bfb, 0x5a1d67b3, 0x52d2db92, + 0x335610e9, 0x1347d66d, 0x8c61d79a, 0x7a0ca137, 0x8e14f859, 0x893c13eb, + 0xee27a9ce, 0x35c961b7, 0xede51ce1, 0x3cb1477a, 0x59dfd29c, 0x3f73f255, + 0x79ce1418, 0xbf37c773, 0xeacdf753, 0x5baafd5f, 0x146f3ddf, 0x86db4478, + 0x81f3afca, 0x3ec468b9, 0x2c342438, 0x5f40a3c2, 0x72c31d16, 0x0c25e2bc, + 0x8b493c28, 0x41950dff, 0x7101a839, 0xdeb30c08, 0x9ce4b4d8, 0x90c15664, + 0x6184cb7b, 0x70b632d5, 0x745c6c48, 0x4257b8d0, +}; +EXPORT_SYMBOL(aes_dec_tab); + +/* Prefetch data into L1 cache. @mem should be cacheline-aligned. */ +static __always_inline void aes_prefetch(const void *mem, size_t len) +{ + for (size_t i = 0; i < len; i += L1_CACHE_BYTES) + *(volatile const u8 *)(mem + i); + barrier(); +} + static u32 mul_by_x(u32 w) { u32 x = w & 0x7f7f7f7f; @@ -169,38 +275,17 @@ static u32 subw(u32 in) (aes_sbox[(in >> 24) & 0xff] << 24); } -/** - * aes_expandkey - Expands the AES key as described in FIPS-197 - * @ctx: The location where the computed key will be stored. - * @in_key: The supplied key. - * @key_len: The length of the supplied key. - * - * Returns 0 on success. The function fails only if an invalid key size (or - * pointer) is supplied. - * The expanded key size is 240 bytes (max of 14 rounds with a unique 16 bytes - * key schedule plus a 16 bytes key which is used before the first round). - * The decryption key is prepared for the "Equivalent Inverse Cipher" as - * described in FIPS-197. The first slot (16 bytes) of each key (enc or dec) is - * for the initial combination, the second slot for the first round and so on. - */ -int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, - unsigned int key_len) +static void aes_expandkey_generic(u32 rndkeys[], u32 *inv_rndkeys, + const u8 *in_key, int key_len) { u32 kwords = key_len / sizeof(u32); u32 rc, i, j; - int err; - - err = aes_check_keylen(key_len); - if (err) - return err; - - ctx->key_length = key_len; for (i = 0; i < kwords; i++) - ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32)); + rndkeys[i] = get_unaligned_le32(&in_key[i * sizeof(u32)]); for (i = 0, rc = 1; i < 10; i++, rc = mul_by_x(rc)) { - u32 *rki = ctx->key_enc + (i * kwords); + u32 *rki = &rndkeys[i * kwords]; u32 *rko = rki + kwords; rko[0] = ror32(subw(rki[kwords - 1]), 8) ^ rc ^ rki[0]; @@ -229,34 +314,38 @@ int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, * the Inverse Mix Columns transformation to all but the first and * the last one. */ - ctx->key_dec[0] = ctx->key_enc[key_len + 24]; - ctx->key_dec[1] = ctx->key_enc[key_len + 25]; - ctx->key_dec[2] = ctx->key_enc[key_len + 26]; - ctx->key_dec[3] = ctx->key_enc[key_len + 27]; - - for (i = 4, j = key_len + 20; j > 0; i += 4, j -= 4) { - ctx->key_dec[i] = inv_mix_columns(ctx->key_enc[j]); - ctx->key_dec[i + 1] = inv_mix_columns(ctx->key_enc[j + 1]); - ctx->key_dec[i + 2] = inv_mix_columns(ctx->key_enc[j + 2]); - ctx->key_dec[i + 3] = inv_mix_columns(ctx->key_enc[j + 3]); - } + if (inv_rndkeys) { + inv_rndkeys[0] = rndkeys[key_len + 24]; + inv_rndkeys[1] = rndkeys[key_len + 25]; + inv_rndkeys[2] = rndkeys[key_len + 26]; + inv_rndkeys[3] = rndkeys[key_len + 27]; + + for (i = 4, j = key_len + 20; j > 0; i += 4, j -= 4) { + inv_rndkeys[i] = inv_mix_columns(rndkeys[j]); + inv_rndkeys[i + 1] = inv_mix_columns(rndkeys[j + 1]); + inv_rndkeys[i + 2] = inv_mix_columns(rndkeys[j + 2]); + inv_rndkeys[i + 3] = inv_mix_columns(rndkeys[j + 3]); + } - ctx->key_dec[i] = ctx->key_enc[0]; - ctx->key_dec[i + 1] = ctx->key_enc[1]; - ctx->key_dec[i + 2] = ctx->key_enc[2]; - ctx->key_dec[i + 3] = ctx->key_enc[3]; + inv_rndkeys[i] = rndkeys[0]; + inv_rndkeys[i + 1] = rndkeys[1]; + inv_rndkeys[i + 2] = rndkeys[2]; + inv_rndkeys[i + 3] = rndkeys[3]; + } +} +int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, + unsigned int key_len) +{ + if (aes_check_keylen(key_len) != 0) + return -EINVAL; + ctx->key_length = key_len; + aes_expandkey_generic(ctx->key_enc, ctx->key_dec, in_key, key_len); return 0; } EXPORT_SYMBOL(aes_expandkey); -/** - * aes_encrypt - Encrypt a single AES block - * @ctx: Context struct containing the key schedule - * @out: Buffer to store the ciphertext - * @in: Buffer containing the plaintext - */ -void aes_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) +void aes_encrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) { const u32 *rkp = ctx->key_enc + 4; int rounds = 6 + ctx->key_length / 4; @@ -299,15 +388,117 @@ void aes_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8); put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12); } -EXPORT_SYMBOL(aes_encrypt); +EXPORT_SYMBOL(aes_encrypt_old); -/** - * aes_decrypt - Decrypt a single AES block - * @ctx: Context struct containing the key schedule - * @out: Buffer to store the plaintext - * @in: Buffer containing the ciphertext - */ -void aes_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) +static __always_inline u32 enc_quarterround(const u32 w[4], int i, u32 rk) +{ + return rk ^ aes_enc_tab[(u8)w[i]] ^ + rol32(aes_enc_tab[(u8)(w[(i + 1) % 4] >> 8)], 8) ^ + rol32(aes_enc_tab[(u8)(w[(i + 2) % 4] >> 16)], 16) ^ + rol32(aes_enc_tab[(u8)(w[(i + 3) % 4] >> 24)], 24); +} + +static __always_inline u32 enclast_quarterround(const u32 w[4], int i, u32 rk) +{ + return rk ^ ((aes_enc_tab[(u8)w[i]] & 0x0000ff00) >> 8) ^ + (aes_enc_tab[(u8)(w[(i + 1) % 4] >> 8)] & 0x0000ff00) ^ + ((aes_enc_tab[(u8)(w[(i + 2) % 4] >> 16)] & 0x0000ff00) << 8) ^ + ((aes_enc_tab[(u8)(w[(i + 3) % 4] >> 24)] & 0x0000ff00) << 16); +} + +static void __maybe_unused aes_encrypt_generic(const u32 rndkeys[], int nrounds, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + const u32 *rkp = rndkeys; + int n = nrounds - 1; + u32 w[4]; + + w[0] = get_unaligned_le32(&in[0]) ^ *rkp++; + w[1] = get_unaligned_le32(&in[4]) ^ *rkp++; + w[2] = get_unaligned_le32(&in[8]) ^ *rkp++; + w[3] = get_unaligned_le32(&in[12]) ^ *rkp++; + + /* + * Prefetch the table before doing data and key-dependent loads from it. + * + * This is intended only as a basic constant-time hardening measure that + * avoids interfering with performance too much. Its effectiveness is + * not guaranteed. For proper constant-time AES, a CPU that supports + * AES instructions should be used instead. + */ + aes_prefetch(aes_enc_tab, sizeof(aes_enc_tab)); + + do { + u32 w0 = enc_quarterround(w, 0, *rkp++); + u32 w1 = enc_quarterround(w, 1, *rkp++); + u32 w2 = enc_quarterround(w, 2, *rkp++); + u32 w3 = enc_quarterround(w, 3, *rkp++); + + w[0] = w0; + w[1] = w1; + w[2] = w2; + w[3] = w3; + } while (--n); + + put_unaligned_le32(enclast_quarterround(w, 0, *rkp++), &out[0]); + put_unaligned_le32(enclast_quarterround(w, 1, *rkp++), &out[4]); + put_unaligned_le32(enclast_quarterround(w, 2, *rkp++), &out[8]); + put_unaligned_le32(enclast_quarterround(w, 3, *rkp++), &out[12]); +} + +static __always_inline u32 dec_quarterround(const u32 w[4], int i, u32 rk) +{ + return rk ^ aes_dec_tab[(u8)w[i]] ^ + rol32(aes_dec_tab[(u8)(w[(i + 3) % 4] >> 8)], 8) ^ + rol32(aes_dec_tab[(u8)(w[(i + 2) % 4] >> 16)], 16) ^ + rol32(aes_dec_tab[(u8)(w[(i + 1) % 4] >> 24)], 24); +} + +static __always_inline u32 declast_quarterround(const u32 w[4], int i, u32 rk) +{ + return rk ^ aes_inv_sbox[(u8)w[i]] ^ + ((u32)aes_inv_sbox[(u8)(w[(i + 3) % 4] >> 8)] << 8) ^ + ((u32)aes_inv_sbox[(u8)(w[(i + 2) % 4] >> 16)] << 16) ^ + ((u32)aes_inv_sbox[(u8)(w[(i + 1) % 4] >> 24)] << 24); +} + +static void __maybe_unused aes_decrypt_generic(const u32 inv_rndkeys[], + int nrounds, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + const u32 *rkp = inv_rndkeys; + int n = nrounds - 1; + u32 w[4]; + + w[0] = get_unaligned_le32(&in[0]) ^ *rkp++; + w[1] = get_unaligned_le32(&in[4]) ^ *rkp++; + w[2] = get_unaligned_le32(&in[8]) ^ *rkp++; + w[3] = get_unaligned_le32(&in[12]) ^ *rkp++; + + aes_prefetch(aes_dec_tab, sizeof(aes_dec_tab)); + + do { + u32 w0 = dec_quarterround(w, 0, *rkp++); + u32 w1 = dec_quarterround(w, 1, *rkp++); + u32 w2 = dec_quarterround(w, 2, *rkp++); + u32 w3 = dec_quarterround(w, 3, *rkp++); + + w[0] = w0; + w[1] = w1; + w[2] = w2; + w[3] = w3; + } while (--n); + + aes_prefetch((const void *)aes_inv_sbox, sizeof(aes_inv_sbox)); + put_unaligned_le32(declast_quarterround(w, 0, *rkp++), &out[0]); + put_unaligned_le32(declast_quarterround(w, 1, *rkp++), &out[4]); + put_unaligned_le32(declast_quarterround(w, 2, *rkp++), &out[8]); + put_unaligned_le32(declast_quarterround(w, 3, *rkp++), &out[12]); +} + +void aes_decrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) { const u32 *rkp = ctx->key_dec + 4; int rounds = 6 + ctx->key_length / 4; @@ -350,8 +541,102 @@ void aes_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8); put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12); } -EXPORT_SYMBOL(aes_decrypt); +EXPORT_SYMBOL(aes_decrypt_old); + +/* + * Note: the aes_prepare*key_* names reflect the fact that the implementation + * might not actually expand the key. (The s390 code for example doesn't.) + * Where the key is expanded we use the more specific names aes_expandkey_*. + * + * aes_preparekey_arch() is passed an optional pointer 'inv_k' which points to + * the area to store the prepared decryption key. It will be NULL if the user + * is requesting encryption-only. aes_preparekey_arch() is also passed a valid + * 'key_len' and 'nrounds', corresponding to AES-128, AES-192, or AES-256. + */ +#ifdef CONFIG_CRYPTO_LIB_AES_ARCH +/* An arch-specific implementation of AES is available. Include it. */ +#include "aes.h" /* $(SRCARCH)/aes.h */ +#else +/* No arch-specific implementation of AES is available. Use generic code. */ + +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + aes_expandkey_generic(k->rndkeys, inv_k ? inv_k->inv_rndkeys : NULL, + in_key, key_len); +} + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + aes_encrypt_generic(key->k.rndkeys, key->nrounds, out, in); +} + +static void aes_decrypt_arch(const struct aes_key *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + aes_decrypt_generic(key->inv_k.inv_rndkeys, key->nrounds, out, in); +} +#endif + +static int __aes_preparekey(struct aes_enckey *enc_key, + union aes_invkey_arch *inv_k, + const u8 *in_key, size_t key_len) +{ + if (aes_check_keylen(key_len) != 0) + return -EINVAL; + enc_key->len = key_len; + enc_key->nrounds = 6 + key_len / 4; + aes_preparekey_arch(&enc_key->k, inv_k, in_key, key_len, + enc_key->nrounds); + return 0; +} + +int aes_preparekey(struct aes_key *key, const u8 *in_key, size_t key_len) +{ + return __aes_preparekey((struct aes_enckey *)key, &key->inv_k, + in_key, key_len); +} +EXPORT_SYMBOL(aes_preparekey); + +int aes_prepareenckey(struct aes_enckey *key, const u8 *in_key, size_t key_len) +{ + return __aes_preparekey(key, NULL, in_key, key_len); +} +EXPORT_SYMBOL(aes_prepareenckey); + +void aes_encrypt_new(aes_encrypt_arg key, u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + aes_encrypt_arch(key.enc_key, out, in); +} +EXPORT_SYMBOL(aes_encrypt_new); + +void aes_decrypt_new(const struct aes_key *key, u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + aes_decrypt_arch(key, out, in); +} +EXPORT_SYMBOL(aes_decrypt_new); + +#ifdef aes_mod_init_arch +static int __init aes_mod_init(void) +{ + aes_mod_init_arch(); + return 0; +} +subsys_initcall(aes_mod_init); + +static void __exit aes_mod_exit(void) +{ +} +module_exit(aes_mod_exit); +#endif -MODULE_DESCRIPTION("Generic AES library"); +MODULE_DESCRIPTION("AES block cipher"); MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_AUTHOR("Eric Biggers "); MODULE_LICENSE("GPL v2"); -- cgit v1.2.3 From a2484474272ef98d9580d8c610b0f7c6ed2f146c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:07 -0800 Subject: crypto: aes - Replace aes-generic with wrapper around lib Now that the AES library's performance has been improved, replace aes_generic.c with a new file aes.c which wraps the AES library. In preparation for making the AES library actually utilize the kernel's existing architecture-optimized AES code including AES instructions, set the driver name to "aes-lib" instead of "aes-generic". This mirrors what's been done for the hash algorithms. Update testmgr.c accordingly. Since this removes the crypto_aes_set_key() helper function, add temporary replacements for it to arch/arm/crypto/aes-cipher-glue.c and arch/arm64/crypto/aes-cipher-glue.c. This is temporary, as that code will be migrated into lib/crypto/ in later commits. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-10-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/arm/crypto/aes-cipher-glue.c | 10 +- arch/arm64/crypto/aes-cipher-glue.c | 10 +- crypto/Makefile | 3 +- crypto/aes.c | 66 ++ crypto/aes_generic.c | 1320 ---------------------------------- crypto/crypto_user.c | 2 +- crypto/testmgr.c | 43 +- drivers/crypto/starfive/jh7110-aes.c | 10 +- include/crypto/aes.h | 6 - 9 files changed, 117 insertions(+), 1353 deletions(-) create mode 100644 crypto/aes.c delete mode 100644 crypto/aes_generic.c (limited to 'include') diff --git a/arch/arm/crypto/aes-cipher-glue.c b/arch/arm/crypto/aes-cipher-glue.c index 29efb7833960..f302db808cd3 100644 --- a/arch/arm/crypto/aes-cipher-glue.c +++ b/arch/arm/crypto/aes-cipher-glue.c @@ -14,6 +14,14 @@ EXPORT_SYMBOL_GPL(__aes_arm_encrypt); EXPORT_SYMBOL_GPL(__aes_arm_decrypt); +static int aes_arm_setkey(struct crypto_tfm *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + return aes_expandkey(ctx, in_key, key_len); +} + static void aes_arm_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); @@ -41,7 +49,7 @@ static struct crypto_alg aes_alg = { .cra_cipher.cia_min_keysize = AES_MIN_KEY_SIZE, .cra_cipher.cia_max_keysize = AES_MAX_KEY_SIZE, - .cra_cipher.cia_setkey = crypto_aes_set_key, + .cra_cipher.cia_setkey = aes_arm_setkey, .cra_cipher.cia_encrypt = aes_arm_encrypt, .cra_cipher.cia_decrypt = aes_arm_decrypt, diff --git a/arch/arm64/crypto/aes-cipher-glue.c b/arch/arm64/crypto/aes-cipher-glue.c index 4ec55e568941..9b27cbac278b 100644 --- a/arch/arm64/crypto/aes-cipher-glue.c +++ b/arch/arm64/crypto/aes-cipher-glue.c @@ -12,6 +12,14 @@ asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds); asmlinkage void __aes_arm64_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds); +static int aes_arm64_setkey(struct crypto_tfm *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + return aes_expandkey(ctx, in_key, key_len); +} + static void aes_arm64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); @@ -39,7 +47,7 @@ static struct crypto_alg aes_alg = { .cra_cipher.cia_min_keysize = AES_MIN_KEY_SIZE, .cra_cipher.cia_max_keysize = AES_MAX_KEY_SIZE, - .cra_cipher.cia_setkey = crypto_aes_set_key, + .cra_cipher.cia_setkey = aes_arm64_setkey, .cra_cipher.cia_encrypt = aes_arm64_encrypt, .cra_cipher.cia_decrypt = aes_arm64_decrypt }; diff --git a/crypto/Makefile b/crypto/Makefile index be403dc20645..65a2c3478814 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -130,8 +130,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH) += twofish_generic.o obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 -obj-$(CONFIG_CRYPTO_AES) += aes_generic.o -CFLAGS_aes_generic.o := $(call cc-option,-fno-code-hoisting) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 +obj-$(CONFIG_CRYPTO_AES) += aes.o obj-$(CONFIG_CRYPTO_SM4) += sm4.o obj-$(CONFIG_CRYPTO_SM4_GENERIC) += sm4_generic.o obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o diff --git a/crypto/aes.c b/crypto/aes.c new file mode 100644 index 000000000000..ae8385df0ce5 --- /dev/null +++ b/crypto/aes.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Crypto API support for AES block cipher + * + * Copyright 2026 Google LLC + */ + +#include +#include +#include + +static_assert(__alignof__(struct aes_key) <= CRYPTO_MINALIGN); + +static int crypto_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct aes_key *key = crypto_tfm_ctx(tfm); + + return aes_preparekey(key, in_key, key_len); +} + +static void crypto_aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + const struct aes_key *key = crypto_tfm_ctx(tfm); + + aes_encrypt(key, out, in); +} + +static void crypto_aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + const struct aes_key *key = crypto_tfm_ctx(tfm); + + aes_decrypt(key, out, in); +} + +static struct crypto_alg alg = { + .cra_name = "aes", + .cra_driver_name = "aes-lib", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct aes_key), + .cra_module = THIS_MODULE, + .cra_u = { .cipher = { .cia_min_keysize = AES_MIN_KEY_SIZE, + .cia_max_keysize = AES_MAX_KEY_SIZE, + .cia_setkey = crypto_aes_setkey, + .cia_encrypt = crypto_aes_encrypt, + .cia_decrypt = crypto_aes_decrypt } } +}; + +static int __init crypto_aes_mod_init(void) +{ + return crypto_register_alg(&alg); +} +module_init(crypto_aes_mod_init); + +static void __exit crypto_aes_mod_exit(void) +{ + crypto_unregister_alg(&alg); +} +module_exit(crypto_aes_mod_exit); + +MODULE_DESCRIPTION("Crypto API support for AES block cipher"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("aes"); +MODULE_ALIAS_CRYPTO("aes-lib"); diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c deleted file mode 100644 index 85d2e78c8ef2..000000000000 --- a/crypto/aes_generic.c +++ /dev/null @@ -1,1320 +0,0 @@ -/* - * Cryptographic API. - * - * AES Cipher Algorithm. - * - * Based on Brian Gladman's code. - * - * Linux developers: - * Alexander Kjeldaas - * Herbert Valerio Riedel - * Kyle McMartin - * Adam J. Richter (conversion to 2.5 API). - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * --------------------------------------------------------------------------- - * Copyright (c) 2002, Dr Brian Gladman , Worcester, UK. - * All rights reserved. - * - * LICENSE TERMS - * - * The free distribution and use of this software in both source and binary - * form is allowed (with or without changes) provided that: - * - * 1. distributions of this source code include the above copyright - * notice, this list of conditions and the following disclaimer; - * - * 2. distributions in binary form include the above copyright - * notice, this list of conditions and the following disclaimer - * in the documentation and/or other associated materials; - * - * 3. the copyright holder's name is not used to endorse products - * built using this software without specific written permission. - * - * ALTERNATIVELY, provided that this notice is retained in full, this product - * may be distributed under the terms of the GNU General Public License (GPL), - * in which case the provisions of the GPL apply INSTEAD OF those given above. - * - * DISCLAIMER - * - * This software is provided 'as is' with no explicit or implied warranties - * in respect of its properties, including, but not limited to, correctness - * and/or fitness for purpose. - * --------------------------------------------------------------------------- - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -static inline u8 byte(const u32 x, const unsigned n) -{ - return x >> (n << 3); -} - -/* cacheline-aligned to facilitate prefetching into cache */ -__visible const u32 crypto_ft_tab[4][256] ____cacheline_aligned = { - { - 0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6, - 0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591, - 0x50303060, 0x03010102, 0xa96767ce, 0x7d2b2b56, - 0x19fefee7, 0x62d7d7b5, 0xe6abab4d, 0x9a7676ec, - 0x45caca8f, 0x9d82821f, 0x40c9c989, 0x877d7dfa, - 0x15fafaef, 0xeb5959b2, 0xc947478e, 0x0bf0f0fb, - 0xecadad41, 0x67d4d4b3, 0xfda2a25f, 0xeaafaf45, - 0xbf9c9c23, 0xf7a4a453, 0x967272e4, 0x5bc0c09b, - 0xc2b7b775, 0x1cfdfde1, 0xae93933d, 0x6a26264c, - 0x5a36366c, 0x413f3f7e, 0x02f7f7f5, 0x4fcccc83, - 0x5c343468, 0xf4a5a551, 0x34e5e5d1, 0x08f1f1f9, - 0x937171e2, 0x73d8d8ab, 0x53313162, 0x3f15152a, - 0x0c040408, 0x52c7c795, 0x65232346, 0x5ec3c39d, - 0x28181830, 0xa1969637, 0x0f05050a, 0xb59a9a2f, - 0x0907070e, 0x36121224, 0x9b80801b, 0x3de2e2df, - 0x26ebebcd, 0x6927274e, 0xcdb2b27f, 0x9f7575ea, - 0x1b090912, 0x9e83831d, 0x742c2c58, 0x2e1a1a34, - 0x2d1b1b36, 0xb26e6edc, 0xee5a5ab4, 0xfba0a05b, - 0xf65252a4, 0x4d3b3b76, 0x61d6d6b7, 0xceb3b37d, - 0x7b292952, 0x3ee3e3dd, 0x712f2f5e, 0x97848413, - 0xf55353a6, 0x68d1d1b9, 0x00000000, 0x2cededc1, - 0x60202040, 0x1ffcfce3, 0xc8b1b179, 0xed5b5bb6, - 0xbe6a6ad4, 0x46cbcb8d, 0xd9bebe67, 0x4b393972, - 0xde4a4a94, 0xd44c4c98, 0xe85858b0, 0x4acfcf85, - 0x6bd0d0bb, 0x2aefefc5, 0xe5aaaa4f, 0x16fbfbed, - 0xc5434386, 0xd74d4d9a, 0x55333366, 0x94858511, - 0xcf45458a, 0x10f9f9e9, 0x06020204, 0x817f7ffe, - 0xf05050a0, 0x443c3c78, 0xba9f9f25, 0xe3a8a84b, - 0xf35151a2, 0xfea3a35d, 0xc0404080, 0x8a8f8f05, - 0xad92923f, 0xbc9d9d21, 0x48383870, 0x04f5f5f1, - 0xdfbcbc63, 0xc1b6b677, 0x75dadaaf, 0x63212142, - 0x30101020, 0x1affffe5, 0x0ef3f3fd, 0x6dd2d2bf, - 0x4ccdcd81, 0x140c0c18, 0x35131326, 0x2fececc3, - 0xe15f5fbe, 0xa2979735, 0xcc444488, 0x3917172e, - 0x57c4c493, 0xf2a7a755, 0x827e7efc, 0x473d3d7a, - 0xac6464c8, 0xe75d5dba, 0x2b191932, 0x957373e6, - 0xa06060c0, 0x98818119, 0xd14f4f9e, 0x7fdcdca3, - 0x66222244, 0x7e2a2a54, 0xab90903b, 0x8388880b, - 0xca46468c, 0x29eeeec7, 0xd3b8b86b, 0x3c141428, - 0x79dedea7, 0xe25e5ebc, 0x1d0b0b16, 0x76dbdbad, - 0x3be0e0db, 0x56323264, 0x4e3a3a74, 0x1e0a0a14, - 0xdb494992, 0x0a06060c, 0x6c242448, 0xe45c5cb8, - 0x5dc2c29f, 0x6ed3d3bd, 0xefacac43, 0xa66262c4, - 0xa8919139, 0xa4959531, 0x37e4e4d3, 0x8b7979f2, - 0x32e7e7d5, 0x43c8c88b, 0x5937376e, 0xb76d6dda, - 0x8c8d8d01, 0x64d5d5b1, 0xd24e4e9c, 0xe0a9a949, - 0xb46c6cd8, 0xfa5656ac, 0x07f4f4f3, 0x25eaeacf, - 0xaf6565ca, 0x8e7a7af4, 0xe9aeae47, 0x18080810, - 0xd5baba6f, 0x887878f0, 0x6f25254a, 0x722e2e5c, - 0x241c1c38, 0xf1a6a657, 0xc7b4b473, 0x51c6c697, - 0x23e8e8cb, 0x7cdddda1, 0x9c7474e8, 0x211f1f3e, - 0xdd4b4b96, 0xdcbdbd61, 0x868b8b0d, 0x858a8a0f, - 0x907070e0, 0x423e3e7c, 0xc4b5b571, 0xaa6666cc, - 0xd8484890, 0x05030306, 0x01f6f6f7, 0x120e0e1c, - 0xa36161c2, 0x5f35356a, 0xf95757ae, 0xd0b9b969, - 0x91868617, 0x58c1c199, 0x271d1d3a, 0xb99e9e27, - 0x38e1e1d9, 0x13f8f8eb, 0xb398982b, 0x33111122, - 0xbb6969d2, 0x70d9d9a9, 0x898e8e07, 0xa7949433, - 0xb69b9b2d, 0x221e1e3c, 0x92878715, 0x20e9e9c9, - 0x49cece87, 0xff5555aa, 0x78282850, 0x7adfdfa5, - 0x8f8c8c03, 0xf8a1a159, 0x80898909, 0x170d0d1a, - 0xdabfbf65, 0x31e6e6d7, 0xc6424284, 0xb86868d0, - 0xc3414182, 0xb0999929, 0x772d2d5a, 0x110f0f1e, - 0xcbb0b07b, 0xfc5454a8, 0xd6bbbb6d, 0x3a16162c, - }, { - 0x6363c6a5, 0x7c7cf884, 0x7777ee99, 0x7b7bf68d, - 0xf2f2ff0d, 0x6b6bd6bd, 0x6f6fdeb1, 0xc5c59154, - 0x30306050, 0x01010203, 0x6767cea9, 0x2b2b567d, - 0xfefee719, 0xd7d7b562, 0xabab4de6, 0x7676ec9a, - 0xcaca8f45, 0x82821f9d, 0xc9c98940, 0x7d7dfa87, - 0xfafaef15, 0x5959b2eb, 0x47478ec9, 0xf0f0fb0b, - 0xadad41ec, 0xd4d4b367, 0xa2a25ffd, 0xafaf45ea, - 0x9c9c23bf, 0xa4a453f7, 0x7272e496, 0xc0c09b5b, - 0xb7b775c2, 0xfdfde11c, 0x93933dae, 0x26264c6a, - 0x36366c5a, 0x3f3f7e41, 0xf7f7f502, 0xcccc834f, - 0x3434685c, 0xa5a551f4, 0xe5e5d134, 0xf1f1f908, - 0x7171e293, 0xd8d8ab73, 0x31316253, 0x15152a3f, - 0x0404080c, 0xc7c79552, 0x23234665, 0xc3c39d5e, - 0x18183028, 0x969637a1, 0x05050a0f, 0x9a9a2fb5, - 0x07070e09, 0x12122436, 0x80801b9b, 0xe2e2df3d, - 0xebebcd26, 0x27274e69, 0xb2b27fcd, 0x7575ea9f, - 0x0909121b, 0x83831d9e, 0x2c2c5874, 0x1a1a342e, - 0x1b1b362d, 0x6e6edcb2, 0x5a5ab4ee, 0xa0a05bfb, - 0x5252a4f6, 0x3b3b764d, 0xd6d6b761, 0xb3b37dce, - 0x2929527b, 0xe3e3dd3e, 0x2f2f5e71, 0x84841397, - 0x5353a6f5, 0xd1d1b968, 0x00000000, 0xededc12c, - 0x20204060, 0xfcfce31f, 0xb1b179c8, 0x5b5bb6ed, - 0x6a6ad4be, 0xcbcb8d46, 0xbebe67d9, 0x3939724b, - 0x4a4a94de, 0x4c4c98d4, 0x5858b0e8, 0xcfcf854a, - 0xd0d0bb6b, 0xefefc52a, 0xaaaa4fe5, 0xfbfbed16, - 0x434386c5, 0x4d4d9ad7, 0x33336655, 0x85851194, - 0x45458acf, 0xf9f9e910, 0x02020406, 0x7f7ffe81, - 0x5050a0f0, 0x3c3c7844, 0x9f9f25ba, 0xa8a84be3, - 0x5151a2f3, 0xa3a35dfe, 0x404080c0, 0x8f8f058a, - 0x92923fad, 0x9d9d21bc, 0x38387048, 0xf5f5f104, - 0xbcbc63df, 0xb6b677c1, 0xdadaaf75, 0x21214263, - 0x10102030, 0xffffe51a, 0xf3f3fd0e, 0xd2d2bf6d, - 0xcdcd814c, 0x0c0c1814, 0x13132635, 0xececc32f, - 0x5f5fbee1, 0x979735a2, 0x444488cc, 0x17172e39, - 0xc4c49357, 0xa7a755f2, 0x7e7efc82, 0x3d3d7a47, - 0x6464c8ac, 0x5d5dbae7, 0x1919322b, 0x7373e695, - 0x6060c0a0, 0x81811998, 0x4f4f9ed1, 0xdcdca37f, - 0x22224466, 0x2a2a547e, 0x90903bab, 0x88880b83, - 0x46468cca, 0xeeeec729, 0xb8b86bd3, 0x1414283c, - 0xdedea779, 0x5e5ebce2, 0x0b0b161d, 0xdbdbad76, - 0xe0e0db3b, 0x32326456, 0x3a3a744e, 0x0a0a141e, - 0x494992db, 0x06060c0a, 0x2424486c, 0x5c5cb8e4, - 0xc2c29f5d, 0xd3d3bd6e, 0xacac43ef, 0x6262c4a6, - 0x919139a8, 0x959531a4, 0xe4e4d337, 0x7979f28b, - 0xe7e7d532, 0xc8c88b43, 0x37376e59, 0x6d6ddab7, - 0x8d8d018c, 0xd5d5b164, 0x4e4e9cd2, 0xa9a949e0, - 0x6c6cd8b4, 0x5656acfa, 0xf4f4f307, 0xeaeacf25, - 0x6565caaf, 0x7a7af48e, 0xaeae47e9, 0x08081018, - 0xbaba6fd5, 0x7878f088, 0x25254a6f, 0x2e2e5c72, - 0x1c1c3824, 0xa6a657f1, 0xb4b473c7, 0xc6c69751, - 0xe8e8cb23, 0xdddda17c, 0x7474e89c, 0x1f1f3e21, - 0x4b4b96dd, 0xbdbd61dc, 0x8b8b0d86, 0x8a8a0f85, - 0x7070e090, 0x3e3e7c42, 0xb5b571c4, 0x6666ccaa, - 0x484890d8, 0x03030605, 0xf6f6f701, 0x0e0e1c12, - 0x6161c2a3, 0x35356a5f, 0x5757aef9, 0xb9b969d0, - 0x86861791, 0xc1c19958, 0x1d1d3a27, 0x9e9e27b9, - 0xe1e1d938, 0xf8f8eb13, 0x98982bb3, 0x11112233, - 0x6969d2bb, 0xd9d9a970, 0x8e8e0789, 0x949433a7, - 0x9b9b2db6, 0x1e1e3c22, 0x87871592, 0xe9e9c920, - 0xcece8749, 0x5555aaff, 0x28285078, 0xdfdfa57a, - 0x8c8c038f, 0xa1a159f8, 0x89890980, 0x0d0d1a17, - 0xbfbf65da, 0xe6e6d731, 0x424284c6, 0x6868d0b8, - 0x414182c3, 0x999929b0, 0x2d2d5a77, 0x0f0f1e11, - 0xb0b07bcb, 0x5454a8fc, 0xbbbb6dd6, 0x16162c3a, - }, { - 0x63c6a563, 0x7cf8847c, 0x77ee9977, 0x7bf68d7b, - 0xf2ff0df2, 0x6bd6bd6b, 0x6fdeb16f, 0xc59154c5, - 0x30605030, 0x01020301, 0x67cea967, 0x2b567d2b, - 0xfee719fe, 0xd7b562d7, 0xab4de6ab, 0x76ec9a76, - 0xca8f45ca, 0x821f9d82, 0xc98940c9, 0x7dfa877d, - 0xfaef15fa, 0x59b2eb59, 0x478ec947, 0xf0fb0bf0, - 0xad41ecad, 0xd4b367d4, 0xa25ffda2, 0xaf45eaaf, - 0x9c23bf9c, 0xa453f7a4, 0x72e49672, 0xc09b5bc0, - 0xb775c2b7, 0xfde11cfd, 0x933dae93, 0x264c6a26, - 0x366c5a36, 0x3f7e413f, 0xf7f502f7, 0xcc834fcc, - 0x34685c34, 0xa551f4a5, 0xe5d134e5, 0xf1f908f1, - 0x71e29371, 0xd8ab73d8, 0x31625331, 0x152a3f15, - 0x04080c04, 0xc79552c7, 0x23466523, 0xc39d5ec3, - 0x18302818, 0x9637a196, 0x050a0f05, 0x9a2fb59a, - 0x070e0907, 0x12243612, 0x801b9b80, 0xe2df3de2, - 0xebcd26eb, 0x274e6927, 0xb27fcdb2, 0x75ea9f75, - 0x09121b09, 0x831d9e83, 0x2c58742c, 0x1a342e1a, - 0x1b362d1b, 0x6edcb26e, 0x5ab4ee5a, 0xa05bfba0, - 0x52a4f652, 0x3b764d3b, 0xd6b761d6, 0xb37dceb3, - 0x29527b29, 0xe3dd3ee3, 0x2f5e712f, 0x84139784, - 0x53a6f553, 0xd1b968d1, 0x00000000, 0xedc12ced, - 0x20406020, 0xfce31ffc, 0xb179c8b1, 0x5bb6ed5b, - 0x6ad4be6a, 0xcb8d46cb, 0xbe67d9be, 0x39724b39, - 0x4a94de4a, 0x4c98d44c, 0x58b0e858, 0xcf854acf, - 0xd0bb6bd0, 0xefc52aef, 0xaa4fe5aa, 0xfbed16fb, - 0x4386c543, 0x4d9ad74d, 0x33665533, 0x85119485, - 0x458acf45, 0xf9e910f9, 0x02040602, 0x7ffe817f, - 0x50a0f050, 0x3c78443c, 0x9f25ba9f, 0xa84be3a8, - 0x51a2f351, 0xa35dfea3, 0x4080c040, 0x8f058a8f, - 0x923fad92, 0x9d21bc9d, 0x38704838, 0xf5f104f5, - 0xbc63dfbc, 0xb677c1b6, 0xdaaf75da, 0x21426321, - 0x10203010, 0xffe51aff, 0xf3fd0ef3, 0xd2bf6dd2, - 0xcd814ccd, 0x0c18140c, 0x13263513, 0xecc32fec, - 0x5fbee15f, 0x9735a297, 0x4488cc44, 0x172e3917, - 0xc49357c4, 0xa755f2a7, 0x7efc827e, 0x3d7a473d, - 0x64c8ac64, 0x5dbae75d, 0x19322b19, 0x73e69573, - 0x60c0a060, 0x81199881, 0x4f9ed14f, 0xdca37fdc, - 0x22446622, 0x2a547e2a, 0x903bab90, 0x880b8388, - 0x468cca46, 0xeec729ee, 0xb86bd3b8, 0x14283c14, - 0xdea779de, 0x5ebce25e, 0x0b161d0b, 0xdbad76db, - 0xe0db3be0, 0x32645632, 0x3a744e3a, 0x0a141e0a, - 0x4992db49, 0x060c0a06, 0x24486c24, 0x5cb8e45c, - 0xc29f5dc2, 0xd3bd6ed3, 0xac43efac, 0x62c4a662, - 0x9139a891, 0x9531a495, 0xe4d337e4, 0x79f28b79, - 0xe7d532e7, 0xc88b43c8, 0x376e5937, 0x6ddab76d, - 0x8d018c8d, 0xd5b164d5, 0x4e9cd24e, 0xa949e0a9, - 0x6cd8b46c, 0x56acfa56, 0xf4f307f4, 0xeacf25ea, - 0x65caaf65, 0x7af48e7a, 0xae47e9ae, 0x08101808, - 0xba6fd5ba, 0x78f08878, 0x254a6f25, 0x2e5c722e, - 0x1c38241c, 0xa657f1a6, 0xb473c7b4, 0xc69751c6, - 0xe8cb23e8, 0xdda17cdd, 0x74e89c74, 0x1f3e211f, - 0x4b96dd4b, 0xbd61dcbd, 0x8b0d868b, 0x8a0f858a, - 0x70e09070, 0x3e7c423e, 0xb571c4b5, 0x66ccaa66, - 0x4890d848, 0x03060503, 0xf6f701f6, 0x0e1c120e, - 0x61c2a361, 0x356a5f35, 0x57aef957, 0xb969d0b9, - 0x86179186, 0xc19958c1, 0x1d3a271d, 0x9e27b99e, - 0xe1d938e1, 0xf8eb13f8, 0x982bb398, 0x11223311, - 0x69d2bb69, 0xd9a970d9, 0x8e07898e, 0x9433a794, - 0x9b2db69b, 0x1e3c221e, 0x87159287, 0xe9c920e9, - 0xce8749ce, 0x55aaff55, 0x28507828, 0xdfa57adf, - 0x8c038f8c, 0xa159f8a1, 0x89098089, 0x0d1a170d, - 0xbf65dabf, 0xe6d731e6, 0x4284c642, 0x68d0b868, - 0x4182c341, 0x9929b099, 0x2d5a772d, 0x0f1e110f, - 0xb07bcbb0, 0x54a8fc54, 0xbb6dd6bb, 0x162c3a16, - }, { - 0xc6a56363, 0xf8847c7c, 0xee997777, 0xf68d7b7b, - 0xff0df2f2, 0xd6bd6b6b, 0xdeb16f6f, 0x9154c5c5, - 0x60503030, 0x02030101, 0xcea96767, 0x567d2b2b, - 0xe719fefe, 0xb562d7d7, 0x4de6abab, 0xec9a7676, - 0x8f45caca, 0x1f9d8282, 0x8940c9c9, 0xfa877d7d, - 0xef15fafa, 0xb2eb5959, 0x8ec94747, 0xfb0bf0f0, - 0x41ecadad, 0xb367d4d4, 0x5ffda2a2, 0x45eaafaf, - 0x23bf9c9c, 0x53f7a4a4, 0xe4967272, 0x9b5bc0c0, - 0x75c2b7b7, 0xe11cfdfd, 0x3dae9393, 0x4c6a2626, - 0x6c5a3636, 0x7e413f3f, 0xf502f7f7, 0x834fcccc, - 0x685c3434, 0x51f4a5a5, 0xd134e5e5, 0xf908f1f1, - 0xe2937171, 0xab73d8d8, 0x62533131, 0x2a3f1515, - 0x080c0404, 0x9552c7c7, 0x46652323, 0x9d5ec3c3, - 0x30281818, 0x37a19696, 0x0a0f0505, 0x2fb59a9a, - 0x0e090707, 0x24361212, 0x1b9b8080, 0xdf3de2e2, - 0xcd26ebeb, 0x4e692727, 0x7fcdb2b2, 0xea9f7575, - 0x121b0909, 0x1d9e8383, 0x58742c2c, 0x342e1a1a, - 0x362d1b1b, 0xdcb26e6e, 0xb4ee5a5a, 0x5bfba0a0, - 0xa4f65252, 0x764d3b3b, 0xb761d6d6, 0x7dceb3b3, - 0x527b2929, 0xdd3ee3e3, 0x5e712f2f, 0x13978484, - 0xa6f55353, 0xb968d1d1, 0x00000000, 0xc12ceded, - 0x40602020, 0xe31ffcfc, 0x79c8b1b1, 0xb6ed5b5b, - 0xd4be6a6a, 0x8d46cbcb, 0x67d9bebe, 0x724b3939, - 0x94de4a4a, 0x98d44c4c, 0xb0e85858, 0x854acfcf, - 0xbb6bd0d0, 0xc52aefef, 0x4fe5aaaa, 0xed16fbfb, - 0x86c54343, 0x9ad74d4d, 0x66553333, 0x11948585, - 0x8acf4545, 0xe910f9f9, 0x04060202, 0xfe817f7f, - 0xa0f05050, 0x78443c3c, 0x25ba9f9f, 0x4be3a8a8, - 0xa2f35151, 0x5dfea3a3, 0x80c04040, 0x058a8f8f, - 0x3fad9292, 0x21bc9d9d, 0x70483838, 0xf104f5f5, - 0x63dfbcbc, 0x77c1b6b6, 0xaf75dada, 0x42632121, - 0x20301010, 0xe51affff, 0xfd0ef3f3, 0xbf6dd2d2, - 0x814ccdcd, 0x18140c0c, 0x26351313, 0xc32fecec, - 0xbee15f5f, 0x35a29797, 0x88cc4444, 0x2e391717, - 0x9357c4c4, 0x55f2a7a7, 0xfc827e7e, 0x7a473d3d, - 0xc8ac6464, 0xbae75d5d, 0x322b1919, 0xe6957373, - 0xc0a06060, 0x19988181, 0x9ed14f4f, 0xa37fdcdc, - 0x44662222, 0x547e2a2a, 0x3bab9090, 0x0b838888, - 0x8cca4646, 0xc729eeee, 0x6bd3b8b8, 0x283c1414, - 0xa779dede, 0xbce25e5e, 0x161d0b0b, 0xad76dbdb, - 0xdb3be0e0, 0x64563232, 0x744e3a3a, 0x141e0a0a, - 0x92db4949, 0x0c0a0606, 0x486c2424, 0xb8e45c5c, - 0x9f5dc2c2, 0xbd6ed3d3, 0x43efacac, 0xc4a66262, - 0x39a89191, 0x31a49595, 0xd337e4e4, 0xf28b7979, - 0xd532e7e7, 0x8b43c8c8, 0x6e593737, 0xdab76d6d, - 0x018c8d8d, 0xb164d5d5, 0x9cd24e4e, 0x49e0a9a9, - 0xd8b46c6c, 0xacfa5656, 0xf307f4f4, 0xcf25eaea, - 0xcaaf6565, 0xf48e7a7a, 0x47e9aeae, 0x10180808, - 0x6fd5baba, 0xf0887878, 0x4a6f2525, 0x5c722e2e, - 0x38241c1c, 0x57f1a6a6, 0x73c7b4b4, 0x9751c6c6, - 0xcb23e8e8, 0xa17cdddd, 0xe89c7474, 0x3e211f1f, - 0x96dd4b4b, 0x61dcbdbd, 0x0d868b8b, 0x0f858a8a, - 0xe0907070, 0x7c423e3e, 0x71c4b5b5, 0xccaa6666, - 0x90d84848, 0x06050303, 0xf701f6f6, 0x1c120e0e, - 0xc2a36161, 0x6a5f3535, 0xaef95757, 0x69d0b9b9, - 0x17918686, 0x9958c1c1, 0x3a271d1d, 0x27b99e9e, - 0xd938e1e1, 0xeb13f8f8, 0x2bb39898, 0x22331111, - 0xd2bb6969, 0xa970d9d9, 0x07898e8e, 0x33a79494, - 0x2db69b9b, 0x3c221e1e, 0x15928787, 0xc920e9e9, - 0x8749cece, 0xaaff5555, 0x50782828, 0xa57adfdf, - 0x038f8c8c, 0x59f8a1a1, 0x09808989, 0x1a170d0d, - 0x65dabfbf, 0xd731e6e6, 0x84c64242, 0xd0b86868, - 0x82c34141, 0x29b09999, 0x5a772d2d, 0x1e110f0f, - 0x7bcbb0b0, 0xa8fc5454, 0x6dd6bbbb, 0x2c3a1616, - } -}; - -static const u32 crypto_fl_tab[4][256] ____cacheline_aligned = { - { - 0x00000063, 0x0000007c, 0x00000077, 0x0000007b, - 0x000000f2, 0x0000006b, 0x0000006f, 0x000000c5, - 0x00000030, 0x00000001, 0x00000067, 0x0000002b, - 0x000000fe, 0x000000d7, 0x000000ab, 0x00000076, - 0x000000ca, 0x00000082, 0x000000c9, 0x0000007d, - 0x000000fa, 0x00000059, 0x00000047, 0x000000f0, - 0x000000ad, 0x000000d4, 0x000000a2, 0x000000af, - 0x0000009c, 0x000000a4, 0x00000072, 0x000000c0, - 0x000000b7, 0x000000fd, 0x00000093, 0x00000026, - 0x00000036, 0x0000003f, 0x000000f7, 0x000000cc, - 0x00000034, 0x000000a5, 0x000000e5, 0x000000f1, - 0x00000071, 0x000000d8, 0x00000031, 0x00000015, - 0x00000004, 0x000000c7, 0x00000023, 0x000000c3, - 0x00000018, 0x00000096, 0x00000005, 0x0000009a, - 0x00000007, 0x00000012, 0x00000080, 0x000000e2, - 0x000000eb, 0x00000027, 0x000000b2, 0x00000075, - 0x00000009, 0x00000083, 0x0000002c, 0x0000001a, - 0x0000001b, 0x0000006e, 0x0000005a, 0x000000a0, - 0x00000052, 0x0000003b, 0x000000d6, 0x000000b3, - 0x00000029, 0x000000e3, 0x0000002f, 0x00000084, - 0x00000053, 0x000000d1, 0x00000000, 0x000000ed, - 0x00000020, 0x000000fc, 0x000000b1, 0x0000005b, - 0x0000006a, 0x000000cb, 0x000000be, 0x00000039, - 0x0000004a, 0x0000004c, 0x00000058, 0x000000cf, - 0x000000d0, 0x000000ef, 0x000000aa, 0x000000fb, - 0x00000043, 0x0000004d, 0x00000033, 0x00000085, - 0x00000045, 0x000000f9, 0x00000002, 0x0000007f, - 0x00000050, 0x0000003c, 0x0000009f, 0x000000a8, - 0x00000051, 0x000000a3, 0x00000040, 0x0000008f, - 0x00000092, 0x0000009d, 0x00000038, 0x000000f5, - 0x000000bc, 0x000000b6, 0x000000da, 0x00000021, - 0x00000010, 0x000000ff, 0x000000f3, 0x000000d2, - 0x000000cd, 0x0000000c, 0x00000013, 0x000000ec, - 0x0000005f, 0x00000097, 0x00000044, 0x00000017, - 0x000000c4, 0x000000a7, 0x0000007e, 0x0000003d, - 0x00000064, 0x0000005d, 0x00000019, 0x00000073, - 0x00000060, 0x00000081, 0x0000004f, 0x000000dc, - 0x00000022, 0x0000002a, 0x00000090, 0x00000088, - 0x00000046, 0x000000ee, 0x000000b8, 0x00000014, - 0x000000de, 0x0000005e, 0x0000000b, 0x000000db, - 0x000000e0, 0x00000032, 0x0000003a, 0x0000000a, - 0x00000049, 0x00000006, 0x00000024, 0x0000005c, - 0x000000c2, 0x000000d3, 0x000000ac, 0x00000062, - 0x00000091, 0x00000095, 0x000000e4, 0x00000079, - 0x000000e7, 0x000000c8, 0x00000037, 0x0000006d, - 0x0000008d, 0x000000d5, 0x0000004e, 0x000000a9, - 0x0000006c, 0x00000056, 0x000000f4, 0x000000ea, - 0x00000065, 0x0000007a, 0x000000ae, 0x00000008, - 0x000000ba, 0x00000078, 0x00000025, 0x0000002e, - 0x0000001c, 0x000000a6, 0x000000b4, 0x000000c6, - 0x000000e8, 0x000000dd, 0x00000074, 0x0000001f, - 0x0000004b, 0x000000bd, 0x0000008b, 0x0000008a, - 0x00000070, 0x0000003e, 0x000000b5, 0x00000066, - 0x00000048, 0x00000003, 0x000000f6, 0x0000000e, - 0x00000061, 0x00000035, 0x00000057, 0x000000b9, - 0x00000086, 0x000000c1, 0x0000001d, 0x0000009e, - 0x000000e1, 0x000000f8, 0x00000098, 0x00000011, - 0x00000069, 0x000000d9, 0x0000008e, 0x00000094, - 0x0000009b, 0x0000001e, 0x00000087, 0x000000e9, - 0x000000ce, 0x00000055, 0x00000028, 0x000000df, - 0x0000008c, 0x000000a1, 0x00000089, 0x0000000d, - 0x000000bf, 0x000000e6, 0x00000042, 0x00000068, - 0x00000041, 0x00000099, 0x0000002d, 0x0000000f, - 0x000000b0, 0x00000054, 0x000000bb, 0x00000016, - }, { - 0x00006300, 0x00007c00, 0x00007700, 0x00007b00, - 0x0000f200, 0x00006b00, 0x00006f00, 0x0000c500, - 0x00003000, 0x00000100, 0x00006700, 0x00002b00, - 0x0000fe00, 0x0000d700, 0x0000ab00, 0x00007600, - 0x0000ca00, 0x00008200, 0x0000c900, 0x00007d00, - 0x0000fa00, 0x00005900, 0x00004700, 0x0000f000, - 0x0000ad00, 0x0000d400, 0x0000a200, 0x0000af00, - 0x00009c00, 0x0000a400, 0x00007200, 0x0000c000, - 0x0000b700, 0x0000fd00, 0x00009300, 0x00002600, - 0x00003600, 0x00003f00, 0x0000f700, 0x0000cc00, - 0x00003400, 0x0000a500, 0x0000e500, 0x0000f100, - 0x00007100, 0x0000d800, 0x00003100, 0x00001500, - 0x00000400, 0x0000c700, 0x00002300, 0x0000c300, - 0x00001800, 0x00009600, 0x00000500, 0x00009a00, - 0x00000700, 0x00001200, 0x00008000, 0x0000e200, - 0x0000eb00, 0x00002700, 0x0000b200, 0x00007500, - 0x00000900, 0x00008300, 0x00002c00, 0x00001a00, - 0x00001b00, 0x00006e00, 0x00005a00, 0x0000a000, - 0x00005200, 0x00003b00, 0x0000d600, 0x0000b300, - 0x00002900, 0x0000e300, 0x00002f00, 0x00008400, - 0x00005300, 0x0000d100, 0x00000000, 0x0000ed00, - 0x00002000, 0x0000fc00, 0x0000b100, 0x00005b00, - 0x00006a00, 0x0000cb00, 0x0000be00, 0x00003900, - 0x00004a00, 0x00004c00, 0x00005800, 0x0000cf00, - 0x0000d000, 0x0000ef00, 0x0000aa00, 0x0000fb00, - 0x00004300, 0x00004d00, 0x00003300, 0x00008500, - 0x00004500, 0x0000f900, 0x00000200, 0x00007f00, - 0x00005000, 0x00003c00, 0x00009f00, 0x0000a800, - 0x00005100, 0x0000a300, 0x00004000, 0x00008f00, - 0x00009200, 0x00009d00, 0x00003800, 0x0000f500, - 0x0000bc00, 0x0000b600, 0x0000da00, 0x00002100, - 0x00001000, 0x0000ff00, 0x0000f300, 0x0000d200, - 0x0000cd00, 0x00000c00, 0x00001300, 0x0000ec00, - 0x00005f00, 0x00009700, 0x00004400, 0x00001700, - 0x0000c400, 0x0000a700, 0x00007e00, 0x00003d00, - 0x00006400, 0x00005d00, 0x00001900, 0x00007300, - 0x00006000, 0x00008100, 0x00004f00, 0x0000dc00, - 0x00002200, 0x00002a00, 0x00009000, 0x00008800, - 0x00004600, 0x0000ee00, 0x0000b800, 0x00001400, - 0x0000de00, 0x00005e00, 0x00000b00, 0x0000db00, - 0x0000e000, 0x00003200, 0x00003a00, 0x00000a00, - 0x00004900, 0x00000600, 0x00002400, 0x00005c00, - 0x0000c200, 0x0000d300, 0x0000ac00, 0x00006200, - 0x00009100, 0x00009500, 0x0000e400, 0x00007900, - 0x0000e700, 0x0000c800, 0x00003700, 0x00006d00, - 0x00008d00, 0x0000d500, 0x00004e00, 0x0000a900, - 0x00006c00, 0x00005600, 0x0000f400, 0x0000ea00, - 0x00006500, 0x00007a00, 0x0000ae00, 0x00000800, - 0x0000ba00, 0x00007800, 0x00002500, 0x00002e00, - 0x00001c00, 0x0000a600, 0x0000b400, 0x0000c600, - 0x0000e800, 0x0000dd00, 0x00007400, 0x00001f00, - 0x00004b00, 0x0000bd00, 0x00008b00, 0x00008a00, - 0x00007000, 0x00003e00, 0x0000b500, 0x00006600, - 0x00004800, 0x00000300, 0x0000f600, 0x00000e00, - 0x00006100, 0x00003500, 0x00005700, 0x0000b900, - 0x00008600, 0x0000c100, 0x00001d00, 0x00009e00, - 0x0000e100, 0x0000f800, 0x00009800, 0x00001100, - 0x00006900, 0x0000d900, 0x00008e00, 0x00009400, - 0x00009b00, 0x00001e00, 0x00008700, 0x0000e900, - 0x0000ce00, 0x00005500, 0x00002800, 0x0000df00, - 0x00008c00, 0x0000a100, 0x00008900, 0x00000d00, - 0x0000bf00, 0x0000e600, 0x00004200, 0x00006800, - 0x00004100, 0x00009900, 0x00002d00, 0x00000f00, - 0x0000b000, 0x00005400, 0x0000bb00, 0x00001600, - }, { - 0x00630000, 0x007c0000, 0x00770000, 0x007b0000, - 0x00f20000, 0x006b0000, 0x006f0000, 0x00c50000, - 0x00300000, 0x00010000, 0x00670000, 0x002b0000, - 0x00fe0000, 0x00d70000, 0x00ab0000, 0x00760000, - 0x00ca0000, 0x00820000, 0x00c90000, 0x007d0000, - 0x00fa0000, 0x00590000, 0x00470000, 0x00f00000, - 0x00ad0000, 0x00d40000, 0x00a20000, 0x00af0000, - 0x009c0000, 0x00a40000, 0x00720000, 0x00c00000, - 0x00b70000, 0x00fd0000, 0x00930000, 0x00260000, - 0x00360000, 0x003f0000, 0x00f70000, 0x00cc0000, - 0x00340000, 0x00a50000, 0x00e50000, 0x00f10000, - 0x00710000, 0x00d80000, 0x00310000, 0x00150000, - 0x00040000, 0x00c70000, 0x00230000, 0x00c30000, - 0x00180000, 0x00960000, 0x00050000, 0x009a0000, - 0x00070000, 0x00120000, 0x00800000, 0x00e20000, - 0x00eb0000, 0x00270000, 0x00b20000, 0x00750000, - 0x00090000, 0x00830000, 0x002c0000, 0x001a0000, - 0x001b0000, 0x006e0000, 0x005a0000, 0x00a00000, - 0x00520000, 0x003b0000, 0x00d60000, 0x00b30000, - 0x00290000, 0x00e30000, 0x002f0000, 0x00840000, - 0x00530000, 0x00d10000, 0x00000000, 0x00ed0000, - 0x00200000, 0x00fc0000, 0x00b10000, 0x005b0000, - 0x006a0000, 0x00cb0000, 0x00be0000, 0x00390000, - 0x004a0000, 0x004c0000, 0x00580000, 0x00cf0000, - 0x00d00000, 0x00ef0000, 0x00aa0000, 0x00fb0000, - 0x00430000, 0x004d0000, 0x00330000, 0x00850000, - 0x00450000, 0x00f90000, 0x00020000, 0x007f0000, - 0x00500000, 0x003c0000, 0x009f0000, 0x00a80000, - 0x00510000, 0x00a30000, 0x00400000, 0x008f0000, - 0x00920000, 0x009d0000, 0x00380000, 0x00f50000, - 0x00bc0000, 0x00b60000, 0x00da0000, 0x00210000, - 0x00100000, 0x00ff0000, 0x00f30000, 0x00d20000, - 0x00cd0000, 0x000c0000, 0x00130000, 0x00ec0000, - 0x005f0000, 0x00970000, 0x00440000, 0x00170000, - 0x00c40000, 0x00a70000, 0x007e0000, 0x003d0000, - 0x00640000, 0x005d0000, 0x00190000, 0x00730000, - 0x00600000, 0x00810000, 0x004f0000, 0x00dc0000, - 0x00220000, 0x002a0000, 0x00900000, 0x00880000, - 0x00460000, 0x00ee0000, 0x00b80000, 0x00140000, - 0x00de0000, 0x005e0000, 0x000b0000, 0x00db0000, - 0x00e00000, 0x00320000, 0x003a0000, 0x000a0000, - 0x00490000, 0x00060000, 0x00240000, 0x005c0000, - 0x00c20000, 0x00d30000, 0x00ac0000, 0x00620000, - 0x00910000, 0x00950000, 0x00e40000, 0x00790000, - 0x00e70000, 0x00c80000, 0x00370000, 0x006d0000, - 0x008d0000, 0x00d50000, 0x004e0000, 0x00a90000, - 0x006c0000, 0x00560000, 0x00f40000, 0x00ea0000, - 0x00650000, 0x007a0000, 0x00ae0000, 0x00080000, - 0x00ba0000, 0x00780000, 0x00250000, 0x002e0000, - 0x001c0000, 0x00a60000, 0x00b40000, 0x00c60000, - 0x00e80000, 0x00dd0000, 0x00740000, 0x001f0000, - 0x004b0000, 0x00bd0000, 0x008b0000, 0x008a0000, - 0x00700000, 0x003e0000, 0x00b50000, 0x00660000, - 0x00480000, 0x00030000, 0x00f60000, 0x000e0000, - 0x00610000, 0x00350000, 0x00570000, 0x00b90000, - 0x00860000, 0x00c10000, 0x001d0000, 0x009e0000, - 0x00e10000, 0x00f80000, 0x00980000, 0x00110000, - 0x00690000, 0x00d90000, 0x008e0000, 0x00940000, - 0x009b0000, 0x001e0000, 0x00870000, 0x00e90000, - 0x00ce0000, 0x00550000, 0x00280000, 0x00df0000, - 0x008c0000, 0x00a10000, 0x00890000, 0x000d0000, - 0x00bf0000, 0x00e60000, 0x00420000, 0x00680000, - 0x00410000, 0x00990000, 0x002d0000, 0x000f0000, - 0x00b00000, 0x00540000, 0x00bb0000, 0x00160000, - }, { - 0x63000000, 0x7c000000, 0x77000000, 0x7b000000, - 0xf2000000, 0x6b000000, 0x6f000000, 0xc5000000, - 0x30000000, 0x01000000, 0x67000000, 0x2b000000, - 0xfe000000, 0xd7000000, 0xab000000, 0x76000000, - 0xca000000, 0x82000000, 0xc9000000, 0x7d000000, - 0xfa000000, 0x59000000, 0x47000000, 0xf0000000, - 0xad000000, 0xd4000000, 0xa2000000, 0xaf000000, - 0x9c000000, 0xa4000000, 0x72000000, 0xc0000000, - 0xb7000000, 0xfd000000, 0x93000000, 0x26000000, - 0x36000000, 0x3f000000, 0xf7000000, 0xcc000000, - 0x34000000, 0xa5000000, 0xe5000000, 0xf1000000, - 0x71000000, 0xd8000000, 0x31000000, 0x15000000, - 0x04000000, 0xc7000000, 0x23000000, 0xc3000000, - 0x18000000, 0x96000000, 0x05000000, 0x9a000000, - 0x07000000, 0x12000000, 0x80000000, 0xe2000000, - 0xeb000000, 0x27000000, 0xb2000000, 0x75000000, - 0x09000000, 0x83000000, 0x2c000000, 0x1a000000, - 0x1b000000, 0x6e000000, 0x5a000000, 0xa0000000, - 0x52000000, 0x3b000000, 0xd6000000, 0xb3000000, - 0x29000000, 0xe3000000, 0x2f000000, 0x84000000, - 0x53000000, 0xd1000000, 0x00000000, 0xed000000, - 0x20000000, 0xfc000000, 0xb1000000, 0x5b000000, - 0x6a000000, 0xcb000000, 0xbe000000, 0x39000000, - 0x4a000000, 0x4c000000, 0x58000000, 0xcf000000, - 0xd0000000, 0xef000000, 0xaa000000, 0xfb000000, - 0x43000000, 0x4d000000, 0x33000000, 0x85000000, - 0x45000000, 0xf9000000, 0x02000000, 0x7f000000, - 0x50000000, 0x3c000000, 0x9f000000, 0xa8000000, - 0x51000000, 0xa3000000, 0x40000000, 0x8f000000, - 0x92000000, 0x9d000000, 0x38000000, 0xf5000000, - 0xbc000000, 0xb6000000, 0xda000000, 0x21000000, - 0x10000000, 0xff000000, 0xf3000000, 0xd2000000, - 0xcd000000, 0x0c000000, 0x13000000, 0xec000000, - 0x5f000000, 0x97000000, 0x44000000, 0x17000000, - 0xc4000000, 0xa7000000, 0x7e000000, 0x3d000000, - 0x64000000, 0x5d000000, 0x19000000, 0x73000000, - 0x60000000, 0x81000000, 0x4f000000, 0xdc000000, - 0x22000000, 0x2a000000, 0x90000000, 0x88000000, - 0x46000000, 0xee000000, 0xb8000000, 0x14000000, - 0xde000000, 0x5e000000, 0x0b000000, 0xdb000000, - 0xe0000000, 0x32000000, 0x3a000000, 0x0a000000, - 0x49000000, 0x06000000, 0x24000000, 0x5c000000, - 0xc2000000, 0xd3000000, 0xac000000, 0x62000000, - 0x91000000, 0x95000000, 0xe4000000, 0x79000000, - 0xe7000000, 0xc8000000, 0x37000000, 0x6d000000, - 0x8d000000, 0xd5000000, 0x4e000000, 0xa9000000, - 0x6c000000, 0x56000000, 0xf4000000, 0xea000000, - 0x65000000, 0x7a000000, 0xae000000, 0x08000000, - 0xba000000, 0x78000000, 0x25000000, 0x2e000000, - 0x1c000000, 0xa6000000, 0xb4000000, 0xc6000000, - 0xe8000000, 0xdd000000, 0x74000000, 0x1f000000, - 0x4b000000, 0xbd000000, 0x8b000000, 0x8a000000, - 0x70000000, 0x3e000000, 0xb5000000, 0x66000000, - 0x48000000, 0x03000000, 0xf6000000, 0x0e000000, - 0x61000000, 0x35000000, 0x57000000, 0xb9000000, - 0x86000000, 0xc1000000, 0x1d000000, 0x9e000000, - 0xe1000000, 0xf8000000, 0x98000000, 0x11000000, - 0x69000000, 0xd9000000, 0x8e000000, 0x94000000, - 0x9b000000, 0x1e000000, 0x87000000, 0xe9000000, - 0xce000000, 0x55000000, 0x28000000, 0xdf000000, - 0x8c000000, 0xa1000000, 0x89000000, 0x0d000000, - 0xbf000000, 0xe6000000, 0x42000000, 0x68000000, - 0x41000000, 0x99000000, 0x2d000000, 0x0f000000, - 0xb0000000, 0x54000000, 0xbb000000, 0x16000000, - } -}; - -__visible const u32 crypto_it_tab[4][256] ____cacheline_aligned = { - { - 0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a, - 0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b, - 0x55fa3020, 0xf66d76ad, 0x9176cc88, 0x254c02f5, - 0xfcd7e54f, 0xd7cb2ac5, 0x80443526, 0x8fa362b5, - 0x495ab1de, 0x671bba25, 0x980eea45, 0xe1c0fe5d, - 0x02752fc3, 0x12f04c81, 0xa397468d, 0xc6f9d36b, - 0xe75f8f03, 0x959c9215, 0xeb7a6dbf, 0xda595295, - 0x2d83bed4, 0xd3217458, 0x2969e049, 0x44c8c98e, - 0x6a89c275, 0x78798ef4, 0x6b3e5899, 0xdd71b927, - 0xb64fe1be, 0x17ad88f0, 0x66ac20c9, 0xb43ace7d, - 0x184adf63, 0x82311ae5, 0x60335197, 0x457f5362, - 0xe07764b1, 0x84ae6bbb, 0x1ca081fe, 0x942b08f9, - 0x58684870, 0x19fd458f, 0x876cde94, 0xb7f87b52, - 0x23d373ab, 0xe2024b72, 0x578f1fe3, 0x2aab5566, - 0x0728ebb2, 0x03c2b52f, 0x9a7bc586, 0xa50837d3, - 0xf2872830, 0xb2a5bf23, 0xba6a0302, 0x5c8216ed, - 0x2b1ccf8a, 0x92b479a7, 0xf0f207f3, 0xa1e2694e, - 0xcdf4da65, 0xd5be0506, 0x1f6234d1, 0x8afea6c4, - 0x9d532e34, 0xa055f3a2, 0x32e18a05, 0x75ebf6a4, - 0x39ec830b, 0xaaef6040, 0x069f715e, 0x51106ebd, - 0xf98a213e, 0x3d06dd96, 0xae053edd, 0x46bde64d, - 0xb58d5491, 0x055dc471, 0x6fd40604, 0xff155060, - 0x24fb9819, 0x97e9bdd6, 0xcc434089, 0x779ed967, - 0xbd42e8b0, 0x888b8907, 0x385b19e7, 0xdbeec879, - 0x470a7ca1, 0xe90f427c, 0xc91e84f8, 0x00000000, - 0x83868009, 0x48ed2b32, 0xac70111e, 0x4e725a6c, - 0xfbff0efd, 0x5638850f, 0x1ed5ae3d, 0x27392d36, - 0x64d90f0a, 0x21a65c68, 0xd1545b9b, 0x3a2e3624, - 0xb1670a0c, 0x0fe75793, 0xd296eeb4, 0x9e919b1b, - 0x4fc5c080, 0xa220dc61, 0x694b775a, 0x161a121c, - 0x0aba93e2, 0xe52aa0c0, 0x43e0223c, 0x1d171b12, - 0x0b0d090e, 0xadc78bf2, 0xb9a8b62d, 0xc8a91e14, - 0x8519f157, 0x4c0775af, 0xbbdd99ee, 0xfd607fa3, - 0x9f2601f7, 0xbcf5725c, 0xc53b6644, 0x347efb5b, - 0x7629438b, 0xdcc623cb, 0x68fcedb6, 0x63f1e4b8, - 0xcadc31d7, 0x10856342, 0x40229713, 0x2011c684, - 0x7d244a85, 0xf83dbbd2, 0x1132f9ae, 0x6da129c7, - 0x4b2f9e1d, 0xf330b2dc, 0xec52860d, 0xd0e3c177, - 0x6c16b32b, 0x99b970a9, 0xfa489411, 0x2264e947, - 0xc48cfca8, 0x1a3ff0a0, 0xd82c7d56, 0xef903322, - 0xc74e4987, 0xc1d138d9, 0xfea2ca8c, 0x360bd498, - 0xcf81f5a6, 0x28de7aa5, 0x268eb7da, 0xa4bfad3f, - 0xe49d3a2c, 0x0d927850, 0x9bcc5f6a, 0x62467e54, - 0xc2138df6, 0xe8b8d890, 0x5ef7392e, 0xf5afc382, - 0xbe805d9f, 0x7c93d069, 0xa92dd56f, 0xb31225cf, - 0x3b99acc8, 0xa77d1810, 0x6e639ce8, 0x7bbb3bdb, - 0x097826cd, 0xf418596e, 0x01b79aec, 0xa89a4f83, - 0x656e95e6, 0x7ee6ffaa, 0x08cfbc21, 0xe6e815ef, - 0xd99be7ba, 0xce366f4a, 0xd4099fea, 0xd67cb029, - 0xafb2a431, 0x31233f2a, 0x3094a5c6, 0xc066a235, - 0x37bc4e74, 0xa6ca82fc, 0xb0d090e0, 0x15d8a733, - 0x4a9804f1, 0xf7daec41, 0x0e50cd7f, 0x2ff69117, - 0x8dd64d76, 0x4db0ef43, 0x544daacc, 0xdf0496e4, - 0xe3b5d19e, 0x1b886a4c, 0xb81f2cc1, 0x7f516546, - 0x04ea5e9d, 0x5d358c01, 0x737487fa, 0x2e410bfb, - 0x5a1d67b3, 0x52d2db92, 0x335610e9, 0x1347d66d, - 0x8c61d79a, 0x7a0ca137, 0x8e14f859, 0x893c13eb, - 0xee27a9ce, 0x35c961b7, 0xede51ce1, 0x3cb1477a, - 0x59dfd29c, 0x3f73f255, 0x79ce1418, 0xbf37c773, - 0xeacdf753, 0x5baafd5f, 0x146f3ddf, 0x86db4478, - 0x81f3afca, 0x3ec468b9, 0x2c342438, 0x5f40a3c2, - 0x72c31d16, 0x0c25e2bc, 0x8b493c28, 0x41950dff, - 0x7101a839, 0xdeb30c08, 0x9ce4b4d8, 0x90c15664, - 0x6184cb7b, 0x70b632d5, 0x745c6c48, 0x4257b8d0, - }, { - 0xa7f45150, 0x65417e53, 0xa4171ac3, 0x5e273a96, - 0x6bab3bcb, 0x459d1ff1, 0x58faacab, 0x03e34b93, - 0xfa302055, 0x6d76adf6, 0x76cc8891, 0x4c02f525, - 0xd7e54ffc, 0xcb2ac5d7, 0x44352680, 0xa362b58f, - 0x5ab1de49, 0x1bba2567, 0x0eea4598, 0xc0fe5de1, - 0x752fc302, 0xf04c8112, 0x97468da3, 0xf9d36bc6, - 0x5f8f03e7, 0x9c921595, 0x7a6dbfeb, 0x595295da, - 0x83bed42d, 0x217458d3, 0x69e04929, 0xc8c98e44, - 0x89c2756a, 0x798ef478, 0x3e58996b, 0x71b927dd, - 0x4fe1beb6, 0xad88f017, 0xac20c966, 0x3ace7db4, - 0x4adf6318, 0x311ae582, 0x33519760, 0x7f536245, - 0x7764b1e0, 0xae6bbb84, 0xa081fe1c, 0x2b08f994, - 0x68487058, 0xfd458f19, 0x6cde9487, 0xf87b52b7, - 0xd373ab23, 0x024b72e2, 0x8f1fe357, 0xab55662a, - 0x28ebb207, 0xc2b52f03, 0x7bc5869a, 0x0837d3a5, - 0x872830f2, 0xa5bf23b2, 0x6a0302ba, 0x8216ed5c, - 0x1ccf8a2b, 0xb479a792, 0xf207f3f0, 0xe2694ea1, - 0xf4da65cd, 0xbe0506d5, 0x6234d11f, 0xfea6c48a, - 0x532e349d, 0x55f3a2a0, 0xe18a0532, 0xebf6a475, - 0xec830b39, 0xef6040aa, 0x9f715e06, 0x106ebd51, - 0x8a213ef9, 0x06dd963d, 0x053eddae, 0xbde64d46, - 0x8d5491b5, 0x5dc47105, 0xd406046f, 0x155060ff, - 0xfb981924, 0xe9bdd697, 0x434089cc, 0x9ed96777, - 0x42e8b0bd, 0x8b890788, 0x5b19e738, 0xeec879db, - 0x0a7ca147, 0x0f427ce9, 0x1e84f8c9, 0x00000000, - 0x86800983, 0xed2b3248, 0x70111eac, 0x725a6c4e, - 0xff0efdfb, 0x38850f56, 0xd5ae3d1e, 0x392d3627, - 0xd90f0a64, 0xa65c6821, 0x545b9bd1, 0x2e36243a, - 0x670a0cb1, 0xe757930f, 0x96eeb4d2, 0x919b1b9e, - 0xc5c0804f, 0x20dc61a2, 0x4b775a69, 0x1a121c16, - 0xba93e20a, 0x2aa0c0e5, 0xe0223c43, 0x171b121d, - 0x0d090e0b, 0xc78bf2ad, 0xa8b62db9, 0xa91e14c8, - 0x19f15785, 0x0775af4c, 0xdd99eebb, 0x607fa3fd, - 0x2601f79f, 0xf5725cbc, 0x3b6644c5, 0x7efb5b34, - 0x29438b76, 0xc623cbdc, 0xfcedb668, 0xf1e4b863, - 0xdc31d7ca, 0x85634210, 0x22971340, 0x11c68420, - 0x244a857d, 0x3dbbd2f8, 0x32f9ae11, 0xa129c76d, - 0x2f9e1d4b, 0x30b2dcf3, 0x52860dec, 0xe3c177d0, - 0x16b32b6c, 0xb970a999, 0x489411fa, 0x64e94722, - 0x8cfca8c4, 0x3ff0a01a, 0x2c7d56d8, 0x903322ef, - 0x4e4987c7, 0xd138d9c1, 0xa2ca8cfe, 0x0bd49836, - 0x81f5a6cf, 0xde7aa528, 0x8eb7da26, 0xbfad3fa4, - 0x9d3a2ce4, 0x9278500d, 0xcc5f6a9b, 0x467e5462, - 0x138df6c2, 0xb8d890e8, 0xf7392e5e, 0xafc382f5, - 0x805d9fbe, 0x93d0697c, 0x2dd56fa9, 0x1225cfb3, - 0x99acc83b, 0x7d1810a7, 0x639ce86e, 0xbb3bdb7b, - 0x7826cd09, 0x18596ef4, 0xb79aec01, 0x9a4f83a8, - 0x6e95e665, 0xe6ffaa7e, 0xcfbc2108, 0xe815efe6, - 0x9be7bad9, 0x366f4ace, 0x099fead4, 0x7cb029d6, - 0xb2a431af, 0x233f2a31, 0x94a5c630, 0x66a235c0, - 0xbc4e7437, 0xca82fca6, 0xd090e0b0, 0xd8a73315, - 0x9804f14a, 0xdaec41f7, 0x50cd7f0e, 0xf691172f, - 0xd64d768d, 0xb0ef434d, 0x4daacc54, 0x0496e4df, - 0xb5d19ee3, 0x886a4c1b, 0x1f2cc1b8, 0x5165467f, - 0xea5e9d04, 0x358c015d, 0x7487fa73, 0x410bfb2e, - 0x1d67b35a, 0xd2db9252, 0x5610e933, 0x47d66d13, - 0x61d79a8c, 0x0ca1377a, 0x14f8598e, 0x3c13eb89, - 0x27a9ceee, 0xc961b735, 0xe51ce1ed, 0xb1477a3c, - 0xdfd29c59, 0x73f2553f, 0xce141879, 0x37c773bf, - 0xcdf753ea, 0xaafd5f5b, 0x6f3ddf14, 0xdb447886, - 0xf3afca81, 0xc468b93e, 0x3424382c, 0x40a3c25f, - 0xc31d1672, 0x25e2bc0c, 0x493c288b, 0x950dff41, - 0x01a83971, 0xb30c08de, 0xe4b4d89c, 0xc1566490, - 0x84cb7b61, 0xb632d570, 0x5c6c4874, 0x57b8d042, - }, { - 0xf45150a7, 0x417e5365, 0x171ac3a4, 0x273a965e, - 0xab3bcb6b, 0x9d1ff145, 0xfaacab58, 0xe34b9303, - 0x302055fa, 0x76adf66d, 0xcc889176, 0x02f5254c, - 0xe54ffcd7, 0x2ac5d7cb, 0x35268044, 0x62b58fa3, - 0xb1de495a, 0xba25671b, 0xea45980e, 0xfe5de1c0, - 0x2fc30275, 0x4c8112f0, 0x468da397, 0xd36bc6f9, - 0x8f03e75f, 0x9215959c, 0x6dbfeb7a, 0x5295da59, - 0xbed42d83, 0x7458d321, 0xe0492969, 0xc98e44c8, - 0xc2756a89, 0x8ef47879, 0x58996b3e, 0xb927dd71, - 0xe1beb64f, 0x88f017ad, 0x20c966ac, 0xce7db43a, - 0xdf63184a, 0x1ae58231, 0x51976033, 0x5362457f, - 0x64b1e077, 0x6bbb84ae, 0x81fe1ca0, 0x08f9942b, - 0x48705868, 0x458f19fd, 0xde94876c, 0x7b52b7f8, - 0x73ab23d3, 0x4b72e202, 0x1fe3578f, 0x55662aab, - 0xebb20728, 0xb52f03c2, 0xc5869a7b, 0x37d3a508, - 0x2830f287, 0xbf23b2a5, 0x0302ba6a, 0x16ed5c82, - 0xcf8a2b1c, 0x79a792b4, 0x07f3f0f2, 0x694ea1e2, - 0xda65cdf4, 0x0506d5be, 0x34d11f62, 0xa6c48afe, - 0x2e349d53, 0xf3a2a055, 0x8a0532e1, 0xf6a475eb, - 0x830b39ec, 0x6040aaef, 0x715e069f, 0x6ebd5110, - 0x213ef98a, 0xdd963d06, 0x3eddae05, 0xe64d46bd, - 0x5491b58d, 0xc471055d, 0x06046fd4, 0x5060ff15, - 0x981924fb, 0xbdd697e9, 0x4089cc43, 0xd967779e, - 0xe8b0bd42, 0x8907888b, 0x19e7385b, 0xc879dbee, - 0x7ca1470a, 0x427ce90f, 0x84f8c91e, 0x00000000, - 0x80098386, 0x2b3248ed, 0x111eac70, 0x5a6c4e72, - 0x0efdfbff, 0x850f5638, 0xae3d1ed5, 0x2d362739, - 0x0f0a64d9, 0x5c6821a6, 0x5b9bd154, 0x36243a2e, - 0x0a0cb167, 0x57930fe7, 0xeeb4d296, 0x9b1b9e91, - 0xc0804fc5, 0xdc61a220, 0x775a694b, 0x121c161a, - 0x93e20aba, 0xa0c0e52a, 0x223c43e0, 0x1b121d17, - 0x090e0b0d, 0x8bf2adc7, 0xb62db9a8, 0x1e14c8a9, - 0xf1578519, 0x75af4c07, 0x99eebbdd, 0x7fa3fd60, - 0x01f79f26, 0x725cbcf5, 0x6644c53b, 0xfb5b347e, - 0x438b7629, 0x23cbdcc6, 0xedb668fc, 0xe4b863f1, - 0x31d7cadc, 0x63421085, 0x97134022, 0xc6842011, - 0x4a857d24, 0xbbd2f83d, 0xf9ae1132, 0x29c76da1, - 0x9e1d4b2f, 0xb2dcf330, 0x860dec52, 0xc177d0e3, - 0xb32b6c16, 0x70a999b9, 0x9411fa48, 0xe9472264, - 0xfca8c48c, 0xf0a01a3f, 0x7d56d82c, 0x3322ef90, - 0x4987c74e, 0x38d9c1d1, 0xca8cfea2, 0xd498360b, - 0xf5a6cf81, 0x7aa528de, 0xb7da268e, 0xad3fa4bf, - 0x3a2ce49d, 0x78500d92, 0x5f6a9bcc, 0x7e546246, - 0x8df6c213, 0xd890e8b8, 0x392e5ef7, 0xc382f5af, - 0x5d9fbe80, 0xd0697c93, 0xd56fa92d, 0x25cfb312, - 0xacc83b99, 0x1810a77d, 0x9ce86e63, 0x3bdb7bbb, - 0x26cd0978, 0x596ef418, 0x9aec01b7, 0x4f83a89a, - 0x95e6656e, 0xffaa7ee6, 0xbc2108cf, 0x15efe6e8, - 0xe7bad99b, 0x6f4ace36, 0x9fead409, 0xb029d67c, - 0xa431afb2, 0x3f2a3123, 0xa5c63094, 0xa235c066, - 0x4e7437bc, 0x82fca6ca, 0x90e0b0d0, 0xa73315d8, - 0x04f14a98, 0xec41f7da, 0xcd7f0e50, 0x91172ff6, - 0x4d768dd6, 0xef434db0, 0xaacc544d, 0x96e4df04, - 0xd19ee3b5, 0x6a4c1b88, 0x2cc1b81f, 0x65467f51, - 0x5e9d04ea, 0x8c015d35, 0x87fa7374, 0x0bfb2e41, - 0x67b35a1d, 0xdb9252d2, 0x10e93356, 0xd66d1347, - 0xd79a8c61, 0xa1377a0c, 0xf8598e14, 0x13eb893c, - 0xa9ceee27, 0x61b735c9, 0x1ce1ede5, 0x477a3cb1, - 0xd29c59df, 0xf2553f73, 0x141879ce, 0xc773bf37, - 0xf753eacd, 0xfd5f5baa, 0x3ddf146f, 0x447886db, - 0xafca81f3, 0x68b93ec4, 0x24382c34, 0xa3c25f40, - 0x1d1672c3, 0xe2bc0c25, 0x3c288b49, 0x0dff4195, - 0xa8397101, 0x0c08deb3, 0xb4d89ce4, 0x566490c1, - 0xcb7b6184, 0x32d570b6, 0x6c48745c, 0xb8d04257, - }, { - 0x5150a7f4, 0x7e536541, 0x1ac3a417, 0x3a965e27, - 0x3bcb6bab, 0x1ff1459d, 0xacab58fa, 0x4b9303e3, - 0x2055fa30, 0xadf66d76, 0x889176cc, 0xf5254c02, - 0x4ffcd7e5, 0xc5d7cb2a, 0x26804435, 0xb58fa362, - 0xde495ab1, 0x25671bba, 0x45980eea, 0x5de1c0fe, - 0xc302752f, 0x8112f04c, 0x8da39746, 0x6bc6f9d3, - 0x03e75f8f, 0x15959c92, 0xbfeb7a6d, 0x95da5952, - 0xd42d83be, 0x58d32174, 0x492969e0, 0x8e44c8c9, - 0x756a89c2, 0xf478798e, 0x996b3e58, 0x27dd71b9, - 0xbeb64fe1, 0xf017ad88, 0xc966ac20, 0x7db43ace, - 0x63184adf, 0xe582311a, 0x97603351, 0x62457f53, - 0xb1e07764, 0xbb84ae6b, 0xfe1ca081, 0xf9942b08, - 0x70586848, 0x8f19fd45, 0x94876cde, 0x52b7f87b, - 0xab23d373, 0x72e2024b, 0xe3578f1f, 0x662aab55, - 0xb20728eb, 0x2f03c2b5, 0x869a7bc5, 0xd3a50837, - 0x30f28728, 0x23b2a5bf, 0x02ba6a03, 0xed5c8216, - 0x8a2b1ccf, 0xa792b479, 0xf3f0f207, 0x4ea1e269, - 0x65cdf4da, 0x06d5be05, 0xd11f6234, 0xc48afea6, - 0x349d532e, 0xa2a055f3, 0x0532e18a, 0xa475ebf6, - 0x0b39ec83, 0x40aaef60, 0x5e069f71, 0xbd51106e, - 0x3ef98a21, 0x963d06dd, 0xddae053e, 0x4d46bde6, - 0x91b58d54, 0x71055dc4, 0x046fd406, 0x60ff1550, - 0x1924fb98, 0xd697e9bd, 0x89cc4340, 0x67779ed9, - 0xb0bd42e8, 0x07888b89, 0xe7385b19, 0x79dbeec8, - 0xa1470a7c, 0x7ce90f42, 0xf8c91e84, 0x00000000, - 0x09838680, 0x3248ed2b, 0x1eac7011, 0x6c4e725a, - 0xfdfbff0e, 0x0f563885, 0x3d1ed5ae, 0x3627392d, - 0x0a64d90f, 0x6821a65c, 0x9bd1545b, 0x243a2e36, - 0x0cb1670a, 0x930fe757, 0xb4d296ee, 0x1b9e919b, - 0x804fc5c0, 0x61a220dc, 0x5a694b77, 0x1c161a12, - 0xe20aba93, 0xc0e52aa0, 0x3c43e022, 0x121d171b, - 0x0e0b0d09, 0xf2adc78b, 0x2db9a8b6, 0x14c8a91e, - 0x578519f1, 0xaf4c0775, 0xeebbdd99, 0xa3fd607f, - 0xf79f2601, 0x5cbcf572, 0x44c53b66, 0x5b347efb, - 0x8b762943, 0xcbdcc623, 0xb668fced, 0xb863f1e4, - 0xd7cadc31, 0x42108563, 0x13402297, 0x842011c6, - 0x857d244a, 0xd2f83dbb, 0xae1132f9, 0xc76da129, - 0x1d4b2f9e, 0xdcf330b2, 0x0dec5286, 0x77d0e3c1, - 0x2b6c16b3, 0xa999b970, 0x11fa4894, 0x472264e9, - 0xa8c48cfc, 0xa01a3ff0, 0x56d82c7d, 0x22ef9033, - 0x87c74e49, 0xd9c1d138, 0x8cfea2ca, 0x98360bd4, - 0xa6cf81f5, 0xa528de7a, 0xda268eb7, 0x3fa4bfad, - 0x2ce49d3a, 0x500d9278, 0x6a9bcc5f, 0x5462467e, - 0xf6c2138d, 0x90e8b8d8, 0x2e5ef739, 0x82f5afc3, - 0x9fbe805d, 0x697c93d0, 0x6fa92dd5, 0xcfb31225, - 0xc83b99ac, 0x10a77d18, 0xe86e639c, 0xdb7bbb3b, - 0xcd097826, 0x6ef41859, 0xec01b79a, 0x83a89a4f, - 0xe6656e95, 0xaa7ee6ff, 0x2108cfbc, 0xefe6e815, - 0xbad99be7, 0x4ace366f, 0xead4099f, 0x29d67cb0, - 0x31afb2a4, 0x2a31233f, 0xc63094a5, 0x35c066a2, - 0x7437bc4e, 0xfca6ca82, 0xe0b0d090, 0x3315d8a7, - 0xf14a9804, 0x41f7daec, 0x7f0e50cd, 0x172ff691, - 0x768dd64d, 0x434db0ef, 0xcc544daa, 0xe4df0496, - 0x9ee3b5d1, 0x4c1b886a, 0xc1b81f2c, 0x467f5165, - 0x9d04ea5e, 0x015d358c, 0xfa737487, 0xfb2e410b, - 0xb35a1d67, 0x9252d2db, 0xe9335610, 0x6d1347d6, - 0x9a8c61d7, 0x377a0ca1, 0x598e14f8, 0xeb893c13, - 0xceee27a9, 0xb735c961, 0xe1ede51c, 0x7a3cb147, - 0x9c59dfd2, 0x553f73f2, 0x1879ce14, 0x73bf37c7, - 0x53eacdf7, 0x5f5baafd, 0xdf146f3d, 0x7886db44, - 0xca81f3af, 0xb93ec468, 0x382c3424, 0xc25f40a3, - 0x1672c31d, 0xbc0c25e2, 0x288b493c, 0xff41950d, - 0x397101a8, 0x08deb30c, 0xd89ce4b4, 0x6490c156, - 0x7b6184cb, 0xd570b632, 0x48745c6c, 0xd04257b8, - } -}; - -static const u32 crypto_il_tab[4][256] ____cacheline_aligned = { - { - 0x00000052, 0x00000009, 0x0000006a, 0x000000d5, - 0x00000030, 0x00000036, 0x000000a5, 0x00000038, - 0x000000bf, 0x00000040, 0x000000a3, 0x0000009e, - 0x00000081, 0x000000f3, 0x000000d7, 0x000000fb, - 0x0000007c, 0x000000e3, 0x00000039, 0x00000082, - 0x0000009b, 0x0000002f, 0x000000ff, 0x00000087, - 0x00000034, 0x0000008e, 0x00000043, 0x00000044, - 0x000000c4, 0x000000de, 0x000000e9, 0x000000cb, - 0x00000054, 0x0000007b, 0x00000094, 0x00000032, - 0x000000a6, 0x000000c2, 0x00000023, 0x0000003d, - 0x000000ee, 0x0000004c, 0x00000095, 0x0000000b, - 0x00000042, 0x000000fa, 0x000000c3, 0x0000004e, - 0x00000008, 0x0000002e, 0x000000a1, 0x00000066, - 0x00000028, 0x000000d9, 0x00000024, 0x000000b2, - 0x00000076, 0x0000005b, 0x000000a2, 0x00000049, - 0x0000006d, 0x0000008b, 0x000000d1, 0x00000025, - 0x00000072, 0x000000f8, 0x000000f6, 0x00000064, - 0x00000086, 0x00000068, 0x00000098, 0x00000016, - 0x000000d4, 0x000000a4, 0x0000005c, 0x000000cc, - 0x0000005d, 0x00000065, 0x000000b6, 0x00000092, - 0x0000006c, 0x00000070, 0x00000048, 0x00000050, - 0x000000fd, 0x000000ed, 0x000000b9, 0x000000da, - 0x0000005e, 0x00000015, 0x00000046, 0x00000057, - 0x000000a7, 0x0000008d, 0x0000009d, 0x00000084, - 0x00000090, 0x000000d8, 0x000000ab, 0x00000000, - 0x0000008c, 0x000000bc, 0x000000d3, 0x0000000a, - 0x000000f7, 0x000000e4, 0x00000058, 0x00000005, - 0x000000b8, 0x000000b3, 0x00000045, 0x00000006, - 0x000000d0, 0x0000002c, 0x0000001e, 0x0000008f, - 0x000000ca, 0x0000003f, 0x0000000f, 0x00000002, - 0x000000c1, 0x000000af, 0x000000bd, 0x00000003, - 0x00000001, 0x00000013, 0x0000008a, 0x0000006b, - 0x0000003a, 0x00000091, 0x00000011, 0x00000041, - 0x0000004f, 0x00000067, 0x000000dc, 0x000000ea, - 0x00000097, 0x000000f2, 0x000000cf, 0x000000ce, - 0x000000f0, 0x000000b4, 0x000000e6, 0x00000073, - 0x00000096, 0x000000ac, 0x00000074, 0x00000022, - 0x000000e7, 0x000000ad, 0x00000035, 0x00000085, - 0x000000e2, 0x000000f9, 0x00000037, 0x000000e8, - 0x0000001c, 0x00000075, 0x000000df, 0x0000006e, - 0x00000047, 0x000000f1, 0x0000001a, 0x00000071, - 0x0000001d, 0x00000029, 0x000000c5, 0x00000089, - 0x0000006f, 0x000000b7, 0x00000062, 0x0000000e, - 0x000000aa, 0x00000018, 0x000000be, 0x0000001b, - 0x000000fc, 0x00000056, 0x0000003e, 0x0000004b, - 0x000000c6, 0x000000d2, 0x00000079, 0x00000020, - 0x0000009a, 0x000000db, 0x000000c0, 0x000000fe, - 0x00000078, 0x000000cd, 0x0000005a, 0x000000f4, - 0x0000001f, 0x000000dd, 0x000000a8, 0x00000033, - 0x00000088, 0x00000007, 0x000000c7, 0x00000031, - 0x000000b1, 0x00000012, 0x00000010, 0x00000059, - 0x00000027, 0x00000080, 0x000000ec, 0x0000005f, - 0x00000060, 0x00000051, 0x0000007f, 0x000000a9, - 0x00000019, 0x000000b5, 0x0000004a, 0x0000000d, - 0x0000002d, 0x000000e5, 0x0000007a, 0x0000009f, - 0x00000093, 0x000000c9, 0x0000009c, 0x000000ef, - 0x000000a0, 0x000000e0, 0x0000003b, 0x0000004d, - 0x000000ae, 0x0000002a, 0x000000f5, 0x000000b0, - 0x000000c8, 0x000000eb, 0x000000bb, 0x0000003c, - 0x00000083, 0x00000053, 0x00000099, 0x00000061, - 0x00000017, 0x0000002b, 0x00000004, 0x0000007e, - 0x000000ba, 0x00000077, 0x000000d6, 0x00000026, - 0x000000e1, 0x00000069, 0x00000014, 0x00000063, - 0x00000055, 0x00000021, 0x0000000c, 0x0000007d, - }, { - 0x00005200, 0x00000900, 0x00006a00, 0x0000d500, - 0x00003000, 0x00003600, 0x0000a500, 0x00003800, - 0x0000bf00, 0x00004000, 0x0000a300, 0x00009e00, - 0x00008100, 0x0000f300, 0x0000d700, 0x0000fb00, - 0x00007c00, 0x0000e300, 0x00003900, 0x00008200, - 0x00009b00, 0x00002f00, 0x0000ff00, 0x00008700, - 0x00003400, 0x00008e00, 0x00004300, 0x00004400, - 0x0000c400, 0x0000de00, 0x0000e900, 0x0000cb00, - 0x00005400, 0x00007b00, 0x00009400, 0x00003200, - 0x0000a600, 0x0000c200, 0x00002300, 0x00003d00, - 0x0000ee00, 0x00004c00, 0x00009500, 0x00000b00, - 0x00004200, 0x0000fa00, 0x0000c300, 0x00004e00, - 0x00000800, 0x00002e00, 0x0000a100, 0x00006600, - 0x00002800, 0x0000d900, 0x00002400, 0x0000b200, - 0x00007600, 0x00005b00, 0x0000a200, 0x00004900, - 0x00006d00, 0x00008b00, 0x0000d100, 0x00002500, - 0x00007200, 0x0000f800, 0x0000f600, 0x00006400, - 0x00008600, 0x00006800, 0x00009800, 0x00001600, - 0x0000d400, 0x0000a400, 0x00005c00, 0x0000cc00, - 0x00005d00, 0x00006500, 0x0000b600, 0x00009200, - 0x00006c00, 0x00007000, 0x00004800, 0x00005000, - 0x0000fd00, 0x0000ed00, 0x0000b900, 0x0000da00, - 0x00005e00, 0x00001500, 0x00004600, 0x00005700, - 0x0000a700, 0x00008d00, 0x00009d00, 0x00008400, - 0x00009000, 0x0000d800, 0x0000ab00, 0x00000000, - 0x00008c00, 0x0000bc00, 0x0000d300, 0x00000a00, - 0x0000f700, 0x0000e400, 0x00005800, 0x00000500, - 0x0000b800, 0x0000b300, 0x00004500, 0x00000600, - 0x0000d000, 0x00002c00, 0x00001e00, 0x00008f00, - 0x0000ca00, 0x00003f00, 0x00000f00, 0x00000200, - 0x0000c100, 0x0000af00, 0x0000bd00, 0x00000300, - 0x00000100, 0x00001300, 0x00008a00, 0x00006b00, - 0x00003a00, 0x00009100, 0x00001100, 0x00004100, - 0x00004f00, 0x00006700, 0x0000dc00, 0x0000ea00, - 0x00009700, 0x0000f200, 0x0000cf00, 0x0000ce00, - 0x0000f000, 0x0000b400, 0x0000e600, 0x00007300, - 0x00009600, 0x0000ac00, 0x00007400, 0x00002200, - 0x0000e700, 0x0000ad00, 0x00003500, 0x00008500, - 0x0000e200, 0x0000f900, 0x00003700, 0x0000e800, - 0x00001c00, 0x00007500, 0x0000df00, 0x00006e00, - 0x00004700, 0x0000f100, 0x00001a00, 0x00007100, - 0x00001d00, 0x00002900, 0x0000c500, 0x00008900, - 0x00006f00, 0x0000b700, 0x00006200, 0x00000e00, - 0x0000aa00, 0x00001800, 0x0000be00, 0x00001b00, - 0x0000fc00, 0x00005600, 0x00003e00, 0x00004b00, - 0x0000c600, 0x0000d200, 0x00007900, 0x00002000, - 0x00009a00, 0x0000db00, 0x0000c000, 0x0000fe00, - 0x00007800, 0x0000cd00, 0x00005a00, 0x0000f400, - 0x00001f00, 0x0000dd00, 0x0000a800, 0x00003300, - 0x00008800, 0x00000700, 0x0000c700, 0x00003100, - 0x0000b100, 0x00001200, 0x00001000, 0x00005900, - 0x00002700, 0x00008000, 0x0000ec00, 0x00005f00, - 0x00006000, 0x00005100, 0x00007f00, 0x0000a900, - 0x00001900, 0x0000b500, 0x00004a00, 0x00000d00, - 0x00002d00, 0x0000e500, 0x00007a00, 0x00009f00, - 0x00009300, 0x0000c900, 0x00009c00, 0x0000ef00, - 0x0000a000, 0x0000e000, 0x00003b00, 0x00004d00, - 0x0000ae00, 0x00002a00, 0x0000f500, 0x0000b000, - 0x0000c800, 0x0000eb00, 0x0000bb00, 0x00003c00, - 0x00008300, 0x00005300, 0x00009900, 0x00006100, - 0x00001700, 0x00002b00, 0x00000400, 0x00007e00, - 0x0000ba00, 0x00007700, 0x0000d600, 0x00002600, - 0x0000e100, 0x00006900, 0x00001400, 0x00006300, - 0x00005500, 0x00002100, 0x00000c00, 0x00007d00, - }, { - 0x00520000, 0x00090000, 0x006a0000, 0x00d50000, - 0x00300000, 0x00360000, 0x00a50000, 0x00380000, - 0x00bf0000, 0x00400000, 0x00a30000, 0x009e0000, - 0x00810000, 0x00f30000, 0x00d70000, 0x00fb0000, - 0x007c0000, 0x00e30000, 0x00390000, 0x00820000, - 0x009b0000, 0x002f0000, 0x00ff0000, 0x00870000, - 0x00340000, 0x008e0000, 0x00430000, 0x00440000, - 0x00c40000, 0x00de0000, 0x00e90000, 0x00cb0000, - 0x00540000, 0x007b0000, 0x00940000, 0x00320000, - 0x00a60000, 0x00c20000, 0x00230000, 0x003d0000, - 0x00ee0000, 0x004c0000, 0x00950000, 0x000b0000, - 0x00420000, 0x00fa0000, 0x00c30000, 0x004e0000, - 0x00080000, 0x002e0000, 0x00a10000, 0x00660000, - 0x00280000, 0x00d90000, 0x00240000, 0x00b20000, - 0x00760000, 0x005b0000, 0x00a20000, 0x00490000, - 0x006d0000, 0x008b0000, 0x00d10000, 0x00250000, - 0x00720000, 0x00f80000, 0x00f60000, 0x00640000, - 0x00860000, 0x00680000, 0x00980000, 0x00160000, - 0x00d40000, 0x00a40000, 0x005c0000, 0x00cc0000, - 0x005d0000, 0x00650000, 0x00b60000, 0x00920000, - 0x006c0000, 0x00700000, 0x00480000, 0x00500000, - 0x00fd0000, 0x00ed0000, 0x00b90000, 0x00da0000, - 0x005e0000, 0x00150000, 0x00460000, 0x00570000, - 0x00a70000, 0x008d0000, 0x009d0000, 0x00840000, - 0x00900000, 0x00d80000, 0x00ab0000, 0x00000000, - 0x008c0000, 0x00bc0000, 0x00d30000, 0x000a0000, - 0x00f70000, 0x00e40000, 0x00580000, 0x00050000, - 0x00b80000, 0x00b30000, 0x00450000, 0x00060000, - 0x00d00000, 0x002c0000, 0x001e0000, 0x008f0000, - 0x00ca0000, 0x003f0000, 0x000f0000, 0x00020000, - 0x00c10000, 0x00af0000, 0x00bd0000, 0x00030000, - 0x00010000, 0x00130000, 0x008a0000, 0x006b0000, - 0x003a0000, 0x00910000, 0x00110000, 0x00410000, - 0x004f0000, 0x00670000, 0x00dc0000, 0x00ea0000, - 0x00970000, 0x00f20000, 0x00cf0000, 0x00ce0000, - 0x00f00000, 0x00b40000, 0x00e60000, 0x00730000, - 0x00960000, 0x00ac0000, 0x00740000, 0x00220000, - 0x00e70000, 0x00ad0000, 0x00350000, 0x00850000, - 0x00e20000, 0x00f90000, 0x00370000, 0x00e80000, - 0x001c0000, 0x00750000, 0x00df0000, 0x006e0000, - 0x00470000, 0x00f10000, 0x001a0000, 0x00710000, - 0x001d0000, 0x00290000, 0x00c50000, 0x00890000, - 0x006f0000, 0x00b70000, 0x00620000, 0x000e0000, - 0x00aa0000, 0x00180000, 0x00be0000, 0x001b0000, - 0x00fc0000, 0x00560000, 0x003e0000, 0x004b0000, - 0x00c60000, 0x00d20000, 0x00790000, 0x00200000, - 0x009a0000, 0x00db0000, 0x00c00000, 0x00fe0000, - 0x00780000, 0x00cd0000, 0x005a0000, 0x00f40000, - 0x001f0000, 0x00dd0000, 0x00a80000, 0x00330000, - 0x00880000, 0x00070000, 0x00c70000, 0x00310000, - 0x00b10000, 0x00120000, 0x00100000, 0x00590000, - 0x00270000, 0x00800000, 0x00ec0000, 0x005f0000, - 0x00600000, 0x00510000, 0x007f0000, 0x00a90000, - 0x00190000, 0x00b50000, 0x004a0000, 0x000d0000, - 0x002d0000, 0x00e50000, 0x007a0000, 0x009f0000, - 0x00930000, 0x00c90000, 0x009c0000, 0x00ef0000, - 0x00a00000, 0x00e00000, 0x003b0000, 0x004d0000, - 0x00ae0000, 0x002a0000, 0x00f50000, 0x00b00000, - 0x00c80000, 0x00eb0000, 0x00bb0000, 0x003c0000, - 0x00830000, 0x00530000, 0x00990000, 0x00610000, - 0x00170000, 0x002b0000, 0x00040000, 0x007e0000, - 0x00ba0000, 0x00770000, 0x00d60000, 0x00260000, - 0x00e10000, 0x00690000, 0x00140000, 0x00630000, - 0x00550000, 0x00210000, 0x000c0000, 0x007d0000, - }, { - 0x52000000, 0x09000000, 0x6a000000, 0xd5000000, - 0x30000000, 0x36000000, 0xa5000000, 0x38000000, - 0xbf000000, 0x40000000, 0xa3000000, 0x9e000000, - 0x81000000, 0xf3000000, 0xd7000000, 0xfb000000, - 0x7c000000, 0xe3000000, 0x39000000, 0x82000000, - 0x9b000000, 0x2f000000, 0xff000000, 0x87000000, - 0x34000000, 0x8e000000, 0x43000000, 0x44000000, - 0xc4000000, 0xde000000, 0xe9000000, 0xcb000000, - 0x54000000, 0x7b000000, 0x94000000, 0x32000000, - 0xa6000000, 0xc2000000, 0x23000000, 0x3d000000, - 0xee000000, 0x4c000000, 0x95000000, 0x0b000000, - 0x42000000, 0xfa000000, 0xc3000000, 0x4e000000, - 0x08000000, 0x2e000000, 0xa1000000, 0x66000000, - 0x28000000, 0xd9000000, 0x24000000, 0xb2000000, - 0x76000000, 0x5b000000, 0xa2000000, 0x49000000, - 0x6d000000, 0x8b000000, 0xd1000000, 0x25000000, - 0x72000000, 0xf8000000, 0xf6000000, 0x64000000, - 0x86000000, 0x68000000, 0x98000000, 0x16000000, - 0xd4000000, 0xa4000000, 0x5c000000, 0xcc000000, - 0x5d000000, 0x65000000, 0xb6000000, 0x92000000, - 0x6c000000, 0x70000000, 0x48000000, 0x50000000, - 0xfd000000, 0xed000000, 0xb9000000, 0xda000000, - 0x5e000000, 0x15000000, 0x46000000, 0x57000000, - 0xa7000000, 0x8d000000, 0x9d000000, 0x84000000, - 0x90000000, 0xd8000000, 0xab000000, 0x00000000, - 0x8c000000, 0xbc000000, 0xd3000000, 0x0a000000, - 0xf7000000, 0xe4000000, 0x58000000, 0x05000000, - 0xb8000000, 0xb3000000, 0x45000000, 0x06000000, - 0xd0000000, 0x2c000000, 0x1e000000, 0x8f000000, - 0xca000000, 0x3f000000, 0x0f000000, 0x02000000, - 0xc1000000, 0xaf000000, 0xbd000000, 0x03000000, - 0x01000000, 0x13000000, 0x8a000000, 0x6b000000, - 0x3a000000, 0x91000000, 0x11000000, 0x41000000, - 0x4f000000, 0x67000000, 0xdc000000, 0xea000000, - 0x97000000, 0xf2000000, 0xcf000000, 0xce000000, - 0xf0000000, 0xb4000000, 0xe6000000, 0x73000000, - 0x96000000, 0xac000000, 0x74000000, 0x22000000, - 0xe7000000, 0xad000000, 0x35000000, 0x85000000, - 0xe2000000, 0xf9000000, 0x37000000, 0xe8000000, - 0x1c000000, 0x75000000, 0xdf000000, 0x6e000000, - 0x47000000, 0xf1000000, 0x1a000000, 0x71000000, - 0x1d000000, 0x29000000, 0xc5000000, 0x89000000, - 0x6f000000, 0xb7000000, 0x62000000, 0x0e000000, - 0xaa000000, 0x18000000, 0xbe000000, 0x1b000000, - 0xfc000000, 0x56000000, 0x3e000000, 0x4b000000, - 0xc6000000, 0xd2000000, 0x79000000, 0x20000000, - 0x9a000000, 0xdb000000, 0xc0000000, 0xfe000000, - 0x78000000, 0xcd000000, 0x5a000000, 0xf4000000, - 0x1f000000, 0xdd000000, 0xa8000000, 0x33000000, - 0x88000000, 0x07000000, 0xc7000000, 0x31000000, - 0xb1000000, 0x12000000, 0x10000000, 0x59000000, - 0x27000000, 0x80000000, 0xec000000, 0x5f000000, - 0x60000000, 0x51000000, 0x7f000000, 0xa9000000, - 0x19000000, 0xb5000000, 0x4a000000, 0x0d000000, - 0x2d000000, 0xe5000000, 0x7a000000, 0x9f000000, - 0x93000000, 0xc9000000, 0x9c000000, 0xef000000, - 0xa0000000, 0xe0000000, 0x3b000000, 0x4d000000, - 0xae000000, 0x2a000000, 0xf5000000, 0xb0000000, - 0xc8000000, 0xeb000000, 0xbb000000, 0x3c000000, - 0x83000000, 0x53000000, 0x99000000, 0x61000000, - 0x17000000, 0x2b000000, 0x04000000, 0x7e000000, - 0xba000000, 0x77000000, 0xd6000000, 0x26000000, - 0xe1000000, 0x69000000, 0x14000000, 0x63000000, - 0x55000000, 0x21000000, 0x0c000000, 0x7d000000, - } -}; - -EXPORT_SYMBOL_GPL(crypto_ft_tab); -EXPORT_SYMBOL_GPL(crypto_it_tab); - -/** - * crypto_aes_set_key - Set the AES key. - * @tfm: The %crypto_tfm that is used in the context. - * @in_key: The input key. - * @key_len: The size of the key. - * - * This function uses aes_expand_key() to expand the key. &crypto_aes_ctx - * _must_ be the private data embedded in @tfm which is retrieved with - * crypto_tfm_ctx(). - * - * Return: 0 on success; -EINVAL on failure (only happens for bad key lengths) - */ -int crypto_aes_set_key(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - return aes_expandkey(ctx, in_key, key_len); -} -EXPORT_SYMBOL_GPL(crypto_aes_set_key); - -/* encrypt a block of text */ - -#define f_rn(bo, bi, n, k) do { \ - bo[n] = crypto_ft_tab[0][byte(bi[n], 0)] ^ \ - crypto_ft_tab[1][byte(bi[(n + 1) & 3], 1)] ^ \ - crypto_ft_tab[2][byte(bi[(n + 2) & 3], 2)] ^ \ - crypto_ft_tab[3][byte(bi[(n + 3) & 3], 3)] ^ *(k + n); \ -} while (0) - -#define f_nround(bo, bi, k) do {\ - f_rn(bo, bi, 0, k); \ - f_rn(bo, bi, 1, k); \ - f_rn(bo, bi, 2, k); \ - f_rn(bo, bi, 3, k); \ - k += 4; \ -} while (0) - -#define f_rl(bo, bi, n, k) do { \ - bo[n] = crypto_fl_tab[0][byte(bi[n], 0)] ^ \ - crypto_fl_tab[1][byte(bi[(n + 1) & 3], 1)] ^ \ - crypto_fl_tab[2][byte(bi[(n + 2) & 3], 2)] ^ \ - crypto_fl_tab[3][byte(bi[(n + 3) & 3], 3)] ^ *(k + n); \ -} while (0) - -#define f_lround(bo, bi, k) do {\ - f_rl(bo, bi, 0, k); \ - f_rl(bo, bi, 1, k); \ - f_rl(bo, bi, 2, k); \ - f_rl(bo, bi, 3, k); \ -} while (0) - -static void crypto_aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - u32 b0[4], b1[4]; - const u32 *kp = ctx->key_enc + 4; - const int key_len = ctx->key_length; - - b0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in); - b0[1] = ctx->key_enc[1] ^ get_unaligned_le32(in + 4); - b0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8); - b0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12); - - if (key_len > 24) { - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - } - - if (key_len > 16) { - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - } - - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - f_nround(b1, b0, kp); - f_nround(b0, b1, kp); - f_nround(b1, b0, kp); - f_lround(b0, b1, kp); - - put_unaligned_le32(b0[0], out); - put_unaligned_le32(b0[1], out + 4); - put_unaligned_le32(b0[2], out + 8); - put_unaligned_le32(b0[3], out + 12); -} - -/* decrypt a block of text */ - -#define i_rn(bo, bi, n, k) do { \ - bo[n] = crypto_it_tab[0][byte(bi[n], 0)] ^ \ - crypto_it_tab[1][byte(bi[(n + 3) & 3], 1)] ^ \ - crypto_it_tab[2][byte(bi[(n + 2) & 3], 2)] ^ \ - crypto_it_tab[3][byte(bi[(n + 1) & 3], 3)] ^ *(k + n); \ -} while (0) - -#define i_nround(bo, bi, k) do {\ - i_rn(bo, bi, 0, k); \ - i_rn(bo, bi, 1, k); \ - i_rn(bo, bi, 2, k); \ - i_rn(bo, bi, 3, k); \ - k += 4; \ -} while (0) - -#define i_rl(bo, bi, n, k) do { \ - bo[n] = crypto_il_tab[0][byte(bi[n], 0)] ^ \ - crypto_il_tab[1][byte(bi[(n + 3) & 3], 1)] ^ \ - crypto_il_tab[2][byte(bi[(n + 2) & 3], 2)] ^ \ - crypto_il_tab[3][byte(bi[(n + 1) & 3], 3)] ^ *(k + n); \ -} while (0) - -#define i_lround(bo, bi, k) do {\ - i_rl(bo, bi, 0, k); \ - i_rl(bo, bi, 1, k); \ - i_rl(bo, bi, 2, k); \ - i_rl(bo, bi, 3, k); \ -} while (0) - -static void crypto_aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - u32 b0[4], b1[4]; - const int key_len = ctx->key_length; - const u32 *kp = ctx->key_dec + 4; - - b0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in); - b0[1] = ctx->key_dec[1] ^ get_unaligned_le32(in + 4); - b0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8); - b0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12); - - if (key_len > 24) { - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - } - - if (key_len > 16) { - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - } - - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - i_nround(b1, b0, kp); - i_nround(b0, b1, kp); - i_nround(b1, b0, kp); - i_lround(b0, b1, kp); - - put_unaligned_le32(b0[0], out); - put_unaligned_le32(b0[1], out + 4); - put_unaligned_le32(b0[2], out + 8); - put_unaligned_le32(b0[3], out + 12); -} - -static struct crypto_alg aes_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-generic", - .cra_priority = 100, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct crypto_aes_ctx), - .cra_module = THIS_MODULE, - .cra_u = { - .cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = crypto_aes_set_key, - .cia_encrypt = crypto_aes_encrypt, - .cia_decrypt = crypto_aes_decrypt - } - } -}; - -static int __init aes_init(void) -{ - return crypto_register_alg(&aes_alg); -} - -static void __exit aes_fini(void) -{ - crypto_unregister_alg(&aes_alg); -} - -module_init(aes_init); -module_exit(aes_fini); - -MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm"); -MODULE_LICENSE("Dual BSD/GPL"); -MODULE_ALIAS_CRYPTO("aes"); -MODULE_ALIAS_CRYPTO("aes-generic"); diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index aad429bef03e..3187e0d276f9 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -293,7 +293,7 @@ static int crypto_del_alg(struct sk_buff *skb, struct nlmsghdr *nlh, if (!alg) return -ENOENT; - /* We can not unregister core algorithms such as aes-generic. + /* We can not unregister core algorithms such as aes. * We would loose the reference in the crypto_alg_list to this algorithm * if we try to unregister. Unregistering such an algorithm without * removing the module is not possible, so we restrict to crypto diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 5df204d9c9dd..cbc049d697a1 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -4061,14 +4061,14 @@ static int alg_test_null(const struct alg_test_desc *desc, static const struct alg_test_desc alg_test_descs[] = { { .alg = "adiantum(xchacha12,aes)", - .generic_driver = "adiantum(xchacha12-lib,aes-generic)", + .generic_driver = "adiantum(xchacha12-lib,aes-lib)", .test = alg_test_skcipher, .suite = { .cipher = __VECS(adiantum_xchacha12_aes_tv_template) }, }, { .alg = "adiantum(xchacha20,aes)", - .generic_driver = "adiantum(xchacha20-lib,aes-generic)", + .generic_driver = "adiantum(xchacha20-lib,aes-lib)", .test = alg_test_skcipher, .suite = { .cipher = __VECS(adiantum_xchacha20_aes_tv_template) @@ -4088,7 +4088,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "authenc(hmac(sha1),cbc(aes))", - .generic_driver = "authenc(hmac-sha1-lib,cbc(aes-generic))", + .generic_driver = "authenc(hmac-sha1-lib,cbc(aes-lib))", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -4139,7 +4139,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "authenc(hmac(sha256),cbc(aes))", - .generic_driver = "authenc(hmac-sha256-lib,cbc(aes-generic))", + .generic_driver = "authenc(hmac-sha256-lib,cbc(aes-lib))", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -4165,7 +4165,7 @@ static const struct alg_test_desc alg_test_descs[] = { .fips_allowed = 1, }, { .alg = "authenc(hmac(sha256),cts(cbc(aes)))", - .generic_driver = "authenc(hmac-sha256-lib,cts(cbc(aes-generic)))", + .generic_driver = "authenc(hmac-sha256-lib,cts(cbc(aes-lib)))", .test = alg_test_aead, .suite = { .aead = __VECS(krb5_test_aes128_cts_hmac_sha256_128) @@ -4194,7 +4194,7 @@ static const struct alg_test_desc alg_test_descs[] = { .fips_allowed = 1, }, { .alg = "authenc(hmac(sha384),cts(cbc(aes)))", - .generic_driver = "authenc(hmac-sha384-lib,cts(cbc(aes-generic)))", + .generic_driver = "authenc(hmac-sha384-lib,cts(cbc(aes-lib)))", .test = alg_test_aead, .suite = { .aead = __VECS(krb5_test_aes256_cts_hmac_sha384_192) @@ -4205,7 +4205,7 @@ static const struct alg_test_desc alg_test_descs[] = { .fips_allowed = 1, }, { .alg = "authenc(hmac(sha512),cbc(aes))", - .generic_driver = "authenc(hmac-sha512-lib,cbc(aes-generic))", + .generic_driver = "authenc(hmac-sha512-lib,cbc(aes-lib))", .fips_allowed = 1, .test = alg_test_aead, .suite = { @@ -4267,6 +4267,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "cbc(aes)", + .generic_driver = "cbc(aes-lib)", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -4362,6 +4363,7 @@ static const struct alg_test_desc alg_test_descs[] = { }, { #endif .alg = "cbcmac(aes)", + .generic_driver = "cbcmac(aes-lib)", .test = alg_test_hash, .suite = { .hash = __VECS(aes_cbcmac_tv_template) @@ -4374,7 +4376,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "ccm(aes)", - .generic_driver = "ccm_base(ctr(aes-generic),cbcmac(aes-generic))", + .generic_driver = "ccm_base(ctr(aes-lib),cbcmac(aes-lib))", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -4402,6 +4404,7 @@ static const struct alg_test_desc alg_test_descs[] = { }, }, { .alg = "cmac(aes)", + .generic_driver = "cmac(aes-lib)", .fips_allowed = 1, .test = alg_test_hash, .suite = { @@ -4443,6 +4446,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "ctr(aes)", + .generic_driver = "ctr(aes-lib)", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -4533,6 +4537,7 @@ static const struct alg_test_desc alg_test_descs[] = { }, { #endif .alg = "cts(cbc(aes))", + .generic_driver = "cts(cbc(aes-lib))", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -4689,6 +4694,7 @@ static const struct alg_test_desc alg_test_descs[] = { .test = alg_test_null, }, { .alg = "ecb(aes)", + .generic_driver = "ecb(aes-lib)", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -4881,7 +4887,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "essiv(authenc(hmac(sha256),cbc(aes)),sha256)", - .generic_driver = "essiv(authenc(hmac-sha256-lib,cbc(aes-generic)),sha256-lib)", + .generic_driver = "essiv(authenc(hmac-sha256-lib,cbc(aes-lib)),sha256-lib)", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -4889,7 +4895,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "essiv(cbc(aes),sha256)", - .generic_driver = "essiv(cbc(aes-generic),sha256-lib)", + .generic_driver = "essiv(cbc(aes-lib),sha256-lib)", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -4934,7 +4940,7 @@ static const struct alg_test_desc alg_test_descs[] = { }, { #endif /* CONFIG_CRYPTO_DH_RFC7919_GROUPS */ .alg = "gcm(aes)", - .generic_driver = "gcm_base(ctr(aes-generic),ghash-generic)", + .generic_driver = "gcm_base(ctr(aes-lib),ghash-generic)", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -4962,7 +4968,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "hctr2(aes)", - .generic_driver = "hctr2_base(xctr(aes-generic),polyval-lib)", + .generic_driver = "hctr2_base(xctr(aes-lib),polyval-lib)", .test = alg_test_skcipher, .suite = { .cipher = __VECS(aes_hctr2_tv_template) @@ -5080,7 +5086,7 @@ static const struct alg_test_desc alg_test_descs[] = { .suite.aead = __VECS(krb5_test_camellia_cts_cmac) }, { .alg = "lrw(aes)", - .generic_driver = "lrw(ecb(aes-generic))", + .generic_driver = "lrw(ecb(aes-lib))", .test = alg_test_skcipher, .suite = { .cipher = __VECS(aes_lrw_tv_template) @@ -5269,6 +5275,7 @@ static const struct alg_test_desc alg_test_descs[] = { .fips_allowed = 1, }, { .alg = "rfc3686(ctr(aes))", + .generic_driver = "rfc3686(ctr(aes-lib))", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { @@ -5282,7 +5289,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "rfc4106(gcm(aes))", - .generic_driver = "rfc4106(gcm_base(ctr(aes-generic),ghash-generic))", + .generic_driver = "rfc4106(gcm_base(ctr(aes-lib),ghash-generic))", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -5294,7 +5301,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "rfc4309(ccm(aes))", - .generic_driver = "rfc4309(ccm_base(ctr(aes-generic),cbcmac(aes-generic)))", + .generic_driver = "rfc4309(ccm_base(ctr(aes-lib),cbcmac(aes-lib)))", .test = alg_test_aead, .fips_allowed = 1, .suite = { @@ -5306,7 +5313,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "rfc4543(gcm(aes))", - .generic_driver = "rfc4543(gcm_base(ctr(aes-generic),ghash-generic))", + .generic_driver = "rfc4543(gcm_base(ctr(aes-lib),ghash-generic))", .test = alg_test_aead, .suite = { .aead = { @@ -5483,6 +5490,7 @@ static const struct alg_test_desc alg_test_descs[] = { } }, { .alg = "xcbc(aes)", + .generic_driver = "xcbc(aes-lib)", .test = alg_test_hash, .suite = { .hash = __VECS(aes_xcbc128_tv_template) @@ -5509,13 +5517,14 @@ static const struct alg_test_desc alg_test_descs[] = { }, }, { .alg = "xctr(aes)", + .generic_driver = "xctr(aes-lib)", .test = alg_test_skcipher, .suite = { .cipher = __VECS(aes_xctr_tv_template) } }, { .alg = "xts(aes)", - .generic_driver = "xts(ecb(aes-generic))", + .generic_driver = "xts(ecb(aes-lib))", .test = alg_test_skcipher, .fips_allowed = 1, .suite = { diff --git a/drivers/crypto/starfive/jh7110-aes.c b/drivers/crypto/starfive/jh7110-aes.c index 426b24889af8..f1edb4fbf364 100644 --- a/drivers/crypto/starfive/jh7110-aes.c +++ b/drivers/crypto/starfive/jh7110-aes.c @@ -983,27 +983,27 @@ static int starfive_aes_ccm_decrypt(struct aead_request *req) static int starfive_aes_ecb_init_tfm(struct crypto_skcipher *tfm) { - return starfive_aes_init_tfm(tfm, "ecb(aes-generic)"); + return starfive_aes_init_tfm(tfm, "ecb(aes-lib)"); } static int starfive_aes_cbc_init_tfm(struct crypto_skcipher *tfm) { - return starfive_aes_init_tfm(tfm, "cbc(aes-generic)"); + return starfive_aes_init_tfm(tfm, "cbc(aes-lib)"); } static int starfive_aes_ctr_init_tfm(struct crypto_skcipher *tfm) { - return starfive_aes_init_tfm(tfm, "ctr(aes-generic)"); + return starfive_aes_init_tfm(tfm, "ctr(aes-lib)"); } static int starfive_aes_ccm_init_tfm(struct crypto_aead *tfm) { - return starfive_aes_aead_init_tfm(tfm, "ccm_base(ctr(aes-generic),cbcmac(aes-generic))"); + return starfive_aes_aead_init_tfm(tfm, "ccm_base(ctr(aes-lib),cbcmac(aes-lib))"); } static int starfive_aes_gcm_init_tfm(struct crypto_aead *tfm) { - return starfive_aes_aead_init_tfm(tfm, "gcm_base(ctr(aes-generic),ghash-generic)"); + return starfive_aes_aead_init_tfm(tfm, "gcm_base(ctr(aes-lib),ghash-generic)"); } static struct skcipher_engine_alg skcipher_algs[] = { diff --git a/include/crypto/aes.h b/include/crypto/aes.h index e8b83d4849fe..66421795cdab 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -82,9 +82,6 @@ struct crypto_aes_ctx { u32 key_length; }; -extern const u32 crypto_ft_tab[4][256] ____cacheline_aligned; -extern const u32 crypto_it_tab[4][256] ____cacheline_aligned; - /* * validate key length for AES algorithms */ @@ -102,9 +99,6 @@ static inline int aes_check_keylen(size_t keylen) return 0; } -int crypto_aes_set_key(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len); - /** * aes_expandkey - Expands the AES key as described in FIPS-197 * @ctx: The location where the computed key will be stored. -- cgit v1.2.3 From 2b1ef7aeeb184ee78523f3d24e221296574c6f2d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:09 -0800 Subject: lib/crypto: arm64/aes: Migrate optimized code into library Move the ARM64 optimized AES key expansion and single-block AES en/decryption code into lib/crypto/, wire it up to the AES library API, and remove the superseded crypto_cipher algorithms. The result is that both the AES library and crypto_cipher APIs are now optimized for ARM64, whereas previously only crypto_cipher was (and the optimizations weren't enabled by default, which this fixes as well). Note: to see the diff from arch/arm64/crypto/aes-ce-glue.c to lib/crypto/arm64/aes.h, view this commit with 'git show -M10'. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-12-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/arm64/crypto/Kconfig | 26 +----- arch/arm64/crypto/Makefile | 6 -- arch/arm64/crypto/aes-ce-ccm-glue.c | 2 - arch/arm64/crypto/aes-ce-core.S | 84 ----------------- arch/arm64/crypto/aes-ce-glue.c | 178 ------------------------------------ arch/arm64/crypto/aes-ce-setkey.h | 6 -- arch/arm64/crypto/aes-cipher-core.S | 132 -------------------------- arch/arm64/crypto/aes-cipher-glue.c | 71 -------------- arch/arm64/crypto/aes-glue.c | 2 - include/crypto/aes.h | 10 ++ lib/crypto/Kconfig | 1 + lib/crypto/Makefile | 5 + lib/crypto/arm64/aes-ce-core.S | 84 +++++++++++++++++ lib/crypto/arm64/aes-cipher-core.S | 132 ++++++++++++++++++++++++++ lib/crypto/arm64/aes.h | 164 +++++++++++++++++++++++++++++++++ 15 files changed, 397 insertions(+), 506 deletions(-) delete mode 100644 arch/arm64/crypto/aes-ce-core.S delete mode 100644 arch/arm64/crypto/aes-ce-glue.c delete mode 100644 arch/arm64/crypto/aes-ce-setkey.h delete mode 100644 arch/arm64/crypto/aes-cipher-core.S delete mode 100644 arch/arm64/crypto/aes-cipher-glue.c create mode 100644 lib/crypto/arm64/aes-ce-core.S create mode 100644 lib/crypto/arm64/aes-cipher-core.S create mode 100644 lib/crypto/arm64/aes.h (limited to 'include') diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 4453dff8f0c1..81ed892b3b72 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -37,34 +37,11 @@ config CRYPTO_SM3_ARM64_CE Architecture: arm64 using: - ARMv8.2 Crypto Extensions -config CRYPTO_AES_ARM64 - tristate "Ciphers: AES, modes: ECB, CBC, CTR, CTS, XCTR, XTS" - select CRYPTO_AES - help - Block ciphers: AES cipher algorithms (FIPS-197) - Length-preserving ciphers: AES with ECB, CBC, CTR, CTS, - XCTR, and XTS modes - AEAD cipher: AES with CBC, ESSIV, and SHA-256 - for fscrypt and dm-crypt - - Architecture: arm64 - -config CRYPTO_AES_ARM64_CE - tristate "Ciphers: AES (ARMv8 Crypto Extensions)" - depends on KERNEL_MODE_NEON - select CRYPTO_ALGAPI - select CRYPTO_LIB_AES - help - Block ciphers: AES cipher algorithms (FIPS-197) - - Architecture: arm64 using: - - ARMv8 Crypto Extensions - config CRYPTO_AES_ARM64_CE_BLK tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS (ARMv8 Crypto Extensions)" depends on KERNEL_MODE_NEON select CRYPTO_SKCIPHER - select CRYPTO_AES_ARM64_CE + select CRYPTO_LIB_AES select CRYPTO_LIB_SHA256 help Length-preserving ciphers: AES cipher algorithms (FIPS-197) @@ -165,7 +142,6 @@ config CRYPTO_AES_ARM64_CE_CCM tristate "AEAD cipher: AES in CCM mode (ARMv8 Crypto Extensions)" depends on KERNEL_MODE_NEON select CRYPTO_ALGAPI - select CRYPTO_AES_ARM64_CE select CRYPTO_AES_ARM64_CE_BLK select CRYPTO_AEAD select CRYPTO_LIB_AES diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index 3ab4b58e5c4c..3574e917bc37 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -29,9 +29,6 @@ sm4-neon-y := sm4-neon-glue.o sm4-neon-core.o obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o -obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o -aes-ce-cipher-y := aes-ce-core.o aes-ce-glue.o - obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o @@ -41,8 +38,5 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o aes-neon-blk-y := aes-glue-neon.o aes-neon.o -obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o -aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o - obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index c4fd648471f1..db371ac051fc 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -17,8 +17,6 @@ #include -#include "aes-ce-setkey.h" - MODULE_IMPORT_NS("CRYPTO_INTERNAL"); static int num_rounds(struct crypto_aes_ctx *ctx) diff --git a/arch/arm64/crypto/aes-ce-core.S b/arch/arm64/crypto/aes-ce-core.S deleted file mode 100644 index e52e13eb8fdb..000000000000 --- a/arch/arm64/crypto/aes-ce-core.S +++ /dev/null @@ -1,84 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 - 2017 Linaro Ltd - */ - -#include -#include - - .arch armv8-a+crypto - -SYM_FUNC_START(__aes_ce_encrypt) - sub w3, w3, #2 - ld1 {v0.16b}, [x2] - ld1 {v1.4s}, [x0], #16 - cmp w3, #10 - bmi 0f - bne 3f - mov v3.16b, v1.16b - b 2f -0: mov v2.16b, v1.16b - ld1 {v3.4s}, [x0], #16 -1: aese v0.16b, v2.16b - aesmc v0.16b, v0.16b -2: ld1 {v1.4s}, [x0], #16 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b -3: ld1 {v2.4s}, [x0], #16 - subs w3, w3, #3 - aese v0.16b, v1.16b - aesmc v0.16b, v0.16b - ld1 {v3.4s}, [x0], #16 - bpl 1b - aese v0.16b, v2.16b - eor v0.16b, v0.16b, v3.16b - st1 {v0.16b}, [x1] - ret -SYM_FUNC_END(__aes_ce_encrypt) - -SYM_FUNC_START(__aes_ce_decrypt) - sub w3, w3, #2 - ld1 {v0.16b}, [x2] - ld1 {v1.4s}, [x0], #16 - cmp w3, #10 - bmi 0f - bne 3f - mov v3.16b, v1.16b - b 2f -0: mov v2.16b, v1.16b - ld1 {v3.4s}, [x0], #16 -1: aesd v0.16b, v2.16b - aesimc v0.16b, v0.16b -2: ld1 {v1.4s}, [x0], #16 - aesd v0.16b, v3.16b - aesimc v0.16b, v0.16b -3: ld1 {v2.4s}, [x0], #16 - subs w3, w3, #3 - aesd v0.16b, v1.16b - aesimc v0.16b, v0.16b - ld1 {v3.4s}, [x0], #16 - bpl 1b - aesd v0.16b, v2.16b - eor v0.16b, v0.16b, v3.16b - st1 {v0.16b}, [x1] - ret -SYM_FUNC_END(__aes_ce_decrypt) - -/* - * __aes_ce_sub() - use the aese instruction to perform the AES sbox - * substitution on each byte in 'input' - */ -SYM_FUNC_START(__aes_ce_sub) - dup v1.4s, w0 - movi v0.16b, #0 - aese v0.16b, v1.16b - umov w0, v0.s[0] - ret -SYM_FUNC_END(__aes_ce_sub) - -SYM_FUNC_START(__aes_ce_invert) - ld1 {v0.4s}, [x1] - aesimc v1.16b, v0.16b - st1 {v1.4s}, [x0] - ret -SYM_FUNC_END(__aes_ce_invert) diff --git a/arch/arm64/crypto/aes-ce-glue.c b/arch/arm64/crypto/aes-ce-glue.c deleted file mode 100644 index a4dad370991d..000000000000 --- a/arch/arm64/crypto/aes-ce-glue.c +++ /dev/null @@ -1,178 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions - * - * Copyright (C) 2013 - 2017 Linaro Ltd - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -#include "aes-ce-setkey.h" - -MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions"); -MODULE_AUTHOR("Ard Biesheuvel "); -MODULE_LICENSE("GPL v2"); - -struct aes_block { - u8 b[AES_BLOCK_SIZE]; -}; - -asmlinkage void __aes_ce_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds); -asmlinkage void __aes_ce_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds); - -asmlinkage u32 __aes_ce_sub(u32 l); -asmlinkage void __aes_ce_invert(struct aes_block *out, - const struct aes_block *in); - -static int num_rounds(struct crypto_aes_ctx *ctx) -{ - /* - * # of rounds specified by AES: - * 128 bit key 10 rounds - * 192 bit key 12 rounds - * 256 bit key 14 rounds - * => n byte key => 6 + (n/4) rounds - */ - return 6 + ctx->key_length / 4; -} - -static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - if (!crypto_simd_usable()) { - aes_encrypt(ctx, dst, src); - return; - } - - scoped_ksimd() - __aes_ce_encrypt(ctx->key_enc, dst, src, num_rounds(ctx)); -} - -static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - if (!crypto_simd_usable()) { - aes_decrypt(ctx, dst, src); - return; - } - - scoped_ksimd() - __aes_ce_decrypt(ctx->key_dec, dst, src, num_rounds(ctx)); -} - -int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, - unsigned int key_len) -{ - /* - * The AES key schedule round constants - */ - static u8 const rcon[] = { - 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, - }; - - u32 kwords = key_len / sizeof(u32); - struct aes_block *key_enc, *key_dec; - int i, j; - - if (key_len != AES_KEYSIZE_128 && - key_len != AES_KEYSIZE_192 && - key_len != AES_KEYSIZE_256) - return -EINVAL; - - ctx->key_length = key_len; - for (i = 0; i < kwords; i++) - ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32)); - - scoped_ksimd() { - for (i = 0; i < sizeof(rcon); i++) { - u32 *rki = ctx->key_enc + (i * kwords); - u32 *rko = rki + kwords; - - rko[0] = ror32(__aes_ce_sub(rki[kwords - 1]), 8) ^ - rcon[i] ^ rki[0]; - rko[1] = rko[0] ^ rki[1]; - rko[2] = rko[1] ^ rki[2]; - rko[3] = rko[2] ^ rki[3]; - - if (key_len == AES_KEYSIZE_192) { - if (i >= 7) - break; - rko[4] = rko[3] ^ rki[4]; - rko[5] = rko[4] ^ rki[5]; - } else if (key_len == AES_KEYSIZE_256) { - if (i >= 6) - break; - rko[4] = __aes_ce_sub(rko[3]) ^ rki[4]; - rko[5] = rko[4] ^ rki[5]; - rko[6] = rko[5] ^ rki[6]; - rko[7] = rko[6] ^ rki[7]; - } - } - - /* - * Generate the decryption keys for the Equivalent Inverse - * Cipher. This involves reversing the order of the round - * keys, and applying the Inverse Mix Columns transformation on - * all but the first and the last one. - */ - key_enc = (struct aes_block *)ctx->key_enc; - key_dec = (struct aes_block *)ctx->key_dec; - j = num_rounds(ctx); - - key_dec[0] = key_enc[j]; - for (i = 1, j--; j > 0; i++, j--) - __aes_ce_invert(key_dec + i, key_enc + j); - key_dec[i] = key_enc[0]; - } - - return 0; -} -EXPORT_SYMBOL(ce_aes_expandkey); - -int ce_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - return ce_aes_expandkey(ctx, in_key, key_len); -} -EXPORT_SYMBOL(ce_aes_setkey); - -static struct crypto_alg aes_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-ce", - .cra_priority = 250, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct crypto_aes_ctx), - .cra_module = THIS_MODULE, - .cra_cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = ce_aes_setkey, - .cia_encrypt = aes_cipher_encrypt, - .cia_decrypt = aes_cipher_decrypt - } -}; - -static int __init aes_mod_init(void) -{ - return crypto_register_alg(&aes_alg); -} - -static void __exit aes_mod_exit(void) -{ - crypto_unregister_alg(&aes_alg); -} - -module_cpu_feature_match(AES, aes_mod_init); -module_exit(aes_mod_exit); diff --git a/arch/arm64/crypto/aes-ce-setkey.h b/arch/arm64/crypto/aes-ce-setkey.h deleted file mode 100644 index fd9ecf07d88c..000000000000 --- a/arch/arm64/crypto/aes-ce-setkey.h +++ /dev/null @@ -1,6 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ - -int ce_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len); -int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, - unsigned int key_len); diff --git a/arch/arm64/crypto/aes-cipher-core.S b/arch/arm64/crypto/aes-cipher-core.S deleted file mode 100644 index 651f701c56a8..000000000000 --- a/arch/arm64/crypto/aes-cipher-core.S +++ /dev/null @@ -1,132 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Scalar AES core transform - * - * Copyright (C) 2017 Linaro Ltd - */ - -#include -#include -#include - - .text - - rk .req x0 - out .req x1 - in .req x2 - rounds .req x3 - tt .req x2 - - .macro __pair1, sz, op, reg0, reg1, in0, in1e, in1d, shift - .ifc \op\shift, b0 - ubfiz \reg0, \in0, #2, #8 - ubfiz \reg1, \in1e, #2, #8 - .else - ubfx \reg0, \in0, #\shift, #8 - ubfx \reg1, \in1e, #\shift, #8 - .endif - - /* - * AArch64 cannot do byte size indexed loads from a table containing - * 32-bit quantities, i.e., 'ldrb w12, [tt, w12, uxtw #2]' is not a - * valid instruction. So perform the shift explicitly first for the - * high bytes (the low byte is shifted implicitly by using ubfiz rather - * than ubfx above) - */ - .ifnc \op, b - ldr \reg0, [tt, \reg0, uxtw #2] - ldr \reg1, [tt, \reg1, uxtw #2] - .else - .if \shift > 0 - lsl \reg0, \reg0, #2 - lsl \reg1, \reg1, #2 - .endif - ldrb \reg0, [tt, \reg0, uxtw] - ldrb \reg1, [tt, \reg1, uxtw] - .endif - .endm - - .macro __pair0, sz, op, reg0, reg1, in0, in1e, in1d, shift - ubfx \reg0, \in0, #\shift, #8 - ubfx \reg1, \in1d, #\shift, #8 - ldr\op \reg0, [tt, \reg0, uxtw #\sz] - ldr\op \reg1, [tt, \reg1, uxtw #\sz] - .endm - - .macro __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc, sz, op - ldp \out0, \out1, [rk], #8 - - __pair\enc \sz, \op, w12, w13, \in0, \in1, \in3, 0 - __pair\enc \sz, \op, w14, w15, \in1, \in2, \in0, 8 - __pair\enc \sz, \op, w16, w17, \in2, \in3, \in1, 16 - __pair\enc \sz, \op, \t0, \t1, \in3, \in0, \in2, 24 - - eor \out0, \out0, w12 - eor \out1, \out1, w13 - eor \out0, \out0, w14, ror #24 - eor \out1, \out1, w15, ror #24 - eor \out0, \out0, w16, ror #16 - eor \out1, \out1, w17, ror #16 - eor \out0, \out0, \t0, ror #8 - eor \out1, \out1, \t1, ror #8 - .endm - - .macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op - __hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op - __hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op - .endm - - .macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op - __hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op - __hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op - .endm - - .macro do_crypt, round, ttab, ltab, bsz - ldp w4, w5, [in] - ldp w6, w7, [in, #8] - ldp w8, w9, [rk], #16 - ldp w10, w11, [rk, #-8] - -CPU_BE( rev w4, w4 ) -CPU_BE( rev w5, w5 ) -CPU_BE( rev w6, w6 ) -CPU_BE( rev w7, w7 ) - - eor w4, w4, w8 - eor w5, w5, w9 - eor w6, w6, w10 - eor w7, w7, w11 - - adr_l tt, \ttab - - tbnz rounds, #1, 1f - -0: \round w8, w9, w10, w11, w4, w5, w6, w7 - \round w4, w5, w6, w7, w8, w9, w10, w11 - -1: subs rounds, rounds, #4 - \round w8, w9, w10, w11, w4, w5, w6, w7 - b.ls 3f -2: \round w4, w5, w6, w7, w8, w9, w10, w11 - b 0b -3: adr_l tt, \ltab - \round w4, w5, w6, w7, w8, w9, w10, w11, \bsz, b - -CPU_BE( rev w4, w4 ) -CPU_BE( rev w5, w5 ) -CPU_BE( rev w6, w6 ) -CPU_BE( rev w7, w7 ) - - stp w4, w5, [out] - stp w6, w7, [out, #8] - ret - .endm - -SYM_FUNC_START(__aes_arm64_encrypt) - do_crypt fround, aes_enc_tab, aes_enc_tab + 1, 2 -SYM_FUNC_END(__aes_arm64_encrypt) - - .align 5 -SYM_FUNC_START(__aes_arm64_decrypt) - do_crypt iround, aes_dec_tab, crypto_aes_inv_sbox, 0 -SYM_FUNC_END(__aes_arm64_decrypt) diff --git a/arch/arm64/crypto/aes-cipher-glue.c b/arch/arm64/crypto/aes-cipher-glue.c deleted file mode 100644 index 9b27cbac278b..000000000000 --- a/arch/arm64/crypto/aes-cipher-glue.c +++ /dev/null @@ -1,71 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Scalar AES core transform - * - * Copyright (C) 2017 Linaro Ltd - */ - -#include -#include -#include - -asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds); -asmlinkage void __aes_arm64_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds); - -static int aes_arm64_setkey(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - return aes_expandkey(ctx, in_key, key_len); -} - -static void aes_arm64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - int rounds = 6 + ctx->key_length / 4; - - __aes_arm64_encrypt(ctx->key_enc, out, in, rounds); -} - -static void aes_arm64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - int rounds = 6 + ctx->key_length / 4; - - __aes_arm64_decrypt(ctx->key_dec, out, in, rounds); -} - -static struct crypto_alg aes_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-arm64", - .cra_priority = 200, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct crypto_aes_ctx), - .cra_module = THIS_MODULE, - - .cra_cipher.cia_min_keysize = AES_MIN_KEY_SIZE, - .cra_cipher.cia_max_keysize = AES_MAX_KEY_SIZE, - .cra_cipher.cia_setkey = aes_arm64_setkey, - .cra_cipher.cia_encrypt = aes_arm64_encrypt, - .cra_cipher.cia_decrypt = aes_arm64_decrypt -}; - -static int __init aes_init(void) -{ - return crypto_register_alg(&aes_alg); -} - -static void __exit aes_fini(void) -{ - crypto_unregister_alg(&aes_alg); -} - -module_init(aes_init); -module_exit(aes_fini); - -MODULE_DESCRIPTION("Scalar AES cipher for arm64"); -MODULE_AUTHOR("Ard Biesheuvel "); -MODULE_LICENSE("GPL v2"); -MODULE_ALIAS_CRYPTO("aes"); diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index c51d4487e9e9..92f43e1cd097 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -21,8 +21,6 @@ #include #include -#include "aes-ce-setkey.h" - #ifdef USE_V8_CRYPTO_EXTENSIONS #define MODE "ce" #define PRIO 300 diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 66421795cdab..18af1acbde58 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -116,6 +116,16 @@ static inline int aes_check_keylen(size_t keylen) int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len); +/* + * The following functions are temporarily exported for use by the AES mode + * implementations in arch/$(SRCARCH)/crypto/. These exports will go away when + * that code is migrated into lib/crypto/. + */ +#ifdef CONFIG_ARM64 +int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, + unsigned int key_len); +#endif + /** * aes_preparekey() - Prepare an AES key for encryption and decryption * @key: (output) The key structure to initialize diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index 60420b421e04..ead47b2a7db6 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -15,6 +15,7 @@ config CRYPTO_LIB_AES_ARCH bool depends on CRYPTO_LIB_AES && !UML && !KMSAN default y if ARM + default y if ARM64 config CRYPTO_LIB_AESCFB tristate diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 2f6b0f59eb1b..1b690c63fafb 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -24,6 +24,11 @@ CFLAGS_aes.o += -I$(src)/$(SRCARCH) libaes-$(CONFIG_ARM) += arm/aes-cipher-core.o +ifeq ($(CONFIG_ARM64),y) +libaes-y += arm64/aes-cipher-core.o +libaes-$(CONFIG_KERNEL_MODE_NEON) += arm64/aes-ce-core.o +endif + endif # CONFIG_CRYPTO_LIB_AES_ARCH ################################################################################ diff --git a/lib/crypto/arm64/aes-ce-core.S b/lib/crypto/arm64/aes-ce-core.S new file mode 100644 index 000000000000..e52e13eb8fdb --- /dev/null +++ b/lib/crypto/arm64/aes-ce-core.S @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2013 - 2017 Linaro Ltd + */ + +#include +#include + + .arch armv8-a+crypto + +SYM_FUNC_START(__aes_ce_encrypt) + sub w3, w3, #2 + ld1 {v0.16b}, [x2] + ld1 {v1.4s}, [x0], #16 + cmp w3, #10 + bmi 0f + bne 3f + mov v3.16b, v1.16b + b 2f +0: mov v2.16b, v1.16b + ld1 {v3.4s}, [x0], #16 +1: aese v0.16b, v2.16b + aesmc v0.16b, v0.16b +2: ld1 {v1.4s}, [x0], #16 + aese v0.16b, v3.16b + aesmc v0.16b, v0.16b +3: ld1 {v2.4s}, [x0], #16 + subs w3, w3, #3 + aese v0.16b, v1.16b + aesmc v0.16b, v0.16b + ld1 {v3.4s}, [x0], #16 + bpl 1b + aese v0.16b, v2.16b + eor v0.16b, v0.16b, v3.16b + st1 {v0.16b}, [x1] + ret +SYM_FUNC_END(__aes_ce_encrypt) + +SYM_FUNC_START(__aes_ce_decrypt) + sub w3, w3, #2 + ld1 {v0.16b}, [x2] + ld1 {v1.4s}, [x0], #16 + cmp w3, #10 + bmi 0f + bne 3f + mov v3.16b, v1.16b + b 2f +0: mov v2.16b, v1.16b + ld1 {v3.4s}, [x0], #16 +1: aesd v0.16b, v2.16b + aesimc v0.16b, v0.16b +2: ld1 {v1.4s}, [x0], #16 + aesd v0.16b, v3.16b + aesimc v0.16b, v0.16b +3: ld1 {v2.4s}, [x0], #16 + subs w3, w3, #3 + aesd v0.16b, v1.16b + aesimc v0.16b, v0.16b + ld1 {v3.4s}, [x0], #16 + bpl 1b + aesd v0.16b, v2.16b + eor v0.16b, v0.16b, v3.16b + st1 {v0.16b}, [x1] + ret +SYM_FUNC_END(__aes_ce_decrypt) + +/* + * __aes_ce_sub() - use the aese instruction to perform the AES sbox + * substitution on each byte in 'input' + */ +SYM_FUNC_START(__aes_ce_sub) + dup v1.4s, w0 + movi v0.16b, #0 + aese v0.16b, v1.16b + umov w0, v0.s[0] + ret +SYM_FUNC_END(__aes_ce_sub) + +SYM_FUNC_START(__aes_ce_invert) + ld1 {v0.4s}, [x1] + aesimc v1.16b, v0.16b + st1 {v1.4s}, [x0] + ret +SYM_FUNC_END(__aes_ce_invert) diff --git a/lib/crypto/arm64/aes-cipher-core.S b/lib/crypto/arm64/aes-cipher-core.S new file mode 100644 index 000000000000..651f701c56a8 --- /dev/null +++ b/lib/crypto/arm64/aes-cipher-core.S @@ -0,0 +1,132 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Scalar AES core transform + * + * Copyright (C) 2017 Linaro Ltd + */ + +#include +#include +#include + + .text + + rk .req x0 + out .req x1 + in .req x2 + rounds .req x3 + tt .req x2 + + .macro __pair1, sz, op, reg0, reg1, in0, in1e, in1d, shift + .ifc \op\shift, b0 + ubfiz \reg0, \in0, #2, #8 + ubfiz \reg1, \in1e, #2, #8 + .else + ubfx \reg0, \in0, #\shift, #8 + ubfx \reg1, \in1e, #\shift, #8 + .endif + + /* + * AArch64 cannot do byte size indexed loads from a table containing + * 32-bit quantities, i.e., 'ldrb w12, [tt, w12, uxtw #2]' is not a + * valid instruction. So perform the shift explicitly first for the + * high bytes (the low byte is shifted implicitly by using ubfiz rather + * than ubfx above) + */ + .ifnc \op, b + ldr \reg0, [tt, \reg0, uxtw #2] + ldr \reg1, [tt, \reg1, uxtw #2] + .else + .if \shift > 0 + lsl \reg0, \reg0, #2 + lsl \reg1, \reg1, #2 + .endif + ldrb \reg0, [tt, \reg0, uxtw] + ldrb \reg1, [tt, \reg1, uxtw] + .endif + .endm + + .macro __pair0, sz, op, reg0, reg1, in0, in1e, in1d, shift + ubfx \reg0, \in0, #\shift, #8 + ubfx \reg1, \in1d, #\shift, #8 + ldr\op \reg0, [tt, \reg0, uxtw #\sz] + ldr\op \reg1, [tt, \reg1, uxtw #\sz] + .endm + + .macro __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc, sz, op + ldp \out0, \out1, [rk], #8 + + __pair\enc \sz, \op, w12, w13, \in0, \in1, \in3, 0 + __pair\enc \sz, \op, w14, w15, \in1, \in2, \in0, 8 + __pair\enc \sz, \op, w16, w17, \in2, \in3, \in1, 16 + __pair\enc \sz, \op, \t0, \t1, \in3, \in0, \in2, 24 + + eor \out0, \out0, w12 + eor \out1, \out1, w13 + eor \out0, \out0, w14, ror #24 + eor \out1, \out1, w15, ror #24 + eor \out0, \out0, w16, ror #16 + eor \out1, \out1, w17, ror #16 + eor \out0, \out0, \t0, ror #8 + eor \out1, \out1, \t1, ror #8 + .endm + + .macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op + __hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op + __hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op + .endm + + .macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op + __hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op + __hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op + .endm + + .macro do_crypt, round, ttab, ltab, bsz + ldp w4, w5, [in] + ldp w6, w7, [in, #8] + ldp w8, w9, [rk], #16 + ldp w10, w11, [rk, #-8] + +CPU_BE( rev w4, w4 ) +CPU_BE( rev w5, w5 ) +CPU_BE( rev w6, w6 ) +CPU_BE( rev w7, w7 ) + + eor w4, w4, w8 + eor w5, w5, w9 + eor w6, w6, w10 + eor w7, w7, w11 + + adr_l tt, \ttab + + tbnz rounds, #1, 1f + +0: \round w8, w9, w10, w11, w4, w5, w6, w7 + \round w4, w5, w6, w7, w8, w9, w10, w11 + +1: subs rounds, rounds, #4 + \round w8, w9, w10, w11, w4, w5, w6, w7 + b.ls 3f +2: \round w4, w5, w6, w7, w8, w9, w10, w11 + b 0b +3: adr_l tt, \ltab + \round w4, w5, w6, w7, w8, w9, w10, w11, \bsz, b + +CPU_BE( rev w4, w4 ) +CPU_BE( rev w5, w5 ) +CPU_BE( rev w6, w6 ) +CPU_BE( rev w7, w7 ) + + stp w4, w5, [out] + stp w6, w7, [out, #8] + ret + .endm + +SYM_FUNC_START(__aes_arm64_encrypt) + do_crypt fround, aes_enc_tab, aes_enc_tab + 1, 2 +SYM_FUNC_END(__aes_arm64_encrypt) + + .align 5 +SYM_FUNC_START(__aes_arm64_decrypt) + do_crypt iround, aes_dec_tab, crypto_aes_inv_sbox, 0 +SYM_FUNC_END(__aes_arm64_decrypt) diff --git a/lib/crypto/arm64/aes.h b/lib/crypto/arm64/aes.h new file mode 100644 index 000000000000..63eea6271ef9 --- /dev/null +++ b/lib/crypto/arm64/aes.h @@ -0,0 +1,164 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * AES block cipher, optimized for ARM64 + * + * Copyright (C) 2013 - 2017 Linaro Ltd + * Copyright 2026 Google LLC + */ + +#include +#include +#include +#include + +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_aes); + +struct aes_block { + u8 b[AES_BLOCK_SIZE]; +}; + +asmlinkage void __aes_arm64_encrypt(const u32 rk[], u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE], int rounds); +asmlinkage void __aes_arm64_decrypt(const u32 inv_rk[], u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE], int rounds); +asmlinkage void __aes_ce_encrypt(const u32 rk[], u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE], int rounds); +asmlinkage void __aes_ce_decrypt(const u32 inv_rk[], u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE], int rounds); +asmlinkage u32 __aes_ce_sub(u32 l); +asmlinkage void __aes_ce_invert(struct aes_block *out, + const struct aes_block *in); + +/* + * Expand an AES key using the crypto extensions if supported and usable or + * generic code otherwise. The expanded key format is compatible between the + * two cases. The outputs are @rndkeys (required) and @inv_rndkeys (optional). + */ +static void aes_expandkey_arm64(u32 rndkeys[], u32 *inv_rndkeys, + const u8 *in_key, int key_len, int nrounds) +{ + /* + * The AES key schedule round constants + */ + static u8 const rcon[] = { + 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, + }; + + u32 kwords = key_len / sizeof(u32); + struct aes_block *key_enc, *key_dec; + int i, j; + + if (!IS_ENABLED(CONFIG_KERNEL_MODE_NEON) || + !static_branch_likely(&have_aes) || unlikely(!may_use_simd())) { + aes_expandkey_generic(rndkeys, inv_rndkeys, in_key, key_len); + return; + } + + for (i = 0; i < kwords; i++) + rndkeys[i] = get_unaligned_le32(&in_key[i * sizeof(u32)]); + + scoped_ksimd() { + for (i = 0; i < sizeof(rcon); i++) { + u32 *rki = &rndkeys[i * kwords]; + u32 *rko = rki + kwords; + + rko[0] = ror32(__aes_ce_sub(rki[kwords - 1]), 8) ^ + rcon[i] ^ rki[0]; + rko[1] = rko[0] ^ rki[1]; + rko[2] = rko[1] ^ rki[2]; + rko[3] = rko[2] ^ rki[3]; + + if (key_len == AES_KEYSIZE_192) { + if (i >= 7) + break; + rko[4] = rko[3] ^ rki[4]; + rko[5] = rko[4] ^ rki[5]; + } else if (key_len == AES_KEYSIZE_256) { + if (i >= 6) + break; + rko[4] = __aes_ce_sub(rko[3]) ^ rki[4]; + rko[5] = rko[4] ^ rki[5]; + rko[6] = rko[5] ^ rki[6]; + rko[7] = rko[6] ^ rki[7]; + } + } + + /* + * Generate the decryption keys for the Equivalent Inverse + * Cipher. This involves reversing the order of the round + * keys, and applying the Inverse Mix Columns transformation on + * all but the first and the last one. + */ + if (inv_rndkeys) { + key_enc = (struct aes_block *)rndkeys; + key_dec = (struct aes_block *)inv_rndkeys; + j = nrounds; + + key_dec[0] = key_enc[j]; + for (i = 1, j--; j > 0; i++, j--) + __aes_ce_invert(key_dec + i, key_enc + j); + key_dec[i] = key_enc[0]; + } + } +} + +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + aes_expandkey_arm64(k->rndkeys, inv_k ? inv_k->inv_rndkeys : NULL, + in_key, key_len, nrounds); +} + +/* + * This is here temporarily until the remaining AES mode implementations are + * migrated from arch/arm64/crypto/ to lib/crypto/arm64/. + */ +int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, + unsigned int key_len) +{ + if (aes_check_keylen(key_len) != 0) + return -EINVAL; + ctx->key_length = key_len; + aes_expandkey_arm64(ctx->key_enc, ctx->key_dec, in_key, key_len, + 6 + key_len / 4); + return 0; +} +EXPORT_SYMBOL(ce_aes_expandkey); + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && + static_branch_likely(&have_aes) && likely(may_use_simd())) { + scoped_ksimd() + __aes_ce_encrypt(key->k.rndkeys, out, in, key->nrounds); + } else { + __aes_arm64_encrypt(key->k.rndkeys, out, in, key->nrounds); + } +} + +static void aes_decrypt_arch(const struct aes_key *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && + static_branch_likely(&have_aes) && likely(may_use_simd())) { + scoped_ksimd() + __aes_ce_decrypt(key->inv_k.inv_rndkeys, out, in, + key->nrounds); + } else { + __aes_arm64_decrypt(key->inv_k.inv_rndkeys, out, in, + key->nrounds); + } +} + +#ifdef CONFIG_KERNEL_MODE_NEON +#define aes_mod_init_arch aes_mod_init_arch +static void aes_mod_init_arch(void) +{ + if (cpu_have_named_feature(AES)) + static_branch_enable(&have_aes); +} +#endif /* CONFIG_KERNEL_MODE_NEON */ -- cgit v1.2.3 From 0892c91b81cc889c95dc03b095b9f4a6fdf93106 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:10 -0800 Subject: lib/crypto: powerpc/aes: Migrate SPE optimized code into library Move the PowerPC SPE AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the superseded "aes-ppc-spe" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs are now optimized with SPE, whereas previously only crypto_cipher was (and optimizations weren't enabled by default, which this commit fixes too). Note that many of the functions in the PowerPC SPE assembly code are still used by the AES mode implementations in arch/powerpc/crypto/. For now, just export these functions. These exports will go away once the AES modes are migrated to the library as well. (Trying to split up the assembly files seemed like much more trouble than it would be worth.) Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-13-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/powerpc/crypto/Kconfig | 2 +- arch/powerpc/crypto/Makefile | 2 +- arch/powerpc/crypto/aes-spe-core.S | 346 -------------------- arch/powerpc/crypto/aes-spe-glue.c | 88 +---- arch/powerpc/crypto/aes-spe-keys.S | 278 ---------------- arch/powerpc/crypto/aes-spe-modes.S | 625 ------------------------------------ arch/powerpc/crypto/aes-spe-regs.h | 37 --- arch/powerpc/crypto/aes-tab-4k.S | 326 ------------------- include/crypto/aes.h | 31 ++ lib/crypto/Kconfig | 1 + lib/crypto/Makefile | 9 + lib/crypto/powerpc/aes-spe-core.S | 346 ++++++++++++++++++++ lib/crypto/powerpc/aes-spe-keys.S | 278 ++++++++++++++++ lib/crypto/powerpc/aes-spe-modes.S | 625 ++++++++++++++++++++++++++++++++++++ lib/crypto/powerpc/aes-spe-regs.h | 37 +++ lib/crypto/powerpc/aes-tab-4k.S | 326 +++++++++++++++++++ lib/crypto/powerpc/aes.h | 74 +++++ 17 files changed, 1734 insertions(+), 1697 deletions(-) delete mode 100644 arch/powerpc/crypto/aes-spe-core.S delete mode 100644 arch/powerpc/crypto/aes-spe-keys.S delete mode 100644 arch/powerpc/crypto/aes-spe-modes.S delete mode 100644 arch/powerpc/crypto/aes-spe-regs.h delete mode 100644 arch/powerpc/crypto/aes-tab-4k.S create mode 100644 lib/crypto/powerpc/aes-spe-core.S create mode 100644 lib/crypto/powerpc/aes-spe-keys.S create mode 100644 lib/crypto/powerpc/aes-spe-modes.S create mode 100644 lib/crypto/powerpc/aes-spe-regs.h create mode 100644 lib/crypto/powerpc/aes-tab-4k.S create mode 100644 lib/crypto/powerpc/aes.h (limited to 'include') diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig index 662aed46f9c7..2d056f1fc90f 100644 --- a/arch/powerpc/crypto/Kconfig +++ b/arch/powerpc/crypto/Kconfig @@ -5,9 +5,9 @@ menu "Accelerated Cryptographic Algorithms for CPU (powerpc)" config CRYPTO_AES_PPC_SPE tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS (SPE)" depends on SPE + select CRYPTO_LIB_AES select CRYPTO_SKCIPHER help - Block ciphers: AES cipher algorithms (FIPS-197) Length-preserving ciphers: AES with ECB, CBC, CTR, and XTS modes Architecture: powerpc using: diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile index 5960e5300db7..e22310da86b5 100644 --- a/arch/powerpc/crypto/Makefile +++ b/arch/powerpc/crypto/Makefile @@ -9,7 +9,7 @@ obj-$(CONFIG_CRYPTO_AES_PPC_SPE) += aes-ppc-spe.o obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o -aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o +aes-ppc-spe-y := aes-spe-glue.o aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-ppc.o vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o diff --git a/arch/powerpc/crypto/aes-spe-core.S b/arch/powerpc/crypto/aes-spe-core.S deleted file mode 100644 index 8e00eccc352b..000000000000 --- a/arch/powerpc/crypto/aes-spe-core.S +++ /dev/null @@ -1,346 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Fast AES implementation for SPE instruction set (PPC) - * - * This code makes use of the SPE SIMD instruction set as defined in - * http://cache.freescale.com/files/32bit/doc/ref_manual/SPEPIM.pdf - * Implementation is based on optimization guide notes from - * http://cache.freescale.com/files/32bit/doc/app_note/AN2665.pdf - * - * Copyright (c) 2015 Markus Stockhausen - */ - -#include -#include "aes-spe-regs.h" - -#define EAD(in, bpos) \ - rlwimi rT0,in,28-((bpos+3)%4)*8,20,27; - -#define DAD(in, bpos) \ - rlwimi rT1,in,24-((bpos+3)%4)*8,24,31; - -#define LWH(out, off) \ - evlwwsplat out,off(rT0); /* load word high */ - -#define LWL(out, off) \ - lwz out,off(rT0); /* load word low */ - -#define LBZ(out, tab, off) \ - lbz out,off(tab); /* load byte */ - -#define LAH(out, in, bpos, off) \ - EAD(in, bpos) /* calc addr + load word high */ \ - LWH(out, off) - -#define LAL(out, in, bpos, off) \ - EAD(in, bpos) /* calc addr + load word low */ \ - LWL(out, off) - -#define LAE(out, in, bpos) \ - EAD(in, bpos) /* calc addr + load enc byte */ \ - LBZ(out, rT0, 8) - -#define LBE(out) \ - LBZ(out, rT0, 8) /* load enc byte */ - -#define LAD(out, in, bpos) \ - DAD(in, bpos) /* calc addr + load dec byte */ \ - LBZ(out, rT1, 0) - -#define LBD(out) \ - LBZ(out, rT1, 0) - -/* - * ppc_encrypt_block: The central encryption function for a single 16 bytes - * block. It does no stack handling or register saving to support fast calls - * via bl/blr. It expects that caller has pre-xored input data with first - * 4 words of encryption key into rD0-rD3. Pointer/counter registers must - * have also been set up before (rT0, rKP, CTR). Output is stored in rD0-rD3 - * and rW0-rW3 and caller must execute a final xor on the output registers. - * All working registers rD0-rD3 & rW0-rW7 are overwritten during processing. - * - */ -_GLOBAL(ppc_encrypt_block) - LAH(rW4, rD1, 2, 4) - LAH(rW6, rD0, 3, 0) - LAH(rW3, rD0, 1, 8) -ppc_encrypt_block_loop: - LAH(rW0, rD3, 0, 12) - LAL(rW0, rD0, 0, 12) - LAH(rW1, rD1, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAL(rW3, rD1, 1, 8) - LAL(rW4, rD2, 2, 4) - LAL(rW6, rD1, 3, 0) - LAH(rW5, rD3, 2, 4) - LAL(rW5, rD0, 2, 4) - LAH(rW7, rD2, 3, 0) - evldw rD1,16(rKP) - EAD(rD3, 3) - evxor rW2,rW2,rW4 - LWL(rW7, 0) - evxor rW2,rW2,rW6 - EAD(rD2, 0) - evxor rD1,rD1,rW2 - LWL(rW1, 12) - evxor rD1,rD1,rW0 - evldw rD3,24(rKP) - evmergehi rD0,rD0,rD1 - EAD(rD1, 2) - evxor rW3,rW3,rW5 - LWH(rW4, 4) - evxor rW3,rW3,rW7 - EAD(rD0, 3) - evxor rD3,rD3,rW3 - LWH(rW6, 0) - evxor rD3,rD3,rW1 - EAD(rD0, 1) - evmergehi rD2,rD2,rD3 - LWH(rW3, 8) - LAH(rW0, rD3, 0, 12) - LAL(rW0, rD0, 0, 12) - LAH(rW1, rD1, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAL(rW3, rD1, 1, 8) - LAL(rW4, rD2, 2, 4) - LAL(rW6, rD1, 3, 0) - LAH(rW5, rD3, 2, 4) - LAL(rW5, rD0, 2, 4) - LAH(rW7, rD2, 3, 0) - evldw rD1,32(rKP) - EAD(rD3, 3) - evxor rW2,rW2,rW4 - LWL(rW7, 0) - evxor rW2,rW2,rW6 - EAD(rD2, 0) - evxor rD1,rD1,rW2 - LWL(rW1, 12) - evxor rD1,rD1,rW0 - evldw rD3,40(rKP) - evmergehi rD0,rD0,rD1 - EAD(rD1, 2) - evxor rW3,rW3,rW5 - LWH(rW4, 4) - evxor rW3,rW3,rW7 - EAD(rD0, 3) - evxor rD3,rD3,rW3 - LWH(rW6, 0) - evxor rD3,rD3,rW1 - EAD(rD0, 1) - evmergehi rD2,rD2,rD3 - LWH(rW3, 8) - addi rKP,rKP,32 - bdnz ppc_encrypt_block_loop - LAH(rW0, rD3, 0, 12) - LAL(rW0, rD0, 0, 12) - LAH(rW1, rD1, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAL(rW3, rD1, 1, 8) - LAL(rW4, rD2, 2, 4) - LAH(rW5, rD3, 2, 4) - LAL(rW6, rD1, 3, 0) - LAL(rW5, rD0, 2, 4) - LAH(rW7, rD2, 3, 0) - evldw rD1,16(rKP) - EAD(rD3, 3) - evxor rW2,rW2,rW4 - LWL(rW7, 0) - evxor rW2,rW2,rW6 - EAD(rD2, 0) - evxor rD1,rD1,rW2 - LWL(rW1, 12) - evxor rD1,rD1,rW0 - evldw rD3,24(rKP) - evmergehi rD0,rD0,rD1 - EAD(rD1, 0) - evxor rW3,rW3,rW5 - LBE(rW2) - evxor rW3,rW3,rW7 - EAD(rD0, 1) - evxor rD3,rD3,rW3 - LBE(rW6) - evxor rD3,rD3,rW1 - EAD(rD0, 0) - evmergehi rD2,rD2,rD3 - LBE(rW1) - LAE(rW0, rD3, 0) - LAE(rW1, rD0, 0) - LAE(rW4, rD2, 1) - LAE(rW5, rD3, 1) - LAE(rW3, rD2, 0) - LAE(rW7, rD1, 1) - rlwimi rW0,rW4,8,16,23 - rlwimi rW1,rW5,8,16,23 - LAE(rW4, rD1, 2) - LAE(rW5, rD2, 2) - rlwimi rW2,rW6,8,16,23 - rlwimi rW3,rW7,8,16,23 - LAE(rW6, rD3, 2) - LAE(rW7, rD0, 2) - rlwimi rW0,rW4,16,8,15 - rlwimi rW1,rW5,16,8,15 - LAE(rW4, rD0, 3) - LAE(rW5, rD1, 3) - rlwimi rW2,rW6,16,8,15 - lwz rD0,32(rKP) - rlwimi rW3,rW7,16,8,15 - lwz rD1,36(rKP) - LAE(rW6, rD2, 3) - LAE(rW7, rD3, 3) - rlwimi rW0,rW4,24,0,7 - lwz rD2,40(rKP) - rlwimi rW1,rW5,24,0,7 - lwz rD3,44(rKP) - rlwimi rW2,rW6,24,0,7 - rlwimi rW3,rW7,24,0,7 - blr - -/* - * ppc_decrypt_block: The central decryption function for a single 16 bytes - * block. It does no stack handling or register saving to support fast calls - * via bl/blr. It expects that caller has pre-xored input data with first - * 4 words of encryption key into rD0-rD3. Pointer/counter registers must - * have also been set up before (rT0, rKP, CTR). Output is stored in rD0-rD3 - * and rW0-rW3 and caller must execute a final xor on the output registers. - * All working registers rD0-rD3 & rW0-rW7 are overwritten during processing. - * - */ -_GLOBAL(ppc_decrypt_block) - LAH(rW0, rD1, 0, 12) - LAH(rW6, rD0, 3, 0) - LAH(rW3, rD0, 1, 8) -ppc_decrypt_block_loop: - LAH(rW1, rD3, 0, 12) - LAL(rW0, rD2, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAH(rW4, rD3, 2, 4) - LAL(rW4, rD0, 2, 4) - LAL(rW6, rD1, 3, 0) - LAH(rW5, rD1, 2, 4) - LAH(rW7, rD2, 3, 0) - LAL(rW7, rD3, 3, 0) - LAL(rW3, rD1, 1, 8) - evldw rD1,16(rKP) - EAD(rD0, 0) - evxor rW4,rW4,rW6 - LWL(rW1, 12) - evxor rW0,rW0,rW4 - EAD(rD2, 2) - evxor rW0,rW0,rW2 - LWL(rW5, 4) - evxor rD1,rD1,rW0 - evldw rD3,24(rKP) - evmergehi rD0,rD0,rD1 - EAD(rD1, 0) - evxor rW3,rW3,rW7 - LWH(rW0, 12) - evxor rW3,rW3,rW1 - EAD(rD0, 3) - evxor rD3,rD3,rW3 - LWH(rW6, 0) - evxor rD3,rD3,rW5 - EAD(rD0, 1) - evmergehi rD2,rD2,rD3 - LWH(rW3, 8) - LAH(rW1, rD3, 0, 12) - LAL(rW0, rD2, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAH(rW4, rD3, 2, 4) - LAL(rW4, rD0, 2, 4) - LAL(rW6, rD1, 3, 0) - LAH(rW5, rD1, 2, 4) - LAH(rW7, rD2, 3, 0) - LAL(rW7, rD3, 3, 0) - LAL(rW3, rD1, 1, 8) - evldw rD1,32(rKP) - EAD(rD0, 0) - evxor rW4,rW4,rW6 - LWL(rW1, 12) - evxor rW0,rW0,rW4 - EAD(rD2, 2) - evxor rW0,rW0,rW2 - LWL(rW5, 4) - evxor rD1,rD1,rW0 - evldw rD3,40(rKP) - evmergehi rD0,rD0,rD1 - EAD(rD1, 0) - evxor rW3,rW3,rW7 - LWH(rW0, 12) - evxor rW3,rW3,rW1 - EAD(rD0, 3) - evxor rD3,rD3,rW3 - LWH(rW6, 0) - evxor rD3,rD3,rW5 - EAD(rD0, 1) - evmergehi rD2,rD2,rD3 - LWH(rW3, 8) - addi rKP,rKP,32 - bdnz ppc_decrypt_block_loop - LAH(rW1, rD3, 0, 12) - LAL(rW0, rD2, 0, 12) - LAH(rW2, rD2, 1, 8) - LAL(rW2, rD3, 1, 8) - LAH(rW4, rD3, 2, 4) - LAL(rW4, rD0, 2, 4) - LAL(rW6, rD1, 3, 0) - LAH(rW5, rD1, 2, 4) - LAH(rW7, rD2, 3, 0) - LAL(rW7, rD3, 3, 0) - LAL(rW3, rD1, 1, 8) - evldw rD1,16(rKP) - EAD(rD0, 0) - evxor rW4,rW4,rW6 - LWL(rW1, 12) - evxor rW0,rW0,rW4 - EAD(rD2, 2) - evxor rW0,rW0,rW2 - LWL(rW5, 4) - evxor rD1,rD1,rW0 - evldw rD3,24(rKP) - evmergehi rD0,rD0,rD1 - DAD(rD1, 0) - evxor rW3,rW3,rW7 - LBD(rW0) - evxor rW3,rW3,rW1 - DAD(rD0, 1) - evxor rD3,rD3,rW3 - LBD(rW6) - evxor rD3,rD3,rW5 - DAD(rD0, 0) - evmergehi rD2,rD2,rD3 - LBD(rW3) - LAD(rW2, rD3, 0) - LAD(rW1, rD2, 0) - LAD(rW4, rD2, 1) - LAD(rW5, rD3, 1) - LAD(rW7, rD1, 1) - rlwimi rW0,rW4,8,16,23 - rlwimi rW1,rW5,8,16,23 - LAD(rW4, rD3, 2) - LAD(rW5, rD0, 2) - rlwimi rW2,rW6,8,16,23 - rlwimi rW3,rW7,8,16,23 - LAD(rW6, rD1, 2) - LAD(rW7, rD2, 2) - rlwimi rW0,rW4,16,8,15 - rlwimi rW1,rW5,16,8,15 - LAD(rW4, rD0, 3) - LAD(rW5, rD1, 3) - rlwimi rW2,rW6,16,8,15 - lwz rD0,32(rKP) - rlwimi rW3,rW7,16,8,15 - lwz rD1,36(rKP) - LAD(rW6, rD2, 3) - LAD(rW7, rD3, 3) - rlwimi rW0,rW4,24,0,7 - lwz rD2,40(rKP) - rlwimi rW1,rW5,24,0,7 - lwz rD3,44(rKP) - rlwimi rW2,rW6,24,0,7 - rlwimi rW3,rW7,24,0,7 - blr diff --git a/arch/powerpc/crypto/aes-spe-glue.c b/arch/powerpc/crypto/aes-spe-glue.c index efab78a3a8f6..7d2827e65240 100644 --- a/arch/powerpc/crypto/aes-spe-glue.c +++ b/arch/powerpc/crypto/aes-spe-glue.c @@ -51,30 +51,6 @@ struct ppc_xts_ctx { u32 rounds; }; -extern void ppc_encrypt_aes(u8 *out, const u8 *in, u32 *key_enc, u32 rounds); -extern void ppc_decrypt_aes(u8 *out, const u8 *in, u32 *key_dec, u32 rounds); -extern void ppc_encrypt_ecb(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, - u32 bytes); -extern void ppc_decrypt_ecb(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, - u32 bytes); -extern void ppc_encrypt_cbc(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, - u32 bytes, u8 *iv); -extern void ppc_decrypt_cbc(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, - u32 bytes, u8 *iv); -extern void ppc_crypt_ctr (u8 *out, const u8 *in, u32 *key_enc, u32 rounds, - u32 bytes, u8 *iv); -extern void ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, - u32 bytes, u8 *iv, u32 *key_twk); -extern void ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, - u32 bytes, u8 *iv, u32 *key_twk); - -extern void ppc_expand_key_128(u32 *key_enc, const u8 *key); -extern void ppc_expand_key_192(u32 *key_enc, const u8 *key); -extern void ppc_expand_key_256(u32 *key_enc, const u8 *key); - -extern void ppc_generate_decrypt_key(u32 *key_dec,u32 *key_enc, - unsigned int key_len); - static void spe_begin(void) { /* disable preemption and save users SPE registers if required */ @@ -89,10 +65,10 @@ static void spe_end(void) preempt_enable(); } -static int ppc_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) +static int ppc_aes_setkey_skcipher(struct crypto_skcipher *tfm, + const u8 *in_key, unsigned int key_len) { - struct ppc_aes_ctx *ctx = crypto_tfm_ctx(tfm); + struct ppc_aes_ctx *ctx = crypto_skcipher_ctx(tfm); switch (key_len) { case AES_KEYSIZE_128: @@ -116,12 +92,6 @@ static int ppc_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, return 0; } -static int ppc_aes_setkey_skcipher(struct crypto_skcipher *tfm, - const u8 *in_key, unsigned int key_len) -{ - return ppc_aes_setkey(crypto_skcipher_tfm(tfm), in_key, key_len); -} - static int ppc_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key, unsigned int key_len) { @@ -159,24 +129,6 @@ static int ppc_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key, return 0; } -static void ppc_aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct ppc_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - spe_begin(); - ppc_encrypt_aes(out, in, ctx->key_enc, ctx->rounds); - spe_end(); -} - -static void ppc_aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct ppc_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - spe_begin(); - ppc_decrypt_aes(out, in, ctx->key_dec, ctx->rounds); - spe_end(); -} - static int ppc_ecb_crypt(struct skcipher_request *req, bool enc) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -410,26 +362,6 @@ static int ppc_xts_decrypt(struct skcipher_request *req) * with kmalloc() in the crypto infrastructure */ -static struct crypto_alg aes_cipher_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-ppc-spe", - .cra_priority = 300, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct ppc_aes_ctx), - .cra_alignmask = 0, - .cra_module = THIS_MODULE, - .cra_u = { - .cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = ppc_aes_setkey, - .cia_encrypt = ppc_aes_encrypt, - .cia_decrypt = ppc_aes_decrypt - } - } -}; - static struct skcipher_alg aes_skcipher_algs[] = { { .base.cra_name = "ecb(aes)", @@ -488,22 +420,12 @@ static struct skcipher_alg aes_skcipher_algs[] = { static int __init ppc_aes_mod_init(void) { - int err; - - err = crypto_register_alg(&aes_cipher_alg); - if (err) - return err; - - err = crypto_register_skciphers(aes_skcipher_algs, - ARRAY_SIZE(aes_skcipher_algs)); - if (err) - crypto_unregister_alg(&aes_cipher_alg); - return err; + return crypto_register_skciphers(aes_skcipher_algs, + ARRAY_SIZE(aes_skcipher_algs)); } static void __exit ppc_aes_mod_fini(void) { - crypto_unregister_alg(&aes_cipher_alg); crypto_unregister_skciphers(aes_skcipher_algs, ARRAY_SIZE(aes_skcipher_algs)); } diff --git a/arch/powerpc/crypto/aes-spe-keys.S b/arch/powerpc/crypto/aes-spe-keys.S deleted file mode 100644 index 2e1bc0d099bf..000000000000 --- a/arch/powerpc/crypto/aes-spe-keys.S +++ /dev/null @@ -1,278 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Key handling functions for PPC AES implementation - * - * Copyright (c) 2015 Markus Stockhausen - */ - -#include - -#ifdef __BIG_ENDIAN__ -#define LOAD_KEY(d, s, off) \ - lwz d,off(s); -#else -#define LOAD_KEY(d, s, off) \ - li r0,off; \ - lwbrx d,s,r0; -#endif - -#define INITIALIZE_KEY \ - stwu r1,-32(r1); /* create stack frame */ \ - stw r14,8(r1); /* save registers */ \ - stw r15,12(r1); \ - stw r16,16(r1); - -#define FINALIZE_KEY \ - lwz r14,8(r1); /* restore registers */ \ - lwz r15,12(r1); \ - lwz r16,16(r1); \ - xor r5,r5,r5; /* clear sensitive data */ \ - xor r6,r6,r6; \ - xor r7,r7,r7; \ - xor r8,r8,r8; \ - xor r9,r9,r9; \ - xor r10,r10,r10; \ - xor r11,r11,r11; \ - xor r12,r12,r12; \ - addi r1,r1,32; /* cleanup stack */ - -#define LS_BOX(r, t1, t2) \ - lis t2,PPC_AES_4K_ENCTAB@h; \ - ori t2,t2,PPC_AES_4K_ENCTAB@l; \ - rlwimi t2,r,4,20,27; \ - lbz t1,8(t2); \ - rlwimi r,t1,0,24,31; \ - rlwimi t2,r,28,20,27; \ - lbz t1,8(t2); \ - rlwimi r,t1,8,16,23; \ - rlwimi t2,r,20,20,27; \ - lbz t1,8(t2); \ - rlwimi r,t1,16,8,15; \ - rlwimi t2,r,12,20,27; \ - lbz t1,8(t2); \ - rlwimi r,t1,24,0,7; - -#define GF8_MUL(out, in, t1, t2) \ - lis t1,0x8080; /* multiplication in GF8 */ \ - ori t1,t1,0x8080; \ - and t1,t1,in; \ - srwi t1,t1,7; \ - mulli t1,t1,0x1b; \ - lis t2,0x7f7f; \ - ori t2,t2,0x7f7f; \ - and t2,t2,in; \ - slwi t2,t2,1; \ - xor out,t1,t2; - -/* - * ppc_expand_key_128(u32 *key_enc, const u8 *key) - * - * Expand 128 bit key into 176 bytes encryption key. It consists of - * key itself plus 10 rounds with 16 bytes each - * - */ -_GLOBAL(ppc_expand_key_128) - INITIALIZE_KEY - LOAD_KEY(r5,r4,0) - LOAD_KEY(r6,r4,4) - LOAD_KEY(r7,r4,8) - LOAD_KEY(r8,r4,12) - stw r5,0(r3) /* key[0..3] = input data */ - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - li r16,10 /* 10 expansion rounds */ - lis r0,0x0100 /* RCO(1) */ -ppc_expand_128_loop: - addi r3,r3,16 - mr r14,r8 /* apply LS_BOX to 4th temp */ - rotlwi r14,r14,8 - LS_BOX(r14, r15, r4) - xor r14,r14,r0 - xor r5,r5,r14 /* xor next 4 keys */ - xor r6,r6,r5 - xor r7,r7,r6 - xor r8,r8,r7 - stw r5,0(r3) /* store next 4 keys */ - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - GF8_MUL(r0, r0, r4, r14) /* multiply RCO by 2 in GF */ - subi r16,r16,1 - cmpwi r16,0 - bt eq,ppc_expand_128_end - b ppc_expand_128_loop -ppc_expand_128_end: - FINALIZE_KEY - blr - -/* - * ppc_expand_key_192(u32 *key_enc, const u8 *key) - * - * Expand 192 bit key into 208 bytes encryption key. It consists of key - * itself plus 12 rounds with 16 bytes each - * - */ -_GLOBAL(ppc_expand_key_192) - INITIALIZE_KEY - LOAD_KEY(r5,r4,0) - LOAD_KEY(r6,r4,4) - LOAD_KEY(r7,r4,8) - LOAD_KEY(r8,r4,12) - LOAD_KEY(r9,r4,16) - LOAD_KEY(r10,r4,20) - stw r5,0(r3) - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - stw r9,16(r3) - stw r10,20(r3) - li r16,8 /* 8 expansion rounds */ - lis r0,0x0100 /* RCO(1) */ -ppc_expand_192_loop: - addi r3,r3,24 - mr r14,r10 /* apply LS_BOX to 6th temp */ - rotlwi r14,r14,8 - LS_BOX(r14, r15, r4) - xor r14,r14,r0 - xor r5,r5,r14 /* xor next 6 keys */ - xor r6,r6,r5 - xor r7,r7,r6 - xor r8,r8,r7 - xor r9,r9,r8 - xor r10,r10,r9 - stw r5,0(r3) - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - subi r16,r16,1 - cmpwi r16,0 /* last round early kick out */ - bt eq,ppc_expand_192_end - stw r9,16(r3) - stw r10,20(r3) - GF8_MUL(r0, r0, r4, r14) /* multiply RCO GF8 */ - b ppc_expand_192_loop -ppc_expand_192_end: - FINALIZE_KEY - blr - -/* - * ppc_expand_key_256(u32 *key_enc, const u8 *key) - * - * Expand 256 bit key into 240 bytes encryption key. It consists of key - * itself plus 14 rounds with 16 bytes each - * - */ -_GLOBAL(ppc_expand_key_256) - INITIALIZE_KEY - LOAD_KEY(r5,r4,0) - LOAD_KEY(r6,r4,4) - LOAD_KEY(r7,r4,8) - LOAD_KEY(r8,r4,12) - LOAD_KEY(r9,r4,16) - LOAD_KEY(r10,r4,20) - LOAD_KEY(r11,r4,24) - LOAD_KEY(r12,r4,28) - stw r5,0(r3) - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - stw r9,16(r3) - stw r10,20(r3) - stw r11,24(r3) - stw r12,28(r3) - li r16,7 /* 7 expansion rounds */ - lis r0,0x0100 /* RCO(1) */ -ppc_expand_256_loop: - addi r3,r3,32 - mr r14,r12 /* apply LS_BOX to 8th temp */ - rotlwi r14,r14,8 - LS_BOX(r14, r15, r4) - xor r14,r14,r0 - xor r5,r5,r14 /* xor 4 keys */ - xor r6,r6,r5 - xor r7,r7,r6 - xor r8,r8,r7 - mr r14,r8 - LS_BOX(r14, r15, r4) /* apply LS_BOX to 4th temp */ - xor r9,r9,r14 /* xor 4 keys */ - xor r10,r10,r9 - xor r11,r11,r10 - xor r12,r12,r11 - stw r5,0(r3) - stw r6,4(r3) - stw r7,8(r3) - stw r8,12(r3) - subi r16,r16,1 - cmpwi r16,0 /* last round early kick out */ - bt eq,ppc_expand_256_end - stw r9,16(r3) - stw r10,20(r3) - stw r11,24(r3) - stw r12,28(r3) - GF8_MUL(r0, r0, r4, r14) - b ppc_expand_256_loop -ppc_expand_256_end: - FINALIZE_KEY - blr - -/* - * ppc_generate_decrypt_key: derive decryption key from encryption key - * number of bytes to handle are calculated from length of key (16/24/32) - * - */ -_GLOBAL(ppc_generate_decrypt_key) - addi r6,r5,24 - slwi r6,r6,2 - lwzx r7,r4,r6 /* first/last 4 words are same */ - stw r7,0(r3) - lwz r7,0(r4) - stwx r7,r3,r6 - addi r6,r6,4 - lwzx r7,r4,r6 - stw r7,4(r3) - lwz r7,4(r4) - stwx r7,r3,r6 - addi r6,r6,4 - lwzx r7,r4,r6 - stw r7,8(r3) - lwz r7,8(r4) - stwx r7,r3,r6 - addi r6,r6,4 - lwzx r7,r4,r6 - stw r7,12(r3) - lwz r7,12(r4) - stwx r7,r3,r6 - addi r3,r3,16 - add r4,r4,r6 - subi r4,r4,28 - addi r5,r5,20 - srwi r5,r5,2 -ppc_generate_decrypt_block: - li r6,4 - mtctr r6 -ppc_generate_decrypt_word: - lwz r6,0(r4) - GF8_MUL(r7, r6, r0, r7) - GF8_MUL(r8, r7, r0, r8) - GF8_MUL(r9, r8, r0, r9) - xor r10,r9,r6 - xor r11,r7,r8 - xor r11,r11,r9 - xor r12,r7,r10 - rotrwi r12,r12,24 - xor r11,r11,r12 - xor r12,r8,r10 - rotrwi r12,r12,16 - xor r11,r11,r12 - rotrwi r12,r10,8 - xor r11,r11,r12 - stw r11,0(r3) - addi r3,r3,4 - addi r4,r4,4 - bdnz ppc_generate_decrypt_word - subi r4,r4,32 - subi r5,r5,1 - cmpwi r5,0 - bt gt,ppc_generate_decrypt_block - blr diff --git a/arch/powerpc/crypto/aes-spe-modes.S b/arch/powerpc/crypto/aes-spe-modes.S deleted file mode 100644 index 3f92a6a85785..000000000000 --- a/arch/powerpc/crypto/aes-spe-modes.S +++ /dev/null @@ -1,625 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * AES modes (ECB/CBC/CTR/XTS) for PPC AES implementation - * - * Copyright (c) 2015 Markus Stockhausen - */ - -#include -#include "aes-spe-regs.h" - -#ifdef __BIG_ENDIAN__ /* Macros for big endian builds */ - -#define LOAD_DATA(reg, off) \ - lwz reg,off(rSP); /* load with offset */ -#define SAVE_DATA(reg, off) \ - stw reg,off(rDP); /* save with offset */ -#define NEXT_BLOCK \ - addi rSP,rSP,16; /* increment pointers per bloc */ \ - addi rDP,rDP,16; -#define LOAD_IV(reg, off) \ - lwz reg,off(rIP); /* IV loading with offset */ -#define SAVE_IV(reg, off) \ - stw reg,off(rIP); /* IV saving with offset */ -#define START_IV /* nothing to reset */ -#define CBC_DEC 16 /* CBC decrement per block */ -#define CTR_DEC 1 /* CTR decrement one byte */ - -#else /* Macros for little endian */ - -#define LOAD_DATA(reg, off) \ - lwbrx reg,0,rSP; /* load reversed */ \ - addi rSP,rSP,4; /* and increment pointer */ -#define SAVE_DATA(reg, off) \ - stwbrx reg,0,rDP; /* save reversed */ \ - addi rDP,rDP,4; /* and increment pointer */ -#define NEXT_BLOCK /* nothing todo */ -#define LOAD_IV(reg, off) \ - lwbrx reg,0,rIP; /* load reversed */ \ - addi rIP,rIP,4; /* and increment pointer */ -#define SAVE_IV(reg, off) \ - stwbrx reg,0,rIP; /* load reversed */ \ - addi rIP,rIP,4; /* and increment pointer */ -#define START_IV \ - subi rIP,rIP,16; /* must reset pointer */ -#define CBC_DEC 32 /* 2 blocks because of incs */ -#define CTR_DEC 17 /* 1 block because of incs */ - -#endif - -#define SAVE_0_REGS -#define LOAD_0_REGS - -#define SAVE_4_REGS \ - stw rI0,96(r1); /* save 32 bit registers */ \ - stw rI1,100(r1); \ - stw rI2,104(r1); \ - stw rI3,108(r1); - -#define LOAD_4_REGS \ - lwz rI0,96(r1); /* restore 32 bit registers */ \ - lwz rI1,100(r1); \ - lwz rI2,104(r1); \ - lwz rI3,108(r1); - -#define SAVE_8_REGS \ - SAVE_4_REGS \ - stw rG0,112(r1); /* save 32 bit registers */ \ - stw rG1,116(r1); \ - stw rG2,120(r1); \ - stw rG3,124(r1); - -#define LOAD_8_REGS \ - LOAD_4_REGS \ - lwz rG0,112(r1); /* restore 32 bit registers */ \ - lwz rG1,116(r1); \ - lwz rG2,120(r1); \ - lwz rG3,124(r1); - -#define INITIALIZE_CRYPT(tab,nr32bitregs) \ - mflr r0; \ - stwu r1,-160(r1); /* create stack frame */ \ - lis rT0,tab@h; /* en-/decryption table pointer */ \ - stw r0,8(r1); /* save link register */ \ - ori rT0,rT0,tab@l; \ - evstdw r14,16(r1); \ - mr rKS,rKP; \ - evstdw r15,24(r1); /* We must save non volatile */ \ - evstdw r16,32(r1); /* registers. Take the chance */ \ - evstdw r17,40(r1); /* and save the SPE part too */ \ - evstdw r18,48(r1); \ - evstdw r19,56(r1); \ - evstdw r20,64(r1); \ - evstdw r21,72(r1); \ - evstdw r22,80(r1); \ - evstdw r23,88(r1); \ - SAVE_##nr32bitregs##_REGS - -#define FINALIZE_CRYPT(nr32bitregs) \ - lwz r0,8(r1); \ - evldw r14,16(r1); /* restore SPE registers */ \ - evldw r15,24(r1); \ - evldw r16,32(r1); \ - evldw r17,40(r1); \ - evldw r18,48(r1); \ - evldw r19,56(r1); \ - evldw r20,64(r1); \ - evldw r21,72(r1); \ - evldw r22,80(r1); \ - evldw r23,88(r1); \ - LOAD_##nr32bitregs##_REGS \ - mtlr r0; /* restore link register */ \ - xor r0,r0,r0; \ - stw r0,16(r1); /* delete sensitive data */ \ - stw r0,24(r1); /* that we might have pushed */ \ - stw r0,32(r1); /* from other context that runs */ \ - stw r0,40(r1); /* the same code */ \ - stw r0,48(r1); \ - stw r0,56(r1); \ - stw r0,64(r1); \ - stw r0,72(r1); \ - stw r0,80(r1); \ - stw r0,88(r1); \ - addi r1,r1,160; /* cleanup stack frame */ - -#define ENDIAN_SWAP(t0, t1, s0, s1) \ - rotrwi t0,s0,8; /* swap endianness for 2 GPRs */ \ - rotrwi t1,s1,8; \ - rlwimi t0,s0,8,8,15; \ - rlwimi t1,s1,8,8,15; \ - rlwimi t0,s0,8,24,31; \ - rlwimi t1,s1,8,24,31; - -#define GF128_MUL(d0, d1, d2, d3, t0) \ - li t0,0x87; /* multiplication in GF128 */ \ - cmpwi d3,-1; \ - iselgt t0,0,t0; \ - rlwimi d3,d2,0,0,0; /* propagate "carry" bits */ \ - rotlwi d3,d3,1; \ - rlwimi d2,d1,0,0,0; \ - rotlwi d2,d2,1; \ - rlwimi d1,d0,0,0,0; \ - slwi d0,d0,1; /* shift left 128 bit */ \ - rotlwi d1,d1,1; \ - xor d0,d0,t0; - -#define START_KEY(d0, d1, d2, d3) \ - lwz rW0,0(rKP); \ - mtctr rRR; \ - lwz rW1,4(rKP); \ - lwz rW2,8(rKP); \ - lwz rW3,12(rKP); \ - xor rD0,d0,rW0; \ - xor rD1,d1,rW1; \ - xor rD2,d2,rW2; \ - xor rD3,d3,rW3; - -/* - * ppc_encrypt_aes(u8 *out, const u8 *in, u32 *key_enc, - * u32 rounds) - * - * called from glue layer to encrypt a single 16 byte block - * round values are AES128 = 4, AES192 = 5, AES256 = 6 - * - */ -_GLOBAL(ppc_encrypt_aes) - INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 0) - LOAD_DATA(rD0, 0) - LOAD_DATA(rD1, 4) - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_encrypt_block - xor rD0,rD0,rW0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rW1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rW2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rW3 - SAVE_DATA(rD3, 12) - FINALIZE_CRYPT(0) - blr - -/* - * ppc_decrypt_aes(u8 *out, const u8 *in, u32 *key_dec, - * u32 rounds) - * - * called from glue layer to decrypt a single 16 byte block - * round values are AES128 = 4, AES192 = 5, AES256 = 6 - * - */ -_GLOBAL(ppc_decrypt_aes) - INITIALIZE_CRYPT(PPC_AES_4K_DECTAB,0) - LOAD_DATA(rD0, 0) - addi rT1,rT0,4096 - LOAD_DATA(rD1, 4) - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_decrypt_block - xor rD0,rD0,rW0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rW1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rW2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rW3 - SAVE_DATA(rD3, 12) - FINALIZE_CRYPT(0) - blr - -/* - * ppc_encrypt_ecb(u8 *out, const u8 *in, u32 *key_enc, - * u32 rounds, u32 bytes); - * - * called from glue layer to encrypt multiple blocks via ECB - * Bytes must be larger or equal 16 and only whole blocks are - * processed. round values are AES128 = 4, AES192 = 5 and - * AES256 = 6 - * - */ -_GLOBAL(ppc_encrypt_ecb) - INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 0) -ppc_encrypt_ecb_loop: - LOAD_DATA(rD0, 0) - mr rKP,rKS - LOAD_DATA(rD1, 4) - subi rLN,rLN,16 - LOAD_DATA(rD2, 8) - cmpwi rLN,15 - LOAD_DATA(rD3, 12) - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_encrypt_block - xor rD0,rD0,rW0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rW1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rW2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rW3 - SAVE_DATA(rD3, 12) - NEXT_BLOCK - bt gt,ppc_encrypt_ecb_loop - FINALIZE_CRYPT(0) - blr - -/* - * ppc_decrypt_ecb(u8 *out, const u8 *in, u32 *key_dec, - * u32 rounds, u32 bytes); - * - * called from glue layer to decrypt multiple blocks via ECB - * Bytes must be larger or equal 16 and only whole blocks are - * processed. round values are AES128 = 4, AES192 = 5 and - * AES256 = 6 - * - */ -_GLOBAL(ppc_decrypt_ecb) - INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 0) - addi rT1,rT0,4096 -ppc_decrypt_ecb_loop: - LOAD_DATA(rD0, 0) - mr rKP,rKS - LOAD_DATA(rD1, 4) - subi rLN,rLN,16 - LOAD_DATA(rD2, 8) - cmpwi rLN,15 - LOAD_DATA(rD3, 12) - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_decrypt_block - xor rD0,rD0,rW0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rW1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rW2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rW3 - SAVE_DATA(rD3, 12) - NEXT_BLOCK - bt gt,ppc_decrypt_ecb_loop - FINALIZE_CRYPT(0) - blr - -/* - * ppc_encrypt_cbc(u8 *out, const u8 *in, u32 *key_enc, - * 32 rounds, u32 bytes, u8 *iv); - * - * called from glue layer to encrypt multiple blocks via CBC - * Bytes must be larger or equal 16 and only whole blocks are - * processed. round values are AES128 = 4, AES192 = 5 and - * AES256 = 6 - * - */ -_GLOBAL(ppc_encrypt_cbc) - INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 4) - LOAD_IV(rI0, 0) - LOAD_IV(rI1, 4) - LOAD_IV(rI2, 8) - LOAD_IV(rI3, 12) -ppc_encrypt_cbc_loop: - LOAD_DATA(rD0, 0) - mr rKP,rKS - LOAD_DATA(rD1, 4) - subi rLN,rLN,16 - LOAD_DATA(rD2, 8) - cmpwi rLN,15 - LOAD_DATA(rD3, 12) - xor rD0,rD0,rI0 - xor rD1,rD1,rI1 - xor rD2,rD2,rI2 - xor rD3,rD3,rI3 - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_encrypt_block - xor rI0,rD0,rW0 - SAVE_DATA(rI0, 0) - xor rI1,rD1,rW1 - SAVE_DATA(rI1, 4) - xor rI2,rD2,rW2 - SAVE_DATA(rI2, 8) - xor rI3,rD3,rW3 - SAVE_DATA(rI3, 12) - NEXT_BLOCK - bt gt,ppc_encrypt_cbc_loop - START_IV - SAVE_IV(rI0, 0) - SAVE_IV(rI1, 4) - SAVE_IV(rI2, 8) - SAVE_IV(rI3, 12) - FINALIZE_CRYPT(4) - blr - -/* - * ppc_decrypt_cbc(u8 *out, const u8 *in, u32 *key_dec, - * u32 rounds, u32 bytes, u8 *iv); - * - * called from glue layer to decrypt multiple blocks via CBC - * round values are AES128 = 4, AES192 = 5, AES256 = 6 - * - */ -_GLOBAL(ppc_decrypt_cbc) - INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 4) - li rT1,15 - LOAD_IV(rI0, 0) - andc rLN,rLN,rT1 - LOAD_IV(rI1, 4) - subi rLN,rLN,16 - LOAD_IV(rI2, 8) - add rSP,rSP,rLN /* reverse processing */ - LOAD_IV(rI3, 12) - add rDP,rDP,rLN - LOAD_DATA(rD0, 0) - addi rT1,rT0,4096 - LOAD_DATA(rD1, 4) - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - START_IV - SAVE_IV(rD0, 0) - SAVE_IV(rD1, 4) - SAVE_IV(rD2, 8) - cmpwi rLN,16 - SAVE_IV(rD3, 12) - bt lt,ppc_decrypt_cbc_end -ppc_decrypt_cbc_loop: - mr rKP,rKS - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_decrypt_block - subi rLN,rLN,16 - subi rSP,rSP,CBC_DEC - xor rW0,rD0,rW0 - LOAD_DATA(rD0, 0) - xor rW1,rD1,rW1 - LOAD_DATA(rD1, 4) - xor rW2,rD2,rW2 - LOAD_DATA(rD2, 8) - xor rW3,rD3,rW3 - LOAD_DATA(rD3, 12) - xor rW0,rW0,rD0 - SAVE_DATA(rW0, 0) - xor rW1,rW1,rD1 - SAVE_DATA(rW1, 4) - xor rW2,rW2,rD2 - SAVE_DATA(rW2, 8) - xor rW3,rW3,rD3 - SAVE_DATA(rW3, 12) - cmpwi rLN,15 - subi rDP,rDP,CBC_DEC - bt gt,ppc_decrypt_cbc_loop -ppc_decrypt_cbc_end: - mr rKP,rKS - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_decrypt_block - xor rW0,rW0,rD0 - xor rW1,rW1,rD1 - xor rW2,rW2,rD2 - xor rW3,rW3,rD3 - xor rW0,rW0,rI0 /* decrypt with initial IV */ - SAVE_DATA(rW0, 0) - xor rW1,rW1,rI1 - SAVE_DATA(rW1, 4) - xor rW2,rW2,rI2 - SAVE_DATA(rW2, 8) - xor rW3,rW3,rI3 - SAVE_DATA(rW3, 12) - FINALIZE_CRYPT(4) - blr - -/* - * ppc_crypt_ctr(u8 *out, const u8 *in, u32 *key_enc, - * u32 rounds, u32 bytes, u8 *iv); - * - * called from glue layer to encrypt/decrypt multiple blocks - * via CTR. Number of bytes does not need to be a multiple of - * 16. Round values are AES128 = 4, AES192 = 5, AES256 = 6 - * - */ -_GLOBAL(ppc_crypt_ctr) - INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 4) - LOAD_IV(rI0, 0) - LOAD_IV(rI1, 4) - LOAD_IV(rI2, 8) - cmpwi rLN,16 - LOAD_IV(rI3, 12) - START_IV - bt lt,ppc_crypt_ctr_partial -ppc_crypt_ctr_loop: - mr rKP,rKS - START_KEY(rI0, rI1, rI2, rI3) - bl ppc_encrypt_block - xor rW0,rD0,rW0 - xor rW1,rD1,rW1 - xor rW2,rD2,rW2 - xor rW3,rD3,rW3 - LOAD_DATA(rD0, 0) - subi rLN,rLN,16 - LOAD_DATA(rD1, 4) - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - xor rD0,rD0,rW0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rW1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rW2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rW3 - SAVE_DATA(rD3, 12) - addic rI3,rI3,1 /* increase counter */ - addze rI2,rI2 - addze rI1,rI1 - addze rI0,rI0 - NEXT_BLOCK - cmpwi rLN,15 - bt gt,ppc_crypt_ctr_loop -ppc_crypt_ctr_partial: - cmpwi rLN,0 - bt eq,ppc_crypt_ctr_end - mr rKP,rKS - START_KEY(rI0, rI1, rI2, rI3) - bl ppc_encrypt_block - xor rW0,rD0,rW0 - SAVE_IV(rW0, 0) - xor rW1,rD1,rW1 - SAVE_IV(rW1, 4) - xor rW2,rD2,rW2 - SAVE_IV(rW2, 8) - xor rW3,rD3,rW3 - SAVE_IV(rW3, 12) - mtctr rLN - subi rIP,rIP,CTR_DEC - subi rSP,rSP,1 - subi rDP,rDP,1 -ppc_crypt_ctr_xorbyte: - lbzu rW4,1(rIP) /* bytewise xor for partial block */ - lbzu rW5,1(rSP) - xor rW4,rW4,rW5 - stbu rW4,1(rDP) - bdnz ppc_crypt_ctr_xorbyte - subf rIP,rLN,rIP - addi rIP,rIP,1 - addic rI3,rI3,1 - addze rI2,rI2 - addze rI1,rI1 - addze rI0,rI0 -ppc_crypt_ctr_end: - SAVE_IV(rI0, 0) - SAVE_IV(rI1, 4) - SAVE_IV(rI2, 8) - SAVE_IV(rI3, 12) - FINALIZE_CRYPT(4) - blr - -/* - * ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, - * u32 rounds, u32 bytes, u8 *iv, u32 *key_twk); - * - * called from glue layer to encrypt multiple blocks via XTS - * If key_twk is given, the initial IV encryption will be - * processed too. Round values are AES128 = 4, AES192 = 5, - * AES256 = 6 - * - */ -_GLOBAL(ppc_encrypt_xts) - INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 8) - LOAD_IV(rI0, 0) - LOAD_IV(rI1, 4) - LOAD_IV(rI2, 8) - cmpwi rKT,0 - LOAD_IV(rI3, 12) - bt eq,ppc_encrypt_xts_notweak - mr rKP,rKT - START_KEY(rI0, rI1, rI2, rI3) - bl ppc_encrypt_block - xor rI0,rD0,rW0 - xor rI1,rD1,rW1 - xor rI2,rD2,rW2 - xor rI3,rD3,rW3 -ppc_encrypt_xts_notweak: - ENDIAN_SWAP(rG0, rG1, rI0, rI1) - ENDIAN_SWAP(rG2, rG3, rI2, rI3) -ppc_encrypt_xts_loop: - LOAD_DATA(rD0, 0) - mr rKP,rKS - LOAD_DATA(rD1, 4) - subi rLN,rLN,16 - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - xor rD0,rD0,rI0 - xor rD1,rD1,rI1 - xor rD2,rD2,rI2 - xor rD3,rD3,rI3 - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_encrypt_block - xor rD0,rD0,rW0 - xor rD1,rD1,rW1 - xor rD2,rD2,rW2 - xor rD3,rD3,rW3 - xor rD0,rD0,rI0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rI1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rI2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rI3 - SAVE_DATA(rD3, 12) - GF128_MUL(rG0, rG1, rG2, rG3, rW0) - ENDIAN_SWAP(rI0, rI1, rG0, rG1) - ENDIAN_SWAP(rI2, rI3, rG2, rG3) - cmpwi rLN,0 - NEXT_BLOCK - bt gt,ppc_encrypt_xts_loop - START_IV - SAVE_IV(rI0, 0) - SAVE_IV(rI1, 4) - SAVE_IV(rI2, 8) - SAVE_IV(rI3, 12) - FINALIZE_CRYPT(8) - blr - -/* - * ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, - * u32 rounds, u32 blocks, u8 *iv, u32 *key_twk); - * - * called from glue layer to decrypt multiple blocks via XTS - * If key_twk is given, the initial IV encryption will be - * processed too. Round values are AES128 = 4, AES192 = 5, - * AES256 = 6 - * - */ -_GLOBAL(ppc_decrypt_xts) - INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 8) - LOAD_IV(rI0, 0) - addi rT1,rT0,4096 - LOAD_IV(rI1, 4) - LOAD_IV(rI2, 8) - cmpwi rKT,0 - LOAD_IV(rI3, 12) - bt eq,ppc_decrypt_xts_notweak - subi rT0,rT0,4096 - mr rKP,rKT - START_KEY(rI0, rI1, rI2, rI3) - bl ppc_encrypt_block - xor rI0,rD0,rW0 - xor rI1,rD1,rW1 - xor rI2,rD2,rW2 - xor rI3,rD3,rW3 - addi rT0,rT0,4096 -ppc_decrypt_xts_notweak: - ENDIAN_SWAP(rG0, rG1, rI0, rI1) - ENDIAN_SWAP(rG2, rG3, rI2, rI3) -ppc_decrypt_xts_loop: - LOAD_DATA(rD0, 0) - mr rKP,rKS - LOAD_DATA(rD1, 4) - subi rLN,rLN,16 - LOAD_DATA(rD2, 8) - LOAD_DATA(rD3, 12) - xor rD0,rD0,rI0 - xor rD1,rD1,rI1 - xor rD2,rD2,rI2 - xor rD3,rD3,rI3 - START_KEY(rD0, rD1, rD2, rD3) - bl ppc_decrypt_block - xor rD0,rD0,rW0 - xor rD1,rD1,rW1 - xor rD2,rD2,rW2 - xor rD3,rD3,rW3 - xor rD0,rD0,rI0 - SAVE_DATA(rD0, 0) - xor rD1,rD1,rI1 - SAVE_DATA(rD1, 4) - xor rD2,rD2,rI2 - SAVE_DATA(rD2, 8) - xor rD3,rD3,rI3 - SAVE_DATA(rD3, 12) - GF128_MUL(rG0, rG1, rG2, rG3, rW0) - ENDIAN_SWAP(rI0, rI1, rG0, rG1) - ENDIAN_SWAP(rI2, rI3, rG2, rG3) - cmpwi rLN,0 - NEXT_BLOCK - bt gt,ppc_decrypt_xts_loop - START_IV - SAVE_IV(rI0, 0) - SAVE_IV(rI1, 4) - SAVE_IV(rI2, 8) - SAVE_IV(rI3, 12) - FINALIZE_CRYPT(8) - blr diff --git a/arch/powerpc/crypto/aes-spe-regs.h b/arch/powerpc/crypto/aes-spe-regs.h deleted file mode 100644 index 2eb4c9b94152..000000000000 --- a/arch/powerpc/crypto/aes-spe-regs.h +++ /dev/null @@ -1,37 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Common registers for PPC AES implementation - * - * Copyright (c) 2015 Markus Stockhausen - */ - -#define rKS r0 /* copy of en-/decryption key pointer */ -#define rDP r3 /* destination pointer */ -#define rSP r4 /* source pointer */ -#define rKP r5 /* pointer to en-/decryption key pointer */ -#define rRR r6 /* en-/decryption rounds */ -#define rLN r7 /* length of data to be processed */ -#define rIP r8 /* potiner to IV (CBC/CTR/XTS modes) */ -#define rKT r9 /* pointer to tweak key (XTS mode) */ -#define rT0 r11 /* pointers to en-/decryption tables */ -#define rT1 r10 -#define rD0 r9 /* data */ -#define rD1 r14 -#define rD2 r12 -#define rD3 r15 -#define rW0 r16 /* working registers */ -#define rW1 r17 -#define rW2 r18 -#define rW3 r19 -#define rW4 r20 -#define rW5 r21 -#define rW6 r22 -#define rW7 r23 -#define rI0 r24 /* IV */ -#define rI1 r25 -#define rI2 r26 -#define rI3 r27 -#define rG0 r28 /* endian reversed tweak (XTS mode) */ -#define rG1 r29 -#define rG2 r30 -#define rG3 r31 diff --git a/arch/powerpc/crypto/aes-tab-4k.S b/arch/powerpc/crypto/aes-tab-4k.S deleted file mode 100644 index ceb604bc6f72..000000000000 --- a/arch/powerpc/crypto/aes-tab-4k.S +++ /dev/null @@ -1,326 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * 4K AES tables for PPC AES implementation - * - * Copyright (c) 2015 Markus Stockhausen - */ - -/* - * These big endian AES encryption/decryption tables have been taken from - * crypto/aes_generic.c and are designed to be simply accessed by a combination - * of rlwimi/lwz instructions with a minimum of table registers (usually only - * one required). Thus they are aligned to 4K. The locality of rotated values - * is derived from the reduced offsets that are available in the SPE load - * instructions. E.g. evldw, evlwwsplat, ... - * - * For the safety-conscious it has to be noted that they might be vulnerable - * to cache timing attacks because of their size. Nevertheless in contrast to - * the generic tables they have been reduced from 16KB to 8KB + 256 bytes. - * This is a quite good tradeoff for low power devices (e.g. routers) without - * dedicated encryption hardware where we usually have no multiuser - * environment. - * - */ - -#define R(a, b, c, d) \ - 0x##a##b##c##d, 0x##d##a##b##c, 0x##c##d##a##b, 0x##b##c##d##a - -.data -.align 12 -.globl PPC_AES_4K_ENCTAB -PPC_AES_4K_ENCTAB: -/* encryption table, same as crypto_ft_tab in crypto/aes-generic.c */ - .long R(c6, 63, 63, a5), R(f8, 7c, 7c, 84) - .long R(ee, 77, 77, 99), R(f6, 7b, 7b, 8d) - .long R(ff, f2, f2, 0d), R(d6, 6b, 6b, bd) - .long R(de, 6f, 6f, b1), R(91, c5, c5, 54) - .long R(60, 30, 30, 50), R(02, 01, 01, 03) - .long R(ce, 67, 67, a9), R(56, 2b, 2b, 7d) - .long R(e7, fe, fe, 19), R(b5, d7, d7, 62) - .long R(4d, ab, ab, e6), R(ec, 76, 76, 9a) - .long R(8f, ca, ca, 45), R(1f, 82, 82, 9d) - .long R(89, c9, c9, 40), R(fa, 7d, 7d, 87) - .long R(ef, fa, fa, 15), R(b2, 59, 59, eb) - .long R(8e, 47, 47, c9), R(fb, f0, f0, 0b) - .long R(41, ad, ad, ec), R(b3, d4, d4, 67) - .long R(5f, a2, a2, fd), R(45, af, af, ea) - .long R(23, 9c, 9c, bf), R(53, a4, a4, f7) - .long R(e4, 72, 72, 96), R(9b, c0, c0, 5b) - .long R(75, b7, b7, c2), R(e1, fd, fd, 1c) - .long R(3d, 93, 93, ae), R(4c, 26, 26, 6a) - .long R(6c, 36, 36, 5a), R(7e, 3f, 3f, 41) - .long R(f5, f7, f7, 02), R(83, cc, cc, 4f) - .long R(68, 34, 34, 5c), R(51, a5, a5, f4) - .long R(d1, e5, e5, 34), R(f9, f1, f1, 08) - .long R(e2, 71, 71, 93), R(ab, d8, d8, 73) - .long R(62, 31, 31, 53), R(2a, 15, 15, 3f) - .long R(08, 04, 04, 0c), R(95, c7, c7, 52) - .long R(46, 23, 23, 65), R(9d, c3, c3, 5e) - .long R(30, 18, 18, 28), R(37, 96, 96, a1) - .long R(0a, 05, 05, 0f), R(2f, 9a, 9a, b5) - .long R(0e, 07, 07, 09), R(24, 12, 12, 36) - .long R(1b, 80, 80, 9b), R(df, e2, e2, 3d) - .long R(cd, eb, eb, 26), R(4e, 27, 27, 69) - .long R(7f, b2, b2, cd), R(ea, 75, 75, 9f) - .long R(12, 09, 09, 1b), R(1d, 83, 83, 9e) - .long R(58, 2c, 2c, 74), R(34, 1a, 1a, 2e) - .long R(36, 1b, 1b, 2d), R(dc, 6e, 6e, b2) - .long R(b4, 5a, 5a, ee), R(5b, a0, a0, fb) - .long R(a4, 52, 52, f6), R(76, 3b, 3b, 4d) - .long R(b7, d6, d6, 61), R(7d, b3, b3, ce) - .long R(52, 29, 29, 7b), R(dd, e3, e3, 3e) - .long R(5e, 2f, 2f, 71), R(13, 84, 84, 97) - .long R(a6, 53, 53, f5), R(b9, d1, d1, 68) - .long R(00, 00, 00, 00), R(c1, ed, ed, 2c) - .long R(40, 20, 20, 60), R(e3, fc, fc, 1f) - .long R(79, b1, b1, c8), R(b6, 5b, 5b, ed) - .long R(d4, 6a, 6a, be), R(8d, cb, cb, 46) - .long R(67, be, be, d9), R(72, 39, 39, 4b) - .long R(94, 4a, 4a, de), R(98, 4c, 4c, d4) - .long R(b0, 58, 58, e8), R(85, cf, cf, 4a) - .long R(bb, d0, d0, 6b), R(c5, ef, ef, 2a) - .long R(4f, aa, aa, e5), R(ed, fb, fb, 16) - .long R(86, 43, 43, c5), R(9a, 4d, 4d, d7) - .long R(66, 33, 33, 55), R(11, 85, 85, 94) - .long R(8a, 45, 45, cf), R(e9, f9, f9, 10) - .long R(04, 02, 02, 06), R(fe, 7f, 7f, 81) - .long R(a0, 50, 50, f0), R(78, 3c, 3c, 44) - .long R(25, 9f, 9f, ba), R(4b, a8, a8, e3) - .long R(a2, 51, 51, f3), R(5d, a3, a3, fe) - .long R(80, 40, 40, c0), R(05, 8f, 8f, 8a) - .long R(3f, 92, 92, ad), R(21, 9d, 9d, bc) - .long R(70, 38, 38, 48), R(f1, f5, f5, 04) - .long R(63, bc, bc, df), R(77, b6, b6, c1) - .long R(af, da, da, 75), R(42, 21, 21, 63) - .long R(20, 10, 10, 30), R(e5, ff, ff, 1a) - .long R(fd, f3, f3, 0e), R(bf, d2, d2, 6d) - .long R(81, cd, cd, 4c), R(18, 0c, 0c, 14) - .long R(26, 13, 13, 35), R(c3, ec, ec, 2f) - .long R(be, 5f, 5f, e1), R(35, 97, 97, a2) - .long R(88, 44, 44, cc), R(2e, 17, 17, 39) - .long R(93, c4, c4, 57), R(55, a7, a7, f2) - .long R(fc, 7e, 7e, 82), R(7a, 3d, 3d, 47) - .long R(c8, 64, 64, ac), R(ba, 5d, 5d, e7) - .long R(32, 19, 19, 2b), R(e6, 73, 73, 95) - .long R(c0, 60, 60, a0), R(19, 81, 81, 98) - .long R(9e, 4f, 4f, d1), R(a3, dc, dc, 7f) - .long R(44, 22, 22, 66), R(54, 2a, 2a, 7e) - .long R(3b, 90, 90, ab), R(0b, 88, 88, 83) - .long R(8c, 46, 46, ca), R(c7, ee, ee, 29) - .long R(6b, b8, b8, d3), R(28, 14, 14, 3c) - .long R(a7, de, de, 79), R(bc, 5e, 5e, e2) - .long R(16, 0b, 0b, 1d), R(ad, db, db, 76) - .long R(db, e0, e0, 3b), R(64, 32, 32, 56) - .long R(74, 3a, 3a, 4e), R(14, 0a, 0a, 1e) - .long R(92, 49, 49, db), R(0c, 06, 06, 0a) - .long R(48, 24, 24, 6c), R(b8, 5c, 5c, e4) - .long R(9f, c2, c2, 5d), R(bd, d3, d3, 6e) - .long R(43, ac, ac, ef), R(c4, 62, 62, a6) - .long R(39, 91, 91, a8), R(31, 95, 95, a4) - .long R(d3, e4, e4, 37), R(f2, 79, 79, 8b) - .long R(d5, e7, e7, 32), R(8b, c8, c8, 43) - .long R(6e, 37, 37, 59), R(da, 6d, 6d, b7) - .long R(01, 8d, 8d, 8c), R(b1, d5, d5, 64) - .long R(9c, 4e, 4e, d2), R(49, a9, a9, e0) - .long R(d8, 6c, 6c, b4), R(ac, 56, 56, fa) - .long R(f3, f4, f4, 07), R(cf, ea, ea, 25) - .long R(ca, 65, 65, af), R(f4, 7a, 7a, 8e) - .long R(47, ae, ae, e9), R(10, 08, 08, 18) - .long R(6f, ba, ba, d5), R(f0, 78, 78, 88) - .long R(4a, 25, 25, 6f), R(5c, 2e, 2e, 72) - .long R(38, 1c, 1c, 24), R(57, a6, a6, f1) - .long R(73, b4, b4, c7), R(97, c6, c6, 51) - .long R(cb, e8, e8, 23), R(a1, dd, dd, 7c) - .long R(e8, 74, 74, 9c), R(3e, 1f, 1f, 21) - .long R(96, 4b, 4b, dd), R(61, bd, bd, dc) - .long R(0d, 8b, 8b, 86), R(0f, 8a, 8a, 85) - .long R(e0, 70, 70, 90), R(7c, 3e, 3e, 42) - .long R(71, b5, b5, c4), R(cc, 66, 66, aa) - .long R(90, 48, 48, d8), R(06, 03, 03, 05) - .long R(f7, f6, f6, 01), R(1c, 0e, 0e, 12) - .long R(c2, 61, 61, a3), R(6a, 35, 35, 5f) - .long R(ae, 57, 57, f9), R(69, b9, b9, d0) - .long R(17, 86, 86, 91), R(99, c1, c1, 58) - .long R(3a, 1d, 1d, 27), R(27, 9e, 9e, b9) - .long R(d9, e1, e1, 38), R(eb, f8, f8, 13) - .long R(2b, 98, 98, b3), R(22, 11, 11, 33) - .long R(d2, 69, 69, bb), R(a9, d9, d9, 70) - .long R(07, 8e, 8e, 89), R(33, 94, 94, a7) - .long R(2d, 9b, 9b, b6), R(3c, 1e, 1e, 22) - .long R(15, 87, 87, 92), R(c9, e9, e9, 20) - .long R(87, ce, ce, 49), R(aa, 55, 55, ff) - .long R(50, 28, 28, 78), R(a5, df, df, 7a) - .long R(03, 8c, 8c, 8f), R(59, a1, a1, f8) - .long R(09, 89, 89, 80), R(1a, 0d, 0d, 17) - .long R(65, bf, bf, da), R(d7, e6, e6, 31) - .long R(84, 42, 42, c6), R(d0, 68, 68, b8) - .long R(82, 41, 41, c3), R(29, 99, 99, b0) - .long R(5a, 2d, 2d, 77), R(1e, 0f, 0f, 11) - .long R(7b, b0, b0, cb), R(a8, 54, 54, fc) - .long R(6d, bb, bb, d6), R(2c, 16, 16, 3a) -.globl PPC_AES_4K_DECTAB -PPC_AES_4K_DECTAB: -/* decryption table, same as crypto_it_tab in crypto/aes-generic.c */ - .long R(51, f4, a7, 50), R(7e, 41, 65, 53) - .long R(1a, 17, a4, c3), R(3a, 27, 5e, 96) - .long R(3b, ab, 6b, cb), R(1f, 9d, 45, f1) - .long R(ac, fa, 58, ab), R(4b, e3, 03, 93) - .long R(20, 30, fa, 55), R(ad, 76, 6d, f6) - .long R(88, cc, 76, 91), R(f5, 02, 4c, 25) - .long R(4f, e5, d7, fc), R(c5, 2a, cb, d7) - .long R(26, 35, 44, 80), R(b5, 62, a3, 8f) - .long R(de, b1, 5a, 49), R(25, ba, 1b, 67) - .long R(45, ea, 0e, 98), R(5d, fe, c0, e1) - .long R(c3, 2f, 75, 02), R(81, 4c, f0, 12) - .long R(8d, 46, 97, a3), R(6b, d3, f9, c6) - .long R(03, 8f, 5f, e7), R(15, 92, 9c, 95) - .long R(bf, 6d, 7a, eb), R(95, 52, 59, da) - .long R(d4, be, 83, 2d), R(58, 74, 21, d3) - .long R(49, e0, 69, 29), R(8e, c9, c8, 44) - .long R(75, c2, 89, 6a), R(f4, 8e, 79, 78) - .long R(99, 58, 3e, 6b), R(27, b9, 71, dd) - .long R(be, e1, 4f, b6), R(f0, 88, ad, 17) - .long R(c9, 20, ac, 66), R(7d, ce, 3a, b4) - .long R(63, df, 4a, 18), R(e5, 1a, 31, 82) - .long R(97, 51, 33, 60), R(62, 53, 7f, 45) - .long R(b1, 64, 77, e0), R(bb, 6b, ae, 84) - .long R(fe, 81, a0, 1c), R(f9, 08, 2b, 94) - .long R(70, 48, 68, 58), R(8f, 45, fd, 19) - .long R(94, de, 6c, 87), R(52, 7b, f8, b7) - .long R(ab, 73, d3, 23), R(72, 4b, 02, e2) - .long R(e3, 1f, 8f, 57), R(66, 55, ab, 2a) - .long R(b2, eb, 28, 07), R(2f, b5, c2, 03) - .long R(86, c5, 7b, 9a), R(d3, 37, 08, a5) - .long R(30, 28, 87, f2), R(23, bf, a5, b2) - .long R(02, 03, 6a, ba), R(ed, 16, 82, 5c) - .long R(8a, cf, 1c, 2b), R(a7, 79, b4, 92) - .long R(f3, 07, f2, f0), R(4e, 69, e2, a1) - .long R(65, da, f4, cd), R(06, 05, be, d5) - .long R(d1, 34, 62, 1f), R(c4, a6, fe, 8a) - .long R(34, 2e, 53, 9d), R(a2, f3, 55, a0) - .long R(05, 8a, e1, 32), R(a4, f6, eb, 75) - .long R(0b, 83, ec, 39), R(40, 60, ef, aa) - .long R(5e, 71, 9f, 06), R(bd, 6e, 10, 51) - .long R(3e, 21, 8a, f9), R(96, dd, 06, 3d) - .long R(dd, 3e, 05, ae), R(4d, e6, bd, 46) - .long R(91, 54, 8d, b5), R(71, c4, 5d, 05) - .long R(04, 06, d4, 6f), R(60, 50, 15, ff) - .long R(19, 98, fb, 24), R(d6, bd, e9, 97) - .long R(89, 40, 43, cc), R(67, d9, 9e, 77) - .long R(b0, e8, 42, bd), R(07, 89, 8b, 88) - .long R(e7, 19, 5b, 38), R(79, c8, ee, db) - .long R(a1, 7c, 0a, 47), R(7c, 42, 0f, e9) - .long R(f8, 84, 1e, c9), R(00, 00, 00, 00) - .long R(09, 80, 86, 83), R(32, 2b, ed, 48) - .long R(1e, 11, 70, ac), R(6c, 5a, 72, 4e) - .long R(fd, 0e, ff, fb), R(0f, 85, 38, 56) - .long R(3d, ae, d5, 1e), R(36, 2d, 39, 27) - .long R(0a, 0f, d9, 64), R(68, 5c, a6, 21) - .long R(9b, 5b, 54, d1), R(24, 36, 2e, 3a) - .long R(0c, 0a, 67, b1), R(93, 57, e7, 0f) - .long R(b4, ee, 96, d2), R(1b, 9b, 91, 9e) - .long R(80, c0, c5, 4f), R(61, dc, 20, a2) - .long R(5a, 77, 4b, 69), R(1c, 12, 1a, 16) - .long R(e2, 93, ba, 0a), R(c0, a0, 2a, e5) - .long R(3c, 22, e0, 43), R(12, 1b, 17, 1d) - .long R(0e, 09, 0d, 0b), R(f2, 8b, c7, ad) - .long R(2d, b6, a8, b9), R(14, 1e, a9, c8) - .long R(57, f1, 19, 85), R(af, 75, 07, 4c) - .long R(ee, 99, dd, bb), R(a3, 7f, 60, fd) - .long R(f7, 01, 26, 9f), R(5c, 72, f5, bc) - .long R(44, 66, 3b, c5), R(5b, fb, 7e, 34) - .long R(8b, 43, 29, 76), R(cb, 23, c6, dc) - .long R(b6, ed, fc, 68), R(b8, e4, f1, 63) - .long R(d7, 31, dc, ca), R(42, 63, 85, 10) - .long R(13, 97, 22, 40), R(84, c6, 11, 20) - .long R(85, 4a, 24, 7d), R(d2, bb, 3d, f8) - .long R(ae, f9, 32, 11), R(c7, 29, a1, 6d) - .long R(1d, 9e, 2f, 4b), R(dc, b2, 30, f3) - .long R(0d, 86, 52, ec), R(77, c1, e3, d0) - .long R(2b, b3, 16, 6c), R(a9, 70, b9, 99) - .long R(11, 94, 48, fa), R(47, e9, 64, 22) - .long R(a8, fc, 8c, c4), R(a0, f0, 3f, 1a) - .long R(56, 7d, 2c, d8), R(22, 33, 90, ef) - .long R(87, 49, 4e, c7), R(d9, 38, d1, c1) - .long R(8c, ca, a2, fe), R(98, d4, 0b, 36) - .long R(a6, f5, 81, cf), R(a5, 7a, de, 28) - .long R(da, b7, 8e, 26), R(3f, ad, bf, a4) - .long R(2c, 3a, 9d, e4), R(50, 78, 92, 0d) - .long R(6a, 5f, cc, 9b), R(54, 7e, 46, 62) - .long R(f6, 8d, 13, c2), R(90, d8, b8, e8) - .long R(2e, 39, f7, 5e), R(82, c3, af, f5) - .long R(9f, 5d, 80, be), R(69, d0, 93, 7c) - .long R(6f, d5, 2d, a9), R(cf, 25, 12, b3) - .long R(c8, ac, 99, 3b), R(10, 18, 7d, a7) - .long R(e8, 9c, 63, 6e), R(db, 3b, bb, 7b) - .long R(cd, 26, 78, 09), R(6e, 59, 18, f4) - .long R(ec, 9a, b7, 01), R(83, 4f, 9a, a8) - .long R(e6, 95, 6e, 65), R(aa, ff, e6, 7e) - .long R(21, bc, cf, 08), R(ef, 15, e8, e6) - .long R(ba, e7, 9b, d9), R(4a, 6f, 36, ce) - .long R(ea, 9f, 09, d4), R(29, b0, 7c, d6) - .long R(31, a4, b2, af), R(2a, 3f, 23, 31) - .long R(c6, a5, 94, 30), R(35, a2, 66, c0) - .long R(74, 4e, bc, 37), R(fc, 82, ca, a6) - .long R(e0, 90, d0, b0), R(33, a7, d8, 15) - .long R(f1, 04, 98, 4a), R(41, ec, da, f7) - .long R(7f, cd, 50, 0e), R(17, 91, f6, 2f) - .long R(76, 4d, d6, 8d), R(43, ef, b0, 4d) - .long R(cc, aa, 4d, 54), R(e4, 96, 04, df) - .long R(9e, d1, b5, e3), R(4c, 6a, 88, 1b) - .long R(c1, 2c, 1f, b8), R(46, 65, 51, 7f) - .long R(9d, 5e, ea, 04), R(01, 8c, 35, 5d) - .long R(fa, 87, 74, 73), R(fb, 0b, 41, 2e) - .long R(b3, 67, 1d, 5a), R(92, db, d2, 52) - .long R(e9, 10, 56, 33), R(6d, d6, 47, 13) - .long R(9a, d7, 61, 8c), R(37, a1, 0c, 7a) - .long R(59, f8, 14, 8e), R(eb, 13, 3c, 89) - .long R(ce, a9, 27, ee), R(b7, 61, c9, 35) - .long R(e1, 1c, e5, ed), R(7a, 47, b1, 3c) - .long R(9c, d2, df, 59), R(55, f2, 73, 3f) - .long R(18, 14, ce, 79), R(73, c7, 37, bf) - .long R(53, f7, cd, ea), R(5f, fd, aa, 5b) - .long R(df, 3d, 6f, 14), R(78, 44, db, 86) - .long R(ca, af, f3, 81), R(b9, 68, c4, 3e) - .long R(38, 24, 34, 2c), R(c2, a3, 40, 5f) - .long R(16, 1d, c3, 72), R(bc, e2, 25, 0c) - .long R(28, 3c, 49, 8b), R(ff, 0d, 95, 41) - .long R(39, a8, 01, 71), R(08, 0c, b3, de) - .long R(d8, b4, e4, 9c), R(64, 56, c1, 90) - .long R(7b, cb, 84, 61), R(d5, 32, b6, 70) - .long R(48, 6c, 5c, 74), R(d0, b8, 57, 42) -.globl PPC_AES_4K_DECTAB2 -PPC_AES_4K_DECTAB2: -/* decryption table, same as crypto_il_tab in crypto/aes-generic.c */ - .byte 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38 - .byte 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb - .byte 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87 - .byte 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb - .byte 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d - .byte 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e - .byte 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2 - .byte 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25 - .byte 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16 - .byte 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92 - .byte 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda - .byte 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84 - .byte 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a - .byte 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06 - .byte 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02 - .byte 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b - .byte 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea - .byte 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73 - .byte 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85 - .byte 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e - .byte 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89 - .byte 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b - .byte 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20 - .byte 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4 - .byte 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31 - .byte 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f - .byte 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d - .byte 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef - .byte 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0 - .byte 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61 - .byte 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26 - .byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 18af1acbde58..c893c9214cb7 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -20,10 +20,22 @@ union aes_enckey_arch { u32 rndkeys[AES_MAX_KEYLENGTH_U32]; +#ifdef CONFIG_CRYPTO_LIB_AES_ARCH +#if defined(CONFIG_PPC) && defined(CONFIG_SPE) + /* Used unconditionally (when SPE AES code is enabled in kconfig) */ + u32 spe_enc_key[AES_MAX_KEYLENGTH_U32] __aligned(8); +#endif +#endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; union aes_invkey_arch { u32 inv_rndkeys[AES_MAX_KEYLENGTH_U32]; +#ifdef CONFIG_CRYPTO_LIB_AES_ARCH +#if defined(CONFIG_PPC) && defined(CONFIG_SPE) + /* Used unconditionally (when SPE AES code is enabled in kconfig) */ + u32 spe_dec_key[AES_MAX_KEYLENGTH_U32] __aligned(8); +#endif +#endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; /** @@ -124,6 +136,25 @@ int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, #ifdef CONFIG_ARM64 int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len); +#elif defined(CONFIG_PPC) +void ppc_expand_key_128(u32 *key_enc, const u8 *key); +void ppc_expand_key_192(u32 *key_enc, const u8 *key); +void ppc_expand_key_256(u32 *key_enc, const u8 *key); +void ppc_generate_decrypt_key(u32 *key_dec, u32 *key_enc, unsigned int key_len); +void ppc_encrypt_ecb(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, + u32 bytes); +void ppc_decrypt_ecb(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, + u32 bytes); +void ppc_encrypt_cbc(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, u32 bytes, + u8 *iv); +void ppc_decrypt_cbc(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, u32 bytes, + u8 *iv); +void ppc_crypt_ctr(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, u32 bytes, + u8 *iv); +void ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, u32 bytes, + u8 *iv, u32 *key_twk); +void ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, u32 bytes, + u8 *iv, u32 *key_twk); #endif /** diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index ead47b2a7db6..cfc6171203d0 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -16,6 +16,7 @@ config CRYPTO_LIB_AES_ARCH depends on CRYPTO_LIB_AES && !UML && !KMSAN default y if ARM default y if ARM64 + default y if PPC && SPE config CRYPTO_LIB_AESCFB tristate diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 1b690c63fafb..d68fde004104 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -29,6 +29,15 @@ libaes-y += arm64/aes-cipher-core.o libaes-$(CONFIG_KERNEL_MODE_NEON) += arm64/aes-ce-core.o endif +ifeq ($(CONFIG_PPC),y) +ifeq ($(CONFIG_SPE),y) +libaes-y += powerpc/aes-spe-core.o \ + powerpc/aes-spe-keys.o \ + powerpc/aes-spe-modes.o \ + powerpc/aes-tab-4k.o +endif +endif # CONFIG_PPC + endif # CONFIG_CRYPTO_LIB_AES_ARCH ################################################################################ diff --git a/lib/crypto/powerpc/aes-spe-core.S b/lib/crypto/powerpc/aes-spe-core.S new file mode 100644 index 000000000000..8e00eccc352b --- /dev/null +++ b/lib/crypto/powerpc/aes-spe-core.S @@ -0,0 +1,346 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Fast AES implementation for SPE instruction set (PPC) + * + * This code makes use of the SPE SIMD instruction set as defined in + * http://cache.freescale.com/files/32bit/doc/ref_manual/SPEPIM.pdf + * Implementation is based on optimization guide notes from + * http://cache.freescale.com/files/32bit/doc/app_note/AN2665.pdf + * + * Copyright (c) 2015 Markus Stockhausen + */ + +#include +#include "aes-spe-regs.h" + +#define EAD(in, bpos) \ + rlwimi rT0,in,28-((bpos+3)%4)*8,20,27; + +#define DAD(in, bpos) \ + rlwimi rT1,in,24-((bpos+3)%4)*8,24,31; + +#define LWH(out, off) \ + evlwwsplat out,off(rT0); /* load word high */ + +#define LWL(out, off) \ + lwz out,off(rT0); /* load word low */ + +#define LBZ(out, tab, off) \ + lbz out,off(tab); /* load byte */ + +#define LAH(out, in, bpos, off) \ + EAD(in, bpos) /* calc addr + load word high */ \ + LWH(out, off) + +#define LAL(out, in, bpos, off) \ + EAD(in, bpos) /* calc addr + load word low */ \ + LWL(out, off) + +#define LAE(out, in, bpos) \ + EAD(in, bpos) /* calc addr + load enc byte */ \ + LBZ(out, rT0, 8) + +#define LBE(out) \ + LBZ(out, rT0, 8) /* load enc byte */ + +#define LAD(out, in, bpos) \ + DAD(in, bpos) /* calc addr + load dec byte */ \ + LBZ(out, rT1, 0) + +#define LBD(out) \ + LBZ(out, rT1, 0) + +/* + * ppc_encrypt_block: The central encryption function for a single 16 bytes + * block. It does no stack handling or register saving to support fast calls + * via bl/blr. It expects that caller has pre-xored input data with first + * 4 words of encryption key into rD0-rD3. Pointer/counter registers must + * have also been set up before (rT0, rKP, CTR). Output is stored in rD0-rD3 + * and rW0-rW3 and caller must execute a final xor on the output registers. + * All working registers rD0-rD3 & rW0-rW7 are overwritten during processing. + * + */ +_GLOBAL(ppc_encrypt_block) + LAH(rW4, rD1, 2, 4) + LAH(rW6, rD0, 3, 0) + LAH(rW3, rD0, 1, 8) +ppc_encrypt_block_loop: + LAH(rW0, rD3, 0, 12) + LAL(rW0, rD0, 0, 12) + LAH(rW1, rD1, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAL(rW3, rD1, 1, 8) + LAL(rW4, rD2, 2, 4) + LAL(rW6, rD1, 3, 0) + LAH(rW5, rD3, 2, 4) + LAL(rW5, rD0, 2, 4) + LAH(rW7, rD2, 3, 0) + evldw rD1,16(rKP) + EAD(rD3, 3) + evxor rW2,rW2,rW4 + LWL(rW7, 0) + evxor rW2,rW2,rW6 + EAD(rD2, 0) + evxor rD1,rD1,rW2 + LWL(rW1, 12) + evxor rD1,rD1,rW0 + evldw rD3,24(rKP) + evmergehi rD0,rD0,rD1 + EAD(rD1, 2) + evxor rW3,rW3,rW5 + LWH(rW4, 4) + evxor rW3,rW3,rW7 + EAD(rD0, 3) + evxor rD3,rD3,rW3 + LWH(rW6, 0) + evxor rD3,rD3,rW1 + EAD(rD0, 1) + evmergehi rD2,rD2,rD3 + LWH(rW3, 8) + LAH(rW0, rD3, 0, 12) + LAL(rW0, rD0, 0, 12) + LAH(rW1, rD1, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAL(rW3, rD1, 1, 8) + LAL(rW4, rD2, 2, 4) + LAL(rW6, rD1, 3, 0) + LAH(rW5, rD3, 2, 4) + LAL(rW5, rD0, 2, 4) + LAH(rW7, rD2, 3, 0) + evldw rD1,32(rKP) + EAD(rD3, 3) + evxor rW2,rW2,rW4 + LWL(rW7, 0) + evxor rW2,rW2,rW6 + EAD(rD2, 0) + evxor rD1,rD1,rW2 + LWL(rW1, 12) + evxor rD1,rD1,rW0 + evldw rD3,40(rKP) + evmergehi rD0,rD0,rD1 + EAD(rD1, 2) + evxor rW3,rW3,rW5 + LWH(rW4, 4) + evxor rW3,rW3,rW7 + EAD(rD0, 3) + evxor rD3,rD3,rW3 + LWH(rW6, 0) + evxor rD3,rD3,rW1 + EAD(rD0, 1) + evmergehi rD2,rD2,rD3 + LWH(rW3, 8) + addi rKP,rKP,32 + bdnz ppc_encrypt_block_loop + LAH(rW0, rD3, 0, 12) + LAL(rW0, rD0, 0, 12) + LAH(rW1, rD1, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAL(rW3, rD1, 1, 8) + LAL(rW4, rD2, 2, 4) + LAH(rW5, rD3, 2, 4) + LAL(rW6, rD1, 3, 0) + LAL(rW5, rD0, 2, 4) + LAH(rW7, rD2, 3, 0) + evldw rD1,16(rKP) + EAD(rD3, 3) + evxor rW2,rW2,rW4 + LWL(rW7, 0) + evxor rW2,rW2,rW6 + EAD(rD2, 0) + evxor rD1,rD1,rW2 + LWL(rW1, 12) + evxor rD1,rD1,rW0 + evldw rD3,24(rKP) + evmergehi rD0,rD0,rD1 + EAD(rD1, 0) + evxor rW3,rW3,rW5 + LBE(rW2) + evxor rW3,rW3,rW7 + EAD(rD0, 1) + evxor rD3,rD3,rW3 + LBE(rW6) + evxor rD3,rD3,rW1 + EAD(rD0, 0) + evmergehi rD2,rD2,rD3 + LBE(rW1) + LAE(rW0, rD3, 0) + LAE(rW1, rD0, 0) + LAE(rW4, rD2, 1) + LAE(rW5, rD3, 1) + LAE(rW3, rD2, 0) + LAE(rW7, rD1, 1) + rlwimi rW0,rW4,8,16,23 + rlwimi rW1,rW5,8,16,23 + LAE(rW4, rD1, 2) + LAE(rW5, rD2, 2) + rlwimi rW2,rW6,8,16,23 + rlwimi rW3,rW7,8,16,23 + LAE(rW6, rD3, 2) + LAE(rW7, rD0, 2) + rlwimi rW0,rW4,16,8,15 + rlwimi rW1,rW5,16,8,15 + LAE(rW4, rD0, 3) + LAE(rW5, rD1, 3) + rlwimi rW2,rW6,16,8,15 + lwz rD0,32(rKP) + rlwimi rW3,rW7,16,8,15 + lwz rD1,36(rKP) + LAE(rW6, rD2, 3) + LAE(rW7, rD3, 3) + rlwimi rW0,rW4,24,0,7 + lwz rD2,40(rKP) + rlwimi rW1,rW5,24,0,7 + lwz rD3,44(rKP) + rlwimi rW2,rW6,24,0,7 + rlwimi rW3,rW7,24,0,7 + blr + +/* + * ppc_decrypt_block: The central decryption function for a single 16 bytes + * block. It does no stack handling or register saving to support fast calls + * via bl/blr. It expects that caller has pre-xored input data with first + * 4 words of encryption key into rD0-rD3. Pointer/counter registers must + * have also been set up before (rT0, rKP, CTR). Output is stored in rD0-rD3 + * and rW0-rW3 and caller must execute a final xor on the output registers. + * All working registers rD0-rD3 & rW0-rW7 are overwritten during processing. + * + */ +_GLOBAL(ppc_decrypt_block) + LAH(rW0, rD1, 0, 12) + LAH(rW6, rD0, 3, 0) + LAH(rW3, rD0, 1, 8) +ppc_decrypt_block_loop: + LAH(rW1, rD3, 0, 12) + LAL(rW0, rD2, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAH(rW4, rD3, 2, 4) + LAL(rW4, rD0, 2, 4) + LAL(rW6, rD1, 3, 0) + LAH(rW5, rD1, 2, 4) + LAH(rW7, rD2, 3, 0) + LAL(rW7, rD3, 3, 0) + LAL(rW3, rD1, 1, 8) + evldw rD1,16(rKP) + EAD(rD0, 0) + evxor rW4,rW4,rW6 + LWL(rW1, 12) + evxor rW0,rW0,rW4 + EAD(rD2, 2) + evxor rW0,rW0,rW2 + LWL(rW5, 4) + evxor rD1,rD1,rW0 + evldw rD3,24(rKP) + evmergehi rD0,rD0,rD1 + EAD(rD1, 0) + evxor rW3,rW3,rW7 + LWH(rW0, 12) + evxor rW3,rW3,rW1 + EAD(rD0, 3) + evxor rD3,rD3,rW3 + LWH(rW6, 0) + evxor rD3,rD3,rW5 + EAD(rD0, 1) + evmergehi rD2,rD2,rD3 + LWH(rW3, 8) + LAH(rW1, rD3, 0, 12) + LAL(rW0, rD2, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAH(rW4, rD3, 2, 4) + LAL(rW4, rD0, 2, 4) + LAL(rW6, rD1, 3, 0) + LAH(rW5, rD1, 2, 4) + LAH(rW7, rD2, 3, 0) + LAL(rW7, rD3, 3, 0) + LAL(rW3, rD1, 1, 8) + evldw rD1,32(rKP) + EAD(rD0, 0) + evxor rW4,rW4,rW6 + LWL(rW1, 12) + evxor rW0,rW0,rW4 + EAD(rD2, 2) + evxor rW0,rW0,rW2 + LWL(rW5, 4) + evxor rD1,rD1,rW0 + evldw rD3,40(rKP) + evmergehi rD0,rD0,rD1 + EAD(rD1, 0) + evxor rW3,rW3,rW7 + LWH(rW0, 12) + evxor rW3,rW3,rW1 + EAD(rD0, 3) + evxor rD3,rD3,rW3 + LWH(rW6, 0) + evxor rD3,rD3,rW5 + EAD(rD0, 1) + evmergehi rD2,rD2,rD3 + LWH(rW3, 8) + addi rKP,rKP,32 + bdnz ppc_decrypt_block_loop + LAH(rW1, rD3, 0, 12) + LAL(rW0, rD2, 0, 12) + LAH(rW2, rD2, 1, 8) + LAL(rW2, rD3, 1, 8) + LAH(rW4, rD3, 2, 4) + LAL(rW4, rD0, 2, 4) + LAL(rW6, rD1, 3, 0) + LAH(rW5, rD1, 2, 4) + LAH(rW7, rD2, 3, 0) + LAL(rW7, rD3, 3, 0) + LAL(rW3, rD1, 1, 8) + evldw rD1,16(rKP) + EAD(rD0, 0) + evxor rW4,rW4,rW6 + LWL(rW1, 12) + evxor rW0,rW0,rW4 + EAD(rD2, 2) + evxor rW0,rW0,rW2 + LWL(rW5, 4) + evxor rD1,rD1,rW0 + evldw rD3,24(rKP) + evmergehi rD0,rD0,rD1 + DAD(rD1, 0) + evxor rW3,rW3,rW7 + LBD(rW0) + evxor rW3,rW3,rW1 + DAD(rD0, 1) + evxor rD3,rD3,rW3 + LBD(rW6) + evxor rD3,rD3,rW5 + DAD(rD0, 0) + evmergehi rD2,rD2,rD3 + LBD(rW3) + LAD(rW2, rD3, 0) + LAD(rW1, rD2, 0) + LAD(rW4, rD2, 1) + LAD(rW5, rD3, 1) + LAD(rW7, rD1, 1) + rlwimi rW0,rW4,8,16,23 + rlwimi rW1,rW5,8,16,23 + LAD(rW4, rD3, 2) + LAD(rW5, rD0, 2) + rlwimi rW2,rW6,8,16,23 + rlwimi rW3,rW7,8,16,23 + LAD(rW6, rD1, 2) + LAD(rW7, rD2, 2) + rlwimi rW0,rW4,16,8,15 + rlwimi rW1,rW5,16,8,15 + LAD(rW4, rD0, 3) + LAD(rW5, rD1, 3) + rlwimi rW2,rW6,16,8,15 + lwz rD0,32(rKP) + rlwimi rW3,rW7,16,8,15 + lwz rD1,36(rKP) + LAD(rW6, rD2, 3) + LAD(rW7, rD3, 3) + rlwimi rW0,rW4,24,0,7 + lwz rD2,40(rKP) + rlwimi rW1,rW5,24,0,7 + lwz rD3,44(rKP) + rlwimi rW2,rW6,24,0,7 + rlwimi rW3,rW7,24,0,7 + blr diff --git a/lib/crypto/powerpc/aes-spe-keys.S b/lib/crypto/powerpc/aes-spe-keys.S new file mode 100644 index 000000000000..2e1bc0d099bf --- /dev/null +++ b/lib/crypto/powerpc/aes-spe-keys.S @@ -0,0 +1,278 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Key handling functions for PPC AES implementation + * + * Copyright (c) 2015 Markus Stockhausen + */ + +#include + +#ifdef __BIG_ENDIAN__ +#define LOAD_KEY(d, s, off) \ + lwz d,off(s); +#else +#define LOAD_KEY(d, s, off) \ + li r0,off; \ + lwbrx d,s,r0; +#endif + +#define INITIALIZE_KEY \ + stwu r1,-32(r1); /* create stack frame */ \ + stw r14,8(r1); /* save registers */ \ + stw r15,12(r1); \ + stw r16,16(r1); + +#define FINALIZE_KEY \ + lwz r14,8(r1); /* restore registers */ \ + lwz r15,12(r1); \ + lwz r16,16(r1); \ + xor r5,r5,r5; /* clear sensitive data */ \ + xor r6,r6,r6; \ + xor r7,r7,r7; \ + xor r8,r8,r8; \ + xor r9,r9,r9; \ + xor r10,r10,r10; \ + xor r11,r11,r11; \ + xor r12,r12,r12; \ + addi r1,r1,32; /* cleanup stack */ + +#define LS_BOX(r, t1, t2) \ + lis t2,PPC_AES_4K_ENCTAB@h; \ + ori t2,t2,PPC_AES_4K_ENCTAB@l; \ + rlwimi t2,r,4,20,27; \ + lbz t1,8(t2); \ + rlwimi r,t1,0,24,31; \ + rlwimi t2,r,28,20,27; \ + lbz t1,8(t2); \ + rlwimi r,t1,8,16,23; \ + rlwimi t2,r,20,20,27; \ + lbz t1,8(t2); \ + rlwimi r,t1,16,8,15; \ + rlwimi t2,r,12,20,27; \ + lbz t1,8(t2); \ + rlwimi r,t1,24,0,7; + +#define GF8_MUL(out, in, t1, t2) \ + lis t1,0x8080; /* multiplication in GF8 */ \ + ori t1,t1,0x8080; \ + and t1,t1,in; \ + srwi t1,t1,7; \ + mulli t1,t1,0x1b; \ + lis t2,0x7f7f; \ + ori t2,t2,0x7f7f; \ + and t2,t2,in; \ + slwi t2,t2,1; \ + xor out,t1,t2; + +/* + * ppc_expand_key_128(u32 *key_enc, const u8 *key) + * + * Expand 128 bit key into 176 bytes encryption key. It consists of + * key itself plus 10 rounds with 16 bytes each + * + */ +_GLOBAL(ppc_expand_key_128) + INITIALIZE_KEY + LOAD_KEY(r5,r4,0) + LOAD_KEY(r6,r4,4) + LOAD_KEY(r7,r4,8) + LOAD_KEY(r8,r4,12) + stw r5,0(r3) /* key[0..3] = input data */ + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + li r16,10 /* 10 expansion rounds */ + lis r0,0x0100 /* RCO(1) */ +ppc_expand_128_loop: + addi r3,r3,16 + mr r14,r8 /* apply LS_BOX to 4th temp */ + rotlwi r14,r14,8 + LS_BOX(r14, r15, r4) + xor r14,r14,r0 + xor r5,r5,r14 /* xor next 4 keys */ + xor r6,r6,r5 + xor r7,r7,r6 + xor r8,r8,r7 + stw r5,0(r3) /* store next 4 keys */ + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + GF8_MUL(r0, r0, r4, r14) /* multiply RCO by 2 in GF */ + subi r16,r16,1 + cmpwi r16,0 + bt eq,ppc_expand_128_end + b ppc_expand_128_loop +ppc_expand_128_end: + FINALIZE_KEY + blr + +/* + * ppc_expand_key_192(u32 *key_enc, const u8 *key) + * + * Expand 192 bit key into 208 bytes encryption key. It consists of key + * itself plus 12 rounds with 16 bytes each + * + */ +_GLOBAL(ppc_expand_key_192) + INITIALIZE_KEY + LOAD_KEY(r5,r4,0) + LOAD_KEY(r6,r4,4) + LOAD_KEY(r7,r4,8) + LOAD_KEY(r8,r4,12) + LOAD_KEY(r9,r4,16) + LOAD_KEY(r10,r4,20) + stw r5,0(r3) + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + stw r9,16(r3) + stw r10,20(r3) + li r16,8 /* 8 expansion rounds */ + lis r0,0x0100 /* RCO(1) */ +ppc_expand_192_loop: + addi r3,r3,24 + mr r14,r10 /* apply LS_BOX to 6th temp */ + rotlwi r14,r14,8 + LS_BOX(r14, r15, r4) + xor r14,r14,r0 + xor r5,r5,r14 /* xor next 6 keys */ + xor r6,r6,r5 + xor r7,r7,r6 + xor r8,r8,r7 + xor r9,r9,r8 + xor r10,r10,r9 + stw r5,0(r3) + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + subi r16,r16,1 + cmpwi r16,0 /* last round early kick out */ + bt eq,ppc_expand_192_end + stw r9,16(r3) + stw r10,20(r3) + GF8_MUL(r0, r0, r4, r14) /* multiply RCO GF8 */ + b ppc_expand_192_loop +ppc_expand_192_end: + FINALIZE_KEY + blr + +/* + * ppc_expand_key_256(u32 *key_enc, const u8 *key) + * + * Expand 256 bit key into 240 bytes encryption key. It consists of key + * itself plus 14 rounds with 16 bytes each + * + */ +_GLOBAL(ppc_expand_key_256) + INITIALIZE_KEY + LOAD_KEY(r5,r4,0) + LOAD_KEY(r6,r4,4) + LOAD_KEY(r7,r4,8) + LOAD_KEY(r8,r4,12) + LOAD_KEY(r9,r4,16) + LOAD_KEY(r10,r4,20) + LOAD_KEY(r11,r4,24) + LOAD_KEY(r12,r4,28) + stw r5,0(r3) + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + stw r9,16(r3) + stw r10,20(r3) + stw r11,24(r3) + stw r12,28(r3) + li r16,7 /* 7 expansion rounds */ + lis r0,0x0100 /* RCO(1) */ +ppc_expand_256_loop: + addi r3,r3,32 + mr r14,r12 /* apply LS_BOX to 8th temp */ + rotlwi r14,r14,8 + LS_BOX(r14, r15, r4) + xor r14,r14,r0 + xor r5,r5,r14 /* xor 4 keys */ + xor r6,r6,r5 + xor r7,r7,r6 + xor r8,r8,r7 + mr r14,r8 + LS_BOX(r14, r15, r4) /* apply LS_BOX to 4th temp */ + xor r9,r9,r14 /* xor 4 keys */ + xor r10,r10,r9 + xor r11,r11,r10 + xor r12,r12,r11 + stw r5,0(r3) + stw r6,4(r3) + stw r7,8(r3) + stw r8,12(r3) + subi r16,r16,1 + cmpwi r16,0 /* last round early kick out */ + bt eq,ppc_expand_256_end + stw r9,16(r3) + stw r10,20(r3) + stw r11,24(r3) + stw r12,28(r3) + GF8_MUL(r0, r0, r4, r14) + b ppc_expand_256_loop +ppc_expand_256_end: + FINALIZE_KEY + blr + +/* + * ppc_generate_decrypt_key: derive decryption key from encryption key + * number of bytes to handle are calculated from length of key (16/24/32) + * + */ +_GLOBAL(ppc_generate_decrypt_key) + addi r6,r5,24 + slwi r6,r6,2 + lwzx r7,r4,r6 /* first/last 4 words are same */ + stw r7,0(r3) + lwz r7,0(r4) + stwx r7,r3,r6 + addi r6,r6,4 + lwzx r7,r4,r6 + stw r7,4(r3) + lwz r7,4(r4) + stwx r7,r3,r6 + addi r6,r6,4 + lwzx r7,r4,r6 + stw r7,8(r3) + lwz r7,8(r4) + stwx r7,r3,r6 + addi r6,r6,4 + lwzx r7,r4,r6 + stw r7,12(r3) + lwz r7,12(r4) + stwx r7,r3,r6 + addi r3,r3,16 + add r4,r4,r6 + subi r4,r4,28 + addi r5,r5,20 + srwi r5,r5,2 +ppc_generate_decrypt_block: + li r6,4 + mtctr r6 +ppc_generate_decrypt_word: + lwz r6,0(r4) + GF8_MUL(r7, r6, r0, r7) + GF8_MUL(r8, r7, r0, r8) + GF8_MUL(r9, r8, r0, r9) + xor r10,r9,r6 + xor r11,r7,r8 + xor r11,r11,r9 + xor r12,r7,r10 + rotrwi r12,r12,24 + xor r11,r11,r12 + xor r12,r8,r10 + rotrwi r12,r12,16 + xor r11,r11,r12 + rotrwi r12,r10,8 + xor r11,r11,r12 + stw r11,0(r3) + addi r3,r3,4 + addi r4,r4,4 + bdnz ppc_generate_decrypt_word + subi r4,r4,32 + subi r5,r5,1 + cmpwi r5,0 + bt gt,ppc_generate_decrypt_block + blr diff --git a/lib/crypto/powerpc/aes-spe-modes.S b/lib/crypto/powerpc/aes-spe-modes.S new file mode 100644 index 000000000000..3f92a6a85785 --- /dev/null +++ b/lib/crypto/powerpc/aes-spe-modes.S @@ -0,0 +1,625 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * AES modes (ECB/CBC/CTR/XTS) for PPC AES implementation + * + * Copyright (c) 2015 Markus Stockhausen + */ + +#include +#include "aes-spe-regs.h" + +#ifdef __BIG_ENDIAN__ /* Macros for big endian builds */ + +#define LOAD_DATA(reg, off) \ + lwz reg,off(rSP); /* load with offset */ +#define SAVE_DATA(reg, off) \ + stw reg,off(rDP); /* save with offset */ +#define NEXT_BLOCK \ + addi rSP,rSP,16; /* increment pointers per bloc */ \ + addi rDP,rDP,16; +#define LOAD_IV(reg, off) \ + lwz reg,off(rIP); /* IV loading with offset */ +#define SAVE_IV(reg, off) \ + stw reg,off(rIP); /* IV saving with offset */ +#define START_IV /* nothing to reset */ +#define CBC_DEC 16 /* CBC decrement per block */ +#define CTR_DEC 1 /* CTR decrement one byte */ + +#else /* Macros for little endian */ + +#define LOAD_DATA(reg, off) \ + lwbrx reg,0,rSP; /* load reversed */ \ + addi rSP,rSP,4; /* and increment pointer */ +#define SAVE_DATA(reg, off) \ + stwbrx reg,0,rDP; /* save reversed */ \ + addi rDP,rDP,4; /* and increment pointer */ +#define NEXT_BLOCK /* nothing todo */ +#define LOAD_IV(reg, off) \ + lwbrx reg,0,rIP; /* load reversed */ \ + addi rIP,rIP,4; /* and increment pointer */ +#define SAVE_IV(reg, off) \ + stwbrx reg,0,rIP; /* load reversed */ \ + addi rIP,rIP,4; /* and increment pointer */ +#define START_IV \ + subi rIP,rIP,16; /* must reset pointer */ +#define CBC_DEC 32 /* 2 blocks because of incs */ +#define CTR_DEC 17 /* 1 block because of incs */ + +#endif + +#define SAVE_0_REGS +#define LOAD_0_REGS + +#define SAVE_4_REGS \ + stw rI0,96(r1); /* save 32 bit registers */ \ + stw rI1,100(r1); \ + stw rI2,104(r1); \ + stw rI3,108(r1); + +#define LOAD_4_REGS \ + lwz rI0,96(r1); /* restore 32 bit registers */ \ + lwz rI1,100(r1); \ + lwz rI2,104(r1); \ + lwz rI3,108(r1); + +#define SAVE_8_REGS \ + SAVE_4_REGS \ + stw rG0,112(r1); /* save 32 bit registers */ \ + stw rG1,116(r1); \ + stw rG2,120(r1); \ + stw rG3,124(r1); + +#define LOAD_8_REGS \ + LOAD_4_REGS \ + lwz rG0,112(r1); /* restore 32 bit registers */ \ + lwz rG1,116(r1); \ + lwz rG2,120(r1); \ + lwz rG3,124(r1); + +#define INITIALIZE_CRYPT(tab,nr32bitregs) \ + mflr r0; \ + stwu r1,-160(r1); /* create stack frame */ \ + lis rT0,tab@h; /* en-/decryption table pointer */ \ + stw r0,8(r1); /* save link register */ \ + ori rT0,rT0,tab@l; \ + evstdw r14,16(r1); \ + mr rKS,rKP; \ + evstdw r15,24(r1); /* We must save non volatile */ \ + evstdw r16,32(r1); /* registers. Take the chance */ \ + evstdw r17,40(r1); /* and save the SPE part too */ \ + evstdw r18,48(r1); \ + evstdw r19,56(r1); \ + evstdw r20,64(r1); \ + evstdw r21,72(r1); \ + evstdw r22,80(r1); \ + evstdw r23,88(r1); \ + SAVE_##nr32bitregs##_REGS + +#define FINALIZE_CRYPT(nr32bitregs) \ + lwz r0,8(r1); \ + evldw r14,16(r1); /* restore SPE registers */ \ + evldw r15,24(r1); \ + evldw r16,32(r1); \ + evldw r17,40(r1); \ + evldw r18,48(r1); \ + evldw r19,56(r1); \ + evldw r20,64(r1); \ + evldw r21,72(r1); \ + evldw r22,80(r1); \ + evldw r23,88(r1); \ + LOAD_##nr32bitregs##_REGS \ + mtlr r0; /* restore link register */ \ + xor r0,r0,r0; \ + stw r0,16(r1); /* delete sensitive data */ \ + stw r0,24(r1); /* that we might have pushed */ \ + stw r0,32(r1); /* from other context that runs */ \ + stw r0,40(r1); /* the same code */ \ + stw r0,48(r1); \ + stw r0,56(r1); \ + stw r0,64(r1); \ + stw r0,72(r1); \ + stw r0,80(r1); \ + stw r0,88(r1); \ + addi r1,r1,160; /* cleanup stack frame */ + +#define ENDIAN_SWAP(t0, t1, s0, s1) \ + rotrwi t0,s0,8; /* swap endianness for 2 GPRs */ \ + rotrwi t1,s1,8; \ + rlwimi t0,s0,8,8,15; \ + rlwimi t1,s1,8,8,15; \ + rlwimi t0,s0,8,24,31; \ + rlwimi t1,s1,8,24,31; + +#define GF128_MUL(d0, d1, d2, d3, t0) \ + li t0,0x87; /* multiplication in GF128 */ \ + cmpwi d3,-1; \ + iselgt t0,0,t0; \ + rlwimi d3,d2,0,0,0; /* propagate "carry" bits */ \ + rotlwi d3,d3,1; \ + rlwimi d2,d1,0,0,0; \ + rotlwi d2,d2,1; \ + rlwimi d1,d0,0,0,0; \ + slwi d0,d0,1; /* shift left 128 bit */ \ + rotlwi d1,d1,1; \ + xor d0,d0,t0; + +#define START_KEY(d0, d1, d2, d3) \ + lwz rW0,0(rKP); \ + mtctr rRR; \ + lwz rW1,4(rKP); \ + lwz rW2,8(rKP); \ + lwz rW3,12(rKP); \ + xor rD0,d0,rW0; \ + xor rD1,d1,rW1; \ + xor rD2,d2,rW2; \ + xor rD3,d3,rW3; + +/* + * ppc_encrypt_aes(u8 *out, const u8 *in, u32 *key_enc, + * u32 rounds) + * + * called from glue layer to encrypt a single 16 byte block + * round values are AES128 = 4, AES192 = 5, AES256 = 6 + * + */ +_GLOBAL(ppc_encrypt_aes) + INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 0) + LOAD_DATA(rD0, 0) + LOAD_DATA(rD1, 4) + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_encrypt_block + xor rD0,rD0,rW0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rW1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rW2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rW3 + SAVE_DATA(rD3, 12) + FINALIZE_CRYPT(0) + blr + +/* + * ppc_decrypt_aes(u8 *out, const u8 *in, u32 *key_dec, + * u32 rounds) + * + * called from glue layer to decrypt a single 16 byte block + * round values are AES128 = 4, AES192 = 5, AES256 = 6 + * + */ +_GLOBAL(ppc_decrypt_aes) + INITIALIZE_CRYPT(PPC_AES_4K_DECTAB,0) + LOAD_DATA(rD0, 0) + addi rT1,rT0,4096 + LOAD_DATA(rD1, 4) + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_decrypt_block + xor rD0,rD0,rW0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rW1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rW2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rW3 + SAVE_DATA(rD3, 12) + FINALIZE_CRYPT(0) + blr + +/* + * ppc_encrypt_ecb(u8 *out, const u8 *in, u32 *key_enc, + * u32 rounds, u32 bytes); + * + * called from glue layer to encrypt multiple blocks via ECB + * Bytes must be larger or equal 16 and only whole blocks are + * processed. round values are AES128 = 4, AES192 = 5 and + * AES256 = 6 + * + */ +_GLOBAL(ppc_encrypt_ecb) + INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 0) +ppc_encrypt_ecb_loop: + LOAD_DATA(rD0, 0) + mr rKP,rKS + LOAD_DATA(rD1, 4) + subi rLN,rLN,16 + LOAD_DATA(rD2, 8) + cmpwi rLN,15 + LOAD_DATA(rD3, 12) + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_encrypt_block + xor rD0,rD0,rW0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rW1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rW2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rW3 + SAVE_DATA(rD3, 12) + NEXT_BLOCK + bt gt,ppc_encrypt_ecb_loop + FINALIZE_CRYPT(0) + blr + +/* + * ppc_decrypt_ecb(u8 *out, const u8 *in, u32 *key_dec, + * u32 rounds, u32 bytes); + * + * called from glue layer to decrypt multiple blocks via ECB + * Bytes must be larger or equal 16 and only whole blocks are + * processed. round values are AES128 = 4, AES192 = 5 and + * AES256 = 6 + * + */ +_GLOBAL(ppc_decrypt_ecb) + INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 0) + addi rT1,rT0,4096 +ppc_decrypt_ecb_loop: + LOAD_DATA(rD0, 0) + mr rKP,rKS + LOAD_DATA(rD1, 4) + subi rLN,rLN,16 + LOAD_DATA(rD2, 8) + cmpwi rLN,15 + LOAD_DATA(rD3, 12) + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_decrypt_block + xor rD0,rD0,rW0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rW1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rW2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rW3 + SAVE_DATA(rD3, 12) + NEXT_BLOCK + bt gt,ppc_decrypt_ecb_loop + FINALIZE_CRYPT(0) + blr + +/* + * ppc_encrypt_cbc(u8 *out, const u8 *in, u32 *key_enc, + * 32 rounds, u32 bytes, u8 *iv); + * + * called from glue layer to encrypt multiple blocks via CBC + * Bytes must be larger or equal 16 and only whole blocks are + * processed. round values are AES128 = 4, AES192 = 5 and + * AES256 = 6 + * + */ +_GLOBAL(ppc_encrypt_cbc) + INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 4) + LOAD_IV(rI0, 0) + LOAD_IV(rI1, 4) + LOAD_IV(rI2, 8) + LOAD_IV(rI3, 12) +ppc_encrypt_cbc_loop: + LOAD_DATA(rD0, 0) + mr rKP,rKS + LOAD_DATA(rD1, 4) + subi rLN,rLN,16 + LOAD_DATA(rD2, 8) + cmpwi rLN,15 + LOAD_DATA(rD3, 12) + xor rD0,rD0,rI0 + xor rD1,rD1,rI1 + xor rD2,rD2,rI2 + xor rD3,rD3,rI3 + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_encrypt_block + xor rI0,rD0,rW0 + SAVE_DATA(rI0, 0) + xor rI1,rD1,rW1 + SAVE_DATA(rI1, 4) + xor rI2,rD2,rW2 + SAVE_DATA(rI2, 8) + xor rI3,rD3,rW3 + SAVE_DATA(rI3, 12) + NEXT_BLOCK + bt gt,ppc_encrypt_cbc_loop + START_IV + SAVE_IV(rI0, 0) + SAVE_IV(rI1, 4) + SAVE_IV(rI2, 8) + SAVE_IV(rI3, 12) + FINALIZE_CRYPT(4) + blr + +/* + * ppc_decrypt_cbc(u8 *out, const u8 *in, u32 *key_dec, + * u32 rounds, u32 bytes, u8 *iv); + * + * called from glue layer to decrypt multiple blocks via CBC + * round values are AES128 = 4, AES192 = 5, AES256 = 6 + * + */ +_GLOBAL(ppc_decrypt_cbc) + INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 4) + li rT1,15 + LOAD_IV(rI0, 0) + andc rLN,rLN,rT1 + LOAD_IV(rI1, 4) + subi rLN,rLN,16 + LOAD_IV(rI2, 8) + add rSP,rSP,rLN /* reverse processing */ + LOAD_IV(rI3, 12) + add rDP,rDP,rLN + LOAD_DATA(rD0, 0) + addi rT1,rT0,4096 + LOAD_DATA(rD1, 4) + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + START_IV + SAVE_IV(rD0, 0) + SAVE_IV(rD1, 4) + SAVE_IV(rD2, 8) + cmpwi rLN,16 + SAVE_IV(rD3, 12) + bt lt,ppc_decrypt_cbc_end +ppc_decrypt_cbc_loop: + mr rKP,rKS + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_decrypt_block + subi rLN,rLN,16 + subi rSP,rSP,CBC_DEC + xor rW0,rD0,rW0 + LOAD_DATA(rD0, 0) + xor rW1,rD1,rW1 + LOAD_DATA(rD1, 4) + xor rW2,rD2,rW2 + LOAD_DATA(rD2, 8) + xor rW3,rD3,rW3 + LOAD_DATA(rD3, 12) + xor rW0,rW0,rD0 + SAVE_DATA(rW0, 0) + xor rW1,rW1,rD1 + SAVE_DATA(rW1, 4) + xor rW2,rW2,rD2 + SAVE_DATA(rW2, 8) + xor rW3,rW3,rD3 + SAVE_DATA(rW3, 12) + cmpwi rLN,15 + subi rDP,rDP,CBC_DEC + bt gt,ppc_decrypt_cbc_loop +ppc_decrypt_cbc_end: + mr rKP,rKS + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_decrypt_block + xor rW0,rW0,rD0 + xor rW1,rW1,rD1 + xor rW2,rW2,rD2 + xor rW3,rW3,rD3 + xor rW0,rW0,rI0 /* decrypt with initial IV */ + SAVE_DATA(rW0, 0) + xor rW1,rW1,rI1 + SAVE_DATA(rW1, 4) + xor rW2,rW2,rI2 + SAVE_DATA(rW2, 8) + xor rW3,rW3,rI3 + SAVE_DATA(rW3, 12) + FINALIZE_CRYPT(4) + blr + +/* + * ppc_crypt_ctr(u8 *out, const u8 *in, u32 *key_enc, + * u32 rounds, u32 bytes, u8 *iv); + * + * called from glue layer to encrypt/decrypt multiple blocks + * via CTR. Number of bytes does not need to be a multiple of + * 16. Round values are AES128 = 4, AES192 = 5, AES256 = 6 + * + */ +_GLOBAL(ppc_crypt_ctr) + INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 4) + LOAD_IV(rI0, 0) + LOAD_IV(rI1, 4) + LOAD_IV(rI2, 8) + cmpwi rLN,16 + LOAD_IV(rI3, 12) + START_IV + bt lt,ppc_crypt_ctr_partial +ppc_crypt_ctr_loop: + mr rKP,rKS + START_KEY(rI0, rI1, rI2, rI3) + bl ppc_encrypt_block + xor rW0,rD0,rW0 + xor rW1,rD1,rW1 + xor rW2,rD2,rW2 + xor rW3,rD3,rW3 + LOAD_DATA(rD0, 0) + subi rLN,rLN,16 + LOAD_DATA(rD1, 4) + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + xor rD0,rD0,rW0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rW1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rW2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rW3 + SAVE_DATA(rD3, 12) + addic rI3,rI3,1 /* increase counter */ + addze rI2,rI2 + addze rI1,rI1 + addze rI0,rI0 + NEXT_BLOCK + cmpwi rLN,15 + bt gt,ppc_crypt_ctr_loop +ppc_crypt_ctr_partial: + cmpwi rLN,0 + bt eq,ppc_crypt_ctr_end + mr rKP,rKS + START_KEY(rI0, rI1, rI2, rI3) + bl ppc_encrypt_block + xor rW0,rD0,rW0 + SAVE_IV(rW0, 0) + xor rW1,rD1,rW1 + SAVE_IV(rW1, 4) + xor rW2,rD2,rW2 + SAVE_IV(rW2, 8) + xor rW3,rD3,rW3 + SAVE_IV(rW3, 12) + mtctr rLN + subi rIP,rIP,CTR_DEC + subi rSP,rSP,1 + subi rDP,rDP,1 +ppc_crypt_ctr_xorbyte: + lbzu rW4,1(rIP) /* bytewise xor for partial block */ + lbzu rW5,1(rSP) + xor rW4,rW4,rW5 + stbu rW4,1(rDP) + bdnz ppc_crypt_ctr_xorbyte + subf rIP,rLN,rIP + addi rIP,rIP,1 + addic rI3,rI3,1 + addze rI2,rI2 + addze rI1,rI1 + addze rI0,rI0 +ppc_crypt_ctr_end: + SAVE_IV(rI0, 0) + SAVE_IV(rI1, 4) + SAVE_IV(rI2, 8) + SAVE_IV(rI3, 12) + FINALIZE_CRYPT(4) + blr + +/* + * ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, + * u32 rounds, u32 bytes, u8 *iv, u32 *key_twk); + * + * called from glue layer to encrypt multiple blocks via XTS + * If key_twk is given, the initial IV encryption will be + * processed too. Round values are AES128 = 4, AES192 = 5, + * AES256 = 6 + * + */ +_GLOBAL(ppc_encrypt_xts) + INITIALIZE_CRYPT(PPC_AES_4K_ENCTAB, 8) + LOAD_IV(rI0, 0) + LOAD_IV(rI1, 4) + LOAD_IV(rI2, 8) + cmpwi rKT,0 + LOAD_IV(rI3, 12) + bt eq,ppc_encrypt_xts_notweak + mr rKP,rKT + START_KEY(rI0, rI1, rI2, rI3) + bl ppc_encrypt_block + xor rI0,rD0,rW0 + xor rI1,rD1,rW1 + xor rI2,rD2,rW2 + xor rI3,rD3,rW3 +ppc_encrypt_xts_notweak: + ENDIAN_SWAP(rG0, rG1, rI0, rI1) + ENDIAN_SWAP(rG2, rG3, rI2, rI3) +ppc_encrypt_xts_loop: + LOAD_DATA(rD0, 0) + mr rKP,rKS + LOAD_DATA(rD1, 4) + subi rLN,rLN,16 + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + xor rD0,rD0,rI0 + xor rD1,rD1,rI1 + xor rD2,rD2,rI2 + xor rD3,rD3,rI3 + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_encrypt_block + xor rD0,rD0,rW0 + xor rD1,rD1,rW1 + xor rD2,rD2,rW2 + xor rD3,rD3,rW3 + xor rD0,rD0,rI0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rI1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rI2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rI3 + SAVE_DATA(rD3, 12) + GF128_MUL(rG0, rG1, rG2, rG3, rW0) + ENDIAN_SWAP(rI0, rI1, rG0, rG1) + ENDIAN_SWAP(rI2, rI3, rG2, rG3) + cmpwi rLN,0 + NEXT_BLOCK + bt gt,ppc_encrypt_xts_loop + START_IV + SAVE_IV(rI0, 0) + SAVE_IV(rI1, 4) + SAVE_IV(rI2, 8) + SAVE_IV(rI3, 12) + FINALIZE_CRYPT(8) + blr + +/* + * ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, + * u32 rounds, u32 blocks, u8 *iv, u32 *key_twk); + * + * called from glue layer to decrypt multiple blocks via XTS + * If key_twk is given, the initial IV encryption will be + * processed too. Round values are AES128 = 4, AES192 = 5, + * AES256 = 6 + * + */ +_GLOBAL(ppc_decrypt_xts) + INITIALIZE_CRYPT(PPC_AES_4K_DECTAB, 8) + LOAD_IV(rI0, 0) + addi rT1,rT0,4096 + LOAD_IV(rI1, 4) + LOAD_IV(rI2, 8) + cmpwi rKT,0 + LOAD_IV(rI3, 12) + bt eq,ppc_decrypt_xts_notweak + subi rT0,rT0,4096 + mr rKP,rKT + START_KEY(rI0, rI1, rI2, rI3) + bl ppc_encrypt_block + xor rI0,rD0,rW0 + xor rI1,rD1,rW1 + xor rI2,rD2,rW2 + xor rI3,rD3,rW3 + addi rT0,rT0,4096 +ppc_decrypt_xts_notweak: + ENDIAN_SWAP(rG0, rG1, rI0, rI1) + ENDIAN_SWAP(rG2, rG3, rI2, rI3) +ppc_decrypt_xts_loop: + LOAD_DATA(rD0, 0) + mr rKP,rKS + LOAD_DATA(rD1, 4) + subi rLN,rLN,16 + LOAD_DATA(rD2, 8) + LOAD_DATA(rD3, 12) + xor rD0,rD0,rI0 + xor rD1,rD1,rI1 + xor rD2,rD2,rI2 + xor rD3,rD3,rI3 + START_KEY(rD0, rD1, rD2, rD3) + bl ppc_decrypt_block + xor rD0,rD0,rW0 + xor rD1,rD1,rW1 + xor rD2,rD2,rW2 + xor rD3,rD3,rW3 + xor rD0,rD0,rI0 + SAVE_DATA(rD0, 0) + xor rD1,rD1,rI1 + SAVE_DATA(rD1, 4) + xor rD2,rD2,rI2 + SAVE_DATA(rD2, 8) + xor rD3,rD3,rI3 + SAVE_DATA(rD3, 12) + GF128_MUL(rG0, rG1, rG2, rG3, rW0) + ENDIAN_SWAP(rI0, rI1, rG0, rG1) + ENDIAN_SWAP(rI2, rI3, rG2, rG3) + cmpwi rLN,0 + NEXT_BLOCK + bt gt,ppc_decrypt_xts_loop + START_IV + SAVE_IV(rI0, 0) + SAVE_IV(rI1, 4) + SAVE_IV(rI2, 8) + SAVE_IV(rI3, 12) + FINALIZE_CRYPT(8) + blr diff --git a/lib/crypto/powerpc/aes-spe-regs.h b/lib/crypto/powerpc/aes-spe-regs.h new file mode 100644 index 000000000000..2eb4c9b94152 --- /dev/null +++ b/lib/crypto/powerpc/aes-spe-regs.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Common registers for PPC AES implementation + * + * Copyright (c) 2015 Markus Stockhausen + */ + +#define rKS r0 /* copy of en-/decryption key pointer */ +#define rDP r3 /* destination pointer */ +#define rSP r4 /* source pointer */ +#define rKP r5 /* pointer to en-/decryption key pointer */ +#define rRR r6 /* en-/decryption rounds */ +#define rLN r7 /* length of data to be processed */ +#define rIP r8 /* potiner to IV (CBC/CTR/XTS modes) */ +#define rKT r9 /* pointer to tweak key (XTS mode) */ +#define rT0 r11 /* pointers to en-/decryption tables */ +#define rT1 r10 +#define rD0 r9 /* data */ +#define rD1 r14 +#define rD2 r12 +#define rD3 r15 +#define rW0 r16 /* working registers */ +#define rW1 r17 +#define rW2 r18 +#define rW3 r19 +#define rW4 r20 +#define rW5 r21 +#define rW6 r22 +#define rW7 r23 +#define rI0 r24 /* IV */ +#define rI1 r25 +#define rI2 r26 +#define rI3 r27 +#define rG0 r28 /* endian reversed tweak (XTS mode) */ +#define rG1 r29 +#define rG2 r30 +#define rG3 r31 diff --git a/lib/crypto/powerpc/aes-tab-4k.S b/lib/crypto/powerpc/aes-tab-4k.S new file mode 100644 index 000000000000..ceb604bc6f72 --- /dev/null +++ b/lib/crypto/powerpc/aes-tab-4k.S @@ -0,0 +1,326 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * 4K AES tables for PPC AES implementation + * + * Copyright (c) 2015 Markus Stockhausen + */ + +/* + * These big endian AES encryption/decryption tables have been taken from + * crypto/aes_generic.c and are designed to be simply accessed by a combination + * of rlwimi/lwz instructions with a minimum of table registers (usually only + * one required). Thus they are aligned to 4K. The locality of rotated values + * is derived from the reduced offsets that are available in the SPE load + * instructions. E.g. evldw, evlwwsplat, ... + * + * For the safety-conscious it has to be noted that they might be vulnerable + * to cache timing attacks because of their size. Nevertheless in contrast to + * the generic tables they have been reduced from 16KB to 8KB + 256 bytes. + * This is a quite good tradeoff for low power devices (e.g. routers) without + * dedicated encryption hardware where we usually have no multiuser + * environment. + * + */ + +#define R(a, b, c, d) \ + 0x##a##b##c##d, 0x##d##a##b##c, 0x##c##d##a##b, 0x##b##c##d##a + +.data +.align 12 +.globl PPC_AES_4K_ENCTAB +PPC_AES_4K_ENCTAB: +/* encryption table, same as crypto_ft_tab in crypto/aes-generic.c */ + .long R(c6, 63, 63, a5), R(f8, 7c, 7c, 84) + .long R(ee, 77, 77, 99), R(f6, 7b, 7b, 8d) + .long R(ff, f2, f2, 0d), R(d6, 6b, 6b, bd) + .long R(de, 6f, 6f, b1), R(91, c5, c5, 54) + .long R(60, 30, 30, 50), R(02, 01, 01, 03) + .long R(ce, 67, 67, a9), R(56, 2b, 2b, 7d) + .long R(e7, fe, fe, 19), R(b5, d7, d7, 62) + .long R(4d, ab, ab, e6), R(ec, 76, 76, 9a) + .long R(8f, ca, ca, 45), R(1f, 82, 82, 9d) + .long R(89, c9, c9, 40), R(fa, 7d, 7d, 87) + .long R(ef, fa, fa, 15), R(b2, 59, 59, eb) + .long R(8e, 47, 47, c9), R(fb, f0, f0, 0b) + .long R(41, ad, ad, ec), R(b3, d4, d4, 67) + .long R(5f, a2, a2, fd), R(45, af, af, ea) + .long R(23, 9c, 9c, bf), R(53, a4, a4, f7) + .long R(e4, 72, 72, 96), R(9b, c0, c0, 5b) + .long R(75, b7, b7, c2), R(e1, fd, fd, 1c) + .long R(3d, 93, 93, ae), R(4c, 26, 26, 6a) + .long R(6c, 36, 36, 5a), R(7e, 3f, 3f, 41) + .long R(f5, f7, f7, 02), R(83, cc, cc, 4f) + .long R(68, 34, 34, 5c), R(51, a5, a5, f4) + .long R(d1, e5, e5, 34), R(f9, f1, f1, 08) + .long R(e2, 71, 71, 93), R(ab, d8, d8, 73) + .long R(62, 31, 31, 53), R(2a, 15, 15, 3f) + .long R(08, 04, 04, 0c), R(95, c7, c7, 52) + .long R(46, 23, 23, 65), R(9d, c3, c3, 5e) + .long R(30, 18, 18, 28), R(37, 96, 96, a1) + .long R(0a, 05, 05, 0f), R(2f, 9a, 9a, b5) + .long R(0e, 07, 07, 09), R(24, 12, 12, 36) + .long R(1b, 80, 80, 9b), R(df, e2, e2, 3d) + .long R(cd, eb, eb, 26), R(4e, 27, 27, 69) + .long R(7f, b2, b2, cd), R(ea, 75, 75, 9f) + .long R(12, 09, 09, 1b), R(1d, 83, 83, 9e) + .long R(58, 2c, 2c, 74), R(34, 1a, 1a, 2e) + .long R(36, 1b, 1b, 2d), R(dc, 6e, 6e, b2) + .long R(b4, 5a, 5a, ee), R(5b, a0, a0, fb) + .long R(a4, 52, 52, f6), R(76, 3b, 3b, 4d) + .long R(b7, d6, d6, 61), R(7d, b3, b3, ce) + .long R(52, 29, 29, 7b), R(dd, e3, e3, 3e) + .long R(5e, 2f, 2f, 71), R(13, 84, 84, 97) + .long R(a6, 53, 53, f5), R(b9, d1, d1, 68) + .long R(00, 00, 00, 00), R(c1, ed, ed, 2c) + .long R(40, 20, 20, 60), R(e3, fc, fc, 1f) + .long R(79, b1, b1, c8), R(b6, 5b, 5b, ed) + .long R(d4, 6a, 6a, be), R(8d, cb, cb, 46) + .long R(67, be, be, d9), R(72, 39, 39, 4b) + .long R(94, 4a, 4a, de), R(98, 4c, 4c, d4) + .long R(b0, 58, 58, e8), R(85, cf, cf, 4a) + .long R(bb, d0, d0, 6b), R(c5, ef, ef, 2a) + .long R(4f, aa, aa, e5), R(ed, fb, fb, 16) + .long R(86, 43, 43, c5), R(9a, 4d, 4d, d7) + .long R(66, 33, 33, 55), R(11, 85, 85, 94) + .long R(8a, 45, 45, cf), R(e9, f9, f9, 10) + .long R(04, 02, 02, 06), R(fe, 7f, 7f, 81) + .long R(a0, 50, 50, f0), R(78, 3c, 3c, 44) + .long R(25, 9f, 9f, ba), R(4b, a8, a8, e3) + .long R(a2, 51, 51, f3), R(5d, a3, a3, fe) + .long R(80, 40, 40, c0), R(05, 8f, 8f, 8a) + .long R(3f, 92, 92, ad), R(21, 9d, 9d, bc) + .long R(70, 38, 38, 48), R(f1, f5, f5, 04) + .long R(63, bc, bc, df), R(77, b6, b6, c1) + .long R(af, da, da, 75), R(42, 21, 21, 63) + .long R(20, 10, 10, 30), R(e5, ff, ff, 1a) + .long R(fd, f3, f3, 0e), R(bf, d2, d2, 6d) + .long R(81, cd, cd, 4c), R(18, 0c, 0c, 14) + .long R(26, 13, 13, 35), R(c3, ec, ec, 2f) + .long R(be, 5f, 5f, e1), R(35, 97, 97, a2) + .long R(88, 44, 44, cc), R(2e, 17, 17, 39) + .long R(93, c4, c4, 57), R(55, a7, a7, f2) + .long R(fc, 7e, 7e, 82), R(7a, 3d, 3d, 47) + .long R(c8, 64, 64, ac), R(ba, 5d, 5d, e7) + .long R(32, 19, 19, 2b), R(e6, 73, 73, 95) + .long R(c0, 60, 60, a0), R(19, 81, 81, 98) + .long R(9e, 4f, 4f, d1), R(a3, dc, dc, 7f) + .long R(44, 22, 22, 66), R(54, 2a, 2a, 7e) + .long R(3b, 90, 90, ab), R(0b, 88, 88, 83) + .long R(8c, 46, 46, ca), R(c7, ee, ee, 29) + .long R(6b, b8, b8, d3), R(28, 14, 14, 3c) + .long R(a7, de, de, 79), R(bc, 5e, 5e, e2) + .long R(16, 0b, 0b, 1d), R(ad, db, db, 76) + .long R(db, e0, e0, 3b), R(64, 32, 32, 56) + .long R(74, 3a, 3a, 4e), R(14, 0a, 0a, 1e) + .long R(92, 49, 49, db), R(0c, 06, 06, 0a) + .long R(48, 24, 24, 6c), R(b8, 5c, 5c, e4) + .long R(9f, c2, c2, 5d), R(bd, d3, d3, 6e) + .long R(43, ac, ac, ef), R(c4, 62, 62, a6) + .long R(39, 91, 91, a8), R(31, 95, 95, a4) + .long R(d3, e4, e4, 37), R(f2, 79, 79, 8b) + .long R(d5, e7, e7, 32), R(8b, c8, c8, 43) + .long R(6e, 37, 37, 59), R(da, 6d, 6d, b7) + .long R(01, 8d, 8d, 8c), R(b1, d5, d5, 64) + .long R(9c, 4e, 4e, d2), R(49, a9, a9, e0) + .long R(d8, 6c, 6c, b4), R(ac, 56, 56, fa) + .long R(f3, f4, f4, 07), R(cf, ea, ea, 25) + .long R(ca, 65, 65, af), R(f4, 7a, 7a, 8e) + .long R(47, ae, ae, e9), R(10, 08, 08, 18) + .long R(6f, ba, ba, d5), R(f0, 78, 78, 88) + .long R(4a, 25, 25, 6f), R(5c, 2e, 2e, 72) + .long R(38, 1c, 1c, 24), R(57, a6, a6, f1) + .long R(73, b4, b4, c7), R(97, c6, c6, 51) + .long R(cb, e8, e8, 23), R(a1, dd, dd, 7c) + .long R(e8, 74, 74, 9c), R(3e, 1f, 1f, 21) + .long R(96, 4b, 4b, dd), R(61, bd, bd, dc) + .long R(0d, 8b, 8b, 86), R(0f, 8a, 8a, 85) + .long R(e0, 70, 70, 90), R(7c, 3e, 3e, 42) + .long R(71, b5, b5, c4), R(cc, 66, 66, aa) + .long R(90, 48, 48, d8), R(06, 03, 03, 05) + .long R(f7, f6, f6, 01), R(1c, 0e, 0e, 12) + .long R(c2, 61, 61, a3), R(6a, 35, 35, 5f) + .long R(ae, 57, 57, f9), R(69, b9, b9, d0) + .long R(17, 86, 86, 91), R(99, c1, c1, 58) + .long R(3a, 1d, 1d, 27), R(27, 9e, 9e, b9) + .long R(d9, e1, e1, 38), R(eb, f8, f8, 13) + .long R(2b, 98, 98, b3), R(22, 11, 11, 33) + .long R(d2, 69, 69, bb), R(a9, d9, d9, 70) + .long R(07, 8e, 8e, 89), R(33, 94, 94, a7) + .long R(2d, 9b, 9b, b6), R(3c, 1e, 1e, 22) + .long R(15, 87, 87, 92), R(c9, e9, e9, 20) + .long R(87, ce, ce, 49), R(aa, 55, 55, ff) + .long R(50, 28, 28, 78), R(a5, df, df, 7a) + .long R(03, 8c, 8c, 8f), R(59, a1, a1, f8) + .long R(09, 89, 89, 80), R(1a, 0d, 0d, 17) + .long R(65, bf, bf, da), R(d7, e6, e6, 31) + .long R(84, 42, 42, c6), R(d0, 68, 68, b8) + .long R(82, 41, 41, c3), R(29, 99, 99, b0) + .long R(5a, 2d, 2d, 77), R(1e, 0f, 0f, 11) + .long R(7b, b0, b0, cb), R(a8, 54, 54, fc) + .long R(6d, bb, bb, d6), R(2c, 16, 16, 3a) +.globl PPC_AES_4K_DECTAB +PPC_AES_4K_DECTAB: +/* decryption table, same as crypto_it_tab in crypto/aes-generic.c */ + .long R(51, f4, a7, 50), R(7e, 41, 65, 53) + .long R(1a, 17, a4, c3), R(3a, 27, 5e, 96) + .long R(3b, ab, 6b, cb), R(1f, 9d, 45, f1) + .long R(ac, fa, 58, ab), R(4b, e3, 03, 93) + .long R(20, 30, fa, 55), R(ad, 76, 6d, f6) + .long R(88, cc, 76, 91), R(f5, 02, 4c, 25) + .long R(4f, e5, d7, fc), R(c5, 2a, cb, d7) + .long R(26, 35, 44, 80), R(b5, 62, a3, 8f) + .long R(de, b1, 5a, 49), R(25, ba, 1b, 67) + .long R(45, ea, 0e, 98), R(5d, fe, c0, e1) + .long R(c3, 2f, 75, 02), R(81, 4c, f0, 12) + .long R(8d, 46, 97, a3), R(6b, d3, f9, c6) + .long R(03, 8f, 5f, e7), R(15, 92, 9c, 95) + .long R(bf, 6d, 7a, eb), R(95, 52, 59, da) + .long R(d4, be, 83, 2d), R(58, 74, 21, d3) + .long R(49, e0, 69, 29), R(8e, c9, c8, 44) + .long R(75, c2, 89, 6a), R(f4, 8e, 79, 78) + .long R(99, 58, 3e, 6b), R(27, b9, 71, dd) + .long R(be, e1, 4f, b6), R(f0, 88, ad, 17) + .long R(c9, 20, ac, 66), R(7d, ce, 3a, b4) + .long R(63, df, 4a, 18), R(e5, 1a, 31, 82) + .long R(97, 51, 33, 60), R(62, 53, 7f, 45) + .long R(b1, 64, 77, e0), R(bb, 6b, ae, 84) + .long R(fe, 81, a0, 1c), R(f9, 08, 2b, 94) + .long R(70, 48, 68, 58), R(8f, 45, fd, 19) + .long R(94, de, 6c, 87), R(52, 7b, f8, b7) + .long R(ab, 73, d3, 23), R(72, 4b, 02, e2) + .long R(e3, 1f, 8f, 57), R(66, 55, ab, 2a) + .long R(b2, eb, 28, 07), R(2f, b5, c2, 03) + .long R(86, c5, 7b, 9a), R(d3, 37, 08, a5) + .long R(30, 28, 87, f2), R(23, bf, a5, b2) + .long R(02, 03, 6a, ba), R(ed, 16, 82, 5c) + .long R(8a, cf, 1c, 2b), R(a7, 79, b4, 92) + .long R(f3, 07, f2, f0), R(4e, 69, e2, a1) + .long R(65, da, f4, cd), R(06, 05, be, d5) + .long R(d1, 34, 62, 1f), R(c4, a6, fe, 8a) + .long R(34, 2e, 53, 9d), R(a2, f3, 55, a0) + .long R(05, 8a, e1, 32), R(a4, f6, eb, 75) + .long R(0b, 83, ec, 39), R(40, 60, ef, aa) + .long R(5e, 71, 9f, 06), R(bd, 6e, 10, 51) + .long R(3e, 21, 8a, f9), R(96, dd, 06, 3d) + .long R(dd, 3e, 05, ae), R(4d, e6, bd, 46) + .long R(91, 54, 8d, b5), R(71, c4, 5d, 05) + .long R(04, 06, d4, 6f), R(60, 50, 15, ff) + .long R(19, 98, fb, 24), R(d6, bd, e9, 97) + .long R(89, 40, 43, cc), R(67, d9, 9e, 77) + .long R(b0, e8, 42, bd), R(07, 89, 8b, 88) + .long R(e7, 19, 5b, 38), R(79, c8, ee, db) + .long R(a1, 7c, 0a, 47), R(7c, 42, 0f, e9) + .long R(f8, 84, 1e, c9), R(00, 00, 00, 00) + .long R(09, 80, 86, 83), R(32, 2b, ed, 48) + .long R(1e, 11, 70, ac), R(6c, 5a, 72, 4e) + .long R(fd, 0e, ff, fb), R(0f, 85, 38, 56) + .long R(3d, ae, d5, 1e), R(36, 2d, 39, 27) + .long R(0a, 0f, d9, 64), R(68, 5c, a6, 21) + .long R(9b, 5b, 54, d1), R(24, 36, 2e, 3a) + .long R(0c, 0a, 67, b1), R(93, 57, e7, 0f) + .long R(b4, ee, 96, d2), R(1b, 9b, 91, 9e) + .long R(80, c0, c5, 4f), R(61, dc, 20, a2) + .long R(5a, 77, 4b, 69), R(1c, 12, 1a, 16) + .long R(e2, 93, ba, 0a), R(c0, a0, 2a, e5) + .long R(3c, 22, e0, 43), R(12, 1b, 17, 1d) + .long R(0e, 09, 0d, 0b), R(f2, 8b, c7, ad) + .long R(2d, b6, a8, b9), R(14, 1e, a9, c8) + .long R(57, f1, 19, 85), R(af, 75, 07, 4c) + .long R(ee, 99, dd, bb), R(a3, 7f, 60, fd) + .long R(f7, 01, 26, 9f), R(5c, 72, f5, bc) + .long R(44, 66, 3b, c5), R(5b, fb, 7e, 34) + .long R(8b, 43, 29, 76), R(cb, 23, c6, dc) + .long R(b6, ed, fc, 68), R(b8, e4, f1, 63) + .long R(d7, 31, dc, ca), R(42, 63, 85, 10) + .long R(13, 97, 22, 40), R(84, c6, 11, 20) + .long R(85, 4a, 24, 7d), R(d2, bb, 3d, f8) + .long R(ae, f9, 32, 11), R(c7, 29, a1, 6d) + .long R(1d, 9e, 2f, 4b), R(dc, b2, 30, f3) + .long R(0d, 86, 52, ec), R(77, c1, e3, d0) + .long R(2b, b3, 16, 6c), R(a9, 70, b9, 99) + .long R(11, 94, 48, fa), R(47, e9, 64, 22) + .long R(a8, fc, 8c, c4), R(a0, f0, 3f, 1a) + .long R(56, 7d, 2c, d8), R(22, 33, 90, ef) + .long R(87, 49, 4e, c7), R(d9, 38, d1, c1) + .long R(8c, ca, a2, fe), R(98, d4, 0b, 36) + .long R(a6, f5, 81, cf), R(a5, 7a, de, 28) + .long R(da, b7, 8e, 26), R(3f, ad, bf, a4) + .long R(2c, 3a, 9d, e4), R(50, 78, 92, 0d) + .long R(6a, 5f, cc, 9b), R(54, 7e, 46, 62) + .long R(f6, 8d, 13, c2), R(90, d8, b8, e8) + .long R(2e, 39, f7, 5e), R(82, c3, af, f5) + .long R(9f, 5d, 80, be), R(69, d0, 93, 7c) + .long R(6f, d5, 2d, a9), R(cf, 25, 12, b3) + .long R(c8, ac, 99, 3b), R(10, 18, 7d, a7) + .long R(e8, 9c, 63, 6e), R(db, 3b, bb, 7b) + .long R(cd, 26, 78, 09), R(6e, 59, 18, f4) + .long R(ec, 9a, b7, 01), R(83, 4f, 9a, a8) + .long R(e6, 95, 6e, 65), R(aa, ff, e6, 7e) + .long R(21, bc, cf, 08), R(ef, 15, e8, e6) + .long R(ba, e7, 9b, d9), R(4a, 6f, 36, ce) + .long R(ea, 9f, 09, d4), R(29, b0, 7c, d6) + .long R(31, a4, b2, af), R(2a, 3f, 23, 31) + .long R(c6, a5, 94, 30), R(35, a2, 66, c0) + .long R(74, 4e, bc, 37), R(fc, 82, ca, a6) + .long R(e0, 90, d0, b0), R(33, a7, d8, 15) + .long R(f1, 04, 98, 4a), R(41, ec, da, f7) + .long R(7f, cd, 50, 0e), R(17, 91, f6, 2f) + .long R(76, 4d, d6, 8d), R(43, ef, b0, 4d) + .long R(cc, aa, 4d, 54), R(e4, 96, 04, df) + .long R(9e, d1, b5, e3), R(4c, 6a, 88, 1b) + .long R(c1, 2c, 1f, b8), R(46, 65, 51, 7f) + .long R(9d, 5e, ea, 04), R(01, 8c, 35, 5d) + .long R(fa, 87, 74, 73), R(fb, 0b, 41, 2e) + .long R(b3, 67, 1d, 5a), R(92, db, d2, 52) + .long R(e9, 10, 56, 33), R(6d, d6, 47, 13) + .long R(9a, d7, 61, 8c), R(37, a1, 0c, 7a) + .long R(59, f8, 14, 8e), R(eb, 13, 3c, 89) + .long R(ce, a9, 27, ee), R(b7, 61, c9, 35) + .long R(e1, 1c, e5, ed), R(7a, 47, b1, 3c) + .long R(9c, d2, df, 59), R(55, f2, 73, 3f) + .long R(18, 14, ce, 79), R(73, c7, 37, bf) + .long R(53, f7, cd, ea), R(5f, fd, aa, 5b) + .long R(df, 3d, 6f, 14), R(78, 44, db, 86) + .long R(ca, af, f3, 81), R(b9, 68, c4, 3e) + .long R(38, 24, 34, 2c), R(c2, a3, 40, 5f) + .long R(16, 1d, c3, 72), R(bc, e2, 25, 0c) + .long R(28, 3c, 49, 8b), R(ff, 0d, 95, 41) + .long R(39, a8, 01, 71), R(08, 0c, b3, de) + .long R(d8, b4, e4, 9c), R(64, 56, c1, 90) + .long R(7b, cb, 84, 61), R(d5, 32, b6, 70) + .long R(48, 6c, 5c, 74), R(d0, b8, 57, 42) +.globl PPC_AES_4K_DECTAB2 +PPC_AES_4K_DECTAB2: +/* decryption table, same as crypto_il_tab in crypto/aes-generic.c */ + .byte 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38 + .byte 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb + .byte 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87 + .byte 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb + .byte 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d + .byte 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e + .byte 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2 + .byte 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25 + .byte 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16 + .byte 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92 + .byte 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda + .byte 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84 + .byte 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a + .byte 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06 + .byte 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02 + .byte 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b + .byte 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea + .byte 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73 + .byte 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85 + .byte 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e + .byte 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89 + .byte 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b + .byte 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20 + .byte 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4 + .byte 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31 + .byte 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f + .byte 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d + .byte 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef + .byte 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0 + .byte 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61 + .byte 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26 + .byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d diff --git a/lib/crypto/powerpc/aes.h b/lib/crypto/powerpc/aes.h new file mode 100644 index 000000000000..cf22020f9050 --- /dev/null +++ b/lib/crypto/powerpc/aes.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2015 Markus Stockhausen + * Copyright 2026 Google LLC + */ +#include +#include +#include +#include +#include +#include + +EXPORT_SYMBOL_GPL(ppc_expand_key_128); +EXPORT_SYMBOL_GPL(ppc_expand_key_192); +EXPORT_SYMBOL_GPL(ppc_expand_key_256); +EXPORT_SYMBOL_GPL(ppc_generate_decrypt_key); +EXPORT_SYMBOL_GPL(ppc_encrypt_ecb); +EXPORT_SYMBOL_GPL(ppc_decrypt_ecb); +EXPORT_SYMBOL_GPL(ppc_encrypt_cbc); +EXPORT_SYMBOL_GPL(ppc_decrypt_cbc); +EXPORT_SYMBOL_GPL(ppc_crypt_ctr); +EXPORT_SYMBOL_GPL(ppc_encrypt_xts); +EXPORT_SYMBOL_GPL(ppc_decrypt_xts); + +void ppc_encrypt_aes(u8 *out, const u8 *in, const u32 *key_enc, u32 rounds); +void ppc_decrypt_aes(u8 *out, const u8 *in, const u32 *key_dec, u32 rounds); + +static void spe_begin(void) +{ + /* disable preemption and save users SPE registers if required */ + preempt_disable(); + enable_kernel_spe(); +} + +static void spe_end(void) +{ + disable_kernel_spe(); + /* reenable preemption */ + preempt_enable(); +} + +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + if (key_len == AES_KEYSIZE_128) + ppc_expand_key_128(k->spe_enc_key, in_key); + else if (key_len == AES_KEYSIZE_192) + ppc_expand_key_192(k->spe_enc_key, in_key); + else + ppc_expand_key_256(k->spe_enc_key, in_key); + + if (inv_k) + ppc_generate_decrypt_key(inv_k->spe_dec_key, k->spe_enc_key, + key_len); +} + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + spe_begin(); + ppc_encrypt_aes(out, in, key->k.spe_enc_key, key->nrounds / 2 - 1); + spe_end(); +} + +static void aes_decrypt_arch(const struct aes_key *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + spe_begin(); + ppc_decrypt_aes(out, in, key->inv_k.spe_dec_key, key->nrounds / 2 - 1); + spe_end(); +} -- cgit v1.2.3 From 7cf2082e74ce7f4f4b5e14cbe67a194d75e257ef Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:11 -0800 Subject: lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library Move the POWER8 AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the superseded "p8_aes" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs are now optimized for POWER8, whereas previously only crypto_cipher was (and optimizations weren't enabled by default, which this commit fixes too). Note that many of the functions in the POWER8 assembly code are still used by the AES mode implementations in arch/powerpc/crypto/. For now, just export these functions. These exports will go away once the AES modes are migrated to the library as well. (Trying to split up the assembly file seemed like much more trouble than it would be worth.) Another challenge with this code is that the POWER8 assembly code uses a custom format for the expanded AES key. Since that code is imported from OpenSSL and is also targeted to POWER8 (rather than POWER9 which has better data movement and byteswap instructions), that is not easily changed. For now I've just kept the custom format. To maintain full correctness, this requires executing some slow fallback code in the case where the usability of VSX changes between key expansion and use. This should be tolerable, as this case shouldn't happen in practice. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-14-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/powerpc/crypto/Makefile | 7 +- arch/powerpc/crypto/aes.c | 134 -- arch/powerpc/crypto/aesp8-ppc.h | 23 - arch/powerpc/crypto/aesp8-ppc.pl | 3889 ------------------------------------- arch/powerpc/crypto/vmx.c | 10 +- include/crypto/aes.h | 41 + lib/crypto/Kconfig | 2 +- lib/crypto/Makefile | 14 +- lib/crypto/powerpc/.gitignore | 2 + lib/crypto/powerpc/aes.h | 164 ++ lib/crypto/powerpc/aesp8-ppc.pl | 3890 ++++++++++++++++++++++++++++++++++++++ 11 files changed, 4115 insertions(+), 4061 deletions(-) delete mode 100644 arch/powerpc/crypto/aes.c delete mode 100644 arch/powerpc/crypto/aesp8-ppc.pl create mode 100644 lib/crypto/powerpc/.gitignore create mode 100644 lib/crypto/powerpc/aesp8-ppc.pl (limited to 'include') diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile index e22310da86b5..3ac0886282a2 100644 --- a/arch/powerpc/crypto/Makefile +++ b/arch/powerpc/crypto/Makefile @@ -11,7 +11,7 @@ obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o aes-ppc-spe-y := aes-spe-glue.o aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-ppc.o -vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o +vmx-crypto-objs := vmx.o ghashp8-ppc.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y) override flavour := linux-ppc64le @@ -26,15 +26,14 @@ endif quiet_cmd_perl = PERL $@ cmd_perl = $(PERL) $< $(flavour) > $@ -targets += aesp10-ppc.S ghashp10-ppc.S aesp8-ppc.S ghashp8-ppc.S +targets += aesp10-ppc.S ghashp10-ppc.S ghashp8-ppc.S $(obj)/aesp10-ppc.S $(obj)/ghashp10-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE $(call if_changed,perl) -$(obj)/aesp8-ppc.S $(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE +$(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE $(call if_changed,perl) OBJECT_FILES_NON_STANDARD_aesp10-ppc.o := y OBJECT_FILES_NON_STANDARD_ghashp10-ppc.o := y -OBJECT_FILES_NON_STANDARD_aesp8-ppc.o := y OBJECT_FILES_NON_STANDARD_ghashp8-ppc.o := y diff --git a/arch/powerpc/crypto/aes.c b/arch/powerpc/crypto/aes.c deleted file mode 100644 index b7192ee719fc..000000000000 --- a/arch/powerpc/crypto/aes.c +++ /dev/null @@ -1,134 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * AES routines supporting VMX instructions on the Power 8 - * - * Copyright (C) 2015 International Business Machines Inc. - * - * Author: Marcelo Henrique Cerri - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "aesp8-ppc.h" - -struct p8_aes_ctx { - struct crypto_cipher *fallback; - struct p8_aes_key enc_key; - struct p8_aes_key dec_key; -}; - -static int p8_aes_init(struct crypto_tfm *tfm) -{ - const char *alg = crypto_tfm_alg_name(tfm); - struct crypto_cipher *fallback; - struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK); - if (IS_ERR(fallback)) { - printk(KERN_ERR - "Failed to allocate transformation for '%s': %ld\n", - alg, PTR_ERR(fallback)); - return PTR_ERR(fallback); - } - - crypto_cipher_set_flags(fallback, - crypto_cipher_get_flags((struct - crypto_cipher *) - tfm)); - ctx->fallback = fallback; - - return 0; -} - -static void p8_aes_exit(struct crypto_tfm *tfm) -{ - struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - if (ctx->fallback) { - crypto_free_cipher(ctx->fallback); - ctx->fallback = NULL; - } -} - -static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key, - unsigned int keylen) -{ - int ret; - struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - preempt_disable(); - pagefault_disable(); - enable_kernel_vsx(); - ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key); - ret |= aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key); - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); - - ret |= crypto_cipher_setkey(ctx->fallback, key, keylen); - - return ret ? -EINVAL : 0; -} - -static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) -{ - struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - if (!crypto_simd_usable()) { - crypto_cipher_encrypt_one(ctx->fallback, dst, src); - } else { - preempt_disable(); - pagefault_disable(); - enable_kernel_vsx(); - aes_p8_encrypt(src, dst, &ctx->enc_key); - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); - } -} - -static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) -{ - struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - if (!crypto_simd_usable()) { - crypto_cipher_decrypt_one(ctx->fallback, dst, src); - } else { - preempt_disable(); - pagefault_disable(); - enable_kernel_vsx(); - aes_p8_decrypt(src, dst, &ctx->dec_key); - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); - } -} - -struct crypto_alg p8_aes_alg = { - .cra_name = "aes", - .cra_driver_name = "p8_aes", - .cra_module = THIS_MODULE, - .cra_priority = 1000, - .cra_type = NULL, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK, - .cra_alignmask = 0, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct p8_aes_ctx), - .cra_init = p8_aes_init, - .cra_exit = p8_aes_exit, - .cra_cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = p8_aes_setkey, - .cia_encrypt = p8_aes_encrypt, - .cia_decrypt = p8_aes_decrypt, - }, -}; diff --git a/arch/powerpc/crypto/aesp8-ppc.h b/arch/powerpc/crypto/aesp8-ppc.h index 0bea010128cb..6862c605cc33 100644 --- a/arch/powerpc/crypto/aesp8-ppc.h +++ b/arch/powerpc/crypto/aesp8-ppc.h @@ -2,30 +2,7 @@ #include #include -struct p8_aes_key { - u8 key[AES_MAX_KEYLENGTH]; - int rounds; -}; - extern struct shash_alg p8_ghash_alg; -extern struct crypto_alg p8_aes_alg; extern struct skcipher_alg p8_aes_cbc_alg; extern struct skcipher_alg p8_aes_ctr_alg; extern struct skcipher_alg p8_aes_xts_alg; - -int aes_p8_set_encrypt_key(const u8 *userKey, const int bits, - struct p8_aes_key *key); -int aes_p8_set_decrypt_key(const u8 *userKey, const int bits, - struct p8_aes_key *key); -void aes_p8_encrypt(const u8 *in, u8 *out, const struct p8_aes_key *key); -void aes_p8_decrypt(const u8 *in, u8 *out, const struct p8_aes_key *key); -void aes_p8_cbc_encrypt(const u8 *in, u8 *out, size_t len, - const struct p8_aes_key *key, u8 *iv, const int enc); -void aes_p8_ctr32_encrypt_blocks(const u8 *in, u8 *out, size_t len, - const struct p8_aes_key *key, const u8 *iv); -void aes_p8_xts_encrypt(const u8 *in, u8 *out, size_t len, - const struct p8_aes_key *key1, - const struct p8_aes_key *key2, u8 *iv); -void aes_p8_xts_decrypt(const u8 *in, u8 *out, size_t len, - const struct p8_aes_key *key1, - const struct p8_aes_key *key2, u8 *iv); diff --git a/arch/powerpc/crypto/aesp8-ppc.pl b/arch/powerpc/crypto/aesp8-ppc.pl deleted file mode 100644 index f729589d792e..000000000000 --- a/arch/powerpc/crypto/aesp8-ppc.pl +++ /dev/null @@ -1,3889 +0,0 @@ -#! /usr/bin/env perl -# SPDX-License-Identifier: GPL-2.0 - -# This code is taken from CRYPTOGAMs[1] and is included here using the option -# in the license to distribute the code under the GPL. Therefore this program -# is free software; you can redistribute it and/or modify it under the terms of -# the GNU General Public License version 2 as published by the Free Software -# Foundation. -# -# [1] https://www.openssl.org/~appro/cryptogams/ - -# Copyright (c) 2006-2017, CRYPTOGAMS by -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# * Redistributions of source code must retain copyright notices, -# this list of conditions and the following disclaimer. -# -# * Redistributions in binary form must reproduce the above -# copyright notice, this list of conditions and the following -# disclaimer in the documentation and/or other materials -# provided with the distribution. -# -# * Neither the name of the CRYPTOGAMS nor the names of its -# copyright holder and contributors may be used to endorse or -# promote products derived from this software without specific -# prior written permission. -# -# ALTERNATIVELY, provided that this notice is retained in full, this -# product may be distributed under the terms of the GNU General Public -# License (GPL), in which case the provisions of the GPL apply INSTEAD OF -# those given above. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -# ==================================================================== -# Written by Andy Polyakov for the OpenSSL -# project. The module is, however, dual licensed under OpenSSL and -# CRYPTOGAMS licenses depending on where you obtain it. For further -# details see https://www.openssl.org/~appro/cryptogams/. -# ==================================================================== -# -# This module implements support for AES instructions as per PowerISA -# specification version 2.07, first implemented by POWER8 processor. -# The module is endian-agnostic in sense that it supports both big- -# and little-endian cases. Data alignment in parallelizable modes is -# handled with VSX loads and stores, which implies MSR.VSX flag being -# set. It should also be noted that ISA specification doesn't prohibit -# alignment exceptions for these instructions on page boundaries. -# Initially alignment was handled in pure AltiVec/VMX way [when data -# is aligned programmatically, which in turn guarantees exception- -# free execution], but it turned to hamper performance when vcipher -# instructions are interleaved. It's reckoned that eventual -# misalignment penalties at page boundaries are in average lower -# than additional overhead in pure AltiVec approach. -# -# May 2016 -# -# Add XTS subroutine, 9x on little- and 12x improvement on big-endian -# systems were measured. -# -###################################################################### -# Current large-block performance in cycles per byte processed with -# 128-bit key (less is better). -# -# CBC en-/decrypt CTR XTS -# POWER8[le] 3.96/0.72 0.74 1.1 -# POWER8[be] 3.75/0.65 0.66 1.0 - -$flavour = shift; - -if ($flavour =~ /64/) { - $SIZE_T =8; - $LRSAVE =2*$SIZE_T; - $STU ="stdu"; - $POP ="ld"; - $PUSH ="std"; - $UCMP ="cmpld"; - $SHL ="sldi"; -} elsif ($flavour =~ /32/) { - $SIZE_T =4; - $LRSAVE =$SIZE_T; - $STU ="stwu"; - $POP ="lwz"; - $PUSH ="stw"; - $UCMP ="cmplw"; - $SHL ="slwi"; -} else { die "nonsense $flavour"; } - -$LITTLE_ENDIAN = ($flavour=~/le$/) ? $SIZE_T : 0; - -$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; -( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or -( $xlate="${dir}../../perlasm/ppc-xlate.pl" and -f $xlate) or -die "can't locate ppc-xlate.pl"; - -open STDOUT,"| $^X $xlate $flavour ".shift || die "can't call $xlate: $!"; - -$FRAME=8*$SIZE_T; -$prefix="aes_p8"; - -$sp="r1"; -$vrsave="r12"; - -######################################################################### -{{{ # Key setup procedures # -my ($inp,$bits,$out,$ptr,$cnt,$rounds)=map("r$_",(3..8)); -my ($zero,$in0,$in1,$key,$rcon,$mask,$tmp)=map("v$_",(0..6)); -my ($stage,$outperm,$outmask,$outhead,$outtail)=map("v$_",(7..11)); - -$code.=<<___; -.machine "any" - -.text - -.align 7 -rcon: -.long 0x01000000, 0x01000000, 0x01000000, 0x01000000 ?rev -.long 0x1b000000, 0x1b000000, 0x1b000000, 0x1b000000 ?rev -.long 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c ?rev -.long 0,0,0,0 ?asis -.long 0x0f102132, 0x43546576, 0x8798a9ba, 0xcbdcedfe -Lconsts: - mflr r0 - bcl 20,31,\$+4 - mflr $ptr #vvvvv "distance between . and rcon - addi $ptr,$ptr,-0x58 - mtlr r0 - blr - .long 0 - .byte 0,12,0x14,0,0,0,0,0 -.asciz "AES for PowerISA 2.07, CRYPTOGAMS by " - -.globl .${prefix}_set_encrypt_key -Lset_encrypt_key: - mflr r11 - $PUSH r11,$LRSAVE($sp) - - li $ptr,-1 - ${UCMP}i $inp,0 - beq- Lenc_key_abort # if ($inp==0) return -1; - ${UCMP}i $out,0 - beq- Lenc_key_abort # if ($out==0) return -1; - li $ptr,-2 - cmpwi $bits,128 - blt- Lenc_key_abort - cmpwi $bits,256 - bgt- Lenc_key_abort - andi. r0,$bits,0x3f - bne- Lenc_key_abort - - lis r0,0xfff0 - mfspr $vrsave,256 - mtspr 256,r0 - - bl Lconsts - mtlr r11 - - neg r9,$inp - lvx $in0,0,$inp - addi $inp,$inp,15 # 15 is not typo - lvsr $key,0,r9 # borrow $key - li r8,0x20 - cmpwi $bits,192 - lvx $in1,0,$inp - le?vspltisb $mask,0x0f # borrow $mask - lvx $rcon,0,$ptr - le?vxor $key,$key,$mask # adjust for byte swap - lvx $mask,r8,$ptr - addi $ptr,$ptr,0x10 - vperm $in0,$in0,$in1,$key # align [and byte swap in LE] - li $cnt,8 - vxor $zero,$zero,$zero - mtctr $cnt - - ?lvsr $outperm,0,$out - vspltisb $outmask,-1 - lvx $outhead,0,$out - ?vperm $outmask,$zero,$outmask,$outperm - - blt Loop128 - addi $inp,$inp,8 - beq L192 - addi $inp,$inp,8 - b L256 - -.align 4 -Loop128: - vperm $key,$in0,$in0,$mask # rotate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vcipherlast $key,$key,$rcon - stvx $stage,0,$out - addi $out,$out,16 - - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vadduwm $rcon,$rcon,$rcon - vxor $in0,$in0,$key - bdnz Loop128 - - lvx $rcon,0,$ptr # last two round keys - - vperm $key,$in0,$in0,$mask # rotate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vcipherlast $key,$key,$rcon - stvx $stage,0,$out - addi $out,$out,16 - - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vadduwm $rcon,$rcon,$rcon - vxor $in0,$in0,$key - - vperm $key,$in0,$in0,$mask # rotate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vcipherlast $key,$key,$rcon - stvx $stage,0,$out - addi $out,$out,16 - - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vxor $in0,$in0,$key - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - stvx $stage,0,$out - - addi $inp,$out,15 # 15 is not typo - addi $out,$out,0x50 - - li $rounds,10 - b Ldone - -.align 4 -L192: - lvx $tmp,0,$inp - li $cnt,4 - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - stvx $stage,0,$out - addi $out,$out,16 - vperm $in1,$in1,$tmp,$key # align [and byte swap in LE] - vspltisb $key,8 # borrow $key - mtctr $cnt - vsububm $mask,$mask,$key # adjust the mask - -Loop192: - vperm $key,$in1,$in1,$mask # roate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vcipherlast $key,$key,$rcon - - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - - vsldoi $stage,$zero,$in1,8 - vspltw $tmp,$in0,3 - vxor $tmp,$tmp,$in1 - vsldoi $in1,$zero,$in1,12 # >>32 - vadduwm $rcon,$rcon,$rcon - vxor $in1,$in1,$tmp - vxor $in0,$in0,$key - vxor $in1,$in1,$key - vsldoi $stage,$stage,$in0,8 - - vperm $key,$in1,$in1,$mask # rotate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vperm $outtail,$stage,$stage,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vcipherlast $key,$key,$rcon - stvx $stage,0,$out - addi $out,$out,16 - - vsldoi $stage,$in0,$in1,8 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vperm $outtail,$stage,$stage,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - stvx $stage,0,$out - addi $out,$out,16 - - vspltw $tmp,$in0,3 - vxor $tmp,$tmp,$in1 - vsldoi $in1,$zero,$in1,12 # >>32 - vadduwm $rcon,$rcon,$rcon - vxor $in1,$in1,$tmp - vxor $in0,$in0,$key - vxor $in1,$in1,$key - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - stvx $stage,0,$out - addi $inp,$out,15 # 15 is not typo - addi $out,$out,16 - bdnz Loop192 - - li $rounds,12 - addi $out,$out,0x20 - b Ldone - -.align 4 -L256: - lvx $tmp,0,$inp - li $cnt,7 - li $rounds,14 - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - stvx $stage,0,$out - addi $out,$out,16 - vperm $in1,$in1,$tmp,$key # align [and byte swap in LE] - mtctr $cnt - -Loop256: - vperm $key,$in1,$in1,$mask # rotate-n-splat - vsldoi $tmp,$zero,$in0,12 # >>32 - vperm $outtail,$in1,$in1,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - vcipherlast $key,$key,$rcon - stvx $stage,0,$out - addi $out,$out,16 - - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in0,$in0,$tmp - vadduwm $rcon,$rcon,$rcon - vxor $in0,$in0,$key - vperm $outtail,$in0,$in0,$outperm # rotate - vsel $stage,$outhead,$outtail,$outmask - vmr $outhead,$outtail - stvx $stage,0,$out - addi $inp,$out,15 # 15 is not typo - addi $out,$out,16 - bdz Ldone - - vspltw $key,$in0,3 # just splat - vsldoi $tmp,$zero,$in1,12 # >>32 - vsbox $key,$key - - vxor $in1,$in1,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in1,$in1,$tmp - vsldoi $tmp,$zero,$tmp,12 # >>32 - vxor $in1,$in1,$tmp - - vxor $in1,$in1,$key - b Loop256 - -.align 4 -Ldone: - lvx $in1,0,$inp # redundant in aligned case - vsel $in1,$outhead,$in1,$outmask - stvx $in1,0,$inp - li $ptr,0 - mtspr 256,$vrsave - stw $rounds,0($out) - -Lenc_key_abort: - mr r3,$ptr - blr - .long 0 - .byte 0,12,0x14,1,0,0,3,0 - .long 0 -.size .${prefix}_set_encrypt_key,.-.${prefix}_set_encrypt_key - -.globl .${prefix}_set_decrypt_key - $STU $sp,-$FRAME($sp) - mflr r10 - $PUSH r10,$FRAME+$LRSAVE($sp) - bl Lset_encrypt_key - mtlr r10 - - cmpwi r3,0 - bne- Ldec_key_abort - - slwi $cnt,$rounds,4 - subi $inp,$out,240 # first round key - srwi $rounds,$rounds,1 - add $out,$inp,$cnt # last round key - mtctr $rounds - -Ldeckey: - lwz r0, 0($inp) - lwz r6, 4($inp) - lwz r7, 8($inp) - lwz r8, 12($inp) - addi $inp,$inp,16 - lwz r9, 0($out) - lwz r10,4($out) - lwz r11,8($out) - lwz r12,12($out) - stw r0, 0($out) - stw r6, 4($out) - stw r7, 8($out) - stw r8, 12($out) - subi $out,$out,16 - stw r9, -16($inp) - stw r10,-12($inp) - stw r11,-8($inp) - stw r12,-4($inp) - bdnz Ldeckey - - xor r3,r3,r3 # return value -Ldec_key_abort: - addi $sp,$sp,$FRAME - blr - .long 0 - .byte 0,12,4,1,0x80,0,3,0 - .long 0 -.size .${prefix}_set_decrypt_key,.-.${prefix}_set_decrypt_key -___ -}}} -######################################################################### -{{{ # Single block en- and decrypt procedures # -sub gen_block () { -my $dir = shift; -my $n = $dir eq "de" ? "n" : ""; -my ($inp,$out,$key,$rounds,$idx)=map("r$_",(3..7)); - -$code.=<<___; -.globl .${prefix}_${dir}crypt - lwz $rounds,240($key) - lis r0,0xfc00 - mfspr $vrsave,256 - li $idx,15 # 15 is not typo - mtspr 256,r0 - - lvx v0,0,$inp - neg r11,$out - lvx v1,$idx,$inp - lvsl v2,0,$inp # inpperm - le?vspltisb v4,0x0f - ?lvsl v3,0,r11 # outperm - le?vxor v2,v2,v4 - li $idx,16 - vperm v0,v0,v1,v2 # align [and byte swap in LE] - lvx v1,0,$key - ?lvsl v5,0,$key # keyperm - srwi $rounds,$rounds,1 - lvx v2,$idx,$key - addi $idx,$idx,16 - subi $rounds,$rounds,1 - ?vperm v1,v1,v2,v5 # align round key - - vxor v0,v0,v1 - lvx v1,$idx,$key - addi $idx,$idx,16 - mtctr $rounds - -Loop_${dir}c: - ?vperm v2,v2,v1,v5 - v${n}cipher v0,v0,v2 - lvx v2,$idx,$key - addi $idx,$idx,16 - ?vperm v1,v1,v2,v5 - v${n}cipher v0,v0,v1 - lvx v1,$idx,$key - addi $idx,$idx,16 - bdnz Loop_${dir}c - - ?vperm v2,v2,v1,v5 - v${n}cipher v0,v0,v2 - lvx v2,$idx,$key - ?vperm v1,v1,v2,v5 - v${n}cipherlast v0,v0,v1 - - vspltisb v2,-1 - vxor v1,v1,v1 - li $idx,15 # 15 is not typo - ?vperm v2,v1,v2,v3 # outmask - le?vxor v3,v3,v4 - lvx v1,0,$out # outhead - vperm v0,v0,v0,v3 # rotate [and byte swap in LE] - vsel v1,v1,v0,v2 - lvx v4,$idx,$out - stvx v1,0,$out - vsel v0,v0,v4,v2 - stvx v0,$idx,$out - - mtspr 256,$vrsave - blr - .long 0 - .byte 0,12,0x14,0,0,0,3,0 - .long 0 -.size .${prefix}_${dir}crypt,.-.${prefix}_${dir}crypt -___ -} -&gen_block("en"); -&gen_block("de"); -}}} -######################################################################### -{{{ # CBC en- and decrypt procedures # -my ($inp,$out,$len,$key,$ivp,$enc,$rounds,$idx)=map("r$_",(3..10)); -my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3)); -my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm)= - map("v$_",(4..10)); -$code.=<<___; -.globl .${prefix}_cbc_encrypt - ${UCMP}i $len,16 - bltlr- - - cmpwi $enc,0 # test direction - lis r0,0xffe0 - mfspr $vrsave,256 - mtspr 256,r0 - - li $idx,15 - vxor $rndkey0,$rndkey0,$rndkey0 - le?vspltisb $tmp,0x0f - - lvx $ivec,0,$ivp # load [unaligned] iv - lvsl $inpperm,0,$ivp - lvx $inptail,$idx,$ivp - le?vxor $inpperm,$inpperm,$tmp - vperm $ivec,$ivec,$inptail,$inpperm - - neg r11,$inp - ?lvsl $keyperm,0,$key # prepare for unaligned key - lwz $rounds,240($key) - - lvsr $inpperm,0,r11 # prepare for unaligned load - lvx $inptail,0,$inp - addi $inp,$inp,15 # 15 is not typo - le?vxor $inpperm,$inpperm,$tmp - - ?lvsr $outperm,0,$out # prepare for unaligned store - vspltisb $outmask,-1 - lvx $outhead,0,$out - ?vperm $outmask,$rndkey0,$outmask,$outperm - le?vxor $outperm,$outperm,$tmp - - srwi $rounds,$rounds,1 - li $idx,16 - subi $rounds,$rounds,1 - beq Lcbc_dec - -Lcbc_enc: - vmr $inout,$inptail - lvx $inptail,0,$inp - addi $inp,$inp,16 - mtctr $rounds - subi $len,$len,16 # len-=16 - - lvx $rndkey0,0,$key - vperm $inout,$inout,$inptail,$inpperm - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - vxor $inout,$inout,$ivec - -Loop_cbc_enc: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - bdnz Loop_cbc_enc - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - li $idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipherlast $ivec,$inout,$rndkey0 - ${UCMP}i $len,16 - - vperm $tmp,$ivec,$ivec,$outperm - vsel $inout,$outhead,$tmp,$outmask - vmr $outhead,$tmp - stvx $inout,0,$out - addi $out,$out,16 - bge Lcbc_enc - - b Lcbc_done - -.align 4 -Lcbc_dec: - ${UCMP}i $len,128 - bge _aesp8_cbc_decrypt8x - vmr $tmp,$inptail - lvx $inptail,0,$inp - addi $inp,$inp,16 - mtctr $rounds - subi $len,$len,16 # len-=16 - - lvx $rndkey0,0,$key - vperm $tmp,$tmp,$inptail,$inpperm - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$tmp,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - -Loop_cbc_dec: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vncipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - bdnz Loop_cbc_dec - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - li $idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vncipherlast $inout,$inout,$rndkey0 - ${UCMP}i $len,16 - - vxor $inout,$inout,$ivec - vmr $ivec,$tmp - vperm $tmp,$inout,$inout,$outperm - vsel $inout,$outhead,$tmp,$outmask - vmr $outhead,$tmp - stvx $inout,0,$out - addi $out,$out,16 - bge Lcbc_dec - -Lcbc_done: - addi $out,$out,-1 - lvx $inout,0,$out # redundant in aligned case - vsel $inout,$outhead,$inout,$outmask - stvx $inout,0,$out - - neg $enc,$ivp # write [unaligned] iv - li $idx,15 # 15 is not typo - vxor $rndkey0,$rndkey0,$rndkey0 - vspltisb $outmask,-1 - le?vspltisb $tmp,0x0f - ?lvsl $outperm,0,$enc - ?vperm $outmask,$rndkey0,$outmask,$outperm - le?vxor $outperm,$outperm,$tmp - lvx $outhead,0,$ivp - vperm $ivec,$ivec,$ivec,$outperm - vsel $inout,$outhead,$ivec,$outmask - lvx $inptail,$idx,$ivp - stvx $inout,0,$ivp - vsel $inout,$ivec,$inptail,$outmask - stvx $inout,$idx,$ivp - - mtspr 256,$vrsave - blr - .long 0 - .byte 0,12,0x14,0,0,0,6,0 - .long 0 -___ -######################################################################### -{{ # Optimized CBC decrypt procedure # -my $key_="r11"; -my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31)); -my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10..13)); -my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(14..21)); -my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys - # v26-v31 last 6 round keys -my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment - -$code.=<<___; -.align 5 -_aesp8_cbc_decrypt8x: - $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) - li r10,`$FRAME+8*16+15` - li r11,`$FRAME+8*16+31` - stvx v20,r10,$sp # ABI says so - addi r10,r10,32 - stvx v21,r11,$sp - addi r11,r11,32 - stvx v22,r10,$sp - addi r10,r10,32 - stvx v23,r11,$sp - addi r11,r11,32 - stvx v24,r10,$sp - addi r10,r10,32 - stvx v25,r11,$sp - addi r11,r11,32 - stvx v26,r10,$sp - addi r10,r10,32 - stvx v27,r11,$sp - addi r11,r11,32 - stvx v28,r10,$sp - addi r10,r10,32 - stvx v29,r11,$sp - addi r11,r11,32 - stvx v30,r10,$sp - stvx v31,r11,$sp - li r0,-1 - stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave - li $x10,0x10 - $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) - li $x20,0x20 - $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) - li $x30,0x30 - $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) - li $x40,0x40 - $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) - li $x50,0x50 - $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) - li $x60,0x60 - $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) - li $x70,0x70 - mtspr 256,r0 - - subi $rounds,$rounds,3 # -4 in total - subi $len,$len,128 # bias - - lvx $rndkey0,$x00,$key # load key schedule - lvx v30,$x10,$key - addi $key,$key,0x20 - lvx v31,$x00,$key - ?vperm $rndkey0,$rndkey0,v30,$keyperm - addi $key_,$sp,$FRAME+15 - mtctr $rounds - -Load_cbc_dec_key: - ?vperm v24,v30,v31,$keyperm - lvx v30,$x10,$key - addi $key,$key,0x20 - stvx v24,$x00,$key_ # off-load round[1] - ?vperm v25,v31,v30,$keyperm - lvx v31,$x00,$key - stvx v25,$x10,$key_ # off-load round[2] - addi $key_,$key_,0x20 - bdnz Load_cbc_dec_key - - lvx v26,$x10,$key - ?vperm v24,v30,v31,$keyperm - lvx v27,$x20,$key - stvx v24,$x00,$key_ # off-load round[3] - ?vperm v25,v31,v26,$keyperm - lvx v28,$x30,$key - stvx v25,$x10,$key_ # off-load round[4] - addi $key_,$sp,$FRAME+15 # rewind $key_ - ?vperm v26,v26,v27,$keyperm - lvx v29,$x40,$key - ?vperm v27,v27,v28,$keyperm - lvx v30,$x50,$key - ?vperm v28,v28,v29,$keyperm - lvx v31,$x60,$key - ?vperm v29,v29,v30,$keyperm - lvx $out0,$x70,$key # borrow $out0 - ?vperm v30,v30,v31,$keyperm - lvx v24,$x00,$key_ # pre-load round[1] - ?vperm v31,v31,$out0,$keyperm - lvx v25,$x10,$key_ # pre-load round[2] - - #lvx $inptail,0,$inp # "caller" already did this - #addi $inp,$inp,15 # 15 is not typo - subi $inp,$inp,15 # undo "caller" - - le?li $idx,8 - lvx_u $in0,$x00,$inp # load first 8 "words" - le?lvsl $inpperm,0,$idx - le?vspltisb $tmp,0x0f - lvx_u $in1,$x10,$inp - le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u - lvx_u $in2,$x20,$inp - le?vperm $in0,$in0,$in0,$inpperm - lvx_u $in3,$x30,$inp - le?vperm $in1,$in1,$in1,$inpperm - lvx_u $in4,$x40,$inp - le?vperm $in2,$in2,$in2,$inpperm - vxor $out0,$in0,$rndkey0 - lvx_u $in5,$x50,$inp - le?vperm $in3,$in3,$in3,$inpperm - vxor $out1,$in1,$rndkey0 - lvx_u $in6,$x60,$inp - le?vperm $in4,$in4,$in4,$inpperm - vxor $out2,$in2,$rndkey0 - lvx_u $in7,$x70,$inp - addi $inp,$inp,0x80 - le?vperm $in5,$in5,$in5,$inpperm - vxor $out3,$in3,$rndkey0 - le?vperm $in6,$in6,$in6,$inpperm - vxor $out4,$in4,$rndkey0 - le?vperm $in7,$in7,$in7,$inpperm - vxor $out5,$in5,$rndkey0 - vxor $out6,$in6,$rndkey0 - vxor $out7,$in7,$rndkey0 - - mtctr $rounds - b Loop_cbc_dec8x -.align 5 -Loop_cbc_dec8x: - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - vncipher $out6,$out6,v24 - vncipher $out7,$out7,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - vncipher $out6,$out6,v25 - vncipher $out7,$out7,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_cbc_dec8x - - subic $len,$len,128 # $len-=128 - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - vncipher $out6,$out6,v24 - vncipher $out7,$out7,v24 - - subfe. r0,r0,r0 # borrow?-1:0 - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - vncipher $out6,$out6,v25 - vncipher $out7,$out7,v25 - - and r0,r0,$len - vncipher $out0,$out0,v26 - vncipher $out1,$out1,v26 - vncipher $out2,$out2,v26 - vncipher $out3,$out3,v26 - vncipher $out4,$out4,v26 - vncipher $out5,$out5,v26 - vncipher $out6,$out6,v26 - vncipher $out7,$out7,v26 - - add $inp,$inp,r0 # $inp is adjusted in such - # way that at exit from the - # loop inX-in7 are loaded - # with last "words" - vncipher $out0,$out0,v27 - vncipher $out1,$out1,v27 - vncipher $out2,$out2,v27 - vncipher $out3,$out3,v27 - vncipher $out4,$out4,v27 - vncipher $out5,$out5,v27 - vncipher $out6,$out6,v27 - vncipher $out7,$out7,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vncipher $out0,$out0,v28 - vncipher $out1,$out1,v28 - vncipher $out2,$out2,v28 - vncipher $out3,$out3,v28 - vncipher $out4,$out4,v28 - vncipher $out5,$out5,v28 - vncipher $out6,$out6,v28 - vncipher $out7,$out7,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - - vncipher $out0,$out0,v29 - vncipher $out1,$out1,v29 - vncipher $out2,$out2,v29 - vncipher $out3,$out3,v29 - vncipher $out4,$out4,v29 - vncipher $out5,$out5,v29 - vncipher $out6,$out6,v29 - vncipher $out7,$out7,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - - vncipher $out0,$out0,v30 - vxor $ivec,$ivec,v31 # xor with last round key - vncipher $out1,$out1,v30 - vxor $in0,$in0,v31 - vncipher $out2,$out2,v30 - vxor $in1,$in1,v31 - vncipher $out3,$out3,v30 - vxor $in2,$in2,v31 - vncipher $out4,$out4,v30 - vxor $in3,$in3,v31 - vncipher $out5,$out5,v30 - vxor $in4,$in4,v31 - vncipher $out6,$out6,v30 - vxor $in5,$in5,v31 - vncipher $out7,$out7,v30 - vxor $in6,$in6,v31 - - vncipherlast $out0,$out0,$ivec - vncipherlast $out1,$out1,$in0 - lvx_u $in0,$x00,$inp # load next input block - vncipherlast $out2,$out2,$in1 - lvx_u $in1,$x10,$inp - vncipherlast $out3,$out3,$in2 - le?vperm $in0,$in0,$in0,$inpperm - lvx_u $in2,$x20,$inp - vncipherlast $out4,$out4,$in3 - le?vperm $in1,$in1,$in1,$inpperm - lvx_u $in3,$x30,$inp - vncipherlast $out5,$out5,$in4 - le?vperm $in2,$in2,$in2,$inpperm - lvx_u $in4,$x40,$inp - vncipherlast $out6,$out6,$in5 - le?vperm $in3,$in3,$in3,$inpperm - lvx_u $in5,$x50,$inp - vncipherlast $out7,$out7,$in6 - le?vperm $in4,$in4,$in4,$inpperm - lvx_u $in6,$x60,$inp - vmr $ivec,$in7 - le?vperm $in5,$in5,$in5,$inpperm - lvx_u $in7,$x70,$inp - addi $inp,$inp,0x80 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $in6,$in6,$in6,$inpperm - vxor $out0,$in0,$rndkey0 - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $in7,$in7,$in7,$inpperm - vxor $out1,$in1,$rndkey0 - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - vxor $out2,$in2,$rndkey0 - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x30,$out - vxor $out3,$in3,$rndkey0 - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x40,$out - vxor $out4,$in4,$rndkey0 - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x50,$out - vxor $out5,$in5,$rndkey0 - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x60,$out - vxor $out6,$in6,$rndkey0 - stvx_u $out7,$x70,$out - addi $out,$out,0x80 - vxor $out7,$in7,$rndkey0 - - mtctr $rounds - beq Loop_cbc_dec8x # did $len-=128 borrow? - - addic. $len,$len,128 - beq Lcbc_dec8x_done - nop - nop - -Loop_cbc_dec8x_tail: # up to 7 "words" tail... - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - vncipher $out6,$out6,v24 - vncipher $out7,$out7,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - vncipher $out6,$out6,v25 - vncipher $out7,$out7,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_cbc_dec8x_tail - - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - vncipher $out6,$out6,v24 - vncipher $out7,$out7,v24 - - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - vncipher $out6,$out6,v25 - vncipher $out7,$out7,v25 - - vncipher $out1,$out1,v26 - vncipher $out2,$out2,v26 - vncipher $out3,$out3,v26 - vncipher $out4,$out4,v26 - vncipher $out5,$out5,v26 - vncipher $out6,$out6,v26 - vncipher $out7,$out7,v26 - - vncipher $out1,$out1,v27 - vncipher $out2,$out2,v27 - vncipher $out3,$out3,v27 - vncipher $out4,$out4,v27 - vncipher $out5,$out5,v27 - vncipher $out6,$out6,v27 - vncipher $out7,$out7,v27 - - vncipher $out1,$out1,v28 - vncipher $out2,$out2,v28 - vncipher $out3,$out3,v28 - vncipher $out4,$out4,v28 - vncipher $out5,$out5,v28 - vncipher $out6,$out6,v28 - vncipher $out7,$out7,v28 - - vncipher $out1,$out1,v29 - vncipher $out2,$out2,v29 - vncipher $out3,$out3,v29 - vncipher $out4,$out4,v29 - vncipher $out5,$out5,v29 - vncipher $out6,$out6,v29 - vncipher $out7,$out7,v29 - - vncipher $out1,$out1,v30 - vxor $ivec,$ivec,v31 # last round key - vncipher $out2,$out2,v30 - vxor $in1,$in1,v31 - vncipher $out3,$out3,v30 - vxor $in2,$in2,v31 - vncipher $out4,$out4,v30 - vxor $in3,$in3,v31 - vncipher $out5,$out5,v30 - vxor $in4,$in4,v31 - vncipher $out6,$out6,v30 - vxor $in5,$in5,v31 - vncipher $out7,$out7,v30 - vxor $in6,$in6,v31 - - cmplwi $len,32 # switch($len) - blt Lcbc_dec8x_one - nop - beq Lcbc_dec8x_two - cmplwi $len,64 - blt Lcbc_dec8x_three - nop - beq Lcbc_dec8x_four - cmplwi $len,96 - blt Lcbc_dec8x_five - nop - beq Lcbc_dec8x_six - -Lcbc_dec8x_seven: - vncipherlast $out1,$out1,$ivec - vncipherlast $out2,$out2,$in1 - vncipherlast $out3,$out3,$in2 - vncipherlast $out4,$out4,$in3 - vncipherlast $out5,$out5,$in4 - vncipherlast $out6,$out6,$in5 - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out1,$out1,$out1,$inpperm - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x00,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x10,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x20,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x30,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x40,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x50,$out - stvx_u $out7,$x60,$out - addi $out,$out,0x70 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_six: - vncipherlast $out2,$out2,$ivec - vncipherlast $out3,$out3,$in2 - vncipherlast $out4,$out4,$in3 - vncipherlast $out5,$out5,$in4 - vncipherlast $out6,$out6,$in5 - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out2,$out2,$out2,$inpperm - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x00,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x10,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x20,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x30,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x40,$out - stvx_u $out7,$x50,$out - addi $out,$out,0x60 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_five: - vncipherlast $out3,$out3,$ivec - vncipherlast $out4,$out4,$in3 - vncipherlast $out5,$out5,$in4 - vncipherlast $out6,$out6,$in5 - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out3,$out3,$out3,$inpperm - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x00,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x10,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x20,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x30,$out - stvx_u $out7,$x40,$out - addi $out,$out,0x50 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_four: - vncipherlast $out4,$out4,$ivec - vncipherlast $out5,$out5,$in4 - vncipherlast $out6,$out6,$in5 - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out4,$out4,$out4,$inpperm - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x00,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x10,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x20,$out - stvx_u $out7,$x30,$out - addi $out,$out,0x40 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_three: - vncipherlast $out5,$out5,$ivec - vncipherlast $out6,$out6,$in5 - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out5,$out5,$out5,$inpperm - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x00,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x10,$out - stvx_u $out7,$x20,$out - addi $out,$out,0x30 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_two: - vncipherlast $out6,$out6,$ivec - vncipherlast $out7,$out7,$in6 - vmr $ivec,$in7 - - le?vperm $out6,$out6,$out6,$inpperm - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x00,$out - stvx_u $out7,$x10,$out - addi $out,$out,0x20 - b Lcbc_dec8x_done - -.align 5 -Lcbc_dec8x_one: - vncipherlast $out7,$out7,$ivec - vmr $ivec,$in7 - - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out7,0,$out - addi $out,$out,0x10 - -Lcbc_dec8x_done: - le?vperm $ivec,$ivec,$ivec,$inpperm - stvx_u $ivec,0,$ivp # write [unaligned] iv - - li r10,`$FRAME+15` - li r11,`$FRAME+31` - stvx $inpperm,r10,$sp # wipe copies of round keys - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - - mtspr 256,$vrsave - lvx v20,r10,$sp # ABI says so - addi r10,r10,32 - lvx v21,r11,$sp - addi r11,r11,32 - lvx v22,r10,$sp - addi r10,r10,32 - lvx v23,r11,$sp - addi r11,r11,32 - lvx v24,r10,$sp - addi r10,r10,32 - lvx v25,r11,$sp - addi r11,r11,32 - lvx v26,r10,$sp - addi r10,r10,32 - lvx v27,r11,$sp - addi r11,r11,32 - lvx v28,r10,$sp - addi r10,r10,32 - lvx v29,r11,$sp - addi r11,r11,32 - lvx v30,r10,$sp - lvx v31,r11,$sp - $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) - $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) - $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) - $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) - $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) - $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) - addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` - blr - .long 0 - .byte 0,12,0x14,0,0x80,6,6,0 - .long 0 -.size .${prefix}_cbc_encrypt,.-.${prefix}_cbc_encrypt -___ -}} }}} - -######################################################################### -{{{ # CTR procedure[s] # - -####################### WARNING: Here be dragons! ####################### -# -# This code is written as 'ctr32', based on a 32-bit counter used -# upstream. The kernel does *not* use a 32-bit counter. The kernel uses -# a 128-bit counter. -# -# This leads to subtle changes from the upstream code: the counter -# is incremented with vaddu_q_m rather than vaddu_w_m. This occurs in -# both the bulk (8 blocks at a time) path, and in the individual block -# path. Be aware of this when doing updates. -# -# See: -# 1d4aa0b4c181 ("crypto: vmx - Fixing AES-CTR counter bug") -# 009b30ac7444 ("crypto: vmx - CTR: always increment IV as quadword") -# https://github.com/openssl/openssl/pull/8942 -# -######################################################################### -my ($inp,$out,$len,$key,$ivp,$x10,$rounds,$idx)=map("r$_",(3..10)); -my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3)); -my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm,$one)= - map("v$_",(4..11)); -my $dat=$tmp; - -$code.=<<___; -.globl .${prefix}_ctr32_encrypt_blocks - ${UCMP}i $len,1 - bltlr- - - lis r0,0xfff0 - mfspr $vrsave,256 - mtspr 256,r0 - - li $idx,15 - vxor $rndkey0,$rndkey0,$rndkey0 - le?vspltisb $tmp,0x0f - - lvx $ivec,0,$ivp # load [unaligned] iv - lvsl $inpperm,0,$ivp - lvx $inptail,$idx,$ivp - vspltisb $one,1 - le?vxor $inpperm,$inpperm,$tmp - vperm $ivec,$ivec,$inptail,$inpperm - vsldoi $one,$rndkey0,$one,1 - - neg r11,$inp - ?lvsl $keyperm,0,$key # prepare for unaligned key - lwz $rounds,240($key) - - lvsr $inpperm,0,r11 # prepare for unaligned load - lvx $inptail,0,$inp - addi $inp,$inp,15 # 15 is not typo - le?vxor $inpperm,$inpperm,$tmp - - srwi $rounds,$rounds,1 - li $idx,16 - subi $rounds,$rounds,1 - - ${UCMP}i $len,8 - bge _aesp8_ctr32_encrypt8x - - ?lvsr $outperm,0,$out # prepare for unaligned store - vspltisb $outmask,-1 - lvx $outhead,0,$out - ?vperm $outmask,$rndkey0,$outmask,$outperm - le?vxor $outperm,$outperm,$tmp - - lvx $rndkey0,0,$key - mtctr $rounds - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$ivec,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - b Loop_ctr32_enc - -.align 5 -Loop_ctr32_enc: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - bdnz Loop_ctr32_enc - - vadduqm $ivec,$ivec,$one # Kernel change for 128-bit - vmr $dat,$inptail - lvx $inptail,0,$inp - addi $inp,$inp,16 - subic. $len,$len,1 # blocks-- - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key - vperm $dat,$dat,$inptail,$inpperm - li $idx,16 - ?vperm $rndkey1,$rndkey0,$rndkey1,$keyperm - lvx $rndkey0,0,$key - vxor $dat,$dat,$rndkey1 # last round key - vcipherlast $inout,$inout,$dat - - lvx $rndkey1,$idx,$key - addi $idx,$idx,16 - vperm $inout,$inout,$inout,$outperm - vsel $dat,$outhead,$inout,$outmask - mtctr $rounds - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vmr $outhead,$inout - vxor $inout,$ivec,$rndkey0 - lvx $rndkey0,$idx,$key - addi $idx,$idx,16 - stvx $dat,0,$out - addi $out,$out,16 - bne Loop_ctr32_enc - - addi $out,$out,-1 - lvx $inout,0,$out # redundant in aligned case - vsel $inout,$outhead,$inout,$outmask - stvx $inout,0,$out - - mtspr 256,$vrsave - blr - .long 0 - .byte 0,12,0x14,0,0,0,6,0 - .long 0 -___ -######################################################################### -{{ # Optimized CTR procedure # -my $key_="r11"; -my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31)); -my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10,12..14)); -my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(15..22)); -my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys - # v26-v31 last 6 round keys -my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment -my ($two,$three,$four)=($outhead,$outperm,$outmask); - -$code.=<<___; -.align 5 -_aesp8_ctr32_encrypt8x: - $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) - li r10,`$FRAME+8*16+15` - li r11,`$FRAME+8*16+31` - stvx v20,r10,$sp # ABI says so - addi r10,r10,32 - stvx v21,r11,$sp - addi r11,r11,32 - stvx v22,r10,$sp - addi r10,r10,32 - stvx v23,r11,$sp - addi r11,r11,32 - stvx v24,r10,$sp - addi r10,r10,32 - stvx v25,r11,$sp - addi r11,r11,32 - stvx v26,r10,$sp - addi r10,r10,32 - stvx v27,r11,$sp - addi r11,r11,32 - stvx v28,r10,$sp - addi r10,r10,32 - stvx v29,r11,$sp - addi r11,r11,32 - stvx v30,r10,$sp - stvx v31,r11,$sp - li r0,-1 - stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave - li $x10,0x10 - $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) - li $x20,0x20 - $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) - li $x30,0x30 - $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) - li $x40,0x40 - $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) - li $x50,0x50 - $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) - li $x60,0x60 - $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) - li $x70,0x70 - mtspr 256,r0 - - subi $rounds,$rounds,3 # -4 in total - - lvx $rndkey0,$x00,$key # load key schedule - lvx v30,$x10,$key - addi $key,$key,0x20 - lvx v31,$x00,$key - ?vperm $rndkey0,$rndkey0,v30,$keyperm - addi $key_,$sp,$FRAME+15 - mtctr $rounds - -Load_ctr32_enc_key: - ?vperm v24,v30,v31,$keyperm - lvx v30,$x10,$key - addi $key,$key,0x20 - stvx v24,$x00,$key_ # off-load round[1] - ?vperm v25,v31,v30,$keyperm - lvx v31,$x00,$key - stvx v25,$x10,$key_ # off-load round[2] - addi $key_,$key_,0x20 - bdnz Load_ctr32_enc_key - - lvx v26,$x10,$key - ?vperm v24,v30,v31,$keyperm - lvx v27,$x20,$key - stvx v24,$x00,$key_ # off-load round[3] - ?vperm v25,v31,v26,$keyperm - lvx v28,$x30,$key - stvx v25,$x10,$key_ # off-load round[4] - addi $key_,$sp,$FRAME+15 # rewind $key_ - ?vperm v26,v26,v27,$keyperm - lvx v29,$x40,$key - ?vperm v27,v27,v28,$keyperm - lvx v30,$x50,$key - ?vperm v28,v28,v29,$keyperm - lvx v31,$x60,$key - ?vperm v29,v29,v30,$keyperm - lvx $out0,$x70,$key # borrow $out0 - ?vperm v30,v30,v31,$keyperm - lvx v24,$x00,$key_ # pre-load round[1] - ?vperm v31,v31,$out0,$keyperm - lvx v25,$x10,$key_ # pre-load round[2] - - vadduqm $two,$one,$one - subi $inp,$inp,15 # undo "caller" - $SHL $len,$len,4 - - vadduqm $out1,$ivec,$one # counter values ... - vadduqm $out2,$ivec,$two # (do all ctr adds as 128-bit) - vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0] - le?li $idx,8 - vadduqm $out3,$out1,$two - vxor $out1,$out1,$rndkey0 - le?lvsl $inpperm,0,$idx - vadduqm $out4,$out2,$two - vxor $out2,$out2,$rndkey0 - le?vspltisb $tmp,0x0f - vadduqm $out5,$out3,$two - vxor $out3,$out3,$rndkey0 - le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u - vadduqm $out6,$out4,$two - vxor $out4,$out4,$rndkey0 - vadduqm $out7,$out5,$two - vxor $out5,$out5,$rndkey0 - vadduqm $ivec,$out6,$two # next counter value - vxor $out6,$out6,$rndkey0 - vxor $out7,$out7,$rndkey0 - - mtctr $rounds - b Loop_ctr32_enc8x -.align 5 -Loop_ctr32_enc8x: - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - vcipher $out5,$out5,v24 - vcipher $out6,$out6,v24 - vcipher $out7,$out7,v24 -Loop_ctr32_enc8x_middle: - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vcipher $out4,$out4,v25 - vcipher $out5,$out5,v25 - vcipher $out6,$out6,v25 - vcipher $out7,$out7,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_ctr32_enc8x - - subic r11,$len,256 # $len-256, borrow $key_ - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - vcipher $out5,$out5,v24 - vcipher $out6,$out6,v24 - vcipher $out7,$out7,v24 - - subfe r0,r0,r0 # borrow?-1:0 - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vcipher $out4,$out4,v25 - vcipher $out5,$out5,v25 - vcipher $out6,$out6,v25 - vcipher $out7,$out7,v25 - - and r0,r0,r11 - addi $key_,$sp,$FRAME+15 # rewind $key_ - vcipher $out0,$out0,v26 - vcipher $out1,$out1,v26 - vcipher $out2,$out2,v26 - vcipher $out3,$out3,v26 - vcipher $out4,$out4,v26 - vcipher $out5,$out5,v26 - vcipher $out6,$out6,v26 - vcipher $out7,$out7,v26 - lvx v24,$x00,$key_ # re-pre-load round[1] - - subic $len,$len,129 # $len-=129 - vcipher $out0,$out0,v27 - addi $len,$len,1 # $len-=128 really - vcipher $out1,$out1,v27 - vcipher $out2,$out2,v27 - vcipher $out3,$out3,v27 - vcipher $out4,$out4,v27 - vcipher $out5,$out5,v27 - vcipher $out6,$out6,v27 - vcipher $out7,$out7,v27 - lvx v25,$x10,$key_ # re-pre-load round[2] - - vcipher $out0,$out0,v28 - lvx_u $in0,$x00,$inp # load input - vcipher $out1,$out1,v28 - lvx_u $in1,$x10,$inp - vcipher $out2,$out2,v28 - lvx_u $in2,$x20,$inp - vcipher $out3,$out3,v28 - lvx_u $in3,$x30,$inp - vcipher $out4,$out4,v28 - lvx_u $in4,$x40,$inp - vcipher $out5,$out5,v28 - lvx_u $in5,$x50,$inp - vcipher $out6,$out6,v28 - lvx_u $in6,$x60,$inp - vcipher $out7,$out7,v28 - lvx_u $in7,$x70,$inp - addi $inp,$inp,0x80 - - vcipher $out0,$out0,v29 - le?vperm $in0,$in0,$in0,$inpperm - vcipher $out1,$out1,v29 - le?vperm $in1,$in1,$in1,$inpperm - vcipher $out2,$out2,v29 - le?vperm $in2,$in2,$in2,$inpperm - vcipher $out3,$out3,v29 - le?vperm $in3,$in3,$in3,$inpperm - vcipher $out4,$out4,v29 - le?vperm $in4,$in4,$in4,$inpperm - vcipher $out5,$out5,v29 - le?vperm $in5,$in5,$in5,$inpperm - vcipher $out6,$out6,v29 - le?vperm $in6,$in6,$in6,$inpperm - vcipher $out7,$out7,v29 - le?vperm $in7,$in7,$in7,$inpperm - - add $inp,$inp,r0 # $inp is adjusted in such - # way that at exit from the - # loop inX-in7 are loaded - # with last "words" - subfe. r0,r0,r0 # borrow?-1:0 - vcipher $out0,$out0,v30 - vxor $in0,$in0,v31 # xor with last round key - vcipher $out1,$out1,v30 - vxor $in1,$in1,v31 - vcipher $out2,$out2,v30 - vxor $in2,$in2,v31 - vcipher $out3,$out3,v30 - vxor $in3,$in3,v31 - vcipher $out4,$out4,v30 - vxor $in4,$in4,v31 - vcipher $out5,$out5,v30 - vxor $in5,$in5,v31 - vcipher $out6,$out6,v30 - vxor $in6,$in6,v31 - vcipher $out7,$out7,v30 - vxor $in7,$in7,v31 - - bne Lctr32_enc8x_break # did $len-129 borrow? - - vcipherlast $in0,$out0,$in0 - vcipherlast $in1,$out1,$in1 - vadduqm $out1,$ivec,$one # counter values ... - vcipherlast $in2,$out2,$in2 - vadduqm $out2,$ivec,$two - vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0] - vcipherlast $in3,$out3,$in3 - vadduqm $out3,$out1,$two - vxor $out1,$out1,$rndkey0 - vcipherlast $in4,$out4,$in4 - vadduqm $out4,$out2,$two - vxor $out2,$out2,$rndkey0 - vcipherlast $in5,$out5,$in5 - vadduqm $out5,$out3,$two - vxor $out3,$out3,$rndkey0 - vcipherlast $in6,$out6,$in6 - vadduqm $out6,$out4,$two - vxor $out4,$out4,$rndkey0 - vcipherlast $in7,$out7,$in7 - vadduqm $out7,$out5,$two - vxor $out5,$out5,$rndkey0 - le?vperm $in0,$in0,$in0,$inpperm - vadduqm $ivec,$out6,$two # next counter value - vxor $out6,$out6,$rndkey0 - le?vperm $in1,$in1,$in1,$inpperm - vxor $out7,$out7,$rndkey0 - mtctr $rounds - - vcipher $out0,$out0,v24 - stvx_u $in0,$x00,$out - le?vperm $in2,$in2,$in2,$inpperm - vcipher $out1,$out1,v24 - stvx_u $in1,$x10,$out - le?vperm $in3,$in3,$in3,$inpperm - vcipher $out2,$out2,v24 - stvx_u $in2,$x20,$out - le?vperm $in4,$in4,$in4,$inpperm - vcipher $out3,$out3,v24 - stvx_u $in3,$x30,$out - le?vperm $in5,$in5,$in5,$inpperm - vcipher $out4,$out4,v24 - stvx_u $in4,$x40,$out - le?vperm $in6,$in6,$in6,$inpperm - vcipher $out5,$out5,v24 - stvx_u $in5,$x50,$out - le?vperm $in7,$in7,$in7,$inpperm - vcipher $out6,$out6,v24 - stvx_u $in6,$x60,$out - vcipher $out7,$out7,v24 - stvx_u $in7,$x70,$out - addi $out,$out,0x80 - - b Loop_ctr32_enc8x_middle - -.align 5 -Lctr32_enc8x_break: - cmpwi $len,-0x60 - blt Lctr32_enc8x_one - nop - beq Lctr32_enc8x_two - cmpwi $len,-0x40 - blt Lctr32_enc8x_three - nop - beq Lctr32_enc8x_four - cmpwi $len,-0x20 - blt Lctr32_enc8x_five - nop - beq Lctr32_enc8x_six - cmpwi $len,0x00 - blt Lctr32_enc8x_seven - -Lctr32_enc8x_eight: - vcipherlast $out0,$out0,$in0 - vcipherlast $out1,$out1,$in1 - vcipherlast $out2,$out2,$in2 - vcipherlast $out3,$out3,$in3 - vcipherlast $out4,$out4,$in4 - vcipherlast $out5,$out5,$in5 - vcipherlast $out6,$out6,$in6 - vcipherlast $out7,$out7,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x30,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x40,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x50,$out - le?vperm $out7,$out7,$out7,$inpperm - stvx_u $out6,$x60,$out - stvx_u $out7,$x70,$out - addi $out,$out,0x80 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_seven: - vcipherlast $out0,$out0,$in1 - vcipherlast $out1,$out1,$in2 - vcipherlast $out2,$out2,$in3 - vcipherlast $out3,$out3,$in4 - vcipherlast $out4,$out4,$in5 - vcipherlast $out5,$out5,$in6 - vcipherlast $out6,$out6,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x30,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x40,$out - le?vperm $out6,$out6,$out6,$inpperm - stvx_u $out5,$x50,$out - stvx_u $out6,$x60,$out - addi $out,$out,0x70 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_six: - vcipherlast $out0,$out0,$in2 - vcipherlast $out1,$out1,$in3 - vcipherlast $out2,$out2,$in4 - vcipherlast $out3,$out3,$in5 - vcipherlast $out4,$out4,$in6 - vcipherlast $out5,$out5,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x30,$out - le?vperm $out5,$out5,$out5,$inpperm - stvx_u $out4,$x40,$out - stvx_u $out5,$x50,$out - addi $out,$out,0x60 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_five: - vcipherlast $out0,$out0,$in3 - vcipherlast $out1,$out1,$in4 - vcipherlast $out2,$out2,$in5 - vcipherlast $out3,$out3,$in6 - vcipherlast $out4,$out4,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - le?vperm $out4,$out4,$out4,$inpperm - stvx_u $out3,$x30,$out - stvx_u $out4,$x40,$out - addi $out,$out,0x50 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_four: - vcipherlast $out0,$out0,$in4 - vcipherlast $out1,$out1,$in5 - vcipherlast $out2,$out2,$in6 - vcipherlast $out3,$out3,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$inpperm - stvx_u $out2,$x20,$out - stvx_u $out3,$x30,$out - addi $out,$out,0x40 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_three: - vcipherlast $out0,$out0,$in5 - vcipherlast $out1,$out1,$in6 - vcipherlast $out2,$out2,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - le?vperm $out2,$out2,$out2,$inpperm - stvx_u $out1,$x10,$out - stvx_u $out2,$x20,$out - addi $out,$out,0x30 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_two: - vcipherlast $out0,$out0,$in6 - vcipherlast $out1,$out1,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - le?vperm $out1,$out1,$out1,$inpperm - stvx_u $out0,$x00,$out - stvx_u $out1,$x10,$out - addi $out,$out,0x20 - b Lctr32_enc8x_done - -.align 5 -Lctr32_enc8x_one: - vcipherlast $out0,$out0,$in7 - - le?vperm $out0,$out0,$out0,$inpperm - stvx_u $out0,0,$out - addi $out,$out,0x10 - -Lctr32_enc8x_done: - li r10,`$FRAME+15` - li r11,`$FRAME+31` - stvx $inpperm,r10,$sp # wipe copies of round keys - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - stvx $inpperm,r10,$sp - addi r10,r10,32 - stvx $inpperm,r11,$sp - addi r11,r11,32 - - mtspr 256,$vrsave - lvx v20,r10,$sp # ABI says so - addi r10,r10,32 - lvx v21,r11,$sp - addi r11,r11,32 - lvx v22,r10,$sp - addi r10,r10,32 - lvx v23,r11,$sp - addi r11,r11,32 - lvx v24,r10,$sp - addi r10,r10,32 - lvx v25,r11,$sp - addi r11,r11,32 - lvx v26,r10,$sp - addi r10,r10,32 - lvx v27,r11,$sp - addi r11,r11,32 - lvx v28,r10,$sp - addi r10,r10,32 - lvx v29,r11,$sp - addi r11,r11,32 - lvx v30,r10,$sp - lvx v31,r11,$sp - $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) - $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) - $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) - $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) - $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) - $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) - addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` - blr - .long 0 - .byte 0,12,0x14,0,0x80,6,6,0 - .long 0 -.size .${prefix}_ctr32_encrypt_blocks,.-.${prefix}_ctr32_encrypt_blocks -___ -}} }}} - -######################################################################### -{{{ # XTS procedures # -# int aes_p8_xts_[en|de]crypt(const char *inp, char *out, size_t len, # -# const AES_KEY *key1, const AES_KEY *key2, # -# [const] unsigned char iv[16]); # -# If $key2 is NULL, then a "tweak chaining" mode is engaged, in which # -# input tweak value is assumed to be encrypted already, and last tweak # -# value, one suitable for consecutive call on same chunk of data, is # -# written back to original buffer. In addition, in "tweak chaining" # -# mode only complete input blocks are processed. # - -my ($inp,$out,$len,$key1,$key2,$ivp,$rounds,$idx) = map("r$_",(3..10)); -my ($rndkey0,$rndkey1,$inout) = map("v$_",(0..2)); -my ($output,$inptail,$inpperm,$leperm,$keyperm) = map("v$_",(3..7)); -my ($tweak,$seven,$eighty7,$tmp,$tweak1) = map("v$_",(8..12)); -my $taillen = $key2; - - ($inp,$idx) = ($idx,$inp); # reassign - -$code.=<<___; -.globl .${prefix}_xts_encrypt - mr $inp,r3 # reassign - li r3,-1 - ${UCMP}i $len,16 - bltlr- - - lis r0,0xfff0 - mfspr r12,256 # save vrsave - li r11,0 - mtspr 256,r0 - - vspltisb $seven,0x07 # 0x070707..07 - le?lvsl $leperm,r11,r11 - le?vspltisb $tmp,0x0f - le?vxor $leperm,$leperm,$seven - - li $idx,15 - lvx $tweak,0,$ivp # load [unaligned] iv - lvsl $inpperm,0,$ivp - lvx $inptail,$idx,$ivp - le?vxor $inpperm,$inpperm,$tmp - vperm $tweak,$tweak,$inptail,$inpperm - - neg r11,$inp - lvsr $inpperm,0,r11 # prepare for unaligned load - lvx $inout,0,$inp - addi $inp,$inp,15 # 15 is not typo - le?vxor $inpperm,$inpperm,$tmp - - ${UCMP}i $key2,0 # key2==NULL? - beq Lxts_enc_no_key2 - - ?lvsl $keyperm,0,$key2 # prepare for unaligned key - lwz $rounds,240($key2) - srwi $rounds,$rounds,1 - subi $rounds,$rounds,1 - li $idx,16 - - lvx $rndkey0,0,$key2 - lvx $rndkey1,$idx,$key2 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $tweak,$tweak,$rndkey0 - lvx $rndkey0,$idx,$key2 - addi $idx,$idx,16 - mtctr $rounds - -Ltweak_xts_enc: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $tweak,$tweak,$rndkey1 - lvx $rndkey1,$idx,$key2 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipher $tweak,$tweak,$rndkey0 - lvx $rndkey0,$idx,$key2 - addi $idx,$idx,16 - bdnz Ltweak_xts_enc - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $tweak,$tweak,$rndkey1 - lvx $rndkey1,$idx,$key2 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipherlast $tweak,$tweak,$rndkey0 - - li $ivp,0 # don't chain the tweak - b Lxts_enc - -Lxts_enc_no_key2: - li $idx,-16 - and $len,$len,$idx # in "tweak chaining" - # mode only complete - # blocks are processed -Lxts_enc: - lvx $inptail,0,$inp - addi $inp,$inp,16 - - ?lvsl $keyperm,0,$key1 # prepare for unaligned key - lwz $rounds,240($key1) - srwi $rounds,$rounds,1 - subi $rounds,$rounds,1 - li $idx,16 - - vslb $eighty7,$seven,$seven # 0x808080..80 - vor $eighty7,$eighty7,$seven # 0x878787..87 - vspltisb $tmp,1 # 0x010101..01 - vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01 - - ${UCMP}i $len,96 - bge _aesp8_xts_encrypt6x - - andi. $taillen,$len,15 - subic r0,$len,32 - subi $taillen,$taillen,16 - subfe r0,r0,r0 - and r0,r0,$taillen - add $inp,$inp,r0 - - lvx $rndkey0,0,$key1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - vperm $inout,$inout,$inptail,$inpperm - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$inout,$tweak - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - mtctr $rounds - b Loop_xts_enc - -.align 5 -Loop_xts_enc: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - bdnz Loop_xts_enc - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - li $idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $rndkey0,$rndkey0,$tweak - vcipherlast $output,$inout,$rndkey0 - - le?vperm $tmp,$output,$output,$leperm - be?nop - le?stvx_u $tmp,0,$out - be?stvx_u $output,0,$out - addi $out,$out,16 - - subic. $len,$len,16 - beq Lxts_enc_done - - vmr $inout,$inptail - lvx $inptail,0,$inp - addi $inp,$inp,16 - lvx $rndkey0,0,$key1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - - subic r0,$len,32 - subfe r0,r0,r0 - and r0,r0,$taillen - add $inp,$inp,r0 - - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vsldoi $tmp,$tmp,$tmp,15 - vand $tmp,$tmp,$eighty7 - vxor $tweak,$tweak,$tmp - - vperm $inout,$inout,$inptail,$inpperm - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$inout,$tweak - vxor $output,$output,$rndkey0 # just in case $len<16 - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - - mtctr $rounds - ${UCMP}i $len,16 - bge Loop_xts_enc - - vxor $output,$output,$tweak - lvsr $inpperm,0,$len # $inpperm is no longer needed - vxor $inptail,$inptail,$inptail # $inptail is no longer needed - vspltisb $tmp,-1 - vperm $inptail,$inptail,$tmp,$inpperm - vsel $inout,$inout,$output,$inptail - - subi r11,$out,17 - subi $out,$out,16 - mtctr $len - li $len,16 -Loop_xts_enc_steal: - lbzu r0,1(r11) - stb r0,16(r11) - bdnz Loop_xts_enc_steal - - mtctr $rounds - b Loop_xts_enc # one more time... - -Lxts_enc_done: - ${UCMP}i $ivp,0 - beq Lxts_enc_ret - - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vsldoi $tmp,$tmp,$tmp,15 - vand $tmp,$tmp,$eighty7 - vxor $tweak,$tweak,$tmp - - le?vperm $tweak,$tweak,$tweak,$leperm - stvx_u $tweak,0,$ivp - -Lxts_enc_ret: - mtspr 256,r12 # restore vrsave - li r3,0 - blr - .long 0 - .byte 0,12,0x04,0,0x80,6,6,0 - .long 0 -.size .${prefix}_xts_encrypt,.-.${prefix}_xts_encrypt - -.globl .${prefix}_xts_decrypt - mr $inp,r3 # reassign - li r3,-1 - ${UCMP}i $len,16 - bltlr- - - lis r0,0xfff8 - mfspr r12,256 # save vrsave - li r11,0 - mtspr 256,r0 - - andi. r0,$len,15 - neg r0,r0 - andi. r0,r0,16 - sub $len,$len,r0 - - vspltisb $seven,0x07 # 0x070707..07 - le?lvsl $leperm,r11,r11 - le?vspltisb $tmp,0x0f - le?vxor $leperm,$leperm,$seven - - li $idx,15 - lvx $tweak,0,$ivp # load [unaligned] iv - lvsl $inpperm,0,$ivp - lvx $inptail,$idx,$ivp - le?vxor $inpperm,$inpperm,$tmp - vperm $tweak,$tweak,$inptail,$inpperm - - neg r11,$inp - lvsr $inpperm,0,r11 # prepare for unaligned load - lvx $inout,0,$inp - addi $inp,$inp,15 # 15 is not typo - le?vxor $inpperm,$inpperm,$tmp - - ${UCMP}i $key2,0 # key2==NULL? - beq Lxts_dec_no_key2 - - ?lvsl $keyperm,0,$key2 # prepare for unaligned key - lwz $rounds,240($key2) - srwi $rounds,$rounds,1 - subi $rounds,$rounds,1 - li $idx,16 - - lvx $rndkey0,0,$key2 - lvx $rndkey1,$idx,$key2 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $tweak,$tweak,$rndkey0 - lvx $rndkey0,$idx,$key2 - addi $idx,$idx,16 - mtctr $rounds - -Ltweak_xts_dec: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $tweak,$tweak,$rndkey1 - lvx $rndkey1,$idx,$key2 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipher $tweak,$tweak,$rndkey0 - lvx $rndkey0,$idx,$key2 - addi $idx,$idx,16 - bdnz Ltweak_xts_dec - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vcipher $tweak,$tweak,$rndkey1 - lvx $rndkey1,$idx,$key2 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vcipherlast $tweak,$tweak,$rndkey0 - - li $ivp,0 # don't chain the tweak - b Lxts_dec - -Lxts_dec_no_key2: - neg $idx,$len - andi. $idx,$idx,15 - add $len,$len,$idx # in "tweak chaining" - # mode only complete - # blocks are processed -Lxts_dec: - lvx $inptail,0,$inp - addi $inp,$inp,16 - - ?lvsl $keyperm,0,$key1 # prepare for unaligned key - lwz $rounds,240($key1) - srwi $rounds,$rounds,1 - subi $rounds,$rounds,1 - li $idx,16 - - vslb $eighty7,$seven,$seven # 0x808080..80 - vor $eighty7,$eighty7,$seven # 0x878787..87 - vspltisb $tmp,1 # 0x010101..01 - vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01 - - ${UCMP}i $len,96 - bge _aesp8_xts_decrypt6x - - lvx $rndkey0,0,$key1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - vperm $inout,$inout,$inptail,$inpperm - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$inout,$tweak - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - mtctr $rounds - - ${UCMP}i $len,16 - blt Ltail_xts_dec - be?b Loop_xts_dec - -.align 5 -Loop_xts_dec: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vncipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - bdnz Loop_xts_dec - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - li $idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $rndkey0,$rndkey0,$tweak - vncipherlast $output,$inout,$rndkey0 - - le?vperm $tmp,$output,$output,$leperm - be?nop - le?stvx_u $tmp,0,$out - be?stvx_u $output,0,$out - addi $out,$out,16 - - subic. $len,$len,16 - beq Lxts_dec_done - - vmr $inout,$inptail - lvx $inptail,0,$inp - addi $inp,$inp,16 - lvx $rndkey0,0,$key1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vsldoi $tmp,$tmp,$tmp,15 - vand $tmp,$tmp,$eighty7 - vxor $tweak,$tweak,$tmp - - vperm $inout,$inout,$inptail,$inpperm - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $inout,$inout,$tweak - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - - mtctr $rounds - ${UCMP}i $len,16 - bge Loop_xts_dec - -Ltail_xts_dec: - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak1,$tweak,$tweak - vsldoi $tmp,$tmp,$tmp,15 - vand $tmp,$tmp,$eighty7 - vxor $tweak1,$tweak1,$tmp - - subi $inp,$inp,16 - add $inp,$inp,$len - - vxor $inout,$inout,$tweak # :-( - vxor $inout,$inout,$tweak1 # :-) - -Loop_xts_dec_short: - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vncipher $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - bdnz Loop_xts_dec_short - - ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm - vncipher $inout,$inout,$rndkey1 - lvx $rndkey1,$idx,$key1 - li $idx,16 - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - vxor $rndkey0,$rndkey0,$tweak1 - vncipherlast $output,$inout,$rndkey0 - - le?vperm $tmp,$output,$output,$leperm - be?nop - le?stvx_u $tmp,0,$out - be?stvx_u $output,0,$out - - vmr $inout,$inptail - lvx $inptail,0,$inp - #addi $inp,$inp,16 - lvx $rndkey0,0,$key1 - lvx $rndkey1,$idx,$key1 - addi $idx,$idx,16 - vperm $inout,$inout,$inptail,$inpperm - ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm - - lvsr $inpperm,0,$len # $inpperm is no longer needed - vxor $inptail,$inptail,$inptail # $inptail is no longer needed - vspltisb $tmp,-1 - vperm $inptail,$inptail,$tmp,$inpperm - vsel $inout,$inout,$output,$inptail - - vxor $rndkey0,$rndkey0,$tweak - vxor $inout,$inout,$rndkey0 - lvx $rndkey0,$idx,$key1 - addi $idx,$idx,16 - - subi r11,$out,1 - mtctr $len - li $len,16 -Loop_xts_dec_steal: - lbzu r0,1(r11) - stb r0,16(r11) - bdnz Loop_xts_dec_steal - - mtctr $rounds - b Loop_xts_dec # one more time... - -Lxts_dec_done: - ${UCMP}i $ivp,0 - beq Lxts_dec_ret - - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vsldoi $tmp,$tmp,$tmp,15 - vand $tmp,$tmp,$eighty7 - vxor $tweak,$tweak,$tmp - - le?vperm $tweak,$tweak,$tweak,$leperm - stvx_u $tweak,0,$ivp - -Lxts_dec_ret: - mtspr 256,r12 # restore vrsave - li r3,0 - blr - .long 0 - .byte 0,12,0x04,0,0x80,6,6,0 - .long 0 -.size .${prefix}_xts_decrypt,.-.${prefix}_xts_decrypt -___ -######################################################################### -{{ # Optimized XTS procedures # -my $key_=$key2; -my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,3,26..31)); - $x00=0 if ($flavour =~ /osx/); -my ($in0, $in1, $in2, $in3, $in4, $in5 )=map("v$_",(0..5)); -my ($out0, $out1, $out2, $out3, $out4, $out5)=map("v$_",(7,12..16)); -my ($twk0, $twk1, $twk2, $twk3, $twk4, $twk5)=map("v$_",(17..22)); -my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys - # v26-v31 last 6 round keys -my ($keyperm)=($out0); # aliases with "caller", redundant assignment -my $taillen=$x70; - -$code.=<<___; -.align 5 -_aesp8_xts_encrypt6x: - $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) - mflr r11 - li r7,`$FRAME+8*16+15` - li r3,`$FRAME+8*16+31` - $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp) - stvx v20,r7,$sp # ABI says so - addi r7,r7,32 - stvx v21,r3,$sp - addi r3,r3,32 - stvx v22,r7,$sp - addi r7,r7,32 - stvx v23,r3,$sp - addi r3,r3,32 - stvx v24,r7,$sp - addi r7,r7,32 - stvx v25,r3,$sp - addi r3,r3,32 - stvx v26,r7,$sp - addi r7,r7,32 - stvx v27,r3,$sp - addi r3,r3,32 - stvx v28,r7,$sp - addi r7,r7,32 - stvx v29,r3,$sp - addi r3,r3,32 - stvx v30,r7,$sp - stvx v31,r3,$sp - li r0,-1 - stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave - li $x10,0x10 - $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) - li $x20,0x20 - $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) - li $x30,0x30 - $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) - li $x40,0x40 - $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) - li $x50,0x50 - $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) - li $x60,0x60 - $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) - li $x70,0x70 - mtspr 256,r0 - - xxlor 2, 32+$eighty7, 32+$eighty7 - vsldoi $eighty7,$tmp,$eighty7,1 # 0x010101..87 - xxlor 1, 32+$eighty7, 32+$eighty7 - - # Load XOR Lconsts. - mr $x70, r6 - bl Lconsts - lxvw4x 0, $x40, r6 # load XOR contents - mr r6, $x70 - li $x70,0x70 - - subi $rounds,$rounds,3 # -4 in total - - lvx $rndkey0,$x00,$key1 # load key schedule - lvx v30,$x10,$key1 - addi $key1,$key1,0x20 - lvx v31,$x00,$key1 - ?vperm $rndkey0,$rndkey0,v30,$keyperm - addi $key_,$sp,$FRAME+15 - mtctr $rounds - -Load_xts_enc_key: - ?vperm v24,v30,v31,$keyperm - lvx v30,$x10,$key1 - addi $key1,$key1,0x20 - stvx v24,$x00,$key_ # off-load round[1] - ?vperm v25,v31,v30,$keyperm - lvx v31,$x00,$key1 - stvx v25,$x10,$key_ # off-load round[2] - addi $key_,$key_,0x20 - bdnz Load_xts_enc_key - - lvx v26,$x10,$key1 - ?vperm v24,v30,v31,$keyperm - lvx v27,$x20,$key1 - stvx v24,$x00,$key_ # off-load round[3] - ?vperm v25,v31,v26,$keyperm - lvx v28,$x30,$key1 - stvx v25,$x10,$key_ # off-load round[4] - addi $key_,$sp,$FRAME+15 # rewind $key_ - ?vperm v26,v26,v27,$keyperm - lvx v29,$x40,$key1 - ?vperm v27,v27,v28,$keyperm - lvx v30,$x50,$key1 - ?vperm v28,v28,v29,$keyperm - lvx v31,$x60,$key1 - ?vperm v29,v29,v30,$keyperm - lvx $twk5,$x70,$key1 # borrow $twk5 - ?vperm v30,v30,v31,$keyperm - lvx v24,$x00,$key_ # pre-load round[1] - ?vperm v31,v31,$twk5,$keyperm - lvx v25,$x10,$key_ # pre-load round[2] - - # Switch to use the following codes with 0x010101..87 to generate tweak. - # eighty7 = 0x010101..87 - # vsrab tmp, tweak, seven # next tweak value, right shift 7 bits - # vand tmp, tmp, eighty7 # last byte with carry - # vaddubm tweak, tweak, tweak # left shift 1 bit (x2) - # xxlor vsx, 0, 0 - # vpermxor tweak, tweak, tmp, vsx - - vperm $in0,$inout,$inptail,$inpperm - subi $inp,$inp,31 # undo "caller" - vxor $twk0,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vand $tmp,$tmp,$eighty7 - vxor $out0,$in0,$twk0 - xxlor 32+$in1, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in1 - - lvx_u $in1,$x10,$inp - vxor $twk1,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in1,$in1,$in1,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out1,$in1,$twk1 - xxlor 32+$in2, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in2 - - lvx_u $in2,$x20,$inp - andi. $taillen,$len,15 - vxor $twk2,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in2,$in2,$in2,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out2,$in2,$twk2 - xxlor 32+$in3, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in3 - - lvx_u $in3,$x30,$inp - sub $len,$len,$taillen - vxor $twk3,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in3,$in3,$in3,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out3,$in3,$twk3 - xxlor 32+$in4, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in4 - - lvx_u $in4,$x40,$inp - subi $len,$len,0x60 - vxor $twk4,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in4,$in4,$in4,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out4,$in4,$twk4 - xxlor 32+$in5, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in5 - - lvx_u $in5,$x50,$inp - addi $inp,$inp,0x60 - vxor $twk5,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in5,$in5,$in5,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out5,$in5,$twk5 - xxlor 32+$in0, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in0 - - vxor v31,v31,$rndkey0 - mtctr $rounds - b Loop_xts_enc6x - -.align 5 -Loop_xts_enc6x: - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - vcipher $out5,$out5,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vcipher $out4,$out4,v25 - vcipher $out5,$out5,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_xts_enc6x - - xxlor 32+$eighty7, 1, 1 # 0x010101..87 - - subic $len,$len,96 # $len-=96 - vxor $in0,$twk0,v31 # xor with last round key - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk0,$tweak,$rndkey0 - vaddubm $tweak,$tweak,$tweak - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - vcipher $out5,$out5,v24 - - subfe. r0,r0,r0 # borrow?-1:0 - vand $tmp,$tmp,$eighty7 - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - xxlor 32+$in1, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in1 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vxor $in1,$twk1,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk1,$tweak,$rndkey0 - vcipher $out4,$out4,v25 - vcipher $out5,$out5,v25 - - and r0,r0,$len - vaddubm $tweak,$tweak,$tweak - vcipher $out0,$out0,v26 - vcipher $out1,$out1,v26 - vand $tmp,$tmp,$eighty7 - vcipher $out2,$out2,v26 - vcipher $out3,$out3,v26 - xxlor 32+$in2, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in2 - vcipher $out4,$out4,v26 - vcipher $out5,$out5,v26 - - add $inp,$inp,r0 # $inp is adjusted in such - # way that at exit from the - # loop inX-in5 are loaded - # with last "words" - vxor $in2,$twk2,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk2,$tweak,$rndkey0 - vaddubm $tweak,$tweak,$tweak - vcipher $out0,$out0,v27 - vcipher $out1,$out1,v27 - vcipher $out2,$out2,v27 - vcipher $out3,$out3,v27 - vand $tmp,$tmp,$eighty7 - vcipher $out4,$out4,v27 - vcipher $out5,$out5,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - xxlor 32+$in3, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in3 - vcipher $out0,$out0,v28 - vcipher $out1,$out1,v28 - vxor $in3,$twk3,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk3,$tweak,$rndkey0 - vcipher $out2,$out2,v28 - vcipher $out3,$out3,v28 - vaddubm $tweak,$tweak,$tweak - vcipher $out4,$out4,v28 - vcipher $out5,$out5,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - vand $tmp,$tmp,$eighty7 - - vcipher $out0,$out0,v29 - vcipher $out1,$out1,v29 - xxlor 32+$in4, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in4 - vcipher $out2,$out2,v29 - vcipher $out3,$out3,v29 - vxor $in4,$twk4,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk4,$tweak,$rndkey0 - vcipher $out4,$out4,v29 - vcipher $out5,$out5,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vaddubm $tweak,$tweak,$tweak - - vcipher $out0,$out0,v30 - vcipher $out1,$out1,v30 - vand $tmp,$tmp,$eighty7 - vcipher $out2,$out2,v30 - vcipher $out3,$out3,v30 - xxlor 32+$in5, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in5 - vcipher $out4,$out4,v30 - vcipher $out5,$out5,v30 - vxor $in5,$twk5,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk5,$tweak,$rndkey0 - - vcipherlast $out0,$out0,$in0 - lvx_u $in0,$x00,$inp # load next input block - vaddubm $tweak,$tweak,$tweak - vcipherlast $out1,$out1,$in1 - lvx_u $in1,$x10,$inp - vcipherlast $out2,$out2,$in2 - le?vperm $in0,$in0,$in0,$leperm - lvx_u $in2,$x20,$inp - vand $tmp,$tmp,$eighty7 - vcipherlast $out3,$out3,$in3 - le?vperm $in1,$in1,$in1,$leperm - lvx_u $in3,$x30,$inp - vcipherlast $out4,$out4,$in4 - le?vperm $in2,$in2,$in2,$leperm - lvx_u $in4,$x40,$inp - xxlor 10, 32+$in0, 32+$in0 - xxlor 32+$in0, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in0 - xxlor 32+$in0, 10, 10 - vcipherlast $tmp,$out5,$in5 # last block might be needed - # in stealing mode - le?vperm $in3,$in3,$in3,$leperm - lvx_u $in5,$x50,$inp - addi $inp,$inp,0x60 - le?vperm $in4,$in4,$in4,$leperm - le?vperm $in5,$in5,$in5,$leperm - - le?vperm $out0,$out0,$out0,$leperm - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk0 - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - vxor $out1,$in1,$twk1 - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - vxor $out2,$in2,$twk2 - le?vperm $out4,$out4,$out4,$leperm - stvx_u $out3,$x30,$out - vxor $out3,$in3,$twk3 - le?vperm $out5,$tmp,$tmp,$leperm - stvx_u $out4,$x40,$out - vxor $out4,$in4,$twk4 - le?stvx_u $out5,$x50,$out - be?stvx_u $tmp, $x50,$out - vxor $out5,$in5,$twk5 - addi $out,$out,0x60 - - mtctr $rounds - beq Loop_xts_enc6x # did $len-=96 borrow? - - xxlor 32+$eighty7, 2, 2 # 0x010101..87 - - addic. $len,$len,0x60 - beq Lxts_enc6x_zero - cmpwi $len,0x20 - blt Lxts_enc6x_one - nop - beq Lxts_enc6x_two - cmpwi $len,0x40 - blt Lxts_enc6x_three - nop - beq Lxts_enc6x_four - -Lxts_enc6x_five: - vxor $out0,$in1,$twk0 - vxor $out1,$in2,$twk1 - vxor $out2,$in3,$twk2 - vxor $out3,$in4,$twk3 - vxor $out4,$in5,$twk4 - - bl _aesp8_xts_enc5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk5 # unused tweak - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - vxor $tmp,$out4,$twk5 # last block prep for stealing - le?vperm $out4,$out4,$out4,$leperm - stvx_u $out3,$x30,$out - stvx_u $out4,$x40,$out - addi $out,$out,0x50 - bne Lxts_enc6x_steal - b Lxts_enc6x_done - -.align 4 -Lxts_enc6x_four: - vxor $out0,$in2,$twk0 - vxor $out1,$in3,$twk1 - vxor $out2,$in4,$twk2 - vxor $out3,$in5,$twk3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_enc5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk4 # unused tweak - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - vxor $tmp,$out3,$twk4 # last block prep for stealing - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - stvx_u $out3,$x30,$out - addi $out,$out,0x40 - bne Lxts_enc6x_steal - b Lxts_enc6x_done - -.align 4 -Lxts_enc6x_three: - vxor $out0,$in3,$twk0 - vxor $out1,$in4,$twk1 - vxor $out2,$in5,$twk2 - vxor $out3,$out3,$out3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_enc5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk3 # unused tweak - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $tmp,$out2,$twk3 # last block prep for stealing - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - stvx_u $out2,$x20,$out - addi $out,$out,0x30 - bne Lxts_enc6x_steal - b Lxts_enc6x_done - -.align 4 -Lxts_enc6x_two: - vxor $out0,$in4,$twk0 - vxor $out1,$in5,$twk1 - vxor $out2,$out2,$out2 - vxor $out3,$out3,$out3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_enc5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk2 # unused tweak - vxor $tmp,$out1,$twk2 # last block prep for stealing - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - stvx_u $out1,$x10,$out - addi $out,$out,0x20 - bne Lxts_enc6x_steal - b Lxts_enc6x_done - -.align 4 -Lxts_enc6x_one: - vxor $out0,$in5,$twk0 - nop -Loop_xts_enc1x: - vcipher $out0,$out0,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vcipher $out0,$out0,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_xts_enc1x - - add $inp,$inp,$taillen - cmpwi $taillen,0 - vcipher $out0,$out0,v24 - - subi $inp,$inp,16 - vcipher $out0,$out0,v25 - - lvsr $inpperm,0,$taillen - vcipher $out0,$out0,v26 - - lvx_u $in0,0,$inp - vcipher $out0,$out0,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vcipher $out0,$out0,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - - vcipher $out0,$out0,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vxor $twk0,$twk0,v31 - - le?vperm $in0,$in0,$in0,$leperm - vcipher $out0,$out0,v30 - - vperm $in0,$in0,$in0,$inpperm - vcipherlast $out0,$out0,$twk0 - - vmr $twk0,$twk1 # unused tweak - vxor $tmp,$out0,$twk1 # last block prep for stealing - le?vperm $out0,$out0,$out0,$leperm - stvx_u $out0,$x00,$out # store output - addi $out,$out,0x10 - bne Lxts_enc6x_steal - b Lxts_enc6x_done - -.align 4 -Lxts_enc6x_zero: - cmpwi $taillen,0 - beq Lxts_enc6x_done - - add $inp,$inp,$taillen - subi $inp,$inp,16 - lvx_u $in0,0,$inp - lvsr $inpperm,0,$taillen # $in5 is no more - le?vperm $in0,$in0,$in0,$leperm - vperm $in0,$in0,$in0,$inpperm - vxor $tmp,$tmp,$twk0 -Lxts_enc6x_steal: - vxor $in0,$in0,$twk0 - vxor $out0,$out0,$out0 - vspltisb $out1,-1 - vperm $out0,$out0,$out1,$inpperm - vsel $out0,$in0,$tmp,$out0 # $tmp is last block, remember? - - subi r30,$out,17 - subi $out,$out,16 - mtctr $taillen -Loop_xts_enc6x_steal: - lbzu r0,1(r30) - stb r0,16(r30) - bdnz Loop_xts_enc6x_steal - - li $taillen,0 - mtctr $rounds - b Loop_xts_enc1x # one more time... - -.align 4 -Lxts_enc6x_done: - ${UCMP}i $ivp,0 - beq Lxts_enc6x_ret - - vxor $tweak,$twk0,$rndkey0 - le?vperm $tweak,$tweak,$tweak,$leperm - stvx_u $tweak,0,$ivp - -Lxts_enc6x_ret: - mtlr r11 - li r10,`$FRAME+15` - li r11,`$FRAME+31` - stvx $seven,r10,$sp # wipe copies of round keys - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - - mtspr 256,$vrsave - lvx v20,r10,$sp # ABI says so - addi r10,r10,32 - lvx v21,r11,$sp - addi r11,r11,32 - lvx v22,r10,$sp - addi r10,r10,32 - lvx v23,r11,$sp - addi r11,r11,32 - lvx v24,r10,$sp - addi r10,r10,32 - lvx v25,r11,$sp - addi r11,r11,32 - lvx v26,r10,$sp - addi r10,r10,32 - lvx v27,r11,$sp - addi r11,r11,32 - lvx v28,r10,$sp - addi r10,r10,32 - lvx v29,r11,$sp - addi r11,r11,32 - lvx v30,r10,$sp - lvx v31,r11,$sp - $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) - $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) - $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) - $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) - $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) - $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) - addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` - blr - .long 0 - .byte 0,12,0x04,1,0x80,6,6,0 - .long 0 - -.align 5 -_aesp8_xts_enc5x: - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vcipher $out4,$out4,v25 - lvx v25,$x10,$key_ # round[4] - bdnz _aesp8_xts_enc5x - - add $inp,$inp,$taillen - cmpwi $taillen,0 - vcipher $out0,$out0,v24 - vcipher $out1,$out1,v24 - vcipher $out2,$out2,v24 - vcipher $out3,$out3,v24 - vcipher $out4,$out4,v24 - - subi $inp,$inp,16 - vcipher $out0,$out0,v25 - vcipher $out1,$out1,v25 - vcipher $out2,$out2,v25 - vcipher $out3,$out3,v25 - vcipher $out4,$out4,v25 - vxor $twk0,$twk0,v31 - - vcipher $out0,$out0,v26 - lvsr $inpperm,r0,$taillen # $in5 is no more - vcipher $out1,$out1,v26 - vcipher $out2,$out2,v26 - vcipher $out3,$out3,v26 - vcipher $out4,$out4,v26 - vxor $in1,$twk1,v31 - - vcipher $out0,$out0,v27 - lvx_u $in0,0,$inp - vcipher $out1,$out1,v27 - vcipher $out2,$out2,v27 - vcipher $out3,$out3,v27 - vcipher $out4,$out4,v27 - vxor $in2,$twk2,v31 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vcipher $out0,$out0,v28 - vcipher $out1,$out1,v28 - vcipher $out2,$out2,v28 - vcipher $out3,$out3,v28 - vcipher $out4,$out4,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - vxor $in3,$twk3,v31 - - vcipher $out0,$out0,v29 - le?vperm $in0,$in0,$in0,$leperm - vcipher $out1,$out1,v29 - vcipher $out2,$out2,v29 - vcipher $out3,$out3,v29 - vcipher $out4,$out4,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vxor $in4,$twk4,v31 - - vcipher $out0,$out0,v30 - vperm $in0,$in0,$in0,$inpperm - vcipher $out1,$out1,v30 - vcipher $out2,$out2,v30 - vcipher $out3,$out3,v30 - vcipher $out4,$out4,v30 - - vcipherlast $out0,$out0,$twk0 - vcipherlast $out1,$out1,$in1 - vcipherlast $out2,$out2,$in2 - vcipherlast $out3,$out3,$in3 - vcipherlast $out4,$out4,$in4 - blr - .long 0 - .byte 0,12,0x14,0,0,0,0,0 - -.align 5 -_aesp8_xts_decrypt6x: - $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) - mflr r11 - li r7,`$FRAME+8*16+15` - li r3,`$FRAME+8*16+31` - $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp) - stvx v20,r7,$sp # ABI says so - addi r7,r7,32 - stvx v21,r3,$sp - addi r3,r3,32 - stvx v22,r7,$sp - addi r7,r7,32 - stvx v23,r3,$sp - addi r3,r3,32 - stvx v24,r7,$sp - addi r7,r7,32 - stvx v25,r3,$sp - addi r3,r3,32 - stvx v26,r7,$sp - addi r7,r7,32 - stvx v27,r3,$sp - addi r3,r3,32 - stvx v28,r7,$sp - addi r7,r7,32 - stvx v29,r3,$sp - addi r3,r3,32 - stvx v30,r7,$sp - stvx v31,r3,$sp - li r0,-1 - stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave - li $x10,0x10 - $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) - li $x20,0x20 - $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) - li $x30,0x30 - $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) - li $x40,0x40 - $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) - li $x50,0x50 - $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) - li $x60,0x60 - $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) - li $x70,0x70 - mtspr 256,r0 - - xxlor 2, 32+$eighty7, 32+$eighty7 - vsldoi $eighty7,$tmp,$eighty7,1 # 0x010101..87 - xxlor 1, 32+$eighty7, 32+$eighty7 - - # Load XOR Lconsts. - mr $x70, r6 - bl Lconsts - lxvw4x 0, $x40, r6 # load XOR contents - mr r6, $x70 - li $x70,0x70 - - subi $rounds,$rounds,3 # -4 in total - - lvx $rndkey0,$x00,$key1 # load key schedule - lvx v30,$x10,$key1 - addi $key1,$key1,0x20 - lvx v31,$x00,$key1 - ?vperm $rndkey0,$rndkey0,v30,$keyperm - addi $key_,$sp,$FRAME+15 - mtctr $rounds - -Load_xts_dec_key: - ?vperm v24,v30,v31,$keyperm - lvx v30,$x10,$key1 - addi $key1,$key1,0x20 - stvx v24,$x00,$key_ # off-load round[1] - ?vperm v25,v31,v30,$keyperm - lvx v31,$x00,$key1 - stvx v25,$x10,$key_ # off-load round[2] - addi $key_,$key_,0x20 - bdnz Load_xts_dec_key - - lvx v26,$x10,$key1 - ?vperm v24,v30,v31,$keyperm - lvx v27,$x20,$key1 - stvx v24,$x00,$key_ # off-load round[3] - ?vperm v25,v31,v26,$keyperm - lvx v28,$x30,$key1 - stvx v25,$x10,$key_ # off-load round[4] - addi $key_,$sp,$FRAME+15 # rewind $key_ - ?vperm v26,v26,v27,$keyperm - lvx v29,$x40,$key1 - ?vperm v27,v27,v28,$keyperm - lvx v30,$x50,$key1 - ?vperm v28,v28,v29,$keyperm - lvx v31,$x60,$key1 - ?vperm v29,v29,v30,$keyperm - lvx $twk5,$x70,$key1 # borrow $twk5 - ?vperm v30,v30,v31,$keyperm - lvx v24,$x00,$key_ # pre-load round[1] - ?vperm v31,v31,$twk5,$keyperm - lvx v25,$x10,$key_ # pre-load round[2] - - vperm $in0,$inout,$inptail,$inpperm - subi $inp,$inp,31 # undo "caller" - vxor $twk0,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - vand $tmp,$tmp,$eighty7 - vxor $out0,$in0,$twk0 - xxlor 32+$in1, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in1 - - lvx_u $in1,$x10,$inp - vxor $twk1,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in1,$in1,$in1,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out1,$in1,$twk1 - xxlor 32+$in2, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in2 - - lvx_u $in2,$x20,$inp - andi. $taillen,$len,15 - vxor $twk2,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in2,$in2,$in2,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out2,$in2,$twk2 - xxlor 32+$in3, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in3 - - lvx_u $in3,$x30,$inp - sub $len,$len,$taillen - vxor $twk3,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in3,$in3,$in3,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out3,$in3,$twk3 - xxlor 32+$in4, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in4 - - lvx_u $in4,$x40,$inp - subi $len,$len,0x60 - vxor $twk4,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in4,$in4,$in4,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out4,$in4,$twk4 - xxlor 32+$in5, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in5 - - lvx_u $in5,$x50,$inp - addi $inp,$inp,0x60 - vxor $twk5,$tweak,$rndkey0 - vsrab $tmp,$tweak,$seven # next tweak value - vaddubm $tweak,$tweak,$tweak - le?vperm $in5,$in5,$in5,$leperm - vand $tmp,$tmp,$eighty7 - vxor $out5,$in5,$twk5 - xxlor 32+$in0, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in0 - - vxor v31,v31,$rndkey0 - mtctr $rounds - b Loop_xts_dec6x - -.align 5 -Loop_xts_dec6x: - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_xts_dec6x - - xxlor 32+$eighty7, 1, 1 # 0x010101..87 - - subic $len,$len,96 # $len-=96 - vxor $in0,$twk0,v31 # xor with last round key - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk0,$tweak,$rndkey0 - vaddubm $tweak,$tweak,$tweak - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - vncipher $out5,$out5,v24 - - subfe. r0,r0,r0 # borrow?-1:0 - vand $tmp,$tmp,$eighty7 - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - xxlor 32+$in1, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in1 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vxor $in1,$twk1,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk1,$tweak,$rndkey0 - vncipher $out4,$out4,v25 - vncipher $out5,$out5,v25 - - and r0,r0,$len - vaddubm $tweak,$tweak,$tweak - vncipher $out0,$out0,v26 - vncipher $out1,$out1,v26 - vand $tmp,$tmp,$eighty7 - vncipher $out2,$out2,v26 - vncipher $out3,$out3,v26 - xxlor 32+$in2, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in2 - vncipher $out4,$out4,v26 - vncipher $out5,$out5,v26 - - add $inp,$inp,r0 # $inp is adjusted in such - # way that at exit from the - # loop inX-in5 are loaded - # with last "words" - vxor $in2,$twk2,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk2,$tweak,$rndkey0 - vaddubm $tweak,$tweak,$tweak - vncipher $out0,$out0,v27 - vncipher $out1,$out1,v27 - vncipher $out2,$out2,v27 - vncipher $out3,$out3,v27 - vand $tmp,$tmp,$eighty7 - vncipher $out4,$out4,v27 - vncipher $out5,$out5,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - xxlor 32+$in3, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in3 - vncipher $out0,$out0,v28 - vncipher $out1,$out1,v28 - vxor $in3,$twk3,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk3,$tweak,$rndkey0 - vncipher $out2,$out2,v28 - vncipher $out3,$out3,v28 - vaddubm $tweak,$tweak,$tweak - vncipher $out4,$out4,v28 - vncipher $out5,$out5,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - vand $tmp,$tmp,$eighty7 - - vncipher $out0,$out0,v29 - vncipher $out1,$out1,v29 - xxlor 32+$in4, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in4 - vncipher $out2,$out2,v29 - vncipher $out3,$out3,v29 - vxor $in4,$twk4,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk4,$tweak,$rndkey0 - vncipher $out4,$out4,v29 - vncipher $out5,$out5,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vaddubm $tweak,$tweak,$tweak - - vncipher $out0,$out0,v30 - vncipher $out1,$out1,v30 - vand $tmp,$tmp,$eighty7 - vncipher $out2,$out2,v30 - vncipher $out3,$out3,v30 - xxlor 32+$in5, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in5 - vncipher $out4,$out4,v30 - vncipher $out5,$out5,v30 - vxor $in5,$twk5,v31 - vsrab $tmp,$tweak,$seven # next tweak value - vxor $twk5,$tweak,$rndkey0 - - vncipherlast $out0,$out0,$in0 - lvx_u $in0,$x00,$inp # load next input block - vaddubm $tweak,$tweak,$tweak - vncipherlast $out1,$out1,$in1 - lvx_u $in1,$x10,$inp - vncipherlast $out2,$out2,$in2 - le?vperm $in0,$in0,$in0,$leperm - lvx_u $in2,$x20,$inp - vand $tmp,$tmp,$eighty7 - vncipherlast $out3,$out3,$in3 - le?vperm $in1,$in1,$in1,$leperm - lvx_u $in3,$x30,$inp - vncipherlast $out4,$out4,$in4 - le?vperm $in2,$in2,$in2,$leperm - lvx_u $in4,$x40,$inp - xxlor 10, 32+$in0, 32+$in0 - xxlor 32+$in0, 0, 0 - vpermxor $tweak, $tweak, $tmp, $in0 - xxlor 32+$in0, 10, 10 - vncipherlast $out5,$out5,$in5 - le?vperm $in3,$in3,$in3,$leperm - lvx_u $in5,$x50,$inp - addi $inp,$inp,0x60 - le?vperm $in4,$in4,$in4,$leperm - le?vperm $in5,$in5,$in5,$leperm - - le?vperm $out0,$out0,$out0,$leperm - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk0 - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - vxor $out1,$in1,$twk1 - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - vxor $out2,$in2,$twk2 - le?vperm $out4,$out4,$out4,$leperm - stvx_u $out3,$x30,$out - vxor $out3,$in3,$twk3 - le?vperm $out5,$out5,$out5,$leperm - stvx_u $out4,$x40,$out - vxor $out4,$in4,$twk4 - stvx_u $out5,$x50,$out - vxor $out5,$in5,$twk5 - addi $out,$out,0x60 - - mtctr $rounds - beq Loop_xts_dec6x # did $len-=96 borrow? - - xxlor 32+$eighty7, 2, 2 # 0x010101..87 - - addic. $len,$len,0x60 - beq Lxts_dec6x_zero - cmpwi $len,0x20 - blt Lxts_dec6x_one - nop - beq Lxts_dec6x_two - cmpwi $len,0x40 - blt Lxts_dec6x_three - nop - beq Lxts_dec6x_four - -Lxts_dec6x_five: - vxor $out0,$in1,$twk0 - vxor $out1,$in2,$twk1 - vxor $out2,$in3,$twk2 - vxor $out3,$in4,$twk3 - vxor $out4,$in5,$twk4 - - bl _aesp8_xts_dec5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk5 # unused tweak - vxor $twk1,$tweak,$rndkey0 - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk1 - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - le?vperm $out4,$out4,$out4,$leperm - stvx_u $out3,$x30,$out - stvx_u $out4,$x40,$out - addi $out,$out,0x50 - bne Lxts_dec6x_steal - b Lxts_dec6x_done - -.align 4 -Lxts_dec6x_four: - vxor $out0,$in2,$twk0 - vxor $out1,$in3,$twk1 - vxor $out2,$in4,$twk2 - vxor $out3,$in5,$twk3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_dec5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk4 # unused tweak - vmr $twk1,$twk5 - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk5 - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - le?vperm $out3,$out3,$out3,$leperm - stvx_u $out2,$x20,$out - stvx_u $out3,$x30,$out - addi $out,$out,0x40 - bne Lxts_dec6x_steal - b Lxts_dec6x_done - -.align 4 -Lxts_dec6x_three: - vxor $out0,$in3,$twk0 - vxor $out1,$in4,$twk1 - vxor $out2,$in5,$twk2 - vxor $out3,$out3,$out3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_dec5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk3 # unused tweak - vmr $twk1,$twk4 - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk4 - le?vperm $out2,$out2,$out2,$leperm - stvx_u $out1,$x10,$out - stvx_u $out2,$x20,$out - addi $out,$out,0x30 - bne Lxts_dec6x_steal - b Lxts_dec6x_done - -.align 4 -Lxts_dec6x_two: - vxor $out0,$in4,$twk0 - vxor $out1,$in5,$twk1 - vxor $out2,$out2,$out2 - vxor $out3,$out3,$out3 - vxor $out4,$out4,$out4 - - bl _aesp8_xts_dec5x - - le?vperm $out0,$out0,$out0,$leperm - vmr $twk0,$twk2 # unused tweak - vmr $twk1,$twk3 - le?vperm $out1,$out1,$out1,$leperm - stvx_u $out0,$x00,$out # store output - vxor $out0,$in0,$twk3 - stvx_u $out1,$x10,$out - addi $out,$out,0x20 - bne Lxts_dec6x_steal - b Lxts_dec6x_done - -.align 4 -Lxts_dec6x_one: - vxor $out0,$in5,$twk0 - nop -Loop_xts_dec1x: - vncipher $out0,$out0,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out0,$out0,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Loop_xts_dec1x - - subi r0,$taillen,1 - vncipher $out0,$out0,v24 - - andi. r0,r0,16 - cmpwi $taillen,0 - vncipher $out0,$out0,v25 - - sub $inp,$inp,r0 - vncipher $out0,$out0,v26 - - lvx_u $in0,0,$inp - vncipher $out0,$out0,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vncipher $out0,$out0,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - - vncipher $out0,$out0,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vxor $twk0,$twk0,v31 - - le?vperm $in0,$in0,$in0,$leperm - vncipher $out0,$out0,v30 - - mtctr $rounds - vncipherlast $out0,$out0,$twk0 - - vmr $twk0,$twk1 # unused tweak - vmr $twk1,$twk2 - le?vperm $out0,$out0,$out0,$leperm - stvx_u $out0,$x00,$out # store output - addi $out,$out,0x10 - vxor $out0,$in0,$twk2 - bne Lxts_dec6x_steal - b Lxts_dec6x_done - -.align 4 -Lxts_dec6x_zero: - cmpwi $taillen,0 - beq Lxts_dec6x_done - - lvx_u $in0,0,$inp - le?vperm $in0,$in0,$in0,$leperm - vxor $out0,$in0,$twk1 -Lxts_dec6x_steal: - vncipher $out0,$out0,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out0,$out0,v25 - lvx v25,$x10,$key_ # round[4] - bdnz Lxts_dec6x_steal - - add $inp,$inp,$taillen - vncipher $out0,$out0,v24 - - cmpwi $taillen,0 - vncipher $out0,$out0,v25 - - lvx_u $in0,0,$inp - vncipher $out0,$out0,v26 - - lvsr $inpperm,0,$taillen # $in5 is no more - vncipher $out0,$out0,v27 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vncipher $out0,$out0,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - - vncipher $out0,$out0,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vxor $twk1,$twk1,v31 - - le?vperm $in0,$in0,$in0,$leperm - vncipher $out0,$out0,v30 - - vperm $in0,$in0,$in0,$inpperm - vncipherlast $tmp,$out0,$twk1 - - le?vperm $out0,$tmp,$tmp,$leperm - le?stvx_u $out0,0,$out - be?stvx_u $tmp,0,$out - - vxor $out0,$out0,$out0 - vspltisb $out1,-1 - vperm $out0,$out0,$out1,$inpperm - vsel $out0,$in0,$tmp,$out0 - vxor $out0,$out0,$twk0 - - subi r30,$out,1 - mtctr $taillen -Loop_xts_dec6x_steal: - lbzu r0,1(r30) - stb r0,16(r30) - bdnz Loop_xts_dec6x_steal - - li $taillen,0 - mtctr $rounds - b Loop_xts_dec1x # one more time... - -.align 4 -Lxts_dec6x_done: - ${UCMP}i $ivp,0 - beq Lxts_dec6x_ret - - vxor $tweak,$twk0,$rndkey0 - le?vperm $tweak,$tweak,$tweak,$leperm - stvx_u $tweak,0,$ivp - -Lxts_dec6x_ret: - mtlr r11 - li r10,`$FRAME+15` - li r11,`$FRAME+31` - stvx $seven,r10,$sp # wipe copies of round keys - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - stvx $seven,r10,$sp - addi r10,r10,32 - stvx $seven,r11,$sp - addi r11,r11,32 - - mtspr 256,$vrsave - lvx v20,r10,$sp # ABI says so - addi r10,r10,32 - lvx v21,r11,$sp - addi r11,r11,32 - lvx v22,r10,$sp - addi r10,r10,32 - lvx v23,r11,$sp - addi r11,r11,32 - lvx v24,r10,$sp - addi r10,r10,32 - lvx v25,r11,$sp - addi r11,r11,32 - lvx v26,r10,$sp - addi r10,r10,32 - lvx v27,r11,$sp - addi r11,r11,32 - lvx v28,r10,$sp - addi r10,r10,32 - lvx v29,r11,$sp - addi r11,r11,32 - lvx v30,r10,$sp - lvx v31,r11,$sp - $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) - $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) - $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) - $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) - $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) - $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) - addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` - blr - .long 0 - .byte 0,12,0x04,1,0x80,6,6,0 - .long 0 - -.align 5 -_aesp8_xts_dec5x: - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - lvx v24,$x20,$key_ # round[3] - addi $key_,$key_,0x20 - - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - lvx v25,$x10,$key_ # round[4] - bdnz _aesp8_xts_dec5x - - subi r0,$taillen,1 - vncipher $out0,$out0,v24 - vncipher $out1,$out1,v24 - vncipher $out2,$out2,v24 - vncipher $out3,$out3,v24 - vncipher $out4,$out4,v24 - - andi. r0,r0,16 - cmpwi $taillen,0 - vncipher $out0,$out0,v25 - vncipher $out1,$out1,v25 - vncipher $out2,$out2,v25 - vncipher $out3,$out3,v25 - vncipher $out4,$out4,v25 - vxor $twk0,$twk0,v31 - - sub $inp,$inp,r0 - vncipher $out0,$out0,v26 - vncipher $out1,$out1,v26 - vncipher $out2,$out2,v26 - vncipher $out3,$out3,v26 - vncipher $out4,$out4,v26 - vxor $in1,$twk1,v31 - - vncipher $out0,$out0,v27 - lvx_u $in0,0,$inp - vncipher $out1,$out1,v27 - vncipher $out2,$out2,v27 - vncipher $out3,$out3,v27 - vncipher $out4,$out4,v27 - vxor $in2,$twk2,v31 - - addi $key_,$sp,$FRAME+15 # rewind $key_ - vncipher $out0,$out0,v28 - vncipher $out1,$out1,v28 - vncipher $out2,$out2,v28 - vncipher $out3,$out3,v28 - vncipher $out4,$out4,v28 - lvx v24,$x00,$key_ # re-pre-load round[1] - vxor $in3,$twk3,v31 - - vncipher $out0,$out0,v29 - le?vperm $in0,$in0,$in0,$leperm - vncipher $out1,$out1,v29 - vncipher $out2,$out2,v29 - vncipher $out3,$out3,v29 - vncipher $out4,$out4,v29 - lvx v25,$x10,$key_ # re-pre-load round[2] - vxor $in4,$twk4,v31 - - vncipher $out0,$out0,v30 - vncipher $out1,$out1,v30 - vncipher $out2,$out2,v30 - vncipher $out3,$out3,v30 - vncipher $out4,$out4,v30 - - vncipherlast $out0,$out0,$twk0 - vncipherlast $out1,$out1,$in1 - vncipherlast $out2,$out2,$in2 - vncipherlast $out3,$out3,$in3 - vncipherlast $out4,$out4,$in4 - mtctr $rounds - blr - .long 0 - .byte 0,12,0x14,0,0,0,0,0 -___ -}} }}} - -my $consts=1; -foreach(split("\n",$code)) { - s/\`([^\`]*)\`/eval($1)/geo; - - # constants table endian-specific conversion - if ($consts && m/\.(long|byte)\s+(.+)\s+(\?[a-z]*)$/o) { - my $conv=$3; - my @bytes=(); - - # convert to endian-agnostic format - if ($1 eq "long") { - foreach (split(/,\s*/,$2)) { - my $l = /^0/?oct:int; - push @bytes,($l>>24)&0xff,($l>>16)&0xff,($l>>8)&0xff,$l&0xff; - } - } else { - @bytes = map(/^0/?oct:int,split(/,\s*/,$2)); - } - - # little-endian conversion - if ($flavour =~ /le$/o) { - SWITCH: for($conv) { - /\?inv/ && do { @bytes=map($_^0xf,@bytes); last; }; - /\?rev/ && do { @bytes=reverse(@bytes); last; }; - } - } - - #emit - print ".byte\t",join(',',map (sprintf("0x%02x",$_),@bytes)),"\n"; - next; - } - $consts=0 if (m/Lconsts:/o); # end of table - - # instructions prefixed with '?' are endian-specific and need - # to be adjusted accordingly... - if ($flavour =~ /le$/o) { # little-endian - s/le\?//o or - s/be\?/#be#/o or - s/\?lvsr/lvsl/o or - s/\?lvsl/lvsr/o or - s/\?(vperm\s+v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+)/$1$3$2$4/o or - s/\?(vsldoi\s+v[0-9]+,\s*)(v[0-9]+,)\s*(v[0-9]+,\s*)([0-9]+)/$1$3$2 16-$4/o or - s/\?(vspltw\s+v[0-9]+,\s*)(v[0-9]+,)\s*([0-9])/$1$2 3-$3/o; - } else { # big-endian - s/le\?/#le#/o or - s/be\?//o or - s/\?([a-z]+)/$1/o; - } - - print $_,"\n"; -} - -close STDOUT; diff --git a/arch/powerpc/crypto/vmx.c b/arch/powerpc/crypto/vmx.c index 0b725e826388..7d2beb774f99 100644 --- a/arch/powerpc/crypto/vmx.c +++ b/arch/powerpc/crypto/vmx.c @@ -27,13 +27,9 @@ static int __init p8_init(void) if (ret) goto err; - ret = crypto_register_alg(&p8_aes_alg); - if (ret) - goto err_unregister_ghash; - ret = crypto_register_skcipher(&p8_aes_cbc_alg); if (ret) - goto err_unregister_aes; + goto err_unregister_ghash; ret = crypto_register_skcipher(&p8_aes_ctr_alg); if (ret) @@ -49,8 +45,6 @@ err_unregister_aes_ctr: crypto_unregister_skcipher(&p8_aes_ctr_alg); err_unregister_aes_cbc: crypto_unregister_skcipher(&p8_aes_cbc_alg); -err_unregister_aes: - crypto_unregister_alg(&p8_aes_alg); err_unregister_ghash: crypto_unregister_shash(&p8_ghash_alg); err: @@ -62,7 +56,6 @@ static void __exit p8_exit(void) crypto_unregister_skcipher(&p8_aes_xts_alg); crypto_unregister_skcipher(&p8_aes_ctr_alg); crypto_unregister_skcipher(&p8_aes_cbc_alg); - crypto_unregister_alg(&p8_aes_alg); crypto_unregister_shash(&p8_ghash_alg); } @@ -74,4 +67,3 @@ MODULE_DESCRIPTION("IBM VMX cryptographic acceleration instructions " "support on Power 8"); MODULE_LICENSE("GPL"); MODULE_VERSION("1.0.0"); -MODULE_IMPORT_NS("CRYPTO_INTERNAL"); diff --git a/include/crypto/aes.h b/include/crypto/aes.h index c893c9214cb7..bff71cfaedeb 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -18,12 +18,34 @@ #define AES_MAX_KEYLENGTH (15 * 16) #define AES_MAX_KEYLENGTH_U32 (AES_MAX_KEYLENGTH / sizeof(u32)) +/* + * The POWER8 VSX optimized AES assembly code is borrowed from OpenSSL and + * inherits OpenSSL's AES_KEY format, which stores the number of rounds after + * the round keys. That assembly code is difficult to change. So for + * compatibility purposes we reserve space for the extra nrounds field on PPC64. + * + * Note: when prepared for decryption, the round keys are just the reversed + * standard round keys, not the round keys for the Equivalent Inverse Cipher. + */ +struct p8_aes_key { + u32 rndkeys[AES_MAX_KEYLENGTH_U32]; + int nrounds; +}; + union aes_enckey_arch { u32 rndkeys[AES_MAX_KEYLENGTH_U32]; #ifdef CONFIG_CRYPTO_LIB_AES_ARCH #if defined(CONFIG_PPC) && defined(CONFIG_SPE) /* Used unconditionally (when SPE AES code is enabled in kconfig) */ u32 spe_enc_key[AES_MAX_KEYLENGTH_U32] __aligned(8); +#elif defined(CONFIG_PPC) + /* + * Kernels that include the POWER8 VSX optimized AES code use this field + * when that code is usable at key preparation time. Otherwise they + * fall back to rndkeys. In the latter case, p8.nrounds (which doesn't + * overlap rndkeys) is set to 0 to differentiate the two formats. + */ + struct p8_aes_key p8; #endif #endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; @@ -34,6 +56,9 @@ union aes_invkey_arch { #if defined(CONFIG_PPC) && defined(CONFIG_SPE) /* Used unconditionally (when SPE AES code is enabled in kconfig) */ u32 spe_dec_key[AES_MAX_KEYLENGTH_U32] __aligned(8); +#elif defined(CONFIG_PPC) + /* Used conditionally, analogous to aes_enckey_arch::p8 */ + struct p8_aes_key p8; #endif #endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; @@ -155,6 +180,22 @@ void ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, u32 rounds, u32 bytes, u8 *iv, u32 *key_twk); void ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, u32 rounds, u32 bytes, u8 *iv, u32 *key_twk); +int aes_p8_set_encrypt_key(const u8 *userKey, const int bits, + struct p8_aes_key *key); +int aes_p8_set_decrypt_key(const u8 *userKey, const int bits, + struct p8_aes_key *key); +void aes_p8_encrypt(const u8 *in, u8 *out, const struct p8_aes_key *key); +void aes_p8_decrypt(const u8 *in, u8 *out, const struct p8_aes_key *key); +void aes_p8_cbc_encrypt(const u8 *in, u8 *out, size_t len, + const struct p8_aes_key *key, u8 *iv, const int enc); +void aes_p8_ctr32_encrypt_blocks(const u8 *in, u8 *out, size_t len, + const struct p8_aes_key *key, const u8 *iv); +void aes_p8_xts_encrypt(const u8 *in, u8 *out, size_t len, + const struct p8_aes_key *key1, + const struct p8_aes_key *key2, u8 *iv); +void aes_p8_xts_decrypt(const u8 *in, u8 *out, size_t len, + const struct p8_aes_key *key1, + const struct p8_aes_key *key2, u8 *iv); #endif /** diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index cfc6171203d0..a0f1c105827e 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -16,7 +16,7 @@ config CRYPTO_LIB_AES_ARCH depends on CRYPTO_LIB_AES && !UML && !KMSAN default y if ARM default y if ARM64 - default y if PPC && SPE + default y if PPC && (SPE || (PPC64 && VSX)) config CRYPTO_LIB_AESCFB tristate diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index d68fde004104..16140616ace8 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -35,7 +35,19 @@ libaes-y += powerpc/aes-spe-core.o \ powerpc/aes-spe-keys.o \ powerpc/aes-spe-modes.o \ powerpc/aes-tab-4k.o -endif +else +libaes-y += powerpc/aesp8-ppc.o +aes-perlasm-flavour-y := linux-ppc64 +aes-perlasm-flavour-$(CONFIG_PPC64_ELF_ABI_V2) := linux-ppc64-elfv2 +aes-perlasm-flavour-$(CONFIG_CPU_LITTLE_ENDIAN) := linux-ppc64le +quiet_cmd_perlasm_aes = PERLASM $@ + cmd_perlasm_aes = $(PERL) $< $(aes-perlasm-flavour-y) $@ +# Use if_changed instead of cmd, in case the flavour changed. +$(obj)/powerpc/aesp8-ppc.S: $(src)/powerpc/aesp8-ppc.pl FORCE + $(call if_changed,perlasm_aes) +targets += powerpc/aesp8-ppc.S +OBJECT_FILES_NON_STANDARD_powerpc/aesp8-ppc.o := y +endif # !CONFIG_SPE endif # CONFIG_PPC endif # CONFIG_CRYPTO_LIB_AES_ARCH diff --git a/lib/crypto/powerpc/.gitignore b/lib/crypto/powerpc/.gitignore new file mode 100644 index 000000000000..598ca7aff6b1 --- /dev/null +++ b/lib/crypto/powerpc/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +aesp8-ppc.S diff --git a/lib/crypto/powerpc/aes.h b/lib/crypto/powerpc/aes.h index cf22020f9050..42e0a993c619 100644 --- a/lib/crypto/powerpc/aes.h +++ b/lib/crypto/powerpc/aes.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (c) 2015 Markus Stockhausen + * Copyright (C) 2015 International Business Machines Inc. * Copyright 2026 Google LLC */ #include @@ -10,6 +11,8 @@ #include #include +#ifdef CONFIG_SPE + EXPORT_SYMBOL_GPL(ppc_expand_key_128); EXPORT_SYMBOL_GPL(ppc_expand_key_192); EXPORT_SYMBOL_GPL(ppc_expand_key_256); @@ -72,3 +75,164 @@ static void aes_decrypt_arch(const struct aes_key *key, ppc_decrypt_aes(out, in, key->inv_k.spe_dec_key, key->nrounds / 2 - 1); spe_end(); } + +#else /* CONFIG_SPE */ + +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_vec_crypto); + +EXPORT_SYMBOL_GPL(aes_p8_set_encrypt_key); +EXPORT_SYMBOL_GPL(aes_p8_set_decrypt_key); +EXPORT_SYMBOL_GPL(aes_p8_encrypt); +EXPORT_SYMBOL_GPL(aes_p8_decrypt); +EXPORT_SYMBOL_GPL(aes_p8_cbc_encrypt); +EXPORT_SYMBOL_GPL(aes_p8_ctr32_encrypt_blocks); +EXPORT_SYMBOL_GPL(aes_p8_xts_encrypt); +EXPORT_SYMBOL_GPL(aes_p8_xts_decrypt); + +static inline bool is_vsx_format(const struct p8_aes_key *key) +{ + return key->nrounds != 0; +} + +/* + * Convert a round key from VSX to generic format by reflecting the 16 bytes, + * and (if apply_inv_mix=true) applying InvMixColumn to each column. + * + * It would be nice if the VSX and generic key formats would be compatible. But + * that's very difficult to do, with the assembly code having been borrowed from + * OpenSSL and also targeted to POWER8 rather than POWER9. + * + * Fortunately, this conversion should only be needed in extremely rare cases, + * possibly not at all in practice. It's just included for full correctness. + */ +static void rndkey_from_vsx(u32 out[4], const u32 in[4], bool apply_inv_mix) +{ + u32 k0 = swab32(in[0]); + u32 k1 = swab32(in[1]); + u32 k2 = swab32(in[2]); + u32 k3 = swab32(in[3]); + + if (apply_inv_mix) { + k0 = inv_mix_columns(k0); + k1 = inv_mix_columns(k1); + k2 = inv_mix_columns(k2); + k3 = inv_mix_columns(k3); + } + out[0] = k3; + out[1] = k2; + out[2] = k1; + out[3] = k0; +} + +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + const int keybits = 8 * key_len; + int ret; + + if (static_branch_likely(&have_vec_crypto) && likely(may_use_simd())) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); + ret = aes_p8_set_encrypt_key(in_key, keybits, &k->p8); + /* + * aes_p8_set_encrypt_key() should never fail here, since the + * key length was already validated. + */ + WARN_ON_ONCE(ret); + if (inv_k) { + ret = aes_p8_set_decrypt_key(in_key, keybits, + &inv_k->p8); + /* ... and likewise for aes_p8_set_decrypt_key(). */ + WARN_ON_ONCE(ret); + } + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + } else { + aes_expandkey_generic(k->rndkeys, + inv_k ? inv_k->inv_rndkeys : NULL, + in_key, key_len); + /* Mark the key as using the generic format. */ + k->p8.nrounds = 0; + if (inv_k) + inv_k->p8.nrounds = 0; + } +} + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (static_branch_likely(&have_vec_crypto) && + likely(is_vsx_format(&key->k.p8) && may_use_simd())) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); + aes_p8_encrypt(in, out, &key->k.p8); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + } else if (unlikely(is_vsx_format(&key->k.p8))) { + /* + * This handles (the hopefully extremely rare) case where a key + * was prepared using the VSX optimized format, then encryption + * is done in a context that cannot use VSX instructions. + */ + u32 rndkeys[AES_MAX_KEYLENGTH_U32]; + + for (int i = 0; i < 4 * (key->nrounds + 1); i += 4) + rndkey_from_vsx(&rndkeys[i], + &key->k.p8.rndkeys[i], false); + aes_encrypt_generic(rndkeys, key->nrounds, out, in); + } else { + aes_encrypt_generic(key->k.rndkeys, key->nrounds, out, in); + } +} + +static void aes_decrypt_arch(const struct aes_key *key, u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (static_branch_likely(&have_vec_crypto) && + likely(is_vsx_format(&key->inv_k.p8) && may_use_simd())) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); + aes_p8_decrypt(in, out, &key->inv_k.p8); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + } else if (unlikely(is_vsx_format(&key->inv_k.p8))) { + /* + * This handles (the hopefully extremely rare) case where a key + * was prepared using the VSX optimized format, then decryption + * is done in a context that cannot use VSX instructions. + */ + u32 inv_rndkeys[AES_MAX_KEYLENGTH_U32]; + int i; + + rndkey_from_vsx(&inv_rndkeys[0], + &key->inv_k.p8.rndkeys[0], false); + for (i = 4; i < 4 * key->nrounds; i += 4) { + rndkey_from_vsx(&inv_rndkeys[i], + &key->inv_k.p8.rndkeys[i], true); + } + rndkey_from_vsx(&inv_rndkeys[i], + &key->inv_k.p8.rndkeys[i], false); + aes_decrypt_generic(inv_rndkeys, key->nrounds, out, in); + } else { + aes_decrypt_generic(key->inv_k.inv_rndkeys, key->nrounds, + out, in); + } +} + +#define aes_mod_init_arch aes_mod_init_arch +static void aes_mod_init_arch(void) +{ + if (cpu_has_feature(CPU_FTR_ARCH_207S) && + (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_VEC_CRYPTO)) + static_branch_enable(&have_vec_crypto); +} + +#endif /* !CONFIG_SPE */ diff --git a/lib/crypto/powerpc/aesp8-ppc.pl b/lib/crypto/powerpc/aesp8-ppc.pl new file mode 100644 index 000000000000..253a06758057 --- /dev/null +++ b/lib/crypto/powerpc/aesp8-ppc.pl @@ -0,0 +1,3890 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: GPL-2.0 + +# This code is taken from CRYPTOGAMs[1] and is included here using the option +# in the license to distribute the code under the GPL. Therefore this program +# is free software; you can redistribute it and/or modify it under the terms of +# the GNU General Public License version 2 as published by the Free Software +# Foundation. +# +# [1] https://www.openssl.org/~appro/cryptogams/ + +# Copyright (c) 2006-2017, CRYPTOGAMS by +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain copyright notices, +# this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above +# copyright notice, this list of conditions and the following +# disclaimer in the documentation and/or other materials +# provided with the distribution. +# +# * Neither the name of the CRYPTOGAMS nor the names of its +# copyright holder and contributors may be used to endorse or +# promote products derived from this software without specific +# prior written permission. +# +# ALTERNATIVELY, provided that this notice is retained in full, this +# product may be distributed under the terms of the GNU General Public +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF +# those given above. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# ==================================================================== +# Written by Andy Polyakov for the OpenSSL +# project. The module is, however, dual licensed under OpenSSL and +# CRYPTOGAMS licenses depending on where you obtain it. For further +# details see https://www.openssl.org/~appro/cryptogams/. +# ==================================================================== +# +# This module implements support for AES instructions as per PowerISA +# specification version 2.07, first implemented by POWER8 processor. +# The module is endian-agnostic in sense that it supports both big- +# and little-endian cases. Data alignment in parallelizable modes is +# handled with VSX loads and stores, which implies MSR.VSX flag being +# set. It should also be noted that ISA specification doesn't prohibit +# alignment exceptions for these instructions on page boundaries. +# Initially alignment was handled in pure AltiVec/VMX way [when data +# is aligned programmatically, which in turn guarantees exception- +# free execution], but it turned to hamper performance when vcipher +# instructions are interleaved. It's reckoned that eventual +# misalignment penalties at page boundaries are in average lower +# than additional overhead in pure AltiVec approach. +# +# May 2016 +# +# Add XTS subroutine, 9x on little- and 12x improvement on big-endian +# systems were measured. +# +###################################################################### +# Current large-block performance in cycles per byte processed with +# 128-bit key (less is better). +# +# CBC en-/decrypt CTR XTS +# POWER8[le] 3.96/0.72 0.74 1.1 +# POWER8[be] 3.75/0.65 0.66 1.0 + +$flavour = shift; + +if ($flavour =~ /64/) { + $SIZE_T =8; + $LRSAVE =2*$SIZE_T; + $STU ="stdu"; + $POP ="ld"; + $PUSH ="std"; + $UCMP ="cmpld"; + $SHL ="sldi"; +} elsif ($flavour =~ /32/) { + $SIZE_T =4; + $LRSAVE =$SIZE_T; + $STU ="stwu"; + $POP ="lwz"; + $PUSH ="stw"; + $UCMP ="cmplw"; + $SHL ="slwi"; +} else { die "nonsense $flavour"; } + +$LITTLE_ENDIAN = ($flavour=~/le$/) ? $SIZE_T : 0; + +$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; +( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or +( $xlate="${dir}../../perlasm/ppc-xlate.pl" and -f $xlate) or +( $xlate="${dir}../../../arch/powerpc/crypto/ppc-xlate.pl" and -f $xlate) or +die "can't locate ppc-xlate.pl"; + +open STDOUT,"| $^X $xlate $flavour ".shift || die "can't call $xlate: $!"; + +$FRAME=8*$SIZE_T; +$prefix="aes_p8"; + +$sp="r1"; +$vrsave="r12"; + +######################################################################### +{{{ # Key setup procedures # +my ($inp,$bits,$out,$ptr,$cnt,$rounds)=map("r$_",(3..8)); +my ($zero,$in0,$in1,$key,$rcon,$mask,$tmp)=map("v$_",(0..6)); +my ($stage,$outperm,$outmask,$outhead,$outtail)=map("v$_",(7..11)); + +$code.=<<___; +.machine "any" + +.text + +.align 7 +rcon: +.long 0x01000000, 0x01000000, 0x01000000, 0x01000000 ?rev +.long 0x1b000000, 0x1b000000, 0x1b000000, 0x1b000000 ?rev +.long 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c ?rev +.long 0,0,0,0 ?asis +.long 0x0f102132, 0x43546576, 0x8798a9ba, 0xcbdcedfe +Lconsts: + mflr r0 + bcl 20,31,\$+4 + mflr $ptr #vvvvv "distance between . and rcon + addi $ptr,$ptr,-0x58 + mtlr r0 + blr + .long 0 + .byte 0,12,0x14,0,0,0,0,0 +.asciz "AES for PowerISA 2.07, CRYPTOGAMS by " + +.globl .${prefix}_set_encrypt_key +Lset_encrypt_key: + mflr r11 + $PUSH r11,$LRSAVE($sp) + + li $ptr,-1 + ${UCMP}i $inp,0 + beq- Lenc_key_abort # if ($inp==0) return -1; + ${UCMP}i $out,0 + beq- Lenc_key_abort # if ($out==0) return -1; + li $ptr,-2 + cmpwi $bits,128 + blt- Lenc_key_abort + cmpwi $bits,256 + bgt- Lenc_key_abort + andi. r0,$bits,0x3f + bne- Lenc_key_abort + + lis r0,0xfff0 + mfspr $vrsave,256 + mtspr 256,r0 + + bl Lconsts + mtlr r11 + + neg r9,$inp + lvx $in0,0,$inp + addi $inp,$inp,15 # 15 is not typo + lvsr $key,0,r9 # borrow $key + li r8,0x20 + cmpwi $bits,192 + lvx $in1,0,$inp + le?vspltisb $mask,0x0f # borrow $mask + lvx $rcon,0,$ptr + le?vxor $key,$key,$mask # adjust for byte swap + lvx $mask,r8,$ptr + addi $ptr,$ptr,0x10 + vperm $in0,$in0,$in1,$key # align [and byte swap in LE] + li $cnt,8 + vxor $zero,$zero,$zero + mtctr $cnt + + ?lvsr $outperm,0,$out + vspltisb $outmask,-1 + lvx $outhead,0,$out + ?vperm $outmask,$zero,$outmask,$outperm + + blt Loop128 + addi $inp,$inp,8 + beq L192 + addi $inp,$inp,8 + b L256 + +.align 4 +Loop128: + vperm $key,$in0,$in0,$mask # rotate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vcipherlast $key,$key,$rcon + stvx $stage,0,$out + addi $out,$out,16 + + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vadduwm $rcon,$rcon,$rcon + vxor $in0,$in0,$key + bdnz Loop128 + + lvx $rcon,0,$ptr # last two round keys + + vperm $key,$in0,$in0,$mask # rotate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vcipherlast $key,$key,$rcon + stvx $stage,0,$out + addi $out,$out,16 + + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vadduwm $rcon,$rcon,$rcon + vxor $in0,$in0,$key + + vperm $key,$in0,$in0,$mask # rotate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vcipherlast $key,$key,$rcon + stvx $stage,0,$out + addi $out,$out,16 + + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vxor $in0,$in0,$key + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + stvx $stage,0,$out + + addi $inp,$out,15 # 15 is not typo + addi $out,$out,0x50 + + li $rounds,10 + b Ldone + +.align 4 +L192: + lvx $tmp,0,$inp + li $cnt,4 + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + stvx $stage,0,$out + addi $out,$out,16 + vperm $in1,$in1,$tmp,$key # align [and byte swap in LE] + vspltisb $key,8 # borrow $key + mtctr $cnt + vsububm $mask,$mask,$key # adjust the mask + +Loop192: + vperm $key,$in1,$in1,$mask # roate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vcipherlast $key,$key,$rcon + + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + + vsldoi $stage,$zero,$in1,8 + vspltw $tmp,$in0,3 + vxor $tmp,$tmp,$in1 + vsldoi $in1,$zero,$in1,12 # >>32 + vadduwm $rcon,$rcon,$rcon + vxor $in1,$in1,$tmp + vxor $in0,$in0,$key + vxor $in1,$in1,$key + vsldoi $stage,$stage,$in0,8 + + vperm $key,$in1,$in1,$mask # rotate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vperm $outtail,$stage,$stage,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vcipherlast $key,$key,$rcon + stvx $stage,0,$out + addi $out,$out,16 + + vsldoi $stage,$in0,$in1,8 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vperm $outtail,$stage,$stage,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + stvx $stage,0,$out + addi $out,$out,16 + + vspltw $tmp,$in0,3 + vxor $tmp,$tmp,$in1 + vsldoi $in1,$zero,$in1,12 # >>32 + vadduwm $rcon,$rcon,$rcon + vxor $in1,$in1,$tmp + vxor $in0,$in0,$key + vxor $in1,$in1,$key + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + stvx $stage,0,$out + addi $inp,$out,15 # 15 is not typo + addi $out,$out,16 + bdnz Loop192 + + li $rounds,12 + addi $out,$out,0x20 + b Ldone + +.align 4 +L256: + lvx $tmp,0,$inp + li $cnt,7 + li $rounds,14 + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + stvx $stage,0,$out + addi $out,$out,16 + vperm $in1,$in1,$tmp,$key # align [and byte swap in LE] + mtctr $cnt + +Loop256: + vperm $key,$in1,$in1,$mask # rotate-n-splat + vsldoi $tmp,$zero,$in0,12 # >>32 + vperm $outtail,$in1,$in1,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + vcipherlast $key,$key,$rcon + stvx $stage,0,$out + addi $out,$out,16 + + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in0,$in0,$tmp + vadduwm $rcon,$rcon,$rcon + vxor $in0,$in0,$key + vperm $outtail,$in0,$in0,$outperm # rotate + vsel $stage,$outhead,$outtail,$outmask + vmr $outhead,$outtail + stvx $stage,0,$out + addi $inp,$out,15 # 15 is not typo + addi $out,$out,16 + bdz Ldone + + vspltw $key,$in0,3 # just splat + vsldoi $tmp,$zero,$in1,12 # >>32 + vsbox $key,$key + + vxor $in1,$in1,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in1,$in1,$tmp + vsldoi $tmp,$zero,$tmp,12 # >>32 + vxor $in1,$in1,$tmp + + vxor $in1,$in1,$key + b Loop256 + +.align 4 +Ldone: + lvx $in1,0,$inp # redundant in aligned case + vsel $in1,$outhead,$in1,$outmask + stvx $in1,0,$inp + li $ptr,0 + mtspr 256,$vrsave + stw $rounds,0($out) + +Lenc_key_abort: + mr r3,$ptr + blr + .long 0 + .byte 0,12,0x14,1,0,0,3,0 + .long 0 +.size .${prefix}_set_encrypt_key,.-.${prefix}_set_encrypt_key + +.globl .${prefix}_set_decrypt_key + $STU $sp,-$FRAME($sp) + mflr r10 + $PUSH r10,$FRAME+$LRSAVE($sp) + bl Lset_encrypt_key + mtlr r10 + + cmpwi r3,0 + bne- Ldec_key_abort + + slwi $cnt,$rounds,4 + subi $inp,$out,240 # first round key + srwi $rounds,$rounds,1 + add $out,$inp,$cnt # last round key + mtctr $rounds + +Ldeckey: + lwz r0, 0($inp) + lwz r6, 4($inp) + lwz r7, 8($inp) + lwz r8, 12($inp) + addi $inp,$inp,16 + lwz r9, 0($out) + lwz r10,4($out) + lwz r11,8($out) + lwz r12,12($out) + stw r0, 0($out) + stw r6, 4($out) + stw r7, 8($out) + stw r8, 12($out) + subi $out,$out,16 + stw r9, -16($inp) + stw r10,-12($inp) + stw r11,-8($inp) + stw r12,-4($inp) + bdnz Ldeckey + + xor r3,r3,r3 # return value +Ldec_key_abort: + addi $sp,$sp,$FRAME + blr + .long 0 + .byte 0,12,4,1,0x80,0,3,0 + .long 0 +.size .${prefix}_set_decrypt_key,.-.${prefix}_set_decrypt_key +___ +}}} +######################################################################### +{{{ # Single block en- and decrypt procedures # +sub gen_block () { +my $dir = shift; +my $n = $dir eq "de" ? "n" : ""; +my ($inp,$out,$key,$rounds,$idx)=map("r$_",(3..7)); + +$code.=<<___; +.globl .${prefix}_${dir}crypt + lwz $rounds,240($key) + lis r0,0xfc00 + mfspr $vrsave,256 + li $idx,15 # 15 is not typo + mtspr 256,r0 + + lvx v0,0,$inp + neg r11,$out + lvx v1,$idx,$inp + lvsl v2,0,$inp # inpperm + le?vspltisb v4,0x0f + ?lvsl v3,0,r11 # outperm + le?vxor v2,v2,v4 + li $idx,16 + vperm v0,v0,v1,v2 # align [and byte swap in LE] + lvx v1,0,$key + ?lvsl v5,0,$key # keyperm + srwi $rounds,$rounds,1 + lvx v2,$idx,$key + addi $idx,$idx,16 + subi $rounds,$rounds,1 + ?vperm v1,v1,v2,v5 # align round key + + vxor v0,v0,v1 + lvx v1,$idx,$key + addi $idx,$idx,16 + mtctr $rounds + +Loop_${dir}c: + ?vperm v2,v2,v1,v5 + v${n}cipher v0,v0,v2 + lvx v2,$idx,$key + addi $idx,$idx,16 + ?vperm v1,v1,v2,v5 + v${n}cipher v0,v0,v1 + lvx v1,$idx,$key + addi $idx,$idx,16 + bdnz Loop_${dir}c + + ?vperm v2,v2,v1,v5 + v${n}cipher v0,v0,v2 + lvx v2,$idx,$key + ?vperm v1,v1,v2,v5 + v${n}cipherlast v0,v0,v1 + + vspltisb v2,-1 + vxor v1,v1,v1 + li $idx,15 # 15 is not typo + ?vperm v2,v1,v2,v3 # outmask + le?vxor v3,v3,v4 + lvx v1,0,$out # outhead + vperm v0,v0,v0,v3 # rotate [and byte swap in LE] + vsel v1,v1,v0,v2 + lvx v4,$idx,$out + stvx v1,0,$out + vsel v0,v0,v4,v2 + stvx v0,$idx,$out + + mtspr 256,$vrsave + blr + .long 0 + .byte 0,12,0x14,0,0,0,3,0 + .long 0 +.size .${prefix}_${dir}crypt,.-.${prefix}_${dir}crypt +___ +} +&gen_block("en"); +&gen_block("de"); +}}} +######################################################################### +{{{ # CBC en- and decrypt procedures # +my ($inp,$out,$len,$key,$ivp,$enc,$rounds,$idx)=map("r$_",(3..10)); +my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3)); +my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm)= + map("v$_",(4..10)); +$code.=<<___; +.globl .${prefix}_cbc_encrypt + ${UCMP}i $len,16 + bltlr- + + cmpwi $enc,0 # test direction + lis r0,0xffe0 + mfspr $vrsave,256 + mtspr 256,r0 + + li $idx,15 + vxor $rndkey0,$rndkey0,$rndkey0 + le?vspltisb $tmp,0x0f + + lvx $ivec,0,$ivp # load [unaligned] iv + lvsl $inpperm,0,$ivp + lvx $inptail,$idx,$ivp + le?vxor $inpperm,$inpperm,$tmp + vperm $ivec,$ivec,$inptail,$inpperm + + neg r11,$inp + ?lvsl $keyperm,0,$key # prepare for unaligned key + lwz $rounds,240($key) + + lvsr $inpperm,0,r11 # prepare for unaligned load + lvx $inptail,0,$inp + addi $inp,$inp,15 # 15 is not typo + le?vxor $inpperm,$inpperm,$tmp + + ?lvsr $outperm,0,$out # prepare for unaligned store + vspltisb $outmask,-1 + lvx $outhead,0,$out + ?vperm $outmask,$rndkey0,$outmask,$outperm + le?vxor $outperm,$outperm,$tmp + + srwi $rounds,$rounds,1 + li $idx,16 + subi $rounds,$rounds,1 + beq Lcbc_dec + +Lcbc_enc: + vmr $inout,$inptail + lvx $inptail,0,$inp + addi $inp,$inp,16 + mtctr $rounds + subi $len,$len,16 # len-=16 + + lvx $rndkey0,0,$key + vperm $inout,$inout,$inptail,$inpperm + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + vxor $inout,$inout,$ivec + +Loop_cbc_enc: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + bdnz Loop_cbc_enc + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + li $idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipherlast $ivec,$inout,$rndkey0 + ${UCMP}i $len,16 + + vperm $tmp,$ivec,$ivec,$outperm + vsel $inout,$outhead,$tmp,$outmask + vmr $outhead,$tmp + stvx $inout,0,$out + addi $out,$out,16 + bge Lcbc_enc + + b Lcbc_done + +.align 4 +Lcbc_dec: + ${UCMP}i $len,128 + bge _aesp8_cbc_decrypt8x + vmr $tmp,$inptail + lvx $inptail,0,$inp + addi $inp,$inp,16 + mtctr $rounds + subi $len,$len,16 # len-=16 + + lvx $rndkey0,0,$key + vperm $tmp,$tmp,$inptail,$inpperm + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$tmp,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + +Loop_cbc_dec: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vncipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + bdnz Loop_cbc_dec + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + li $idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vncipherlast $inout,$inout,$rndkey0 + ${UCMP}i $len,16 + + vxor $inout,$inout,$ivec + vmr $ivec,$tmp + vperm $tmp,$inout,$inout,$outperm + vsel $inout,$outhead,$tmp,$outmask + vmr $outhead,$tmp + stvx $inout,0,$out + addi $out,$out,16 + bge Lcbc_dec + +Lcbc_done: + addi $out,$out,-1 + lvx $inout,0,$out # redundant in aligned case + vsel $inout,$outhead,$inout,$outmask + stvx $inout,0,$out + + neg $enc,$ivp # write [unaligned] iv + li $idx,15 # 15 is not typo + vxor $rndkey0,$rndkey0,$rndkey0 + vspltisb $outmask,-1 + le?vspltisb $tmp,0x0f + ?lvsl $outperm,0,$enc + ?vperm $outmask,$rndkey0,$outmask,$outperm + le?vxor $outperm,$outperm,$tmp + lvx $outhead,0,$ivp + vperm $ivec,$ivec,$ivec,$outperm + vsel $inout,$outhead,$ivec,$outmask + lvx $inptail,$idx,$ivp + stvx $inout,0,$ivp + vsel $inout,$ivec,$inptail,$outmask + stvx $inout,$idx,$ivp + + mtspr 256,$vrsave + blr + .long 0 + .byte 0,12,0x14,0,0,0,6,0 + .long 0 +___ +######################################################################### +{{ # Optimized CBC decrypt procedure # +my $key_="r11"; +my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31)); +my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10..13)); +my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(14..21)); +my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys + # v26-v31 last 6 round keys +my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment + +$code.=<<___; +.align 5 +_aesp8_cbc_decrypt8x: + $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) + li r10,`$FRAME+8*16+15` + li r11,`$FRAME+8*16+31` + stvx v20,r10,$sp # ABI says so + addi r10,r10,32 + stvx v21,r11,$sp + addi r11,r11,32 + stvx v22,r10,$sp + addi r10,r10,32 + stvx v23,r11,$sp + addi r11,r11,32 + stvx v24,r10,$sp + addi r10,r10,32 + stvx v25,r11,$sp + addi r11,r11,32 + stvx v26,r10,$sp + addi r10,r10,32 + stvx v27,r11,$sp + addi r11,r11,32 + stvx v28,r10,$sp + addi r10,r10,32 + stvx v29,r11,$sp + addi r11,r11,32 + stvx v30,r10,$sp + stvx v31,r11,$sp + li r0,-1 + stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave + li $x10,0x10 + $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) + li $x20,0x20 + $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) + li $x30,0x30 + $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) + li $x40,0x40 + $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) + li $x50,0x50 + $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) + li $x60,0x60 + $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) + li $x70,0x70 + mtspr 256,r0 + + subi $rounds,$rounds,3 # -4 in total + subi $len,$len,128 # bias + + lvx $rndkey0,$x00,$key # load key schedule + lvx v30,$x10,$key + addi $key,$key,0x20 + lvx v31,$x00,$key + ?vperm $rndkey0,$rndkey0,v30,$keyperm + addi $key_,$sp,$FRAME+15 + mtctr $rounds + +Load_cbc_dec_key: + ?vperm v24,v30,v31,$keyperm + lvx v30,$x10,$key + addi $key,$key,0x20 + stvx v24,$x00,$key_ # off-load round[1] + ?vperm v25,v31,v30,$keyperm + lvx v31,$x00,$key + stvx v25,$x10,$key_ # off-load round[2] + addi $key_,$key_,0x20 + bdnz Load_cbc_dec_key + + lvx v26,$x10,$key + ?vperm v24,v30,v31,$keyperm + lvx v27,$x20,$key + stvx v24,$x00,$key_ # off-load round[3] + ?vperm v25,v31,v26,$keyperm + lvx v28,$x30,$key + stvx v25,$x10,$key_ # off-load round[4] + addi $key_,$sp,$FRAME+15 # rewind $key_ + ?vperm v26,v26,v27,$keyperm + lvx v29,$x40,$key + ?vperm v27,v27,v28,$keyperm + lvx v30,$x50,$key + ?vperm v28,v28,v29,$keyperm + lvx v31,$x60,$key + ?vperm v29,v29,v30,$keyperm + lvx $out0,$x70,$key # borrow $out0 + ?vperm v30,v30,v31,$keyperm + lvx v24,$x00,$key_ # pre-load round[1] + ?vperm v31,v31,$out0,$keyperm + lvx v25,$x10,$key_ # pre-load round[2] + + #lvx $inptail,0,$inp # "caller" already did this + #addi $inp,$inp,15 # 15 is not typo + subi $inp,$inp,15 # undo "caller" + + le?li $idx,8 + lvx_u $in0,$x00,$inp # load first 8 "words" + le?lvsl $inpperm,0,$idx + le?vspltisb $tmp,0x0f + lvx_u $in1,$x10,$inp + le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u + lvx_u $in2,$x20,$inp + le?vperm $in0,$in0,$in0,$inpperm + lvx_u $in3,$x30,$inp + le?vperm $in1,$in1,$in1,$inpperm + lvx_u $in4,$x40,$inp + le?vperm $in2,$in2,$in2,$inpperm + vxor $out0,$in0,$rndkey0 + lvx_u $in5,$x50,$inp + le?vperm $in3,$in3,$in3,$inpperm + vxor $out1,$in1,$rndkey0 + lvx_u $in6,$x60,$inp + le?vperm $in4,$in4,$in4,$inpperm + vxor $out2,$in2,$rndkey0 + lvx_u $in7,$x70,$inp + addi $inp,$inp,0x80 + le?vperm $in5,$in5,$in5,$inpperm + vxor $out3,$in3,$rndkey0 + le?vperm $in6,$in6,$in6,$inpperm + vxor $out4,$in4,$rndkey0 + le?vperm $in7,$in7,$in7,$inpperm + vxor $out5,$in5,$rndkey0 + vxor $out6,$in6,$rndkey0 + vxor $out7,$in7,$rndkey0 + + mtctr $rounds + b Loop_cbc_dec8x +.align 5 +Loop_cbc_dec8x: + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + vncipher $out6,$out6,v24 + vncipher $out7,$out7,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + vncipher $out6,$out6,v25 + vncipher $out7,$out7,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_cbc_dec8x + + subic $len,$len,128 # $len-=128 + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + vncipher $out6,$out6,v24 + vncipher $out7,$out7,v24 + + subfe. r0,r0,r0 # borrow?-1:0 + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + vncipher $out6,$out6,v25 + vncipher $out7,$out7,v25 + + and r0,r0,$len + vncipher $out0,$out0,v26 + vncipher $out1,$out1,v26 + vncipher $out2,$out2,v26 + vncipher $out3,$out3,v26 + vncipher $out4,$out4,v26 + vncipher $out5,$out5,v26 + vncipher $out6,$out6,v26 + vncipher $out7,$out7,v26 + + add $inp,$inp,r0 # $inp is adjusted in such + # way that at exit from the + # loop inX-in7 are loaded + # with last "words" + vncipher $out0,$out0,v27 + vncipher $out1,$out1,v27 + vncipher $out2,$out2,v27 + vncipher $out3,$out3,v27 + vncipher $out4,$out4,v27 + vncipher $out5,$out5,v27 + vncipher $out6,$out6,v27 + vncipher $out7,$out7,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vncipher $out0,$out0,v28 + vncipher $out1,$out1,v28 + vncipher $out2,$out2,v28 + vncipher $out3,$out3,v28 + vncipher $out4,$out4,v28 + vncipher $out5,$out5,v28 + vncipher $out6,$out6,v28 + vncipher $out7,$out7,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + + vncipher $out0,$out0,v29 + vncipher $out1,$out1,v29 + vncipher $out2,$out2,v29 + vncipher $out3,$out3,v29 + vncipher $out4,$out4,v29 + vncipher $out5,$out5,v29 + vncipher $out6,$out6,v29 + vncipher $out7,$out7,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + + vncipher $out0,$out0,v30 + vxor $ivec,$ivec,v31 # xor with last round key + vncipher $out1,$out1,v30 + vxor $in0,$in0,v31 + vncipher $out2,$out2,v30 + vxor $in1,$in1,v31 + vncipher $out3,$out3,v30 + vxor $in2,$in2,v31 + vncipher $out4,$out4,v30 + vxor $in3,$in3,v31 + vncipher $out5,$out5,v30 + vxor $in4,$in4,v31 + vncipher $out6,$out6,v30 + vxor $in5,$in5,v31 + vncipher $out7,$out7,v30 + vxor $in6,$in6,v31 + + vncipherlast $out0,$out0,$ivec + vncipherlast $out1,$out1,$in0 + lvx_u $in0,$x00,$inp # load next input block + vncipherlast $out2,$out2,$in1 + lvx_u $in1,$x10,$inp + vncipherlast $out3,$out3,$in2 + le?vperm $in0,$in0,$in0,$inpperm + lvx_u $in2,$x20,$inp + vncipherlast $out4,$out4,$in3 + le?vperm $in1,$in1,$in1,$inpperm + lvx_u $in3,$x30,$inp + vncipherlast $out5,$out5,$in4 + le?vperm $in2,$in2,$in2,$inpperm + lvx_u $in4,$x40,$inp + vncipherlast $out6,$out6,$in5 + le?vperm $in3,$in3,$in3,$inpperm + lvx_u $in5,$x50,$inp + vncipherlast $out7,$out7,$in6 + le?vperm $in4,$in4,$in4,$inpperm + lvx_u $in6,$x60,$inp + vmr $ivec,$in7 + le?vperm $in5,$in5,$in5,$inpperm + lvx_u $in7,$x70,$inp + addi $inp,$inp,0x80 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $in6,$in6,$in6,$inpperm + vxor $out0,$in0,$rndkey0 + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $in7,$in7,$in7,$inpperm + vxor $out1,$in1,$rndkey0 + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + vxor $out2,$in2,$rndkey0 + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x30,$out + vxor $out3,$in3,$rndkey0 + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x40,$out + vxor $out4,$in4,$rndkey0 + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x50,$out + vxor $out5,$in5,$rndkey0 + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x60,$out + vxor $out6,$in6,$rndkey0 + stvx_u $out7,$x70,$out + addi $out,$out,0x80 + vxor $out7,$in7,$rndkey0 + + mtctr $rounds + beq Loop_cbc_dec8x # did $len-=128 borrow? + + addic. $len,$len,128 + beq Lcbc_dec8x_done + nop + nop + +Loop_cbc_dec8x_tail: # up to 7 "words" tail... + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + vncipher $out6,$out6,v24 + vncipher $out7,$out7,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + vncipher $out6,$out6,v25 + vncipher $out7,$out7,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_cbc_dec8x_tail + + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + vncipher $out6,$out6,v24 + vncipher $out7,$out7,v24 + + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + vncipher $out6,$out6,v25 + vncipher $out7,$out7,v25 + + vncipher $out1,$out1,v26 + vncipher $out2,$out2,v26 + vncipher $out3,$out3,v26 + vncipher $out4,$out4,v26 + vncipher $out5,$out5,v26 + vncipher $out6,$out6,v26 + vncipher $out7,$out7,v26 + + vncipher $out1,$out1,v27 + vncipher $out2,$out2,v27 + vncipher $out3,$out3,v27 + vncipher $out4,$out4,v27 + vncipher $out5,$out5,v27 + vncipher $out6,$out6,v27 + vncipher $out7,$out7,v27 + + vncipher $out1,$out1,v28 + vncipher $out2,$out2,v28 + vncipher $out3,$out3,v28 + vncipher $out4,$out4,v28 + vncipher $out5,$out5,v28 + vncipher $out6,$out6,v28 + vncipher $out7,$out7,v28 + + vncipher $out1,$out1,v29 + vncipher $out2,$out2,v29 + vncipher $out3,$out3,v29 + vncipher $out4,$out4,v29 + vncipher $out5,$out5,v29 + vncipher $out6,$out6,v29 + vncipher $out7,$out7,v29 + + vncipher $out1,$out1,v30 + vxor $ivec,$ivec,v31 # last round key + vncipher $out2,$out2,v30 + vxor $in1,$in1,v31 + vncipher $out3,$out3,v30 + vxor $in2,$in2,v31 + vncipher $out4,$out4,v30 + vxor $in3,$in3,v31 + vncipher $out5,$out5,v30 + vxor $in4,$in4,v31 + vncipher $out6,$out6,v30 + vxor $in5,$in5,v31 + vncipher $out7,$out7,v30 + vxor $in6,$in6,v31 + + cmplwi $len,32 # switch($len) + blt Lcbc_dec8x_one + nop + beq Lcbc_dec8x_two + cmplwi $len,64 + blt Lcbc_dec8x_three + nop + beq Lcbc_dec8x_four + cmplwi $len,96 + blt Lcbc_dec8x_five + nop + beq Lcbc_dec8x_six + +Lcbc_dec8x_seven: + vncipherlast $out1,$out1,$ivec + vncipherlast $out2,$out2,$in1 + vncipherlast $out3,$out3,$in2 + vncipherlast $out4,$out4,$in3 + vncipherlast $out5,$out5,$in4 + vncipherlast $out6,$out6,$in5 + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out1,$out1,$out1,$inpperm + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x00,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x10,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x20,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x30,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x40,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x50,$out + stvx_u $out7,$x60,$out + addi $out,$out,0x70 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_six: + vncipherlast $out2,$out2,$ivec + vncipherlast $out3,$out3,$in2 + vncipherlast $out4,$out4,$in3 + vncipherlast $out5,$out5,$in4 + vncipherlast $out6,$out6,$in5 + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out2,$out2,$out2,$inpperm + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x00,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x10,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x20,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x30,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x40,$out + stvx_u $out7,$x50,$out + addi $out,$out,0x60 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_five: + vncipherlast $out3,$out3,$ivec + vncipherlast $out4,$out4,$in3 + vncipherlast $out5,$out5,$in4 + vncipherlast $out6,$out6,$in5 + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out3,$out3,$out3,$inpperm + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x00,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x10,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x20,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x30,$out + stvx_u $out7,$x40,$out + addi $out,$out,0x50 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_four: + vncipherlast $out4,$out4,$ivec + vncipherlast $out5,$out5,$in4 + vncipherlast $out6,$out6,$in5 + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out4,$out4,$out4,$inpperm + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x00,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x10,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x20,$out + stvx_u $out7,$x30,$out + addi $out,$out,0x40 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_three: + vncipherlast $out5,$out5,$ivec + vncipherlast $out6,$out6,$in5 + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out5,$out5,$out5,$inpperm + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x00,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x10,$out + stvx_u $out7,$x20,$out + addi $out,$out,0x30 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_two: + vncipherlast $out6,$out6,$ivec + vncipherlast $out7,$out7,$in6 + vmr $ivec,$in7 + + le?vperm $out6,$out6,$out6,$inpperm + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x00,$out + stvx_u $out7,$x10,$out + addi $out,$out,0x20 + b Lcbc_dec8x_done + +.align 5 +Lcbc_dec8x_one: + vncipherlast $out7,$out7,$ivec + vmr $ivec,$in7 + + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out7,0,$out + addi $out,$out,0x10 + +Lcbc_dec8x_done: + le?vperm $ivec,$ivec,$ivec,$inpperm + stvx_u $ivec,0,$ivp # write [unaligned] iv + + li r10,`$FRAME+15` + li r11,`$FRAME+31` + stvx $inpperm,r10,$sp # wipe copies of round keys + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + + mtspr 256,$vrsave + lvx v20,r10,$sp # ABI says so + addi r10,r10,32 + lvx v21,r11,$sp + addi r11,r11,32 + lvx v22,r10,$sp + addi r10,r10,32 + lvx v23,r11,$sp + addi r11,r11,32 + lvx v24,r10,$sp + addi r10,r10,32 + lvx v25,r11,$sp + addi r11,r11,32 + lvx v26,r10,$sp + addi r10,r10,32 + lvx v27,r11,$sp + addi r11,r11,32 + lvx v28,r10,$sp + addi r10,r10,32 + lvx v29,r11,$sp + addi r11,r11,32 + lvx v30,r10,$sp + lvx v31,r11,$sp + $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) + $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) + $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) + $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) + $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) + $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) + addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` + blr + .long 0 + .byte 0,12,0x14,0,0x80,6,6,0 + .long 0 +.size .${prefix}_cbc_encrypt,.-.${prefix}_cbc_encrypt +___ +}} }}} + +######################################################################### +{{{ # CTR procedure[s] # + +####################### WARNING: Here be dragons! ####################### +# +# This code is written as 'ctr32', based on a 32-bit counter used +# upstream. The kernel does *not* use a 32-bit counter. The kernel uses +# a 128-bit counter. +# +# This leads to subtle changes from the upstream code: the counter +# is incremented with vaddu_q_m rather than vaddu_w_m. This occurs in +# both the bulk (8 blocks at a time) path, and in the individual block +# path. Be aware of this when doing updates. +# +# See: +# 1d4aa0b4c181 ("crypto: vmx - Fixing AES-CTR counter bug") +# 009b30ac7444 ("crypto: vmx - CTR: always increment IV as quadword") +# https://github.com/openssl/openssl/pull/8942 +# +######################################################################### +my ($inp,$out,$len,$key,$ivp,$x10,$rounds,$idx)=map("r$_",(3..10)); +my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3)); +my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm,$one)= + map("v$_",(4..11)); +my $dat=$tmp; + +$code.=<<___; +.globl .${prefix}_ctr32_encrypt_blocks + ${UCMP}i $len,1 + bltlr- + + lis r0,0xfff0 + mfspr $vrsave,256 + mtspr 256,r0 + + li $idx,15 + vxor $rndkey0,$rndkey0,$rndkey0 + le?vspltisb $tmp,0x0f + + lvx $ivec,0,$ivp # load [unaligned] iv + lvsl $inpperm,0,$ivp + lvx $inptail,$idx,$ivp + vspltisb $one,1 + le?vxor $inpperm,$inpperm,$tmp + vperm $ivec,$ivec,$inptail,$inpperm + vsldoi $one,$rndkey0,$one,1 + + neg r11,$inp + ?lvsl $keyperm,0,$key # prepare for unaligned key + lwz $rounds,240($key) + + lvsr $inpperm,0,r11 # prepare for unaligned load + lvx $inptail,0,$inp + addi $inp,$inp,15 # 15 is not typo + le?vxor $inpperm,$inpperm,$tmp + + srwi $rounds,$rounds,1 + li $idx,16 + subi $rounds,$rounds,1 + + ${UCMP}i $len,8 + bge _aesp8_ctr32_encrypt8x + + ?lvsr $outperm,0,$out # prepare for unaligned store + vspltisb $outmask,-1 + lvx $outhead,0,$out + ?vperm $outmask,$rndkey0,$outmask,$outperm + le?vxor $outperm,$outperm,$tmp + + lvx $rndkey0,0,$key + mtctr $rounds + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$ivec,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + b Loop_ctr32_enc + +.align 5 +Loop_ctr32_enc: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + bdnz Loop_ctr32_enc + + vadduqm $ivec,$ivec,$one # Kernel change for 128-bit + vmr $dat,$inptail + lvx $inptail,0,$inp + addi $inp,$inp,16 + subic. $len,$len,1 # blocks-- + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key + vperm $dat,$dat,$inptail,$inpperm + li $idx,16 + ?vperm $rndkey1,$rndkey0,$rndkey1,$keyperm + lvx $rndkey0,0,$key + vxor $dat,$dat,$rndkey1 # last round key + vcipherlast $inout,$inout,$dat + + lvx $rndkey1,$idx,$key + addi $idx,$idx,16 + vperm $inout,$inout,$inout,$outperm + vsel $dat,$outhead,$inout,$outmask + mtctr $rounds + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vmr $outhead,$inout + vxor $inout,$ivec,$rndkey0 + lvx $rndkey0,$idx,$key + addi $idx,$idx,16 + stvx $dat,0,$out + addi $out,$out,16 + bne Loop_ctr32_enc + + addi $out,$out,-1 + lvx $inout,0,$out # redundant in aligned case + vsel $inout,$outhead,$inout,$outmask + stvx $inout,0,$out + + mtspr 256,$vrsave + blr + .long 0 + .byte 0,12,0x14,0,0,0,6,0 + .long 0 +___ +######################################################################### +{{ # Optimized CTR procedure # +my $key_="r11"; +my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31)); +my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10,12..14)); +my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(15..22)); +my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys + # v26-v31 last 6 round keys +my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment +my ($two,$three,$four)=($outhead,$outperm,$outmask); + +$code.=<<___; +.align 5 +_aesp8_ctr32_encrypt8x: + $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) + li r10,`$FRAME+8*16+15` + li r11,`$FRAME+8*16+31` + stvx v20,r10,$sp # ABI says so + addi r10,r10,32 + stvx v21,r11,$sp + addi r11,r11,32 + stvx v22,r10,$sp + addi r10,r10,32 + stvx v23,r11,$sp + addi r11,r11,32 + stvx v24,r10,$sp + addi r10,r10,32 + stvx v25,r11,$sp + addi r11,r11,32 + stvx v26,r10,$sp + addi r10,r10,32 + stvx v27,r11,$sp + addi r11,r11,32 + stvx v28,r10,$sp + addi r10,r10,32 + stvx v29,r11,$sp + addi r11,r11,32 + stvx v30,r10,$sp + stvx v31,r11,$sp + li r0,-1 + stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave + li $x10,0x10 + $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) + li $x20,0x20 + $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) + li $x30,0x30 + $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) + li $x40,0x40 + $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) + li $x50,0x50 + $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) + li $x60,0x60 + $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) + li $x70,0x70 + mtspr 256,r0 + + subi $rounds,$rounds,3 # -4 in total + + lvx $rndkey0,$x00,$key # load key schedule + lvx v30,$x10,$key + addi $key,$key,0x20 + lvx v31,$x00,$key + ?vperm $rndkey0,$rndkey0,v30,$keyperm + addi $key_,$sp,$FRAME+15 + mtctr $rounds + +Load_ctr32_enc_key: + ?vperm v24,v30,v31,$keyperm + lvx v30,$x10,$key + addi $key,$key,0x20 + stvx v24,$x00,$key_ # off-load round[1] + ?vperm v25,v31,v30,$keyperm + lvx v31,$x00,$key + stvx v25,$x10,$key_ # off-load round[2] + addi $key_,$key_,0x20 + bdnz Load_ctr32_enc_key + + lvx v26,$x10,$key + ?vperm v24,v30,v31,$keyperm + lvx v27,$x20,$key + stvx v24,$x00,$key_ # off-load round[3] + ?vperm v25,v31,v26,$keyperm + lvx v28,$x30,$key + stvx v25,$x10,$key_ # off-load round[4] + addi $key_,$sp,$FRAME+15 # rewind $key_ + ?vperm v26,v26,v27,$keyperm + lvx v29,$x40,$key + ?vperm v27,v27,v28,$keyperm + lvx v30,$x50,$key + ?vperm v28,v28,v29,$keyperm + lvx v31,$x60,$key + ?vperm v29,v29,v30,$keyperm + lvx $out0,$x70,$key # borrow $out0 + ?vperm v30,v30,v31,$keyperm + lvx v24,$x00,$key_ # pre-load round[1] + ?vperm v31,v31,$out0,$keyperm + lvx v25,$x10,$key_ # pre-load round[2] + + vadduqm $two,$one,$one + subi $inp,$inp,15 # undo "caller" + $SHL $len,$len,4 + + vadduqm $out1,$ivec,$one # counter values ... + vadduqm $out2,$ivec,$two # (do all ctr adds as 128-bit) + vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0] + le?li $idx,8 + vadduqm $out3,$out1,$two + vxor $out1,$out1,$rndkey0 + le?lvsl $inpperm,0,$idx + vadduqm $out4,$out2,$two + vxor $out2,$out2,$rndkey0 + le?vspltisb $tmp,0x0f + vadduqm $out5,$out3,$two + vxor $out3,$out3,$rndkey0 + le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u + vadduqm $out6,$out4,$two + vxor $out4,$out4,$rndkey0 + vadduqm $out7,$out5,$two + vxor $out5,$out5,$rndkey0 + vadduqm $ivec,$out6,$two # next counter value + vxor $out6,$out6,$rndkey0 + vxor $out7,$out7,$rndkey0 + + mtctr $rounds + b Loop_ctr32_enc8x +.align 5 +Loop_ctr32_enc8x: + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + vcipher $out5,$out5,v24 + vcipher $out6,$out6,v24 + vcipher $out7,$out7,v24 +Loop_ctr32_enc8x_middle: + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vcipher $out4,$out4,v25 + vcipher $out5,$out5,v25 + vcipher $out6,$out6,v25 + vcipher $out7,$out7,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_ctr32_enc8x + + subic r11,$len,256 # $len-256, borrow $key_ + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + vcipher $out5,$out5,v24 + vcipher $out6,$out6,v24 + vcipher $out7,$out7,v24 + + subfe r0,r0,r0 # borrow?-1:0 + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vcipher $out4,$out4,v25 + vcipher $out5,$out5,v25 + vcipher $out6,$out6,v25 + vcipher $out7,$out7,v25 + + and r0,r0,r11 + addi $key_,$sp,$FRAME+15 # rewind $key_ + vcipher $out0,$out0,v26 + vcipher $out1,$out1,v26 + vcipher $out2,$out2,v26 + vcipher $out3,$out3,v26 + vcipher $out4,$out4,v26 + vcipher $out5,$out5,v26 + vcipher $out6,$out6,v26 + vcipher $out7,$out7,v26 + lvx v24,$x00,$key_ # re-pre-load round[1] + + subic $len,$len,129 # $len-=129 + vcipher $out0,$out0,v27 + addi $len,$len,1 # $len-=128 really + vcipher $out1,$out1,v27 + vcipher $out2,$out2,v27 + vcipher $out3,$out3,v27 + vcipher $out4,$out4,v27 + vcipher $out5,$out5,v27 + vcipher $out6,$out6,v27 + vcipher $out7,$out7,v27 + lvx v25,$x10,$key_ # re-pre-load round[2] + + vcipher $out0,$out0,v28 + lvx_u $in0,$x00,$inp # load input + vcipher $out1,$out1,v28 + lvx_u $in1,$x10,$inp + vcipher $out2,$out2,v28 + lvx_u $in2,$x20,$inp + vcipher $out3,$out3,v28 + lvx_u $in3,$x30,$inp + vcipher $out4,$out4,v28 + lvx_u $in4,$x40,$inp + vcipher $out5,$out5,v28 + lvx_u $in5,$x50,$inp + vcipher $out6,$out6,v28 + lvx_u $in6,$x60,$inp + vcipher $out7,$out7,v28 + lvx_u $in7,$x70,$inp + addi $inp,$inp,0x80 + + vcipher $out0,$out0,v29 + le?vperm $in0,$in0,$in0,$inpperm + vcipher $out1,$out1,v29 + le?vperm $in1,$in1,$in1,$inpperm + vcipher $out2,$out2,v29 + le?vperm $in2,$in2,$in2,$inpperm + vcipher $out3,$out3,v29 + le?vperm $in3,$in3,$in3,$inpperm + vcipher $out4,$out4,v29 + le?vperm $in4,$in4,$in4,$inpperm + vcipher $out5,$out5,v29 + le?vperm $in5,$in5,$in5,$inpperm + vcipher $out6,$out6,v29 + le?vperm $in6,$in6,$in6,$inpperm + vcipher $out7,$out7,v29 + le?vperm $in7,$in7,$in7,$inpperm + + add $inp,$inp,r0 # $inp is adjusted in such + # way that at exit from the + # loop inX-in7 are loaded + # with last "words" + subfe. r0,r0,r0 # borrow?-1:0 + vcipher $out0,$out0,v30 + vxor $in0,$in0,v31 # xor with last round key + vcipher $out1,$out1,v30 + vxor $in1,$in1,v31 + vcipher $out2,$out2,v30 + vxor $in2,$in2,v31 + vcipher $out3,$out3,v30 + vxor $in3,$in3,v31 + vcipher $out4,$out4,v30 + vxor $in4,$in4,v31 + vcipher $out5,$out5,v30 + vxor $in5,$in5,v31 + vcipher $out6,$out6,v30 + vxor $in6,$in6,v31 + vcipher $out7,$out7,v30 + vxor $in7,$in7,v31 + + bne Lctr32_enc8x_break # did $len-129 borrow? + + vcipherlast $in0,$out0,$in0 + vcipherlast $in1,$out1,$in1 + vadduqm $out1,$ivec,$one # counter values ... + vcipherlast $in2,$out2,$in2 + vadduqm $out2,$ivec,$two + vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0] + vcipherlast $in3,$out3,$in3 + vadduqm $out3,$out1,$two + vxor $out1,$out1,$rndkey0 + vcipherlast $in4,$out4,$in4 + vadduqm $out4,$out2,$two + vxor $out2,$out2,$rndkey0 + vcipherlast $in5,$out5,$in5 + vadduqm $out5,$out3,$two + vxor $out3,$out3,$rndkey0 + vcipherlast $in6,$out6,$in6 + vadduqm $out6,$out4,$two + vxor $out4,$out4,$rndkey0 + vcipherlast $in7,$out7,$in7 + vadduqm $out7,$out5,$two + vxor $out5,$out5,$rndkey0 + le?vperm $in0,$in0,$in0,$inpperm + vadduqm $ivec,$out6,$two # next counter value + vxor $out6,$out6,$rndkey0 + le?vperm $in1,$in1,$in1,$inpperm + vxor $out7,$out7,$rndkey0 + mtctr $rounds + + vcipher $out0,$out0,v24 + stvx_u $in0,$x00,$out + le?vperm $in2,$in2,$in2,$inpperm + vcipher $out1,$out1,v24 + stvx_u $in1,$x10,$out + le?vperm $in3,$in3,$in3,$inpperm + vcipher $out2,$out2,v24 + stvx_u $in2,$x20,$out + le?vperm $in4,$in4,$in4,$inpperm + vcipher $out3,$out3,v24 + stvx_u $in3,$x30,$out + le?vperm $in5,$in5,$in5,$inpperm + vcipher $out4,$out4,v24 + stvx_u $in4,$x40,$out + le?vperm $in6,$in6,$in6,$inpperm + vcipher $out5,$out5,v24 + stvx_u $in5,$x50,$out + le?vperm $in7,$in7,$in7,$inpperm + vcipher $out6,$out6,v24 + stvx_u $in6,$x60,$out + vcipher $out7,$out7,v24 + stvx_u $in7,$x70,$out + addi $out,$out,0x80 + + b Loop_ctr32_enc8x_middle + +.align 5 +Lctr32_enc8x_break: + cmpwi $len,-0x60 + blt Lctr32_enc8x_one + nop + beq Lctr32_enc8x_two + cmpwi $len,-0x40 + blt Lctr32_enc8x_three + nop + beq Lctr32_enc8x_four + cmpwi $len,-0x20 + blt Lctr32_enc8x_five + nop + beq Lctr32_enc8x_six + cmpwi $len,0x00 + blt Lctr32_enc8x_seven + +Lctr32_enc8x_eight: + vcipherlast $out0,$out0,$in0 + vcipherlast $out1,$out1,$in1 + vcipherlast $out2,$out2,$in2 + vcipherlast $out3,$out3,$in3 + vcipherlast $out4,$out4,$in4 + vcipherlast $out5,$out5,$in5 + vcipherlast $out6,$out6,$in6 + vcipherlast $out7,$out7,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x30,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x40,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x50,$out + le?vperm $out7,$out7,$out7,$inpperm + stvx_u $out6,$x60,$out + stvx_u $out7,$x70,$out + addi $out,$out,0x80 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_seven: + vcipherlast $out0,$out0,$in1 + vcipherlast $out1,$out1,$in2 + vcipherlast $out2,$out2,$in3 + vcipherlast $out3,$out3,$in4 + vcipherlast $out4,$out4,$in5 + vcipherlast $out5,$out5,$in6 + vcipherlast $out6,$out6,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x30,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x40,$out + le?vperm $out6,$out6,$out6,$inpperm + stvx_u $out5,$x50,$out + stvx_u $out6,$x60,$out + addi $out,$out,0x70 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_six: + vcipherlast $out0,$out0,$in2 + vcipherlast $out1,$out1,$in3 + vcipherlast $out2,$out2,$in4 + vcipherlast $out3,$out3,$in5 + vcipherlast $out4,$out4,$in6 + vcipherlast $out5,$out5,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x30,$out + le?vperm $out5,$out5,$out5,$inpperm + stvx_u $out4,$x40,$out + stvx_u $out5,$x50,$out + addi $out,$out,0x60 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_five: + vcipherlast $out0,$out0,$in3 + vcipherlast $out1,$out1,$in4 + vcipherlast $out2,$out2,$in5 + vcipherlast $out3,$out3,$in6 + vcipherlast $out4,$out4,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + le?vperm $out4,$out4,$out4,$inpperm + stvx_u $out3,$x30,$out + stvx_u $out4,$x40,$out + addi $out,$out,0x50 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_four: + vcipherlast $out0,$out0,$in4 + vcipherlast $out1,$out1,$in5 + vcipherlast $out2,$out2,$in6 + vcipherlast $out3,$out3,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$inpperm + stvx_u $out2,$x20,$out + stvx_u $out3,$x30,$out + addi $out,$out,0x40 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_three: + vcipherlast $out0,$out0,$in5 + vcipherlast $out1,$out1,$in6 + vcipherlast $out2,$out2,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + le?vperm $out2,$out2,$out2,$inpperm + stvx_u $out1,$x10,$out + stvx_u $out2,$x20,$out + addi $out,$out,0x30 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_two: + vcipherlast $out0,$out0,$in6 + vcipherlast $out1,$out1,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + le?vperm $out1,$out1,$out1,$inpperm + stvx_u $out0,$x00,$out + stvx_u $out1,$x10,$out + addi $out,$out,0x20 + b Lctr32_enc8x_done + +.align 5 +Lctr32_enc8x_one: + vcipherlast $out0,$out0,$in7 + + le?vperm $out0,$out0,$out0,$inpperm + stvx_u $out0,0,$out + addi $out,$out,0x10 + +Lctr32_enc8x_done: + li r10,`$FRAME+15` + li r11,`$FRAME+31` + stvx $inpperm,r10,$sp # wipe copies of round keys + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + stvx $inpperm,r10,$sp + addi r10,r10,32 + stvx $inpperm,r11,$sp + addi r11,r11,32 + + mtspr 256,$vrsave + lvx v20,r10,$sp # ABI says so + addi r10,r10,32 + lvx v21,r11,$sp + addi r11,r11,32 + lvx v22,r10,$sp + addi r10,r10,32 + lvx v23,r11,$sp + addi r11,r11,32 + lvx v24,r10,$sp + addi r10,r10,32 + lvx v25,r11,$sp + addi r11,r11,32 + lvx v26,r10,$sp + addi r10,r10,32 + lvx v27,r11,$sp + addi r11,r11,32 + lvx v28,r10,$sp + addi r10,r10,32 + lvx v29,r11,$sp + addi r11,r11,32 + lvx v30,r10,$sp + lvx v31,r11,$sp + $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) + $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) + $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) + $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) + $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) + $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) + addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` + blr + .long 0 + .byte 0,12,0x14,0,0x80,6,6,0 + .long 0 +.size .${prefix}_ctr32_encrypt_blocks,.-.${prefix}_ctr32_encrypt_blocks +___ +}} }}} + +######################################################################### +{{{ # XTS procedures # +# int aes_p8_xts_[en|de]crypt(const char *inp, char *out, size_t len, # +# const AES_KEY *key1, const AES_KEY *key2, # +# [const] unsigned char iv[16]); # +# If $key2 is NULL, then a "tweak chaining" mode is engaged, in which # +# input tweak value is assumed to be encrypted already, and last tweak # +# value, one suitable for consecutive call on same chunk of data, is # +# written back to original buffer. In addition, in "tweak chaining" # +# mode only complete input blocks are processed. # + +my ($inp,$out,$len,$key1,$key2,$ivp,$rounds,$idx) = map("r$_",(3..10)); +my ($rndkey0,$rndkey1,$inout) = map("v$_",(0..2)); +my ($output,$inptail,$inpperm,$leperm,$keyperm) = map("v$_",(3..7)); +my ($tweak,$seven,$eighty7,$tmp,$tweak1) = map("v$_",(8..12)); +my $taillen = $key2; + + ($inp,$idx) = ($idx,$inp); # reassign + +$code.=<<___; +.globl .${prefix}_xts_encrypt + mr $inp,r3 # reassign + li r3,-1 + ${UCMP}i $len,16 + bltlr- + + lis r0,0xfff0 + mfspr r12,256 # save vrsave + li r11,0 + mtspr 256,r0 + + vspltisb $seven,0x07 # 0x070707..07 + le?lvsl $leperm,r11,r11 + le?vspltisb $tmp,0x0f + le?vxor $leperm,$leperm,$seven + + li $idx,15 + lvx $tweak,0,$ivp # load [unaligned] iv + lvsl $inpperm,0,$ivp + lvx $inptail,$idx,$ivp + le?vxor $inpperm,$inpperm,$tmp + vperm $tweak,$tweak,$inptail,$inpperm + + neg r11,$inp + lvsr $inpperm,0,r11 # prepare for unaligned load + lvx $inout,0,$inp + addi $inp,$inp,15 # 15 is not typo + le?vxor $inpperm,$inpperm,$tmp + + ${UCMP}i $key2,0 # key2==NULL? + beq Lxts_enc_no_key2 + + ?lvsl $keyperm,0,$key2 # prepare for unaligned key + lwz $rounds,240($key2) + srwi $rounds,$rounds,1 + subi $rounds,$rounds,1 + li $idx,16 + + lvx $rndkey0,0,$key2 + lvx $rndkey1,$idx,$key2 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $tweak,$tweak,$rndkey0 + lvx $rndkey0,$idx,$key2 + addi $idx,$idx,16 + mtctr $rounds + +Ltweak_xts_enc: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $tweak,$tweak,$rndkey1 + lvx $rndkey1,$idx,$key2 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipher $tweak,$tweak,$rndkey0 + lvx $rndkey0,$idx,$key2 + addi $idx,$idx,16 + bdnz Ltweak_xts_enc + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $tweak,$tweak,$rndkey1 + lvx $rndkey1,$idx,$key2 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipherlast $tweak,$tweak,$rndkey0 + + li $ivp,0 # don't chain the tweak + b Lxts_enc + +Lxts_enc_no_key2: + li $idx,-16 + and $len,$len,$idx # in "tweak chaining" + # mode only complete + # blocks are processed +Lxts_enc: + lvx $inptail,0,$inp + addi $inp,$inp,16 + + ?lvsl $keyperm,0,$key1 # prepare for unaligned key + lwz $rounds,240($key1) + srwi $rounds,$rounds,1 + subi $rounds,$rounds,1 + li $idx,16 + + vslb $eighty7,$seven,$seven # 0x808080..80 + vor $eighty7,$eighty7,$seven # 0x878787..87 + vspltisb $tmp,1 # 0x010101..01 + vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01 + + ${UCMP}i $len,96 + bge _aesp8_xts_encrypt6x + + andi. $taillen,$len,15 + subic r0,$len,32 + subi $taillen,$taillen,16 + subfe r0,r0,r0 + and r0,r0,$taillen + add $inp,$inp,r0 + + lvx $rndkey0,0,$key1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + vperm $inout,$inout,$inptail,$inpperm + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$inout,$tweak + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + mtctr $rounds + b Loop_xts_enc + +.align 5 +Loop_xts_enc: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + bdnz Loop_xts_enc + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + li $idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $rndkey0,$rndkey0,$tweak + vcipherlast $output,$inout,$rndkey0 + + le?vperm $tmp,$output,$output,$leperm + be?nop + le?stvx_u $tmp,0,$out + be?stvx_u $output,0,$out + addi $out,$out,16 + + subic. $len,$len,16 + beq Lxts_enc_done + + vmr $inout,$inptail + lvx $inptail,0,$inp + addi $inp,$inp,16 + lvx $rndkey0,0,$key1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + + subic r0,$len,32 + subfe r0,r0,r0 + and r0,r0,$taillen + add $inp,$inp,r0 + + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vsldoi $tmp,$tmp,$tmp,15 + vand $tmp,$tmp,$eighty7 + vxor $tweak,$tweak,$tmp + + vperm $inout,$inout,$inptail,$inpperm + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$inout,$tweak + vxor $output,$output,$rndkey0 # just in case $len<16 + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + + mtctr $rounds + ${UCMP}i $len,16 + bge Loop_xts_enc + + vxor $output,$output,$tweak + lvsr $inpperm,0,$len # $inpperm is no longer needed + vxor $inptail,$inptail,$inptail # $inptail is no longer needed + vspltisb $tmp,-1 + vperm $inptail,$inptail,$tmp,$inpperm + vsel $inout,$inout,$output,$inptail + + subi r11,$out,17 + subi $out,$out,16 + mtctr $len + li $len,16 +Loop_xts_enc_steal: + lbzu r0,1(r11) + stb r0,16(r11) + bdnz Loop_xts_enc_steal + + mtctr $rounds + b Loop_xts_enc # one more time... + +Lxts_enc_done: + ${UCMP}i $ivp,0 + beq Lxts_enc_ret + + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vsldoi $tmp,$tmp,$tmp,15 + vand $tmp,$tmp,$eighty7 + vxor $tweak,$tweak,$tmp + + le?vperm $tweak,$tweak,$tweak,$leperm + stvx_u $tweak,0,$ivp + +Lxts_enc_ret: + mtspr 256,r12 # restore vrsave + li r3,0 + blr + .long 0 + .byte 0,12,0x04,0,0x80,6,6,0 + .long 0 +.size .${prefix}_xts_encrypt,.-.${prefix}_xts_encrypt + +.globl .${prefix}_xts_decrypt + mr $inp,r3 # reassign + li r3,-1 + ${UCMP}i $len,16 + bltlr- + + lis r0,0xfff8 + mfspr r12,256 # save vrsave + li r11,0 + mtspr 256,r0 + + andi. r0,$len,15 + neg r0,r0 + andi. r0,r0,16 + sub $len,$len,r0 + + vspltisb $seven,0x07 # 0x070707..07 + le?lvsl $leperm,r11,r11 + le?vspltisb $tmp,0x0f + le?vxor $leperm,$leperm,$seven + + li $idx,15 + lvx $tweak,0,$ivp # load [unaligned] iv + lvsl $inpperm,0,$ivp + lvx $inptail,$idx,$ivp + le?vxor $inpperm,$inpperm,$tmp + vperm $tweak,$tweak,$inptail,$inpperm + + neg r11,$inp + lvsr $inpperm,0,r11 # prepare for unaligned load + lvx $inout,0,$inp + addi $inp,$inp,15 # 15 is not typo + le?vxor $inpperm,$inpperm,$tmp + + ${UCMP}i $key2,0 # key2==NULL? + beq Lxts_dec_no_key2 + + ?lvsl $keyperm,0,$key2 # prepare for unaligned key + lwz $rounds,240($key2) + srwi $rounds,$rounds,1 + subi $rounds,$rounds,1 + li $idx,16 + + lvx $rndkey0,0,$key2 + lvx $rndkey1,$idx,$key2 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $tweak,$tweak,$rndkey0 + lvx $rndkey0,$idx,$key2 + addi $idx,$idx,16 + mtctr $rounds + +Ltweak_xts_dec: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $tweak,$tweak,$rndkey1 + lvx $rndkey1,$idx,$key2 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipher $tweak,$tweak,$rndkey0 + lvx $rndkey0,$idx,$key2 + addi $idx,$idx,16 + bdnz Ltweak_xts_dec + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vcipher $tweak,$tweak,$rndkey1 + lvx $rndkey1,$idx,$key2 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vcipherlast $tweak,$tweak,$rndkey0 + + li $ivp,0 # don't chain the tweak + b Lxts_dec + +Lxts_dec_no_key2: + neg $idx,$len + andi. $idx,$idx,15 + add $len,$len,$idx # in "tweak chaining" + # mode only complete + # blocks are processed +Lxts_dec: + lvx $inptail,0,$inp + addi $inp,$inp,16 + + ?lvsl $keyperm,0,$key1 # prepare for unaligned key + lwz $rounds,240($key1) + srwi $rounds,$rounds,1 + subi $rounds,$rounds,1 + li $idx,16 + + vslb $eighty7,$seven,$seven # 0x808080..80 + vor $eighty7,$eighty7,$seven # 0x878787..87 + vspltisb $tmp,1 # 0x010101..01 + vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01 + + ${UCMP}i $len,96 + bge _aesp8_xts_decrypt6x + + lvx $rndkey0,0,$key1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + vperm $inout,$inout,$inptail,$inpperm + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$inout,$tweak + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + mtctr $rounds + + ${UCMP}i $len,16 + blt Ltail_xts_dec + be?b Loop_xts_dec + +.align 5 +Loop_xts_dec: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vncipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + bdnz Loop_xts_dec + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + li $idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $rndkey0,$rndkey0,$tweak + vncipherlast $output,$inout,$rndkey0 + + le?vperm $tmp,$output,$output,$leperm + be?nop + le?stvx_u $tmp,0,$out + be?stvx_u $output,0,$out + addi $out,$out,16 + + subic. $len,$len,16 + beq Lxts_dec_done + + vmr $inout,$inptail + lvx $inptail,0,$inp + addi $inp,$inp,16 + lvx $rndkey0,0,$key1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vsldoi $tmp,$tmp,$tmp,15 + vand $tmp,$tmp,$eighty7 + vxor $tweak,$tweak,$tmp + + vperm $inout,$inout,$inptail,$inpperm + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $inout,$inout,$tweak + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + + mtctr $rounds + ${UCMP}i $len,16 + bge Loop_xts_dec + +Ltail_xts_dec: + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak1,$tweak,$tweak + vsldoi $tmp,$tmp,$tmp,15 + vand $tmp,$tmp,$eighty7 + vxor $tweak1,$tweak1,$tmp + + subi $inp,$inp,16 + add $inp,$inp,$len + + vxor $inout,$inout,$tweak # :-( + vxor $inout,$inout,$tweak1 # :-) + +Loop_xts_dec_short: + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vncipher $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + bdnz Loop_xts_dec_short + + ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm + vncipher $inout,$inout,$rndkey1 + lvx $rndkey1,$idx,$key1 + li $idx,16 + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + vxor $rndkey0,$rndkey0,$tweak1 + vncipherlast $output,$inout,$rndkey0 + + le?vperm $tmp,$output,$output,$leperm + be?nop + le?stvx_u $tmp,0,$out + be?stvx_u $output,0,$out + + vmr $inout,$inptail + lvx $inptail,0,$inp + #addi $inp,$inp,16 + lvx $rndkey0,0,$key1 + lvx $rndkey1,$idx,$key1 + addi $idx,$idx,16 + vperm $inout,$inout,$inptail,$inpperm + ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm + + lvsr $inpperm,0,$len # $inpperm is no longer needed + vxor $inptail,$inptail,$inptail # $inptail is no longer needed + vspltisb $tmp,-1 + vperm $inptail,$inptail,$tmp,$inpperm + vsel $inout,$inout,$output,$inptail + + vxor $rndkey0,$rndkey0,$tweak + vxor $inout,$inout,$rndkey0 + lvx $rndkey0,$idx,$key1 + addi $idx,$idx,16 + + subi r11,$out,1 + mtctr $len + li $len,16 +Loop_xts_dec_steal: + lbzu r0,1(r11) + stb r0,16(r11) + bdnz Loop_xts_dec_steal + + mtctr $rounds + b Loop_xts_dec # one more time... + +Lxts_dec_done: + ${UCMP}i $ivp,0 + beq Lxts_dec_ret + + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vsldoi $tmp,$tmp,$tmp,15 + vand $tmp,$tmp,$eighty7 + vxor $tweak,$tweak,$tmp + + le?vperm $tweak,$tweak,$tweak,$leperm + stvx_u $tweak,0,$ivp + +Lxts_dec_ret: + mtspr 256,r12 # restore vrsave + li r3,0 + blr + .long 0 + .byte 0,12,0x04,0,0x80,6,6,0 + .long 0 +.size .${prefix}_xts_decrypt,.-.${prefix}_xts_decrypt +___ +######################################################################### +{{ # Optimized XTS procedures # +my $key_=$key2; +my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,3,26..31)); + $x00=0 if ($flavour =~ /osx/); +my ($in0, $in1, $in2, $in3, $in4, $in5 )=map("v$_",(0..5)); +my ($out0, $out1, $out2, $out3, $out4, $out5)=map("v$_",(7,12..16)); +my ($twk0, $twk1, $twk2, $twk3, $twk4, $twk5)=map("v$_",(17..22)); +my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys + # v26-v31 last 6 round keys +my ($keyperm)=($out0); # aliases with "caller", redundant assignment +my $taillen=$x70; + +$code.=<<___; +.align 5 +_aesp8_xts_encrypt6x: + $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) + mflr r11 + li r7,`$FRAME+8*16+15` + li r3,`$FRAME+8*16+31` + $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp) + stvx v20,r7,$sp # ABI says so + addi r7,r7,32 + stvx v21,r3,$sp + addi r3,r3,32 + stvx v22,r7,$sp + addi r7,r7,32 + stvx v23,r3,$sp + addi r3,r3,32 + stvx v24,r7,$sp + addi r7,r7,32 + stvx v25,r3,$sp + addi r3,r3,32 + stvx v26,r7,$sp + addi r7,r7,32 + stvx v27,r3,$sp + addi r3,r3,32 + stvx v28,r7,$sp + addi r7,r7,32 + stvx v29,r3,$sp + addi r3,r3,32 + stvx v30,r7,$sp + stvx v31,r3,$sp + li r0,-1 + stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave + li $x10,0x10 + $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) + li $x20,0x20 + $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) + li $x30,0x30 + $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) + li $x40,0x40 + $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) + li $x50,0x50 + $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) + li $x60,0x60 + $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) + li $x70,0x70 + mtspr 256,r0 + + xxlor 2, 32+$eighty7, 32+$eighty7 + vsldoi $eighty7,$tmp,$eighty7,1 # 0x010101..87 + xxlor 1, 32+$eighty7, 32+$eighty7 + + # Load XOR Lconsts. + mr $x70, r6 + bl Lconsts + lxvw4x 0, $x40, r6 # load XOR contents + mr r6, $x70 + li $x70,0x70 + + subi $rounds,$rounds,3 # -4 in total + + lvx $rndkey0,$x00,$key1 # load key schedule + lvx v30,$x10,$key1 + addi $key1,$key1,0x20 + lvx v31,$x00,$key1 + ?vperm $rndkey0,$rndkey0,v30,$keyperm + addi $key_,$sp,$FRAME+15 + mtctr $rounds + +Load_xts_enc_key: + ?vperm v24,v30,v31,$keyperm + lvx v30,$x10,$key1 + addi $key1,$key1,0x20 + stvx v24,$x00,$key_ # off-load round[1] + ?vperm v25,v31,v30,$keyperm + lvx v31,$x00,$key1 + stvx v25,$x10,$key_ # off-load round[2] + addi $key_,$key_,0x20 + bdnz Load_xts_enc_key + + lvx v26,$x10,$key1 + ?vperm v24,v30,v31,$keyperm + lvx v27,$x20,$key1 + stvx v24,$x00,$key_ # off-load round[3] + ?vperm v25,v31,v26,$keyperm + lvx v28,$x30,$key1 + stvx v25,$x10,$key_ # off-load round[4] + addi $key_,$sp,$FRAME+15 # rewind $key_ + ?vperm v26,v26,v27,$keyperm + lvx v29,$x40,$key1 + ?vperm v27,v27,v28,$keyperm + lvx v30,$x50,$key1 + ?vperm v28,v28,v29,$keyperm + lvx v31,$x60,$key1 + ?vperm v29,v29,v30,$keyperm + lvx $twk5,$x70,$key1 # borrow $twk5 + ?vperm v30,v30,v31,$keyperm + lvx v24,$x00,$key_ # pre-load round[1] + ?vperm v31,v31,$twk5,$keyperm + lvx v25,$x10,$key_ # pre-load round[2] + + # Switch to use the following codes with 0x010101..87 to generate tweak. + # eighty7 = 0x010101..87 + # vsrab tmp, tweak, seven # next tweak value, right shift 7 bits + # vand tmp, tmp, eighty7 # last byte with carry + # vaddubm tweak, tweak, tweak # left shift 1 bit (x2) + # xxlor vsx, 0, 0 + # vpermxor tweak, tweak, tmp, vsx + + vperm $in0,$inout,$inptail,$inpperm + subi $inp,$inp,31 # undo "caller" + vxor $twk0,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vand $tmp,$tmp,$eighty7 + vxor $out0,$in0,$twk0 + xxlor 32+$in1, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in1 + + lvx_u $in1,$x10,$inp + vxor $twk1,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in1,$in1,$in1,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out1,$in1,$twk1 + xxlor 32+$in2, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in2 + + lvx_u $in2,$x20,$inp + andi. $taillen,$len,15 + vxor $twk2,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in2,$in2,$in2,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out2,$in2,$twk2 + xxlor 32+$in3, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in3 + + lvx_u $in3,$x30,$inp + sub $len,$len,$taillen + vxor $twk3,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in3,$in3,$in3,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out3,$in3,$twk3 + xxlor 32+$in4, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in4 + + lvx_u $in4,$x40,$inp + subi $len,$len,0x60 + vxor $twk4,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in4,$in4,$in4,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out4,$in4,$twk4 + xxlor 32+$in5, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in5 + + lvx_u $in5,$x50,$inp + addi $inp,$inp,0x60 + vxor $twk5,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in5,$in5,$in5,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out5,$in5,$twk5 + xxlor 32+$in0, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in0 + + vxor v31,v31,$rndkey0 + mtctr $rounds + b Loop_xts_enc6x + +.align 5 +Loop_xts_enc6x: + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + vcipher $out5,$out5,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vcipher $out4,$out4,v25 + vcipher $out5,$out5,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_xts_enc6x + + xxlor 32+$eighty7, 1, 1 # 0x010101..87 + + subic $len,$len,96 # $len-=96 + vxor $in0,$twk0,v31 # xor with last round key + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk0,$tweak,$rndkey0 + vaddubm $tweak,$tweak,$tweak + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + vcipher $out5,$out5,v24 + + subfe. r0,r0,r0 # borrow?-1:0 + vand $tmp,$tmp,$eighty7 + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + xxlor 32+$in1, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in1 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vxor $in1,$twk1,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk1,$tweak,$rndkey0 + vcipher $out4,$out4,v25 + vcipher $out5,$out5,v25 + + and r0,r0,$len + vaddubm $tweak,$tweak,$tweak + vcipher $out0,$out0,v26 + vcipher $out1,$out1,v26 + vand $tmp,$tmp,$eighty7 + vcipher $out2,$out2,v26 + vcipher $out3,$out3,v26 + xxlor 32+$in2, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in2 + vcipher $out4,$out4,v26 + vcipher $out5,$out5,v26 + + add $inp,$inp,r0 # $inp is adjusted in such + # way that at exit from the + # loop inX-in5 are loaded + # with last "words" + vxor $in2,$twk2,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk2,$tweak,$rndkey0 + vaddubm $tweak,$tweak,$tweak + vcipher $out0,$out0,v27 + vcipher $out1,$out1,v27 + vcipher $out2,$out2,v27 + vcipher $out3,$out3,v27 + vand $tmp,$tmp,$eighty7 + vcipher $out4,$out4,v27 + vcipher $out5,$out5,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + xxlor 32+$in3, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in3 + vcipher $out0,$out0,v28 + vcipher $out1,$out1,v28 + vxor $in3,$twk3,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk3,$tweak,$rndkey0 + vcipher $out2,$out2,v28 + vcipher $out3,$out3,v28 + vaddubm $tweak,$tweak,$tweak + vcipher $out4,$out4,v28 + vcipher $out5,$out5,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + vand $tmp,$tmp,$eighty7 + + vcipher $out0,$out0,v29 + vcipher $out1,$out1,v29 + xxlor 32+$in4, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in4 + vcipher $out2,$out2,v29 + vcipher $out3,$out3,v29 + vxor $in4,$twk4,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk4,$tweak,$rndkey0 + vcipher $out4,$out4,v29 + vcipher $out5,$out5,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vaddubm $tweak,$tweak,$tweak + + vcipher $out0,$out0,v30 + vcipher $out1,$out1,v30 + vand $tmp,$tmp,$eighty7 + vcipher $out2,$out2,v30 + vcipher $out3,$out3,v30 + xxlor 32+$in5, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in5 + vcipher $out4,$out4,v30 + vcipher $out5,$out5,v30 + vxor $in5,$twk5,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk5,$tweak,$rndkey0 + + vcipherlast $out0,$out0,$in0 + lvx_u $in0,$x00,$inp # load next input block + vaddubm $tweak,$tweak,$tweak + vcipherlast $out1,$out1,$in1 + lvx_u $in1,$x10,$inp + vcipherlast $out2,$out2,$in2 + le?vperm $in0,$in0,$in0,$leperm + lvx_u $in2,$x20,$inp + vand $tmp,$tmp,$eighty7 + vcipherlast $out3,$out3,$in3 + le?vperm $in1,$in1,$in1,$leperm + lvx_u $in3,$x30,$inp + vcipherlast $out4,$out4,$in4 + le?vperm $in2,$in2,$in2,$leperm + lvx_u $in4,$x40,$inp + xxlor 10, 32+$in0, 32+$in0 + xxlor 32+$in0, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in0 + xxlor 32+$in0, 10, 10 + vcipherlast $tmp,$out5,$in5 # last block might be needed + # in stealing mode + le?vperm $in3,$in3,$in3,$leperm + lvx_u $in5,$x50,$inp + addi $inp,$inp,0x60 + le?vperm $in4,$in4,$in4,$leperm + le?vperm $in5,$in5,$in5,$leperm + + le?vperm $out0,$out0,$out0,$leperm + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk0 + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + vxor $out1,$in1,$twk1 + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + vxor $out2,$in2,$twk2 + le?vperm $out4,$out4,$out4,$leperm + stvx_u $out3,$x30,$out + vxor $out3,$in3,$twk3 + le?vperm $out5,$tmp,$tmp,$leperm + stvx_u $out4,$x40,$out + vxor $out4,$in4,$twk4 + le?stvx_u $out5,$x50,$out + be?stvx_u $tmp, $x50,$out + vxor $out5,$in5,$twk5 + addi $out,$out,0x60 + + mtctr $rounds + beq Loop_xts_enc6x # did $len-=96 borrow? + + xxlor 32+$eighty7, 2, 2 # 0x010101..87 + + addic. $len,$len,0x60 + beq Lxts_enc6x_zero + cmpwi $len,0x20 + blt Lxts_enc6x_one + nop + beq Lxts_enc6x_two + cmpwi $len,0x40 + blt Lxts_enc6x_three + nop + beq Lxts_enc6x_four + +Lxts_enc6x_five: + vxor $out0,$in1,$twk0 + vxor $out1,$in2,$twk1 + vxor $out2,$in3,$twk2 + vxor $out3,$in4,$twk3 + vxor $out4,$in5,$twk4 + + bl _aesp8_xts_enc5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk5 # unused tweak + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + vxor $tmp,$out4,$twk5 # last block prep for stealing + le?vperm $out4,$out4,$out4,$leperm + stvx_u $out3,$x30,$out + stvx_u $out4,$x40,$out + addi $out,$out,0x50 + bne Lxts_enc6x_steal + b Lxts_enc6x_done + +.align 4 +Lxts_enc6x_four: + vxor $out0,$in2,$twk0 + vxor $out1,$in3,$twk1 + vxor $out2,$in4,$twk2 + vxor $out3,$in5,$twk3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_enc5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk4 # unused tweak + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + vxor $tmp,$out3,$twk4 # last block prep for stealing + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + stvx_u $out3,$x30,$out + addi $out,$out,0x40 + bne Lxts_enc6x_steal + b Lxts_enc6x_done + +.align 4 +Lxts_enc6x_three: + vxor $out0,$in3,$twk0 + vxor $out1,$in4,$twk1 + vxor $out2,$in5,$twk2 + vxor $out3,$out3,$out3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_enc5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk3 # unused tweak + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $tmp,$out2,$twk3 # last block prep for stealing + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + stvx_u $out2,$x20,$out + addi $out,$out,0x30 + bne Lxts_enc6x_steal + b Lxts_enc6x_done + +.align 4 +Lxts_enc6x_two: + vxor $out0,$in4,$twk0 + vxor $out1,$in5,$twk1 + vxor $out2,$out2,$out2 + vxor $out3,$out3,$out3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_enc5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk2 # unused tweak + vxor $tmp,$out1,$twk2 # last block prep for stealing + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + stvx_u $out1,$x10,$out + addi $out,$out,0x20 + bne Lxts_enc6x_steal + b Lxts_enc6x_done + +.align 4 +Lxts_enc6x_one: + vxor $out0,$in5,$twk0 + nop +Loop_xts_enc1x: + vcipher $out0,$out0,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vcipher $out0,$out0,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_xts_enc1x + + add $inp,$inp,$taillen + cmpwi $taillen,0 + vcipher $out0,$out0,v24 + + subi $inp,$inp,16 + vcipher $out0,$out0,v25 + + lvsr $inpperm,0,$taillen + vcipher $out0,$out0,v26 + + lvx_u $in0,0,$inp + vcipher $out0,$out0,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vcipher $out0,$out0,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + + vcipher $out0,$out0,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vxor $twk0,$twk0,v31 + + le?vperm $in0,$in0,$in0,$leperm + vcipher $out0,$out0,v30 + + vperm $in0,$in0,$in0,$inpperm + vcipherlast $out0,$out0,$twk0 + + vmr $twk0,$twk1 # unused tweak + vxor $tmp,$out0,$twk1 # last block prep for stealing + le?vperm $out0,$out0,$out0,$leperm + stvx_u $out0,$x00,$out # store output + addi $out,$out,0x10 + bne Lxts_enc6x_steal + b Lxts_enc6x_done + +.align 4 +Lxts_enc6x_zero: + cmpwi $taillen,0 + beq Lxts_enc6x_done + + add $inp,$inp,$taillen + subi $inp,$inp,16 + lvx_u $in0,0,$inp + lvsr $inpperm,0,$taillen # $in5 is no more + le?vperm $in0,$in0,$in0,$leperm + vperm $in0,$in0,$in0,$inpperm + vxor $tmp,$tmp,$twk0 +Lxts_enc6x_steal: + vxor $in0,$in0,$twk0 + vxor $out0,$out0,$out0 + vspltisb $out1,-1 + vperm $out0,$out0,$out1,$inpperm + vsel $out0,$in0,$tmp,$out0 # $tmp is last block, remember? + + subi r30,$out,17 + subi $out,$out,16 + mtctr $taillen +Loop_xts_enc6x_steal: + lbzu r0,1(r30) + stb r0,16(r30) + bdnz Loop_xts_enc6x_steal + + li $taillen,0 + mtctr $rounds + b Loop_xts_enc1x # one more time... + +.align 4 +Lxts_enc6x_done: + ${UCMP}i $ivp,0 + beq Lxts_enc6x_ret + + vxor $tweak,$twk0,$rndkey0 + le?vperm $tweak,$tweak,$tweak,$leperm + stvx_u $tweak,0,$ivp + +Lxts_enc6x_ret: + mtlr r11 + li r10,`$FRAME+15` + li r11,`$FRAME+31` + stvx $seven,r10,$sp # wipe copies of round keys + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + + mtspr 256,$vrsave + lvx v20,r10,$sp # ABI says so + addi r10,r10,32 + lvx v21,r11,$sp + addi r11,r11,32 + lvx v22,r10,$sp + addi r10,r10,32 + lvx v23,r11,$sp + addi r11,r11,32 + lvx v24,r10,$sp + addi r10,r10,32 + lvx v25,r11,$sp + addi r11,r11,32 + lvx v26,r10,$sp + addi r10,r10,32 + lvx v27,r11,$sp + addi r11,r11,32 + lvx v28,r10,$sp + addi r10,r10,32 + lvx v29,r11,$sp + addi r11,r11,32 + lvx v30,r10,$sp + lvx v31,r11,$sp + $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) + $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) + $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) + $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) + $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) + $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) + addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` + blr + .long 0 + .byte 0,12,0x04,1,0x80,6,6,0 + .long 0 + +.align 5 +_aesp8_xts_enc5x: + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vcipher $out4,$out4,v25 + lvx v25,$x10,$key_ # round[4] + bdnz _aesp8_xts_enc5x + + add $inp,$inp,$taillen + cmpwi $taillen,0 + vcipher $out0,$out0,v24 + vcipher $out1,$out1,v24 + vcipher $out2,$out2,v24 + vcipher $out3,$out3,v24 + vcipher $out4,$out4,v24 + + subi $inp,$inp,16 + vcipher $out0,$out0,v25 + vcipher $out1,$out1,v25 + vcipher $out2,$out2,v25 + vcipher $out3,$out3,v25 + vcipher $out4,$out4,v25 + vxor $twk0,$twk0,v31 + + vcipher $out0,$out0,v26 + lvsr $inpperm,r0,$taillen # $in5 is no more + vcipher $out1,$out1,v26 + vcipher $out2,$out2,v26 + vcipher $out3,$out3,v26 + vcipher $out4,$out4,v26 + vxor $in1,$twk1,v31 + + vcipher $out0,$out0,v27 + lvx_u $in0,0,$inp + vcipher $out1,$out1,v27 + vcipher $out2,$out2,v27 + vcipher $out3,$out3,v27 + vcipher $out4,$out4,v27 + vxor $in2,$twk2,v31 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vcipher $out0,$out0,v28 + vcipher $out1,$out1,v28 + vcipher $out2,$out2,v28 + vcipher $out3,$out3,v28 + vcipher $out4,$out4,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + vxor $in3,$twk3,v31 + + vcipher $out0,$out0,v29 + le?vperm $in0,$in0,$in0,$leperm + vcipher $out1,$out1,v29 + vcipher $out2,$out2,v29 + vcipher $out3,$out3,v29 + vcipher $out4,$out4,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vxor $in4,$twk4,v31 + + vcipher $out0,$out0,v30 + vperm $in0,$in0,$in0,$inpperm + vcipher $out1,$out1,v30 + vcipher $out2,$out2,v30 + vcipher $out3,$out3,v30 + vcipher $out4,$out4,v30 + + vcipherlast $out0,$out0,$twk0 + vcipherlast $out1,$out1,$in1 + vcipherlast $out2,$out2,$in2 + vcipherlast $out3,$out3,$in3 + vcipherlast $out4,$out4,$in4 + blr + .long 0 + .byte 0,12,0x14,0,0,0,0,0 + +.align 5 +_aesp8_xts_decrypt6x: + $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp) + mflr r11 + li r7,`$FRAME+8*16+15` + li r3,`$FRAME+8*16+31` + $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp) + stvx v20,r7,$sp # ABI says so + addi r7,r7,32 + stvx v21,r3,$sp + addi r3,r3,32 + stvx v22,r7,$sp + addi r7,r7,32 + stvx v23,r3,$sp + addi r3,r3,32 + stvx v24,r7,$sp + addi r7,r7,32 + stvx v25,r3,$sp + addi r3,r3,32 + stvx v26,r7,$sp + addi r7,r7,32 + stvx v27,r3,$sp + addi r3,r3,32 + stvx v28,r7,$sp + addi r7,r7,32 + stvx v29,r3,$sp + addi r3,r3,32 + stvx v30,r7,$sp + stvx v31,r3,$sp + li r0,-1 + stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave + li $x10,0x10 + $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp) + li $x20,0x20 + $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp) + li $x30,0x30 + $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp) + li $x40,0x40 + $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp) + li $x50,0x50 + $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp) + li $x60,0x60 + $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp) + li $x70,0x70 + mtspr 256,r0 + + xxlor 2, 32+$eighty7, 32+$eighty7 + vsldoi $eighty7,$tmp,$eighty7,1 # 0x010101..87 + xxlor 1, 32+$eighty7, 32+$eighty7 + + # Load XOR Lconsts. + mr $x70, r6 + bl Lconsts + lxvw4x 0, $x40, r6 # load XOR contents + mr r6, $x70 + li $x70,0x70 + + subi $rounds,$rounds,3 # -4 in total + + lvx $rndkey0,$x00,$key1 # load key schedule + lvx v30,$x10,$key1 + addi $key1,$key1,0x20 + lvx v31,$x00,$key1 + ?vperm $rndkey0,$rndkey0,v30,$keyperm + addi $key_,$sp,$FRAME+15 + mtctr $rounds + +Load_xts_dec_key: + ?vperm v24,v30,v31,$keyperm + lvx v30,$x10,$key1 + addi $key1,$key1,0x20 + stvx v24,$x00,$key_ # off-load round[1] + ?vperm v25,v31,v30,$keyperm + lvx v31,$x00,$key1 + stvx v25,$x10,$key_ # off-load round[2] + addi $key_,$key_,0x20 + bdnz Load_xts_dec_key + + lvx v26,$x10,$key1 + ?vperm v24,v30,v31,$keyperm + lvx v27,$x20,$key1 + stvx v24,$x00,$key_ # off-load round[3] + ?vperm v25,v31,v26,$keyperm + lvx v28,$x30,$key1 + stvx v25,$x10,$key_ # off-load round[4] + addi $key_,$sp,$FRAME+15 # rewind $key_ + ?vperm v26,v26,v27,$keyperm + lvx v29,$x40,$key1 + ?vperm v27,v27,v28,$keyperm + lvx v30,$x50,$key1 + ?vperm v28,v28,v29,$keyperm + lvx v31,$x60,$key1 + ?vperm v29,v29,v30,$keyperm + lvx $twk5,$x70,$key1 # borrow $twk5 + ?vperm v30,v30,v31,$keyperm + lvx v24,$x00,$key_ # pre-load round[1] + ?vperm v31,v31,$twk5,$keyperm + lvx v25,$x10,$key_ # pre-load round[2] + + vperm $in0,$inout,$inptail,$inpperm + subi $inp,$inp,31 # undo "caller" + vxor $twk0,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + vand $tmp,$tmp,$eighty7 + vxor $out0,$in0,$twk0 + xxlor 32+$in1, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in1 + + lvx_u $in1,$x10,$inp + vxor $twk1,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in1,$in1,$in1,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out1,$in1,$twk1 + xxlor 32+$in2, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in2 + + lvx_u $in2,$x20,$inp + andi. $taillen,$len,15 + vxor $twk2,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in2,$in2,$in2,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out2,$in2,$twk2 + xxlor 32+$in3, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in3 + + lvx_u $in3,$x30,$inp + sub $len,$len,$taillen + vxor $twk3,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in3,$in3,$in3,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out3,$in3,$twk3 + xxlor 32+$in4, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in4 + + lvx_u $in4,$x40,$inp + subi $len,$len,0x60 + vxor $twk4,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in4,$in4,$in4,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out4,$in4,$twk4 + xxlor 32+$in5, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in5 + + lvx_u $in5,$x50,$inp + addi $inp,$inp,0x60 + vxor $twk5,$tweak,$rndkey0 + vsrab $tmp,$tweak,$seven # next tweak value + vaddubm $tweak,$tweak,$tweak + le?vperm $in5,$in5,$in5,$leperm + vand $tmp,$tmp,$eighty7 + vxor $out5,$in5,$twk5 + xxlor 32+$in0, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in0 + + vxor v31,v31,$rndkey0 + mtctr $rounds + b Loop_xts_dec6x + +.align 5 +Loop_xts_dec6x: + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_xts_dec6x + + xxlor 32+$eighty7, 1, 1 # 0x010101..87 + + subic $len,$len,96 # $len-=96 + vxor $in0,$twk0,v31 # xor with last round key + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk0,$tweak,$rndkey0 + vaddubm $tweak,$tweak,$tweak + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + vncipher $out5,$out5,v24 + + subfe. r0,r0,r0 # borrow?-1:0 + vand $tmp,$tmp,$eighty7 + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + xxlor 32+$in1, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in1 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vxor $in1,$twk1,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk1,$tweak,$rndkey0 + vncipher $out4,$out4,v25 + vncipher $out5,$out5,v25 + + and r0,r0,$len + vaddubm $tweak,$tweak,$tweak + vncipher $out0,$out0,v26 + vncipher $out1,$out1,v26 + vand $tmp,$tmp,$eighty7 + vncipher $out2,$out2,v26 + vncipher $out3,$out3,v26 + xxlor 32+$in2, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in2 + vncipher $out4,$out4,v26 + vncipher $out5,$out5,v26 + + add $inp,$inp,r0 # $inp is adjusted in such + # way that at exit from the + # loop inX-in5 are loaded + # with last "words" + vxor $in2,$twk2,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk2,$tweak,$rndkey0 + vaddubm $tweak,$tweak,$tweak + vncipher $out0,$out0,v27 + vncipher $out1,$out1,v27 + vncipher $out2,$out2,v27 + vncipher $out3,$out3,v27 + vand $tmp,$tmp,$eighty7 + vncipher $out4,$out4,v27 + vncipher $out5,$out5,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + xxlor 32+$in3, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in3 + vncipher $out0,$out0,v28 + vncipher $out1,$out1,v28 + vxor $in3,$twk3,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk3,$tweak,$rndkey0 + vncipher $out2,$out2,v28 + vncipher $out3,$out3,v28 + vaddubm $tweak,$tweak,$tweak + vncipher $out4,$out4,v28 + vncipher $out5,$out5,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + vand $tmp,$tmp,$eighty7 + + vncipher $out0,$out0,v29 + vncipher $out1,$out1,v29 + xxlor 32+$in4, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in4 + vncipher $out2,$out2,v29 + vncipher $out3,$out3,v29 + vxor $in4,$twk4,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk4,$tweak,$rndkey0 + vncipher $out4,$out4,v29 + vncipher $out5,$out5,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vaddubm $tweak,$tweak,$tweak + + vncipher $out0,$out0,v30 + vncipher $out1,$out1,v30 + vand $tmp,$tmp,$eighty7 + vncipher $out2,$out2,v30 + vncipher $out3,$out3,v30 + xxlor 32+$in5, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in5 + vncipher $out4,$out4,v30 + vncipher $out5,$out5,v30 + vxor $in5,$twk5,v31 + vsrab $tmp,$tweak,$seven # next tweak value + vxor $twk5,$tweak,$rndkey0 + + vncipherlast $out0,$out0,$in0 + lvx_u $in0,$x00,$inp # load next input block + vaddubm $tweak,$tweak,$tweak + vncipherlast $out1,$out1,$in1 + lvx_u $in1,$x10,$inp + vncipherlast $out2,$out2,$in2 + le?vperm $in0,$in0,$in0,$leperm + lvx_u $in2,$x20,$inp + vand $tmp,$tmp,$eighty7 + vncipherlast $out3,$out3,$in3 + le?vperm $in1,$in1,$in1,$leperm + lvx_u $in3,$x30,$inp + vncipherlast $out4,$out4,$in4 + le?vperm $in2,$in2,$in2,$leperm + lvx_u $in4,$x40,$inp + xxlor 10, 32+$in0, 32+$in0 + xxlor 32+$in0, 0, 0 + vpermxor $tweak, $tweak, $tmp, $in0 + xxlor 32+$in0, 10, 10 + vncipherlast $out5,$out5,$in5 + le?vperm $in3,$in3,$in3,$leperm + lvx_u $in5,$x50,$inp + addi $inp,$inp,0x60 + le?vperm $in4,$in4,$in4,$leperm + le?vperm $in5,$in5,$in5,$leperm + + le?vperm $out0,$out0,$out0,$leperm + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk0 + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + vxor $out1,$in1,$twk1 + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + vxor $out2,$in2,$twk2 + le?vperm $out4,$out4,$out4,$leperm + stvx_u $out3,$x30,$out + vxor $out3,$in3,$twk3 + le?vperm $out5,$out5,$out5,$leperm + stvx_u $out4,$x40,$out + vxor $out4,$in4,$twk4 + stvx_u $out5,$x50,$out + vxor $out5,$in5,$twk5 + addi $out,$out,0x60 + + mtctr $rounds + beq Loop_xts_dec6x # did $len-=96 borrow? + + xxlor 32+$eighty7, 2, 2 # 0x010101..87 + + addic. $len,$len,0x60 + beq Lxts_dec6x_zero + cmpwi $len,0x20 + blt Lxts_dec6x_one + nop + beq Lxts_dec6x_two + cmpwi $len,0x40 + blt Lxts_dec6x_three + nop + beq Lxts_dec6x_four + +Lxts_dec6x_five: + vxor $out0,$in1,$twk0 + vxor $out1,$in2,$twk1 + vxor $out2,$in3,$twk2 + vxor $out3,$in4,$twk3 + vxor $out4,$in5,$twk4 + + bl _aesp8_xts_dec5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk5 # unused tweak + vxor $twk1,$tweak,$rndkey0 + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk1 + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + le?vperm $out4,$out4,$out4,$leperm + stvx_u $out3,$x30,$out + stvx_u $out4,$x40,$out + addi $out,$out,0x50 + bne Lxts_dec6x_steal + b Lxts_dec6x_done + +.align 4 +Lxts_dec6x_four: + vxor $out0,$in2,$twk0 + vxor $out1,$in3,$twk1 + vxor $out2,$in4,$twk2 + vxor $out3,$in5,$twk3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_dec5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk4 # unused tweak + vmr $twk1,$twk5 + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk5 + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + le?vperm $out3,$out3,$out3,$leperm + stvx_u $out2,$x20,$out + stvx_u $out3,$x30,$out + addi $out,$out,0x40 + bne Lxts_dec6x_steal + b Lxts_dec6x_done + +.align 4 +Lxts_dec6x_three: + vxor $out0,$in3,$twk0 + vxor $out1,$in4,$twk1 + vxor $out2,$in5,$twk2 + vxor $out3,$out3,$out3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_dec5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk3 # unused tweak + vmr $twk1,$twk4 + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk4 + le?vperm $out2,$out2,$out2,$leperm + stvx_u $out1,$x10,$out + stvx_u $out2,$x20,$out + addi $out,$out,0x30 + bne Lxts_dec6x_steal + b Lxts_dec6x_done + +.align 4 +Lxts_dec6x_two: + vxor $out0,$in4,$twk0 + vxor $out1,$in5,$twk1 + vxor $out2,$out2,$out2 + vxor $out3,$out3,$out3 + vxor $out4,$out4,$out4 + + bl _aesp8_xts_dec5x + + le?vperm $out0,$out0,$out0,$leperm + vmr $twk0,$twk2 # unused tweak + vmr $twk1,$twk3 + le?vperm $out1,$out1,$out1,$leperm + stvx_u $out0,$x00,$out # store output + vxor $out0,$in0,$twk3 + stvx_u $out1,$x10,$out + addi $out,$out,0x20 + bne Lxts_dec6x_steal + b Lxts_dec6x_done + +.align 4 +Lxts_dec6x_one: + vxor $out0,$in5,$twk0 + nop +Loop_xts_dec1x: + vncipher $out0,$out0,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out0,$out0,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Loop_xts_dec1x + + subi r0,$taillen,1 + vncipher $out0,$out0,v24 + + andi. r0,r0,16 + cmpwi $taillen,0 + vncipher $out0,$out0,v25 + + sub $inp,$inp,r0 + vncipher $out0,$out0,v26 + + lvx_u $in0,0,$inp + vncipher $out0,$out0,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vncipher $out0,$out0,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + + vncipher $out0,$out0,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vxor $twk0,$twk0,v31 + + le?vperm $in0,$in0,$in0,$leperm + vncipher $out0,$out0,v30 + + mtctr $rounds + vncipherlast $out0,$out0,$twk0 + + vmr $twk0,$twk1 # unused tweak + vmr $twk1,$twk2 + le?vperm $out0,$out0,$out0,$leperm + stvx_u $out0,$x00,$out # store output + addi $out,$out,0x10 + vxor $out0,$in0,$twk2 + bne Lxts_dec6x_steal + b Lxts_dec6x_done + +.align 4 +Lxts_dec6x_zero: + cmpwi $taillen,0 + beq Lxts_dec6x_done + + lvx_u $in0,0,$inp + le?vperm $in0,$in0,$in0,$leperm + vxor $out0,$in0,$twk1 +Lxts_dec6x_steal: + vncipher $out0,$out0,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out0,$out0,v25 + lvx v25,$x10,$key_ # round[4] + bdnz Lxts_dec6x_steal + + add $inp,$inp,$taillen + vncipher $out0,$out0,v24 + + cmpwi $taillen,0 + vncipher $out0,$out0,v25 + + lvx_u $in0,0,$inp + vncipher $out0,$out0,v26 + + lvsr $inpperm,0,$taillen # $in5 is no more + vncipher $out0,$out0,v27 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vncipher $out0,$out0,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + + vncipher $out0,$out0,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vxor $twk1,$twk1,v31 + + le?vperm $in0,$in0,$in0,$leperm + vncipher $out0,$out0,v30 + + vperm $in0,$in0,$in0,$inpperm + vncipherlast $tmp,$out0,$twk1 + + le?vperm $out0,$tmp,$tmp,$leperm + le?stvx_u $out0,0,$out + be?stvx_u $tmp,0,$out + + vxor $out0,$out0,$out0 + vspltisb $out1,-1 + vperm $out0,$out0,$out1,$inpperm + vsel $out0,$in0,$tmp,$out0 + vxor $out0,$out0,$twk0 + + subi r30,$out,1 + mtctr $taillen +Loop_xts_dec6x_steal: + lbzu r0,1(r30) + stb r0,16(r30) + bdnz Loop_xts_dec6x_steal + + li $taillen,0 + mtctr $rounds + b Loop_xts_dec1x # one more time... + +.align 4 +Lxts_dec6x_done: + ${UCMP}i $ivp,0 + beq Lxts_dec6x_ret + + vxor $tweak,$twk0,$rndkey0 + le?vperm $tweak,$tweak,$tweak,$leperm + stvx_u $tweak,0,$ivp + +Lxts_dec6x_ret: + mtlr r11 + li r10,`$FRAME+15` + li r11,`$FRAME+31` + stvx $seven,r10,$sp # wipe copies of round keys + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + stvx $seven,r10,$sp + addi r10,r10,32 + stvx $seven,r11,$sp + addi r11,r11,32 + + mtspr 256,$vrsave + lvx v20,r10,$sp # ABI says so + addi r10,r10,32 + lvx v21,r11,$sp + addi r11,r11,32 + lvx v22,r10,$sp + addi r10,r10,32 + lvx v23,r11,$sp + addi r11,r11,32 + lvx v24,r10,$sp + addi r10,r10,32 + lvx v25,r11,$sp + addi r11,r11,32 + lvx v26,r10,$sp + addi r10,r10,32 + lvx v27,r11,$sp + addi r11,r11,32 + lvx v28,r10,$sp + addi r10,r10,32 + lvx v29,r11,$sp + addi r11,r11,32 + lvx v30,r10,$sp + lvx v31,r11,$sp + $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp) + $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp) + $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp) + $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp) + $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp) + $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp) + addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T` + blr + .long 0 + .byte 0,12,0x04,1,0x80,6,6,0 + .long 0 + +.align 5 +_aesp8_xts_dec5x: + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + lvx v24,$x20,$key_ # round[3] + addi $key_,$key_,0x20 + + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + lvx v25,$x10,$key_ # round[4] + bdnz _aesp8_xts_dec5x + + subi r0,$taillen,1 + vncipher $out0,$out0,v24 + vncipher $out1,$out1,v24 + vncipher $out2,$out2,v24 + vncipher $out3,$out3,v24 + vncipher $out4,$out4,v24 + + andi. r0,r0,16 + cmpwi $taillen,0 + vncipher $out0,$out0,v25 + vncipher $out1,$out1,v25 + vncipher $out2,$out2,v25 + vncipher $out3,$out3,v25 + vncipher $out4,$out4,v25 + vxor $twk0,$twk0,v31 + + sub $inp,$inp,r0 + vncipher $out0,$out0,v26 + vncipher $out1,$out1,v26 + vncipher $out2,$out2,v26 + vncipher $out3,$out3,v26 + vncipher $out4,$out4,v26 + vxor $in1,$twk1,v31 + + vncipher $out0,$out0,v27 + lvx_u $in0,0,$inp + vncipher $out1,$out1,v27 + vncipher $out2,$out2,v27 + vncipher $out3,$out3,v27 + vncipher $out4,$out4,v27 + vxor $in2,$twk2,v31 + + addi $key_,$sp,$FRAME+15 # rewind $key_ + vncipher $out0,$out0,v28 + vncipher $out1,$out1,v28 + vncipher $out2,$out2,v28 + vncipher $out3,$out3,v28 + vncipher $out4,$out4,v28 + lvx v24,$x00,$key_ # re-pre-load round[1] + vxor $in3,$twk3,v31 + + vncipher $out0,$out0,v29 + le?vperm $in0,$in0,$in0,$leperm + vncipher $out1,$out1,v29 + vncipher $out2,$out2,v29 + vncipher $out3,$out3,v29 + vncipher $out4,$out4,v29 + lvx v25,$x10,$key_ # re-pre-load round[2] + vxor $in4,$twk4,v31 + + vncipher $out0,$out0,v30 + vncipher $out1,$out1,v30 + vncipher $out2,$out2,v30 + vncipher $out3,$out3,v30 + vncipher $out4,$out4,v30 + + vncipherlast $out0,$out0,$twk0 + vncipherlast $out1,$out1,$in1 + vncipherlast $out2,$out2,$in2 + vncipherlast $out3,$out3,$in3 + vncipherlast $out4,$out4,$in4 + mtctr $rounds + blr + .long 0 + .byte 0,12,0x14,0,0,0,0,0 +___ +}} }}} + +my $consts=1; +foreach(split("\n",$code)) { + s/\`([^\`]*)\`/eval($1)/geo; + + # constants table endian-specific conversion + if ($consts && m/\.(long|byte)\s+(.+)\s+(\?[a-z]*)$/o) { + my $conv=$3; + my @bytes=(); + + # convert to endian-agnostic format + if ($1 eq "long") { + foreach (split(/,\s*/,$2)) { + my $l = /^0/?oct:int; + push @bytes,($l>>24)&0xff,($l>>16)&0xff,($l>>8)&0xff,$l&0xff; + } + } else { + @bytes = map(/^0/?oct:int,split(/,\s*/,$2)); + } + + # little-endian conversion + if ($flavour =~ /le$/o) { + SWITCH: for($conv) { + /\?inv/ && do { @bytes=map($_^0xf,@bytes); last; }; + /\?rev/ && do { @bytes=reverse(@bytes); last; }; + } + } + + #emit + print ".byte\t",join(',',map (sprintf("0x%02x",$_),@bytes)),"\n"; + next; + } + $consts=0 if (m/Lconsts:/o); # end of table + + # instructions prefixed with '?' are endian-specific and need + # to be adjusted accordingly... + if ($flavour =~ /le$/o) { # little-endian + s/le\?//o or + s/be\?/#be#/o or + s/\?lvsr/lvsl/o or + s/\?lvsl/lvsr/o or + s/\?(vperm\s+v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+)/$1$3$2$4/o or + s/\?(vsldoi\s+v[0-9]+,\s*)(v[0-9]+,)\s*(v[0-9]+,\s*)([0-9]+)/$1$3$2 16-$4/o or + s/\?(vspltw\s+v[0-9]+,\s*)(v[0-9]+,)\s*([0-9])/$1$2 3-$3/o; + } else { # big-endian + s/le\?/#le#/o or + s/be\?//o or + s/\?([a-z]+)/$1/o; + } + + print $_,"\n"; +} + +close STDOUT; -- cgit v1.2.3 From 0cab15611e839142f4fd3c8a366acd1f7334b30b Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:13 -0800 Subject: lib/crypto: s390/aes: Migrate optimized code into library Implement aes_preparekey_arch(), aes_encrypt_arch(), and aes_decrypt_arch() using the CPACF AES instructions. Then, remove the superseded "aes-s390" crypto_cipher. The result is that both the AES library and crypto_cipher APIs use the CPACF AES instructions, whereas previously only crypto_cipher did (and it wasn't enabled by default, which this commit fixes as well). Note that this preserves the optimization where the AES key is stored in raw form rather than expanded form. CPACF just takes the raw key. Acked-by: Ard Biesheuvel Tested-by: Holger Dengler Reviewed-by: Holger Dengler Link: https://lore.kernel.org/r/20260112192035.10427-16-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/s390/crypto/Kconfig | 2 - arch/s390/crypto/aes_s390.c | 113 -------------------------------------------- include/crypto/aes.h | 3 ++ lib/crypto/Kconfig | 1 + lib/crypto/s390/aes.h | 106 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 110 insertions(+), 115 deletions(-) create mode 100644 lib/crypto/s390/aes.h (limited to 'include') diff --git a/arch/s390/crypto/Kconfig b/arch/s390/crypto/Kconfig index f838ca055f6d..79a2d0034258 100644 --- a/arch/s390/crypto/Kconfig +++ b/arch/s390/crypto/Kconfig @@ -14,10 +14,8 @@ config CRYPTO_GHASH_S390 config CRYPTO_AES_S390 tristate "Ciphers: AES, modes: ECB, CBC, CTR, XTS, GCM" - select CRYPTO_ALGAPI select CRYPTO_SKCIPHER help - Block cipher: AES cipher algorithms (FIPS 197) AEAD cipher: AES with GCM Length-preserving ciphers: AES with ECB, CBC, XTS, and CTR modes diff --git a/arch/s390/crypto/aes_s390.c b/arch/s390/crypto/aes_s390.c index d0a295435680..62edc66d5478 100644 --- a/arch/s390/crypto/aes_s390.c +++ b/arch/s390/crypto/aes_s390.c @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include @@ -45,7 +44,6 @@ struct s390_aes_ctx { unsigned long fc; union { struct crypto_skcipher *skcipher; - struct crypto_cipher *cip; } fallback; }; @@ -72,109 +70,6 @@ struct gcm_sg_walk { unsigned int nbytes; }; -static int setkey_fallback_cip(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - - sctx->fallback.cip->base.crt_flags &= ~CRYPTO_TFM_REQ_MASK; - sctx->fallback.cip->base.crt_flags |= (tfm->crt_flags & - CRYPTO_TFM_REQ_MASK); - - return crypto_cipher_setkey(sctx->fallback.cip, in_key, key_len); -} - -static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - unsigned long fc; - - /* Pick the correct function code based on the key length */ - fc = (key_len == 16) ? CPACF_KM_AES_128 : - (key_len == 24) ? CPACF_KM_AES_192 : - (key_len == 32) ? CPACF_KM_AES_256 : 0; - - /* Check if the function code is available */ - sctx->fc = (fc && cpacf_test_func(&km_functions, fc)) ? fc : 0; - if (!sctx->fc) - return setkey_fallback_cip(tfm, in_key, key_len); - - sctx->key_len = key_len; - memcpy(sctx->key, in_key, key_len); - return 0; -} - -static void crypto_aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - - if (unlikely(!sctx->fc)) { - crypto_cipher_encrypt_one(sctx->fallback.cip, out, in); - return; - } - cpacf_km(sctx->fc, &sctx->key, out, in, AES_BLOCK_SIZE); -} - -static void crypto_aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) -{ - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - - if (unlikely(!sctx->fc)) { - crypto_cipher_decrypt_one(sctx->fallback.cip, out, in); - return; - } - cpacf_km(sctx->fc | CPACF_DECRYPT, - &sctx->key, out, in, AES_BLOCK_SIZE); -} - -static int fallback_init_cip(struct crypto_tfm *tfm) -{ - const char *name = tfm->__crt_alg->cra_name; - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - - sctx->fallback.cip = crypto_alloc_cipher(name, 0, - CRYPTO_ALG_NEED_FALLBACK); - - if (IS_ERR(sctx->fallback.cip)) { - pr_err("Allocating AES fallback algorithm %s failed\n", - name); - return PTR_ERR(sctx->fallback.cip); - } - - return 0; -} - -static void fallback_exit_cip(struct crypto_tfm *tfm) -{ - struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm); - - crypto_free_cipher(sctx->fallback.cip); - sctx->fallback.cip = NULL; -} - -static struct crypto_alg aes_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-s390", - .cra_priority = 300, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER | - CRYPTO_ALG_NEED_FALLBACK, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct s390_aes_ctx), - .cra_module = THIS_MODULE, - .cra_init = fallback_init_cip, - .cra_exit = fallback_exit_cip, - .cra_u = { - .cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = aes_set_key, - .cia_encrypt = crypto_aes_encrypt, - .cia_decrypt = crypto_aes_decrypt, - } - } -}; - static int setkey_fallback_skcipher(struct crypto_skcipher *tfm, const u8 *key, unsigned int len) { @@ -1049,7 +944,6 @@ static struct aead_alg gcm_aes_aead = { }, }; -static struct crypto_alg *aes_s390_alg; static struct skcipher_alg *aes_s390_skcipher_algs[5]; static int aes_s390_skciphers_num; static struct aead_alg *aes_s390_aead_alg; @@ -1066,8 +960,6 @@ static int aes_s390_register_skcipher(struct skcipher_alg *alg) static void aes_s390_fini(void) { - if (aes_s390_alg) - crypto_unregister_alg(aes_s390_alg); while (aes_s390_skciphers_num--) crypto_unregister_skcipher(aes_s390_skcipher_algs[aes_s390_skciphers_num]); if (ctrblk) @@ -1090,10 +982,6 @@ static int __init aes_s390_init(void) if (cpacf_test_func(&km_functions, CPACF_KM_AES_128) || cpacf_test_func(&km_functions, CPACF_KM_AES_192) || cpacf_test_func(&km_functions, CPACF_KM_AES_256)) { - ret = crypto_register_alg(&aes_alg); - if (ret) - goto out_err; - aes_s390_alg = &aes_alg; ret = aes_s390_register_skcipher(&ecb_aes_alg); if (ret) goto out_err; @@ -1156,4 +1044,3 @@ MODULE_ALIAS_CRYPTO("aes-all"); MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm"); MODULE_LICENSE("GPL"); -MODULE_IMPORT_NS("CRYPTO_INTERNAL"); diff --git a/include/crypto/aes.h b/include/crypto/aes.h index bff71cfaedeb..19fd99f383fb 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -46,6 +46,9 @@ union aes_enckey_arch { * overlap rndkeys) is set to 0 to differentiate the two formats. */ struct p8_aes_key p8; +#elif defined(CONFIG_S390) + /* Used when the CPU supports CPACF AES for this key's length */ + u8 raw_key[AES_MAX_KEY_SIZE]; #endif #endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index 2690b5ffc5ca..56a9b4f53b0e 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -19,6 +19,7 @@ config CRYPTO_LIB_AES_ARCH default y if PPC && (SPE || (PPC64 && VSX)) default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS + default y if S390 config CRYPTO_LIB_AESCFB tristate diff --git a/lib/crypto/s390/aes.h b/lib/crypto/s390/aes.h new file mode 100644 index 000000000000..5466f6ecbce7 --- /dev/null +++ b/lib/crypto/s390/aes.h @@ -0,0 +1,106 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * AES optimized using the CP Assist for Cryptographic Functions (CPACF) + * + * Copyright 2026 Google LLC + */ +#include +#include + +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_cpacf_aes128); +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_cpacf_aes192); +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_cpacf_aes256); + +/* + * When the CPU supports CPACF AES for the requested key length, we need only + * save a copy of the raw AES key, as that's what the CPACF instructions need. + * + * When unsupported, fall back to the generic key expansion and en/decryption. + */ +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + if (key_len == AES_KEYSIZE_128) { + if (static_branch_likely(&have_cpacf_aes128)) { + memcpy(k->raw_key, in_key, AES_KEYSIZE_128); + return; + } + } else if (key_len == AES_KEYSIZE_192) { + if (static_branch_likely(&have_cpacf_aes192)) { + memcpy(k->raw_key, in_key, AES_KEYSIZE_192); + return; + } + } else { + if (static_branch_likely(&have_cpacf_aes256)) { + memcpy(k->raw_key, in_key, AES_KEYSIZE_256); + return; + } + } + aes_expandkey_generic(k->rndkeys, inv_k ? inv_k->inv_rndkeys : NULL, + in_key, key_len); +} + +static inline bool aes_crypt_s390(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE], int decrypt) +{ + if (key->len == AES_KEYSIZE_128) { + if (static_branch_likely(&have_cpacf_aes128)) { + cpacf_km(CPACF_KM_AES_128 | decrypt, + (void *)key->k.raw_key, out, in, + AES_BLOCK_SIZE); + return true; + } + } else if (key->len == AES_KEYSIZE_192) { + if (static_branch_likely(&have_cpacf_aes192)) { + cpacf_km(CPACF_KM_AES_192 | decrypt, + (void *)key->k.raw_key, out, in, + AES_BLOCK_SIZE); + return true; + } + } else { + if (static_branch_likely(&have_cpacf_aes256)) { + cpacf_km(CPACF_KM_AES_256 | decrypt, + (void *)key->k.raw_key, out, in, + AES_BLOCK_SIZE); + return true; + } + } + return false; +} + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (likely(aes_crypt_s390(key, out, in, 0))) + return; + aes_encrypt_generic(key->k.rndkeys, key->nrounds, out, in); +} + +static void aes_decrypt_arch(const struct aes_key *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + if (likely(aes_crypt_s390((const struct aes_enckey *)key, out, in, + CPACF_DECRYPT))) + return; + aes_decrypt_generic(key->inv_k.inv_rndkeys, key->nrounds, out, in); +} + +#define aes_mod_init_arch aes_mod_init_arch +static void aes_mod_init_arch(void) +{ + if (cpu_have_feature(S390_CPU_FEATURE_MSA)) { + cpacf_mask_t km_functions; + + cpacf_query(CPACF_KM, &km_functions); + if (cpacf_test_func(&km_functions, CPACF_KM_AES_128)) + static_branch_enable(&have_cpacf_aes128); + if (cpacf_test_func(&km_functions, CPACF_KM_AES_192)) + static_branch_enable(&have_cpacf_aes192); + if (cpacf_test_func(&km_functions, CPACF_KM_AES_256)) + static_branch_enable(&have_cpacf_aes256); + } +} -- cgit v1.2.3 From 293c7cd5c6c00f3b6fa0072fc4b017a3a13ad1e7 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:14 -0800 Subject: lib/crypto: sparc/aes: Migrate optimized code into library Move the SPARC64 AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the "aes-sparc64" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs use the SPARC64 AES opcodes, whereas previously only crypto_cipher did (and it wasn't enabled by default, which this commit fixes as well). Note that some of the functions in the SPARC64 AES assembly code are still used by the AES mode implementations in arch/sparc/crypto/aes_glue.c. For now, just export these functions. These exports will go away once the AES mode implementations are migrated to the library as well. (Trying to split up the assembly file seemed like much more trouble than it would be worth.) Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-17-ebiggers@kernel.org Signed-off-by: Eric Biggers --- arch/sparc/crypto/Kconfig | 2 +- arch/sparc/crypto/Makefile | 2 +- arch/sparc/crypto/aes_asm.S | 1543 ------------------------------------------ arch/sparc/crypto/aes_glue.c | 140 +--- include/crypto/aes.h | 42 ++ lib/crypto/Kconfig | 1 + lib/crypto/Makefile | 1 + lib/crypto/sparc/aes.h | 149 ++++ lib/crypto/sparc/aes_asm.S | 1543 ++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 1743 insertions(+), 1680 deletions(-) delete mode 100644 arch/sparc/crypto/aes_asm.S create mode 100644 lib/crypto/sparc/aes.h create mode 100644 lib/crypto/sparc/aes_asm.S (limited to 'include') diff --git a/arch/sparc/crypto/Kconfig b/arch/sparc/crypto/Kconfig index f755da979534..c1932ce46c7f 100644 --- a/arch/sparc/crypto/Kconfig +++ b/arch/sparc/crypto/Kconfig @@ -19,9 +19,9 @@ config CRYPTO_DES_SPARC64 config CRYPTO_AES_SPARC64 tristate "Ciphers: AES, modes: ECB, CBC, CTR" depends on SPARC64 + select CRYPTO_LIB_AES select CRYPTO_SKCIPHER help - Block ciphers: AES cipher algorithms (FIPS-197) Length-preseving ciphers: AES with ECB, CBC, and CTR modes Architecture: sparc64 using crypto instructions diff --git a/arch/sparc/crypto/Makefile b/arch/sparc/crypto/Makefile index 7b4796842ddd..cdf9f4b3efbb 100644 --- a/arch/sparc/crypto/Makefile +++ b/arch/sparc/crypto/Makefile @@ -7,6 +7,6 @@ obj-$(CONFIG_CRYPTO_AES_SPARC64) += aes-sparc64.o obj-$(CONFIG_CRYPTO_DES_SPARC64) += des-sparc64.o obj-$(CONFIG_CRYPTO_CAMELLIA_SPARC64) += camellia-sparc64.o -aes-sparc64-y := aes_asm.o aes_glue.o +aes-sparc64-y := aes_glue.o des-sparc64-y := des_asm.o des_glue.o camellia-sparc64-y := camellia_asm.o camellia_glue.o diff --git a/arch/sparc/crypto/aes_asm.S b/arch/sparc/crypto/aes_asm.S deleted file mode 100644 index f291174a72a1..000000000000 --- a/arch/sparc/crypto/aes_asm.S +++ /dev/null @@ -1,1543 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#include -#include -#include - -#define ENCRYPT_TWO_ROUNDS(KEY_BASE, I0, I1, T0, T1) \ - AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ - AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ - AES_EROUND01(KEY_BASE + 4, T0, T1, I0) \ - AES_EROUND23(KEY_BASE + 6, T0, T1, I1) - -#define ENCRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ - AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ - AES_EROUND01(KEY_BASE + 0, I2, I3, T2) \ - AES_EROUND23(KEY_BASE + 2, I2, I3, T3) \ - AES_EROUND01(KEY_BASE + 4, T0, T1, I0) \ - AES_EROUND23(KEY_BASE + 6, T0, T1, I1) \ - AES_EROUND01(KEY_BASE + 4, T2, T3, I2) \ - AES_EROUND23(KEY_BASE + 6, T2, T3, I3) - -#define ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE, I0, I1, T0, T1) \ - AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ - AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ - AES_EROUND01_L(KEY_BASE + 4, T0, T1, I0) \ - AES_EROUND23_L(KEY_BASE + 6, T0, T1, I1) - -#define ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ - AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ - AES_EROUND01(KEY_BASE + 0, I2, I3, T2) \ - AES_EROUND23(KEY_BASE + 2, I2, I3, T3) \ - AES_EROUND01_L(KEY_BASE + 4, T0, T1, I0) \ - AES_EROUND23_L(KEY_BASE + 6, T0, T1, I1) \ - AES_EROUND01_L(KEY_BASE + 4, T2, T3, I2) \ - AES_EROUND23_L(KEY_BASE + 6, T2, T3, I3) - - /* 10 rounds */ -#define ENCRYPT_128(KEY_BASE, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 32, I0, I1, T0, T1) - -#define ENCRYPT_128_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) - - /* 12 rounds */ -#define ENCRYPT_192(KEY_BASE, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 40, I0, I1, T0, T1) - -#define ENCRYPT_192_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) \ - ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 40, I0, I1, I2, I3, T0, T1, T2, T3) - - /* 14 rounds */ -#define ENCRYPT_256(KEY_BASE, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS(KEY_BASE + 40, I0, I1, T0, T1) \ - ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 48, I0, I1, T0, T1) - -#define ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, TMP_BASE) \ - ENCRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, \ - TMP_BASE + 0, TMP_BASE + 2, TMP_BASE + 4, TMP_BASE + 6) - -#define ENCRYPT_256_2(KEY_BASE, I0, I1, I2, I3) \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, KEY_BASE + 48) \ - ldd [%o0 + 0xd0], %f56; \ - ldd [%o0 + 0xd8], %f58; \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, KEY_BASE + 0) \ - ldd [%o0 + 0xe0], %f60; \ - ldd [%o0 + 0xe8], %f62; \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, KEY_BASE + 0) \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, KEY_BASE + 0) \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, KEY_BASE + 0) \ - ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 40, I0, I1, I2, I3, KEY_BASE + 0) \ - AES_EROUND01(KEY_BASE + 48, I0, I1, KEY_BASE + 0) \ - AES_EROUND23(KEY_BASE + 50, I0, I1, KEY_BASE + 2) \ - AES_EROUND01(KEY_BASE + 48, I2, I3, KEY_BASE + 4) \ - AES_EROUND23(KEY_BASE + 50, I2, I3, KEY_BASE + 6) \ - AES_EROUND01_L(KEY_BASE + 52, KEY_BASE + 0, KEY_BASE + 2, I0) \ - AES_EROUND23_L(KEY_BASE + 54, KEY_BASE + 0, KEY_BASE + 2, I1) \ - ldd [%o0 + 0x10], %f8; \ - ldd [%o0 + 0x18], %f10; \ - AES_EROUND01_L(KEY_BASE + 52, KEY_BASE + 4, KEY_BASE + 6, I2) \ - AES_EROUND23_L(KEY_BASE + 54, KEY_BASE + 4, KEY_BASE + 6, I3) \ - ldd [%o0 + 0x20], %f12; \ - ldd [%o0 + 0x28], %f14; - -#define DECRYPT_TWO_ROUNDS(KEY_BASE, I0, I1, T0, T1) \ - AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ - AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ - AES_DROUND23(KEY_BASE + 4, T0, T1, I1) \ - AES_DROUND01(KEY_BASE + 6, T0, T1, I0) - -#define DECRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ - AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ - AES_DROUND23(KEY_BASE + 0, I2, I3, T3) \ - AES_DROUND01(KEY_BASE + 2, I2, I3, T2) \ - AES_DROUND23(KEY_BASE + 4, T0, T1, I1) \ - AES_DROUND01(KEY_BASE + 6, T0, T1, I0) \ - AES_DROUND23(KEY_BASE + 4, T2, T3, I3) \ - AES_DROUND01(KEY_BASE + 6, T2, T3, I2) - -#define DECRYPT_TWO_ROUNDS_LAST(KEY_BASE, I0, I1, T0, T1) \ - AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ - AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ - AES_DROUND23_L(KEY_BASE + 4, T0, T1, I1) \ - AES_DROUND01_L(KEY_BASE + 6, T0, T1, I0) - -#define DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ - AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ - AES_DROUND23(KEY_BASE + 0, I2, I3, T3) \ - AES_DROUND01(KEY_BASE + 2, I2, I3, T2) \ - AES_DROUND23_L(KEY_BASE + 4, T0, T1, I1) \ - AES_DROUND01_L(KEY_BASE + 6, T0, T1, I0) \ - AES_DROUND23_L(KEY_BASE + 4, T2, T3, I3) \ - AES_DROUND01_L(KEY_BASE + 6, T2, T3, I2) - - /* 10 rounds */ -#define DECRYPT_128(KEY_BASE, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 32, I0, I1, T0, T1) - -#define DECRYPT_128_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) - - /* 12 rounds */ -#define DECRYPT_192(KEY_BASE, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 40, I0, I1, T0, T1) - -#define DECRYPT_192_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) \ - DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 40, I0, I1, I2, I3, T0, T1, T2, T3) - - /* 14 rounds */ -#define DECRYPT_256(KEY_BASE, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS(KEY_BASE + 40, I0, I1, T0, T1) \ - DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 48, I0, I1, T0, T1) - -#define DECRYPT_256_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, TMP_BASE) \ - DECRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, \ - TMP_BASE + 0, TMP_BASE + 2, TMP_BASE + 4, TMP_BASE + 6) - -#define DECRYPT_256_2(KEY_BASE, I0, I1, I2, I3) \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, KEY_BASE + 48) \ - ldd [%o0 + 0x18], %f56; \ - ldd [%o0 + 0x10], %f58; \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, KEY_BASE + 0) \ - ldd [%o0 + 0x08], %f60; \ - ldd [%o0 + 0x00], %f62; \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, KEY_BASE + 0) \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, KEY_BASE + 0) \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, KEY_BASE + 0) \ - DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 40, I0, I1, I2, I3, KEY_BASE + 0) \ - AES_DROUND23(KEY_BASE + 48, I0, I1, KEY_BASE + 2) \ - AES_DROUND01(KEY_BASE + 50, I0, I1, KEY_BASE + 0) \ - AES_DROUND23(KEY_BASE + 48, I2, I3, KEY_BASE + 6) \ - AES_DROUND01(KEY_BASE + 50, I2, I3, KEY_BASE + 4) \ - AES_DROUND23_L(KEY_BASE + 52, KEY_BASE + 0, KEY_BASE + 2, I1) \ - AES_DROUND01_L(KEY_BASE + 54, KEY_BASE + 0, KEY_BASE + 2, I0) \ - ldd [%o0 + 0xd8], %f8; \ - ldd [%o0 + 0xd0], %f10; \ - AES_DROUND23_L(KEY_BASE + 52, KEY_BASE + 4, KEY_BASE + 6, I3) \ - AES_DROUND01_L(KEY_BASE + 54, KEY_BASE + 4, KEY_BASE + 6, I2) \ - ldd [%o0 + 0xc8], %f12; \ - ldd [%o0 + 0xc0], %f14; - - .align 32 -ENTRY(aes_sparc64_key_expand) - /* %o0=input_key, %o1=output_key, %o2=key_len */ - VISEntry - ld [%o0 + 0x00], %f0 - ld [%o0 + 0x04], %f1 - ld [%o0 + 0x08], %f2 - ld [%o0 + 0x0c], %f3 - - std %f0, [%o1 + 0x00] - std %f2, [%o1 + 0x08] - add %o1, 0x10, %o1 - - cmp %o2, 24 - bl 2f - nop - - be 1f - nop - - /* 256-bit key expansion */ - ld [%o0 + 0x10], %f4 - ld [%o0 + 0x14], %f5 - ld [%o0 + 0x18], %f6 - ld [%o0 + 0x1c], %f7 - - std %f4, [%o1 + 0x00] - std %f6, [%o1 + 0x08] - add %o1, 0x10, %o1 - - AES_KEXPAND1(0, 6, 0x0, 8) - AES_KEXPAND2(2, 8, 10) - AES_KEXPAND0(4, 10, 12) - AES_KEXPAND2(6, 12, 14) - AES_KEXPAND1(8, 14, 0x1, 16) - AES_KEXPAND2(10, 16, 18) - AES_KEXPAND0(12, 18, 20) - AES_KEXPAND2(14, 20, 22) - AES_KEXPAND1(16, 22, 0x2, 24) - AES_KEXPAND2(18, 24, 26) - AES_KEXPAND0(20, 26, 28) - AES_KEXPAND2(22, 28, 30) - AES_KEXPAND1(24, 30, 0x3, 32) - AES_KEXPAND2(26, 32, 34) - AES_KEXPAND0(28, 34, 36) - AES_KEXPAND2(30, 36, 38) - AES_KEXPAND1(32, 38, 0x4, 40) - AES_KEXPAND2(34, 40, 42) - AES_KEXPAND0(36, 42, 44) - AES_KEXPAND2(38, 44, 46) - AES_KEXPAND1(40, 46, 0x5, 48) - AES_KEXPAND2(42, 48, 50) - AES_KEXPAND0(44, 50, 52) - AES_KEXPAND2(46, 52, 54) - AES_KEXPAND1(48, 54, 0x6, 56) - AES_KEXPAND2(50, 56, 58) - - std %f8, [%o1 + 0x00] - std %f10, [%o1 + 0x08] - std %f12, [%o1 + 0x10] - std %f14, [%o1 + 0x18] - std %f16, [%o1 + 0x20] - std %f18, [%o1 + 0x28] - std %f20, [%o1 + 0x30] - std %f22, [%o1 + 0x38] - std %f24, [%o1 + 0x40] - std %f26, [%o1 + 0x48] - std %f28, [%o1 + 0x50] - std %f30, [%o1 + 0x58] - std %f32, [%o1 + 0x60] - std %f34, [%o1 + 0x68] - std %f36, [%o1 + 0x70] - std %f38, [%o1 + 0x78] - std %f40, [%o1 + 0x80] - std %f42, [%o1 + 0x88] - std %f44, [%o1 + 0x90] - std %f46, [%o1 + 0x98] - std %f48, [%o1 + 0xa0] - std %f50, [%o1 + 0xa8] - std %f52, [%o1 + 0xb0] - std %f54, [%o1 + 0xb8] - std %f56, [%o1 + 0xc0] - ba,pt %xcc, 80f - std %f58, [%o1 + 0xc8] - -1: - /* 192-bit key expansion */ - ld [%o0 + 0x10], %f4 - ld [%o0 + 0x14], %f5 - - std %f4, [%o1 + 0x00] - add %o1, 0x08, %o1 - - AES_KEXPAND1(0, 4, 0x0, 6) - AES_KEXPAND2(2, 6, 8) - AES_KEXPAND2(4, 8, 10) - AES_KEXPAND1(6, 10, 0x1, 12) - AES_KEXPAND2(8, 12, 14) - AES_KEXPAND2(10, 14, 16) - AES_KEXPAND1(12, 16, 0x2, 18) - AES_KEXPAND2(14, 18, 20) - AES_KEXPAND2(16, 20, 22) - AES_KEXPAND1(18, 22, 0x3, 24) - AES_KEXPAND2(20, 24, 26) - AES_KEXPAND2(22, 26, 28) - AES_KEXPAND1(24, 28, 0x4, 30) - AES_KEXPAND2(26, 30, 32) - AES_KEXPAND2(28, 32, 34) - AES_KEXPAND1(30, 34, 0x5, 36) - AES_KEXPAND2(32, 36, 38) - AES_KEXPAND2(34, 38, 40) - AES_KEXPAND1(36, 40, 0x6, 42) - AES_KEXPAND2(38, 42, 44) - AES_KEXPAND2(40, 44, 46) - AES_KEXPAND1(42, 46, 0x7, 48) - AES_KEXPAND2(44, 48, 50) - - std %f6, [%o1 + 0x00] - std %f8, [%o1 + 0x08] - std %f10, [%o1 + 0x10] - std %f12, [%o1 + 0x18] - std %f14, [%o1 + 0x20] - std %f16, [%o1 + 0x28] - std %f18, [%o1 + 0x30] - std %f20, [%o1 + 0x38] - std %f22, [%o1 + 0x40] - std %f24, [%o1 + 0x48] - std %f26, [%o1 + 0x50] - std %f28, [%o1 + 0x58] - std %f30, [%o1 + 0x60] - std %f32, [%o1 + 0x68] - std %f34, [%o1 + 0x70] - std %f36, [%o1 + 0x78] - std %f38, [%o1 + 0x80] - std %f40, [%o1 + 0x88] - std %f42, [%o1 + 0x90] - std %f44, [%o1 + 0x98] - std %f46, [%o1 + 0xa0] - std %f48, [%o1 + 0xa8] - ba,pt %xcc, 80f - std %f50, [%o1 + 0xb0] - -2: - /* 128-bit key expansion */ - AES_KEXPAND1(0, 2, 0x0, 4) - AES_KEXPAND2(2, 4, 6) - AES_KEXPAND1(4, 6, 0x1, 8) - AES_KEXPAND2(6, 8, 10) - AES_KEXPAND1(8, 10, 0x2, 12) - AES_KEXPAND2(10, 12, 14) - AES_KEXPAND1(12, 14, 0x3, 16) - AES_KEXPAND2(14, 16, 18) - AES_KEXPAND1(16, 18, 0x4, 20) - AES_KEXPAND2(18, 20, 22) - AES_KEXPAND1(20, 22, 0x5, 24) - AES_KEXPAND2(22, 24, 26) - AES_KEXPAND1(24, 26, 0x6, 28) - AES_KEXPAND2(26, 28, 30) - AES_KEXPAND1(28, 30, 0x7, 32) - AES_KEXPAND2(30, 32, 34) - AES_KEXPAND1(32, 34, 0x8, 36) - AES_KEXPAND2(34, 36, 38) - AES_KEXPAND1(36, 38, 0x9, 40) - AES_KEXPAND2(38, 40, 42) - - std %f4, [%o1 + 0x00] - std %f6, [%o1 + 0x08] - std %f8, [%o1 + 0x10] - std %f10, [%o1 + 0x18] - std %f12, [%o1 + 0x20] - std %f14, [%o1 + 0x28] - std %f16, [%o1 + 0x30] - std %f18, [%o1 + 0x38] - std %f20, [%o1 + 0x40] - std %f22, [%o1 + 0x48] - std %f24, [%o1 + 0x50] - std %f26, [%o1 + 0x58] - std %f28, [%o1 + 0x60] - std %f30, [%o1 + 0x68] - std %f32, [%o1 + 0x70] - std %f34, [%o1 + 0x78] - std %f36, [%o1 + 0x80] - std %f38, [%o1 + 0x88] - std %f40, [%o1 + 0x90] - std %f42, [%o1 + 0x98] -80: - retl - VISExit -ENDPROC(aes_sparc64_key_expand) - - .align 32 -ENTRY(aes_sparc64_encrypt_128) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - ldd [%o0 + 0x00], %f8 - ldd [%o0 + 0x08], %f10 - ldd [%o0 + 0x10], %f12 - ldd [%o0 + 0x18], %f14 - ldd [%o0 + 0x20], %f16 - ldd [%o0 + 0x28], %f18 - ldd [%o0 + 0x30], %f20 - ldd [%o0 + 0x38], %f22 - ldd [%o0 + 0x40], %f24 - ldd [%o0 + 0x48], %f26 - ldd [%o0 + 0x50], %f28 - ldd [%o0 + 0x58], %f30 - ldd [%o0 + 0x60], %f32 - ldd [%o0 + 0x68], %f34 - ldd [%o0 + 0x70], %f36 - ldd [%o0 + 0x78], %f38 - ldd [%o0 + 0x80], %f40 - ldd [%o0 + 0x88], %f42 - ldd [%o0 + 0x90], %f44 - ldd [%o0 + 0x98], %f46 - ldd [%o0 + 0xa0], %f48 - ldd [%o0 + 0xa8], %f50 - fxor %f8, %f4, %f4 - fxor %f10, %f6, %f6 - ENCRYPT_128(12, 4, 6, 0, 2) - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - retl - VISExit -ENDPROC(aes_sparc64_encrypt_128) - - .align 32 -ENTRY(aes_sparc64_encrypt_192) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - - ldd [%o0 + 0x00], %f8 - ldd [%o0 + 0x08], %f10 - - fxor %f8, %f4, %f4 - fxor %f10, %f6, %f6 - - ldd [%o0 + 0x10], %f8 - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - add %o0, 0x20, %o0 - - ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) - - ldd [%o0 + 0x10], %f12 - ldd [%o0 + 0x18], %f14 - ldd [%o0 + 0x20], %f16 - ldd [%o0 + 0x28], %f18 - ldd [%o0 + 0x30], %f20 - ldd [%o0 + 0x38], %f22 - ldd [%o0 + 0x40], %f24 - ldd [%o0 + 0x48], %f26 - ldd [%o0 + 0x50], %f28 - ldd [%o0 + 0x58], %f30 - ldd [%o0 + 0x60], %f32 - ldd [%o0 + 0x68], %f34 - ldd [%o0 + 0x70], %f36 - ldd [%o0 + 0x78], %f38 - ldd [%o0 + 0x80], %f40 - ldd [%o0 + 0x88], %f42 - ldd [%o0 + 0x90], %f44 - ldd [%o0 + 0x98], %f46 - ldd [%o0 + 0xa0], %f48 - ldd [%o0 + 0xa8], %f50 - - - ENCRYPT_128(12, 4, 6, 0, 2) - - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - - retl - VISExit -ENDPROC(aes_sparc64_encrypt_192) - - .align 32 -ENTRY(aes_sparc64_encrypt_256) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - - ldd [%o0 + 0x00], %f8 - ldd [%o0 + 0x08], %f10 - - fxor %f8, %f4, %f4 - fxor %f10, %f6, %f6 - - ldd [%o0 + 0x10], %f8 - - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - add %o0, 0x20, %o0 - - ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) - - ldd [%o0 + 0x10], %f8 - - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - add %o0, 0x20, %o0 - - ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) - - ldd [%o0 + 0x10], %f12 - ldd [%o0 + 0x18], %f14 - ldd [%o0 + 0x20], %f16 - ldd [%o0 + 0x28], %f18 - ldd [%o0 + 0x30], %f20 - ldd [%o0 + 0x38], %f22 - ldd [%o0 + 0x40], %f24 - ldd [%o0 + 0x48], %f26 - ldd [%o0 + 0x50], %f28 - ldd [%o0 + 0x58], %f30 - ldd [%o0 + 0x60], %f32 - ldd [%o0 + 0x68], %f34 - ldd [%o0 + 0x70], %f36 - ldd [%o0 + 0x78], %f38 - ldd [%o0 + 0x80], %f40 - ldd [%o0 + 0x88], %f42 - ldd [%o0 + 0x90], %f44 - ldd [%o0 + 0x98], %f46 - ldd [%o0 + 0xa0], %f48 - ldd [%o0 + 0xa8], %f50 - - ENCRYPT_128(12, 4, 6, 0, 2) - - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - - retl - VISExit -ENDPROC(aes_sparc64_encrypt_256) - - .align 32 -ENTRY(aes_sparc64_decrypt_128) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - ldd [%o0 + 0xa0], %f8 - ldd [%o0 + 0xa8], %f10 - ldd [%o0 + 0x98], %f12 - ldd [%o0 + 0x90], %f14 - ldd [%o0 + 0x88], %f16 - ldd [%o0 + 0x80], %f18 - ldd [%o0 + 0x78], %f20 - ldd [%o0 + 0x70], %f22 - ldd [%o0 + 0x68], %f24 - ldd [%o0 + 0x60], %f26 - ldd [%o0 + 0x58], %f28 - ldd [%o0 + 0x50], %f30 - ldd [%o0 + 0x48], %f32 - ldd [%o0 + 0x40], %f34 - ldd [%o0 + 0x38], %f36 - ldd [%o0 + 0x30], %f38 - ldd [%o0 + 0x28], %f40 - ldd [%o0 + 0x20], %f42 - ldd [%o0 + 0x18], %f44 - ldd [%o0 + 0x10], %f46 - ldd [%o0 + 0x08], %f48 - ldd [%o0 + 0x00], %f50 - fxor %f8, %f4, %f4 - fxor %f10, %f6, %f6 - DECRYPT_128(12, 4, 6, 0, 2) - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - retl - VISExit -ENDPROC(aes_sparc64_decrypt_128) - - .align 32 -ENTRY(aes_sparc64_decrypt_192) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - ldd [%o0 + 0xc0], %f8 - ldd [%o0 + 0xc8], %f10 - ldd [%o0 + 0xb8], %f12 - ldd [%o0 + 0xb0], %f14 - ldd [%o0 + 0xa8], %f16 - ldd [%o0 + 0xa0], %f18 - fxor %f8, %f4, %f4 - fxor %f10, %f6, %f6 - ldd [%o0 + 0x98], %f20 - ldd [%o0 + 0x90], %f22 - ldd [%o0 + 0x88], %f24 - ldd [%o0 + 0x80], %f26 - DECRYPT_TWO_ROUNDS(12, 4, 6, 0, 2) - ldd [%o0 + 0x78], %f28 - ldd [%o0 + 0x70], %f30 - ldd [%o0 + 0x68], %f32 - ldd [%o0 + 0x60], %f34 - ldd [%o0 + 0x58], %f36 - ldd [%o0 + 0x50], %f38 - ldd [%o0 + 0x48], %f40 - ldd [%o0 + 0x40], %f42 - ldd [%o0 + 0x38], %f44 - ldd [%o0 + 0x30], %f46 - ldd [%o0 + 0x28], %f48 - ldd [%o0 + 0x20], %f50 - ldd [%o0 + 0x18], %f52 - ldd [%o0 + 0x10], %f54 - ldd [%o0 + 0x08], %f56 - ldd [%o0 + 0x00], %f58 - DECRYPT_128(20, 4, 6, 0, 2) - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - retl - VISExit -ENDPROC(aes_sparc64_decrypt_192) - - .align 32 -ENTRY(aes_sparc64_decrypt_256) - /* %o0=key, %o1=input, %o2=output */ - VISEntry - ld [%o1 + 0x00], %f4 - ld [%o1 + 0x04], %f5 - ld [%o1 + 0x08], %f6 - ld [%o1 + 0x0c], %f7 - ldd [%o0 + 0xe0], %f8 - ldd [%o0 + 0xe8], %f10 - ldd [%o0 + 0xd8], %f12 - ldd [%o0 + 0xd0], %f14 - ldd [%o0 + 0xc8], %f16 - fxor %f8, %f4, %f4 - ldd [%o0 + 0xc0], %f18 - fxor %f10, %f6, %f6 - ldd [%o0 + 0xb8], %f20 - AES_DROUND23(12, 4, 6, 2) - ldd [%o0 + 0xb0], %f22 - AES_DROUND01(14, 4, 6, 0) - ldd [%o0 + 0xa8], %f24 - AES_DROUND23(16, 0, 2, 6) - ldd [%o0 + 0xa0], %f26 - AES_DROUND01(18, 0, 2, 4) - ldd [%o0 + 0x98], %f12 - AES_DROUND23(20, 4, 6, 2) - ldd [%o0 + 0x90], %f14 - AES_DROUND01(22, 4, 6, 0) - ldd [%o0 + 0x88], %f16 - AES_DROUND23(24, 0, 2, 6) - ldd [%o0 + 0x80], %f18 - AES_DROUND01(26, 0, 2, 4) - ldd [%o0 + 0x78], %f20 - AES_DROUND23(12, 4, 6, 2) - ldd [%o0 + 0x70], %f22 - AES_DROUND01(14, 4, 6, 0) - ldd [%o0 + 0x68], %f24 - AES_DROUND23(16, 0, 2, 6) - ldd [%o0 + 0x60], %f26 - AES_DROUND01(18, 0, 2, 4) - ldd [%o0 + 0x58], %f28 - AES_DROUND23(20, 4, 6, 2) - ldd [%o0 + 0x50], %f30 - AES_DROUND01(22, 4, 6, 0) - ldd [%o0 + 0x48], %f32 - AES_DROUND23(24, 0, 2, 6) - ldd [%o0 + 0x40], %f34 - AES_DROUND01(26, 0, 2, 4) - ldd [%o0 + 0x38], %f36 - AES_DROUND23(28, 4, 6, 2) - ldd [%o0 + 0x30], %f38 - AES_DROUND01(30, 4, 6, 0) - ldd [%o0 + 0x28], %f40 - AES_DROUND23(32, 0, 2, 6) - ldd [%o0 + 0x20], %f42 - AES_DROUND01(34, 0, 2, 4) - ldd [%o0 + 0x18], %f44 - AES_DROUND23(36, 4, 6, 2) - ldd [%o0 + 0x10], %f46 - AES_DROUND01(38, 4, 6, 0) - ldd [%o0 + 0x08], %f48 - AES_DROUND23(40, 0, 2, 6) - ldd [%o0 + 0x00], %f50 - AES_DROUND01(42, 0, 2, 4) - AES_DROUND23(44, 4, 6, 2) - AES_DROUND01(46, 4, 6, 0) - AES_DROUND23_L(48, 0, 2, 6) - AES_DROUND01_L(50, 0, 2, 4) - st %f4, [%o2 + 0x00] - st %f5, [%o2 + 0x04] - st %f6, [%o2 + 0x08] - st %f7, [%o2 + 0x0c] - retl - VISExit -ENDPROC(aes_sparc64_decrypt_256) - - .align 32 -ENTRY(aes_sparc64_load_encrypt_keys_128) - /* %o0=key */ - VISEntry - ldd [%o0 + 0x10], %f8 - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - ldd [%o0 + 0x30], %f16 - ldd [%o0 + 0x38], %f18 - ldd [%o0 + 0x40], %f20 - ldd [%o0 + 0x48], %f22 - ldd [%o0 + 0x50], %f24 - ldd [%o0 + 0x58], %f26 - ldd [%o0 + 0x60], %f28 - ldd [%o0 + 0x68], %f30 - ldd [%o0 + 0x70], %f32 - ldd [%o0 + 0x78], %f34 - ldd [%o0 + 0x80], %f36 - ldd [%o0 + 0x88], %f38 - ldd [%o0 + 0x90], %f40 - ldd [%o0 + 0x98], %f42 - ldd [%o0 + 0xa0], %f44 - retl - ldd [%o0 + 0xa8], %f46 -ENDPROC(aes_sparc64_load_encrypt_keys_128) - - .align 32 -ENTRY(aes_sparc64_load_encrypt_keys_192) - /* %o0=key */ - VISEntry - ldd [%o0 + 0x10], %f8 - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - ldd [%o0 + 0x30], %f16 - ldd [%o0 + 0x38], %f18 - ldd [%o0 + 0x40], %f20 - ldd [%o0 + 0x48], %f22 - ldd [%o0 + 0x50], %f24 - ldd [%o0 + 0x58], %f26 - ldd [%o0 + 0x60], %f28 - ldd [%o0 + 0x68], %f30 - ldd [%o0 + 0x70], %f32 - ldd [%o0 + 0x78], %f34 - ldd [%o0 + 0x80], %f36 - ldd [%o0 + 0x88], %f38 - ldd [%o0 + 0x90], %f40 - ldd [%o0 + 0x98], %f42 - ldd [%o0 + 0xa0], %f44 - ldd [%o0 + 0xa8], %f46 - ldd [%o0 + 0xb0], %f48 - ldd [%o0 + 0xb8], %f50 - ldd [%o0 + 0xc0], %f52 - retl - ldd [%o0 + 0xc8], %f54 -ENDPROC(aes_sparc64_load_encrypt_keys_192) - - .align 32 -ENTRY(aes_sparc64_load_encrypt_keys_256) - /* %o0=key */ - VISEntry - ldd [%o0 + 0x10], %f8 - ldd [%o0 + 0x18], %f10 - ldd [%o0 + 0x20], %f12 - ldd [%o0 + 0x28], %f14 - ldd [%o0 + 0x30], %f16 - ldd [%o0 + 0x38], %f18 - ldd [%o0 + 0x40], %f20 - ldd [%o0 + 0x48], %f22 - ldd [%o0 + 0x50], %f24 - ldd [%o0 + 0x58], %f26 - ldd [%o0 + 0x60], %f28 - ldd [%o0 + 0x68], %f30 - ldd [%o0 + 0x70], %f32 - ldd [%o0 + 0x78], %f34 - ldd [%o0 + 0x80], %f36 - ldd [%o0 + 0x88], %f38 - ldd [%o0 + 0x90], %f40 - ldd [%o0 + 0x98], %f42 - ldd [%o0 + 0xa0], %f44 - ldd [%o0 + 0xa8], %f46 - ldd [%o0 + 0xb0], %f48 - ldd [%o0 + 0xb8], %f50 - ldd [%o0 + 0xc0], %f52 - ldd [%o0 + 0xc8], %f54 - ldd [%o0 + 0xd0], %f56 - ldd [%o0 + 0xd8], %f58 - ldd [%o0 + 0xe0], %f60 - retl - ldd [%o0 + 0xe8], %f62 -ENDPROC(aes_sparc64_load_encrypt_keys_256) - - .align 32 -ENTRY(aes_sparc64_load_decrypt_keys_128) - /* %o0=key */ - VISEntry - ldd [%o0 + 0x98], %f8 - ldd [%o0 + 0x90], %f10 - ldd [%o0 + 0x88], %f12 - ldd [%o0 + 0x80], %f14 - ldd [%o0 + 0x78], %f16 - ldd [%o0 + 0x70], %f18 - ldd [%o0 + 0x68], %f20 - ldd [%o0 + 0x60], %f22 - ldd [%o0 + 0x58], %f24 - ldd [%o0 + 0x50], %f26 - ldd [%o0 + 0x48], %f28 - ldd [%o0 + 0x40], %f30 - ldd [%o0 + 0x38], %f32 - ldd [%o0 + 0x30], %f34 - ldd [%o0 + 0x28], %f36 - ldd [%o0 + 0x20], %f38 - ldd [%o0 + 0x18], %f40 - ldd [%o0 + 0x10], %f42 - ldd [%o0 + 0x08], %f44 - retl - ldd [%o0 + 0x00], %f46 -ENDPROC(aes_sparc64_load_decrypt_keys_128) - - .align 32 -ENTRY(aes_sparc64_load_decrypt_keys_192) - /* %o0=key */ - VISEntry - ldd [%o0 + 0xb8], %f8 - ldd [%o0 + 0xb0], %f10 - ldd [%o0 + 0xa8], %f12 - ldd [%o0 + 0xa0], %f14 - ldd [%o0 + 0x98], %f16 - ldd [%o0 + 0x90], %f18 - ldd [%o0 + 0x88], %f20 - ldd [%o0 + 0x80], %f22 - ldd [%o0 + 0x78], %f24 - ldd [%o0 + 0x70], %f26 - ldd [%o0 + 0x68], %f28 - ldd [%o0 + 0x60], %f30 - ldd [%o0 + 0x58], %f32 - ldd [%o0 + 0x50], %f34 - ldd [%o0 + 0x48], %f36 - ldd [%o0 + 0x40], %f38 - ldd [%o0 + 0x38], %f40 - ldd [%o0 + 0x30], %f42 - ldd [%o0 + 0x28], %f44 - ldd [%o0 + 0x20], %f46 - ldd [%o0 + 0x18], %f48 - ldd [%o0 + 0x10], %f50 - ldd [%o0 + 0x08], %f52 - retl - ldd [%o0 + 0x00], %f54 -ENDPROC(aes_sparc64_load_decrypt_keys_192) - - .align 32 -ENTRY(aes_sparc64_load_decrypt_keys_256) - /* %o0=key */ - VISEntry - ldd [%o0 + 0xd8], %f8 - ldd [%o0 + 0xd0], %f10 - ldd [%o0 + 0xc8], %f12 - ldd [%o0 + 0xc0], %f14 - ldd [%o0 + 0xb8], %f16 - ldd [%o0 + 0xb0], %f18 - ldd [%o0 + 0xa8], %f20 - ldd [%o0 + 0xa0], %f22 - ldd [%o0 + 0x98], %f24 - ldd [%o0 + 0x90], %f26 - ldd [%o0 + 0x88], %f28 - ldd [%o0 + 0x80], %f30 - ldd [%o0 + 0x78], %f32 - ldd [%o0 + 0x70], %f34 - ldd [%o0 + 0x68], %f36 - ldd [%o0 + 0x60], %f38 - ldd [%o0 + 0x58], %f40 - ldd [%o0 + 0x50], %f42 - ldd [%o0 + 0x48], %f44 - ldd [%o0 + 0x40], %f46 - ldd [%o0 + 0x38], %f48 - ldd [%o0 + 0x30], %f50 - ldd [%o0 + 0x28], %f52 - ldd [%o0 + 0x20], %f54 - ldd [%o0 + 0x18], %f56 - ldd [%o0 + 0x10], %f58 - ldd [%o0 + 0x08], %f60 - retl - ldd [%o0 + 0x00], %f62 -ENDPROC(aes_sparc64_load_decrypt_keys_256) - - .align 32 -ENTRY(aes_sparc64_ecb_encrypt_128) - /* %o0=key, %o1=input, %o2=output, %o3=len */ - ldx [%o0 + 0x00], %g1 - subcc %o3, 0x10, %o3 - be 10f - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F60 - MOVXTOD_G7_F62 - ENCRYPT_128_2(8, 4, 6, 60, 62, 0, 2, 56, 58) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - ENCRYPT_128(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_encrypt_128) - - .align 32 -ENTRY(aes_sparc64_ecb_encrypt_192) - /* %o0=key, %o1=input, %o2=output, %o3=len */ - ldx [%o0 + 0x00], %g1 - subcc %o3, 0x10, %o3 - be 10f - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F60 - MOVXTOD_G7_F62 - ENCRYPT_192_2(8, 4, 6, 60, 62, 0, 2, 56, 58) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - ENCRYPT_192(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_encrypt_192) - - .align 32 -ENTRY(aes_sparc64_ecb_encrypt_256) - /* %o0=key, %o1=input, %o2=output, %o3=len */ - ldx [%o0 + 0x00], %g1 - subcc %o3, 0x10, %o3 - be 10f - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F0 - MOVXTOD_G7_F2 - ENCRYPT_256_2(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f0, [%o2 + 0x10] - std %f2, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldd [%o0 + 0xd0], %f56 - ldd [%o0 + 0xd8], %f58 - ldd [%o0 + 0xe0], %f60 - ldd [%o0 + 0xe8], %f62 - ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - ENCRYPT_256(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_encrypt_256) - - .align 32 -ENTRY(aes_sparc64_ecb_decrypt_128) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ - ldx [%o0 - 0x10], %g1 - subcc %o3, 0x10, %o3 - be 10f - ldx [%o0 - 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F60 - MOVXTOD_G7_F62 - DECRYPT_128_2(8, 4, 6, 60, 62, 0, 2, 56, 58) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz,pt %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_128(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_decrypt_128) - - .align 32 -ENTRY(aes_sparc64_ecb_decrypt_192) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ - ldx [%o0 - 0x10], %g1 - subcc %o3, 0x10, %o3 - be 10f - ldx [%o0 - 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F60 - MOVXTOD_G7_F62 - DECRYPT_192_2(8, 4, 6, 60, 62, 0, 2, 56, 58) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz,pt %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_192(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_decrypt_192) - - .align 32 -ENTRY(aes_sparc64_ecb_decrypt_256) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ - ldx [%o0 - 0x10], %g1 - subcc %o3, 0x10, %o3 - ldx [%o0 - 0x08], %g2 - be 10f - sub %o0, 0xf0, %o0 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - ldx [%o1 + 0x10], %o4 - ldx [%o1 + 0x18], %o5 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - xor %g1, %o4, %g3 - xor %g2, %o5, %g7 - MOVXTOD_G3_F0 - MOVXTOD_G7_F2 - DECRYPT_256_2(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - std %f0, [%o2 + 0x10] - std %f2, [%o2 + 0x18] - sub %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz,pt %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldd [%o0 + 0x18], %f56 - ldd [%o0 + 0x10], %f58 - ldd [%o0 + 0x08], %f60 - ldd [%o0 + 0x00], %f62 - ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_256(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: retl - nop -ENDPROC(aes_sparc64_ecb_decrypt_256) - - .align 32 -ENTRY(aes_sparc64_cbc_encrypt_128) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldd [%o4 + 0x00], %f4 - ldd [%o4 + 0x08], %f6 - ldx [%o0 + 0x00], %g1 - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F0 - MOVXTOD_G7_F2 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - ENCRYPT_128(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - std %f4, [%o4 + 0x00] - std %f6, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_encrypt_128) - - .align 32 -ENTRY(aes_sparc64_cbc_encrypt_192) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldd [%o4 + 0x00], %f4 - ldd [%o4 + 0x08], %f6 - ldx [%o0 + 0x00], %g1 - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F0 - MOVXTOD_G7_F2 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - ENCRYPT_192(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - std %f4, [%o4 + 0x00] - std %f6, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_encrypt_192) - - .align 32 -ENTRY(aes_sparc64_cbc_encrypt_256) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldd [%o4 + 0x00], %f4 - ldd [%o4 + 0x08], %f6 - ldx [%o0 + 0x00], %g1 - ldx [%o0 + 0x08], %g2 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F0 - MOVXTOD_G7_F2 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - ENCRYPT_256(8, 4, 6, 0, 2) - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - std %f4, [%o4 + 0x00] - std %f6, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_encrypt_256) - - .align 32 -ENTRY(aes_sparc64_cbc_decrypt_128) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ - ldx [%o0 - 0x10], %g1 - ldx [%o0 - 0x08], %g2 - ldx [%o4 + 0x00], %o0 - ldx [%o4 + 0x08], %o5 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_128(8, 4, 6, 0, 2) - MOVXTOD_O0_F0 - MOVXTOD_O5_F2 - xor %g1, %g3, %o0 - xor %g2, %g7, %o5 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - stx %o0, [%o4 + 0x00] - stx %o5, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_decrypt_128) - - .align 32 -ENTRY(aes_sparc64_cbc_decrypt_192) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ - ldx [%o0 - 0x10], %g1 - ldx [%o0 - 0x08], %g2 - ldx [%o4 + 0x00], %o0 - ldx [%o4 + 0x08], %o5 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_192(8, 4, 6, 0, 2) - MOVXTOD_O0_F0 - MOVXTOD_O5_F2 - xor %g1, %g3, %o0 - xor %g2, %g7, %o5 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - stx %o0, [%o4 + 0x00] - stx %o5, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_decrypt_192) - - .align 32 -ENTRY(aes_sparc64_cbc_decrypt_256) - /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ - ldx [%o0 - 0x10], %g1 - ldx [%o0 - 0x08], %g2 - ldx [%o4 + 0x00], %o0 - ldx [%o4 + 0x08], %o5 -1: ldx [%o1 + 0x00], %g3 - ldx [%o1 + 0x08], %g7 - add %o1, 0x10, %o1 - xor %g1, %g3, %g3 - xor %g2, %g7, %g7 - MOVXTOD_G3_F4 - MOVXTOD_G7_F6 - DECRYPT_256(8, 4, 6, 0, 2) - MOVXTOD_O0_F0 - MOVXTOD_O5_F2 - xor %g1, %g3, %o0 - xor %g2, %g7, %o5 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] - subcc %o3, 0x10, %o3 - bne,pt %xcc, 1b - add %o2, 0x10, %o2 - stx %o0, [%o4 + 0x00] - stx %o5, [%o4 + 0x08] - retl - nop -ENDPROC(aes_sparc64_cbc_decrypt_256) - - .align 32 -ENTRY(aes_sparc64_ctr_crypt_128) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldx [%o4 + 0x00], %g3 - ldx [%o4 + 0x08], %g7 - subcc %o3, 0x10, %o3 - ldx [%o0 + 0x00], %g1 - be 10f - ldx [%o0 + 0x08], %g2 -1: xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - xor %g1, %g3, %o5 - MOVXTOD_O5_F4 - xor %g2, %g7, %o5 - MOVXTOD_O5_F6 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_128_2(8, 0, 2, 4, 6, 56, 58, 60, 62) - ldd [%o1 + 0x00], %f56 - ldd [%o1 + 0x08], %f58 - ldd [%o1 + 0x10], %f60 - ldd [%o1 + 0x18], %f62 - fxor %f56, %f0, %f56 - fxor %f58, %f2, %f58 - fxor %f60, %f4, %f60 - fxor %f62, %f6, %f62 - std %f56, [%o2 + 0x00] - std %f58, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - subcc %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_128(8, 0, 2, 4, 6) - ldd [%o1 + 0x00], %f4 - ldd [%o1 + 0x08], %f6 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: stx %g3, [%o4 + 0x00] - retl - stx %g7, [%o4 + 0x08] -ENDPROC(aes_sparc64_ctr_crypt_128) - - .align 32 -ENTRY(aes_sparc64_ctr_crypt_192) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldx [%o4 + 0x00], %g3 - ldx [%o4 + 0x08], %g7 - subcc %o3, 0x10, %o3 - ldx [%o0 + 0x00], %g1 - be 10f - ldx [%o0 + 0x08], %g2 -1: xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - xor %g1, %g3, %o5 - MOVXTOD_O5_F4 - xor %g2, %g7, %o5 - MOVXTOD_O5_F6 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_192_2(8, 0, 2, 4, 6, 56, 58, 60, 62) - ldd [%o1 + 0x00], %f56 - ldd [%o1 + 0x08], %f58 - ldd [%o1 + 0x10], %f60 - ldd [%o1 + 0x18], %f62 - fxor %f56, %f0, %f56 - fxor %f58, %f2, %f58 - fxor %f60, %f4, %f60 - fxor %f62, %f6, %f62 - std %f56, [%o2 + 0x00] - std %f58, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - subcc %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_192(8, 0, 2, 4, 6) - ldd [%o1 + 0x00], %f4 - ldd [%o1 + 0x08], %f6 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: stx %g3, [%o4 + 0x00] - retl - stx %g7, [%o4 + 0x08] -ENDPROC(aes_sparc64_ctr_crypt_192) - - .align 32 -ENTRY(aes_sparc64_ctr_crypt_256) - /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ - ldx [%o4 + 0x00], %g3 - ldx [%o4 + 0x08], %g7 - subcc %o3, 0x10, %o3 - ldx [%o0 + 0x00], %g1 - be 10f - ldx [%o0 + 0x08], %g2 -1: xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - xor %g1, %g3, %o5 - MOVXTOD_O5_F4 - xor %g2, %g7, %o5 - MOVXTOD_O5_F6 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_256_2(8, 0, 2, 4, 6) - ldd [%o1 + 0x00], %f56 - ldd [%o1 + 0x08], %f58 - ldd [%o1 + 0x10], %f60 - ldd [%o1 + 0x18], %f62 - fxor %f56, %f0, %f56 - fxor %f58, %f2, %f58 - fxor %f60, %f4, %f60 - fxor %f62, %f6, %f62 - std %f56, [%o2 + 0x00] - std %f58, [%o2 + 0x08] - std %f60, [%o2 + 0x10] - std %f62, [%o2 + 0x18] - subcc %o3, 0x20, %o3 - add %o1, 0x20, %o1 - brgz %o3, 1b - add %o2, 0x20, %o2 - brlz,pt %o3, 11f - nop -10: ldd [%o0 + 0xd0], %f56 - ldd [%o0 + 0xd8], %f58 - ldd [%o0 + 0xe0], %f60 - ldd [%o0 + 0xe8], %f62 - xor %g1, %g3, %o5 - MOVXTOD_O5_F0 - xor %g2, %g7, %o5 - MOVXTOD_O5_F2 - add %g7, 1, %g7 - add %g3, 1, %o5 - movrz %g7, %o5, %g3 - ENCRYPT_256(8, 0, 2, 4, 6) - ldd [%o1 + 0x00], %f4 - ldd [%o1 + 0x08], %f6 - fxor %f4, %f0, %f4 - fxor %f6, %f2, %f6 - std %f4, [%o2 + 0x00] - std %f6, [%o2 + 0x08] -11: stx %g3, [%o4 + 0x00] - retl - stx %g7, [%o4 + 0x08] -ENDPROC(aes_sparc64_ctr_crypt_256) diff --git a/arch/sparc/crypto/aes_glue.c b/arch/sparc/crypto/aes_glue.c index 359f22643b05..661561837415 100644 --- a/arch/sparc/crypto/aes_glue.c +++ b/arch/sparc/crypto/aes_glue.c @@ -32,8 +32,6 @@ #include struct aes_ops { - void (*encrypt)(const u64 *key, const u32 *input, u32 *output); - void (*decrypt)(const u64 *key, const u32 *input, u32 *output); void (*load_encrypt_keys)(const u64 *key); void (*load_decrypt_keys)(const u64 *key); void (*ecb_encrypt)(const u64 *key, const u64 *input, u64 *output, @@ -55,79 +53,7 @@ struct crypto_sparc64_aes_ctx { u32 expanded_key_length; }; -extern void aes_sparc64_encrypt_128(const u64 *key, const u32 *input, - u32 *output); -extern void aes_sparc64_encrypt_192(const u64 *key, const u32 *input, - u32 *output); -extern void aes_sparc64_encrypt_256(const u64 *key, const u32 *input, - u32 *output); - -extern void aes_sparc64_decrypt_128(const u64 *key, const u32 *input, - u32 *output); -extern void aes_sparc64_decrypt_192(const u64 *key, const u32 *input, - u32 *output); -extern void aes_sparc64_decrypt_256(const u64 *key, const u32 *input, - u32 *output); - -extern void aes_sparc64_load_encrypt_keys_128(const u64 *key); -extern void aes_sparc64_load_encrypt_keys_192(const u64 *key); -extern void aes_sparc64_load_encrypt_keys_256(const u64 *key); - -extern void aes_sparc64_load_decrypt_keys_128(const u64 *key); -extern void aes_sparc64_load_decrypt_keys_192(const u64 *key); -extern void aes_sparc64_load_decrypt_keys_256(const u64 *key); - -extern void aes_sparc64_ecb_encrypt_128(const u64 *key, const u64 *input, - u64 *output, unsigned int len); -extern void aes_sparc64_ecb_encrypt_192(const u64 *key, const u64 *input, - u64 *output, unsigned int len); -extern void aes_sparc64_ecb_encrypt_256(const u64 *key, const u64 *input, - u64 *output, unsigned int len); - -extern void aes_sparc64_ecb_decrypt_128(const u64 *key, const u64 *input, - u64 *output, unsigned int len); -extern void aes_sparc64_ecb_decrypt_192(const u64 *key, const u64 *input, - u64 *output, unsigned int len); -extern void aes_sparc64_ecb_decrypt_256(const u64 *key, const u64 *input, - u64 *output, unsigned int len); - -extern void aes_sparc64_cbc_encrypt_128(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_cbc_encrypt_192(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_cbc_encrypt_256(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_cbc_decrypt_128(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_cbc_decrypt_192(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_cbc_decrypt_256(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - -extern void aes_sparc64_ctr_crypt_128(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); -extern void aes_sparc64_ctr_crypt_192(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); -extern void aes_sparc64_ctr_crypt_256(const u64 *key, const u64 *input, - u64 *output, unsigned int len, - u64 *iv); - static struct aes_ops aes128_ops = { - .encrypt = aes_sparc64_encrypt_128, - .decrypt = aes_sparc64_decrypt_128, .load_encrypt_keys = aes_sparc64_load_encrypt_keys_128, .load_decrypt_keys = aes_sparc64_load_decrypt_keys_128, .ecb_encrypt = aes_sparc64_ecb_encrypt_128, @@ -138,8 +64,6 @@ static struct aes_ops aes128_ops = { }; static struct aes_ops aes192_ops = { - .encrypt = aes_sparc64_encrypt_192, - .decrypt = aes_sparc64_decrypt_192, .load_encrypt_keys = aes_sparc64_load_encrypt_keys_192, .load_decrypt_keys = aes_sparc64_load_decrypt_keys_192, .ecb_encrypt = aes_sparc64_ecb_encrypt_192, @@ -150,8 +74,6 @@ static struct aes_ops aes192_ops = { }; static struct aes_ops aes256_ops = { - .encrypt = aes_sparc64_encrypt_256, - .decrypt = aes_sparc64_decrypt_256, .load_encrypt_keys = aes_sparc64_load_encrypt_keys_256, .load_decrypt_keys = aes_sparc64_load_decrypt_keys_256, .ecb_encrypt = aes_sparc64_ecb_encrypt_256, @@ -161,13 +83,10 @@ static struct aes_ops aes256_ops = { .ctr_crypt = aes_sparc64_ctr_crypt_256, }; -extern void aes_sparc64_key_expand(const u32 *in_key, u64 *output_key, - unsigned int key_len); - -static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) +static int aes_set_key_skcipher(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) { - struct crypto_sparc64_aes_ctx *ctx = crypto_tfm_ctx(tfm); + struct crypto_sparc64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); switch (key_len) { case AES_KEYSIZE_128: @@ -195,26 +114,6 @@ static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key, return 0; } -static int aes_set_key_skcipher(struct crypto_skcipher *tfm, const u8 *in_key, - unsigned int key_len) -{ - return aes_set_key(crypto_skcipher_tfm(tfm), in_key, key_len); -} - -static void crypto_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) -{ - struct crypto_sparc64_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - ctx->ops->encrypt(&ctx->key[0], (const u32 *) src, (u32 *) dst); -} - -static void crypto_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) -{ - struct crypto_sparc64_aes_ctx *ctx = crypto_tfm_ctx(tfm); - - ctx->ops->decrypt(&ctx->key[0], (const u32 *) src, (u32 *) dst); -} - static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -358,26 +257,6 @@ static int ctr_crypt(struct skcipher_request *req) return err; } -static struct crypto_alg cipher_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-sparc64", - .cra_priority = SPARC_CR_OPCODE_PRIORITY, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct crypto_sparc64_aes_ctx), - .cra_alignmask = 3, - .cra_module = THIS_MODULE, - .cra_u = { - .cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = aes_set_key, - .cia_encrypt = crypto_aes_encrypt, - .cia_decrypt = crypto_aes_decrypt - } - } -}; - static struct skcipher_alg skcipher_algs[] = { { .base.cra_name = "ecb(aes)", @@ -440,26 +319,17 @@ static bool __init sparc64_has_aes_opcode(void) static int __init aes_sparc64_mod_init(void) { - int err; - if (!sparc64_has_aes_opcode()) { pr_info("sparc64 aes opcodes not available.\n"); return -ENODEV; } pr_info("Using sparc64 aes opcodes optimized AES implementation\n"); - err = crypto_register_alg(&cipher_alg); - if (err) - return err; - err = crypto_register_skciphers(skcipher_algs, - ARRAY_SIZE(skcipher_algs)); - if (err) - crypto_unregister_alg(&cipher_alg); - return err; + return crypto_register_skciphers(skcipher_algs, + ARRAY_SIZE(skcipher_algs)); } static void __exit aes_sparc64_mod_fini(void) { - crypto_unregister_alg(&cipher_alg); crypto_unregister_skciphers(skcipher_algs, ARRAY_SIZE(skcipher_algs)); } diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 19fd99f383fb..4a56aed59973 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -49,6 +49,9 @@ union aes_enckey_arch { #elif defined(CONFIG_S390) /* Used when the CPU supports CPACF AES for this key's length */ u8 raw_key[AES_MAX_KEY_SIZE]; +#elif defined(CONFIG_SPARC64) + /* Used when the CPU supports the SPARC64 AES opcodes */ + u64 sparc_rndkeys[AES_MAX_KEYLENGTH / sizeof(u64)]; #endif #endif /* CONFIG_CRYPTO_LIB_AES_ARCH */ }; @@ -199,6 +202,45 @@ void aes_p8_xts_encrypt(const u8 *in, u8 *out, size_t len, void aes_p8_xts_decrypt(const u8 *in, u8 *out, size_t len, const struct p8_aes_key *key1, const struct p8_aes_key *key2, u8 *iv); +#elif defined(CONFIG_SPARC64) +void aes_sparc64_key_expand(const u32 *in_key, u64 *output_key, + unsigned int key_len); +void aes_sparc64_load_encrypt_keys_128(const u64 *key); +void aes_sparc64_load_encrypt_keys_192(const u64 *key); +void aes_sparc64_load_encrypt_keys_256(const u64 *key); +void aes_sparc64_load_decrypt_keys_128(const u64 *key); +void aes_sparc64_load_decrypt_keys_192(const u64 *key); +void aes_sparc64_load_decrypt_keys_256(const u64 *key); +void aes_sparc64_ecb_encrypt_128(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_ecb_encrypt_192(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_ecb_encrypt_256(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_ecb_decrypt_128(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_ecb_decrypt_192(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_ecb_decrypt_256(const u64 *key, const u64 *input, u64 *output, + unsigned int len); +void aes_sparc64_cbc_encrypt_128(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_cbc_encrypt_192(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_cbc_encrypt_256(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_cbc_decrypt_128(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_cbc_decrypt_192(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_cbc_decrypt_256(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_ctr_crypt_128(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_ctr_crypt_192(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); +void aes_sparc64_ctr_crypt_256(const u64 *key, const u64 *input, u64 *output, + unsigned int len, u64 *iv); #endif /** diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index 56a9b4f53b0e..920d96e6b498 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -20,6 +20,7 @@ config CRYPTO_LIB_AES_ARCH default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS default y if S390 + default y if SPARC64 config CRYPTO_LIB_AESCFB tristate diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 811b60787dd5..761d52d91f92 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -51,6 +51,7 @@ endif # !CONFIG_SPE endif # CONFIG_PPC libaes-$(CONFIG_RISCV) += riscv/aes-riscv64-zvkned.o +libaes-$(CONFIG_SPARC) += sparc/aes_asm.o endif # CONFIG_CRYPTO_LIB_AES_ARCH ################################################################################ diff --git a/lib/crypto/sparc/aes.h b/lib/crypto/sparc/aes.h new file mode 100644 index 000000000000..e354aa507ee0 --- /dev/null +++ b/lib/crypto/sparc/aes.h @@ -0,0 +1,149 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * AES accelerated using the sparc64 aes opcodes + * + * Copyright (C) 2008, Intel Corp. + * Copyright (c) 2010, Intel Corporation. + * Copyright 2026 Google LLC + */ + +#include +#include +#include +#include + +static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_aes_opcodes); + +EXPORT_SYMBOL_GPL(aes_sparc64_key_expand); +EXPORT_SYMBOL_GPL(aes_sparc64_load_encrypt_keys_128); +EXPORT_SYMBOL_GPL(aes_sparc64_load_encrypt_keys_192); +EXPORT_SYMBOL_GPL(aes_sparc64_load_encrypt_keys_256); +EXPORT_SYMBOL_GPL(aes_sparc64_load_decrypt_keys_128); +EXPORT_SYMBOL_GPL(aes_sparc64_load_decrypt_keys_192); +EXPORT_SYMBOL_GPL(aes_sparc64_load_decrypt_keys_256); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_encrypt_128); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_encrypt_192); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_encrypt_256); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_decrypt_128); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_decrypt_192); +EXPORT_SYMBOL_GPL(aes_sparc64_ecb_decrypt_256); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_encrypt_128); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_encrypt_192); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_encrypt_256); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_decrypt_128); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_decrypt_192); +EXPORT_SYMBOL_GPL(aes_sparc64_cbc_decrypt_256); +EXPORT_SYMBOL_GPL(aes_sparc64_ctr_crypt_128); +EXPORT_SYMBOL_GPL(aes_sparc64_ctr_crypt_192); +EXPORT_SYMBOL_GPL(aes_sparc64_ctr_crypt_256); + +void aes_sparc64_encrypt_128(const u64 *key, const u32 *input, u32 *output); +void aes_sparc64_encrypt_192(const u64 *key, const u32 *input, u32 *output); +void aes_sparc64_encrypt_256(const u64 *key, const u32 *input, u32 *output); +void aes_sparc64_decrypt_128(const u64 *key, const u32 *input, u32 *output); +void aes_sparc64_decrypt_192(const u64 *key, const u32 *input, u32 *output); +void aes_sparc64_decrypt_256(const u64 *key, const u32 *input, u32 *output); + +static void aes_preparekey_arch(union aes_enckey_arch *k, + union aes_invkey_arch *inv_k, + const u8 *in_key, int key_len, int nrounds) +{ + if (static_branch_likely(&have_aes_opcodes)) { + u32 aligned_key[AES_MAX_KEY_SIZE / 4]; + + if (IS_ALIGNED((uintptr_t)in_key, 4)) { + aes_sparc64_key_expand((const u32 *)in_key, + k->sparc_rndkeys, key_len); + } else { + memcpy(aligned_key, in_key, key_len); + aes_sparc64_key_expand(aligned_key, + k->sparc_rndkeys, key_len); + memzero_explicit(aligned_key, key_len); + } + /* + * Note that nothing needs to be written to inv_k (if it's + * non-NULL) here, since the SPARC64 assembly code uses + * k->sparc_rndkeys for both encryption and decryption. + */ + } else { + aes_expandkey_generic(k->rndkeys, + inv_k ? inv_k->inv_rndkeys : NULL, + in_key, key_len); + } +} + +static void aes_sparc64_encrypt(const struct aes_enckey *key, + const u32 *input, u32 *output) +{ + if (key->len == AES_KEYSIZE_128) + aes_sparc64_encrypt_128(key->k.sparc_rndkeys, input, output); + else if (key->len == AES_KEYSIZE_192) + aes_sparc64_encrypt_192(key->k.sparc_rndkeys, input, output); + else + aes_sparc64_encrypt_256(key->k.sparc_rndkeys, input, output); +} + +static void aes_encrypt_arch(const struct aes_enckey *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + u32 bounce_buf[AES_BLOCK_SIZE / 4]; + + if (static_branch_likely(&have_aes_opcodes)) { + if (IS_ALIGNED((uintptr_t)in | (uintptr_t)out, 4)) { + aes_sparc64_encrypt(key, (const u32 *)in, (u32 *)out); + } else { + memcpy(bounce_buf, in, AES_BLOCK_SIZE); + aes_sparc64_encrypt(key, bounce_buf, bounce_buf); + memcpy(out, bounce_buf, AES_BLOCK_SIZE); + } + } else { + aes_encrypt_generic(key->k.rndkeys, key->nrounds, out, in); + } +} + +static void aes_sparc64_decrypt(const struct aes_key *key, + const u32 *input, u32 *output) +{ + if (key->len == AES_KEYSIZE_128) + aes_sparc64_decrypt_128(key->k.sparc_rndkeys, input, output); + else if (key->len == AES_KEYSIZE_192) + aes_sparc64_decrypt_192(key->k.sparc_rndkeys, input, output); + else + aes_sparc64_decrypt_256(key->k.sparc_rndkeys, input, output); +} + +static void aes_decrypt_arch(const struct aes_key *key, + u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) +{ + u32 bounce_buf[AES_BLOCK_SIZE / 4]; + + if (static_branch_likely(&have_aes_opcodes)) { + if (IS_ALIGNED((uintptr_t)in | (uintptr_t)out, 4)) { + aes_sparc64_decrypt(key, (const u32 *)in, (u32 *)out); + } else { + memcpy(bounce_buf, in, AES_BLOCK_SIZE); + aes_sparc64_decrypt(key, bounce_buf, bounce_buf); + memcpy(out, bounce_buf, AES_BLOCK_SIZE); + } + } else { + aes_decrypt_generic(key->inv_k.inv_rndkeys, key->nrounds, + out, in); + } +} + +#define aes_mod_init_arch aes_mod_init_arch +static void aes_mod_init_arch(void) +{ + unsigned long cfr; + + if (!(sparc64_elf_hwcap & HWCAP_SPARC_CRYPTO)) + return; + + __asm__ __volatile__("rd %%asr26, %0" : "=r" (cfr)); + if (!(cfr & CFR_AES)) + return; + + static_branch_enable(&have_aes_opcodes); +} diff --git a/lib/crypto/sparc/aes_asm.S b/lib/crypto/sparc/aes_asm.S new file mode 100644 index 000000000000..f291174a72a1 --- /dev/null +++ b/lib/crypto/sparc/aes_asm.S @@ -0,0 +1,1543 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include + +#define ENCRYPT_TWO_ROUNDS(KEY_BASE, I0, I1, T0, T1) \ + AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ + AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ + AES_EROUND01(KEY_BASE + 4, T0, T1, I0) \ + AES_EROUND23(KEY_BASE + 6, T0, T1, I1) + +#define ENCRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ + AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ + AES_EROUND01(KEY_BASE + 0, I2, I3, T2) \ + AES_EROUND23(KEY_BASE + 2, I2, I3, T3) \ + AES_EROUND01(KEY_BASE + 4, T0, T1, I0) \ + AES_EROUND23(KEY_BASE + 6, T0, T1, I1) \ + AES_EROUND01(KEY_BASE + 4, T2, T3, I2) \ + AES_EROUND23(KEY_BASE + 6, T2, T3, I3) + +#define ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE, I0, I1, T0, T1) \ + AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ + AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ + AES_EROUND01_L(KEY_BASE + 4, T0, T1, I0) \ + AES_EROUND23_L(KEY_BASE + 6, T0, T1, I1) + +#define ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + AES_EROUND01(KEY_BASE + 0, I0, I1, T0) \ + AES_EROUND23(KEY_BASE + 2, I0, I1, T1) \ + AES_EROUND01(KEY_BASE + 0, I2, I3, T2) \ + AES_EROUND23(KEY_BASE + 2, I2, I3, T3) \ + AES_EROUND01_L(KEY_BASE + 4, T0, T1, I0) \ + AES_EROUND23_L(KEY_BASE + 6, T0, T1, I1) \ + AES_EROUND01_L(KEY_BASE + 4, T2, T3, I2) \ + AES_EROUND23_L(KEY_BASE + 6, T2, T3, I3) + + /* 10 rounds */ +#define ENCRYPT_128(KEY_BASE, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 32, I0, I1, T0, T1) + +#define ENCRYPT_128_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) + + /* 12 rounds */ +#define ENCRYPT_192(KEY_BASE, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 40, I0, I1, T0, T1) + +#define ENCRYPT_192_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) \ + ENCRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 40, I0, I1, I2, I3, T0, T1, T2, T3) + + /* 14 rounds */ +#define ENCRYPT_256(KEY_BASE, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS(KEY_BASE + 40, I0, I1, T0, T1) \ + ENCRYPT_TWO_ROUNDS_LAST(KEY_BASE + 48, I0, I1, T0, T1) + +#define ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, TMP_BASE) \ + ENCRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, \ + TMP_BASE + 0, TMP_BASE + 2, TMP_BASE + 4, TMP_BASE + 6) + +#define ENCRYPT_256_2(KEY_BASE, I0, I1, I2, I3) \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, KEY_BASE + 48) \ + ldd [%o0 + 0xd0], %f56; \ + ldd [%o0 + 0xd8], %f58; \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, KEY_BASE + 0) \ + ldd [%o0 + 0xe0], %f60; \ + ldd [%o0 + 0xe8], %f62; \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, KEY_BASE + 0) \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, KEY_BASE + 0) \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, KEY_BASE + 0) \ + ENCRYPT_256_TWO_ROUNDS_2(KEY_BASE + 40, I0, I1, I2, I3, KEY_BASE + 0) \ + AES_EROUND01(KEY_BASE + 48, I0, I1, KEY_BASE + 0) \ + AES_EROUND23(KEY_BASE + 50, I0, I1, KEY_BASE + 2) \ + AES_EROUND01(KEY_BASE + 48, I2, I3, KEY_BASE + 4) \ + AES_EROUND23(KEY_BASE + 50, I2, I3, KEY_BASE + 6) \ + AES_EROUND01_L(KEY_BASE + 52, KEY_BASE + 0, KEY_BASE + 2, I0) \ + AES_EROUND23_L(KEY_BASE + 54, KEY_BASE + 0, KEY_BASE + 2, I1) \ + ldd [%o0 + 0x10], %f8; \ + ldd [%o0 + 0x18], %f10; \ + AES_EROUND01_L(KEY_BASE + 52, KEY_BASE + 4, KEY_BASE + 6, I2) \ + AES_EROUND23_L(KEY_BASE + 54, KEY_BASE + 4, KEY_BASE + 6, I3) \ + ldd [%o0 + 0x20], %f12; \ + ldd [%o0 + 0x28], %f14; + +#define DECRYPT_TWO_ROUNDS(KEY_BASE, I0, I1, T0, T1) \ + AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ + AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ + AES_DROUND23(KEY_BASE + 4, T0, T1, I1) \ + AES_DROUND01(KEY_BASE + 6, T0, T1, I0) + +#define DECRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ + AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ + AES_DROUND23(KEY_BASE + 0, I2, I3, T3) \ + AES_DROUND01(KEY_BASE + 2, I2, I3, T2) \ + AES_DROUND23(KEY_BASE + 4, T0, T1, I1) \ + AES_DROUND01(KEY_BASE + 6, T0, T1, I0) \ + AES_DROUND23(KEY_BASE + 4, T2, T3, I3) \ + AES_DROUND01(KEY_BASE + 6, T2, T3, I2) + +#define DECRYPT_TWO_ROUNDS_LAST(KEY_BASE, I0, I1, T0, T1) \ + AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ + AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ + AES_DROUND23_L(KEY_BASE + 4, T0, T1, I1) \ + AES_DROUND01_L(KEY_BASE + 6, T0, T1, I0) + +#define DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + AES_DROUND23(KEY_BASE + 0, I0, I1, T1) \ + AES_DROUND01(KEY_BASE + 2, I0, I1, T0) \ + AES_DROUND23(KEY_BASE + 0, I2, I3, T3) \ + AES_DROUND01(KEY_BASE + 2, I2, I3, T2) \ + AES_DROUND23_L(KEY_BASE + 4, T0, T1, I1) \ + AES_DROUND01_L(KEY_BASE + 6, T0, T1, I0) \ + AES_DROUND23_L(KEY_BASE + 4, T2, T3, I3) \ + AES_DROUND01_L(KEY_BASE + 6, T2, T3, I2) + + /* 10 rounds */ +#define DECRYPT_128(KEY_BASE, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 32, I0, I1, T0, T1) + +#define DECRYPT_128_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) + + /* 12 rounds */ +#define DECRYPT_192(KEY_BASE, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 40, I0, I1, T0, T1) + +#define DECRYPT_192_2(KEY_BASE, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, T0, T1, T2, T3) \ + DECRYPT_TWO_ROUNDS_LAST_2(KEY_BASE + 40, I0, I1, I2, I3, T0, T1, T2, T3) + + /* 14 rounds */ +#define DECRYPT_256(KEY_BASE, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 0, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 8, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 16, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 24, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 32, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS(KEY_BASE + 40, I0, I1, T0, T1) \ + DECRYPT_TWO_ROUNDS_LAST(KEY_BASE + 48, I0, I1, T0, T1) + +#define DECRYPT_256_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, TMP_BASE) \ + DECRYPT_TWO_ROUNDS_2(KEY_BASE, I0, I1, I2, I3, \ + TMP_BASE + 0, TMP_BASE + 2, TMP_BASE + 4, TMP_BASE + 6) + +#define DECRYPT_256_2(KEY_BASE, I0, I1, I2, I3) \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 0, I0, I1, I2, I3, KEY_BASE + 48) \ + ldd [%o0 + 0x18], %f56; \ + ldd [%o0 + 0x10], %f58; \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 8, I0, I1, I2, I3, KEY_BASE + 0) \ + ldd [%o0 + 0x08], %f60; \ + ldd [%o0 + 0x00], %f62; \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 16, I0, I1, I2, I3, KEY_BASE + 0) \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 24, I0, I1, I2, I3, KEY_BASE + 0) \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 32, I0, I1, I2, I3, KEY_BASE + 0) \ + DECRYPT_256_TWO_ROUNDS_2(KEY_BASE + 40, I0, I1, I2, I3, KEY_BASE + 0) \ + AES_DROUND23(KEY_BASE + 48, I0, I1, KEY_BASE + 2) \ + AES_DROUND01(KEY_BASE + 50, I0, I1, KEY_BASE + 0) \ + AES_DROUND23(KEY_BASE + 48, I2, I3, KEY_BASE + 6) \ + AES_DROUND01(KEY_BASE + 50, I2, I3, KEY_BASE + 4) \ + AES_DROUND23_L(KEY_BASE + 52, KEY_BASE + 0, KEY_BASE + 2, I1) \ + AES_DROUND01_L(KEY_BASE + 54, KEY_BASE + 0, KEY_BASE + 2, I0) \ + ldd [%o0 + 0xd8], %f8; \ + ldd [%o0 + 0xd0], %f10; \ + AES_DROUND23_L(KEY_BASE + 52, KEY_BASE + 4, KEY_BASE + 6, I3) \ + AES_DROUND01_L(KEY_BASE + 54, KEY_BASE + 4, KEY_BASE + 6, I2) \ + ldd [%o0 + 0xc8], %f12; \ + ldd [%o0 + 0xc0], %f14; + + .align 32 +ENTRY(aes_sparc64_key_expand) + /* %o0=input_key, %o1=output_key, %o2=key_len */ + VISEntry + ld [%o0 + 0x00], %f0 + ld [%o0 + 0x04], %f1 + ld [%o0 + 0x08], %f2 + ld [%o0 + 0x0c], %f3 + + std %f0, [%o1 + 0x00] + std %f2, [%o1 + 0x08] + add %o1, 0x10, %o1 + + cmp %o2, 24 + bl 2f + nop + + be 1f + nop + + /* 256-bit key expansion */ + ld [%o0 + 0x10], %f4 + ld [%o0 + 0x14], %f5 + ld [%o0 + 0x18], %f6 + ld [%o0 + 0x1c], %f7 + + std %f4, [%o1 + 0x00] + std %f6, [%o1 + 0x08] + add %o1, 0x10, %o1 + + AES_KEXPAND1(0, 6, 0x0, 8) + AES_KEXPAND2(2, 8, 10) + AES_KEXPAND0(4, 10, 12) + AES_KEXPAND2(6, 12, 14) + AES_KEXPAND1(8, 14, 0x1, 16) + AES_KEXPAND2(10, 16, 18) + AES_KEXPAND0(12, 18, 20) + AES_KEXPAND2(14, 20, 22) + AES_KEXPAND1(16, 22, 0x2, 24) + AES_KEXPAND2(18, 24, 26) + AES_KEXPAND0(20, 26, 28) + AES_KEXPAND2(22, 28, 30) + AES_KEXPAND1(24, 30, 0x3, 32) + AES_KEXPAND2(26, 32, 34) + AES_KEXPAND0(28, 34, 36) + AES_KEXPAND2(30, 36, 38) + AES_KEXPAND1(32, 38, 0x4, 40) + AES_KEXPAND2(34, 40, 42) + AES_KEXPAND0(36, 42, 44) + AES_KEXPAND2(38, 44, 46) + AES_KEXPAND1(40, 46, 0x5, 48) + AES_KEXPAND2(42, 48, 50) + AES_KEXPAND0(44, 50, 52) + AES_KEXPAND2(46, 52, 54) + AES_KEXPAND1(48, 54, 0x6, 56) + AES_KEXPAND2(50, 56, 58) + + std %f8, [%o1 + 0x00] + std %f10, [%o1 + 0x08] + std %f12, [%o1 + 0x10] + std %f14, [%o1 + 0x18] + std %f16, [%o1 + 0x20] + std %f18, [%o1 + 0x28] + std %f20, [%o1 + 0x30] + std %f22, [%o1 + 0x38] + std %f24, [%o1 + 0x40] + std %f26, [%o1 + 0x48] + std %f28, [%o1 + 0x50] + std %f30, [%o1 + 0x58] + std %f32, [%o1 + 0x60] + std %f34, [%o1 + 0x68] + std %f36, [%o1 + 0x70] + std %f38, [%o1 + 0x78] + std %f40, [%o1 + 0x80] + std %f42, [%o1 + 0x88] + std %f44, [%o1 + 0x90] + std %f46, [%o1 + 0x98] + std %f48, [%o1 + 0xa0] + std %f50, [%o1 + 0xa8] + std %f52, [%o1 + 0xb0] + std %f54, [%o1 + 0xb8] + std %f56, [%o1 + 0xc0] + ba,pt %xcc, 80f + std %f58, [%o1 + 0xc8] + +1: + /* 192-bit key expansion */ + ld [%o0 + 0x10], %f4 + ld [%o0 + 0x14], %f5 + + std %f4, [%o1 + 0x00] + add %o1, 0x08, %o1 + + AES_KEXPAND1(0, 4, 0x0, 6) + AES_KEXPAND2(2, 6, 8) + AES_KEXPAND2(4, 8, 10) + AES_KEXPAND1(6, 10, 0x1, 12) + AES_KEXPAND2(8, 12, 14) + AES_KEXPAND2(10, 14, 16) + AES_KEXPAND1(12, 16, 0x2, 18) + AES_KEXPAND2(14, 18, 20) + AES_KEXPAND2(16, 20, 22) + AES_KEXPAND1(18, 22, 0x3, 24) + AES_KEXPAND2(20, 24, 26) + AES_KEXPAND2(22, 26, 28) + AES_KEXPAND1(24, 28, 0x4, 30) + AES_KEXPAND2(26, 30, 32) + AES_KEXPAND2(28, 32, 34) + AES_KEXPAND1(30, 34, 0x5, 36) + AES_KEXPAND2(32, 36, 38) + AES_KEXPAND2(34, 38, 40) + AES_KEXPAND1(36, 40, 0x6, 42) + AES_KEXPAND2(38, 42, 44) + AES_KEXPAND2(40, 44, 46) + AES_KEXPAND1(42, 46, 0x7, 48) + AES_KEXPAND2(44, 48, 50) + + std %f6, [%o1 + 0x00] + std %f8, [%o1 + 0x08] + std %f10, [%o1 + 0x10] + std %f12, [%o1 + 0x18] + std %f14, [%o1 + 0x20] + std %f16, [%o1 + 0x28] + std %f18, [%o1 + 0x30] + std %f20, [%o1 + 0x38] + std %f22, [%o1 + 0x40] + std %f24, [%o1 + 0x48] + std %f26, [%o1 + 0x50] + std %f28, [%o1 + 0x58] + std %f30, [%o1 + 0x60] + std %f32, [%o1 + 0x68] + std %f34, [%o1 + 0x70] + std %f36, [%o1 + 0x78] + std %f38, [%o1 + 0x80] + std %f40, [%o1 + 0x88] + std %f42, [%o1 + 0x90] + std %f44, [%o1 + 0x98] + std %f46, [%o1 + 0xa0] + std %f48, [%o1 + 0xa8] + ba,pt %xcc, 80f + std %f50, [%o1 + 0xb0] + +2: + /* 128-bit key expansion */ + AES_KEXPAND1(0, 2, 0x0, 4) + AES_KEXPAND2(2, 4, 6) + AES_KEXPAND1(4, 6, 0x1, 8) + AES_KEXPAND2(6, 8, 10) + AES_KEXPAND1(8, 10, 0x2, 12) + AES_KEXPAND2(10, 12, 14) + AES_KEXPAND1(12, 14, 0x3, 16) + AES_KEXPAND2(14, 16, 18) + AES_KEXPAND1(16, 18, 0x4, 20) + AES_KEXPAND2(18, 20, 22) + AES_KEXPAND1(20, 22, 0x5, 24) + AES_KEXPAND2(22, 24, 26) + AES_KEXPAND1(24, 26, 0x6, 28) + AES_KEXPAND2(26, 28, 30) + AES_KEXPAND1(28, 30, 0x7, 32) + AES_KEXPAND2(30, 32, 34) + AES_KEXPAND1(32, 34, 0x8, 36) + AES_KEXPAND2(34, 36, 38) + AES_KEXPAND1(36, 38, 0x9, 40) + AES_KEXPAND2(38, 40, 42) + + std %f4, [%o1 + 0x00] + std %f6, [%o1 + 0x08] + std %f8, [%o1 + 0x10] + std %f10, [%o1 + 0x18] + std %f12, [%o1 + 0x20] + std %f14, [%o1 + 0x28] + std %f16, [%o1 + 0x30] + std %f18, [%o1 + 0x38] + std %f20, [%o1 + 0x40] + std %f22, [%o1 + 0x48] + std %f24, [%o1 + 0x50] + std %f26, [%o1 + 0x58] + std %f28, [%o1 + 0x60] + std %f30, [%o1 + 0x68] + std %f32, [%o1 + 0x70] + std %f34, [%o1 + 0x78] + std %f36, [%o1 + 0x80] + std %f38, [%o1 + 0x88] + std %f40, [%o1 + 0x90] + std %f42, [%o1 + 0x98] +80: + retl + VISExit +ENDPROC(aes_sparc64_key_expand) + + .align 32 +ENTRY(aes_sparc64_encrypt_128) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + ldd [%o0 + 0x00], %f8 + ldd [%o0 + 0x08], %f10 + ldd [%o0 + 0x10], %f12 + ldd [%o0 + 0x18], %f14 + ldd [%o0 + 0x20], %f16 + ldd [%o0 + 0x28], %f18 + ldd [%o0 + 0x30], %f20 + ldd [%o0 + 0x38], %f22 + ldd [%o0 + 0x40], %f24 + ldd [%o0 + 0x48], %f26 + ldd [%o0 + 0x50], %f28 + ldd [%o0 + 0x58], %f30 + ldd [%o0 + 0x60], %f32 + ldd [%o0 + 0x68], %f34 + ldd [%o0 + 0x70], %f36 + ldd [%o0 + 0x78], %f38 + ldd [%o0 + 0x80], %f40 + ldd [%o0 + 0x88], %f42 + ldd [%o0 + 0x90], %f44 + ldd [%o0 + 0x98], %f46 + ldd [%o0 + 0xa0], %f48 + ldd [%o0 + 0xa8], %f50 + fxor %f8, %f4, %f4 + fxor %f10, %f6, %f6 + ENCRYPT_128(12, 4, 6, 0, 2) + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + retl + VISExit +ENDPROC(aes_sparc64_encrypt_128) + + .align 32 +ENTRY(aes_sparc64_encrypt_192) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + + ldd [%o0 + 0x00], %f8 + ldd [%o0 + 0x08], %f10 + + fxor %f8, %f4, %f4 + fxor %f10, %f6, %f6 + + ldd [%o0 + 0x10], %f8 + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + add %o0, 0x20, %o0 + + ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) + + ldd [%o0 + 0x10], %f12 + ldd [%o0 + 0x18], %f14 + ldd [%o0 + 0x20], %f16 + ldd [%o0 + 0x28], %f18 + ldd [%o0 + 0x30], %f20 + ldd [%o0 + 0x38], %f22 + ldd [%o0 + 0x40], %f24 + ldd [%o0 + 0x48], %f26 + ldd [%o0 + 0x50], %f28 + ldd [%o0 + 0x58], %f30 + ldd [%o0 + 0x60], %f32 + ldd [%o0 + 0x68], %f34 + ldd [%o0 + 0x70], %f36 + ldd [%o0 + 0x78], %f38 + ldd [%o0 + 0x80], %f40 + ldd [%o0 + 0x88], %f42 + ldd [%o0 + 0x90], %f44 + ldd [%o0 + 0x98], %f46 + ldd [%o0 + 0xa0], %f48 + ldd [%o0 + 0xa8], %f50 + + + ENCRYPT_128(12, 4, 6, 0, 2) + + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + + retl + VISExit +ENDPROC(aes_sparc64_encrypt_192) + + .align 32 +ENTRY(aes_sparc64_encrypt_256) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + + ldd [%o0 + 0x00], %f8 + ldd [%o0 + 0x08], %f10 + + fxor %f8, %f4, %f4 + fxor %f10, %f6, %f6 + + ldd [%o0 + 0x10], %f8 + + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + add %o0, 0x20, %o0 + + ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) + + ldd [%o0 + 0x10], %f8 + + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + add %o0, 0x20, %o0 + + ENCRYPT_TWO_ROUNDS(8, 4, 6, 0, 2) + + ldd [%o0 + 0x10], %f12 + ldd [%o0 + 0x18], %f14 + ldd [%o0 + 0x20], %f16 + ldd [%o0 + 0x28], %f18 + ldd [%o0 + 0x30], %f20 + ldd [%o0 + 0x38], %f22 + ldd [%o0 + 0x40], %f24 + ldd [%o0 + 0x48], %f26 + ldd [%o0 + 0x50], %f28 + ldd [%o0 + 0x58], %f30 + ldd [%o0 + 0x60], %f32 + ldd [%o0 + 0x68], %f34 + ldd [%o0 + 0x70], %f36 + ldd [%o0 + 0x78], %f38 + ldd [%o0 + 0x80], %f40 + ldd [%o0 + 0x88], %f42 + ldd [%o0 + 0x90], %f44 + ldd [%o0 + 0x98], %f46 + ldd [%o0 + 0xa0], %f48 + ldd [%o0 + 0xa8], %f50 + + ENCRYPT_128(12, 4, 6, 0, 2) + + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + + retl + VISExit +ENDPROC(aes_sparc64_encrypt_256) + + .align 32 +ENTRY(aes_sparc64_decrypt_128) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + ldd [%o0 + 0xa0], %f8 + ldd [%o0 + 0xa8], %f10 + ldd [%o0 + 0x98], %f12 + ldd [%o0 + 0x90], %f14 + ldd [%o0 + 0x88], %f16 + ldd [%o0 + 0x80], %f18 + ldd [%o0 + 0x78], %f20 + ldd [%o0 + 0x70], %f22 + ldd [%o0 + 0x68], %f24 + ldd [%o0 + 0x60], %f26 + ldd [%o0 + 0x58], %f28 + ldd [%o0 + 0x50], %f30 + ldd [%o0 + 0x48], %f32 + ldd [%o0 + 0x40], %f34 + ldd [%o0 + 0x38], %f36 + ldd [%o0 + 0x30], %f38 + ldd [%o0 + 0x28], %f40 + ldd [%o0 + 0x20], %f42 + ldd [%o0 + 0x18], %f44 + ldd [%o0 + 0x10], %f46 + ldd [%o0 + 0x08], %f48 + ldd [%o0 + 0x00], %f50 + fxor %f8, %f4, %f4 + fxor %f10, %f6, %f6 + DECRYPT_128(12, 4, 6, 0, 2) + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + retl + VISExit +ENDPROC(aes_sparc64_decrypt_128) + + .align 32 +ENTRY(aes_sparc64_decrypt_192) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + ldd [%o0 + 0xc0], %f8 + ldd [%o0 + 0xc8], %f10 + ldd [%o0 + 0xb8], %f12 + ldd [%o0 + 0xb0], %f14 + ldd [%o0 + 0xa8], %f16 + ldd [%o0 + 0xa0], %f18 + fxor %f8, %f4, %f4 + fxor %f10, %f6, %f6 + ldd [%o0 + 0x98], %f20 + ldd [%o0 + 0x90], %f22 + ldd [%o0 + 0x88], %f24 + ldd [%o0 + 0x80], %f26 + DECRYPT_TWO_ROUNDS(12, 4, 6, 0, 2) + ldd [%o0 + 0x78], %f28 + ldd [%o0 + 0x70], %f30 + ldd [%o0 + 0x68], %f32 + ldd [%o0 + 0x60], %f34 + ldd [%o0 + 0x58], %f36 + ldd [%o0 + 0x50], %f38 + ldd [%o0 + 0x48], %f40 + ldd [%o0 + 0x40], %f42 + ldd [%o0 + 0x38], %f44 + ldd [%o0 + 0x30], %f46 + ldd [%o0 + 0x28], %f48 + ldd [%o0 + 0x20], %f50 + ldd [%o0 + 0x18], %f52 + ldd [%o0 + 0x10], %f54 + ldd [%o0 + 0x08], %f56 + ldd [%o0 + 0x00], %f58 + DECRYPT_128(20, 4, 6, 0, 2) + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + retl + VISExit +ENDPROC(aes_sparc64_decrypt_192) + + .align 32 +ENTRY(aes_sparc64_decrypt_256) + /* %o0=key, %o1=input, %o2=output */ + VISEntry + ld [%o1 + 0x00], %f4 + ld [%o1 + 0x04], %f5 + ld [%o1 + 0x08], %f6 + ld [%o1 + 0x0c], %f7 + ldd [%o0 + 0xe0], %f8 + ldd [%o0 + 0xe8], %f10 + ldd [%o0 + 0xd8], %f12 + ldd [%o0 + 0xd0], %f14 + ldd [%o0 + 0xc8], %f16 + fxor %f8, %f4, %f4 + ldd [%o0 + 0xc0], %f18 + fxor %f10, %f6, %f6 + ldd [%o0 + 0xb8], %f20 + AES_DROUND23(12, 4, 6, 2) + ldd [%o0 + 0xb0], %f22 + AES_DROUND01(14, 4, 6, 0) + ldd [%o0 + 0xa8], %f24 + AES_DROUND23(16, 0, 2, 6) + ldd [%o0 + 0xa0], %f26 + AES_DROUND01(18, 0, 2, 4) + ldd [%o0 + 0x98], %f12 + AES_DROUND23(20, 4, 6, 2) + ldd [%o0 + 0x90], %f14 + AES_DROUND01(22, 4, 6, 0) + ldd [%o0 + 0x88], %f16 + AES_DROUND23(24, 0, 2, 6) + ldd [%o0 + 0x80], %f18 + AES_DROUND01(26, 0, 2, 4) + ldd [%o0 + 0x78], %f20 + AES_DROUND23(12, 4, 6, 2) + ldd [%o0 + 0x70], %f22 + AES_DROUND01(14, 4, 6, 0) + ldd [%o0 + 0x68], %f24 + AES_DROUND23(16, 0, 2, 6) + ldd [%o0 + 0x60], %f26 + AES_DROUND01(18, 0, 2, 4) + ldd [%o0 + 0x58], %f28 + AES_DROUND23(20, 4, 6, 2) + ldd [%o0 + 0x50], %f30 + AES_DROUND01(22, 4, 6, 0) + ldd [%o0 + 0x48], %f32 + AES_DROUND23(24, 0, 2, 6) + ldd [%o0 + 0x40], %f34 + AES_DROUND01(26, 0, 2, 4) + ldd [%o0 + 0x38], %f36 + AES_DROUND23(28, 4, 6, 2) + ldd [%o0 + 0x30], %f38 + AES_DROUND01(30, 4, 6, 0) + ldd [%o0 + 0x28], %f40 + AES_DROUND23(32, 0, 2, 6) + ldd [%o0 + 0x20], %f42 + AES_DROUND01(34, 0, 2, 4) + ldd [%o0 + 0x18], %f44 + AES_DROUND23(36, 4, 6, 2) + ldd [%o0 + 0x10], %f46 + AES_DROUND01(38, 4, 6, 0) + ldd [%o0 + 0x08], %f48 + AES_DROUND23(40, 0, 2, 6) + ldd [%o0 + 0x00], %f50 + AES_DROUND01(42, 0, 2, 4) + AES_DROUND23(44, 4, 6, 2) + AES_DROUND01(46, 4, 6, 0) + AES_DROUND23_L(48, 0, 2, 6) + AES_DROUND01_L(50, 0, 2, 4) + st %f4, [%o2 + 0x00] + st %f5, [%o2 + 0x04] + st %f6, [%o2 + 0x08] + st %f7, [%o2 + 0x0c] + retl + VISExit +ENDPROC(aes_sparc64_decrypt_256) + + .align 32 +ENTRY(aes_sparc64_load_encrypt_keys_128) + /* %o0=key */ + VISEntry + ldd [%o0 + 0x10], %f8 + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + ldd [%o0 + 0x30], %f16 + ldd [%o0 + 0x38], %f18 + ldd [%o0 + 0x40], %f20 + ldd [%o0 + 0x48], %f22 + ldd [%o0 + 0x50], %f24 + ldd [%o0 + 0x58], %f26 + ldd [%o0 + 0x60], %f28 + ldd [%o0 + 0x68], %f30 + ldd [%o0 + 0x70], %f32 + ldd [%o0 + 0x78], %f34 + ldd [%o0 + 0x80], %f36 + ldd [%o0 + 0x88], %f38 + ldd [%o0 + 0x90], %f40 + ldd [%o0 + 0x98], %f42 + ldd [%o0 + 0xa0], %f44 + retl + ldd [%o0 + 0xa8], %f46 +ENDPROC(aes_sparc64_load_encrypt_keys_128) + + .align 32 +ENTRY(aes_sparc64_load_encrypt_keys_192) + /* %o0=key */ + VISEntry + ldd [%o0 + 0x10], %f8 + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + ldd [%o0 + 0x30], %f16 + ldd [%o0 + 0x38], %f18 + ldd [%o0 + 0x40], %f20 + ldd [%o0 + 0x48], %f22 + ldd [%o0 + 0x50], %f24 + ldd [%o0 + 0x58], %f26 + ldd [%o0 + 0x60], %f28 + ldd [%o0 + 0x68], %f30 + ldd [%o0 + 0x70], %f32 + ldd [%o0 + 0x78], %f34 + ldd [%o0 + 0x80], %f36 + ldd [%o0 + 0x88], %f38 + ldd [%o0 + 0x90], %f40 + ldd [%o0 + 0x98], %f42 + ldd [%o0 + 0xa0], %f44 + ldd [%o0 + 0xa8], %f46 + ldd [%o0 + 0xb0], %f48 + ldd [%o0 + 0xb8], %f50 + ldd [%o0 + 0xc0], %f52 + retl + ldd [%o0 + 0xc8], %f54 +ENDPROC(aes_sparc64_load_encrypt_keys_192) + + .align 32 +ENTRY(aes_sparc64_load_encrypt_keys_256) + /* %o0=key */ + VISEntry + ldd [%o0 + 0x10], %f8 + ldd [%o0 + 0x18], %f10 + ldd [%o0 + 0x20], %f12 + ldd [%o0 + 0x28], %f14 + ldd [%o0 + 0x30], %f16 + ldd [%o0 + 0x38], %f18 + ldd [%o0 + 0x40], %f20 + ldd [%o0 + 0x48], %f22 + ldd [%o0 + 0x50], %f24 + ldd [%o0 + 0x58], %f26 + ldd [%o0 + 0x60], %f28 + ldd [%o0 + 0x68], %f30 + ldd [%o0 + 0x70], %f32 + ldd [%o0 + 0x78], %f34 + ldd [%o0 + 0x80], %f36 + ldd [%o0 + 0x88], %f38 + ldd [%o0 + 0x90], %f40 + ldd [%o0 + 0x98], %f42 + ldd [%o0 + 0xa0], %f44 + ldd [%o0 + 0xa8], %f46 + ldd [%o0 + 0xb0], %f48 + ldd [%o0 + 0xb8], %f50 + ldd [%o0 + 0xc0], %f52 + ldd [%o0 + 0xc8], %f54 + ldd [%o0 + 0xd0], %f56 + ldd [%o0 + 0xd8], %f58 + ldd [%o0 + 0xe0], %f60 + retl + ldd [%o0 + 0xe8], %f62 +ENDPROC(aes_sparc64_load_encrypt_keys_256) + + .align 32 +ENTRY(aes_sparc64_load_decrypt_keys_128) + /* %o0=key */ + VISEntry + ldd [%o0 + 0x98], %f8 + ldd [%o0 + 0x90], %f10 + ldd [%o0 + 0x88], %f12 + ldd [%o0 + 0x80], %f14 + ldd [%o0 + 0x78], %f16 + ldd [%o0 + 0x70], %f18 + ldd [%o0 + 0x68], %f20 + ldd [%o0 + 0x60], %f22 + ldd [%o0 + 0x58], %f24 + ldd [%o0 + 0x50], %f26 + ldd [%o0 + 0x48], %f28 + ldd [%o0 + 0x40], %f30 + ldd [%o0 + 0x38], %f32 + ldd [%o0 + 0x30], %f34 + ldd [%o0 + 0x28], %f36 + ldd [%o0 + 0x20], %f38 + ldd [%o0 + 0x18], %f40 + ldd [%o0 + 0x10], %f42 + ldd [%o0 + 0x08], %f44 + retl + ldd [%o0 + 0x00], %f46 +ENDPROC(aes_sparc64_load_decrypt_keys_128) + + .align 32 +ENTRY(aes_sparc64_load_decrypt_keys_192) + /* %o0=key */ + VISEntry + ldd [%o0 + 0xb8], %f8 + ldd [%o0 + 0xb0], %f10 + ldd [%o0 + 0xa8], %f12 + ldd [%o0 + 0xa0], %f14 + ldd [%o0 + 0x98], %f16 + ldd [%o0 + 0x90], %f18 + ldd [%o0 + 0x88], %f20 + ldd [%o0 + 0x80], %f22 + ldd [%o0 + 0x78], %f24 + ldd [%o0 + 0x70], %f26 + ldd [%o0 + 0x68], %f28 + ldd [%o0 + 0x60], %f30 + ldd [%o0 + 0x58], %f32 + ldd [%o0 + 0x50], %f34 + ldd [%o0 + 0x48], %f36 + ldd [%o0 + 0x40], %f38 + ldd [%o0 + 0x38], %f40 + ldd [%o0 + 0x30], %f42 + ldd [%o0 + 0x28], %f44 + ldd [%o0 + 0x20], %f46 + ldd [%o0 + 0x18], %f48 + ldd [%o0 + 0x10], %f50 + ldd [%o0 + 0x08], %f52 + retl + ldd [%o0 + 0x00], %f54 +ENDPROC(aes_sparc64_load_decrypt_keys_192) + + .align 32 +ENTRY(aes_sparc64_load_decrypt_keys_256) + /* %o0=key */ + VISEntry + ldd [%o0 + 0xd8], %f8 + ldd [%o0 + 0xd0], %f10 + ldd [%o0 + 0xc8], %f12 + ldd [%o0 + 0xc0], %f14 + ldd [%o0 + 0xb8], %f16 + ldd [%o0 + 0xb0], %f18 + ldd [%o0 + 0xa8], %f20 + ldd [%o0 + 0xa0], %f22 + ldd [%o0 + 0x98], %f24 + ldd [%o0 + 0x90], %f26 + ldd [%o0 + 0x88], %f28 + ldd [%o0 + 0x80], %f30 + ldd [%o0 + 0x78], %f32 + ldd [%o0 + 0x70], %f34 + ldd [%o0 + 0x68], %f36 + ldd [%o0 + 0x60], %f38 + ldd [%o0 + 0x58], %f40 + ldd [%o0 + 0x50], %f42 + ldd [%o0 + 0x48], %f44 + ldd [%o0 + 0x40], %f46 + ldd [%o0 + 0x38], %f48 + ldd [%o0 + 0x30], %f50 + ldd [%o0 + 0x28], %f52 + ldd [%o0 + 0x20], %f54 + ldd [%o0 + 0x18], %f56 + ldd [%o0 + 0x10], %f58 + ldd [%o0 + 0x08], %f60 + retl + ldd [%o0 + 0x00], %f62 +ENDPROC(aes_sparc64_load_decrypt_keys_256) + + .align 32 +ENTRY(aes_sparc64_ecb_encrypt_128) + /* %o0=key, %o1=input, %o2=output, %o3=len */ + ldx [%o0 + 0x00], %g1 + subcc %o3, 0x10, %o3 + be 10f + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F60 + MOVXTOD_G7_F62 + ENCRYPT_128_2(8, 4, 6, 60, 62, 0, 2, 56, 58) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + ENCRYPT_128(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_encrypt_128) + + .align 32 +ENTRY(aes_sparc64_ecb_encrypt_192) + /* %o0=key, %o1=input, %o2=output, %o3=len */ + ldx [%o0 + 0x00], %g1 + subcc %o3, 0x10, %o3 + be 10f + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F60 + MOVXTOD_G7_F62 + ENCRYPT_192_2(8, 4, 6, 60, 62, 0, 2, 56, 58) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + ENCRYPT_192(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_encrypt_192) + + .align 32 +ENTRY(aes_sparc64_ecb_encrypt_256) + /* %o0=key, %o1=input, %o2=output, %o3=len */ + ldx [%o0 + 0x00], %g1 + subcc %o3, 0x10, %o3 + be 10f + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F0 + MOVXTOD_G7_F2 + ENCRYPT_256_2(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f0, [%o2 + 0x10] + std %f2, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldd [%o0 + 0xd0], %f56 + ldd [%o0 + 0xd8], %f58 + ldd [%o0 + 0xe0], %f60 + ldd [%o0 + 0xe8], %f62 + ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + ENCRYPT_256(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_encrypt_256) + + .align 32 +ENTRY(aes_sparc64_ecb_decrypt_128) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ + ldx [%o0 - 0x10], %g1 + subcc %o3, 0x10, %o3 + be 10f + ldx [%o0 - 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F60 + MOVXTOD_G7_F62 + DECRYPT_128_2(8, 4, 6, 60, 62, 0, 2, 56, 58) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz,pt %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_128(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_decrypt_128) + + .align 32 +ENTRY(aes_sparc64_ecb_decrypt_192) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ + ldx [%o0 - 0x10], %g1 + subcc %o3, 0x10, %o3 + be 10f + ldx [%o0 - 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F60 + MOVXTOD_G7_F62 + DECRYPT_192_2(8, 4, 6, 60, 62, 0, 2, 56, 58) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz,pt %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_192(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_decrypt_192) + + .align 32 +ENTRY(aes_sparc64_ecb_decrypt_256) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len */ + ldx [%o0 - 0x10], %g1 + subcc %o3, 0x10, %o3 + ldx [%o0 - 0x08], %g2 + be 10f + sub %o0, 0xf0, %o0 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + ldx [%o1 + 0x10], %o4 + ldx [%o1 + 0x18], %o5 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + xor %g1, %o4, %g3 + xor %g2, %o5, %g7 + MOVXTOD_G3_F0 + MOVXTOD_G7_F2 + DECRYPT_256_2(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + std %f0, [%o2 + 0x10] + std %f2, [%o2 + 0x18] + sub %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz,pt %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldd [%o0 + 0x18], %f56 + ldd [%o0 + 0x10], %f58 + ldd [%o0 + 0x08], %f60 + ldd [%o0 + 0x00], %f62 + ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_256(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: retl + nop +ENDPROC(aes_sparc64_ecb_decrypt_256) + + .align 32 +ENTRY(aes_sparc64_cbc_encrypt_128) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldd [%o4 + 0x00], %f4 + ldd [%o4 + 0x08], %f6 + ldx [%o0 + 0x00], %g1 + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F0 + MOVXTOD_G7_F2 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + ENCRYPT_128(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + std %f4, [%o4 + 0x00] + std %f6, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_encrypt_128) + + .align 32 +ENTRY(aes_sparc64_cbc_encrypt_192) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldd [%o4 + 0x00], %f4 + ldd [%o4 + 0x08], %f6 + ldx [%o0 + 0x00], %g1 + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F0 + MOVXTOD_G7_F2 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + ENCRYPT_192(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + std %f4, [%o4 + 0x00] + std %f6, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_encrypt_192) + + .align 32 +ENTRY(aes_sparc64_cbc_encrypt_256) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldd [%o4 + 0x00], %f4 + ldd [%o4 + 0x08], %f6 + ldx [%o0 + 0x00], %g1 + ldx [%o0 + 0x08], %g2 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F0 + MOVXTOD_G7_F2 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + ENCRYPT_256(8, 4, 6, 0, 2) + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + std %f4, [%o4 + 0x00] + std %f6, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_encrypt_256) + + .align 32 +ENTRY(aes_sparc64_cbc_decrypt_128) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ + ldx [%o0 - 0x10], %g1 + ldx [%o0 - 0x08], %g2 + ldx [%o4 + 0x00], %o0 + ldx [%o4 + 0x08], %o5 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_128(8, 4, 6, 0, 2) + MOVXTOD_O0_F0 + MOVXTOD_O5_F2 + xor %g1, %g3, %o0 + xor %g2, %g7, %o5 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + stx %o0, [%o4 + 0x00] + stx %o5, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_decrypt_128) + + .align 32 +ENTRY(aes_sparc64_cbc_decrypt_192) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ + ldx [%o0 - 0x10], %g1 + ldx [%o0 - 0x08], %g2 + ldx [%o4 + 0x00], %o0 + ldx [%o4 + 0x08], %o5 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_192(8, 4, 6, 0, 2) + MOVXTOD_O0_F0 + MOVXTOD_O5_F2 + xor %g1, %g3, %o0 + xor %g2, %g7, %o5 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + stx %o0, [%o4 + 0x00] + stx %o5, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_decrypt_192) + + .align 32 +ENTRY(aes_sparc64_cbc_decrypt_256) + /* %o0=&key[key_len], %o1=input, %o2=output, %o3=len, %o4=iv */ + ldx [%o0 - 0x10], %g1 + ldx [%o0 - 0x08], %g2 + ldx [%o4 + 0x00], %o0 + ldx [%o4 + 0x08], %o5 +1: ldx [%o1 + 0x00], %g3 + ldx [%o1 + 0x08], %g7 + add %o1, 0x10, %o1 + xor %g1, %g3, %g3 + xor %g2, %g7, %g7 + MOVXTOD_G3_F4 + MOVXTOD_G7_F6 + DECRYPT_256(8, 4, 6, 0, 2) + MOVXTOD_O0_F0 + MOVXTOD_O5_F2 + xor %g1, %g3, %o0 + xor %g2, %g7, %o5 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] + subcc %o3, 0x10, %o3 + bne,pt %xcc, 1b + add %o2, 0x10, %o2 + stx %o0, [%o4 + 0x00] + stx %o5, [%o4 + 0x08] + retl + nop +ENDPROC(aes_sparc64_cbc_decrypt_256) + + .align 32 +ENTRY(aes_sparc64_ctr_crypt_128) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldx [%o4 + 0x00], %g3 + ldx [%o4 + 0x08], %g7 + subcc %o3, 0x10, %o3 + ldx [%o0 + 0x00], %g1 + be 10f + ldx [%o0 + 0x08], %g2 +1: xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + xor %g1, %g3, %o5 + MOVXTOD_O5_F4 + xor %g2, %g7, %o5 + MOVXTOD_O5_F6 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_128_2(8, 0, 2, 4, 6, 56, 58, 60, 62) + ldd [%o1 + 0x00], %f56 + ldd [%o1 + 0x08], %f58 + ldd [%o1 + 0x10], %f60 + ldd [%o1 + 0x18], %f62 + fxor %f56, %f0, %f56 + fxor %f58, %f2, %f58 + fxor %f60, %f4, %f60 + fxor %f62, %f6, %f62 + std %f56, [%o2 + 0x00] + std %f58, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + subcc %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_128(8, 0, 2, 4, 6) + ldd [%o1 + 0x00], %f4 + ldd [%o1 + 0x08], %f6 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: stx %g3, [%o4 + 0x00] + retl + stx %g7, [%o4 + 0x08] +ENDPROC(aes_sparc64_ctr_crypt_128) + + .align 32 +ENTRY(aes_sparc64_ctr_crypt_192) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldx [%o4 + 0x00], %g3 + ldx [%o4 + 0x08], %g7 + subcc %o3, 0x10, %o3 + ldx [%o0 + 0x00], %g1 + be 10f + ldx [%o0 + 0x08], %g2 +1: xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + xor %g1, %g3, %o5 + MOVXTOD_O5_F4 + xor %g2, %g7, %o5 + MOVXTOD_O5_F6 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_192_2(8, 0, 2, 4, 6, 56, 58, 60, 62) + ldd [%o1 + 0x00], %f56 + ldd [%o1 + 0x08], %f58 + ldd [%o1 + 0x10], %f60 + ldd [%o1 + 0x18], %f62 + fxor %f56, %f0, %f56 + fxor %f58, %f2, %f58 + fxor %f60, %f4, %f60 + fxor %f62, %f6, %f62 + std %f56, [%o2 + 0x00] + std %f58, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + subcc %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_192(8, 0, 2, 4, 6) + ldd [%o1 + 0x00], %f4 + ldd [%o1 + 0x08], %f6 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: stx %g3, [%o4 + 0x00] + retl + stx %g7, [%o4 + 0x08] +ENDPROC(aes_sparc64_ctr_crypt_192) + + .align 32 +ENTRY(aes_sparc64_ctr_crypt_256) + /* %o0=key, %o1=input, %o2=output, %o3=len, %o4=IV */ + ldx [%o4 + 0x00], %g3 + ldx [%o4 + 0x08], %g7 + subcc %o3, 0x10, %o3 + ldx [%o0 + 0x00], %g1 + be 10f + ldx [%o0 + 0x08], %g2 +1: xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + xor %g1, %g3, %o5 + MOVXTOD_O5_F4 + xor %g2, %g7, %o5 + MOVXTOD_O5_F6 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_256_2(8, 0, 2, 4, 6) + ldd [%o1 + 0x00], %f56 + ldd [%o1 + 0x08], %f58 + ldd [%o1 + 0x10], %f60 + ldd [%o1 + 0x18], %f62 + fxor %f56, %f0, %f56 + fxor %f58, %f2, %f58 + fxor %f60, %f4, %f60 + fxor %f62, %f6, %f62 + std %f56, [%o2 + 0x00] + std %f58, [%o2 + 0x08] + std %f60, [%o2 + 0x10] + std %f62, [%o2 + 0x18] + subcc %o3, 0x20, %o3 + add %o1, 0x20, %o1 + brgz %o3, 1b + add %o2, 0x20, %o2 + brlz,pt %o3, 11f + nop +10: ldd [%o0 + 0xd0], %f56 + ldd [%o0 + 0xd8], %f58 + ldd [%o0 + 0xe0], %f60 + ldd [%o0 + 0xe8], %f62 + xor %g1, %g3, %o5 + MOVXTOD_O5_F0 + xor %g2, %g7, %o5 + MOVXTOD_O5_F2 + add %g7, 1, %g7 + add %g3, 1, %o5 + movrz %g7, %o5, %g3 + ENCRYPT_256(8, 0, 2, 4, 6) + ldd [%o1 + 0x00], %f4 + ldd [%o1 + 0x08], %f6 + fxor %f4, %f0, %f4 + fxor %f6, %f2, %f6 + std %f4, [%o2 + 0x00] + std %f6, [%o2 + 0x08] +11: stx %g3, [%o4 + 0x00] + retl + stx %g7, [%o4 + 0x08] +ENDPROC(aes_sparc64_ctr_crypt_256) -- cgit v1.2.3 From b2c15db74a379a50d723002799c3db0c6781c75a Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:27 -0800 Subject: crypto: drbg - Use new AES library API Switch from the old AES library functions (which use struct crypto_aes_ctx) to the new ones (which use struct aes_enckey). This eliminates the unnecessary computation and caching of the decryption round keys. The new AES en/decryption functions are also much faster and use AES instructions when supported by the CPU. Note that in addition to the change in the key preparation function and the key struct type itself, the change in the type of the key struct results in aes_encrypt() (which is temporarily a type-generic macro) calling the new encryption function rather than the old one. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-30-ebiggers@kernel.org Signed-off-by: Eric Biggers --- crypto/df_sp80090a.c | 30 ++++++++++-------------------- crypto/drbg.c | 12 ++++++------ drivers/crypto/xilinx/xilinx-trng.c | 8 ++++---- include/crypto/df_sp80090a.h | 2 +- 4 files changed, 21 insertions(+), 31 deletions(-) (limited to 'include') diff --git a/crypto/df_sp80090a.c b/crypto/df_sp80090a.c index dc63b31a93fc..b8134be6f7ad 100644 --- a/crypto/df_sp80090a.c +++ b/crypto/df_sp80090a.c @@ -14,27 +14,17 @@ #include #include -static void drbg_kcapi_symsetkey(struct crypto_aes_ctx *aesctx, - const unsigned char *key, - u8 keylen); -static void drbg_kcapi_symsetkey(struct crypto_aes_ctx *aesctx, - const unsigned char *key, u8 keylen) -{ - aes_expandkey(aesctx, key, keylen); -} - -static void drbg_kcapi_sym(struct crypto_aes_ctx *aesctx, - unsigned char *outval, +static void drbg_kcapi_sym(struct aes_enckey *aeskey, unsigned char *outval, const struct drbg_string *in, u8 blocklen_bytes) { /* there is only component in *in */ BUG_ON(in->len < blocklen_bytes); - aes_encrypt(aesctx, outval, in->buf); + aes_encrypt(aeskey, outval, in->buf); } /* BCC function for CTR DRBG as defined in 10.4.3 */ -static void drbg_ctr_bcc(struct crypto_aes_ctx *aesctx, +static void drbg_ctr_bcc(struct aes_enckey *aeskey, unsigned char *out, const unsigned char *key, struct list_head *in, u8 blocklen_bytes, @@ -47,7 +37,7 @@ static void drbg_ctr_bcc(struct crypto_aes_ctx *aesctx, drbg_string_fill(&data, out, blocklen_bytes); /* 10.4.3 step 2 / 4 */ - drbg_kcapi_symsetkey(aesctx, key, keylen); + aes_prepareenckey(aeskey, key, keylen); list_for_each_entry(curr, in, list) { const unsigned char *pos = curr->buf; size_t len = curr->len; @@ -56,7 +46,7 @@ static void drbg_ctr_bcc(struct crypto_aes_ctx *aesctx, /* 10.4.3 step 4.2 */ if (blocklen_bytes == cnt) { cnt = 0; - drbg_kcapi_sym(aesctx, out, &data, blocklen_bytes); + drbg_kcapi_sym(aeskey, out, &data, blocklen_bytes); } out[cnt] ^= *pos; pos++; @@ -66,7 +56,7 @@ static void drbg_ctr_bcc(struct crypto_aes_ctx *aesctx, } /* 10.4.3 step 4.2 for last block */ if (cnt) - drbg_kcapi_sym(aesctx, out, &data, blocklen_bytes); + drbg_kcapi_sym(aeskey, out, &data, blocklen_bytes); } /* @@ -110,7 +100,7 @@ static void drbg_ctr_bcc(struct crypto_aes_ctx *aesctx, */ /* Derivation Function for CTR DRBG as defined in 10.4.2 */ -int crypto_drbg_ctr_df(struct crypto_aes_ctx *aesctx, +int crypto_drbg_ctr_df(struct aes_enckey *aeskey, unsigned char *df_data, size_t bytes_to_return, struct list_head *seedlist, u8 blocklen_bytes, @@ -187,7 +177,7 @@ int crypto_drbg_ctr_df(struct crypto_aes_ctx *aesctx, */ drbg_cpu_to_be32(i, iv); /* 10.4.2 step 9.2 -- BCC and concatenation with temp */ - drbg_ctr_bcc(aesctx, temp + templen, K, &bcc_list, + drbg_ctr_bcc(aeskey, temp + templen, K, &bcc_list, blocklen_bytes, keylen); /* 10.4.2 step 9.3 */ i++; @@ -201,7 +191,7 @@ int crypto_drbg_ctr_df(struct crypto_aes_ctx *aesctx, /* 10.4.2 step 12: overwriting of outval is implemented in next step */ /* 10.4.2 step 13 */ - drbg_kcapi_symsetkey(aesctx, temp, keylen); + aes_prepareenckey(aeskey, temp, keylen); while (generated_len < bytes_to_return) { short blocklen = 0; /* @@ -209,7 +199,7 @@ int crypto_drbg_ctr_df(struct crypto_aes_ctx *aesctx, * implicit as the key is only drbg_blocklen in size based on * the implementation of the cipher function callback */ - drbg_kcapi_sym(aesctx, X, &cipherin, blocklen_bytes); + drbg_kcapi_sym(aeskey, X, &cipherin, blocklen_bytes); blocklen = (blocklen_bytes < (bytes_to_return - generated_len)) ? blocklen_bytes : diff --git a/crypto/drbg.c b/crypto/drbg.c index 1d433dae9955..85cc4549bd58 100644 --- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1505,9 +1505,9 @@ static int drbg_kcapi_hash(struct drbg_state *drbg, unsigned char *outval, #ifdef CONFIG_CRYPTO_DRBG_CTR static int drbg_fini_sym_kernel(struct drbg_state *drbg) { - struct crypto_aes_ctx *aesctx = (struct crypto_aes_ctx *)drbg->priv_data; + struct aes_enckey *aeskey = drbg->priv_data; - kfree(aesctx); + kfree(aeskey); drbg->priv_data = NULL; if (drbg->ctr_handle) @@ -1526,16 +1526,16 @@ static int drbg_fini_sym_kernel(struct drbg_state *drbg) static int drbg_init_sym_kernel(struct drbg_state *drbg) { - struct crypto_aes_ctx *aesctx; + struct aes_enckey *aeskey; struct crypto_skcipher *sk_tfm; struct skcipher_request *req; unsigned int alignmask; char ctr_name[CRYPTO_MAX_ALG_NAME]; - aesctx = kzalloc(sizeof(*aesctx), GFP_KERNEL); - if (!aesctx) + aeskey = kzalloc(sizeof(*aeskey), GFP_KERNEL); + if (!aeskey) return -ENOMEM; - drbg->priv_data = aesctx; + drbg->priv_data = aeskey; if (snprintf(ctr_name, CRYPTO_MAX_ALG_NAME, "ctr(%s)", drbg->core->backend_cra_name) >= CRYPTO_MAX_ALG_NAME) { diff --git a/drivers/crypto/xilinx/xilinx-trng.c b/drivers/crypto/xilinx/xilinx-trng.c index db0fbb28ff32..5276ac2d82bb 100644 --- a/drivers/crypto/xilinx/xilinx-trng.c +++ b/drivers/crypto/xilinx/xilinx-trng.c @@ -60,7 +60,7 @@ struct xilinx_rng { void __iomem *rng_base; struct device *dev; unsigned char *scratchpadbuf; - struct crypto_aes_ctx *aesctx; + struct aes_enckey *aeskey; struct mutex lock; /* Protect access to TRNG device */ struct hwrng trng; }; @@ -198,7 +198,7 @@ static int xtrng_reseed_internal(struct xilinx_rng *rng) ret = xtrng_collect_random_data(rng, entropy, TRNG_SEED_LEN_BYTES, true); if (ret != TRNG_SEED_LEN_BYTES) return -EINVAL; - ret = crypto_drbg_ctr_df(rng->aesctx, rng->scratchpadbuf, + ret = crypto_drbg_ctr_df(rng->aeskey, rng->scratchpadbuf, TRNG_SEED_LEN_BYTES, &seedlist, AES_BLOCK_SIZE, TRNG_SEED_LEN_BYTES); if (ret) @@ -349,8 +349,8 @@ static int xtrng_probe(struct platform_device *pdev) return PTR_ERR(rng->rng_base); } - rng->aesctx = devm_kzalloc(&pdev->dev, sizeof(*rng->aesctx), GFP_KERNEL); - if (!rng->aesctx) + rng->aeskey = devm_kzalloc(&pdev->dev, sizeof(*rng->aeskey), GFP_KERNEL); + if (!rng->aeskey) return -ENOMEM; sb_size = crypto_drbg_ctr_df_datalen(TRNG_SEED_LEN_BYTES, AES_BLOCK_SIZE); diff --git a/include/crypto/df_sp80090a.h b/include/crypto/df_sp80090a.h index 6b25305fe611..cb5d6fe15d40 100644 --- a/include/crypto/df_sp80090a.h +++ b/include/crypto/df_sp80090a.h @@ -18,7 +18,7 @@ static inline int crypto_drbg_ctr_df_datalen(u8 statelen, u8 blocklen) statelen + blocklen; /* temp */ } -int crypto_drbg_ctr_df(struct crypto_aes_ctx *aes, +int crypto_drbg_ctr_df(struct aes_enckey *aes, unsigned char *df_data, size_t bytes_to_return, struct list_head *seedlist, -- cgit v1.2.3 From 7fcb22dc7765185a86077f65179ec4fa3f30b910 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:30 -0800 Subject: lib/crypto: aescfb: Use new AES library API Switch from the old AES library functions (which use struct crypto_aes_ctx) to the new ones (which use struct aes_enckey). This eliminates the unnecessary computation and caching of the decryption round keys. The new AES en/decryption functions are also much faster and use AES instructions when supported by the CPU. Note that in addition to the change in the key preparation function and the key struct type itself, the change in the type of the key struct results in aes_encrypt() (which is temporarily a type-generic macro) calling the new encryption function rather than the old one. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-33-ebiggers@kernel.org Signed-off-by: Eric Biggers --- drivers/char/tpm/tpm2-sessions.c | 10 +++++----- include/crypto/aes.h | 4 ++-- lib/crypto/aescfb.c | 30 +++++++++++++++--------------- 3 files changed, 22 insertions(+), 22 deletions(-) (limited to 'include') diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index 4149379665c4..09df6353ef04 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -126,7 +126,7 @@ struct tpm2_auth { u8 session_key[SHA256_DIGEST_SIZE]; u8 passphrase[SHA256_DIGEST_SIZE]; int passphrase_len; - struct crypto_aes_ctx aes_ctx; + struct aes_enckey aes_key; /* saved session attributes: */ u8 attrs; __be32 ordinal; @@ -677,8 +677,8 @@ int tpm_buf_fill_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf) auth->scratch); len = tpm_buf_read_u16(buf, &offset_p); - aes_expandkey(&auth->aes_ctx, auth->scratch, AES_KEY_BYTES); - aescfb_encrypt(&auth->aes_ctx, &buf->data[offset_p], + aes_prepareenckey(&auth->aes_key, auth->scratch, AES_KEY_BYTES); + aescfb_encrypt(&auth->aes_key, &buf->data[offset_p], &buf->data[offset_p], len, auth->scratch + AES_KEY_BYTES); /* reset p to beginning of parameters for HMAC */ @@ -858,8 +858,8 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf, auth->scratch); len = tpm_buf_read_u16(buf, &offset_p); - aes_expandkey(&auth->aes_ctx, auth->scratch, AES_KEY_BYTES); - aescfb_decrypt(&auth->aes_ctx, &buf->data[offset_p], + aes_prepareenckey(&auth->aes_key, auth->scratch, AES_KEY_BYTES); + aescfb_decrypt(&auth->aes_key, &buf->data[offset_p], &buf->data[offset_p], len, auth->scratch + AES_KEY_BYTES); } diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 4a56aed59973..4cb3c27d1bf5 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -343,9 +343,9 @@ extern const u8 crypto_aes_inv_sbox[]; extern const u32 aes_enc_tab[256]; extern const u32 aes_dec_tab[256]; -void aescfb_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, +void aescfb_encrypt(const struct aes_enckey *key, u8 *dst, const u8 *src, int len, const u8 iv[AES_BLOCK_SIZE]); -void aescfb_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, +void aescfb_decrypt(const struct aes_enckey *key, u8 *dst, const u8 *src, int len, const u8 iv[AES_BLOCK_SIZE]); #endif diff --git a/lib/crypto/aescfb.c b/lib/crypto/aescfb.c index 0f294c8cbf3c..147e5211728f 100644 --- a/lib/crypto/aescfb.c +++ b/lib/crypto/aescfb.c @@ -11,7 +11,7 @@ #include #include -static void aescfb_encrypt_block(const struct crypto_aes_ctx *ctx, void *dst, +static void aescfb_encrypt_block(const struct aes_enckey *key, void *dst, const void *src) { unsigned long flags; @@ -25,27 +25,27 @@ static void aescfb_encrypt_block(const struct crypto_aes_ctx *ctx, void *dst, * interrupts disabled. */ local_irq_save(flags); - aes_encrypt(ctx, dst, src); + aes_encrypt(key, dst, src); local_irq_restore(flags); } /** * aescfb_encrypt - Perform AES-CFB encryption on a block of data * - * @ctx: The AES-CFB key schedule + * @key: The AES-CFB key schedule * @dst: Pointer to the ciphertext output buffer * @src: Pointer the plaintext (may equal @dst for encryption in place) * @len: The size in bytes of the plaintext and ciphertext. * @iv: The initialization vector (IV) to use for this block of data */ -void aescfb_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, +void aescfb_encrypt(const struct aes_enckey *key, u8 *dst, const u8 *src, int len, const u8 iv[AES_BLOCK_SIZE]) { u8 ks[AES_BLOCK_SIZE]; const u8 *v = iv; while (len > 0) { - aescfb_encrypt_block(ctx, ks, v); + aescfb_encrypt_block(key, ks, v); crypto_xor_cpy(dst, src, ks, min(len, AES_BLOCK_SIZE)); v = dst; @@ -61,18 +61,18 @@ EXPORT_SYMBOL(aescfb_encrypt); /** * aescfb_decrypt - Perform AES-CFB decryption on a block of data * - * @ctx: The AES-CFB key schedule + * @key: The AES-CFB key schedule * @dst: Pointer to the plaintext output buffer * @src: Pointer the ciphertext (may equal @dst for decryption in place) * @len: The size in bytes of the plaintext and ciphertext. * @iv: The initialization vector (IV) to use for this block of data */ -void aescfb_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, +void aescfb_decrypt(const struct aes_enckey *key, u8 *dst, const u8 *src, int len, const u8 iv[AES_BLOCK_SIZE]) { u8 ks[2][AES_BLOCK_SIZE]; - aescfb_encrypt_block(ctx, ks[0], iv); + aescfb_encrypt_block(key, ks[0], iv); for (int i = 0; len > 0; i ^= 1) { if (len > AES_BLOCK_SIZE) @@ -81,7 +81,7 @@ void aescfb_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src, * performing the XOR, as that may update in place and * overwrite the ciphertext. */ - aescfb_encrypt_block(ctx, ks[!i], src); + aescfb_encrypt_block(key, ks[!i], src); crypto_xor_cpy(dst, src, ks[i], min(len, AES_BLOCK_SIZE)); @@ -214,15 +214,15 @@ static struct { static int __init libaescfb_init(void) { for (int i = 0; i < ARRAY_SIZE(aescfb_tv); i++) { - struct crypto_aes_ctx ctx; + struct aes_enckey key; u8 buf[64]; - if (aes_expandkey(&ctx, aescfb_tv[i].key, aescfb_tv[i].klen)) { - pr_err("aes_expandkey() failed on vector %d\n", i); + if (aes_prepareenckey(&key, aescfb_tv[i].key, aescfb_tv[i].klen)) { + pr_err("aes_prepareenckey() failed on vector %d\n", i); return -ENODEV; } - aescfb_encrypt(&ctx, buf, aescfb_tv[i].ptext, aescfb_tv[i].len, + aescfb_encrypt(&key, buf, aescfb_tv[i].ptext, aescfb_tv[i].len, aescfb_tv[i].iv); if (memcmp(buf, aescfb_tv[i].ctext, aescfb_tv[i].len)) { pr_err("aescfb_encrypt() #1 failed on vector %d\n", i); @@ -230,14 +230,14 @@ static int __init libaescfb_init(void) } /* decrypt in place */ - aescfb_decrypt(&ctx, buf, buf, aescfb_tv[i].len, aescfb_tv[i].iv); + aescfb_decrypt(&key, buf, buf, aescfb_tv[i].len, aescfb_tv[i].iv); if (memcmp(buf, aescfb_tv[i].ptext, aescfb_tv[i].len)) { pr_err("aescfb_decrypt() failed on vector %d\n", i); return -ENODEV; } /* encrypt in place */ - aescfb_encrypt(&ctx, buf, buf, aescfb_tv[i].len, aescfb_tv[i].iv); + aescfb_encrypt(&key, buf, buf, aescfb_tv[i].len, aescfb_tv[i].iv); if (memcmp(buf, aescfb_tv[i].ctext, aescfb_tv[i].len)) { pr_err("aescfb_encrypt() #2 failed on vector %d\n", i); -- cgit v1.2.3 From bc79efa08c038846b962edf04dc00922b48efdc7 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:31 -0800 Subject: lib/crypto: aesgcm: Use new AES library API Switch from the old AES library functions (which use struct crypto_aes_ctx) to the new ones (which use struct aes_enckey). This eliminates the unnecessary computation and caching of the decryption round keys. The new AES en/decryption functions are also much faster and use AES instructions when supported by the CPU. Note that in addition to the change in the key preparation function and the key struct type itself, the change in the type of the key struct results in aes_encrypt() (which is temporarily a type-generic macro) calling the new encryption function rather than the old one. Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-34-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/gcm.h | 2 +- lib/crypto/aesgcm.c | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/crypto/gcm.h b/include/crypto/gcm.h index fd9df607a836..b524e47bd4d0 100644 --- a/include/crypto/gcm.h +++ b/include/crypto/gcm.h @@ -66,7 +66,7 @@ static inline int crypto_ipsec_check_assoclen(unsigned int assoclen) struct aesgcm_ctx { be128 ghash_key; - struct crypto_aes_ctx aes_ctx; + struct aes_enckey aes_key; unsigned int authsize; }; diff --git a/lib/crypto/aesgcm.c b/lib/crypto/aesgcm.c index ac0b2fcfd606..02f5b5f32c76 100644 --- a/lib/crypto/aesgcm.c +++ b/lib/crypto/aesgcm.c @@ -12,7 +12,7 @@ #include #include -static void aesgcm_encrypt_block(const struct crypto_aes_ctx *ctx, void *dst, +static void aesgcm_encrypt_block(const struct aes_enckey *key, void *dst, const void *src) { unsigned long flags; @@ -26,7 +26,7 @@ static void aesgcm_encrypt_block(const struct crypto_aes_ctx *ctx, void *dst, * effective when running with interrupts disabled. */ local_irq_save(flags); - aes_encrypt(ctx, dst, src); + aes_encrypt(key, dst, src); local_irq_restore(flags); } @@ -49,12 +49,12 @@ int aesgcm_expandkey(struct aesgcm_ctx *ctx, const u8 *key, int ret; ret = crypto_gcm_check_authsize(authsize) ?: - aes_expandkey(&ctx->aes_ctx, key, keysize); + aes_prepareenckey(&ctx->aes_key, key, keysize); if (ret) return ret; ctx->authsize = authsize; - aesgcm_encrypt_block(&ctx->aes_ctx, &ctx->ghash_key, kin); + aesgcm_encrypt_block(&ctx->aes_key, &ctx->ghash_key, kin); return 0; } @@ -97,7 +97,7 @@ static void aesgcm_mac(const struct aesgcm_ctx *ctx, const u8 *src, int src_len, aesgcm_ghash(&ghash, &ctx->ghash_key, &tail, sizeof(tail)); ctr[3] = cpu_to_be32(1); - aesgcm_encrypt_block(&ctx->aes_ctx, buf, ctr); + aesgcm_encrypt_block(&ctx->aes_key, buf, ctr); crypto_xor_cpy(authtag, buf, (u8 *)&ghash, ctx->authsize); memzero_explicit(&ghash, sizeof(ghash)); @@ -119,7 +119,7 @@ static void aesgcm_crypt(const struct aesgcm_ctx *ctx, u8 *dst, const u8 *src, * len', this cannot happen, so no explicit test is necessary. */ ctr[3] = cpu_to_be32(n++); - aesgcm_encrypt_block(&ctx->aes_ctx, buf, ctr); + aesgcm_encrypt_block(&ctx->aes_key, buf, ctr); crypto_xor_cpy(dst, src, buf, min(len, AES_BLOCK_SIZE)); dst += AES_BLOCK_SIZE; -- cgit v1.2.3 From 953f2db0bfc2ab68e69f385441d0f4b54aee0cd1 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 12 Jan 2026 11:20:32 -0800 Subject: lib/crypto: aes: Remove old AES en/decryption functions Now that all callers of the aes_encrypt() and aes_decrypt() type-generic macros are using the new types, remove the old functions. Then, replace the macro with direct calls to the new functions, dropping the "_new" suffix from them. This completes the change in the type of the key struct that is passed to aes_encrypt() and aes_decrypt(). Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20260112192035.10427-35-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/aes.h | 24 ++--------- lib/crypto/aes.c | 118 +++------------------------------------------------ 2 files changed, 10 insertions(+), 132 deletions(-) (limited to 'include') diff --git a/include/crypto/aes.h b/include/crypto/aes.h index 4cb3c27d1bf5..cbf1cc96db52 100644 --- a/include/crypto/aes.h +++ b/include/crypto/aes.h @@ -308,17 +308,8 @@ typedef union { * * Context: Any context. */ -#define aes_encrypt(key, out, in) \ - _Generic((key), \ - struct crypto_aes_ctx *: aes_encrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ - const struct crypto_aes_ctx *: aes_encrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ - struct aes_enckey *: aes_encrypt_new((const struct aes_enckey *)(key), (out), (in)), \ - const struct aes_enckey *: aes_encrypt_new((const struct aes_enckey *)(key), (out), (in)), \ - struct aes_key *: aes_encrypt_new((const struct aes_key *)(key), (out), (in)), \ - const struct aes_key *: aes_encrypt_new((const struct aes_key *)(key), (out), (in))) -void aes_encrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); -void aes_encrypt_new(aes_encrypt_arg key, u8 out[at_least AES_BLOCK_SIZE], - const u8 in[at_least AES_BLOCK_SIZE]); +void aes_encrypt(aes_encrypt_arg key, u8 out[at_least AES_BLOCK_SIZE], + const u8 in[at_least AES_BLOCK_SIZE]); /** * aes_decrypt() - Decrypt a single AES block @@ -328,15 +319,8 @@ void aes_encrypt_new(aes_encrypt_arg key, u8 out[at_least AES_BLOCK_SIZE], * * Context: Any context. */ -#define aes_decrypt(key, out, in) \ - _Generic((key), \ - struct crypto_aes_ctx *: aes_decrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ - const struct crypto_aes_ctx *: aes_decrypt_old((const struct crypto_aes_ctx *)(key), (out), (in)), \ - struct aes_key *: aes_decrypt_new((const struct aes_key *)(key), (out), (in)), \ - const struct aes_key *: aes_decrypt_new((const struct aes_key *)(key), (out), (in))) -void aes_decrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in); -void aes_decrypt_new(const struct aes_key *key, u8 out[at_least AES_BLOCK_SIZE], - const u8 in[at_least AES_BLOCK_SIZE]); +void aes_decrypt(const struct aes_key *key, u8 out[at_least AES_BLOCK_SIZE], + const u8 in[at_least AES_BLOCK_SIZE]); extern const u8 crypto_aes_sbox[]; extern const u8 crypto_aes_inv_sbox[]; diff --git a/lib/crypto/aes.c b/lib/crypto/aes.c index 88da68dcf5a8..b7ab2d0c4e59 100644 --- a/lib/crypto/aes.c +++ b/lib/crypto/aes.c @@ -251,22 +251,6 @@ static u32 inv_mix_columns(u32 x) return mix_columns(x ^ y ^ ror32(y, 16)); } -static __always_inline u32 subshift(u32 in[], int pos) -{ - return (aes_sbox[in[pos] & 0xff]) ^ - (aes_sbox[(in[(pos + 1) % 4] >> 8) & 0xff] << 8) ^ - (aes_sbox[(in[(pos + 2) % 4] >> 16) & 0xff] << 16) ^ - (aes_sbox[(in[(pos + 3) % 4] >> 24) & 0xff] << 24); -} - -static __always_inline u32 inv_subshift(u32 in[], int pos) -{ - return (aes_inv_sbox[in[pos] & 0xff]) ^ - (aes_inv_sbox[(in[(pos + 3) % 4] >> 8) & 0xff] << 8) ^ - (aes_inv_sbox[(in[(pos + 2) % 4] >> 16) & 0xff] << 16) ^ - (aes_inv_sbox[(in[(pos + 1) % 4] >> 24) & 0xff] << 24); -} - static u32 subw(u32 in) { return (aes_sbox[in & 0xff]) ^ @@ -345,51 +329,6 @@ int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, } EXPORT_SYMBOL(aes_expandkey); -void aes_encrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) -{ - const u32 *rkp = ctx->key_enc + 4; - int rounds = 6 + ctx->key_length / 4; - u32 st0[4], st1[4]; - int round; - - st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in); - st0[1] = ctx->key_enc[1] ^ get_unaligned_le32(in + 4); - st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8); - st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12); - - /* - * Force the compiler to emit data independent Sbox references, - * by xoring the input with Sbox values that are known to add up - * to zero. This pulls the entire Sbox into the D-cache before any - * data dependent lookups are done. - */ - st0[0] ^= aes_sbox[ 0] ^ aes_sbox[ 64] ^ aes_sbox[134] ^ aes_sbox[195]; - st0[1] ^= aes_sbox[16] ^ aes_sbox[ 82] ^ aes_sbox[158] ^ aes_sbox[221]; - st0[2] ^= aes_sbox[32] ^ aes_sbox[ 96] ^ aes_sbox[160] ^ aes_sbox[234]; - st0[3] ^= aes_sbox[48] ^ aes_sbox[112] ^ aes_sbox[186] ^ aes_sbox[241]; - - for (round = 0;; round += 2, rkp += 8) { - st1[0] = mix_columns(subshift(st0, 0)) ^ rkp[0]; - st1[1] = mix_columns(subshift(st0, 1)) ^ rkp[1]; - st1[2] = mix_columns(subshift(st0, 2)) ^ rkp[2]; - st1[3] = mix_columns(subshift(st0, 3)) ^ rkp[3]; - - if (round == rounds - 2) - break; - - st0[0] = mix_columns(subshift(st1, 0)) ^ rkp[4]; - st0[1] = mix_columns(subshift(st1, 1)) ^ rkp[5]; - st0[2] = mix_columns(subshift(st1, 2)) ^ rkp[6]; - st0[3] = mix_columns(subshift(st1, 3)) ^ rkp[7]; - } - - put_unaligned_le32(subshift(st1, 0) ^ rkp[4], out); - put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4); - put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8); - put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12); -} -EXPORT_SYMBOL(aes_encrypt_old); - static __always_inline u32 enc_quarterround(const u32 w[4], int i, u32 rk) { return rk ^ aes_enc_tab[(u8)w[i]] ^ @@ -498,51 +437,6 @@ static void __maybe_unused aes_decrypt_generic(const u32 inv_rndkeys[], put_unaligned_le32(declast_quarterround(w, 3, *rkp++), &out[12]); } -void aes_decrypt_old(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in) -{ - const u32 *rkp = ctx->key_dec + 4; - int rounds = 6 + ctx->key_length / 4; - u32 st0[4], st1[4]; - int round; - - st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in); - st0[1] = ctx->key_dec[1] ^ get_unaligned_le32(in + 4); - st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8); - st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12); - - /* - * Force the compiler to emit data independent Sbox references, - * by xoring the input with Sbox values that are known to add up - * to zero. This pulls the entire Sbox into the D-cache before any - * data dependent lookups are done. - */ - st0[0] ^= aes_inv_sbox[ 0] ^ aes_inv_sbox[ 64] ^ aes_inv_sbox[129] ^ aes_inv_sbox[200]; - st0[1] ^= aes_inv_sbox[16] ^ aes_inv_sbox[ 83] ^ aes_inv_sbox[150] ^ aes_inv_sbox[212]; - st0[2] ^= aes_inv_sbox[32] ^ aes_inv_sbox[ 96] ^ aes_inv_sbox[160] ^ aes_inv_sbox[236]; - st0[3] ^= aes_inv_sbox[48] ^ aes_inv_sbox[112] ^ aes_inv_sbox[187] ^ aes_inv_sbox[247]; - - for (round = 0;; round += 2, rkp += 8) { - st1[0] = inv_mix_columns(inv_subshift(st0, 0)) ^ rkp[0]; - st1[1] = inv_mix_columns(inv_subshift(st0, 1)) ^ rkp[1]; - st1[2] = inv_mix_columns(inv_subshift(st0, 2)) ^ rkp[2]; - st1[3] = inv_mix_columns(inv_subshift(st0, 3)) ^ rkp[3]; - - if (round == rounds - 2) - break; - - st0[0] = inv_mix_columns(inv_subshift(st1, 0)) ^ rkp[4]; - st0[1] = inv_mix_columns(inv_subshift(st1, 1)) ^ rkp[5]; - st0[2] = inv_mix_columns(inv_subshift(st1, 2)) ^ rkp[6]; - st0[3] = inv_mix_columns(inv_subshift(st1, 3)) ^ rkp[7]; - } - - put_unaligned_le32(inv_subshift(st1, 0) ^ rkp[4], out); - put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4); - put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8); - put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12); -} -EXPORT_SYMBOL(aes_decrypt_old); - /* * Note: the aes_prepare*key_* names reflect the fact that the implementation * might not actually expand the key. (The s390 code for example doesn't.) @@ -608,19 +502,19 @@ int aes_prepareenckey(struct aes_enckey *key, const u8 *in_key, size_t key_len) } EXPORT_SYMBOL(aes_prepareenckey); -void aes_encrypt_new(aes_encrypt_arg key, u8 out[AES_BLOCK_SIZE], - const u8 in[AES_BLOCK_SIZE]) +void aes_encrypt(aes_encrypt_arg key, u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) { aes_encrypt_arch(key.enc_key, out, in); } -EXPORT_SYMBOL(aes_encrypt_new); +EXPORT_SYMBOL(aes_encrypt); -void aes_decrypt_new(const struct aes_key *key, u8 out[AES_BLOCK_SIZE], - const u8 in[AES_BLOCK_SIZE]) +void aes_decrypt(const struct aes_key *key, u8 out[AES_BLOCK_SIZE], + const u8 in[AES_BLOCK_SIZE]) { aes_decrypt_arch(key, out, in); } -EXPORT_SYMBOL(aes_decrypt_new); +EXPORT_SYMBOL(aes_decrypt); #ifdef aes_mod_init_arch static int __init aes_mod_init(void) -- cgit v1.2.3 From ffd42b6d0420c4be97cc28fd1bb5f4c29e286e98 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 2 Feb 2026 14:15:52 -0800 Subject: lib/crypto: mldsa: Clarify the documentation for mldsa_verify() slightly mldsa_verify() implements ML-DSA.Verify with ctx='', so document this more explicitly. Remove the one-liner comment above mldsa_verify() which was somewhat misleading. Reviewed-by: David Howells Link: https://lore.kernel.org/r/20260202221552.174341-1-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/crypto/mldsa.h | 4 +++- lib/crypto/mldsa.c | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/crypto/mldsa.h b/include/crypto/mldsa.h index cf30aef29970..3ef2676787c9 100644 --- a/include/crypto/mldsa.h +++ b/include/crypto/mldsa.h @@ -39,7 +39,9 @@ enum mldsa_alg { * otherwise -EBADMSG will be returned. * * This verifies a signature using pure ML-DSA with the specified parameter set. - * The context string is assumed to be empty. + * The context string is assumed to be empty. This corresponds to FIPS 204 + * Algorithm 3 "ML-DSA.Verify" with the ctx parameter set to the empty string + * and the lengths of the signature and key given explicitly by the caller. * * Context: Might sleep * diff --git a/lib/crypto/mldsa.c b/lib/crypto/mldsa.c index ba0c0468956e..c96fddc4e7dc 100644 --- a/lib/crypto/mldsa.c +++ b/lib/crypto/mldsa.c @@ -525,7 +525,6 @@ static size_t encode_w1(u8 out[MAX_W1_ENCODED_LEN], return pos; } -/* Reference: FIPS 204 Section 6.3 "ML-DSA Verifying (Internal)" */ int mldsa_verify(enum mldsa_alg alg, const u8 *sig, size_t sig_len, const u8 *msg, size_t msg_len, const u8 *pk, size_t pk_len) { -- cgit v1.2.3