summaryrefslogtreecommitdiff
path: root/arch/arm/crypto
AgeCommit message (Collapse)AuthorFilesLines
2022-03-31Merge tag 'v5.18-p1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Missing Kconfig dependency on arm that leads to boot failure - x86 SLS fixes - Reference leak in the stm32 driver * tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm3 - Fixup SLS crypto: x86/poly1305 - Fixup SLS crypto: x86/chacha20 - Avoid spurious jumps to other functions crypto: stm32 - fix reference leak in stm32_crc_remove crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
2022-03-25crypto: arm/aes-neonbs-cbc - Select generic cbc and aesHerbert Xu1-0/+2
The algorithm __cbc-aes-neonbs requires a fallback so we need to select the config options for them or otherwise it will fail to register on boot-up. Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-22Merge branch 'linus' of ↵Linus Torvalds2-63/+77
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - hwrng core now credits for low-quality RNG devices. Algorithms: - Optimisations for neon aes on arm/arm64. - Add accelerated crc32_be on arm64. - Add ffdheXYZ(dh) templates. - Disallow hmac keys < 112 bits in FIPS mode. - Add AVX assembly implementation for sm3 on x86. Drivers: - Add missing local_bh_disable calls for crypto_engine callback. - Ensure BH is disabled in crypto_engine callback path. - Fix zero length DMA mappings in ccree. - Add synchronization between mailbox accesses in octeontx2. - Add Xilinx SHA3 driver. - Add support for the TDES IP available on sama7g5 SoC in atmel" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits) crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list crypto: dh - Remove the unused function dh_safe_prime_dh_alg() hwrng: nomadik - Change clk_disable to clk_disable_unprepare crypto: arm64 - cleanup comments crypto: qat - fix initialization of pfvf rts_map_msg structures crypto: qat - fix initialization of pfvf cap_msg structures crypto: qat - remove unneeded assignment crypto: qat - disable registration of algorithms crypto: hisilicon/qm - fix memset during queues clearing crypto: xilinx: prevent probing on non-xilinx hardware crypto: marvell/octeontx - Use swap() instead of open coding it crypto: ccree - Fix use after free in cc_cipher_exit() crypto: ccp - ccp_dmaengine_unregister release dma channels crypto: octeontx2 - fix missing unlock hwrng: cavium - fix NULL but dereferenced coccicheck error crypto: cavium/nitrox - don't cast parameter in bit operations crypto: vmx - add missing dependencies MAINTAINERS: Add maintainer for Xilinx ZynqMP SHA3 driver crypto: xilinx - Add Xilinx SHA3 driver ...
2022-02-05crypto: arm/aes-neonbs-ctr - deal with non-multiples of AES block sizeArd Biesheuvel2-63/+77
Instead of falling back to C code to deal with the final bit of input that is not a round multiple of the block size, handle this in the asm code, permitting us to use overlapping loads and stores for performance, and implement the 16-byte wide XOR using a single NEON instruction. Since NEON loads and stores have a natural width of 16 bytes, we need to handle inputs of less than 16 bytes in a special way, but this rarely occurs in practice so it does not impact performance. All other input sizes can be consumed directly by the NEON asm code, although it should be noted that the core AES transform can still only process 128 bytes (8 AES blocks) at a time. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-04lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFIJason A. Donenfeld1-2/+2
blake2s_compress_generic is weakly aliased by blake2s_compress. The current harness for function selection uses a function pointer, which is ordinarily inlined and resolved at compile time. But when Clang's CFI is enabled, CFI still triggers when making an indirect call via a weak symbol. This seems like a bug in Clang's CFI, as though it's bucketing weak symbols and strong symbols differently. It also only seems to trigger when "full LTO" mode is used, rather than "thin LTO". [ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444) [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1 [ 0.000000][ T0] Hardware name: MT6873 (DT) [ 0.000000][ T0] Call trace: [ 0.000000][ T0] dump_backtrace+0xfc/0x1dc [ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c [ 0.000000][ T0] panic+0x194/0x464 [ 0.000000][ T0] __cfi_check_fail+0x54/0x58 [ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0 [ 0.000000][ T0] blake2s_update+0x14c/0x178 [ 0.000000][ T0] _extract_entropy+0xf4/0x29c [ 0.000000][ T0] crng_initialize_primary+0x24/0x94 [ 0.000000][ T0] rand_initialize+0x2c/0x6c [ 0.000000][ T0] start_kernel+0x2f8/0x65c [ 0.000000][ T0] __primary_switched+0xc4/0x7be4 [ 0.000000][ T0] Rebooting in 5 seconds.. Nonetheless, the function pointer method isn't so terrific anyway, so this patch replaces it with a simple boolean, which also gets inlined away. This successfully works around the Clang bug. In general, I'm not too keen on all of the indirection involved here; it clearly does more harm than good. Hopefully the whole thing can get cleaned up down the road when lib/crypto is overhauled more comprehensively. But for now, we go with a simple bandaid. Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in") Link: https://github.com/ClangBuiltLinux/linux/issues/1567 Reported-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07lib/crypto: blake2s: include as built-inJason A. Donenfeld4-77/+83
In preparation for using blake2s in the RNG, we change the way that it is wired-in to the build system. Instead of using ifdefs to select the right symbol, we use weak symbols. And because ARM doesn't need the generic implementation, we make the generic one default only if an arch library doesn't need it already, and then have arch libraries that do need it opt-in. So that the arch libraries can remain tristate rather than bool, we then split the shash part from the glue code. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-07-16crypto: arm/curve25519 - rename 'mod_init' & 'mod_exit' functions to be ↵Randy Dunlap1-4/+4
module-specific Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-05-14crypto: arm - use a pattern rule for generating *.S filesMasahiro Yamada1-7/+1
Unify similar build rules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-05-14crypto: arm - generate *.S by Perl at build time instead of shipping themMasahiro Yamada4-5848/+3
Generate *.S by Perl like arch/{mips,x86}/crypto/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-16crypto: arm/curve25519 - Move '.fpu' after '.arch'Nathan Chancellor1-1/+1
Debian's clang carries a patch that makes the default FPU mode 'vfp3-d16' instead of 'neon' for 'armv7-a' to avoid generating NEON instructions on hardware that does not support them: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/raw/5a61ca6f21b4ad8c6ac4970e5ea5a7b5b4486d22/debian/patches/clang-arm-default-vfp3-on-armv7a.patch https://bugs.debian.org/841474 https://bugs.debian.org/842142 https://bugs.debian.org/914268 This results in the following build error when clang's integrated assembler is used because the '.arch' directive overrides the '.fpu' directive: arch/arm/crypto/curve25519-core.S:25:2: error: instruction requires: NEON vmov.i32 q0, #1 ^ arch/arm/crypto/curve25519-core.S:26:2: error: instruction requires: NEON vshr.u64 q1, q0, #7 ^ arch/arm/crypto/curve25519-core.S:27:2: error: instruction requires: NEON vshr.u64 q0, q0, #8 ^ arch/arm/crypto/curve25519-core.S:28:2: error: instruction requires: NEON vmov.i32 d4, #19 ^ Shuffle the order of the '.arch' and '.fpu' directives so that the code builds regardless of the default FPU mode. This has been tested against both clang with and without Debian's patch and GCC. Cc: stable@vger.kernel.org Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation") Link: https://github.com/ClangBuiltLinux/continuous-integration2/issues/118 Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Jessica Clarke <jrtc27@jrtc27.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-02crypto: poly1305 - fix poly1305_core_setkey() declarationArnd Bergmann1-1/+1
gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16]) | ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’} 21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/chacha-scalar - switch to common rev_l macroArd Biesheuvel1-30/+13
Drop the local definition of a byte swapping macro and use the common one instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/aes-scalar - switch to common rev_l/mov_l macrosArd Biesheuvel1-32/+10
The scalar AES implementation has some locally defined macros which reimplement things that are now available in macros defined in assembler.h. So let's switch to those. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/blake2s - fix for big endianEric Biggers1-0/+21
The new ARM BLAKE2s code doesn't work correctly (fails the self-tests) in big endian kernel builds because it doesn't swap the endianness of the message words when loading them. Fix this. Fixes: 5172d322d34c ("crypto: arm/blake2s - add ARM scalar optimized BLAKE2s") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-07crypto: arm/blake2b - drop unnecessary return statementEric Biggers1-2/+2
Neither crypto_unregister_shashes() nor the module_exit function return a value, so the explicit 'return' is unnecessary. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/blake2b - add NEON-accelerated BLAKE2bEric Biggers4-0/+464
Add a NEON-accelerated implementation of BLAKE2b. On Cortex-A7 (which these days is the most common ARM processor that doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as SHA-256, and slightly faster than SHA-1. It is also almost three times as fast as the generic implementation of BLAKE2b: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.0 sha1-neon 16.3 blake2s-256-arm 18.8 sha1-asm 20.8 blake2s-256-generic 26.0 sha256-neon 28.9 sha256-asm 32.0 blake2b-256-generic 38.9 This implementation isn't directly based on any other implementation, but it borrows some ideas from previous NEON code I've written as well as from chacha-neon-core.S. At least on Cortex-A7, it is faster than the other NEON implementations of BLAKE2b I'm aware of (the implementation in the BLAKE2 official repository using intrinsics, and Andrew Moon's implementation which can be found in SUPERCOP). It does only one block at a time, so it performs well on short messages too. NEON-accelerated BLAKE2b is useful because there is interest in using BLAKE2b-256 for dm-verity on low-end Android devices (specifically, devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On these devices, the performance cost of upgrading to SHA-256 may be unacceptable, whereas BLAKE2b-256 would actually improve performance. Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which is intended for 32-bit platforms), on 32-bit ARM processors with NEON, BLAKE2b is actually faster than BLAKE2s. This is because NEON supports 64-bit operations, and because BLAKE2s's block size is too small for NEON to be helpful for it. The best I've been able to do with BLAKE2s on Cortex-A7 is 18.8 cpb with an optimized scalar implementation. (I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but they're more complex as they require running multiple hashes at once. Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7, so I expect that any speedup from BLAKE2sp or BLAKE3 would come only from the smaller number of rounds, not from the extra parallelism.) For now this BLAKE2b implementation is only wired up to the shash API, since there is no library API for BLAKE2b yet. However, I've tried to keep things consistent with BLAKE2s, e.g. by defining blake2b_compress_arch() which is analogous to blake2s_compress_arch() and could be exported for use by the library API later if needed. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/blake2s - add ARM scalar optimized BLAKE2sEric Biggers4-0/+374
Add an ARM scalar optimized implementation of BLAKE2s. NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too small for NEON to help. Each NEON instruction would depend on the previous one, resulting in poor performance. With scalar instructions, on the other hand, we can take advantage of ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an implementation get runs much faster than the C implementation. Performance results on Cortex-A7 in cycles per byte using the shash API: 4096-byte messages: blake2s-256-arm: 18.8 blake2s-256-generic: 26.0 500-byte messages: blake2s-256-arm: 20.3 blake2s-256-generic: 27.9 100-byte messages: blake2s-256-arm: 29.7 blake2s-256-generic: 39.2 32-byte messages: blake2s-256-arm: 50.6 blake2s-256-generic: 66.2 Except on very short messages, this is still slower than the NEON implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8, and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively. However, optimized BLAKE2s is useful for cases where BLAKE2s is used instead of BLAKE2b, such as WireGuard. This new implementation is added in the form of a new module blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it provides blake2s_compress_arch() for use by the library API as well as optionally register the algorithms with the shash API. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: remove cipher routines from public crypto APIArd Biesheuvel1-0/+3
The cipher routines in the crypto API are mostly intended for templates implementing skcipher modes generically in software, and shouldn't be used outside of the crypto subsystem. So move the prototypes and all related definitions to a new header file under include/crypto/internal. Also, let's use the new module namespace feature to move the symbol exports into a new namespace CRYPTO_INTERNAL. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/chacha-neon - add missing counter incrementArd Biesheuvel1-0/+1
Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") refactored the chacha block handling in the glue code in a way that may result in the counter increment to be omitted when calling chacha_block_xor_neon() to process a full block. This violates the skcipher API, which requires that the output IV is suitable for handling more input as long as the preceding input has been presented in round multiples of the block size. Also, the same code is exposed via the chacha library interface whose callers may actually rely on this increment to occur even for final blocks that are smaller than the chacha block size. So increment the counter after calling chacha_block_xor_neon(). Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04crypto: arm/aes-ce - work around Cortex-A57/A72 silion errataArd Biesheuvel1-10/+22
ARM Cortex-A57 and Cortex-A72 cores running in 32-bit mode are affected by silicon errata #1742098 and #1655431, respectively, where the second instruction of a AES instruction pair may execute twice if an interrupt is taken right after the first instruction consumes an input register of which a single 32-bit lane has been updated the last time it was modified. This is not such a rare occurrence as it may seem: in counter mode, only the least significant 32-bit word is incremented in the absence of a carry, which makes our counter mode implementation susceptible to these errata. So let's shuffle the counter assignments around a bit so that the most recent updates when the AES instruction pair executes are 128-bit wide. [0] ARM-EPM-049219 v23 Cortex-A57 MPCore Software Developers Errata Notice [1] ARM-EPM-012079 v11.0 Cortex-A72 MPCore Software Developers Errata Notice Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-20crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers9-9/+9
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-13crypto: arm/chacha-neon - optimize for non-block size multiplesArd Biesheuvel2-24/+107
The current NEON based ChaCha implementation for ARM is optimized for multiples of 4x the ChaCha block size (64 bytes). This makes sense for block encryption, but given that ChaCha is also often used in the context of networking, it makes sense to consider arbitrary length inputs as well. For example, WireGuard typically uses 1420 byte packets, and performing ChaCha encryption involves 5 invocations of chacha_4block_xor_neon() and 3 invocations of chacha_block_xor_neon(), where the last one also involves a memcpy() using a buffer on the stack to process the final chunk of 1420 % 64 == 12 bytes. Let's optimize for this case as well, by letting chacha_4block_xor_neon() deal with any input size between 64 and 256 bytes, using NEON permutation instructions and overlapping loads and stores. This way, the 140 byte tail of a 1420 byte input buffer can simply be processed in one go. This results in the following performance improvements for 1420 byte blocks, without significant impact on power-of-2 input sizes. (Note that Raspberry Pi is widely used in combination with a 32-bit kernel, even though the core is 64-bit capable) Cortex-A8 (BeagleBone) : 7% Cortex-A15 (Calxeda Midway) : 21% Cortex-A53 (Raspberry Pi 3) : 3% Cortex-A72 (Raspberry Pi 4) : 19% Cc: Eric Biggers <ebiggers@google.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-06crypto: arm/aes-neonbs - fix usage of cbc(aes) fallbackHoria Geantă1-3/+5
Loading the module deadlocks since: -local cbc(aes) implementation needs a fallback and -crypto API tries to find one but the request_module() resolves back to the same module Fix this by changing the module alias for cbc(aes) and using the NEED_FALLBACK flag when requesting for a fallback algorithm. Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc encryption path") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - use typed init/exit routines for XTSArd Biesheuvel1-6/+6
Use the typed skcipher init/exit routines instead of the generic cra_init/_exit routines when instantiating/releasing the XTS skciphers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid loading reorder argument on encryptionArd Biesheuvel1-2/+3
Reordering the tweak is never necessary for encryption, so avoid the argument load on the encryption path. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid hacks to prevent Thumb2 mode switchesArd Biesheuvel1-27/+22
Instead of using a homegrown macrofied version of the adr instruction that sets the Thumb bit in the output value, only to ensure that any bx instructions consuming that value will not switch out of Thumb mode when branching, use non-interworking mov (to PC) instructions, which achieve the same thing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha512-neon - avoid ADRL pseudo instructionArd Biesheuvel2-4/+4
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha256-neon - avoid ADRL pseudo instructionArd Biesheuvel2-4/+4
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11crypto: arm/aes-neonbs - Use generic cbc encryption pathHerbert Xu1-18/+28
Since commit b56f5cbc7e08ec7d31c42fc41e5247677f20b143 ("crypto: arm/aes-neonbs - resolve fallback cipher at runtime") the CBC encryption path in aes-neonbs is now identical to that obtained through the cbc template. This means that it can simply call the generic cbc template instead of doing its own thing. This patch removes the custom encryption path and simply invokes the generic cbc template. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-04crypto: arm/poly1305 - Add prototype for poly1305_blocks_neonHerbert Xu1-0/+1
This patch adds a prototype for poly1305_blocks_neon to slience a compiler warning: CC [M] arch/arm/crypto/poly1305-glue.o ../arch/arm/crypto/poly1305-glue.c:25:13: warning: no previous prototype for `poly1305_blocks_neon' [-Wmissing-prototypes] void __weak poly1305_blocks_neon(void *state, const u8 *src, u32 len, u32 hibit) ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-08-25crypto: arm/curve25519 - include <linux/scatterlist.h>Fabio Estevam1-0/+1
Building ARM allmodconfig leads to the following warnings: arch/arm/crypto/curve25519-glue.c:73:12: error: implicit declaration of function 'sg_copy_to_buffer' [-Werror=implicit-function-declaration] arch/arm/crypto/curve25519-glue.c:74:9: error: implicit declaration of function 'sg_nents_for_len' [-Werror=implicit-function-declaration] arch/arm/crypto/curve25519-glue.c:88:11: error: implicit declaration of function 'sg_copy_from_buffer' [-Werror=implicit-function-declaration] Include <linux/scatterlist.h> to fix such warnings Reported-by: Olof's autobuilder <build@lixom.net> Fixes: 0c3dc787a62a ("crypto: algapi - Remove skbuff.h inclusion") Signed-off-by: Fabio Estevam <festevam@gmail.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-23crypto: Replace HTTP links with HTTPS onesAlexander A. Klimov6-8/+8
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-09crypto: arm/ghash - use variably sized key structArd Biesheuvel1-27/+24
Of the two versions of GHASH that the ARM driver implements, only one performs aggregation, and so the other one has no use for the powers of H to be precomputed, or space to be allocated for them in the key struct. So make the context size dependent on which version is being selected, and while at it, use a static key to carry this decision, and get rid of the function pointer. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-11Merge branch 'rwonce/rework' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-01Merge branch 'linus' of ↵Linus Torvalds4-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-05-08crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.hEric Biggers4-4/+0
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30crypto: arch/nhpoly1305 - process in explicit 4k chunksJason A. Donenfeld1-1/+1
Rather than chunking via PAGE_SIZE, this commit changes the arch implementations to chunk in explicit 4k parts, so that calculations on maximum acceptable latency don't suddenly become invalid on platforms where PAGE_SIZE isn't 4k, such as arm64. Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305") Fixes: 012c82388c03 ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305") Fixes: a00fa0c88774 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305") Fixes: 16aae3595a9d ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30crypto: arch/lib - limit simd usage to 4k chunksJason A. Donenfeld2-7/+22
The initial Zinc patchset, after some mailing list discussion, contained code to ensure that kernel_fpu_enable would not be kept on for more than a 4k chunk, since it disables preemption. The choice of 4k isn't totally scientific, but it's not a bad guess either, and it's what's used in both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form of PAGE_SIZE, which this commit corrects to be explicitly 4k for the former two). Ard did some back of the envelope calculations and found that at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k means we have a maximum preemption disabling of 20us, which Sebastian confirmed was probably a good limit. Unfortunately the chunking appears to have been left out of the final patchset that added the glue code. So, this commit adds it back in. Fixes: 84e03fa39fbe ("crypto: x86/chacha - expose SIMD ChaCha routine as library function") Fixes: b3aad5bad26a ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function") Fixes: a44a3430d71b ("crypto: arm/chacha - expose ARM ChaCha routine as library function") Fixes: d7d7b8535662 ("crypto: x86/poly1305 - wire up faster implementations for kernel") Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation") Cc: Eric Biggers <ebiggers@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-15compiler/gcc: Raise minimum GCC version for kernel builds to 4.8Will Deacon1-6/+6
It is very rare to see versions of GCC prior to 4.8 being used to build the mainline kernel. These old compilers are also know to have codegen issues which can lead to silent miscompilation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Raise the minimum GCC version for kernel build to 4.8 and remove some tautological Kconfig dependencies as a consequence. Cc: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-04-03Merge tag 'spdx-5.7-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-03-30crypto: arm[64]/poly1305 - add artifact to .gitignore filesJason A. Donenfeld1-0/+1
The .S_shipped yields a .S, and the pattern in these directories is to add that to .gitignore so that git-status doesn't raise a fuss. Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Reported-by: Emil Renner Berthing <kernel@esmil.dk> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada1-0/+1
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20crypto: arm/neon - memzero_explicit aes-cbc keyTorsten Duwe1-0/+1
At function exit, do not leave the expanded key in the rk struct which got allocated on the stack. Signed-off-by: Torsten Duwe <duwe@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-06crypto: arm/ghash-ce - define fpu before fpu registers are referencedStefan Agner1-2/+3
Building ARMv7 with Clang's integrated assembler leads to errors such as: arch/arm/crypto/ghash-ce-core.S:34:11: error: register name expected t3l .req d16 ^ Since no FPU has selected yet Clang considers d16 not a valid register. Moving the FPU directive on-top allows Clang to parse the registers and allows to successfully build this file with Clang's integrated assembler. Signed-off-by: Stefan Agner <stefan@agner.ch> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22crypto: arm/chacha - fix build failured when kernel mode NEON is disabledArd Biesheuvel1-2/+2
When the ARM accelerated ChaCha driver is built as part of a configuration that has kernel mode NEON disabled, we expect the compiler to propagate the build time constant expression IS_ENABLED(CONFIG_KERNEL_MODE_NEON) in a way that eliminates all the cross-object references to the actual NEON routines, which allows the chacha-neon-core.o object to be omitted from the build entirely. Unfortunately, this fails to work as expected in some cases, and we may end up with a build error such as chacha-glue.c:(.text+0xc0): undefined reference to `chacha_4block_xor_neon' caused by the fact that chacha_doneon() has not been eliminated from the object code, even though it will never be called in practice. Let's fix this by adding some IS_ENABLED(CONFIG_KERNEL_MODE_NEON) tests that are not strictly needed from a logical point of view, but should help the compiler infer that the NEON code paths are unreachable in those cases. Fixes: b36d8c09e710c71f ("crypto: arm/chacha - remove dependency on generic ...") Reported-by: Russell King <linux@armlinux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16crypto: {arm,arm64,mips}/poly1305 - remove redundant non-reduction from emitJason A. Donenfeld1-16/+2
This appears to be some kind of copy and paste error, and is actually dead code. Pre: f = 0 ⇒ (f >> 32) = 0 f = (f >> 32) + le32_to_cpu(digest[0]); Post: 0 ≤ f < 2³² put_unaligned_le32(f, dst); Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0 f = (f >> 32) + le32_to_cpu(digest[1]); Post: 0 ≤ f < 2³² put_unaligned_le32(f, dst + 4); Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0 f = (f >> 32) + le32_to_cpu(digest[2]); Post: 0 ≤ f < 2³² put_unaligned_le32(f, dst + 8); Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0 f = (f >> 32) + le32_to_cpu(digest[3]); Post: 0 ≤ f < 2³² put_unaligned_le32(f, dst + 12); Therefore this sequence is redundant. And Andy's code appears to handle misalignment acceptably. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu1-0/+7
Merge crypto tree to pick up hisilicon patch.
2020-01-09crypto: remove propagation of CRYPTO_TFM_RES_* flagsEric Biggers1-6/+1
The CRYPTO_TFM_RES_* flags were apparently meant as a way to make the ->setkey() functions provide more information about errors. But these flags weren't actually being used or tested, and in many cases they weren't being set correctly anyway. So they've now been removed. Also, if someone ever actually needs to start better distinguishing ->setkey() errors (which is somewhat unlikely, as this has been unneeded for a long time), we'd be much better off just defining different return values, like -EINVAL if the key is invalid for the algorithm vs. -EKEYREJECTED if the key was rejected by a policy like "no weak keys". That would be much simpler, less error-prone, and easier to test. So just remove CRYPTO_TFM_RES_MASK and all the unneeded logic that propagates these flags around. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-09crypto: remove CRYPTO_TFM_RES_BAD_KEY_LENEric Biggers3-18/+4
The CRYPTO_TFM_RES_BAD_KEY_LEN flag was apparently meant as a way to make the ->setkey() functions provide more information about errors. However, no one actually checks for this flag, which makes it pointless. Also, many algorithms fail to set this flag when given a bad length key. Reviewing just the generic implementations, this is the case for aes-fixed-time, cbcmac, echainiv, nhpoly1305, pcrypt, rfc3686, rfc4309, rfc7539, rfc7539esp, salsa20, seqiv, and xcbc. But there are probably many more in arch/*/crypto/ and drivers/crypto/. Some algorithms can even set this flag when the key is the correct length. For example, authenc and authencesn set it when the key payload is malformed in any way (not just a bad length), the atmel-sha and ccree drivers can set it if a memory allocation fails, and the chelsio driver sets it for bad auth tag lengths, not just bad key lengths. So even if someone actually wanted to start checking this flag (which seems unlikely, since it's been unused for a long time), there would be a lot of work needed to get it working correctly. But it would probably be much better to go back to the drawing board and just define different return values, like -EINVAL if the key is invalid for the algorithm vs. -EKEYREJECTED if the key was rejected by a policy like "no weak keys". That would be much simpler, less error-prone, and easier to test. So just remove this flag. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-12-12crypto: arm/curve25519 - add arch-specific key generation functionJason A. Donenfeld1-0/+7
Somehow this was forgotten when Zinc was being split into oddly shaped pieces, resulting in linker errors. The x86_64 glue has a specific key generation implementation, but the Arm one does not. However, it can still receive the NEON speedups by calling the ordinary DH function using the base point. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>