summaryrefslogtreecommitdiff
path: root/crypto
AgeCommit message (Collapse)AuthorFilesLines
2017-04-10crypto: lz4 - fixed decompress function to return error codeMyungho Jung2-2/+2
Decompress function in LZ4 library is supposed to return an error code or negative result. But, it returns -1 when any error is detected. Return error code when the library returns negative value. Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-10crypto: af_alg - Allow arbitrarily long algorithm namesHerbert Xu1-2/+2
This patch removes the hard-coded 64-byte limit on the length of the algorithm name through bind(2). The address length can now exceed that. The user-space structure remains unchanged. In order to use a longer name simply extend the salg_name array beyond its defined 64 bytes length. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-10crypto: user - Prepare for CRYPTO_MAX_ALG_NAME expansionHerbert Xu1-9/+9
This patch hard-codes CRYPTO_MAX_NAME in the user-space API to 64, which is the current value of CRYPTO_MAX_ALG_NAME. This patch also replaces all remaining occurences of CRYPTO_MAX_ALG_NAME in the user-space API with CRYPTO_MAX_NAME. This way the user-space API will not be modified when we raise the value of CRYPTO_MAX_ALG_NAME. Furthermore, the code has been updated to handle names longer than the user-space API. They will be truncated. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
2017-04-05crypto: xts - drop gf128mul dependencyOndrej Mosnáček1-1/+0
Since the gf128mul_x_ble function used by xts.c is now defined inline in the header file, the XTS module no longer depends on gf128mul. Therefore, the 'select CRYPTO_GF128MUL' line can be safely removed. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Reviewd-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05crypto: gf128mul - switch gf128mul_x_ble to le128Ondrej Mosnáček1-19/+19
Currently, gf128mul_x_ble works with pointers to be128, even though it actually interprets the words as little-endian. Consequently, it uses cpu_to_le64/le64_to_cpu on fields of type __be64, which is incorrect. This patch fixes that by changing the function to accept pointers to le128 and updating all users accordingly. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Reviewd-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05crypto: gf128mul - define gf128mul_x_* in gf128mul.hOndrej Mosnáček1-32/+1
The gf128mul_x_ble function is currently defined in gf128mul.c, because it depends on the gf128mul_table_be multiplication table. However, since the function is very small and only uses two values from the table, it is better for it to be defined as inline function in gf128mul.h. That way, the function can be inlined by the compiler for better performance. For consistency, the other gf128mul_x_* functions are also moved to the header file. In addition, the code is rewritten to be constant-time. After this change, the speed of the generic 'xts(aes)' implementation increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64 and CRYPTO_AES_NI_INTEL disabled). Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Reviewd-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu2-4/+10
Merge the crypto tree to resolve conflict between caam changes.
2017-03-24crypto: DRBG - initialize SGL only onceStephan Mueller1-3/+2
An SGL to be initialized only once even when its buffers are written to several times. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24crypto: testmgr - mark ctr(des3_ede) as fips_allowedMarcelo Cerri1-0/+1
3DES is missing the fips_allowed flag for CTR mode. Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24md5: remove from lib and only live in cryptoJason A. Donenfeld1-1/+94
The md5_transform function is no longer used any where in the tree, except for the crypto api's actual implementation of md5, so we can drop the function from lib and put it as a static function of the crypto file, where it belongs. There should be no new users of md5_transform, anyway, since there are more modern ways of doing what it once achieved. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24crypto: powerpc - Stress test for vpmsum implementationsDaniel Axtens1-0/+8
vpmsum implementations often don't kick in for short test vectors. This is a simple test module that does a configurable number of random tests, each up to 64kB and each with random offsets. Both CRC-T10DIF and CRC32C are tested. Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24crypto: powerpc - Add CRC-T10DIF accelerationDaniel Axtens1-0/+9
T10DIF is a CRC16 used heavily in NVMe. It turns out we can accelerate it with a CRC32 library and a few little tricks. Provide the accelerator based the refactored CRC32 code. Cc: Anton Blanchard <anton@samba.org> Thanks-to: Hong Bo Peng <penghb@cn.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxHerbert Xu2-8/+10
Merging 4.11-rc3 to pick up md5 removal from /dev/random.
2017-03-24crypto: xts,lrw - fix out-of-bounds write after kmalloc failureEric Biggers2-4/+10
In the generic XTS and LRW algorithms, for input data > 128 bytes, a temporary buffer is allocated to hold the values to be XOR'ed with the data before and after encryption or decryption. If the allocation fails, the fixed-size buffer embedded in the request buffer is meant to be used as a fallback --- resulting in more calls to the ECB algorithm, but still producing the correct result. However, we weren't correctly limiting subreq->cryptlen in this case, resulting in pre_crypt() overrunning the embedded buffer. Fix this by setting subreq->cryptlen correctly. Fixes: f1c131b45410 ("crypto: xts - Convert to skcipher") Fixes: 700cb3f5fe75 ("crypto: lrw - Convert to skcipher") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-10net: Work around lockdep limitation in sockets that use socketsDavid Howells2-8/+10
Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem. The theory lockdep comes up with is as follows: (1) If the pagefault handler decides it needs to read pages from AFS, it calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but creating a call requires the socket lock: mmap_sem must be taken before sk_lock-AF_RXRPC (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind() binds the underlying UDP socket whilst holding its socket lock. inet_bind() takes its own socket lock: sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET (3) Reading from a TCP socket into a userspace buffer might cause a fault and thus cause the kernel to take the mmap_sem, but the TCP socket is locked whilst doing this: sk_lock-AF_INET must be taken before mmap_sem However, lockdep's theory is wrong in this instance because it deals only with lock classes and not individual locks. The AF_INET lock in (2) isn't really equivalent to the AF_INET lock in (3) as the former deals with a socket entirely internal to the kernel that never sees userspace. This is a limitation in the design of lockdep. Fix the general case by: (1) Double up all the locking keys used in sockets so that one set are used if the socket is created by userspace and the other set is used if the socket is created by the kernel. (2) Store the kern parameter passed to sk_alloc() in a variable in the sock struct (sk_kern_sock). This informs sock_lock_init(), sock_init_data() and sk_clone_lock() as to the lock keys to be used. Note that the child created by sk_clone_lock() inherits the parent's kern setting. (3) Add a 'kern' parameter to ->accept() that is analogous to the one passed in to ->create() that distinguishes whether kernel_accept() or sys_accept4() was the caller and can be passed to sk_alloc(). Note that a lot of accept functions merely dequeue an already allocated socket. I haven't touched these as the new socket already exists before we get the parameter. Note also that there are a couple of places where I've made the accepted socket unconditionally kernel-based: irda_accept() rds_rcp_accept_one() tcp_accept_from_sock() because they follow a sock_create_kern() and accept off of that. Whilst creating this, I noticed that lustre and ocfs don't create sockets through sock_create_kern() and thus they aren't marked as for-kernel, though they appear to be internal. I wonder if these should do that so that they use the new set of lock keys. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09crypto: ctr - Propagate NEED_FALLBACK bitMarcelo Cerri1-5/+18
When requesting a fallback algorithm, we should propagate the NEED_FALLBACK bit when search for the underlying algorithm. This will prevents drivers from allocating unnecessary fallbacks that are never called. For instance, currently the vmx-crypto driver will use the following chain of calls when calling the fallback implementation: p8_aes_ctr -> ctr(p8_aes) -> aes-generic However p8_aes will always delegate its calls to aes-generic. With this patch, p8_aes_ctr will be able to use ctr(aes-generic) directly as its fallback. The same applies to aes_s390. Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: cbc - Propagate NEED_FALLBACK bitMarcelo Cerri1-2/+13
When requesting a fallback algorithm, we should propagate the NEED_FALLBACK bit when search for the underlying algorithm. This will prevents drivers from allocating unnecessary fallbacks that are never called. For instance, currently the vmx-crypto driver will use the following chain of calls when calling the fallback implementation: p8_aes_cbc -> cbc(p8_aes) -> aes-generic However p8_aes will always delegate its calls to aes-generic. With this patch, p8_aes_cbc will be able to use cbc(aes-generic) directly as its fallback. The same applies to aes_s390. Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: testmgr - constify all test vectorsEric Biggers2-286/+297
Cryptographic test vectors should never be modified, so constify them to enforce this at both compile-time and run-time. This moves a significant amount of data from .data to .rodata when the crypto tests are enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: kpp - constify buffer passed to crypto_kpp_set_secret()Eric Biggers2-2/+4
Constify the buffer passed to crypto_kpp_set_secret() and kpp_alg.set_secret, since it is never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: algapi - annotate expected branch behavior in crypto_inc()Ard Biesheuvel1-2/+2
To prevent unnecessary branching, mark the exit condition of the primary loop as likely(), given that a carry in a 32-bit counter occurs very rarely. On arm64, the resulting code is emitted by GCC as 9a8: cmp w1, #0x3 9ac: add x3, x0, w1, uxtw 9b0: b.ls 9e0 <crypto_inc+0x38> 9b4: ldr w2, [x3,#-4]! 9b8: rev w2, w2 9bc: add w2, w2, #0x1 9c0: rev w4, w2 9c4: str w4, [x3] 9c8: cbz w2, 9d0 <crypto_inc+0x28> 9cc: ret where the two remaining branch conditions (one for size < 4 and one for the carry) are statically predicted as non-taken, resulting in optimal execution in the vast majority of cases. Also, replace the open coded alignment test with IS_ALIGNED(). Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: gf128mul - constify 4k and 64k multiplication tablesEric Biggers1-3/+3
Constify the multiplication tables passed to the 4k and 64k multiplication functions, as they are not modified by these functions. Cc: Alex Cope <alexcope@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: gf128mul - rename the byte overflow tablesEric Biggers1-17/+32
Though the GF(2^128) byte overflow tables were named the "lle" and "bbe" tables, they are not actually tied to these element formats specifically, but rather to particular a "bit endianness". For example, the bbe table is actually used for both bbe and ble multiplication. Therefore, rename the tables to "le" and "be" and update the comment to explain this. Cc: Alex Cope <alexcope@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: gf128mul - remove xx() macroEric Biggers1-10/+8
The xx() macro serves no purpose and can be removed. Cc: Alex Cope <alexcope@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09crypto: gf128mul - fix some commentsEric Biggers1-6/+7
Fix incorrect references to GF(128) instead of GF(2^128), as these are two entirely different fields, and fix a few other incorrect comments. Cc: Alex Cope <alexcope@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-04Merge branch 'linus' of ↵Linus Torvalds3-9/+12
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - vmalloc stack regression in CCM - Build problem in CRC32 on ARM - Memory leak in cavium - Missing Kconfig dependencies in atmel and mediatek - XTS Regression on some platforms (s390 and ppc) - Memory overrun in CCM test vector * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: vmx - Use skcipher for xts fallback crypto: vmx - Use skcipher for cbc fallback crypto: testmgr - Pad aes_ccm_enc_tv_template vector crypto: arm/crc32 - add build time test for CRC instruction support crypto: arm/crc32 - fix build error with outdated binutils crypto: ccm - move cbcmac input off the stack crypto: xts - Propagate NEED_FALLBACK bit crypto: api - Add crypto_requires_off helper crypto: atmel - CRYPTO_DEV_MEDIATEK should depend on HAS_DMA crypto: atmel - CRYPTO_DEV_ATMEL_TDES and CRYPTO_DEV_ATMEL_SHA should depend on HAS_DMA crypto: cavium - fix leak on curr if curr->head fails to be allocated crypto: cavium - Fix couple of static checker errors
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/stat.h> We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/stat.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar4-2/+4
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01crypto: testmgr - Pad aes_ccm_enc_tv_template vectorLaura Abbott1-1/+1
Running with KASAN and crypto tests currently gives BUG: KASAN: global-out-of-bounds in __test_aead+0x9d9/0x2200 at addr ffffffff8212fca0 Read of size 16 by task cryptomgr_test/1107 Address belongs to variable 0xffffffff8212fca0 CPU: 0 PID: 1107 Comm: cryptomgr_test Not tainted 4.10.0+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 Call Trace: dump_stack+0x63/0x8a kasan_report.part.1+0x4a7/0x4e0 ? __test_aead+0x9d9/0x2200 ? crypto_ccm_init_crypt+0x218/0x3c0 [ccm] kasan_report+0x20/0x30 check_memory_region+0x13c/0x1a0 memcpy+0x23/0x50 __test_aead+0x9d9/0x2200 ? kasan_unpoison_shadow+0x35/0x50 ? alg_test_akcipher+0xf0/0xf0 ? crypto_skcipher_init_tfm+0x2e3/0x310 ? crypto_spawn_tfm2+0x37/0x60 ? crypto_ccm_init_tfm+0xa9/0xd0 [ccm] ? crypto_aead_init_tfm+0x7b/0x90 ? crypto_alloc_tfm+0xc4/0x190 test_aead+0x28/0xc0 alg_test_aead+0x54/0xd0 alg_test+0x1eb/0x3d0 ? alg_find_test+0x90/0x90 ? __sched_text_start+0x8/0x8 ? __wake_up_common+0x70/0xb0 cryptomgr_test+0x4d/0x60 kthread+0x173/0x1c0 ? crypto_acomp_scomp_free_ctx+0x60/0x60 ? kthread_create_on_node+0xa0/0xa0 ret_from_fork+0x2c/0x40 Memory state around the buggy address: ffffffff8212fb80: 00 00 00 00 01 fa fa fa fa fa fa fa 00 00 00 00 ffffffff8212fc00: 00 01 fa fa fa fa fa fa 00 00 00 00 01 fa fa fa >ffffffff8212fc80: fa fa fa fa 00 05 fa fa fa fa fa fa 00 00 00 00 ^ ffffffff8212fd00: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa ffffffff8212fd80: fa fa fa fa 00 00 00 00 00 05 fa fa fa fa fa fa This always happens on the same IV which is less than 16 bytes. Per Ard, "CCM IVs are 16 bytes, but due to the way they are constructed internally, the final couple of bytes of input IV are dont-cares. Apparently, we do read all 16 bytes, which triggers the KASAN errors." Fix this by padding the IV with null bytes to be at least 16 bytes. Cc: stable@vger.kernel.org Fixes: 0bc5a6c5c79a ("crypto: testmgr - Disable rfc4309 test and convert test vectors") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-28crypto: ccm - move cbcmac input off the stackArd Biesheuvel1-2/+3
Commit f15f05b0a5de ("crypto: ccm - switch to separate cbcmac driver") refactored the CCM driver to allow separate implementations of the underlying MAC to be provided by a platform. However, in doing so, it moved some data from the linear region to the stack, which violates the SG constraints when the stack is virtually mapped. So move idata/odata back to the request ctx struct, of which we can reasonably expect that it has been allocated using kmalloc() et al. Reported-by: Johannes Berg <johannes@sipsolutions.net> Fixes: f15f05b0a5de ("crypto: ccm - switch to separate cbcmac driver") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-27crypto: xts - Propagate NEED_FALLBACK bitHerbert Xu1-6/+8
When we're used as a fallback algorithm, we should propagate the NEED_FALLBACK bit when searching for the underlying ECB mode. This just happens to fix a hang too because otherwise the search may end up loading the same module that triggered this XTS creation. Cc: stable@vger.kernel.org #4.10 Fixes: f1c131b45410 ("crypto: xts - Convert to skcipher") Reported-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-25crypto: change LZ4 modules to work with new LZ4 module versionSven Schmidt3-68/+120
Update the crypto modules using LZ4 compression as well as the test cases in testmgr.h to work with the new LZ4 module version. Link: http://lkml.kernel.org/r/1486321748-19085-4-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-23crypto: xts - Add ECB dependencyMilan Broz1-0/+1
Since the commit f1c131b45410a202eb45cc55980a7a9e4e4b4f40 crypto: xts - Convert to skcipher the XTS mode is based on ECB, so the mode must select ECB otherwise it can fail to initialize. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-15crypto: ccm - drop unnecessary minimum 32-bit alignmentArd Biesheuvel1-2/+1
The CCM driver forces 32-bit alignment even if the underlying ciphers don't care about alignment. This is because crypto_xor() used to require this, but since this is no longer the case, drop the hardcoded minimum of 32 bits. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-15crypto: ccm - honour alignmask of subordinate MAC cipherArd Biesheuvel1-8/+10
The CCM driver was recently updated to defer the MAC part of the algorithm to a dedicated crypto transform, and a template for instantiating such transforms was added at the same time. However, this new cbcmac template fails to take the alignmask of the encapsulated cipher into account, which may result in buffer addresses being passed down that are not sufficiently aligned. So update the code to ensure that the digest buffer in the desc ctx appears at a sufficiently aligned offset, and tweak the code so that all calls to crypto_cipher_encrypt_one() operate on this buffer exclusively. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: algapi - make crypto_xor() and crypto_inc() alignment agnosticArd Biesheuvel7-32/+52
Instead of unconditionally forcing 4 byte alignment for all generic chaining modes that rely on crypto_xor() or crypto_inc() (which may result in unnecessary copying of data when the underlying hardware can perform unaligned accesses efficiently), make those functions deal with unaligned input explicitly, but only if the Kconfig symbol HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers. For crypto_inc(), this simply involves making the 4-byte stride conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that it typically operates on 16 byte buffers. For crypto_xor(), an algorithm is implemented that simply runs through the input using the largest strides possible if unaligned accesses are allowed. If they are not, an optimal sequence of memory accesses is emitted that takes the relative alignment of the input buffers into account, e.g., if the relative misalignment of dst and src is 4 bytes, the entire xor operation will be completed using 4 byte loads and stores (modulo unaligned bits at the start and end). Note that all expressions involving misalign are simply eliminated by the compiler when HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: improve gcc optimization flags for serpent and wp512Arnd Bergmann1-0/+2
An ancient gcc bug (first reported in 2003) has apparently resurfaced on MIPS, where kernelci.org reports an overly large stack frame in the whirlpool hash algorithm: crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=] With some testing in different configurations, I'm seeing large variations in stack frames size up to 1500 bytes for what should have around 300 bytes at most. I also checked the reference implementation, which is essentially the same code but also comes with some test and benchmarking infrastructure. It seems that recent compiler versions on at least arm, arm64 and powerpc have a partial fix for this problem, but enabling "-fsched-pressure", but even with that fix they suffer from the issue to a certain degree. Some testing on arm64 shows that the time needed to hash a given amount of data is roughly proportional to the stack frame size here, which makes sense given that the wp512 implementation is doing lots of loads for table lookups, and the problem with the overly large stack is a result of doing a lot more loads and stores for spilled registers (as seen from inspecting the object code). Disabling -fschedule-insns consistently fixes the problem for wp512, in my collection of cross-compilers, the results are consistently better or identical when comparing the stack sizes in this function, though some architectures (notable x86) have schedule-insns disabled by default. The four columns are: default: -O2 press: -O2 -fsched-pressure nopress: -O2 -fschedule-insns -fno-sched-pressure nosched: -O2 -no-schedule-insns (disables sched-pressure) default press nopress nosched alpha-linux-gcc-4.9.3 1136 848 1136 176 am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 cris-linux-gcc-4.9.3 272 272 272 272 frv-linux-gcc-4.9.3 1128 1000 1128 280 hppa64-linux-gcc-4.9.3 1128 336 1128 184 hppa-linux-gcc-4.9.3 644 308 644 276 i386-linux-gcc-4.9.3 352 352 352 352 m32r-linux-gcc-4.9.3 720 656 720 268 microblaze-linux-gcc-4.9.3 1108 604 1108 256 mips64-linux-gcc-4.9.3 1328 592 1328 208 mips-linux-gcc-4.9.3 1096 624 1096 240 powerpc64-linux-gcc-4.9.3 1088 432 1088 160 powerpc-linux-gcc-4.9.3 1080 584 1080 224 s390-linux-gcc-4.9.3 456 456 624 360 sh3-linux-gcc-4.9.3 292 292 292 292 sparc64-linux-gcc-4.9.3 992 240 992 208 sparc-linux-gcc-4.9.3 680 592 680 312 x86_64-linux-gcc-4.9.3 224 240 272 224 xtensa-linux-gcc-4.9.3 1152 704 1152 304 aarch64-linux-gcc-7.0.0 224 224 1104 208 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 mips-linux-gcc-7.0.0 1120 648 1120 272 x86_64-linux-gcc-7.0.1 240 240 304 240 arm-linux-gnueabi-gcc-4.4.7 840 392 arm-linux-gnueabi-gcc-4.5.4 784 728 784 320 arm-linux-gnueabi-gcc-4.6.4 736 728 736 304 arm-linux-gnueabi-gcc-4.7.4 944 784 944 352 arm-linux-gnueabi-gcc-4.8.5 464 464 760 352 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336 arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 Trying the same test for serpent-generic, the picture is a bit different, and while -fno-schedule-insns is generally better here than the default, -fsched-pressure wins overall, so I picked that instead. default press nopress nosched alpha-linux-gcc-4.9.3 1392 864 1392 960 am33_2.0-linux-gcc-4.9.3 536 524 536 528 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 cris-linux-gcc-4.9.3 528 528 528 528 frv-linux-gcc-4.9.3 536 400 536 504 hppa64-linux-gcc-4.9.3 524 208 524 480 hppa-linux-gcc-4.9.3 768 472 768 508 i386-linux-gcc-4.9.3 564 564 564 564 m32r-linux-gcc-4.9.3 712 576 712 532 microblaze-linux-gcc-4.9.3 724 392 724 512 mips64-linux-gcc-4.9.3 720 384 720 496 mips-linux-gcc-4.9.3 728 384 728 496 powerpc64-linux-gcc-4.9.3 704 304 704 480 powerpc-linux-gcc-4.9.3 704 296 704 480 s390-linux-gcc-4.9.3 560 560 592 536 sh3-linux-gcc-4.9.3 540 540 540 540 sparc64-linux-gcc-4.9.3 544 352 544 496 sparc-linux-gcc-4.9.3 544 344 544 496 x86_64-linux-gcc-4.9.3 528 536 576 528 xtensa-linux-gcc-4.9.3 752 544 752 544 aarch64-linux-gcc-7.0.0 432 432 656 480 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 mips-linux-gcc-7.0.0 720 464 720 488 x86_64-linux-gcc-7.0.1 536 528 600 536 arm-linux-gnueabi-gcc-4.4.7 592 440 arm-linux-gnueabi-gcc-4.5.4 776 448 776 544 arm-linux-gnueabi-gcc-4.6.4 776 448 776 544 arm-linux-gnueabi-gcc-4.7.4 768 448 768 544 arm-linux-gnueabi-gcc-4.8.5 488 488 776 544 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 arm-linux-gnueabi-gcc-5.3.1 552 552 776 536 arm-linux-gnueabi-gcc-6.1.1 560 560 776 536 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 I did not do any runtime tests with serpent, so it is possible that stack frame size does not directly correlate with runtime performance here and it actually makes things worse, but it's more likely to help here, and the reduced stack frame size is probably enough reason to apply the patch, especially given that the crypto code is often used in deep call chains. Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/ Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: ccm - switch to separate cbcmac driverArd Biesheuvel2-137/+245
Update the generic CCM driver to defer CBC-MAC processing to a dedicated CBC-MAC ahash transform rather than open coding this transform (and much of the associated scatterwalk plumbing) in the CCM driver itself. This cleans up the code considerably, but more importantly, it allows the use of alternative CBC-MAC implementations that don't suffer from performance degradation due to significant setup time (e.g., the NEON based AES code needs to enable/disable the NEON, and load the S-box into 16 SIMD registers, which cannot be amortized over the entire input when using the cipher interface) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: testmgr - add test cases for cbcmac(aes)Ard Biesheuvel2-0/+67
In preparation of splitting off the CBC-MAC transform in the CCM driver into a separate algorithm, define some test cases for the AES incarnation of cbcmac. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: aes - add generic time invariant AES cipherArd Biesheuvel3-0/+393
Lookup table based AES is sensitive to timing attacks, which is due to the fact that such table lookups are data dependent, and the fact that 8 KB worth of tables covers a significant number of cachelines on any architecture, resulting in an exploitable correlation between the key and the processing time for known plaintexts. For network facing algorithms such as CTR, CCM or GCM, this presents a security risk, which is why arch specific AES ports are typically time invariant, either through the use of special instructions, or by using SIMD algorithms that don't rely on table lookups. For generic code, this is difficult to achieve without losing too much performance, but we can improve the situation significantly by switching to an implementation that only needs 256 bytes of table data (the actual S-box itself), which can be prefetched at the start of each block to eliminate data dependent latencies. This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Decryption is substantially slower. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: aes-generic - drop alignment requirementArd Biesheuvel1-32/+32
The generic AES code exposes a 32-bit align mask, which forces all users of the code to use temporary buffers or take other measures to ensure the alignment requirement is adhered to, even on architectures that don't care about alignment for software algorithms such as this one. So drop the align mask, and fix the code to use get_unaligned_le32() where appropriate, which will resolve to whatever is optimal for the architecture. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu2-1/+2
Merge the crypto tree to pick up arm64 output IV patch.
2017-02-03crypto: algif_aead - Fix kernel panic on list_delHarsh Jain1-1/+1
Kernel panics when userspace program try to access AEAD interface. Remove node from Linked List before freeing its memory. Cc: <stable@vger.kernel.org> Signed-off-by: Harsh Jain <harsh@chelsio.com> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: tcrypt - Add debug printsRabin Vincent1-0/+6
tcrypt is very tight-lipped when it succeeds, but a bit more feedback would be useful when developing or debugging crypto drivers, especially since even a successful run ends with the module failing to insert. Add a couple of debug prints, which can be enabled with dynamic debug: Before: # insmod tcrypt.ko mode=10 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable After: # insmod tcrypt.ko mode=10 dyndbg tcrypt: testing ecb(aes) tcrypt: testing cbc(aes) tcrypt: testing lrw(aes) tcrypt: testing xts(aes) tcrypt: testing ctr(aes) tcrypt: testing rfc3686(ctr(aes)) tcrypt: all tests passed insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an algSalvatore Benedetto1-0/+1
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with the algorithm registration. This fixes qat-dh registration when driver is restarted Cc: <stable@vger.kernel.org> Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13crypto: testmgr - use calculated count for number of test vectorsArd Biesheuvel2-1101/+204
When working on AES in CCM mode for ARM, my code passed the internal tcrypt test before I had even bothered to implement the AES-192 and AES-256 code paths, which is strange because the tcrypt does contain AES-192 and AES-256 test vectors for CCM. As it turned out, the define AES_CCM_ENC_TEST_VECTORS was out of sync with the actual number of test vectors, causing only the AES-128 ones to be executed. So get rid of the defines, and wrap the test vector references in a macro that calculates the number of vectors automatically. The following test vector counts were out of sync with the respective defines: BF_CTR_ENC_TEST_VECTORS 2 -> 3 BF_CTR_DEC_TEST_VECTORS 2 -> 3 TF_CTR_ENC_TEST_VECTORS 2 -> 3 TF_CTR_DEC_TEST_VECTORS 2 -> 3 SERPENT_CTR_ENC_TEST_VECTORS 2 -> 3 SERPENT_CTR_DEC_TEST_VECTORS 2 -> 3 AES_CCM_ENC_TEST_VECTORS 8 -> 14 AES_CCM_DEC_TEST_VECTORS 7 -> 17 AES_CCM_4309_ENC_TEST_VECTORS 7 -> 23 AES_CCM_4309_DEC_TEST_VECTORS 10 -> 23 CAMELLIA_CTR_ENC_TEST_VECTORS 2 -> 3 CAMELLIA_CTR_DEC_TEST_VECTORS 2 -> 3 Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: testmgr - Allocate only the required output size for hash testsAndrew Lutomirski1-4/+5
There are some hashes (e.g. sha224) that have some internal trickery to make sure that only the correct number of output bytes are generated. If something goes wrong, they could potentially overrun the output buffer. Make the test more robust by allocating only enough space for the correct output size so that memory debugging will catch the error if the output is overrun. Tested by intentionally breaking sha224 to output all 256 internally-generated bits while running on KASAN. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: Replaced gcc specific attributes with macros from compiler.hGideon Israel Dsouza13-20/+33
Continuing from this commit: 52f5684c8e1e ("kernel: use macros from compiler.h instead of __attribute__((...))") I submitted 4 total patches. They are part of task I've taken up to increase compiler portability in the kernel. I've cleaned up the subsystems under /kernel /mm /block and /security, this patch targets /crypto. There is <linux/compiler.h> which provides macros for various gcc specific constructs. Eg: __weak for __attribute__((weak)). I've cleaned all instances of gcc specific attributes with the right macros for the crypto subsystem. I had to make one additional change into compiler-gcc.h for the case when one wants to use this: __attribute__((aligned) and not specify an alignment factor. From the gcc docs, this will result in the largest alignment for that data type on the target machine so I've named the macro __aligned_largest. Please advise if another name is more appropriate. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: testmgr - use kmemdup instead of kmalloc+memcpyEric Biggers1-4/+2
It's recommended to use kmemdup instead of kmalloc followed by memcpy. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-30crypto: skcipher - introduce walksize attribute for SIMD algosArd Biesheuvel1-8/+12
In some cases, SIMD algorithms can only perform optimally when allowed to operate on multiple input blocks in parallel. This is especially true for bit slicing algorithms, which typically take the same amount of time processing a single block or 8 blocks in parallel. However, other SIMD algorithms may benefit as well from bigger strides. So add a walksize attribute to the skcipher algorithm definition, and wire it up to the skcipher walk API. To avoid confusion between the skcipher and AEAD attributes, rename the skcipher_walk chunksize attribute to 'stride', and set it from the walksize (in the skcipher case) or from the chunksize (in the AEAD case). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>