<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/crypto/x86, branch v7.0-rc7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-15T22:09:07+00:00</updated>
<entry>
<title>lib/crypto: x86/aes: Add AES-NI optimization</title>
<updated>2026-01-15T22:09:07+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-01-12T19:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=24eb22d8161380eba65edc5b499299639cbe8bf9'/>
<id>urn:sha1:24eb22d8161380eba65edc5b499299639cbe8bf9</id>
<content type='text'>
Optimize the AES library with x86 AES-NI instructions.

The relevant existing assembly functions, aesni_set_key(), aesni_enc(),
and aesni_dec(), are a bit difficult to extract into the library:

- They're coupled to the code for the AES modes.
- They operate on struct crypto_aes_ctx.  The AES library now uses
  different structs.
- They assume the key is 16-byte aligned.  The AES library only
  *prefers* 16-byte alignment; it doesn't require it.

Moreover, they're not all that great in the first place:

- They use unrolled loops, which isn't a great choice on x86.
- They use the 'aeskeygenassist' instruction, which is unnecessary, is
  slow on Intel CPUs, and forces the loop to be unrolled.
- They have special code for AES-192 key expansion, despite that being
  kind of useless.  AES-128 and AES-256 are the ones used in practice.

These are small functions anyway.

Therefore, I opted to just write replacements of these functions for the
library.  They address all the above issues.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260112192035.10427-18-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/nh: Migrate optimized code into library</title>
<updated>2026-01-12T19:07:50+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-12-11T01:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a229d83235c7627c490deb7dd4744a72567cea12'/>
<id>urn:sha1:a229d83235c7627c490deb7dd4744a72567cea12</id>
<content type='text'>
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/polyval: Migrate optimized code into library</title>
<updated>2025-11-11T19:03:38+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-09T23:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d8da35579daad0392d238460ed7e9629d49ca35'/>
<id>urn:sha1:4d8da35579daad0392d238460ed7e9629d49ca35</id>
<content type='text'>
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ba60c5914f25a44f10189c6919a737b199f6dbf'/>
<id>urn:sha1:8ba60c5914f25a44f10189c6919a737b199f6dbf</id>
<content type='text'>
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96.  This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cd5528621abb01664a477392cd3e76be2ef6296b'/>
<id>urn:sha1:cd5528621abb01664a477392cd3e76be2ef6296b</id>
<content type='text'>
Just before returning, blake2s_compress_ssse3() and
blake2s_compress_avx512() store updated values to the 'h', 't', and 'f'
fields of struct blake2s_ctx.  But 'f' is always unchanged (which is
correct; only the C code changes it).  So, there's no need to write to
'f'.  Use 64-bit stores (movq and vmovq) instead of 128-bit stores
(movdqu and vmovdqu) so that only 't' is written.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Improve readability</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a7acd77ebd7f17b07a6ab2ca1dd1e4d487bdfa80'/>
<id>urn:sha1:a7acd77ebd7f17b07a6ab2ca1dd1e4d487bdfa80</id>
<content type='text'>
Various cleanups for readability.  No change to the generated code:

- Add some comments
- Add #defines for arguments
- Rename some labels
- Use decimal constants instead of hex where it makes sense.
  (The pshufd immediates intentionally remain as hex.)
- Add blank lines when there's a logical break

The round loop still could use some work, but this is at least a start.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Use local labels for data</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83c1a867c9999b3bb83e3da8f22eec9ec77a523c'/>
<id>urn:sha1:83c1a867c9999b3bb83e3da8f22eec9ec77a523c</id>
<content type='text'>
Following the usual practice, prefix the names of the data labels with
".L" so that the assembler treats them as truly local.  This more
clearly expresses the intent and is less error-prone.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Drop check for nblocks == 0</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c19bdf24cc274e96006267173d664df2ef4b13db'/>
<id>urn:sha1:c19bdf24cc274e96006267173d664df2ef4b13db</id>
<content type='text'>
Since blake2s_compress() is always passed nblocks != 0, remove the
unnecessary check for nblocks == 0 from blake2s_compress_ssse3().

Note that this makes it consistent with blake2s_compress_avx512() in the
same file as well as the arm32 blake2s_compress().

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit</title>
<updated>2025-11-06T04:30:51+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f22115709fc7ebcfa40af3367a508fbbd2f71e9'/>
<id>urn:sha1:2f22115709fc7ebcfa40af3367a508fbbd2f71e9</id>
<content type='text'>
In the C code, the 'inc' argument to the assembly functions
blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with
type u32, matching blake2s_compress().  The assembly code then reads it
from the 64-bit %rcx.  However, the ABI doesn't guarantee zero-extension
to 64 bits, nor do gcc or clang guarantee it.  Therefore, fix these
functions to read this argument from the 32-bit %ecx.

In theory, this bug could have caused the wrong 'inc' value to be used,
causing incorrect BLAKE2s hashes.  In practice, probably not: I've fixed
essentially this same bug in many other assembly files too, but there's
never been a real report of it having caused a problem.  In x86_64, all
writes to 32-bit registers are zero-extended to 64 bits.  That results
in zero-extension in nearly all situations.  I've only been able to
demonstrate a lack of zero-extension with a somewhat contrived example
involving truncation, e.g. when the C code has a u64 variable holding
0x1234567800000040 and passes it as a u32 expecting it to be truncated
to 0x40 (64).  But that's not what the real code does, of course.

Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: blake2s: Drop excessive const &amp; rename block =&gt; data</title>
<updated>2025-10-30T05:04:24+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-10-18T04:30:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5385bcbffe5a76a74d6bb135af1c88fb235f8134'/>
<id>urn:sha1:5385bcbffe5a76a74d6bb135af1c88fb235f8134</id>
<content type='text'>
A couple more small cleanups to the BLAKE2s code before these things get
propagated into the BLAKE2b code:

- Drop 'const' from some non-pointer function parameters.  It was a bit
  excessive and not conventional.

- Rename 'block' argument of blake2s_compress*() to 'data'.  This is for
  consistency with the SHA-* code, and also to avoid the implication
  that it points to a singular "block".

No functional changes.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251018043106.375964-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
</feed>
