<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/crypto/x86, branch master</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=master</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-24T00:50:59+00:00</updated>
<entry>
<title>lib/crypto: x86/sm3: Migrate optimized code into library</title>
<updated>2026-03-24T00:50:59+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-21T04:09:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17ba6108d3e084652807826cc49c851c00976f1a'/>
<id>urn:sha1:17ba6108d3e084652807826cc49c851c00976f1a</id>
<content type='text'>
Instead of exposing the x86-optimized SM3 code via an x86-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function.  This is much simpler, it makes the SM3 library functions be
x86-optimized, and it fixes the longstanding issue where the
x86-optimized SM3 code was disabled by default.  SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Tweak the prototype of sm3_transform_avx() to match what the library
expects, including changing the block count to size_t.  Note that the
assembly code actually already treated this argument as size_t.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/ghash: Migrate optimized code into library</title>
<updated>2026-03-23T23:44:29+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-19T06:17:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e79c8ec49596288c4460029c4971b9c838103b9'/>
<id>urn:sha1:3e79c8ec49596288c4460029c4971b9c838103b9</id>
<content type='text'>
Remove the "ghash-pclmulqdqni" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.

This makes the GHASH library be optimized with x86's carryless
multiplication instructions.  It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.

Rename and adjust the prototypes of the assembly functions to make them
fit better with the library.  Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.

Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication.  Wire it up to both
ghash_mul_arch() and polyval_mul_arch().

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functions</title>
<updated>2026-03-23T20:15:13+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-19T06:17:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b3b6e8f9b38911e9b30a5abe845541ade0797327'/>
<id>urn:sha1:b3b6e8f9b38911e9b30a5abe845541ade0797327</id>
<content type='text'>
Currently, some architectures (arm64 and x86) have optimized code for
both GHASH and POLYVAL.  Others (arm, powerpc, riscv, and s390) have
optimized code only for GHASH.  While POLYVAL support could be
implemented on these other architectures, until then we need to support
the case where arch-optimized functions are present only for GHASH.

Therefore, update the support for arch-optimized POLYVAL functions to
allow architectures to opt into supporting these functions individually.

The new meaning of CONFIG_CRYPTO_LIB_GF128HASH_ARCH is that some level
of GHASH and/or POLYVAL acceleration is provided.

Also provide an implementation of polyval_mul() based on
polyval_blocks_arch(), for when polyval_mul_arch() isn't implemented.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260319061723.1140720-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: gf128hash: Rename polyval module to gf128hash</title>
<updated>2026-03-23T20:15:13+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-19T06:17:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61f66c5216a961784b12307be60a25204525605c'/>
<id>urn:sha1:61f66c5216a961784b12307be60a25204525605c</id>
<content type='text'>
Currently, the standalone GHASH code is coupled with crypto_shash.  This
has resulted in unnecessary complexity and overhead, as well as the code
being unavailable to library code such as the AES-GCM library.  Like was
done with POLYVAL, it needs to find a new home in lib/crypto/.

GHASH and POLYVAL are closely related and can each be implemented in
terms of each other.  Optimized code for one can be reused with the
other.  But also since GHASH tends to be difficult to implement directly
due to its unnatural bit order, most modern GHASH implementations
(including the existing arm, arm64, powerpc, and x86 optimized GHASH
code, and the new generic GHASH code I'll be adding) actually
reinterpret the GHASH computation as an equivalent POLYVAL computation,
pre and post-processing the inputs and outputs to map to/from POLYVAL.

Given this close relationship, it makes sense to group the GHASH and
POLYVAL code together in the same module.  This gives us a wide range of
options for implementing them, reusing code between the two and properly
utilizing whatever instructions each architecture provides.

Thus, GHASH support will be added to the library module that is
currently called "polyval".  Rename it to an appropriate name:
"gf128hash".  Rename files, options, functions, etc. where appropriate
to reflect the upcoming sharing with GHASH.  (Note: polyval_kunit is not
renamed, as ghash_kunit will be added alongside it instead.)

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260319061723.1140720-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function</title>
<updated>2026-03-14T18:44:18+00:00</updated>
<author>
<name>AlanSong-oc</name>
<email>AlanSong-oc@zhaoxin.com</email>
</author>
<published>2026-03-13T08:01:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=44b02a14d993d91ae36409a54941ac5a5ad20b44'/>
<id>urn:sha1:44b02a14d993d91ae36409a54941ac5a5ad20b44</id>
<content type='text'>
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
XSHA256, XSHA384 and XSHA512 instructions. The instruction specification
is available at the following link.
(https://gitee.com/openzhaoxin/zhaoxin_specifications/blob/20260227/ZX_Padlock_Reference.pdf)

With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.

This patch includes the XSHA256 instruction optimized implementation of
SHA-256 transform function.

The table below shows the benchmark results before and after applying
this patch by using CRYPTO_LIB_BENCHMARK on Zhaoxin KX-7000 platform,
highlighting the achieved speedups.

+---------+--------------------------+
|         |          SHA256          |
+---------+--------+-----------------+
|   Len   | Before |      After      |
+---------+--------+-----------------+
|      1* |    2   |    7 (3.50x)    |
|     16  |   35   |  119 (3.40x)    |
|     64  |   74   |  280 (3.78x)    |
|    127  |   99   |  387 (3.91x)    |
|    128  |  103   |  427 (4.15x)    |
|    200  |  123   |  537 (4.37x)    |
|    256  |  128   |  582 (4.55x)    |
|    511  |  144   |  679 (4.72x)    |
|    512  |  146   |  714 (4.89x)    |
|   1024  |  157   |  796 (5.07x)    |
|   3173  |  167   |  883 (5.28x)    |
|   4096  |  166   |  876 (5.28x)    |
|  16384  |  169   |  899 (5.32x)    |
+---------+--------+-----------------+
*: The length of each data block to be processed by one complete SHA
   sequence.
**: The throughput of processing data blocks, unit is Mb/s.

After applying this patch, the SHA256 KUnit test suite passes on Zhaoxin
platforms. Detailed test logs are shown below.

[    7.767257]     # Subtest: sha256
[    7.770542]     # module: sha256_kunit
[    7.770544]     1..15
[    7.777383]     ok 1 test_hash_test_vectors
[    7.788563]     ok 2 test_hash_all_lens_up_to_4096
[    7.806090]     ok 3 test_hash_incremental_updates
[    7.813553]     ok 4 test_hash_buffer_overruns
[    7.822384]     ok 5 test_hash_overlaps
[    7.829388]     ok 6 test_hash_alignment_consistency
[    7.833843]     ok 7 test_hash_ctx_zeroization
[    7.915191]     ok 8 test_hash_interrupt_context_1
[    8.362312]     ok 9 test_hash_interrupt_context_2
[    8.401607]     ok 10 test_hmac
[    8.415458]     ok 11 test_sha256_finup_2x
[    8.419397]     ok 12 test_sha256_finup_2x_defaultctx
[    8.424107]     ok 13 test_sha256_finup_2x_hugelen
[    8.451289]     # benchmark_hash: len=1: 7 MB/s
[    8.465372]     # benchmark_hash: len=16: 119 MB/s
[    8.481760]     # benchmark_hash: len=64: 280 MB/s
[    8.499344]     # benchmark_hash: len=127: 387 MB/s
[    8.515800]     # benchmark_hash: len=128: 427 MB/s
[    8.531970]     # benchmark_hash: len=200: 537 MB/s
[    8.548241]     # benchmark_hash: len=256: 582 MB/s
[    8.564838]     # benchmark_hash: len=511: 679 MB/s
[    8.580872]     # benchmark_hash: len=512: 714 MB/s
[    8.596858]     # benchmark_hash: len=1024: 796 MB/s
[    8.612567]     # benchmark_hash: len=3173: 883 MB/s
[    8.628546]     # benchmark_hash: len=4096: 876 MB/s
[    8.644482]     # benchmark_hash: len=16384: 899 MB/s
[    8.649773]     ok 14 benchmark_hash
[    8.655505]     ok 15 benchmark_sha256_finup_2x # SKIP not relevant
[    8.659065] # sha256: pass:14 fail:0 skip:1 total:15
[    8.665276] # Totals: pass:14 fail:0 skip:1 total:15
[    8.670195] ok 7 sha256

Signed-off-by: AlanSong-oc &lt;AlanSong-oc@zhaoxin.com&gt;
Link: https://lore.kernel.org/r/20260313080150.9393-3-AlanSong-oc@zhaoxin.com
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/aes: Add AES-NI optimization</title>
<updated>2026-01-15T22:09:07+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-01-12T19:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=24eb22d8161380eba65edc5b499299639cbe8bf9'/>
<id>urn:sha1:24eb22d8161380eba65edc5b499299639cbe8bf9</id>
<content type='text'>
Optimize the AES library with x86 AES-NI instructions.

The relevant existing assembly functions, aesni_set_key(), aesni_enc(),
and aesni_dec(), are a bit difficult to extract into the library:

- They're coupled to the code for the AES modes.
- They operate on struct crypto_aes_ctx.  The AES library now uses
  different structs.
- They assume the key is 16-byte aligned.  The AES library only
  *prefers* 16-byte alignment; it doesn't require it.

Moreover, they're not all that great in the first place:

- They use unrolled loops, which isn't a great choice on x86.
- They use the 'aeskeygenassist' instruction, which is unnecessary, is
  slow on Intel CPUs, and forces the loop to be unrolled.
- They have special code for AES-192 key expansion, despite that being
  kind of useless.  AES-128 and AES-256 are the ones used in practice.

These are small functions anyway.

Therefore, I opted to just write replacements of these functions for the
library.  They address all the above issues.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260112192035.10427-18-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/nh: Migrate optimized code into library</title>
<updated>2026-01-12T19:07:50+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-12-11T01:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a229d83235c7627c490deb7dd4744a72567cea12'/>
<id>urn:sha1:a229d83235c7627c490deb7dd4744a72567cea12</id>
<content type='text'>
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/polyval: Migrate optimized code into library</title>
<updated>2025-11-11T19:03:38+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-09T23:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d8da35579daad0392d238460ed7e9629d49ca35'/>
<id>urn:sha1:4d8da35579daad0392d238460ed7e9629d49ca35</id>
<content type='text'>
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ba60c5914f25a44f10189c6919a737b199f6dbf'/>
<id>urn:sha1:8ba60c5914f25a44f10189c6919a737b199f6dbf</id>
<content type='text'>
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96.  This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value</title>
<updated>2025-11-06T04:30:52+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cd5528621abb01664a477392cd3e76be2ef6296b'/>
<id>urn:sha1:cd5528621abb01664a477392cd3e76be2ef6296b</id>
<content type='text'>
Just before returning, blake2s_compress_ssse3() and
blake2s_compress_avx512() store updated values to the 'h', 't', and 'f'
fields of struct blake2s_ctx.  But 'f' is always unchanged (which is
correct; only the C code changes it).  So, there's no need to write to
'f'.  Use 64-bit stores (movq and vmovq) instead of 128-bit stores
(movdqu and vmovdqu) so that only 't' is written.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
</feed>
