<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/crypto/x86, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-02T11:57:07+00:00</updated>
<entry>
<title>lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit</title>
<updated>2026-01-02T11:57:07+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-02T23:42:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b95fdb75d74c9bc4aee5868f7b3499af00c1632'/>
<id>urn:sha1:3b95fdb75d74c9bc4aee5868f7b3499af00c1632</id>
<content type='text'>
commit 2f22115709fc7ebcfa40af3367a508fbbd2f71e9 upstream.

In the C code, the 'inc' argument to the assembly functions
blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with
type u32, matching blake2s_compress().  The assembly code then reads it
from the 64-bit %rcx.  However, the ABI doesn't guarantee zero-extension
to 64 bits, nor do gcc or clang guarantee it.  Therefore, fix these
functions to read this argument from the 32-bit %ecx.

In theory, this bug could have caused the wrong 'inc' value to be used,
causing incorrect BLAKE2s hashes.  In practice, probably not: I've fixed
essentially this same bug in many other assembly files too, but there's
never been a real report of it having caused a problem.  In x86_64, all
writes to 32-bit registers are zero-extended to 64 bits.  That results
in zero-extension in nearly all situations.  I've only been able to
demonstrate a lack of zero-extension with a somewhat contrived example
involving truncation, e.g. when the C code has a u64 variable holding
0x1234567800000040 and passes it as a u32 expecting it to be truncated
to 0x40 (64).  But that's not what the real code does, of course.

Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux</title>
<updated>2025-09-29T22:55:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-09-29T22:55:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1896ce8eb6c61824f6c1125d69d8fda1f44a22f8'/>
<id>urn:sha1:1896ce8eb6c61824f6c1125d69d8fda1f44a22f8</id>
<content type='text'>
Pull interleaved SHA-256 hashing support from Eric Biggers:
 "Optimize fsverity with 2-way interleaved hashing

  Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and
  make fsverity use it for faster file data verification. This improves
  fsverity performance on many x86_64 and arm64 processors.

  Later, I plan to make dm-verity use this too"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: Use 2-way interleaved SHA-256 hashing when supported
  fsverity: Remove inode parameter from fsverity_hash_block()
  lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()
  lib/crypto: x86/sha256: Add support for 2-way interleaved hashing
  lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing
  lib/crypto: sha256: Add support for 2-way interleaved hashing
</content>
</entry>
<entry>
<title>lib/crypto: x86/sha256: Add support for 2-way interleaved hashing</title>
<updated>2025-09-17T18:09:40+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-09-15T16:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bc6d6a4172a7fa599fe66f16e9a2c1cdc6983207'/>
<id>urn:sha1:bc6d6a4172a7fa599fe66f16e9a2c1cdc6983207</id>
<content type='text'>
Add an implementation of sha256_finup_2x_arch() for x86_64.  It
interleaves the computation of two SHA-256 hashes using the x86 SHA-NI
instructions.  dm-verity and fs-verity will take advantage of this for
greatly improved performance on capable CPUs.

This increases the throughput of SHA-256 hashing 4096-byte messages by
the following amounts on the following CPUs:

    Intel Ice Lake (server):        4%
    Intel Sapphire Rapids:          38%
    Intel Emerald Rapids:           38%
    AMD Zen 1 (Threadripper 1950X): 84%
    AMD Zen 4 (EPYC 9B14):          98%
    AMD Zen 5 (Ryzen 9 9950X):      64%

For now, this seems to benefit AMD more than Intel.  This seems to be
because current AMD CPUs support concurrent execution of the SHA-NI
instructions, but unfortunately current Intel CPUs don't, except for the
sha256msg2 instruction.  Hopefully future Intel CPUs will support SHA-NI
on more execution ports.  Zen 1 supports 2 concurrent sha256rnds2, and
Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better
performance may be achievable on Zen 4 by interleaving more than two
hashes.  However, doing so poses a number of trade-offs, and furthermore
Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: curve25519: Consolidate into single module</title>
<updated>2025-09-06T23:32:43+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-09-06T21:35:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=68546e5632c0b982663af575ae12cc5d81facc91'/>
<id>urn:sha1:68546e5632c0b982663af575ae12cc5d81facc91</id>
<content type='text'>
Reorganize the Curve25519 library code:

- Build a single libcurve25519 module, instead of up to three modules:
  libcurve25519, libcurve25519-generic, and an arch-specific module.

- Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to
  lib/crypto/$(SRCARCH)/.  Centralize the build rules into
  lib/crypto/Makefile and lib/crypto/Kconfig.

- Include the arch-specific code directly in lib/crypto/curve25519.c via
  a header, rather than using a separate .c file.

- Eliminate the entanglement with CRYPTO.  CRYPTO_LIB_CURVE25519 no
  longer selects CRYPTO, and the arch-specific Curve25519 code no longer
  depends on CRYPTO.

This brings Curve25519 in line with the latest conventions for
lib/crypto/, used by other algorithms.  The exception is that I kept the
generic code in separate translation units for now.  (Some of the
function names collide between the x86 and generic Curve25519 code.  And
the Curve25519 functions are very long anyway, so inlining doesn't
matter as much for Curve25519 as it does for some other algorithms.)

Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: blake2s: Consolidate into single C translation unit</title>
<updated>2025-08-29T16:50:19+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-27T15:11:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=39ee3970f26d55b57343da392d45117d7f893205'/>
<id>urn:sha1:39ee3970f26d55b57343da392d45117d7f893205</id>
<content type='text'>
As was done with the other algorithms, reorganize the BLAKE2s code so
that the generic implementation and the arch-specific "glue" code is
consolidated into a single translation unit, so that the compiler will
inline the functions and automatically decide whether to include the
generic code in the resulting binary or not.

Similarly, also consolidate the build rules into
lib/crypto/{Makefile,Kconfig}.  This removes the last uses of
lib/crypto/{arm,x86}/{Makefile,Kconfig}, so remove those too.

Don't keep the !KMSAN dependency.  It was needed only for other
algorithms such as ChaCha that initialize memory from assembly code.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827151131.27733-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code</title>
<updated>2025-08-29T16:50:19+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-27T15:11:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56e48d4e138cb105a17e0f8c257f3bc41b1bd69d'/>
<id>urn:sha1:56e48d4e138cb105a17e0f8c257f3bc41b1bd69d</id>
<content type='text'>
When support for a crypto algorithm is enabled, the arch-optimized
implementation of that algorithm should be enabled too.  We've learned
this the hard way many times over the years: people regularly forget to
enable the arch-optimized implementations of the crypto algorithms,
resulting in significant performance being left on the table.

Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
uses it.  Therefore, the arch-optimized BLAKE2s code, which exists for
ARM and x86_64, should be always enabled too.  Let's do that.

Note that the effect on kernel image size is very small and should not
be a concern.  On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks*
the kernel size by about 1200 bytes, since the ARM-optimized
blake2s_compress() completely replaces the generic blake2s_compress().
On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by
about 1400 bytes, as the generic blake2s_compress() is still included as
a fallback; however, for context, that is only about a quarter the size
of the generic blake2s_compress().  The x86_64 optimized BLAKE2s code
uses much less icache at runtime than the generic code.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827151131.27733-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2</title>
<updated>2025-08-29T16:50:19+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-27T15:11:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=453eda46b7f807f6fc4283f9639085697100ec08'/>
<id>urn:sha1:453eda46b7f807f6fc4283f9639085697100ec08</id>
<content type='text'>
Save 480 bytes of .rodata by replacing the .long constants with .bytes,
and using the vpmovzxbd instruction to expand them.

Also update the code to do the loads before incrementing %rax rather
than after.  This avoids the need for the first load to use an offset.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827151131.27733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: chacha: Consolidate into single module</title>
<updated>2025-08-29T16:50:19+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-27T15:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=13cecc526d8fe7eeb9b136159738688a1a10cd82'/>
<id>urn:sha1:13cecc526d8fe7eeb9b136159738688a1a10cd82</id>
<content type='text'>
Consolidate the ChaCha code into a single module (excluding
chacha-block-generic.c which remains always built-in for random.c),
similar to various other algorithms:

- Each arch now provides a header file lib/crypto/$(SRCARCH)/chacha.h,
  replacing lib/crypto/$(SRCARCH)/chacha*.c.  The header defines
  chacha_crypt_arch() and hchacha_block_arch().  It is included by
  lib/crypto/chacha.c, and thus the code gets built into the single
  libchacha module, with improved inlining in some cases.

- Whether arch-optimized ChaCha is buildable is now controlled centrally
  by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig.
  The conditions for enabling it remain the same as before, and it
  remains enabled by default.

- Any additional arch-specific translation units for the optimized
  ChaCha code, such as assembly files, are now compiled by
  lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.

This removes the last use for the Makefile and Kconfig files in the
arm64, mips, powerpc, riscv, and s390 subdirectories of lib/crypto/.  So
also remove those files and the references to them.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827151131.27733-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: chacha: Remove unused function chacha_is_arch_optimized()</title>
<updated>2025-08-29T16:50:19+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-27T15:11:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4b846ff6ecab0427cc7dcccbe0af60b244a6d56'/>
<id>urn:sha1:c4b846ff6ecab0427cc7dcccbe0af60b244a6d56</id>
<content type='text'>
chacha_is_arch_optimized() is no longer used, so remove it.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827151131.27733-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crypto: poly1305: Consolidate into single module</title>
<updated>2025-08-29T16:49:18+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-29T15:25:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b646b782e522da3509e61f971e5502fccb3a3723'/>
<id>urn:sha1:b646b782e522da3509e61f971e5502fccb3a3723</id>
<content type='text'>
Consolidate the Poly1305 code into a single module, similar to various
other algorithms (SHA-1, SHA-256, SHA-512, etc.):

- Each arch now provides a header file lib/crypto/$(SRCARCH)/poly1305.h,
  replacing lib/crypto/$(SRCARCH)/poly1305*.c.  The header defines
  poly1305_block_init(), poly1305_blocks(), poly1305_emit(), and
  optionally poly1305_mod_init_arch().  It is included by
  lib/crypto/poly1305.c, and thus the code gets built into the single
  libpoly1305 module, with improved inlining in some cases.

- Whether arch-optimized Poly1305 is buildable is now controlled
  centrally by lib/crypto/Kconfig instead of by
  lib/crypto/$(SRCARCH)/Kconfig.  The conditions for enabling it remain
  the same as before, and it remains enabled by default.  (The PPC64 one
  remains unconditionally disabled due to 'depends on BROKEN'.)

- Any additional arch-specific translation units for the optimized
  Poly1305 code, such as assembly files, are now compiled by
  lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.

A special consideration is needed because the Adiantum code uses the
poly1305_core_*() functions directly.  For now, just carry forward that
approach.  This means retaining the CRYPTO_LIB_POLY1305_GENERIC kconfig
symbol, and keeping the poly1305_core_*() functions in separate
translation units.  So it's not quite as streamlined I've done with the
other hash functions, but we still get a single libpoly1305 module.

Note: to see the diff from the arm, arm64, and x86 .c files to the new
.h files, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250829152513.92459-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
</feed>
