summaryrefslogtreecommitdiff
path: root/scripts/generate_rust_analyzer.py
diff options
context:
space:
mode:
authorEric Biggers <ebiggers@google.com>2025-02-10 20:26:44 +0300
committerEric Biggers <ebiggers@google.com>2025-02-10 20:49:28 +0300
commita03fda967eb3da15014d8e1f8ae778a60033d5e4 (patch)
tree056475ea1d0884af128afa80233f9b751607a856 /scripts/generate_rust_analyzer.py
parent8d2d3e72e35b7de26746ef18dc8ad5f89a2b2e25 (diff)
downloadlinux-a03fda967eb3da15014d8e1f8ae778a60033d5e4.tar.xz
x86/crc32: implement crc32_le using new template
Instantiate crc-pclmul-template.S for crc32_le, and delete the original PCLMULQDQ optimized implementation. This has the following advantages: - Less CRC-variant-specific code. - VPCLMULQDQ support, greatly improving performance on sufficiently long messages on newer CPUs. - A faster reduction from 128 bits to the final CRC. - Support for lengths not a multiple of 16 bytes, improving performance for such lengths. - Support for misaligned buffers, improving performance in such cases. Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: Length Before After ------ ------ ----- 1 427 MB/s 605 MB/s 16 710 MB/s 3631 MB/s 64 704 MB/s 7615 MB/s 127 3610 MB/s 9710 MB/s 128 8759 MB/s 12702 MB/s 200 7083 MB/s 15343 MB/s 256 17284 MB/s 22904 MB/s 511 10919 MB/s 27309 MB/s 512 19849 MB/s 48900 MB/s 1024 21216 MB/s 62630 MB/s 3173 22150 MB/s 72437 MB/s 4096 22496 MB/s 79593 MB/s 16384 22018 MB/s 85106 MB/s Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
Diffstat (limited to 'scripts/generate_rust_analyzer.py')
0 files changed, 0 insertions, 0 deletions