diff options
| author | Eric Biggers <ebiggers@google.com> | 2025-02-10 20:26:45 +0300 | 
|---|---|---|
| committer | Eric Biggers <ebiggers@google.com> | 2025-02-10 20:49:32 +0300 | 
| commit | dbdda1fde38259623a79f4f14b8c90c16c64b36b (patch) | |
| tree | 9980e88af53937e2fb3de9304e4cf7ba163d662a /tools/perf/scripts/python | |
| parent | a03fda967eb3da15014d8e1f8ae778a60033d5e4 (diff) | |
| download | linux-dbdda1fde38259623a79f4f14b8c90c16c64b36b.tar.xz | |
x86/crc-t10dif: implement crc_t10dif using new template
Instantiate crc-pclmul-template.S for crc_t10dif and delete the original
PCLMULQDQ optimized implementation.  This has the following advantages:
- Less CRC-variant-specific code.
- VPCLMULQDQ support, greatly improving performance on sufficiently long
  messages on newer CPUs.
- A faster reduction from 128 bits to the final CRC.
- Support for i386.
Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit:
        Length     Before        After
	------     ------        -----
	     1     440 MB/s      386 MB/s
	    16    1865 MB/s     2008 MB/s
	    64    4343 MB/s     6917 MB/s
	   127    5440 MB/s     8909 MB/s
	   128    5533 MB/s    12150 MB/s
	   200    5908 MB/s    14423 MB/s
	   256   15870 MB/s    21288 MB/s
	   511   14219 MB/s    25840 MB/s
	   512   18361 MB/s    37797 MB/s
	  1024   19941 MB/s    61374 MB/s
	  3173   20461 MB/s    74909 MB/s
	  4096   21310 MB/s    78919 MB/s
	 16384   21663 MB/s    85012 MB/s
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250210174540.161705-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Diffstat (limited to 'tools/perf/scripts/python')
0 files changed, 0 insertions, 0 deletions
