|
Add a "template" crc-clmul-template.h that can generate RISC-V Zbc
optimized CRC functions. Each generated CRC function is parameterized
by CRC length and bit order, and it accepts a pointer to the constants
struct required for the specific CRC polynomial desired. Update
gen-crc-consts.py to support generating the needed constants structs.
This makes it possible to easily wire up a Zbc optimized implementation
of almost any CRC.
The design generally follows what I did for x86, but it is simplified by
using RISC-V's scalar carryless multiplication Zbc, which has no
equivalent on x86. RISC-V's clmulr instruction is also helpful. A
potential switch to Zvbc (or support for Zvbc alongside Zbc) is left for
future work. For long messages Zvbc should be fastest, but it would
need to be shown to be worthwhile over just using Zbc which is
significantly more convenient to use, especially in the kernel context.
Compared to the existing Zbc-optimized CRC32 code and the earlier
proposed Zbc-optimized CRC-T10DIF code
(https://lore.kernel.org/r/20250211071101.181652-1-zhihang.shao.iscas@gmail.com),
this submission deduplicates the code among CRC variants and is
significantly more optimized. It uses "folding" to take better
advantage of instruction-level parallelism (to a more limited extent
than x86 for now, but it could be extended to more), it reworks the
Barrett reduction to eliminate unnecessary instructions, and it
documents all the math used and makes all the constants reproducible.
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250216225530.306980-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Add a Python script that generates constants for computing the given CRC
variant(s) using x86's pclmulqdq or vpclmulqdq instructions.
This is specifically tuned for x86's crc-pclmul-template.S. However,
other architectures with a 64x64 => 128-bit carryless multiplication
instruction should be able to use the generated constants too. (Some
tweaks may be warranted based on the exact instructions available on
each arch, so the script may grow an arch argument in the future.)
The script also supports generating the tables needed for table-based
CRC computation. Thus, it can also be used to reproduce the tables like
t10_dif_crc_table[] and crc16_table[] that are currently hardcoded in
the source with no generation script explicitly documented.
Python is used rather than C since it enables implementing the CRC math
in the simplest way possible, using arbitrary precision integers. The
outputs of this script are intended to be checked into the repo, so
Python will continue to not be required to build the kernel, and the
script has been optimized for simplicity rather than performance.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250210174540.161705-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|