<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/raid6/neon.uc, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-02-28T17:44:51+00:00</updated>
<entry>
<title>lib/raid6: use vdupq_n_u8 to avoid endianness warnings</title>
<updated>2019-02-28T17:44:51+00:00</updated>
<author>
<name>ndesaulniers@google.com</name>
<email>ndesaulniers@google.com</email>
</author>
<published>2019-02-26T04:03:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ad3935b39da78a403e7df7a3813f866c731bc64'/>
<id>urn:sha1:1ad3935b39da78a403e7df7a3813f866c731bc64</id>
<content type='text'>
Clang warns: vector initializers are not compatible with NEON intrinsics
in big endian mode [-Wnonportable-vector-initialization]

While this is usually the case, it's not an issue for this case since
we're initializing the uint8x16_t (16x uint8_t's) with the same value.

Instead, use vdupq_n_u8 which both compilers lower into a single movi
instruction: https://godbolt.org/z/vBrgzt

This avoids the static storage for a constant value.

Link: https://github.com/ClangBuiltLinux/linux/issues/214
Suggested-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Reviewed-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>md/raid6: use faster multiplication for ARM NEON delta syndrome</title>
<updated>2017-08-09T17:51:57+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-07-13T17:16:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=35129dde88afad07f54b332d4f9eda2d254b80f2'/>
<id>urn:sha1:35129dde88afad07f54b332d4f9eda2d254b80f2</id>
<content type='text'>
The P/Q left side optimization in the delta syndrome simply involves
repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
can accelerate this substantially by performing up to 4 such operations
at once, using the NEON instructions for polynomial multiplication.

Results on a Cortex-A57 running in 64-bit mode:

  Before:
  -------
  raid6: neonx1   xor()  1680 MB/s
  raid6: neonx2   xor()  2286 MB/s
  raid6: neonx4   xor()  3162 MB/s
  raid6: neonx8   xor()  3389 MB/s

  After:
  ------
  raid6: neonx1   xor()  2281 MB/s
  raid6: neonx2   xor()  3362 MB/s
  raid6: neonx4   xor()  3787 MB/s
  raid6: neonx8   xor()  4239 MB/s

While we're at it, simplify MASK() by using a signed shift rather than
a vector compare involving a temp register.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>md/raid6: delta syndrome for ARM NEON</title>
<updated>2015-08-31T17:29:05+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2015-07-01T02:19:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0e833e697bcf4c2f3f7fb9fce39d08cd4439e5d7'/>
<id>urn:sha1:0e833e697bcf4c2f3f7fb9fce39d08cd4439e5d7</id>
<content type='text'>
This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.

Relative performance on a Cortex-A57 based system:

  raid6: int64x1  gen()   905 MB/s
  raid6: int64x1  xor()   881 MB/s
  raid6: int64x2  gen()  1343 MB/s
  raid6: int64x2  xor()  1286 MB/s
  raid6: int64x4  gen()  1896 MB/s
  raid6: int64x4  xor()  1321 MB/s
  raid6: int64x8  gen()  1773 MB/s
  raid6: int64x8  xor()  1165 MB/s
  raid6: neonx1   gen()  1834 MB/s
  raid6: neonx1   xor()  1278 MB/s
  raid6: neonx2   gen()  2528 MB/s
  raid6: neonx2   xor()  1942 MB/s
  raid6: neonx4   gen()  2888 MB/s
  raid6: neonx4   xor()  2334 MB/s
  raid6: neonx8   gen()  2957 MB/s
  raid6: neonx8   xor()  2232 MB/s
  raid6: using algorithm neonx8 gen() 2957 MB/s
  raid6: .... xor() 2232 MB/s, rmw enabled

Cc: Markus Stockhausen &lt;stockhausen@collogia.de&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>lib/raid6: add ARM-NEON accelerated syndrome calculation</title>
<updated>2013-07-08T21:09:18+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2013-05-16T15:20:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7d11965ddb9b9b1e0a5d13c58345ada1ccbc663b'/>
<id>urn:sha1:7d11965ddb9b9b1e0a5d13c58345ada1ccbc663b</id>
<content type='text'>
Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Acked-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Cc: hpa@linux.intel.com
</content>
</entry>
</feed>
