<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/raid6/algos.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-09-06T14:53:55+00:00</updated>
<entry>
<title>raid6: Add LoongArch SIMD recovery implementation</title>
<updated>2023-09-06T14:53:55+00:00</updated>
<author>
<name>WANG Xuerui</name>
<email>git@xen0n.name</email>
</author>
<published>2023-09-06T14:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2091321044d9fbcadb93dfc1c9cf23e563ea40c'/>
<id>urn:sha1:f2091321044d9fbcadb93dfc1c9cf23e563ea40c</id>
<content type='text'>
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.

The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).

Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:

&gt; lasx  2data: 354.987 MiB/s
&gt; lasx  datap: 350.430 MiB/s
&gt; lsx   2data: 340.026 MiB/s
&gt; lsx   datap: 337.318 MiB/s
&gt; intx1 2data: 164.280 MiB/s
&gt; intx1 datap: 187.966 MiB/s

Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
</entry>
<entry>
<title>raid6: Add LoongArch SIMD syndrome calculation</title>
<updated>2023-09-06T14:53:55+00:00</updated>
<author>
<name>WANG Xuerui</name>
<email>git@xen0n.name</email>
</author>
<published>2023-09-06T14:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f3f06dfd6873135068ccf1a0b386308e8c4da38'/>
<id>urn:sha1:8f3f06dfd6873135068ccf1a0b386308e8c4da38</id>
<content type='text'>
The algorithms work on 64 bytes at a time, which is the L1 cache line
size of all current and future LoongArch cores (that we care about), as
confirmed by Huacai. The code is based on the generic int.uc algorithm,
unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
not meaningfully improve the performance according to experiments.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

&gt; raid6: lasx     gen() 12726 MB/s
&gt; raid6: lsx      gen() 10001 MB/s
&gt; raid6: int64x8  gen()  2876 MB/s
&gt; raid6: int64x4  gen()  3867 MB/s
&gt; raid6: int64x2  gen()  2531 MB/s
&gt; raid6: int64x1  gen()  1945 MB/s

Comparison of xor() speeds (from different boots but meaningful anyway):

&gt; lasx:    11226 MB/s
&gt; lsx:     6395 MB/s
&gt; int64x4: 2147 MB/s

Performance as measured by raid6test:

&gt; raid6: lasx     gen() 25109 MB/s
&gt; raid6: lsx      gen() 13233 MB/s
&gt; raid6: int64x8  gen()  4164 MB/s
&gt; raid6: int64x4  gen()  6005 MB/s
&gt; raid6: int64x2  gen()  5781 MB/s
&gt; raid6: int64x1  gen()  4119 MB/s
&gt; raid6: using algorithm lasx gen() 25109 MB/s
&gt; raid6: .... xor() 14439 MB/s, rmw enabled

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
</entry>
<entry>
<title>lib/raid6: drop RAID6_USE_EMPTY_ZERO_PAGE</title>
<updated>2022-11-14T17:35:50+00:00</updated>
<author>
<name>Giulio Benetti</name>
<email>giulio.benetti@benettiengineering.com</email>
</author>
<published>2022-10-19T16:04:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42271ca389edb0446b9e492858b4c38083b0b9f8'/>
<id>urn:sha1:42271ca389edb0446b9e492858b4c38083b0b9f8</id>
<content type='text'>
RAID6_USE_EMPTY_ZERO_PAGE is unused and hardcoded to 0, so let's drop it.

Signed-off-by: Giulio Benetti &lt;giulio.benetti@benettiengineering.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/raid6: Use strict priority ranking for pq gen() benchmarking</title>
<updated>2022-01-06T16:37:03+00:00</updated>
<author>
<name>Dirk Müller</name>
<email>dmueller@suse.de</email>
</author>
<published>2022-01-05T16:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36dacddbf0bdba86cd00f066b4d724157eeb63f1'/>
<id>urn:sha1:36dacddbf0bdba86cd00f066b4d724157eeb63f1</id>
<content type='text'>
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Testing across a hardware pool of
various generations of intel cpus I could not find a single
case where SSE2 won over AVX2 or AVX512. There are cases where
AVX2 wins over AVX512 however.

Change "prefer" into an integer priority field (similar to
how recov selection works) to have more than one ranking level
available, which is backwards compatible with existing behavior.

Give AVX2/512 variants higher priority over SSE2 in order to skip
SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this
saves in the order of 200ms of initialization time.

Signed-off-by: Dirk Müller &lt;dmueller@suse.de&gt;
Acked-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/raid6: skip benchmark of non-chosen xor_syndrome functions</title>
<updated>2022-01-06T16:37:03+00:00</updated>
<author>
<name>Dirk Müller</name>
<email>dmueller@suse.de</email>
</author>
<published>2022-01-05T16:38:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38640c480939d56cc8b03d58642fc5261761a697'/>
<id>urn:sha1:38640c480939d56cc8b03d58642fc5261761a697</id>
<content type='text'>
In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
a xor_syndrome() benchmarking was added also to the raid6_choose_gen()
function. However, the results of that benchmarking were intentionally
discarded and did not influence the choice. It picked the
xor_syndrome() variant related to the best performing gen_syndrome().

Reduce runtime of raid6_choose_gen() without modifying its outcome by
only benchmarking the xor_syndrome() of the best gen_syndrome() variant.

For a HZ=250 x86_64 system with avx2 and without avx512 this removes
5 out of 6 xor() benchmarks, saving 340ms of raid6 initialization time.

Signed-off-by: Dirk Müller &lt;dmueller@suse.de&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86: update AS_* macros to binutils &gt;=2.23, supporting ADX and AVX2</title>
<updated>2020-04-08T15:12:48+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2020-03-26T20:26:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6abef610c7363cbd25205674b962031ef3bc790'/>
<id>urn:sha1:e6abef610c7363cbd25205674b962031ef3bc790</id>
<content type='text'>
Now that the kernel specifies binutils 2.23 as the minimum version, we
can remove ifdefs for AVX2 and ADX throughout.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86: remove always-defined CONFIG_AS_SSSE3</title>
<updated>2020-04-08T15:01:59+00:00</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2020-03-26T08:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92203b02805d99d8aca88b1c6b93c721237205fe'/>
<id>urn:sha1:92203b02805d99d8aca88b1c6b93c721237205fe</id>
<content type='text'>
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6a4 ("x86/raid6:
correctly check for assembler capabilities").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_SSSE3, which is always defined.

I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors
on non-x86 architectures.

lib/raid6/algos.c is built not only for the kernel but also for
testing the library code from userspace. I added -DCONFIG_X86 to
lib/raid6/test/Makefile to cator to this usecase.

Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Reviewed-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>md/raid6: fix algorithm choice under larger PAGE_SIZE</title>
<updated>2020-01-13T19:44:09+00:00</updated>
<author>
<name>Zhengyuan Liu</name>
<email>liuzhengyuan@kylinos.cn</email>
</author>
<published>2019-12-20T02:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f591df3cc6d60cadf8ceff5d44af73ea6ba0a39a'/>
<id>urn:sha1:f591df3cc6d60cadf8ceff5d44af73ea6ba0a39a</id>
<content type='text'>
There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon.  To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:

    const int disks = (65536/PAGE_SIZE) + 2;

So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.

This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.

Suggested-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Zhengyuan Liu &lt;liuzhengyuan@kylinos.cn&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 48</title>
<updated>2019-05-24T15:27:13+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-20T17:08:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dd165a658d9018cf31f87d2ea2f26293f215d91d'/>
<id>urn:sha1:dd165a658d9018cf31f87d2ea2f26293f215d91d</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation inc 53 temple place ste 330 boston ma
  02111 1307 usa either version 2 of the license or at your option any
  later version incorporated herein by reference

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 13 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170858.645641371@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>lib/raid6: add option to skip algo benchmarking</title>
<updated>2018-12-20T16:53:23+00:00</updated>
<author>
<name>Daniel Verkamp</name>
<email>dverkamp@chromium.org</email>
</author>
<published>2018-11-12T23:26:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=be85f93ae2df32dea0b20908316f1d894c3e0f64'/>
<id>urn:sha1:be85f93ae2df32dea0b20908316f1d894c3e0f64</id>
<content type='text'>
This is helpful for systems where fast startup time is important.
It is especially nice to avoid benchmarking RAID functions that are
never used (for example, BTRFS selects RAID6_PQ even if the parity RAID
mode is not in use).

This saves 250+ milliseconds of boot time on modern x86 and ARM systems
with a dozen or more available implementations.

The new option is defaulted to 'y' to match the previous behavior of
always benchmarking on init.

Signed-off-by: Daniel Verkamp &lt;dverkamp@chromium.org&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
</feed>
