<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/raid, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-09-06T14:53:55+00:00</updated>
<entry>
<title>raid6: Add LoongArch SIMD recovery implementation</title>
<updated>2023-09-06T14:53:55+00:00</updated>
<author>
<name>WANG Xuerui</name>
<email>git@xen0n.name</email>
</author>
<published>2023-09-06T14:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2091321044d9fbcadb93dfc1c9cf23e563ea40c'/>
<id>urn:sha1:f2091321044d9fbcadb93dfc1c9cf23e563ea40c</id>
<content type='text'>
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.

The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).

Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:

&gt; lasx  2data: 354.987 MiB/s
&gt; lasx  datap: 350.430 MiB/s
&gt; lsx   2data: 340.026 MiB/s
&gt; lsx   datap: 337.318 MiB/s
&gt; intx1 2data: 164.280 MiB/s
&gt; intx1 datap: 187.966 MiB/s

Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
</entry>
<entry>
<title>raid6: Add LoongArch SIMD syndrome calculation</title>
<updated>2023-09-06T14:53:55+00:00</updated>
<author>
<name>WANG Xuerui</name>
<email>git@xen0n.name</email>
</author>
<published>2023-09-06T14:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f3f06dfd6873135068ccf1a0b386308e8c4da38'/>
<id>urn:sha1:8f3f06dfd6873135068ccf1a0b386308e8c4da38</id>
<content type='text'>
The algorithms work on 64 bytes at a time, which is the L1 cache line
size of all current and future LoongArch cores (that we care about), as
confirmed by Huacai. The code is based on the generic int.uc algorithm,
unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
not meaningfully improve the performance according to experiments.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

&gt; raid6: lasx     gen() 12726 MB/s
&gt; raid6: lsx      gen() 10001 MB/s
&gt; raid6: int64x8  gen()  2876 MB/s
&gt; raid6: int64x4  gen()  3867 MB/s
&gt; raid6: int64x2  gen()  2531 MB/s
&gt; raid6: int64x1  gen()  1945 MB/s

Comparison of xor() speeds (from different boots but meaningful anyway):

&gt; lasx:    11226 MB/s
&gt; lsx:     6395 MB/s
&gt; int64x4: 2147 MB/s

Performance as measured by raid6test:

&gt; raid6: lasx     gen() 25109 MB/s
&gt; raid6: lsx      gen() 13233 MB/s
&gt; raid6: int64x8  gen()  4164 MB/s
&gt; raid6: int64x4  gen()  6005 MB/s
&gt; raid6: int64x2  gen()  5781 MB/s
&gt; raid6: int64x1  gen()  4119 MB/s
&gt; raid6: using algorithm lasx gen() 25109 MB/s
&gt; raid6: .... xor() 14439 MB/s, rmw enabled

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
</entry>
<entry>
<title>lib/raid6: drop RAID6_USE_EMPTY_ZERO_PAGE</title>
<updated>2022-11-14T17:35:50+00:00</updated>
<author>
<name>Giulio Benetti</name>
<email>giulio.benetti@benettiengineering.com</email>
</author>
<published>2022-10-19T16:04:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42271ca389edb0446b9e492858b4c38083b0b9f8'/>
<id>urn:sha1:42271ca389edb0446b9e492858b4c38083b0b9f8</id>
<content type='text'>
RAID6_USE_EMPTY_ZERO_PAGE is unused and hardcoded to 0, so let's drop it.

Signed-off-by: Giulio Benetti &lt;giulio.benetti@benettiengineering.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/xor: make xor prototypes more friendly to compiler vectorization</title>
<updated>2022-02-11T09:39:39+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2022-02-05T15:23:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=297565aa22cfa80ab0f88c3569693aea0b6afb6d'/>
<id>urn:sha1:297565aa22cfa80ab0f88c3569693aea0b6afb6d</id>
<content type='text'>
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.

So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.

Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>lib/raid6: Use strict priority ranking for pq gen() benchmarking</title>
<updated>2022-01-06T16:37:03+00:00</updated>
<author>
<name>Dirk Müller</name>
<email>dmueller@suse.de</email>
</author>
<published>2022-01-05T16:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36dacddbf0bdba86cd00f066b4d724157eeb63f1'/>
<id>urn:sha1:36dacddbf0bdba86cd00f066b4d724157eeb63f1</id>
<content type='text'>
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Testing across a hardware pool of
various generations of intel cpus I could not find a single
case where SSE2 won over AVX2 or AVX512. There are cases where
AVX2 wins over AVX512 however.

Change "prefer" into an integer priority field (similar to
how recov selection works) to have more than one ranking level
available, which is backwards compatible with existing behavior.

Give AVX2/512 variants higher priority over SSE2 in order to skip
SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this
saves in the order of 200ms of initialization time.

Signed-off-by: Dirk Müller &lt;dmueller@suse.de&gt;
Acked-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: remove the kernel version of md_u.h</title>
<updated>2020-07-16T13:35:21+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-06-07T14:33:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a6a050620e496abf42749a2e1d3882645cc053f'/>
<id>urn:sha1:1a6a050620e496abf42749a2e1d3882645cc053f</id>
<content type='text'>
mdp_major can just move to drivers/md/md.h.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>md: move the early init autodetect code to drivers/md/</title>
<updated>2020-07-16T13:34:47+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-06-07T14:18:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f5b246b37e024955c0fcca0c7f5952089052d1d'/>
<id>urn:sha1:4f5b246b37e024955c0fcca0c7f5952089052d1d</id>
<content type='text'>
Just like the NFS and CIFS root code this better lives with the
driver it is tightly integrated with.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: cleanup how md_autodetect_dev is called</title>
<updated>2020-03-24T13:57:08+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-03-24T07:25:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=74cc979c3c7f8328b24651daf15280f07533e735'/>
<id>urn:sha1:74cc979c3c7f8328b24651daf15280f07533e735</id>
<content type='text'>
Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code.  Then use IS_BUILTIN to call it instead of the
ifdef magic.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>md/raid6: fix algorithm choice under larger PAGE_SIZE</title>
<updated>2020-01-13T19:44:09+00:00</updated>
<author>
<name>Zhengyuan Liu</name>
<email>liuzhengyuan@kylinos.cn</email>
</author>
<published>2019-12-20T02:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f591df3cc6d60cadf8ceff5d44af73ea6ba0a39a'/>
<id>urn:sha1:f591df3cc6d60cadf8ceff5d44af73ea6ba0a39a</id>
<content type='text'>
There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon.  To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:

    const int disks = (65536/PAGE_SIZE) + 2;

So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.

This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.

Suggested-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Zhengyuan Liu &lt;liuzhengyuan@kylinos.cn&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>raid6/test: fix a compilation warning</title>
<updated>2020-01-13T19:44:09+00:00</updated>
<author>
<name>Zhengyuan Liu</name>
<email>liuzhengyuan@kylinos.cn</email>
</author>
<published>2019-12-20T02:21:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e5ac01c2b8802921fee680518a986011cb59820'/>
<id>urn:sha1:5e5ac01c2b8802921fee680518a986011cb59820</id>
<content type='text'>
The compilation warning is redefination showed as following:

        In file included from tables.c:2:
        ../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined
         #define EXPORT_SYMBOL(sym)  __EXPORT_SYMBOL(sym, "")

        In file included from tables.c:1:
        ../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition
         #define EXPORT_SYMBOL(sym)

Fixes: 69a94abb82ee ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols")
Signed-off-by: Zhengyuan Liu &lt;liuzhengyuan@kylinos.cn&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
</feed>
