starfive-tech/linux.git/lib, branch visionfive_v1_5.13

lib/string: optimized memset

2021-08-26T21:07:31+00:00

The generic memset is defined as a byte at time write. This is always safe, but it's slower than a 4 byte or even 8 byte write. Write a generic memset which fills the data one byte at time until the destination is aligned, then fills using the largest size allowed, and finally fills the remaining data one byte at time. On a RISC-V machine the speed goes from 140 Mb/s to 241 Mb/s, and this the binary size increase according to bloat-o-meter: Function old new delta memset 32 148 +116 Link: https://lkml.kernel.org/r/20210702123153.14093-4-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce Cc: Christoph Hellwig Cc: David Laight Cc: Drew Fustini Cc: Emil Renner Berthing Cc: Guo Ren Cc: Nick Desaulniers Cc: Nick Kossifidis Cc: Palmer Dabbelt Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell

lib/string: optimized memmove

2021-08-26T21:07:31+00:00

When the destination buffer is before the source one, or when the buffers doesn't overlap, it's safe to use memcpy() instead, which is optimized to use a bigger data size possible. This "optimization" only covers a common case. In future, proper code which does the same thing as memcpy() does but backwards can be done. Link: https://lkml.kernel.org/r/20210702123153.14093-3-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce Cc: Christoph Hellwig Cc: David Laight Cc: Drew Fustini Cc: Emil Renner Berthing Cc: Guo Ren Cc: Nick Desaulniers Cc: Nick Kossifidis Cc: Palmer Dabbelt Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell

lib/string: optimized memcpy

2021-08-26T21:07:31+00:00

Patch series "lib/string: optimized mem* functions", v2. Rewrite the generic mem{cpy,move,set} so that memory is accessed with the widest size possible, but without doing unaligned accesses. This was originally posted as C string functions for RISC-V[1], but as there was no specific RISC-V code, it was proposed for the generic lib/string.c implementation. Tested on RISC-V and on x86_64 by undefining __HAVE_ARCH_MEM{CPY,SET,MOVE} and HAVE_EFFICIENT_UNALIGNED_ACCESS. These are the performances of memcpy() and memset() of a RISC-V machine on a 32 mbyte buffer: memcpy: original aligned: 75 Mb/s original unaligned: 75 Mb/s new aligned: 114 Mb/s new unaligned: 107 Mb/s memset: original aligned: 140 Mb/s original unaligned: 140 Mb/s new aligned: 241 Mb/s new unaligned: 241 Mb/s The size increase is negligible: $ scripts/bloat-o-meter vmlinux.orig vmlinux add/remove: 0/0 grow/shrink: 4/1 up/down: 427/-6 (421) Function old new delta memcpy 29 351 +322 memset 29 117 +88 strlcat 68 78 +10 strlcpy 50 57 +7 memmove 56 50 -6 Total: Before=8556964, After=8557385, chg +0.00% These functions will be used for RISC-V initially. [1] https://lore.kernel.org/linux-riscv/20210617152754.17960-1-mcroce@linux.microsoft.com/ This patch (of 3): Rewrite the generic memcpy() to copy a word at time, without generating unaligned accesses. The procedure is made of three steps: First copy data one byte at time until the destination buffer is aligned to a long boundary. Then copy the data one long at time shifting the current and the next long to compose a long at every cycle. Finally, copy the remainder one byte at time. This is the improvement on RISC-V: original aligned: 75 Mb/s original unaligned: 75 Mb/s new aligned: 114 Mb/s new unaligned: 107 Mb/s and this the binary size increase according to bloat-o-meter: Function old new delta memcpy 36 324 +288 Link: https://lkml.kernel.org/r/20210702123153.14093-1-mcroce@linux.microsoft.com Link: https://lkml.kernel.org/r/20210702123153.14093-2-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce Cc: Nick Kossifidis Cc: Guo Ren Cc: Christoph Hellwig Cc: David Laight Cc: Palmer Dabbelt Cc: Emil Renner Berthing Cc: Drew Fustini Cc: Nick Desaulniers Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell

lib: use PFN_PHYS() in devmem_is_allowed()

2021-08-18T07:06:46+00:00

commit 854f32648b8a5e424d682953b1a9f3b7c3322701 upstream. The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated. We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned. This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11. Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang Reviewed-by: Luis Chamberlain Cc: Palmer Dabbelt Cc: Greg Kroah-Hartman Cc: Russell King Cc: Liang Wang Cc: Xiaoming Ni Cc: Kefeng Wang Cc: [2.6.37+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

lib/decompress_unlz4.c: correctly handle zero-padding around initrds.

2021-07-20T14:00:22+00:00

[ Upstream commit 2c484419efc09e7234c667aa72698cb79ba8d8ed ] lz4 compatible decompressor is simple. The format is underspecified and relies on EOF notification to determine when to stop. Initramfs buffer format[1] explicitly states that it can have arbitrary number of zero padding. Thus when operating without a fill function, be extra careful to ensure that sizes less than 4, or apperantly empty chunksizes are treated as EOF. To test this I have created two cpio initrds, first a normal one, main.cpio. And second one with just a single /test-file with content "second" second.cpio. Then i compressed both of them with gzip, and with lz4 -l. Then I created a padding of 4 bytes (dd if=/dev/zero of=pad4 bs=1 count=4). To create four testcase initrds: 1) main.cpio.gzip + extra.cpio.gzip = pad0.gzip 2) main.cpio.lz4 + extra.cpio.lz4 = pad0.lz4 3) main.cpio.gzip + pad4 + extra.cpio.gzip = pad4.gzip 4) main.cpio.lz4 + pad4 + extra.cpio.lz4 = pad4.lz4 The pad4 test-cases replicate the initrd load by grub, as it pads and aligns every initrd it loads. All of the above boot, however /test-file was not accessible in the initrd for the testcase #4, as decoding in lz4 decompressor failed. Also an error message printed which usually is harmless. Whith a patched kernel, all of the above testcases now pass, and /test-file is accessible. This fixes lz4 initrd decompress warning on every boot with grub. And more importantly this fixes inability to load multiple lz4 compressed initrds with grub. This patch has been shipping in Ubuntu kernels since January 2021. [1] ./Documentation/driver-api/early-userspace/buffer-format.rst BugLink: https://bugs.launchpad.net/bugs/1835660 Link: https://lore.kernel.org/lkml/20210114200256.196589-1-xnox@ubuntu.com/ # v0 Link: https://lkml.kernel.org/r/20210513104831.432975-1-dimitri.ledkov@canonical.com Signed-off-by: Dimitri John Ledkov Cc: Kyungsik Lee Cc: Yinghai Lu Cc: Bongkyu Kim Cc: Kees Cook Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Rajat Asthana Cc: Nick Terrell Cc: Gao Xiang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin

iov_iter_advance(): use consistent semantics for move past the end

2021-07-20T14:00:16+00:00

[ Upstream commit 3b3fc051cd2cba42bf736fa62780857d251a1236 ] asking to advance by more than we have left in the iov_iter should move to the very end; it should *not* leave negative i->count and it should not spew into syslog, etc. - it's a legitimate operation. Signed-off-by: Al Viro Signed-off-by: Sasha Levin

seq_buf: Fix overflow in seq_buf_putmem_hex()

2021-07-19T08:04:52+00:00

commit d3b16034a24a112bb83aeb669ac5b9b01f744bb7 upstream. There's two variables being increased in that loop (i and j), and i follows the raw data, and j follows what is being written into the buffer. We should compare 'i' to MAX_MEMHEX_BYTES or compare 'j' to HEX_CHARS. Otherwise, if 'j' goes bigger than HEX_CHARS, it will overflow the destination buffer. Link: https://lore.kernel.org/lkml/20210625122453.5e2fe304@oasis.local.home/ Link: https://lkml.kernel.org/r/20210626032156.47889-1-yun.zhou@windriver.com Cc: stable@vger.kernel.org Fixes: 5e3ca0ec76fce ("ftrace: introduce the "hex" output method") Signed-off-by: Yun Zhou Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman

lib/math/rational.c: fix divide by zero

2021-07-14T15:07:48+00:00

[ Upstream commit 65a0d3c14685663ba111038a35db70f559e39336 ] If the input is out of the range of the allowed values, either larger than the largest value or closer to zero than the smallest non-zero allowed value, then a division by zero would occur. In the case of input too large, the division by zero will occur on the first iteration. The best result (largest allowed value) will be found by always choosing the semi-convergent and excluding the denominator based limit when finding it. In the case of the input too small, the division by zero will occur on the second iteration. The numerator based semi-convergent should not be calculated to avoid the division by zero. But the semi-convergent vs previous convergent test is still needed, which effectively chooses between 0 (the previous convergent) vs the smallest allowed fraction (best semi-convergent) as the result. Link: https://lkml.kernel.org/r/20210525144250.214670-1-tpiepho@gmail.com Fixes: 323dd2c3ed0 ("lib/math/rational.c: fix possible incorrect result from rational fractions helper") Signed-off-by: Trent Piepho Reported-by: Yiyuan Guo Reviewed-by: Andy Shevchenko Cc: Oskar Schirmer Cc: Daniel Latypov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin

kunit: Fix result propagation for parameterised tests

2021-07-14T15:07:39+00:00

[ Upstream commit 384426bd101cb3cd580b18de19d4891ec5ca5bf9 ] When one parameter of a parameterised test failed, its failure would be propagated to the overall test, but not to the suite result (unless it was the last parameter). This is because test_case->success was being reset to the test->success result after each parameter was used, so a failing test's result would be overwritten by a non-failing result. The overall test result was handled in a third variable, test_result, but this was discarded after the status line was printed. Instead, just propagate the result after each parameter run. Signed-off-by: David Gow Fixes: fadb08e7c750 ("kunit: Support for Parameterized Testing") Reviewed-by: Marco Elver Reviewed-by: Brendan Higgins Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin

lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING

2021-07-14T15:06:48+00:00

[ Upstream commit c0c2c0dad6a06e0c05e9a52d65f932bd54364c97 ] When PROVE_RAW_LOCK_NESTING=y many of the selftests FAILED because HARDIRQ context is out-of-bounds for spinlocks. Instead make the default hardware context the threaded hardirq context, which preserves the old locking rules. The wait-type specific locking selftests will have a non-threaded HARDIRQ variant. Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks") Signed-off-by: Peter Zijlstra (Intel) Tested-by: Joerg Roedel Link: https://lore.kernel.org/r/20210617190313.322096283@infradead.org Signed-off-by: Sasha Levin