kernel/linux.git/lib, branch v4.9.168

ARM: 8833/1: Ensure that NEON code always compiles with Clang

2019-04-05T20:29:11+00:00

[ Upstream commit de9c0d49d85dc563549972edc5589d195cd5e859 ] While building arm32 allyesconfig, I ran into the following errors: arch/arm/lib/xor-neon.c:17:2: error: You should compile this file with '-mfloat-abi=softfp -mfpu=neon' In file included from lib/raid6/neon1.c:27: /home/nathan/cbl/prebuilt/lib/clang/8.0.0/include/arm_neon.h:28:2: error: "NEON support not enabled" Building V=1 showed NEON_FLAGS getting passed along to Clang but __ARM_NEON__ was not getting defined. Ultimately, it boils down to Clang only defining __ARM_NEON__ when targeting armv7, rather than armv6k, which is the '-march' value for allyesconfig. >From lib/Basic/Targets/ARM.cpp in the Clang source: // This only gets set when Neon instructions are actually available, unlike // the VFP define, hence the soft float and arch check. This is subtly // different from gcc, we follow the intent which was that it should be set // when Neon instructions are actually available. if ((FPU & NeonFPU) && !SoftFloat && ArchVersion >= 7) { Builder.defineMacro("__ARM_NEON", "1"); Builder.defineMacro("__ARM_NEON__"); // current AArch32 NEON implementations do not support double-precision // floating-point even when it is present in VFP. Builder.defineMacro("__ARM_NEON_FP", "0x" + Twine::utohexstr(HW_FP & ~HW_FP_DP)); } Ard Biesheuvel recommended explicitly adding '-march=armv7-a' at the beginning of the NEON_FLAGS definitions so that __ARM_NEON__ always gets definined by Clang. This doesn't functionally change anything because that code will only run where NEON is supported, which is implicitly armv7. Link: https://github.com/ClangBuiltLinux/linux/issues/287 Suggested-by: Ard Biesheuvel Signed-off-by: Nathan Chancellor Acked-by: Nicolas Pitre Reviewed-by: Nick Desaulniers Reviewed-by: Stefan Agner Signed-off-by: Russell King Signed-off-by: Sasha Levin

kprobes: Prohibit probing on bsearch()

2019-04-05T20:29:11+00:00

[ Upstream commit 02106f883cd745523f7766d90a739f983f19e650 ] Since kprobe breakpoing handler is using bsearch(), probing on this routine can cause recursive breakpoint problem. int3 ->do_int3() ->ftrace_int3_handler() ->ftrace_location() ->ftrace_location_range() ->bsearch() -> int3 Prohibit probing on bsearch(). Signed-off-by: Andrea Righi Acked-by: Masami Hiramatsu Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154998813406.31052.8791425358974650922.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin

lib/int_sqrt: optimize initial value compute

2019-04-05T20:29:04+00:00

commit f8ae107eef209bff29a5816bc1aad40d5cd69a80 upstream. The initial value (@m) compute is: m = 1UL << (BITS_PER_LONG - 2); while (m > x) m >>= 2; Which is a linear search for the highest even bit smaller or equal to @x We can implement this using a binary search using __fls() (or better when its hardware implemented). m = 1UL << (__fls(x) & ~1UL); Especially for small values of @x; which are the more common arguments when doing a CDF on idle times; the linear search is near to worst case, while the binary search of __fls() is a constant 6 (or 5 on 32bit) branches. cycles: branches: branch-misses: PRE: hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681 cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219 SOFTWARE FLS: hot: 29.576176 +- 0.028850 26.666730 +- 0.004511 0.019463 +- 0.000663 cold: 165.947136 +- 0.188406 26.666746 +- 0.004511 6.133897 +- 0.004386 HARDWARE FLS: hot: 24.720922 +- 0.025161 20.666784 +- 0.004509 0.020836 +- 0.000677 cold: 132.777197 +- 0.127471 20.666776 +- 0.004509 5.080285 +- 0.003874 Averages computed over all values <128k using a LFSR to generate order. Cold numbers have a LFSR based branch trace buffer 'confuser' ran between each int_sqrt() invocation. Link: http://lkml.kernel.org/r/20171020164644.936577234@infradead.org Signed-off-by: Peter Zijlstra (Intel) Suggested-by: Joe Perches Acked-by: Will Deacon Acked-by: Linus Torvalds Cc: Anshul Garg Cc: Davidlohr Bueso Cc: David Miller Cc: Ingo Molnar Cc: Kees Cook Cc: Matthew Wilcox Cc: Michael Davidson Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Joe Perches Signed-off-by: Greg Kroah-Hartman

lib/int_sqrt: optimize small argument

2019-03-27T05:13:04+00:00

commit 3f3295709edea6268ff1609855f498035286af73 upstream. The current int_sqrt() computation is sub-optimal for the case of small @x. Which is the interesting case when we're going to do cumulative distribution functions on idle times, which we assume to be a random variable, where the target residency of the deepest idle state gives an upper bound on the variable (5e6ns on recent Intel chips). In the case of small @x, the compute loop: while (m != 0) { b = y + m; y >>= 1; if (x >= b) { x -= b; y += m; } m >>= 2; } can be reduced to: while (m > x) m >>= 2; Because y==0, b==m and until x>=m y will remain 0. And while this is computationally equivalent, it runs much faster because there's less code, in particular less branches. cycles: branches: branch-misses: OLD: hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593 cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305 PRE: hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113 cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568 POST: hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681 cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219 Averages computed over all values <128k using a LFSR to generate order. Cold numbers have a LFSR based branch trace buffer 'confuser' ran between each int_sqrt() invocation. Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm") Signed-off-by: Peter Zijlstra (Intel) Suggested-by: Anshul Garg Acked-by: Linus Torvalds Cc: Davidlohr Bueso Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Will Deacon Cc: Joe Perches Cc: David Miller Cc: Matthew Wilcox Cc: Kees Cook Cc: Michael Davidson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman

assoc_array: Fix shortcut creation

2019-03-23T12:19:42+00:00

[ Upstream commit bb2ba2d75a2d673e76ddaf13a9bd30d6a8b1bb08 ] Fix the creation of shortcuts for which the length of the index key value is an exact multiple of the machine word size. The problem is that the code that blanks off the unused bits of the shortcut value malfunctions if the number of bits in the last word equals machine word size. This is due to the "<<" operator being given a shift of zero in this case, and so the mask that should be all zeros is all ones instead. This causes the subsequent masking operation to clear everything rather than clearing nothing. Ordinarily, the presence of the hash at the beginning of the tree index key makes the issue very hard to test for, but in this case, it was encountered due to a development mistake that caused the hash output to be either 0 (keyring) or 1 (non-keyring) only. This made it susceptible to the keyctl/unlink/valid test in the keyutils package. The fix is simply to skip the blanking if the shift would be 0. For example, an index key that is 64 bits long would produce a 0 shift and thus a 'blank' of all 1s. This would then be inverted and AND'd onto the index_key, incorrectly clearing the entire last word. Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Signed-off-by: David Howells Signed-off-by: James Morris Signed-off-by: Sasha Levin

seq_buf: Make seq_buf_puts() null-terminate the buffer

2019-02-12T18:44:57+00:00

[ Upstream commit 0464ed24380905d640030d368cd84a4e4d1e15e2 ] Currently seq_buf_puts() will happily create a non null-terminated string for you in the buffer. This is particularly dangerous if the buffer is on the stack. For example: char buf[8]; char secret = "secret"; struct seq_buf s; seq_buf_init(&s, buf, sizeof(buf)); seq_buf_puts(&s, "foo"); printk("Message is %s\n", buf); Can result in: Message is fooªªªªªsecret We could require all users to memset() their buffer to zero before use. But that seems likely to be forgotten and lead to bugs. Instead we can change seq_buf_puts() to always leave the buffer in a null-terminated state. The only downside is that this makes the buffer 1 character smaller for seq_buf_puts(), but that seems like a good trade off. Link: http://lkml.kernel.org/r/20181019042109.8064-1-mpe@ellerman.id.au Acked-by: Kees Cook Signed-off-by: Michael Ellerman Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin

lib/interval_tree_test.c: allow users to limit scope of endpoint

2018-12-21T13:11:30+00:00

[ Upstream commit a8ec14d4f6aa8e245efacc992c8ee6ea0464ce2a ] Add a 'max_endpoint' parameter such that users may easily limit the size of the intervals that are randomly generated. Link: http://lkml.kernel.org/r/20170518174936.20265-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin

lib/rbtree-test: lower default params

2018-12-21T13:11:30+00:00

[ Upstream commit 0b548e33e6cb2bff240fdaf1783783be15c29080 ] Fengguang reported soft lockups while running the rbtree and interval tree test modules. The logic for these tests all occur in init phase, and we currently are pounding with the default values for number of nodes and number of iterations of each test. Reduce the latter by two orders of magnitude. This does not influence the value of the tests in that one thousand times by default is enough to get the picture. Link: http://lkml.kernel.org/r/20171109161715.xai2dtwqw2frhkcm@linux-n805 Signed-off-by: Davidlohr Bueso Reported-by: Fengguang Wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin

lib/rbtree_test.c: make input module parameters

2018-12-21T13:11:30+00:00

[ Upstream commit 223f8911eace60c787f8767c25148b80ece9732a ] Allows for more flexible debugging. Link: http://lkml.kernel.org/r/20170719014603.19029-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin

lib/interval_tree_test.c: allow full tree search

2018-12-21T13:11:30+00:00

[ Upstream commit c46ecce431ebe6b1a9551d1f530eb432dae5c39b ] ... such that a user can specify visiting all the nodes in the tree (intersects with the world). This is a nice opposite from the very basic default query which is a single point. Link: http://lkml.kernel.org/r/20170518174936.20265-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin