summaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm/bitops.h
AgeCommit message (Collapse)AuthorFilesLines
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds1-2/+0
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-15include: move find.h from asm_generic to linuxYury Norov1-2/+0
find_bit API and bitmap API are closely related, but inclusion paths are different - include/asm-generic and include/linux, correspondingly. In the past it made a lot of troubles due to circular dependencies and/or undefined symbols. Fix this by moving find.h under include/linux. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2021-11-30powerpc/bitops: Use immediate operand when possibleChristophe Leroy1-8/+81
Today we get the following code generation for bitops like set or clear bit: c0009fe0: 39 40 08 00 li r10,2048 c0009fe4: 7c e0 40 28 lwarx r7,0,r8 c0009fe8: 7c e7 53 78 or r7,r7,r10 c0009fec: 7c e0 41 2d stwcx. r7,0,r8 c000d568: 39 00 18 00 li r8,6144 c000d56c: 7c c0 38 28 lwarx r6,0,r7 c000d570: 7c c6 40 78 andc r6,r6,r8 c000d574: 7c c0 39 2d stwcx. r6,0,r7 Most set bits are constant on lower 16 bits, so it can easily be replaced by the "immediate" version of the operation. Allow GCC to choose between the normal or immediate form. For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for when all bits to be cleared are consecutive. On 64 bits we don't have any equivalent single operation for clearing, single bits or a few bits, we'd need two 'rldicl' so it is not worth it, the li/andc sequence is doing the same. With this patch we get: c0009fe0: 7d 00 50 28 lwarx r8,0,r10 c0009fe4: 61 08 08 00 ori r8,r8,2048 c0009fe8: 7d 00 51 2d stwcx. r8,0,r10 c000d558: 7c e0 40 28 lwarx r7,0,r8 c000d55c: 54 e7 05 64 rlwinm r7,r7,0,21,18 c000d560: 7c e0 41 2d stwcx. r7,0,r8 On pmac32_defconfig, it reduces the text by approx 10 kbytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-08-25powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macrosChristophe Leroy1-4/+4
Force the eh flag at 0 on PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()Christophe Leroy1-2/+21
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). On powerpc, those builtins trivially use ctlzw and ctlzd power instructions. Allthough those instructions provide the expected result with input argument 0, __builtin_ctz() and __builtin_ctzll() are documented as undefined for value 0. The easiest fix would be to use fls() and fls64() functions defined in include/asm-generic/bitops/builtin-fls.h and include/asm-generic/bitops/fls64.h, but GCC output is not optimal: 00000388 <testfls>: 388: 2c 03 00 00 cmpwi r3,0 38c: 41 82 00 10 beq 39c <testfls+0x14> 390: 7c 63 00 34 cntlzw r3,r3 394: 20 63 00 20 subfic r3,r3,32 398: 4e 80 00 20 blr 39c: 38 60 00 00 li r3,0 3a0: 4e 80 00 20 blr 000003b0 <testfls64>: 3b0: 2c 03 00 00 cmpwi r3,0 3b4: 40 82 00 1c bne 3d0 <testfls64+0x20> 3b8: 2f 84 00 00 cmpwi cr7,r4,0 3bc: 38 60 00 00 li r3,0 3c0: 4d 9e 00 20 beqlr cr7 3c4: 7c 83 00 34 cntlzw r3,r4 3c8: 20 63 00 20 subfic r3,r3,32 3cc: 4e 80 00 20 blr 3d0: 7c 63 00 34 cntlzw r3,r3 3d4: 20 63 00 40 subfic r3,r3,64 3d8: 4e 80 00 20 blr When the input of fls(x) is a constant, just check x for nullity and return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction directly. For fls64() on PPC64, do the same but with __builtin_clzll() and cntlzd instruction. On PPC32, lets take the generic fls64() which will use our fls(). The result is as expected: 00000388 <testfls>: 388: 7c 63 00 34 cntlzw r3,r3 38c: 20 63 00 20 subfic r3,r3,32 390: 4e 80 00 20 blr 000003a0 <testfls64>: 3a0: 2c 03 00 00 cmpwi r3,0 3a4: 40 82 00 10 bne 3b4 <testfls64+0x14> 3a8: 7c 83 00 34 cntlzw r3,r4 3ac: 20 63 00 20 subfic r3,r3,32 3b0: 4e 80 00 20 blr 3b4: 7c 63 00 34 cntlzw r3,r3 3b8: 20 63 00 40 subfic r3,r3,64 3bc: 4e 80 00 20 blr Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
2020-05-28powerpc: Remove IBM405 Erratum #77Christophe Leroy1-4/+0
This erratum is dedicated to IBM 405GP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/44dbc08e9034681eb28324cbabc086e97044c36c.1590079969.git.christophe.leroy@csgroup.eu
2019-11-07powerpc: support KASAN instrumentation of bitopsDaniel Axtens1-22/+29
The powerpc-specific bitops are not being picked up by the KASAN test suite. Instrumentation is done via the bitops/instrumented-{atomic,lock}.h headers. They require that arch-specific versions of bitop functions are renamed to arch_*. Do this renaming. For clear_bit_unlock_is_negative_byte, the current implementation uses the PG_waiters constant. This works because it's a preprocessor macro - so it's only actually evaluated in contexts where PG_waiters is defined. With instrumentation however, it becomes a static inline function, and all of a sudden we need the actual value of PG_waiters. Because of the order of header includes, it's not available and we fail to compile. Instead, manually specify that we care about bit 7. This is still correct: bit 7 is the bit that would mark a negative byte. While we're at it, replace __inline__ with inline across the file. Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Daniel Axtens <dja@axtens.net> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820024941.12640-2-dja@axtens.net
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-30powerpc/405: move PPC405_ERR77 in asm-405.hChristophe Leroy1-0/+1
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc: Remove __ilog2()s and use generic onesChristophe Leroy1-26/+1
With the __ilog2() function as defined in arch/powerpc/include/asm/bitops.h, GCC will not optimise the code in case of constant parameter. The generic ilog2() function in include/linux/log2.h is written to handle the case of the constant parameter. This patch discards the three __ilog2() functions and defines __ilog2() as ilog2() For non constant calls, the generated code is doing the same: int test__ilog2(unsigned long x) { return __ilog2(x); } int test__ilog2_u32(u32 n) { return __ilog2_u32(n); } int test__ilog2_u64(u64 n) { return __ilog2_u64(n); } On PPC32 before the patch: 00000000 <test__ilog2>: 0: 7c 63 00 34 cntlzw r3,r3 4: 20 63 00 1f subfic r3,r3,31 8: 4e 80 00 20 blr 0000000c <test__ilog2_u32>: c: 7c 63 00 34 cntlzw r3,r3 10: 20 63 00 1f subfic r3,r3,31 14: 4e 80 00 20 blr On PPC32 after the patch: 00000000 <test__ilog2>: 0: 7c 63 00 34 cntlzw r3,r3 4: 20 63 00 1f subfic r3,r3,31 8: 4e 80 00 20 blr 0000000c <test__ilog2_u32>: c: 7c 63 00 34 cntlzw r3,r3 10: 20 63 00 1f subfic r3,r3,31 14: 4e 80 00 20 blr On PPC64 before the patch: 0000000000000000 <.test__ilog2>: 0: 7c 63 00 74 cntlzd r3,r3 4: 20 63 00 3f subfic r3,r3,63 8: 7c 63 07 b4 extsw r3,r3 c: 4e 80 00 20 blr 0000000000000010 <.test__ilog2_u32>: 10: 7c 63 00 34 cntlzw r3,r3 14: 20 63 00 1f subfic r3,r3,31 18: 7c 63 07 b4 extsw r3,r3 1c: 4e 80 00 20 blr 0000000000000020 <.test__ilog2_u64>: 20: 7c 63 00 74 cntlzd r3,r3 24: 20 63 00 3f subfic r3,r3,63 28: 7c 63 07 b4 extsw r3,r3 2c: 4e 80 00 20 blr On PPC64 after the patch: 0000000000000000 <.test__ilog2>: 0: 7c 63 00 74 cntlzd r3,r3 4: 20 63 00 3f subfic r3,r3,63 8: 7c 63 07 b4 extsw r3,r3 c: 4e 80 00 20 blr 0000000000000010 <.test__ilog2_u32>: 10: 7c 63 00 34 cntlzw r3,r3 14: 20 63 00 1f subfic r3,r3,31 18: 7c 63 07 b4 extsw r3,r3 1c: 4e 80 00 20 blr 0000000000000020 <.test__ilog2_u64>: 20: 7c 63 00 74 cntlzd r3,r3 24: 20 63 00 3f subfic r3,r3,63 28: 7c 63 07 b4 extsw r3,r3 2c: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc: Replace ffz() by equivalent generic functionChristophe Leroy1-19/+1
With the ffz() function as defined in arch/powerpc/include/asm/bitops.h GCC will not optimise the code in case of constant parameter. This patch replaces ffz() by the generic function. The generic ffz(x) expects to never be called with ~x == 0 as written in the comment in include/asm-generic/bitops/ffz.h The only user of ffz() within arch/powerpc/ is platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff For non constant calls, the generated code is doing the same: unsigned long testffz(unsigned long x) { return ffz(x); } On PPC32, before the patch: 00000018 <testffz>: 18: 7c 63 18 f9 not. r3,r3 1c: 40 82 00 0c bne 28 <testffz+0x10> 20: 38 60 00 20 li r3,32 24: 4e 80 00 20 blr 28: 7d 23 00 d0 neg r9,r3 2c: 7d 23 18 38 and r3,r9,r3 30: 7c 63 00 34 cntlzw r3,r3 34: 20 63 00 1f subfic r3,r3,31 38: 4e 80 00 20 blr On PPC32, after the patch: 00000018 <testffz>: 18: 39 23 00 01 addi r9,r3,1 1c: 7d 23 18 78 andc r3,r9,r3 20: 7c 63 00 34 cntlzw r3,r3 24: 20 63 00 1f subfic r3,r3,31 28: 4e 80 00 20 blr On PPC64, before the patch: 0000000000000030 <.testffz>: 30: 7c 60 18 f9 not. r0,r3 34: 38 60 00 40 li r3,64 38: 4d 82 00 20 beqlr 3c: 7c 60 00 d0 neg r3,r0 40: 7c 63 00 38 and r3,r3,r0 44: 7c 63 00 74 cntlzd r3,r3 48: 20 63 00 3f subfic r3,r3,63 4c: 7c 63 07 b4 extsw r3,r3 50: 4e 80 00 20 blr On PPC64, after the patch: 0000000000000030 <.testffz>: 30: 38 03 00 01 addi r0,r3,1 34: 7c 03 18 78 andc r3,r0,r3 38: 7c 63 00 74 cntlzd r3,r3 3c: 20 63 00 3f subfic r3,r3,63 40: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc: Use builtin functions for fls()/__fls()/fls64()Christophe Leroy1-21/+3
With the fls() functions as defined in arch/powerpc/include/asm/bitops.h GCC will not optimise the code in case of constant parameter. This patch replaces __fls() by the builtin function, and modifies fls() and fls64() to use builtins instead of inline assembly For non constant calls, the generated code is doing the same: int testfls(unsigned int x) { return fls(x); } unsigned long test__fls(unsigned long x) { return __fls(x); } int testfls64(__u64 x) { return fls64(x); } On PPC32, before the patch: 00000064 <testfls>: 64: 7c 63 00 34 cntlzw r3,r3 68: 20 63 00 20 subfic r3,r3,32 6c: 4e 80 00 20 blr 00000070 <test__fls>: 70: 7c 63 00 34 cntlzw r3,r3 74: 20 63 00 1f subfic r3,r3,31 78: 4e 80 00 20 blr 0000007c <testfls64>: 7c: 2c 03 00 00 cmpwi r3,0 80: 40 82 00 10 bne 90 <testfls64+0x14> 84: 7c 83 00 34 cntlzw r3,r4 88: 20 63 00 20 subfic r3,r3,32 8c: 4e 80 00 20 blr 90: 7c 63 00 34 cntlzw r3,r3 94: 20 63 00 40 subfic r3,r3,64 98: 4e 80 00 20 blr On PPC32, after the patch: 00000054 <testfls>: 54: 7c 63 00 34 cntlzw r3,r3 58: 20 63 00 20 subfic r3,r3,32 5c: 4e 80 00 20 blr 00000060 <test__fls>: 60: 7c 63 00 34 cntlzw r3,r3 64: 20 63 00 1f subfic r3,r3,31 68: 4e 80 00 20 blr 0000006c <testfls64>: 6c: 2c 03 00 00 cmpwi r3,0 70: 41 82 00 10 beq 80 <testfls64+0x14> 74: 7c 63 00 34 cntlzw r3,r3 78: 20 63 00 40 subfic r3,r3,64 7c: 4e 80 00 20 blr 80: 7c 83 00 34 cntlzw r3,r4 84: 20 63 00 40 subfic r3,r3,32 88: 4e 80 00 20 blr On PPC64, before the patch: 00000000000000a0 <.testfls>: a0: 7c 63 00 34 cntlzw r3,r3 a4: 20 63 00 20 subfic r3,r3,32 a8: 7c 63 07 b4 extsw r3,r3 ac: 4e 80 00 20 blr 00000000000000b0 <.test__fls>: b0: 7c 63 00 74 cntlzd r3,r3 b4: 20 63 00 3f subfic r3,r3,63 b8: 7c 63 07 b4 extsw r3,r3 bc: 4e 80 00 20 blr 00000000000000c0 <.testfls64>: c0: 7c 63 00 74 cntlzd r3,r3 c4: 20 63 00 40 subfic r3,r3,64 c8: 7c 63 07 b4 extsw r3,r3 cc: 4e 80 00 20 blr On PPC64, after the patch: 0000000000000090 <.testfls>: 90: 7c 63 00 34 cntlzw r3,r3 94: 20 63 00 20 subfic r3,r3,32 98: 7c 63 07 b4 extsw r3,r3 9c: 4e 80 00 20 blr 00000000000000a0 <.test__fls>: a0: 7c 63 00 74 cntlzd r3,r3 a4: 20 63 00 3f subfic r3,r3,63 a8: 4e 80 00 20 blr ac: 60 00 00 00 nop 00000000000000b0 <.testfls64>: b0: 7c 63 00 74 cntlzd r3,r3 b4: 20 63 00 40 subfic r3,r3,64 b8: 7c 63 07 b4 extsw r3,r3 bc: 4e 80 00 20 blr Those builtins have been in GCC since at least 3.4.6 (see https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html ) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc: Discard ffs()/__ffs() function and use builtin functions insteadChristophe Leroy1-14/+2
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h GCC will not optimise the code in case of constant parameter, as shown by the small exemple below. int ffs_test(void) { return 4 << ffs(31); } c0012334 <ffs_test>: c0012334: 39 20 00 01 li r9,1 c0012338: 38 60 00 04 li r3,4 c001233c: 7d 29 00 34 cntlzw r9,r9 c0012340: 21 29 00 20 subfic r9,r9,32 c0012344: 7c 63 48 30 slw r3,r3,r9 c0012348: 4e 80 00 20 blr With this patch, the same function will compile as follows: c0012334 <ffs_test>: c0012334: 38 60 00 08 li r3,8 c0012338: 4e 80 00 20 blr The same happens with __ffs() For non constant calls, the generated code is doing the same, allthought it is slightly different on 64 bits for ffs(): unsigned long test__ffs(unsigned long x) { return __ffs(x); } int testffs(int x) { return ffs(x); } On PPC32, before the patch: 0000003c <test__ffs>: 3c: 7d 23 00 d0 neg r9,r3 40: 7d 23 18 38 and r3,r9,r3 44: 7c 63 00 34 cntlzw r3,r3 48: 20 63 00 1f subfic r3,r3,31 4c: 4e 80 00 20 blr 00000050 <testffs>: 50: 7d 23 00 d0 neg r9,r3 54: 7d 23 18 38 and r3,r9,r3 58: 7c 63 00 34 cntlzw r3,r3 5c: 20 63 00 20 subfic r3,r3,32 60: 4e 80 00 20 blr On PPC32, after the patch: 0000002c <test__ffs>: 2c: 7d 23 00 d0 neg r9,r3 30: 7d 23 18 38 and r3,r9,r3 34: 7c 63 00 34 cntlzw r3,r3 38: 20 63 00 1f subfic r3,r3,31 3c: 4e 80 00 20 blr 00000040 <testffs>: 40: 7d 23 00 d0 neg r9,r3 44: 7d 23 18 38 and r3,r9,r3 48: 7c 63 00 34 cntlzw r3,r3 4c: 20 63 00 20 subfic r3,r3,32 50: 4e 80 00 20 blr On PPC64, before the patch: 0000000000000060 <.test__ffs>: 60: 7c 03 00 d0 neg r0,r3 64: 7c 03 18 38 and r3,r0,r3 68: 7c 63 00 74 cntlzd r3,r3 6c: 20 63 00 3f subfic r3,r3,63 70: 7c 63 07 b4 extsw r3,r3 74: 4e 80 00 20 blr 0000000000000080 <.testffs>: 80: 7c 03 00 d0 neg r0,r3 84: 7c 03 18 38 and r3,r0,r3 88: 7c 63 00 74 cntlzd r3,r3 8c: 20 63 00 40 subfic r3,r3,64 90: 7c 63 07 b4 extsw r3,r3 94: 4e 80 00 20 blr On PPC64, after the patch: 0000000000000050 <.test__ffs>: 50: 7c 03 00 d0 neg r0,r3 54: 7c 03 18 38 and r3,r0,r3 58: 7c 63 00 74 cntlzd r3,r3 5c: 20 63 00 3f subfic r3,r3,63 60: 4e 80 00 20 blr 0000000000000070 <.testffs>: 70: 7c 03 00 d0 neg r0,r3 74: 7c 03 18 38 and r3,r0,r3 78: 7c 63 00 34 cntlzw r3,r3 7c: 20 63 00 20 subfic r3,r3,32 80: 7c 63 07 b4 extsw r3,r3 84: 4e 80 00 20 blr (ffs() operates on an int so cntlzw is equivalent to cntlzd) In addition, when reading the generated vmlinux, we can observe that with the builtin functions, GCC sometimes efficiently spreads the instructions within the generated functions while the inline assembly force them to remain grouped together. __builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h Those builtins have been in GCC since at least 3.4.6 (see https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html ) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06powerpc: Add more PPC bit conversion macrosBenjamin Herrenschmidt1-0/+8
Add 32 and 8 bit variants Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-10powerpc/64s: POWER9 machine check handlerNicholas Piggin1-0/+4
Add POWER9 machine check handler. There are several new types of errors added, so logging messages for those are also added. This doesn't attempt to reuse any of the P7/8 defines or functions, because that becomes too complex. The better option in future is to use a table driven approach. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-18powerpc/64: Implement clear_bit_unlock_is_negative_byte()Nicholas Piggin1-0/+28
Commit b91e1302ad9b8 ("mm: optimize PageWaiters bit use for unlock_page()") added a special bitop function to speed up unlock_page(). Implement this for 64-bit powerpc. This improves the unlock_page() core code from this: li 9,1 lwsync 1: ldarx 10,0,3,0 andc 10,10,9 stdcx. 10,0,3 bne- 1b ori 2,2,0 ld 9,0(3) andi. 10,9,0x80 beqlr li 4,0 b wake_up_page_bit To this: li 10,1 lwsync 1: ldarx 9,0,3,0 andc 9,9,10 stdcx. 9,0,3 bne- 1b andi. 10,9,0x80 beqlr li 4,0 b wake_up_page_bit In a test of elapsed time for dd writing into 16GB of already-dirty pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB this patch reduced overhead by 1.1%: N Min Max Median Avg Stddev x 19 2.578 2.619 2.594 2.595 0.011 + 19 2.552 2.592 2.564 2.565 0.008 Difference at 95.0% confidence -0.030 +/- 0.006 -1.142% +/- 0.243% Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Made 64-bit only until I can test it properly on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12powerpc: Fix comment typos in arch/powerpc/include/asm/bitops.hBoqun Feng1-2/+2
In arch/powerpc/include/asm/bitops.h, the comments about bit numbers in large (> 1 word) bitmaps have two typos: - On ppc64 system, the LSB of the 4th word should be bit 192 rather than 196, because if it's bit 196, bit 192-195 will be missing in the bitmap. - On ppc32 system, the LSB of the second word should be bit 32 rather than 31, because bit 31 is already in the first word. This patch fixes these typos. Signed-off-by: Boqun Feng <boqun.feng@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10powerpc: make __ffs return unsigned longAnton Blanchard1-1/+1
I'm seeing a build warning in mm/nobootmem.c after removing bootmem: mm/nobootmem.c: In function '__free_pages_memory': include/linux/kernel.h:713:17: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void) (&_min1 == &_min2); \ ^ mm/nobootmem.c:90:11: note: in expansion of macro 'min' order = min(MAX_ORDER - 1UL, __ffs(start)); ^ The rest of the worlds seems to define __ffs as returning unsigned long, so lets do that. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-04-18arch,powerpc: Convert smp_mb__*()Peter Zijlstra1-5/+1
Powerpc allows reordering over its ll/sc implementation. Implement the two new barriers as appropriate. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-gg2ffgq32sjgy9b8lj6m3hsc@git.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-05powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.Mahesh Salgaonkar1-0/+5
If we get a machine check exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning on MMU. Otherwise we would run into nested machine checks. If we get a machine check when we are in guest, then just flush the SLBs and continue. This patch handles errors for power7. The next patch will handle errors for power8 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-18powerpc: Remove unused postfix parameter to DEFINE_BITOP()Michael Ellerman1-6/+5
None of the users of DEFINE_BITOP pass a postfix, and as far as I can tell none ever did, so drop it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-03-05powerpc: Remove unused BITOP_LE_SWIZZLE macroAkinobu Mita1-2/+0
The BITOP_LE_SWIZZLE macro was used in the little-endian bitops functions for powerpc. But these functions were converted to generic bitops and the BITOP_LE_SWIZZLE is not used anymore. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15powerpc: Use asm-generic/bitops/le.hAkinobu Mita1-54/+1
The only difference between powerpc and asm-generic le-bitops is test_bit_le(). Usually all bitops require a long aligned bitmap. But powerpc test_bit_le() can take an unaligned address. There is no special callsite of test_bit_le() that needs unaligned access in powerpc as far as I can see. So convert to use asm-generic/bitops/le.h for powerpc. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15powerpc: Remove BITOP_MASK and BITOP_WORD from asm/bitops.hAkinobu Mita1-11/+9
Replace BITOP_MASK and BITOP_WORD with BIT_MASK and BIT_WORD defined in linux/bitops.h and remove BITOP_* which are not used anymore. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-05powerpc: bitops: introduce {clear,set}_bit_le()Takuya Yoshikawa1-0/+10
Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is being used for this missing function. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-17powerpc: Fix atomic_xxx_return barrier semanticsBenjamin Herrenschmidt1-6/+6
The Documentation/memory-barriers.txt document requires that atomic operations that return a value act as a memory barrier both before and after the actual atomic operation. Our current implementation doesn't guarantee this. More specifically, while a load following the isync can not be issued before stwcx. has completed, that completion doesn't architecturally means that the result of stwcx. is visible to other processors (or any previous stores for that matter) (typically, the other processors L1 caches can still hold the old value). This has caused an actual crash in RCU torture testing on Power 7 This fixes it by changing those atomic ops to use new macros instead of RELEASE/ACQUIRE barriers, called ATOMIC_ENTRY and ATMOIC_EXIT barriers, which are then defined respectively to lwsync and sync. I haven't had a chance to measure the performance impact (or rather what I measured with kernel compiles is in the noise, I yet have to find a more precise benchmark) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-27asm-generic: add another generic ext2 atomic bitopsAkinobu Mita1-4/+1
The majority of architectures implement ext2 atomic bitops as test_and_{set,clear}_bit() without spinlock. This adds this type of generic implementation in ext2-atomic-setbit.h and use it wherever possible. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by: Andreas Dilger <adilger@dilger.ca> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-31Fix common misspellingsLucas De Marchi1-2/+2
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-24bitops: remove minix bitops from asm/bitops.hAkinobu Mita1-14/+0
minix bit operations are only used by minix filesystem and useless by other modules. Because byte order of inode and block bitmaps is different on each architecture like below: m68k: big-endian 16bit indexed bitmaps h8300, microblaze, s390, sparc, m68knommu: big-endian 32 or 64bit indexed bitmaps m32r, mips, sh, xtensa: big-endian 32 or 64bit indexed bitmaps for big-endian mode little-endian bitmaps for little-endian mode Others: little-endian bitmaps In order to move minix bit operations from asm/bitops.h to architecture independent code in minix filesystem, this provides two config options. CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k. CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu, m32r, mips, sh, xtensa). The architectures which always use little-endian bitmaps do not select these options. Finally, we can remove minix bit operations from asm/bitops.h for all architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hirokazu Takata <takata@linux-m32r.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24bitops: remove ext2 non-atomic bitops from asm/bitops.hAkinobu Mita1-14/+0
As the result of conversions, there are no users of ext2 non-atomic bit operations except for ext2 filesystem itself. Now we can put them into architecture independent code in ext2 filesystem, and remove from asm/bitops.h for all architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24powerpc: introduce little-endian bitopsAkinobu Mita1-23/+38
Introduce little-endian bit operations by renaming existing powerpc native little-endian bit operations and changing them to take any pointer types. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24asm-generic: change little-endian bitops to take any pointer typesAkinobu Mita1-2/+2
This makes the little-endian bitops take any pointer types by changing the prototypes and adding casts in the preprocessor macros. That would seem to at least make all the filesystem code happier, and they can continue to do just something like #define ext2_set_bit __test_and_set_bit_le (or whatever the exact sequence ends up being). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24asm-generic: rename generic little-endian bitops functionsAkinobu Mita1-7/+8
As a preparation for providing little-endian bitops for all architectures, This renames generic implementation of little-endian bitops. (remove "generic_" prefix and postfix "_le") s/generic_find_next_le_bit/find_next_bit_le/ s/generic_find_next_zero_le_bit/find_next_zero_bit_le/ s/generic_find_first_zero_le_bit/find_first_zero_bit_le/ s/generic___test_and_set_le_bit/__test_and_set_bit_le/ s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/ s/generic_test_le_bit/test_bit_le/ s/generic___set_le_bit/__set_bit_le/ s/generic___clear_le_bit/__clear_bit_le/ s/generic_test_and_set_le_bit/test_and_set_bit_le/ s/generic_test_and_clear_le_bit/test_and_clear_bit_le/ Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-29powerpc: Add support for popcnt instructionsAnton Blanchard1-0/+9
POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Rename LWSYNC_ON_SMP to PPC_RELEASE_BARRIER, ISYNC_ON_SMP to ↵Anton Blanchard1-6/+10
PPC_ACQUIRE_BARRIER For performance reasons we are about to change ISYNC_ON_SMP to sometimes be lwsync. Now that the macro name doesn't make sense, change it and LWSYNC_ON_SMP to better explain what the barriers are doing. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Use lwarx/ldarx hint in bit locksAnton Blanchard1-24/+24
This patch implements the lwarx/ldarx hint bit for bit locks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: expose the multi-bit ops that underlie single-bit ops.Geoff Thorpe1-134/+62
The bitops.h functions that operate on a single bit in a bitfield are implemented by operating on the corresponding word location. In all cases the inner logic is valid if the mask being applied has more than one bit set, so this patch exposes those inner operations. Indeed, set_bits() was already available, but it duplicated code from set_bit() (rather than making the latter a wrapper) - it was also missing the PPC405_ERR77() workaround and the "volatile" address qualifier present in other APIs. This corrects that, and exposes the other multi-bit equivalents. One advantage of these multi-bit forms is that they allow word-sized variables to essentially be their own spinlocks, eg. very useful for state machines where an atomic "flags" variable can obviate the need for any additional locking. Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-08-04powerpc: Move include files to arch/powerpc/include/asmStephen Rothwell1-0/+410
from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>