<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch/powerpc/include/asm/bitops.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-08T21:58:51+00:00</updated>
<entry>
<title>powerpc: Add __attribute_const__ to ffs()-family implementations</title>
<updated>2025-09-08T21:58:51+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2025-08-04T16:44:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69057d3db759cc260aee4ab189b76ae91fbc18f9'/>
<id>urn:sha1:69057d3db759cc260aee4ab189b76ae91fbc18f9</id>
<content type='text'>
While tracking down a problem where constant expressions used by
BUILD_BUG_ON() suddenly stopped working[1], we found that an added static
initializer was convincing the compiler that it couldn't track the state
of the prior statically initialized value. Tracing this down found that
ffs() was used in the initializer macro, but since it wasn't marked with
__attribute__const__, the compiler had to assume the function might
change variable states as a side-effect (which is not true for ffs(),
which provides deterministic math results).

Add missing __attribute_const__ annotations to PowerPC's implementations of
fls() function. These are pure mathematical functions that always return
the same result for the same input with no side effects, making them eligible
for compiler optimization.

Build tested ARCH=powerpc defconfig with GCC powerpc-linux-gnu 14.2.0.

Link: https://github.com/KSPP/linux/issues/364 [1]
Acked-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250804164417.1612371-5-kees@kernel.org
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>powerpc: implement arch_xor_unlock_is_negative_byte on 32-bit</title>
<updated>2023-10-18T21:34:17+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2023-10-04T16:53:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=51a752c28bcf901618bbc25a43f84ef539f9e682'/>
<id>urn:sha1:51a752c28bcf901618bbc25a43f84ef539f9e682</id>
<content type='text'>
Simply remove the ifdef.  The assembly is identical to that in the
non-optimised case of test_and_clear_bits() on PPC32, and it's not clear
to me how the PPC32 optimisation works, nor whether it would work for
arch_xor_unlock_is_negative_byte().  If that optimisation would work,
someone can implement it later, but this is more efficient than the
implementation in filemap.c.

Link: https://lkml.kernel.org/r/20231004165317.1061855-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bitops: add xor_unlock_is_negative_byte()</title>
<updated>2023-10-18T21:34:16+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2023-10-04T16:53:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=247dbcdbf790c52fc76cf8e327cd0a5778e41e66'/>
<id>urn:sha1:247dbcdbf790c52fc76cf8e327cd0a5778e41e66</id>
<content type='text'>
Replace clear_bit_and_unlock_is_negative_byte() with
xor_unlock_is_negative_byte().  We have a few places that like to lock a
folio, set a flag and unlock it again.  Allow for the possibility of
combining the latter two operations for efficiency.  We are guaranteed
that the caller holds the lock, so it is safe to unlock it with the xor. 
The caller must guarantee that nobody else will set the flag without
holding the lock; it is not safe to do this with the PG_dirty flag, for
example.

Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>powerpc: Don't hide eh field of lwarx behind a macro</title>
<updated>2022-08-10T05:32:02+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2022-08-02T09:02:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eb5a33ea31190c189ca4a59de4687b0877662c06'/>
<id>urn:sha1:eb5a33ea31190c189ca4a59de4687b0877662c06</id>
<content type='text'>
The eh field must remain 0 for PPC32 and is only used
by PPC64.

Don't hide that behind a macro, just leave the responsibility
to the user.

At the time being, the only users of PPC_RAW_L{WDQ}ARX are
setting the eh field to 0, so the special handling of __PPC_EH
is useless. Just take the value given by the caller.

Same for DEFINE_TESTOP(), don't do special handling in that
macro, ensure the caller hands over the proper eh value.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
[mpe: Use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/8b9c8a1a14f9143552a85fcbf96698224a8c2469.1659430931.git.christophe.leroy@csgroup.eu
</content>
</entry>
<entry>
<title>powerpc/bitops: Force inlining of fls()</title>
<updated>2022-03-08T11:33:03+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2022-02-11T08:51:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b0057cc4193c7cd9c0829a440e4901b29ce4ff8'/>
<id>urn:sha1:0b0057cc4193c7cd9c0829a440e4901b29ce4ff8</id>
<content type='text'>
Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to
the following functions being copied several times in vmlinux:

	31 times __ilog2_u32()
	34 times fls()

Disassembly follows:

	c00f476c &lt;fls&gt;:
	c00f476c:	7c 63 00 34 	cntlzw  r3,r3
	c00f4770:	20 63 00 20 	subfic  r3,r3,32
	c00f4774:	4e 80 00 20 	blr

	c00f4778 &lt;__ilog2_u32&gt;:
	c00f4778:	94 21 ff f0 	stwu    r1,-16(r1)
	c00f477c:	7c 08 02 a6 	mflr    r0
	c00f4780:	90 01 00 14 	stw     r0,20(r1)
	c00f4784:	4b ff ff e9 	bl      c00f476c &lt;fls&gt;
	c00f4788:	80 01 00 14 	lwz     r0,20(r1)
	c00f478c:	38 63 ff ff 	addi    r3,r3,-1
	c00f4790:	7c 08 03 a6 	mtlr    r0
	c00f4794:	38 21 00 10 	addi    r1,r1,16
	c00f4798:	4e 80 00 20 	blr

When forcing inlining of fls(), we get

	c0008b80 &lt;__ilog2_u32&gt;:
	c0008b80:	7c 63 00 34 	cntlzw  r3,r3
	c0008b84:	20 63 00 1f 	subfic  r3,r3,31
	c0008b88:	4e 80 00 20 	blr

vmlinux size gets reduced by 1 kbyte with that change.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/adc9c9d6378f6b5008246ca717993d7870188efb.1644569473.git.christophe.leroy@csgroup.eu

</content>
</entry>
<entry>
<title>Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux</title>
<updated>2022-01-23T04:20:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-23T04:20:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3689f9f8b0c52dfd8f5995e4b58917f8f3ac3ee3'/>
<id>urn:sha1:3689f9f8b0c52dfd8f5995e4b58917f8f3ac3ee3</id>
<content type='text'>
Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
</content>
</entry>
<entry>
<title>include: move find.h from asm_generic to linux</title>
<updated>2022-01-15T16:47:31+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2021-08-14T21:16:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47d8c15615c0a2046d2d90b04cb80b81ddf31fb1'/>
<id>urn:sha1:47d8c15615c0a2046d2d90b04cb80b81ddf31fb1</id>
<content type='text'>
find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Tested-by: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
</content>
</entry>
<entry>
<title>powerpc/bitops: Use immediate operand when possible</title>
<updated>2021-11-30T00:45:50+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2021-09-21T15:09:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb350784d8d17952afa93383bb47aaa6b715c459'/>
<id>urn:sha1:fb350784d8d17952afa93383bb47aaa6b715c459</id>
<content type='text'>
Today we get the following code generation for bitops like
set or clear bit:

	c0009fe0:	39 40 08 00 	li      r10,2048
	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8

	c000d568:	39 00 18 00 	li      r8,6144
	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
	c000d570:	7c c6 40 78 	andc    r6,r6,r8
	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10

	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Segher Boessenkool &lt;segher@kernel.crashing.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu

</content>
</entry>
<entry>
<title>powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros</title>
<updated>2021-08-25T03:35:49+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2021-03-02T08:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9401f4e46cf6965e23738f70e149172344a01eef'/>
<id>urn:sha1:9401f4e46cf6965e23738f70e149172344a01eef</id>
<content type='text'>
Force the eh flag at 0 on PPC32.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu

</content>
</entry>
<entry>
<title>powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()</title>
<updated>2020-11-19T03:50:13+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2020-10-22T14:05:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1891ef21d92c4801ea082ee8ed478e304ddc6749'/>
<id>urn:sha1:1891ef21d92c4801ea082ee8ed478e304ddc6749</id>
<content type='text'>
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.

Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.

The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:

00000388 &lt;testfls&gt;:
 388:   2c 03 00 00     cmpwi   r3,0
 38c:   41 82 00 10     beq     39c &lt;testfls+0x14&gt;
 390:   7c 63 00 34     cntlzw  r3,r3
 394:   20 63 00 20     subfic  r3,r3,32
 398:   4e 80 00 20     blr
 39c:   38 60 00 00     li      r3,0
 3a0:   4e 80 00 20     blr

000003b0 &lt;testfls64&gt;:
 3b0:   2c 03 00 00     cmpwi   r3,0
 3b4:   40 82 00 1c     bne     3d0 &lt;testfls64+0x20&gt;
 3b8:   2f 84 00 00     cmpwi   cr7,r4,0
 3bc:   38 60 00 00     li      r3,0
 3c0:   4d 9e 00 20     beqlr   cr7
 3c4:   7c 83 00 34     cntlzw  r3,r4
 3c8:   20 63 00 20     subfic  r3,r3,32
 3cc:   4e 80 00 20     blr
 3d0:   7c 63 00 34     cntlzw  r3,r3
 3d4:   20 63 00 40     subfic  r3,r3,64
 3d8:   4e 80 00 20     blr

When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.

For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:

00000388 &lt;testfls&gt;:
 388:   7c 63 00 34     cntlzw  r3,r3
 38c:   20 63 00 20     subfic  r3,r3,32
 390:   4e 80 00 20     blr

000003a0 &lt;testfls64&gt;:
 3a0:   2c 03 00 00     cmpwi   r3,0
 3a4:   40 82 00 10     bne     3b4 &lt;testfls64+0x14&gt;
 3a8:   7c 83 00 34     cntlzw  r3,r4
 3ac:   20 63 00 20     subfic  r3,r3,32
 3b0:   4e 80 00 20     blr
 3b4:   7c 63 00 34     cntlzw  r3,r3
 3b8:   20 63 00 40     subfic  r3,r3,64
 3bc:   4e 80 00 20     blr

Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Acked-by: Segher Boessenkool &lt;segher@kernel.crashing.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
</content>
</entry>
</feed>
