<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/find_bit.c, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-07-08T23:11:57+00:00</updated>
<entry>
<title>bitmap: generalize node_random()</title>
<updated>2025-07-08T23:11:57+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-06-19T18:26:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c56f97c5c71f17d781461d44acb777cd21521b81'/>
<id>urn:sha1:c56f97c5c71f17d781461d44acb777cd21521b81</id>
<content type='text'>
Generalize node_random() and make it available to general bitmaps and
cpumasks users.

Notice, find_first_bit() is generally faster than find_nth_bit(), and we
employ it when there's a single set bit in the bitmap.

See commit 3e061d924fe9c7b4 ("lib/nodemask: optimize node_random for
nodemask with single NUMA node").

CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: "Yury Norov [NVIDIA]" &lt;yury.norov@gmail.com&gt;
</content>
</entry>
<entry>
<title>find: Add find_first_andnot_bit()</title>
<updated>2025-05-15T18:24:40+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-05-15T16:58:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=13f0a02bf4c1c5888c736cedef9ca50de666adb3'/>
<id>urn:sha1:13f0a02bf4c1c5888c736cedef9ca50de666adb3</id>
<content type='text'>
The function helps to implement cpumask_andnot() APIs.

Signed-off-by: Yury Norov [NVIDIA] &lt;yury.norov@gmail.com&gt;
Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: James Morse &lt;james.morse@arm.com&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Tested-by: James Morse &lt;james.morse@arm.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Link: https://lore.kernel.org/20250515165855.31452-3-james.morse@arm.com
</content>
</entry>
<entry>
<title>Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linux</title>
<updated>2024-05-21T22:29:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-05-21T22:29:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4865a27c66fda6a32511ec5492f4bbec437f512d'/>
<id>urn:sha1:4865a27c66fda6a32511ec5492f4bbec437f512d</id>
<content type='text'>
Pull bitmap updates from Yury Norov:

 - topology_span_sane() optimization from Kyle Meyer

 - fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and
   other places)

 - headers cleanup from Andy

 - add a MAINTAINERS record for bitops API

* tag 'bitmap-for-6.10v2' of https://github.com/norov/linux:
  usercopy: Don't use "proxy" headers
  bitops: Move aligned_byte_mask() to wordpart.h
  MAINTAINERS: add BITOPS API record
  bitmap: relax find_nth_bit() limitation on return value
  lib: make test_bitops compilable into the kernel image
  bitops: Optimize fns() for improved performance
  lib/test_bitops: Add benchmark test for fns()
  Compiler Attributes: Add __always_used macro
  sched/topology: Optimize topology_span_sane()
  cpumask: Add for_each_cpu_from()
</content>
</entry>
<entry>
<title>bitmap: relax find_nth_bit() limitation on return value</title>
<updated>2024-05-09T16:25:08+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2024-05-02T17:12:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b2811ba11b04353033237359c9d042eb0cdc1c1'/>
<id>urn:sha1:0b2811ba11b04353033237359c9d042eb0cdc1c1</id>
<content type='text'>
The function claims to return the bitmap size, if Nth bit doesn't exist.
This rule is violated in inline case because the fns() that is used
there doesn't know anything about size of the bitmap.

So, relax this requirement to '&gt;= size', and make the outline
implementation a bit cheaper.

All in-tree kernel users of find_nth_bit() are safe against that.

Reported-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Closes: https://lore.kernel.org/all/Zi50cAgR8nZvgLa3@yury-ThinkPad/T/#m6da806a0525e74dcc91f35e5f20766ed4e853e8a
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
</content>
</entry>
<entry>
<title>cpumask: Introduce cpumask_first_and_and()</title>
<updated>2024-04-24T19:23:49+00:00</updated>
<author>
<name>Dawei Li</name>
<email>dawei.li@shingroup.cn</email>
</author>
<published>2024-04-16T08:54:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cdc66553c4130735f0a2db943a5259e54ff1597a'/>
<id>urn:sha1:cdc66553c4130735f0a2db943a5259e54ff1597a</id>
<content type='text'>
Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
works in-place with all inputs and produces desired output directly.

Signed-off-by: Dawei Li &lt;dawei.li@shingroup.cn&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Link: https://lore.kernel.org/r/20240416085454.3547175-2-dawei.li@shingroup.cn

</content>
</entry>
<entry>
<title>cpumask: introduce for_each_cpu_or</title>
<updated>2023-03-19T17:02:04+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2023-03-16T00:31:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1470afefc3c42df5d1662f87d079b46651bdc95b'/>
<id>urn:sha1:1470afefc3c42df5d1662f87d079b46651bdc95b</id>
<content type='text'>
Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/find: introduce find_nth_and_andnot_bit</title>
<updated>2023-02-08T02:20:00+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2023-01-21T04:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=43245117806ff8914e37327b610fc08b5ddedc91'/>
<id>urn:sha1:43245117806ff8914e37327b610fc08b5ddedc91</id>
<content type='text'>
In the following patches the function is used to implement in-place bitmaps
traversing without storing intermediate result in temporary bitmaps.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Reviewed-by: Peter Lafreniere &lt;peter@n8pjl.ca&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/find_bit: Introduce find_next_andnot_bit()</title>
<updated>2022-10-06T12:57:36+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>vschneid@redhat.com</email>
</author>
<published>2022-10-03T15:34:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=90d482908eedd56f01a707325aa541cf9c40f936'/>
<id>urn:sha1:90d482908eedd56f01a707325aa541cf9c40f936</id>
<content type='text'>
In preparation of introducing for_each_cpu_andnot(), add a variant of
find_next_bit() that negate the bits in @addr2 when ANDing them with the
bits in @addr1.

Signed-off-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
</content>
</entry>
<entry>
<title>lib: add find_nth{,_and,_andnot}_bit()</title>
<updated>2022-09-26T19:19:12+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2022-09-18T03:07:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3cea8d475327756066e2a54f0b651bb7284dd448'/>
<id>urn:sha1:3cea8d475327756066e2a54f0b651bb7284dd448</id>
<content type='text'>
Kernel lacks for a function that searches for Nth bit in a bitmap.
Usually people do it like this:
	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

We can do it more efficiently, if we:
1. find a word containing Nth bit, using hweight(); and
2. find the bit, using a helper fns(), that works similarly to
   __ffs() and ffz().

fns() is implemented as a simple loop. For x86_64, there's PDEP instruction
to do that: ret = clz(pdep(1 &lt;&lt; idx, num)). However, for large bitmaps the
most of improvement comes from using hweight(), so I kept fns() simple.

New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark:
find_nth_bit:                  7154190 ns,  16411 iterations
for_each_bit:                505493126 ns,  16315 iterations

With all that, a family of 3 new functions is added, and used where
appropriate in the following patches.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
</content>
</entry>
<entry>
<title>lib/find_bit: optimize find_next_bit() functions</title>
<updated>2022-09-21T19:21:32+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2022-09-15T02:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e79864f3164c573afce09ec4b72b75ebe061c14d'/>
<id>urn:sha1:e79864f3164c573afce09ec4b72b75ebe061c14d</id>
<content type='text'>
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.

As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.

This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.

The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:

  #define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...

  find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
  {
	return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] &amp; addr3[idx],
				/* nop */, size, start);
  }

The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.

MUNGE is here to support _le code generation for BE builds. May be
empty.

I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:

                      v6.0-rc2  Optimized       Difference  Z-score
Random dense bitmap         ns         ns        ns      %
find_next_bit:          787735     670546    117189   14.9     3.97
find_next_zero_bit:     777492     664208    113284   14.6    10.51
find_last_bit:          830925     687573    143352   17.3     2.35
find_first_bit:        3874366    3306635    567731   14.7     1.84
find_first_and_bit:   40677125   37739887   2937238    7.2     1.36
find_next_and_bit:      347865     304456     43409   12.5     1.35

Random sparse bitmap
find_next_bit:           19816      14021      5795   29.2     6.10
find_next_zero_bit:    1318901    1223794     95107    7.2     1.41
find_last_bit:           14573      13514      1059    7.3     6.92
find_first_bit:        1313321    1249024     64297    4.9     1.53
find_first_and_bit:       8921       8098       823    9.2     4.56
find_next_and_bit:        9796       7176      2620   26.7     5.39

Where the statistics is significant (z-score &gt; 3), the improvement
is ~15%.

According to the bloat-o-meter, the Image size is 10-11K less:

x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)

arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
</content>
</entry>
</feed>
