<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/cpumask.h, branch v5.10.258</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.258</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.258'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-09-04T11:17:30+00:00</updated>
<entry>
<title>bitmap: introduce generic optimized bitmap_size()</title>
<updated>2024-09-04T11:17:30+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-03-27T15:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=de7be1940c34e752a2027fea7586cba0c8fa39d7'/>
<id>urn:sha1:de7be1940c34e752a2027fea7586cba0c8fa39d7</id>
<content type='text'>
commit a37fbe666c016fd89e4460d0ebfcea05baba46dc upstream.

The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.

Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):

48 83 c0 3f          	add    $0x3f,%rax
48 c1 e8 06          	shr    $0x6,%rax
48 8d 14 c5 00 00 00 00	lea    0x0(,%rax,8),%rdx

%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:

8d 50 3f             	lea    0x3f(%rax),%edx
c1 ea 03             	shr    $0x3,%edx
81 e2 f8 ff ff 1f    	and    $0x1ffffff8,%edx

Now it shifts `nbits + 63` by 3 positions (IOW performs fast division
by 8) and then masks bits[2:0]. bloat-o-meter:

add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)

Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:

add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)

Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.

Reviewed-by: Przemek Kitszel &lt;przemyslaw.kitszel@intel.com&gt;
Acked-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Distribute tasks within affinity masks</title>
<updated>2020-03-20T12:06:18+00:00</updated>
<author>
<name>Paul Turner</name>
<email>pjt@google.com</email>
</author>
<published>2020-03-11T01:01:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46a87b3851f0d6eb05e6d83d5c5a30df0eca8f76'/>
<id>urn:sha1:46a87b3851f0d6eb05e6d83d5c5a30df0eca8f76</id>
<content type='text'>
Currently, when updating the affinity of tasks via either cpusets.cpus,
or, sched_setaffinity(); tasks not currently running within the newly
specified mask will be arbitrarily assigned to the first CPU within the
mask.

This (particularly in the case that we are restricting masks) can
result in many tasks being assigned to the first CPUs of their new
masks.

This:
 1) Can induce scheduling delays while the load-balancer has a chance to
    spread them between their new CPUs.
 2) Can antogonize a poor load-balancer behavior where it has a
    difficult time recognizing that a cross-socket imbalance has been
    forced by an affinity mask.

This change adds a new cpumask interface to allow iterated calls to
distribute within the intersection of the provided masks.

The cases that this mainly affects are:
 - modifying cpuset.cpus
 - when tasks join a cpuset
 - when modifying a task's affinity via sched_setaffinity(2)

Signed-off-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Josh Don &lt;joshdon@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Tested-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com
</content>
</entry>
<entry>
<title>include/linux/cpumask.h: don't calculate length of the input string</title>
<updated>2020-02-04T03:05:27+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2020-02-04T01:37:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=190535f7cf50f2d6d6e603715201c58cd6ec696b'/>
<id>urn:sha1:190535f7cf50f2d6d6e603715201c58cd6ec696b</id>
<content type='text'>
New design of inner bitmap_parse() allows to avoid calculating the size of
a null-terminated string.

Link: http://lkml.kernel.org/r/20200102043031.30357-8-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Amritha Nambiar &lt;amritha.nambiar@intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: "Tobin C . Harding" &lt;tobin@kernel.org&gt;
Cc: Vineet Gupta &lt;vineet.gupta1@synopsys.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpumask: nicer for_each_cpumask_and() signature</title>
<updated>2019-09-26T00:51:40+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2019-09-25T23:47:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a4a4082cd4438333b5ecffdd15d1a484e5a83c7'/>
<id>urn:sha1:2a4a4082cd4438333b5ecffdd15d1a484e5a83c7</id>
<content type='text'>
Mask arguments can be swapped without changing anything.  Make arguments
names reflect that:

	#define for_each_cpu_and(cpu, mask1, mask2)

Link: http://lkml.kernel.org/r/20190724183350.GA15041@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpu/hotplug: Cache number of online CPUs</title>
<updated>2019-07-25T13:48:01+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-07-09T14:23:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c09ab96fc820109d63097a2adcbbd20836b655f'/>
<id>urn:sha1:0c09ab96fc820109d63097a2adcbbd20836b655f</id>
<content type='text'>
Re-evaluating the bitmap wheight of the online cpus bitmap in every
invocation of num_online_cpus() over and over is a pretty useless
exercise. Especially when num_online_cpus() is used in code paths
like the IPI delivery of x86 or the membarrier code.

Cache the number of online CPUs in the core and just return the cached
variable. The accessor function provides only a snapshot when used without
protection against concurrent CPU hotplug.

The storage needs to use an atomic_t because the kexec and reboot code
(ab)use set_cpu_online() in their 'shutdown' handlers without any form of
serialization as pointed out by Mathieu. Regular CPU hotplug usage is
properly serialized.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de

</content>
</entry>
<entry>
<title>cpumask: Implement cpumask_or_equal()</title>
<updated>2019-07-25T13:47:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-07-22T18:47:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b9fa6442f7043e2cdd247905d4f3b80f2e9605cb'/>
<id>urn:sha1:b9fa6442f7043e2cdd247905d4f3b80f2e9605cb</id>
<content type='text'>
The IPI code of x86 needs to evaluate whether the target cpumask is equal
to the cpu_online_mask or equal except for the calling CPU.

To replace the current implementation which requires the usage of a
temporary cpumask, which might involve allocations, add a new function
which compares a cpumask to the result of two other cpumasks which are
or'ed together before comparison.

This allows to make the required decision in one go and the calling code
then can check for the calling CPU being set in the target mask with
cpumask_test_cpu().

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20190722105220.585449120@linutronix.de

</content>
</entry>
<entry>
<title>smp/hotplug: Track booted once CPUs in a cpumask</title>
<updated>2019-07-25T13:47:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-07-22T18:47:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e797bda3fd29137f6c151dfa10ea6a61c17895ce'/>
<id>urn:sha1:e797bda3fd29137f6c151dfa10ea6a61c17895ce</id>
<content type='text'>
The booted once information which is required to deal with the MCE
broadcast issue on X86 correctly is stored in the per cpu hotplug state,
which is perfectly fine for the intended purpose.

X86 needs that information for supporting NMI broadcasting via shortcuts,
but retrieving it from per cpu data is cumbersome.

Move it to a cpumask so the information can be checked against the
cpu_present_mask quickly.

No functional change intended.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20190722105219.818822855@linutronix.de

</content>
</entry>
<entry>
<title>include/linux/cpumask.h: fix double string traverse in cpumask_parse</title>
<updated>2019-05-15T02:52:50+00:00</updated>
<author>
<name>Yury Norov</name>
<email>ynorov@marvell.com</email>
</author>
<published>2019-05-14T22:44:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3713a4e1fdb8da86f96a3e770b08e278d97529b4'/>
<id>urn:sha1:3713a4e1fdb8da86f96a3e770b08e278d97529b4</id>
<content type='text'>
cpumask_parse() finds first occurrence of either or strchr() and
strlen().  We can do it better with a single call of strchrnul().

[akpm@linux-foundation.org: remove unneeded cast]
Link: http://lkml.kernel.org/r/20190409204208.12190-1-ynorov@marvell.com
Signed-off-by: Yury Norov &lt;ynorov@marvell.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpumask: make cpumask_next_wrap available without smp</title>
<updated>2018-08-13T16:05:05+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2018-08-12T13:14:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9af18e56d43ca4864ce65c3542c513827c2697de'/>
<id>urn:sha1:9af18e56d43ca4864ce65c3542c513827c2697de</id>
<content type='text'>
The kbuild robot shows build failure on machines without CONFIG_SMP:

  drivers/net/virtio_net.c:1916:10: error:
    implicit declaration of function 'cpumask_next_wrap'

cpumask_next_wrap is exported from lib/cpumask.o, which has

    lib-$(CONFIG_SMP) += cpumask.o

same as other functions, also define it as static inline in the
NR_CPUS==1 branch in include/linux/cpumask.h.

If wrap is true and next == start, return nr_cpumask_bits, or 1.
Else wrap across the range of valid cpus, here [0].

Fixes: 2ca653d607ce ("virtio_net: Stripe queue affinities across cores.")
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Tested-by: Krzysztof Kozlowski &lt;krzk@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Refactor XPS for CPUs and Rx queues</title>
<updated>2018-07-02T00:06:23+00:00</updated>
<author>
<name>Amritha Nambiar</name>
<email>amritha.nambiar@intel.com</email>
</author>
<published>2018-06-30T04:26:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=80d19669ecd34423e85ca04f2210b0e42a47cb16'/>
<id>urn:sha1:80d19669ecd34423e85ca04f2210b0e42a47cb16</id>
<content type='text'>
Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.

Signed-off-by: Amritha Nambiar &lt;amritha.nambiar@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
