<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/int_sqrt.c, branch v4.9.287</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.9.287</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.9.287'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-04-05T20:29:04+00:00</updated>
<entry>
<title>lib/int_sqrt: optimize initial value compute</title>
<updated>2019-04-05T20:29:04+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2017-11-17T23:28:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=627f9c3af3845789a31cab587b49037c62255093'/>
<id>urn:sha1:627f9c3af3845789a31cab587b49037c62255093</id>
<content type='text'>
commit f8ae107eef209bff29a5816bc1aad40d5cd69a80 upstream.

The initial value (@m) compute is:

	m = 1UL &lt;&lt; (BITS_PER_LONG - 2);
	while (m &gt; x)
		m &gt;&gt;= 2;

Which is a linear search for the highest even bit smaller or equal to @x
We can implement this using a binary search using __fls() (or better when
its hardware implemented).

	m = 1UL &lt;&lt; (__fls(x) &amp; ~1UL);

Especially for small values of @x; which are the more common arguments
when doing a CDF on idle times; the linear search is near to worst case,
while the binary search of __fls() is a constant 6 (or 5 on 32bit)
branches.

      cycles:                 branches:              branch-misses:

PRE:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

SOFTWARE FLS:

hot:   29.576176 +- 0.028850  26.666730 +- 0.004511  0.019463 +- 0.000663
cold: 165.947136 +- 0.188406  26.666746 +- 0.004511  6.133897 +- 0.004386

HARDWARE FLS:

hot:   24.720922 +- 0.025161  20.666784 +- 0.004509  0.020836 +- 0.000677
cold: 132.777197 +- 0.127471  20.666776 +- 0.004509  5.080285 +- 0.003874

Averages computed over all values &lt;128k using a LFSR to generate order.
Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
each int_sqrt() invocation.

Link: http://lkml.kernel.org/r/20171020164644.936577234@infradead.org
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Suggested-by: Joe Perches &lt;joe@perches.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Anshul Garg &lt;aksgarg1989@gmail.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Michael Davidson &lt;md@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>lib/int_sqrt: optimize small argument</title>
<updated>2019-03-27T05:13:04+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2017-11-17T23:28:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6008a0525e4f9d6298762530e51dbf114455a47'/>
<id>urn:sha1:e6008a0525e4f9d6298762530e51dbf114455a47</id>
<content type='text'>
commit 3f3295709edea6268ff1609855f498035286af73 upstream.

The current int_sqrt() computation is sub-optimal for the case of small
@x.  Which is the interesting case when we're going to do cumulative
distribution functions on idle times, which we assume to be a random
variable, where the target residency of the deepest idle state gives an
upper bound on the variable (5e6ns on recent Intel chips).

In the case of small @x, the compute loop:

	while (m != 0) {
		b = y + m;
		y &gt;&gt;= 1;

		if (x &gt;= b) {
			x -= b;
			y += m;
		}
		m &gt;&gt;= 2;
	}

can be reduced to:

	while (m &gt; x)
		m &gt;&gt;= 2;

Because y==0, b==m and until x&gt;=m y will remain 0.

And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.

      cycles:                 branches:              branch-misses:

OLD:

hot:   45.109444 +- 0.044117  44.333392 +- 0.002254  0.018723 +- 0.000593
cold: 187.737379 +- 0.156678  44.333407 +- 0.002254  6.272844 +- 0.004305

PRE:

hot:   67.937492 +- 0.064124  66.999535 +- 0.000488  0.066720 +- 0.001113
cold: 232.004379 +- 0.332811  66.999527 +- 0.000488  6.914634 +- 0.006568

POST:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

Averages computed over all values &lt;128k using a LFSR to generate order.
Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
each int_sqrt() invocation.

Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org
Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Suggested-by: Anshul Garg &lt;aksgarg1989@gmail.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Michael Davidson &lt;md@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>lib/int_sqrt.c: optimize square root algorithm</title>
<updated>2013-04-30T01:28:19+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-04-29T23:18:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=30493cc9dddb68066dcc4878015660fdaa8e0965'/>
<id>urn:sha1:30493cc9dddb68066dcc4878015660fdaa8e0965</id>
<content type='text'>
Optimize the current version of the shift-and-subtract (hardware)
algorithm, described by John von Newmann[1] and Guy L Steele.

Iterating 1,000,000 times, perf shows for the current version:

 Performance counter stats for './sqrt-curr' (10 runs):

         27.170996 task-clock                #    0.979 CPUs utilized            ( +-  3.19% )
                 3 context-switches          #    0.103 K/sec                    ( +-  4.76% )
                 0 cpu-migrations            #    0.004 K/sec                    ( +-100.00% )
               104 page-faults               #    0.004 M/sec                    ( +-  0.16% )
        64,921,199 cycles                    #    2.389 GHz                      ( +-  0.03% )
        28,967,789 stalled-cycles-frontend   #   44.62% frontend cycles idle     ( +-  0.18% )
   &lt;not supported&gt; stalled-cycles-backend
       104,502,623 instructions              #    1.61  insns per cycle
                                             #    0.28  stalled cycles per insn  ( +-  0.00% )
        34,088,368 branches                  # 1254.587 M/sec                    ( +-  0.00% )
             4,901 branch-misses             #    0.01% of all branches          ( +-  1.32% )

       0.027763015 seconds time elapsed                                          ( +-  3.22% )

And for the new version:

Performance counter stats for './sqrt-new' (10 runs):

          0.496869 task-clock                #    0.519 CPUs utilized            ( +-  2.38% )
                 0 context-switches          #    0.000 K/sec
                 0 cpu-migrations            #    0.403 K/sec                    ( +-100.00% )
               104 page-faults               #    0.209 M/sec                    ( +-  0.15% )
           590,760 cycles                    #    1.189 GHz                      ( +-  2.35% )
           395,053 stalled-cycles-frontend   #   66.87% frontend cycles idle     ( +-  3.67% )
   &lt;not supported&gt; stalled-cycles-backend
           398,963 instructions              #    0.68  insns per cycle
                                             #    0.99  stalled cycles per insn  ( +-  0.39% )
            70,228 branches                  #  141.341 M/sec                    ( +-  0.36% )
             3,364 branch-misses             #    4.79% of all branches          ( +-  5.45% )

       0.000957440 seconds time elapsed                                          ( +-  2.42% )

Furthermore, this saves space in instruction text:

   text    data     bss     dec     hex filename
    111       0       0     111      6f lib/int_sqrt-baseline.o
     89       0       0      89      59 lib/int_sqrt.o

[1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Reviewed-by: Jonathan Gonzalez &lt;jgonzlez@linets.cl&gt;
Tested-by: Jonathan Gonzalez &lt;jgonzlez@linets.cl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib: reduce the use of module.h wherever possible</title>
<updated>2012-03-07T20:04:04+00:00</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2011-11-17T02:29:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8bc3bcc93a2b4e47d5d410146f6546bca6171663'/>
<id>urn:sha1:8bc3bcc93a2b4e47d5d410146f6546bca6171663</id>
<content type='text'>
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include.  Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
<entry>
<title>[PATCH] lib: Fix bug in int_sqrt() for 64 bit longs</title>
<updated>2006-02-03T16:32:08+00:00</updated>
<author>
<name>Peter Williams</name>
<email>pwil3058@bigpond.net.au</email>
</author>
<published>2006-02-03T11:04:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f0c00257d63463fa9d692e632fae037d6c0e67b0'/>
<id>urn:sha1:f0c00257d63463fa9d692e632fae037d6c0e67b0</id>
<content type='text'>
The implementation of int_sqrt() assumes that longs have 32 bits.  On
systems that have 64 bit longs this will result in gross errors when the
argument to the function is greater than 2^32 - 1 on such systems.  I doubt
whether any such use is currently made of int_sqrt() but the attached patch
fixes the problem anyway.

Signed-off-by: Peter Williams &lt;pwil3058@bigpond.com.au&gt;
Cc: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>Linux-2.6.12-rc2</title>
<updated>2005-04-16T22:20:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@ppc970.osdl.org</email>
</author>
<published>2005-04-16T22:20:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1da177e4c3f41524e886b7f1b8a0c1fc7321cac2'/>
<id>urn:sha1:1da177e4c3f41524e886b7f1b8a0c1fc7321cac2</id>
<content type='text'>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
</content>
</entry>
</feed>
