diff options
| author | David S. Miller <davem@davemloft.net> | 2017-08-11 00:59:18 +0300 | 
|---|---|---|
| committer | David S. Miller <davem@davemloft.net> | 2017-08-11 00:59:18 +0300 | 
| commit | fa5dc772e32e5c4945760f44adc7c7ee89b3475b (patch) | |
| tree | 252bf2b8dc79a6a9b6a6d0b4d3d17953b0a73111 /lib/mpi/mpi-internal.h | |
| parent | 061273f9ecdb9e55e80676bbbb8f65ecd3b9699a (diff) | |
| parent | 34060b8fffa76ded52d9e115d6b759b0456114ee (diff) | |
| download | linux-fa5dc772e32e5c4945760f44adc7c7ee89b3475b.tar.xz | |
Merge branch 'sparc64-M7-memcpy'
Babu Moger says:
====================
sparc64: Update memcpy, memset etc. for M7/M8 architectures
This series of patches updates the memcpy, memset, copy_to_user, copy_from_user
etc for SPARC M7/M8 architecture.
New algorithm here takes advantage of the M7/M8 block init store ASIs, with much
more optimized way to improve the performance. More detail are in code comments.
Tested and compared the latency measured in ticks(NG4memcpy vs new M7memcpy).
1. Memset numbers(Aligned memset)
No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
  3		77		25		-67.53
  7		43		33		-23.25
  32		72		68		 -5.55
  128		164		44		-73.17
  256		335		68		-79.70
  512		511		220		-56.94
  1024		1552		627		-59.60
  2048		3515		1322		-62.38
  4096		6303		2472		-60.78
  8192		13118		4867		-62.89
  16384		26206		10371		-60.42
  32768		52501		18569		-64.63
  65536		100219		35899		-64.17
2. Memcpy numbers(Aligned memcpy)
No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
  3		20		19		-5
  7		29		27		-6.89
  32		30		28		-6.66
  128		89		69		-22.47
  256		142		143		 0.70
  512		341		283		-17.00
  1024		1588		655		-58.75
  2048		3553		1357		-61.80
  4096		7218		2590		-64.11
  8192		13701		5231		-61.82
  16384		28304		10716		-62.13
  32768		56516		22995		-59.31
  65536		115443		50840		-55.96
3. Memset numbers(un-aligned memset)
No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
  3		40		31		-22.5
  7		52		29		-44.2307692308
  32		89		86		-3.3707865169
  128		201		74		-63.184079602
  256		340		154		-54.7058823529
  512		961		335		-65.1404786681
  1024		1799		686		-61.8677042802
  2048		3575		1260		-64.7552447552
  4096		6560		2627		-59.9542682927
  8192		13161		6018		-54.273991338
  16384		26465		10439		-60.5554505951
  32768		52119		18649		-64.2184232238
  65536		101593		35724		-64.8361599717
4. Memcpy numbers(un-aligned memcpy)
No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
  3		26		19		-26.9230769231
  7		48		45		-6.25
  32		52		49		-5.7692307692
  128		284		334		17.6056338028
  256		430		482		12.0930232558
  512		646		690		6.8111455108
  1024		1051		1016		-3.3301617507
  2048		1787		1818		1.7347509793
  4096		3309		3376		2.0247809006
  8192		8151		7444		-8.673782358
  16384		34222		34556		0.9759803635
  32768		87851		95044		8.1877269468
  65536		158331		159572		0.7838010244
There is not much difference in numbers with Un-aligned copies
between NG4memcpy and M7memcpy because they both mostly use the
same algorithems.
v2:
 1. Fixed indentation issues found by David Miller
 2. Used ENTRY and ENDPROC for the labels in M7patch.S as suggested by David Miller
 3. Now M8 also will use M7memcpy. Also tested on M8 config.
 4. These patches are created on top of below M8 patches
    https://patchwork.ozlabs.org/patch/792661/
    https://patchwork.ozlabs.org/patch/792662/
    However, I did not see these patches in sparc-next tree. It may be in queue now.
    It is possible these patches might cause some build problems. It will resolve
    once all M8 patches are in sparc-next tree.
v0: Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'lib/mpi/mpi-internal.h')
0 files changed, 0 insertions, 0 deletions
