diff options
author | Kirill A. Shutemov <kirill.shutemov@linux.intel.com> | 2018-02-16 14:49:48 +0300 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2018-02-21 12:19:18 +0300 |
commit | 39b9552281abfcdfc54162897018890dafe7ffef (patch) | |
tree | 43fdf0c1942430f0c9551bf85ce09462d9b0a092 /arch/x86/entry/entry_64.S | |
parent | 92e1c5b3f7bf5407cfdbf13613e7101831216dc5 (diff) | |
download | linux-39b9552281abfcdfc54162897018890dafe7ffef.tar.xz |
x86/mm: Optimize boot-time paging mode switching cost
By this point we have functioning boot-time switching between 4- and
5-level paging mode. But naive approach comes with cost.
Numbers below are for kernel build, allmodconfig, 5 times.
CONFIG_X86_5LEVEL=n:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17308719.892691 task-clock:u (msec) # 26.772 CPUs utilized ( +- 0.11% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,993,164 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,614,978,867,455 cycles:u # 2.520 GHz ( +- 0.01% )
39,371,534,575,126 stalled-cycles-frontend:u # 90.27% frontend cycles idle ( +- 0.09% )
28,363,350,152,428 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,784,066,413 branches:u # 364.948 M/sec ( +- 0.00% )
250,808,144,781 branch-misses:u # 3.97% of all branches ( +- 0.01% )
646.531974142 seconds time elapsed ( +- 1.15% )
CONFIG_X86_5LEVEL=y:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17411536.780625 task-clock:u (msec) # 26.426 CPUs utilized ( +- 0.10% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,868,663 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,865,909,056,301 cycles:u # 2.519 GHz ( +- 0.01% )
39,740,130,365,581 stalled-cycles-frontend:u # 90.59% frontend cycles idle ( +- 0.05% )
28,363,358,997,959 instructions:u # 0.65 insn per cycle
# 1.40 stalled cycles per insn ( +- 0.00% )
6,316,784,937,460 branches:u # 362.793 M/sec ( +- 0.00% )
251,531,919,485 branch-misses:u # 3.98% of all branches ( +- 0.00% )
658.886307752 seconds time elapsed ( +- 0.92% )
The patch tries to fix the performance regression by using
cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in
all hot code paths. These will statically patch the target code for
additional performance.
CONFIG_X86_5LEVEL=y + the patch:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17381990.268506 task-clock:u (msec) # 26.907 CPUs utilized ( +- 0.19% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,862,625 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,697,726,320,051 cycles:u # 2.514 GHz ( +- 0.03% )
39,480,408,690,401 stalled-cycles-frontend:u # 90.35% frontend cycles idle ( +- 0.05% )
28,363,394,221,388 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,794,985,573 branches:u # 363.410 M/sec ( +- 0.00% )
251,013,232,547 branch-misses:u # 3.97% of all branches ( +- 0.01% )
645.991174661 seconds time elapsed ( +- 1.19% )
Unfortunately, this approach doesn't help with text size:
vmlinux.before .text size: 8190319
vmlinux.after .text size: 8200623
The .text section is increased by about 4k. Not sure if we can do anything
about this.
Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'arch/x86/entry/entry_64.S')
-rw-r--r-- | arch/x86/entry/entry_64.S | 11 |
1 files changed, 2 insertions, 9 deletions
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 2c06348b7807..b18acdff9c3f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -275,15 +275,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) * depending on paging mode) in the address. */ #ifdef CONFIG_X86_5LEVEL - testl $1, pgtable_l5_enabled(%rip) - jz 1f - shl $(64 - 57), %rcx - sar $(64 - 57), %rcx - jmp 2f -1: - shl $(64 - 48), %rcx - sar $(64 - 48), %rcx -2: + ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \ + "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57 #else shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx |