Age | Commit message (Collapse) | Author | Files | Lines |
|
So the dwarf2 annotations in low level assembly code have
become an increasing hindrance: unreadable, messy macros
mixed into some of the most security sensitive code paths
of the Linux kernel.
These debug info annotations don't even buy the upstream
kernel anything: dwarf driven stack unwinding has caused
problems in the past so it's out of tree, and the upstream
kernel only uses the much more robust framepointers based
stack unwinding method.
In addition to that there's a steady, slow bitrot going
on with these annotations, requiring frequent fixups.
There's no tooling and no functionality upstream that
keeps it correct.
So burn down the sick forest, allowing new, healthier growth:
27 files changed, 350 insertions(+), 1101 deletions(-)
Someone who has the willingness and time to do this
properly can attempt to reintroduce dwarf debuginfo in x86
assembly code plus dwarf unwinding from first principles,
with the following conditions:
- it should be maximally readable, and maximally low-key to
'ordinary' code reading and maintenance.
- find a build time method to insert dwarf annotations
automatically in the most common cases, for pop/push
instructions that manipulate the stack pointer. This could
be done for example via a preprocessing step that just
looks for common patterns - plus special annotations for
the few cases where we want to depart from the default.
We have hundreds of CFI annotations, so automating most of
that makes sense.
- it should come with build tooling checks that ensure that
CFI annotations are sensible. We've seen such efforts from
the framepointer side, and there's no reason it couldn't be
done on the dwarf side.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart. It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.
Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count. Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code. The new scheme should be much
clearer to future readers.
While we're at it, improve the comments and rename the array and
common code.
Binutils may start relaxing jumps to non-weak labels. If so,
this change will fix our build, and we may need to backport this
change.
Before, on x86_64:
0000000000000000 <early_idt_handlers>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9>
5: R_X86_64_PC32 early_idt_handler-0x4
...
48: 66 90 xchg %ax,%ax
4a: 6a 08 pushq $0x8
4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51>
4d: R_X86_64_PC32 early_idt_handler-0x4
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: e9 00 00 00 00 jmpq 120 <early_idt_handler>
11c: R_X86_64_PC32 early_idt_handler-0x4
After:
0000000000000000 <early_idt_handler_array>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common>
...
48: 6a 08 pushq $0x8
4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common>
4f: cc int3
50: cc int3
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: eb 03 jmp 120 <early_idt_handler_common>
11d: cc int3
11e: cc int3
11f: cc int3
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The "movw %ds,%cx" instruction needs a 0x66 prefix, while
"movl %ds,%ecx" does not.
The difference is that latter form (on 64-bit CPUs)
overwrites the entire %ecx, not only its lower half.
But subsequent code doesn't depend on the value of upper
half of %ecx, so we can safely use the shorter instruction.
The new code is also faster than the old one - now we don't
depend on the old value of %ecx, but this code fragment is
not performance-critical so it does not matter much.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Make the disassembly look less confusing:
-- head_64.o.before.asm
++ head_64.o.after.asm
0000000000000120 <early_idt_handler>:
120: fc cld
121: 83 3c 24 02 cmpl $0x2,(%rsp)
- 125: 0f 84 9d 00 00 00 je 1c8 <is_nmi>
+ 125: 0f 84 9d 00 00 00 je 1c8 <early_idt_handler+0xa8>
12b: 83 3d 00 00 00 00 02 cmpl $0x2,0x0(%rip) # 132 <early_idt_handler+0x12>
132: 74 7e je 1b2 <early_idt_handler+0x92>
134: ff 05 00 00 00 00 incl 0x0(%rip) # 13a <early_idt_handler+0x1a>
@@ -1198,9 +1198,7 @@ Disassembly of section .init.text:
1bf: 5a pop %rdx
1c0: 59 pop %rcx
1c1: 58 pop %rax
- 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <is_nmi>
-
-00000000000001c8 <is_nmi>:
+ 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <early_idt_handler+0xa8>
1c8: 48 83 c4 10 add $0x10,%rsp
1cc: 48 cf iretq
-- head_32.o.before.asm
++ head_32.o.after.asm
0000016c <early_idt_handler>:
16c: fc cld
16d: 83 3c 24 02 cmpl $0x2,(%esp)
- 171: 74 73 je 1e6 <is_nmi>
+ 171: 74 73 je 1e6 <ex_entry+0xc>
173: 36 83 3d 00 00 00 00 cmpl $0x2,%ss:0x0
17a: 02
17b: 74 5a je 1d7 <hlt_loop>
@@ -483,8 +483,6 @@ Disassembly of section .init.text:
1dd: 59 pop %ecx
1de: 58 pop %eax
1df: 36 ff 0d 00 00 00 00 decl %ss:0x0
-
-000001e6 <is_nmi>:
1e6: 83 c4 08 add $0x8,%esp
1e9: cf iret
1ea: 66 90 xchg %ax,%ax
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
tools/testing/selftests/x86/Makefile
tools/testing/selftests/x86/run_x86_tests.sh
|
|
Packing loops tightly (-falign-loops=1) is beneficial to code size:
text data bss dec filename
12566391 1617840 1089536 15273767 vmlinux.align.16-byte
12224951 1617840 1089536 14932327 vmlinux.align.1-byte
11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte
11903735 1617840 1089536 14611111 vmlinux.align.1-byte.funcs-1-byte.loops-1-byte
Which reduces the size of the kernel by another 0.6%, so the
the total combined size reduction of the alignment-packing
patches is ~5.5%.
The x86 decoder bandwidth and caching arguments laid out in:
be6cb02779ca ("x86: Align jump targets to 1-byte boundaries")
apply to loop alignment as well.
Furtermore, modern CPU uarchs have a loop cache/buffer that
is a L0 cache before even any uop cache, covering a few
dozen most recently executed instructions.
This loop cache generally does not have the 16-byte alignment
restrictions of the uop cache.
Now loop alignment can still be beneficial if:
- a loop is cache-hot and its surroundings are not.
- if the loop is so cache hot that the instruction
flow becomes x86 decoder bandwidth limited
But loop alignment is harmful if:
- a loop is cache-cold
- a loop's surroundings are cache-hot as well
- two cache-hot loops are close to each other
- if the loop fits into the loop cache
- if the code flow is not decoder bandwidth limited
and I'd argue that the latter five scenarios are much
more common in the kernel, as our hottest loops are
typically:
- pointer chasing: this should fit into the loop cache
in most cases and is typically data cache and address
generation limited
- generic memory ops (memset, memcpy, etc.): these generally
fit into the loop cache as well, and are likewise data
cache limited.
So this patch packs loop addresses tightly as well.
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20150410123017.GB19918@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build fix from Ingo Molnar:
"A bzImage build fix on older distros"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix 'make bzImage' on older distros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also a lockdep annotation fix, a PMU event
list fix and a new model addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/liblockdep: Fix compilation error
tools/liblockdep: Fix linker error in case of cross compile
perf tools: Use getconf to determine number of online CPUs
tools: Fix tools/vm build
perf/x86/rapl: Enable Broadwell-U RAPL support
perf/x86/intel: Fix SLM cache event list
perf: Annotate inherited event ctx->mutex recursion
|
|
The following NOP in a hot function caught my attention:
> 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
That's a dead NOP that bloats the function a bit, added for the
default 16-byte alignment that GCC applies for jump targets.
I realize that x86 CPU manufacturers recommend 16-byte jump
target alignments (it's in the Intel optimization manual),
to help their relatively narrow decoder prefetch alignment
and uop cache constraints, but the cost of that is very
significant:
text data bss dec filename
12566391 1617840 1089536 15273767 vmlinux.align.16-byte
12224951 1617840 1089536 14932327 vmlinux.align.1-byte
By using 1-byte jump target alignment (i.e. no alignment at all)
we get an almost 3% reduction in kernel size (!) - and a
probably similar reduction in I$ footprint.
Now, the usual justification for jump target alignment is the
following:
- modern decoders tend to have 16-byte (effective) decoder
prefetch windows. (AMD documents it higher but measurements
suggest the effective prefetch window on curretn uarchs is
still around 16 bytes)
- on Intel there's also the uop-cache with cachelines that have
16-byte granularity and limited associativity.
- older x86 uarchs had a penalty for decoder fetches that crossed
16-byte boundaries. These limits are mostly gone from recent
uarchs.
So if a forward jump target is aligned to cacheline boundary then
prefetches will start from a new prefetch-cacheline and there's
higher chance for decoding in fewer steps and packing tightly.
But I think that argument is flawed for typical optimized kernel
code flows: forward jumps often go to 'cold' (uncommon) pieces
of code, and aligning cold code to cache lines does not bring a
lot of advantages (they are uncommon), while it causes
collateral damage:
- their alignment 'spreads out' the cache footprint, it shifts
followup hot code further out
- plus it slows down even 'cold' code that immediately follows 'hot'
code (like in the above case), which could have benefited from the
partial cacheline that comes off the end of hot code.
But even in the cache-hot case the 16 byte alignment brings
disadvantages:
- it spreads out the cache footprint, possibly making the code
fall out of the L1 I$.
- On Intel CPUs, recent microarchitectures have plenty of
uop cache (typically doubling every 3 years) - while the
size of the L1 cache grows much less aggressively. So
workloads are rarely uop cache limited.
The only situation where alignment might matter are tight
loops that could fit into a single 16 byte chunk - but those
are pretty rare in the kernel: if they exist they tend
to be pointer chasing or generic memory ops, which both tend
to be cache miss (or cache allocation) intensive and are not
decoder bandwidth limited.
So the balance of arguments strongly favors packing kernel
instructions tightly versus maximizing for decoder bandwidth:
this patch changes the jump target alignment from 16 bytes
to 1 byte (tightly packed, unaligned).
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20150410120846.GA17101@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Move __copy_user_nocache() to arch/x86/lib/copy_user_64.S and
kill the containing file.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431538944-27724-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull it up into the header and kill duplicate versions.
Separately, both macros are identical:
35948b2bd3431aee7149e85cfe4becbc /tmp/a
35948b2bd3431aee7149e85cfe4becbc /tmp/b
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431538944-27724-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
No code changed:
# arch/x86/lib/copy_user_nocache_64.o:
text data bss dec hex filename
390 0 0 390 186 copy_user_nocache_64.o.before
390 0 0 390 186 copy_user_nocache_64.o.after
md5:
7fa0577b28700af89d3a67a8b590426e copy_user_nocache_64.o.before.asm
7fa0577b28700af89d3a67a8b590426e copy_user_nocache_64.o.after.asm
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431538944-27724-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from
Avri Altman.
2) Use the correct FW API for scan completions in iwlwifi, from Avraham
Stern.
3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad
Kaufman.
4) rhashtable conversion of mac80211 station table was buggy, the
virtual interface was not taken into account. Fix from Johannes
Berg.
5) Fix deadlock in rtlwifi by not using a zero timeout for
usb_control_msg(), from Larry Finger.
6) Update reordering state before calculating loss detection, from
Yuchung Cheng.
7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter.
8) Fix extended frame handling in xiling_can driver, from Jeppe
Ledet-Pedersen.
9) Fix CODEL packet scheduler behavior in the presence of TSO packets,
from Eric Dumazet.
10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck.
11) macvlan needs to propagate promisc settings down the the lower
device, from Vlad Yasevich.
12) igb driver can oops when changing number of rings, from Toshiaki
Makita.
13) Source specific default routes not handled properly in ipv6, from
Markus Stenberg.
14) Use after free in tc_ctl_tfilter(), from WANG Cong.
15) Use softirq spinlocking in netxen driver, from Tony Camuso.
16) Two ARM bpf JIT fixes from Nicolas Schichan.
17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from
Mathias Kretschmer.
18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei
Starovoitov.
19) ll_temac driver DMA maps TX packet header with incorrect length, fix
from Michal Simek.
20) We removed pm_qos bits from netdevice.h, but some indirect
references remained. Kill them. From David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
net: Remove remaining remnants of pm_qos from netdevice.h
e1000e: Add pm_qos header
net: phy: micrel: Fix regression in kszphy_probe
net: ll_temac: Fix DMA map size bug
x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions
netns: return RTM_NEWNSID instead of RTM_GETNSID on a get
Update be2net maintainers' email addresses
net_sched: gred: use correct backlog value in WRED mode
pppoe: drop pppoe device in pppoe_unbind_sock_work
net: qca_spi: Fix possible race during probe
net: mdio-gpio: Allow for unspecified bus id
af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
bnx2x: limit fw delay in kdump to 5s after boot
ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits.
ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
mpls: Change reserved label names to be consistent with netbsd
usbnet: avoid integer overflow in start_xmit
netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
net: xgene_enet: Set hardware dependency
net: amd-xgbe: Add hardware dependency
...
|
|
FROM_BE16:
'ror %reg, 8' doesn't clear upper bits of the register,
so use additional 'movzwl' insn to zero extend 16 bits into 64
FROM_LE16:
should zero extend lower 16 bits into 64 bit
FROM_LE32:
should zero extend lower 32 bits into 64 bit
Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch enables RAPL counters (energy consumption counters)
support for Intel Broadwell-U processors (Model 61):
To use:
$ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jacob.jun.pan@linux.intel.com
Cc: kan.liang@intel.com
Cc: peterz@infradead.org
Cc: sonnyrao@chromium.org
Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Software optimization guides for both F15h and F16h cite those
NOPs as the optimal ones. A microbenchmark confirms that
actually even older families are better with the single-insn
NOPs so switch to them for the alternatives.
Cycles count below includes the loop overhead of the measurement
but that overhead is the same with all runs.
F10h, revE:
-----------
Running NOP tests, 1000 NOPs x 1000000 repetitions
K8:
90 288.212282 cycles
66 90 288.220840 cycles
66 66 90 288.219447 cycles
66 66 66 90 288.223204 cycles
66 66 90 66 90 571.393424 cycles
66 66 90 66 66 90 571.374919 cycles
66 66 66 90 66 66 90 572.249281 cycles
66 66 66 90 66 66 66 90 571.388651 cycles
P6:
90 288.214193 cycles
66 90 288.225550 cycles
0f 1f 00 288.224441 cycles
0f 1f 40 00 288.225030 cycles
0f 1f 44 00 00 288.233558 cycles
66 0f 1f 44 00 00 324.792342 cycles
0f 1f 80 00 00 00 00 325.657462 cycles
0f 1f 84 00 00 00 00 00 430.246643 cycles
F14h:
----
Running NOP tests, 1000 NOPs x 1000000 repetitions
K8:
90 510.404890 cycles
66 90 510.432117 cycles
66 66 90 510.561858 cycles
66 66 66 90 510.541865 cycles
66 66 90 66 90 1014.192782 cycles
66 66 90 66 66 90 1014.226546 cycles
66 66 66 90 66 66 90 1014.334299 cycles
66 66 66 90 66 66 66 90 1014.381205 cycles
P6:
90 510.436710 cycles
66 90 510.448229 cycles
0f 1f 00 510.545100 cycles
0f 1f 40 00 510.502792 cycles
0f 1f 44 00 00 510.589517 cycles
66 0f 1f 44 00 00 510.611462 cycles
0f 1f 80 00 00 00 00 511.166794 cycles
0f 1f 84 00 00 00 00 00 511.651641 cycles
F15h:
-----
Running NOP tests, 1000 NOPs x 1000000 repetitions
K8:
90 243.128396 cycles
66 90 243.129883 cycles
66 66 90 243.131631 cycles
66 66 66 90 242.499324 cycles
66 66 90 66 90 481.829083 cycles
66 66 90 66 66 90 481.884413 cycles
66 66 66 90 66 66 90 481.851446 cycles
66 66 66 90 66 66 66 90 481.409220 cycles
P6:
90 243.127026 cycles
66 90 243.130711 cycles
0f 1f 00 243.122747 cycles
0f 1f 40 00 242.497617 cycles
0f 1f 44 00 00 245.354461 cycles
66 0f 1f 44 00 00 361.930417 cycles
0f 1f 80 00 00 00 00 362.844944 cycles
0f 1f 84 00 00 00 00 00 480.514948 cycles
F16h:
-----
Running NOP tests, 1000 NOPs x 1000000 repetitions
K8:
90 507.793298 cycles
66 90 507.789636 cycles
66 66 90 507.826490 cycles
66 66 66 90 507.859075 cycles
66 66 90 66 90 1008.663129 cycles
66 66 90 66 66 90 1008.696259 cycles
66 66 66 90 66 66 90 1008.692517 cycles
66 66 66 90 66 66 66 90 1008.755399 cycles
P6:
90 507.795232 cycles
66 90 507.794761 cycles
0f 1f 00 507.834901 cycles
0f 1f 40 00 507.822629 cycles
0f 1f 44 00 00 507.838493 cycles
66 0f 1f 44 00 00 507.908597 cycles
0f 1f 80 00 00 00 00 507.946417 cycles
0f 1f 84 00 00 00 00 00 507.954960 cycles
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along
with include/uapi.
This looks more consistent, and this fixes "make bzImage" on my
old distro which doesn't have asm/bitsperlong.h in /usr/include/.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6f121e548f83 ("x86, vdso: Reimplement vdso.so preparation in build-time C")
Link: http://lkml.kernel.org/r/1431332153-18566-6-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150507165835.GB18652@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since the ISA irqs are in a single block, use
ISA_IRQ_VECTOR(irq) instead of individual macros.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Use IA32_SYSCALL_VECTOR for both compat and native.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The invalidate_interrupt* functions no longer exist.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Move irq_regs and irq_stat definitions to irq.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).
Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
expression can be used in both 32-bit and 64-bit code.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
PER_CPU_VAR(kernel_stack) is redundant:
- On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
- On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).
PER_CPU_VAR(kernel_stack) will be deleted by a separate change.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously
doesn't inline very small functions we expect to be inlined:
$ nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^
*1 ' | sort -rn 473 000000000000000b t spin_unlock_irqrestore
449 000000000000005f t rcu_read_unlock
355 0000000000000009 t atomic_inc <== THIS
353 000000000000006e t rcu_read_lock
350 0000000000000075 t rcu_read_lock_sched_held
291 000000000000000b t spin_unlock
266 0000000000000019 t arch_local_irq_restore
215 000000000000000b t spin_lock
180 0000000000000011 t kzalloc
165 0000000000000012 t list_add_tail
161 0000000000000019 t arch_local_save_flags
153 0000000000000016 t test_and_set_bit
134 000000000000000b t spin_unlock_irq
134 0000000000000009 t atomic_dec <== THIS
130 000000000000000b t spin_unlock_bh
122 0000000000000010 t brelse
120 0000000000000016 t test_and_clear_bit
120 000000000000000b t spin_lock_irq
119 000000000000001e t get_dma_ops
117 0000000000000053 t cpumask_next
116 0000000000000036 t kref_get
114 000000000000001a t schedule_work
106 000000000000000b t spin_lock_bh
103 0000000000000019 t arch_local_irq_disable
...
Note sizes of marked functions. They are merely 9 bytes long!
Selecting function with 'atomic' in their names:
355 0000000000000009 t atomic_inc
134 0000000000000009 t atomic_dec
98 0000000000000014 t atomic_dec_and_test
31 000000000000000e t atomic_add_return
27 000000000000000a t atomic64_inc
26 000000000000002f t kmap_atomic
24 0000000000000009 t atomic_add
12 0000000000000009 t atomic_sub
10 0000000000000021 t __atomic_add_unless
10 000000000000000a t atomic64_add
5 000000000000001f t __atomic_add_unless.constprop.7
5 000000000000000a t atomic64_dec
4 000000000000001f t __atomic_add_unless.constprop.18
4 000000000000001f t __atomic_add_unless.constprop.12
4 000000000000001f t __atomic_add_unless.constprop.10
3 000000000000001f t __atomic_add_unless.constprop.13
3 0000000000000011 t atomic64_add_return
2 000000000000001f t __atomic_add_unless.constprop.9
2 000000000000001f t __atomic_add_unless.constprop.8
2 000000000000001f t __atomic_add_unless.constprop.6
2 000000000000001f t __atomic_add_unless.constprop.5
2 000000000000001f t __atomic_add_unless.constprop.3
2 000000000000001f t __atomic_add_unless.constprop.22
2 000000000000001f t __atomic_add_unless.constprop.14
2 000000000000001f t __atomic_add_unless.constprop.11
2 000000000000001e t atomic_dec_if_positive
2 0000000000000014 t atomic_inc_and_test
2 0000000000000011 t atomic_add_return.constprop.4
2 0000000000000011 t atomic_add_return.constprop.17
2 0000000000000011 t atomic_add_return.constprop.16
2 000000000000000d t atomic_inc.constprop.4
2 000000000000000c t atomic_cmpxchg
This patch fixes this for x86 atomic ops via
s/inline/__always_inline/. This decreases allyesconfig kernel by
about 25k:
text data bss dec hex filename
82399481 22255416 20627456 125282353 777a831 vmlinux.before
82375570 22255544 20627456 125258570 7774b4a vmlinux
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1431080762-17797-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
iTLB-load-misses and LLC-load-misses count incorrectly on SLM.
There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK
should be used to count iTLB-load-misses. This event counts when an
instruction (I) page walk is completed or started. Since a page walk
implies a TLB miss, the number of TLB misses can be counted by counting
the number of pagewalks.
DMND_DATA_RD counts both demand and DCU prefetch data reads. However,
LLC-load-misses should only count demand reads. There is no way to not
include prefetches with a single counter on SLM. So the LLC-load-misses
support should be removed on SLM.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
By the nature of TEST operation, it is often possible
to test a narrower part of the operand:
"testl $3, mem" -> "testb $3, mem"
This results in shorter insns, because TEST insn has no
sign-entending byte-immediate forms unlike other ALU ops.
text data bss dec hex filename
11674 0 0 11674 2d9a entry_64.o.before
11658 0 0 11658 2d8a entry_64.o
Changes in object code:
- f7 84 24 88 00 00 00 03 00 00 00 testl $0x3,0x88(%rsp)
+ f6 84 24 88 00 00 00 03 testb $0x3,0x88(%rsp)
- f7 44 24 68 03 00 00 00 testl $0x3,0x68(%rsp)
+ f6 44 24 68 03 testb $0x3,0x68(%rsp)
- f7 84 24 90 00 00 00 03 00 00 00 testl $0x3,0x90(%rsp)
+ f6 84 24 90 00 00 00 03 testb $0x3,0x90(%rsp)
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
After TESTs, use logically correct JZ/JNZ mnemonics instead of
JE/JNE. This doesn't change code.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These include three regression fixes (PCI resources management,
ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI
documentation fixes related to GPIO.
Specifics:
- Fix for a PCI resources management regression introduced during the
4.0 cycle and related to the handling of ACPI resources'
Producer/Consumer flags that turn out to be useless (Jiang Liu)
- Fix for a MacBook regression related to the Smart Battery Subsystem
(SBS) driver causing various problems (stalls on boot, failure to
detect or report battery) to happen and introduced during the 3.18
cycle (Chris Bainbridge)
- Fix for an ACPI/PNP device enumeration regression introduced during
the 3.16 cycle caused by failing to include two PNP device IDs into
the list of IDs that PNP device objects need to be created for
(Witold Szczeponik)
- Fixes for two minor mistakes in the ACPI GPIO properties
documentation (Antonio Ospite, Rafael J Wysocki)"
* tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PNP: add two IDs to list for PNPACPI device enumeration
ACPI / documentation: Fix ambiguity in the GPIO properties document
ACPI / documentation: fix a sentence about GPIO resources
ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
|
|
* acpi-resources:
x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
* acpi-battery:
ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
* acpi-doc:
ACPI / documentation: Fix ambiguity in the GPIO properties document
ACPI / documentation: fix a sentence about GPIO resources
* acpi-pnp:
ACPI / PNP: add two IDs to list for PNPACPI device enumeration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- fix blkback regression if using persistent grants
- fix various event channel related suspend/resume bugs
- fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS
- SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
capable of 32-bit DMA work.
* tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
xen/console: Update console event channel on resume
xen/xenbus: Update xenbus event channel on resume
xen/events: Clear cpu_evtchn_mask before resuming
xen-pciback: Add name prefix to global 'permissive' variable
xen: Suspend ticks on all CPUs during suspend
xen/grant: introduce func gnttab_unmap_refs_sync()
xen/blkback: safely unmap purge persistent grants
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
two build fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Always restore_xinit_state() when use_eager_cpu()
x86: Make cpu_tss available to external modules
efi: Fix error handling in add_sysfs_runtime_map_entry()
x86/spinlocks: Fix regression in spinlock contention detection
x86/mm: Clean up types in xlate_dev_mem_ptr()
x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
efivarfs: Ensure VariableName is NUL-terminated
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
PMU driver hardware-enablement addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix segfault if passed with ''.
perf report: Fix -T/--threads option to work again
perf bench numa: Fix immediate meeting of convergence condition
perf bench numa: Fixes of --quiet argument
perf bench futex: Fix hung wakeup tasks after requeueing
perf probe: Fix bug with global variables handling
perf top: Fix a segfault when kernel map is restricted.
tools lib traceevent: Fix build failure on 32-bit arch
perf kmem: Fix compiles on RHEL6/OL6
tools lib api: Undefine _FORTIFY_SOURCE before setting it
perf kmem: Consistently use PRIu64 for printing u64 values
perf trace: Disable events and drain events when forked workload ends
perf trace: Enable events when doing system wide tracing and starting a workload
perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
|
|
Make sure that xen_swiotlb_init allocates buffers that are DMA capable
when at least one memblock is available below 4G. Otherwise we assume
that all devices on the SoC can cope with >4G addresses. We do this on
ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.
No functional changes on x86.
From: Chen Baozi <baozich@gmail.com>
Signed-off-by: Chen Baozi <baozich@gmail.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Chen Baozi <baozich@gmail.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
Add some text to the macro magic for future reference and against
failing human memory.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The following commit:
f893959b0898 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).
The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.
Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().
Tested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
GCC 5 added a compiler option, -mskip-rax-setup, for x86-64. It skips
setting up the RAX register when SSE is disabled and there are no
variable arguments passed in vector registers. (According to the x86_64
ABI, %al is used as a hidden register containing the number of vector
registers used).
Since the kernel doesn't pass vector registers to functions with
variable arguments, this option can be used to optimize the x86-64
kernel.
This GCC feature was suggested by Rasmus Villemoes <linux@rasmusvillemoes.dk>.
This is the corresponding kernel change using it.
For kernel v3.17:
text data bss dec filename
11455921 2204048 5853184 19513153 vmlinux #with -mskip-rax-setup
11480079 2204048 5853184 19537311 vmlinux
For Kernel v4.0+ - custom config:
text data bss dec filename
10231778 3479800 16617472 30329050 vmlinux-gcc5+-mskip-rax-setup
10268797 3547448 16621568 30437813 vmlinux
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Avoid garbage names in efivarfs due to buggy firmware by zeroing
EFI variable name. (Ross Lagerwall)
* Stop erroneously dropping upper 32 bits of boot command line pointer
in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)
* Fix double-free bug in error handling code path of EFI runtime map
code. (Dan Carpenter)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 75182b1632 ("x86/asm/entry: Switch all C consumers of
kernel_stack to this_cpu_sp0()") changed current_thread_info
to use this_cpu_sp0, and indirectly made it rely on init_tss
which was exported with EXPORT_PER_CPU_SYMBOL_GPL.
As a result some macros and inline functions such as set/get_fs,
test_thread_flag and variants have been made unusable for
external modules.
Make cpu_tss exported with EXPORT_PER_CPU_SYMBOL so that these
functions are accessible again, as they were previously.
Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1430763404-21221-1-git-send-email-marc.dionne@your-file-system.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Commit 61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor
attribute issue") makes AMD processors set SS to __KERNEL_DS in
__switch_to() to deal with cases when SS is NULL.
This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.
Since the problem that the commit is trying to address would have to be
fixed in the hypervisor (if it in fact exists under Xen) there is no
reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.
This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
op which will clear this flag. (And since this structure is no longer
HVM-specific we should do some renaming).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
A spinlock is regarded as contended when there is at least one waiter.
Currently, the code that checks whether there are any waiters rely on
tail value being greater than head. However, this is not true if tail
reaches the max value and wraps back to zero, so arch_spin_is_contended()
incorrectly returns 0 (not contended) when tail is smaller than head.
The original code (before regression) handled this case by casting the
(tail - head) to an unsigned value. This change simply restores that
behavior.
Fixes: d6abfdb20223 ("x86/spinlocks/paravirt: Fix memory corruption on unlock")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Cc: peterz@infradead.org
Cc: Waiman.Long@hp.com
Cc: borntraeger@de.ibm.com
Cc: oleg@redhat.com
Cc: raghavendra.kt@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1430799331-20445-1-git-send-email-tahsin@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
An IO port or MMIO resource assigned to a PCI host bridge may be
consumed by the host bridge itself or available to its child
bus/devices. The ACPI specification defines a bit (Producer/Consumer)
to tell whether the resource is consumed by the host bridge itself,
but firmware hasn't used that bit consistently, so we can't rely on it.
Before commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored
all IO port resources defined by acpi_resource_io and
acpi_resource_fixed_io to filter out IO ports consumed by the host
bridge itself.
Commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces
to simplify implementation") started accepting all IO port and MMIO
resources, which caused a regression that IO port resources consumed
by the host bridge itself became available to its child devices.
Then commit 63f1789ec716 ("x86/PCI/ACPI: Ignore resources consumed by
host bridge itself") ignored resources consumed by the host bridge
itself by checking the IORESOURCE_WINDOW flag, which accidently removed
MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32
and acpi_resource_fixed_memory32.
On x86 and IA64 platforms, all IO port and MMIO resources are assumed
to be available to child bus/devices except one special case:
IO port [0xCF8-0xCFF] is consumed by the host bridge itself
to access PCI configuration space.
So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution
will also ease the way to consolidate ACPI PCI host bridge common code
from x86, ia64 and ARM64.
Related ACPI table are archived at:
https://bugzilla.kernel.org/show_bug.cgi?id=94221
Related discussions at:
http://patchwork.ozlabs.org/patch/461633/
https://lkml.org/lkml/2015/3/29/304
Fixes: 63f1789ec716 (Ignore resources consumed by host bridge itself)
Reported-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit 77e32c89a711 ("clockevents: Manage device's state separately for
the core") decouples clockevent device's modes from states. With this
change when a Xen guest tries to resume, it won't be calling its
set_mode op which needs to be done on each VCPU in order to make the
hypervisor aware that we are in oneshot mode.
This happens because clockevents_tick_resume() (which is an intermediate
step of resuming ticks on a processor) doesn't call clockevents_set_state()
anymore and because during suspend clockevent devices on all VCPUs (except
for the one doing the suspend) are left in ONESHOT state. As result, during
resume the clockevents state machine will assume that device is already
where it should be and doesn't need to be updated.
To avoid this problem we should suspend ticks on all VCPUs during
suspend.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27
and 80f7fdb1c7f0f9266421f823964fd1962681f6ce.
The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC. Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.
As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.
Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The kvmclock spec says that the host will increment a version field to
an odd number, then update stuff, then increment it to an even number.
The host is buggy and doesn't do this, and the result is observable
when one vcpu reads another vcpu's kvmclock data.
There's no good way for a guest kernel to keep its vdso from reading
a different vcpu's kvmclock data, but we don't need to care about
changing VCPUs as long as we read a consistent data from kvmclock.
(VCPU can change outside of this loop too, so it doesn't matter if we
return a value not fit for this VCPU.)
Based on a patch by Radim Krčmář.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.
Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.
This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
|
|
Pull crypto fixes from Herbert Xu:
"This push fixes a build problem with img-hash under non-standard
configurations and a serious regression with sha512_ssse3 which can
lead to boot failures"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA
crypto: x86/sha512_ssse3 - fixup for asm function prototype change
|
|
Pull second batch of KVM changes from Paolo Bonzini:
"This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and
some cleanups). But there are also bug fixes and small cleanups for
ARM, x86 and s390.
The task_migration_notifier revert and real fix is still pending
review, but I'll send it as soon as possible after -rc1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm/arm64: check IRQ number on userland injection
KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
KVM: VMX: Preserve host CR4.MCE value while in guest mode.
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
KVM: PPC: Book3S HV: Streamline guest entry and exit
KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
KVM: PPC: Book3S HV: Use decrementer to wake napping threads
KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
KVM: PPC: Book3S HV: Minor cleanups
KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
KVM: PPC: Book3S HV: Add ICP real mode counters
KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
KVM: PPC: Book3S HV: Add guest->host real mode completion counters
KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
...
|