Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 81e96851ea32deb2c921c870eecabf335f598aeb ]
The clang integrated assembler requires the 'cmp' instruction to
have a length prefix here:
arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl')
cmp $0xffffffff,-24(%ebp)
^
Make this a 32-bit comparison, which it was clearly meant to be.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5afc78551bf5d53279036e0bf63314e35631d79f ]
Rather than open-code test_tsk_thread_flag() at each callsite, simply
replace the couple of offenders with calls to test_tsk_thread_flag()
directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 68239654acafe6aad5a3c1dc7237e60accfebc03 ]
The sequence
fpu->initialized = 1; /* step A */
preempt_disable(); /* step B */
fpu__restore(fpu);
preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
For 32bit frames, __fpu__restore_sig() prepares the FPU state within
fpu->state. To ensure that a context switch (switch_fpu_prepare() in
particular) does not modify fpu->state it uses fpu__drop() which sets
fpu->initialized to 0.
After fpu->initialized is cleared, the CPU's FPU state is not saved
to fpu->state during a context switch. The new state is loaded via
fpu__restore(). It gets loaded into fpu->state from userland and
ensured it is sane. fpu->initialized is then set to 1 in order to avoid
fpu__initialize() doing anything (overwrite the new state) which is part
of fpu__restore().
A context switch between step A and B above would save CPU's current FPU
registers to fpu->state and overwrite the newly prepared state. This
looks like a tiny race window but the Kernel Test Robot reported this
back in 2016 while we had lazy FPU support. Borislav Petkov made the
link between that report and another patch that has been posted. Since
the removal of the lazy FPU support, this race goes unnoticed because
the warning has been removed.
Disable bottom halves around the restore sequence to avoid the race. BH
need to be disabled because BH is allowed to run (even with preemption
disabled) and might invoke kernel_fpu_begin() by doing IPsec.
[ bp: massage commit message a bit. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de
Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0d5ab144429e8bd80889b856a44d56ab4a5cd59b ]
Increment *pos in the cpuinfo_op.next to fix the following warning
triggered by cat /proc/cpuinfo:
seq_file: buggy .next function c_next did not update position index
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 73f9941306d5ce030f3ffc7db425c7b2a798cf8e ]
Building xtensa kernel with gcc-10 produces the following warnings:
arch/xtensa/kernel/xtensa_ksyms.c:90:15: warning: conflicting types
for built-in function ‘__sync_fetch_and_and_4’;
expected ‘unsigned int(volatile void *, unsigned int)’
[-Wbuiltin-declaration-mismatch]
arch/xtensa/kernel/xtensa_ksyms.c:96:15: warning: conflicting types
for built-in function ‘__sync_fetch_and_or_4’;
expected ‘unsigned int(volatile void *, unsigned int)’
[-Wbuiltin-declaration-mismatch]
Fix declarations of these functions to avoid the warning.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Commit ed26aacfb5f71eecb20a ("mips: Add udelay lpj numbers adjustment")
has backported to 4.4~5.4, but the "struct cpufreq_freqs" (and also the
cpufreq notifier machanism) of 4.4~4.19 are different from the upstream
kernel. These differences cause build errors, and this patch can fix the
build.
Cc: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Stable <stable@vger.kernel.org> # 4.4/4.9/4.14/4.19
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b7faf971081a4e56147f082234bfff55135305cb upstream.
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 00fdec98d9881bf5173af09aebd353ab3b9ac729 upstream.
Trap handler for syscall tracing reads EFA (Exception Fault Address),
in case strace wants PC of trap instruction (EFA is not part of pt_regs
as of current code).
However this EFA read is racy as it happens after dropping to pure
kernel mode (re-enabling interrupts). A taken interrupt could
context-switch, trigger a different task's trap, clobbering EFA for this
execution context.
Fix this by reading EFA early, before re-enabling interrupts. A slight
side benefit is de-duplication of FAKE_RET_FROM_EXCPN in trap handler.
The trap handler is common to both ARCompact and ARCv2 builds too.
This just came out of code rework/review and no real problem was reported
but is clearly a potential problem specially for strace.
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5ecad245de2ae23dc4e2dbece92f8ccfbaed2fa7 upstream.
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
page table entries. Intel ignores it; AMD ignores it in PDEs and PDPEs, but
reserves it in PML4Es.
Probably, earlier versions of the AMD manual documented it as reserved in PDPEs
as well, and that behavior made it into KVM as well as kvm-unit-tests; fix it.
Cc: stable@vger.kernel.org
Reported-by: Nadav Amit <namit@vmware.com>
Fixes: a0c0feb57992 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD", 2014-09-03)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8523c006264df65aac7d77284cc69aac46a6f842 ]
After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will
delay installing breakpoints, do single-step first), it won't work
correctly, and it will enter kdb due to oops.
It's because the reason gotten in kdb_stub() is not as expected, and it
seems that the ex_vector for single-step should be 0, like what arch
powerpc/sh/parisc has implemented.
Before the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 due to Breakpoint @ 0xffff8000101486cc
[3]kdb> ss
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 Oops: (null)
due to oops @ 0xffff800010082ab8
CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6
Hardware name: linux,dummy-virt (DT)
pstate: 00000085 (nzcv daIf -PAN -UAO)
pc : el1_irq+0x78/0x180
lr : __handle_sysrq+0x80/0x190
sp : ffff800015003bf0
x29: ffff800015003d20 x28: ffff0000fa878040
x27: 0000000000000000 x26: ffff80001126b1f0
x25: ffff800011b6a0d8 x24: 0000000000000000
x23: 0000000080200005 x22: ffff8000101486cc
x21: ffff800015003d30 x20: 0000ffffffffffff
x19: ffff8000119f2000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : ffff800015003e50
x7 : 0000000000000002 x6 : 00000000380b9990
x5 : ffff8000106e99e8 x4 : ffff0000fadd83c0
x3 : 0000ffffffffffff x2 : ffff800011b6a0d8
x1 : ffff800011b6a000 x0 : ffff80001130c9d8
Call trace:
el1_irq+0x78/0x180
printk+0x0/0x84
write_sysrq_trigger+0xb0/0x118
proc_reg_write+0xb4/0xe0
__vfs_write+0x18/0x40
vfs_write+0xb0/0x1b8
ksys_write+0x64/0xf0
__arm64_sys_write+0x14/0x20
el0_svc_common.constprop.2+0xb0/0x168
do_el0_svc+0x20/0x98
el0_sync_handler+0xec/0x1a8
el0_sync+0x140/0x180
[3]kdb>
After the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> g
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> ss
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to SS trap @ 0xffff800010082ab8
[0]kdb>
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200509214159.19680-2-liwei391@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 774911290c589e98e3638e73b24b0a4d4530e97c ]
The current number of KVM_IRQCHIP_NUM_PINS results in an order 3
allocation (32kb) for each guest start/restart. This can result in OOM
killer activity even with free swap when the memory is fragmented
enough:
kernel: qemu-system-s39 invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0
kernel: CPU: 1 PID: 357274 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu
kernel: Hardware name: IBM 8562 T02 Z06 (LPAR)
kernel: Call Trace:
kernel: ([<00000001f848fe2a>] show_stack+0x7a/0xc0)
kernel: [<00000001f8d3437a>] dump_stack+0x8a/0xc0
kernel: [<00000001f8687032>] dump_header+0x62/0x258
kernel: [<00000001f8686122>] oom_kill_process+0x172/0x180
kernel: [<00000001f8686abe>] out_of_memory+0xee/0x580
kernel: [<00000001f86e66b8>] __alloc_pages_slowpath+0xd18/0xe90
kernel: [<00000001f86e6ad4>] __alloc_pages_nodemask+0x2a4/0x320
kernel: [<00000001f86b1ab4>] kmalloc_order+0x34/0xb0
kernel: [<00000001f86b1b62>] kmalloc_order_trace+0x32/0xe0
kernel: [<00000001f84bb806>] kvm_set_irq_routing+0xa6/0x2e0
kernel: [<00000001f84c99a4>] kvm_arch_vm_ioctl+0x544/0x9e0
kernel: [<00000001f84b8936>] kvm_vm_ioctl+0x396/0x760
kernel: [<00000001f875df66>] do_vfs_ioctl+0x376/0x690
kernel: [<00000001f875e304>] ksys_ioctl+0x84/0xb0
kernel: [<00000001f875e39a>] __s390x_sys_ioctl+0x2a/0x40
kernel: [<00000001f8d55424>] system_call+0xd8/0x2c8
As far as I can tell s390x does not use the iopins as we bail our for
anything other than KVM_IRQ_ROUTING_S390_ADAPTER and the chip/pin is
only used for KVM_IRQ_ROUTING_IRQCHIP. So let us use a small number to
reduce the memory footprint.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200617083620.5409-1-borntraeger@de.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit fcec538ef8cca0ad0b84432235dccd9059c8e6f8 upstream.
This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.
The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:
Producer | Consumer | Hazard
----------|----------|----------------------------
mtc0 | mfc0 | any coprocessor 0 register
I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.107070] Kernel panic - not syncing: Unexpected DSP exception
[ 0.113470] Rebooting in 1 seconds..
We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.
In the kernel I bisected this problem to commit 9012d011660e ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd935c34b ("MIPS: Always
allocate exception vector for MIPSr2+").
Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8dfe804a4031ca6ba3a3efb2048534249b64f3a5 upstream.
A 32-bit perf querying the registers of a compat task using REGS_ABI_32
will receive zeroes from w15, when it expects to find the PC.
Return the PC value for register dwarf register 15 when returning register
values for a compat task to perf.
Cc: <stable@vger.kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Link: https://lore.kernel.org/r/1589165527-188401-1-git-send-email-jiping.ma2@windriver.com
[will: Shuffled code and added a comment]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bf10bd0be53282183f374af23577b18b5fbf7801 upstream.
Only MSR address range 0x800 through 0x8ff is architecturally reserved
and dedicated for accessing APIC registers in x2APIC mode.
Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 586745f1598ccf71b0a5a6df2222dee0a865954e ]
if of_find_device_by_node() succeed, imx_suspend_alloc_ocram() doesn't
have a corresponding put_device(). Thus add a jump target to fix the
exception handling for this function implementation.
Fixes: 1579c7b9fe01 ("ARM: imx53: Set DDR pins to high impedance when in suspend to RAM.")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9d964e1b82d8182184153b70174f445ea616f053 ]
lost npc in PTRACE_SETREGSET, breaking PTRACE_SETREGS as well
Fixes: cf51e129b968 "sparc32: fix register window handling in genregs32_[gs]et()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9b38cc704e844e41d9cf74e647bff1d249512cb3 ]
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:
============================================
WARNING: possible recursive locking detected
5.6.0-rc6+ #6 Not tainted
--------------------------------------------
sched-messaging/2767 is trying to acquire lock:
ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
but task is already holding lock:
ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(kretprobe_table_locks[i].lock));
lock(&(kretprobe_table_locks[i].lock));
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by sched-messaging/2767:
#0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
stack backtrace:
CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
Call Trace:
dump_stack+0x96/0xe0
__lock_acquire.cold.57+0x173/0x2b7
? native_queued_spin_lock_slowpath+0x42b/0x9e0
? lockdep_hardirqs_on+0x590/0x590
? __lock_acquire+0xf63/0x4030
lock_acquire+0x15a/0x3d0
? kretprobe_hash_lock+0x52/0xa0
_raw_spin_lock_irqsave+0x36/0x70
? kretprobe_hash_lock+0x52/0xa0
kretprobe_hash_lock+0x52/0xa0
trampoline_handler+0xf8/0x940
? kprobe_fault_handler+0x380/0x380
? find_held_lock+0x3a/0x1c0
kretprobe_trampoline+0x25/0x50
? lock_acquired+0x392/0xbc0
? _raw_spin_lock_irqsave+0x50/0x70
? __get_valid_kprobe+0x1f0/0x1f0
? _raw_spin_unlock_irqrestore+0x3b/0x40
? finish_task_switch+0x4b9/0x6d0
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.
The problem is in outside kprobe_flush_task, where we call:
kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave
where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:
kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
---> kretprobe_table_locks locked
kretprobe_trampoline
trampoline_handler
kretprobe_hash_lock(current, &head, &flags); <--- deadlock
Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.
Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.
Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2
Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b191fa96ea6dc00d331dcc28c1f7db5e075693a0 ]
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 30176466e36aadba01e1a630cf42397a3438efa4 ]
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
- one, having to do with a missing include for linux/module.h for
MODULE_NAME_LEN -- this didn't show up previously since the only
users of kprobe_lookup_name were in kprobes.c, which included
linux/module.h through other headers, and
- two, with a missing const qualifier for a local variable which ends
up referring a string literal. Again, this is unique to how
kprobe_lookup_name is being invoked in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 142cd25293f6a7ecbdff4fb0af17de6438d46433 ]
We do need access_process_vm() to access the target's reg_window.
However, access to caller's memory (storing the result in
genregs32_get(), fetching the new values in case of genregs32_set())
should be done by normal uaccess primitives.
Fixes: ad4f95764040 ([SPARC64]: Fix user accesses in regset code.)
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6bd140e14d9aaa734ec37985b8b20a96c0ece948 ]
Working on the OpenRISC glibc port I found that sometimes clone was
working strange. That the tls data argument sent in r7 was always
wrong. Further investigation revealed that the arguments were getting
clobbered in the entry code. This patch removes the code that writes to
the argument registers. This was likely due to some old code hanging
around.
This patch fixes this up for clone and fork. This fork clobber is
harmless but also useless so remove.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 126554465d93b10662742128918a5fc338cda4aa ]
The ps3_mm_region_destroy() and ps3_mm_vas_destroy() routines
are called very late in the shutdown via kexec's mmu_cleanup_all
routine. By the time mmu_cleanup_all runs it is too late to use
udbg_printf, and calling it will cause PS3 systems to hang.
Remove all debugging statements from ps3_mm_region_destroy() and
ps3_mm_vas_destroy() and replace any error reporting with calls
to lv1_panic.
With this change builds with 'DEBUG' defined will not cause kexec
reboots to hang, and builds with 'DEBUG' defined or not will end
in lv1_panic if an error is encountered.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7325c4af2b4c989c19d6a26b90b1fec9c0615ddf.1589049250.git.geoff@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit deb70f7a35a22dffa55b2c3aac71bc6fb0f486ce ]
This was discovered developing qemu fwnmi sreset support. This
off-by-one bug means the last 16 bytes of the rtas area can not
be used for a 16 byte save area.
It's not a serious bug, and QEMU implementation has to retain a
workaround for old kernels, but it's good to tighten it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200508043408.886394-7-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit be5470e0c285a68dc3afdea965032f5ddc8269d7 ]
'mem=" option is an easy way to put high pressure on memory during
some test. Hence after applying the memory limit, instead of total
mem, the actual usable memory should be considered when reserving mem
for crashkernel. Otherwise the boot up may experience OOM issue.
E.g. it would reserve 4G prior to the change and 512M afterward, if
passing
crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G",
and mem=5G on a 256G machine.
This issue is powerpc specific because it puts higher priority on
fadump and kdump reservation than on "mem=". Referring the following
code:
if (fadump_reserve_mem() == 0)
reserve_crashkernel();
...
/* Ensure that total memory size is page-aligned. */
limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
memblock_enforce_memory_limit(limit);
While on other arches, the effect of "mem=" takes a higher priority
and pass through memblock_phys_mem_size() before calling
reserve_crashkernel().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1585749644-4148-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b3583fca5fb654af2cfc1c08259abb9728272538 upstream.
If both the tracer and the tracee are compat processes, and gprs[2]
is assigned a value by __poke_user_compat, then the higher 32 bits
of gprs[2] are cleared, IS_ERR_VALUE() always returns false, and
syscall_get_error() always returns 0.
Fix the implementation by sign-extending the value for compat processes
the same way as x86 implementation does.
The bug was exposed to user space by commit 201766a20e30f ("ptrace: add
PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite.
This change fixes strace syscall tampering on s390.
Link: https://lkml.kernel.org/r/20200602180051.GA2427@altlinux.org
Fixes: 753c4dd6a2fa2 ("[S390] ptrace changes")
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 35509737c8f958944e059d501255a0bf18361ba0 upstream.
The PL310 Auxiliary Control Register shouldn't have the "Full line of
zero" optimization bit being set before L2 cache is enabled. The L2X0
driver takes care of enabling the optimization by itself.
This patch fixes a noisy error message on Tegra20 and Tegra30 telling
that cache optimization is erroneously enabled without enabling it for
the CPU:
L2C-310: enabling full line of zeros but not enabled in Cortex-A9
Cc: <stable@vger.kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cf51e129b96847f969bfb8af1ee1516a01a70b39 upstream.
It needs access_process_vm() if the traced process does not share
mm with the caller. Solution is similar to what sparc64 does.
Note that genregs32_set() is only ever called with pos being 0
or 32 * sizeof(u32) (the latter - as part of PTRACE_SETREGS
handling).
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 31e1b3efa802f97a17628dde280006c4cee4ce5e ]
Register "a1" is unsaved in this function,
when CONFIG_TRACE_IRQFLAGS is enabled,
the TRACE_IRQS_OFF macro will call trace_hardirqs_off(),
and this may change register "a1".
The changed register "a1" as argument will be send
to do_fpe() and do_msa_fpe().
Signed-off-by: YuanJunQing <yuanjunqing66@163.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bcc44f6b74106b31f0b0408b70305a40360d63b7 ]
There is no VIA2 chip on the Mac IIfx, so don't call via_flush_cache().
This avoids a boot crash which appeared in v5.4.
printk: console [ttyS0] enabled
printk: bootconsole [debug0] disabled
printk: bootconsole [debug0] disabled
Calibrating delay loop... 9.61 BogoMIPS (lpj=48064)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc.isra.27+0x68/0x194 with crng_init=0
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
futex hash table entries: 256 (order: -1, 3072 bytes, linear)
NET: Registered protocol family 16
Data read fault at 0x00000000 in Super Data (pc=0x8a6a)
BAD KERNEL BUSERR
Oops: 00000000
Modules linked in:
PC: [<00008a6a>] via_flush_cache+0x12/0x2c
SR: 2700 SP: 01c1fe3c a2: 01c24000
d0: 00001119 d1: 0000000c d2: 00012000 d3: 0000000f
d4: 01c06840 d5: 00033b92 a0: 00000000 a1: 00000000
Process swapper (pid: 1, task=01c24000)
Frame format=B ssw=0755 isc=0200 isb=fff7 daddr=00000000 dobuf=01c1fed0
baddr=00008a6e dibuf=0000004e ver=f
Stack from 01c1fec4:
01c1fed0 00007d7e 00010080 01c1fedc 0000792e 00000001 01c1fef4 00006b40
01c80000 00040000 00000006 00000003 01c1ff1c 004a545e 004ff200 00040000
00000000 00000003 01c06840 00033b92 004a5410 004b6c88 01c1ff84 000021e2
00000073 00000003 01c06840 00033b92 0038507a 004bb094 004b6ca8 004b6c88
004b6ca4 004b6c88 000021ae 00020002 00000000 01c0685d 00000000 01c1ffb4
0049f938 00409c85 01c06840 0045bd40 00000073 00000002 00000002 00000000
Call Trace: [<00007d7e>] mac_cache_card_flush+0x12/0x1c
[<00010080>] fix_dnrm+0x2/0x18
[<0000792e>] cache_push+0x46/0x5a
[<00006b40>] arch_dma_prep_coherent+0x60/0x6e
[<00040000>] switched_to_dl+0x76/0xd0
[<004a545e>] dma_atomic_pool_init+0x4e/0x188
[<00040000>] switched_to_dl+0x76/0xd0
[<00033b92>] parse_args+0x0/0x370
[<004a5410>] dma_atomic_pool_init+0x0/0x188
[<000021e2>] do_one_initcall+0x34/0x1be
[<00033b92>] parse_args+0x0/0x370
[<0038507a>] strcpy+0x0/0x1e
[<000021ae>] do_one_initcall+0x0/0x1be
[<00020002>] do_proc_dointvec_conv+0x54/0x74
[<0049f938>] kernel_init_freeable+0x126/0x190
[<0049f94c>] kernel_init_freeable+0x13a/0x190
[<004a5410>] dma_atomic_pool_init+0x0/0x188
[<00041798>] complete+0x0/0x3c
[<000b9b0c>] kfree+0x0/0x20a
[<0038df98>] schedule+0x0/0xd0
[<0038d604>] kernel_init+0x0/0xda
[<0038d610>] kernel_init+0xc/0xda
[<0038d604>] kernel_init+0x0/0xda
[<00002d38>] ret_from_kernel_thread+0xc/0x14
Code: 0000 2079 0048 10da 2279 0048 10c8 d3c8 <1011> 0200 fff7 1280 d1f9 0048 10c8 1010 0000 0008 1080 4e5e 4e75 4e56 0000 2039
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Thanks to Stan Johnson for capturing the console log and running git
bisect.
Git bisect said commit 8e3a68fb55e0 ("dma-mapping: make
dma_atomic_pool_init self-contained") is the first "bad" commit. I don't
know why. Perhaps mach_l2_flush first became reachable with that commit.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Cc: Joshua Thompson <funaho@jurai.org>
Link: https://lore.kernel.org/r/b8bbeef197d6b3898e82ed0d231ad08f575a4b34.1589949122.git.fthain@telegraphics.com.au
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 67d631b7c05eff955ccff4139327f0f92a5117e5 ]
This currently leaks kernel physical addresses into userspace.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/20200229231120.1147527-1-nivedita@alum.mit.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ed26aacfb5f71eecb20a51c4467da440cb719d66 ]
Loops-per-jiffies is a special number which represents a number of
noop-loop cycles per CPU-scheduler quantum - jiffies. As you
understand aside from CPU-specific implementation it depends on
the CPU frequency. So when a platform has the CPU frequency fixed,
we have no problem and the current udelay interface will work
just fine. But as soon as CPU-freq driver is enabled and the cores
frequency changes, we'll end up with distorted udelay's. In order
to fix this we have to accordinly adjust the per-CPU udelay_val
(the same as the global loops_per_jiffy) number. This can be done
in the CPU-freq transition event handler. We subscribe to that event
in the MIPS arch time-inititalization method.
Co-developed-by: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Signed-off-by: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8a0efb8b101665a843205eab3d67ab09cb2d9a8d ]
Commit 3885c2b463f6 ("MIPS: CM: Add support for reporting CM cache
errors") adds cm2_causes[] array with map of error type ID and
pointers to the short description string. There is a mistake in
the table, since according to MIPS32 manual CM2_ERROR_TYPE = {17,18}
correspond to INTVN_WR_ERR and INTVN_RD_ERR, while the table
claims they have {0x17,0x18} codes. This is obviously hex-dec
copy-paste bug. Moreover codes {0x18 - 0x1a} indicate L2 ECC errors.
Fixes: 3885c2b463f6 ("MIPS: CM: Add support for reporting CM cache errors")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 88413a6bfbbe2f648df399b62f85c934460b7a4d ]
Currently, we may perform a copy_to_user (through
simple_read_from_buffer()) while holding a context's register_lock,
while accessing the context save area.
This change uses a temporary buffer for the context save area data,
which we then pass to simple_read_from_buffer.
Includes changes from Christoph Hellwig <hch@lst.de>.
Fixes: bf1ab978be23 ("[POWERPC] coredump: Add SPU elf notes to coredump.")
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
[hch: renamed to function to avoid ___-prefixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 269b3a9ac538c4ae87f84be640b9fa89914a2489 ]
In the current code, if CONFIG_SWIOTLB is set, when failed to get IO TLB
memory from the low pages by plat_swiotlb_setup(), it may lead to the boot
process failed with kernel panic.
(1) On the Loongson and SiByte platform
arch/mips/loongson64/dma.c
arch/mips/sibyte/common/dma.c
void __init plat_swiotlb_setup(void)
{
swiotlb_init(1);
}
kernel/dma/swiotlb.c
void __init
swiotlb_init(int verbose)
{
...
vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;
...
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
}
phys_addr_t swiotlb_tbl_map_single()
{
...
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier ...");
...
}
(2) On the Cavium OCTEON platform
arch/mips/cavium-octeon/dma-octeon.c
void __init plat_swiotlb_setup(void)
{
...
octeon_swiotlb = memblock_alloc_low(swiotlbsize, PAGE_SIZE);
if (!octeon_swiotlb)
panic("%s: Failed to allocate %zu bytes align=%lx\n",
__func__, swiotlbsize, PAGE_SIZE);
...
}
Because IO_TLB_DEFAULT_SIZE is 64M, if the rest size of low memory is less
than 64M when call plat_swiotlb_setup(), we can easily reproduce the panic
case.
In order to reduce the possibility of kernel panic when failed to get IO
TLB memory under CONFIG_SWIOTLB, it is better to allocate low memory as
small as possible before plat_swiotlb_setup(), so make sparse_init() using
top-down allocation.
Reported-by: Juxin Gao <gaojuxin@loongson.cn>
Co-developed-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e1de94380af588bdf6ad6f0cc1f75004c35bc096 ]
Recent work with KASan exposed the folling hard-coded bitmask
in arch/arm/mm/proc-macros.S:
bic rd, sp, #8128
bic rd, rd, #63
This forms the bitmask 0x1FFF that is coinciding with
(PAGE_SIZE << THREAD_SIZE_ORDER) - 1, this code was assuming
that THREAD_SIZE is always 8K (8192).
As KASan was increasing THREAD_SIZE_ORDER to 2, I ran into
this bug.
Fix it by this little oneline suggested by Ard:
bic rd, sp, #(THREAD_SIZE - 1) & ~63
Where THREAD_SIZE is defined using THREAD_SIZE_ORDER.
We have to also include <linux/const.h> since the THREAD_SIZE
expands to use the _AC() macro.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3204be4109ad681523e3461ce64454c79278450a upstream.
AArch32 CP1x registers are overlayed on their AArch64 counterparts
in the vcpu struct. This leads to an interesting problem as they
are stored in their CPU-local format, and thus a CP1x register
doesn't "hit" the lower 32bit portion of the AArch64 register on
a BE host.
To workaround this unfortunate situation, introduce a bias trick
in the vcpu_cp1x() accessors which picks the correct half of the
64bit register.
Cc: stable@vger.kernel.org
Reported-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2ebac8bb3c2d35f5135466490fc8eeaf3f3e2d37 upstream.
Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON,
when determining whether a nested VM-Exit should be reflected into L1 or
handled by KVM in L0.
For better or worse, the switch statement in nested_vmx_exit_reflected()
currently defaults to "true", i.e. reflects any nested VM-Exit without
dedicated logic. Because the case statements only contain the basic
exit reason, any VM-Exit with modifier bits set will be reflected to L1,
even if KVM intended to handle it in L0.
Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY,
i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to
L1, as "failed VM-Entry" is the only modifier that KVM can currently
encounter. The SMM modifiers will never be generated as KVM doesn't
support/employ a SMI Transfer Monitor. Ditto for "exit from enclave",
as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to
enter an enclave in a KVM guest (L1 or L2).
Fixes: 644d711aa0e1 ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit")
Cc: Jim Mattson <jmattson@google.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6c0238c4a62b3a0b1201aeb7e33a4636d552a436 upstream.
Restoring the ASID from the hsave area on VMEXIT is wrong, because its
value depends on the handling of TLB flushes. Just skipping the field in
copy_vmcb_control_area will do.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4d8df8cbb9156b0a0ab3f802b80cb5db57acc0bf ]
Currently, it is possible to enable indirect branch speculation even after
it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the
PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result
(force-disabled when it is in fact enabled). This also is inconsistent
vs. STIBP and the documention which cleary states that
PR_SPEC_FORCE_DISABLE cannot be undone.
Fix this by actually enforcing force-disabled indirect branch
speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails
with -EPERM as described in the documentation.
Fixes: 9137bb27e60e ("x86/speculation: Add prctl() control for indirect branch speculation")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 21998a351512eba4ed5969006f0c55882d995ada ]
When STIBP is unavailable or enhanced IBRS is available, Linux
force-disables the IBPB mitigation of Spectre-BTB even when simultaneous
multithreading is disabled. While attempts to enable IBPB using
prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, ...) fail with
EPERM, the seccomp syscall (or its prctl(PR_SET_SECCOMP, ...) equivalent)
which are used e.g. by Chromium or OpenSSH succeed with no errors but the
application remains silently vulnerable to cross-process Spectre v2 attacks
(classical BTB poisoning). At the same time the SYSFS reporting
(/sys/devices/system/cpu/vulnerabilities/spectre_v2) displays that IBPB is
conditionally enabled when in fact it is unconditionally disabled.
STIBP is useful only when SMT is enabled. When SMT is disabled and STIBP is
unavailable, it makes no sense to force-disable also IBPB, because IBPB
protects against cross-process Spectre-BTB attacks regardless of the SMT
state. At the same time since missing STIBP was only observed on AMD CPUs,
AMD does not recommend using STIBP, but recommends using IBPB, so disabling
IBPB because of missing STIBP goes directly against AMD's advice:
https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf
Similarly, enhanced IBRS is designed to protect cross-core BTB poisoning
and BTB-poisoning attacks from user space against kernel (and
BTB-poisoning attacks from guest against hypervisor), it is not designed
to prevent cross-process (or cross-VM) BTB poisoning between processes (or
VMs) running on the same core. Therefore, even with enhanced IBRS it is
necessary to flush the BTB during context-switches, so there is no reason
to force disable IBPB when enhanced IBRS is available.
Enable the prctl control of IBPB even when STIBP is unavailable or enhanced
IBRS is available.
Fixes: 7cc765a67d8e ("x86/speculation: Enable prctl mode for spectre_v2_user")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 20c3a2c33e9fdc82e9e8e8d2a6445b3256d20191 ]
Different AMD processors may have different implementations of STIBP.
When STIBP is conditionally enabled, some implementations would benefit
from having STIBP always on instead of toggling the STIBP bit through MSR
writes. This preference is advertised through a CPUID feature bit.
When conditional STIBP support is requested at boot and the CPU advertises
STIBP always-on mode as preferred, switch to STIBP "on" support. To show
that this transition has occurred, create a new spectre_v2_user_mitigation
value and a new spectre_v2_user_strings message. The new mitigation value
is used in spectre_v2_user_select_mitigation() to print the new mitigation
message as well as to return a new string from stibp_state().
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aa77bfb354c495fc4361199e63fc5765b9e1e783 ]
STIBP stands for Single Thread Indirect Branch Predictors. The acronym,
however, can be easily mis-spelled as STIPB. It is perhaps due to the
presence of another related term - IBPB (Indirect Branch Predictor
Barrier).
Fix the mis-spelling in the code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1544039368-9009-1-git-send-email-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 140fd4ac78d385e6c8e6a5757585f6c707085f87 upstream.
On MacBook6,1 reboot would hang unless parameter reboot=pci is added.
Make it automatic.
Signed-off-by: Hill Ma <maahiuzeon@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200425200641.GA1554@cslab.localdomain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dbbe2ad02e9df26e372f38cc3e70dab9222c832e upstream.
On context switch the change of TIF_SSBD and TIF_SPEC_IB are evaluated
to adjust the mitigations accordingly. This is optimized to avoid the
expensive MSR write if not needed.
This optimization is buggy and allows an attacker to shutdown the SSBD
protection of a victim process.
The update logic reads the cached base value for the speculation control
MSR which has neither the SSBD nor the STIBP bit set. It then OR's the
SSBD bit only when TIF_SSBD is different and requests the MSR update.
That means if TIF_SSBD of the previous and next task are the same, then
the base value is not updated, even if TIF_SSBD is set. The MSR write is
not requested.
Subsequently if the TIF_STIBP bit differs then the STIBP bit is updated
in the base value and the MSR is written with a wrong SSBD value.
This was introduced when the per task/process conditional STIPB
switching was added on top of the existing SSBD switching.
It is exploitable if the attacker creates a process which enforces SSBD
and has the contrary value of STIBP than the victim process (i.e. if the
victim process enforces STIBP, the attacker process must not enforce it;
if the victim process does not enforce STIBP, the attacker process must
enforce it) and schedule it on the same core as the victim process. If
the victim runs after the attacker the victim becomes vulnerable to
Spectre V4.
To fix this, update the MSR value independent of the TIF_SSBD difference
and dependent on the SSBD mitigation method available. This ensures that
a subsequent STIPB initiated MSR write has the correct state of SSBD.
[ tglx: Handle X86_FEATURE_VIRT_SSBD & X86_FEATURE_VIRT_SSBD correctly
and massaged changelog ]
Fixes: 5bfbe3ad5840 ("x86/speculation: Prepare for per task indirect branch speculation control")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d8ad6d39c35d2b44b3d48b787df7f3359381dcbf upstream.
'jiffies' and 'jiffies_64' are meant to alias (two different symbols that
share the same address). Most architectures make the symbols alias to the
same address via a linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:
jiffies = jiffies_64;
which is effectively a definition of jiffies.
jiffies and jiffies_64 are both forward declared for all architectures in
include/linux/jiffies.h. jiffies_64 is defined in kernel/time/timer.c.
x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via the linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via 'jiffies_64 = jiffies;'.
As Fangrui notes:
In LLD, symbol assignments in linker scripts override definitions in
object files. GNU ld appears to have the same behavior. It would
probably make sense for LLD to error "duplicate symbol" but GNU ld
is unlikely to adopt for compatibility reasons.
This results in an ODR violation (UB), which seems to have survived
thus far. Where it becomes harmful is when;
1. -fno-semantic-interposition is used:
As Fangrui notes:
Clang after LLVM commit 5b22bcc2b70d
("[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local")
defaults to -fno-semantic-interposition similar semantics which help
-fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
within the same translation unit. Unlike GCC
-fno-semantic-interposition, Clang emits such relocations referencing
local symbols for non-pic code as well.
This causes references to jiffies to refer to '.Ljiffies$local' when
jiffies is defined in the same translation unit. Likewise, references to
jiffies_64 become references to '.Ljiffies_64$local' in translation units
that define jiffies_64. Because these differ from the names used in the
linker script, they will not be rewritten to alias one another.
2. Full LTO
Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere. When
the linker processes the linker script, there are no longer any
references to jiffies_64' anywhere to replace with 'jiffies'. And
thus '.Ljiffies$local' and '.Ljiffies_64$local' no longer alias
at all.
In the process of porting patches enabling Full LTO from arm64 to x86_64,
spooky bugs have been observed where the kernel appeared to boot, but init
doesn't get scheduled.
Avoid the ODR violation by matching other architectures and define jiffies
only by linker script. For -fno-semantic-interposition + Full LTO, there
is no longer a global definition of jiffies for the compiler to produce a
local symbol which the linker script won't ensure aliases to jiffies_64.
Fixes: 40747ffa5aa8 ("asmlinkage: Make jiffies visible")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Alistair Delva <adelva@google.com>
Debugged-by: Nick Desaulniers <ndesaulniers@google.com>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bob Haarman <inglorion@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot on
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Link: https://lkml.kernel.org/r/20200602193100.229287-1-inglorion@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3866f217aaa81bf7165c7f27362eee5d7919c496 ]
call_undef_hook() in traps.c applies the same instr_mask for both 16-bit
and 32-bit thumb instructions. If instr_mask then is only 16 bits wide
(0xffff as opposed to 0xffffffff), the first half-word of 32-bit thumb
instructions will be masked out. This makes the function match 32-bit
thumb instructions where the second half-word is equal to instr_val,
regardless of the first half-word.
The result in this case is that all undefined 32-bit thumb instructions
with the second half-word equal to 0xde01 (udf #1) work as breakpoints
and will raise a SIGTRAP instead of a SIGILL, instead of just the one
intended 16-bit instruction. An example of such an instruction is
0xeaa0de01, which is unallocated according to Arm ARM and should raise a
SIGILL, but instead raises a SIGTRAP.
This patch fixes the issue by setting all the bits in instr_mask, which
will still match the intended 16-bit thumb instruction (where the
upper half is always 0), but not any 32-bit thumb instructions.
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7e5b3c267d256822407a22fdce6afdf9cd13f9fb upstream
SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.
While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.
The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.
* Enable administrator to configure the mitigation off when desired using
either mitigations=off or srbds=off.
* Export vulnerability status via sysfs
* Rename file-scoped macros to apply for non-whitelist table initializations.
[ bp: Massage,
- s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
- do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
- flip check in cpu_set_bug_bits() to save an indentation level,
- reflow comments.
jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
tglx: Dropped the fused off magic for now
]
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 93920f61c2ad7edb01e63323832585796af75fc9 upstream
To make cpu_matches() reusable for other matching tables, have it take a
pointer to a x86_cpu_id table as an argument.
[ bp: Flip arguments order. ]
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9d7144597b10ff13ff2264c059f7d4a7fbc89ac upstream
Intel uses the same family/model for several CPUs. Sometimes the
stepping must be checked to tell them apart.
On x86 there can be at most 16 steppings. Add a steppings bitmask to
x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro
and support for matching against family/model/stepping.
[ bp: Massage.
tglx: Lightweight variant for backporting ]
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b399151cb48db30ad1e0e93dd40d68c6d007b637 upstream.
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
[ Backport by Mark Gross to simplify the SRBDS backport ]
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|