Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit a8de1304b7df30e3a14f2a8b9709bb4ff31a0385 ]
The DTC v1.5.1 added references to (U)INT32_MAX.
This is no problem for user-space programs since <stdint.h> defines
(U)INT32_MAX along with (u)int32_t.
For the kernel space, libfdt_env.h needs to be adjusted before we
pull in the changes.
In the kernel, we usually use s/u32 instead of (u)int32_t for the
fixed-width types.
Accordingly, we already have S/U32_MAX for their max values.
So, we should not add (U)INT32_MAX to <linux/limits.h> any more.
Instead, add them to the in-kernel libfdt_env.h to compile the
latest libfdt.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4e706af3cd8e1d0503c25332b30cad33c97ed442 ]
The issue was showing "Mitigation" message via sysfs whatever the
state of "RFI Flush", but it should show "Vulnerable" when it is
disabled.
If you have "L1D private" feature enabled and not "RFI Flush" you are
vulnerable to meltdown attacks.
"RFI Flush" is the key feature to mitigate the meltdown whatever the
"L1D private" state.
SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only.
So the message should be as the truth table shows:
CPU | L1D private | RFI Flush | sysfs
----|-------------|-----------|-------------------------------------
P9 | False | False | Vulnerable
P9 | False | True | Mitigation: RFI Flush
P9 | True | False | Vulnerable: L1D private per thread
P9 | True | True | Mitigation: RFI Flush, L1D private per thread
P8 | False | False | Vulnerable
P8 | False | True | Mitigation: RFI Flush
Output before this fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: L1D private per thread
Output after fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Vulnerable: L1D private per thread
Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com>
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7d8212747435c534c8d564fbef4541a463c976ff ]
When unloading the module, one gets
------------[ cut here ]------------
Device 'cmm0' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt.
WARNING: CPU: 0 PID: 19308 at drivers/base/core.c:1244 .device_release+0xcc/0xf0
...
We only have one static fake device. There is nothing to do when
releasing the device (via cmm_exit()).
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-2-david@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 16f6b67cf03cb43db7104acb2ca877bdc2606c92 ]
With large memory (8TB and more) hotplug, we can get soft lockup
warnings as below. These were caused by a long loop without any
explicit cond_resched which is a problem for !PREEMPT kernels.
Avoid this using cond_resched() while inserting hash page table
entries. We already do similar cond_resched() in __add_pages(), see
commit f64ac5e6e306 ("mm, memory_hotplug: add scheduling point to
__add_pages").
rcu: 3-....: (24002 ticks this GP) idle=13e/1/0x4000000000000002 softirq=722/722 fqs=12001
(t=24003 jiffies g=4285 q=2002)
NMI backtrace for cpu 3
CPU: 3 PID: 3870 Comm: ndctl Not tainted 5.3.0-197.18-default+ #2
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
nmi_cpu_backtrace+0x124/0x130
nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
arch_trigger_cpumask_backtrace+0x28/0x3c
rcu_dump_cpu_stacks+0xf8/0x154
rcu_sched_clock_irq+0x878/0xb40
update_process_times+0x48/0x90
tick_sched_handle.isra.16+0x4c/0x80
tick_sched_timer+0x68/0xe0
__hrtimer_run_queues+0x180/0x430
hrtimer_interrupt+0x110/0x300
timer_interrupt+0x108/0x2f0
decrementer_common+0x114/0x120
--- interrupt: 901 at arch_add_memory+0xc0/0x130
LR = arch_add_memory+0x74/0x130
memremap_pages+0x494/0x650
devm_memremap_pages+0x3c/0xa0
pmem_attach_disk+0x188/0x750
nvdimm_bus_probe+0xac/0x2c0
really_probe+0x148/0x570
driver_probe_device+0x19c/0x1d0
device_driver_attach+0xcc/0x100
bind_store+0x134/0x1c0
drv_attr_store+0x44/0x60
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1a0/0x270
__vfs_write+0x3c/0x70
vfs_write+0xd0/0x260
ksys_write+0xdc/0x130
system_call+0x5c/0x68
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001084656.31277-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8e6b6da91ac9b9ec5a925b6cb13f287a54bd547d ]
Some PowerPC CPUs are vulnerable to L1TF to the same extent as to
Meltdown. It is also mitigated by flushing the L1D on privilege
transition.
Currently the sysfs gives a false negative on L1TF on CPUs that I
verified to be vulnerable, a Power9 Talos II Boston 004e 1202, PowerNV
T2P9D01.
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Just have cpu_show_l1tf() call cpu_show_meltdown() directly]
Link: https://lore.kernel.org/r/20191029190759.84821-1-asteinhauser@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 75838a3290cd4ebbd1f567f310ba04b6ef017ce4 ]
If the hypervisor returned H_PTEG_FULL for H_ENTER hcall, retry a hash page table
insert by removing a random entry from the group.
After some runtime, it is very well possible to find all the 8 hash page table
entry slot in the hpte group used for mapping. Don't fail a bolted entry insert
in that case. With Storage class memory a user can find this error easily since
a namespace enable/disable is equivalent to memory add/remove.
This results in failures as reported below:
$ ndctl create-namespace -r region1 -t pmem -m devdax -a 65536 -s 100M
libndctl: ndctl_dax_enable: dax1.3: failed to enable
Error: namespace1.2: failed to enable
failed to create namespace: No such device or address
In kernel log we find the details as below:
Unable to create mapping for hot added memory 0xc000042006000000..0xc00004200d000000: -1
dax_pmem: probe of dax1.3 failed with error -14
This indicates that we failed to create a bolted hash table entry for direct-map
address backing the namespace.
We also observe failures such that not all namespaces will be enabled with
ndctl enable-namespace all command.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024093542.29777-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit eb8e20f89093b64f48975c74ccb114e6775cee22 ]
accumulate_stolen_time() is called prior to interrupt state being
reconciled, which can trip the warning in arch_local_irq_restore():
WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130
...
NIP .arch_local_irq_restore+0x9c/0x130
LR .rb_start_commit+0x38/0x80
Call Trace:
.ring_buffer_lock_reserve+0xe4/0x620
.trace_function+0x44/0x210
.function_trace_call+0x148/0x170
.ftrace_ops_no_ops+0x180/0x1d0
ftrace_call+0x4/0x8
.accumulate_stolen_time+0x1c/0xb0
decrementer_common+0x124/0x160
For now just mark it as notrace. We may change the ordering to call it
after interrupt state has been reconciled, but that is a larger
change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 099bc4812f09155da77eeb960a983470249c9ce1 upstream.
Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of
the irq stack"), check_stack_overflow() was called by do_IRQ(), before
switching to the irq stack.
In that commit, do_IRQ() was renamed __do_irq(), and is now executing
on the irq stack, so check_stack_overflow() has just become almost
useless.
Move check_stack_overflow() call in do_IRQ() to do the check while
still on the current stack.
Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 552263456215ada7ee8700ce022d12b0cffe4802 ]
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.
Fix the powerpc vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.
Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Cc: stable@vger.kernel.org
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
[chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES]
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f9ec11165301982585e5e5f606739b5bae5331f3 upstream.
When calling __kernel_sync_dicache with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-3-alastair@au1.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b682c8692442711684befe413cf93cf01c5324ea ]
The add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros originate
from GCC's longlong.h which in turn was copied from GMP's longlong.h a
few decades ago.
This was found when compiling with clang:
arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
inline asm context requiring an l-value: remove the cast or build with
-fheinous-gnu-extensions
FP_ADD_D(R, T, B);
^~~~~~~~~~~~~~~~~
...
./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
macro 'sub_ddmmss'
: "=r" ((USItype)(sh)), \
~~~~~~~~~~^~~
Segher points out: this was fixed in GCC over 16 years ago
( https://gcc.gnu.org/r56600 ), and in GMP (where it comes from)
presumably before that.
Update the add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros to
the latest GCC version in order to git rid of the invalid casts. These
were taken as-is from GCC's longlong in order to make future syncs
obvious. Other parts of sfp-machine.h were left as-is as the file
contains more features than present in longlong.h.
Link: https://github.com/ClangBuiltLinux/linux/issues/260
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit efa9ace68e487ddd29c2b4d6dd23242158f1f607 ]
In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
kstrdup() may return NULL, so it should be checked and handle error.
And prop should be freed if 'prop->name' is NULL.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 47918bc68b7427e961035949cc1501a864578a69 ]
In update_lmb_associativity_index() we lookup dr_node using
of_find_node_by_path() which takes a reference for us. In the
non-error case we forget to drop the reference. Note that
find_aa_index() does modify properties of the node, but doesn't need
an extra reference held once it's returned.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0deae39cec6dab3a66794f3e9e83ca4dc30080f1 ]
When the watchdog timer is set in interrupt mode, it causes a
machine check when it times out. The purpose of this mode is to
ease debugging, not to crash the kernel and reboot the machine.
This patch implements a special handling for that, in order to not
crash the kernel if the watchdog times out while in interrupt or
within the idle task.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: added missing #include]
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
opal_pci_eeh_freeze_status
[ Upstream commit c20577014f85f36d4e137d3d52a1f61225b4a3d2 ]
The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
skiboot's NPU driver does not touch the pci_error_type parameter so
it might have garbage but the powernv code analyzes it nevertheless.
This initializes pcierr and fstate to zero in all call sites.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3cfb9ebe906b51f2942b1e251009bb251efd2ba6 ]
The bamboo dts has a bug: it uses a non-naturally aligned range
for PCI memory space. This isnt' supported by the code, thus
causing PCI to break on this system.
This is due to the fact that while the chip memory map has 1G
reserved for PCI memory, it's only 512M aligned. The code doesn't
know how to split that into 2 different PMMs and fails, so limit
the region to 512M.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 49a502ea23bf9dec47f8f3c3960909ff409cd1bb ]
As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE
In the mean time, this page makes all bad_page_fault() messages
shorter so that they remain on one single line. And it prefixes them
by "BUG: " so that they get easily grepped.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Avoid pr_cont()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b18f0ae92b0a1db565c3e505fa87b6971ad3b641 ]
This patch fixes early DEBUG messages in prom.c:
- Use %px instead of %p to see the addresses
- Cast memblock_phys_mem_size() with (unsigned long long) to
avoid build failure when phys_addr_t is not 64 bits.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 32c8c4c621897199e690760c2d57054f8b84b6e6 ]
mfsrin() takes segment num from bits 31-28 (IBM bits 0-3).
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Clarify bit numbering]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e93ba1b7eb5b188c749052df7af1c90821c5f320 ]
This patch fixes the loop in p_block_mapped() and v_block_mapped()
to scan the entire bat_addrs[] array.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit af2e8c68b9c5403f77096969c516f742f5bb29e0 upstream.
On some systems that are vulnerable to Spectre v2, it is up to
software to flush the link stack (return address stack), in order to
protect against Spectre-RSB.
When exiting from a guest we do some house keeping and then
potentially exit to C code which is several stack frames deep in the
host kernel. We will then execute a series of returns without
preceeding calls, opening up the possiblity that the guest could have
poisoned the link stack, and direct speculative execution of the host
to a gadget of some sort.
To prevent this we add a flush of the link stack on exit from a guest.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[dja: straightforward backport to v4.14]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream.
In commit ee13cb249fab ("powerpc/64s: Add support for software count
cache flush"), I added support for software to flush the count
cache (indirect branch cache) on context switch if firmware told us
that was the required mitigation for Spectre v2.
As part of that code we also added a software flush of the link
stack (return address stack), which protects against Spectre-RSB
between user processes.
That is all correct for CPUs that activate that mitigation, which is
currently Power9 Nimbus DD2.3.
What I got wrong is that on older CPUs, where firmware has disabled
the count cache, we also need to flush the link stack on context
switch.
To fix it we create a new feature bit which is not set by firmware,
which tells us we need to flush the link stack. We set that when
firmware tells us that either of the existing Spectre v2 mitigations
are enabled.
Then we adjust the patching code so that if we see that feature bit we
enable the link stack flush. If we're also told to flush the count
cache in software then we fall through and do that also.
On the older CPUs we don't need to do do the software count cache
flush, firmware has disabled it, so in that case we patch in an early
return after the link stack flush.
The naming of some of the functions is awkward after this patch,
because they're called "count cache" but they also do link stack. But
we'll fix that up in a later commit to ease backporting.
This is the fix for CVE-2019-18660.
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[dja: straightforward backport to v4.14]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream.
Add support for disabling the kernel implemented spectre v2 mitigation
(count cache flush on context switch) via the nospectre_v2 and
mitigations=off cmdline options.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.
In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g. from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.
Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.
The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.
Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module. If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).
Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ]
Fix a bug introduced by the creation of flush_all_to_thread() for
processors that have SPE (Signal Processing Engine) and use it to
compute floating-point operations.
>From userspace perspective, the problem was seen in attempts of
computing floating-point operations which should generate exceptions.
For example:
fork();
float x = 0.0 / 0.0;
isnan(x); // forked process returns False (should be True)
The operation above also should always cause the SPEFSCR FINV bit to
be set. However, the SPE floating-point exceptions were turned off
after a fork().
Kernel versions prior to the bug used flush_spe_to_thread(), which
first saves SPEFSCR register values in tsk->thread and then calls
giveup_spe(tsk).
After commit 579e633e764e, the save_all() function was called first
to giveup_spe(), and then the SPEFSCR register values were saved in
tsk->thread. This would save the SPEFSCR register values after
disabling SPE for that thread, causing the bug described above.
Fixes 579e633e764e ("powerpc: create flush_all_to_thread()")
Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 473af09b56dc4be68e4af33220ceca6be67aa60d ]
eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it
incorrectly removes it from pe->type, instead of pe->state.
However, rather than clearing it from the correct field, remove it.
Inspection of the code shows that it can't ever have had any effect
(even if it had been cleared from the correct field), because the
field is never tested after it is cleared by the statement in
question.
The clear statement was added by commit 807a827d4e74 ("powerpc/eeh:
Keep PE during hotplug"), but it didn't explain why it was necessary.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 014704e6f54189a203cc14c7c0bb411b940241bc ]
The "count < sizeof(struct os_area_db)" comparison is type promoted to
size_t so negative values of "count" are treated as very high values
and we accidentally return success instead of a negative error code.
This doesn't really change runtime much but it fixes a static checker
warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1006284c5e411872333967b1970c2ca46a9e225f ]
When an OS (currently only classic Mac OS) is running in KVM-PR and makes a
linked jump from code with split hack addressing enabled into code that does
not, LR is not correctly updated and reflects the previously munged PC.
To fix this, this patch undoes the address munge when exiting split
hack mode so that code relying on LR being a proper address will now
execute. This does not affect OS X or other operating systems running
on KVM-PR.
Signed-off-by: Cameron Kaiser <spectre@floodgap.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9258227e9dd1da8feddb07ad9702845546a581c9 ]
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in
the lppaca to determine the number of entries in the buffer. Since
lppaca is in big endian, we need to do an endian conversion before using
this in our calculation to determine the number of entries in the
buffer. Without this, we do not iterate over the existing entries in the
DTL buffer properly.
Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit db787af1b8a6b4be428ee2ea7d409dafcaa4a43c ]
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL
buffer for a cpu when the associated file under powerpc/dtl in debugfs
is opened. When doing so, we need to set the size of the buffer being
registered in the second u32 word of the buffer. This needs to be in big
endian, but we are not doing the conversion resulting in the below error
showing up in dmesg:
dtl_start: DTL registration for cpu 0 (hw 0) failed with -4
Fix this in the obvious manner.
Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 53dd9dce6979bc54d64a3a09a2fb20187a025be7 ]
The next update of libfdt has a new dependency on INT_MAX. Update the
instances of libfdt_env.h in the kernel to either include the necessary
header with the definition or define it locally.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 85a88cabad57d26d826dd94ea34d3a785824d802 ]
When performing partition migrations all present CPUs must be online
as all present CPUs must make the H_JOIN call as part of the migration
process. Once all present CPUs make the H_JOIN call, one CPU is returned
to make the rtas call to perform the migration to the destination system.
During testing of migration and changing the SMT state we have found
instances where CPUs are offlined, as part of the SMT state change,
before they make the H_JOIN call. This results in a hung system where
every CPU is either in H_JOIN or offline.
To prevent this this patch disables CPU hotplug during the migration
process.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 09b4438db13fa83b6219aee5993711a2aa2a0c64 ]
This causes SLB alloation to start 1 beyond the start of the SLB.
There is no real problem because after it wraps it stats behaving
properly, it's just surprisig to see when looking at SLB traces.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 984ecdd68de0fa1f63ce205d6c19ef5a7bc67b40 ]
The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check
if it is not NULL.
Just moving the dereference code to after the check, where there will
be guarantee that 'tbl' will not be NULL.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 56d20861c027498b5a1112b4f9f05b56d906fdda ]
Call Frame Information is used by gdb for back-traces and inserting
breakpoints on function return for the "finish" command. This failed
when inside __kernel_clock_gettime. More concerning than difficulty
debugging is that CFI is also used by stack frame unwinding code to
implement exceptions. If you have an app that needs to handle
asynchronous exceptions for some reason, and you are unlucky enough to
get one inside the VDSO time functions, your app will crash.
What's wrong: There is control flow in __kernel_clock_gettime that
reaches label 99 without saving lr in r12. CFI info however is
interpreted by the unwinder without reference to control flow: It's a
simple matter of "Execute all the CFI opcodes up to the current
address". That means the unwinder thinks r12 contains the return
address at label 99. Disabuse it of that notion by resetting CFI for
the return address at label 99.
Note that the ".cfi_restore lr" could have gone anywhere from the
"mtlr r12" a few instructions earlier to the instruction at label 99.
I put the CFI as late as possible, because in general that's best
practice (and if possible grouped with other CFI in order to reduce
the number of CFI opcodes executed when unwinding). Using r12 as the
return address is perfectly fine after the "mtlr r12" since r12 on
that code path still contains the return address.
__get_datapage also has a CFI error. That function temporarily saves
lr in r0, and reflects that fact with ".cfi_register lr,r0". A later
use of r0 means the CFI at that point isn't correct, as r0 no longer
contains the return address. Fix that too.
Signed-off-by: Alan Modra <amodra@gmail.com>
Tested-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e7de4f7b64c23e503a8c42af98d56f2a7462bd6d upstream.
Currently the OPAL symbol map is globally readable, which seems bad as
it contains physical addresses.
Restrict it to root.
Fixes: c8742f85125d ("powerpc/powernv: Expose OPAL firmware symbol map")
Cc: stable@vger.kernel.org # v3.19+
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190503075253.22798-1-ajd@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 92c94dfb69e350471473fd3075c74bc68150879e ]
prep_irq_for_idle() is intended to be called before entering
H_CEDE (and it is used by the pseries cpuidle driver). However the
default pseries idle routine does not call it, leading to mismanaged
lazy irq state when the cpuidle driver isn't in use. Manifestations of
this include:
* Dropped IPIs in the time immediately after a cpu comes
online (before it has installed the cpuidle handler), making the
online operation block indefinitely waiting for the new cpu to
respond.
* Hitting this WARN_ON in arch_local_irq_restore():
/*
* We should already be hard disabled here. We had bugs
* where that wasn't the case so let's dbl check it and
* warn if we are wrong. Only do that when IRQ tracing
* is enabled as mfmsr() can be costly.
*/
if (WARN_ON_ONCE(mfmsr() & MSR_EE))
__hard_irq_disable();
Call prep_irq_for_idle() from pseries_lpar_idle() and honor its
result.
Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0b66370c61fcf5fcc1d6901013e110284da6e2bb ]
Bare metal machine checks run an "early" handler in real mode before
running the main handler which reports the event.
The main handler runs exactly as a normal interrupt handler, after the
"windup" which sets registers back as they were at interrupt entry.
CFAR does not get restored by the windup code, so that will be wrong
when the handler is run.
Restore the CFAR to the saved value before running the late handler.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ccfb5bd71d3d1228090a8633800ae7cdf42a94ac ]
After a partition migration, pseries_devicetree_update() processes
changes to the device tree communicated from the platform to
Linux. This is a relatively heavyweight operation, with multiple
device tree searches, memory allocations, and conversations with
partition firmware.
There's a few levels of nested loops which are bounded only by
decisions made by the platform, outside of Linux's control, and indeed
we have seen RCU stalls on large systems while executing this call
graph. Use cond_resched() in these loops so that the cpu is yielded
when needed.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-4-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 38a0d0cdb46d3f91534e5b9839ec2d67be14c59d ]
We see warnings such as:
kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
return oldval == cmparg;
^
kernel/futex.c:1651:6: note: 'oldval' was declared here
int oldval, ret;
^
This is because arch_futex_atomic_op_inuser() only sets *oval if ret
is 0 and GCC doesn't see that it will only use it when ret is 0.
Anyway, the non-zero ret path is an error path that won't suffer from
setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
it will have no impact.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: reword change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a6717c01ddc259f6f73364779df058e2c67309f8 ]
The LPAR migration implementation and userspace-initiated cpu hotplug
can interleave their executions like so:
1. Set cpu 7 offline via sysfs.
2. Begin a partition migration, whose implementation requires the OS
to ensure all present cpus are online; cpu 7 is onlined:
rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
This sets cpu 7 online in all respects except for the cpu's
corresponding struct device; dev->offline remains true.
3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
already online and returns success. The driver core (device_online)
sets dev->offline = false.
4. The migration completes and restores cpu 7 to offline state:
rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the cpu device
is manually set offline and then online again via sysfs.
Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.
Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 89a3496e0664577043666791ec07fb731d57c950 upstream.
We use mmu_vmemmap_psize to find the page size for mapping the vmmemap area.
With radix translation, we are suboptimally setting this value to PAGE_SIZE.
We do check for 2M page size support and update mmu_vmemap_psize to use
hugepage size but we suboptimally reset the value to PAGE_SIZE in
radix__early_init_mmu(). This resulted in always mapping vmemmap area with
64K page size.
Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8205d5d98ef7f155de211f5e2eb6ca03d95a5a60 upstream.
When we take an FP unavailable exception in a transaction we have to
account for the hardware FP TM checkpointed registers being
incorrect. In this case for this process we know the current and
checkpointed FP registers must be the same (since FP wasn't used
inside the transaction) hence in the thread_struct we copy the current
FP registers to the checkpointed ones.
This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
to determine if FP was on when in userspace. thread->ckpt_regs.msr
represents the state of the MSR when exiting userspace. This is setup
by check_if_tm_restore_required().
Unfortunatley there is an optimisation in giveup_all() which returns
early if tsk->thread.regs->msr (via local variable `usermsr`) has
FP=VEC=VSX=SPE=0. This optimisation means that
check_if_tm_restore_required() is not called and hence
thread->ckpt_regs.msr is not updated and will contain an old value.
This can happen if due to load_fp=255 we start a userspace process
with MSR FP=1 and then we are context switched out. In this case
thread->ckpt_regs.msr will contain FP=1. If that same process is then
context switched in and load_fp overflows, MSR will have FP=0. If that
process now enters a transaction and does an FP instruction, the FP
unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread()
will then not perform the required memcpy and the checkpointed FP regs
in the thread struct will contain the wrong values.
The code path for this happening is:
Userspace: Kernel
Start userspace
with MSR FP/VEC/VSX/SPE=0 TM=1
< -----
...
tbegin
bne
fp instruction
FP unavailable
---- >
fp_unavailable_tm()
tm_reclaim_current()
tm_reclaim_thread()
giveup_all()
return early since FP/VMX/VSX=0
/* ckpt MSR not updated (Incorrect) */
tm_reclaim()
/* thread_struct ckpt FP regs contain junk (OK) */
/* Sees ckpt MSR FP=1 (Incorrect) */
no memcpy() performed
/* thread_struct ckpt FP regs not fixed (Incorrect) */
tm_recheckpoint()
/* Put junk in hardware checkpoint FP regs */
....
< -----
Return to userspace
with MSR TM=1 FP=1
with junk in the FP TM checkpoint
TM rollback
reads FP junk
This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.
This patch moves up check_if_tm_restore_required() in giveup_all() to
ensure thread->ckpt_regs.msr is updated correctly.
A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c
Similarly for VMX.
This fixes CVE-2019-15030.
Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fa9dd599b4dae841924b022768354cfde9affecb upstream.
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[bwh: Backported to 4.9 as dependency of commit 2e4a30983b0f
"bpf: restrict access to core bpf sysctls":
- Drop change in arch/mips/net/ebpf_jit.c
- Drop change to bpf_jit_kallsyms
- Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream.
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:
Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8)
MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000
CFAR: c0000000000022e0 IRQMASK: 0
GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
LR [00007fffb2d67e48] 0x7fffb2d67e48
Call Trace:
Instruction dump:
e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.
This means any local user can crash the system.
Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.
Found with sigfuz kernel selftest on P9.
This fixes CVE-2019-13648.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]
In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.
Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions. When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.
When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.
There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.
Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]
The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:
| WRAP arch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2
skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.
I figured out the root cause in lib/decompress_unxz.c:
| #ifdef CONFIG_PPC
| # define XZ_DEC_POWERPC
| #endif
CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c
XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.
With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.
Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.
The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.
If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included <asm/unaligned.h>:
| #ifdef __KERNEL__
| # include <linux/xz.h>
| # include <linux/kernel.h>
| # include <asm/unaligned.h>
However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.
I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190705100144.28785-1-yamada.masahiro@socionext.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]
When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.
This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]
When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.
The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.
This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Shawn Anastasio <shawn@anastas.io>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f474c28fbcbe42faca4eb415172c07d76adcb819 upstream.
powerpc hardware triggers watchpoint before executing the instruction.
To make trigger-after-execute behavior, kernel emulates the
instruction. If the instruction is 'load something into non-volatile
register', exception handler should restore emulated register state
while returning back, otherwise there will be register state
corruption. eg, adding a watchpoint on a list can corrput the list:
# cat /proc/kallsyms | grep kthread_create_list
c00000000121c8b8 d kthread_create_list
Add watchpoint on kthread_create_list->prev:
# perf record -e mem:0xc00000000121c8c0
Run some workload such that new kthread gets invoked. eg, I just
logged out from console:
list_add corruption. next->prev should be prev (c000000001214e00), \
but was c00000000121c8b8. (next=c00000000121c8b8).
WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
...
NIP __list_add_valid+0xb4/0xc0
LR __list_add_valid+0xb0/0xc0
Call Trace:
__list_add_valid+0xb0/0xc0 (unreliable)
__kthread_create_on_node+0xe0/0x260
kthread_create_on_node+0x34/0x50
create_worker+0xe8/0x260
worker_thread+0x444/0x560
kthread+0x160/0x1a0
ret_from_kernel_thread+0x5c/0x70
List corruption happened because it uses 'load into non-volatile
register' instruction:
Snippet from __kthread_create_on_node:
c000000000136be8: addis r29,r2,-19
c000000000136bec: ld r29,31424(r29)
if (!__list_add_valid(new, prev, next))
c000000000136bf0: mr r3,r30
c000000000136bf4: mr r5,r28
c000000000136bf8: mr r4,r29
c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8>
Register state from WARN_ON():
GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
Watchpoint hit at 0xc000000000136bec.
addis r29,r2,-19
=> r29 = 0xc000000001344e00 + (-19 << 16)
=> r29 = 0xc000000001214e00
ld r29,31424(r29)
=> r29 = *(0xc000000001214e00 + 31424)
=> r29 = *(0xc00000000121c8c0)
0xc00000000121c8c0 is where we placed a watchpoint and thus this
instruction was emulated by emulate_step. But because handle_dabr_fault
did not restore emulated register state, r29 still contains stale
value in above register state.
Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: stable@vger.kernel.org # 2.6.36+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|