Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 60e065f70bdb0b0e916389024922ad40f3270c96 upstream.
There is a bug in binutils 2.24 which causes miscompilation if we're
building little endian and using weak symbols (which the kernel does).
It is fixed in binutils commit 57fa7b8c7e59 "Correct elf_merge_st_other
arguments for weak symbols", which is in binutils 2.25 and has been
backported to the binutils 2.24 branch and has been picked up by most
distros it seems.
However if we're running stock 2.24 (no extra version) then the bug is
present, so check for that and bail.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 7ed23e1bae8bf7e37fd555066550a00b95a3a98b upstream.
On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.
So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.
In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.
Fixes: 2a3563b023e5 ("powerpc: Setup in HFSCR for POWER8")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[sb: Backported to linux-4.4.y: adjusted context]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 48fe9e9488743eec9b7c1addd3c93f12f2123d54 upstream.
In the past, there was only one load-with-reservation instruction,
lwarx, and if a program attempted a lwarx on a misaligned address, it
would take an alignment interrupt and the kernel handler would emulate
it as though it was lwzx, which was not really correct, but benign since
it is loading the right amount of data, and the lwarx should be paired
with a stwcx. to the same address, which would also cause an alignment
interrupt which would result in a SIGBUS being delivered to the process.
We now have 5 different sizes of load-with-reservation instruction. Of
those, lharx and ldarx cause an immediate SIGBUS by luck since their
entries in aligninfo[] overlap instructions which were not fixed up, but
lqarx overlaps with lhz and will be emulated as such. lbarx can never
generate an alignment interrupt since it only operates on 1 byte.
To straighten this out and fix the lqarx case, this adds code to detect
the l[hwdq]arx instructions and return without fixing them up, resulting
in a SIGBUS being delivered to the process.
[js] include disassemble.h in 3.12
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 708e75a3ee750dce1072134e630d66c4e6eaf63c upstream.
If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate
one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls
kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction()
returned EMULATE_FAIL, so the guest gets an program interrupt for the
illegal opcode.
However, the kvmppc_emulate_instruction() also tried to inject a
program exception for this already, so the program interrupt gets
injected twice and the return address in srr0 gets destroyed.
All other callers of kvmppc_emulate_instruction() are also injecting
a program interrupt, and since the callers have the right knowledge
about the srr1 flags that should be used, it is the function
kvmppc_emulate_instruction() that should _not_ inject program
interrupts, so remove the kvmppc_core_queue_program() here.
This fixes the issue discovered by Laurent Vivier with kvm-unit-tests
where the logs are filled with these messages when the test tries
to execute an illegal instruction:
Couldn't emulate instruction 0x00000000 (op 0 xop 0)
kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c21a493a2b44650707d06741601894329486f2ad upstream.
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit fe0f3168169f7c34c29b0cf0c489f126a7f29643 upstream.
Make sure to drop any reference taken by bus_find_device() in the sysfs
callbacks that are used to create and destroy devices based on
device-tree entries.
Fixes: 6bccf755ff53 ("[POWERPC] ibmebus: dynamic addition/removal of adapters, some code cleanup")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 815a7141c4d1b11610dccb7fcbb38633759824f2 upstream.
Make sure to drop any reference taken by bus_find_device() when creating
devices during init and driver registration.
Fixes: 55347cc9962f ("[POWERPC] ibmebus: Add device creation and bus probing based on of_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8ae679c4bc2ea2d16d92620da8e3e9332fa4039f upstream.
I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").
The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.
Thanks to Nicholas Piggin for suggesting this solution.
Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 80f23935cadb1c654e81951f5a8b7ceae0acc1b4 upstream.
PowerPC's "cmp" instruction has four operands. Normally people write
"cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently
people forget, and write "cmp" with just three operands.
With older binutils this is silently accepted as if this was "cmpw",
while often "cmpd" is wanted. With newer binutils GAS will complain
about this for 64-bit code. For 32-bit code it still silently assumes
"cmpw" is what is meant.
In this instance the code comes directly from ISA v2.07, including the
cmp, but cmpd is correct. Backport to stable so that new toolchains can
build old kernels.
Fixes: 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode")
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6dff5b67054e17c91bd630bcdda17cfca5aa4215 upstream.
GCC 5 generates different code for this bootwrapper null check that
causes the PS3 to hang very early in its bootup. This check is of
limited value, so just get rid of it.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 11b7e154b132232535befe51c55db048069c8461 upstream.
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.
Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .
Fixes: fa2b4e54d41f ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 04fec21c06e35b169a83e75a84a015ab4606bf5e upstream.
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 05af40e885955065aee8bb7425058eb3e1adca08 upstream.
This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.
In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: virtio_balloon ip_tables x_tables autofs4
virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76
task: c000000005394880 task.stack: c000000005570000
NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000
REGS: c000000005573820 TRAP: 0300 Not tainted (4.8.0)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84822884 XER: 20000000
CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1
GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964
GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104
GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5
GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000
GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000
GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000
GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964
GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0
NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470
LR [c00000000027bf64] zap_huge_pmd+0x44/0x470
Call Trace:
[c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
[c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0
[c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120
[c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0
[c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0
[c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0
[c000000005573e30] [c000000000009560] system_call+0x38/0x108
Instruction dump:
fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378
Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.
This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.
This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).
Fixes: 1a5272866f87 ("powerpc: Optimize hugepage invalidate")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 1a34439e5a0b2235e43f96816dbb15ee1154f656 upstream.
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5adaf8629b193f185ca5a1665b9e777a0579f518 upstream.
This fixes the warnings reported from sparse:
pci.c:312:33: warning: restricted __be64 degrades to integer
pci.c:313:33: warning: restricted __be64 degrades to integer
Fixes: cee72d5bb489 ("powerpc/powernv: Display diag data on p7ioc EEH errors")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5045ea37377ce8cca6890d32b127ad6770e6dce5 upstream.
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));
And we get:
0
-1
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit fa73c3b25bd8d0d393dc6109a1dba3c2aef0451e upstream.
The MMCR2 register is available twice, one time with number 785
(privileged access), and one time with number 769 (unprivileged,
but it can be disabled completely). In former times, the Linux
kernel was using the unprivileged register 769 only, but since
commit 8dd75ccb571f3c92c ("powerpc: Use privileged SPR number
for MMCR2"), it uses the privileged register 785 instead.
The KVM-PR code then of course also switched to use the SPR 785,
but this is causing older guest kernels to crash, since these
kernels still access 769 instead. So to support older kernels
with KVM-PR again, we have to support register 769 in KVM-PR, too.
Fixes: 8dd75ccb571f3c92c48014b3dabd3d51a115ab41
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a5948fa092a04dfd6b9ee31c99eb6896c158eb08 upstream.
In parallel to the Processor ID Register (PIR) threaded POWER8 also adds a
Thread ID Register (TIR). Since PR KVM doesn't emulate more than one thread
per core, we can just always expose 0 here.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f8f6eb0d189cf2724af5ebc8cad460c78fb1994e upstream.
When we expose a POWER8 CPU into the guest, it will start accessing PMU SPRs
that we don't emulate. Just ignore accesses to them.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
client-architecture-support
commit 66443efa83dc73775100b7442962ce2cb0d4472e upstream.
When booting from an OpenFirmware which supports it, we use the
"ibm,client-architecture-support" firmware call to communicate
our capabilities to firmware.
The format of the structure we pass to firmware is specified in
PAPR (Power Architecture Platform Requirements), or the public version
LoPAPR (Linux on Power Architecture Platform Reference).
Referring to table 244 in LoPAPR v1.1, option vector 5 contains a 4 byte
field at bytes 17-20 for the "Platform Facilities Enable". This is
followed by a 1 byte field at byte 21 for "Sub-Processor Represenation
Level".
Comparing to the code, there we have the Platform Facilities
options (OV5_PFO_*) at byte 17, but we fail to pad that field out to its
full width of 4 bytes. This means the OV5_SUB_PROCESSORS option is
incorrectly placed at byte 18.
Fix it by adding zero bytes for bytes 18, 19, 20, and comment the bytes
to hopefully make it clearer in future.
As far as I'm aware nothing actually consumes this value at this time,
so the effect of this bug is nil in practice.
It does mean we've been incorrectly setting bit 15 of the "Platform
Facilities Enable" option for the past ~3 1/2 years, so we should avoid
allocating that bit to anything else in future.
Fixes: df77c7992029 ("powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e8a4fd0afe05d5213d809fa686d3b8319464acfd upstream.
The encoding of the lengths in the ibm_architecture_vec array is
"interesting" to say the least. It's non-obvious how the number of bytes
we provide relates to the length value.
In fact we already got it wrong once, see 11e9ed43ca8a "Fix up
ibm_architecture_vec definition".
So add some macros to make it (hopefully) clearer. These at least have
the property that the integer present in the code is equal to the number
of bytes that follows it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 1e407ee3b21f981140491d5b8a36422979ca246f upstream.
gcc-6 correctly warns about a out of bounds access
arch/powerpc/kernel/ptrace.c:407:24: warning: index 32 denotes an offset greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' [-Warray-bounds]
offsetof(struct thread_fp_state, fpr[32][0]));
^
check the end of array instead of beginning of next element to fix this
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4fa9a3f6b616fd9f2555d9d0c04513a942750986 upstream.
This struct is unused, which is now a build error with gcc 6:
error: 'os_area_db_id_video_mode' defined but not used
There doesn't seem to be any good reason to keep it around so remove it,
it's in the history if anyone needs it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 224264657b8b228f949b42346e09ed8c90136a8e upstream.
should clear on access_ok() failures. Also remove the useless
range truncation logics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f077aaf0754bcba0fffdbd925bc12f09cd1e38aa upstream.
In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET. That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).
This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it. The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca. If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space. If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).
The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere. Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
Fixes: c60ac5693c47 ("powerpc: Update kernel VSID range")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8a934efe94347eee843aeea65bdec8077a79e259 upstream.
In commit 8445a87f7092 "powerpc/iommu: Remove the dependency on EEH
struct in DDW mechanism", the PE address was replaced with the PCI
config address in order to remove dependency on EEH. According to PAPR
spec, firmware (pHyp or QEMU) should accept "xxBBSSxx" format PCI config
address, not "xxxxBBSS" provided by the patch. Note that "BB" is PCI bus
number and "SS" is the combination of slot and function number.
This fixes the PCI address passed to DDW RTAS calls.
Fixes: 8445a87f7092 ("powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism")
Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8445a87f7092bc8336ea1305be9306f26b846d93 upstream.
Commit 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
changed the pci_dn struct by removing its EEH-related members.
As part of this clean-up, DDW mechanism was modified to read the device
configuration address from eeh_dev struct.
As a consequence, now if we disable EEH mechanism on kernel command-line
for example, the DDW mechanism will fail, generating a kernel oops by
dereferencing a NULL pointer (which turns to be the eeh_dev pointer).
This patch just changes the configuration address calculation on DDW
functions to a manual calculation based on pci_dn members instead of
using eeh_dev-based address.
No functional changes were made. This was tested on pSeries, both
in PHyp and qemu guest.
Fixes: 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8dd75ccb571f3c92c48014b3dabd3d51a115ab41 upstream.
We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.
Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8")
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d23fac2b27d94aeb7b65536a50d32bfdc21fe01e upstream.
The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 871e178e0f2c4fa788f694721a10b4758d494ce1 upstream.
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
spec states that values of 9900-9905 can be returned, indicating that
software should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.
The delay is capped at 0.2 seconds.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8ed8ab40047a570fdd8043a40c104a57248dd3fd upstream.
Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.
However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.
But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:
1. In almost all cases, production kernel (relocatable) is used for
kdump as well, which would mean that crashed kernel's OOL handler
would be at the same place where we end up branching to, from short
interrupt vector of kdump kernel.
2. Also, OOL handler was unlikely the reason for crash in almost all
the kdump scenarios, which meant we had a sane OOL handler from
crashed kernel that we branched to.
3. On most 64-bit POWER server processors, page size is large enough
that marking interrupt vector code as executable (see commit
429d2e83) leads to marking OOL handler code from crashed kernel,
that sits right below interrupt vector code from kdump kernel, as
executable as well.
Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.
This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.
Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.
Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers")
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6997e57d693b07289694239e52a10d2f02c3a46f upstream.
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Flesh out changelog, add comment reserving 0x4]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f15838e9cac8f78f0cc506529bb9d3b9fa589c1f upstream.
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name. To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.
Backport to all stables to avoid breakage when people update their
binutils - mpe.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 81d7a3294de7e9828310bbf986a67246b13fa01e upstream.
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.
So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 49e9cf3f0c04bf76ffa59242254110309554861d upstream.
According to memory-barriers.txt:
> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...
Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:
https://lkml.org/lkml/2015/10/14/970
To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.
This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.
Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d2b9d2a5ad5ef04ff978c9923d19730cb05efd55 upstream.
Currently we allow both the MSR T and S bits to be set by userspace on
a signal return. Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).
This patch checks for this case in both the 32 and 64 bit signals
code. If both T and S are set, we mark the context as invalid.
Found using a syscall fuzzer.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c20875a3e638e4a03e099b343ec798edd1af5cc6 upstream.
Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination. The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).
Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point. If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.
This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 733187e29576041ceccf3b82092ca900fc929170 upstream.
commit f13c13a00512 (powerpc: Stop using non-architected shared_proc
field in lppaca) fixed a potential issue with shared/dedicated
partition detection. The old method of detection relied on an
unarchitected field (shared_proc), and this patch switched
to using something architected (a non zero yield_count).
Unfortunately the assertion in the Linux header that yield_count
is only non zero on shared processor partitions is not true. It
turns out dedicated processor partitions can increment yield_count
and as such we falsely detect dedicated partitions as shared.
Fix the comment, and switch back to using the old method.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream.
Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000
CFAR: c000000000009c0c SOFTE: 0
NIP [0000000000000000] (null)
LR [0000000000009c14] 0x9c14
Call Trace:
[c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
[c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
[c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e297c939b745e420ef0b9dc989cb87bda617b399 upstream.
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts. The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.
The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use. Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using). CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.
To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping(). Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.
The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.
Fixes: 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 36b35d5d807b7e57aff7d08e63de8b17731ee211 upstream.
If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.
Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.
Fixes: 6d492ecc6489 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream.
The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
[c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
[c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
[c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
[c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
[c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
[c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
[c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
[c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
[c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
[c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.
The EPOW sensor is defined to be "fast" in sPAPR - mpe.
Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream.
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.
However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.
In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.
This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.
However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.
That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.
This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.
Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.
This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.
copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.
This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)
parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream.
A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads. Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.
This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation. The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes. These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.
In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability. It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits. It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.
[Changelog by PaulMck]
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 72e349f1124a114435e599479c9b8d14bfd1ebcd upstream.
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.
If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().
Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:
0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--52.35%-- 0
| |
| |--46.39%-- __hrtimer_start_range_ns
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | |
| | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
| | | |
| | | --100.00%-- 0x7e714
| | | 0x7e714
Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.
Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:
0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--53.83%-- 0
| |
| |--44.73%-- hrtimer_try_to_cancel
| | kvmppc_start_thread
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | __ioctl
| | 0x7e714
| | 0x7e714
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5e95235ccd5442d4a4fe11ec4eb99ba1b7959368 upstream.
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.
If they are bad, we die a few hundred instructions into boot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
allowed address space
commit 19751c07b3728748c1253627ce94e6906fa5e273 upstream.
According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if
the requested mapping exceeds the allowed range for address space of
the process. The generic code set it right, but the specific powerpc
slice_get_unmapped_area() function currently returns -EINVAL in that
case.
This patch corrects it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f7e9e358362557c3aa2c1ec47490f29fe880a09e upstream.
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9c1 "Rewrite sysfs processor cache info code".
This caused lscpu to error out on at least e500v2 devices, eg:
error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory
Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.
Fixes: 93197a36a9c1 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 9a5cbce421a283e6aea3c4007f141735bf9da8c3 upstream.
We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|