summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-01-04powerpc/pseries: Mark accumulate_stolen_time() as notraceMichael Ellerman1-1/+1
[ Upstream commit eb8e20f89093b64f48975c74ccb114e6775cee22 ] accumulate_stolen_time() is called prior to interrupt state being reconciled, which can trip the warning in arch_local_irq_restore(): WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130 ... NIP .arch_local_irq_restore+0x9c/0x130 LR .rb_start_commit+0x38/0x80 Call Trace: .ring_buffer_lock_reserve+0xe4/0x620 .trace_function+0x44/0x210 .function_trace_call+0x148/0x170 .ftrace_ops_no_ops+0x180/0x1d0 ftrace_call+0x4/0x8 .accumulate_stolen_time+0x1c/0xb0 decrementer_common+0x124/0x160 For now just mark it as notrace. We may change the ordering to call it after interrupt state has been reconciled, but that is a larger change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31powerpc/irq: fix stack overflow verificationChristophe Leroy1-2/+2
commit 099bc4812f09155da77eeb960a983470249c9ce1 upstream. Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Fix vDSO clock_getres()Vincenzo Frascino4-5/+12
[ Upstream commit 552263456215ada7ee8700ce022d12b0cffe4802 ] clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the powerpc vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc: Avoid clang warnings around setjmp and longjmpNathan Chancellor1-2/+2
[ Upstream commit c9029ef9c95765e7b63c4d9aa780674447db1ec0 ] Commit aea447141c7e ("powerpc: Disable -Wbuiltin-requires-header when setjmp is used") disabled -Wbuiltin-requires-header because of a warning about the setjmp and longjmp declarations. r367387 in clang added another diagnostic around this, complaining that there is no jmp_buf declaration. In file included from ../arch/powerpc/xmon/xmon.c:47: ../arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of built-in function 'setjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern long setjmp(long *); ^ ../arch/powerpc/include/asm/setjmp.h:11:13: error: declaration of built-in function 'longjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern void longjmp(long *, long); ^ 2 errors generated. We are not using the standard library's longjmp/setjmp implementations for obvious reasons; make this clear to clang by using -ffreestanding on these files. Cc: stable@vger.kernel.org # 4.14+ Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191119045712.39633-3-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc: Allow flush_icache_range to work across ranges >4GBAlastair D'Silva1-2/+2
commit 29430fae82073d39b1b881a3cd507416a56a363f upstream. When calling flush_icache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-2-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GBAlastair D'Silva1-2/+2
commit f9ec11165301982585e5e5f606739b5bae5331f3 upstream. When calling __kernel_sync_dicache with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-3-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05powerpc/83xx: handle machine check caused by watchdog timerChristophe Leroy1-4/+6
[ Upstream commit 0deae39cec6dab3a66794f3e9e83ca4dc30080f1 ] When the watchdog timer is set in interrupt mode, it causes a machine check when it times out. The purpose of this mode is to ease debugging, not to crash the kernel and reboot the machine. This patch implements a special handling for that, in order to not crash the kernel if the watchdog times out while in interrupt or within the idle task. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: added missing #include] Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05powerpc/prom: fix early DEBUG messagesChristophe Leroy1-3/+3
[ Upstream commit b18f0ae92b0a1db565c3e505fa87b6971ad3b641 ] This patch fixes early DEBUG messages in prom.c: - Use %px instead of %p to see the addresses - Cast memblock_phys_mem_size() with (unsigned long long) to avoid build failure when phys_addr_t is not 64 bits. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernelMichael Ellerman1-0/+9
commit af2e8c68b9c5403f77096969c516f742f5bb29e0 upstream. On some systems that are vulnerable to Spectre v2, it is up to software to flush the link stack (return address stack), in order to protect against Spectre-RSB. When exiting from a guest we do some house keeping and then potentially exit to C code which is several stack frames deep in the host kernel. We will then execute a series of returns without preceeding calls, opening up the possiblity that the guest could have poisoned the link stack, and direct speculative execution of the host to a gadget of some sort. To prevent this we add a flush of the link stack on exit from a guest. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [dja: straightforward backport to v4.14] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/book3s64: Fix link stack flush on context switchMichael Ellerman2-4/+50
commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream. In commit ee13cb249fab ("powerpc/64s: Add support for software count cache flush"), I added support for software to flush the count cache (indirect branch cache) on context switch if firmware told us that was the required mitigation for Spectre v2. As part of that code we also added a software flush of the link stack (return address stack), which protects against Spectre-RSB between user processes. That is all correct for CPUs that activate that mitigation, which is currently Power9 Nimbus DD2.3. What I got wrong is that on older CPUs, where firmware has disabled the count cache, we also need to flush the link stack on context switch. To fix it we create a new feature bit which is not set by firmware, which tells us we need to flush the link stack. We set that when firmware tells us that either of the existing Spectre v2 mitigations are enabled. Then we adjust the patching code so that if we see that feature bit we enable the link stack flush. If we're also told to flush the count cache in software then we fall through and do that also. On the older CPUs we don't need to do do the software count cache flush, firmware has disabled it, so in that case we patch in an early return after the link stack flush. The naming of some of the functions is awkward after this patch, because they're called "count cache" but they also do link stack. But we'll fix that up in a later commit to ease backporting. This is the fix for CVE-2019-18660. Reported-by: Anthony Steinhauser <asteinhauser@google.com> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [dja: straightforward backport to v4.14] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/64s: support nospectre_v2 cmdline optionChristopher M. Riedl1-3/+16
commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream. Add support for disabling the kernel implemented spectre v2 mitigation (count cache flush on context switch) via the nospectre_v2 and mitigations=off cmdline options. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/process: Fix flush_all_to_thread for SPEFelipe Rechia1-2/+1
[ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ] Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e764e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e764e ("powerpc: create flush_all_to_thread()") Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01powerpc/eeh: Fix use of EEH_PE_KEEP on wrong fieldSam Bobroff1-1/+1
[ Upstream commit 473af09b56dc4be68e4af33220ceca6be67aa60d ] eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it incorrectly removes it from pe->type, instead of pe->state. However, rather than clearing it from the correct field, remove it. Inspection of the code shows that it can't ever have had any effect (even if it had been cleared from the correct field), because the field is never tested after it is cleared by the statement in question. The clear statement was added by commit 807a827d4e74 ("powerpc/eeh: Keep PE during hotplug"), but it didn't explain why it was necessary. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24powerpc/time: Fix clockevent_decrementer initalisation for PR KVMMichael Ellerman1-0/+4
[ Upstream commit b4d16ab58c41ff0125822464bdff074cebd0fe47 ] In the recent commit 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") we changed the way we initialise the decrementer clockevent(s). We no longer initialise the mult & shift values of decrementer_clockevent itself. This has the effect of breaking PR KVM, because it uses those values in kvmppc_emulate_dec(). The symptom is guest kernels spin forever mid-way through boot. For now fix it by assigning back to decrementer_clockevent the mult and shift values. Fixes: 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24powerpc/time: Use clockevents_register_device(), fixing an issue with large ↵Anton Blanchard1-14/+3
decrementer [ Upstream commit 8b78fdb045de60a4eb35460092bbd3cffa925353 ] We currently cap the decrementer clockevent at 4 seconds, even on systems with large decrementer support. Fix this by converting the code to use clockevents_register_device() which calculates the upper bound based on the max_delta passed in. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20powerpc/pseries: Disable CPU hotplug across migrationsNathan Fontenot1-0/+2
[ Upstream commit 85a88cabad57d26d826dd94ea34d3a785824d802 ] When performing partition migrations all present CPUs must be online as all present CPUs must make the H_JOIN call as part of the migration process. Once all present CPUs make the H_JOIN call, one CPU is returned to make the rtas call to perform the migration to the destination system. During testing of migration and changing the SMT state we have found instances where CPUs are offlined, as part of the SMT state change, before they make the H_JOIN call. This results in a hung system where every CPU is either in H_JOIN or offline. To prevent this this patch disables CPU hotplug during the migration process. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20powerpc/iommu: Avoid derefence before pointer checkBreno Leitao1-1/+1
[ Upstream commit 984ecdd68de0fa1f63ce205d6c19ef5a7bc67b40 ] The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check if it is not NULL. Just moving the dereference code to after the check, where there will be guarantee that 'tbl' will not be NULL. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20powerpc/vdso: Correct call frame informationAlan Modra4-0/+4
[ Upstream commit 56d20861c027498b5a1112b4f9f05b56d906fdda ] Call Frame Information is used by gdb for back-traces and inserting breakpoints on function return for the "finish" command. This failed when inside __kernel_clock_gettime. More concerning than difficulty debugging is that CFI is also used by stack frame unwinding code to implement exceptions. If you have an app that needs to handle asynchronous exceptions for some reason, and you are unlucky enough to get one inside the VDSO time functions, your app will crash. What's wrong: There is control flow in __kernel_clock_gettime that reaches label 99 without saving lr in r12. CFI info however is interpreted by the unwinder without reference to control flow: It's a simple matter of "Execute all the CFI opcodes up to the current address". That means the unwinder thinks r12 contains the return address at label 99. Disabuse it of that notion by resetting CFI for the return address at label 99. Note that the ".cfi_restore lr" could have gone anywhere from the "mtlr r12" a few instructions earlier to the instruction at label 99. I put the CFI as late as possible, because in general that's best practice (and if possible grouped with other CFI in order to reduce the number of CFI opcodes executed when unwinding). Using r12 as the return address is perfectly fine after the "mtlr r12" since r12 on that code path still contains the return address. __get_datapage also has a CFI error. That function temporarily saves lr in r0, and reflects that fact with ".cfi_register lr,r0". A later use of r0 means the CFI at that point isn't correct, as r0 no longer contains the return address. Fix that too. Signed-off-by: Alan Modra <amodra@gmail.com> Tested-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-10powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9Aneesh Kumar K.V1-0/+2
commit 047e6575aec71d75b765c22111820c4776cd1c43 upstream. On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org # v4.14 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com [sandipan: Backported to v4.14] Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flagAneesh Kumar K.V1-3/+3
commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 upstream. Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Cc: stable@vger.kernel.org # v4.14 Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com [sandipan: Backported to v4.14] Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisionsAneesh Kumar K.V1-3/+28
commit 677733e296b5c7a37c47da391fc70a43dc40bd67 upstream. The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them We can only do this on PowerNV. On pseries guest with kvm we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2 Cc: stable@vger.kernel.org # v4.14 Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com [sandipan: Backported to v4.14] Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10powerpc/mm: Fixup tlbie vs store ordering issue on POWER9Aneesh Kumar K.V1-0/+3
commit a5d4b5891c2f1f865a2def1eb0030f534e77ff86 upstream. On POWER9, under some circumstances, a broadcast TLB invalidation might complete before all previous stores have drained, potentially allowing stale stores from becoming visible after the invalidation. This works around it by doubling up those TLB invalidations which was verified by HW to be sufficient to close the risk window. This will be documented in a yet-to-be-published errata. Cc: stable@vger.kernel.org # v4.14 Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Enable the feature in the DT CPU features code for all Power9, rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20180323045627.16800-3-aneesh.kumar@linux.vnet.ibm.com/ [sandipan: Backported to v4.14] Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07powerpc/64s/exception: machine check use correct cfar for late handlerNicholas Piggin1-0/+4
[ Upstream commit 0b66370c61fcf5fcc1d6901013e110284da6e2bb ] Bare metal machine checks run an "early" handler in real mode before running the main handler which reports the event. The main handler runs exactly as a normal interrupt handler, after the "windup" which sets registers back as they were at interrupt entry. CFAR does not get restored by the windup code, so that will be wrong when the handler is run. Restore the CFAR to the saved value before running the late handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07powerpc/rtas: use device model APIs and serialization during LPMNathan Lynch1-3/+8
[ Upstream commit a6717c01ddc259f6f73364779df058e2c67309f8 ] The LPAR migration implementation and userspace-initiated cpu hotplug can interleave their executions like so: 1. Set cpu 7 offline via sysfs. 2. Begin a partition migration, whose implementation requires the OS to ensure all present cpus are online; cpu 7 is onlined: rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up This sets cpu 7 online in all respects except for the cpu's corresponding struct device; dev->offline remains true. 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is already online and returns success. The driver core (device_online) sets dev->offline = false. 4. The migration completes and restores cpu 7 to offline state: rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down This leaves cpu7 in a state where the driver core considers the cpu device online, but in all other respects it is offline and unused. Attempts to online the cpu via sysfs appear to succeed but the driver core actually does not pass the request to the lower-level cpuhp support code. This makes the cpu unusable until the cpu device is manually set offline and then online again via sysfs. Instead of directly calling cpu_up/cpu_down, the migration code should use the higher-level device core APIs to maintain consistent state and serialize operations. Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16powerpc/64: mark start_here_multiplatform as __refChristophe Leroy1-0/+2
[ Upstream commit 9c4e4c90ec24652921e31e9551fcaedc26eec86d ] Otherwise, the following warning is encountered: WARNING: vmlinux.o(.text+0x3dc6): Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup() The function start_here_multiplatform() references the function __init .early_setup(). This is often because start_here_multiplatform lacks a __init annotation or the annotation of .early_setup is wrong. Fixes: 56c46bba9bbf ("powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX") Cc: Russell Currey <ruscur@russell.cc> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16powerpc/tm: Fix FP/VMX unavailable exceptions inside a transactionGustavo Romero1-1/+2
commit 8205d5d98ef7f155de211f5e2eb6ca03d95a5a60 upstream. When we take an FP unavailable exception in a transaction we have to account for the hardware FP TM checkpointed registers being incorrect. In this case for this process we know the current and checkpointed FP registers must be the same (since FP wasn't used inside the transaction) hence in the thread_struct we copy the current FP registers to the checkpointed ones. This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr to determine if FP was on when in userspace. thread->ckpt_regs.msr represents the state of the MSR when exiting userspace. This is setup by check_if_tm_restore_required(). Unfortunatley there is an optimisation in giveup_all() which returns early if tsk->thread.regs->msr (via local variable `usermsr`) has FP=VEC=VSX=SPE=0. This optimisation means that check_if_tm_restore_required() is not called and hence thread->ckpt_regs.msr is not updated and will contain an old value. This can happen if due to load_fp=255 we start a userspace process with MSR FP=1 and then we are context switched out. In this case thread->ckpt_regs.msr will contain FP=1. If that same process is then context switched in and load_fp overflows, MSR will have FP=0. If that process now enters a transaction and does an FP instruction, the FP unavailable will not update thread->ckpt_regs.msr (the bug) and MSR FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread() will then not perform the required memcpy and the checkpointed FP regs in the thread struct will contain the wrong values. The code path for this happening is: Userspace: Kernel Start userspace with MSR FP/VEC/VSX/SPE=0 TM=1 < ----- ... tbegin bne fp instruction FP unavailable ---- > fp_unavailable_tm() tm_reclaim_current() tm_reclaim_thread() giveup_all() return early since FP/VMX/VSX=0 /* ckpt MSR not updated (Incorrect) */ tm_reclaim() /* thread_struct ckpt FP regs contain junk (OK) */ /* Sees ckpt MSR FP=1 (Incorrect) */ no memcpy() performed /* thread_struct ckpt FP regs not fixed (Incorrect) */ tm_recheckpoint() /* Put junk in hardware checkpoint FP regs */ .... < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. This patch moves up check_if_tm_restore_required() in giveup_all() to ensure thread->ckpt_regs.msr is updated correctly. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c Similarly for VMX. This fixes CVE-2019-15030. Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GBAlastair D'Silva1-2/+2
The upstream commit: 22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()") has a similar effect, but since it is a rewrite of the assembler to C, is too invasive for stable. This patch is a minimal fix to address the issue in assembler. This patch applies cleanly to v5.2, v4.19 & v4.14. When calling flush_(inval_)dcache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31powerpc/tm: Fix oops on sigreturn on systems without TMMichael Neuling2-0/+8
commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream. On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31powerpc/eeh: Handle hugepages in ioremap spaceOliver O'Halloran1-3/+12
[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ] In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") support for using hugepages in the vmalloc and ioremap areas was enabled for radix. Unfortunately this broke EEH MMIO error checking. Detection works by inserting a hook which checks the results of the ioreadXX() set of functions. When a read returns a 0xFFs response we need to check for an error which we do by mapping the (virtual) MMIO address back to a physical address, then mapping physical address to a PCI device via an interval tree. When translating virt -> phys we currently assume the ioremap space is only populated by PAGE_SIZE mappings. If a hugepage mapping is found we emit a WARN_ON(), but otherwise handles the check as though a normal page was found. In pathalogical cases such as copying a buffer containing a lot of 0xFFs from BAR memory this can result in the system not booting because it's too busy printing WARN_ON()s. There's no real reason to assume huge pages can't be present and we're prefectly capable of handling them, so do that. Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31powerpc/pci/of: Fix OF flags parsing for 64bit BARsAlexey Kardashevskiy1-0/+2
[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ] When the firmware does PCI BAR resource allocation, it passes the assigned addresses and flags (prefetch/64bit/...) via the "reg" property of a PCI device device tree node so the kernel does not need to do resource allocation. The flags are stored in resource::flags - the lower byte stores PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc. Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated, such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc. When parsing the "reg" property, we copy the prefetch flag but we skip on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync. The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions: 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch() or by passing "/chosen/linux,pci-probe-only"); 2. we request resource alignment (by passing pci=resource_alignment= via the kernel cmd line to request PAGE_SIZE alignment or defining ppc_md.pcibios_default_alignment which returns anything but 0). Note that the alignment requests are ignored if PCI_PROBE_ONLY is enabled. With 1) and 2), the generic PCI code in the kernel unconditionally decides to: - reassign the BARs in pci_specified_resource_alignment() (works fine) - write new BARs to the device - this fails for 64bit BARs as the generic code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping in the hypervisor. This fixes the issue by copying the flag. This is useful if we want to enforce certain BAR alignment per platform as handling subpage sized BARs is proven to cause problems with hotplug (SLOF already aligns BARs to 64k). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31powerpc/watchpoint: Restore NV GPRs while returning from exceptionRavi Bangoria1-2/+7
commit f474c28fbcbe42faca4eb415172c07d76adcb819 upstream. powerpc hardware triggers watchpoint before executing the instruction. To make trigger-after-execute behavior, kernel emulates the instruction. If the instruction is 'load something into non-volatile register', exception handler should restore emulated register state while returning back, otherwise there will be register state corruption. eg, adding a watchpoint on a list can corrput the list: # cat /proc/kallsyms | grep kthread_create_list c00000000121c8b8 d kthread_create_list Add watchpoint on kthread_create_list->prev: # perf record -e mem:0xc00000000121c8c0 Run some workload such that new kthread gets invoked. eg, I just logged out from console: list_add corruption. next->prev should be prev (c000000001214e00), \ but was c00000000121c8b8. (next=c00000000121c8b8). WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0 CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69 ... NIP __list_add_valid+0xb4/0xc0 LR __list_add_valid+0xb0/0xc0 Call Trace: __list_add_valid+0xb0/0xc0 (unreliable) __kthread_create_on_node+0xe0/0x260 kthread_create_on_node+0x34/0x50 create_worker+0xe8/0x260 worker_thread+0x444/0x560 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x70 List corruption happened because it uses 'load into non-volatile register' instruction: Snippet from __kthread_create_on_node: c000000000136be8: addis r29,r2,-19 c000000000136bec: ld r29,31424(r29) if (!__list_add_valid(new, prev, next)) c000000000136bf0: mr r3,r30 c000000000136bf4: mr r5,r28 c000000000136bf8: mr r4,r29 c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8> Register state from WARN_ON(): GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075 GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000 GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00 GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370 GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000 GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628 GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0 Watchpoint hit at 0xc000000000136bec. addis r29,r2,-19 => r29 = 0xc000000001344e00 + (-19 << 16) => r29 = 0xc000000001214e00 ld r29,31424(r29) => r29 = *(0xc000000001214e00 + 31424) => r29 = *(0xc00000000121c8c0) 0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was emulated by emulate_step. But because handle_dabr_fault did not restore emulated register state, r29 still contains stale value in above register state. Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31powerpc/32s: fix suspend/resume when IBATs 4-7 are usedChristophe Leroy1-8/+65
commit 6ecb78ef56e08d2119d337ae23cb951a640dc52d upstream. Previously, only IBAT1 and IBAT2 were used to map kernel linear mem. Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping kernel text. But the suspend/restore functions only save/restore BATs 0 to 3, and clears BATs 4 to 7. Make suspend and restore functions respectively save and reload the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11pstore: Convert buf_lock to semaphoreKees Cook1-2/+0
commit ea84b580b95521644429cc6748b6c2bf27c8b0f3 upstream. Instead of running with interrupts disabled, use a semaphore. This should make it easier for backends that may need to sleep (e.g. EFI) when performing a write: |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum |Preemption disabled at: |[<ffffffff99d60512>] pstore_dump+0x72/0x330 |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 |Call Trace: | dump_stack+0x4f/0x6a | ___might_sleep.cold.91+0xd3/0xe4 | __might_sleep+0x50/0x90 | wait_for_completion+0x32/0x130 | virt_efi_query_variable_info+0x14e/0x160 | efi_query_variable_store+0x51/0x1a0 | efivar_entry_set_safe+0xa3/0x1b0 | efi_pstore_write+0x109/0x140 | pstore_dump+0x11c/0x330 | kmsg_dump+0xa4/0xd0 | oops_exit+0x22/0x30 ... Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWXRussell Currey1-1/+3
[ Upstream commit 56c46bba9bbfe229b4472a5be313c44c5b714a39 ] With STRICT_KERNEL_RWX enabled anything marked __init is placed at a 16M boundary. This is necessary so that it can be repurposed later with different permissions. However, in kernels with text larger than 16M, this pushes early_setup past 32M, incapable of being reached by the branch instruction. Fix this by setting the CTR and branching there instead. Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Fix it to work on BE by using DOTSYM()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16powerpc/powernv/idle: Restore IAMR after idleRussell Currey1-0/+20
commit a3f3072db6cad40895c585dce65e36aab997f042 upstream. Without restoring the IAMR after idle, execution prevention on POWER9 with Radix MMU is overwritten and the kernel can freely execute userspace without faulting. This is necessary when returning from any stop state that modifies user state, as well as hypervisor state. To test how this fails without this patch, load the lkdtm driver and do the following: $ echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT which won't fault, then boot the kernel with powersave=off, where it will fault. Applying this patch will fix this. Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user space") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16powerpc/64s: Include cpu headerBreno Leitao1-0/+1
commit 42e2acde1237878462b028f5a27d9cc5bea7502c upstream. Current powerpc security.c file is defining functions, as cpu_show_meltdown(), cpu_show_spectre_v{1,2} and others, that are being declared at linux/cpu.h header without including the header file that contains these declarations. This is being reported by sparse, which thinks that these functions are static, due to the lack of declaration: arch/powerpc/kernel/security.c:105:9: warning: symbol 'cpu_show_meltdown' was not declared. Should it be static? arch/powerpc/kernel/security.c:139:9: warning: symbol 'cpu_show_spectre_v1' was not declared. Should it be static? arch/powerpc/kernel/security.c:161:9: warning: symbol 'cpu_show_spectre_v2' was not declared. Should it be static? arch/powerpc/kernel/security.c:209:6: warning: symbol 'stf_barrier' was not declared. Should it be static? arch/powerpc/kernel/security.c:289:9: warning: symbol 'cpu_show_spec_store_bypass' was not declared. Should it be static? This patch simply includes the proper header (linux/cpu.h) to match function definition and declaration. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Major Hayden <major@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14powerpc/speculation: Support 'mitigations=' cmdline optionJosh Poimboeuf2-4/+4
commit 782e69efb3dfed6e8360bc612e8c7827a901a8f9 upstream Configure powerpc CPU runtime speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v1, Spectre v2, and Speculative Store Bypass. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/245a606e1a42a558a310220312d9b6adb9159df6.1555085500.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08kmemleak: powerpc: skip scanning holes in the .bss sectionCatalin Marinas1-0/+7
[ Upstream commit 298a32b132087550d3fa80641ca58323c5dfd4d9 ] Commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds kvm_tmp[] into the .bss section and then free the rest of unused spaces back to the page allocator. kernel_init kvm_guest_init kvm_free_tmp free_reserved_area free_unref_page free_unref_page_prepare With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the result, kmemleak scan will trigger a panic when it scans the .bss section with unmapped pages. This patch creates dedicated kmemleak objects for the .data, .bss and potentially .data..ro_after_init sections to allow partial freeing via the kmemleak_free_part() in the powerpc kvm_free_tmp() function. Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Qian Cai <cai@lca.pw> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Avi Kivity <avi@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20powerpc/pseries: Remove prrn_work workqueueNathan Fontenot1-14/+3
[ Upstream commit cd24e457fd8b2d087d9236700c8d2957054598bf ] When a PRRN event is received we are already running in a worker thread. Instead of spawning off another worker thread on the prrn_work workqueue to handle the PRRN event we can just call the PRRN handler routine directly. With this update we can also pass the scope variable for the PRRN event directly to the handler instead of it being a global variable. This patch fixes the following oops mnessage we are seeing in PRRN testing: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External 54 CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 4.4.126-94.22-default #1 Workqueue: pseries hotplug workque pseries_hp_work_fn task: c000000775367790 ti: c00000001ebd4000 task.ti: c00000070d140000 NIP: 0000000000000000 LR: 000000001fb3d050 CTR: 0000000000000000 REGS: c00000001ebd7d40 TRAP: 0700 Tainted: G X (4.4.126-94.22-default) MSR: 8000000102081000 <41,VEC,ME5 CR: 28000002 XER: 20040018 4 CFAR: 000000001fb3d084 40 419 1 3 GPR00: 000000000000000040000000000010007 000000001ffff400 000000041fffe200 GPR04: 000000000000008050000000000000000 000000001fb15fa8 0000000500000500 GPR08: 000000000001f40040000000000000001 0000000000000000 000005:5200040002 GPR12: 00000000000000005c000000007a05400 c0000000000e89f8 000000001ed9f668 GPR16: 000000001fbeff944000000001fbeff94 000000001fb545e4 0000006000000060 GPR20: ffffffffffffffff4ffffffffffffffff 0000000000000000 0000000000000000 GPR24: 00000000000000005400000001fb3c000 0000000000000000 000000001fb1b040 GPR28: 000000001fb240004000000001fb440d8 0000000000000008 0000000000000000 NIP [0000000000000000] 5 (null) LR [000000001fb3d050] 031fb3d050 Call Trace: 4 Instruction dump: 4 5:47 12 2 XXXXXXXX XXXXXXXX XXXXX4XX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXX5XX XXXXXXXX 60000000 60000000 60000000 60000000 ---[ end trace aa5627b04a7d9d6b ]--- 3NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G D X 4.4.126-94.22-default #1 Workqueue: events prrn_work_fn task: c000000747cfa390 ti: c00000074712c000 task.ti: c00000074712c000 NIP: c0000000008002a8 LR: c000000000090770 CTR: 000000000032e088 REGS: c00000074712f7b0 TRAP: 0901 Tainted: G D X (4.4.126-94.22-default) MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22482044 XER: 20040000 CFAR: c0000000008002c4 SOFTE: 1 GPR00: c000000000090770 c00000074712fa30 c000000000f09800 c000000000fa1928 6:02 GPR04: c000000775f5e000 fffffffffffffffe 0000000000000001 c000000000f42db8 GPR08: 0000000000000001 0000000080000007 0000000000000000 0000000000000000 GPR12: 8006210083180000 c000000007a14400 NIP [c0000000008002a8] _raw_spin_lock+0x68/0xd0 LR [c000000000090770] mobility_rtas_call+0x50/0x100 Call Trace: 59 5 [c00000074712fa60] [c000000000090770] mobility_rtas_call+0x50/0x100 [c00000074712faf0] [c000000000090b08] pseries_devicetree_update+0xf8/0x530 [c00000074712fc20] [c000000000031ba4] prrn_work_fn+0x34/0x50 [c00000074712fc40] [c0000000000e0390] process_one_work+0x1a0/0x4e0 [c00000074712fcd0] [c0000000000e0870] worker_thread+0x1a0/0x6105:57 2 [c00000074712fd80] [c0000000000e8b18] kthread+0x128/0x150 [c00000074712fe30] [c0000000000096f8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 2c090000 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 5:540030 3 e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d0000 89290009 792affe3 Signed-off-by: John Allen <jallen@linux.ibm.com> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEMBreno Leitao1-5/+18
[ Upstream commit 897bc3df8c5aebb54c32d831f917592e873d0559 ] Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined. error: 'msr' undeclared (first use in this function) This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set: #define MSR_TM_ACTIVE(x) 0 This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition. Cc: stable@vger.kernel.org Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-03powerpc/security: Fix spectre_v2 reportingMichael Ellerman1-15/+8
commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream. When I updated the spectre_v2 reporting to handle software count cache flush I got the logic wrong when there's no software count cache enabled at all. The result is that on systems with the software count cache flush disabled we print: Mitigation: Indirect branch cache disabled, Software count cache flush Which correctly indicates that the count cache is disabled, but incorrectly says the software count cache flush is enabled. The root of the problem is that we are trying to handle all combinations of options. But we know now that we only expect to see the software count cache flush enabled if the other options are false. So split the two cases, which simplifies the logic and fixes the bug. We were also missing a space before "(hardware accelerated)". The result is we see one of: Mitigation: Indirect branch serialisation (kernel only) Mitigation: Indirect branch cache disabled Mitigation: Software count cache flush Mitigation: Software count cache flush (hardware accelerated) Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Fix the flush of branch predictor.Christophe Leroy1-0/+1
commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream. The commit identified below adds MC_BTB_FLUSH macro only when CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error on some configs (seen several times with kisskb randconfig_defconfig) arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush' make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1 make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2 make[1]: *** [Makefile:1043: arch/powerpc] Error 2 make: *** [Makefile:152: sub-make] Error 2 This patch adds a blank definition of MC_BTB_FLUSH for other cases. Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)") Cc: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'Diana Craciun1-6/+12
commit 039daac5526932ec731e4499613018d263af8b3e upstream. Fixed the following build warning: powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from `arch/powerpc/kernel/head_44x.o' being placed in section `__btb_flush_fixup'. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Update Spectre v2 reportingDiana Craciun1-1/+4
commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream. Report branch predictor state flush as a mitigation for Spectre variant 2. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is usedDiana Craciun1-0/+1
commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream. If the user choses not to use the mitigations, replace the code sequence with nops. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)Diana Craciun2-0/+21
commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream. In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e.the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)Diana Craciun2-1/+30
commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream. In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e. the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Add nospectre_v2 command line argumentDiana Craciun1-0/+21
commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream. When the command line argument is present, the Spectre variant 2 mitigations are disabled. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Fix spectre_v2 mitigations reportingDiana Craciun1-1/+1
commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream. Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect: $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 "Mitigation: Software count cache flush" Which is wrong. Fix it to report vulnerable for now. Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03powerpc/fsl: Add infrastructure to fixup branch predictor flushDiana Craciun1-0/+8
commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream. In order to protect against speculation attacks (Spectre variant 2) on NXP PowerPC platforms, the branch predictor should be flushed when the privillege level is changed. This patch is adding the infrastructure to fixup at runtime the code sections that are performing the branch predictor flush depending on a boot arg parameter which is added later in a separate patch. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>