summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-11-29KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernelMichael Ellerman1-0/+9
commit af2e8c68b9c5403f77096969c516f742f5bb29e0 upstream. On some systems that are vulnerable to Spectre v2, it is up to software to flush the link stack (return address stack), in order to protect against Spectre-RSB. When exiting from a guest we do some house keeping and then potentially exit to C code which is several stack frames deep in the host kernel. We will then execute a series of returns without preceeding calls, opening up the possiblity that the guest could have poisoned the link stack, and direct speculative execution of the host to a gadget of some sort. To prevent this we add a flush of the link stack on exit from a guest. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29powerpc/book3s64: Fix link stack flush on context switchMichael Ellerman2-4/+50
commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream. In commit ee13cb249fab ("powerpc/64s: Add support for software count cache flush"), I added support for software to flush the count cache (indirect branch cache) on context switch if firmware told us that was the required mitigation for Spectre v2. As part of that code we also added a software flush of the link stack (return address stack), which protects against Spectre-RSB between user processes. That is all correct for CPUs that activate that mitigation, which is currently Power9 Nimbus DD2.3. What I got wrong is that on older CPUs, where firmware has disabled the count cache, we also need to flush the link stack on context switch. To fix it we create a new feature bit which is not set by firmware, which tells us we need to flush the link stack. We set that when firmware tells us that either of the existing Spectre v2 mitigations are enabled. Then we adjust the patching code so that if we see that feature bit we enable the link stack flush. If we're also told to flush the count cache in software then we fall through and do that also. On the older CPUs we don't need to do do the software count cache flush, firmware has disabled it, so in that case we patch in an early return after the link stack flush. The naming of some of the functions is awkward after this patch, because they're called "count cache" but they also do link stack. But we'll fix that up in a later commit to ease backporting. This is the fix for CVE-2019-18660. Reported-by: Anthony Steinhauser <asteinhauser@google.com> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29powerpc/64s: support nospectre_v2 cmdline optionChristopher M. Riedl1-3/+16
commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream. Add support for disabling the kernel implemented spectre v2 mitigation (count cache flush on context switch) via the nospectre_v2 and mitigations=off cmdline options. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9Aneesh Kumar K.V1-0/+2
commit 047e6575aec71d75b765c22111820c4776cd1c43 upstream. On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flagAneesh Kumar K.V1-3/+3
commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 upstream. Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisionsAneesh Kumar K.V1-2/+28
commit 677733e296b5c7a37c47da391fc70a43dc40bd67 upstream. The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2 Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/32s: Fix boot failure with DEBUG_PAGEALLOC without KASAN.Christophe Leroy1-0/+2
commit 9d6d712fbf7766f21c838940eebcd7b4d476c5e6 upstream. When KASAN is selected, the definitive hash table has to be set up later, but there is already an early temporary one. When KASAN is not selected, there is no early hash table, so the setup of the definitive hash table cannot be delayed. Fixes: 72f208c6a8f7 ("powerpc/32s: move hash code patching out of MMU_init_hw()") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Tested-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b7860c5e1e784d6b96ba67edf47dd6cbc2e78ab6.1565776892.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/603: Fix handling of the DIRTY flagChristophe Leroy1-2/+2
commit 415480dce2ef03bb8335deebd2f402f475443ce0 upstream. If a page is already mapped RW without the DIRTY flag, the DIRTY flag is never set and a TLB store miss exception is taken forever. This is easily reproduced with the following app: void main(void) { volatile char *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); *ptr = *ptr; } When DIRTY flag is not set, bail out of TLB miss handler and take a minor page fault which will set the DIRTY flag. Fixes: f8b58c64eaef ("powerpc/603: let's handle PAGE_DIRTY directly") Cc: stable@vger.kernel.org # v5.1+ Reported-by: Doug Crawford <doug.crawford@intelight-its.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80432f71194d7ee75b2f5043ecf1501cf1cca1f3.1566196646.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/mce: Schedule work from irq_workSantosh Sivaraj1-1/+10
commit b5bda6263cad9a927e1a4edb7493d542da0c1410 upstream. schedule_work() cannot be called from MCE exception context as MCE can interrupt even in interrupt disabled context. Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors") Cc: stable@vger.kernel.org # v4.15+ Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/mce: Fix MCE handling for huge pagesBalbir Singh1-6/+13
commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee upstream. The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsingharora@gmail.com> [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07powerpc: dump kernel log before carrying out fadump or kdumpGanesh Goudar1-0/+1
[ Upstream commit e7ca44ed3ba77fc26cf32650bb71584896662474 ] Since commit 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path"), pstore dmesg file is not updated when dump is triggered from HMC. This commit modified system reset (sreset) handler to invoke fadump or kdump (if configured), without pushing dmesg to pstore. This leaves pstore to have old dmesg data which won't be much of a help if kdump fails to capture the dump. This patch fixes that by calling kmsg_dump() before heading to fadump ot kdump. Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path") Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07powerpc/eeh: Clean up EEH PEs after recovery finishesOliver O'Halloran3-3/+64
[ Upstream commit 799abe283e5103d48e079149579b4f167c95ea0e ] When the last device in an eeh_pe is removed the eeh_pe structure itself (and any empty parents) are freed since they are no longer needed. This results in a crash when a hotplug driver is involved since the following may occur: 1. Device is suprise removed. 2. Driver performs an MMIO, which fails and queues and eeh_event. 3. Hotplug driver receives a hotplug interrupt and removes any pci_devs that were under the slot. 4. pci_dev is torn down and the eeh_pe is freed. 5. The EEH event handler thread processes the eeh_event and crashes since the eeh_pe pointer in the eeh_event structure is no longer valid. Crashing is generally considered poor form. Instead of doing that use the fact PEs are marked as EEH_PE_INVALID to keep them around until the end of the recovery cycle, at which point we can safely prune any empty PEs. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07powerpc/64s/exception: machine check use correct cfar for late handlerNicholas Piggin1-0/+4
[ Upstream commit 0b66370c61fcf5fcc1d6901013e110284da6e2bb ] Bare metal machine checks run an "early" handler in real mode before running the main handler which reports the event. The main handler runs exactly as a normal interrupt handler, after the "windup" which sets registers back as they were at interrupt entry. CFAR does not get restored by the windup code, so that will be wrong when the handler is run. Restore the CFAR to the saved value before running the late handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flagSam Bobroff1-1/+10
[ Upstream commit aa06e3d60e245284d1e55497eb3108828092818d ] The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the use of driver callbacks in drivers that have been bound part way through the recovery process. This is necessary to prevent later stage handlers from being called when the earlier stage handlers haven't, which can be confusing for drivers. However, the flag is set for all devices that are added after boot time and only cleared at the end of the EEH recovery process. This results in hot plugged devices erroneously having the flag set during the first recovery after they are added (causing their driver's handlers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Also clear the flag during eeh_handle_special_event(), for the same reasons. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b8ca5629d27de74c957d4f4b250177d1b6fc4bbd.1565930772.git.sbobroff@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07powerpc/rtas: use device model APIs and serialization during LPMNathan Lynch1-3/+8
[ Upstream commit a6717c01ddc259f6f73364779df058e2c67309f8 ] The LPAR migration implementation and userspace-initiated cpu hotplug can interleave their executions like so: 1. Set cpu 7 offline via sysfs. 2. Begin a partition migration, whose implementation requires the OS to ensure all present cpus are online; cpu 7 is onlined: rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up This sets cpu 7 online in all respects except for the cpu's corresponding struct device; dev->offline remains true. 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is already online and returns success. The driver core (device_online) sets dev->offline = false. 4. The migration completes and restores cpu 7 to offline state: rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down This leaves cpu7 in a state where the driver core considers the cpu device online, but in all other respects it is offline and unused. Attempts to online the cpu via sysfs appear to succeed but the driver core actually does not pass the request to the lower-level cpuhp support code. This makes the cpu unusable until the cpu device is manually set offline and then online again via sysfs. Instead of directly calling cpu_up/cpu_down, the migration code should use the higher-level device core APIs to maintain consistent state and serialize operations. Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-06Merge tag 'powerpc-5.3-5' of ↵Linus Torvalds1-17/+4
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a boot hang on some Freescale machines when PREEMPT is enabled. Two CVE fixes for bugs in our handling of FP registers and transactional memory, both of which can result in corrupted FP state, or FP state leaking between processes. Thanks to: Chris Packham, Christophe Leroy, Gustavo Romero, Michael Neuling" * tag 'powerpc-5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction powerpc/64e: Drop stale call to smp_processor_id() which hangs SMP startup
2019-09-04powerpc/tm: Fix restoring FP/VMX facility incorrectly on interruptsGustavo Romero1-16/+2
When in userspace and MSR FP=0 the hardware FP state is unrelated to the current process. This is extended for transactions where if tbegin is run with FP=0, the hardware checkpoint FP state will also be unrelated to the current process. Due to this, we need to ensure this hardware checkpoint is updated with the correct state before we enable FP for this process. Unfortunately we get this wrong when returning to a process from a hardware interrupt. A process that starts a transaction with FP=0 can take an interrupt. When the kernel returns back to that process, we change to FP=1 but with hardware checkpoint FP state not updated. If this transaction is then rolled back, the FP registers now contain the wrong state. The process looks like this: Userspace: Kernel Start userspace with MSR FP=0 TM=1 < ----- ... tbegin bne Hardware interrupt ---- > <do_IRQ...> .... ret_from_except restore_math() /* sees FP=0 */ restore_fp() tm_active_with_fp() /* sees FP=1 (Incorrect) */ load_fp_state() FP = 0 -> 1 < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk When returning from the hardware exception, tm_active_with_fp() is incorrectly making restore_fp() call load_fp_state() which is setting FP=1. The fix is to remove tm_active_with_fp(). tm_active_with_fp() is attempting to handle the case where FP state has been changed inside a transaction. In this case the checkpointed and transactional FP state is different and hence we must restore the FP state (ie. we can't do lazy FP restore inside a transaction that's used FP). It's safe to remove tm_active_with_fp() as this case is handled by restore_tm_state(). restore_tm_state() detects if FP has been using inside a transaction and will set load_fp and call restore_math() to ensure the FP state (checkpoint and transaction) is restored. This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. Similarly for VMX. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c This fixes CVE-2019-15031. Fixes: a7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
2019-09-04powerpc/tm: Fix FP/VMX unavailable exceptions inside a transactionGustavo Romero1-1/+2
When we take an FP unavailable exception in a transaction we have to account for the hardware FP TM checkpointed registers being incorrect. In this case for this process we know the current and checkpointed FP registers must be the same (since FP wasn't used inside the transaction) hence in the thread_struct we copy the current FP registers to the checkpointed ones. This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr to determine if FP was on when in userspace. thread->ckpt_regs.msr represents the state of the MSR when exiting userspace. This is setup by check_if_tm_restore_required(). Unfortunatley there is an optimisation in giveup_all() which returns early if tsk->thread.regs->msr (via local variable `usermsr`) has FP=VEC=VSX=SPE=0. This optimisation means that check_if_tm_restore_required() is not called and hence thread->ckpt_regs.msr is not updated and will contain an old value. This can happen if due to load_fp=255 we start a userspace process with MSR FP=1 and then we are context switched out. In this case thread->ckpt_regs.msr will contain FP=1. If that same process is then context switched in and load_fp overflows, MSR will have FP=0. If that process now enters a transaction and does an FP instruction, the FP unavailable will not update thread->ckpt_regs.msr (the bug) and MSR FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread() will then not perform the required memcpy and the checkpointed FP regs in the thread struct will contain the wrong values. The code path for this happening is: Userspace: Kernel Start userspace with MSR FP/VEC/VSX/SPE=0 TM=1 < ----- ... tbegin bne fp instruction FP unavailable ---- > fp_unavailable_tm() tm_reclaim_current() tm_reclaim_thread() giveup_all() return early since FP/VMX/VSX=0 /* ckpt MSR not updated (Incorrect) */ tm_reclaim() /* thread_struct ckpt FP regs contain junk (OK) */ /* Sees ckpt MSR FP=1 (Incorrect) */ no memcpy() performed /* thread_struct ckpt FP regs not fixed (Incorrect) */ tm_recheckpoint() /* Put junk in hardware checkpoint FP regs */ .... < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. This patch moves up check_if_tm_restore_required() in giveup_all() to ensure thread->ckpt_regs.msr is updated correctly. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c Similarly for VMX. This fixes CVE-2019-15030. Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
2019-08-10dma-mapping: fix page attributes for dma_mmap_*Christoph Hellwig2-19/+1
All the way back to introducing dma_common_mmap we've defaulted to mark the pages as uncached. But this is wrong for DMA coherent devices. Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that flag is only treated special on the alloc side for non-coherent devices. Introduce a new dma_pgprot helper that deals with the check for coherent devices so that only the remapping cases ever reach arch_dma_mmap_pgprot and we thus ensure no aliasing of page attributes happens, which makes the powerpc version of arch_dma_mmap_pgprot obsolete and simplifies the remaining ones. Note that this means arch_dma_mmap_pgprot is a bit misnamed now, but we'll phase it out soon. Fixes: 64ccc9c033c6 ("common: dma-mapping: add support for generic dma_mmap_* calls") Reported-by: Shawn Anastasio <shawn@anastas.io> Reported-by: Gavin Li <git@thegavinli.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> # arm64
2019-07-30powerpc/spe: Mark expected switch fall-throughsMichael Ellerman1-0/+4
Mark switch cases where we are expecting to fall through. Fixes errors such as below, seen with mpc85xx_defconfig: arch/powerpc/kernel/align.c: In function 'emulate_spe': arch/powerpc/kernel/align.c:178:8: error: this statement may fall through ret |= __get_user_inatomic(temp.v[3], p++); ^~ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190730141917.21817-1-mpe@ellerman.id.au
2019-07-29powerpc: Wire up clone3 syscallMichael Ellerman3-1/+14
Wire up the new clone3 syscall added in commit 7f192e3cd316 ("fork: add clone3"). This requires a ppc_clone3 wrapper, in order to save the non-volatile GPRs before calling into the generic syscall code. Otherwise we hit the BUG_ON in CHECK_FULL_REGS in copy_thread(). Lightly tested using Christian's test code on a Power8 LE VM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Christian Brauner <christian@brauner.io> Link: https://lore.kernel.org/r/20190724140259.23554-1-mpe@ellerman.id.au
2019-07-26Merge tag 'docs-5.3-1' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation fixes from Jonathan Corbet: "This is mostly a set of follow-on fixes from Mauro fixing various fallout from the massive RST conversion; a few other small fixes as well" * tag 'docs-5.3-1' of git://git.lwn.net/linux: (21 commits) docs: phy: Drop duplicate 'be made' doc:it_IT: translations in process/ docs/vm: transhuge: fix typo in madvise reference doc:it_IT: rephrase statement doc:it_IT: align translation to mainline docs: load_config.py: ensure subdirs end with "/" docs: virtual: add it to the documentation body docs: remove extra conf.py files docs: load_config.py: avoid needing a conf.py just due to LaTeX docs scripts/sphinx-pre-install: seek for Noto CJK fonts for pdf output scripts/sphinx-pre-install: cleanup Gentoo checks scripts/sphinx-pre-install: fix latexmk dependencies scripts/sphinx-pre-install: don't use LaTeX with CentOS 7 scripts/sphinx-pre-install: fix script for RHEL/CentOS docs: conf.py: only use CJK if the font is available docs: conf.py: add CJK package needed by translations docs: pdf: add all Documentation/*/index.rst to PDF output docs: fix broken doc references due to renames docs: power: add it to to the main documentation index docs: powerpc: convert docs to ReST and rename to *.rst ...
2019-07-24Merge tag 'powerpc-5.3-2' of ↵Linus Torvalds4-1/+27
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "An assortment of non-regression fixes that have accumulated since the start of the merge window. - A fix for a user triggerable oops on machines where transactional memory is disabled, eg. Power9 bare metal, Power8 with TM disabled on the command line, or all Power7 or earlier machines. - Three fixes for handling of PMU and power saving registers when running nested KVM on Power9. - Two fixes for bugs found while stress testing the XIVE interrupt controller code, also on Power9. - A fix to allow guests to boot under Qemu/KVM on Power9 using the the Hash MMU with >= 1TB of memory. - Two fixes for bugs in the recent DMA cleanup, one of which could lead to checkstops. - And finally three fixes for the PAPR SCM nvdimm driver. Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater, Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling, Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar Singh, Vaibhav Jain" * tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL powerpc/pseries: Update SCM hcall op-codes in hvcall.h powerpc/tm: Fix oops on sigreturn on systems without TM powerpc/dma: Fix invalid DMA mmap behavior KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting powerpc/mm: Limit rma_size to 1TB when running without HV mode
2019-07-22powerpc/tm: Fix oops on sigreturn on systems without TMMichael Neuling2-0/+8
On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
2019-07-19powerpc/dma: Fix invalid DMA mmap behaviorShawn Anastasio2-1/+19
The refactor of powerpc DMA functions in commit 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Fixes: 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Shawn Anastasio <shawn@anastas.io> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
2019-07-17docs: powerpc: convert docs to ReST and rename to *.rstMauro Carvalho Chehab1-1/+1
Convert docs to ReST and add them to the arch-specific book. The conversion here was trivial, as almost every file there was already using an elegant format close to ReST standard. The changes were mostly to mark literal blocks and add a few missing section title identifiers. One note with regards to "--": on Sphinx, this can't be used to identify a list, as it will format it badly. This can be used, however, to identify a long hyphen - and "---" is an even longer one. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
2019-07-16Merge tag 'for-linus-20190715' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd and clone3 fixes from Christian Brauner: "This contains a bugfix for CLONE_PIDFD when used with the legacy clone syscall, two fixes to ensure that syscall numbering and clone3 entrypoint implementations will stay consistent, and an update for the maintainers file: - The addition of clone3 broke CLONE_PIDFD for legacy clone on all architectures that use do_fork() directly instead of calling the clone syscall itself. (Fwiw, cleaning do_fork() up is on my todo.) The reason this happened was that during conversion of _do_fork() to use struct kernel_clone_args we missed that do_fork() is called directly by various architectures. This is fixed by making sure that the pidfd argument in struct kernel_clone_args is correctly initialized with the parent_tidptr argument passed down from do_fork(). Additionally, do_fork() missed a check to make CLONE_PIDFD and CLONE_PARENT_SETTID mutually exclusive just a clone() does. This is now fixed too. - When clone3() was introduced we skipped architectures that require special handling for fork-like syscalls. Their syscall tables did not contain any mention of clone3(). To make sure that Arnd's work to make syscall numbers on all architectures identical (minus alpha) was not for naught we are placing a comment in all syscall tables that do not yet implement clone3(). The comment makes it clear that 435 is reserved for clone3 and should not be used. - Also, this contains a patch to make the clone3() syscall definition in asm-generic/unist.h conditional on __ARCH_WANT_SYS_CLONE3. This lets us catch new architectures that implicitly make use of clone3 without setting __ARCH_WANT_SYS_CLONE3 which is a good indicator that they did not check whether it needs special treatment or not. - Finally, this contains a patch to add me as maintainer for pidfd stuff so people can start blaming me (more)" * tag 'for-linus-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: MAINTAINERS: add new entry for pidfd api unistd: protect clone3 via __ARCH_WANT_SYS_CLONE3 arch: mark syscall number 435 reserved for clone3 clone: fix CLONE_PIDFD support
2019-07-15arch: mark syscall number 435 reserved for clone3Christian Brauner1-0/+1
A while ago Arnd made it possible to give new system calls the same syscall number on all architectures (except alpha). To not break this nice new feature let's mark 435 for clone3 as reserved on all architectures that do not yet implement it. Even if an architecture does not plan to implement it this ensures that new system calls coming after clone3 will have the same number on all architectures. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: linux-arch@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.io Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-14Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds23-655/+1333
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-11powerpc/eeh: Handle hugepages in ioremap spaceOliver O'Halloran1-3/+12
In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") support for using hugepages in the vmalloc and ioremap areas was enabled for radix. Unfortunately this broke EEH MMIO error checking. Detection works by inserting a hook which checks the results of the ioreadXX() set of functions. When a read returns a 0xFFs response we need to check for an error which we do by mapping the (virtual) MMIO address back to a physical address, then mapping physical address to a PCI device via an interval tree. When translating virt -> phys we currently assume the ioremap space is only populated by PAGE_SIZE mappings. If a hugepage mapping is found we emit a WARN_ON(), but otherwise handles the check as though a normal page was found. In pathalogical cases such as copying a buffer containing a lot of 0xFFs from BAR memory this can result in the system not booting because it's too busy printing WARN_ON()s. There's no real reason to assume huge pages can't be present and we're prefectly capable of handling them, so do that. Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.com
2019-07-11Merge tag 'pidfd-updates-v5.3' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This adds two main features. - First, it adds polling support for pidfds. This allows process managers to know when a (non-parent) process dies in a race-free way. The notification mechanism used follows the same logic that is currently used when the parent of a task is notified of a child's death. With this patchset it is possible to put pidfds in an {e}poll loop and get reliable notifications for process (i.e. thread-group) exit. - The second feature compliments the first one by making it possible to retrieve pollable pidfds for processes that were not created using CLONE_PIDFD. A lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a problem for Android's low memory killer (LMK) and service managers such as systemd. Both patchsets are accompanied by selftests. It's perhaps worth noting that the work done so far and the work done in this branch for pidfd_open() and polling support do already see some adoption: - Android is in the process of backporting this work to all their LTS kernels [1] - Service managers make use of pidfd_send_signal but will need to wait until we enable waiting on pidfds for full adoption. - And projects I maintain make use of both pidfd_send_signal and CLONE_PIDFD [2] and will use polling support and pidfd_open() too" [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22 [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753 * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add pidfd_open() tests arch: wire-up pidfd_open() pid: add pidfd_open() pidfd: add polling selftests pidfd: add polling support
2019-07-10powerpc/irq: Don't WARN continuously in arch_local_irq_restore()Michael Ellerman1-3/+3
When CONFIG_PPC_IRQ_SOFT_MASK_DEBUG is enabled (uncommon), we have a series of WARN_ON's in arch_local_irq_restore(). These are "should never happen" conditions, but if they do happen they can flood the console and render the system unusable. So switch them to WARN_ON_ONCE(). Fixes: e2b36d591720 ("powerpc/64: Don't trace code that runs with the soft irq mask unreconciled") Fixes: 9b81c0211c24 ("powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely") Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190708061046.7075-1-mpe@ellerman.id.au
2019-07-09Merge tag 'pm-5.3-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update PCI and ACPI power management (improved handling of ACPI power resources and PCIe link delays, fixes related to corner cases, hibernation handling rework), fix and extend the operating performance points (OPP) framework, add new cpufreq drivers for Raspberry Pi and imx8m chips, update some other cpufreq drivers, clean up assorted pieces of PM code and documentation and update tools. Specifics: - Improve the handling of shared ACPI power resources in the PCI bus type layer (Mika Westerberg). - Make the PCI layer take link delays required by the PCIe spec into account as appropriate and avoid polling devices in D3cold for PME (Mika Westerberg). - Fix some corner case issues in ACPI device power management and in the PCI bus type layer, optimiza and clean up the handling of runtime-suspended PCI devices during system-wide transitions to sleep states (Rafael Wysocki). - Rework hibernation handling in the ACPI core and the PCI bus type to resume runtime-suspended devices before hibernation (which allows some functional problems to be avoided) and fix some ACPI power management issues related to hiberation (Rafael Wysocki). - Extend the operating performance points (OPP) framework to support a wider range of devices (Rajendra Nayak, Stehpen Boyd). - Fix issues related to genpd_virt_devs and issues with platforms using the set_opp() callback in the OPP framework (Viresh Kumar, Dmitry Osipenko). - Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne). - Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez). - Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210, and armada-37xx cpufreq drivers (David Arcari, Florian Fainelli, Paweł Chmiel, YueHaibing). - Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano). - Fix minor issue in the ACPI system sleep support code and export one function from it (Lenny Szubowicz, Dexuan Cui). - Clean up assorted pieces of PM code and documentation (Kefeng Wang, Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang, Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki). - Update the pm-graph utility to v5.4 (Todd Brandt). - Fix and clean up the cpupower utility (Abhishek Goel, Nick Black)" * tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (57 commits) ACPI: PM: Make acpi_sleep_state_supported() non-static PM: sleep: Drop dev_pm_skip_next_resume_phases() ACPI: PM: Unexport acpi_device_get_power() Documentation: ABI: power: Add missing newline at end of file ACPI: PM: Drop unused function and function header ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS ACPI: PM: Simplify and fix PM domain hibernation callbacks PCI: PM: Simplify bus-level hibernation callbacks PM: ACPI/PCI: Resume all devices during hibernation cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update() cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get() kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset() cpufreq: Don't skip frequency validation for has_target() drivers PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete() PCI / ACPI: Add _PR0 dependent devices ACPI / PM: Introduce concept of a _PR0 dependent device PCI / ACPI: Use cached ACPI device state to get PCI device power state ACPI: PM: Allow transitions to D0 to occur in special cases ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3cold cpufreq: Use has_target() instead of !setpolicy ...
2019-07-09Merge branch 'siginfo-linus' of ↵Linus Torvalds4-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug updates from Thomas Gleixner: "A small set of updates for SMP and CPU hotplug: - Abort disabling secondary CPUs in the freezer when a wakeup is pending instead of evaluating it only after all CPUs have been offlined. - Remove the shared annotation for the strict per CPU cfd_data in the smp function call core code. - Remove the return values of smp_call_function() and on_each_cpu() as they are unconditionally 0. Fixup the few callers which actually bothered to check the return value" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove smp_call_function() and on_each_cpu() return values smp: Do not mark call_function_data as shared cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-07-08Merge branch 'pm-sleep'Rafael J. Wysocki1-0/+1
* pm-sleep: PM: sleep: Drop dev_pm_skip_next_resume_phases() ACPI: PM: Drop unused function and function header ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS ACPI: PM: Simplify and fix PM domain hibernation callbacks PCI: PM: Simplify bus-level hibernation callbacks PM: ACPI/PCI: Resume all devices during hibernation kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset() PM: sleep: Update struct wakeup_source documentation drivers: base: power: remove wakeup_sources_stats_dentry variable PM: suspend: Rename pm_suspend_via_s2idle() PM: sleep: Show how long dpm_suspend_start() and dpm_suspend_end() take PM: hibernate: powerpc: Expose pfn_is_nosave() prototype
2019-07-05powerpc/module64: Use symbolic instructions names.Christophe Leroy1-18/+35
To increase readability/maintainability, replace hard coded instructions values by symbolic names. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Fix R_PPC64_ENTRY case, the addi reads from r2 not r12] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/module32: Use symbolic instructions names.Christophe Leroy1-8/+16
To increase readability/maintainability, replace hard coded instructions values by symbolic names. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.hChristophe Leroy2-11/+0
PPC_HA() PPC_HI() and PPC_LO() macros are nice macros. Move them from module64.c to ppc-opcode.h in order to use them in other places. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Clean up formatting in new code, drop duplicates in ftrace.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/module64: Fix comment in R_PPC64_ENTRY handlingMichael Ellerman1-1/+1
The comment here is wrong, the addi reads from r2 not r12. The code is correct, 0x38420000 = addi r2,r2,0. Fixes: a61674bdfc7c ("powerpc/module: Handle R_PPC64_ENTRY relocations") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/64: reuse PPC32 static inline flush_dcache_range()Christophe Leroy1-29/+0
This patch drops the assembly PPC64 version of flush_dcache_range() and re-uses the PPC32 static inline version. With GCC 8.1, the following code is generated: void flush_test(unsigned long start, unsigned long stop) { flush_dcache_range(start, stop); } 0000000000000130 <.flush_test>: 130: 3d 22 00 00 addis r9,r2,0 132: R_PPC64_TOC16_HA .data+0x8 134: 81 09 00 00 lwz r8,0(r9) 136: R_PPC64_TOC16_LO .data+0x8 138: 3d 22 00 00 addis r9,r2,0 13a: R_PPC64_TOC16_HA .data+0xc 13c: 80 e9 00 00 lwz r7,0(r9) 13e: R_PPC64_TOC16_LO .data+0xc 140: 7d 48 00 d0 neg r10,r8 144: 7d 43 18 38 and r3,r10,r3 148: 7c 00 04 ac hwsync 14c: 4c 00 01 2c isync 150: 39 28 ff ff addi r9,r8,-1 154: 7c 89 22 14 add r4,r9,r4 158: 7c 83 20 50 subf r4,r3,r4 15c: 7c 89 3c 37 srd. r9,r4,r7 160: 41 82 00 1c beq 17c <.flush_test+0x4c> 164: 7d 29 03 a6 mtctr r9 168: 60 00 00 00 nop 16c: 60 00 00 00 nop 170: 7c 00 18 ac dcbf 0,r3 174: 7c 63 42 14 add r3,r3,r8 178: 42 00 ff f8 bdnz 170 <.flush_test+0x40> 17c: 7c 00 04 ac hwsync 180: 4c 00 01 2c isync 184: 4e 80 00 20 blr 188: 60 00 00 00 nop 18c: 60 00 00 00 nop Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/64: flush_inval_dcache_range() becomes flush_dcache_range()Christophe Leroy1-25/+2
On most arches having function flush_dcache_range(), including PPC32, this function does a writeback and invalidation of the cache bloc. On PPC64, flush_dcache_range() only does a writeback while flush_inval_dcache_range() does the invalidation in addition. In addition it looks like within arch/powerpc/, there are no PPC64 platforms using flush_dcache_range() This patch drops the existing 64 bits version of flush_dcache_range() and renames flush_inval_dcache_range() into flush_dcache_range(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/pci/of: Parse unassigned resourcesAlexey Kardashevskiy1-2/+10
The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing which reads "assigned-addresses" of every PCI device and initializes the device resources. However if the property is missing or zero sized, then there is no fallback of any kind and the PCI resources remain undiscovered, i.e. pdev->resource[] array remains empty. This adds a fallback which parses the "reg" property in pretty much same way except it marks resources as "unset" which later make Linux assign those resources proper addresses. This has an effect when: 1. a hypervisor failed to assign any resource for a device; 2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try assigning a resource. Neither is likely to happen under PowerVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/pseries/dma: Allow SWIOTLBAlexey Kardashevskiy1-0/+36
The commit 8617a5c5bc00 ("powerpc/dma: handle iommu bypass in dma_iommu_ops") merged direct DMA ops into the IOMMU DMA ops allowing SWIOTLB as well but only for mapping; the unmapping and bouncing parts were left unmodified. This adds missing direct unmapping calls to .unmap_page() and .unmap_sg(). This adds missing sync callbacks and directs them to the direct DMA hooks. Fixes: 8617a5c5bc00 ("powerpc/dma: handle iommu bypass in dma_iommu_ops") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: remove device_to_mask()Christoph Hellwig1-2/+2
Use the dma_get_mask() helper from dma-mapping.h instead, as they are functionally identical. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: Fix compile issue with force DAWRMichael Neuling4-89/+102
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This moves a bunch of code around to fix this. It moves a lot of the DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable compiling it. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Minor formatting in set_dawr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: silence a -Wcast-function-type warning in dawr_write_file_boolMathieu Malaterre1-1/+6
In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") the following piece of code was added: smp_call_function((smp_call_func_t)set_dawr, &null_brk, 0); Since GCC 8 this triggers the following warning about incompatible function types: arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void *)' [-Werror=cast-function-type] Since the warning is there for a reason, and should not be hidden behind a cast, provide an intermediate callback function to avoid the warning. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/64s: Rename PPC_INVALIDATE_ERAT to PPC_ISA_3_0_INVALIDATE_ERATNicholas Piggin1-2/+1
This makes it clear to the caller that it can only be used on POWER9 and later CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use "ISA_3_0" rather than "ARCH_300"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/64s/exception: simplify hmi control flowNicholas Piggin1-16/+10
Branch to the relocated 0xc000 address early (still in real mode), to simplify subsequent branches. Have the virt mode handler avoid just 'windup' and redo the exception from scratch, rather than branching back to the trampoline. Rearrange the stack setup instruction location to match the system reset handler (e.g., right before EXCEPTION_PROLOG_COMMON). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>