summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2020-08-31Merge tag 'v5.8.5' into dev-5.8dev-5.8Joel Stanley132-820/+821
This is the 5.8.5 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2020-08-26KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not setWill Deacon1-4/+13
commit b5331379bc62611d1026173a09c73573384201d9 upstream. When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: <stable@vger.kernel.org> Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-3-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon9-9/+15
commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream. The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi/x86: Mark kernel rodata non-executable for mixed modeArvind Sankar1-0/+2
commit c8502eb2d43b6b9b1dc382299a4d37031be63876 upstream. When remapping the kernel rodata section RO in the EFI pagetables, the protection flags that were used for the text section are being reused, but the rodata section should not be marked executable. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200717194526.3452089-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc/pseries: Do not initiate shutdown when system is running on UPSVasant Hegde1-1/+0
commit 90a9b102eddf6a3f987d15f4454e26a2532c1c98 upstream. As per PAPR we have to look for both EPOW sensor value and event modifier to identify the type of event and take appropriate action. In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes": SYSTEM_SHUTDOWN 3 The system must be shut down. An EPOW-aware OS logs the EPOW error log information, then schedules the system to be shut down to begin after an OS defined delay internal (default is 10 minutes.) Then in section 10.3.2.2.8 there is table 146 "Platform Event Log Format, Version 6, EPOW Section", which includes the "EPOW Event Modifier": For EPOW sensor value = 3 0x01 = Normal system shutdown with no additional delay 0x02 = Loss of utility power, system is running on UPS/Battery 0x03 = Loss of system critical functions, system should be shutdown 0x04 = Ambient temperature too high All other values = reserved We have a user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown after predefined time. It also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Commit 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of the "on UPS/Battery" case, to immediately shutdown the system. This breaks existing setups that rely on the userspace tool to delay shutdown and let the system run on the UPS. Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Massage change log and add PAPR references] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 coresMichael Neuling1-0/+1
commit 030a2c689fb46e1690f7ded8b194bab7678efb28 upstream. On POWER10 bit 12 in the PVR indicates if the core is SMT4 or SMT8. Bit 12 is set for SMT4. Without this patch, /proc/cpuinfo on a SMT4 DD1 POWER10 looks like this: cpu : POWER10, altivec supported revision : 17.0 (pvr 0080 1100) Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode") Cc: stable@vger.kernel.org # v5.8 Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200803035600.1820371-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU deathMichael Roth1-6/+12
[ Upstream commit 801980f6497946048709b9b09771a1729551d705 ] For a power9 KVM guest with XIVE enabled, running a test loop where we hotplug 384 vcpus and then unplug them, the following traces can be seen (generally within a few loops) either from the unplugged vcpu: cpu 65 (hwid 65) Ready to die... Querying DEAD? cpu 66 (66) shows 2 list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ... CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1 NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000 ... NIP __list_del_entry_valid+0xac/0x100 LR __list_del_entry_valid+0xa8/0x100 Call Trace: __list_del_entry_valid+0xa8/0x100 (unreliable) free_pcppages_bulk+0x1f8/0x940 free_unref_page+0xd0/0x100 xive_spapr_cleanup_queue+0x148/0x1b0 xive_teardown_cpu+0x1bc/0x240 pseries_mach_cpu_die+0x78/0x2f0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x2f4/0x4c0 cpu_startup_entry+0x38/0x40 start_secondary+0x7bc/0x8f0 start_secondary_prolog+0x10/0x14 or on the worker thread handling the unplug: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 BUG: Bad page state in process kworker/u768:3 pfn:95de1 cpu 314 (hwid 314) Ready to die... page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0x5ffffc00000000() raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ... CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 Workqueue: pseries hotplug workque pseries_hp_work_fn Call Trace: dump_stack+0xb0/0xf4 (unreliable) bad_page+0x12c/0x1b0 free_pcppages_bulk+0x5bc/0x940 page_alloc_cpu_dead+0x118/0x120 cpuhp_invoke_callback.constprop.5+0xb8/0x760 _cpu_down+0x188/0x340 cpu_down+0x5c/0xa0 cpu_subsys_offline+0x24/0x40 device_offline+0xf0/0x130 dlpar_offline_cpu+0x1c4/0x2a0 dlpar_cpu_remove+0xb8/0x190 dlpar_cpu_remove_by_index+0x12c/0x150 dlpar_cpu+0x94/0x800 pseries_hp_work_fn+0x128/0x1e0 process_one_work+0x304/0x5d0 worker_thread+0xcc/0x7a0 kthread+0x1ac/0x1c0 ret_from_kernel_thread+0x5c/0x80 The latter trace is due to the following sequence: page_alloc_cpu_dead drain_pages drain_pages_zone free_pcppages_bulk where drain_pages() in this case is called under the assumption that the unplugged cpu is no longer executing. To ensure that is the case, and early call is made to __cpu_die()->pseries_cpu_die(), which runs a loop that waits for the cpu to reach a halted state by polling its status via query-cpu-stopped-state RTAS calls. It only polls for 25 iterations before giving up, however, and in the trace above this results in the following being printed only .1 seconds after the hotplug worker thread begins processing the unplug request: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 At that point the worker thread assumes the unplugged CPU is in some unknown/dead state and procedes with the cleanup, causing the race with the XIVE cleanup code executed by the unplugged CPU. Fix this by waiting indefinitely, but also making an effort to avoid spurious lockup messages by allowing for rescheduling after polling the CPU status and printing a warning if we wait for longer than 120s. Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> [mpe: Trim oopses in change log slightly for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26powerpc/fixmap: Fix the size of the early debug areaChristophe Leroy1-1/+1
[ Upstream commit fdc6edbb31fba76fd25d7bd016b675a92908d81e ] Commit ("03fd42d458fb powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") reworked the setup of the early debug area and mistakenly replaced 128 * 1024 by SZ_128. Change to SZ_128K to restore the original 128 kbytes size of the area. Fixes: 03fd42d458fb ("powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/996184974d674ff984643778cf1cdd7fe58cc065.1597644194.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26ARM64: vdso32: Install vdso32 from vdso_installStephen Boyd2-1/+2
[ Upstream commit 8d75785a814241587802655cc33e384230744f0c ] Add the 32-bit vdso Makefile to the vdso_install rule so that 'make vdso_install' installs the 32-bit compat vdso when it is compiled. Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26Fix build error when CONFIG_ACPI is not set/enabled:Randy Dunlap1-0/+1
[ Upstream commit ee87e1557c42dc9c2da11c38e11b87c311569853 ] ../arch/x86/pci/xen.c: In function ‘pci_xen_init’: ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set(); Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: linux-pci@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE modeJim Mattson1-1/+1
[ Upstream commit cb957adb4ea422bd758568df5b2478ea3bb34f35 ] See the SDM, volume 3, section 4.4.1: If PAE paging would be in use following an execution of MOV to CR0 or MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then the PDPTEs are loaded from the address in CR3. Fixes: b9baba8614890 ("KVM, pkeys: expose CPUID/CR4 to guest") Cc: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20200817181655.3716509-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE modeJim Mattson1-1/+1
[ Upstream commit 427890aff8558eb4326e723835e0eae0e6fe3102 ] See the SDM, volume 3, section 4.4.1: If PAE paging would be in use following an execution of MOV to CR0 or MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then the PDPTEs are loaded from the address in CR3. Fixes: 0be0226f07d14 ("KVM: MMU: fix SMAP virtualization") Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20200817181655.3716509-2-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26arch/ia64: Restore arch-specific pgd_offset_k implementationJessica Clarke1-0/+9
[ Upstream commit bd05220c7be3356046861c317d9c287ca50445ba ] IA-64 is special and treats pgd_offset_k() differently to pgd_offset(), using different formulae to calculate the indices into the kernel and user PGDs. The index into the user PGDs takes into account the region number, but the index into the kernel (init_mm) PGD always assumes a predefined kernel region number. Commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which incorrectly used pgd_index() for kernel page tables. As a result, the index into the kernel PGD was going out of bounds and the kernel hung during early boot. Allow overrides of pgd_offset_k() and override it on IA-64 with the old implementation that will correctly index the kernel PGD. Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26s390/ptrace: fix storage key handlingHeiko Carstens1-2/+5
[ Upstream commit fd78c59446b8d050ecf3e0897c5a486c7de7c595 ] The key member of the runtime instrumentation control block contains only the access key, not the complete storage key. Therefore the value must be shifted by four bits. Since existing user space does not necessarily query and set the access key correctly, just ignore the user space provided key and use the correct one. Note: this is only relevant for debugging purposes in case somebody compiles a kernel with a default storage access key set to a value not equal to zero. Fixes: 262832bc5acd ("s390/ptrace: add runtime instrumention register get/set") Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26s390/runtime_instrumentation: fix storage key handlingHeiko Carstens1-1/+1
[ Upstream commit 9eaba29c7985236e16468f4e6a49cc18cf01443e ] The key member of the runtime instrumentation control block contains only the access key, not the complete storage key. Therefore the value must be shifted by four bits. Note: this is only relevant for debugging purposes in case somebody compiles a kernel with a default storage access key set to a value not equal to zero. Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation") Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26alpha: fix annotation of io{read,write}{16,32}be()Luc Van Oostenryck1-4/+4
[ Upstream commit bd72866b8da499e60633ff28f8a4f6e09ca78efe ] These accessors must be used to read/write a big-endian bus. The value returned or written is native-endian. However, these accessors are defined using be{16,32}_to_cpu() or cpu_to_be{16,32}() to make the endian conversion but these expect a __be{16,32} when none is present. Keeping them would need a force cast that would solve nothing at all. So, do the conversion using swab{16,32}, like done in asm-generic for similar situations. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20200622114232.80039-1-luc.vanoostenryck@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26riscv: Fixup static_obj() failGuo Ren1-1/+1
[ Upstream commit 6184358da0004c8fd940afda6c0a0fa4027dc911 ] When enable LOCKDEP, static_obj() will cause error. Because some __initdata static variables is before _stext: static int static_obj(const void *obj) { unsigned long start = (unsigned long) &_stext, end = (unsigned long) &_end, addr = (unsigned long) obj; /* * static variable? */ if ((addr >= start) && (addr < end)) return 1; [ 0.067192] INFO: trying to register non-static key. [ 0.067325] the code is fine but needs lockdep annotation. [ 0.067449] turning off the locking correctness validator. [ 0.067718] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc7-dirty #44 [ 0.067945] Call Trace: [ 0.068369] [<ffffffe00020323c>] walk_stackframe+0x0/0xa4 [ 0.068506] [<ffffffe000203422>] show_stack+0x2a/0x34 [ 0.068631] [<ffffffe000521e4e>] dump_stack+0x94/0xca [ 0.068757] [<ffffffe000255a4e>] register_lock_class+0x5b8/0x5bc [ 0.068969] [<ffffffe000255abe>] __lock_acquire+0x6c/0x1d5c [ 0.069101] [<ffffffe0002550fe>] lock_acquire+0xae/0x312 [ 0.069228] [<ffffffe000989a8e>] _raw_spin_lock_irqsave+0x40/0x5a [ 0.069357] [<ffffffe000247c64>] complete+0x1e/0x50 [ 0.069479] [<ffffffe000984c38>] rest_init+0x1b0/0x28a [ 0.069660] [<ffffffe0000016a2>] 0xffffffe0000016a2 [ 0.069779] [<ffffffe000001b84>] 0xffffffe000001b84 [ 0.069953] [<ffffffe000001092>] 0xffffffe000001092 static __initdata DECLARE_COMPLETION(kthreadd_done); noinline void __ref rest_init(void) { ... complete(&kthreadd_done); Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26m68knommu: fix overwriting of bits in ColdFire V3 cache controlGreg Ungerer1-3/+3
[ Upstream commit bdee0e793cea10c516ff48bf3ebb4ef1820a116b ] The Cache Control Register (CACR) of the ColdFire V3 has bits that control high level caching functions, and also enable/disable the use of the alternate stack pointer register (the EUSP bit) to provide separate supervisor and user stack pointer registers. The code as it is today will blindly clear the EUSP bit on cache actions like invalidation. So it is broken for this case - and that will result in failed booting (interrupt entry and exit processing will be completely hosed). This only affects ColdFire V3 parts that support the alternate stack register (like the 5329 for example) - generally speaking new parts do, older parts don't. It has no impact on ColdFire V3 parts with the single stack pointer, like the 5307 for example. Fix the cache bit defines used, so they maintain the EUSP bit when carrying out cache actions through the CACR register. Signed-off-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26MIPS: Fix unable to reserve memory for Crash kernelJinyang He1-1/+1
[ Upstream commit b1ce9716f3b5ed3b49badf1f003b9e34b7ead0f9 ] Use 0 as the align parameter in memblock_find_in_range() is incorrect when we reserve memory for Crash kernel. The environment as follows: [ 0.000000] MIPS: machine is loongson,loongson64c-4core-rs780e ... [ 1.951016] crashkernel=64M@128M The warning as follows: [ 0.000000] Invalid memory region reserved for crash kernel And the iomem as follows: 00200000-0effffff : System RAM 04000000-0484009f : Kernel code 048400a0-04ad7fff : Kernel data 04b40000-05c4c6bf : Kernel bss 1a000000-1bffffff : pci@1a000000 ... The align parameter may be finally used by round_down() or round_up(). Like the following call tree: mips-next: mm/memblock.c memblock_find_in_range └── memblock_find_in_range_node ├── __memblock_find_range_bottom_up │ └── round_up └── __memblock_find_range_top_down └── round_down \#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1) \#define round_down(x, y) ((x) & ~__round_mask(x, y)) \#define __round_mask(x, y) ((__typeof__(x))((y)-1)) The round_down(or round_up)'s second parameter must be a power of 2. If the second parameter is 0, it both will return 0. Use 1 as the parameter to fix the bug and the iomem as follows: 00200000-0effffff : System RAM 04000000-0484009f : Kernel code 048400a0-04ad7fff : Kernel data 04b40000-05c4c6bf : Kernel bss 08000000-0bffffff : Crash kernel 1a000000-1bffffff : pci@1a000000 ... Signed-off-by: Jinyang He <hejinyang@loongson.cn> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26s390/pci: ignore stale configuration request eventNiklas Schnelle1-0/+3
commit b76fee1bc56c31a9d2a49592810eba30cc06d61a upstream. A configuration request event may be stale, that is the event may reference a zdev which was already configured. This can happen when a hotplug happens during boot such that the device is discovered and configured in the initial clp_list_pci(), then after initialization we enable events and process the original configuration request which additionally still contains the old disabled function handle leading to a failure during device enablement and subsequent I/O lockout. Fix this by restoring the check that the device to be configured is in standby which was removed in commit f606b3ef47c9 ("s390/pci: adapt events for zbus"). This check does not need serialization as we only enable the events after zPCI has fully initialized, which includes the initial clp_list_pci(), rescan only does updates and events are serialized with respect to each other. Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") Cc: <stable@vger.kernel.org> # 5.8 Reported-by: Shalini Chellathurai Saroja <shalini@linux.ibm.com> Tested-by: Shalini Chellathurai Saroja <shalini@linux.ibm.com> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26s390/pci: fix PF/VF linking on hot plugNiklas Schnelle3-13/+32
commit b97bf44f99155e57088e16974afb1f2d7b5287aa upstream. Currently there are four places in which a PCI function is scanned and made available to drivers: 1. In pci_scan_root_bus() as part of the initial zbus creation. 2. In zpci_bus_add_devices() when registering a device in configured state on a zbus that has already been scanned. 3. When a function is already known to zPCI (in reserved/standby state) and configuration is triggered through firmware by PEC 0x301. 4. When a device is already known to zPCI (in standby/reserved state) and configuration is triggered from within Linux using enable_slot(). The PF/VF linking step and setting of pdev->is_virtfn introduced with commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") was only triggered for the second case, which is where VFs created through sriov_numvfs usually land. However unlike some other platforms but like POWER VFs can be individually enabled/disabled through /sys/bus/pci/slots. Fix this by doing VF setup as part of pcibios_bus_add_device() which is called in all of the above cases. Finally to remove the PF/VF links call the common code pci_iov_remove_virtfn() function to remove linked VFs. This takes care of the necessary sysfs cleanup. Fixes: e5794cf1a270 ("s390/pci: create links between PFs and VFs") Cc: <stable@vger.kernel.org> # 5.8: 2f0230b2f2d5: s390/pci: re-introduce zpci_remove_device() Cc: <stable@vger.kernel.org> # 5.8 Acked-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26s390/pci: re-introduce zpci_remove_device()Niklas Schnelle2-9/+14
commit 2f0230b2f2d5fd287a85583eefb5aed35b6fe510 upstream. For fixing the PF to VF link removal we need to perform some action on every removal of a zdev from the common PCI subsystem. So in preparation re-introduce zpci_remove_device() and use that instead of directly calling the common code functions. This was actually still declared from earlier code but no longer implemented. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26s390/pci: fix zpci_bus_link_virtfn()Niklas Schnelle1-10/+15
commit 3cddb79afc60bcdb5fd9dd7a1c64a8d03bdd460f upstream. We were missing the pci_dev_put() for candidate PFs. Furhtermore in discussion with upstream it turns out that somewhat counterintuitively some common code, in particular the vfio-pci driver, assumes that pdev->is_virtfn always implies that pdev->physfn is set, i.e. that VFs are always linked. While POWER does seem to set pdev->is_virtfn even for unlinked functions (see comments in arch/powerpc/kernel/eeh.c:eeh_debugfs_break_device()) for now just be safe and only set pdev->is_virtfn on linking. Also make sure that we only search for parent PFs if the zbus is multifunction and we thus know the devfn values supplied by firmware come from the RID. Fixes: e5794cf1a270 ("s390/pci: create links between PFs and VFs") Cc: <stable@vger.kernel.org> # 5.8 Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21sh: fault: Fix duplicate printing of "PC:"Geert Uytterhoeven1-1/+0
[ Upstream commit 845d9156febcd6b3b20c0c2c8d73b461b48e844c ] Somewhere along the patch handling path, both the old "printk(KERN_ALERT ....)" and the new "pr_alert(...)" were retained, leading to the duplicate printing of "PC:". Drop the old one. Fixes: eaabf98b0932a540 ("sh: fault: modernize printing of kernel messages") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rich Felker <dalias@libc.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21sh: landisk: Add missing initialization of sh_io_port_baseGeert Uytterhoeven1-0/+3
[ Upstream commit 0c64a0dce51faa9c706fdf1f957d6f19878f4b81 ] The Landisk setup code maps the CF IDE area using ioremap_prot(), and passes the resulting virtual addresses to the pata_platform driver, disguising them as I/O port addresses. Hence the pata_platform driver translates them again using ioport_map(). As CONFIG_GENERIC_IOMAP=n, and CONFIG_HAS_IOPORT_MAP=y, the SuperH-specific mapping code in arch/sh/kernel/ioport.c translates I/O port addresses to virtual addresses by adding sh_io_port_base, which defaults to -1, thus breaking the assumption of an identity mapping. Fix this by setting sh_io_port_base to zero. Fixes: 37b7a97884ba64bf ("sh: machvec IO death.") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rich Felker <dalias@libc.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21perf/x86/rapl: Fix missing psys sysfs attributesZhang Rui1-1/+1
[ Upstream commit 4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431 ] This fixes a problem introduced by commit: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface") that perf event sysfs attributes for psys RAPL domain are missing. Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20200811153149.12242-2-rui.zhang@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21s390/Kconfig: add missing ZCRYPT dependency to VFIO_APKrzysztof Kozlowski1-0/+1
[ Upstream commit 929a343b858612100cb09443a8aaa20d4a4706d3 ] The VFIO_AP uses ap_driver_register() (and deregister) functions implemented in ap_bus.c (compiled into ap.o). However the ap.o will be built only if CONFIG_ZCRYPT is selected. This was not visible before commit e93a1695d7fb ("iommu: Enable compile testing for some of drivers") because the CONFIG_VFIO_AP depends on CONFIG_S390_AP_IOMMU which depends on the missing CONFIG_ZCRYPT. After adding COMPILE_TEST, it is possible to select a configuration with VFIO_AP and S390_AP_IOMMU but without the ZCRYPT. Add proper dependency to the VFIO_AP to fix build errors: ERROR: modpost: "ap_driver_register" [drivers/s390/crypto/vfio_ap.ko] undefined! ERROR: modpost: "ap_driver_unregister" [drivers/s390/crypto/vfio_ap.ko] undefined! Reported-by: kernel test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21s390/test_unwind: fix possible memleak in test_unwind()Wang Hai1-0/+1
[ Upstream commit 75d3e7f4769d276a056efa1cc7f08de571fc9b4b ] test_unwind() misses to call kfree(bt) in an error path. Add the missed function call to fix it. Fixes: 0610154650f1 ("s390/test_unwind: print verbose unwinding results") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21x86/bugs/multihit: Fix mitigation reporting when VMX is not in usePawan Gupta1-1/+7
[ Upstream commit f29dfa53cc8ae6ad93bae619bcc0bf45cab344f7 ] On systems that have virtualization disabled or unsupported, sysfs mitigation for X86_BUG_ITLB_MULTIHIT is reported incorrectly as: $ cat /sys/devices/system/cpu/vulnerabilities/itlb_multihit KVM: Vulnerable System is not vulnerable to DoS attack from a rogue guest when virtualization is disabled or unsupported in the hardware. Change the mitigation reporting for these cases. Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Reported-by: Nelson Dsouza <nelson.dsouza@linux.intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/0ba029932a816179b9d14a30db38f0f11ef1f166.1594925782.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoCDilip Kota1-2/+7
[ Upstream commit 7d98585860d845e36ee612832a5ff021f201dbaf ] Frequency descriptor of Lightning Mountain SoC doesn't have all the frequency entries so resulting in the below failure causing a kernel hang: Error MSR_FSB_FREQ index 15 is unknown tsc: Fast TSC calibration failed So, add all the frequency entries in the Lightning Mountain SoC frequency descriptor. Fixes: 0cc5359d8fd45 ("x86/cpu: Update init data for new Airmont CPU model") Fixes: 812c2d7506fd ("x86/tsc_msr: Use named struct initializers") Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/211c643ae217604b46cbec43a2c0423946dc7d2d.1596440057.git.eswara.kota@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21openrisc: Fix oops caused when dumping stackStafford Horne1-2/+16
[ Upstream commit 57b8e277c33620e115633cdf700a260b55095460 ] When dumping a stack with 'cat /proc/#/stack' the kernel would oops. For example: # cat /proc/690/stack Unable to handle kernel access at virtual address 0x7fc60f58 Oops#: 0000 CPU #: 0 PC: c00097fc SR: 0000807f SP: d6f09b9c GPR00: 00000000 GPR01: d6f09b9c GPR02: d6f09bb8 GPR03: d6f09bc4 GPR04: 7fc60f5c GPR05: c00099b4 GPR06: 00000000 GPR07: d6f09ba3 GPR08: ffffff00 GPR09: c0009804 GPR10: d6f08000 GPR11: 00000000 GPR12: ffffe000 GPR13: dbb86000 GPR14: 00000001 GPR15: dbb86250 GPR16: 7fc60f63 GPR17: 00000f5c GPR18: d6f09bc4 GPR19: 00000000 GPR20: c00099b4 GPR21: ffffffc0 GPR22: 00000000 GPR23: 00000000 GPR24: 00000001 GPR25: 000002c6 GPR26: d78b6850 GPR27: 00000001 GPR28: 00000000 GPR29: dbb86000 GPR30: ffffffff GPR31: dbb862fc RES: 00000000 oGPR11: ffffffff Process cat (pid: 702, stackpage=d79d6000) Stack: Call trace: [<598977f2>] save_stack_trace_tsk+0x40/0x74 [<95063f0e>] stack_trace_save_tsk+0x44/0x58 [<b557bfdd>] proc_pid_stack+0xd0/0x13c [<a2df8eda>] proc_single_show+0x6c/0xf0 [<e5a737b7>] seq_read+0x1b4/0x688 [<2d6c7480>] do_iter_read+0x208/0x248 [<2182a2fb>] vfs_readv+0x64/0x90 This was caused by the stack trace code in save_stack_trace_tsk using the wrong stack pointer. It was using the user stack pointer instead of the kernel stack pointer. Fix this by using the right stack. Also for good measure we add try_get_task_stack/put_task_stack to ensure the task is not lost while we are walking it's stack. Fixes: eecac38b0423a ("openrisc: support framepointers and STACKTRACE_SUPPORT") Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21pseries: Fix 64 bit logical memory block panicAnton Blanchard1-1/+1
commit 89c140bbaeee7a55ed0360a88f294ead2b95201b upstream. Booting with a 4GB LMB size causes us to panic: qemu-system-ppc64: OS terminated: OS panic: Memory block size not suitable: 0x0 Fix pseries_memory_block_size() to handle 64 bit LMBs. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715000820.1255764-1-anton@ozlabs.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21MIPS: SGI-IP27: always enable NUMA in KconfigMike Rapoport1-0/+1
commit 6c86a3029ce3b44597526909f2e39a77a497f640 upstream. When a configuration has NUMA disabled and SGI_IP27 enabled, the build fails: CC kernel/bounds.s CC arch/mips/kernel/asm-offsets.s In file included from arch/mips/include/asm/topology.h:11, from include/linux/topology.h:36, from include/linux/gfp.h:9, from include/linux/slab.h:15, from include/linux/crypto.h:19, from include/crypto/hash.h:11, from include/linux/uio.h:10, from include/linux/socket.h:8, from include/linux/compat.h:15, from arch/mips/kernel/asm-offsets.c:12: include/linux/topology.h: In function 'numa_node_id': arch/mips/include/asm/mach-ip27/topology.h:16:27: error: implicit declaration of function 'cputonasid'; did you mean 'cpu_vpe_id'? [-Werror=implicit-function-declaration] #define cpu_to_node(cpu) (cputonasid(cpu)) ^~~~~~~~~~ include/linux/topology.h:119:9: note: in expansion of macro 'cpu_to_node' return cpu_to_node(raw_smp_processor_id()); ^~~~~~~~~~~ include/linux/topology.h: In function 'cpu_cpu_mask': arch/mips/include/asm/mach-ip27/topology.h:19:7: error: implicit declaration of function 'hub_data' [-Werror=implicit-function-declaration] &hub_data(node)->h_cpus) ^~~~~~~~ include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node' return cpumask_of_node(cpu_to_node(cpu)); ^~~~~~~~~~~~~~~ arch/mips/include/asm/mach-ip27/topology.h:19:21: error: invalid type argument of '->' (have 'int') &hub_data(node)->h_cpus) ^~ include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node' return cpumask_of_node(cpu_to_node(cpu)); ^~~~~~~~~~~~~~~ Before switch from discontigmem to sparsemem, there always was CONFIG_NEED_MULTIPLE_NODES=y because it was selected by DISCONTIGMEM. Without DISCONTIGMEM it is possible to have SPARSEMEM without NUMA for SGI_IP27 and as many things there rely on custom node definition, the build breaks. As Thomas noted "... there are right now too many places in IP27 code, which assumes NUMA enabled", the simplest solution would be to always enable NUMA for SGI-IP27 builds. Reported-by: kernel test robot <lkp@intel.com> Fixes: 397dc00e249e ("mips: sgi-ip27: switch from DISCONTIGMEM to SPARSEMEM") Cc: stable@vger.kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21MIPS: qi_lb60: Fix routing to audio amplifierPaul Cercueil1-1/+1
commit 0889a67a9e7a56ba39af223d536630b20b877fda upstream. The ROUT (right channel output of audio codec) was connected to INL (left channel of audio amplifier) instead of INR (right channel of audio amplifier). Fixes: 8ddebad15e9b ("MIPS: qi_lb60: Migrate to devicetree") Cc: stable@vger.kernel.org # v5.3 Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21MIPS: CPU#0 is not hotpluggableHuacai Chen1-1/+1
commit 9cce844abf07b683cff5f0273977d5f8d0af94c7 upstream. Now CPU#0 is not hotpluggable on MIPS, so prevent to create /sys/devices /system/cpu/cpu0/online which confuses some user-space tools. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc: Fix circular dependency between percpu.h and mmu.hMichael Ellerman1-2/+2
commit 0c83b277ada72b585e6a3e52b067669df15bcedb upstream. Recently random.h started including percpu.h (see commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity")), which broke corenet64_smp_defconfig: In file included from /linux/arch/powerpc/include/asm/paca.h:18, from /linux/arch/powerpc/include/asm/percpu.h:13, from /linux/include/linux/random.h:14, from /linux/lib/uuid.c:14: /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx' 139 | DECLARE_PER_CPU(int, next_tlbcam_idx); This is due to a circular header dependency: asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which includes asm/mmu.h Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it. We can fix it by moving the include of paca.h below the include of asm-generic/percpu.h. This moves the include of paca.h out of the #ifdef __powerpc64__, but that is OK because paca.h is almost entirely inside #ifdef CONFIG_PPC64 anyway. It also moves the include of paca.h out of the #ifdef CONFIG_SMP, which could possibly break something, but seems to have no ill effects. Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Cc: stable@vger.kernel.org # v5.8 Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200804130558.292328-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc: Allow 4224 bytes of stack expansion for the signal frameMichael Ellerman1-2/+5
commit 63dee5df43a31f3844efabc58972f0a206ca4534 upstream. We have powerpc specific logic in our page fault handling to decide if an access to an unmapped address below the stack pointer should expand the stack VMA. The code was originally added in 2004 "ported from 2.4". The rough logic is that the stack is allowed to grow to 1MB with no extra checking. Over 1MB the access must be within 2048 bytes of the stack pointer, or be from a user instruction that updates the stack pointer. The 2048 byte allowance below the stack pointer is there to cover the 288 byte "red zone" as well as the "about 1.5kB" needed by the signal delivery code. Unfortunately since then the signal frame has expanded, and is now 4224 bytes on 64-bit kernels with transactional memory enabled. This means if a process has consumed more than 1MB of stack, and its stack pointer lies less than 4224 bytes from the next page boundary, signal delivery will fault when trying to expand the stack and the process will see a SEGV. The total size of the signal frame is the size of struct rt_sigframe (which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on 64-bit). The 2048 byte allowance was correct until 2008 as the signal frame was: struct rt_sigframe { struct ucontext uc; /* 0 1440 */ /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1440 16 */ unsigned int tramp[6]; /* 1456 24 */ struct siginfo * pinfo; /* 1480 8 */ void * puc; /* 1488 8 */ struct siginfo info; /* 1496 128 */ /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1624 288 */ /* size: 1920, cachelines: 15, members: 7 */ /* padding: 8 */ }; 1920 + 128 = 2048 Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") (Jul 2008) the signal frame expanded to 2304 bytes: struct rt_sigframe { struct ucontext uc; /* 0 1696 */ <-- /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1696 16 */ unsigned int tramp[6]; /* 1712 24 */ struct siginfo * pinfo; /* 1736 8 */ void * puc; /* 1744 8 */ struct siginfo info; /* 1752 128 */ /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1880 288 */ /* size: 2176, cachelines: 17, members: 7 */ /* padding: 8 */ }; 2176 + 128 = 2304 At this point we should have been exposed to the bug, though as far as I know it was never reported. I no longer have a system old enough to easily test on. Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a grow-down stack segment") caused our stack expansion code to never trigger, as there was always a VMA found for a write up to PAGE_SIZE below r1. That meant the bug was hidden as we continued to expand the signal frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") (Feb 2013): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ <-- /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[288]; /* 3576 288 */ /* size: 3872, cachelines: 31, members: 8 */ /* padding: 8 */ /* last cacheline: 32 bytes */ }; 3872 + 128 = 4000 And commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to 512 bytes") (Feb 2014): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[512]; /* 3576 512 */ <-- /* size: 4096, cachelines: 32, members: 8 */ /* padding: 8 */ }; 4096 + 128 = 4224 Then finally in 2017, commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") exposed us to the existing bug, because it changed the stack VMA to be the correct/real size, meaning our stack expansion code is now triggered. Fix it by increasing the allowance to 4224 bytes. Hard-coding 4224 is obviously unsafe against future expansions of the signal frame in the same way as the existing code. We can't easily use sizeof() because the signal frame structure is not in a header. We will either fix that, or rip out all the custom stack expansion checking logic entirely. Fixes: ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") Cc: stable@vger.kernel.org # v2.6.27+ Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc/ptdump: Fix build failure in hashpagetable.cChristophe Leroy1-1/+1
commit 7c466b0807960edc13e4b855be85ea765df9a6cd upstream. H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined. != H_SUCCESS means != 0. Modify the test accordingly. Fixes: 65e701b2d2a8 ("powerpc/ptdump: drop non vital #ifdefs") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/795158fc1d2b3dff3bf7347881947a887ea9391a.1592227105.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21xtensa: fix xtensa_pmu_setup prototypeMax Filippov1-1/+1
commit 6d65d3769d1910379e1cfa61ebf387efc6bfb22c upstream. Fix the following build error in configurations with CONFIG_XTENSA_VARIANT_HAVE_PERF_EVENTS=y: arch/xtensa/kernel/perf_event.c:420:29: error: passing argument 3 of ‘cpuhp_setup_state’ from incompatible pointer type Cc: stable@vger.kernel.org Fixes: 25a77b55e74c ("xtensa/perf: Convert the hotplug notifier to state machine callbacks") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21xtensa: add missing exclusive access state managementMax Filippov3-0/+18
commit a0fc1436f1f4f84e93144480bf30e0c958d135b6 upstream. The result of the s32ex opcode is recorded in the ATOMCTL special register and must be retrieved with the getex opcode. Context switch between s32ex and getex may trash the ATOMCTL register and result in duplicate update or missing update of the atomic variable. Add atomctl8 field to the struct thread_info and use getex to swap ATOMCTL bit 8 as a part of context switch. Clear exclusive access monitor on kernel entry. Cc: stable@vger.kernel.org Fixes: f7c34874f04a ("xtensa: add exclusive atomics support") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21arm64: perf: Correct the event index in sysfsShaokun Zhang1-5/+8
commit 539707caa1a89ee4efc57b4e4231c20c46575ccc upstream. When PMU event ID is equal or greater than 0x4000, it will be reduced by 0x4000 and it is not the raw number in the sysfs. Let's correct it and obtain the raw event ID. Before this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x001 After this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x4001 Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1592487344-30555-3-git-send-email-zhangshaokun@hisilicon.com [will: fixed formatting of 'if' condition] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21arm64: dts: qcom: sc7180: Drop the unused non-MSA SIDSibi Sankar2-2/+2
commit 08257610302159e08fd4f5d33787807374ea63c7 upstream. Having a non-MSA (Modem Self-Authentication) SID bypassed breaks modem sandboxing i.e if a transaction were to originate from it, the hardware memory protections units (XPUs) would fail to flag them (any transaction originating from modem are historically termed as an MSA transaction). Drop the unused non-MSA modem SID on SC7180 SoCs and cheza so that SMMU continues to block them. Tested-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Fixes: bec71ba243e95 ("arm64: dts: qcom: sc7180: Update Q6V5 MSS node") Fixes: 68aee4af5f620 ("arm64: dts: qcom: sdm845-cheza: Add iommus property") Cc: stable@vger.kernel.org Reported-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Link: https://lore.kernel.org/r/20200630081938.8131-1-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner1-0/+4
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19s390/gmap: improve THP splittingGerald Schaefer1-7/+20
commit ba925fa35057a062ac98c3e8138b013ce4ce351c upstream. During s390_enable_sie(), we need to take care of splitting all qemu user process THP mappings. This is currently done with follow_page(FOLL_SPLIT), by simply iterating over all vma ranges, with PAGE_SIZE increment. This logic is sub-optimal and can result in a lot of unnecessary overhead, especially when using qemu and ASAN with large shadow map. Ilya reported significant system slow-down with one CPU busy for a long time and overall unresponsiveness. Fix this by using walk_page_vma() and directly calling split_huge_pmd() only for present pmds, which greatly reduces overhead. Cc: <stable@vger.kernel.org> # v5.4+ Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19s390/numa: set node distance to LOCAL_DISTANCEAlexander Gordeev1-6/+0
commit 535e4fc623fab2e09a0653fc3a3e17f382ad0251 upstream. The node distance is hardcoded to 0, which causes a trouble for some user-level applications. In particular, "libnuma" expects the distance of a node to itself as LOCAL_DISTANCE. This update removes the offending node distance override. Cc: <stable@vger.kernel.org> # 4.4 Fixes: 3a368f742da1 ("s390/numa: add core infrastructure") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19irqdomain/treewide: Free firmware node after domain removalJon Derrick2-0/+8
commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream. Commit 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit e3beca48a45b fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19ARM: 8992/1: Fix unwind_frame for clang-built kernelsNathan Huckleberry1-0/+24
commit b4d5ec9b39f8b31d98f65bc5577b5d15d93795d7 upstream. Since clang does not push pc and sp in function prologues, the current implementation of unwind_frame does not work. By using the previous frame's lr/fp instead of saved pc/sp we get valid unwinds on clang-built kernels. The bounds check on next frame pointer must be changed as well since there are 8 less bytes between frames. This fixes /proc/<pid>/stack. Link: https://github.com/ClangBuiltLinux/linux/issues/912 Reported-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Miles Chen <miles.chen@mediatek.com> Cc: stable@vger.kernel.org Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Huckleberry <nhuck@google.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19ARM: dts: exynos: Extend all Exynos5800 A15's OPPs with max voltage dataMarek Szyprowski1-3/+3
commit d644853ff8fcbb7a4e3757f9d8ccc39d930b7e3c upstream. On Exynos5422/5800 the regulator supply for the A15 cores ("vdd_arm") is coupled with the regulator supply for the SoC internal circuits ("vdd_int"), thus all operating points that modify one of those supplies have to specify a triplet of the min/target/max values to properly work with regulator coupling. Fixes: eaffc4de16c6 ("ARM: dts: exynos: Add missing CPU frequencies for Exynos5422/5800") Cc: <stable@vger.kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19parisc: Implement __smp_store_release and __smp_load_acquire barriersJohn David Anglin1-0/+61
commit e96ebd589debd9a6a793608c4ec7019c38785dea upstream. This patch implements the __smp_store_release and __smp_load_acquire barriers using ordered stores and loads. This avoids the sync instruction present in the generic implementation. Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Dave Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19parisc: Do not use an ordered store in pa_tlb_lock()John David Anglin1-2/+7
commit e72b23dec1da5e62a0090c5da1d926778284e230 upstream. No need to use an ordered store in pa_tlb_lock() and update the comment regarng usage of the sid register to unlocak a spinlock in tlb_unlock0(). Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>