summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2022-01-20powerpc/pseries: Get entry and uaccess flush required bits from ↵Nicholas Piggin2-0/+8
H_GET_CPU_CHARACTERISTICS commit 65c7d070850e109a8a75a431f5a7f6eb4c007b77 upstream. This allows the hypervisor / firmware to describe these workarounds to the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_allWei Wang1-1/+1
commit 9fb12fe5b93b94b9e607509ba461e17f4cc6a264 upstream. The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20KVM: s390: Clarify SIGP orders versus STOP/RESTARTEric Farman4-2/+43
commit 812de04661c4daa7ac385c0dfd62594540538034 upstream. With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL, SENSE, and SENSE RUNNING STATUS) which are intended for frequent use and thus are processed in-kernel. The remainder are sent to userspace with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders (RESTART, STOP, and STOP AND STORE STATUS) have the potential to inject work back into the kernel, and thus are asynchronous. Let's look for those pending IRQs when processing one of the in-kernel SIGP orders, and return BUSY (CC2) if one is in process. This is in agreement with the Principles of Operation, which states that only one order can be "active" on a CPU at a time. Cc: stable@vger.kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com [borntraeger@linux.ibm.com: add stable tag] Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guestSean Christopherson3-1/+6
commit f4b027c5c8199abd4fb6f00d67d380548dbfdfa8 upstream. Override the Processor Trace (PT) interrupt handler for guest mode if and only if PT is configured for host+guest mode, i.e. is being used independently by both host and guest. If PT is configured for system mode, the host fully controls PT and must handle all events. Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest") Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Artem Kashkanov <artem.kashkanov@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20perf: Protect perf_guest_cbs with RCUSean Christopherson7-31/+60
commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream. Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily, all paths that read perf_guest_cbs already require RCU protection, e.g. to protect the callback chains, so only the direct perf_guest_cbs touchpoints need to be modified. Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure perf_guest_cbs isn't reloaded between a !NULL check and a dereference. Fixed via the READ_ONCE() in rcu_dereference(). Bug #2 is that on weakly-ordered architectures, updates to the callbacks themselves are not guaranteed to be visible before the pointer is made visible to readers. Fixed by the smp_store_release() in rcu_assign_pointer() when the new pointer is non-NULL. Bug #3 is that, because the callbacks are global, it's possible for readers to run in parallel with an unregisters, and thus a module implementing the callbacks can be unloaded while readers are in flight, resulting in a use-after-free. Fixed by a synchronize_rcu() call when unregistering callbacks. Bug #1 escaped notice because it's extremely unlikely a compiler will reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest() guard all but guarantees the consumer will win the race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest and teardown down all VMs before KVM start its module unload / unregister sequence. This also makes it all but impossible to encounter bug #3. Bug #2 has not been a problem because all architectures that register callbacks are strongly ordered and/or have a static set of callbacks. But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming kvm_intel module load/unload leads to: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:perf_misc_flags+0x1c/0x70 Call Trace: perf_prepare_sample+0x53/0x6b0 perf_event_output_forward+0x67/0x160 __perf_event_overflow+0x52/0xf0 handle_pmi_common+0x207/0x300 intel_pmu_handle_irq+0xcf/0x410 perf_event_nmi_handler+0x28/0x50 nmi_handle+0xc7/0x260 default_do_nmi+0x6b/0x170 exc_nmi+0x103/0x130 asm_exc_nmi+0x76/0xbf Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection from host") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-16ARM: dts: exynos: Fix BCM4330 Bluetooth reset polarity in I9100Paul Cercueil1-1/+1
commit 9cb6de45a006a9799ec399bce60d64b6d4fcc4af upstream. The reset GPIO was marked active-high, which is against what's specified in the documentation. Mark the reset GPIO as active-low. With this change, Bluetooth can now be used on the i9100. Fixes: 8620cc2f99b7 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2") Cc: stable@vger.kernel.org Signed-off-by: Paul Cercueil <paul@crapouillou.net> Link: https://lore.kernel.org/r/20211031234137.87070-1-paul@crapouillou.net Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11ARM: dts: gpio-ranges property is now requiredPhil Elwell2-0/+4
[ Upstream commit c8013355ead68dce152cf426686f8a5f80d88b40 ] Since [1], added in 5.7, the absence of a gpio-ranges property has prevented GPIOs from being restored to inputs when released. Add those properties for BCM283x and BCM2711 devices. [1] commit 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Link: https://lore.kernel.org/r/20220104170247.956760-1-linus.walleij@linaro.org Fixes: 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Fixes: 266423e60ea1 ("pinctrl: bcm2835: Change init order for gpio hogs") Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Phil Elwell <phil@raspberrypi.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20211206092237.4105895-3-phil@raspberrypi.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-05parisc: Clear stale IIR value on instruction access rights trapHelge Deller1-0/+2
[ Upstream commit 484730e5862f6b872dca13840bed40fd7c60fa26 ] When a trap 7 (Instruction access rights) occurs, this means the CPU couldn't execute an instruction due to missing execute permissions on the memory region. In this case it seems the CPU didn't even fetched the instruction from memory and thus did not store it in the cr19 (IIR) register before calling the trap handler. So, the trap handler will find some random old stale value in cr19. This patch simply overwrites the stale IIR value with a constant magic "bad food" value (0xbaadf00d), in the hope people don't start to try to understand the various random IIR values in trap 7 dumps. Noticed-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handlingArd Biesheuvel1-5/+3
commit 8536a5ef886005bc443c2da9b842d69fd3d7647f upstream. The Thumb2 version of the FP exception handling entry code treats the register holding the CP number (R8) differently, resulting in the iWMMXT CP number check to be incorrect. Fix this by unifying the ARM and Thumb2 code paths, and switch the order of the additions of the TI_USED_CP offset and the shifted CP index. Cc: <stable@vger.kernel.org> Fixes: b86040a59feb ("Thumb-2: Implementation of the unified start-up and exceptions code") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPUSean Christopherson1-2/+1
commit fdba608f15e2427419997b0898750a49a735afcb upstream. Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29x86/pkey: Fix undefined behaviour with PKRU_WD_BITAndrew Cooper1-2/+2
commit 57690554abe135fee81d6ac33cc94d75a7e224bb upstream. Both __pkru_allows_write() and arch_set_user_pkey_access() shift PKRU_WD_BIT (a signed constant) by up to 30 bits, hitting the sign bit. Use unsigned constants instead. Clearly pkey 15 has not been used in combination with UBSAN yet. Noticed by code inspection only. I can't actually provoke the compiler into generating incorrect logic as far as this shift is concerned. [ dhansen: add stable@ tag, plus minor changelog massaging, For anyone doing backports, these #defines were in arch/x86/include/asm/pgtable.h before 784a46618f6. ] Fixes: 33a709b25a76 ("mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211216000856.4480-1-andrew.cooper3@citrix.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29parisc: Fix mask used to select futex spinlockJohn David Anglin1-2/+2
commit d3a5a68cff47f6eead84504c3c28376b85053242 upstream. The address bits used to select the futex spinlock need to match those used in the LWS code in syscall.S. The mask 0x3f8 only selects 7 bits. It should select 8 bits. This change fixes the glibc nptl/tst-cond24 and nptl/tst-cond25 tests. Signed-off-by: John David Anglin <dave.anglin@bell.net> Fixes: 53a42b6324b8 ("parisc: Switch to more fine grained lws locks") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29parisc: Correct completer in lws startJohn David Anglin1-1/+1
commit 8f66fce0f46560b9e910787ff7ad0974441c4f9c upstream. The completer in the "or,ev %r1,%r30,%r30" instruction is reversed, so we are not clipping the LWS number when we are called from a 32-bit process (W=0). We need to nulify the following depdi instruction when the least-significant bit of %r30 is 1. If the %r20 register is not clipped, a user process could perform a LWS call that would branch to an undefined location in the kernel and potentially crash the machine. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29ARM: dts: imx6qdl-wandboard: Fix Ethernet supportMartin Haaß1-0/+1
[ Upstream commit 39e660687ac0c57499134765abbecf71cfd11eae ] Currently, the imx6q-wandboard Ethernet does not transmit any data. This issue has been exposed by commit f5d9aa79dfdf ("ARM: imx6q: remove clk-out fixup for the Atheros AR8031 and AR8035 PHYs"). Fix it by describing the qca,clk-out-frequency property as suggested by the commit above. Fixes: 77591e42458d ("ARM: dts: imx6qdl-wandboard: add ethernet PHY description") Signed-off-by: Martin Haaß <vvvrrooomm@gmail.com> Tested-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29arm64: dts: allwinner: orangepi-zero-plus: fix PHY modeRobert Marko1-1/+1
[ Upstream commit 08d2061ff9c5319a07bf9ca6bbf11fdec68f704a ] Orange Pi Zero Plus uses a Realtek RTL8211E RGMII Gigabit PHY, but its currently set to plain RGMII mode meaning that it doesn't introduce delays. With this setup, TX packets are completely lost and changing the mode to RGMII-ID so the PHY will add delays internally fixes the issue. Fixes: a7affb13b271 ("arm64: allwinner: H5: Add Xunlong Orange Pi Zero Plus") Acked-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Ron Goossens <rgoossens@gmail.com> Tested-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20211117140222.43692-1-robert.marko@sartura.hr Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-29arm64: vdso32: require CROSS_COMPILE_COMPAT for gcc+bfdNick Desaulniers2-13/+7
commit 3e6f8d1fa18457d54b20917bd9174d27daf09ab9 upstream. Similar to commit 231ad7f409f1 ("Makefile: infer --target from ARCH for CC=clang") There really is no point in setting --target based on $CROSS_COMPILE_COMPAT for clang when the integrated assembler is being used, since commit ef94340583ee ("arm64: vdso32: drop -no-integrated-as flag"). Allows COMPAT_VDSO to be selected without setting $CROSS_COMPILE_COMPAT when using clang and lld together. Before: $ ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make -j72 LLVM=1 defconfig $ grep CONFIG_COMPAT_VDSO .config CONFIG_COMPAT_VDSO=y $ ARCH=arm64 make -j72 LLVM=1 defconfig $ grep CONFIG_COMPAT_VDSO .config $ After: $ ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make -j72 LLVM=1 defconfig $ grep CONFIG_COMPAT_VDSO .config CONFIG_COMPAT_VDSO=y $ ARCH=arm64 make -j72 LLVM=1 defconfig $ grep CONFIG_COMPAT_VDSO .config CONFIG_COMPAT_VDSO=y Reviewed-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20211019223646.1146945-5-ndesaulniers@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29arm64: vdso32: drop -no-integrated-as flagNick Desaulniers1-8/+0
commit ef94340583eec5cb1544dc41a87baa4f684b3fe1 upstream. Clang can assemble these files just fine; this is a relic from the top level Makefile conditionally adding this. We no longer need --prefix, --gcc-toolchain, or -Qunused-arguments flags either with this change, so remove those too. To test building: $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \ CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make LLVM=1 LLVM_IAS=1 \ defconfig arch/arm64/kernel/vdso32/ Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Tested-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210420174427.230228-1-ndesaulniers@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad nameFabio Estevam1-1/+1
commit 737e65c7956795b3553781fb7bc82fce1c39503f upstream. According to the i.MX6ULL Reference Manual, pad CSI_DATA07 may have the ESAI_TX0 functionality, not ESAI_T0. Also, NXP's i.MX Config Tools 10.0 generates dtsi with the MX6ULL_PAD_CSI_DATA07__ESAI_TX0 naming, so fix it accordingly. There are no devicetree users in mainline that use the old name, so just remove the old entry. Fixes: c201369d4aa5 ("ARM: dts: imx6ull: add imx6ull support") Reported-by: George Makarov <georgemakarov1@gmail.com> Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22KVM: x86: Drop guest CPUID check for host initiated writes to ↵Vitaly Kuznetsov1-1/+1
MSR_IA32_PERF_CAPABILITIES [ Upstream commit 1aa2abb33a419090c7c87d4ae842a6347078ee12 ] The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should not depend on guest visible CPUID entries, even if just to allow creating/restoring guest MSRs and CPUIDs in any sequence. Fixes: 27461da31089 ("KVM: x86/pmu: Support full width counting") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22powerpc/85xx: Fix oops when CONFIG_FSL_PMC=nXiaoming Ni1-2/+2
[ Upstream commit 3dc709e518b47386e6af937eaec37bb36539edfd ] When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare in the mpc85xx_pm_ops structure. As a result, oops is triggered in smp_85xx_start_cpu(). smp: Bringing up secondary CPUs ... kernel tried to execute user page (0) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch (NULL pointer?) Faulting instruction address: 0x00000000 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [00000000] 0x0 LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568 Call Trace: [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable) [c1051de8] [c0011460] __cpu_up+0xc0/0x228 [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224 [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8 [c1051eb8] [c07e67bc] smp_init+0x30/0x78 [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8 [c1051f18] [c00032d8] kernel_init+0x14/0x124 [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c Fixes: c45361abb918 ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n") Reported-by: Martin Kennedy <hurricos@gmail.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Tested-by: Martin Kennedy <hurricos@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211126041153.16926-1-nixiaoming@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22s390/kexec_file: fix error handling when applying relocationsPhilipp Rudo1-1/+6
[ Upstream commit 41967a37b8eedfee15b81406a9f3015be90d3980 ] arch_kexec_apply_relocations_add currently ignores all errors returned by arch_kexec_do_relocs. This means that every unknown relocation is silently skipped causing unpredictable behavior while the relocated code runs. Fix this by checking for errors and fail kexec_file_load if an unknown relocation type is encountered. The problem was found after gcc changed its behavior and used R_390_PLT32DBL relocations for brasl instruction and relied on ld to resolve the relocations in the final link in case direct calls are possible. As the purgatory code is only linked partially (option -r) ld didn't resolve the relocations leaving them for arch_kexec_do_relocs. But arch_kexec_do_relocs doesn't know how to handle R_390_PLT32DBL relocations so they were silently skipped. This ultimately caused an endless loop in the purgatory as the brasl instructions kept branching to itself. Fixes: 71406883fd35 ("s390/kexec_file: Add kexec_file_load system call") Reported-by: Tao Liu <ltao@redhat.com> Signed-off-by: Philipp Rudo <prudo@redhat.com> Link: https://lore.kernel.org/r/20211208130741.5821-3-prudo@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22ARM: socfpga: dts: fix qspi node compatibleDinh Nguyen7-8/+8
[ Upstream commit cb25b11943cbcc5a34531129952870420f8be858 ] The QSPI flash node needs to have the required "jedec,spi-nor" in the compatible string. Fixes: 1df99da8953 ("ARM: dts: socfpga: Enable QSPI in Arria10 devkit") Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22arm64: dts: rockchip: fix audio-supply for Rock Pi 4Alex Bee1-1/+1
[ Upstream commit 8240e87f16d17a9592c9d67857a3dcdbcb98f10d ] As stated in the schematics [1] and [2] P5 the APIO5 domain is supplied by RK808-D Buck4, which in our case vcc1v8_codec - i.e. a 1.8 V regulator. Currently only white noise comes from the ES8316's output, which - for whatever reason - came up only after the the correct switch from i2s0_8ch_bus to i2s0_2ch_bus for i2s0's pinctrl was done. Fix this by setting the correct regulator for audio-supply. [1] https://dl.radxa.com/rockpi4/docs/hw/rockpi4/rockpi4_v13_sch_20181112.pdf [2] https://dl.radxa.com/rockpi4/docs/hw/rockpi4/rockpi_4c_v12_sch_20200620.pdf Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support") Signed-off-by: Alex Bee <knaerzche@gmail.com> Link: https://lore.kernel.org/r/20211027143726.165809-1-knaerzche@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supplyJohn Keeping1-1/+1
[ Upstream commit 2b454a90e2ccdd6e03f88f930036da4df577be76 ] Correct a typo in the vin-supply property. The input supply is always-on, so this mistake doesn't affect whether the supply is actually enabled correctly. Fixes: fc702ed49a86 ("arm64: dts: rockchip: Add dts for Leez RK3399 P710 SBC") Signed-off-by: John Keeping <john@metanate.com> Link: https://lore.kernel.org/r/20211102182908.3409670-3-john@metanate.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supplyJohn Keeping1-1/+1
[ Upstream commit 772fb46109f635dd75db20c86b7eaf48efa46cef ] Correct a typo in the vin-supply property. The input supply is always-on, so this mistake doesn't affect whether the supply is actually enabled correctly. Fixes: 4403e1237be3 ("arm64: dts: rockchip: Add devicetree for board roc-rk3308-cc") Signed-off-by: John Keeping <john@metanate.com> Link: https://lore.kernel.org/r/20211102182908.3409670-2-john@metanate.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edgeArtem Lapkin1-1/+0
[ Upstream commit 6dd0053683804427529ef3523f7872f473440a19 ] Remove mmc-hs400-enhanced-strobe from the rk3399-khadas-edge dts to improve compatibility with a wider range of eMMC chips. Before (BJTD4R 29.1 GiB): [ 7.001493] mmc2: CQHCI version 5.10 [ 7.027971] mmc2: SDHCI controller on fe330000.mmc [fe330000.mmc] using ADMA ....... [ 7.207086] mmc2: mmc_select_hs400es failed, error -110 [ 7.207129] mmc2: error -110 whilst initialising MMC card [ 7.308893] mmc2: mmc_select_hs400es failed, error -110 [ 7.308921] mmc2: error -110 whilst initialising MMC card [ 7.427524] mmc2: mmc_select_hs400es failed, error -110 [ 7.427546] mmc2: error -110 whilst initialising MMC card [ 7.590993] mmc2: mmc_select_hs400es failed, error -110 [ 7.591012] mmc2: error -110 whilst initialising MMC card After: [ 6.960785] mmc2: CQHCI version 5.10 [ 6.984672] mmc2: SDHCI controller on fe330000.mmc [fe330000.mmc] using ADMA [ 7.175021] mmc2: Command Queue Engine enabled [ 7.175053] mmc2: new HS400 MMC card at address 0001 [ 7.175808] mmcblk2: mmc2:0001 BJTD4R 29.1 GiB [ 7.176033] mmcblk2boot0: mmc2:0001 BJTD4R 4.00 MiB [ 7.176245] mmcblk2boot1: mmc2:0001 BJTD4R 4.00 MiB [ 7.176495] mmcblk2rpmb: mmc2:0001 BJTD4R 4.00 MiB, chardev (242:0) Fixes: c2aacceedc86 ("arm64: dts: rockchip: Add support for Khadas Edge/Edge-V/Captain boards") Signed-off-by: Artem Lapkin <art@khadas.com> Link: https://lore.kernel.org/r/20211115083321.2627461-1-art@khadas.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22arm64: dts: imx8mp-evk: Improve the Ethernet PHY descriptionFabio Estevam1-0/+2
commit 798a1807ab13a38e21c6fecd8d22a513d6786e2d upstream. According to the datasheet RTL8211, it must be asserted low for at least 10ms and at least 72ms "for internal circuits settling time" before accessing the PHY registers. Add properties to describe such requirements. Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Fabio Estevam <festevam@gmail.com> Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22arm64: dts: imx8m: correct assigned clocks for FECJoakim Zhang3-9/+12
commit 70eacf42a93aff6589a8b91279bbfe5f73c4ca3d upstream. CLK_ENET_TIMER assigned clocks twice, should be a typo, correct to CLK_ENET_PHY_REF clock. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAMMike Rapoport1-1/+3
commit 024591f9a6e0164ec23301784d1e6d8f6cacbe59 upstream. [ Upstream commit 024591f9a6e0164ec23301784d1e6d8f6cacbe59 ] The semantics of pfn_valid() is to check presence of the memory map for a PFN and not whether a PFN is in RAM. The memory map may be present for a hole in the physical memory and if such hole corresponds to an MMIO range, __arm_ioremap_pfn_caller() will produce a WARN() and fail: [ 2.863406] WARNING: CPU: 0 PID: 1 at arch/arm/mm/ioremap.c:287 __arm_ioremap_pfn_caller+0xf0/0x1dc [ 2.864812] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-09882-ga180bd1d7e16 #1 [ 2.865263] Hardware name: Generic DT based system [ 2.865711] Backtrace: [ 2.866063] [<80b07e58>] (dump_backtrace) from [<80b080ac>] (show_stack+0x20/0x24) [ 2.866633] r7:00000009 r6:0000011f r5:60000153 r4:80ddd1c0 [ 2.866922] [<80b0808c>] (show_stack) from [<80b18df0>] (dump_stack_lvl+0x58/0x74) [ 2.867117] [<80b18d98>] (dump_stack_lvl) from [<80b18e20>] (dump_stack+0x14/0x1c) [ 2.867309] r5:80118cac r4:80dc6774 [ 2.867404] [<80b18e0c>] (dump_stack) from [<80122fcc>] (__warn+0xe4/0x150) [ 2.867583] [<80122ee8>] (__warn) from [<80b08850>] (warn_slowpath_fmt+0x88/0xc0) [ 2.867774] r7:0000011f r6:80dc6774 r5:00000000 r4:814c4000 [ 2.867917] [<80b087cc>] (warn_slowpath_fmt) from [<80118cac>] (__arm_ioremap_pfn_caller+0xf0/0x1dc) [ 2.868158] r9:00000001 r8:9ef00000 r7:80e8b0d4 r6:0009ef00 r5:00000000 r4:00100000 [ 2.868346] [<80118bbc>] (__arm_ioremap_pfn_caller) from [<80118df8>] (__arm_ioremap_caller+0x60/0x68) [ 2.868581] r9:9ef00000 r8:821b6dc0 r7:00100000 r6:00000000 r5:815d1010 r4:80118d98 [ 2.868761] [<80118d98>] (__arm_ioremap_caller) from [<80118fcc>] (ioremap+0x28/0x30) [ 2.868958] [<80118fa4>] (ioremap) from [<8062871c>] (__devm_ioremap_resource+0x154/0x1c8) [ 2.869169] r5:815d1010 r4:814c5d2c [ 2.869263] [<806285c8>] (__devm_ioremap_resource) from [<8062899c>] (devm_ioremap_resource+0x14/0x18) [ 2.869495] r9:9e9f57a0 r8:814c4000 r7:815d1000 r6:815d1010 r5:8177c078 r4:815cf400 [ 2.869676] [<80628988>] (devm_ioremap_resource) from [<8091c6e4>] (fsi_master_acf_probe+0x1a8/0x5d8) [ 2.869909] [<8091c53c>] (fsi_master_acf_probe) from [<80723dbc>] (platform_probe+0x68/0xc8) [ 2.870124] r9:80e9dadc r8:00000000 r7:815d1010 r6:810c1000 r5:815d1010 r4:00000000 [ 2.870306] [<80723d54>] (platform_probe) from [<80721208>] (really_probe+0x1cc/0x470) [ 2.870512] r7:815d1010 r6:810c1000 r5:00000000 r4:815d1010 [ 2.870651] [<8072103c>] (really_probe) from [<807215cc>] (__driver_probe_device+0x120/0x1fc) [ 2.870872] r7:815d1010 r6:810c1000 r5:810c1000 r4:815d1010 [ 2.871013] [<807214ac>] (__driver_probe_device) from [<807216e8>] (driver_probe_device+0x40/0xd8) [ 2.871244] r9:80e9dadc r8:00000000 r7:815d1010 r6:810c1000 r5:812feaa0 r4:812fe994 [ 2.871428] [<807216a8>] (driver_probe_device) from [<80721a58>] (__driver_attach+0xa8/0x1d4) [ 2.871647] r9:80e9dadc r8:00000000 r7:00000000 r6:810c1000 r5:815d1054 r4:815d1010 [ 2.871830] [<807219b0>] (__driver_attach) from [<8071ee8c>] (bus_for_each_dev+0x88/0xc8) [ 2.872040] r7:00000000 r6:814c4000 r5:807219b0 r4:810c1000 [ 2.872194] [<8071ee04>] (bus_for_each_dev) from [<80722208>] (driver_attach+0x28/0x30) [ 2.872418] r7:810a2aa0 r6:00000000 r5:821b6000 r4:810c1000 [ 2.872570] [<807221e0>] (driver_attach) from [<8071f80c>] (bus_add_driver+0x114/0x200) [ 2.872788] [<8071f6f8>] (bus_add_driver) from [<80722ec4>] (driver_register+0x98/0x128) [ 2.873011] r7:81011d0c r6:814c4000 r5:00000000 r4:810c1000 [ 2.873167] [<80722e2c>] (driver_register) from [<80725240>] (__platform_driver_register+0x2c/0x34) [ 2.873408] r5:814dcb80 r4:80f2a764 [ 2.873513] [<80725214>] (__platform_driver_register) from [<80f2a784>] (fsi_master_acf_init+0x20/0x28) [ 2.873766] [<80f2a764>] (fsi_master_acf_init) from [<80f014a8>] (do_one_initcall+0x108/0x290) [ 2.874007] [<80f013a0>] (do_one_initcall) from [<80f01840>] (kernel_init_freeable+0x1ac/0x230) [ 2.874248] r9:80e9dadc r8:80f3987c r7:80f3985c r6:00000007 r5:814dcb80 r4:80f627a4 [ 2.874456] [<80f01694>] (kernel_init_freeable) from [<80b19f44>] (kernel_init+0x20/0x138) [ 2.874691] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:80b19f24 [ 2.874894] r4:00000000 [ 2.874977] [<80b19f24>] (kernel_init) from [<80100170>] (ret_from_fork+0x14/0x24) [ 2.875231] Exception stack(0x814c5fb0 to 0x814c5ff8) [ 2.875535] 5fa0: 00000000 00000000 00000000 00000000 [ 2.875849] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.876133] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.876363] r5:80b19f24 r4:00000000 [ 2.876683] ---[ end trace b2f74b8536829970 ]--- [ 2.876911] fsi-master-acf gpio-fsi: ioremap failed for resource [mem 0x9ef00000-0x9effffff] [ 2.877492] fsi-master-acf gpio-fsi: Error -12 mapping coldfire memory [ 2.877689] fsi-master-acf: probe of gpio-fsi failed with error -12 Use memblock_is_map_memory() instead of pfn_valid() to check if a PFN is in RAM or not. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17arm: extend pfn_valid to take into account freed memory map alignmentMike Rapoport1-1/+12
[ Upstream commit a4d5613c4dc6d413e0733e37db9d116a2a36b9f3 ] When unused memory map is freed the preserved part of the memory map is extended to match pageblock boundaries because lots of core mm functionality relies on homogeneity of the memory map within pageblock boundaries. Since pfn_valid() is used to check whether there is a valid memory map entry for a PFN, make it return true also for PFNs that have memory map entries even if there is no actual memory populated there. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17memblock: align freed memory map on pageblock boundaries with SPARSEMEMMike Rapoport1-3/+5
[ Upstream commit f921f53e089a12a192808ac4319f28727b35dc0f ] When CONFIG_SPARSEMEM=y the ranges of the memory map that are freed are not aligned to the pageblock boundaries which breaks assumptions about homogeneity of the memory map throughout core mm code. Make sure that the freed memory map is always aligned on pageblock boundaries regardless of the memory model selection. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ [backport upstream modification in mm/memblock.c to arch/arm/mm/init.c] Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17memblock: free_unused_memmap: use pageblock units instead of MAX_ORDERMike Rapoport1-8/+8
[ Upstream commit e2a86800d58639b3acde7eaeb9eb393dca066e08 ] The code that frees unused memory map uses rounds start and end of the holes that are freed to MAX_ORDER_NR_PAGES to preserve continuity of the memory map for MAX_ORDER regions. Lots of core memory management functionality relies on homogeneity of the memory map within each pageblock which size may differ from MAX_ORDER in certain configurations. Although currently, for the architectures that use free_unused_memmap(), pageblock_order and MAX_ORDER are equivalent, it is cleaner to have common notation thought mm code. Replace MAX_ORDER_NR_PAGES with pageblock_nr_pages and update the comments to make it more clear why the alignment to pageblock boundaries is required. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ [backport upstream modification in mm/memblock.c to arch/arm/mm/init.c] Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI reqSean Christopherson1-2/+5
commit 3244867af8c065e51969f1bffe732d3ebfd9a7d2 upstream. Do not bail early if there are no bits set in the sparse banks for a non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is legal to have a variable length of '0', e.g. VP_SET's BankContents in this case, if the request can be serviced without the extra info. It is possible that for a given invocation of a hypercall that does accept variable sized input headers that all the header input fits entirely within the fixed size header. In such cases the variable sized input header is zero-sized and the corresponding bits in the hypercall input should be set to zero. Bailing early results in KVM failing to send IPIs to all CPUs as expected by the guest. Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17s390/test_unwind: use raw opcode instead of invalid instructionIlie Halip1-2/+3
[ Upstream commit 53ae7230918154d1f4281d7aa3aae9650436eadf ] Building with clang & LLVM_IAS=1 leads to an error: arch/s390/lib/test_unwind.c:179:4: error: invalid register pair " mvcl %%r1,%%r1\n" ^ The test creates an invalid instruction that would trap at runtime, but the LLVM inline assembler tries to validate it at compile time too. Use the raw instruction opcode instead. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1421 Link: https://lore.kernel.org/r/20211117174822.3632412-1-ilie.halip@gmail.com Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [hca@linux.ibm.com: use illegal opcode, and update comment] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-17KVM: arm64: Save PSTATE early on exitMarc Zyngier2-1/+12
[ Upstream commit 83bb2c1a01d7127d5adc7d69d7aaa3f7072de2b4 ] In order to be able to use primitives such as vcpu_mode_is_32bit(), we need to synchronize the guest PSTATE. However, this is currently done deep into the bowels of the world-switch code, and we do have helpers evaluating this much earlier (__vgic_v3_perform_cpuif_access and handle_aarch32_guest, for example). Move the saving of the guest pstate into the early fixups, which cures the first issue. The second one will be addressed separately. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-14csky: fix typo of fpu config macroKelly Devilliv1-2/+2
commit a0793fdad9a11a32bc6d21317c93c83f4aa82ebc upstream. Fix typo which will cause fpe and privilege exception error. Signed-off-by: Kelly Devilliv <kelly.devilliv@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush ↵Vitaly Kuznetsov1-1/+1
hypercall commit 1ebfaa11ebb5b603a3c3f54b2e84fcf1030f5a14 upstream. Prior to commit 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs and KVM_REQ_TLB_FLUSH is/was defined as: (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that "This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred." and without KVM_REQUEST_WAIT there's a small chance that the vCPU making the TLB flush will resume running before all IPIs get delivered to other vCPUs and a stale mapping can get read there. Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST: kvm_hv_flush_tlb() is the sole caller which uses it for kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where KVM_REQUEST_WAIT makes a difference. Cc: stable@kernel.org Fixes: 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211209102937.584397-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14x86/sme: Explicitly map new EFI memmap table as encryptedTom Lendacky2-1/+3
commit 1ff2fc02862d52e18fd3daabcfe840ec27e920a8 upstream. Reserving memory using efi_mem_reserve() calls into the x86 efi_arch_mem_reserve() function. This function will insert a new EFI memory descriptor into the EFI memory map representing the area of memory to be reserved and marking it as EFI runtime memory. As part of adding this new entry, a new EFI memory map is allocated and mapped. The mapping is where a problem can occur. This new memory map is mapped using early_memremap() and generally mapped encrypted, unless the new memory for the mapping happens to come from an area of memory that is marked as EFI_BOOT_SERVICES_DATA memory. In this case, the new memory will be mapped unencrypted. However, during replacement of the old memory map, efi_mem_type() is disabled, so the new memory map will now be long-term mapped encrypted (in efi.memmap), resulting in the map containing invalid data and causing the kernel boot to crash. Since it is known that the area will be mapped encrypted going forward, explicitly map the new memory map as encrypted using early_memremap_prot(). Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear") Link: https://lore.kernel.org/all/ebf1eb2940405438a09d51d121ec0d02c8755558.1634752931.git.thomas.lendacky@amd.com/ Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ardb: incorporate Kconfig fix by Arnd] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08parisc: Mark cr16 CPU clocksource unstable on all SMP machinesHelge Deller1-19/+5
commit afdb4a5b1d340e4afffc65daa21cc71890d7d589 upstream. In commit c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources") I assumed that CPUs on the same physical core are syncronous. While booting up the kernel on two different C8000 machines, one with a dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be wrong. The symptom was that I saw a jump in the internal clocks printed to the syslog and strange overall behaviour. On machines which have 4 cores (2 dual-cores) the problem isn't visible, because the current logic already marked the cr16 clocksource unstable in this case. This patch now marks the cr16 interval timers unstable if we have more than one CPU in the system, and it fixes this issue. Fixes: c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources") Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08x86/64/mm: Map all kernel memory into trampoline_pgdJoerg Roedel1-1/+11
commit 51523ed1c26758de1af7e58730a656875f72f783 upstream. The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff range of kernel memory (with 4-level paging). This range contains the kernel's text+data+bss mappings and the module mapping space but not the direct mapping and the vmalloc area. This is enough to get the application processors out of real-mode, but for code that switches back to real-mode the trampoline_pgd is missing important parts of the address space. For example, consider this code from arch/x86/kernel/reboot.c, function machine_real_restart() for a 64-bit kernel: #ifdef CONFIG_X86_32 load_cr3(initial_page_table); #else write_cr3(real_mode_header->trampoline_pgd); /* Exiting long mode will fail if CR4.PCIDE is set. */ if (boot_cpu_has(X86_FEATURE_PCID)) cr4_clear_bits(X86_CR4_PCIDE); #endif /* Jump to the identity-mapped low memory code */ #ifdef CONFIG_X86_32 asm volatile("jmpl *%0" : : "rm" (real_mode_header->machine_real_restart_asm), "a" (type)); #else asm volatile("ljmpl *%0" : : "m" (real_mode_header->machine_real_restart_asm), "D" (type)); #endif The code switches to the trampoline_pgd, which unmaps the direct mapping and also the kernel stack. The call to cr4_clear_bits() will find no stack and crash the machine. The real_mode_header pointer below points into the direct mapping, and dereferencing it also causes a crash. The reason this does not crash always is only that kernel mappings are global and the CR3 switch does not flush those mappings. But if theses mappings are not in the TLB already, the above code will crash before it can jump to the real-mode stub. Extend the trampoline_pgd to contain all kernel mappings to prevent these crashes and to make code which runs on this page-table more robust. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211202153226.22946-5-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08x86/tsc: Disable clocksource watchdog for TSC on qualified platormsFeng Tang1-4/+24
commit b50db7095fe002fa3e16605546cba66bf1b68a3e upstream. There are cases that the TSC clocksource is wrongly judged as unstable by the clocksource watchdog mechanism which tries to validate the TSC against HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to check the validity of a watchdog, Thomas Gleixner proposed [1]: "I'm inclined to lift that requirement when the CPU has: 1) X86_FEATURE_CONSTANT_TSC 2) X86_FEATURE_NONSTOP_TSC 3) X86_FEATURE_NONSTOP_TSC_S3 4) X86_FEATURE_TSC_ADJUST 5) At max. 4 sockets After two decades of horrors we're finally at a point where TSC seems to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST was really key as we can now detect even small modifications reliably and the important point is that we can cure them as well (not pretty but better than all other options)." As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive to use maximal 2 sockets. The check is done inside tsc_init() before registering 'tsc-early' and 'tsc' clocksources, as there were cases that both of them had been wrongly judged as unreliable. For more background of tsc/watchdog, there is a good summary in [2] [tglx} Update vs. jiffies: On systems where the only remaining clocksource aside of TSC is jiffies there is no way to make this work because that creates a circular dependency. Jiffies accuracy depends on not missing a periodic timer interrupt, which is not guaranteed. That could be detected by TSC, but as TSC is not trusted this cannot be compensated. The consequence is a circulus vitiosus which results in shutting down TSC and falling back to the jiffies clocksource which is even more unreliable. [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/ [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/ [ tglx: Refine comment and amend changelog ] Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08x86/tsc: Add a timer to make sure TSC_adjust is always checkedFeng Tang1-0/+41
commit c7719e79347803b8e3b6b50da8c6db410a3012b5 upstream. The TSC_ADJUST register is checked every time a CPU enters idle state, but Thomas Gleixner mentioned there is still a caveat that a system won't enter idle [1], either because it's too busy or configured purposely to not enter idle. Setup a periodic timer (every 10 minutes) to make sure the check is happening on a regular base. [1] https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/ Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability") Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211117023751.24190-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08parisc: Fix "make install" on newer debian releasesHelge Deller1-0/+1
commit 0f9fee4cdebfbe695c297e5b603a275e2557c1cc upstream. On newer debian releases the debian-provided "installkernel" script is installed in /usr/sbin. Fix the kernel install.sh script to look for the script in this directory as well. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08parisc: Fix KBUILD_IMAGE for self-extracting kernelHelge Deller1-0/+5
commit 1d7c29b77725d05faff6754d2f5e7c147aedcf93 upstream. Default KBUILD_IMAGE to $(boot)/bzImage if a self-extracting (CONFIG_PARISC_SELF_EXTRACT=y) kernel is to be built. This fixes the bindeb-pkg make target. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()Lai Jiangshan1-11/+5
[ Upstream commit c07e45553da1808aa802e9f0ffa8108cfeaf7a17 ] Commit 18ec54fdd6d18 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") added FENCE_SWAPGS_{KERNEL|USER}_ENTRY for conditional SWAPGS. In paranoid_entry(), it uses only FENCE_SWAPGS_KERNEL_ENTRY for both branches. This is because the fence is required for both cases since the CR3 write is conditional even when PTI is enabled. But 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") changed the order of SWAPGS and the CR3 write. And it missed the needed FENCE_SWAPGS_KERNEL_ENTRY for the user gsbase case. Add it back by changing the branches so that FENCE_SWAPGS_KERNEL_ENTRY can cover both branches. [ bp: Massage, fix typos, remove obsolete comment while at it. ] Fixes: 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211126101209.8613-2-jiangshanlai@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08x86/pv: Switch SWAPGS to ALTERNATIVEJuergen Gross8-47/+13
[ Upstream commit 53c9d9240944088274aadbbbafc6138ca462db4f ] SWAPGS is used only for interrupts coming from user mode or for returning to user mode. So there is no reason to use the PARAVIRT framework, as it can easily be replaced by an ALTERNATIVE depending on X86_FEATURE_XENPV. There are several instances using the PV-aware SWAPGS macro in paths which are never executed in a Xen PV guest. Replace those with the plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210120135555.32594-5-jgross@suse.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08x86/xen: Add xenpv_restore_regs_and_return_to_usermode()Lai Jiangshan2-0/+24
[ Upstream commit 5c8f6a2e316efebb3ba93d8c1af258155dcf5632 ] In the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the trampoline stack. But XEN pv doesn't use trampoline stack, so PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack. In that case, source and destination stacks are identical, which means that reusing swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp to move up to the top of the kernel stack and leave the IRET frame below %rsp. This is dangerous as it can be corrupted if #NMI / #MC hit as either of these events occurring in the middle of the stack pushing would clobber data on the (original) stack. And, with XEN pv, swapgs_restore_regs_and_return_to_usermode() pushing the IRET frame on to the original address is useless and error-prone when there is any future attempt to modify the code. [ bp: Massage commit message. ] Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lkml.kernel.org/r/20211126101209.8613-4-jiangshanlai@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08x86/entry: Use the correct fence macro after swapgs in kernel CR3Lai Jiangshan1-7/+8
[ Upstream commit 1367afaa2ee90d1c956dfc224e199fcb3ff3f8cc ] The commit c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching") removed a CR3 write in the faulting path of load_gs_index(). But the path's FENCE_SWAPGS_USER_ENTRY has no fence operation if PTI is enabled, see spectre_v1_select_mitigation(). Rather, it depended on the serializing CR3 write of SWITCH_TO_KERNEL_CR3 and since it got removed, add a FENCE_SWAPGS_KERNEL_ENTRY call to make sure speculation is blocked. [ bp: Massage commit message and comment. ] Fixes: c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211126101209.8613-3-jiangshanlai@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qwordMichael Sterritt1-18/+39
[ Upstream commit 1d5379d0475419085d3575bd9155f2e558e96390 ] Properly type the operands being passed to __put_user()/__get_user(). Otherwise, these routines truncate data for dependent instructions (e.g., INSW) and only read/write one byte. This has been tested by sending a string with REP OUTSW to a port and then reading it back in with REP INSW on the same port. Previous behavior was to only send and receive the first char of the size. For example, word operations for "abcd" would only read/write "ac". With change, the full string is now written and read back. Fixes: f980f9c31a923 (x86/sev-es: Compile early handler code into kernel image) Signed-off-by: Michael Sterritt <sterritt@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Peter Gonda <pgonda@google.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20211119232757.176201-1-sterritt@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08KVM: VMX: Set failure code in prepare_vmcs02()Dan Carpenter1-1/+3
[ Upstream commit bfbb307c628676929c2d329da0daf9d22afa8ad2 ] The error paths in the prepare_vmcs02() function are supposed to set *entry_failure_code but this path does not. It leads to using an uninitialized variable in the caller. Fixes: 71f7347025bf ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <20211130125337.GB24578@kili> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>