summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2020-05-20x86: Fix early boot crash on gcc-10, third tryBorislav Petkov3-1/+15
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream. ... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entriesFabio Estevam1-2/+2
commit 0caf34350a25907515d929a9c77b9b206aac6d1e upstream. The I2C2 pins are already used and the following errors are seen: imx27-pinctrl 10015000.iomuxc: pin MX27_PAD_I2C2_SDA already requested by 10012000.i2c; cannot claim for 1001d000.i2c imx27-pinctrl 10015000.iomuxc: pin-69 (1001d000.i2c) status -22 imx27-pinctrl 10015000.iomuxc: could not request pin 69 (MX27_PAD_I2C2_SDA) from group i2c2grp on device 10015000.iomuxc imx-i2c 1001d000.i2c: Error applying setting, reverse things back imx-i2c: probe of 1001d000.i2c failed with error -22 Fix it by adding the correct I2C1 IOMUX entries for the pinctrl_i2c1 group. Cc: <stable@vger.kernel.org> Fixes: 61664d0b432a ("ARM: dts: imx27 phyCARD-S pinctrl") Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Stefan Riedmueller <s.riedmueller@phytec.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10MIPS: perf: Remove incorrect odd/even counter handling for I6400Marcin Nowakowski1-1/+5
commit f7a31b5e7874f77464a4eae0a8ba84b9ae0b3a54 upstream. All performance counters on I6400 (odd and even) are capable of counting any of the available events, so drop current logic of using the extra bit to determine which counter to use. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Fixes: 4e88a8621301 ("MIPS: Add cases for CPU_I6400") Fixes: fd716fca10fc ("MIPS: perf: Fix I6400 event numbers") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15991/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10powerpc/pci/of: Parse unassigned resourcesAlexey Kardashevskiy1-2/+10
commit dead1c845dbe97e0061dae2017eaf3bd8f8f06ee upstream. The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing which reads "assigned-addresses" of every PCI device and initializes the device resources. However if the property is missing or zero sized, then there is no fallback of any kind and the PCI resources remain undiscovered, i.e. pdev->resource[] array remains empty. This adds a fallback which parses the "reg" property in pretty much same way except it marks resources as "unset" which later make Linux assign those resources proper addresses. This has an effect when: 1. a hypervisor failed to assign any resource for a device; 2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try assigning a resource. Neither is likely to happen under PowerVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_BLuke Nelson1-3/+15
[ Upstream commit aee194b14dd2b2bde6252b3acf57d36dccfc743a ] This patch fixes an encoding bug in emit_stx for BPF_B when the source register is BPF_REG_FP. The current implementation for BPF_STX BPF_B in emit_stx saves one REX byte when the operands can be encoded using Mod-R/M alone. The lower 8 bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers, (e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit equivalents (%sil, %dil, %bpl, %spl). The current code checks if the source for BPF_STX BPF_B is BPF_REG_1 or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the required REX prefix. However, it misses the case when the source is BPF_REG_FP (mapped to %rbp). The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand will read from register %ch instead of the correct %bpl. This patch fixes the problem by fixing and refactoring the check on which registers need the extra REX byte. Since no BPF registers map to %rsp, there is no need to handle %spl. Fixes: 622582786c9e0 ("net: filter: x86: internal BPF JIT") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200418232655.23870-1-luke.r.nels@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=yAhmad Fatoum1-0/+2
commit f1baca8896ae18e12c45552a4c4ae2086aa7e02c upstream. 512a928affd5 ("ARM: imx: build v7_cpu_resume() unconditionally") introduced an unintended linker error for i.MX6 configurations that have ARM_CPU_SUSPEND=n which can happen if neither CONFIG_PM, CONFIG_CPU_IDLE, nor ARM_PSCI_FW are selected. Fix this by having v7_cpu_resume() compiled only when cpu_resume() it calls is available as well. The C declaration for the function remains unguarded to avoid future code inadvertently using a stub and introducing a regression to the bug the original commit fixed. Cc: <stable@vger.kernel.org> Fixes: 512a928affd5 ("ARM: imx: build v7_cpu_resume() unconditionally") Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Tested-by: Roland Hieber <rhi@pengutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02KVM: VMX: Enable machine check support for 32bit targetsUros Bizjak1-1/+1
commit fb56baae5ea509e63c2a068d66a4d8ea91969fca upstream. There is no reason to limit the use of do_machine_check to 64bit targets. MCE handling works for both target familes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Fixes: a0861c02a981 ("KVM: Add VT-x machine check support") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414071414.45636-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24x86/vdso: Fix lsl operand orderSamuel Neves1-1/+1
commit e78e5a91456fcecaa2efbb3706572fe043766f4d upstream. In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24x86/microcode/intel: replace sync_core() with native_cpuid_reg(eax)Evalds Iodzevics1-1/+1
On Intel it is required to do CPUID(1) before reading the microcode revision MSR. Current code in 4.4 an 4.9 relies on sync_core() to call CPUID, unfortunately on 32 bit machines code inside sync_core() always jumps past CPUID instruction as it depends on data structure boot_cpu_data witch are not populated correctly so early in boot sequence. It depends on: commit 5dedade6dfa2 ("x86/CPU: Add native CPUID variants returning a single datum") This patch is for 4.4 but also should apply to 4.9 Signed-off-by: Evalds Iodzevics <evalds.iodzevics@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24x86/CPU: Add native CPUID variants returning a single datumBorislav Petkov1-0/+18
commit 5dedade6dfa243c130b85d1e4daba6f027805033 upstream. ... similarly to the cpuid_<reg>() variants. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Evalds Iodzevics <evalds.iodzevics@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: s390: vsie: Fix possible race when shadowing region 3 tablesDavid Hildenbrand1-0/+1
[ Upstream commit 1493e0f944f3c319d11e067c185c904d01c17ae5 ] We have to properly retry again by returning -EINVAL immediately in case somebody else instantiated the table concurrently. We missed to add the goto in this function only. The code now matches the other, similar shadowing functions. We are overwriting an existing region 2 table entry. All allocated pages are added to the crst_list to be freed later, so they are not lost forever. However, when unshadowing the region 2 table, we wouldn't trigger unshadowing of the original shadowed region 3 table that we replaced. It would get unshadowed when the original region 3 table is modified. As it's not connected to the page table hierarchy anymore, it's not going to get used anymore. However, for a limited time, this page table will stick around, so it's in some sense a temporary memory leak. Identified by manual code inspection. I don't think this classifies as stable material. Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24powerpc/maple: Fix declaration made after definitionNathan Chancellor1-17/+17
[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24s390/cpuinfo: fix wrong output when CPU0 is offlineAlexander Gordeev1-1/+4
[ Upstream commit 872f27103874a73783aeff2aac2b41a489f67d7c ] /proc/cpuinfo should not print information about CPU 0 when it is offline. Fixes: 281eaa8cb67c ("s390/cpuinfo: simplify locking and skip offline cpus early") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: shortened commit message] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24arm64: cpu_errata: include required headersArnd Bergmann1-0/+2
commit 94a5d8790e79ab78f499d2d9f1ff2cab63849d9f upstream. Without including psci.h and arm-smccc.h, we now get a build failure in some configurations: arch/arm64/kernel/cpu_errata.c: In function 'arm64_update_smccc_conduit': arch/arm64/kernel/cpu_errata.c:278:10: error: 'psci_ops' undeclared (first use in this function); did you mean 'sysfs_ops'? arch/arm64/kernel/cpu_errata.c: In function 'arm64_set_ssbd_mitigation': arch/arm64/kernel/cpu_errata.c:311:3: error: implicit declaration of function 'arm_smccc_1_1_hvc' [-Werror=implicit-function-declaration] arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBDJim Mattson1-1/+2
commit 396d2e878f92ec108e4293f1c77ea3bc90b414ff upstream. The host reports support for the synthetic feature X86_FEATURE_SSBD when any of the three following hardware features are set: CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] CPUID.80000008H:EBX.AMD_SSBD[bit 24] CPUID.80000008H:EBX.VIRT_SSBD[bit 25] Either of the first two hardware features implies the existence of the IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does not. Therefore, CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] should only be set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host. Fixes: 0c54914d0c52a ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jacob Xu <jacobhxu@google.com> Reviewed-by: Peter Shier <pshier@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.x: adjust indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24powerpc/fsl_booke: Avoid creating duplicate tlb1 entryLaurentiu Tudor1-1/+11
[ Upstream commit aa4113340ae6c2811e046f08c2bc21011d20a072 ] In the current implementation, the call to loadcam_multi() is wrapped between switch_to_as1() and restore_to_as0() calls so, when it tries to create its own temporary AS=1 TLB1 entry, it ends up duplicating the existing one created by switch_to_as1(). Add a check to skip creating the temporary entry if already running in AS=1. Fixes: d9e1831a4202 ("powerpc/85xx: Load all early TLB entries at once") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200123111914.2565-1-laurentiu.tudor@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24powerpc/64/tm: Don't let userspace set regs->trap via sigreturnMichael Ellerman1-1/+3
commit c7def7fbdeaa25feaa19caf4a27c5d10bd8789e4 upstream. In restore_tm_sigcontexts() we take the trap value directly from the user sigcontext with no checking: err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]); This means we can be in the kernel with an arbitrary regs->trap value. Although that's not immediately problematic, there is a risk we could trigger one of the uses of CHECK_FULL_REGS(): #define CHECK_FULL_REGS(regs) BUG_ON(regs->trap & 1) It can also cause us to unnecessarily save non-volatile GPRs again in save_nvgprs(), which shouldn't be problematic but is still wrong. It's also possible it could trick the syscall restart machinery, which relies on regs->trap not being == 0xc00 (see 9a81c16b5275 ("powerpc: fix double syscall restarts")), though I haven't been able to make that happen. Finally it doesn't match the behaviour of the non-TM case, in restore_sigcontext() which zeroes regs->trap. So change restore_tm_sigcontexts() to zero regs->trap. This was discovered while testing Nick's upcoming rewrite of the syscall entry path. In that series the call to save_nvgprs() prior to signal handling (do_notify_resume()) is removed, which leaves the low-bit of regs->trap uncleared which can then trigger the FULL_REGS() WARNs in setup_tm_sigcontexts(). Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24s390/diag: fix display of diagnose call statisticsMichael Mueller1-1/+1
commit 6c7c851f1b666a8a455678a0b480b9162de86052 upstream. Show the full diag statistic table and not just parts of it. The issue surfaced in a KVM guest with a number of vcpus defined smaller than NR_DIAG_STAT. Fixes: 1ec2772e0c3c ("s390/diag: add a statistic for diagnose calls") Cc: stable@vger.kernel.org Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24arm64: armv8_deprecated: Fix undef_hook mask for thumb setendFredrik Strupe1-1/+1
commit fc2266011accd5aeb8ebc335c381991f20e26e33 upstream. For thumb instructions, call_undef_hook() in traps.c first reads a u16, and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second u16 is read, which then makes up the the lower half-word of a T32 instruction. For T16 instructions, the second u16 is not read, which makes the resulting u32 opcode always have the upper half set to 0. However, having the upper half of instr_mask in the undef_hook set to 0 masks out the upper half of all thumb instructions - both T16 and T32. This results in trapped T32 instructions with the lower half-word equal to the T16 encoding of setend (b650) being matched, even though the upper half-word is not 0000 and thus indicates a T32 opcode. An example of such a T32 instruction is eaa0b650, which should raise a SIGILL since T32 instructions with an eaa prefix are unallocated as per Arm ARM, but instead works as a SETEND because the second half-word is set to b650. This patch fixes the issue by extending instr_mask to include the upper u32 half, which will still match T16 instructions where the upper half is 0, but not T32 instructions. Fixes: 2d888f48e056 ("arm64: Emulate SETEND for AArch32 tasks") Cc: <stable@vger.kernel.org> # 4.0.x- Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: VMX: fix crash cleanup when KVM wasn't usedVitaly Kuznetsov1-5/+7
commit dbef2808af6c594922fe32833b30f55f35e9da6d upstream. If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4fc2bb ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec supportSean Christopherson1-51/+16
commit 31603d4fc2bb4f0815245d496cb970b27b4f636a upstream. VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[*], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_*() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: x86: Allocate new rmap and large page tracking when moving memslotSean Christopherson1-0/+11
commit edd4fa37baa6ee8e44dc65523b27bd6fe44c94de upstream. Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: s390: vsie: Fix delivery of addressing exceptionsDavid Hildenbrand1-0/+1
commit 4d4cee96fb7a3cc53702a9be8299bf525be4ee98 upstream. Whenever we get an -EFAULT, we failed to read in guest 2 physical address space. Such addressing exceptions are reported via a program intercept to the nested hypervisor. We faked the intercept, we have to return to guest 2. Instead, right now we would be returning -EFAULT from the intercept handler, eventually crashing the VM. the correct thing to do is to return 1 as rc == 1 is the internal representation of "we have to go back into g2". Addressing exceptions can only happen if the g2->g3 page tables reference invalid g2 addresses (say, either a table or the final page is not accessible - so something that basically never happens in sane environments. Identified by manual code inspection. Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checksDavid Hildenbrand1-1/+5
commit a1d032a49522cb5368e5dfb945a85899b4c74f65 upstream. In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. The can result in endless loops, e.g. when prefix cannot get mapped. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24x86/entry/32: Add missing ASM_CLAC to general_protection entryThomas Gleixner1-0/+1
commit 3d51507f29f2153a658df4a0674ec5b592b62085 upstream. All exception entry points must have ASM_CLAC right at the beginning. The general_protection entry is missing one. Fixes: e59d1b0a2419 ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24MIPS: OCTEON: irq: Fix potential NULL pointer dereferenceGustavo A. R. Silva1-0/+3
commit 792a402c2840054533ef56279c212ef6da87d811 upstream. There is a potential NULL pointer dereference in case kzalloc() fails and returns NULL. Fix this by adding a NULL check on *cd* This bug was detected with the help of Coccinelle. Fixes: 64b139f97c01 ("MIPS: OCTEON: irq: add CIB and other fixes") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24acpi/x86: ignore unspecified bit positions in the ACPI global lock fieldJan Engelhardt1-1/+1
commit ecb9c790999fd6c5af0f44783bd0217f0b89ec2b upstream. The value in "new" is constructed from "old" such that all bits defined as reserved by the ACPI spec[1] are left untouched. But if those bits do not happen to be all zero, "new < 3" will not evaluate to true. The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with garbage inside the "FACS" ACPI table. The starting value is old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off the reserved bits. [1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24x86/boot: Use unsigned comparison for addressesArvind Sankar2-3/+3
[ Upstream commit 81a34892c2c7c809f9c4e22c5ac936ae673fb9a2 ] The load address is compared with LOAD_PHYSICAL_ADDR using a signed comparison currently (using jge instruction). When loading a 64-bit kernel using the new efi32_pe_entry() point added by: 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section") using Qemu with -m 3072, the firmware actually loads us above 2Gb, resulting in a very early crash. Use the JAE instruction to perform a unsigned comparison instead, as physical addresses should be considered unsigned. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-13arm64: Fix size of __early_cpu_boot_statusArun KS1-1/+1
commit 61cf61d81e326163ce1557ceccfca76e11d0e57c upstream. __early_cpu_boot_status is of type long. Use quad assembler directive to allocate proper size. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arun KS <arunks@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02arm64: alternative: fix build with clang integrated assemblerIlie Halip1-1/+1
commit 6f5459da2b8736720afdbd67c4bd2d1edba7d0e3 upstream. Building an arm64 defconfig with clang's integrated assembler, this error occurs: <instantiation>:2:2: error: unrecognized instruction mnemonic _ASM_EXTABLE 9999b, 9f ^ arch/arm64/mm/cache.S:50:1: note: while in macro instantiation user_alt 9f, "dc cvau, x4", "dc civac, x4", 0 ^ While GNU as seems fine with case-sensitive macro instantiations, clang doesn't, so use the actual macro name (_asm_extable) as in the rest of the file. Also checked that the generated assembly matches the GCC output. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Fixes: 290622efc76e ("arm64: fix "dc cvau" cache operation on errata-affected core") Link: https://github.com/ClangBuiltLinux/linux/issues/924 Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: omap5: Add bus_dma_limit for L3 busRoger Quadros1-0/+1
commit dfa7ea303f56a3a8b1ed3b91ef35af2da67ca4ee upstream. The L3 interconnect's memory map is from 0x0 to 0xffffffff. Out of this, System memory (SDRAM) can be accessed from 0x80000000 to 0xffffffff (2GB) OMAP5 does support 4GB of SDRAM but upper 2GB can only be accessed by the MPU subsystem. Add the dma-ranges property to reflect the physical address limit of the L3 bus. Cc: stable@kernel.org Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: dra7: Add bus_dma_limit for L3 busRoger Quadros1-0/+1
commit cfb5d65f25959f724081bae8445a0241db606af6 upstream. The L3 interconnect's memory map is from 0x0 to 0xffffffff. Out of this, System memory (SDRAM) can be accessed from 0x80000000 to 0xffffffff (2GB) DRA7 does support 4GB of SDRAM but upper 2GB can only be accessed by the MPU subsystem. Add the dma-ranges property to reflect the physical address limit of the L3 bus. Issues ere observed only with SATA on DRA7-EVM with 4GB RAM and CONFIG_ARM_LPAE enabled. This is because the controller supports 64-bit DMA and its driver sets the dma_mask to 64-bit thus resulting in DMA accesses beyond L3 limit of 2G. Setting the correct bus_dma_limit fixes the issue. Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: stable@kernel.org Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instrSean Christopherson1-2/+2
[ Upstream commit c4409905cd6eb42cfd06126e9226b0150e05a715 ] Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of MMIO emulation. As handle_ept_misconfig() is only used for MMIO emulation, it should pass EMULTYPE_NO_REEXECUTE when using the emulator to skip an instr in the fast-MMIO case where VM_EXIT_INSTRUCTION_LEN is invalid. And because the cr2 value passed to x86_emulate_instruction() is only destined for use when retrying or reexecuting, we can simply call emulate_instruction(). Fixes: d391f1207067 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02arm64: smp: fix smp_send_stop() behaviourCristian Marussi1-3/+14
commit d0bab0c39e32d39a8c5cddca72e5b4a3059fe050 upstream. On a system with only one CPU online, when another one CPU panics while starting-up, smp_send_stop() will fail to send any STOP message to the other already online core, resulting in a system still responsive and alive at the end of the panic procedure. [ 186.700083] CPU3: shutdown [ 187.075462] CPU2: shutdown [ 187.162869] CPU1: shutdown [ 188.689998] ------------[ cut here ]------------ [ 188.691645] kernel BUG at arch/arm64/kernel/cpufeature.c:886! [ 188.692079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 188.692444] Modules linked in: [ 188.693031] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-rc4-00001-g338d25c35a98 #104 [ 188.693175] Hardware name: Foundation-v8A (DT) [ 188.693492] pstate: 200001c5 (nzCv dAIF -PAN -UAO) [ 188.694183] pc : has_cpuid_feature+0xf0/0x348 [ 188.694311] lr : verify_local_elf_hwcaps+0x84/0xe8 [ 188.694410] sp : ffff800011b1bf60 [ 188.694536] x29: ffff800011b1bf60 x28: 0000000000000000 [ 188.694707] x27: 0000000000000000 x26: 0000000000000000 [ 188.694801] x25: 0000000000000000 x24: ffff80001189a25c [ 188.694905] x23: 0000000000000000 x22: 0000000000000000 [ 188.694996] x21: ffff8000114aa018 x20: ffff800011156a38 [ 188.695089] x19: ffff800010c944a0 x18: 0000000000000004 [ 188.695187] x17: 0000000000000000 x16: 0000000000000000 [ 188.695280] x15: 0000249dbde5431e x14: 0262cbe497efa1fa [ 188.695371] x13: 0000000000000002 x12: 0000000000002592 [ 188.695472] x11: 0000000000000080 x10: 00400032b5503510 [ 188.695572] x9 : 0000000000000000 x8 : ffff800010c80204 [ 188.695659] x7 : 00000000410fd0f0 x6 : 0000000000000001 [ 188.695750] x5 : 00000000410fd0f0 x4 : 0000000000000000 [ 188.695836] x3 : 0000000000000000 x2 : ffff8000100939d8 [ 188.695919] x1 : 0000000000180420 x0 : 0000000000180480 [ 188.696253] Call trace: [ 188.696410] has_cpuid_feature+0xf0/0x348 [ 188.696504] verify_local_elf_hwcaps+0x84/0xe8 [ 188.696591] check_local_cpu_capabilities+0x44/0x128 [ 188.696666] secondary_start_kernel+0xf4/0x188 [ 188.697150] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000) [ 188.698639] ---[ end trace 3f12ca47652f7b72 ]--- [ 188.699160] Kernel panic - not syncing: Attempted to kill the idle task! [ 188.699546] Kernel Offset: disabled [ 188.699828] CPU features: 0x00004,20c02008 [ 188.700012] Memory Limit: none [ 188.700538] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- [root@arch ~]# echo Helo Helo [root@arch ~]# cat /proc/cpuinfo | grep proce processor : 0 Make smp_send_stop() account also for the online status of the calling CPU while evaluating how many CPUs are effectively online: this way, the right number of STOPs is sent, so enforcing a proper freeze of the system at the end of panic even under the above conditions. Fixes: 08e875c16a16c ("arm64: SMP support") Reported-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02x86/mm: split vmalloc_sync_all()Joerg Roedel1-2/+24
commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream. Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodesKishon Vijay Abraham I1-0/+2
[ Upstream commit 27f13774654ea6bd0b6fc9b97cce8d19e5735661 ] 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI devices not described in the DT. Certain DRA7 platforms (e.g., DRA76) has RAM above 32-bit boundary (accessible with LPAE config) though the PCIe bridge will be able to access only 32-bits. Add 'dma-ranges' property in PCIe RC DT nodes to indicate the host bridge can access only 32 bits. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02powerpc: Include .BTF sectionNaveen N. Rao1-0/+6
[ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-20ARM: 8958/1: rename missed uaccess .fixup sectionKees Cook1-1/+1
commit f87b1c49bc675da30d8e1e8f4b60b800312c7b90 upstream. When the uaccess .fixup section was renamed to .text.fixup, one case was missed. Under ld.bfd, the orphaned section was moved close to .text (since they share the "ax" bits), so things would work normally on uaccess faults. Under ld.lld, the orphaned section was placed outside the .text section, making it unreachable. Link: https://github.com/ClangBuiltLinux/linux/issues/282 Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1020633#c44 Link: https://lore.kernel.org/r/nycvar.YSQ.7.76.1912032147340.17114@knanqh.ubzr Link: https://lore.kernel.org/lkml/202002071754.F5F073F1D@keescook/ Fixes: c4a84ae39b4a5 ("ARM: 8322/1: keep .text and .fixup regions closer together") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()Florian Fainelli1-0/+2
commit 45939ce292b4b11159719faaf60aba7d58d5fe33 upstream. It is possible for a system with an ARMv8 timer to run a 32-bit kernel. When this happens we will unconditionally have the vDSO code remove the __vdso_gettimeofday and __vdso_clock_gettime symbols because cntvct_functional() returns false since it does not match that compatibility string. Fixes: ecf99a439105 ("ARM: 8331/1: VDSO initialization, mapping, and synchronization") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flagKim Phillips1-7/+7
[ Upstream commit f967140dfb7442e2db0868b03b961f9c59418a1b ] Enable the sampling check in kernel/events/core.c::perf_event_open(), which returns the more appropriate -EOPNOTSUPP. BEFORE: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses). /bin/dmesg | grep -i perf may provide additional information. With nothing relevant in dmesg. AFTER: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-20ARC: define __ALIGN_STR and __ALIGN symbols for ARCEugeniy Paltsev1-0/+2
commit 8d92e992a785f35d23f845206cf8c6cafbc264e0 upstream. The default defintions use fill pattern 0x90 for padding which for ARC generates unintended "ldh_s r12,[r0,0x20]" corresponding to opcode 0x9090 So use ".align 4" which insert a "nop_s" instruction instead. Cc: stable@vger.kernel.org Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20KVM: x86: clear stale x86_emulate_ctxt->intercept valueVitaly Kuznetsov1-0/+1
commit 342993f96ab24d5864ab1216f46c0b199c2baf8e upstream. After commit 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Hyper-V guests on KVM stopped booting with: kvm_nested_vmexit: rip fffff802987d6169 reason EPT_VIOLATION info1 181 info2 0 int_info 0 int_info_err 0 kvm_page_fault: address febd0000 error_code 181 kvm_emulate_insn: 0:fffff802987d6169: f3 a5 kvm_emulate_insn: 0:fffff802987d6169: f3 a5 FAIL kvm_inj_exception: #UD (0x0) "f3 a5" is a "rep movsw" instruction, which should not be intercepted at all. Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") reduced the number of fields cleared by init_decode_cache() claiming that they are being cleared elsewhere, 'intercept', however, is left uncleared if the instruction does not have any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm, NearBranch, No16 and of course Intercept itself). Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: stable@vger.kernel.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systemsDesnes A. Nunes do Rosario1-1/+3
commit fc37a1632d40c80c067eb1bc235139f5867a2667 upstream. PowerVM systems running compatibility mode on a few Power8 revisions are still vulnerable to the hardware defect that loses PMU exceptions arriving prior to a context switch. The software fix for this issue is enabled through the CPU_FTR_PMAO_BUG cpu_feature bit, nevertheless this bit also needs to be set for PowerVM compatibility mode systems. Fixes: 68f2f0d431d9ea4 ("powerpc: Add a cpu feature CPU_FTR_PMAO_BUG") Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com> Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200227134715.9715-1-desnesn@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11ARM: imx: build v7_cpu_resume() unconditionallyAhmad Fatoum4-16/+28
commit 512a928affd51c2dc631401e56ad5ee5d5dd68b6 upstream. This function is not only needed by the platform suspend code, but is also reused as the CPU resume function when the ARM cores can be powered down completely in deep idle, which is the case on i.MX6SX and i.MX6UL(L). Providing the static inline stub whenever CONFIG_SUSPEND is disabled means that those platforms will hang on resume from cpuidle if suspend is disabled. So there are two problems: - The static inline stub masks the linker error - The function is not available where needed Fix both by just building the function unconditionally, when CONFIG_SOC_IMX6 is enabled. The actual code is three instructions long, so it's arguably ok to just leave it in for all i.MX6 kernel configurations. Fixes: 05136f0897b5 ("ARM: imx: support arm power off in cpuidle for i.mx6sx") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11ARM: dts: ls1021a: Restore MDIO compatible to gianfarVladimir Oltean1-2/+2
commit 7155c44624d061692b4c13aa8343f119c67d4fc0 upstream. The difference between "fsl,etsec2-mdio" and "gianfar" has to do with the .get_tbipa function, which calculates the address of the TBIPA register automatically, if not explicitly specified. [ see drivers/net/ethernet/freescale/fsl_pq_mdio.c ]. On LS1021A, the TBIPA register is at offset 0x30 within the port register block, which is what the "gianfar" method of calculating addresses actually does. Luckily, the bad "compatible" is inconsequential for ls1021a.dtsi, because the TBIPA register is explicitly specified via the second "reg" (<0x0 0x2d10030 0x0 0x4>), so the "get_tbipa" function is dead code. Nonetheless it's good to restore it to its correct value. Background discussion: https://www.spinics.net/lists/stable/msg361156.html Fixes: c7861adbe37f ("ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changesSean Christopherson1-1/+1
commit 735a6dd02222d8d070c7bb748f25895239ca8c92 upstream. Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE. Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap() changes that were made between this_cpu->c_init() and setup_pku(), as all non-synthetic feature words are reinitialized from the CPU's CPUID values. Blasting away capability updates manifests most visibility when running on a VMX capable CPU, but with VMX disabled by BIOS. To indicate that VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report which CPU is misconfigured (KVM needs to probe every CPU anyways). Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled, ultimately leading to an unexpected #GP when KVM attempts to do VMXON. Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let KVM figure out a different way to report the misconfigured CPU, but VMX is not the only feature bit that is affected, i.e. there is precedent that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init() is expected to work. Most notably, x86_init_rdrand()'s clearing of X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten. Fixes: 0697694564c8 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU") Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()'Christophe JAILLET1-1/+1
commit bef8e2dfceed6daeb6ca3e8d33f9c9d43b926580 upstream. Pointer on the memory allocated by 'alloc_progmem()' is stored in 'v->load_addr'. So this is this memory that should be freed by 'release_progmem()'. 'release_progmem()' is only a call to 'kfree()'. With the current code, there is both a double free and a memory leak. Fix it by passing the correct pointer to 'release_progmem()'. Fixes: e01402b115ccc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: ralf@linux-mips.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in ↵Nathan Chancellor1-1/+1
storage_key_init_range commit 380324734956c64cd060e1db4304f3117ac15809 upstream. Clang warns: In file included from ../arch/s390/purgatory/purgatory.c:10: In file included from ../include/linux/kexec.h:18: In file included from ../include/linux/crash_core.h:6: In file included from ../include/linux/elfcore.h:5: In file included from ../include/linux/user.h:1: In file included from ../arch/s390/include/asm/user.h:11: ../arch/s390/include/asm/page.h:45:6: warning: converting the result of '<<' to a boolean always evaluates to false [-Wtautological-constant-compare] if (PAGE_DEFAULT_KEY) ^ ../arch/s390/include/asm/page.h:23:44: note: expanded from macro 'PAGE_DEFAULT_KEY' #define PAGE_DEFAULT_KEY (PAGE_DEFAULT_ACC << 4) ^ 1 warning generated. Explicitly compare this against zero to silence the warning as it is intended to be used in a boolean context. Fixes: de3fa841e429 ("s390/mm: fix compile for PAGE_DEFAULT_KEY != 0") Link: https://github.com/ClangBuiltLinux/linux/issues/860 Link: https://lkml.kernel.org/r/20200214064207.10381-1-natechancellor@gmail.com Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28KVM: apic: avoid calculating pending eoi from an uninitialized valMiaohe Lin1-1/+3
commit 23520b2def95205f132e167cf5b25c609975e959 upstream. When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return value of pv_eoi_get_pending() becomes random. Fix the issue by initializing the variable. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28KVM: nVMX: Check IO instruction VM-exit conditionsOliver Upton1-7/+52
commit 35a571346a94fb93b5b3b6a599675ef3384bc75c upstream. Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>