summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-03-09Linux 4.14.25v4.14.25Greg Kroah-Hartman1-1/+1
2018-03-09nvme-rdma: don't suppress send completionsSagi Grimberg1-40/+14
commit b4b591c87f2b0f4ebaf3a68d4f13873b241aa584 upstream. The entire completions suppress mechanism is currently broken because the HCA might retry a send operation (due to dropped ack) after the nvme transaction has completed. In order to handle this, we signal all send completions and introduce a separate done handler for async events as they will be handled differently (as they don't include in-capsule data by definition). Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09md: only allow remove_and_add_spares when no sync_thread running.NeilBrown1-0/+4
commit 39772f0a7be3b3dc26c74ea13fe7847fd1522c8b upstream. The locking protocols in md assume that a device will never be removed from an array during resync/recovery/reshape. When that isn't happening, rcu or reconfig_mutex is needed to protect an rdev pointer while taking a refcount. When it is happening, that protection isn't needed. Unfortunately there are cases were remove_and_add_spares() is called when recovery might be happening: is state_store(), slot_store() and hot_remove_disk(). In each case, this is just an optimization, to try to expedite removal from the personality so the device can be removed from the array. If resync etc is happening, we just have to wait for md_check_recover to find a suitable time to call remove_and_add_spares(). This optimization and not essential so it doesn't matter if it fails. So change remove_and_add_spares() to abort early if resync/recovery/reshape is happening, unless it is called from md_check_recovery() as part of a newly started recovery. The parameter "this" is only NULL when called from md_check_recovery() so when it is NULL, there is no need to abort. As this can result in a NULL dereference, the fix is suitable for -stable. cc: yuyufen <yuyufen@huawei.com> Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.") Cc: stable@ver.kernel.org (v4.8+) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: dts: LogicPD Torpedo: Fix I2C1 pinmuxAdam Ford1-0/+8
commit 74402055a2d3ec998a1ded599e86185a27d9bbf4 upstream. The pinmuxing was missing for I2C1 which was causing intermittent issues with the PMIC which is connected to I2C1. The bootloader did not quite configure the I2C1 either, so when running at 2.6MHz, it was generating errors at time. This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz Fixes: 687c27676151 ("ARM: dts: Add minimal support for LogicPD Torpedo DM3730 devkit") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmuxAdam Ford1-1/+8
commit 84c7efd607e7fb6933920322086db64654f669b2 upstream. The pinmuxing was missing for I2C1 which was causing intermittent issues with the PMIC which is connected to I2C1. The bootloader did not quite configure the I2C1 either, so when running at 2.6MHz, it was generating errors at times. This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz Fixes: ab8dd3aed011 ("ARM: DTS: Add minimal Support for Logic PD DM3730 SOM-LV") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530Kai Heng Feng1-7/+31
commit 36904703aeeeb6cd31993f1353c8325006229f9a upstream. The i2c touchpad on Dell XPS 9570 and Precision M5530 doesn't work out of box. The touchpad relies on its _INI method to update its _HID value from XXXX0000 to SYNA2393. Also, the _STA relies on value of I2CN to report correct status. Set acpi_gbl_parse_table_as_term_list so the value of I2CN can be correctly set up, and _INI can get run. The ACPI table in this machine is designed to get parsed this way. Also, change the quirk table to a more generic name. Link: https://bugzilla.kernel.org/show_bug.cgi?id=198515 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM/x86: remove WARN_ON() for when vm_munmap() failsEric Biggers1-4/+2
commit 103c763c72dd2df3e8c91f2d7ec88f98ed391111 upstream. On x86, special KVM memslots such as the TSS region have anonymous memory mappings created on behalf of userspace, and these mappings are removed when the VM is destroyed. It is however possible for removing these mappings via vm_munmap() to fail. This can most easily happen if the thread receives SIGKILL while it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)' in __x86_set_memory_region(). syzkaller was able to hit this, using 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure results in the mapping not being removed immediately, it is not leaked forever but rather will be freed when the process exits. It's not really possible to handle this failure properly, so almost every other caller of vm_munmap() doesn't check the return value. It's a limitation of having the kernel manage these mappings rather than userspace. So just remove the WARN_ON() so that users can't spam the kernel log with this warning. Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in ↵Tianyu Lan1-2/+2
kvm_valid_sregs() commit 37b95951c58fdf08dc10afa9d02066ed9f176fb5 upstream. kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is to fix it. Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set) Reported-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09PCI/ASPM: Deal with missing root ports in link state handlingArd Biesheuvel1-2/+6
commit ee8bdfb6568d86bb93f55f8d99c4c643e77304ee upstream. Even though it is unconventional, some PCIe host implementations omit the root ports entirely, and simply consist of a host bridge (which is not modeled as a device in the PCI hierarchy) and a link. When the downstream device is an endpoint, our current code does not seem to mind this unusual configuration. However, when PCIe switches are involved, the ASPM code assumes that any downstream switch port has a parent, and blindly dereferences the bus->parent->self field of the pci_dev struct to chain the downstream link state to the link state of the root port. Given that the root port is missing, the link is not modeled at all, and nor is the link state, and attempting to access it results in a NULL pointer dereference and a crash. Avoid this by allowing the link state chain to terminate at the downstream port if no root port exists. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: x86: fix vcpu initialization with userspace lapicRadim Krčmář2-7/+6
commit b7e31be385584afe7f073130e8e570d53c95f7fe upstream. Moving the code around broke this rare configuration. Use this opportunity to finally call lapic reset from vcpu reset. Reported-by: syzbot+fb7a33a4b6c35007a72b@syzkaller.appspotmail.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: 0b2e9904c159 ("KVM: x86: move LAPIC initialization after VMCS creation") Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR ↵Paolo Bonzini2-2/+2
path as unlikely() commit 946fbbc13dce68902f64515b610eeb2a6c3d7a64 upstream. vmx_vcpu_run() and svm_vcpu_run() are large functions, and giving branch hints to the compiler can actually make a substantial cycle difference by keeping the fast path contiguous in memory. With this optimization, the retpoline-guest/retpoline-host case is about 50 cycles faster. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180222154318.20361-3-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: x86: move LAPIC initialization after VMCS creationPaolo Bonzini2-1/+1
commit 0b2e9904c15963e715d33e5f3f1387f17d19333a upstream. The initial reset of the local APIC is performed before the VMCS has been created, but it tries to do a vmwrite: vmwrite error: reg 810 value 4a00 (err 18944) CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014 Call Trace: vmx_set_rvi [kvm_intel] vmx_hwapic_irr_update [kvm_intel] kvm_lapic_reset [kvm] kvm_create_lapic [kvm] kvm_arch_vcpu_init [kvm] kvm_vcpu_init [kvm] vmx_create_vcpu [kvm_intel] kvm_vm_ioctl [kvm] Move it later, after the VMCS has been created. Fixes: 4191db26b714 ("KVM: x86: Update APICv on APIC reset") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM/x86: Remove indirect MSR op calls from SPEC_CTRLPaolo Bonzini2-6/+8
commit ecb586bd29c99fb4de599dec388658e74388daad upstream. Having a paravirt indirect call in the IBRS restore path is not a good idea, since we are trying to protect from speculative execution of bogus indirect branch targets. It is also slower, so use native_wrmsrl() on the vmentry path too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Fixes: d28b387fb74da95d69d2615732f50cceb38e9a4d Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: mmu: Fix overlap between public and private memslotsWanpeng Li1-2/+1
commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream. Reported by syzkaller: pte_list_remove: ffff9714eb1f8078 0->BUG ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu.c:1157! invalid opcode: 0000 [#1] SMP RIP: 0010:pte_list_remove+0x11b/0x120 [kvm] Call Trace: drop_spte+0x83/0xb0 [kvm] mmu_page_zap_pte+0xcc/0xe0 [kvm] kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm] kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm] kvm_arch_flush_shadow_all+0xe/0x10 [kvm] kvm_mmu_notifier_release+0x6c/0xa0 [kvm] ? kvm_mmu_notifier_release+0x5/0xa0 [kvm] __mmu_notifier_release+0x79/0x110 ? __mmu_notifier_release+0x5/0x110 exit_mmap+0x15a/0x170 ? do_exit+0x281/0xcb0 mmput+0x66/0x160 do_exit+0x2c9/0xcb0 ? __context_tracking_exit.part.5+0x4a/0x150 do_group_exit+0x50/0xd0 SyS_exit_group+0x14/0x20 do_syscall_64+0x73/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 The reason is that when creates new memslot, there is no guarantee for new memslot not overlap with private memslots. This can be triggered by the following program: #include <fcntl.h> #include <pthread.h> #include <setjmp.h> #include <signal.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> #include <linux/kvm.h> long r[16]; int main() { void *p = valloc(0x4000); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); uint64_t addr = 0xf000; ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr); r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul); ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul); ioctl(r[6], KVM_RUN, 0); ioctl(r[6], KVM_RUN, 0); struct kvm_userspace_memory_region mr = { .slot = 0, .flags = KVM_MEM_LOG_DIRTY_PAGES, .guest_phys_addr = 0xf000, .memory_size = 0x4000, .userspace_addr = (uintptr_t) p }; ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr); return 0; } This patch fixes the bug by not adding a new memslot even if it overlaps with private memslots. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2018-03-09KVM: X86: Fix SMRAM accessing even if VM is shutdownWanpeng Li1-1/+1
commit 95e057e25892eaa48cad1e2d637b80d0f1a4fac5 upstream. Reported by syzkaller: WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel] CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4 RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel] Call Trace: vmx_handle_exit+0xbd/0xe20 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm] kvm_vcpu_ioctl+0x3e9/0x720 [kvm] do_vfs_ioctl+0xa4/0x6a0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x25/0x9c The testcase creates a first thread to issue KVM_SMI ioctl, and then creates a second thread to mmap and operate on the same vCPU. This triggers a race condition when running the testcase with multiple threads. Sometimes one thread exits with a triple fault while another thread mmaps and operates on the same vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE in kvm_handle_bad_page(), which will go on to cause an emulation failure and an exit with KVM_EXIT_INTERNAL_ERROR. Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: x86: extend usage of RET_MMIO_PF_* constantsPaolo Bonzini2-58/+55
commit 9b8ebbdb74b5ad76b9dfd8b101af17839174b126 upstream. The x86 MMU if full of code that returns 0 and 1 for retry/emulate. Use the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to drop the MMIO part. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: kvm: fix building with gcc-8Arnd Bergmann2-0/+9
commit 67870eb1204223598ea6d8a4467b482e9f5875b5 upstream. In banked-sr.c, we use a top-level '__asm__(".arch_extension virt")' statement to allow compilation of a multi-CPU kernel for ARMv6 and older ARMv7-A that don't normally support access to the banked registers. This is considered to be a programming error by the gcc developers and will no longer work in gcc-8, where we now get a build error: /tmp/cc4Qy7GR.s:34: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_usr' /tmp/cc4Qy7GR.s:41: Error: Banked registers are not available with this architecture. -- `mrs r3,ELR_hyp' /tmp/cc4Qy7GR.s:55: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_svc' /tmp/cc4Qy7GR.s:62: Error: Banked registers are not available with this architecture. -- `mrs r3,LR_svc' /tmp/cc4Qy7GR.s:69: Error: Banked registers are not available with this architecture. -- `mrs r3,SPSR_svc' /tmp/cc4Qy7GR.s:76: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_abt' Passign the '-march-armv7ve' flag to gcc works, and is ok here, because we know the functions won't ever be called on pre-ARMv7VE machines. Unfortunately, older compiler versions (4.8 and earlier) do not understand that flag, so we still need to keep the asm around. Backporting to stable kernels (4.6+) is needed to allow those to be built with future compilers as well. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84129 Fixes: 33280b4cd1dc ("ARM: KVM: Add banked registers save/restore") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: mvebu: Fix broken PL310_ERRATA_753970 selectsUlf Magnusson1-2/+2
commit 8aa36a8dcde3183d84db7b0d622ffddcebb61077 upstream. The MACH_ARMADA_375 and MACH_ARMADA_38X boards select ARM_ERRATA_753970, but it was renamed to PL310_ERRATA_753970 by commit fa0ce4035d48 ("ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds"). Fix the selects to use the new name. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Fixes: fa0ce4035d48 ("ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds" cc: stable@vger.kernel.org Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: dts: rockchip: Remove 1.8 GHz operation point from phycore somDaniel Schultz1-20/+0
commit 5ce0bad4ccd04c8a989e94d3c89e4e796ac22e48 upstream. Rockchip recommends to run the CPU cores only with operations points of 1.6 GHz or lower. Removed the cpu0 node with too high operation points and use the default values instead. Fixes: 903d31e34628 ("ARM: dts: rockchip: Add support for phyCORE-RK3288 SoM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Schultz <d.schultz@phytec.de> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09ARM: orion: fix orion_ge00_switch_board_info initializationArnd Bergmann1-12/+11
commit 8337d083507b9827dfb36d545538b7789df834fd upstream. A section type mismatch warning shows up when building with LTO, since orion_ge00_mvmdio_bus_name was put in __initconst but not marked const itself: include/linux/of.h: In function 'spear_setup_of_timer': arch/arm/mach-spear/time.c:207:34: error: 'timer_of_match' causes a section type conflict with 'orion_ge00_mvmdio_bus_name' static const struct of_device_id timer_of_match[] __initconst = { ^ arch/arm/plat-orion/common.c:475:32: note: 'orion_ge00_mvmdio_bus_name' was declared here static __initconst const char *orion_ge00_mvmdio_bus_name = "orion-mii"; ^ As pointed out by Andrew Lunn, it should in fact be 'const' but not '__initconst' because the string is never copied but may be accessed after the init sections are freed. To fix that, I get rid of the extra symbol and rewrite the initialization in a simpler way that assigns both the bus_id and modalias statically. I spotted another theoretical bug in the same place, where d->netdev[i] may be an out of bounds access, this can be fixed by moving the device assignment into the loop. Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09x86/mm: Fix {pmd,pud}_{set,clear}_flags()Jan Beulich2-4/+14
commit 842cef9113c2120f74f645111ded1e020193d84c upstream. Just like pte_{set,clear}_flags() their PMD and PUD counterparts should not do any address translation. This was outright wrong under Xen (causing a dead boot with no useful output on "suitable" systems), and produced needlessly more complicated code (even if just slightly) when paravirt was enabled. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09nospec: Allow index argument to have const-qualified typeRasmus Villemoes1-2/+1
commit b98c6a160a057d5686a8c54c79cc6c8c94a7d0c8 upstream. The last expression in a statement expression need not be a bare variable, quoting gcc docs The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. and we already use that in e.g. the min/max macros which end with a ternary expression. This way, we can allow index to have const-qualified type, which will in some cases avoid the need for introducing a local copy of index of non-const qualified type. That, in turn, can prevent readers not familiar with the internals of array_index_nospec from wondering about the seemingly redundant extra variable, and I think that's worthwhile considering how confusing the whole _nospec business is. The expression _i&_mask has type unsigned long (since that is the type of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to that), so in order not to change the type of the whole expression, add a cast back to typeof(_i). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: s390: consider epoch index on TOD clock syncsDavid Hildenbrand1-3/+29
commit 1575767ef3cf5326701d2ae3075b7732cbc855e4 upstream. For now, we don't take care of over/underflows. Especially underflows are critical: Assume the epoch is currently 0 and we get a sync request for delta=1, meaning the TOD is moved forward by 1 and we have to fix it up by subtracting 1 from the epoch. Right now, this will leave the epoch index untouched, resulting in epoch=-1, epoch_idx=0, which is wrong. We have to take care of over and underflows, also for the VSIE case. So let's factor out calculation into a separate function. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180207114647.6220-5-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [use u8 for idx] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: s390: consider epoch index on hotplugged CPUsDavid Hildenbrand1-0/+1
commit d16b52cb9cdb6f06dea8ab2f0a428e7d7f0b0a81 upstream. We must copy both, the epoch and the epoch_idx. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180207114647.6220-4-david@redhat.com> Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: s390: provide only a single function for setting the tod (fix SCK)David Hildenbrand3-38/+22
commit 0e7def5fb0dc53ddbb9f62a497d15f1e11ccdc36 upstream. Right now, SET CLOCK called in the guest does not properly take care of the epoch index, as the call goes via the old kvm_s390_set_tod_clock() interface. So the epoch index is neither reset to 0, if required, nor properly set to e.g. 0xff on negative values. Fix this by providing a single kvm_s390_set_tod_clock() function. Move Multiple-epoch facility handling into it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180207114647.6220-3-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09KVM: s390: take care of clock-comparator sign controlDavid Hildenbrand1-6/+19
commit 5fe01793dd953ab947fababe8abaf5ed5258c8df upstream. Missed when enabling the Multiple-epoch facility. If the facility is installed and the control is set, a sign based comaprison has to be performed. Right now we would inject wrong interrupts and ignore interrupt conditions. Also the sleep time is calculated in a wrong way. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180207114647.6220-2-david@redhat.com> Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNLAnna Karbownik1-1/+1
commit bf8486709ac7fad99e4040dea73fe466c57a4ae1 upstream. Commit 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights Landing which supports up to 6 channels. This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm variables which don't pay critical role on KNL code path, so the memory corruption wasn't causing any visible driver failures. The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that. An alternative solution would be to restructure the KNL part of the driver to 2MC/3channel representation. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Karbownik <anna.karbownik@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: jim.m.snow@intel.com Cc: krzysztof.paliswiat@intel.com Cc: lukasz.odzioba@intel.com Cc: qiuxu.zhuo@intel.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Fixes: 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09media: m88ds3103: don't call a non-initalized functionMauro Carvalho Chehab1-2/+5
commit b9c97c67fd19262c002d94ced2bfb513083e161e upstream. If m88d3103 chip ID is not recognized, the device is not initialized. However, it returns from probe without any error, causing this OOPS: [ 7.689289] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 7.689297] pgd = 7b0bd7a7 [ 7.689302] [00000000] *pgd=00000000 [ 7.689318] Internal error: Oops: 80000005 [#1] SMP ARM [ 7.689322] Modules linked in: dvb_usb_dvbsky(+) m88ds3103 dvb_usb_v2 dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_core crc32_arm_ce videodev media [ 7.689358] CPU: 3 PID: 197 Comm: systemd-udevd Not tainted 4.15.0-mcc+ #23 [ 7.689361] Hardware name: BCM2835 [ 7.689367] PC is at 0x0 [ 7.689382] LR is at m88ds3103_attach+0x194/0x1d0 [m88ds3103] [ 7.689386] pc : [<00000000>] lr : [<bf0ae1ec>] psr: 60000013 [ 7.689391] sp : ed8e5c20 ip : ed8c1e00 fp : ed8945c0 [ 7.689395] r10: ed894000 r9 : ed894378 r8 : eda736c0 [ 7.689400] r7 : ed894070 r6 : ed8e5c44 r5 : bf0bb040 r4 : eda77600 [ 7.689405] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : eda77600 [ 7.689412] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 7.689417] Control: 10c5383d Table: 2d8e806a DAC: 00000051 [ 7.689423] Process systemd-udevd (pid: 197, stack limit = 0xe9dbfb63) [ 7.689428] Stack: (0xed8e5c20 to 0xed8e6000) [ 7.689439] 5c20: ed853a80 eda73640 ed894000 ed8942c0 ed853a80 bf0b9e98 ed894070 bf0b9f10 [ 7.689449] 5c40: 00000000 00000000 bf08c17c c08dfc50 00000000 00000000 00000000 00000000 [ 7.689459] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689468] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689479] 5ca0: 00000000 00000000 ed8945c0 ed8942c0 ed894000 ed894830 bf0b9e98 00000000 [ 7.689490] 5cc0: ed894378 bf0a3cb4 bf0bc3b0 0000533b ed920540 00000000 00000034 bf0a6434 [ 7.689500] 5ce0: ee952070 ed826600 bf0a7038 bf0a2dd8 00000001 bf0a6768 bf0a2f90 ed8943c0 [ 7.689511] 5d00: 00000000 c08eca68 ed826620 ed826620 00000000 ee952070 bf0bc034 ee952000 [ 7.689521] 5d20: ed826600 bf0bb080 ffffffed c0aa9e9c c0aa9dac ed826620 c16edf6c c168c2c8 [ 7.689531] 5d40: c16edf70 00000000 bf0bc034 0000000d 00000000 c08e268c bf0bb080 ed826600 [ 7.689541] 5d60: bf0bc034 ed826654 ed826620 bf0bc034 c164c8bc 00000000 00000001 00000000 [ 7.689553] 5d80: 00000028 c08e2948 00000000 bf0bc034 c08e2848 c08e0778 ee9f0a58 ed88bab4 [ 7.689563] 5da0: bf0bc034 ed90ba80 c168c1f0 c08e1934 bf0bb3bc c17045ac bf0bc034 c164c8bc [ 7.689574] 5dc0: bf0bc034 bf0bb3bc ed91f564 c08e34ec bf0bc000 c164c8bc bf0bc034 c0aa8dc4 [ 7.689584] 5de0: ffffe000 00000000 bf0bf000 ed91f600 ed91f564 c03021e4 00000001 00000000 [ 7.689595] 5e00: c166e040 8040003f ed853a80 bf0bc448 00000000 c1678174 ed853a80 f0f22000 [ 7.689605] 5e20: f0f21fff 8040003f 014000c0 ed91e700 ed91e700 c16d8e68 00000001 ed91e6c0 [ 7.689615] 5e40: bf0bc400 00000001 bf0bc400 ed91f564 00000001 00000000 00000028 c03c9a24 [ 7.689625] 5e60: 00000001 c03c8c94 ed8e5f50 ed8e5f50 00000001 bf0bc400 ed91f540 c03c8cb0 [ 7.689637] 5e80: bf0bc40c 00007fff bf0bc400 c03c60b0 00000000 bf0bc448 00000028 c0e09684 [ 7.689647] 5ea0: 00000002 bf0bc530 c1234bf8 bf0bc5dc bf0bc514 c10ebbe8 ffffe000 bf000000 [ 7.689657] 5ec0: 00011538 00000000 ed8e5f48 00000000 00000000 00000000 00000000 00000000 [ 7.689666] 5ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689676] 5f00: 00000000 00000000 7fffffff 00000000 00000013 b6e55a18 0000017b c0309104 [ 7.689686] 5f20: ed8e4000 00000000 00510af0 c03c9430 7fffffff 00000000 00000003 00000000 [ 7.689697] 5f40: 00000000 f0f0f000 00011538 00000000 f0f107b0 f0f0f000 00011538 f0f1fdb8 [ 7.689707] 5f60: f0f1fbe8 f0f1b974 00004000 000041e0 bf0bc3d0 00000001 00000000 000024c4 [ 7.689717] 5f80: 0000002d 0000002e 00000019 00000000 00000010 00000000 16894000 00000000 [ 7.689727] 5fa0: 00000000 c0308f20 16894000 00000000 00000013 b6e55a18 00000000 b6e5652c [ 7.689737] 5fc0: 16894000 00000000 00000000 0000017b 00020000 00508110 00000000 00510af0 [ 7.689748] 5fe0: bef68948 bef68938 b6e4d3d0 b6d32590 60000010 00000013 00000000 00000000 [ 7.689790] [<bf0ae1ec>] (m88ds3103_attach [m88ds3103]) from [<bf0b9f10>] (dvbsky_s960c_attach+0x78/0x280 [dvb_usb_dvbsky]) [ 7.689821] [<bf0b9f10>] (dvbsky_s960c_attach [dvb_usb_dvbsky]) from [<bf0a3cb4>] (dvb_usbv2_probe+0xa3c/0x1024 [dvb_usb_v2]) [ 7.689849] [<bf0a3cb4>] (dvb_usbv2_probe [dvb_usb_v2]) from [<c0aa9e9c>] (usb_probe_interface+0xf0/0x2a8) [ 7.689869] [<c0aa9e9c>] (usb_probe_interface) from [<c08e268c>] (driver_probe_device+0x2f8/0x4b4) [ 7.689881] [<c08e268c>] (driver_probe_device) from [<c08e2948>] (__driver_attach+0x100/0x11c) [ 7.689895] [<c08e2948>] (__driver_attach) from [<c08e0778>] (bus_for_each_dev+0x4c/0x9c) [ 7.689909] [<c08e0778>] (bus_for_each_dev) from [<c08e1934>] (bus_add_driver+0x1c0/0x264) [ 7.689919] [<c08e1934>] (bus_add_driver) from [<c08e34ec>] (driver_register+0x78/0xf4) [ 7.689931] [<c08e34ec>] (driver_register) from [<c0aa8dc4>] (usb_register_driver+0x70/0x134) [ 7.689946] [<c0aa8dc4>] (usb_register_driver) from [<c03021e4>] (do_one_initcall+0x44/0x168) [ 7.689963] [<c03021e4>] (do_one_initcall) from [<c03c9a24>] (do_init_module+0x64/0x1f4) [ 7.689979] [<c03c9a24>] (do_init_module) from [<c03c8cb0>] (load_module+0x20a0/0x25c8) [ 7.689993] [<c03c8cb0>] (load_module) from [<c03c9430>] (SyS_finit_module+0xb4/0xec) [ 7.690007] [<c03c9430>] (SyS_finit_module) from [<c0308f20>] (ret_fast_syscall+0x0/0x54) [ 7.690018] Code: bad PC value This may happen on normal circumstances, if, for some reason, the demod hangs and start returning an invalid chip ID: [ 10.394395] m88ds3103 3-0068: Unknown device. Chip_id=00 So, change the logic to cause probe to fail with -ENODEV, preventing the OOPS. Detected while testing DVB MMAP patches on Raspberry Pi 3 with DVBSky S960CI. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatchMing Lei1-1/+3
commit 105976f517791aed3b11f8f53b308a2069d42055 upstream. __blk_mq_requeue_request() covers two cases: - one is that the requeued request is added to hctx->dispatch, such as blk_mq_dispatch_rq_list() - another case is that the request is requeued to io scheduler, such as blk_mq_requeue_request(). We should call io sched's .requeue_request callback only for the 2nd case. Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Omar Sandoval <osandov@fb.com> Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix IPA command submission raceJulian Wiedmann1-9/+10
[ Upstream commit d22ffb5a712f9211ffd104c38fc17cbfb1b5e2b0 ] If multiple IPA commands are build & sent out concurrently, fill_ipacmd_header() may assign a seqno value to a command that's different from what send_control_data() later assigns to this command's reply. This is due to other commands passing through send_control_data(), and incrementing card->seqno.ipa along the way. So one IPA command has no reply that's waiting for its seqno, while some other IPA command has multiple reply objects waiting for it. Only one of those waiting replies wins, and the other(s) times out and triggers a recovery via send_ipa_cmd(). Fix this by making sure that the same seqno value is assigned to a command and its reply object. Do so immediately before submitting the command & while holding the irq_pending "lock", to produce nicely ascending seqnos. As a side effect, *all* IPA commands now use a reply object that's waiting for its actual seqno. Previously, early IPA commands that were submitted while the card was still DOWN used the "catch-all" IDX seqno. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix IP address lookup for L3 devicesJulian Wiedmann2-51/+74
[ Upstream commit c5c48c58b259bb8f0482398370ee539d7a12df3e ] Current code ("qeth_l3_ip_from_hash()") matches a queried address object against objects in the IP table by IP address, Mask/Prefix Length and MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually require is either a) "is this IP address registered" (ie. match by IP address only), before adding a new address. b) or "is this address object registered" (ie. match all relevant attributes), before deleting an address. Right now 1. the ADD path is too strict in its lookup, and eg. doesn't detect conflicts between an existing NORMAL address and a new VIPA address (because the NORMAL address will have mask != 0, while VIPA has a mask == 0), 2. the DELETE path is not strict enough, and eg. allows del_rxip() to delete a VIPA address as long as the IP address matches. Fix all this by adding helpers (_addr_match_ip() and _addr_match_all()) that do the appropriate checking. Note that the ADD path for NORMAL addresses is special, as qeth keeps track of how many times such an address is in use (and there is no immediate way of returning errors to the caller). So when a requested NORMAL address _fully_ matches an existing one, it's not considered a conflict and we merely increment the refcount. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09Revert "s390/qeth: fix using of ref counter for rxip addresses"Julian Wiedmann1-5/+3
[ Upstream commit 4964c66fd49b2e2342da35358f2ff74614bcbaee ] This reverts commit cb816192d986f7596009dedcf2201fe2e5bc2aa7. The issue this attempted to fix never actually occurs. l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address was previously added to the card. If so, it returns -EEXIST and doesn't call l3_add_ip(). As a result, the "address exists" path in l3_add_ip() is never taken for rxip addresses, and this patch had no effect. Fixes: cb816192d986 ("s390/qeth: fix using of ref counter for rxip addresses") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix double-free on IP add/remove raceJulian Wiedmann1-1/+2
[ Upstream commit 14d066c3531a87f727968cacd85bd95c75f59843 ] Registering an IPv4 address with the HW takes quite a while, so we temporarily drop the ip_htable lock. Any concurrent add/remove of the same IP adjusts the IP's use count, and (on remove) is then blocked by addr->in_progress. After the register call has completed, we check the use count for concurrently attempted add/remove calls - and possibly straight-away deregister the IP again. This happens via l3_delete_ip(), which 1) looks up the queried IP in the htable (getting a reference to the *same* queried object), 2) deregisters the IP from the HW, and 3) frees the IP object. The caller in l3_add_ip() then does a second free on the same object. For this case, skip all the extra checks and lookups in l3_delete_ip() and just deregister & free the IP object ourselves. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix IP removal on offline cardsJulian Wiedmann1-11/+3
[ Upstream commit 98d823ab1fbdcb13abc25b420f9bb71bade42056 ] If the HW is not reachable, then none of the IPs in qeth's internal table has been registered with the HW yet. So when deleting such an IP, there's no need to stage it for deregistration - just drop it from the table. This fixes the "add-delete-add" scenario on an offline card, where the the second "add" merely increments the IP's use count. But as the IP is still set to DISP_ADDR_DELETE from the previous "delete" step, l3_recover_ip() won't register it with the HW when the card goes online. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix overestimated count of buffer elementsJulian Wiedmann2-9/+12
[ Upstream commit 12472af89632beb1ed8dea29d4efe208ca05b06a ] qeth_get_elements_for_range() doesn't know how to handle a 0-length range (ie. start == end), and returns 1 when it should return 0. Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all of the skb's linear data) are skipped when mapping the skb into regular buffer elements. This overestimation may cause several performance-related issues: 1. sub-optimal IO buffer selection, where the next buffer gets selected even though the skb would actually still fit into the current buffer. 2. forced linearization, if the element count for a non-linear skb exceeds QETH_MAX_BUFFER_ELEMENTS. Rather than modifying qeth_get_elements_for_range() and adding overhead to every caller, fix up those callers that are in risk of passing a 0-length range. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix SETIP command handlingJulian Wiedmann2-6/+13
[ Upstream commit 1c5b2216fbb973a9410e0b06389740b5c1289171 ] send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses *all* command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09s390/qeth: fix underestimated count of buffer elementsUrsula Braun1-1/+1
[ Upstream commit 89271c65edd599207dd982007900506283c90ae3 ] For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09virtio-net: disable NAPI only when enabled during XDP setJason Wang1-3/+5
[ Upstream commit 4e09ff5362843dff3accfa84c805c7f3a99de9cd ] We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before. Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09tuntap: disable preemption during XDP processingJason Wang1-0/+6
[ Upstream commit 23e43f07f896f8578318cfcc9466f1e8b8ab21b6 ] Except for tuntap, all other drivers' XDP was implemented at NAPI poll() routine in a bh. This guarantees all XDP operation were done at the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But for tuntap, we do it in process context and we try to protect XDP processing by RCU reader lock. This is insufficient since CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which breaks the assumption that all XDP were processed in the same CPU. Fixing this by simply disabling preemption during XDP processing. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09tuntap: correctly add the missing XDP flushJason Wang1-0/+1
[ Upstream commit 1bb4f2e868a2891ab8bc668b8173d6ccb8c4ce6f ] We don't flush batched XDP packets through xdp_do_flush_map(), this will cause packets stall at TX queue. Consider we don't do XDP on NAPI poll(), the only possible fix is to call xdp_do_flush_map() immediately after xdp_do_redirect(). Note, this in fact won't try to batch packets through devmap, we could address in the future. Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09tcp: purge write queue upon RSTSoheil Hassas Yeganeh1-0/+1
[ Upstream commit a27fd7a8ed3856faaf5a2ff1c8c5f00c0667aaa0 ] When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed. RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07 Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09netlink: put module reference if dump start failsJason A. Donenfeld1-1/+3
[ Upstream commit b87b6194be631c94785fe93398651e804ed43e28 ] Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09mlxsw: spectrum_router: Do not unconditionally clear route offload indicationIdo Schimmel1-0/+3
[ Upstream commit d1c95af366961101819f07e3c64d44f3be7f0367 ] When mlxsw replaces (or deletes) a route it removes the offload indication from the replaced route. This is problematic for IPv4 routes, as the offload indication is stored in the fib_info which is usually shared between multiple routes. Instead of unconditionally clearing the offload indication, only clear it if no other route is using the fib_info. Fixes: 3984d1a89fe7 ("mlxsw: spectrum_router: Provide offload indication using nexthop flags") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09cls_u32: fix use after free in u32_destroy_key()Paolo Abeni1-10/+11
[ Upstream commit d7cdee5ea8d28ae1b6922deb0c1badaa3aa0ef8c ] Li Shuang reported an Oops with cls_u32 due to an use-after-free in u32_destroy_key(). The use-after-free can be triggered with: dev=lo tc qdisc add dev $dev root handle 1: htb default 10 tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256 tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\ 10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1: tc qdisc del dev $dev root Which causes the following kasan splat: ================================================================== BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571 CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32] Call Trace: dump_stack+0xd6/0x182 ? dma_virt_map_sg+0x22e/0x22e print_address_description+0x73/0x290 kasan_report+0x277/0x360 ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] u32_delete_key_freepf_work+0x1c/0x30 [cls_u32] process_one_work+0xae0/0x1c80 ? sched_clock+0x5/0x10 ? pwq_dec_nr_in_flight+0x3c0/0x3c0 ? _raw_spin_unlock_irq+0x29/0x40 ? trace_hardirqs_on_caller+0x381/0x570 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1e5/0x760 ? finish_task_switch+0x208/0x760 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x839/0x1ee0 ? check_noncircular+0x20/0x20 ? firmware_map_remove+0x73/0x73 ? find_held_lock+0x39/0x1c0 ? worker_thread+0x434/0x1820 ? lock_contended+0xee0/0xee0 ? lock_release+0x1100/0x1100 ? init_rescuer.part.16+0x150/0x150 ? retint_kernel+0x10/0x10 worker_thread+0x216/0x1820 ? process_one_work+0x1c80/0x1c80 ? lock_acquire+0x1a5/0x540 ? lock_downgrade+0x6b0/0x6b0 ? sched_clock+0x5/0x10 ? lock_release+0x1100/0x1100 ? compat_start_thread+0x80/0x80 ? do_raw_spin_trylock+0x190/0x190 ? _raw_spin_unlock_irq+0x29/0x40 ? trace_hardirqs_on_caller+0x381/0x570 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1e5/0x760 ? finish_task_switch+0x208/0x760 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x839/0x1ee0 ? kmem_cache_alloc_trace+0x143/0x320 ? firmware_map_remove+0x73/0x73 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1c0 ? schedule+0xf3/0x3b0 ? lock_downgrade+0x6b0/0x6b0 ? __schedule+0x1ee0/0x1ee0 ? do_wait_intr_irq+0x340/0x340 ? do_raw_spin_trylock+0x190/0x190 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? process_one_work+0x1c80/0x1c80 ? process_one_work+0x1c80/0x1c80 kthread+0x312/0x3d0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 Allocated by task 1688: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x162/0x380 u32_change+0x1220/0x3c9e [cls_u32] tc_ctl_tfilter+0x1ba6/0x2f80 rtnetlink_rcv_msg+0x4f0/0x9d0 netlink_rcv_skb+0x124/0x320 netlink_unicast+0x430/0x600 netlink_sendmsg+0x8fa/0xd60 sock_sendmsg+0xb1/0xe0 ___sys_sendmsg+0x678/0x980 __sys_sendmsg+0xc4/0x210 do_syscall_64+0x232/0x7f0 return_from_SYSCALL_64+0x0/0x75 Freed by task 112: kasan_slab_free+0x71/0xc0 kfree+0x114/0x320 rcu_process_callbacks+0xc3f/0x1600 __do_softirq+0x2bf/0xc06 The buggy address belongs to the object at ffff881b83dae600 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 24 bytes inside of 4096-byte region [ffff881b83dae600, ffff881b83daf600) The buggy address belongs to the page: page:ffffea006e0f6a00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x17ffffc0008100(slab|head) raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007 raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The problem is that the htnode is freed before the linked knodes and the latter will try to access the first at u32_destroy_key() time. This change addresses the issue using the htnode refcnt to guarantee the correct free order. While at it also add a RCU annotation, to keep sparse happy. v1 -> v2: use rtnl_derefence() instead of RCU read locks v2 -> v3: - don't check refcnt in u32_destroy_hnode() - cleaned-up u32_destroy() implementation - cleaned-up code comment v3 -> v4: - dropped unneeded comment Reported-by: Li Shuang <shuali@redhat.com> Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09amd-xgbe: Restore PCI interrupt enablement setting on resumeTom Lendacky1-0/+2
[ Upstream commit cfd092f2db8b4b6727e1c03ef68a7842e1023573 ] After resuming from suspend, the PCI device support must re-enable the interrupt setting so that interrupts are actually delivered. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09net/mlx5e: Verify inline header size do not exceed SKB linear sizeEran Ben Elisha1-1/+1
[ Upstream commit f600c6088018d1dbc5777d18daa83660f7ea4a64 ] Driver tries to copy at least MLX5E_MIN_INLINE bytes into the control segment of the WQE. It assumes that the linear part contains at least MLX5E_MIN_INLINE bytes, which can be wrong. Cited commit verified that driver will not copy more bytes into the inline header part that the actual size of the packet. Re-factor this check to make sure we do not exceed the linear part as well. This fix is aligned with the current driver's assumption that the entire L2 will be present in the linear part of the SKB. Fixes: 6aace17e64f4 ("net/mlx5e: Fix inline header size for small packets") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09bridge: Fix VLAN reference count problemIdo Schimmel1-0/+2
[ Upstream commit 0e5a82efda872c2469c210957d7d4161ef8f4391 ] When a VLAN is added on a port, a reference is taken on the corresponding master VLAN entry. If it does not already exist, then it is created and a reference taken. However, in the second case a reference is not really taken when CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by refcount_inc_not_zero(). Fix this by using refcount_set() on a newly created master VLAN entry. Fixes: 251277598596 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09sctp: fix dst refcnt leak in sctp_v6_get_dst()Alexey Kodanev1-3/+7
[ Upstream commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 ] When going through the bind address list in sctp_v6_get_dst() and the previously found address is better ('matchlen > bmatchlen'), the code continues to the next iteration without releasing currently held destination. Fix it by releasing 'bdst' before continue to the next iteration, and instead of introducing one more '!IS_ERR(bdst)' check for dst_release(), move the already existed one right after ip6_dst_lookup_flow(), i.e. we shouldn't proceed further if we get an error for the route lookup. Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary addresses for ipv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09net: ipv4: Set addr_type in hash_keys for forwarded caseDavid Ahern1-0/+2
[ Upstream commit 1fe4b1184c2ae2bfbf9e8b14c9c0c1945c98f205 ] The result of the skb flow dissect is copied from keys to hash_keys to ensure only the intended data is hashed. The original L4 hash patch overlooked setting the addr_type for this case; add it. Fixes: bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_createJiri Pirko1-8/+11
[ Upstream commit 0f2d2b2736b08dafa3bde31d048750fbc8df3a31 ] Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create() use ERR_PTR macro to propagate int err through return of a pointer, the return value is not NULL in case of failure. So if one of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table is not NULL and mlxsw_sp_vr_is_used wrongly assumes that vr is in use which leads to crash like following one: [ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9 [ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum] Fix this by using local variables to hold the pointers and set vr->* only in case everything went fine. Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling") Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support") Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>