summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2019-04-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2-109/+36
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-07Merge tag 'for-linus-5.1b-rc4-tag' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "One minor fix and a small cleanup for the xen privcmd driver" * tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Prevent buffer overflow in privcmd ioctl xen: use struct_size() helper in kzalloc()
2019-04-05syscalls: Remove start and number from syscall_set_arguments() argsSteven Rostedt (VMware)1-53/+16
After removing the start and count arguments of syscall_get_arguments() it seems reasonable to remove them from syscall_set_arguments(). Note, as of today, there are no users of syscall_set_arguments(). But we are told that there will be soon. But for now, at least make it consistent with syscall_get_arguments(). Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05syscalls: Remove start and number from syscall_get_arguments() argsSteven Rostedt (Red Hat)1-56/+17
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the function call syscall_get_arguments() implemented in x86 was horribly written and not optimized for the standard case of passing in 0 and 6 for the starting index and the number of system calls to get. When looking at all the users of this function, I discovered that all instances pass in only 0 and 6 for these arguments. Instead of having this function handle different cases that are never used, simply rewrite it to return the first 6 arguments of a system call. This should help out the performance of tracing system calls by ptrace, ftrace and perf. Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05xen: Prevent buffer overflow in privcmd ioctlDan Carpenter1-0/+3
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access. Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-04-03locking/rwsem: Remove arch specific rwsem filesWaiman Long1-237/+0
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway. By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks): Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed ------------ ----- ----- ----- ----- ----- ----- 1 29,201 30,143 29,458 28,615 30,172 29,201 2 6,807 13,299 1,171 7,725 15,025 1,804 4 6,504 12,755 1,520 7,127 14,286 1,345 8 6,762 13,412 764 6,826 13,652 726 16 6,693 15,408 662 6,599 15,938 626 32 6,145 15,286 496 5,549 15,487 511 64 5,812 15,495 60 5,858 15,572 60 There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a little bit faster than the assembly version with low lock contention. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h with no code change as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-riscv@lists.infradead.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linuxppc-dev@lists.ozlabs.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: uclinux-h8-devel@lists.sourceforge.jp Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-3/+7
Pull KVM fixes from Paolo Bonzini: "A collection of x86 and ARM bugfixes, and some improvements to documentation. On top of this, a cleanup of kvm_para.h headers, which were exported by some architectures even though they not support KVM at all. This is responsible for all the Kbuild changes in the diffstat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION KVM: doc: Document the life cycle of a VM and its resources KVM: selftests: complete IO before migrating guest state KVM: selftests: disable stack protector for all KVM tests KVM: selftests: explicitly disable PIE for tests KVM: selftests: assert on exit reason in CR4/cpuid sync test KVM: x86: update %rip after emulating IO x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts kvm: don't redefine flags as something else kvm: mmu: Used range based flushing in slot_handle_level_range KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) KVM: Reject device ioctls from processes other than the VM's creator KVM: doc: Fix incorrect word ordering regarding supported use of APIs KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT ...
2019-03-29x86/realmode: Make set_real_mode_mem() static inlineMatteo Croce1-1/+5
Remove the unused @size argument and move it into a header file, so it can be inlined. [ bp: Massage. ] Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-efi <linux-efi@vger.kernel.org> Cc: platform-driver-x86@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190328114233.27835-1-mcroce@redhat.com
2019-03-28KVM: x86: update %rip after emulating IOSean Christopherson1-0/+1
Most (all?) x86 platforms provide a port IO based reset mechanism, e.g. OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM that it is doing a reset, e.g. Qemu jams vCPU state and resumes running. To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM: Skip pio instruction when it is emulated, not executed") changed the behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the instruction prior to exiting to userspace. Full emulation doesn't need such tricks becase re-emulating the instruction will naturally handle %rip being changed to point at the reset vector. Updating %rip prior to executing to userspace has several drawbacks: - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation fails it will likely yell about the wrong address. - Single step exits to userspace for are effectively dropped as KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO. - Behavior of PIO emulation is different depending on whether it goes down the fast path or the slow path. Rather than skip the PIO instruction before exiting to userspace, snapshot the linear %rip and cancel PIO completion if the current value does not match the snapshot. For a 64-bit vCPU, i.e. the most common scenario, the snapshot and comparison has negligible overhead as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra VMREAD in this case. All other alternatives to snapshotting the linear %rip that don't rely on an explicit reset announcenment suffer from one corner case or another. For example, canceling PIO completion on any write to %rip fails if userspace does a save/restore of %rip, and attempting to avoid that issue by canceling PIO only if %rip changed then fails if PIO collides with the reset %rip. Attempting to zero in on the exact reset vector won't work for APs, which means adding more hooks such as the vCPU's MP_STATE, and so on and so forth. Checking for a linear %rip match technically suffers from corner cases, e.g. userspace could theoretically rewrite the underlying code page and expect a different instruction to execute, or the guest hardcodes a PIO reset at 0xfffffff0, but those are far, far outside of what can be considered normal operation. Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O") Cc: <stable@vger.kernel.org> Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hostsSean Christopherson1-0/+1
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()Wei Yang1-1/+1
* nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is non-zero. * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages() never return zero. Based on these two reasons, we can merge the two *if* clause and use the return value from kvm_mmu_calculate_mmu_pages() directly. This simplify the code and also eliminate the possibility for reader to believe nr_mmu_pages would be zero. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)Singh, Brijesh1-0/+2
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: Venkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'Sean Christopherson1-2/+2
The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28x86/cpufeature: Fix __percpu annotation in this_cpu_has()Jann Horn1-2/+3
&cpu_info.x86_capability is __percpu, and the second argument of x86_this_cpu_test_bit() is expected to be __percpu. Don't cast the __percpu away and then implicitly add it again. This gets rid of 106 lines of sparse warnings with the kernel config I'm using. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Nadav Amit <namit@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190328154948.152273-1-jannh@google.com
2019-03-22Merge branch 'x86/cpu' into x86/urgentThomas Gleixner1-16/+15
Merge the forgotten cleanup patch for the new file, so the mess does not propagate further.
2019-03-21x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processorsMatthew Whitehead1-21/+0
The getCx86_old() and setCx86_old() macros have been replaced with correctly working getCx86() and setCx86(), so remove these unused macros. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1552596361-8967-3-git-send-email-tedheadster@gmail.com
2019-03-17Merge tag 'kbuild-v5.1-2' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - add more Build-Depends to Debian source package - prefix header search paths with $(srctree)/ - make modpost show verbose section mismatch warnings - avoid hard-coded CROSS_COMPILE for h8300 - fix regression for Debian make-kpkg command - add semantic patch to detect missing put_device() - fix some warnings of 'make deb-pkg' - optimize NOSTDINC_FLAGS evaluation - add warnings about redundant generic-y - clean up Makefiles and scripts * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove stale lxdialog/.gitignore kbuild: force all architectures except um to include mandatory-y kbuild: warn redundant generic-y Revert "modsign: Abort modules_install when signing fails" kbuild: Make NOSTDINC_FLAGS a simply expanded variable kbuild: deb-pkg: avoid implicit effects coccinelle: semantic code search for missing put_device() kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb kbuild: deb-pkg: add CONFIG_ prefix to kernel config options kbuild: add workaround for Debian make-kpkg kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG} unicore32: simplify linker script generation for decompressor h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- kbuild: move archive command to scripts/Makefile.lib modpost: always show verbose warning for section mismatch ia64: prefix header search path with $(srctree)/ libfdt: prefix header search paths with $(srctree)/ deb-pkg: generate correct build dependencies
2019-03-17Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds3-125/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Thomas Gleixner: "Two cleanup patches removing dead conditionals and unused code" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove unused __constant_c_x_memset() macro and inlines x86/asm: Remove dead __GNUC__ conditionals
2019-03-17kbuild: force all architectures except um to include mandatory-yMasahiro Yamada1-2/+0
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes the common Kbuild.asm file. Factor out the duplicated include directives to scripts/Makefile.asm-generic so that no architecture would opt out of the mandatory-y mechanism. um is not forced to include mandatory-y since it is a very exceptional case which does not support UAPI. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-17kbuild: warn redundant generic-yMasahiro Yamada1-1/+0
The generic-y is redundant under the following condition: - arch has its own implementation - the same header is added to generated-y - the same header is added to mandatory-y If a redundant generic-y is found, the warning like follows is displayed: scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h I fixed up arch Kbuild files found by this. Suggested-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-23/+44
Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
2019-03-12Merge branch 'x86-tsx-for-linus' of ↵Linus Torvalds2-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tsx fixes from Thomas Gleixner: "This update provides kernel side handling for the TSX erratum of Intel Skylake (and later) CPUs. On these CPUs Intel Transactional Synchronization Extensions (TSX) functions can result in unpredictable system behavior under certain circumstances. The issue is mitigated with an microcode update which utilizes Performance Monitoring Counter (PMC) 3 when TSX functions are in use. This mitigation is enabled unconditionally by the updated microcode. As a consequence the usage of TSX functions can cause corrupted performance monitoring results for events which utilize PMC3. The corruption is silent on kernels which have no update for this issue. This update makes the kernel aware of the PMC3 utilization by the microcode: The microcode offers a possibility to enforce TSX abort which prevents the malfunction and frees up PMC3. The enforced TSX abort requires the TSX using application to have a software fallback path implemented; abort handlers which solely retry the transaction will fail over and over. The enforced TSX abort request is issued by the kernel when: - enforced TSX abort is enabled (PMU attribute) - A performance monitoring request needs PMC3 When PMC3 is not longer used by the kernel the TSX force abort request is cleared. The enforced TSX abort mechanism is enabled by default and can be controlled by the administrator via the new PMU attribute 'allow_tsx_force_abort'. This attribute is only visible when updated microcode is detected on affected systems. Writing '0' disables the enforced TSX abort mechanism, '1' enables it. As a result of disabling the enforced TSX abort mechanism, PMC3 is permanentely unavailable for performance monitoring which can cause performance monitoring requests to fail or switch to multiplexing mode" * branch 'x86-tsx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Implement support for TSX Force Abort x86: Add TSX Force Abort CPUID/MSR perf/x86/intel: Generalize dynamic constraint creation perf/x86/intel: Make cpuc allocations consistent
2019-03-12Merge tag 'for-linus-5.1a-rc1-tag' of ↵Linus Torvalds1-11/+2
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "xen fixes and features: - remove fallback code for very old Xen hypervisors - three patches for fixing Xen dom0 boot regressions - an old patch for Xen PCI passthrough which was never applied for unknown reasons - some more minor fixes and cleanup patches" * tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix dom0 boot on huge systems xen, cpu_hotplug: Prevent an out of bounds access xen: remove pre-xen3 fallback handlers xen/ACPI: Switch to bitmap_zalloc() x86/xen: dont add memory above max allowed allocation x86: respect memory size limiting via mem= parameter xen/gntdev: Check and release imported dma-bufs on close xen/gntdev: Do not destroy context while dma-bufs are in use xen/pciback: Don't disable PCI_COMMAND on PCI device reset. xen-scsiback: mark expected switch fall-through xen: mark expected switch fall-through
2019-03-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Make the unwinder more robust when it encounters a NULL pointer call, so the backtrace becomes more useful - Fix the bogus ORC unwind table alignment - Prevent kernel panic during kexec on HyperV caused by a cleared but not disabled hypercall page. - Remove the now pointless stacksize increase for KASAN_EXTRA, as KASAN_EXTRA is gone. - Remove unused variables from the x86 memory management code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Fix kernel panic when kexec on HyperV x86/mm: Remove unused variable 'old_pte' x86/mm: Remove unused variable 'cpu' Revert "x86_64: Increase stack size for KASAN_EXTRA" x86/unwind: Add hardcoded ORC entry for NULL x86/unwind: Handle NULL pointer calls better in frame unwinder x86/unwind/orc: Fix ORC unwind table alignment
2019-03-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "This time around we have in store: - Disable MC4_MISC thresholding banks on all AMD family 0x15 models (Shirish S) - AMD MCE error descriptions update and error decode improvements (Yazen Ghannam) - The usual smaller conversions and fixes" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Improve error message when kernel cannot recover, p2 EDAC/mce_amd: Decode MCA_STATUS in bit definition order EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit EDAC, mce_amd: Print ExtErrorCode and description on a single line EDAC, mce_amd: Match error descriptions to latest documentation x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types RAS: Add a MAINTAINERS entry RAS: Use consistent types for UUIDs x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models x86/MCE: Switch to use the new generic UUID API
2019-03-08Merge branch 'x86-uv-for-linus' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV updates from Ingo Molnar: "Three UV related cleanups" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/UV: Use efi_enabled() instead of test_bit() x86/platform/UV: Remove uv_bios_call_reentrant() x86/platform/UV: Remove unnecessary #ifdef CONFIG_EFI
2019-03-08Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanup from Ingo Molnar: "A single GUP cleanup" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm/gup: Remove the 'write' parameter from gup_fast_permitted()
2019-03-08Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds2-44/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "Three changes: - preparatory patch for AVX state tracking that computing-cluster folks would like to use for user-space batching - but we are not happy about the related ABI yet so this is only the kernel tracking side - a cleanup for CR0 handling in do_device_not_available() - plus we removed a workaround for an ancient binutils version" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Track AVX-512 usage of tasks x86/fpu: Get rid of CONFIG_AS_FXSAVEQ x86/traps: Have read_cr0() only once in the #NM handler
2019-03-08Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds3-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups and simplifications, none of them really stands out, they are all over the place" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Remove unused __addr_ok() macro x86/smpboot: Remove unused phys_id variable x86/mm/dump_pagetables: Remove the unused prev_pud variable x86/fpu: Move init_xstate_size() to __init section x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section x86/mtrr: Remove unused variable x86/boot/compressed/64: Explain paging_prepare()'s return value x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition x86/asm/suspend: Drop ENTRY from local data x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery x86/boot: Save several bytes in decompressor x86/trap: Remove useless declaration x86/mm/tlb: Remove unused cpu variable x86/events: Mark expected switch-case fall-throughs x86/asm-prototypes: Remove duplicate include <asm/page.h> x86/kernel: Mark expected switch-case fall-throughs x86/insn-eval: Mark expected switch-case fall-through x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls x86/e820: Replace kmalloc() + memcpy() with kmemdup()
2019-03-07Revert "x86_64: Increase stack size for KASAN_EXTRA"Qian Cai1-4/+0
This reverts commit a8e911d13540487942d53137c156bd7707f66e5d. KASAN_EXTRA was removed via the commit 7771bdbbfd3d ("kasan: remove use after scope bugs detection."), so this is no longer needed. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: bp@alien8.de Cc: akpm@linux-foundation.org Cc: aryabinin@virtuozzo.com Cc: glider@google.com Cc: dvyukov@google.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20190306213806.46139-1-cai@lca.pw
2019-03-07x86/unwind: Handle NULL pointer calls better in frame unwinderJann Horn1-0/+6
When the frame unwinder is invoked for an oops caused by a call to NULL, it currently skips the parent function because BP still points to the parent's stack frame; the (nonexistent) current function only has the first half of a stack frame, and BP doesn't point to it yet. Add a special case for IP==0 that calculates a fake BP from SP, then uses the real BP for the next frame. Note that this handles first_frame specially: Return information about the parent function as long as the saved IP is >=first_frame, even if the fake BP points below it. With an artificially-added NULL call in prctl_set_seccomp(), before this patch, the trace is: Call Trace: ? prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 After this patch, the trace is: Call Trace: prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.com
2019-03-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-21/+24
Merge misc updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits) tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include proc: more robust bulk read test proc: test /proc/*/maps, smaps, smaps_rollup, statm proc: use seq_puts() everywhere proc: read kernel cpu stat pointer once proc: remove unused argument in proc_pid_lookup() fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() fs/proc/self.c: code cleanup for proc_setup_self() proc: return exit code 4 for skipped tests mm,mremap: bail out earlier in mremap_to under map pressure mm/sparse: fix a bad comparison mm/memory.c: do_fault: avoid usage of stale vm_area_struct writeback: fix inode cgroup switching comment mm/huge_memory.c: fix "orig_pud" set but not used mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC mm/memcontrol.c: fix bad line in comment mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm/compaction: pass pgdat to too_many_isolated() instead of zone mm: remove zone_lru_lock() function, access ->lru_lock directly ...
2019-03-06Merge branch 'x86-alternatives-for-linus' of ↵Linus Torvalds2-9/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternative instruction updates from Ingo Molnar: "Small RDTSCP opimization, enabled by the newly added ALTERNATIVE_3(), and other small improvements" * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/TSC: Use RDTSCP x86/alternatives: Add an ALTERNATIVE_3() macro x86/alternatives: Print containing function x86/alternatives: Add macro comments
2019-03-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of tooling updates - too many to list, here's a few highlights: - Various subcommand updates to 'perf trace', 'perf report', 'perf record', 'perf annotate', 'perf script', 'perf test', etc. - CPU and NUMA topology and affinity handling improvements, - HW tracing and HW support updates: - Intel PT updates - ARM CoreSight updates - vendor HW event updates - BPF updates - Tons of infrastructure updates, both on the build system and the library support side - Documentation updates. - ... and lots of other changes, see the changelog for details. Kernel side updates: - Tighten up kprobes blacklist handling, reduce the number of places where developers can install a kprobe and hang/crash the system. - Fix/enhance vma address filter handling. - Various PMU driver updates, small fixes and additions. - refcount_t conversions - BPF updates - error code propagation enhancements - misc other changes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits) perf script python: Add Python3 support to syscall-counts-by-pid.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to netdev-times.py perf tools: Add perf_exe() helper to find perf binary perf script: Handle missing fields with -F +.. perf data: Add perf_data__open_dir_data function perf data: Add perf_data__(create_dir|close_dir) functions perf data: Fail check_backup in case of error perf data: Make check_backup work over directories perf tools: Add rm_rf_perf_data function perf tools: Add pattern name checking to rm_rf perf tools: Add depth checking to rm_rf perf data: Add global path holder ...
2019-03-06Merge branch 'locking-core-for-linus' of ↵Linus Torvalds1-4/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The biggest part of this tree is the new auto-generated atomics API wrappers by Mark Rutland. The primary motivation was to allow instrumentation without uglifying the primary source code. The linecount increase comes from adding the auto-generated files to the Git space as well: include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++-- include/asm-generic/atomic-long.h | 1174 ++++++++++--- include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++ include/linux/atomic.h | 1241 +------------ I preferred this approach, so that the full call stack of the (already complex) locking APIs is still fully visible in 'git grep'. But if this is excessive we could certainly hide them. There's a separate build-time mechanism to determine whether the headers are out of date (they should never be stale if we do our job right). Anyway, nothing from this should be visible to regular kernel developers. Other changes: - Add support for dynamic keys, which removes a source of false positives in the workqueue code, among other things (Bart Van Assche) - Updates to tools/memory-model (Andrea Parri, Paul E. McKenney) - qspinlock, wake_q and lockdep micro-optimizations (Waiman Long) - misc other updates and enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/lockdep: Shrink struct lock_class_key locking/lockdep: Add module_param to enable consistency checks lockdep/lib/tests: Test dynamic key registration lockdep/lib/tests: Fix run_tests.sh kernel/workqueue: Use dynamic lockdep keys for workqueues locking/lockdep: Add support for dynamic keys locking/lockdep: Verify whether lock objects are small enough to be used as class keys locking/lockdep: Check data structure consistency locking/lockdep: Reuse lock chains that have been freed locking/lockdep: Fix a comment in add_chain_cache() locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count() locking/lockdep: Reuse list entries that are no longer in use locking/lockdep: Free lock classes that are no longer in use locking/lockdep: Update two outdated comments locking/lockdep: Make it easy to detect whether or not inside a selftest locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock() locking/lockdep: Initialize the locks_before and locks_after lists earlier locking/lockdep: Make zap_class() remove all matching lock order entries locking/lockdep: Reorder struct lock_class members locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache ...
2019-03-06Merge branch 'efi-core-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main EFI changes in this cycle were: - Use 32-bit alignment for efi_guid_t - Allow the SetVirtualAddressMap() call to be omitted - Implement earlycon=efifb based on existing earlyprintk code - Various minor fixes and code cleanups from Sai, Ard and me" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Fix build error due to enum collision between efi.h and ima.h efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted efi: Replace GPL license boilerplate with SPDX headers efi/fdt: Apply more cleanups efi: Use 32-bit alignment for efi_guid_t efi/memattr: Don't bail on zero VA if it equals the region's PA x86/efi: Mark can_free_region() as an __init function
2019-03-06x86: Add TSX Force Abort CPUID/MSRPeter Zijlstra (Intel)2-0/+7
Skylake systems will receive a microcode update to address a TSX errata. This microcode will (by default) clobber PMC3 when TSX instructions are (speculatively or not) executed. It also provides an MSR to cause all TSX transaction to abort and preserve PMC3. Add the CPUID enumeration and MSR definition. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06docs/core-api/mm: fix user memory accessors formattingMike Rapoport1-12/+12
The descriptions of userspace memory access functions had minor issues with formatting that made kernel-doc unable to properly detect the function/macro names and the return value sections: ./arch/x86/include/asm/uaccess.h:80: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:139: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:231: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:505: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:530: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:58: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:69: warning: No description found for return value of 'clear_user' ./arch/x86/lib/usercopy_32.c:78: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:90: warning: No description found for return value of '__clear_user' Fix the formatting. Link: http://lkml.kernel.org/r/1549549644-4903-3-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-06mm: update ptep_modify_prot_commit to take old pte value as argAneesh Kumar K.V1-1/+1
Architectures like ppc64 require to do a conditional tlb flush based on the old and new value of pte. Enable that by passing old pte value as the arg. Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-06mm: update ptep_modify_prot_start/commit to take vm_area_struct as argAneesh Kumar K.V2-7/+9
Patch series "NestMMU pte upgrade workaround for mprotect", v5. We can upgrade pte access (R -> RW transition) via mprotect. We need to make sure we follow the recommended pte update sequence as outlined in commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") for such updates. This patch series does that. This patch (of 5): Some architectures may want to call flush_tlb_range from these helpers. Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-06mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual1-1/+2
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-06Merge branch 'timers-2038-for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull year 2038 updates from Thomas Gleixner: "Another round of changes to make the kernel ready for 2038. After lots of preparatory work this is the first set of syscalls which are 2038 safe: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 The syscall numbers are identical all over the architectures" * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) riscv: Use latest system call ABI checksyscalls: fix up mq_timedreceive and stat exceptions unicore32: Fix __ARCH_WANT_STAT64 definition asm-generic: Make time32 syscall numbers optional asm-generic: Drop getrlimit and setrlimit syscalls from default list 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option compat ABI: use non-compat openat and open_by_handle_at variants y2038: add 64-bit time_t syscalls to all 32-bit architectures y2038: rename old time and utime syscalls y2038: remove struct definition redirects y2038: use time32 syscall names on 32-bit syscalls: remove obsolete __IGNORE_ macros y2038: syscalls: rename y2038 compat syscalls x86/x32: use time64 versions of sigtimedwait and recvmmsg timex: change syscalls to use struct __kernel_timex timex: use __kernel_timex internally sparc64: add custom adjtimex/clock_adjtime functions time: fix sys_timer_settime prototype time: Add struct __kernel_timex time: make adjtime compat handling available for 32 bit ...
2019-03-05a.out: remove core dumping supportLinus Torvalds1-67/+0
We're (finally) phasing out a.out support for good. As Borislav Petkov points out, we've supported ELF binaries for about 25 years by now, and coredumping in particular has bitrotted over the years. None of the tool chains even support generating a.out binaries any more, and the plan is to deprecate a.out support entirely for the kernel. But I want to start with just removing the core dumping code, because I can still imagine that somebody actually might want to support a.out as a simpler biinary format. Particularly if you generate some random binaries on the fly, ELF is a much more complicated format (admittedly ELF also does have a lot of toolchain support, mitigating that complexity a lot and you really should have moved over in the last 25 years). So it's at least somewhat possible that somebody out there has some workflow that still involves generating and running a.out executables. In contrast, it's very unlikely that anybody depends on debugging any legacy a.out core files. But regardless, I want this phase-out to be done in two steps, so that we can resurrect a.out support (if needed) without having to resurrect the core file dumping that is almost certainly not needed. Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial cut at this had missed. And Alan Cox points out that the a.out binary loader _could_ be done in user space if somebody wants to, but we might keep just the loader in the kernel if somebody really wants it, since the loader isn't that big and has no really odd special cases like the core dumping does. Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jannh@google.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2-1/+1
Pull networking updates from David Miller: "Here we go, another merge window full of networking and #ebpf changes: 1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP range without dealing with floods of ARP traffic, from Linus Lüssing. 2) Throttle buffered multicast packet transmission in mt76, from Felix Fietkau. 3) Support adaptive interrupt moderation in ice, from Brett Creeley. 4) A lot of struct_size conversions, from Gustavo A. R. Silva. 5) Add peek/push/pop commands to bpftool, as well as bash completion, from Stanislav Fomichev. 6) Optimize sk_msg_clone(), from Vakul Garg. 7) Add SO_BINDTOIFINDEX, from David Herrmann. 8) Be more conservative with local resends due to local congestion, from Yuchung Cheng. 9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata. 10) Add health buffer support to devlink, from Eran Ben Elisha. 11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen. 12) Add statistics to basic packet scheduler filter, from Cong Wang. 13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan. 14) Lots of new IP tunneling forwarding tests, also from Nir Dotan. 15) Add 3ad stats to bonding, from Nikolay Aleksandrov. 16) Lots of probing improvements for bpftool, from Quentin Monnet. 17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski. 18) Allow #ebpf programs to access gso_segs from skb shared info, from Eric Dumazet. 19) Add sock_diag support for AF_XDP sockets, from Björn Töpel. 20) Support 22260 iwlwifi devices, from Luca Coelho. 21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov. 22) Add JMP32 instruction class support to #ebpf, from Jiong Wang. 23) Add spinlock support to #ebpf, from Alexei Starovoitov. 24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson. 25) Add device infomation API to devlink, from Jakub Kicinski. 26) Add new timestamping socket options which are y2038 safe, from Deepa Dinamani. 27) Add RX checksum offloading for various sh_eth chips, from Sergei Shtylyov. 28) Flow offload infrastructure, from Pablo Neira Ayuso. 29) Numerous cleanups, improvements, and bug fixes to the PHY layer and many drivers from Heiner Kallweit. 30) Lots of changes to try and make packet scheduler classifiers run lockless as much as possible, from Vlad Buslov. 31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows. 32) Add concurrency tests to tc-tests infrastructure, from Vlad Buslov. 33) Add hwmon support to aquantia, from Heiner Kallweit. 34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet. And I would be remiss if I didn't thank the various major networking subsystem maintainers for integrating much of this work before I even saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso, Johannes Berg, Kalle Valo, and many others. Thank you!" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits) net/sched: avoid unused-label warning net: ignore sysctl_devconf_inherit_init_net without SYSCTL phy: mdio-mux: fix Kconfig dependencies net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework selftest/net: Remove duplicate header sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79 net/mlx5e: Update tx reporter status in case channels were successfully opened devlink: Add support for direct reporter health state update devlink: Update reporter state to error even if recover aborted sctp: call iov_iter_revert() after sending ABORT team: Free BPF filter when unregistering netdev ip6mr: Do not call __IP6_INC_STATS() from preemptible context isdn: mISDN: Fix potential NULL pointer dereference of kzalloc net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs cxgb4/chtls: Prefix adapter flags with CXGB4 net-sysfs: Switch to bitmap_zalloc() mellanox: Switch to bitmap_zalloc() bpf: add test cases for non-pointer sanitiation logic mlxsw: i2c: Extend initialization by querying resources data ...
2019-03-05xen: remove pre-xen3 fallback handlersArnd Bergmann1-11/+2
The legacy hypercall handlers were originally added with a comment explaining that "copying the argument structures in HYPERVISOR_event_channel_op() and HYPERVISOR_physdev_op() into the local variable is sufficiently safe" and only made sure to not write past the end of the argument structure, the checks in linux/string.h disagree with that, when link-time optimizations are used: In function 'memcpy', inlined from 'pirq_query_unmask' at drivers/xen/fallback.c:53:2, inlined from '__startup_pirq' at drivers/xen/events/events_base.c:529:2, inlined from 'restore_pirqs' at drivers/xen/events/events_base.c:1439:3, inlined from 'xen_irq_resume' at drivers/xen/events/events_base.c:1581:2: include/linux/string.h:350:3: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter __read_overflow2(); ^ Further research turned out that only Xen 3.0.2 or earlier required the fallback at all, while all versions in use today don't need it. As far as I can tell, it is not even possible to run a mainline kernel on those old Xen releases, at the time when they were in use, only a patched kernel was supported anyway. Fixes: cf47a83fb06e ("xen/hypercall: fix hypercall fallback code for very old hypervisors") Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-3/+5
2019-03-04get rid of legacy 'get_ds()' functionLinus Torvalds1-1/+0
Every in-kernel use of this function defined it to KERNEL_DS (either as an actual define, or as an inline function). It's an entirely historical artifact, and long long long ago used to actually read the segment selector valueof '%ds' on x86. Which in the kernel is always KERNEL_DS. Inspired by a patch from Jann Horn that just did this for a very small subset of users (the ones in fs/), along with Al who suggested a script. I then just took it to the logical extreme and removed all the remaining gunk. Roughly scripted with git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/' git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d' plus manual fixups to remove a few unusual usage patterns, the couple of inline function cases and to fix up a comment that had become stale. The 'get_ds()' function remains in an x86 kvm selftest, since in user space it actually does something relevant. Inspired-by: Jann Horn <jannh@google.com> Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two last minute fixes: - Prevent value evaluation via functions happening in the user access enabled region of __put_user() (put another way: make sure to evaluate the value to be stored in user space _before_ enabling user space accesses) - Correct the definition of a Hyper-V hypercall constant" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
2019-02-28x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNTLan Tianyu1-1/+1
The max flush rep count of HvFlushGuestPhysicalAddressList hypercall is equal with how many entries of union hv_gpa_page_range can be populated into the input parameter page. The code lacks parenthesis around PAGE_SIZE - 2 * sizeof(u64) which results in bogus computations. Add them. Fixes: cc4edae4b924 ("x86/hyper-v: Add HvFlushGuestAddressList hypercall support") Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kys@microsoft.com Cc: haiyangz@microsoft.com Cc: sthemmin@microsoft.com Cc: sashal@kernel.org Cc: bp@alien8.de Cc: hpa@zytor.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190225143114.5149-1-Tianyu.Lan@microsoft.com
2019-02-28Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar4-2/+12
Signed-off-by: Ingo Molnar <mingo@kernel.org>