summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2019-04-30x86/alternatives: Add text_poke_kgdb() to not assert the lock when debuggingNadav Amit3-19/+45
text_mutex is currently expected to be held before text_poke() is called, but kgdb does not take the mutex, and instead *supposedly* ensures the lock is not taken and will not be acquired by any other core while text_poke() is running. The reason for the "supposedly" comment is that it is not entirely clear that this would be the case if gdb_do_roundup is zero. Create two wrapper functions, text_poke() and text_poke_kgdb(), which do or do not run the lockdep assertion respectively. While we are at it, change the return code of text_poke() to something meaningful. One day, callers might actually respect it and the existing BUG_ON() when patching fails could be removed. For kgdb, the return value can actually be used. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9222f606506c ("x86/alternatives: Lockdep-enforce text_mutex in text_poke*()") Link: https://lkml.kernel.org/r/20190426001143.4983-2-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30Merge tag 'v5.1-rc7' into x86/mm, to pick up fixesIngo Molnar3-6/+12
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-29x86: make ZERO_PAGE() at least parse its argumentLinus Torvalds1-1/+1
This doesn't really do anything, but at least we now parse teh ZERO_PAGE() address argument so that we'll catch the most obvious errors in usage next time they'll happen. See commit 6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use") what happens when we don't have any use of the macro argument at all. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-29x86/stacktrace: Use common infrastructureThomas Gleixner2-97/+20
Replace the stack_trace_save*() functions with the new arch_stack_walk() interfaces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: linux-arch@vger.kernel.org Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: linux-mm@kvack.org Cc: David Rientjes <rientjes@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: iommu@lists.linux-foundation.org Cc: Robin Murphy <robin.murphy@arm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: linux-btrfs@vger.kernel.org Cc: dm-devel@redhat.com Cc: Mike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: intel-gfx@lists.freedesktop.org Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: dri-devel@lists.freedesktop.org Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Miroslav Benes <mbenes@suse.cz> Link: https://lkml.kernel.org/r/20190425094803.816485461@linutronix.de
2019-04-29perf/x86: Make perf callchains work without CONFIG_FRAME_POINTERKairui Song3-23/+18
Currently perf callchain doesn't work well with ORC unwinder when sampling from trace point. We'll get useless in kernel callchain like this: perf 6429 [000] 22.498450: kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) 7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 5651468729c1 [unknown] (/usr/bin/perf) 5651467ee82a main+0x69a (/usr/bin/perf) 7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) The root cause is that, for trace point events, it doesn't provide a real snapshot of the hardware registers. Instead perf tries to get required caller's registers and compose a fake register snapshot which suppose to contain enough information for start a unwinding. However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the frame pointer, so current frame pointer is returned instead. We get a invalid register combination which confuse the unwinder, and end the stacktrace early. So in such case just don't try dump BP, and let the unwinder start directly when the register is not a real snapshot. Use SP as the skip mark, unwinder will skip all the frames until it meet the frame of the trace point caller. Tested with frame pointer unwinder and ORC unwinder, this makes perf callchain get the full kernel space stacktrace again like this: perf 6503 [000] 1567.570191: kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux) 7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 55a22960d9c1 [unknown] (/usr/bin/perf) 55a22958982a main+0x69a (/usr/bin/perf) 7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190422162652.15483-1-kasong@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix an early boot crash in the RSDP parsing code by effectively turning off the parsing call - we ran out of time but want to fix the regression. The more involved fix is being worked on. - Fix a crash that can trigger in the kmemlek code. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix a crash with kmemleak_scan() x86/boot: Disable RSDP parsing temporarily
2019-04-27KVM: VMX: Move RSB stuffing to before the first RET after VM-ExitRick Edgecombe2-3/+12
The not-so-recent change to move VMX's VM-Exit handing to a dedicated "function" unintentionally exposed KVM to a speculative attack from the guest by executing a RET prior to stuffing the RSB. Make RSB stuffing happen immediately after VM-Exit, before any unpaired returns. Alternatively, the VM-Exit path could postpone full RSB stuffing until its current location by stuffing the RSB only as needed, or by avoiding returns in the VM-Exit path entirely, but both alternatives are beyond ugly since vmx_vmexit() has multiple indirect callers (by way of vmx_vmenter()). And putting the RSB stuffing immediately after VM-Exit makes it much less likely to be re-broken in the future. Note, the cost of PUSH/POP could be avoided in the normal flow by pairing the PUSH RAX with the POP RAX in __vmx_vcpu_run() and adding an a POP to nested_vmx_check_vmentry_hw(), but such a weird/subtle dependency is likely to cause problems in the long run, and PUSH/POP will take all of a few cycles, which is peanuts compared to the number of cycles required to fill the RSB. Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines") Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-26x86/mm/tlb: Remove 'struct flush_tlb_info' from the stackNadav Amit1-34/+82
Move flush_tlb_info variables off the stack. This allows to align flush_tlb_info to cache-line and avoid potentially unnecessary cache line movements. It also allows to have a fixed virtual-to-physical translation of the variables, which reduces TLB misses. Use per-CPU struct for flush_tlb_mm_range() and flush_tlb_kernel_range(). Add debug assertions to ensure there are no nested TLB flushes that might overwrite the per-CPU data. For arch_tlbbatch_flush() use a const struct. Results when running a microbenchmarks that performs 10^6 MADV_DONTEED operations and touching a page, in which 3 additional threads run a busy-wait loop (5 runs, PTI and retpolines are turned off): base off-stack ---- --------- avg (usec/op) 1.629 1.570 (-3%) stddev 0.014 0.009 Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190425230143.7008-1-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-26Merge branch 'linus' into x86/mm, to pick up dependent fixIngo Molnar36-426/+686
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller17-57/+157
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26x86: tsc: Rework time_cpufreq_notifier()Rafael J. Wysocki1-15/+14
There are problems with running time_cpufreq_notifier() on SMP systems. First off, the rdtsc() called from there runs on the CPU executing that code and not necessarily on the CPU whose sched_clock() rate is updated which is questionable at best. Second, in the cases when the frequencies of all CPUs in an SMP system are always in sync, it is not sufficient to update just one of them or the set associated with a given cpufreq policy on frequency changes - all CPUs in the system should be updated and that would require more than a simple transition notifier. Note, however, that the underlying issue (the TSC rate depending on the CPU frequency) has not been present in hardware shipping for the last few years and in quite a few relevant cases (acpi-cpufreq in particular) running time_cpufreq_notifier() will cause the TSC to be marked as unstable anyway. For this reason, make time_cpufreq_notifier() simply mark the TSC as unstable and give up when run on SMP and only try to carry out any adjustments otherwise. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-25x86/apic: Unify duplicated local apic timer clockevent initializationJacob Pan1-26/+31
Local APIC timer clockevent parameters can be calculated based on platform specific methods. However the code is mostly duplicated with the interrupt based calibration. The commit which increased the max_delta parameter updated only one place and made the implementations diverge. Unify it to prevent further damage. [ tglx: Rename function to lapic_init_clockevent() and adjust changelog a bit ] Fixes: 4aed89d6b515 ("x86, lapic-timer: Increase the max_delta to 31 bits") Reported-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/1556213272-63568-1-git-send-email-jacob.jun.pan@linux.intel.com
2019-04-25xen/pvh: correctly setup the PV EFI interface for dom0Roger Pau Monne5-14/+18
This involves initializing the boot params EFI related fields and the efi global variable. Without this fix a PVH dom0 doesn't detect when booted from EFI, and thus doesn't support accessing any of the EFI related data. Reported-by: PGNet Dev <pgnet.dev@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org # 4.19+
2019-04-25xen/pvh: set xen_domain_type to HVM in xen_pvh_initRoger Pau Monne1-0/+1
Or else xen_domain() returns false despite xen_pvh being set. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org # 4.19+
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers2-3/+0
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-25x86/Kconfig: Deprecate DISCONTIGMEM support for 32-bit x86Mike Rapoport1-1/+2
Mel Gorman says: "32-bit NUMA systems should be non-existent in practice. The last NUMA system I'm aware of that was both NUMA and 32-bit only died somewhere between 2004 and 2007. If someone is running a 64-bit capable system in 32-bit mode with NUMA, they really are just punishing themselves for fun." Mark DISCONTIGMEM broken for now as suggested by Christoph Hellwig, and (hopefully) remove it in a couple of releases. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1556112252-9339-3-git-send-email-rppt@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-25x86/Kconfig: Make SPARSEMEM default for 32-bit x86Mike Rapoport1-6/+1
Sparsemem has been a default memory model for x86-64 for over a decade, since: b263295dbffd ("x86: 64-bit, make sparsemem vmemmap the only memory model"). Make it the default for 32-bit NUMA systems (if there any left) as well. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1556112252-9339-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-25perf/x86/intel: Update KBL Package C-state events to also include ↵Harry Pan1-5/+5
PC8/PC9/PC10 counters Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters. This patch updates the list of Kaby/Coffee Lake PMU event counters from the snb_cstates[] list of events to the hswult_cstates[] list of events, which keeps all previously supported events and also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters. This allows user space tools to profile them through the perf interface. Signed-off-by: Harry Pan <harry.pan@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: gs0622@gmail.com Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/pci: Clean up usage of X86_DEV_DMA_OPSChristoph Hellwig1-3/+0
We have supported per-device dma_map_ops in generic code for a long time, and this symbol just guards the inclusion of the dma_map_ops registry used for vmd. Stop enabling it for anything but vmd. No change in functionality intended. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190410080220.21705-3-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/build: Move _etext to actual end of .textKees Cook1-3/+3
When building x86 with Clang LTO and CFI, CFI jump regions are automatically added to the end of the .text section late in linking. As a result, the _etext position was being labelled before the appended jump regions, causing confusion about where the boundaries of the executable region actually are in the running kernel, and broke at least the fault injection code. This moves the _etext mark to outside (and immediately after) the .text area, as it already the case on other architectures (e.g. arm64, arm). Reported-and-tested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190423183827.GA4012@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/um/vdso: Drop unnecessary cc-ldoptionNick Desaulniers1-1/+1
Towards the goal of removing cc-ldoption, it seems that --hash-style= was added to binutils 2.17.50.0.2 in 2006. The minimal required version of binutils for the kernel according to Documentation/process/changes.rst is 2.20. Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: anton.ivanov@cambridgegreys.com Cc: clang-built-linux@googlegroups.com Cc: jdike@addtoit.com Cc: linux-um@lists.infradead.org Cc: richard@nod.at Link: http://lkml.kernel.org/r/20190423211554.1594-1-ndesaulniers@google.com Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()Jiri Kosina1-2/+0
In-NMI warnings have been added to vmalloc_fault() via: ebc8827f75 ("x86: Barf when vmalloc and kmemcheck faults happen in NMI") back in the time when our NMI entry code could not cope with nested NMIs. These days, it's perfectly fine to take a fault in NMI context and we don't have to care about the fact that IRET from the fault handler might cause NMI nesting. This warning has already been removed from 32-bit implementation of vmalloc_fault() in: 6863ea0cda8 ("x86/mm: Remove in_nmi() warning from vmalloc_fault()") but the 64-bit version was omitted. Remove the bogus warning also from 64-bit implementation of vmalloc_fault(). Reported-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6863ea0cda8 ("x86/mm: Remove in_nmi() warning from vmalloc_fault()") Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1904240902280.9803@cbobk.fhfr.pm Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/uaccess: Dont leak the AC flag into __put_user() argument evaluationPeter Zijlstra1-3/+4
The __put_user() macro evaluates it's @ptr argument inside the __uaccess_begin() / __uaccess_end() region. While this would normally not be expected to be an issue, an UBSAN bug (it ignored -fwrapv, fixed in GCC 8+) would transform the @ptr evaluation for: drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) { into a signed-overflow-UB check and trigger the objtool AC validation. Finish this commit: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") and explicitly evaluate all 3 arguments early. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Fixes: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24x86/mm: Fix a crash with kmemleak_scan()Qian Cai1-0/+6
The first kmemleak_scan() call after boot would trigger the crash below because this callpath: kernel_init free_initmem mem_encrypt_free_decrypted_mem free_init_pages unmaps memory inside the .bss when DEBUG_PAGEALLOC=y. kmemleak_init() will register the .data/.bss sections and then kmemleak_scan() will scan those addresses and dereference them looking for pointer references. If free_init_pages() frees and unmaps pages in those sections, kmemleak_scan() will crash if referencing one of those addresses: BUG: unable to handle kernel paging request at ffffffffbd402000 CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4 RIP: 0010:scan_block Call Trace: scan_gray_list kmemleak_scan kmemleak_scan_thread kthread ret_from_fork Since kmemleak_free_part() is tolerant to unknown objects (not tracked by kmemleak), it is fine to call it from free_init_pages() even if not all address ranges passed to this function are known to kmemleak. [ bp: Massage. ] Fixes: b3f0907c71e0 ("x86/mm: Add .bss..decrypted section to hold shared variables") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190423165811.36699-1-cai@lca.pw
2019-04-24Merge tag 'drm-intel-next-2019-04-17' of ↵Dave Airlie1-1/+3
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - uAPI "Fixes:" patch for the upcoming kernel 5.1, included here too We have an Ack from the media folks (only current user) for this late tweak Cross-subsystem Changes: - ALSA: hda: Fix racy display power access (Takashi, Chris) Driver Changes: - DDI and MIPI-DSI clocks fixes for Icelake (Vandita) - Fix Icelake frequency change/locking (RPS) (Mika) - Temporarily disable ppGTT read-only bit on Icelake (Mika) - Add missing Icelake W/As (Mika) - Enable 12 deep CSB status FIFO on Icelake (Mika) - Inherit more Icelake code for Elkhartlake (Bob, Jani) - Handle catastrophic error on engine reset (Mika) - Shortcut readiness to reset check (Mika) - Regression fix for GEM_BUSY causing us to report a mixed uabi-class request as not busy (Chris) - Revert back to max link rate and lane count on eDP (Jani) - Fix pipe BPP readout for BXT/GLK DSI (Ville) - Set DP min_bpp to 8*3 for non-RGB output formats (Ville) - Enable coarse preemption boundaries for Gen8 (Chris) - Do not enable FEC without DSC (Ville) - Restore correct BXT DDI latency optim setting calculation (Ville) - Always reset context's RING registers to avoid running workload twice during reset (Chris) - Set GPU wedged on driver unload (Janusz) - Consolidate two similar barries from timeline into one (Chris) - Only reset the pinned kernel contexts on resume (Chris) - Wakeref tracking improvements (Chris, Imre) - Lockdep fixes for shrinker interactions (Chris) - Bump ready tasks ahead of busywaits in prep of semaphore use (Chris) - Huge step in splitting display code into fine grained files (Jani) - Refactor the IRQ init/reset macros for code saving (Paulo) - Convert IRQ initialization code to uncore MMIO access (Paulo) - Convert workarounds code to use uncore MMIO access (Chris) - Nuke drm_crtc_state and use intel_atomic_state instead (Manasi) - Update SKL clock-gating WA (Radhakrishna, Ville) - Isolate GuC reset code flow (Chris) - Expose force_dsc_enable through debugfs (Manasi) - Header standalone compile testing framework (Jani) - Code cleanups to reduce driver footprint (Chris) - PSR code fixes and cleanups (Jose) - Sparse and kerneldoc updates (Chris) - Suppress spurious combo PHY B warning (Vile) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190418080426.GA6409@jlahtine-desk.ger.corp.intel.com
2019-04-23x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h modelsYazen Ghannam3-13/+48
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA errors under certain conditions. The errors are benign and can safely be ignored. However, the high error rate may cause the MCA threshold counter to overflow causing a high rate of thresholding interrupts. In addition, users may see the errors reported through the AMD MCE decoder module, even with the interrupt disabled, due to MCA polling. Clear the "Counter Present" bit in the Instruction Fetch bank's MCA_MISC0 register. This will prevent enabling MCA thresholding on this bank which will prevent the high interrupt rate due to this error. Define an AMD-specific function to filter these errors from the MCE event pool so that they don't get reported during early boot. Rename filter function in EDAC/mce_amd to avoid a naming conflict, while at it. [ bp: Move function prototype to the internal header and massage/cleanup, fix typos. ] Reported-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "clemej@gmail.com" <clemej@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd407455: x86/MCE: Group AMD function prototypes in <asm/mce.h> Cc: <stable@vger.kernel.org> # 5.0.x Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
2019-04-23x86/MCE: Add an MCE-record filtering functionYazen Ghannam3-0/+11
Some systems may report spurious MCA errors. In general, spurious MCA errors may be disabled by clearing a particular bit in MCA_CTL. However, clearing a bit in MCA_CTL may not be recommended for some errors, so the only option is to ignore them. An MCA error is printed and handled after it has been added to the MCE event pool. So an MCA error can be ignored by not adding it to that pool in the first place. Add such a filtering function. [ bp: Move function prototype to the internal header and massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "clemej@gmail.com" <clemej@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: "rafal@milecki.pl" <rafal@milecki.pl> Cc: Shirish S <Shirish.S@amd.com> Cc: <stable@vger.kernel.org> # 5.0.x Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190325163410.171021-1-Yazen.Ghannam@amd.com
2019-04-23x86/xen: Add "xen_timer_slop" command line optionRyan Thibodeaux1-3/+17
Add a new command-line option "xen_timer_slop=<INT>" that sets the minimum delta of virtual Xen timers. This commit does not change the default timer slop value for virtual Xen timers. Lowering the timer slop value should improve the accuracy of virtual timers (e.g., better process dispatch latency), but it will likely increase the number of virtual timer interrupts (relative to the original slop setting). The original timer slop value has not changed since the introduction of the Xen-aware Linux kernel code. This commit provides users an opportunity to tune timer performance given the refinements to hardware and the Xen event channel processing. It also mirrors a feature in the Xen hypervisor - the "timer_slop" Xen command line option. [boris: updated comment describing TIMER_SLOP] Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-04-22x86/irq: Fix outdated commentsJiang Biao1-2/+2
INVALIDATE_TLB_VECTOR_START has been removed by: 52aec3308db8("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR") while VSYSCALL_EMU_VECTO(204) has also been removed, by: 3ae36655b97a("x86-64: Rework vsyscall emulation and add vsyscall= parameter") so update the comments in <asm/irq_vectors.h> accordingly. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20190422024943.71918-1-benbjiang@tencent.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-22x86/boot: Disable RSDP parsing temporarilyBorislav Petkov1-1/+1
The original intention to move RDSP parsing very early, before KASLR does its ranges selection, was to accommodate movable memory regions machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory hotplug. However, that broke kexec'ing a kernel on EFI machines because depending on where the EFI systab was mapped, on at least one machine it isn't present in the kexec mapping of the second kernel, leading to a triple fault in the early code. Fixing this properly requires significantly involved surgery and we cannot allow ourselves to do that, that close to the merge window. So disable the RSDP parsing code temporarily until it is fixed properly in the next release cycle. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190419141952.GE10324@zn.tnic
2019-04-22x86/kdump: Fall back to reserve high crashkernel memoryDave Young1-8/+14
crashkernel=xM tries to reserve memory for the crash kernel under 4G, which is enough, usually. But this could fail sometimes, for example when one tries to reserve a big chunk like 2G, for example. So let the crashkernel=xM just fall back to use high memory in case it fails to find a suitable low range. Do not set the ,high as default because it allocates extra low memory for DMA buffers and swiotlb, and this is not always necessary for all machines. Typically, crashkernel=128M usually works with low reservation under 4G, so keep <4G as default. [ bp: Massage. ] Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: linux-doc@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: piliu@redhat.com Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thymo van Beers <thymovanbeers@gmail.com> Cc: vgoyal@redhat.com Cc: x86-ml <x86@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Zhimin Gu <kookoo.gu@intel.com> Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com
2019-04-22x86/kdump: Have crashkernel=X reserve under 4G by defaultDave Young1-5/+5
The kdump crashkernel low reservation is limited to under 896M even for X86_64. This obscure and miserable limitation exists for compatibility with old kexec-tools but the reason is not documented anywhere. Some more tests/investigations about the background: a) Previously, old kexec-tools could only load purgatory to memory under 2G. Eric removed that limitation in 2012 in kexec-tools: b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere in the 64bit address space.") b) Back in 2013 Yinghai removed all the limitations in new kexec-tools, bzImage64 can be loaded anywhere: 82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G") c) Test results with old kexec-tools with old and latest kernels: 1. Old kexec-tools can not build with modern toolchain anymore, I built it in a RHEL6 vm. 2. 2.0.0 kexec-tools does not work with the latest kernel even with memory under 896M and gives an error: "ELF core (kcore) parse failed" For that it needs below kexec-tools fix: ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid") 3. Even with patched kexec-tools which fixes 2), it still needs some other fixes to work correctly for KASLR-enabled kernels. So the situation is: * Old kexec-tools is already broken with latest kernels. * We can not keep these limitations forever just for compatibility with very old kexec-tools. * If one must use old tools then he/she can choose crashkernel=X@Y. * People have reported bugs where crashkernel=384M failed because KASLR makes the 0-896M space sparse. * Crashkernel can reserve in low or high area, it is natural to understand low as memory under 4G. Hence drop the 896M limitation and change crashkernel low reservation to reserve under 4G by default. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: piliu@redhat.com Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vgoyal@redhat.com Cc: x86-ml <x86@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Zhimin Gu <kookoo.gu@intel.com> Link: https://lkml.kernel.org/r/20190421035058.943630505@redhat.com
2019-04-21x86/fault: Make fault messages more succinctBorislav Petkov1-3/+3
So we are going to be staring at those in the next years, let's make them more succinct. In particular: - change "address = " to "address: " - "-privileged" reads funny. It should be simply "kernel" or "user" - "from kernel code" reads funny too. "kernel mode" or "user mode" is more natural. An actual example says more than 1000 words, of course: [ 0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8 [ 0.249120] #PF: supervisor write access in kernel mode [ 0.249717] #PF: error_code(0x0002) - not-present page Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: riel@surriel.com Cc: sean.j.christopherson@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds4-31/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
2019-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds11-14/+41
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: a console spam fix, section attributes fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot quirk, boot options related warnings fix, an LTO fix, a deadlock fix and an RDT fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority x86/cpu/bugs: Use __initconst for 'const' init data x86/mm/KASLR: Fix the size of the direct mapping section x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T x86/mm: Prevent bogus warnings with "noexec=off" x86/build/lto: Fix truncated .bss with -fdata-sections x86/speculation: Prevent deadlock on ssb_state::lock x86/resctrl: Do not repeat rdtgroup mode initialization
2019-04-20asm-generic: generalize asm/sockios.hArnd Bergmann1-1/+0
ia64, parisc and sparc just use a copy of the generic version of asm/sockios.h, and x86 is a redirect to the same file, so we can just let the header file be generated. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19x86/vdso: Rename variable to fix -Wshadow warningLeonardo Brás1-6/+7
The go32() and go64() functions has an argument and a local variable called ‘name’. Rename both to clarify the code and to fix a warning with -Wshadow. Signed-off-by: Leonardo Brás <leobras.c@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David.Laight@aculab.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: helen@koikeco.de Cc: linux-kbuild@vger.kernel.org Cc: lkcamp@lists.libreplanetbr.org Link: http://lkml.kernel.org/r/20181023011022.GA6574@WindFlash Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/tools/relocs: Fix big section header tablesArtem Savkov1-29/+45
In case when the number of entries in the section header table is larger then or equal to SHN_LORESERVE the size of the table is held in the sh_size member of the initial entry in section header table instead of e_shnum. Same with the string table index which is located in sh_link instead of e_shstrndx. This case is easily reproducible with KCFLAGS="-ffunction-sections", bzImage build fails with "String table index out of bounds" error. Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Cc: Eric W . Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181129155615.2594-1-asavkov@redhat.com [ Simplify the die() lines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptiveJuri Lelli1-1/+1
DEBUG_HOTPLUG_CPU0 debug feature offlines a CPU as early as possible allowing userspace to boot up without that CPU (so that it is possible to check for unwanted dependencies towards the offlined CPU). After doing so it emits a "CPU %u is now offline" pr_info, which is not enough descriptive of why the CPU was offlined (e.g., one might be running with a config that triggered some problem, not being aware that CONFIG_DEBUG_ HOTPLUG_CPU0 is set). Add a bit more of informative text to the pr_info, so that it is immediately obvious why a CPU has been offlined in early boot stages. Background: Got to scratch my head a bit while debugging a WARNING splat related to the offlining of CPU0. Without being aware yet of this debug option it wasn't immediately obvious why CPU0 was being offlined by the kernel. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/20181219151647.15073-1-juri.lelli@redhat.com [ Merge line-broken line. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/fault: Decode and print #PF oops in human readable formSean Christopherson1-31/+11
Linus pointed out that deciphering the raw #PF error code and printing a more human readable message are two different things, and also that printing the negative cases is mostly just noise[1]. For example, the USER bit doesn't mean the fault originated in user code and stating that an oops wasn't due to a protection keys violation isn't interesting since an oops on a keys violation is a one-in-a-million scenario. Remove the per-bit decoding of the error code and instead print: - the raw error code - why the fault occurred - the effective privilege level of the access - the type of access - whether the fault originated in user code or kernel code This provides the user with the information needed to triage 99.9% of oopses without polluting the log with useless information or conflating the error_code with the CPL. Sample output: BUG: kernel NULL pointer dereference, address = 0000000000000008 #PF: supervisor-privileged instruction fetch from kernel code #PF: error_code(0x0010) - not-present page BUG: unable to handle page fault for address = ffffbeef00000000 #PF: supervisor-privileged instruction fetch from kernel code #PF: error_code(0x0010) - not-present page BUG: unable to handle page fault for address = ffffc90000230000 #PF: supervisor-privileged write access from kernel code #PF: error_code(0x000b) - reserved bit violation [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/fault: Reword initial BUG message for unhandled page faultsSean Christopherson1-3/+6
Reword the NULL pointer dereference case to simply state that a NULL pointer was dereferenced, i.e. drop "unable to handle" as that implies that there are instances where the kernel actual does handle NULL pointer dereferences, which is not true barring funky exception fixup. For the non-NULL case, replace "kernel paging request" with "page fault" as the kernel can technically oops on faults that originated in user code. Dropping "kernel" also allows future patches to provide detailed information on where the fault occurred, e.g. user vs. kernel, without conflicting with the initial BUG message. In both cases, replace "at address=" with wording more appropriate to the oops, as "at" may be interpreted as stating that the address is the RIP of the instruction that faulted. Last, and probably least, further qualify the NULL-pointer path by checking that the fault actually originated in kernel code. It's technically possible for userspace to map address 0, and not printing a super specific message is the least of our worries if the kernel does manage to oops on an actual NULL pointer dereference from userspace. Before: BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000 BUG: unable to handle kernel paging request at ffffbeef00000000 After: BUG: kernel NULL pointer dereference, address = 0000000000000008 BUG: unable to handle page fault for address = ffffbeef00000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/power: Optimize C3 entry on Centaur CPUsDavid Wang1-0/+12
For new Centaur CPUs the ucode will take care of the preservation of cache coherence between CPU cores in C-states regardless of how deep the C-states are. So, it is not necessary to flush the caches in software befor entering C3. This useless operation will cause performance drop for the cores which share some caches with the idling core. Signed-off-by: David Wang <davidwang@zhaoxin.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: brucechang@via-alliance.com Cc: cooperyan@zhaoxin.com Cc: len.brown@intel.com Cc: linux-pm@kernel.org Cc: qiyuanwang@zhaoxin.com Cc: rjw@rjwysocki.net Cc: timguo@zhaoxin.com Link: http://lkml.kernel.org/r/1545900110-2757-1-git-send-email-davidwang@zhaoxin.com [ Tidy up the comment. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/mce: Fix debugfs_simple_attr.cocci warningsYueHaibing1-4/+4
Use DEFINE_DEBUGFS_ATTRIBUTE() rather than DEFINE_SIMPLE_ATTRIBUTE() for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/1545981853-70877-1-git-send-email-yuehaibing@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log ↵Hans de Goede1-2/+2
priority The "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'" message triggers on pretty much every Intel machine. The purpose of log messages with a warning level is to notify the user of something which potentially is a problem, or at least somewhat unexpected. This message clearly does not match those criteria, so lower its log priority from warning to info. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181230172715.17469-1-hdegoede@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/kvm: Make steal_time visibleAndi Kleen1-1/+1
This per cpu variable is accessed from assembler code, so it needs to be visible for LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20190330004743.29541-8-andi@firstfloor.org
2019-04-19x86/hyperv: Make hv_vcpu_is_preempted() visibleAndi Kleen1-1/+1
This function is referrenced from assembler, so it needs to be marked visible for LTO. Fixes: 3a025de64bf8 ("x86/hyperv: Enable PV qspinlock for Hyper-V") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yi Sun <yi.y.sun@linux.intel.com> Cc: kys@microsoft.com Cc: haiyangz@microsoft.com Link: https://lkml.kernel.org/r/20190330004743.29541-6-andi@firstfloor.org
2019-04-19x86/timer: Don't inline __const_udelay()Andi Kleen1-1/+1
LTO will happily inline __const_udelay() everywhere it is used. Forcing it noinline saves ~44k text in a LTO build. 13999560 1740864 1499136 17239560 1070e08 vmlinux-with-udelay-inline 13954764 1736768 1499136 17190668 1064f0c vmlinux-wo-udelay-inline Even without LTO this function should never be inlined. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-4-andi@firstfloor.org
2019-04-19x86/cpu/amd: Exclude 32bit only assembler from 64bit buildAndi Kleen1-0/+2
The "vide" inline assembler is only needed on 32bit kernels for old 32bit only CPUs. Guard it with an #ifdef so it's not included in 64bit builds. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-2-andi@firstfloor.org
2019-04-19x86/asm: Mark all top level asm statements as .textAndi Kleen3-1/+4
With gcc toplevel assembler statements that do not mark themselves as .text may end up in other sections. This causes LTO boot crashes because various assembler statements ended up in the middle of the initcall section. It's also a latent problem without LTO, although it's currently not known to cause any real problems. According to the gcc team it's expected behavior. Always mark all the top level assembler statements as text so that they switch to the right section. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-1-andi@firstfloor.org
2019-04-19x86/cpu/bugs: Use __initconst for 'const' init dataAndi Kleen1-3/+3
Some of the recently added const tables use __initdata which causes section attribute conflicts. Use __initconst instead. Fixes: fa1202ef2243 ("x86/speculation: Add command line control") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190330004743.29541-9-andi@firstfloor.org