summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-10-07x86/mce: Decode a kernel instruction to determine if it is copying from userTony Luck2-4/+60
All instructions copying data between kernel and user memory are tagged with either _ASM_EXTABLE_UA or _ASM_EXTABLE_CPY entries in the exception table. ex_fault_handler_type() returns EX_HANDLER_UACCESS for both of these. Recovery is only possible when the machine check was triggered on a read from user memory. In this case the same strategy for recovery applies as if the user had made the access in ring3. If the fault was in kernel memory while copying to user there is no current recovery plan. For MOV and MOVZ instructions a full decode of the instruction is done to find the source address. For MOVS instructions the source address is in the %rsi register. The function fault_in_kernel_space() determines whether the source address is kernel or user, upgrade it from "static" so it can be used here. Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-7-tony.luck@intel.com
2020-10-07x86/mce: Recover from poison found while copying from user spaceTony Luck1-7/+20
Existing kernel code can only recover from a machine check on code that is tagged in the exception table with a fault handling recovery path. Add two new fields in the task structure to pass information from machine check handler to the "task_work" that is queued to run before the task returns to user mode: + mce_vaddr: will be initialized to the user virtual address of the fault in the case where the fault occurred in the kernel copying data from a user address. This is so that kill_me_maybe() can provide that information to the user SIGBUS handler. + mce_kflags: copy of the struct mce.kflags needed by kill_me_maybe() to determine if mce_vaddr is applicable to this error. Add code to recover from a machine check while copying data from user space to the kernel. Action for this case is the same as if the user touched the poison directly; unmap the page and send a SIGBUS to the task. Use a new helper function to share common code between the "fault in user mode" case and the "fault while copying from user" case. New code paths will be activated by the next patch which sets MCE_IN_KERNEL_COPYIN. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-6-tony.luck@intel.com
2020-10-07x86/mce: Provide method to find out the type of an exception handlerTony Luck1-1/+4
Avoid a proliferation of ex_has_*_handler() functions by having just one function that returns the type of the handler (if any). Drop the __visible attribute for this function. It is not called from assembler so the attribute is not necessary. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-3-tony.luck@intel.com
2020-10-07x86/mce: Pass pointer to saved pt_regs to severity calculation routinesYouquan Song3-14/+17
New recovery features require additional information about processor state when a machine check occurred. Pass pt_regs down to the routines that need it. No functional change. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-2-tony.luck@intel.com
2020-10-07x86/platform/uv: Update Copyrights to conform to HPE standardsMike Travis1-0/+1
Add Copyrights to those files that have been updated for UV5 changes. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005203929.148656-14-mike.travis@hpe.com
2020-10-07x86/platform/uv: Update UV5 TSC checkingMike Travis1-14/+10
Update check of BIOS TSC sync status to include both possible "invalid" states provided by newer UV5 BIOS. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-12-mike.travis@hpe.com
2020-10-07x86/platform/uv: Update node present countingMike Travis1-7/+19
The changes in the UV5 arch shrunk the NODE PRESENT table to just 2x64 entries (128 total) so are in to 64 bit MMRs instead of a depth of 64 bits in an array. Adjust references when counting up the nodes present. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-11-mike.travis@hpe.com
2020-10-07x86/platform/uv: Update UV5 MMR references in UV GRUMike Travis1-6/+24
Make modifications to the GRU mappings to accommodate changes for UV5. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-10-mike.travis@hpe.com
2020-10-07x86/platform/uv: Adjust GAM MMR references affected by UV5 updatesMike Travis1-5/+25
Make modifications to the GAM MMR mappings to accommodate changes for UV5. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-9-mike.travis@hpe.com
2020-10-07x86/platform/uv: Update MMIOH references based on new UV5 MMRsMike Travis1-68/+144
Make modifications to the MMIOH mappings to accommodate changes for UV5. [ Fix W=1 build warnings. ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-8-mike.travis@hpe.com
2020-10-07x86/platform/uv: Add and decode Arch Type in UVsystabMike Travis1-19/+116
When the UV BIOS starts the kernel it passes the UVsystab info struct to the kernel which contains information elements more specific than ACPI, and generally pertinent only to the MMRs. These are read only fields so information is passed one way only. A new field starting with UV5 is the UV architecture type so the ACPI OEM_ID field can be used for other purposes going forward. The UV Arch Type selects the entirety of the MMRs available, with their addresses and fields defined in uv_mmrs.h. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-7-mike.travis@hpe.com
2020-10-07x86/platform/uv: Add UV5 direct referencesMike Travis1-27/+70
Add new references to UV5 (and UVY class) system MMR addresses and fields primarily caused by the expansion from 46 to 52 bits of physical memory address. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-6-mike.travis@hpe.com
2020-10-07x86/platform/uv: Update UV MMRs for UV5Mike Travis1-112/+155
Update UV MMRs in uv_mmrs.h for UV5 based on Verilog output from the UV Hub hardware design files. This is the next UV architecture with a new class (UVY) being defined for 52 bit physical address masks. Uses a bitmask for UV arch identification so a single test can cover multiple versions. Includes other adjustments to match the uv_mmrs.h file to keep from encountering compile errors. New UV5 functionality is added in the patches that follow. [ Fix W=1 build warnings. ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-5-mike.travis@hpe.com
2020-10-07x86/platform/uv: Remove SCIR MMR references for UV systemsMike Travis1-82/+0
UV class systems no longer use System Controller for monitoring of CPU activity provided by this driver. Other methods have been developed for BIOS and the management controller (BMC). Remove that supporting code. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-3-mike.travis@hpe.com
2020-10-07x86/platform/uv: Remove UV BAU TLB Shootdown HandlerMike Travis1-3/+0
The Broadcast Assist Unit (BAU) TLB shootdown handler is being rewritten to become the UV BAU APIC driver. It is designed to speed up sending IPIs to selective CPUs within the system. Remove the current TLB shutdown handler (tlb_uv.c) file and a couple of kernel hooks in the interim. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-2-mike.travis@hpe.com
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams2-13/+5
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06dma-mapping: move dma-debug.h to kernel/dma/Christoph Hellwig1-1/+0
Most of dma-debug.h is not required by anything outside of kernel/dma. Move the four declarations needed by dma-mappin.h or dma-ops providers into dma-mapping.h and dma-map-ops.h, and move the remainder of the file to kernel/dma/debug.h. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>Christoph Hellwig1-1/+1
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment describing the contiguous allocator into kernel/dma/contigous.c. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: split <linux/dma-mapping.h>Christoph Hellwig3-0/+4
Split out all the bits that are purely for dma_map_ops implementations and related code into a new <linux/dma-map-ops.h> header so that they don't get pulled into all the drivers. That also means the architecture specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h> any more, which leads to a missing includes that were pulled in by the x86 or arm versions in a few not overly portable drivers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-02x86: Support Generic Initiator only proximity domainsJonathan Cameron1-0/+1
In common with memoryless domains only register GI domains if the proximity node is not online. If a domain is already a memory containing domain, or a memoryless domain there is nothing to do just because it also contains a Generic Initiator. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02Merge branch 'for-next/mte' into for-next/coreWill Deacon1-1/+1
Add userspace support for the Memory Tagging Extension introduced by Armv8.5. (Catalin Marinas and others) * for-next/mte: (30 commits) arm64: mte: Fix typo in memory tagging ABI documentation arm64: mte: Add Memory Tagging Extension documentation arm64: mte: Kconfig entry arm64: mte: Save tags when hibernating arm64: mte: Enable swap of tagged pages mm: Add arch hooks for saving/restoring tags fs: Handle intra-page faults in copy_mount_options() arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks arm64: mte: Restore the GCR_EL1 register after a suspend arm64: mte: Allow user control of the generated random tags via prctl() arm64: mte: Allow user control of the tag check mode via prctl() mm: Allow arm64 mmap(PROT_MTE) on RAM-based files arm64: mte: Validate the PROT_MTE request via arch_validate_flags() mm: Introduce arch_validate_flags() arm64: mte: Add PROT_MTE support to mmap() and mprotect() mm: Introduce arch_calc_vm_flag_bits() arm64: mte: Tags-aware aware memcmp_pages() implementation arm64: Avoid unnecessary clear_user_page() indirection ...
2020-10-02x86/dumpstack: Fix misleading instruction pointer error messageMark Mossberg1-1/+2
Printing "Bad RIP value" if copy_code() fails can be misleading for userspace pointers, since copy_code() can fail if the instruction pointer is valid but the code is paged out. This is because copy_code() calls copy_from_user_nmi() for userspace pointers, which disables page fault handling. This is reproducible in OOM situations, where it's plausible that the code may be reclaimed in the time between entry into the kernel and when this message is printed. This leaves a misleading log in dmesg that suggests instruction pointer corruption has occurred, which may alarm users. Change the message to state the error condition more precisely. [ bp: Massage a bit. ] Signed-off-by: Mark Mossberg <mark.mossberg@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201002042915.403558-1-mark.mossberg@gmail.com
2020-10-01x86/nmi: Fix nmi_handle() duration miscalculationLibing Zhou1-3/+2
When nmi_check_duration() is checking the time an NMI handler took to execute, the whole_msecs value used should be read from the @duration argument, not from the ->max_duration, the latter being used to store the current maximal duration. [ bp: Rewrite commit message. ] Fixes: 248ed51048c4 ("x86/nmi: Remove irq_work from the long duration NMI handler") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Libing Zhou <libing.zhou@nokia-sbell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Changbin Du <changbin.du@gmail.com> Link: https://lkml.kernel.org/r/20200820025641.44075-1-libing.zhou@nokia-sbell.com
2020-10-01x86/asm: Replace __force_order with a memory clobberArvind Sankar1-2/+2
The CRn accessor functions use __force_order as a dummy operand to prevent the compiler from reordering CRn reads/writes with respect to each other. The fact that the asm is volatile should be enough to prevent this: volatile asm statements should be executed in program order. However GCC 4.9.x and 5.x have a bug that might result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x, may reorder volatile asm statements with respect to each other. There are some issues with __force_order as implemented: - It is used only as an input operand for the write functions, and hence doesn't do anything additional to prevent reordering writes. - It allows memory accesses to be cached/reordered across write functions, but CRn writes affect the semantics of memory accesses, so this could be dangerous. - __force_order is not actually defined in the kernel proper, but the LLVM toolchain can in some cases require a definition: LLVM (as well as GCC 4.9) requires it for PIE code, which is why the compressed kernel has a definition, but also the clang integrated assembler may consider the address of __force_order to be significant, resulting in a reference that requires a definition. Fix this by: - Using a memory clobber for the write functions to additionally prevent caching/reordering memory accesses across CRn writes. - Using a dummy input operand with an arbitrary constant address for the read functions, instead of a global variable. This will prevent reads from being reordered across writes, while allowing memory loads to be cached/reordered across CRn reads, which should be safe. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/ Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
2020-09-30x86/mce: Use idtentry_nmi_enter/exit()Thomas Gleixner1-2/+4
The recent fix for NMI vs. IRQ state tracking missed to apply the cure to the MCE handler. Fixes: ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/87mu17ism2.fsf@nanos.tec.linutronix.de
2020-09-30x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule listTony Luck1-4/+0
Way back in v3.19 Intel and AMD shared the same machine check severity grading code. So it made sense to add a case for AMD DEFERRED errors in commit e3480271f592 ("x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error") But later in v4.2 AMD switched to a separate grading function in commit bf80bbd7dcf5 ("x86/mce: Add an AMD severities-grading function") Belatedly drop the DEFERRED case from the Intel rule list. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-3-tony.luck@intel.com
2020-09-30x86/mce: Add Skylake quirk for patrol scrub reported errorsBorislav Petkov1-2/+26
The patrol scrubber in Skylake and Cascade Lake systems can be configured to report uncorrected errors using a special signature in the machine check bank and to signal using CMCI instead of machine check. Update the severity calculation mechanism to allow specifying the model, minimum stepping and range of machine check bank numbers. Add a new rule to detect the special signature (on model 0x55, stepping >=4 in any of the memory controller banks). [ bp: Rewrite it. aegl: Productize it. ] Suggested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-2-tony.luck@intel.com
2020-09-28cpuidle-haltpoll: fix error comments in arch_haltpoll_disableLi Qiang1-1/+1
The 'arch_haltpoll_disable' is used to disable guest halt poll. Correct the comments. Fixes: a1c4423b02b21 ("cpuidle-haltpoll: disable host side polling when kvm virtualized") Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20200924155800.4939-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury1-4/+4
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-09-27x86/apic/msi: Unbreak DMAR and HPET MSIThomas Gleixner1-0/+2
Switching the DMAR and HPET MSI code to use the generic MSI domain ops missed to add the flag which tells the core code to update the domain operations with the defaults. As a consequence the core code crashes when an interrupt in one of those domains is allocated. Add the missing flags. Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops") Reported-by: Qian Cai <cai@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87wo0fli8b.fsf@nanos.tec.linutronix.de
2020-09-27x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury1-4/+4
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com
2020-09-25x86/sev-es: Use GHCB accessor for setting the MMIO scratch bufferTom Lendacky1-1/+1
Use ghcb_set_sw_scratch() to set the GHCB scratch field, which will also set the corresponding bit in the GHCB valid_bitmap field to denote that sw_scratch is actually valid. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/ba84deabdf44a7a880454fb351d189c6ad79d4ba.1601041106.git.thomas.lendacky@amd.com
2020-09-25dma-mapping: add a new dma_alloc_pages APIChristoph Hellwig1-0/+2
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag. Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (MIPS part)
2020-09-25Merge branch 'master' of ↵Christoph Hellwig3-38/+34
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.
2020-09-23x86/ioapic: Unbreak check_timer()Thomas Gleixner1-0/+1
Several people reported in the kernel bugzilla that between v4.12 and v4.13 the magic which works around broken hardware and BIOSes to find the proper timer interrupt delivery mode stopped working for some older affected platforms which need to fall back to ExtINT delivery mode. The reason is that the core code changed to keep track of the masked and disabled state of an interrupt line more accurately to avoid the expensive hardware operations. That broke an assumption in i8259_make_irq() which invokes disable_irq_nosync(); irq_set_chip_and_handler(); enable_irq(); Up to v4.12 this worked because enable_irq() unconditionally unmasked the interrupt line, but after the state tracking improvements this is not longer the case because the IO/APIC uses lazy disabling. So the line state is unmasked which means that enable_irq() does not call into the new irq chip to unmask it. In principle this is a shortcoming of the core code, but it's more than unclear whether the core code should try to reset state. At least this cannot be done unconditionally as that would break other existing use cases where the chip type is changed, e.g. when changing the trigger type, but the callers expect the state to be preserved. As the way how check_timer() is switching the delivery modes is truly unique, the obvious fix is to simply unmask the i8259 manually after changing the mode to ExtINT delivery and switching the irq chip to the legacy PIC. Note, that the fixes tag is not really precise, but identifies the commit which broke the assumptions in the IO/APIC and i8259 code and that's the kernel version to which this needs to be backported. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-by: p_c_chan@hotmail.com Reported-by: ecm4@mail.com Reported-by: perdigao1@yahoo.com Reported-by: matzes@users.sourceforge.net Reported-by: rvelascog@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: p_c_chan@hotmail.com Tested-by: matzes@users.sourceforge.net Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
2020-09-22x86/irq: Make run_on_irqstack_cond() typesafeThomas Gleixner2-2/+2
Sami reported that run_on_irqstack_cond() requires the caller to cast functions to mismatching types, which trips indirect call Control-Flow Integrity (CFI) in Clang. Instead of disabling CFI on that function, provide proper helpers for the three call variants. The actual ASM code stays the same as that is out of reach. [ bp: Fix __run_on_irqstack() prototype to match. ] Fixes: 931b94145981 ("x86/entry: Provide helpers for executing on the irqstack") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Sami Tolvanen <samitolvanen@google.com> Cc: <stable@vger.kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1052 Link: https://lkml.kernel.org/r/87pn6eb5tv.fsf@nanos.tec.linutronix.de
2020-09-22Merge branch 'x86-seves-for-paolo' of ↵Paolo Bonzini19-82/+105
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
2020-09-22x86/fpu: Handle FPU-related and clearcpuid command line arguments earlierMike Hommey2-55/+55
FPU initialization handles them currently. However, in the case of clearcpuid=, some other early initialization code may check for features before the FPU initialization code is called. Handling the argument earlier allows the command line to influence those early initializations. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200921215638.37980-1-mh@glandium.org
2020-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-19/+3
Pull kvm fixes from Paolo Bonzini: "ARM: - fix fault on page table writes during instruction fetch s390: - doc improvement x86: - The obvious patches are always the ones that turn out to be completely broken. /me hangs his head in shame" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Revert "KVM: Check the allocation of pv cpu mask" KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch docs: kvm: add documentation for KVM_CAP_S390_DIAG318
2020-09-21Merge tag 'x86_urgent_for_v5.9_rc6' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A defconfig fix (Daniel Díaz) - Disable relocation relaxation for the compressed kernel when not built as -pie as in that case kernels built with clang and linked with LLD fail to boot due to the linker optimizing some instructions in non-PIE form; the gory details in the commit message (Arvind Sankar) - A fix for the "bad bp value" warning issued by the frame-pointer unwinder (Josh Poimboeuf) * tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/fp: Fix FP unwinding in ret_from_fork x86/boot/compressed: Disable relocation relaxation x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig
2020-09-21Revert "KVM: Check the allocation of pv cpu mask"Vitaly Kuznetsov1-19/+3
The commit 0f990222108d ("KVM: Check the allocation of pv cpu mask") we have in 5.9-rc5 has two issue: 1) Compilation fails for !CONFIG_SMP, see: https://bugzilla.kernel.org/show_bug.cgi?id=209285 2) This commit completely disables PV TLB flush, see https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/ The allocation problem is likely a theoretical one, if we don't have memory that early in boot process we're likely doomed anyway. Let's solve it properly later. This reverts commit 0f990222108d214a0924d920e6095b58107d7b59. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-18stacktrace: Remove reliable argument from arch_stack_walk() callbackMark Brown1-5/+5
Currently the callback passed to arch_stack_walk() has an argument called reliable passed to it to indicate if the stack entry is reliable, a comment says that this is used by some printk() consumers. However in the current kernel none of the arch_stack_walk() implementations ever set this flag to true and the only callback implementation we have is in the generic stacktrace code which ignores the flag. It therefore appears that this flag is redundant so we can simplify and clarify things by removing it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/20200914153409.25097-2-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-18x86/mce: Annotate mce_rd/wrmsrl() with noinstrBorislav Petkov1-6/+21
They do get called from the #MC handler which is already marked "noinstr". Commit e2def7d49d08 ("x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR") already got rid of the instrumentation in the MSR accessors, fix the annotation now too, in order to get rid of: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200915194020.28807-1-bp@alien8.de
2020-09-18x86/cpu: Add hardware-enforced cache coherency as a CPUID featureKrish Sadhukhan1-0/+1
In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page is enforced. In such a system, it is not required for software to flush the page from all CPU caches in the system prior to changing the value of the C-bit for a page. This hardware- enforced cache coherency is indicated by EAX[10] in CPUID leaf 0x8000001f. [ bp: Use one of the free slots in word 3. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200917212038.5090-2-krish.sadhukhan@oracle.com
2020-09-18x86/unwind/fp: Fix FP unwinding in ret_from_forkJosh Poimboeuf1-1/+2
There have been some reports of "bad bp value" warnings printed by the frame pointer unwinder: WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000 This warning happens when unwinding from an interrupt in ret_from_fork(). If entry code gets interrupted, the state of the frame pointer (rbp) may be undefined, which can confuse the unwinder, resulting in warnings like the above. There's an in_entry_code() check which normally silences such warnings for entry code. But in this case, ret_from_fork() is getting interrupted. It recently got moved out of .entry.text, so the in_entry_code() check no longer works. It could be moved back into .entry.text, but that would break the noinstr validation because of the call to schedule_tail(). Instead, initialize each new task's RBP to point to the task's entry regs via an encoded frame pointer. That will allow the unwinder to reach the end of the stack gracefully. Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com
2020-09-17x86/mmu: Allocate/free a PASIDFenghua Yu1-0/+57
A PASID is allocated for an "mm" the first time any thread binds to an SVA-capable device and is freed from the "mm" when the SVA is unbound by the last thread. It's possible for the "mm" to have different PASID values in different binding/unbinding SVA cycles. The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is propagated to a per-thread PASID MSR for all threads within the mm through IPI, context switch, or inherited. This is done to ensure that a running thread has the right PASID in the MSR matching the mm's PASID. [ bp: s/SVM/SVA/g; massage. ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-10-git-send-email-fenghua.yu@intel.com
2020-09-17x86/fpu/xstate: Add supervisor PASID state for ENQCMDYu-cheng Yu1-1/+5
The ENQCMD instruction reads a PASID from the IA32_PASID MSR. The MSR is stored in the task's supervisor XSAVE* PASID state and is context-switched by XSAVES/XRSTORS. [ bp: Add (in-)definite articles and massage. ] Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-6-git-send-email-fenghua.yu@intel.com
2020-09-17x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructionsFenghua Yu1-0/+1
Work submission instruction comes in two flavors. ENQCMD can be called both in ring 3 and ring 0 and always uses the contents of a PASID MSR when shipping the command to the device. ENQCMDS allows a kernel driver to submit commands on behalf of a user process. The driver supplies the PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel as of now. The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-5-git-send-email-fenghua.yu@intel.com
2020-09-16efi: Support for MOK variable config tableLenny Szubowicz1-0/+1
Because of system-specific EFI firmware limitations, EFI volatile variables may not be capable of holding the required contents of the Machine Owner Key (MOK) certificate store when the certificate list grows above some size. Therefore, an EFI boot loader may pass the MOK certs via a EFI configuration table created specifically for this purpose to avoid this firmware limitation. An EFI configuration table is a much more primitive mechanism compared to EFI variables and is well suited for one-way passage of static information from a pre-OS environment to the kernel. This patch adds initial kernel support to recognize, parse, and validate the EFI MOK configuration table, where named entries contain the same data that would otherwise be provided in similarly named EFI variables. Additionally, this patch creates a sysfs binary file for each EFI MOK configuration table entry found. These files are read-only to root and are provided for use by user space utilities such as mokutil. A subsequent patch will load MOK certs into the trusted platform key ring using this infrastructure. Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com> Link: https://lore.kernel.org/r/20200905013107.10457-2-lszubowi@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-09-16x86/irq: Cleanup the arch_*_msi_irqs() leftoversThomas Gleixner2-40/+0
Get rid of all the gunk and remove the 'select PCI_MSI_ARCH_FALLBACK' from the x86 Kconfig so the weak functions in the PCI core are replaced by stubs which emit a warning, which ensures that any fail to set the irq domain pointer results in a warning when the device is used. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112334.086003720@linutronix.de