summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2021-06-01kprobes: Remove kprobe::fault_handlerPeter Zijlstra1-10/+0
The reason for kprobe::fault_handler(), as given by their comment: * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by * copy_from_user(), get_user() etc. Let the * user-specified handler try to fix it first. Is just plain bad. Those other handlers are ran from non-preemptible context and had better use _nofault() functions. Also, there is no upstream usage of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-06-01perf/x86/intel/uncore: Fix M2M event umask for Ice Lake serverKan Liang1-1/+2
Perf tool errors out with the latest event list for the Ice Lake server. event syntax error: 'unc_m2m_imc_reads.to_pmm' \___ value too big for format, maximum is 255 The same as the Snow Ridge server, the M2M uncore unit in the Ice Lake server has the unit mask extension field as well. Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1622552943-119174-1-git-send-email-kan.liang@linux.intel.com
2021-05-31x86/thermal: Fix LVT thermal setup for SMI delivery modeBorislav Petkov2-1/+12
There are machines out there with added value crap^WBIOS which provide an SMI handler for the local APIC thermal sensor interrupt. Out of reset, the BSP on those machines has something like 0x200 in that APIC register (timestamps left in because this whole issue is timing sensitive): [ 0.033858] read lvtthmr: 0x330, val: 0x200 which means: - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled - bits [10:8] have 010b which means SMI delivery mode. Now, later during boot, when the kernel programs the local APIC, it soft-disables it temporarily through the spurious vector register: setup_local_APIC: ... /* * If this comes from kexec/kcrash the APIC might be enabled in * SPIV. Soft disable it before doing further initialization. */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write(APIC_SPIV, value); which means (from the SDM): "10.4.7.2 Local APIC State After It Has Been Software Disabled ... * The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored." And this happens too: [ 0.124111] APIC: Switch to symmetric I/O mode setup [ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0 [ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0 This results in CPU 0 soft lockups depending on the placement in time when the APIC soft-disable happens. Those soft lockups are not 100% reproducible and the reason for that can only be speculated as no one tells you what SMM does. Likely, it confuses the SMM code that the APIC is disabled and the thermal interrupt doesn't doesn't fire at all, leading to CPU 0 stuck in SMM forever... Now, before 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") due to how the APIC_LVTTHMR was read before APIC initialization in mcheck_intel_therm_init(), it would read the value with the mask bit 16 clear and then intel_init_thermal() would replicate it onto the APs and all would be peachy - the thermal interrupt would remain enabled. But that commit moved that reading to a later moment in intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too late and with its interrupt mask bit set. Thus, revert back to the old behavior of reading the thermal LVT register before the APIC gets initialized. Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") Reported-by: James Feeney <james@nurealm.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systemsPu Wen1-1/+2
Hygon systems support the MONITOR/MWAIT instructions and these can be used for ACPI C1 in the same way as on AMD and Intel systems. The BIOS declares a C1 state in _CST to use FFH and CPUID_Fn00000005_EDX is non-zero on Hygon systems. Allow ffh_cstate_init() to succeed on Hygon systems to default using FFH MWAIT instead of HALT for ACPI C1. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210528081417.31474-1-puwen@hygon.cn
2021-05-31perf/x86/intel/uncore: Fix a kernel WARNING triggered by maxcpus=1Kan Liang1-2/+4
A kernel WARNING may be triggered when setting maxcpus=1. The uncore counters are Die-scope. When probing a PCI device, only the BUS information can be retrieved. The uncore driver has to maintain a mapping table used to calculate the logical Die ID from a given BUS#. Before the patch ba9506be4e40, the mapping table stores the mapping information from the BUS# -> a Physical Socket ID. To calculate the logical die ID, perf does, - In snbep_pci2phy_map_init(), retrieve the BUS# -> a Physical Socket ID from the UBOX PCI configure space. - Calculate the mapping information (a BUS# -> a Physical Socket ID) for the other PCI BUS. - In the uncore_pci_probe(), get the physical Socket ID from a given BUS and the mapping table. - Calculate the logical Die ID Since only the logical Die ID is required, with the patch ba9506be4e40, the mapping table stores the mapping information from the BUS# -> a logical Die ID. Now perf does, - In snbep_pci2phy_map_init(), retrieve the BUS# -> a Physical Socket ID from the UBOX PCI configure space. - Calculate the logical Die ID - Calculate the mapping information (a BUS# -> a logical Die ID) for the other PCI BUS. - In the uncore_pci_probe(), get the logical die ID from a given BUS and the mapping table. When calculating the logical Die ID, -1 may be returned, especially when maxcpus=1. Here, -1 means the logical Die ID is not found. But when calculating the mapping information for the other PCI BUS, -1 indicates that it's the other PCI BUS that requires the calculation of the mapping. The driver will mistakenly do the calculation. Uses the -ENODEV to indicate the case which the logical Die ID is not found. The driver will not mess up the mapping table anymore. Fixes: ba9506be4e40 ("perf/x86/intel/uncore: Store the logical die id instead of the physical die id.") Reported-by: John Donnelly <john.p.donnelly@oracle.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Tested-by: John Donnelly <john.p.donnelly@oracle.com> Link: https://lkml.kernel.org/r/1622037527-156028-1-git-send-email-kan.liang@linux.intel.com
2021-05-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-36/+67
Pull KVM fixes from Paolo Bonzini: "ARM fixes: - Another state update on exit to userspace fix - Prevent the creation of mixed 32/64 VMs - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code x86 fixes: - fix guest missed wakeup with assigned devices - fix WARN reported by syzkaller - do not use BIT() in UAPI headers - make the kvm_amd.avic parameter bool PPC fixes: - make halt polling heuristics consistent with other architectures selftests: - various fixes - new performance selftest memslot_perf_test - test UFFD minor faults in demand_paging_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits) selftests: kvm: fix overlapping addresses in memslot_perf_test KVM: X86: Kill off ctxt->ud KVM: X86: Fix warning caused by stale emulation context KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception KVM: x86/mmu: Fix comment mentioning skip_4k KVM: VMX: update vcpu posted-interrupt descriptor when assigning device KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK KVM: x86: add start_assignment hook to kvm_x86_ops KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch selftests: kvm: do only 1 memslot_perf_test run by default KVM: X86: Use _BITUL() macro in UAPI headers KVM: selftests: add shared hugetlbfs backing source type KVM: selftests: allow using UFFD minor faults for demand paging KVM: selftests: create alias mappings when using shared memory KVM: selftests: add shmem backing source type KVM: selftests: refactor vm_mem_backing_src_type flags KVM: selftests: allow different backing source types KVM: selftests: compute correct demand paging size KVM: selftests: simplify setup_demand_paging error handling KVM: selftests: Print a message if /dev/kvm is missing ...
2021-05-29x86/apic: Mark _all_ legacy interrupts when IO/APIC is missingThomas Gleixner3-0/+22
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case. When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors. As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty. Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors. Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
2021-05-28KVM: X86: Kill off ctxt->udWanpeng Li3-7/+5
ctxt->ud is consumed only by x86_decode_insn(), we can kill it off by passing emulation_type to x86_decode_insn() and dropping ctxt->ud altogether. Tracking that info in ctxt for literally one call is silly. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <1622160097-37633-2-git-send-email-wanpengli@tencent.com>
2021-05-28KVM: X86: Fix warning caused by stale emulation contextWanpeng Li1-5/+5
Reported by syzkaller: WARNING: CPU: 7 PID: 10526 at linux/arch/x86/kvm//x86.c:7621 x86_emulate_instruction+0x41b/0x510 [kvm] RIP: 0010:x86_emulate_instruction+0x41b/0x510 [kvm] Call Trace: kvm_mmu_page_fault+0x126/0x8f0 [kvm] vmx_handle_exit+0x11e/0x680 [kvm_intel] vcpu_enter_guest+0xd95/0x1b40 [kvm] kvm_arch_vcpu_ioctl_run+0x377/0x6a0 [kvm] kvm_vcpu_ioctl+0x389/0x630 [kvm] __x64_sys_ioctl+0x8e/0xd0 do_syscall_64+0x3c/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Commit 4a1e10d5b5d8 ("KVM: x86: handle hardware breakpoints during emulation()) adds hardware breakpoints check before emulation the instruction and parts of emulation context initialization, actually we don't have the EMULTYPE_NO_DECODE flag here and the emulation context will not be reused. Commit c8848cee74ff ("KVM: x86: set ctxt->have_exception in x86_decode_insn()) triggers the warning because it catches the stale emulation context has #UD, however, it is not during instruction decoding which should result in EMULATION_FAILED. This patch fixes it by moving the second part emulation context initialization into init_emulate_ctxt() and before hardware breakpoints check. The ctxt->ud will be dropped by a follow-up patch. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134683fdd00000 Reported-by: syzbot+71271244f206d17f6441@syzkaller.appspotmail.com Fixes: 4a1e10d5b5d8 (KVM: x86: handle hardware breakpoints during emulation) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <1622160097-37633-1-git-send-email-wanpengli@tencent.com>
2021-05-28KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interceptionYuan Yao2-5/+3
The kvm_get_linear_rip() handles x86/long mode cases well and has better readability, __kvm_set_rflags() also use the paired function kvm_is_linear_rip() to check the vcpu->arch.singlestep_rip set in kvm_arch_vcpu_ioctl_set_guest_debug(), so change the "CS.BASE + RIP" code in kvm_arch_vcpu_ioctl_set_guest_debug() and handle_exception_nmi() to this one. Signed-off-by: Yuan Yao <yuan.yao@intel.com> Message-Id: <20210526063828.1173-1-yuan.yao@linux.intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-28x86/mce: Include a MCi_MISC value in faked mce logsTony Luck1-1/+2
When BIOS reports memory errors to Linux using the ACPI/APEI error reporting method Linux creates a "struct mce" to pass to the normal reporting code path. The constructed record doesn't include a value for the "misc" field of the structure, and so mce_usable_address() says this record doesn't include a valid address. Net result is that functions like uc_decode_notifier() will just ignore this record instead of taking action to offline a page. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210527222846.931851-1-tony.luck@intel.com
2021-05-28x86/pci: Return true/false (not 1/0) from bool functionsYang Li1-5/+5
Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck: ./arch/x86/pci/mmconfig-shared.c:464:9-10: WARNING: return of 0/1 in function 'is_mmconf_reserved' with return type bool ./arch/x86/pci/mmconfig-shared.c:493:5-6: WARNING: return of 0/1 in function 'is_mmconf_reserved' with return type bool ./arch/x86/pci/mmconfig-shared.c:501:9-10: WARNING: return of 0/1 in function 'is_mmconf_reserved' with return type bool ./arch/x86/pci/mmconfig-shared.c:522:5-6: WARNING: return of 0/1 in function 'is_mmconf_reserved' with return type bool Link: https://lore.kernel.org/r/1615794000-102771-1-git-send-email-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
2021-05-27x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank typesMuralidhara M K2-25/+43
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Also while at it, optimize the string names for some SMCA banks. [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
2021-05-27KVM: x86/mmu: Fix comment mentioning skip_4kDavid Matlack1-3/+3
This comment was left over from a previous version of the patch that introduced wrprot_gfn_range, when skip_4k was passed in instead of min_level. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210526163227.3113557-1-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: VMX: update vcpu posted-interrupt descriptor when assigning deviceMarcelo Tosatti3-0/+16
For VMX, when a vcpu enters HLT emulation, pi_post_block will: 1) Add vcpu to per-cpu list of blocked vcpus. 2) Program the posted-interrupt descriptor "notification vector" to POSTED_INTR_WAKEUP_VECTOR With interrupt remapping, an interrupt will set the PIR bit for the vector programmed for the device on the CPU, test-and-set the ON bit on the posted interrupt descriptor, and if the ON bit is clear generate an interrupt for the notification vector. This way, the target CPU wakes upon a device interrupt and wakes up the target vcpu. Problem is that pi_post_block only programs the notification vector if kvm_arch_has_assigned_device() is true. Its possible for the following to happen: 1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false, notification vector is not programmed 2) device is assigned to VM 3) device interrupts vcpu V, sets ON bit (notification vector not programmed, so pcpu P remains in idle) 4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set, kvm_vcpu_kick is skipped 5) vcpu 0 busy spins on vcpu V's response for several seconds, until RCU watchdog NMIs all vCPUs. To fix this, use the start_assignment kvm_x86_ops callback to kick vcpus out of the halt loop, so the notification vector is properly reprogrammed to the wakeup vector. Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20210526172014.GA29007@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCKMarcelo Tosatti2-2/+2
KVM_REQ_UNBLOCK will be used to exit a vcpu from its inner vcpu halt emulation loop. Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch PowerPC to arch specific request bit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20210525134321.303768132@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: x86: add start_assignment hook to kvm_x86_opsMarcelo Tosatti3-1/+4
Add a start_assignment hook to kvm_x86_ops, which is called when kvm_arch_start_assignment is done. The hook is required to update the wakeup vector of a sleeping vCPU when a device is assigned to the guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20210525134321.254128742@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switchWanpeng Li1-3/+11
Let's treat lapic_timer_advance_ns automatic tuning logic as hypervisor overhead, move it before wait_lapic_expire instead of between wait_lapic_expire and the world switch, the wait duration should be calculated by the up-to-date guest_tsc after the overhead of automatic tuning logic. This patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing busy waits. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-5-git-send-email-wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots()Wanpeng Li1-0/+8
WARNING: suspicious RCU usage 5.13.0-rc1 #4 Not tainted ----------------------------- ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by hyperv_clock/8318: #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm] stack backtrace: CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4 Call Trace: dump_stack+0x87/0xb7 lockdep_rcu_suspicious+0xce/0xf0 kvm_write_guest_page+0x1c1/0x1d0 [kvm] kvm_write_guest+0x50/0x90 [kvm] kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm] kvm_gen_update_masterclock+0x1d/0x110 [kvm] kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm] kvm_vm_ioctl+0x123/0x11d0 [kvm] __x64_sys_ioctl+0x3ed/0x9d0 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock. Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs) Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-4-git-send-email-wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: X86: Fix vCPU preempted state from guest's point of viewWanpeng Li1-0/+2
Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) avoids to access pv tlb shootdown host side logic when this pv feature is not exposed to guest, however, kvm_steal_time.preempted not only leveraged by pv tlb shootdown logic but also mitigate the lock holder preemption issue. From guest's point of view, vCPU is always preempted since we lose the reset of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not exposed. This patch fixes it by clearing kvm_steal_time.preempted before vmentry. Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-3-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27KVM: X86: Bail out of direct yield in case of under-committed scenariosWanpeng Li1-0/+3
In case of under-committed scenarios, vCPUs can be scheduled easily; kvm_vcpu_yield_to adds extra overhead, and it is also common to see when vcpu->ready is true but yield later failing due to p->state is TASK_RUNNING. Let's bail out in such scenarios by checking the length of current cpu runqueue, which can be treated as a hint of under-committed instead of guarantee of accuracy. 30%+ of directed-yield attempts can now avoid the expensive lookups in kvm_sched_yield() in an under-committed scenario. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27Merge v5.13-rc3 into drm-nextDaniel Vetter11-74/+136
drm/i915 is extremely on fire without the below revert from -rc3: commit 293837b9ac8d3021657f44c9d7a14948ec01c5d0 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed May 19 05:55:57 2021 -1000 Revert "i915: fix remap_io_sg to verify the pgprot" Backmerge so we don't have a too wide bisect window for anything that's a more involved workload than booting the driver. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-05-26kbuild: require all architectures to have arch/$(SRCARCH)/KbuildMasahiro Yamada1-3/+0
arch/$(SRCARCH)/Kbuild is useful for Makefile cleanups because you can use the obj-y syntax. Add an empty file if it is missing in arch/$(SRCARCH)/. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26locking/atomic: delete !ARCH_ATOMIC remnantsMark Rutland1-1/+0
Now that all architectures implement ARCH_ATOMIC, we can make it mandatory, removing the Kconfig symbol and logic for !ARCH_ATOMIC. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210525140232.53872-33-mark.rutland@arm.com
2021-05-26locking/atomic: make ARCH_ATOMIC a Kconfig symbolMark Rutland2-2/+1
Subsequent patches will move architectures over to the ARCH_ATOMIC API, after preparing the asm-generic atomic implementations to function with or without ARCH_ATOMIC. As some architectures use the asm-generic implementations exclusively (and don't have a local atomic.h), and to avoid the risk that ARCH_ATOMIC isn't defined in some cases we expect, let's make the ARCH_ATOMIC macro a Kconfig symbol instead, so that we can guarantee it is consistently available where needed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210525140232.53872-2-mark.rutland@arm.com
2021-05-25x86/syscalls: Don't adjust CFLAGS for syscall tablesBrian Gerst1-6/+0
The syscall_*.c files only contain data (the syscall tables). There is no need to adjust CFLAGS for tracing and stack protector since they contain no code. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210524181707.132844-4-brgerst@gmail.com
2021-05-25x86/syscalls: Remove -Wno-override-init for syscall tablesBrian Gerst1-4/+0
Commit 44fe4895f47c ("Stop filling syscall arrays with *_sys_ni_syscall") removes the need for -Wno-override-init, since the table is now filled sequentially instead of overriding a default value. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210524181707.132844-3-brgerst@gmail.com
2021-05-25x86/uml/syscalls: Remove array index from syscall initializersBrian Gerst2-2/+2
The recent syscall table generator rework removed the index from the initializers for native x86 syscall tables, but missed the UML syscall tables. Fixes: 44fe4895f47c ("Stop filling syscall arrays with *_sys_ni_syscall") Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210524181707.132844-2-brgerst@gmail.com
2021-05-25x86/syscalls: Clear 'offset' and 'prefix' in case they are set in envMasahiro Yamada1-0/+2
If the environment variable 'prefix' is set on the build host, it is wrongly used as syscall macro prefixes. $ export prefix=/usr $ make -s defconfig all In file included from ./arch/x86/include/asm/unistd.h:20, from <stdin>:2: ./arch/x86/include/generated/uapi/asm/unistd_64.h:4:9: warning: missing whitespace after the macro name 4 | #define __NR_/usrread 0 | ^~~~~ arch/x86/entry/syscalls/Makefile should clear 'offset' and 'prefix'. Fixes: 3cba325b358f ("x86/syscalls: Switch to generic syscallhdr.sh") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210525115420.679416-1-masahiroy@kernel.org
2021-05-25x86/entry: Use int everywhere for system call numbersH. Peter Anvin (Intel)2-29/+60
System call numbers are defined as int, so use int everywhere for system call numbers. This is strictly a cleanup; it should not change anything user visible; all ABI changes have been done in the preceeding patches. [ tglx: Replaced the unsigned long cast ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210518191303.4135296-7-hpa@zytor.com
2021-05-24KVM: SVM: make the avic parameter a boolPaolo Bonzini2-3/+3
Make it consistent with kvm_intel.enable_apicv. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-24KVM: VMX: Drop unneeded CONFIG_X86_LOCAL_APIC checkVitaly Kuznetsov1-2/+1
CONFIG_X86_LOCAL_APIC is always on when CONFIG_KVM (on x86) since commit e42eef4ba388 ("KVM: add X86_LOCAL_APIC dependency"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210518144339.1987982-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
2021-05-24KVM: SVM: Drop unneeded CONFIG_X86_LOCAL_APIC checkVitaly Kuznetsov2-5/+1
AVIC dependency on CONFIG_X86_LOCAL_APIC is dead code since commit e42eef4ba388 ("KVM: add X86_LOCAL_APIC dependency"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210518144339.1987982-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
2021-05-23Merge tag 'perf-urgent-2021-05-23' of ↵Linus Torvalds4-9/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two perf fixes: - Do not check the LBR_TOS MSR when setting up unrelated LBR MSRs as this can cause malfunction when TOS is not supported - Allocate the LBR XSAVE buffers along with the DS buffers upfront because allocating them when adding an event can deadlock" * tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
2021-05-23Merge tag 'x86_urgent_for_v5.13_rc3' of ↵Linus Torvalds3-57/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix how SEV handles MMIO accesses by forwarding potential page faults instead of killing the machine and by using the accessors with the exact functionality needed when accessing memory. - Fix a confusion with Clang LTO compiler switches passed to the it - Handle the case gracefully when VMGEXIT has been executed in userspace * tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __put_user()/__get_user() for data accesses x86/sev-es: Forward page-faults which happen during emulation x86/sev-es: Don't return NULL from sev_es_get_ghcb() x86/build: Fix location of '-plugin-opt=' flags x86/sev-es: Invalidate the GHCB after completing VMGEXIT x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
2021-05-23Merge tag 'efi-next-for-v5.14' of ↵Ingo Molnar1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v5.14 from Ard Biesheuvel: "First microbatch of EFI updates - not a lot going on these days." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-22Merge tag 'for-linus-5.13b-rc3-tag' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a fix for a boot regression when running as PV guest on hardware without NX support - a small series fixing a bug in the Xen pciback driver when configuring a PCI card with multiple virtual functions * tag 'for-linus-5.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: reconfigure also from backend watch handler xen-pciback: redo VF placement in the virtual topology x86/Xen: swap NX determination and GDT setup on BSP
2021-05-22x86/efi: Log 32/64-bit mismatch with kernel as an errorPaul Menzel1-1/+1
Log the message No EFI runtime due to 32/64-bit mismatch with kernel as an error condition, as several things like efivarfs won’t work without the EFI runtime. Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-05-21Merge branch 'for-v5.13-rc3' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.cH. Peter Anvin (Intel)2-44/+4
These files contain private set_gdt() functions which are only used to invalid the gdt; machine_kexec_64.c also contains a set_idt() function to invalidate the idt. phys_to_virt(0) *really* doesn't make any sense for creating an invalid GDT. A NULL pointer (virtual 0) makes a lot more sense; although neither will allow any actual memory reference, a NULL pointer stands out more. Replace these calls with native_[gi]dt_invalidate(). Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210519212154.511983-7-hpa@zytor.com
2021-05-21x86: Add native_[ig]dt_invalidate()H. Peter Anvin (Intel)1-0/+20
In some places, the native forms of descriptor table invalidation is required. Rather than open-coding them, add explicitly native functions to invalidate the GDT and IDT. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210519212154.511983-6-hpa@zytor.com
2021-05-21x86/idt: Remove address argument from idt_invalidate()H. Peter Anvin (Intel)4-6/+5
There is no reason to specify any specific address to idt_invalidate(). It looks mostly like an artifact of unifying code done differently by accident. The most "sensible" address to set here is a NULL pointer - virtual address zero, just as a visual marker. This also makes it possible to mark the struct desc_ptr in idt_invalidate() as static const. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210519212154.511983-5-hpa@zytor.com
2021-05-21x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORSH. Peter Anvin (Intel)2-2/+5
Add defines for the number of external vectors and number of system vectors instead of requiring the use of (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR) and (NR_VECTORS - FIRST_SYSTEM_VECTOR) respectively. Clean up the usage sites. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20210519212154.511983-3-hpa@zytor.com
2021-05-21x86/irq: Remove unused vectors definesH. Peter Anvin (Intel)1-2/+2
UV_BAU_MESSAGE is defined but not used anywhere in the kernel. Presumably this is a stale vector number that can be reclaimed. MCE_VECTOR is not an actual vector: #MC is an exception, not an interrupt vector, and as such is correctly described as X86_TRAP_MC. MCE_VECTOR is not used anywhere is the kernel. Note that NMI_VECTOR *is* used; specifically it is the vector number programmed into the APIC LVT when an NMI interrupt is configured. At the moment it is always numerically identical to X86_TRAP_NMI, that is not necessarily going to be the case indefinitely. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20210519212154.511983-4-hpa@zytor.com
2021-05-21x86/amd_nb: Add AMD family 19h model 50h PCI idsDavid Bartley1-0/+3
This is required to support Zen3 APUs in k10temp. Signed-off-by: David Bartley <andareed@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Wei Huang <wei.huang2@amd.com> Link: https://lkml.kernel.org/r/20210520174130.94954-1-andareed@gmail.com
2021-05-21x86/elf: Use _BITUL() macro in UAPI headersJoe Richey1-2/+4
Replace BIT() in x86's UAPI header with _BITUL(). BIT() is not defined in the UAPI headers and its usage may cause userspace build errors. Fixes: 742c45c3ecc9 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2") Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210521085849.37676-2-joerichey94@gmail.com
2021-05-21x86/Xen: swap NX determination and GDT setup on BSPJan Beulich1-4/+4
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-21Merge tag 'drm-intel-next-2021-05-19-1' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-intel into drm-next Core Changes: - drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose). Driver Changes: - Display plane clock rates fixes and improvements (Ville). - Uninint DMC FW loader state during shutdown (Imre). - Convert snprintf to sysfs_emit (Xuezhi). - Fix invalid access to ACPI _DSM objects (Takashi). - A big refactor around how i915 addresses the graphics and display IP versions. (Matt, Lucas). - Backlight fix (Lyude). - Display watermark and DBUF fixes (Ville). - HDCP fix (Anshuman). - Improve cases where display is not available (Jose). - Defeature PSR2 for RKL and ALD-S (Jose). - VLV DSI panel power fixes and improvements (Hans). - display-12 workaround (Jose). - Fix modesetting (Imre). - Drop redundant address-of op before lttpr_common_caps array (Imre). - Fix compiler checks (Jose, Jason). - GLK display fixes (Ville). - Fix error code returns (Dan). - eDP novel: back again to slow and wide link training everywhere (Kai-Heng). - Abstract DMC FW path (Rodrigo). - Preparation and changes for upcoming XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh). - Fix comment typo in DSI code (zuoqilin). - Simplify CCS and UV plane alignment handling (Imre). - PSR Fixes on TGL (Gwan-gyeong, Jose). - Add intel_dp_hdcp.h and rename init (Jani). - Move crtc and dpll declarations around (Jani). - Fix pre-skl DP AUX precharge length (Ville). - Remove stray newlines from random files (Ville). - crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville). - Add frontbuffer tracking tracepoints (Ville). - ADL-S PCI ID updates (Anand). - Use unique backlight device names (Jani). - A few clean-ups on i915/audio (Jani). - Use intel_framebuffer instead of drm one on intel_fb functions (Imre). - Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre). - Nuke display error state (Ville). - ADL-P initial enablement patches starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika). - Display clean-up around VBT and the strap bits (Lucas). - Try YCbCr420 color when RGB fails (Werner). - More PSR fixes and improvements (Jose). - Other generic display code clean-up (Jose, Ville). - Use correct downstream caps for check Src-Ctl mode for PCON (Ankit). - Disable HiZ Raw Stall Optimization on broken gen7 (Simon). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YKVioeu0JkUAlR7y@intel.com
2021-05-20Merge tag 'quota_for_v5.13-rc3' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code
2021-05-20x86/entry: Treat out of range and gap system calls the sameH. Peter Anvin (Intel)1-0/+4
The current 64-bit system call entry code treats out-of-range system calls differently than system calls that map to a hole in the system call table. This is visible to the user if system calls are intercepted via ptrace or seccomp and the return value (regs->ax) is modified: in the former case, the return value is preserved, and in the latter case, sys_ni_syscall() is called and the return value is forced to -ENOSYS. The API spec in <asm-generic/syscalls.h> is very clear that only (int)-1 is the non-system-call sentinel value, so make the system call behavior consistent by calling sys_ni_syscall() for all invalid system call numbers except for -1. Although currently sys_ni_syscall() simply returns -ENOSYS, calling it explicitly is friendly for tracing and future possible extensions, and as this is an error path there is no reason to optimize it. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210518191303.4135296-6-hpa@zytor.com