summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2022-11-29x86/hyperv: Remove unregister syscore call from Hyper-V cleanupGaurav Kohli1-2/+0
Hyper-V cleanup code comes under panic path where preemption and irq is already disabled. So calling of unregister_syscore_ops might schedule out the thread even for the case where mutex lock is free. hyperv_cleanup unregister_syscore_ops mutex_lock(&syscore_ops_lock) might_sleep Here might_sleep might schedule out this thread, where voluntary preemption config is on and this thread will never comes back. And also this was added earlier to maintain the symmetry which is not required as this can comes during crash shutdown path only. To prevent the same, removing unregister_syscore_ops function call. Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1669443291-2575-1-git-send-email-gauravkohli@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28iommu/hyper-v: Allow hyperv irq remapping without x2apicNuno Das Neves1-0/+6
If x2apic is not available, hyperv-iommu skips remapping irqs. This breaks root partition which always needs irqs remapped. Fix this by allowing irq remapping regardless of x2apic, and change hyperv_enable_irq_remapping() to return IRQ_REMAP_XAPIC_MODE in case x2apic is missing. Tested with root and non-root hyperv partitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28clocksource: hyper-v: Add TSC page support for root partitionStanislav Kinsburskiy1-0/+2
Microsoft Hypervisor root partition has to map the TSC page specified by the hypervisor, instead of providing the page to the hypervisor like it's done in the guest partitions. However, it's too early to map the page when the clock is initialized, so, the actual mapping is happening later. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-hyperv@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166759443644.385891.15921594265843430260.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28clocksource: hyper-v: Use TSC PFN getter to map vvar pageStanislav Kinsburskiy1-4/+3
Instead of converting the virtual address to physical directly. This is a precursor patch for the upcoming support for TSC page mapping into Microsoft Hypervisor root partition, where TSC PFN will be defined by the hypervisor and thus can't be obtained by linear translation of the physical address. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: Andy Lutomirski <luto@kernel.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-kernel@vger.kernel.org CC: linux-hyperv@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166749833939.218190.14095015146003109462.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28x86/hyperv: Expand definition of struct hv_vp_assist_pageSaurabh Sengar1-1/+10
The struct hv_vp_assist_page has 24 bytes which is defined as u64[3], expand that to expose vtl_entry_reason, vtl_ret_x64rax and vtl_ret_x64rcx field. vtl_entry_reason is updated by hypervisor for the entry reason as to why the VTL was entered on the virtual processor. Guest updates the vtl_ret_* fields to provide the register values to restore on VTL return. The specific register values that are restored which will be updated on vtl_ret_x64rax and vtl_ret_x64rcx. Also added the missing fields for synthetic_time_unhalted_timer_expired, virtualization_fault_information and intercept_message. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/1667587123-31645-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28x86/resctrl: Move MSR defines into msr-index.hBorislav Petkov5-24/+21
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
2022-11-27Merge tag 'x86_urgent_for_v6.1_rc7' of ↵Linus Torvalds4-30/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - ioremap: mask out the bits which are not part of the physical address *after* the size computation is done to prevent any hypothetical ioremap failures - Change the MSR save/restore functionality during suspend to rely on flags denoting that the related MSRs are actually supported vs reading them and assuming they are (an Atom one allows reading but not writing, thus breaking this scheme at resume time) - prevent IV reuse in the AES-GCM communication scheme between SNP guests and the AMD secure processor * tag 'x86_urgent_for_v6.1_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Fix page aligned size calculation in __ioremap_caller() x86/pm: Add enumeration check before spec MSRs save/restore setup x86/tsx: Add a feature bit for TSX control MSR support virt/sev-guest: Prevent IV reuse in the SNP guest driver
2022-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-42/+64
Pull kvm fixes from Paolo Bonzini: "x86: - Fixes for Xen emulation. While nobody should be enabling it in the kernel (the only public users of the feature are the selftests), the bug effectively allows userspace to read arbitrary memory. - Correctness fixes for nested hypervisors that do not intercept INIT or SHUTDOWN on AMD; the subsequent CPU reset can cause a use-after-free when it disables virtualization extensions. While downgrading the panic to a WARN is quite easy, the full fix is a bit more laborious; there are also tests. This is the bulk of the pull request. - Fix race condition due to incorrect mmu_lock use around make_mmu_pages_available(). Generic: - Obey changes to the kvm.halt_poll_ns module parameter in VMs not using KVM_CAP_HALT_POLL, restoring behavior from before the introduction of the capability" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Update gfn_to_pfn_cache khva when it moves within the same page KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0 KVM: x86/xen: Validate port number in SCHEDOP_poll KVM: x86/mmu: Fix race condition in direct_page_fault KVM: x86: remove exit_int_info warning in svm_handle_exit KVM: selftests: add svm part to triple_fault_test KVM: x86: allow L1 to not intercept triple fault kvm: selftests: add svm nested shutdown test KVM: selftests: move idt_entry to header KVM: x86: forcibly leave nested mode on vCPU reset KVM: x86: add kvm_leave_nested KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use KVM: x86: nSVM: leave nested mode on vCPU free KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling KVM: Cap vcpu->halt_poll_ns before halting rather than after
2022-11-27Merge tag 'kbuild-fixes-v6.1-4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix CC_HAS_ASM_GOTO_TIED_OUTPUT test in Kconfig - Fix noisy "No such file or directory" message when KBUILD_BUILD_VERSION is passed - Include rust/ in source tarballs - Fix missing FORCE for ARCH=nios2 builds * tag 'kbuild-fixes-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: nios2: add FORCE for vmlinuz.gz scripts: add rust in scripts/Makefile.package kbuild: fix "cat: .version: No such file or directory" init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash
2022-11-25Merge tag 'hyperv-fixes-signed-20221125' of ↵Linus Torvalds1-28/+26
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix IRTE allocation in Hyper-V PCI controller (Dexuan Cui) - Fix handling of SCSI srb_status and capacity change events (Michael Kelley) - Restore VP assist page after CPU offlining and onlining (Vitaly Kuznetsov) - Fix some memory leak issues in VMBus (Yang Yingliang) * tag 'hyperv-fixes-signed-20221125' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register() Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work() PCI: hv: Only reuse existing IRTE allocation for Multi-MSI scsi: storvsc: Fix handling of srb_status and capacity change events x86/hyperv: Restore VP assist page after cpu offlining/onlining
2022-11-25use less confusing names for iov_iter direction initializersAl Viro2-2/+2
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25[elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs ↵Al Viro1-4/+0
argument) Don't bother with pointless macros - we are not sharing it with aout coredumps anymore. Just convert the underlying functions to the same arguments (nobody uses regs, actually) and call them elf_core_copy_task_fpregs(). And unexport the entire bunch, while we are at it. [added missing includes in arch/{csky,m68k,um}/kernel/process.c to avoid extra warnings about the lack of externs getting added to huge piles for those files. Pointless, but...] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-24perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box()Xiongfeng Wang1-0/+3
pci_get_device() will increase the reference count for the returned pci_dev, so tgl_uncore_get_mc_dev() will return a pci_dev with its reference count increased. We need to call pci_dev_put() to decrease the reference count before exiting from __uncore_imc_init_box(). Add pci_dev_put() for both normal and error path. Fixes: fdb64822443e ("perf/x86: Add Intel Tiger Lake uncore support") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221118063137.121512-5-wangxiongfeng2@huawei.com
2022-11-24perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map()Xiongfeng Wang1-0/+2
pci_get_device() will increase the reference count for the returned pci_dev, so snr_uncore_get_mc_dev() will return a pci_dev with its reference count increased. We need to call pci_dev_put() to decrease the reference count. Let's add the missing pci_dev_put(). Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221118063137.121512-4-wangxiongfeng2@huawei.com
2022-11-24perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox()Xiongfeng Wang1-0/+1
pci_get_device() will increase the reference count for the returned 'dev'. We need to call pci_dev_put() to decrease the reference count. Since 'dev' is only used in pci_read_config_dword(), let's add pci_dev_put() right after it. Fixes: 9d480158ee86 ("perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221118063137.121512-3-wangxiongfeng2@huawei.com
2022-11-24perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology()Xiongfeng Wang1-0/+2
pci_get_device() will increase the reference count for the returned pci_dev, and also decrease the reference count for the input parameter *from* if it is not NULL. If we break the loop in sad_cfg_iio_topology() with 'dev' not NULL. We need to call pci_dev_put() to decrease the reference count. Since pci_dev_put() can handle the NULL input parameter, we can just add one pci_dev_put() right before 'return ret'. Fixes: c1777be3646b ("perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221118063137.121512-2-wangxiongfeng2@huawei.com
2022-11-24perf/x86/intel/uncore: Make set_mapping() procedure voidAlexander Antonov2-23/+20
Return value of set_mapping() is not needed to be checked anymore. So, make this procedure void. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-12-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire RapidsAlexander Antonov1-1/+42
UPI topology discovery on SPR is same as in ICX but UBOX device has different Device ID 0x3250. This patch enables /sys/devices/uncore_upi_*/die* attributes on SPR. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-10-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Enable UPI topology discovery for Icelake ServerAlexander Antonov1-0/+75
UPI topology discovery relies on data from KTILP0 (offset 0x94) and KTIPCSTS (offset 0x120) as well as on SKX but on Icelake Server these registers reside under UBOX (Device ID 0x3450) bus. This patch enables /sys/devices/uncore_upi_*/die* attributes on Icelake Server. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-9-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Get UPI NodeID and GroupIDAlexander Antonov1-8/+25
The GIDNIDMAP register of UBOX device is used to get the topology information in the snbep_pci2phy_map_init(). The same approach will be used to discover UPI topology for ICX and SPR platforms. Move common code that will be reused in next patches. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-8-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Enable UPI topology discovery for Skylake ServerAlexander Antonov1-0/+130
UPI topology discovery relies on data from KTILP0 (offset 0x94) and KTIPCSTS (offset 0x120) registers which reside under IIO bus(3) on SKX/CLX. This patch enable UPI topology discovery on Skylake Server. Topology is exposed through attributes /sys/devices/uncore_upi_<pmu_idx>/dieX, where dieX is file which holds "upi_<idx1>:die_<idx2>" connected to this UPI link. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-7-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Generalize get_topology() for SKX PMUsAlexander Antonov1-10/+28
Factor out a generic code from skx_iio_get_topology() to skx_pmu_get_topology() to avoid code duplication. This code will be used by get_topology() procedure for SKX UPI PMUs in the further patch. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-6-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-DAlexander Antonov2-0/+6
Current implementation of I/O stacks to PMU mapping doesn't support ICX-D. Detect ICX-D system to disable mapping. Fixes: 10337e95e04c ("perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221117122833.3103580-5-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Clear attr_update properlyAlexander Antonov1-1/+16
Current clear_attr_update procedure in pmu_set_mapping() sets attr_update field in NULL that is not correct because intel_uncore_type pmu types can contain several groups in attr_update field. For example, SPR platform already has uncore_alias_group to update and then UPI topology group will be added in next patches. Fix current behavior and clear attr_update group related to mapping only. Fixes: bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221117122833.3103580-4-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Introduce UPI topology typeAlexander Antonov2-1/+10
This patch introduces new 'uncore_upi_topology' topology type to support UPI topology discovery. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-3-alexander.antonov@linux.intel.com
2022-11-24perf/x86/intel/uncore: Generalize IIO topology supportAlexander Antonov2-44/+122
Current implementation of uncore mapping doesn't support different types of uncore PMUs which have its own topology context. This patch generalizes Intel uncore topology implementation to be able easily introduce support for new uncore blocks. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221117122833.3103580-2-alexander.antonov@linux.intel.com
2022-11-24perf/amd/ibs: Make IBS a core pmuRavi Bangoria1-2/+2
So far, only one pmu was allowed to be registered as core pmu and thus IBS pmus were being registered as uncore. However, with the event context rewrite, that limitation no longer exists and thus IBS pmus can also be registered as core pmu. This makes IBS much more usable, for ex, user will be able to do per-process precise monitoring on AMD: Before patch: $ sudo perf record -e cycles:pp ls Error: Invalid event (cycles:pp) in per-thread mode, enable system wide with '-a' After patch: $ sudo perf record -e cycles:pp ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (33 samples) ] Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20221115093904.1799-1-ravi.bangoria@amd.com
2022-11-24perf/x86/amd: Remove the repeated declarationShaokun Zhang1-1/+0
The function 'amd_brs_disable_all' is declared twice in commit ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support"). Remove one of them. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221108104117.46642-1-zhangshaokun@hisilicon.com
2022-11-24kbuild: fix "cat: .version: No such file or directory"Masahiro Yamada1-1/+1
Since commit 2df8220cc511 ("kbuild: build init/built-in.a just once"), the .version file is not touched at all when KBUILD_BUILD_VERSION is given. If KBUILD_BUILD_VERSION is specified and the .version file is missing (for example right after 'make mrproper'), "No such file or director" is shown. Even if the .version exists, it is irrelevant to the version of the current build. $ make -j$(nproc) KBUILD_BUILD_VERSION=100 mrproper defconfig all [ snip ] BUILD arch/x86/boot/bzImage cat: .version: No such file or directory Kernel: arch/x86/boot/bzImage is ready (#) Show KBUILD_BUILD_VERSION if it is given. Fixes: 2df8220cc511 ("kbuild: build init/built-in.a just once") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-11-24Merge branch 'kvm-dwmw2-fixes' into HEADPaolo Bonzini1-9/+23
This brings in a few important fixes for Xen emulation. While nobody should be enabling it, the bug effectively allows userspace to read arbitrary memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0David Woodhouse1-1/+11
There are almost no hypercalls which are valid from CPL > 0, and definitely none which are handled by the kernel. Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Reported-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/xen: Validate port number in SCHEDOP_pollDavid Woodhouse1-8/+12
We shouldn't allow guests to poll on arbitrary port numbers off the end of the event channel table. Fixes: 1a65105a5aba ("KVM: x86/xen: handle PV spinlocks slowpath") [dwmw2: my bug though; the original version did check the validity as a side-effect of an idr_find() which I ripped out in refactoring.] Reported-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/mmu: Fix race condition in direct_page_faultKazuki Takiguchi1-6/+7
make_mmu_pages_available() must be called with mmu_lock held for write. However, if the TDP MMU is used, it will be called with mmu_lock held for read. This function does nothing unless shadow pages are used, so there is no race unless nested TDP is used. Since nested TDP uses shadow pages, old shadow pages may be zapped by this function even when the TDP MMU is enabled. Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race condition can be avoided by not calling make_mmu_pages_available() if the TDP MMU is currently in use. I encountered this when repeatedly starting and stopping nested VM. It can be artificially caused by allocating a large number of nested TDP SPTEs. For example, the following BUG and general protection fault are caused in the host kernel. pte_list_remove: 00000000cd54fc10 many->many ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu/mmu.c:963! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm] Call Trace: <TASK> drop_spte+0xe0/0x180 [kvm] mmu_page_zap_pte+0x4f/0x140 [kvm] __kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm] kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm] direct_page_fault+0x3cb/0x9b0 [kvm] kvm_tdp_page_fault+0x2c/0xa0 [kvm] kvm_mmu_page_fault+0x207/0x930 [kvm] npf_interception+0x47/0xb0 [kvm_amd] svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd] svm_handle_exit+0xfc/0x2c0 [kvm_amd] kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm] kvm_vcpu_ioctl+0x29b/0x6f0 [kvm] __x64_sys_ioctl+0x95/0xd0 do_syscall_64+0x5c/0x90 general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm] Call Trace: <TASK> kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm] direct_page_fault+0x3cb/0x9b0 [kvm] kvm_tdp_page_fault+0x2c/0xa0 [kvm] kvm_mmu_page_fault+0x207/0x930 [kvm] npf_interception+0x47/0xb0 [kvm_amd] CVE: CVE-2022-45869 Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Signed-off-by: Kazuki Takiguchi <takiguchi.kazuki171@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-23ACPI: make remove callback of ACPI driver voidDawei Li1-2/+1
For bus-based driver, device removal is implemented as: 1 device_remove()-> 2 bus->remove()-> 3 driver->remove() Driver core needs no inform from callee(bus driver) about the result of remove callback. In that case, commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned. Now we have the situation that both 1 & 2 of calling chain are void-returned, so it does not make much sense for 3(driver->remove) to return non-void to its caller. So the basic idea behind this change is making remove() callback of any bus-based driver to be void-returned. This change, for itself, is for device drivers based on acpi-bus. Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Lee Jones <lee@kernel.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dawei Li <set_pte_at@outlook.com> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> # for drivers/platform/surface/* Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-22x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGNYingChi Long1-5/+2
WG14 N2350 specifies that it is an undefined behavior to have type definitions within offsetof", see https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm This specification is also part of C23. Therefore, replace the TYPE_ALIGN macro with the _Alignof builtin to avoid undefined behavior. (_Alignof itself is C11 and the kernel is built with -gnu11). ISO C11 _Alignof is subtly different from the GNU C extension __alignof__. Latter is the preferred alignment and _Alignof the minimal alignment. For long long on x86 these are 8 and 4 respectively. The macro TYPE_ALIGN's behavior matches _Alignof rather than __alignof__. [ bp: Massage commit message. ] Signed-off-by: YingChi Long <me@inclyc.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220925153151.2467884-1-me@inclyc.cn
2022-11-22x86/alternative: Consistently patch SMP locks in vmlinux and modulesJulian Pidancet1-6/+5
alternatives_smp_module_add() restricts patching of SMP lock prefixes to the text address range passed as an argument. For vmlinux, patching all the instructions located between the _text and _etext symbols is allowed. That includes the .text section but also other sections such as .text.hot and .text.unlikely. As per the comment inside the 'struct smp_alt_module' definition, the original purpose of this restriction is to avoid patching the init code because in the case when one boots with a single CPU, the LOCK prefixes to the locking primitives are removed. Later on, when other CPUs are onlined, those LOCK prefixes get added back in but by that time the .init code is very likely removed so patching that would be a bad idea. For modules, the current code only allows patching instructions located inside the .text segment, excluding other sections such as .text.hot or .text.unlikely, which may need patching. Make patching of the kernel core and modules more consistent by allowing all text sections of modules except .init.text to be patched in module_finalize(). For that, use mod->core_layout.base/mod->core_layout.text_size as the address range allowed to be patched, which include all the code sections except the init code. [ bp: Massage and expand commit message. ] Signed-off-by: Julian Pidancet <julian.pidancet@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221027204906.511277-1-julian.pidancet@oracle.com
2022-11-22x86/ioremap: Fix page aligned size calculation in __ioremap_caller()Michael Kelley1-1/+7
Current code re-calculates the size after aligning the starting and ending physical addresses on a page boundary. But the re-calculation also embeds the masking of high order bits that exceed the size of the physical address space (via PHYSICAL_PAGE_MASK). If the masking removes any high order bits, the size calculation results in a huge value that is likely to immediately fail. Fix this by re-calculating the page-aligned size first. Then mask any high order bits using PHYSICAL_PAGE_MASK. Fixes: ffa71f33a820 ("x86, ioremap: Fix incorrect physical address handling in PAE mode") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/1668624097-14884-2-git-send-email-mikelley@microsoft.com
2022-11-21x86/pm: Add enumeration check before spec MSRs save/restore setupPawan Gupta1-8/+15
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need to be saved and restored at hibernate and resume. However, it has zero awareness of CPU support for these MSRs. It mostly works by unconditionally attempting to manipulate these MSRs and relying on rdmsrl_safe() being able to handle a #GP on CPUs where the support is unavailable. However, it's possible for reads (RDMSR) to be supported for a given MSR while writes (WRMSR) are not. In this case, msr_build_context() sees a successful read (RDMSR) and marks the MSR as valid. Then, later, a write (WRMSR) fails, producing a nasty (but harmless) error message. This causes restore_processor_state() to try and restore it, but writing this MSR is not allowed on the Intel Atom N2600 leading to: unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \ at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20) Call Trace: <TASK> restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kernfs_fop_write_iter vfs_write ksys_write do_syscall_64 ? do_syscall_64 ? up_read ? lock_is_held_type ? asm_exc_page_fault ? lockdep_hardirqs_on entry_SYSCALL_64_after_hwframe To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid trying to manipulate the MSR when the feature bit is clear. This required adding a X86_FEATURE bit for MSRs that do not have one already, but it's a small price to pay. [ bp: Move struct msr_enumeration inside the only function that uses it. ] Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup") Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.1668539735.git.pawan.kumar.gupta@linux.intel.com
2022-11-21x86/tsx: Add a feature bit for TSX control MSR supportPawan Gupta2-21/+20
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES. This is different from how other CPU features are enumerated i.e. via CPUID. Currently, a call to tsx_ctrl_is_supported() is required for enumerating the feature. In the absence of a feature bit for TSX control, any code that relies on checking feature bits directly will not work. In preparation for adding a feature bit check in MSR save/restore during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the new feature bit to avoid any overhead of reading the MSR. [ bp: Remove tsx_ctrl_is_supported(), add room for two more feature bits in word 11 which are coming up in the next merge window. ] Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
2022-11-20Merge tag 'x86_urgent_for_v6.1_rc6' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Do not hold fpregs lock when inheriting FPU permissions because the fpregs lock disables preemption on RT but fpu_inherit_perms() does spin_lock_irq(), which, on RT, uses rtmutexes and they need to be preemptible. - Check the page offset and the length of the data supplied by userspace for overflow when specifying a set of pages to add to an SGX enclave * tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Drop fpregs lock before inheriting FPU permissions x86/sgx: Add overflow check in sgx_validate_offset_length()
2022-11-20Merge tag 'perf_urgent_for_v6.1_rc6' of ↵Linus Torvalds3-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Fix an intel PT erratum where CPUs do not support single range output for more than 4K - Fix a NULL ptr dereference which can happen after an NMI interferes with the event enabling dance in amd_pmu_enable_all() - Free the events array too when freeing uncore contexts on CPU online, thereby fixing a memory leak - Improve the pending SIGTRAP check * tag 'perf_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix sampling using single range output perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling perf/x86/amd/uncore: Fix memory leak for events array perf: Improve missing SIGTRAP checking
2022-11-20Merge tag 'locking_urgent_for_v6.1_rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a build error with clang 11 * tag 'locking_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Fix qspinlock/x86 inline asm error
2022-11-19x86/kaslr: Fix process_mem_region()'s return valueJiapeng Chong1-1/+1
Fix the following coccicheck warning: ./arch/x86/boot/compressed/kaslr.c:670:8-9: WARNING: return of 0/1 in function 'process_mem_region' with return type bool. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220421202556.129799-1-jiapeng.chong@linux.alibaba.com
2022-11-19perf/x86: Make struct p4_event_bind::cntr signed arrayAlexey Dobriyan1-1/+1
struct p4_event_bind::cntr[][] should be signed because of the following code: int i, j; for (i = 0; i < P4_CNTR_LIMIT; i++) { ---> j = bind->cntr[thread][i]; if (j != -1 && !test_bit(j, used_mask)) return j; } Making this member unsigned will make "j" 255 and fail "j != -1" comparison. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accessesMark Rutland1-0/+16
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch adds new ftrace_regs_{get,set}_*() helpers which can be used to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y, these can always be used on any ftrace_regs, and when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are available. A new ftrace_regs_has_args(fregs) helper is added which code can use to check when these are usable. Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: rename ftrace_instruction_pointer_set() -> ↵Mark Rutland1-1/+1
ftrace_regs_set_instruction_pointer() In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*() helpers. In preparation, this patch renames ftrace_instruction_pointer_set() to ftrace_regs_set_instruction_pointer(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: pass fregs to arch_ftrace_set_direct_caller()Mark Rutland1-13/+18
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch changes the prototype of arch_ftrace_set_direct_caller() to take ftrace_regs rather than pt_regs, and moves the extraction of the pt_regs into arch_ftrace_set_direct_caller(). On x86, arch_ftrace_set_direct_caller() can be used even when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines struct ftrace_regs. Due to this, it's necessary to define arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete type. I've also moved the body of arch_ftrace_set_direct_caller() after the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct ftrace_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18stackprotector: actually use get_random_canary()Jason A. Donenfeld1-13/+1
The RNG always mixes in the Linux version extremely early in boot. It also always includes a cycle counter, not only during early boot, but each and every time it is invoked prior to being fully initialized. Together, this means that the use of additional xors inside of the various stackprotector.h files is superfluous and over-complicated. Instead, we can get exactly the same thing, but better, by just calling `get_random_canary()`. Acked-by: Guo Ren <guoren@kernel.org> # for csky Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18stackprotector: move get_random_canary() into stackprotector.hJason A. Donenfeld4-3/+4
This has nothing to do with random.c and everything to do with stack protectors. Yes, it uses randomness. But many things use randomness. random.h and random.c are concerned with the generation of randomness, not with each and every use. So move this function into the more specific stackprotector.h file where it belongs. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_inclusive() when possibleJason A. Donenfeld1-1/+1
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>