summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2022-12-13Merge tag 'pull-elfcore' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull elf coredumping updates from Al Viro: "Unification of regset and non-regset sides of ELF coredump handling. Collecting per-thread register values is the only thing that needs to be ifdefed there..." * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [elf] get rid of get_note_info_size() [elf] unify regset and non-regset cases [elf][non-regset] use elf_core_copy_task_regs() for dumper as well [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) elf_core_copy_task_regs(): task_pt_regs is defined everywhere [elf][regset] simplify thread list handling in fill_note_info() [elf][regset] clean fill_note_info() a bit kill extern of vsyscall32_sysctl kill coredump_params->regs kill signal_pt_regs()
2022-12-13Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds1-13/+1
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-13Merge tag 'x86_cache_for_6.2' of ↵Linus Torvalds2-11/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Dave Hansen: "These declare the resource control (rectrl) MSRs a bit more normally and clean up an unnecessary structure member: - Remove unnecessary arch_has_empty_bitmaps structure memory - Move rescrtl MSR defines into msr-index.h, like normal MSRs" * tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Move MSR defines into msr-index.h x86/resctrl: Remove arch_has_empty_bitmaps
2022-12-13Merge tag 'x86_tdx_for_6.2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tdx updates from Dave Hansen: "This includes a single chunk of new functionality for TDX guests which allows them to talk to the trusted TDX module software and obtain an attestation report. This report can then be used to prove the trustworthiness of the guest to a third party and get access to things like storage encryption keys" * tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/tdx: Test TDX attestation GetReport support virt: Add TDX guest driver x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
2022-12-13Merge tag 'x86_sgx_for_6.2' of ↵Linus Torvalds2-7/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 sgx updates from Dave Hansen: "The biggest deal in this series is support for a new hardware feature that allows enclaves to detect and mitigate single-stepping attacks. There's also a minor performance tweak and a little piece of the kmap_atomic() -> kmap_local() transition. Summary: - Introduce a new SGX feature (Asynchrounous Exit Notification) for bare-metal enclaves and KVM guests to mitigate single-step attacks - Increase batching to speed up enclave release - Replace kmap/kunmap_atomic() calls" * tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Replace kmap/kunmap_atomic() calls KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest x86/sgx: Allow enclaves to use Asynchrounous Exit Notification x86/sgx: Reduce delay and interference of enclave release
2022-12-13Merge tag 'x86-misc-2022-12-10' of ↵Linus Torvalds5-17/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: "Updates for miscellaneous x86 areas: - Reserve a new boot loader type for barebox which is usally used on ARM and MIPS, but can also be utilized as EFI payload on x86 to provide watchdog-supervised boot up. - Consolidate the native and compat 32bit signal handling code and split the 64bit version out into a separate source file - Switch the ESPFIX random usage to get_random_long()" * tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/espfix: Use get_random_long() rather than archrandom x86/signal/64: Move 64-bit signal code to its own file x86/signal/32: Merge native and compat 32-bit signal code x86/signal: Add ABI prefixes to frame setup functions x86/signal: Merge get_sigframe() x86: Remove __USER32_DS signal/compat: Remove compat_sigset_t override x86/signal: Remove sigset_t parameter from frame setup functions x86/signal: Remove sig parameter from frame setup functions Documentation/x86/boot: Reserve type_of_loader=13 for barebox
2022-12-12Merge tag 'x86-apic-2022-12-10' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic update from Thomas Gleixner: "A set of changes for the x86 APIC code: - Handle the case where x2APIC is enabled and locked by the BIOS on a kernel with CONFIG_X86_X2APIC=n gracefully. Instead of a panic which does not make it to the graphical console during very early boot, simply disable the local APIC completely and boot with the PIC and very limited functionality, which allows to diagnose the issue - Convert x86 APIC device tree bindings to YAML - Extend x86 APIC device tree bindings to configure interrupt delivery mode and handle this in during init. This allows to boot with device tree on platforms which lack a legacy PIC" * tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/of: Add support for boot time interrupt delivery mode configuration x86/of: Replace printk(KERN_LVL) with pr_lvl() dt-bindings: x86: apic: Introduce new optional bool property for lapic dt-bindings: x86: apic: Convert Intel's APIC bindings to YAML schema x86/of: Remove unused early_init_dt_add_memory_arch() x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds6-11/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12Merge tag 'arm64-upstream' of ↵Linus Torvalds1-14/+35
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ...
2022-12-12Merge tag 'hyperv-next-signed-20221208' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav Kohli) - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli) - Allow IRQ remapping to work without x2apic (Nuno Das Neves) - Fix comments (Olaf Hering) - Expand hv_vp_assist_page definition (Saurabh Sengar) - Improvement to page reporting (Shradha Gupta) - Make sure TSC clocksource works when Linux runs as the root partition (Stanislav Kinsburskiy) * tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Remove unregister syscore call from Hyper-V cleanup iommu/hyper-v: Allow hyperv irq remapping without x2apic clocksource: hyper-v: Add TSC page support for root partition clocksource: hyper-v: Use TSC PFN getter to map vvar page clocksource: hyper-v: Introduce TSC PFN getter clocksource: hyper-v: Introduce a pointer to TSC page x86/hyperv: Expand definition of struct hv_vp_assist_page PCI: hv: update comment in x86 specific hv_arch_irq_unmask hv: fix comment typo in vmbus_channel/low_latency drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers video: hyperv_fb: Avoid taking busy spinlock on panic path hv_balloon: Add support for configurable order free page reporting mm/page_reporting: Add checks for page_reporting_order param
2022-12-06x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYNThomas Gleixner1-1/+1
x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out of the box - on the vector domain and on the remapping domains, Add the feature flag to the supported feature list Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.787373104@linutronix.de
2022-12-06x86/apic/msi: Remove arch_create_remap_msi_irq_domain()Thomas Gleixner1-4/+0
and related code which is not longer required now that the interrupt remap code has been converted to MSI parent domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de
2022-12-06x86/apic/vector: Provide MSI parent domainThomas Gleixner2-0/+7
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-03x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3Pawan Gupta1-1/+1
The "force" argument to write_spec_ctrl_current() is currently ambiguous as it does not guarantee the MSR write. This is due to the optimization that writes to the MSR happen only when the new value differs from the cached value. This is fine in most cases, but breaks for S3 resume when the cached MSR value gets out of sync with the hardware MSR value due to S3 resetting it. When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write is skipped. Which results in SPEC_CTRL mitigations not getting restored. Move the MSR write from write_spec_ctrl_current() to a new function that unconditionally writes to the MSR. Update the callers accordingly and rename functions. [ bp: Rework a bit. ] Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value") Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-03Merge tag 'mm-hotfixes-stable-2022-12-02' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes, 11 marked cc:stable. Only three or four of the latter address post-6.0 issues, which is hopefully a sign that things are converging" * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: take the right locks for page table retraction mm: migrate: fix THP's mapcount on isolation mm: introduce arch_has_hw_nonleaf_pmd_young() mm: add dummy pmd_young() for architectures not having it mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes() tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise: use zap_page_range_single for madvise dontneed mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOSMateusz Jończyk1-2/+1
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on platforms that have x2APIC already enabled in the BIOS before starting the kernel. The kernel was supposed to panic with an approprite error message in validate_x2apic() due to the missing X2APIC support. However, validate_x2apic() was run too late in the boot cycle, and the kernel tried to initialize the APIC nonetheless. This resulted in an earlier panic in setup_local_APIC() because the APIC was not registered. In my experiments, a panic message in setup_local_APIC() was not visible in the graphical console, which resulted in a hang with no indication what has gone wrong. Instead of calling panic(), disable the APIC, which results in a somewhat working system with the PIC only (and no SMP). This way the user is able to diagnose the problem more easily. Disabling X2APIC mode is not an option because it's impossible on systems with locked x2APIC. The proper place to disable the APIC in this case is in check_x2apic(), which is called early from setup_arch(). Doing this in __apic_intr_mode_select() is too late. Make check_x2apic() unconditionally available and remove the empty stub. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Robert Elliott (Servers) <elliott@hpe.com> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl
2022-12-01mm: introduce arch_has_hw_nonleaf_pmd_young()Juergen Gross1-0/+8
When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in pmdp_test_and_clear_young(): BUG: unable to handle page fault for address: ffff8880083374d0 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1 RIP: e030:pmdp_test_and_clear_young+0x25/0x40 This happens because the Xen hypervisor can't emulate direct writes to page table entries other than PTEs. This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young() similar to arch_has_hw_pte_young() and test that instead of CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG. Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") Signed-off-by: Juergen Gross <jgross@suse.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Yu Zhao <yuzhao@google.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: David Hildenbrand <david@redhat.com> [core changes] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01mm: add dummy pmd_young() for architectures not having itJuergen Gross1-0/+1
In order to avoid #ifdeffery add a dummy pmd_young() implementation as a fallback. This is required for the later patch "mm: introduce arch_has_hw_nonleaf_pmd_young()". Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-28x86/hyperv: Expand definition of struct hv_vp_assist_pageSaurabh Sengar1-1/+10
The struct hv_vp_assist_page has 24 bytes which is defined as u64[3], expand that to expose vtl_entry_reason, vtl_ret_x64rax and vtl_ret_x64rcx field. vtl_entry_reason is updated by hypervisor for the entry reason as to why the VTL was entered on the virtual processor. Guest updates the vtl_ret_* fields to provide the register values to restore on VTL return. The specific register values that are restored which will be updated on vtl_ret_x64rax and vtl_ret_x64rcx. Also added the missing fields for synthetic_time_unhalted_timer_expired, virtualization_fault_information and intercept_message. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/1667587123-31645-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28x86/resctrl: Move MSR defines into msr-index.hBorislav Petkov2-11/+18
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
2022-11-21x86/tsx: Add a feature bit for TSX control MSR supportPawan Gupta1-0/+3
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES. This is different from how other CPU features are enumerated i.e. via CPUID. Currently, a call to tsx_ctrl_is_supported() is required for enumerating the feature. In the absence of a feature bit for TSX control, any code that relies on checking feature bits directly will not work. In preparation for adding a feature bit check in MSR save/restore during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the new feature bit to avoid any overhead of reading the MSR. [ bp: Remove tsx_ctrl_is_supported(), add room for two more feature bits in word 11 which are coming up in the next merge window. ] Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
2022-11-20Merge tag 'locking_urgent_for_v6.1_rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a build error with clang 11 * tag 'locking_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Fix qspinlock/x86 inline asm error
2022-11-18ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accessesMark Rutland1-0/+16
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch adds new ftrace_regs_{get,set}_*() helpers which can be used to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y, these can always be used on any ftrace_regs, and when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are available. A new ftrace_regs_has_args(fregs) helper is added which code can use to check when these are usable. Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: rename ftrace_instruction_pointer_set() -> ↵Mark Rutland1-1/+1
ftrace_regs_set_instruction_pointer() In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*() helpers. In preparation, this patch renames ftrace_instruction_pointer_set() to ftrace_regs_set_instruction_pointer(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: pass fregs to arch_ftrace_set_direct_caller()Mark Rutland1-13/+18
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch changes the prototype of arch_ftrace_set_direct_caller() to take ftrace_regs rather than pt_regs, and moves the extraction of the pt_regs into arch_ftrace_set_direct_caller(). On x86, arch_ftrace_set_direct_caller() can be used even when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines struct ftrace_regs. Due to this, it's necessary to define arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete type. I've also moved the body of arch_ftrace_set_direct_caller() after the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct ftrace_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18stackprotector: actually use get_random_canary()Jason A. Donenfeld1-13/+1
The RNG always mixes in the Linux version extremely early in boot. It also always includes a cycle counter, not only during early boot, but each and every time it is invoked prior to being fully initialized. Together, this means that the use of additional xors inside of the various stackprotector.h files is superfluous and over-complicated. Instead, we can get exactly the same thing, but better, by just calling `get_random_canary()`. Acked-by: Guo Ren <guoren@kernel.org> # for csky Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17x86/tdx: Add a wrapper to get TDREPORT0 from the TDX ModuleKuppuswamy Sathyanarayanan1-0/+2
To support TDX attestation, the TDX guest driver exposes an IOCTL interface to allow userspace to get the TDREPORT0 (a.k.a. TDREPORT subtype 0) from the TDX module via TDG.MR.TDREPORT TDCALL. In order to get the TDREPORT0 in the TDX guest driver, instead of using a low level function like __tdx_module_call(), add a tdx_mcall_get_report0() wrapper function to handle it. This is a preparatory patch for adding attestation support. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/all/20221116223820.819090-2-sathyanarayanan.kuppuswamy%40linux.intel.com
2022-11-17x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORSThomas Gleixner1-3/+1
Now that the PCI/MSI core code does early checking for multi-MSI support X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore. Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
2022-11-17PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAINThomas Gleixner1-2/+2
What a zoo: PCI_MSI select GENERIC_MSI_IRQ PCI_MSI_IRQ_DOMAIN def_bool y depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are just an indirection to PCI_MSI. Match the reality and just admit that PCI_MSI requires GENERIC_MSI_IRQ_DOMAIN. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de
2022-11-17clocksource/drivers/hyper-v: Include asm/hyperv-tlfs.h not asm/mshyperv.hThomas Gleixner2-2/+9
clocksource/hyperv_timer.h is included into the VDSO build. It includes asm/mshyperv.h which in turn includes the world and some more. This worked so far by chance, but any subtle change in the include chain results in a build breakage because VDSO builds are building user space libraries. Include asm/hyperv-tlfs.h instead which contains everything what the VDSO build needs except the hv_get_raw_timer() define. Move this define into a separate header file, which contains the prerequisites (msr.h) and is included by clocksource/hyperv_timer.h. Fixup drivers/hv/vmbus_drv.c which relies on the indirect include of asm/mshyperv.h. With that the VDSO build only pulls in the minimum requirements. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/87fsemtut0.ffs@tglx
2022-11-16locking: Fix qspinlock/x86 inline asm errorGuo Jin1-1/+1
When compiling linux 6.1.0-rc3 configured with CONFIG_64BIT=y and CONFIG_PARAVIRT_SPINLOCKS=y on x86_64 using LLVM 11.0, an error: "<inline asm> error: changed section flags for .spinlock.text, expected:: 0x6" occurred. The reason is the .spinlock.text in kernel/locking/qspinlock.o is used many times, but its flags are omitted in subsequent use. LLVM 11.0 assembler didn't permit to leave out flags in subsequent uses of the same sections. So this patch adds the corresponding flags to avoid above error. Fixes: 501f7f69bca1 ("locking: Add __lockfunc to slow path functions") Signed-off-by: Guo Jin <guoj17@chinatelecom.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20221108060126.2505-1-guoj17@chinatelecom.cn
2022-11-15x86/cpu: Restore AMD's DE_CFG MSR after resumeBorislav Petkov1-3/+5
DE_CFG contains the LFENCE serializing bit, restore it on resume too. This is relevant to older families due to the way how they do S3. Unify and correct naming while at it. Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction") Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-09Merge tag 'kvm-s390-master-6.1-1' of ↵Paolo Bonzini2-2/+11
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD A PCI allocation fix and a PV clock fix.
2022-11-09KVM: x86/pmu: Limit the maximum number of supported AMD GP countersLike Xu1-0/+1
The AMD PerfMonV2 specification allows for a maximum of 16 GP counters, but currently only 6 pairs of MSRs are accepted by KVM. While AMD64_NUM_COUNTERS_CORE is already equal to 6, increasing without adjusting msrs_to_save_all[] could result in out-of-bounds accesses. Therefore introduce a macro (named KVM_AMD_PMC_MAX_GENERIC) to refer to the number of counters supported by KVM. Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Limit the maximum number of supported Intel GP countersLike Xu1-1/+5
The Intel Architectural IA32_PMCx MSRs addresses range allows for a maximum of 8 GP counters, and KVM cannot address any more. Introduce a local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to refer to the number of counters supported by KVM, thus avoiding possible out-of-bound accesses. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callersPaolo Bonzini1-5/+5
x86_virt_spec_ctrl only deals with the paravirtualized MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL anymore; remove the corresponding, unused argument. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-06Merge tag 'x86_urgent_for_v6.1_rc4' of ↵Linus Torvalds2-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add new Intel CPU models - Enforce that TDX guests are successfully loaded only on TDX hardware where virtualization exception (#VE) delivery on kernel memory is disabled because handling those in all possible cases is "essentially impossible" - Add the proper include to the syscall wrappers so that BTF can see the real pt_regs definition and not only the forward declaration * tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add several Intel server CPU model numbers x86/tdx: Panic on bad configs that #VE on "private" memory access x86/tdx: Prepare for using "INFO" call for a second purpose x86/syscall: Include asm/ptrace.h in syscall_wrapper header
2022-11-05KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guestKai Huang1-0/+1
The new Asynchronous Exit (AEX) notification mechanism (AEX-notify) allows one enclave to receive a notification in the ERESUME after the enclave exit due to an AEX. EDECCSSA is a new SGX user leaf function (ENCLU[EDECCSSA]) to facilitate the AEX notification handling. The new EDECCSSA is enumerated via CPUID(EAX=0x12,ECX=0x0):EAX[11]. Besides Allowing reporting the new AEX-notify attribute to KVM guests, also allow reporting the new EDECCSSA user leaf function to KVM guests so the guest can fully utilize the AEX-notify mechanism. Similar to existing X86_FEATURE_SGX1 and X86_FEATURE_SGX2, introduce a new scattered X86_FEATURE_SGX_EDECCSSA bit for the new EDECCSSA, and report it in KVM's supported CPUIDs. Note, no additional KVM enabling is required to allow the guest to use EDECCSSA. It's impossible to trap ENCLU (without completely preventing the guest from using SGX). Advertise EDECCSSA as supported purely so that userspace doesn't need to special case EDECCSSA, i.e. doesn't need to manually check host CPUID. The inability to trap ENCLU also means that KVM can't prevent the guest from using EDECCSSA, but that virtualization hole is benign as far as KVM is concerned. EDECCSSA is simply a fancy way to modify internal enclave state. More background about how do AEX-notify and EDECCSSA work: SGX maintains a Current State Save Area Frame (CSSA) for each enclave thread. When AEX happens, the enclave thread context is saved to the CSSA and the CSSA is increased by 1. For a normal ERESUME which doesn't deliver AEX notification, it restores the saved thread context from the previously saved SSA and decreases the CSSA. If AEX-notify is enabled for one enclave, the ERESUME acts differently. Instead of restoring the saved thread context and decreasing the CSSA, it acts like EENTER which doesn't decrease the CSSA but establishes a clean slate thread context using the CSSA for the enclave to handle the notification. After some handling, the enclave must discard the "new-established" SSA and switch back to the previously saved SSA (upon AEX). Otherwise, the enclave will run out of SSA space upon further AEXs and eventually fail to run. To solve this problem, the new EDECCSSA essentially decreases the CSSA. It can be used by the enclave notification handler to switch back to the previous saved SSA when needed, i.e. after it handles the notification. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/all/20221101022422.858944-1-kai.huang%40intel.com
2022-11-05x86/sgx: Allow enclaves to use Asynchrounous Exit NotificationDave Hansen1-7/+26
Short Version: Allow enclaves to use the new Asynchronous EXit (AEX) notification mechanism. This mechanism lets enclaves run a handler after an AEX event. These handlers can run mitigations for things like SGX-Step[1]. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. Long Version: == SGX Attribute Background == The SGX architecture includes a list of SGX "attributes". These attributes ensure consistency and transparency around specific enclave features. As a simple example, the "DEBUG" attribute allows an enclave to be debugged, but also destroys virtually all of SGX security. Using attributes, enclaves can know that they are being debugged. Attributes also affect enclave attestation so an enclave can, for instance, be denied access to secrets while it is being debugged. The kernel keeps a list of known attributes and will only initialize enclaves that use a known set of attributes. This kernel policy eliminates the chance that a new SGX attribute could cause undesired effects. For example, imagine a new attribute was added called "PROVISIONKEY2" that provided similar functionality to "PROVISIIONKEY". A kernel policy that allowed indiscriminate use of unknown attributes and thus PROVISIONKEY2 would undermine the existing kernel policy which limits use of PROVISIONKEY enclaves. == AEX Notify Background == "Intel Architecture Instruction Set Extensions and Future Features - Version 45" is out[2]. There is a new chapter: Asynchronous Enclave Exit Notify and the EDECCSSA User Leaf Function. Enclaves exit can be either synchronous and consensual (EEXIT for instance) or asynchronous (on an interrupt or fault). The asynchronous ones can evidently be exploited to single step enclaves[1], on top of which other naughty things can be built. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. == The Problem == These attacks are currently entirely opaque to the enclave since the hardware does the save/restore under the covers. The Asynchronous Enclave Exit Notify (AEX Notify) mechanism provides enclaves an ability to detect and mitigate potential exposure to these kinds of attacks. == The Solution == Define the new attribute value for AEX Notification. Ensure the attribute is cleared from the list reserved attributes. Instead of adding to the open-coded lists of individual attributes, add named lists of privileged (disallowed by default) and unprivileged (allowed by default) attributes. Add the AEX notify attribute as an unprivileged attribute, which will keep the kernel from rejecting enclaves with it set. 1. https://github.com/jovanbulck/sgx-step 2. https://cdrdv2.intel.com/v1/dl/getContent/671368?explicitVersion=true Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20220720191347.1343986-1-dave.hansen%40linux.intel.com
2022-11-04x86/cpu: Add several Intel server CPU model numbersTony Luck1-1/+10
These servers are all on the public versions of the roadmap. The model numbers for Grand Ridge, Granite Rapids, and Sierra Forest were included in the September 2022 edition of the Instruction Set Extensions document. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20221103203310.5058-1-tony.luck@intel.com
2022-10-30Merge tag 'mm-hotfixes-stable-2022-10-28' of ↵Linus Torvalds2-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Eight fix pre-6.0 bugs and the remainder address issues which were introduced in the 6.1-rc merge cycle, or address issues which aren't considered sufficiently serious to warrant a -stable backport" * tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region lib: maple_tree: remove unneeded initialization in mtree_range_walk() mmap: fix remap_file_pages() regression mm/shmem: ensure proper fallback if page faults mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page() x86: fortify: kmsan: fix KMSAN fortify builds x86: asm: make sure __put_user_size() evaluates pointer once Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default x86/purgatory: disable KMSAN instrumentation mm: kmsan: export kmsan_copy_page_meta() mm: migrate: fix return value if all subpages of THPs are migrated successfully mm/uffd: fix vma check on userfault for wp mm: prep_compound_tail() clear page->private mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs mm/page_isolation: fix clang deadcode warning fs/ext4/super.c: remove unused `deprecated_msg' ipc/msg.c: fix percpu_counter use after free memory tier, sysfs: rename attribute "nodes" to "nodelist" MAINTAINERS: git://github.com -> https://github.com for nilfs2 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops ...
2022-10-28x86: fortify: kmsan: fix KMSAN fortify buildsAlexander Potapenko1-4/+7
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the respective __msan_XXX functions, and that none of the macros are redefined twice. This should allow building kernel with both CONFIG_KMSAN and CONFIG_FORTIFY_SOURCE. Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com Link: https://github.com/google/kmsan/issues/89 Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28x86: asm: make sure __put_user_size() evaluates pointer onceAlexander Potapenko1-6/+7
User access macros must ensure their arguments are evaluated only once if they are used more than once in the macro body. Adding instrument_put_user() to __put_user_size() resulted in double evaluation of the `ptr` argument, which led to correctness issues when performing e.g. unsafe_put_user(..., p++, ...). To fix those issues, evaluate the `ptr` argument of __put_user_size() at the beginning of the macro. Link: https://lkml.kernel.org/r/20221024212144.2852069-4-glider@google.com Fixes: 888f84a6da4d ("x86: asm: instrument usercopy in get_user() and put_user()") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: youling257 <youling257@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-24x86/syscall: Include asm/ptrace.h in syscall_wrapper headerJiri Olsa1-1/+1
With just the forward declaration of the 'struct pt_regs' in syscall_wrapper.h, the syscall stub functions: __[x64|ia32]_sys_*(struct pt_regs *regs) will have different definition of 'regs' argument in BTF data based on which object file they are defined in. If the syscall's object includes 'struct pt_regs' definition, the BTF argument data will point to a 'struct pt_regs' record, like: [226] STRUCT 'pt_regs' size=168 vlen=21 'r15' type_id=1 bits_offset=0 'r14' type_id=1 bits_offset=64 'r13' type_id=1 bits_offset=128 ... If not, it will point to a fwd declaration record: [15439] FWD 'pt_regs' fwd_kind=struct and make bpf tracing program hooking on those functions unable to access fields from 'struct pt_regs'. Include asm/ptrace.h directly in syscall_wrapper.h to make sure all syscalls see 'struct pt_regs' definition. This then results in BTF for '__*_sys_*(struct pt_regs *regs)' functions to point to the actual struct, not just the forward declaration. [ bp: No Fixes tag as this is not really a bug fix but "adjustment" so that BTF is happy. ] Reported-by: Akihiro HARAI <jharai0815@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: <stable@vger.kernel.org> # this is needed only for BTF so kernels >= 5.15 Link: https://lore.kernel.org/r/20221018122708.823792-1-jolsa@kernel.org
2022-10-24kill extern of vsyscall32_sysctlAl Viro1-1/+0
it's been dead for years. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-10-21iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()Charlotte Tan1-1/+3
arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI Reserved region, but it seems like it should accept an NVS region as well. The ACPI spec https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html uses similar wording for "Reserved" and "NVS" region types; for NVS regions it says "This range of addresses is in use or reserved by the system and must not be used by the operating system." There is an old comment on this mailing list that also suggests NVS regions should pass the arch_rmrr_sanity_check() test: The warnings come from arch_rmrr_sanity_check() since it checks whether the region is E820_TYPE_RESERVED. However, if the purpose of the check is to detect RMRR has regions that may be used by OS as free memory, isn't E820_TYPE_NVS safe, too? This patch overlaps with another proposed patch that would add the region type to the log since sometimes the bug reporter sees this log on the console but doesn't know to include the kernel log: https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/ Here's an example of the "Firmware Bug" apparent false positive (wrapped for line length): DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for fixes DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x000000006f760000-0x000000006f762fff] This is the snippet from the e820 table: BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data Fixes: f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Cc: Will Mortensen <will@extrahop.com> Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443 Signed-off-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-19x86/signal/32: Merge native and compat 32-bit signal codeBrian Gerst1-0/+1
There are significant differences between signal handling on 32-bit vs. 64-bit, like different structure layouts and legacy syscalls. Instead of duplicating that code for native and compat, merge both versions into one file. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/20220606203802.158958-8-brgerst@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2022-10-19x86/signal: Add ABI prefixes to frame setup functionsBrian Gerst2-5/+5
Add ABI prefixes to the frame setup functions that didn't already have them. To avoid compiler warnings and prepare for moving these functions to separate files, make them non-static. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/20220606203802.158958-7-brgerst@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2022-10-19x86/signal: Merge get_sigframe()Brian Gerst1-0/+4
Adapt the native get_sigframe() function so that the compat signal code can use it. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/20220606203802.158958-6-brgerst@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2022-10-19x86: Remove __USER32_DSBrian Gerst2-5/+0
Replace all users with the equivalent __USER_DS, which will make merging native and compat code simpler. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/20220606203802.158958-5-brgerst@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>