summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2025-11-05KVM: TDX: Guard VM state transitions with "all" the locksSean Christopherson1-10/+49
Acquire kvm->lock, kvm->slots_lock, and all vcpu->mutex locks when servicing ioctls that (a) transition the TD to a new state, i.e. when doing INIT or FINALIZE or (b) are only valid if the TD is in a specific state, i.e. when initializing a vCPU or memory region. Acquiring "all" the locks fixes several KVM_BUG_ON() situations where a SEAMCALL can fail due to racing actions, e.g. if tdh_vp_create() contends with either tdh_mr_extend() or tdh_mr_finalize(). For all intents and purposes, the paths in question are fully serialized, i.e. there's no reason to try and allow anything remotely interesting to happen. Smack 'em with a big hammer instead of trying to be "nice". Acquire kvm->lock to prevent VM-wide things from happening, slots_lock to prevent kvm_mmu_zap_all_fast(), and _all_ vCPU mutexes to prevent vCPUs from interefering. Use the recently-renamed kvm_arch_vcpu_unlocked_ioctl() to service the vCPU-scoped ioctls to avoid a lock inversion problem, e.g. due to taking vcpu->mutex outside kvm->lock. See also commit ecf371f8b02d ("KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight"), which fixed a similar bug with SEV intra-host migration where an in-flight vCPU creation could race with a VM-wide state transition. Define a fancy new CLASS to handle the lock+check => unlock logic with guard()-like syntax: CLASS(tdx_vm_state_guard, guard)(kvm); if (IS_ERR(guard)) return PTR_ERR(guard); to simplify juggling the many locks. Note! Take kvm->slots_lock *after* all vcpu->mutex locks, as per KVM's soon-to-be-documented lock ordering rules[1]. Link: https://lore.kernel.org/all/20251016235538.171962-1-seanjc@google.com [1] Reported-by: Yan Zhao <yan.y.zhao@intel.com> Closes: https://lore.kernel.org/all/aLFiPq1smdzN3Ary@yzhao56-desk.sh.intel.com Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Don't copy "cmd" back to userspace for KVM_TDX_CAPABILITIESSean Christopherson1-3/+3
Don't copy the kvm_tdx_cmd structure back to userspace when handling KVM_TDX_CAPABILITIES, as tdx_get_capabilities() doesn't modify hw_error or any other fields. Opportunistically hoist the call to tdx_get_capabilities() outside of the kvm->lock critical section, as getting the capabilities doesn't touch the VM in any way, e.g. doesn't even take @kvm. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Use guard() to acquire kvm->lock in tdx_vm_ioctl()Sean Christopherson1-6/+3
Use guard() in tdx_vm_ioctl() to tidy up the code a small amount, but more importantly to minimize the diff of a future change, which will use guard-like semantics to acquire and release multiple locks. No functional change intended. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Convert INIT_MEM_REGION and INIT_VCPU to "unlocked" vCPU ioctlSean Christopherson6-6/+55
Handle the KVM_TDX_INIT_MEM_REGION and KVM_TDX_INIT_VCPU vCPU sub-ioctls in the unlocked variant, i.e. outside of vcpu->mutex, in anticipation of taking kvm->lock along with all other vCPU mutexes, at which point the sub-ioctls _must_ start without vcpu->mutex held. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Add tdx_get_cmd() helper to get and validate sub-ioctl commandSean Christopherson1-13/+20
Add a helper to copy a kvm_tdx_cmd structure from userspace and verify that must-be-zero fields are indeed zero. No functional change intended. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Add macro to retry SEAMCALLs when forcing vCPUs out of guestSean Christopherson1-49/+33
Add a macro to handle kicking vCPUs out of the guest and retrying SEAMCALLs on TDX_OPERAND_BUSY instead of providing small helpers to be used by each SEAMCALL. Wrapping the SEAMCALLs in a macro makes it a little harder to tease out which SEAMCALL is being made, but significantly reduces the amount of copy+paste code, and makes it all but impossible to leave an elevated wait_for_sept_zap. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05Merge tag 'rust-fixes-6.18' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are enabled due to extra checking performed by the compiler and an upstream issue already fixed for Rust 1.93.0 - Fix future Rust 1.93.0 builds by supporting the stabilized name for the 'no-jump-tables' flag - Fix a couple private/broken intra-doc links uncovered by the future move of pin-init to 'syn' * tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0 rust: kbuild: workaround `rustdoc` doctests modifier bug rust: kbuild: treat `build_error` and `rustdoc` as kernel objects rust: condvar: fix broken intra-doc link rust: devres: fix private intra-doc link
2025-11-05KVM: TDX: Assert that mmu_lock is held for write when removing S-EPT entriesSean Christopherson1-0/+7
Unconditionally assert that mmu_lock is held for write when removing S-EPT entries, not just when removing S-EPT entries triggers certain conditions, e.g. needs to do TDH_MEM_TRACK or kick vCPUs out of the guest. Conditionally asserting implies that it's safe to hold mmu_lock for read when those paths aren't hit, which is simply not true, as KVM doesn't support removing S-EPT entries under read-lock. Only two paths lead to remove_external_spte(), and both paths asserts that mmu_lock is held for write (tdp_mmu_set_spte() via lockdep, and handle_removed_pt() via KVM_BUG_ON()). Deliberately leave lockdep assertions in the "no vCPUs" helpers to document that wait_for_sept_zap is guarded by holding mmu_lock for write, and keep the conditional assert in tdx_track() as well, but with a comment to help explain why holding mmu_lock for write matters (above and beyond why tdx_sept_remove_private_spte()'s requirements). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Derive error argument names from the local variable namesSean Christopherson1-6/+7
When printing SEAMCALL errors, use the name of the variable holding an error parameter instead of the register from whence it came, so that flows which use descriptive variable names will similarly print descriptive error messages. Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Combine KVM_BUG_ON + pr_tdx_error() into TDX_BUG_ON()Sean Christopherson1-64/+46
Add TDX_BUG_ON() macros (with varying numbers of arguments) to deduplicate the myriad flows that do KVM_BUG_ON()/WARN_ON_ONCE() followed by a call to pr_tdx_error(). In addition to reducing boilerplate copy+paste code, this also helps ensure that KVM provides consistent handling of SEAMCALL errors. Opportunistically convert a handful of bare WARN_ON_ONCE() paths to the equivalent of KVM_BUG_ON(), i.e. have them terminate the VM. If a SEAMCALL error is fatal enough to WARN on, it's fatal enough to terminate the TD. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Fold tdx_sept_zap_private_spte() into tdx_sept_remove_private_spte()Sean Christopherson1-30/+11
Do TDH_MEM_RANGE_BLOCK directly in tdx_sept_remove_private_spte() instead of using a one-off helper now that the nr_premapped tracking is gone. Opportunistically drop the WARN on hugepages, which was dead code (see the KVM_BUG_ON() in tdx_sept_remove_private_spte()). No functional change intended. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: ADD pages to the TD image while populating mirror EPT entriesSean Christopherson2-72/+44
When populating the initial memory image for a TDX guest, ADD pages to the TD as part of establishing the mappings in the mirror EPT, as opposed to creating the mappings and then doing ADD after the fact. Doing ADD in the S-EPT callbacks eliminates the need to track "premapped" pages, as the mirror EPT (M-EPT) and S-EPT are always synchronized, e.g. if ADD fails, KVM reverts to the previous M-EPT entry (guaranteed to be !PRESENT). Eliminating the hole where the M-EPT can have a mapping that doesn't exist in the S-EPT in turn obviates the need to handle errors that are unique to encountering a missing S-EPT entry (see tdx_is_sept_zap_err_due_to_premap()). Keeping the M-EPT and S-EPT synchronized also eliminates the need to check for unconsumed "premap" entries during tdx_td_finalize(), as there simply can't be any such entries. Dropping that check in particular reduces the overall cognitive load, as the management of nr_premapped with respect to removal of S-EPT is _very_ subtle. E.g. successful removal of an S-EPT entry after it completed ADD doesn't adjust nr_premapped, but it's not clear why that's "ok" but having half-baked entries is not (it's not truly "ok" in that removing pages from the image will likely prevent the guest from booting, but from KVM's perspective it's "ok"). Doing ADD in the S-EPT path requires passing an argument via a scratch field, but the current approach of tracking the number of "premapped" pages effectively does the same. And the "premapped" counter is much more dangerous, as it doesn't have a singular lock to protect its usage, since nr_premapped can be modified as soon as mmu_lock is dropped, at least in theory. I.e. nr_premapped is guarded by slots_lock, but only for "happy" paths. Note, this approach was used/tried at various points in TDX development, but was ultimately discarded due to a desire to avoid stashing temporary state in kvm_tdx. But as above, KVM ended up with such state anyways, and fully committing to using temporary state provides better access rules (100% guarded by slots_lock), and makes several edge cases flat out impossible. Note #2, continue to extend the measurement outside of mmu_lock, as it's a slow operation (typically 16 SEAMCALLs per page whose data is included in the measurement), and doesn't *need* to be done under mmu_lock, e.g. for consistency purposes. However, MR.EXTEND isn't _that_ slow, e.g. ~1ms latency to measure a full page, so if it needs to be done under mmu_lock in the future, e.g. because KVM gains a flow that can remove S-EPT entries during KVM_TDX_INIT_MEM_REGION, then extending the measurement can also be moved into the S-EPT mapping path (again, only if absolutely necessary). P.S. _If_ MR.EXTEND is moved into the S-EPT path, take care not to return an error up the stack if TDH_MR_EXTEND fails, as removing the M-EPT entry but not the S-EPT entry would result in inconsistent state! Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole callerSean Christopherson1-28/+21
Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as providing a one-off helper for effectively three lines of code is at best a wash, and splitting the code makes the comment for smp_rmb() _extremely_ confusing as the comment talks about reading kvm->arch.pre_fault_allowed before kvm_tdx->state, but the immediately visible code does the exact opposite. Opportunistically rewrite the comments to more explicitly explain who is checking what, as well as _why_ the ordering matters. No functional change intended. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Use atomic64_dec_return() instead of a poor equivalentSean Christopherson1-4/+2
Use atomic64_dec_return() when decrementing the number of "pre-mapped" S-EPT pages to ensure that the count can't go negative without KVM noticing. In theory, checking for '0' and then decrementing in a separate operation could miss a 0=>-1 transition. In practice, such a condition is impossible because nr_premapped is protected by slots_lock, i.e. doesn't actually need to be an atomic (that wart will be addressed shortly). Don't bother trying to keep the count non-negative, as the KVM_BUG_ON() ensures the VM is dead, i.e. there's no point in trying to limp along. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Avoid a double-KVM_BUG_ON() in tdx_sept_zap_private_spte()Sean Christopherson1-2/+4
Return -EIO immediately from tdx_sept_zap_private_spte() if the number of to-be-added pages underflows, so that the following "KVM_BUG_ON(err, kvm)" isn't also triggered. Isolating the check from the "is premap error" if-statement will also allow adding a lockdep assertion that premap errors are encountered if and only if slots_lock is held. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: WARN if mirror SPTE doesn't have full RWX when creating S-EPT mappingSean Christopherson3-4/+7
Pass in the mirror_spte to kvm_x86_ops.set_external_spte() to provide symmetry with .remove_external_spte(), and assert in TDX that the mirror SPTE is shadow-present with full RWX permissions (the TDX-Module doesn't allow the hypervisor to control protections). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: x86/mmu: Drop the return code from kvm_x86_ops.remove_external_spte()Sean Christopherson3-17/+12
Drop the return code from kvm_x86_ops.remove_external_spte(), a.k.a. tdx_sept_remove_private_spte(), as KVM simply does a KVM_BUG_ON() failure, and that KVM_BUG_ON() is redundant since all error paths in TDX also do a KVM_BUG_ON(). Opportunistically pass the spte instead of the pfn, as the API is clearly about removing an spte. Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte()Sean Christopherson1-50/+40
Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte() as a step towards having "remove" be the one and only function that deals with removing/zapping/dropping a SPTE, e.g. to avoid having to differentiate between "zap", "drop", and "remove". Eliminating the "drop" helper also gets rid of what is effectively dead code due to redundant checks, e.g. on an HKID being assigned. No functional change intended. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Return -EIO, not -EINVAL, on a KVM_BUG_ON() conditionSean Christopherson1-6/+6
Return -EIO when a KVM_BUG_ON() is tripped, as KVM's ABI is to return -EIO when a VM has been killed due to a KVM bug, not -EINVAL. Note, many (all?) of the affected paths never propagate the error code to userspace, i.e. this is about internal consistency more than anything else. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Drop superfluous page pinning in S-EPT managementYan Zhao1-24/+4
Don't explicitly pin pages when mapping pages into the S-EPT, guest_memfd doesn't support page migration in any capacity, i.e. there are no migrate callbacks because guest_memfd pages *can't* be migrated. See the WARN in kvm_gmem_migrate_folio(). Eliminating TDX's explicit pinning will also enable guest_memfd to support in-place conversion between shared and private memory[1][2]. Because KVM cannot distinguish between speculative/transient refcounts and the intentional refcount for TDX on private pages[3], failing to release private page refcount in TDX could cause guest_memfd to indefinitely wait on decreasing the refcount for the splitting. Under normal conditions, not holding an extra page refcount in TDX is safe because guest_memfd ensures pages are retained until its invalidation notification to KVM MMU is completed. However, if there're bugs in KVM/TDX module, not holding an extra refcount when a page is mapped in S-EPT could result in a page being released from guest_memfd while still mapped in the S-EPT. But, doing work to make a fatal error slightly less fatal is a net negative when that extra work adds complexity and confusion. Several approaches were considered to address the refcount issue, including - Attempting to modify the KVM unmap operation to return a failure, which was deemed too complex and potentially incorrect[4]. - Increasing the folio reference count only upon S-EPT zapping failure[5]. - Use page flags or page_ext to indicate a page is still used by TDX[6], which does not work for HVO (HugeTLB Vmemmap Optimization). - Setting HWPOISON bit or leveraging folio_set_hugetlb_hwpoison()[7]. Due to the complexity or inappropriateness of these approaches, and the fact that S-EPT zapping failure is currently only possible when there are bugs in the KVM or TDX module, which is very rare in a production kernel, a straightforward approach of simply not holding the page reference count in TDX was chosen[8]. When S-EPT zapping errors occur, KVM_BUG_ON() is invoked to kick off all vCPUs and mark the VM as dead. Although there is a potential window that a private page mapped in the S-EPT could be reallocated and used outside the VM, the loud warning from KVM_BUG_ON() should provide sufficient debug information. To be robust against bugs, the user can enable panic_on_warn as normal. Link: https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com [1] Link: https://youtu.be/UnBKahkAon4 [2] Link: https://lore.kernel.org/all/CAGtprH_ypohFy9TOJ8Emm_roT4XbQUtLKZNFcM6Fr+fhTFkE0Q@mail.gmail.com [3] Link: https://lore.kernel.org/all/aEEEJbTzlncbRaRA@yzhao56-desk.sh.intel.com [4] Link: https://lore.kernel.org/all/aE%2Fq9VKkmaCcuwpU@yzhao56-desk.sh.intel.com [5] Link: https://lore.kernel.org/all/aFkeBtuNBN1RrDAJ@yzhao56-desk.sh.intel.com [6] Link: https://lore.kernel.org/all/diqzy0tikran.fsf@ackerleytng-ctop.c.googlers.com [7] Link: https://lore.kernel.org/all/53ea5239f8ef9d8df9af593647243c10435fd219.camel@intel.com [8] Suggested-by: Vishal Annapurve <vannapurve@google.com> Suggested-by: Ackerley Tng <ackerleytng@google.com> Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> [sean: extract out of hugepage series, massage changelog accordingly] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: x86/mmu: Rename kvm_tdp_map_page() to kvm_tdp_page_prefault()Sean Christopherson1-3/+3
Rename kvm_tdp_map_page() to kvm_tdp_page_prefault() now that it's used only by kvm_arch_vcpu_pre_fault_memory(). No functional change intended. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05Revert "KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"Sean Christopherson3-36/+7
Remove the helper and exports that were added to allow TDX code to reuse kvm_tdp_map_page() for its gmem post-populate flow now that a dedicated TDP MMU API is provided to install a mapping given a gfn+pfn pair. This reverts commit 2608f105760115e94a03efd9f12f8fbfd1f9af4b. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: x86/mmu: WARN if KVM attempts to map into an invalid TDP MMU rootSean Christopherson1-0/+2
When mapping into the TDP MMU, WARN (if KVM_PROVE_MMU=y) if the root is invalid, e.g. if KVM is attempting to insert a mapping without checking if the information and MMU context is fresh. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMUSean Christopherson3-8/+84
Add and use a new API for mapping a private pfn from guest_memfd into the TDP MMU from TDX's post-populate hook instead of partially open-coding the functionality into the TDX code. Sharing code with the pre-fault path sounded good on paper, but it's fatally flawed as simulating a fault loses the pfn, and calling back into gmem to re-retrieve the pfn creates locking problems, e.g. kvm_gmem_populate() already holds the gmem invalidation lock. Providing a dedicated API will also removing several MMU exports that ideally would not be exposed outside of the MMU, let alone to vendor code. On that topic, opportunistically drop the kvm_mmu_load() export. Leave kvm_tdp_mmu_gpa_is_mapped() alone for now; the entire commit that added kvm_tdp_mmu_gpa_is_mapped() will be removed in the near future. Gate the API on CONFIG_KVM_GUEST_MEMFD=y as private memory _must_ be backed by guest_memfd. Add a lockdep-only assert to that the incoming pfn is indeed backed by guest_memfd, and that the gmem instance's invalidate lock is held (which, combined with slots_lock being held, obviates the need to check for a stale "fault"). Cc: Michael Roth <michael.roth@amd.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20250709232103.zwmufocd3l7sqk7y@amd.com Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: TDX: Drop PROVE_MMU=y sanity check on to-be-populated mappingsSean Christopherson1-14/+0
Drop TDX's sanity check that a mirror EPT mapping isn't zapped between creating said mapping and doing TDH.MEM.PAGE.ADD, as the check is simultaneously superfluous and incomplete. Per commit 2608f1057601 ("KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"), the justification for introducing kvm_tdp_mmu_gpa_is_mapped() was to check that the target gfn was pre-populated, with a link that points to this snippet: : > One small question: : > : > What if the memory region passed to KVM_TDX_INIT_MEM_REGION hasn't been pre- : > populated? If we want to make KVM_TDX_INIT_MEM_REGION work with these regions, : > then we still need to do the real map. Or we can make KVM_TDX_INIT_MEM_REGION : > return error when it finds the region hasn't been pre-populated? : : Return an error. I don't love the idea of bleeding so many TDX details into : userspace, but I'm pretty sure that ship sailed a long, long time ago. But that justification makes little sense for the final code, as the check on nr_premapped after TDH.MEM.PAGE.ADD will detect and return an error if KVM attempted to zap a S-EPT entry (tdx_sept_zap_private_spte() will fail on TDH.MEM.RANGE.BLOCK due lack of a valid S-EPT entry). And as evidenced by the "is mapped?" code being guarded with CONFIG_KVM_PROVE_MMU=y, KVM is NOT relying on the check for general correctness. The sanity check is also incomplete in the sense that mmu_lock is dropped between the check and TDH.MEM.PAGE.ADD, i.e. will only detect KVM bugs that zap SPTEs in a very specific window (note, this also applies to the check on nr_premapped). Removing the sanity check will allow removing kvm_tdp_mmu_gpa_is_mapped(), which has no business being exposed to vendor code, and more importantly will pave the way for eliminating the "pre-map" approach entirely in favor of doing TDH.MEM.PAGE.ADD under mmu_lock. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: Rename kvm_arch_vcpu_async_ioctl() to kvm_arch_vcpu_unlocked_ioctl()Sean Christopherson7-14/+14
Rename the "async" ioctl API to "unlocked" so that upcoming usage in x86's TDX code doesn't result in a massive misnomer. To avoid having to retry SEAMCALLs, TDX needs to acquire kvm->lock *and* all vcpu->mutex locks, and acquiring all of those locks after/inside the current vCPU's mutex is a non-starter. However, TDX also needs to acquire the vCPU's mutex and load the vCPU, i.e. the handling is very much not async to the vCPU. No functional change intended. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: Make support for kvm_arch_vcpu_async_ioctl() mandatorySean Christopherson7-5/+12
Implement kvm_arch_vcpu_async_ioctl() "natively" in x86 and arm64 instead of relying on an #ifdef'd stub, and drop HAVE_KVM_VCPU_ASYNC_IOCTL in anticipation of using the API on x86. Once x86 uses the API, providing a stub for one architecture and having all other architectures opt-in requires more code than simply implementing the API in the lone holdout. Eliminating the Kconfig will also reduce churn if the API is renamed in the future (spoiler alert). No functional change intended. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05arm64: dts: broadcom: bcm2712: Add watchdog DT nodeStanimir Varbanov1-0/+9
Add watchdog device-tree node for bcm2712 SoC. Signed-off-by: Stanimir Varbanov <svarbanov@suse.de> Link: https://lore.kernel.org/r/20251031183309.1163384-5-svarbanov@suse.de Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-11-05arm64: dts: broadcom: bcm2712: rpi-5: Add ethernet0 aliasLaurent Pinchart1-0/+6
The RP1 ethernet controller DT node contains a local-mac-address property to pass the MAC address from the boot loader to the kernel. The boot loader does not fill the MAC address as the ethernet0 alias is missing. Add it. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Andrea della Porta <andrea.porta@suse.com> Link: https://lore.kernel.org/r/20251102111443.18206-1-laurent.pinchart@ideasonboard.com Fixes: 43456fdfc014 ("arm64: dts: broadcom: Enable RP1 ethernet for Raspberry Pi 5") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-11-05arm64: dts: broadcom: Assign clock rates in eth node for RPi5Andrea della Porta1-0/+4
In Raspberry Pi 5 DTS, the Ethernet clock rates must be assigned as the default clock register values are not valid for the Ethernet interface to function. This can be done either in rp1_clocks node or in rp1_eth node. Define the rates in rp1_eth node, as those clocks are 'leaf' clocks used specifically by the Ethernet device only. Fixes: 43456fdfc014 ("arm64: dts: broadcom: Enable RP1 ethernet for Raspberry Pi 5") Signed-off-by: Andrea della Porta <andrea.porta@suse.com> Link: https://lore.kernel.org/r/20251021135533.5517-1-andrea.porta@suse.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-11-05arm64: dts: broadcom: bcm2712: Enable RNGPeter Robinson1-0/+6
The RNG is the same IP as in the bcm2711 so add the device tree block to enable the device. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20250927075643.716179-1-pbrobinson@gmail.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-11-05x86/boot: Fix page table access in 5-level to 4-level paging transitionUsama Arif1-4/+7
When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags. Bits above the physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The PGD entry is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com
2025-11-05x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systemsYazen Ghannam1-68/+53
Scalable MCA systems have a per-CPU register that gives the APIC LVT offset for the thresholding and deferred error interrupts. Currently, this register is read once to set up the deferred error interrupt and then read again for each thresholding block. Furthermore, the APIC LVT registers are configured each time, but they only need to be configured once per-CPU. Move the APIC LVT setup to the early part of CPU init, so that the registers are set up once. Also, this ensures that the kernel is ready to service the interrupts before the individual error sources (each MCA bank) are enabled. Apply this change only to SMCA systems to avoid breaking any legacy behavior. The deferred error interrupt is technically advertised by the SUCCOR feature. However, this was first made available on SMCA systems. Therefore, only set up the deferred error interrupt on SMCA systems and simplify the code. Guidance from hardware designers is that the LVT offsets provided from the platform should be used. The kernel should not try to enforce specific values. However, the kernel should check that an LVT offset is not reused for multiple sources. Therefore, remove the extra checking and value enforcement from the MCE code. The "reuse/conflict" case is already handled in setup_APIC_eilvt(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce: Unify AMD DFR handler with MCA PollingYazen Ghannam3-99/+49
AMD systems optionally support a deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the interrupt, etc. However, Scalable MCA systems include a pair of registers, MCA_DESTAT and MCA_DEADDR, that should be checked for valid errors. This check should be done whenever MCA registers are polled. Currently, the deferred error interrupt does this check, but the MCA polling function does not. Call the MCA polling function when handling the deferred error interrupt. This keeps all "polling" cases in a common function. Add an SMCA status check helper. This will do the same status check and register clearing that the interrupt handler has done. And it extends the common polling flow to find AMD deferred errors. Clear the MCA_DESTAT register at the end of the handler rather than the beginning. This maintains the procedure that the 'status' register must be cleared as the final step. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05arm64: tegra: Add pinctrl definitions for pcie-ep nodesNiklas Cassel1-0/+61
When the PCIe controller is running in endpoint mode, the controller initialization is triggered by a PERST# (PCIe reset) GPIO deassertion. The driver has configured an IRQ to trigger when the PERST# GPIO changes state. Without the pinctrl definition, we do not get an IRQ when PERST# is deasserted, so the PCIe controller never gets initialized. Add the missing definitions, so that the controller actually gets initialized. Fixes: ec142c44b026 ("arm64: tegra: Add P2U and PCIe controller nodes to Tegra234 DT") Fixes: 0580286d0d22 ("arm64: tegra: Add Tegra234 PCIe C4 EP definition") Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> [treding@nvidia.com: add blank lines to separate blocks] Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-05x86/mce: Unify AMD THR handler with MCA PollingYazen Ghannam1-28/+23
AMD systems optionally support an MCA thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA thresholding is managed using the MCA_MISC registers within an MCA bank. The OS will need to modify the hardware error count field in order to reset the threshold limit and rearm the interrupt. Management of the MCA_MISC register should be done as a follow up to the basic MCA polling flow. It should not be the main focus of the interrupt handler. Furthermore, future systems will have the ability to send an MCA thresholding interrupt to the OS even when the OS does not manage the feature, i.e. MCA_MISC registers are Read-as-Zero/Locked. Call the common MCA polling function when handling the MCA thresholding interrupt. This will allow the OS to find any valid errors whether or not the MCA thresholding feature is OS-managed. Also, this allows the common MCA polling options and kernel parameters to apply to AMD systems. Add a callback to the MCA polling function to check and reset any threshold blocks that have reached their threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)Marc Herbert1-1/+1
While restricting access, a7e1f67ed29f ("x86/msr: Filter MSR writes") also added warning and started tainting the kernel. But the warning message never mentioned tainting. Moreover, this uses the "CPU_OUT_OF_SPEC" flag which is not clearly related to MSRs: that flag is overloaded by several, fairly different situations, including some much scarier ones. So, without an expert around (thank you Dave Hansen), it would have been practically impossible to root cause the tainting from just the log file at hand. So it would be prudent to explicitly mention in the logs when the tainting happens so that debugging crashes can be made easier. Fix this by simply appending the CPU_OUT_OF_SPEC flag to the warning message. This readability issue happened when staring at logs involving the Intel Memory Latency Checker (among many other things going on in that log). The MLC disables hardware prefetch. [ bp: Massage and extend commit message. ] Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251101-tainted-msr-v1-1-e00658ba04d4@linux.intel.com
2025-11-05arm64: dts: mediatek: Add GCE header for MT8196Jason-JH Lin1-0/+612
Add GCE header define for GCE Thread Priority and GCE Event IDs that used in the MT8196 dtsi. Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05arm64: dts: mediatek: mt7981b: Add reserved memory for TF-ASjoerd Simons1-0/+12
Add memory range handled by ARM Trusted Firmware Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Sjoerd Simons <sjoerd@collabora.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05arm64: dts: mediatek: mt7981b: Configure UART0 pinmuxSjoerd Simons1-0/+9
Add explicit pinctrl configuration for UART0 Signed-off-by: Sjoerd Simons <sjoerd@collabora.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05arm64: dts: exynos7870-j6lte: enable display panel supportKaustabh Chakraborty1-14/+24
Enable DECON and DSI nodes, and add the compatible display panel and appropriate panel timings for this device. Also, remove the simple-framebuffer node in favor of the panel. This device has a 720x1480 AMOLED Samsung AMS561RA01 panel with S6E8AA5X01 controller. Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-5-c1f77fb16b87@disroot.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05arm64: dts: exynos7870-a2corelte: enable display panel supportKaustabh Chakraborty1-14/+43
Enable DECON and DSI nodes, and add the compatible display panel and appropriate panel timings for this device. Also, remove the simple-framebuffer node in favor of the panel. This device has a 540x960 Synaptics TD4101 display panel. Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-4-c1f77fb16b87@disroot.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05arm64: dts: exynos7870-on7xelte: enable display panel supportKaustabh Chakraborty1-14/+43
Enable DECON and DSI nodes, and add the compatible display panel and appropriate panel timings for this device. Also, remove the simple-framebuffer node in favor of the panel. This device has a 1080x1920 Synaptics TD4300 display panel. Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-3-c1f77fb16b87@disroot.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05arm64: dts: exynos7870: add DSI supportKaustabh Chakraborty1-0/+84
Add devicetree nodes for MIPI PHYs, Samsung's DECON and DSIM blocks, and DECON IOMMU devicetree nodes. Enables SoC support for hardware to be able to drive a MIPI DSI display. Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-2-c1f77fb16b87@disroot.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05MIPS: BCM47XX: remove creating a fixed phyHeiner Kallweit1-7/+0
Now that b44 ethernet driver creates a fixed phy if needed, we can remove this here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/8983b705-6bca-4728-9283-7aa60f49340f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05m68k: coldfire: remove creating a fixed phyHeiner Kallweit1-15/+0
Now that the fec ethernet driver creates a fixed phy if needed, we can remove this here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/212e0cb5-a2f5-460f-8e03-3c3369d0acf1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05x86: uaccess: don't use runtime-const rewriting in modulesLinus Torvalds3-6/+14
The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code. But by the time I used it for the x86-64 user space limit handling in commit 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue"), I had completely repressed that fact. And it all happens to work because the only code that currently actually gets inlined by modules is for the access_ok() limit check, where the default constant value works even when not fixed up. Because at least I had intentionally made it be something that is in the non-canonical address space region. But it's technically very wrong, and it does mean that at least in theory, the use of 'access_ok()' + '__get_user()' can trigger the same speculation issue with non-canonical addresses that the original commit was all about. The pattern is unusual enough that this probably doesn't matter in practice, but very wrong is still very wrong. Also, let's fix it before the nice optimized scoped user accessor helpers that Thomas Gleixner is working on cause this pseudo-constant to then be more widely used. This all came up due to an unrelated discussion with Mateusz Guzik about using the runtime const infrastructure for names_cachep accesses too. There the modular case was much more obviously broken, and Mateusz noted it in his 'v2' of the patch series. That then made me notice how broken 'access_ok()' had been in modules all along. Mea culpa, mea maxima culpa. Fix it by simply not using the runtime-const code in modules, and just using the USER_PTR_MAX variable value instead. This is not performance-critical like the core user accessor functions (get_user() and friends) are. Also make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations by having the x86 runtime-const.h header file error out if included by modules. Fixes: 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue") Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Triggered-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-05arm: dts: ti: omap: Drop unnecessary properties for SDHCI nodeCharan Pedumuru7-10/+0
Remove the "ti,needs-special-reset", "ti,needs-special-hs-handling", and "cap-mmc-dual-data-rate" properties from the DTS for the sdhci nodes, as the sdhci-omap driver does not depend on these properties. Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com> Link: https://lore.kernel.org/r/20251024-ti-sdhci-omap-v5-2-df5f6f033a38@gmail.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-05arm: dts: ti: omap: am335x-pepper: Fix vmmc-supply property typoCharan Pedumuru1-1/+1
Rectify a typo for the property "vmmc-supply" to resolve the errors detected by dtb_check. Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com> Link: https://lore.kernel.org/r/20251024-ti-sdhci-omap-v5-1-df5f6f033a38@gmail.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-05ARM: dts: omap3: n900: Correct obsolete TWL4030 power compatibleJihed Chaibi1-1/+1
The "ti,twl4030-power-n900" compatible string is obsolete and is not supported by any in-kernel driver. Currently, the kernel falls back to the second entry, "ti,twl4030-power-idle-osc-off", to bind a driver to this node. Make this fallback explicit by removing the obsolete board-specific compatible. This preserves the existing functionality while making the DTS compliant with the new, stricter 'ti,twl.yaml' binding. Fixes: daebabd578647 ("mfd: twl4030-power: Fix PM idle pin configuration to not conflict with regulators") Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Link: https://lore.kernel.org/r/20250914192516.164629-4-jihed.chaibi.dev@gmail.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>