kernel/linux.git/drivers/iommu, branch v6.6.142

iommu/vt-d: Disable DMAR for Intel Q35 IGFX

2026-05-23T11:03:34+00:00

commit 2cda2e10dc8343ae01eae9e999a876b7e7d37861 upstream. Intel Q35 integrated graphics (8086:29b2) exhibits broken DMAR behaviour similar to other G4x/GM45 devices for which DMAR is already disabled via quirks. When DMAR is enabled, the system may hard lock up during boot or early device initialization, requiring a reset. Add the missing PCI ID to the existing quirk list to disable DMAR for this device. Fixes: 1f76249cc3be ("iommu/vt-d: Declare Broadwell igfx dmar support snafu") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=201185 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216064 Signed-off-by: Naval Alcalá Link: https://lore.kernel.org/r/20260410161622.13549-1-ari@naval.cat Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman

iommufd: vfio compatibility extension check for noiommu mode

2026-05-23T11:03:16+00:00

[ Upstream commit 7147ec874ea08c322d779d8eba28946e294ed1f3 ] VFIO_CHECK_EXTENSION should return false for TYPE1_IOMMU variants when in NO-IOMMU mode and IOMMUFD compat container is set. This change makes the behavior match VFIO_CONTAINER in noiommu mode. It also prevents userspace from incorrectly attempting to use TYPE1 IOMMU operations in a no-iommu context. Fixes: d624d6652a65 ("iommufd: vfio container FD ioctl compatibility") Link: https://patch.msgid.link/r/20260213183636.3340-1-jacob.pan@linux.microsoft.com Signed-off-by: Jacob Pan Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin

iommu/amd: serialize sequence allocation under concurrent TLB invalidations

2026-05-17T15:13:35+00:00

commit 9e249c48412828e807afddc21527eb734dc9bd3d upstream. With concurrent TLB invalidations, completion wait randomly gets timed out because cmd_sem_val was incremented outside the IOMMU spinlock, allowing CMD_COMPL_WAIT commands to be queued out of sequence and breaking the ordering assumption in wait_on_sem(). Move the cmd_sem_val increment under iommu->lock so completion sequence allocation is serialized with command queuing. And remove the unnecessary return. Fixes: d2a0cac10597 ("iommu/amd: move wait_on_sem() out of spinlock") Tested-by: Srikanth Aithal Reported-by: Srikanth Aithal Signed-off-by: Ankit Soni Reviewed-by: Vasant Hegde Signed-off-by: Joerg Roedel [Salvatore Bonaccorso: Backport to v6.12.y where f32fe7cb0198 ("iommu/amd: Add support to remap/unmap IOMMU buffers for kdump") is not present] Signed-off-by: Salvatore Bonaccorso Signed-off-by: Sasha Levin

iommu/amd: Use atomic64_inc_return() in iommu.c

2026-05-17T15:13:35+00:00

commit 5ce73c524f5fb5abd7b1bfed0115474b4fb437b4 upstream. Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref) to use optimized implementation and ease register pressure around the primitive for targets that implement optimized variant. Signed-off-by: Uros Bizjak Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: Will Deacon Cc: Robin Murphy Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20241007084356.47799-1-ubizjak@gmail.com Signed-off-by: Joerg Roedel Signed-off-by: Salvatore Bonaccorso Stable-dep-of: 9e249c48412828e807afddc21527eb734dc9bd3d ("iommu/amd: serialize sequence allocation under concurrent TLB invalidations") Signed-off-by: Sasha Levin

iommufd: Fix a race with concurrent allocation and unmap

2026-05-17T15:13:34+00:00

commit 8602018b1f17fbdaa5e5d79f4c8603ad20640c12 upstream. iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop body when getting to the more expensive unmap operations. This is fine on its own, except the loop condition is based on the first area that matches the unmap address range. If a concurrent call to map picks an area that was unmapped in previous iterations, the loop mistakenly tries to unmap it. This is reproducible by having one userspace thread map buffers and pass them to another thread that unmaps them. The problem manifests as EBUSY errors with single page mappings. Fix this by advancing the start pointer after unmapping an area. This ensures each iteration only examines the IOVA range that remains mapped, which is guaranteed not to have overlaps. Cc: stable@vger.kernel.org Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com Signed-off-by: Sina Hassani Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

iommu/vt-d: Fix intel iommu iotlb sync hardlockup and retry

2026-03-25T10:06:04+00:00

commit fe89277c9ceb0d6af0aa665bcf24a41d8b1b79cd upstream. During the qi_check_fault process after an IOMMU ITE event, requests at odd-numbered positions in the queue are set to QI_ABORT, only satisfying single-request submissions. However, qi_submit_sync now supports multiple simultaneous submissions, and can't guarantee that the wait_desc will be at an odd-numbered position. Therefore, if an item times out, IOMMU can't re-initiate the request, resulting in an infinite polling wait. This modifies the process by setting the status of all requests already fetched by IOMMU and recorded as QI_IN_USE status (including wait_desc requests) to QI_ABORT, thus enabling multiple requests to be resubmitted. Fixes: 8a1d82462540 ("iommu/vt-d: Multiple descriptors per qi_submit_sync()") Cc: stable@vger.kernel.org Signed-off-by: Guanghui Feng Tested-by: Shuai Xue Reviewed-by: Shuai Xue Reviewed-by: Samiullah Khawaja Link: https://lore.kernel.org/r/20260306101516.3885775-1-guanghuifeng@linux.alibaba.com Signed-off-by: Lu Baolu Fixes: 8a1d82462540 ("iommu/vt-d: Multiple descriptors per qi_submit_sync()") Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman

iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode

2026-03-04T12:21:11+00:00

[ Upstream commit 10e60d87813989e20eac1f3eda30b3bae461e7f9 ] Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system. For example, if a VM fails to connect to the PCIe device, "virsh destroy" is executed to release resources and isolate the fault, but a hard-lockup occurs while releasing the group fd. Call Trace: qi_submit_sync qi_flush_dev_iotlb intel_pasid_tear_down_entry device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput Although pci_device_is_present() is slower than pci_dev_is_disconnected(), it still takes only ~70 µs on a ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed and width increase. Besides, devtlb_invalidation_with_pasid() is called only in the paths below, which are far less frequent than memory map/unmap. 1. mm-struct release 2. {attach,release}_dev 3. set/remove PASID 4. dirty-tracking setup The gain in system stability far outweighs the negligible cost of using pci_device_is_present() instead of pci_dev_is_disconnected() to decide when to skip ATS invalidation, especially under GDR high-load conditions. Fixes: 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/amd: move wait_on_sem() out of spinlock

2026-03-04T12:20:45+00:00

[ Upstream commit d2a0cac10597068567d336e85fa3cbdbe8ca62bf ] With iommu.strict=1, the existing completion wait path can cause soft lockups under stressed environment, as wait_on_sem() busy-waits under the spinlock with interrupts disabled. Move the completion wait in iommu_completion_wait() out of the spinlock. wait_on_sem() only polls the hardware-updated cmd_sem and does not require iommu->lock, so holding the lock during the busy wait unnecessarily increases contention and extends the time with interrupts disabled. Signed-off-by: Ankit Soni Reviewed-by: Vasant Hegde Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency

2026-03-04T12:20:44+00:00

[ Upstream commit df180b1a4cc51011c5f8c52c7ec02ad2e42962de ] The SMMU CMDQ lock is highly contentious when there are multiple CPUs issuing commands and the queue is nearly full. The lock has the following states: - 0: Unlocked - >0: Shared lock held with count - INT_MIN+N: Exclusive lock held, where N is the # of shared waiters - INT_MIN: Exclusive lock held, no shared waiters When multiple CPUs are polling for space in the queue, they attempt to grab the exclusive lock to update the cons pointer from the hardware. If they fail to get the lock, they will spin until either the cons pointer is updated by another CPU. The current code allows the possibility of shared lock starvation if there is a constant stream of CPUs trying to grab the exclusive lock. This leads to severe latency issues and soft lockups. Consider the following scenario where CPU1's attempt to acquire the shared lock is starved by CPU2 and CPU0 contending for the exclusive lock. CPU0 (exclusive) | CPU1 (shared) | CPU2 (exclusive) | `cmdq->lock` -------------------------------------------------------------------------- trylock() //takes | | | 0 | shared_lock() | | INT_MIN | fetch_inc() | | INT_MIN | no return | | INT_MIN + 1 | spins // VAL >= 0 | | INT_MIN + 1 unlock() | spins... | | INT_MIN + 1 set_release(0) | spins... | | 0 see[NOTE] (done) | (sees 0) | trylock() // takes | 0 | *exits loop* | cmpxchg(0, INT_MIN) | 0 | | *cuts in* | INT_MIN | cmpxchg(0, 1) | | INT_MIN | fails // != 0 | | INT_MIN | spins // VAL >= 0 | | INT_MIN | *starved* | | INT_MIN [NOTE] The current code resets the exclusive lock to 0 regardless of the state of the lock. This causes two problems: 1. It opens the possibility of back-to-back exclusive locks and the downstream effect of starving shared lock. 2. The count of shared lock waiters are lost. To mitigate this, we release the exclusive lock by only clearing the sign bit while retaining the shared lock waiter count as a way to avoid starving the shared lock waiters. Also deleted cmpxchg loop while trying to acquire the shared lock as it is not needed. The waiters can see the positive lock count and proceed immediately after the exclusive lock is released. Exclusive lock is not starved in that submitters will try exclusive lock first when new spaces become available. Reviewed-by: Mostafa Saleh Reviewed-by: Nicolin Chen Signed-off-by: Alexander Grest Signed-off-by: Jacob Pan Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/vt-d: Flush cache for PASID table before using it

2026-03-04T12:19:46+00:00

[ Upstream commit 22d169bdd2849fe6bd18c2643742e1c02be6451c ] When writing the address of a freshly allocated zero-initialized PASID table to a PASID directory entry, do that after the CPU cache flush for this PASID table, not before it, to avoid the time window when this PASID table may be already used by non-coherent IOMMU hardware while its contents in RAM is still some random old data, not zero-initialized. Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Signed-off-by: Dmytro Maluka Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20251221123508.37495-1-dmaluka@chromium.org Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin