kernel/linux.git/include/linux/iommu.h, branch v7.2-rc1

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

2026-06-17T19:33:23+00:00

Pull iommufd updates from Jason Gunthorpe: "All various fixes: - Typo breaking the veventq uAPI for 32 bit userspace - Several Sashiko found errors in the veventq and fault fd paths - Fix incorrect use of dmabuf locks, and possible races with iommufd destroy and dmabuf revoke - Sashiko errors found in the uAPI validation for IOMMU_HWPT_INVALIDATE" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu: Avoid copying the user array twice in the full-array copy helper iommufd/selftest: Add invalidation entry_num and entry_len boundary tests iommufd: Set upper bounds on cache invalidation entry_num and entry_len iommufd: Clarify IOAS_MAP_FILE dma-buf support iommufd: Destroy the pages content after detaching from dmabuf iommufd: Take dma_resv lock before dma_buf_unpin() in release path iommufd/selftest: Cover invalid read counts on vEVENTQ FD iommufd: Avoid partial fault group delivery in iommufd_fault_fops_read() iommufd: Break the loop on failure in iommufd_fault_fops_read() iommufd: Reject invalid read count in iommufd_fault_fops_read() iommufd: Propagate allocation failure in iommufd_veventq_deliver_fetch() iommufd: Reject invalid read count in iommufd_veventq_fops_read() iommufd: Rewind header length in done if iommufd_veventq_fops_read() fails iommufd/selftest: Add boundary tests for veventq_depth iommufd: Set veventq_depth upper bound iommufd: Move vevent memory allocation outside spinlock iommufd: Fix data_len byte-count vs element-count mismatch iommufd: Use sizeof(*hdr) instead of sizeof(hdr) in veventq read

iommu: Avoid copying the user array twice in the full-array copy helper

2026-06-12T13:44:44+00:00

iommu_copy_struct_from_full_user_array() copies a whole user array into a kernel buffer. In the common case, where user entry_len equals destination entry size, it takes a fast path and copies the whole array with a single copy_from_user(). That fast path does not return, so it falls through into the item-by-item copy_struct_from_user() loop and copies every entry a second time. For an equal entry_len that loop is just a copy_from_user() of the same bytes, so the whole array is copied twice for no benefit. Return right after the bulk copy. The per-item loop then runs only on the slow path, where entry_len differs and each entry needs size adaption. Fixes: 4f2e59ccb698 ("iommu: Add iommu_copy_struct_from_full_user_array helper") Link: https://patch.msgid.link/r/6c9eca4ff584cb977661e97799ac6fe934e7f51c.1780521606.git.nicolinc@nvidia.com Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Nicolin Chen Reviewed-by: Lu Baolu Signed-off-by: Jason Gunthorpe

iommupt: Add PT_FEAT_DETAILED_GATHER

2026-05-19T08:45:38+00:00

Generating the ARM SMMUv3 and RISC-V invalidation commands optimally requires some additional details from iommupt: - leaf_levels_bitmap is used to compute the ARM Range Invalidation Table Top Level hint - leaf_levels_bitmap is also used to compute the stride when generating single invalidations to invalidate once per leaf - table_levels_bitmap also computes the ARM TTL for future cases when there are no leaves Put these under a feature since only two drivers need to calculate them. This is also useful for the coming kunit iotlb invalidation test to know more about what invalidation is happening. Signed-off-by: Jason Gunthorpe Reviewed-by: Pranjal Shrivastava Tested-by: Andrew Jones Signed-off-by: Joerg Roedel

iommu: Split the kdoc comment for struct iommu_iotlb_gather

2026-05-19T08:45:38+00:00

Use in-line member documentation and add some small clarifications to the members. This is preparation to add more members. - Note that pgsize is only used by arm-smmuv3 - Note that freelist is only used by iommupt - Reword queued to emphasize the flush-all behavior Signed-off-by: Jason Gunthorpe Reviewed-by: Pranjal Shrivastava Tested-by: Andrew Jones Signed-off-by: Joerg Roedel

Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next

2026-04-09T12:18:27+00:00

iommu: Do not call drivers for empty gathers

2026-03-27T08:07:13+00:00

An empty gather is coded with start=U64_MAX, end=0 and several drivers go on to convert that to a size with: end - start + 1 Which gives 2 for an empty gather. This then causes Weird Stuff to happen (for example an UBSAN splat in VT-d) that is hopefully harmless, but maybe not. Prevent drivers from being called right in iommu_iotlb_sync(). Auditing shows that AMD, Intel, Mediatek and RSIC-V drivers all do things on these empty gathers. Further, there are several callers that can trigger empty gathers, especially in unusual conditions. For example iommu_map_nosync() will call a 0 size unmap on some error paths. Also in VFIO, iommupt and other places. Cc: stable@vger.kernel.org Reported-by: Janusz Krzysztofik Closes: https://lore.kernel.org/r/11145826.aFP6jjVeTY@jkrzyszt-mobl2.ger.corp.intel.com Signed-off-by: Jason Gunthorpe Reviewed-by: Lu Baolu Reviewed-by: Samiullah Khawaja Reviewed-by: Robin Murphy Reviewed-by: Vasant Hegde Signed-off-by: Joerg Roedel

iommu: Add device ATS supported capability

2026-03-17T13:05:05+00:00

PCIe ATS may be disabled by platform firmware, root complex limitations, or kernel policy even when a device advertises the ATS capability in its PCI configuration space. Add a new IOMMU_CAP_PCI_ATS_SUPPORTED capability to allow IOMMU drivers to report the effective ATS decision for a device. When this capability is true for a device, ATS may be enabled for that device, but it does not imply that ATS is currently enabled. A subsequent patch will extend iommufd to expose the effective ATS status to userspace. Suggested-by: Jason Gunthorpe Reviewed-by: Jason Gunthorpe Signed-off-by: Shameer Kolothum Signed-off-by: Joerg Roedel

iommupt: Directly call iommupt's unmap_range()

2026-03-17T12:57:39+00:00

The common algorithm in iommupt does not require the iommu_pgsize() calculations, it can directly unmap any arbitrary range. Add a new function pointer to directly call an iommupt unmap_range op and make __iommu_unmap() call it directly. Gives about a 5% gain on single page unmappings. The function pointer is run through pt_iommu_ops instead of iommu_domain_ops to discourage using it outside iommupt. All drivers with their own page tables should continue to use the simplified map/unmap_pages() style interfaces. Reviewed-by: Samiullah Khawaja Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Reviewed-by: Lu Baolu Signed-off-by: Joerg Roedel

iommu: Introduce pci_dev_reset_iommu_prepare/done()

2026-01-10T09:26:44+00:00

PCIe permits a device to ignore ATS invalidation TLPs while processing a reset. This creates a problem visible to the OS where an ATS invalidation command will time out. E.g. an SVA domain will have no coordination with a reset event and can racily issue ATS invalidations to a resetting device. The OS should do something to mitigate this as we do not want production systems to be reporting critical ATS failures, especially in a hypervisor environment. Broadly, OS could arrange to ignore the timeouts, block page table mutations to prevent invalidations, or disable and block ATS. The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and block ATS before initiating a Function Level Reset. It also mentions that other reset methods could have the same vulnerability as well. Provide a callback from the PCI subsystem that will enclose the reset and have the iommu core temporarily change all the attached RID/PASID domains group->blocking_domain so that the IOMMU hardware would fence any incoming ATS queries. And IOMMU drivers should also synchronously stop issuing new ATS invalidations and wait for all ATS invalidations to complete. This can avoid any ATS invaliation timeouts. However, if there is a domain attachment/replacement happening during an ongoing reset, ATS routines may be re-activated between the two function calls. So, introduce a new resetting_domain in the iommu_group structure to reject any concurrent attach_dev/set_dev_pasid call during a reset for a concern of compatibility failure. Since this changes the behavior of an attach operation, update the uAPI accordingly. Note that there are two corner cases: 1. Devices in the same iommu_group Since an attachment is always per iommu_group, this means that any sibling devices in the iommu_group cannot change domain, to prevent race conditions. 2. An SR-IOV PF that is being reset while its VF is not In such case, the VF itself is already broken. So, there is no point in preventing PF from going through the iommu reset. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Dheeraj Kumar Srivastava Signed-off-by: Nicolin Chen Signed-off-by: Joerg Roedel

iommu: Add iommu_driver_get_domain_for_dev() helper

2026-01-10T09:26:43+00:00

There is a need to stage a resetting PCI device to temporarily the blocked domain and then attach back to its previously attached domain after reset. This can be simply done by keeping the "previously attached domain" in the iommu_group->domain pointer while adding an iommu_group->resetting_domain, which gives troubles to IOMMU drivers using the iommu_get_domain_for_dev() for a device's physical domain in order to program IOMMU hardware. And in such for-driver use cases, the iommu_group->mutex must be held, so it doesn't fit in external callers that don't hold the iommu_group->mutex. Introduce a new iommu_driver_get_domain_for_dev() helper, exclusively for driver use cases that hold the iommu_group->mutex, to separate from those external use cases. Add a lockdep_assert_not_held to the existing iommu_get_domain_for_dev() and highlight that in a kdoc. Reviewed-by: Kevin Tian Reviewed-by: Lu Baolu Reviewed-by: Jason Gunthorpe Tested-by: Dheeraj Kumar Srivastava Signed-off-by: Nicolin Chen Signed-off-by: Joerg Roedel