diff options
| author | Jason Gunthorpe <jgg@nvidia.com> | 2025-11-04 21:30:05 +0300 |
|---|---|---|
| committer | Joerg Roedel <joerg.roedel@amd.com> | 2025-11-05 11:07:10 +0300 |
| commit | dcd6a011a8d523a114af2360a8753de5bd60c139 (patch) | |
| tree | 41c1bc03c2c8835769d0b7e472a023171c2ecd00 /include/linux | |
| parent | 7c53f4238aa8bfb476e177263133ead2eeb8d55d (diff) | |
| download | linux-dcd6a011a8d523a114af2360a8753de5bd60c139.tar.xz | |
iommupt: Add map_pages op
map is slightly complicated because it has to handle a number of special
edge cases:
- Overmapping a previously shared, but now empty, table level with an OA.
Requries validating and freeing the possibly empty tables
- Doing the above across an entire to-be-created contiguous entry
- Installing a new shared table level concurrently with another thread
- Expanding the table by adding more top levels
Table expansion is a unique feature of AMDv1, this version is quite
similar except we handle racing concurrent lockless map. The table top
pointer and starting level are encoded in a single uintptr_t which ensures
we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use
that fixed point as its starting point. Concurrent expansion is handled
with a table global spinlock.
When inserting a new table entry map checks that the entire portion of the
table is empty. This includes freeing any empty lower tables that will be
overwritten by an OA. A separate free list is used while checking and
collecting all the empty lower tables so that writing the new entry is
uninterrupted, either the new entry fully writes or nothing changes.
A special fast path for PAGE_SIZE is implemented that does a direct walk
to the leaf level and installs a single entry. This gives ~15% improvement
for iommu_map() when mapping lists of single pages.
This version sits under the iommu_domain_ops as map_pages() but does not
require the external page size calculation. The implementation is actually
map_range() and can do arbitrary ranges, internally handling all the
validation and supporting any arrangment of page sizes. A future series
can optimize iommu_map() to take advantage of this.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/generic_pt/iommu.h | 59 |
1 files changed, 59 insertions, 0 deletions
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index ceb6bc9cea37..0d59423024d5 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -11,6 +11,7 @@ struct iommu_iotlb_gather; struct pt_iommu_ops; +struct pt_iommu_driver_ops; /** * DOC: IOMMU Radix Page Table @@ -44,6 +45,12 @@ struct pt_iommu { const struct pt_iommu_ops *ops; /** + * @driver_ops: Function pointers provided by the HW driver to help + * manage HW details like caches. + */ + const struct pt_iommu_driver_ops *driver_ops; + + /** * @nid: Node ID to use for table memory allocations. The IOMMU driver * may want to set the NID to the device's NID, if there are multiple * table walkers. @@ -84,6 +91,53 @@ struct pt_iommu_ops { void (*deinit)(struct pt_iommu *iommu_table); }; +/** + * struct pt_iommu_driver_ops - HW IOTLB cache flushing operations + * + * The IOMMU driver should implement these using container_of(iommu_table) to + * get to it's iommu_domain derived structure. All ops can be called in atomic + * contexts as they are buried under DMA API calls. + */ +struct pt_iommu_driver_ops { + /** + * @change_top: Update the top of table pointer + * @iommu_table: Table to operate on + * @top_paddr: New CPU physical address of the top pointer + * @top_level: IOMMU PT level of the new top + * + * Called under the get_top_lock() spinlock. The driver must update all + * HW references to this domain with a new top address and + * configuration. On return mappings placed in the new top must be + * reachable by the HW. + * + * top_level encodes the level in IOMMU PT format, level 0 is the + * smallest page size increasing from there. This has to be translated + * to any HW specific format. During this call the new top will not be + * visible to any other API. + * + * This op is only used by PT_FEAT_DYNAMIC_TOP, and is required if + * enabled. + */ + void (*change_top)(struct pt_iommu *iommu_table, phys_addr_t top_paddr, + unsigned int top_level); + + /** + * @get_top_lock: lock to hold when changing the table top + * @iommu_table: Table to operate on + * + * Return a lock to hold when changing the table top page table from + * being stored in HW. The lock will be held prior to calling + * change_top() and released once the top is fully visible. + * + * Typically this would be a lock that protects the iommu_domain's + * attachment list. + * + * This op is only used by PT_FEAT_DYNAMIC_TOP, and is required if + * enabled. + */ + spinlock_t *(*get_top_lock)(struct pt_iommu *iommu_table); +}; + static inline void pt_iommu_deinit(struct pt_iommu *iommu_table) { /* @@ -120,6 +174,10 @@ struct pt_iommu_cfg { #define IOMMU_PROTOTYPES(fmt) \ phys_addr_t pt_iommu_##fmt##_iova_to_phys(struct iommu_domain *domain, \ dma_addr_t iova); \ + int pt_iommu_##fmt##_map_pages(struct iommu_domain *domain, \ + unsigned long iova, phys_addr_t paddr, \ + size_t pgsize, size_t pgcount, \ + int prot, gfp_t gfp, size_t *mapped); \ size_t pt_iommu_##fmt##_unmap_pages( \ struct iommu_domain *domain, unsigned long iova, \ size_t pgsize, size_t pgcount, \ @@ -142,6 +200,7 @@ struct pt_iommu_cfg { */ #define IOMMU_PT_DOMAIN_OPS(fmt) \ .iova_to_phys = &pt_iommu_##fmt##_iova_to_phys, \ + .map_pages = &pt_iommu_##fmt##_map_pages, \ .unmap_pages = &pt_iommu_##fmt##_unmap_pages /* |
