<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/generic_pt, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-11-28T07:43:55+00:00</updated>
<entry>
<title>iommupt/vtd: Support mgaw's less than a 4 level walk for first stage</title>
<updated>2025-11-28T07:43:55+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-27T23:54:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1eb0ae6fbd544619c50b4a4d96ccb4676cac03cb'/>
<id>urn:sha1:1eb0ae6fbd544619c50b4a4d96ccb4676cac03cb</id>
<content type='text'>
If the IOVA is limited to less than 48 the page table will be constructed
with a 3 level configuration which is unsupported by hardware.

Like the second stage the caller needs to pass in both the top_level an
the vasz to specify a table that has more levels than required to hold the
IOVA range.

Fixes: 6cbc09b7719e ("iommu/vt-d: Restore previous domain::aperture_end calculation")
Reported-by: Calvin Owens &lt;calvin@wbinvd.org&gt;
Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Tested-by: Calvin Owens &lt;calvin@wbinvd.org&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires</title>
<updated>2025-11-28T07:43:03+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-27T23:54:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d856f9d27885c499d96ab7fe506083346ccf145d'/>
<id>urn:sha1:d856f9d27885c499d96ab7fe506083346ccf145d</id>
<content type='text'>
VT-d second stage HW specifies both the maximum IOVA and the supported
table walk starting points. Weirdly there is HW that only supports a 4
level walk but has a maximum IOVA that only needs 3.

The current code miscalculates this and creates a wrongly sized page table
which ultimately fails the compatibility check for number of levels.

This is fixed by allowing the page table to be created with both a vasz
and top_level input. The vasz will set the aperture for the domain while
the top_level will set the page table geometry.

Add top_level to vtdss and correct the logic in VT-d to generate the right
top_level and vasz from mgaw and sagaw.

Fixes: d373449d8e97 ("iommu/vt-d: Use the generic iommu page table")
Reported-by: Calvin Owens &lt;calvin@wbinvd.org&gt;
Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Tested-by: Calvin Owens &lt;calvin@wbinvd.org&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add the Intel VT-d second stage page table format</title>
<updated>2025-11-05T08:50:17+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-10-23T18:22:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5448c1558f60d4051c90938f2878c6fb20e2982a'/>
<id>urn:sha1:5448c1558f60d4051c90938f2878c6fb20e2982a</id>
<content type='text'>
The VT-d second stage format is almost the same as the x86 PAE format,
except the bit encodings in the PTE are different and a few new PTE
features, like force coherency are present.

Among all the formats it is unique in not having a designated present bit.

Comparing the performance of several operations to the existing version:

iommu_map()
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,     53,66    ,      50,64      ,  21.21
     2^21,     59,70    ,      56,67      ,  16.16
     2^30,     54,66    ,      52,63      ,  17.17
 256*2^12,    384,524   ,     337,516     ,  34.34
 256*2^21,    387,632   ,     336,626     ,  46.46
 256*2^30,    376,629   ,     323,623     ,  48.48

iommu_unmap()
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,     67,86    ,      63,84      ,  25.25
     2^21,     64,84    ,      59,80      ,  26.26
     2^30,     59,78    ,      56,74      ,  24.24
 256*2^12,    216,335   ,     198,317     ,  37.37
 256*2^21,    245,350   ,     232,344     ,  32.32
 256*2^30,    248,345   ,     226,339     ,  33.33

Cc: Tina Zhang &lt;tina.zhang@intel.com&gt;
Cc: Kevin Tian &lt;kevin.tian@intel.com&gt;
Cc: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Use the incoherent start/stop functions for PT_FEAT_DMA_INCOHERENT</title>
<updated>2025-11-05T08:47:44+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-10-23T18:22:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aefd967dab6469f5b827b59e50016a760dcc1fbc'/>
<id>urn:sha1:aefd967dab6469f5b827b59e50016a760dcc1fbc</id>
<content type='text'>
This is the first step to supporting an incoherent walker, start and stop
the incoherence around the allocation and frees of the page table memory.

The iommu_pages API maps this to dma_map/unmap_single(), or arch cache
flushing calls.

Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add the x86 64 bit page table format</title>
<updated>2025-11-05T08:07:14+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aef5de756ea871ab44e3a1a87be6c944e6587c51'/>
<id>urn:sha1:aef5de756ea871ab44e3a1a87be6c944e6587c51</id>
<content type='text'>
This is used by x86 CPUs and can be used in AMD/VT-d x86 IOMMUs. When a
x86 IOMMU is running SVA the MM will be using this format.

This implementation follows the AMD v2 io-pgtable version.

There is nothing remarkable here, the format can have 4 or 5 levels and
limited support for different page sizes. No contiguous pages support.

x86 uses a sign extension mechanism where the top bits of the VA must
match the sign bit. The core code supports this through
PT_FEAT_SIGN_EXTEND which creates and upper and lower VA range. All the
new operations will work correctly in both spaces, however currently there
is no way to report the upper space to other layers. Future patches can
improve that.

In principle this can support 3 page tables levels matching the 32 bit PAE
table format, but no iommu driver needs this. The focus is on the modern
64 bit 4 and 5 level formats.

Comparing the performance of several operations to the existing version:

iommu_map()
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,     71,61    ,      66,58      , -13.13
     2^21,     66,60    ,      61,55      , -10.10
     2^30,     59,56    ,      56,54      ,  -3.03
 256*2^12,    392,1360  ,     345,1289    ,  73.73
 256*2^21,    383,1159  ,     335,1145    ,  70.70
 256*2^30,    378,965   ,     331,892     ,  62.62

iommu_unmap()
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,     77,71    ,      73,68      ,  -7.07
     2^21,     76,70    ,      70,66      ,  -6.06
     2^30,     69,66    ,      66,63      ,  -4.04
 256*2^12,    225,899   ,     210,870     ,  75.75
 256*2^21,    262,722   ,     248,710     ,  65.65
 256*2^30,    251,643   ,     244,634     ,  61.61

The small -ve values in the iommu_unmap() are due to the core code calling
iommu_pgsize() before invoking the domain op. This is unncessary with this
implementation. Future work optimizes this and gets to 2%, 4%, 3%.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Vasant Hegde &lt;vasant.hegde@amd.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommufd: Change the selftest to use iommupt instead of xarray</title>
<updated>2025-11-05T08:07:13+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e93d5945ed5bb086431e83eed7ab98b6c058cc0b'/>
<id>urn:sha1:e93d5945ed5bb086431e83eed7ab98b6c058cc0b</id>
<content type='text'>
The iommufd self test uses an xarray to store the pfns and their orders to
emulate a page table. Make it act more like a real iommu driver by
replacing the xarray with an iommupt based page table. The new AMDv1 mock
format behaves similarly to the xarray.

Add set_dirty() as a iommu_pt operation to allow the test suite to
simulate HW dirty.

Userspace can select between several formats including the normal AMDv1
format and a special MOCK_IOMMUPT_HUGE variation for testing huge page
dirty tracking. To make the dirty tracking test work the page table must
only store exactly 2M huge pages otherwise the logic the test uses
fails. They cannot be broken up or combined.

Aside from aligning the selftest with a real page table implementation,
this helps test the iommupt code itself.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add a mock pagetable format for iommufd selftest to use</title>
<updated>2025-11-05T08:07:13+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e5359dcc617a2174d834bab4083340196615d8bd'/>
<id>urn:sha1:e5359dcc617a2174d834bab4083340196615d8bd</id>
<content type='text'>
The iommufd self test uses an xarray to store the pfns and their orders to
emulate a page table. Slightly modify the amdv1 page table to create a
real page table that has similar properties:

 - 2k base granule to simulate something like a 4k page table on a 64K
   PAGE_SIZE ARM system
 - Contiguous page support for every PFN order
 - Dirty tracking

AMDv1 is the closest format, as it is the only one that already supports
every page size. Tweak it to have only 5 levels and an 11 bit base granule
and compile it separately as a format variant.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add read_and_clear_dirty op</title>
<updated>2025-11-05T08:07:11+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4a00f943489103b4b9edff9f39bd484efbfb15fa'/>
<id>urn:sha1:4a00f943489103b4b9edff9f39bd484efbfb15fa</id>
<content type='text'>
IOMMU HW now supports updating a dirty bit in an entry when a DMA writes
to the entry's VA range. iommufd has a uAPI to read and clear the dirty
bits from the tables.

This is a trivial recursive descent algorithm to read and optionally clear
the dirty bits. The format needs a function to tell if a contiguous entry
is dirty, and a function to clear a contiguous entry back to clean.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add map_pages op</title>
<updated>2025-11-05T08:07:10+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dcd6a011a8d523a114af2360a8753de5bd60c139'/>
<id>urn:sha1:dcd6a011a8d523a114af2360a8753de5bd60c139</id>
<content type='text'>
map is slightly complicated because it has to handle a number of special
edge cases:
 - Overmapping a previously shared, but now empty, table level with an OA.
   Requries validating and freeing the possibly empty tables
 - Doing the above across an entire to-be-created contiguous entry
 - Installing a new shared table level concurrently with another thread
 - Expanding the table by adding more top levels

Table expansion is a unique feature of AMDv1, this version is quite
similar except we handle racing concurrent lockless map. The table top
pointer and starting level are encoded in a single uintptr_t which ensures
we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use
that fixed point as its starting point. Concurrent expansion is handled
with a table global spinlock.

When inserting a new table entry map checks that the entire portion of the
table is empty. This includes freeing any empty lower tables that will be
overwritten by an OA. A separate free list is used while checking and
collecting all the empty lower tables so that writing the new entry is
uninterrupted, either the new entry fully writes or nothing changes.

A special fast path for PAGE_SIZE is implemented that does a direct walk
to the leaf level and installs a single entry. This gives ~15% improvement
for iommu_map() when mapping lists of single pages.

This version sits under the iommu_domain_ops as map_pages() but does not
require the external page size calculation. The implementation is actually
map_range() and can do arbitrary ranges, internally handling all the
validation and supporting any arrangment of page sizes. A future series
can optimize iommu_map() to take advantage of this.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
<entry>
<title>iommupt: Add unmap_pages op</title>
<updated>2025-11-05T08:07:10+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2025-11-04T18:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7c53f4238aa8bfb476e177263133ead2eeb8d55d'/>
<id>urn:sha1:7c53f4238aa8bfb476e177263133ead2eeb8d55d</id>
<content type='text'>
unmap_pages removes mappings and any fully contained interior tables from
the given range. This follows the now-standard iommu_domain API definition
where it does not split up larger page sizes into smaller. The caller must
perform unmap only on ranges created by map or it must have somehow
otherwise determined safe cut points (eg iommufd/vfio use iova_to_phys to
scan for them)

A future work will provide 'cut' which explicitly does the page size split
if the HW can support it.

unmap is implemented with a recursive descent of the tree. If the caller
provides a VA range that spans an entire table item then the table memory
can be freed as well.

If an entire table item can be freed then this version will also check the
leaf-only level of the tree to ensure that all entries are present to
generate -EINVAL. Many of the existing drivers don't do this extra check.

This version sits under the iommu_domain_ops as unmap_pages() but does not
require the external page size calculation. The implementation is actually
unmap_range() and can do arbitrary ranges, internally handling all the
validation and supporting any arrangment of page sizes. A future series
can optimize __iommu_unmap() to take advantage of this.

Freed page table memory is batched up in the gather and will be freed in
the driver's iotlb_sync() callback after the IOTLB flush completes.

Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Alejandro Jimenez &lt;alejandro.j.jimenez@oracle.com&gt;
Tested-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
</entry>
</feed>
