kernel/linux.git/drivers/iommu/intel, branch v6.1.124

iommu/vt-d: Fix checks and print in pgtable_walk()

2024-12-14T18:53:40+00:00

[ Upstream commit f1645676f25d2c846798f0233c3a953efd62aafb ] There are some issues in pgtable_walk(): 1. Super page is dumped as non-present page 2. dma_pte_superpage() should not check against leaf page table entries 3. Pointer pte is never NULL so checking it is meaningless 4. When an entry is not present, it still makes sense to dump the entry content. Fix 1,2 by checking dma_pte_superpage()'s returned value after level check. Fix 3 by removing pte check. Fix 4 by checking present bit after printing. By this chance, change to print "page table not present" instead of "PTE not present" to be clearer. Fixes: 914ff7719e8a ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan Link: https://lore.kernel.org/r/20241024092146.715063-3-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/vt-d: Fix checks and print in dmar_fault_dump_ptes()

2024-12-14T18:53:39+00:00

[ Upstream commit 6ceb93f952f6ca34823ce3650c902c31b8385b40 ] There are some issues in dmar_fault_dump_ptes(): 1. return value of phys_to_virt() is used for checking if an entry is present. 2. dump is confusing, e.g., "pasid table entry is not present", confusing by unpresent pasid table vs. unpresent pasid table entry. Current code means the former. 3. pgtable_walk() is called without checking if page table is present. Fix 1 by checking present bit of an entry before dump a lower level entry. Fix 2 by removing "entry" string, e.g., "pasid table is not present". Fix 3 by checking page table present before walk. Take issue 3 for example, before fix: [ 442.240357] DMAR: pasid dir entry: 0x000000012c83e001 [ 442.246661] DMAR: pasid table entry[0]: 0x0000000000000000 [ 442.253429] DMAR: pasid table entry[1]: 0x0000000000000000 [ 442.260203] DMAR: pasid table entry[2]: 0x0000000000000000 [ 442.266969] DMAR: pasid table entry[3]: 0x0000000000000000 [ 442.273733] DMAR: pasid table entry[4]: 0x0000000000000000 [ 442.280479] DMAR: pasid table entry[5]: 0x0000000000000000 [ 442.287234] DMAR: pasid table entry[6]: 0x0000000000000000 [ 442.293989] DMAR: pasid table entry[7]: 0x0000000000000000 [ 442.300742] DMAR: PTE not present at level 2 After fix: ... [ 357.241214] DMAR: pasid table entry[6]: 0x0000000000000000 [ 357.248022] DMAR: pasid table entry[7]: 0x0000000000000000 [ 357.254824] DMAR: scalable mode page table is not present Fixes: 914ff7719e8a ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan Link: https://lore.kernel.org/r/20241024092146.715063-2-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices

2024-10-22T13:56:44+00:00

commit 6e02a277f1db24fa039e23783c8921c7b0e5b1b3 upstream. Previously, the domain_context_clear() function incorrectly called pci_for_each_dma_alias() to set up context entries for non-PCI devices. This could lead to kernel hangs or other unexpected behavior. Add a check to only call pci_for_each_dma_alias() for PCI devices. For non-PCI devices, domain_context_clear_one() is called directly. Reported-by: Todd Brandt Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219363 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219349 Fixes: 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with context mapping") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20241014013744.102197-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman

Revert "iommu/vt-d: Retrieve IOMMU perfmon capability information"

2024-10-17T13:22:28+00:00

This reverts commit 586e19c88a0cb58b6ff45ae085b3dd200d862153 which is commit a6a5006dad572a53b5df3f47e1471d207ae9ba49 upstream. This commit is pulled in due to dependency for: 8c91a4bfc7f8 ("iommu: Fix compilation without CONFIG_IOMMU_INTEL") But the patch itself is part of a patchset, should not only include one, and it lead to boot hang on on Kernel 6.1.83+ with Dell PowerEdge R770 and Intel Xeon 6710E, so revert it for stable 6.1.112 Signed-off-by: Jack Wang Signed-off-by: Greg Kroah-Hartman

iommu/vt-d: Fix potential lockup if qi_submit_sync called with 0 count

2024-10-17T13:21:42+00:00

[ Upstream commit 3cf74230c139f208b7fb313ae0054386eee31a81 ] If qi_submit_sync() is invoked with 0 invalidation descriptors (for instance, for DMA draining purposes), we can run into a bug where a submitting thread fails to detect the completion of invalidation_wait. Subsequently, this led to a soft lockup. Currently, there is no impact by this bug on the existing users because no callers are submitting invalidations with 0 descriptors. This fix will enable future users (such as DMA drain) calling qi_submit_sync() with 0 count. Suppose thread T1 invokes qi_submit_sync() with non-zero descriptors, while concurrently, thread T2 calls qi_submit_sync() with zero descriptors. Both threads then enter a while loop, waiting for their respective descriptors to complete. T1 detects its completion (i.e., T1's invalidation_wait status changes to QI_DONE by HW) and proceeds to call reclaim_free_desc() to reclaim all descriptors, potentially including adjacent ones of other threads that are also marked as QI_DONE. During this time, while T2 is waiting to acquire the qi->q_lock, the IOMMU hardware may complete the invalidation for T2, setting its status to QI_DONE. However, if T1's execution of reclaim_free_desc() frees T2's invalidation_wait descriptor and changes its status to QI_FREE, T2 will not observe the QI_DONE status for its invalidation_wait and will indefinitely remain stuck. This soft lockup does not occur when only non-zero descriptors are submitted.In such cases, invalidation descriptors are interspersed among wait descriptors with the status QI_IN_USE, acting as barriers. These barriers prevent the reclaim code from mistakenly freeing descriptors belonging to other submitters. Considered the following example timeline: T1 T2 ======================================== ID1 WD1 while(WD1!=QI_DONE) unlock lock WD1=QI_DONE* WD2 while(WD2!=QI_DONE) unlock lock WD1==QI_DONE? ID1=QI_DONE WD2=DONE* reclaim() ID1=FREE WD1=FREE WD2=FREE unlock soft lockup! T2 never sees QI_DONE in WD2 Where: ID = invalidation descriptor WD = wait descriptor * Written by hardware The root of the problem is that the descriptor status QI_DONE flag is used for two conflicting purposes: 1. signal a descriptor is ready for reclaim (to be freed) 2. signal by the hardware that a wait descriptor is complete The solution (in this patch) is state separation by using QI_FREE flag for #1. Once a thread's invalidation descriptors are complete, their status would be set to QI_FREE. The reclaim_free_desc() function would then only free descriptors marked as QI_FREE instead of those marked as QI_DONE. This change ensures that T2 (from the previous example) will correctly observe the completion of its invalidation_wait (marked as QI_DONE). Signed-off-by: Sanjay K Kumar Signed-off-by: Jacob Pan Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240728210059.1964602-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/vt-d: Always reserve a domain ID for identity setup

2024-10-17T13:21:42+00:00

[ Upstream commit 2c13012e09190174614fd6901857a1b8c199e17d ] We will use a global static identity domain. Reserve a static domain ID for it. Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Jerry Snitselaar Link: https://lore.kernel.org/r/20240809055431.36513-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin

iommu/vt-d: Handle volatile descriptor status read

2024-09-12T09:10:22+00:00

[ Upstream commit b5e86a95541cea737394a1da967df4cd4d8f7182 ] Queued invalidation wait descriptor status is volatile in that IOMMU hardware writes the data upon completion. Use READ_ONCE() to prevent compiler optimizations which ensures memory reads every time. As a side effect, READ_ONCE() also enforces strict types and may add an extra instruction. But it should not have negative performance impact since we use cpu_relax anyway and the extra time(by adding an instruction) may allow IOMMU HW request cacheline ownership easier. e.g. gcc 12.3 BEFORE: 81 38 ad de 00 00 cmpl $0x2,(%rax) AFTER (with READ_ONCE()) 772f: 8b 00 mov (%rax),%eax 7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit Signed-off-by: Jacob Pan Reviewed-by: Kevin Tian Reviewed-by: Yi Liu Link: https://lore.kernel.org/r/20240607173817.3914600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20240702130839.108139-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/vt-d: Fix identity map bounds in si_domain_init()

2024-08-03T06:49:23+00:00

[ Upstream commit 31000732d56b43765d51e08cccb68818fbc0032c ] Intel IOMMU operates on inclusive bounds (both generally aas well as iommu_domain_identity_map()). Meanwhile, for_each_mem_pfn_range() uses exclusive bounds for end_pfn. This creates an off-by-one error when switching between the two. Fixes: c5395d5c4a82 ("intel-iommu: Clean up iommu_domain_identity_map()") Signed-off-by: Jon Pan-Doh Tested-by: Sudheer Dantuluri Suggested-by: Gary Zibrat Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240709234913.2749386-1-pandoh@google.com Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/vt-d: Fix to convert mm pfn to dma pfn

2024-08-03T06:49:23+00:00

[ Upstream commit fb5f50a43d9fd44fd6bc4f4dbcf9d3ec5b556558 ] For the case that VT-d page is smaller than mm page, converting dma pfn should be handled in two cases which are for start pfn and for end pfn. Currently the calculation of end dma pfn is incorrect and the result is less than real page frame number which is causing the mapping of iova always misses some page frames. Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper for converting end dma pfn named mm_to_dma_pfn_end(). Signed-off-by: Yanfei Xu Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Stable-dep-of: 31000732d56b ("iommu/vt-d: Fix identity map bounds in si_domain_init()") Signed-off-by: Sasha Levin

iommu/vt-d: Allocate local memory for page request queue

2024-04-17T09:18:26+00:00

[ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ] The page request queue is per IOMMU, its allocation should be made NUMA-aware for performance reasons. Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling") Signed-off-by: Jacob Pan Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin