summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel
AgeCommit message (Collapse)AuthorFilesLines
2021-10-18iommu/vt-d: Avoid duplicate removing in __domain_mapping()Longpeng(Mike)1-5/+6
The __domain_mapping() always removes the pages in the range from 'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn of the range that the caller wants to map. This would introduce too many duplicated removing and leads the map operation take too long, for example: Map iova=0x100000,nr_pages=0x7d61800 iov_pfn: 0x100000, end_pfn: 0x7e617ff iov_pfn: 0x140000, end_pfn: 0x7e617ff iov_pfn: 0x180000, end_pfn: 0x7e617ff iov_pfn: 0x1c0000, end_pfn: 0x7e617ff iov_pfn: 0x200000, end_pfn: 0x7e617ff ... it takes about 50ms in total. We can reduce the cost by recalculate the 'end_pfn' and limit it to the boundary of the end of this pte page. Map iova=0x100000,nr_pages=0x7d61800 iov_pfn: 0x100000, end_pfn: 0x13ffff iov_pfn: 0x140000, end_pfn: 0x17ffff iov_pfn: 0x180000, end_pfn: 0x1bffff iov_pfn: 0x1c0000, end_pfn: 0x1fffff iov_pfn: 0x200000, end_pfn: 0x23ffff ... it only need 9ms now. This also removes a meaningless BUG_ON() in __domain_mapping(). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Tested-by: Liujunjie <liujunjie23@huawei.com> Link: https://lore.kernel.org/r/20211008000433.1115-1-longpeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Clean up unused PASID updating functionsFenghua Yu1-23/+1
update_pasid() and its call chain are currently unused in the tree because Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly using a different approach and update_pasid() and its call chain will not be used in the new approach. Remove the useless functions. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210920192349.2602141-1-fenghua.yu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Delete dev_has_feat callbackLu Baolu1-54/+5
The commit 262948f8ba573 ("iommu: Delete iommu_dev_has_feature()") has deleted the iommu_dev_has_feature() interface. Remove the dev_has_feat callback to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210929072030.1330225-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Use second level for GPA->HPA translationLu Baolu1-3/+12
The IOMMU VT-d implementation uses the first level for GPA->HPA translation by default. Although both the first level and the second level could handle the DMA translation, they're different in some way. For example, the second level translation has separate controls for the Access/Dirty page tracking. With the first level translation, there's no such control. On the other hand, the second level translation has the page-level control for forcing snoop, but the first level only has global control with pasid granularity. This uses the second level for GPA->HPA translation so that we can provide a consistent hardware interface for use cases like dirty page tracking for live migration. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Check FL and SL capability sanity in scalable modeLu Baolu2-0/+14
An iommu domain could be allocated and mapped before it's attached to any device. This requires that in scalable mode, when the domain is allocated, the format (FL or SL) of the page table must be determined. In order to achieve this, the platform should support consistent SL or FL capabilities on all IOMMU's. This adds a check for this and aborts IOMMU probing if it doesn't meet this requirement. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Remove duplicate identity domain flagLu Baolu1-5/+4
The iommu_domain data structure already has the "type" field to keep the type of a domain. It's unnecessary to have the DOMAIN_FLAG_STATIC_IDENTITY flag in the vt-d implementation. This cleans it up with no functionality change. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Dump DMAR translation structure when DMA fault occursKyung Min Park3-2/+125
When the dmar translation fault happens, the kernel prints a single line fault reason with corresponding hexadecimal code defined in the Intel VT-d specification. Currently, when user wants to debug the translation fault in detail, debugfs is used for dumping the dmar_translation_struct, which is not available when the kernel failed to boot. Dump the DMAR translation structure, pagewalk the IO page table and print the page table entry when the fault happens. This takes effect only when CONFIG_DMAR_DEBUG is enabled. Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Link: https://lore.kernel.org/r/20210815203845.31287-1-kyung.min.park@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel optionTvrtko Ursulin1-1/+5
Handling of intel_iommu kernel command line option should return "true" to indicate option is valid and so avoid logging it as unknown by the core parsing code. Also log unknown sub-options at the notice level to let user know of potential typos or similar. Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://lore.kernel.org/r/20210831112947.310080-1-tvrtko.ursulin@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-28iommu/vt-d: Drop "0x" prefix from PCI bus & device addressesBjorn Helgaas1-3/+3
719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed the DMA fault reason from hex to decimal. It also added "0x" prefixes to the PCI bus/device, e.g., - DMAR: [INTR-REMAP] Request device [00:00.5] + DMAR: [INTR-REMAP] Request device [0x00:0x00.5] These no longer match dev_printk() and other similar messages in dmar_match_pci_path() and dmar_acpi_insert_dev_scope(). Drop the "0x" prefixes from the bus and device addresses. Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20210903193711.483999-1-helgaas@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210922054726.499110-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-09iommu/vt-d: Fix a deadlock in intel_svm_drain_prq()Fenghua Yu1-0/+12
pasid_mutex and dev->iommu->param->lock are held while unbinding mm is flushing IO page fault workqueue and waiting for all page fault works to finish. But an in-flight page fault work also need to hold the two locks while unbinding mm are holding them and waiting for the work to finish. This may cause an ABBA deadlock issue as shown below: idxd 0000:00:0a.0: unbind PASID 2 ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc7+ #549 Not tainted [ 186.615245] ---------- dsa_test/898 is trying to acquire lock: ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at: iopf_queue_flush_dev+0x29/0x60 but task is already holding lock: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at: intel_svm_unbind+0x34/0x1e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (pasid_mutex){+.+.}-{3:3}: __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 intel_svm_page_response+0x8e/0x260 iommu_page_response+0x122/0x200 iopf_handle_group+0x1c2/0x240 process_one_work+0x2a5/0x5a0 worker_thread+0x55/0x400 kthread+0x13b/0x160 ret_from_fork+0x22/0x30 -> #1 (&param->fault_param->lock){+.+.}-{3:3}: __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 iommu_report_device_fault+0xc2/0x170 prq_event_thread+0x28a/0x580 irq_thread_fn+0x28/0x60 irq_thread+0xcf/0x180 kthread+0x13b/0x160 ret_from_fork+0x22/0x30 -> #0 (&param->lock){+.+.}-{3:3}: __lock_acquire+0x1134/0x1d60 lock_acquire+0xc6/0x2e0 __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 iopf_queue_flush_dev+0x29/0x60 intel_svm_drain_prq+0x127/0x210 intel_svm_unbind+0xc5/0x1e0 iommu_sva_unbind_device+0x62/0x80 idxd_cdev_release+0x15a/0x200 [idxd] __fput+0x9c/0x250 ____fput+0xe/0x10 task_work_run+0x64/0xa0 exit_to_user_mode_prepare+0x227/0x230 syscall_exit_to_user_mode+0x2c/0x60 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &param->lock --> &param->fault_param->lock --> pasid_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pasid_mutex); lock(&param->fault_param->lock); lock(pasid_mutex); lock(&param->lock); *** DEADLOCK *** 2 locks held by dsa_test/898: #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at: iommu_sva_unbind_device+0x53/0x80 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at: intel_svm_unbind+0x34/0x1e0 stack backtrace: CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549 Hardware name: Intel Corporation Kabylake Client platform/KBL S DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016 Call Trace: dump_stack_lvl+0x5b/0x74 dump_stack+0x10/0x12 print_circular_bug.cold+0x13d/0x142 check_noncircular+0xf1/0x110 __lock_acquire+0x1134/0x1d60 lock_acquire+0xc6/0x2e0 ? iopf_queue_flush_dev+0x29/0x60 ? pci_mmcfg_read+0xde/0x240 __mutex_lock+0x75/0x730 ? iopf_queue_flush_dev+0x29/0x60 ? pci_mmcfg_read+0xfd/0x240 ? iopf_queue_flush_dev+0x29/0x60 mutex_lock_nested+0x1b/0x20 iopf_queue_flush_dev+0x29/0x60 intel_svm_drain_prq+0x127/0x210 ? intel_pasid_tear_down_entry+0x22e/0x240 intel_svm_unbind+0xc5/0x1e0 iommu_sva_unbind_device+0x62/0x80 idxd_cdev_release+0x15a/0x200 pasid_mutex protects pasid and svm data mapping data. It's unnecessary to hold pasid_mutex while flushing the workqueue. To fix the deadlock issue, unlock pasid_pasid during flushing the workqueue to allow the works to be handled. Fixes: d5b9e4bfe0d8 ("iommu/vt-d: Report prq to io-pgfault framework") Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com [joro: Removed timing information from kernel log messages] Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-09iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()Fenghua Yu1-3/+0
The mm->pasid will be used in intel_svm_free_pasid() after load_pasid() during unbinding mm. Clearing it in load_pasid() will cause PASID cannot be freed in intel_svm_free_pasid(). Additionally mm->pasid was updated already before load_pasid() during pasid allocation. No need to update it again in load_pasid() during binding mm. Don't update mm->pasid to avoid the issues in both binding mm and unbinding mm. Fixes: 4048377414162 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers") Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-20Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', ↵Joerg Roedel7-136/+135
'x86/vt-d' and 'core' into next
2021-08-19iommu/vt-d: Add present bit check in pasid entry setup helpersLiu Yi L1-0/+16
The helper functions should not modify the pasid entries which are still in use. Add a check against present bit. Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210818134852.1847070-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Use pasid_pte_is_present() helper functionLiu Yi L1-1/+1
Use the pasid_pte_is_present() helper for present bit check in the intel_pasid_tear_down_entry(). Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210818134852.1847070-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Drop the kernel doc annotationAndy Shevchenko1-1/+1
Kernel doc validator is unhappy with the following .../perf.c:16: warning: Function parameter or member 'latency_lock' not described in 'DEFINE_SPINLOCK' .../perf.c:16: warning: expecting prototype for perf.c(). Prototype was for DEFINE_SPINLOCK() instead Drop kernel doc annotation since the top comment is not in the required format. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210729163538.40101-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210818134852.1847070-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Allow devices to have more than 32 outstanding PRsLu Baolu2-5/+2
The minimum per-IOMMU PRQ queue size is one 4K page, this is more entries than the hardcoded limit of 32 in the current VT-d code. Some devices can support up to 512 outstanding PRQs but underutilized by this limit of 32. Although, 32 gives some rough fairness when multiple devices share the same IOMMU PRQ queue, but far from optimal for customized use case. This extends the per-IOMMU PRQ queue size to four 4K pages and let the devices have as many outstanding page requests as they can. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Preset A/D bits for user space DMA usageLu Baolu1-7/+3
We preset the access and dirty bits for IOVA over first level usage only for the kernel DMA (i.e., when domain type is IOMMU_DOMAIN_DMA). We should also preset the FL A/D for user space DMA usage. The idea is that even the user space A/D bit memory write is unnecessary. We should avoid it to minimize the overhead. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Enable Intel IOMMU scalable mode by defaultLu Baolu2-1/+5
The commit 8950dcd83ae7d ("iommu/vt-d: Leave scalable mode default off") leaves the scalable mode default off and end users could turn it on with "intel_iommu=sm_on". Using the Intel IOMMU scalable mode for kernel DMA, user-level device access and Shared Virtual Address have been enabled. This enables the scalable mode by default if the hardware advertises the support and adds kernel options of "intel_iommu=sm_on/sm_off" for end users to configure it through the kernel parameters. Suggested-by: Ashok Raj <ashok.raj@intel.com> Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Refactor Kconfig a bitLu Baolu2-19/+12
Put all sub-options inside a "if INTEL_IOMMU" so that they don't need to always depend on INTEL_IOMMU. Use IS_ENABLED() instead of #ifdef as well. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Remove unnecessary oom messageZhen Lei2-7/+1
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210609124937.14260-1-thunder.leizhen@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210818134852.1847070-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19iommu/vt-d: Update the virtual command related registersLu Baolu1-5/+5
The VT-d spec Revision 3.3 updated the virtual command registers, virtual command opcode B register, virtual command response register and virtual command capability register (Section 10.4.43, 10.4.44, 10.4.45, 10.4.46). This updates the virtual command interface implementation in the Intel IOMMU driver accordingly. Fixes: 24f27d32ab6b7 ("iommu/vt-d: Enlightened PASID allocation") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Sanjay Kumar <sanjay.k.kumar@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210713042649.3547403-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/vt-d: Prepare for multiple DMA domain typesRobin Murphy1-9/+6
In preparation for the strict vs. non-strict decision for DMA domains to be expressed in the domain type, make sure we expose our flush queue awareness by accepting the new domain type, and test the specific feature flag where we want to identify DMA domains in general. The DMA ops reset/setup can simply be made unconditional, since iommu-dma already knows only to touch DMA domains. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/31a8ef868d593a2f3826a6a120edee81815375a7.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/vt-d: Drop IOVA cookie managementRobin Murphy1-8/+0
The core code bakes its own cookies now. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e9dbe3b6108f8538e17e0c5f59f8feeb714f51a4.1628682048.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()Liu Yi L2-2/+14
This fixes improper iotlb invalidation in intel_pasid_tear_down_entry(). When a PASID was used as nested mode, released and reused, the following error message will appear: [ 180.187556] Unexpected page request in Privilege Mode [ 180.187565] Unexpected page request in Privilege Mode [ 180.279933] Unexpected page request in Privilege Mode [ 180.279937] Unexpected page request in Privilege Mode Per chapter 6.5.3.3 of VT-d spec 3.3, when tear down a pasid entry, the software should use Domain selective IOTLB flush if the PGTT of the pasid entry is SL only or Nested, while for the pasid entries whose PGTT is FL only or PT using PASID-based IOTLB flush is enough. Fixes: 2cd1311a26673 ("iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr") Signed-off-by: Kumar Sanjay K <sanjay.k.kumar@intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Tested-by: Yi Sun <yi.y.sun@intel.com> Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210817124321.1517985-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/vt-d: Fix PASID reference leakFenghua Yu1-1/+2
A PASID reference is increased whenever a device is bound to an mm (and its PASID) successfully (i.e. the device's sdev user count is increased). But the reference is not dropped every time the device is unbound successfully from the mm (i.e. the device's sdev user count is decreased). The reference is dropped only once by calling intel_svm_free_pasid() when there isn't any device bound to the mm. intel_svm_free_pasid() drops the reference and only frees the PASID on zero reference. Fix the issue by dropping the PASID reference and freeing the PASID when no reference on successful unbinding the device by calling intel_svm_free_pasid() . Fixes: 4048377414162 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20210813181345.1870742-1-fenghua.yu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210817124321.1517985-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/vt-d: Move clflush'es from iotlb_sync_map() to map_pages()Lu Baolu1-41/+7
As the Intel VT-d driver has switched to use the iommu_ops.map_pages() callback, multiple pages of the same size will be mapped in a call. There's no need to put the clflush'es in iotlb_sync_map() callback. Move them back into __domain_mapping() to simplify the code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/vt-d: Implement map/unmap_pages() iommu_ops callbackLu Baolu1-2/+35
Implement the map_pages() and unmap_pages() callback for the Intel IOMMU driver to allow calls from iommu core to map and unmap multiple pages of the same size in one call. With map/unmap_pages() implemented, the prior map/unmap callbacks are deprecated. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/vt-d: Report real pgsize bitmap to iommu coreLu Baolu1-19/+19
The pgsize bitmap is used to advertise the page sizes our hardware supports to the IOMMU core, which will then use this information to split physically contiguous memory regions it is mapping into page sizes that we support. Traditionally the IOMMU core just handed us the mappings directly, after making sure the size is an order of a 4KiB page and that the mapping has natural alignment. To retain this behavior, we currently advertise that we support all page sizes that are an order of 4KiB. We are about to utilize the new IOMMU map/unmap_pages APIs. We could change this to advertise the real page sizes we support. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu: Remove mode argument from iommu_set_dma_strict()John Garry1-3/+3
We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/vt-d: Add support for IOMMU default DMA mode build optionsZhen Lei1-9/+6
Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set, as is current behaviour. Also delete global flag intel_iommu_strict: - In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also remove the print, as iommu_subsys_init() prints the mode and we have already marked this param as deprecated. - For cap_caching_mode() check in intel_iommu_setup(), call iommu_set_dma_strict(true) directly; also reword the accompanying print with a level downgrade and also add the missing '\n'. - For Ironlake GPU, again call iommu_set_dma_strict(true) directly and keep the accompanying print. [jpg: Remove intel_iommu_strict] Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-5-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu: Deprecate Intel and AMD cmdline methods to enable strict modeJohn Garry1-0/+1
Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-2-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-14iommu/vt-d: Fix clearing real DMA device's scalable-mode context entriesLu Baolu1-3/+2
The commit 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") fixes an issue of "sub-device is removed where the context entry is cleared for all aliases". But this commit didn't consider the PASID entry and PASID table in VT-d scalable mode. This fix increases the coverage of scalable mode. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Fixes: 8038bdb855331 ("iommu/vt-d: Only clear real DMA device's context entries") Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Cc: stable@vger.kernel.org # v5.6+ Cc: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-14iommu/vt-d: Global devTLB flush when present context entry changedSanjay Kumar1-9/+22
This fixes a bug in context cache clear operation. The code was not following the correct invalidation flow. A global device TLB invalidation should be added after the IOTLB invalidation. At the same time, it uses the domain ID from the context entry. But in scalable mode, the domain ID is in PASID table entry, not context entry. Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-25Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/smmu', 'x86/vt-d', ↵Joerg Roedel9-403/+825
'x86/amd', 'virtio' and 'core' into next
2021-06-25iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()Jean-Philippe Brucker1-4/+1
Passing a 64-bit address width to iommu_setup_dma_ops() is valid on virtual platforms, but isn't currently possible. The overflow check in iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass a limit address instead of a size, so callers don't have to fake a size to work around the check. The base and limit parameters are being phased out, because: * they are redundant for x86 callers. dma-iommu already reserves the first page, and the upper limit is already in domain->geometry. * they can now be obtained from dev->dma_range_map on Arm. But removing them on Arm isn't completely straightforward so is left for future work. As an intermediate step, simplify the x86 callers by passing dummy limits. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210618152059.1194210-5-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-18iommu/vt-d: Fix dereference of pointer info before it is null checkedColin Ian King1-2/+6
The assignment of iommu from info->iommu occurs before info is null checked hence leading to a potential null pointer dereference issue. Fix this by assigning iommu and checking if iommu is null after null checking info. Addresses-Coverity: ("Dereference before null check") Fixes: 4c82b88696ac ("iommu/vt-d: Allocate/register iopf queue for sva devices") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.king@canonical.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Fix linker error on 32-bitJoerg Roedel1-3/+3
A recent commit broke the build on 32-bit x86. The linker throws these messages: ld: drivers/iommu/intel/perf.o: in function `dmar_latency_snapshot': perf.c:(.text+0x40c): undefined reference to `__udivdi3' ld: perf.c:(.text+0x458): undefined reference to `__udivdi3' The reason are the 64-bit divides in dmar_latency_snapshot(). Use the div_u64() helper function for those. Fixes: 55ee5e67a59a ("iommu/vt-d: Add common code for dmar latency performance monitors") Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210610083120.29224-1-joro@8bytes.org
2021-06-10iommu/vt-d: No need to typecastParav Pandit1-3/+2
Page directory assignment by alloc_pgtable_page() or phys_to_virt() doesn't need typecasting as both routines return void*. Hence, remove typecasting from both the calls. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-24-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Remove unnecessary bracesParav Pandit1-2/+1
No need for braces for single line statement under if() block. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-22-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Removed unused iommu_count in dmar domainParav Pandit1-8/+3
DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id. Older iommu_count is only incremented and decremented but no decisions are taken based on this refcount. This is not of much use. Hence, remove iommu_count and further simplify domain_detach_iommu() by returning void. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Use bitfields for DMAR capabilitiesParav Pandit1-9/+9
IOTLB device presence, iommu coherency and snooping are boolean capabilities. Use them as bits and keep them adjacent. Structure layout before the reorg. $ pahole -C dmar_domain drivers/iommu/intel/dmar.o struct dmar_domain { int nid; /* 0 4 */ unsigned int iommu_refcnt[128]; /* 4 512 */ /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */ u16 iommu_did[128]; /* 516 256 */ /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */ bool has_iotlb_device; /* 772 1 */ /* XXX 3 bytes hole, try to pack */ struct list_head devices; /* 776 16 */ struct list_head subdevices; /* 792 16 */ struct iova_domain iovad __attribute__((__aligned__(8))); /* 808 2320 */ /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */ struct dma_pte * pgd; /* 3128 8 */ /* --- cacheline 49 boundary (3136 bytes) --- */ int gaw; /* 3136 4 */ int agaw; /* 3140 4 */ int flags; /* 3144 4 */ int iommu_coherency; /* 3148 4 */ int iommu_snooping; /* 3152 4 */ int iommu_count; /* 3156 4 */ int iommu_superpage; /* 3160 4 */ /* XXX 4 bytes hole, try to pack */ u64 max_addr; /* 3168 8 */ u32 default_pasid; /* 3176 4 */ /* XXX 4 bytes hole, try to pack */ struct iommu_domain domain; /* 3184 72 */ /* size: 3256, cachelines: 51, members: 18 */ /* sum members: 3245, holes: 3, sum holes: 11 */ /* forced alignments: 1 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); After arranging it for natural padding and to make flags as u8 bits, it saves 8 bytes for the struct. struct dmar_domain { int nid; /* 0 4 */ unsigned int iommu_refcnt[128]; /* 4 512 */ /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */ u16 iommu_did[128]; /* 516 256 */ /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */ u8 has_iotlb_device:1; /* 772: 0 1 */ u8 iommu_coherency:1; /* 772: 1 1 */ u8 iommu_snooping:1; /* 772: 2 1 */ /* XXX 5 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ struct list_head devices; /* 776 16 */ struct list_head subdevices; /* 792 16 */ struct iova_domain iovad __attribute__((__aligned__(8))); /* 808 2320 */ /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */ struct dma_pte * pgd; /* 3128 8 */ /* --- cacheline 49 boundary (3136 bytes) --- */ int gaw; /* 3136 4 */ int agaw; /* 3140 4 */ int flags; /* 3144 4 */ int iommu_count; /* 3148 4 */ int iommu_superpage; /* 3152 4 */ /* XXX 4 bytes hole, try to pack */ u64 max_addr; /* 3160 8 */ u32 default_pasid; /* 3168 4 */ /* XXX 4 bytes hole, try to pack */ struct iommu_domain domain; /* 3176 72 */ /* size: 3248, cachelines: 51, members: 18 */ /* sum members: 3236, holes: 3, sum holes: 11 */ /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */ /* forced alignments: 1 */ /* last cacheline: 48 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Use DEVICE_ATTR_RO macroYueHaibing1-24/+18
Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210528130229.22108-1-yuehaibing@huawei.com Link: https://lore.kernel.org/r/20210610020115.1637656-19-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Fix out-bounds-warning in intel/svm.cGustavo A. R. Silva1-10/+16
Replace a couple of calls to memcpy() with simple assignments in order to fix the following out-of-bounds warning: drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds] The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &desc.qw2 and &resp.qw2, respectively. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210414201403.GA392764@embeddedor Link: https://lore.kernel.org/r/20210610020115.1637656-18-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add PRQ handling latency samplingLu Baolu1-3/+13
The execution time for page fault request handling is performance critical and needs to be monitored. This adds code to sample the execution time of page fault request handling. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-17-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add cache invalidation latency samplingLu Baolu1-0/+31
Queued invalidation execution time is performance critical and needs to be monitored. This adds code to sample the execution time of IOTLB/ devTLB/ICE cache invalidation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-16-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Expose latency monitor data through debugfsLu Baolu2-0/+112
A debugfs interface /sys/kernel/debug/iommu/intel/dmar_perf_latency is created to control and show counts of execution time ranges for various types per DMAR. The interface may help debug any potential performance issue. By default, the interface is disabled. Possible write value of /sys/kernel/debug/iommu/intel/dmar_perf_latency 0 - disable sampling all latency data 1 - enable sampling IOTLB invalidation latency data 2 - enable sampling devTLB invalidation latency data 3 - enable sampling intr entry cache invalidation latency data 4 - enable sampling prq handling latency data Read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives a snapshot of sampling result of all enabled monitors. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-15-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add common code for dmar latency performance monitorsLu Baolu4-0/+243
The execution time of some operations is very performance critical, such as cache invalidation and PRQ processing time. This adds some common code to monitor the execution time range of those operations. The interfaces include enabling/disabling, checking status, updating sampling data and providing a common string format for users. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-14-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add prq_report trace eventLu Baolu1-0/+7
This adds a new trace event to track the page fault request report. This event will provide almost all information defined in a page request descriptor. A sample output: | prq_report: dmar0/0000:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l pasid=0x2 index=0x1 This will be helpful for I/O page fault related debugging. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-13-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Report prq to io-pgfault frameworkLu Baolu2-81/+17
Let the IO page fault requests get handled through the io-pgfault framework. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-12-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Allocate/register iopf queue for sva devicesLu Baolu2-26/+77
This allocates and registers the iopf queue infrastructure for devices which want to support IO page fault for SVA. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>