Age | Commit message (Collapse) | Author | Files | Lines |
|
The __domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.
This would introduce too many duplicated removing and leads the
map operation take too long, for example:
Map iova=0x100000,nr_pages=0x7d61800
iov_pfn: 0x100000, end_pfn: 0x7e617ff
iov_pfn: 0x140000, end_pfn: 0x7e617ff
iov_pfn: 0x180000, end_pfn: 0x7e617ff
iov_pfn: 0x1c0000, end_pfn: 0x7e617ff
iov_pfn: 0x200000, end_pfn: 0x7e617ff
...
it takes about 50ms in total.
We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of this pte page.
Map iova=0x100000,nr_pages=0x7d61800
iov_pfn: 0x100000, end_pfn: 0x13ffff
iov_pfn: 0x140000, end_pfn: 0x17ffff
iov_pfn: 0x180000, end_pfn: 0x1bffff
iov_pfn: 0x1c0000, end_pfn: 0x1fffff
iov_pfn: 0x200000, end_pfn: 0x23ffff
...
it only need 9ms now.
This also removes a meaningless BUG_ON() in __domain_mapping().
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Tested-by: Liujunjie <liujunjie23@huawei.com>
Link: https://lore.kernel.org/r/20211008000433.1115-1-longpeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The first_pte_in_page() returns true or false, so let's convert its
return type to bool. In addition, use 'IS_ALIGNED' to make the
code more readable and neater.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20211008000433.1115-1-longpeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The iommu_domain data structure already has the "type" field to keep the
type of a domain. It's unnecessary to have the DOMAIN_FLAG_STATIC_IDENTITY
flag in the vt-d implementation. This cleans it up with no functionality
change.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20211014053839.727419-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The VT-d spec Revision 3.3 updated the virtual command registers, virtual
command opcode B register, virtual command response register and virtual
command capability register (Section 10.4.43, 10.4.44, 10.4.45, 10.4.46).
This updates the virtual command interface implementation in the Intel
IOMMU driver accordingly.
Fixes: 24f27d32ab6b7 ("iommu/vt-d: Enlightened PASID allocation")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Sanjay Kumar <sanjay.k.kumar@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210713042649.3547403-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Avoid below checkpatch warning.
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+ unsigned iommu_refcnt[DMAR_UNITS_SUPPORTED];
Fixes: 29a27719abaa ("iommu/vt-d: Replace iommu_bmp with a refcount")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-23-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id.
Older iommu_count is only incremented and decremented but no decisions
are taken based on this refcount. This is not of much use.
Hence, remove iommu_count and further simplify domain_detach_iommu()
by returning void.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
IOTLB device presence, iommu coherency and snooping are boolean
capabilities. Use them as bits and keep them adjacent.
Structure layout before the reorg.
$ pahole -C dmar_domain drivers/iommu/intel/dmar.o
struct dmar_domain {
int nid; /* 0 4 */
unsigned int iommu_refcnt[128]; /* 4 512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16 iommu_did[128]; /* 516 256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
bool has_iotlb_device; /* 772 1 */
/* XXX 3 bytes hole, try to pack */
struct list_head devices; /* 776 16 */
struct list_head subdevices; /* 792 16 */
struct iova_domain iovad __attribute__((__aligned__(8)));
/* 808 2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte * pgd; /* 3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
int gaw; /* 3136 4 */
int agaw; /* 3140 4 */
int flags; /* 3144 4 */
int iommu_coherency; /* 3148 4 */
int iommu_snooping; /* 3152 4 */
int iommu_count; /* 3156 4 */
int iommu_superpage; /* 3160 4 */
/* XXX 4 bytes hole, try to pack */
u64 max_addr; /* 3168 8 */
u32 default_pasid; /* 3176 4 */
/* XXX 4 bytes hole, try to pack */
struct iommu_domain domain; /* 3184 72 */
/* size: 3256, cachelines: 51, members: 18 */
/* sum members: 3245, holes: 3, sum holes: 11 */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
After arranging it for natural padding and to make flags as u8 bits, it
saves 8 bytes for the struct.
struct dmar_domain {
int nid; /* 0 4 */
unsigned int iommu_refcnt[128]; /* 4 512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16 iommu_did[128]; /* 516 256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
u8 has_iotlb_device:1; /* 772: 0 1 */
u8 iommu_coherency:1; /* 772: 1 1 */
u8 iommu_snooping:1; /* 772: 2 1 */
/* XXX 5 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
struct list_head devices; /* 776 16 */
struct list_head subdevices; /* 792 16 */
struct iova_domain iovad __attribute__((__aligned__(8)));
/* 808 2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte * pgd; /* 3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
int gaw; /* 3136 4 */
int agaw; /* 3140 4 */
int flags; /* 3144 4 */
int iommu_count; /* 3148 4 */
int iommu_superpage; /* 3152 4 */
/* XXX 4 bytes hole, try to pack */
u64 max_addr; /* 3160 8 */
u32 default_pasid; /* 3168 4 */
/* XXX 4 bytes hole, try to pack */
struct iommu_domain domain; /* 3176 72 */
/* size: 3248, cachelines: 51, members: 18 */
/* sum members: 3236, holes: 3, sum holes: 11 */
/* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
/* forced alignments: 1 */
/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The execution time of some operations is very performance critical, such
as cache invalidation and PRQ processing time. This adds some common code
to monitor the execution time range of those operations. The interfaces
include enabling/disabling, checking status, updating sampling data and
providing a common string format for users.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-14-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This adds a new trace event to track the page fault request report.
This event will provide almost all information defined in a page
request descriptor.
A sample output:
| prq_report: dmar0/0000:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l pasid=0x2 index=0x1
This will be helpful for I/O page fault related debugging.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This allocates and registers the iopf queue infrastructure for devices
which want to support IO page fault for SVA.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Align the pasid alloc/free code with the generic helpers defined in the
iommu core. This also refactored the SVA binding code to improve the
readability.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When the Intel IOMMU is operating in the scalable mode, some information
from the root and context table may be used to tag entries in the PASID
cache. Software should invalidate the PASID-cache when changing root or
context table entries.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The svm_dev_ops has never been referenced in the tree, and there's no
plan to have anything to use it. Remove it to make the code neat.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When the invalidation queue errors are encountered, dump the information
logged by the VT-d hardware together with the pending queue invalidation
descriptors.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210318005340.187311-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Audit IOMMU Capability/Extended Capability and check if the IOMMUs have
the consistent value for features. Report out or scale to the lowest
supported when IOMMU features have incompatibility among IOMMUs.
Report out features when below features are mismatched:
- First Level 5 Level Paging Support (FL5LP)
- First Level 1 GByte Page Support (FL1GP)
- Read Draining (DRD)
- Write Draining (DWD)
- Page Selective Invalidation (PSI)
- Zero Length Read (ZLR)
- Caching Mode (CM)
- Protected High/Low-Memory Region (PHMR/PLMR)
- Required Write-Buffer Flushing (RWBF)
- Advanced Fault Logging (AFL)
- RID-PASID Support (RPS)
- Scalable Mode Page Walk Coherency (SMPWC)
- First Level Translation Support (FLTS)
- Second Level Translation Support (SLTS)
- No Write Flag Support (NWFS)
- Second Level Accessed/Dirty Support (SLADS)
- Virtual Command Support (VCS)
- Scalable Mode Translation Support (SMTS)
- Device TLB Invalidation Throttle (DIT)
- Page Drain Support (PDS)
- Process Address Space ID Support (PASID)
- Extended Accessed Flag Support (EAFS)
- Supervisor Request Support (SRS)
- Execute Request Support (ERS)
- Page Request Support (PRS)
- Nested Translation Support (NEST)
- Snoop Control (SC)
- Pass Through (PT)
- Device TLB Support (DT)
- Queued Invalidation (QI)
- Page walk Coherency (C)
Set capability to the lowest supported when below features are mismatched:
- Maximum Address Mask Value (MAMV)
- Number of Fault Recording Registers (NFR)
- Second Level Large Page Support (SLLPS)
- Fault Recording Offset (FRO)
- Maximum Guest Address Width (MGAW)
- Supported Adjusted Guest Address Width (SAGAW)
- Number of Domains supported (NDOMS)
- Pasid Size Supported (PSS)
- Maximum Handle Mask Value (MHMV)
- IOTLB Register Offset (IRO)
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210130184452.31711-1-kyung.min.park@intel.com
Link: https://lore.kernel.org/r/20210204014401.2846425-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Fix misspellings of "physical".
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/linux-iommu/20210126211738.2920789-1-helgaas@kernel.org
Link: https://lore.kernel.org/r/20210204014401.2846425-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Access/Dirty bits in the first level page table entry will be set
whenever a page table entry was used for address translation or write
permission was successfully translated. This is always true when using
the first-level page table for kernel IOVA. Instead of wasting hardware
cycles to update the certain bits, it's better to set them up at the
beginning.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210115004202.953965-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The aux-domain attach/detach are not tracked, some data structures might
be used after free. This causes general protection faults when multiple
subdevices are created and assigned to a same guest machine:
| general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
| RIP: 0010:intel_iommu_aux_detach_device+0x12a/0x1f0
| [...]
| Call Trace:
| iommu_aux_detach_device+0x24/0x70
| vfio_mdev_detach_domain+0x3b/0x60
| ? vfio_mdev_set_domain+0x50/0x50
| iommu_group_for_each_dev+0x4f/0x80
| vfio_iommu_detach_group.isra.0+0x22/0x30
| vfio_iommu_type1_detach_group.cold+0x71/0x211
| ? find_exported_symbol_in_section+0x4a/0xd0
| ? each_symbol_section+0x28/0x50
| __vfio_group_unset_container+0x4d/0x150
| vfio_group_try_dissolve_container+0x25/0x30
| vfio_group_put_external_user+0x13/0x20
| kvm_vfio_group_put_external_user+0x27/0x40 [kvm]
| kvm_vfio_destroy+0x45/0xb0 [kvm]
| kvm_put_kvm+0x1bb/0x2e0 [kvm]
| kvm_vm_release+0x22/0x30 [kvm]
| __fput+0xcc/0x260
| ____fput+0xe/0x10
| task_work_run+0x8f/0xb0
| do_exit+0x358/0xaf0
| ? wake_up_state+0x10/0x20
| ? signal_wake_up_state+0x1a/0x30
| do_group_exit+0x47/0xb0
| __x64_sys_exit_group+0x18/0x20
| do_syscall_64+0x57/0x1d0
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix the crash by tracking the subdevices when attaching and detaching
aux-domains.
Fixes: 67b8e02b5e76 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Co-developed-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-3-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
'struct intel_svm' is shared by all devices bound to a give process,
but records only a single pointer to a 'struct intel_iommu'. Consequently,
cache invalidations may only be applied to a single DMAR unit, and are
erroneously skipped for the other devices.
In preparation for fixing this, rework the structures so that the iommu
pointer resides in 'struct intel_svm_dev', allowing 'struct intel_svm'
to track them in its device list.
Fixes: 1c4f88b7f1f9 ("iommu/vt-d: Shared virtual address in scalable mode")
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Raj Ashok <ashok.raj@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Reported-by: Guo Kaijie <Kaijie.Guo@intel.com>
Reported-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Cc: stable@vger.kernel.org # v5.0+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-2-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
"intel_iommu=off" command line is used to disable iommu but iommu is force
enabled in a tboot system for security reason.
However for better performance on high speed network device, a new option
"intel_iommu=tboot_noforce" is introduced to disable the force on.
By default kernel should panic if iommu init fail in tboot for security
reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off".
Fix the code setting force_on and move intel_iommu_tboot_noforce
from tboot code to intel iommu code.
Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when
a device tries DMA on memory owned by a guest. This needs new
fault-types as well as a rewrite of the IOMMU memory semaphore for
command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to still be
used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can access
address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
* tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits)
iommu/vt-d: Gracefully handle DMAR units with no supported address widths
iommu/vt-d: Check UAPI data processed by IOMMU core
iommu/uapi: Handle data and argsz filled by users
iommu/uapi: Rename uapi functions
iommu/uapi: Use named union for user data
iommu/uapi: Add argsz for user filled data
docs: IOMMU user API
iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()
iommu/arm-smmu-v3: Add SVA device feature
iommu/arm-smmu-v3: Check for SVA features
iommu/arm-smmu-v3: Seize private ASID
iommu/arm-smmu-v3: Share process page tables
iommu/arm-smmu-v3: Move definitions to a header
iommu/io-pgtable-arm: Move some definitions to a header
iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer
iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB
iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
iommu/amd: Use 4K page for completion wait write-back semaphore
iommu/tegra-smmu: Allow to group clients in same swgroup
iommu/tegra-smmu: Fix iova->phys translation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq updates from Thomas Gleixner:
"Surgery of the MSI interrupt handling to prepare the support of
upcoming devices which require non-PCI based MSI handling:
- Cleanup historical leftovers all over the place
- Rework the code to utilize more core functionality
- Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
assignment to PCI devices possible.
- Assign irqdomains to PCI devices at initialization time which
allows to utilize the full functionality of hierarchical
irqdomains.
- Remove arch_.*_msi_irq() functions from X86 and utilize the
irqdomain which is assigned to the device for interrupt management.
- Make the arch_.*_msi_irq() support conditional on a config switch
and let the last few users select it"
* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
x86/apic/msi: Unbreak DMAR and HPET MSI
iommu/amd: Remove domain search for PCI/MSI
iommu/vt-d: Remove domain search for PCI/MSI[X]
x86/irq: Make most MSI ops XEN private
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
x86/pci: Set default irq domain in pcibios_add_device()
iommm/amd: Store irq domain in struct device
iommm/vt-d: Store irq domain in struct device
x86/xen: Wrap XEN MSI management into irqdomain
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Consolidate XEN-MSI init
x86/xen: Rework MSI teardown
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
x86/irq: Initialize PCI/MSI domain at PCI init time
x86/pci: Reducde #ifdeffery in PCI init code
...
|
|
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".
Change its type to "unsigned int".
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-3-git-send-email-fenghua.yu@intel.com
|
|
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
|
|
As a first step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.
This is done from dmar_pci_bus_add_dev() because it has to work even when
DMA remapping is disabled. It only overrides the irqdomain of devices which
are handled by a regular PCI/MSI irq domain which protects PCI devices
behind special busses like VMD which have their own irq domain.
No functional change. It just avoids the redirection through
arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq
domain alloc/free functions instead of having to look up the irq domain for
every single MSI interupt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112333.714566121@linutronix.de
|
|
Static analyzer is not happy about intel_iommu_gfx_mapped declaration:
.../drivers/iommu/intel/iommu.c:364:5: warning: symbol 'intel_iommu_gfx_mapped' was not declared. Should it be static?
Move its declaration to Intel IOMMU header file.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200828161212.71294-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The VT-d spec requires (10.4.4 Global Command Register, TE field) that:
Hardware implementations supporting DMA draining must drain any in-flight
DMA read/write requests queued within the Root-Complex before completing
the translation enable command and reflecting the status of the command
through the TES field in the Global Status register.
Unfortunately, some integrated graphic devices fail to do so after some
kind of power state transition. As the result, the system might stuck in
iommu_disable_translation(), waiting for the completion of TE transition.
This provides a quirk list for those devices and skips TE disabling if
the qurik hits.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206571
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Koba Ko <koba.ko@canonical.com>
Tested-by: Jun Miao <jun.miao@windriver.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200723013437.2268-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After page requests are handled, software must respond to the device
which raised the page request with the result. This is done through
the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Co-developed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It is refactored in two ways:
- Make it global so that it could be used in other files.
- Make bus/devfn optional so that callers could ignore these two returned
values when they only want to get the coresponding iommu pointer.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Global pages support is removed from VT-d spec 3.0 for dev TLB
invalidation. This patch is to remove the bits for vSVA. Similar change
already made for the native SVA. See the link below.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/linux-iommu/20190830142919.GE11578@8bytes.org/T/
Link: https://lore.kernel.org/r/20200724014925.15523-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Set proper masks to avoid invalid input spillover to reserved bits.
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200724014925.15523-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When using first-level translation for IOVA, currently the U/S bit in the
page table is cleared which implies DMA requests with user privilege are
blocked. As the result, following error messages might be observed when
passing through a device to user level:
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [41:00.0] PASID 1 fault addr 7ecdcd000
[fault reason 129] SM: U/S set 0 for first-level translation
with user privilege
This fixes it by setting U/S bit in the first level page table and makes
IOVA over first level compatible with previous second-level translation.
Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Reported-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200622231345.29722-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Sub-devices of a real DMA device might exist on a separate segment than
the real DMA device and its IOMMU. These devices should still have a
valid device_domain_info, but the current dma alias model won't
allocate info for the subdevice.
This patch adds a segment member to struct device_domain_info and uses
the sub-device's BDF so that these sub-devices won't alias to other
devices.
Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Cc: stable@vger.kernel.org # v5.6+
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200527165617.297470-3-jonathan.derrick@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in remapping hardware.
This adds the interface to drain page requests and call it when a
PASID is terminated.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-16-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Current qi_submit_sync() only supports single invalidation descriptor
per submission and appends wait descriptor after each submission to
poll the hardware completion. This extends the qi_submit_sync() helper
to support multiple descriptors, and add an option so that the caller
could specify the Page-request Drain (PD) bit in the wait descriptor.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This patch is an initial step to replace Intel SVM code with the
following IOMMU SVA ops:
intel_svm_bind_mm() => iommu_sva_bind_device()
intel_svm_unbind_mm() => iommu_sva_unbind_device()
intel_svm_is_pasid_valid() => iommu_sva_get_pasid()
The features below will continue to work but are not included in this patch
in that they are handled mostly within the IOMMU subsystem.
- IO page fault
- mmu notifier
Consolidation of the above will come after merging generic IOMMU sva
code[1]. There should not be any changes needed for SVA users such as
accelerator device drivers during this time.
[1] http://jpbrucker.net/sva/
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a get_domain_info() helper to retrieve the valid per-device
iommu private data.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.
Virtual command registers are used in the guest only, to prevent
vmexit cost, we cache the capability and store it during initialization.
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200516062101.29541-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Enabling IOMMU in a guest requires communication with the host
driver for certain aspects. Use of PASID ID to enable Shared Virtual
Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
provides a Virtual Command Register (VCMD) to facilitate this.
Writes to this register in the guest are trapped by vIOMMU which
proxies the call to the host driver.
This virtual command interface consists of a capability register,
a virtual command register, and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.
This patch adds the enlightened PASID allocation/free interfaces
via the virtual command interface.
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200516062101.29541-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.
For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.
.-------------. .---------------------------.
| vIOMMU | | Guest process CR3, FL only|
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | CR3 in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.------------------------------.
| | |SL for GPA-HPA, default domain|
| | '------------------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.
This patch adds a helper function for setting up nested translation
where second level comes from a domain and first level comes from
a guest PGD.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200516062101.29541-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Move domain helper to header to be used by SVA code.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200516062101.29541-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Commit 6825d3ea6cde ("iommu/vt-d: Add debugfs support to show register
contents") dumps the register contents for all IOMMU devices.
Currently, a 64 bit read(dmar_readq) is done for all the IOMMU registers,
even though some of the registers are 32 bits, which is incorrect.
Use the correct read function variant (dmar_readl/dmar_readq) while
reading the contents of 32/64 bit registers respectively.
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Link: https://lore.kernel.org/r/1583784587-26126-2-git-send-email-megha.dey@linux.intel.com
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Export page table internals of the domain attached to each device.
Example of such dump on a Skylake machine:
$ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct
[ ... ]
Device 0000:00:14.0 with pasid 0 @0x15f3d9000
IOVA_PFN PML5E PML4E
0x000000008ced0 | 0x0000000000000000 0x000000015f3da003
0x000000008ced1 | 0x0000000000000000 0x000000015f3da003
0x000000008ced2 | 0x0000000000000000 0x000000015f3da003
0x000000008ced3 | 0x0000000000000000 0x000000015f3da003
0x000000008ced4 | 0x0000000000000000 0x000000015f3da003
0x000000008ced5 | 0x0000000000000000 0x000000015f3da003
0x000000008ced6 | 0x0000000000000000 0x000000015f3da003
0x000000008ced7 | 0x0000000000000000 0x000000015f3da003
0x000000008ced8 | 0x0000000000000000 0x000000015f3da003
0x000000008ced9 | 0x0000000000000000 0x000000015f3da003
PDPE PDE PTE
0x000000015f3db003 0x000000015f3dc003 0x000000008ced0003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced1003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced2003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced3003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced4003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced5003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced6003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced7003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced8003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced9003
[ ... ]
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When software has changed first-level tables, it should invalidate
the affected IOTLB and the paging-structure-caches using the PASID-
based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Shared Virtual Memory(SVM) is based on a collective set of hardware
features detected at runtime. There are requirements for matching CPU
and IOMMU capabilities.
The current code checks CPU and IOMMU feature set for SVM support but
the result is never stored nor used. Therefore, SVM can still be used
even when these checks failed. The consequences can be:
1. CPU uses 5-level paging mode for virtual address of 57 bits, but
IOMMU can only support 4-level paging mode with 48 bits address for DMA.
2. 1GB page size is used by CPU but IOMMU does not support it. VT-d
unrecoverable faults may be generated.
The best solution to fix these problems is to prevent them in the first
place.
This patch consolidates code for checking PASID, CPU vs. IOMMU paging
mode compatibility, as well as provides specific error messages for
each failed checks. On sane hardware configurations, these error message
shall never appear in kernel log.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
For both PASID-based-Device-TLB Invalidate Descriptor and
Device-TLB Invalidate Descriptor, the Physical Function Source-ID
value is split according to this layout:
PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52.
Fix the part laid out at offset 52.
Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org # v4.19+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'arm/qcom', 'arm/renesas', 'x86/amd', 'x86/vt-d' and 'core' into next
|