Age | Commit message (Collapse) | Author | Files | Lines |
|
This fixes improper iotlb invalidation in intel_pasid_tear_down_entry().
When a PASID was used as nested mode, released and reused, the following
error message will appear:
[ 180.187556] Unexpected page request in Privilege Mode
[ 180.187565] Unexpected page request in Privilege Mode
[ 180.279933] Unexpected page request in Privilege Mode
[ 180.279937] Unexpected page request in Privilege Mode
Per chapter 6.5.3.3 of VT-d spec 3.3, when tear down a pasid entry, the
software should use Domain selective IOTLB flush if the PGTT of the pasid
entry is SL only or Nested, while for the pasid entries whose PGTT is FL
only or PT using PASID-based IOTLB flush is enough.
Fixes: 2cd1311a26673 ("iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr")
Signed-off-by: Kumar Sanjay K <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Yi Sun <yi.y.sun@intel.com>
Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A PASID reference is increased whenever a device is bound to an mm (and
its PASID) successfully (i.e. the device's sdev user count is increased).
But the reference is not dropped every time the device is unbound
successfully from the mm (i.e. the device's sdev user count is decreased).
The reference is dropped only once by calling intel_svm_free_pasid() when
there isn't any device bound to the mm. intel_svm_free_pasid() drops the
reference and only frees the PASID on zero reference.
Fix the issue by dropping the PASID reference and freeing the PASID when
no reference on successful unbinding the device by calling
intel_svm_free_pasid() .
Fixes: 4048377414162 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210813181345.1870742-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The commit 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
fixes an issue of "sub-device is removed where the context entry is cleared
for all aliases". But this commit didn't consider the PASID entry and PASID
table in VT-d scalable mode. This fix increases the coverage of scalable
mode.
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Fixes: 8038bdb855331 ("iommu/vt-d: Only clear real DMA device's context entries")
Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Cc: stable@vger.kernel.org # v5.6+
Cc: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This fixes a bug in context cache clear operation. The code was not
following the correct invalidation flow. A global device TLB invalidation
should be added after the IOTLB invalidation. At the same time, it
uses the domain ID from the context entry. But in scalable mode, the
domain ID is in PASID table entry, not context entry.
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'x86/amd', 'virtio' and 'core' into next
|
|
Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.
The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210618152059.1194210-5-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The assignment of iommu from info->iommu occurs before info is null checked
hence leading to a potential null pointer dereference issue. Fix this by
assigning iommu and checking if iommu is null after null checking info.
Addresses-Coverity: ("Dereference before null check")
Fixes: 4c82b88696ac ("iommu/vt-d: Allocate/register iopf queue for sva devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.king@canonical.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A recent commit broke the build on 32-bit x86. The linker throws these
messages:
ld: drivers/iommu/intel/perf.o: in function `dmar_latency_snapshot':
perf.c:(.text+0x40c): undefined reference to `__udivdi3'
ld: perf.c:(.text+0x458): undefined reference to `__udivdi3'
The reason are the 64-bit divides in dmar_latency_snapshot(). Use the
div_u64() helper function for those.
Fixes: 55ee5e67a59a ("iommu/vt-d: Add common code for dmar latency performance monitors")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210610083120.29224-1-joro@8bytes.org
|
|
Page directory assignment by alloc_pgtable_page() or phys_to_virt()
doesn't need typecasting as both routines return void*. Hence, remove
typecasting from both the calls.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-24-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
No need for braces for single line statement under if() block.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-22-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id.
Older iommu_count is only incremented and decremented but no decisions
are taken based on this refcount. This is not of much use.
Hence, remove iommu_count and further simplify domain_detach_iommu()
by returning void.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
IOTLB device presence, iommu coherency and snooping are boolean
capabilities. Use them as bits and keep them adjacent.
Structure layout before the reorg.
$ pahole -C dmar_domain drivers/iommu/intel/dmar.o
struct dmar_domain {
int nid; /* 0 4 */
unsigned int iommu_refcnt[128]; /* 4 512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16 iommu_did[128]; /* 516 256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
bool has_iotlb_device; /* 772 1 */
/* XXX 3 bytes hole, try to pack */
struct list_head devices; /* 776 16 */
struct list_head subdevices; /* 792 16 */
struct iova_domain iovad __attribute__((__aligned__(8)));
/* 808 2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte * pgd; /* 3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
int gaw; /* 3136 4 */
int agaw; /* 3140 4 */
int flags; /* 3144 4 */
int iommu_coherency; /* 3148 4 */
int iommu_snooping; /* 3152 4 */
int iommu_count; /* 3156 4 */
int iommu_superpage; /* 3160 4 */
/* XXX 4 bytes hole, try to pack */
u64 max_addr; /* 3168 8 */
u32 default_pasid; /* 3176 4 */
/* XXX 4 bytes hole, try to pack */
struct iommu_domain domain; /* 3184 72 */
/* size: 3256, cachelines: 51, members: 18 */
/* sum members: 3245, holes: 3, sum holes: 11 */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
After arranging it for natural padding and to make flags as u8 bits, it
saves 8 bytes for the struct.
struct dmar_domain {
int nid; /* 0 4 */
unsigned int iommu_refcnt[128]; /* 4 512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16 iommu_did[128]; /* 516 256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
u8 has_iotlb_device:1; /* 772: 0 1 */
u8 iommu_coherency:1; /* 772: 1 1 */
u8 iommu_snooping:1; /* 772: 2 1 */
/* XXX 5 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
struct list_head devices; /* 776 16 */
struct list_head subdevices; /* 792 16 */
struct iova_domain iovad __attribute__((__aligned__(8)));
/* 808 2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte * pgd; /* 3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
int gaw; /* 3136 4 */
int agaw; /* 3140 4 */
int flags; /* 3144 4 */
int iommu_count; /* 3148 4 */
int iommu_superpage; /* 3152 4 */
/* XXX 4 bytes hole, try to pack */
u64 max_addr; /* 3160 8 */
u32 default_pasid; /* 3168 4 */
/* XXX 4 bytes hole, try to pack */
struct iommu_domain domain; /* 3176 72 */
/* size: 3248, cachelines: 51, members: 18 */
/* sum members: 3236, holes: 3, sum holes: 11 */
/* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
/* forced alignments: 1 */
/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210528130229.22108-1-yuehaibing@huawei.com
Link: https://lore.kernel.org/r/20210610020115.1637656-19-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:
drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from
the object at 'desc' is out of the bounds of referenced subobject
'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds]
The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210414201403.GA392764@embeddedor
Link: https://lore.kernel.org/r/20210610020115.1637656-18-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The execution time for page fault request handling is performance critical
and needs to be monitored. This adds code to sample the execution time of
page fault request handling.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-17-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Queued invalidation execution time is performance critical and needs
to be monitored. This adds code to sample the execution time of IOTLB/
devTLB/ICE cache invalidation.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-16-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A debugfs interface /sys/kernel/debug/iommu/intel/dmar_perf_latency is
created to control and show counts of execution time ranges for various
types per DMAR. The interface may help debug any potential performance
issue.
By default, the interface is disabled.
Possible write value of /sys/kernel/debug/iommu/intel/dmar_perf_latency
0 - disable sampling all latency data
1 - enable sampling IOTLB invalidation latency data
2 - enable sampling devTLB invalidation latency data
3 - enable sampling intr entry cache invalidation latency data
4 - enable sampling prq handling latency data
Read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives a snapshot
of sampling result of all enabled monitors.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-15-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The execution time of some operations is very performance critical, such
as cache invalidation and PRQ processing time. This adds some common code
to monitor the execution time range of those operations. The interfaces
include enabling/disabling, checking status, updating sampling data and
providing a common string format for users.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-14-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This adds a new trace event to track the page fault request report.
This event will provide almost all information defined in a page
request descriptor.
A sample output:
| prq_report: dmar0/0000:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l pasid=0x2 index=0x1
This will be helpful for I/O page fault related debugging.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Let the IO page fault requests get handled through the io-pgfault
framework.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This allocates and registers the iopf queue infrastructure for devices
which want to support IO page fault for SVA.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Refactor prq_event_thread() by moving handling single prq event out of
the main loop.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It's common to iterate the svm device list and find a matched device. Add
common helpers to do this and consolidate the code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Align the pasid alloc/free code with the generic helpers defined in the
iommu core. This also refactored the SVA binding code to improve the
readability.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
We are about to use iommu_sva_alloc/free_pasid() helpers in iommu core.
That means the pasid life cycle will be managed by iommu core. Use a
local array to save the per pasid private data instead of attaching it
the real pasid.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Current VT-d implementation supports nested translation only if all
underlying IOMMUs support the nested capability. This is unnecessary
as the upper layer is allowed to create different containers and set
them with different type of iommu backend. The IOMMU driver needs to
guarantee that devices attached to a nested mode iommu_domain should
support nested capabilility.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210517065701.5078-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Intel VT-d implementation supports device TLB management. Select
PCI_ATS explicitly so that the pci_ats helpers are always available.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210512065313.3441309-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Intel IOMMU driver reports the DMA fault reason in a decimal number
while the VT-d specification uses a hexadecimal one. It's inconvenient
that users need to covert them everytime before consulting the spec.
Let's use hexadecimal number for a DMA fault reason.
The fault message uses 0xffffffff as PASID for DMA requests w/o PASID.
This is confusing. Tweak this by adding "NO_PASID" explicitly.
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210517065425.4953-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for drivers/iommu/intel/pasid.c follows this syntax, but
the content inside does not comply with kernel-doc.
This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: Function parameter or member 'fmt' not described in 'pr_fmt'
Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.
Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210523143245.19040-1-yashsri421@gmail.com
Link: https://lore.kernel.org/r/20210610020115.1637656-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The variable agaw is initialized with a value that is never read and it
is being updated later with a new value as a counter in a for-loop. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210416171826.64091-1-colin.king@canonical.com
Link: https://lore.kernel.org/r/20210610020115.1637656-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
iommu_device_sysfs_add() is called before, so is has to be cleaned on subsequent
errors.
Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
Cc: stable@vger.kernel.org # 4.11.x
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/17411490.HIIP88n32C@mobilepool36.emlix.com
Link: https://lore.kernel.org/r/20210525070802.361755-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When first-level page tables are used for IOVA translation, we use user
privilege by setting U/S bit in the page table entry. This is to make it
consistent with the second level translation, where the U/S enforcement
is not available. Clear the SRE (Supervisor Request Enable) field in the
pasid table entry of RID2PASID so that requests requesting the supervisor
privilege are blocked and treated as DMA remapping faults.
Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210512064426.3440915-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210519015027.108468-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
In current kernels small allocations never fail, but checking for
allocation failure is the correct thing to do.
Fixes: 18abda7a2d55 ("iommu/vt-d: Fix general protection fault in aux_detach_device()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/YJuobKuSn81dOPLd@mwanda
Link: https://lore.kernel.org/r/20210519015027.108468-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Release OF node when pci_scan_device() fails (Dmitry Baryshkov)
- Add pci_disable_parity() (Bjorn Helgaas)
- Disable Mellanox Tavor parity reporting (Heiner Kallweit)
- Disable N2100 r8169 parity reporting (Heiner Kallweit)
- Fix RCiEP device to RCEC association (Qiuxu Zhuo)
- Convert sysfs "config", "rom", "reset", "label", "index",
"acpi_index" to static attributes to help fix races in device
enumeration (Krzysztof Wilczyński)
- Convert sysfs "vpd" to static attribute (Heiner Kallweit, Krzysztof
Wilczyński)
- Use sysfs_emit() in "show" functions (Krzysztof Wilczyński)
- Remove unused alloc_pci_root_info() return value (Krzysztof
Wilczyński)
PCI device hotplug:
- Fix acpiphp reference count leak (Feilong Lin)
Power management:
- Fix acpi_pci_set_power_state() debug message (Rafael J. Wysocki)
- Fix runtime PM imbalance (Dinghao Liu)
Virtualization:
- Increase delay after FLR to work around Intel DC P4510 NVMe erratum
(Raphael Norwitz)
MSI:
- Convert rcar, tegra, xilinx to MSI domains (Marc Zyngier)
- For rcar, xilinx, use controller address as MSI doorbell (Marc
Zyngier)
- Remove unused hv msi_controller struct (Marc Zyngier)
- Remove unused PCI core msi_controller support (Marc Zyngier)
- Remove struct msi_controller altogether (Marc Zyngier)
- Remove unused default_teardown_msi_irqs() (Marc Zyngier)
- Let host bridges declare their reliance on MSI domains (Marc
Zyngier)
- Make pci_host_common_probe() declare its reliance on MSI domains
(Marc Zyngier)
- Advertise mediatek lack of built-in MSI handling (Thomas Gleixner)
- Document ways of ending up with NO_MSI (Marc Zyngier)
- Refactor HT advertising of NO_MSI flag (Marc Zyngier)
VPD:
- Remove obsolete Broadcom NIC VPD length-limiting quirk (Heiner
Kallweit)
- Remove sysfs VPD size checking dead code (Heiner Kallweit)
- Convert VPF sysfs file to static attribute (Heiner Kallweit)
- Remove unnecessary pci_set_vpd_size() (Heiner Kallweit)
- Tone down "missing VPD" message (Heiner Kallweit)
Endpoint framework:
- Fix NULL pointer dereference when epc_features not implemented
(Shradha Todi)
- Add missing destroy_workqueue() in endpoint test (Yang Yingliang)
Amazon Annapurna Labs PCIe controller driver:
- Fix compile testing without CONFIG_PCI_ECAM (Arnd Bergmann)
- Fix "no symbols" warnings when compile testing with
CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)
APM X-Gene PCIe controller driver:
- Fix cfg resource mapping regression (Dejin Zheng)
Broadcom iProc PCIe controller driver:
- Return zero for success of iproc_msi_irq_domain_alloc() (Pali
Rohár)
Broadcom STB PCIe controller driver:
- Add reset_control_rearm() stub for !CONFIG_RESET_CONTROLLER (Jim
Quinlan)
- Fix use of BCM7216 reset controller (Jim Quinlan)
- Use reset/rearm for Broadcom STB pulse reset instead of
deassert/assert (Jim Quinlan)
- Fix brcm_pcie_probe() error return for unsupported revision (Wei
Yongjun)
Cavium ThunderX PCIe controller driver:
- Fix compile testing (Arnd Bergmann)
- Fix "no symbols" warnings when compile testing with
CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)
Freescale Layerscape PCIe controller driver:
- Fix ls_pcie_ep_probe() syntax error (comma for semicolon)
(Krzysztof Wilczyński)
- Remove layerscape-gen4 dependencies on OF and ARM64, add dependency
on ARCH_LAYERSCAPE (Geert Uytterhoeven)
HiSilicon HIP PCIe controller driver:
- Remove obsolete HiSilicon PCIe DT description (Dongdong Liu)
Intel Gateway PCIe controller driver:
- Remove unused pcie_app_rd() (Jiapeng Chong)
Intel VMD host bridge driver:
- Program IRTE with Requester ID of VMD endpoint, not child device
(Jon Derrick)
- Disable VMD MSI-X remapping when possible so children can use more
MSI-X vectors (Jon Derrick)
MediaTek PCIe controller driver:
- Configure FC and FTS for functions other than 0 (Ryder Lee)
- Add YAML schema for MediaTek (Jianjun Wang)
- Export pci_pio_to_address() for module use (Jianjun Wang)
- Add MediaTek MT8192 PCIe controller driver (Jianjun Wang)
- Add MediaTek MT8192 INTx support (Jianjun Wang)
- Add MediaTek MT8192 MSI support (Jianjun Wang)
- Add MediaTek MT8192 system power management support (Jianjun Wang)
- Add missing MODULE_DEVICE_TABLE (Qiheng Lin)
Microchip PolarFlare PCIe controller driver:
- Make several symbols static (Wei Yongjun)
NVIDIA Tegra PCIe controller driver:
- Add MCFG quirks for Tegra194 ECAM errata (Vidya Sagar)
- Make several symbols const (Rikard Falkeborn)
- Fix Kconfig host/endpoint typo (Wesley Sheng)
SiFive FU740 PCIe controller driver:
- Add pcie_aux clock to prci driver (Greentime Hu)
- Use reset-simple in prci driver for PCIe (Greentime Hu)
- Add SiFive FU740 PCIe host controller driver and DT binding (Paul
Walmsley, Greentime Hu)
Synopsys DesignWare PCIe controller driver:
- Move MSI Receiver init to dw_pcie_host_init() so it is
re-initialized along with the RC in resume (Jisheng Zhang)
- Move iATU detection earlier to fix regression (Hou Zhiqiang)
TI J721E PCIe driver:
- Add DT binding and TI j721e support for refclk to PCIe connector
(Kishon Vijay Abraham I)
- Add host mode and endpoint mode DT bindings for TI AM64 SoC (Kishon
Vijay Abraham I)
TI Keystone PCIe controller driver:
- Use generic config accessors for TI AM65x (K3) to fix regression
(Kishon Vijay Abraham I)
Xilinx NWL PCIe controller driver:
- Add support for coherent PCIe DMA traffic using CCI (Bharat Kumar
Gogada)
- Add optional "dma-coherent" DT property (Bharat Kumar Gogada)
Miscellaneous:
- Fix kernel-doc warnings (Krzysztof Wilczyński)
- Remove unused MicroGate SyncLink device IDs (Jiri Slaby)
- Remove redundant dev_err() for devm_ioremap_resource() failure
(Chen Hui)
- Remove redundant initialization (Colin Ian King)
- Drop redundant dev_err() for platform_get_irq() errors (Krzysztof
Wilczyński)"
* tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (98 commits)
riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC
PCI: fu740: Add SiFive FU740 PCIe host controller driver
dt-bindings: PCI: Add SiFive FU740 PCIe host controller
MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver
clk: sifive: Use reset-simple in prci driver for PCIe driver
clk: sifive: Add pcie_aux clock in prci driver for PCIe driver
PCI: brcmstb: Use reset/rearm instead of deassert/assert
ata: ahci_brcm: Fix use of BCM7216 reset controller
reset: add missing empty function reset_control_rearm()
PCI: Allow VPD access for QLogic ISP2722
PCI/VPD: Add helper pci_get_func0_dev()
PCI/VPD: Remove pci_vpd_find_tag() SRDT handling
PCI/VPD: Remove pci_vpd_find_tag() 'offset' argument
PCI/VPD: Change pci_vpd_init() return type to void
PCI/VPD: Make missing VPD message less alarming
PCI/VPD: Remove pci_set_vpd_size()
x86/PCI: Remove unused alloc_pci_root_info() return value
MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer
PCI: mediatek-gen3: Add system PM support
PCI: mediatek-gen3: Add MSI support
...
|
|
Rather than have separate opaque setter functions that are easy to
overlook and lead to repetitive boilerplate in drivers, let's pass the
relevant initialisation parameters directly to iommu_device_register().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next
|
|
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Gonglei (Arei) <arei.gonglei@huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable@vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@huawei.com/
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
If 'intel_cap_audit()' fails, we should return directly, as already done in
the surrounding error handling path.
Fixes: ad3d19029979 ("iommu/vt-d: Audit IOMMU Capabilities and add helper functions")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/98d531caabe66012b4fffc7813fd4b9470afd517.1618124777.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Commit f68c7f539b6e9 ("iommu/vt-d: Enable write protect for supervisor
SVM") added pasid_enable_wpe() which hit below compile error with !X86.
../drivers/iommu/intel/pasid.c: In function 'pasid_enable_wpe':
../drivers/iommu/intel/pasid.c:554:22: error: implicit declaration of function 'read_cr0' [-Werror=implicit-function-declaration]
554 | unsigned long cr0 = read_cr0();
| ^~~~~~~~
In file included from ../include/linux/build_bug.h:5,
from ../include/linux/bits.h:22,
from ../include/linux/bitops.h:6,
from ../drivers/iommu/intel/pasid.c:12:
../drivers/iommu/intel/pasid.c:557:23: error: 'X86_CR0_WP' undeclared (first use in this function)
557 | if (unlikely(!(cr0 & X86_CR0_WP))) {
| ^~~~~~~~~~
../include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
78 | # define unlikely(x) __builtin_expect(!!(x), 0)
| ^
../drivers/iommu/intel/pasid.c:557:23: note: each undeclared identifier is reported only once for each function it appears in
557 | if (unlikely(!(cr0 & X86_CR0_WP))) {
| ^~~~~~~~~~
../include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
78 | # define unlikely(x) __builtin_expect(!!(x), 0)
|
Add the missing dependency.
Cc: Sanjay Kumar <sanjay.k.kumar@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: f68c7f539b6e9 ("iommu/vt-d: Enable write protect for supervisor SVM")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20210411062312.3057579-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When a present pasid entry is disassembled, all kinds of pasid related
caches need to be flushed. But when a pasid entry is not being used
(PRESENT bit not set), we don't need to do this. Check the PRESENT bit
in intel_pasid_tear_down_entry() and avoid flushing caches if it's not
set.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When the Intel IOMMU is operating in the scalable mode, some information
from the root and context table may be used to tag entries in the PASID
cache. Software should invalidate the PASID-cache when changing root or
context table entries.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When the first level page table is used for IOVA translation, it only
supports Read-Only and Read-Write permissions. The Write-Only permission
is not supported as the PRESENT bit (implying Read permission) should
always set. When using second level, we still give separate permissions
that allows WriteOnly which seems inconsistent and awkward. We want to
have consistent behavior. After moving to 1st level, we don't want things
to work sometimes, and break if we use 2nd level for the same mappings.
Hence remove this configuration.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Address field of the Page Request Descriptor only keeps bit [63:12]
of the offending address. Convert it to a full address before reporting
it to device drivers.
Fixes: eb8d93ea3c1d3 ("iommu/vt-d: Report page request faults for guest SVA")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.
This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86. As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use an explicit enable_nesting method instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Allow drivers to query and enable IOMMU_DEV_FEAT_IOPF, which amounts to
checking whether PRI is enabled.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210401154718.307519-5-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Intel VT-d driver checks wrong register to report snoop capablility
when using first level page table for GPA to HPA translation. This might
lead the IOMMU driver to say that it supports snooping control, but in
reality, it does not. Fix this by always setting PASID-table-entry.PGSNP
whenever a pasid entry is setting up for GPA to HPA translation so that
the IOMMU driver could report snoop capability as long as it runs in the
scalable mode.
Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Suggested-by: Rajesh Sankaran <rajesh.sankaran@intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210330021145.13824-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Now that the core code handles flushing per-IOVA domain CPU rcaches,
remove the handling here.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1616675401-151997-3-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make some functions static as they are only used inside pasid.c.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Some functions have been removed. Remove the remaining function
delarations.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|