<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/iommu/arm, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:21:14+00:00</updated>
<entry>
<title>iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate</title>
<updated>2026-03-04T12:21:14+00:00</updated>
<author>
<name>Nicolin Chen</name>
<email>nicolinc@nvidia.com</email>
</author>
<published>2026-01-15T01:12:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a50006e7ba0e31b49a00a3d4fef89161298f866e'/>
<id>urn:sha1:a50006e7ba0e31b49a00a3d4fef89161298f866e</id>
<content type='text'>
[ Upstream commit a45dd34663025c75652b27e384e91c9c05ba1d80 ]

A vSTE may have three configuration types: Abort, Bypass, and Translate.

An Abort vSTE wouldn't enable ATS, but the other two might.

It makes sense for a Transalte vSTE to rely on the guest vSTE.EATS field.

For a Bypass vSTE, it would end up with an S2-only physical STE, similar
to an attachment to a regular S2 domain. However, the nested case always
disables ATS following the Bypass vSTE, while the regular S2 case always
enables ATS so long as arm_smmu_ats_supported(master) == true.

Note that ATS is needed for certain VM centric workloads and historically
non-vSMMU cases have relied on this automatic enablement. So, having the
nested case behave differently causes problems.

To fix that, add a condition to disable_ats, so that it might enable ATS
for a Bypass vSTE, aligning with the regular S2 case.

Fixes: f27298a82ba0 ("iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence</title>
<updated>2026-03-04T12:21:14+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-01-15T18:23:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=779e0c250a6b8a0a2c1e2f39bedfb159100d113d'/>
<id>urn:sha1:779e0c250a6b8a0a2c1e2f39bedfb159100d113d</id>
<content type='text'>
[ Upstream commit 7cad800485956a263318930613f8f4a084af8c70 ]

If VM wants to toggle EATS_TRANS off at the same time as changing the CFG,
hypervisor will see EATS change to 0 and insert a V=0 breaking update into
the STE even though the VM did not ask for that.

In bare metal, EATS_TRANS is ignored by CFG=ABORT/BYPASS, which is why this
does not cause a problem until we have the nested case where CFG is always
a variation of S2 trans that does use EATS_TRANS.

Relax the rules for EATS_TRANS sequencing, we don't need it to be exact as
the enclosing code will always disable ATS at the PCI device when changing
EATS_TRANS. This ensures there are no ATS transactions that can race with
an EATS_TRANS change so we don't need to carefully sequence these bits.

Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Signed-off-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence</title>
<updated>2026-03-04T12:21:14+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-01-15T18:23:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3f73325f4548beee0a1f1d378ffe74ca1748570d'/>
<id>urn:sha1:3f73325f4548beee0a1f1d378ffe74ca1748570d</id>
<content type='text'>
[ Upstream commit f3c1d372dbb8e5a86923f20db66deabef42bfc9d ]

Nested CD tables set the MEV bit to try to reduce multi-fault spamming on
the hypervisor. Since MEV is in STE word 1 this causes a breaking update
sequence that is not required and impacts real workloads.

For the purposes of STE updates the value of MEV doesn't matter, if it is
set/cleared early or late it just results in a change to the fault reports
that must be supported by the kernel anyhow. The spec says:

 Note: Software must expect, and be able to deal with, coalesced fault
 records even when MEV == 0.

So mark STE MEV safe when computing the update sequence, to avoid creating
a breaking update.

Fixes: da0c56520e88 ("iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Mostafa Saleh &lt;smostafa@google.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Signed-off-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence</title>
<updated>2026-03-04T12:21:14+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-01-15T18:23:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=efd59553e477fe139f9981ab36a899869dae9294'/>
<id>urn:sha1:efd59553e477fe139f9981ab36a899869dae9294</id>
<content type='text'>
[ Upstream commit 2781f2a930abb5d27f80b8afbabfa19684833b65 ]

C_BAD_STE was observed when updating nested STE from an S1-bypass mode to
an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly
different than the normal S1-bypass and S1DSS-bypass modes. As a result,
fields like MEV and EATS in S2's used list marked the word1 as a critical
word that requested a STE.V=0. This breaks a hitless update.

However, both MEV and EATS aren't critical in terms of STE update. One
controls the merge of the events and the other controls the ATS that is
managed by the driver at the same time via pci_enable_ats().

Add an arm_smmu_get_ste_update_safe() to allow STE update algorithm to
relax those fields, avoiding the STE update breakages.

After this change, entry_set has no caller checking its return value, so
change it to void.

Note that this change is required by both MEV and EATS fields, which were
introduced in different kernel versions. So add get_update_safe() first.
MEV and EATS will be added to arm_smmu_get_ste_update_safe() separately.

Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Mostafa Saleh &lt;smostafa@google.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Signed-off-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency</title>
<updated>2026-03-04T12:20:12+00:00</updated>
<author>
<name>Alexander Grest</name>
<email>Alexander.Grest@microsoft.com</email>
</author>
<published>2025-12-08T21:28:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c7e5d819388e525196a340a0634e45d715d0272d'/>
<id>urn:sha1:c7e5d819388e525196a340a0634e45d715d0272d</id>
<content type='text'>
[ Upstream commit df180b1a4cc51011c5f8c52c7ec02ad2e42962de ]

The SMMU CMDQ lock is highly contentious when there are multiple CPUs
issuing commands and the queue is nearly full.

The lock has the following states:
 - 0:		Unlocked
 - &gt;0:		Shared lock held with count
 - INT_MIN+N:	Exclusive lock held, where N is the # of shared waiters
 - INT_MIN:	Exclusive lock held, no shared waiters

When multiple CPUs are polling for space in the queue, they attempt to
grab the exclusive lock to update the cons pointer from the hardware. If
they fail to get the lock, they will spin until either the cons pointer
is updated by another CPU.

The current code allows the possibility of shared lock starvation
if there is a constant stream of CPUs trying to grab the exclusive lock.
This leads to severe latency issues and soft lockups.

Consider the following scenario where CPU1's attempt to acquire the
shared lock is starved by CPU2 and CPU0 contending for the exclusive
lock.

CPU0 (exclusive)  | CPU1 (shared)     | CPU2 (exclusive)    | `cmdq-&gt;lock`
--------------------------------------------------------------------------
trylock() //takes |                   |                     | 0
                  | shared_lock()     |                     | INT_MIN
                  | fetch_inc()       |                     | INT_MIN
                  | no return         |                     | INT_MIN + 1
                  | spins // VAL &gt;= 0 |                     | INT_MIN + 1
unlock()          | spins...          |                     | INT_MIN + 1
set_release(0)    | spins...          |                     | 0 see[NOTE]
(done)            | (sees 0)          | trylock() // takes  | 0
                  | *exits loop*      | cmpxchg(0, INT_MIN) | 0
                  |                   | *cuts in*           | INT_MIN
                  | cmpxchg(0, 1)     |                     | INT_MIN
                  | fails // != 0     |                     | INT_MIN
                  | spins // VAL &gt;= 0 |                     | INT_MIN
                  | *starved*         |                     | INT_MIN

[NOTE] The current code resets the exclusive lock to 0 regardless of the
state of the lock. This causes two problems:
1. It opens the possibility of back-to-back exclusive locks and the
   downstream effect of starving shared lock.
2. The count of shared lock waiters are lost.

To mitigate this, we release the exclusive lock by only clearing the sign
bit while retaining the shared lock waiter count as a way to avoid
starving the shared lock waiters.

Also deleted cmpxchg loop while trying to acquire the shared lock as it
is not needed. The waiters can see the positive lock count and proceed
immediately after the exclusive lock is released.

Exclusive lock is not starved in that submitters will try exclusive lock
first when new spaces become available.

Reviewed-by: Mostafa Saleh &lt;smostafa@google.com&gt;
Reviewed-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Signed-off-by: Alexander Grest &lt;Alexander.Grest@microsoft.com&gt;
Signed-off-by: Jacob Pan &lt;jacob.pan@linux.microsoft.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-qcom: do not register driver in probe()</title>
<updated>2026-02-19T15:31:36+00:00</updated>
<author>
<name>Danilo Krummrich</name>
<email>dakr@kernel.org</email>
</author>
<published>2026-01-21T14:12:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f611dafe0ffd33bc3fe522506af38903f34ffcc1'/>
<id>urn:sha1:f611dafe0ffd33bc3fe522506af38903f34ffcc1</id>
<content type='text'>
commit ed1ac3c977dd6b119405fa36dd41f7151bd5b4de upstream.

Commit 0b4eeee2876f ("iommu/arm-smmu-qcom: Register the TBU driver in
qcom_smmu_impl_init") intended to also probe the TBU driver when
CONFIG_ARM_SMMU_QCOM_DEBUG is disabled, but also moved the corresponding
platform_driver_register() call into qcom_smmu_impl_init() which is
called from arm_smmu_device_probe().

However, it neither makes sense to register drivers from probe()
callbacks of other drivers, nor does the driver core allow registering
drivers with a device lock already being held.

The latter was revealed by commit dc23806a7c47 ("driver core: enforce
device_lock for driver_match_device()") leading to a deadlock condition
described in [1].

Additionally, it was noted by Robin that the current approach is
potentially racy with async probe [2].

Hence, fix this by registering the qcom_smmu_tbu_driver from
module_init(). Unfortunately, due to the vendoring of the driver, this
requires an indirection through arm-smmu-impl.c.

Reported-by: Mark Brown &lt;broonie@kernel.org&gt;
Closes: https://lore.kernel.org/lkml/7ae38e31-ef31-43ad-9106-7c76ea0e8596@sirena.org.uk/
Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1]
Link: https://lore.kernel.org/lkml/0c0d3707-9ea5-44f9-88a1-a65c62e3df8d@arm.com/ [2]
Fixes: dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()")
Fixes: 0b4eeee2876f ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init")
Acked-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Tested-by: Bjorn Andersson &lt;andersson@kernel.org&gt;
Reviewed-by: Bjorn Andersson &lt;andersson@kernel.org&gt;
Acked-by: Konrad Dybcio &lt;konradybcio@kernel.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Tested-by: Ioana Ciornei &lt;ioana.ciornei@nxp.com&gt; #LX2160ARDB
Tested-by: Wang Jiayue &lt;akaieurus@gmail.com&gt;
Reviewed-by: Wang Jiayue &lt;akaieurus@gmail.com&gt;
Tested-by: Mark Brown &lt;broonie@kernel.org&gt;
Acked-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Link: https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich &lt;dakr@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()</title>
<updated>2026-02-06T15:57:44+00:00</updated>
<author>
<name>Nicolin Chen</name>
<email>nicolinc@nvidia.com</email>
</author>
<published>2026-01-29T22:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=84df65fcfbff150ba16e6f697f0cbbdbc297ba24'/>
<id>urn:sha1:84df65fcfbff150ba16e6f697f0cbbdbc297ba24</id>
<content type='text'>
commit 80f1a2c2332fee0edccd006fe87fc8a6db94bab3 upstream.

The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset
the HW registers. So, the driver explicitly clears all the registers when a
VINTF or VCMDQ is being initialized calling its hw_deinit() function.

However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ
getting reset in tegra241_vcmdq_hw_init().

Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will
not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF
at that stage.

Then, this may result in dirty VCMDQ registers, which can fail the VM.

Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to
fix this bug. This is required by a host kernel.

Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Reported-by: Bao Nguyen &lt;ncqb@google.com&gt;
Signed-off-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>iommu/qcom: fix device leak on of_xlate()</title>
<updated>2026-01-08T09:16:56+00:00</updated>
<author>
<name>Johan Hovold</name>
<email>johan@kernel.org</email>
</author>
<published>2025-10-20T04:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6b8390fcef610432686b464079955381dcce2c78'/>
<id>urn:sha1:6b8390fcef610432686b464079955381dcce2c78</id>
<content type='text'>
commit 6a3908ce56e6879920b44ef136252b2f0c954194 upstream.

Make sure to drop the reference taken to the iommu platform device when
looking up its driver data during of_xlate().

Note that commit e2eae09939a8 ("iommu/qcom: add missing put_device()
call in qcom_iommu_of_xlate()") fixed the leak in a couple of error
paths, but the reference is still leaking on success and late failures.

Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu")
Cc: stable@vger.kernel.org	# 4.14: e2eae09939a8
Cc: Rob Clark &lt;robin.clark@oss.qualcomm.com&gt;
Cc: Yu Kuai &lt;yukuai3@huawei.com&gt;
Acked-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Johan Hovold &lt;johan@kernel.org&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal</title>
<updated>2025-12-18T13:03:18+00:00</updated>
<author>
<name>Stephan Gerhold</name>
<email>stephan.gerhold@linaro.org</email>
</author>
<published>2025-08-21T08:33:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cefb67ea27f95715a1e33e0e9ca0488f6495e051'/>
<id>urn:sha1:cefb67ea27f95715a1e33e0e9ca0488f6495e051</id>
<content type='text'>
[ Upstream commit 5583a55e074b33ccd88ac0542fd7cd656a7e2c8c ]

Some platforms (e.g. SC8280XP and X1E) support more than 128 stream
matching groups. This is more than what is defined as maximum by the ARM
SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom:
Limit the SMR groups to 128") disabled use of the additional groups because
they don't exhibit the same behavior as the architecture supported ones.

It seems like this is just another quirk of the hypervisor: When running
bare-metal without the hypervisor, the additional groups appear to behave
just like all others. The boot firmware uses some of the additional groups,
so ignoring them in this situation leads to stream match conflicts whenever
we allocate a new SMR group for the same SID.

The workaround exists primarily because the bypass quirk detection fails
when using a S2CR register from the additional matching groups, so let's
perform the test with the last reliable S2CR (127) and then limit the
number of SMR groups only if we detect that we are running below the
hypervisor (because of the bypass quirk).

Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128")
Signed-off-by: Stephan Gerhold &lt;stephan.gerhold@linaro.org&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables</title>
<updated>2025-12-18T13:03:16+00:00</updated>
<author>
<name>Ryan Huang</name>
<email>tzukui@google.com</email>
</author>
<published>2025-11-07T19:09:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d12cc5d13f6dc98b128e5c666417b6e2df6406a'/>
<id>urn:sha1:3d12cc5d13f6dc98b128e5c666417b6e2df6406a</id>
<content type='text'>
[ Upstream commit 5941f0e0c1e0be03ebc15b461f64208f5250d3d9 ]

In arm_smmu_alloc_cd_tables(), the error check following the
dma_alloc_coherent() for cd_table-&gt;l2.l1tab incorrectly tests
cd_table-&gt;l2.l2ptrs.

This means an allocation failure for l1tab goes undetected, causing
the function to return 0 (success) erroneously.

Correct the check to test cd_table-&gt;l2.l1tab.

Fixes: e3b1be2e73db ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg")
Signed-off-by: Daniel Mentz &lt;danielmentz@google.com&gt;
Signed-off-by: Ryan Huang &lt;tzukui@google.com&gt;
Reviewed-by: Nicolin Chen &lt;nicolinc@nvidia.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
