kernel/linux.git/drivers/iommu/arm, branch v6.18.21

iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate

2026-03-04T12:21:14+00:00

[ Upstream commit a45dd34663025c75652b27e384e91c9c05ba1d80 ] A vSTE may have three configuration types: Abort, Bypass, and Translate. An Abort vSTE wouldn't enable ATS, but the other two might. It makes sense for a Transalte vSTE to rely on the guest vSTE.EATS field. For a Bypass vSTE, it would end up with an S2-only physical STE, similar to an attachment to a regular S2 domain. However, the nested case always disables ATS following the Bypass vSTE, while the regular S2 case always enables ATS so long as arm_smmu_ats_supported(master) == true. Note that ATS is needed for certain VM centric workloads and historically non-vSMMU cases have relied on this automatic enablement. So, having the nested case behave differently causes problems. To fix that, add a condition to disable_ats, so that it might enable ATS for a Bypass vSTE, aligning with the regular S2 case. Fixes: f27298a82ba0 ("iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe Signed-off-by: Nicolin Chen Reviewed-by: Pranjal Shrivastava Reviewed-by: Jason Gunthorpe Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence

2026-03-04T12:21:14+00:00

[ Upstream commit 7cad800485956a263318930613f8f4a084af8c70 ] If VM wants to toggle EATS_TRANS off at the same time as changing the CFG, hypervisor will see EATS change to 0 and insert a V=0 breaking update into the STE even though the VM did not ask for that. In bare metal, EATS_TRANS is ignored by CFG=ABORT/BYPASS, which is why this does not cause a problem until we have the nested case where CFG is always a variation of S2 trans that does use EATS_TRANS. Relax the rules for EATS_TRANS sequencing, we don't need it to be exact as the enclosing code will always disable ATS at the PCI device when changing EATS_TRANS. This ensures there are no ATS transactions that can race with an EATS_TRANS change so we don't need to carefully sequence these bits. Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe Reviewed-by: Shuai Xue Signed-off-by: Nicolin Chen Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence

2026-03-04T12:21:14+00:00

[ Upstream commit f3c1d372dbb8e5a86923f20db66deabef42bfc9d ] Nested CD tables set the MEV bit to try to reduce multi-fault spamming on the hypervisor. Since MEV is in STE word 1 this causes a breaking update sequence that is not required and impacts real workloads. For the purposes of STE updates the value of MEV doesn't matter, if it is set/cleared early or late it just results in a change to the fault reports that must be supported by the kernel anyhow. The spec says: Note: Software must expect, and be able to deal with, coalesced fault records even when MEV == 0. So mark STE MEV safe when computing the update sequence, to avoid creating a breaking update. Fixes: da0c56520e88 ("iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe Reviewed-by: Shuai Xue Reviewed-by: Mostafa Saleh Reviewed-by: Pranjal Shrivastava Signed-off-by: Nicolin Chen Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence

2026-03-04T12:21:14+00:00

[ Upstream commit 2781f2a930abb5d27f80b8afbabfa19684833b65 ] C_BAD_STE was observed when updating nested STE from an S1-bypass mode to an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly different than the normal S1-bypass and S1DSS-bypass modes. As a result, fields like MEV and EATS in S2's used list marked the word1 as a critical word that requested a STE.V=0. This breaks a hitless update. However, both MEV and EATS aren't critical in terms of STE update. One controls the merge of the events and the other controls the ATS that is managed by the driver at the same time via pci_enable_ats(). Add an arm_smmu_get_ste_update_safe() to allow STE update algorithm to relax those fields, avoiding the STE update breakages. After this change, entry_set has no caller checking its return value, so change it to void. Note that this change is required by both MEV and EATS fields, which were introduced in different kernel versions. So add get_update_safe() first. MEV and EATS will be added to arm_smmu_get_ste_update_safe() separately. Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe Reviewed-by: Shuai Xue Reviewed-by: Mostafa Saleh Reviewed-by: Pranjal Shrivastava Signed-off-by: Nicolin Chen Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency

2026-03-04T12:20:12+00:00

[ Upstream commit df180b1a4cc51011c5f8c52c7ec02ad2e42962de ] The SMMU CMDQ lock is highly contentious when there are multiple CPUs issuing commands and the queue is nearly full. The lock has the following states: - 0: Unlocked - >0: Shared lock held with count - INT_MIN+N: Exclusive lock held, where N is the # of shared waiters - INT_MIN: Exclusive lock held, no shared waiters When multiple CPUs are polling for space in the queue, they attempt to grab the exclusive lock to update the cons pointer from the hardware. If they fail to get the lock, they will spin until either the cons pointer is updated by another CPU. The current code allows the possibility of shared lock starvation if there is a constant stream of CPUs trying to grab the exclusive lock. This leads to severe latency issues and soft lockups. Consider the following scenario where CPU1's attempt to acquire the shared lock is starved by CPU2 and CPU0 contending for the exclusive lock. CPU0 (exclusive) | CPU1 (shared) | CPU2 (exclusive) | `cmdq->lock` -------------------------------------------------------------------------- trylock() //takes | | | 0 | shared_lock() | | INT_MIN | fetch_inc() | | INT_MIN | no return | | INT_MIN + 1 | spins // VAL >= 0 | | INT_MIN + 1 unlock() | spins... | | INT_MIN + 1 set_release(0) | spins... | | 0 see[NOTE] (done) | (sees 0) | trylock() // takes | 0 | *exits loop* | cmpxchg(0, INT_MIN) | 0 | | *cuts in* | INT_MIN | cmpxchg(0, 1) | | INT_MIN | fails // != 0 | | INT_MIN | spins // VAL >= 0 | | INT_MIN | *starved* | | INT_MIN [NOTE] The current code resets the exclusive lock to 0 regardless of the state of the lock. This causes two problems: 1. It opens the possibility of back-to-back exclusive locks and the downstream effect of starving shared lock. 2. The count of shared lock waiters are lost. To mitigate this, we release the exclusive lock by only clearing the sign bit while retaining the shared lock waiter count as a way to avoid starving the shared lock waiters. Also deleted cmpxchg loop while trying to acquire the shared lock as it is not needed. The waiters can see the positive lock count and proceed immediately after the exclusive lock is released. Exclusive lock is not starved in that submitters will try exclusive lock first when new spaces become available. Reviewed-by: Mostafa Saleh Reviewed-by: Nicolin Chen Signed-off-by: Alexander Grest Signed-off-by: Jacob Pan Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-qcom: do not register driver in probe()

2026-02-19T15:31:36+00:00

commit ed1ac3c977dd6b119405fa36dd41f7151bd5b4de upstream. Commit 0b4eeee2876f ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") intended to also probe the TBU driver when CONFIG_ARM_SMMU_QCOM_DEBUG is disabled, but also moved the corresponding platform_driver_register() call into qcom_smmu_impl_init() which is called from arm_smmu_device_probe(). However, it neither makes sense to register drivers from probe() callbacks of other drivers, nor does the driver core allow registering drivers with a device lock already being held. The latter was revealed by commit dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") leading to a deadlock condition described in [1]. Additionally, it was noted by Robin that the current approach is potentially racy with async probe [2]. Hence, fix this by registering the qcom_smmu_tbu_driver from module_init(). Unfortunately, due to the vendoring of the driver, this requires an indirection through arm-smmu-impl.c. Reported-by: Mark Brown Closes: https://lore.kernel.org/lkml/7ae38e31-ef31-43ad-9106-7c76ea0e8596@sirena.org.uk/ Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1] Link: https://lore.kernel.org/lkml/0c0d3707-9ea5-44f9-88a1-a65c62e3df8d@arm.com/ [2] Fixes: dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") Fixes: 0b4eeee2876f ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") Acked-by: Robin Murphy Tested-by: Bjorn Andersson Reviewed-by: Bjorn Andersson Acked-by: Konrad Dybcio Reviewed-by: Greg Kroah-Hartman Tested-by: Ioana Ciornei #LX2160ARDB Tested-by: Wang Jiayue Reviewed-by: Wang Jiayue Tested-by: Mark Brown Acked-by: Joerg Roedel Link: https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org Signed-off-by: Danilo Krummrich Signed-off-by: Greg Kroah-Hartman

iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()

2026-02-06T15:57:44+00:00

commit 80f1a2c2332fee0edccd006fe87fc8a6db94bab3 upstream. The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset the HW registers. So, the driver explicitly clears all the registers when a VINTF or VCMDQ is being initialized calling its hw_deinit() function. However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ getting reset in tegra241_vcmdq_hw_init(). Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF at that stage. Then, this may result in dirty VCMDQ registers, which can fail the VM. Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to fix this bug. This is required by a host kernel. Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support") Cc: stable@vger.kernel.org Reported-by: Bao Nguyen Signed-off-by: Nicolin Chen Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman

iommu/qcom: fix device leak on of_xlate()

2026-01-08T09:16:56+00:00

commit 6a3908ce56e6879920b44ef136252b2f0c954194 upstream. Make sure to drop the reference taken to the iommu platform device when looking up its driver data during of_xlate(). Note that commit e2eae09939a8 ("iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()") fixed the leak in a couple of error paths, but the reference is still leaking on success and late failures. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Cc: stable@vger.kernel.org # 4.14: e2eae09939a8 Cc: Rob Clark Cc: Yu Kuai Acked-by: Robin Murphy Signed-off-by: Johan Hovold Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman

iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal

2025-12-18T13:03:18+00:00

[ Upstream commit 5583a55e074b33ccd88ac0542fd7cd656a7e2c8c ] Some platforms (e.g. SC8280XP and X1E) support more than 128 stream matching groups. This is more than what is defined as maximum by the ARM SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") disabled use of the additional groups because they don't exhibit the same behavior as the architecture supported ones. It seems like this is just another quirk of the hypervisor: When running bare-metal without the hypervisor, the additional groups appear to behave just like all others. The boot firmware uses some of the additional groups, so ignoring them in this situation leads to stream match conflicts whenever we allocate a new SMR group for the same SID. The workaround exists primarily because the bypass quirk detection fails when using a S2CR register from the additional matching groups, so let's perform the test with the last reliable S2CR (127) and then limit the number of SMR groups only if we detect that we are running below the hypervisor (because of the bypass quirk). Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") Signed-off-by: Stephan Gerhold Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables

2025-12-18T13:03:16+00:00

[ Upstream commit 5941f0e0c1e0be03ebc15b461f64208f5250d3d9 ] In arm_smmu_alloc_cd_tables(), the error check following the dma_alloc_coherent() for cd_table->l2.l1tab incorrectly tests cd_table->l2.l2ptrs. This means an allocation failure for l1tab goes undetected, causing the function to return 0 (success) erroneously. Correct the check to test cd_table->l2.l1tab. Fixes: e3b1be2e73db ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg") Signed-off-by: Daniel Mentz Signed-off-by: Ryan Huang Reviewed-by: Nicolin Chen Reviewed-by: Pranjal Shrivastava Reviewed-by: Jason Gunthorpe Signed-off-by: Will Deacon Signed-off-by: Sasha Levin