summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)AuthorFilesLines
2024-05-02iommu/vt-d: Allocate local memory for page request queueJacob Pan1-1/+1
[ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ] The page request queue is per IOMMU, its allocation should be made NUMA-aware for performance reasons. Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27iommu/vt-d: Don't issue ATS Invalidation request when device is disconnectedEthan Zhao1-0/+3
[ Upstream commit 4fc82cd907ac075648789cc3a00877778aa1838b ] For those endpoint devices connect to system via hotplug capable ports, users could request a hot reset to the device by flapping device's link through setting the slot's link control register, as pciehp_ist() DLLSC interrupt sequence response, pciehp will unload the device driver and then power it off. thus cause an IOMMU device-TLB invalidation (Intel VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence target device to be sent and deadly loop to retry that request after ITE fault triggered in interrupt context. That would cause following continuous hard lockup warning and system hang [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144 [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822623] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822626] FS: 0000000000000000(0000) GS:ffffa237ae400000(0000) knlGS:0000000000000000 [ 4223.822627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0 [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 4223.822628] PKRU: 55555554 [ 4223.822628] Call Trace: [ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822629] intel_iommu_release_device+0x1f/0x30 [ 4223.822629] iommu_release_device+0x33/0x60 [ 4223.822629] iommu_bus_notifier+0x7f/0x90 [ 4223.822630] blocking_notifier_call_chain+0x60/0x90 [ 4223.822630] device_del+0x2e5/0x420 [ 4223.822630] pci_remove_bus_device+0x70/0x110 [ 4223.822630] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822631] pciehp_disable_slot+0x6b/0x100 [ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822631] pciehp_ist+0x176/0x180 [ 4223.822631] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822632] irq_thread_fn+0x19/0x50 [ 4223.822632] irq_thread+0x104/0x190 [ 4223.822632] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822632] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822633] kthread+0x114/0x130 [ 4223.822633] ? __kthread_cancel_work+0x40/0x40 [ 4223.822633] ret_from_fork+0x1f/0x30 [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822634] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822634] Call Trace: [ 4223.822634] <NMI> [ 4223.822635] dump_stack+0x6d/0x88 [ 4223.822635] panic+0x101/0x2d0 [ 4223.822635] ? ret_from_fork+0x11/0x30 [ 4223.822635] nmi_panic.cold.14+0xc/0xc [ 4223.822636] watchdog_overflow_callback.cold.8+0x6d/0x81 [ 4223.822636] __perf_event_overflow+0x4f/0xf0 [ 4223.822636] handle_pmi_common+0x1ef/0x290 [ 4223.822636] ? __set_pte_vaddr+0x28/0x40 [ 4223.822637] ? flush_tlb_one_kernel+0xa/0x20 [ 4223.822637] ? __native_set_fixmap+0x24/0x30 [ 4223.822637] ? ghes_copy_tofrom_phys+0x70/0x100 [ 4223.822637] ? __ghes_peek_estatus.isra.16+0x49/0xa0 [ 4223.822637] intel_pmu_handle_irq+0xba/0x2b0 [ 4223.822638] perf_event_nmi_handler+0x24/0x40 [ 4223.822638] nmi_handle+0x4d/0xf0 [ 4223.822638] default_do_nmi+0x49/0x100 [ 4223.822638] exc_nmi+0x134/0x180 [ 4223.822639] end_repeat_nmi+0x16/0x67 [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822641] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] </NMI> [ 4223.822642] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822642] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822643] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822643] intel_iommu_release_device+0x1f/0x30 [ 4223.822643] iommu_release_device+0x33/0x60 [ 4223.822643] iommu_bus_notifier+0x7f/0x90 [ 4223.822644] blocking_notifier_call_chain+0x60/0x90 [ 4223.822644] device_del+0x2e5/0x420 [ 4223.822644] pci_remove_bus_device+0x70/0x110 [ 4223.822644] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822644] pciehp_disable_slot+0x6b/0x100 [ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822645] pciehp_ist+0x176/0x180 [ 4223.822645] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822645] irq_thread_fn+0x19/0x50 [ 4223.822646] irq_thread+0x104/0x190 [ 4223.822646] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822646] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822646] kthread+0x114/0x130 [ 4223.822647] ? __kthread_cancel_work+0x40/0x40 [ 4223.822647] ret_from_fork+0x1f/0x30 [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Such issue could be triggered by all kinds of regular surprise removal hotplug operation. like: 1. pull EP(endpoint device) out directly. 2. turn off EP's power. 3. bring the link down. etc. this patch aims to work for regular safe removal and surprise removal unplug. these hot unplug handling process could be optimized for fix the ATS Invalidation hang issue by calling pci_dev_is_disconnected() in function devtlb_invalidation_with_pasid() to check target device state to avoid sending meaningless ATS Invalidation request to iommu when device is gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. In safe removal path, device state isn't set to pci_channel_io_perm_failure in pciehp_unconfigure_device() by checking 'presence' parameter, calling pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return false there, wouldn't break the function. For surprise removal, device state is set to pci_channel_io_perm_failure in pciehp_unconfigure_device(), means device is already gone (disconnected) call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return true to break the function not to send ATS Invalidation request to the disconnected device blindly, thus avoid to trigger further ITE fault, and ITE fault will block all invalidation request to be handled. furthermore retry the timeout request could trigger hard lockup. safe removal (present) & surprise removal (not present) pciehp_ist() pciehp_handle_presence_or_link_change() pciehp_disable_slot() remove_board() pciehp_unconfigure_device(presence) { if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); } this patch works for regular safe removal and surprise removal of ATS capable endpoint on PCIe switch downstream ports. Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Tested-by: Haorong Ye <yehaorong@bytedance.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27iommu/amd: Mark interrupt as managedMario Limonciello1-0/+3
[ Upstream commit 0feda94c868d396fac3b3cb14089d2d989a07c72 ] On many systems that have an AMD IOMMU the following sequence of warnings is observed during bootup. ``` pci 0000:00:00.2 can't derive routing for PCI INT A pci 0000:00:00.2: PCI INT A: not connected ``` This series of events happens because of the IOMMU initialization sequence order and the lack of _PRT entries for the IOMMU. During initialization the IOMMU driver first enables the PCI device using pci_enable_device(). This will call acpi_pci_irq_enable() which will check if the interrupt is declared in a PCI routing table (_PRT) entry. According to the PCI spec [1] these routing entries are only required under PCI root bridges: The _PRT object is required under all PCI root bridges The IOMMU is directly connected to the root complex, so there is no parent bridge to look for a _PRT entry. The first warning is emitted since no entry could be found in the hierarchy. The second warning is then emitted because the interrupt hasn't yet been configured to any value. The pin was configured in pci_read_irq() but the byte in PCI_INTERRUPT_LINE return 0xff which means "Unknown". After that sequence of events pci_enable_msi() is called and this will allocate an interrupt. That is both of these warnings are totally harmless because the IOMMU uses MSI for interrupts. To avoid even trying to probe for a _PRT entry mark the IOMMU as IRQ managed. This avoids both warnings. Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html?highlight=_prt#prt-pci-routing-table [1] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Fixes: cffe0a2b5a34 ("x86, irq: Keep balance of IOAPIC pin reference count") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20240122233400.1802-1-mario.limonciello@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26iommu/arm-smmu-qcom: Add missing GMU entry to match tableRob Clark1-0/+1
commit afc95681c3068956fed1241a1ff1612c066c75ac upstream. In some cases the firmware expects cbndx 1 to be assigned to the GMU, so we also want the default domain for the GMU to be an identy domain. This way it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and later cbndx 2. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231210180655.75542-1-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08iommu/vt-d: Add MTL to quirk list to skip TE disablingAbdul Halim, Mohd Syazwan1-1/+1
commit 85b80fdffa867d75dfb9084a839e7949e29064e8 upstream. The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before switching address translation on or off and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translation(), waiting for the completion of TE transition. Add MTL to the quirk list for those devices and skips TE disabling if the qurik hits. Fixes: b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") Cc: stable@vger.kernel.org Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19iommu/vt-d: Fix to flush cache of PASID directory tableYanfei Xu1-1/+1
[ Upstream commit 8a3b8e63f8371c1247b7aa24ff9c5312f1a6948b ] Even the PCI devices don't support pasid capability, PASID table is mandatory for a PCI device in scalable mode. However flushing cache of pasid directory table for these devices are not taken after pasid table is allocated as the "size" of table is zero. Fix it by calculating the size by page order. Found this when reading the code, no real problem encountered for now. Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19iommu/qcom: Disable and reset context bank before programmingAngeloGioacchino Del Regno1-0/+7
[ Upstream commit 9f3fef23d9b5a858a6e6d5f478bb1b6b76265e76 ] Writing the new TTBRs, TCRs and MAIRs on a previously enabled context bank may trigger a context fault, resulting in firmware driven AP resets: change the domain initialization programming sequence to disable the context bank(s) and to also clear the related fault address (CB_FAR) and fault status (CB_FSR) registers before writing new values to TTBR0/1, TCR/TCR2, MAIR0/1. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230622092742.74819-4-angelogioacchino.delregno@collabora.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/amd: Don't block updates to GATag if guest mode is onJoao Martins1-2/+1
[ Upstream commit ed8a2f4ddef2eaaf864ab1efbbca9788187036ab ] On KVM GSI routing table updates, specially those where they have vIOMMUs with interrupt remapping enabled (to boot >255vcpus setups without relying on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs with a new VCPU affinity. On AMD with AVIC enabled, the new vcpu affinity info is updated via: avic_pi_update_irte() irq_set_vcpu_affinity() amd_ir_set_vcpu_affinity() amd_iommu_{de}activate_guest_mode() Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM (via GALog) when interrupt cannot be delivered due to vCPU is in blocking state. The issue is that amd_iommu_activate_guest_mode() will essentially only change IRTE fields on transitions from non-guest-mode to guest-mode and otherwise returns *with no changes to IRTE* on already configured guest-mode interrupts. To the guest this means that the VF interrupts remain affined to the first vCPU they were first configured, and guest will be unable to issue VF interrupts and receive messages like this from spurious interrupts (e.g. from waking the wrong vCPU in GALog): [ 167.759472] __common_interrupt: 3.34 No irq handler for vector [ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid 3122): Recovered 1 EQEs on cmd_eq [ 230.681799] mlx5_core 0000:00:02.0: wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400) recovered after timeout [ 230.683266] __common_interrupt: 3.34 No irq handler for vector Given the fact that amd_ir_set_vcpu_affinity() uses amd_iommu_activate_guest_mode() underneath it essentially means that VCPU affinity changes of IRTEs are nops. Fix it by dropping the check for guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change underlying DestID on the host, the VFIO IRQ handler will still be able to poke at the right guest-vCPU. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/rockchip: Fix unwind goto issueChao Wang1-6/+8
[ Upstream commit ec014683c564fb74fc68e8f5e84691d3b3839d24 ] Smatch complains that drivers/iommu/rockchip-iommu.c:1306 rk_iommu_probe() warn: missing unwind goto? The rk_iommu_probe function, after obtaining the irq value through platform_get_irq, directly returns an error if the returned value is negative, without releasing any resources. Fix this by adding a new error handling label "err_pm_disable" and use a goto statement to redirect to the error handling process. In order to preserve the original semantics, set err to the value of irq. Fixes: 1aa55ca9b14a ("iommu/rockchip: Move irq request past pm_runtime_enable") Signed-off-by: Chao Wang <D202280639@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20230417030421.2777-1-D202280639@hust.edu.cn Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30iommu/arm-smmu-v3: Acknowledge pri/event queue overflow if anyTomas Krcka1-5/+14
[ Upstream commit 67ea0b7ce41844eae7c10bb04dfe66a23318c224 ] When an overflow occurs in the PRI queue, the SMMU toggles the overflow flag in the PROD register. To exit the overflow condition, the PRI thread is supposed to acknowledge it by toggling this flag in the CONS register. Unacknowledged overflow causes the queue to stop adding anything new. Currently, the priq thread always writes the CONS register back to the SMMU after clearing the queue. The writeback is not necessary if the OVFLG in the PROD register has not been changed, no overflow has occured. This commit checks the difference of the overflow flag between CONS and PROD register. If it's different, toggles the OVACKFLG flag in the CONS register and write it to the SMMU. The situation is similar for the event queue. The acknowledge register is also toggled after clearing the event queue but never propagated to the hardware. This would only be done the next time when executing evtq thread. Unacknowledged event queue overflow doesn't affect the event queue, because the SMMU still adds elements to that queue when the overflow condition is active. But it feel nicer to keep SMMU in sync when possible, so use the same way here as well. Signed-off-by: Tomas Krcka <krckatom@amazon.de> Link: https://lore.kernel.org/r/20230329123420.34641-1-tomas.krcka@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30iommu/arm-smmu-qcom: Limit the SMR groups to 128Manivannan Sadhasivam1-1/+15
[ Upstream commit 12261134732689b7e30c59db9978f81230965181 ] Some platforms support more than 128 stream matching groups than what is defined by the ARM SMMU architecture specification. But due to some unknown reasons, those additional groups don't exhibit the same behavior as the architecture supported ones. For instance, the additional groups will not detect the quirky behavior of some firmware versions intercepting writes to S2CR register, thus skipping the quirk implemented in the driver and causing boot crash. So let's limit the groups to 128 for now until the issue with those groups are fixed and issue a notice to users in that case. Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20230327080029.11584-1-manivannan.sadhasivam@linaro.org [will: Reworded the comment slightly] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTEKishon Vijay Abraham I1-2/+2
commit ccc62b827775915a9b82db42a29813d04f92df7a upstream. commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") while refactoring guest virtual APIC activation/de-activation code, stored information for activate/de-activate in "struct amd_ir_data". It used 32-bit integer data type for storing the "Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the "ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping Table Entry). This causes interrupts from PCIe devices to not reach the guest in the case of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_ bit in the "ga_root_ptr" is lost before writing it to the IRTE. Fix it by using 64-bit data type for storing the "ga_root_ptr". While at that also change the data type of "ga_tag" to u32 in order to match the IOMMU spec. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Cc: stable@vger.kernel.org # v5.4+ Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameterGavrilov Ilia1-1/+15
[ Upstream commit b6b26d86c61c441144c72f842f7469bb686e1211 ] The 'acpiid' buffer in the parse_ivrs_acpihid function may overflow, because the string specifier in the format string sscanf() has no width limitation. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20230202082719.1513849-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17iommu/vt-d: Fix PASID directory pointer coherencyJacob Pan1-0/+7
[ Upstream commit 194b3348bdbb7db65375c72f3f774aee4cc6614e ] On platforms that do not support IOMMU Extended capability bit 0 Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing any translation structures. IOMMU access goes only directly to memory. Intel IOMMU code was missing a flush for the PASID table directory that resulted in the unrecoverable fault as shown below. This patch adds clflush calls whenever allocating and updating a PASID table directory to ensure cache coherency. On the reverse direction, there's no need to clflush the PASID directory pointer when we deactivate a context entry in that IOMMU hardware will not see the old PASID directory pointer after we clear the context entry. PASID directory entries are also never freed once allocated. DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000 [fault reason 0x51] SM: Present bit in Directory Entry is clear DMAR: Dump dmar1 table entries for IOVA 0x1026a4000 DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001 DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401 DMAR: pasid dir entry: 0x0000000101b4e001 DMAR: pasid table entry[0]: 0x0000000000000109 DMAR: pasid table entry[1]: 0x0000000000000001 DMAR: pasid table entry[2]: 0x0000000000000000 DMAR: pasid table entry[3]: 0x0000000000000000 DMAR: pasid table entry[4]: 0x0000000000000000 DMAR: pasid table entry[5]: 0x0000000000000000 DMAR: pasid table entry[6]: 0x0000000000000000 DMAR: pasid table entry[7]: 0x0000000000000000 DMAR: PTE not present at level 4 Cc: <stable@vger.kernel.org> Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Sukumar Ghorai <sukumar.ghorai@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230209212843.1788125-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17iommu/vt-d: Fix lockdep splat in intel_pasid_get_entry()Lu Baolu1-8/+13
[ Upstream commit 803766cbf85fb8edbf896729bbefc2d38dcf1e0a ] The pasid_lock is used to synchronize different threads from modifying a same pasid directory entry at the same time. It causes below lockdep splat. [ 83.296538] ======================================================== [ 83.296538] WARNING: possible irq lock inversion dependency detected [ 83.296539] 5.12.0-rc3+ #25 Tainted: G W [ 83.296539] -------------------------------------------------------- [ 83.296540] bash/780 just changed the state of lock: [ 83.296540] ffffffff82b29c98 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.0+0x32/0x110 [ 83.296547] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 83.296547] (pasid_lock){+.+.}-{2:2} [ 83.296548] and interrupts could create inverse lock ordering between them. [ 83.296549] other info that might help us debug this: [ 83.296549] Chain exists of: device_domain_lock --> &iommu->lock --> pasid_lock [ 83.296551] Possible interrupt unsafe locking scenario: [ 83.296551] CPU0 CPU1 [ 83.296552] ---- ---- [ 83.296552] lock(pasid_lock); [ 83.296553] local_irq_disable(); [ 83.296553] lock(device_domain_lock); [ 83.296554] lock(&iommu->lock); [ 83.296554] <Interrupt> [ 83.296554] lock(device_domain_lock); [ 83.296555] *** DEADLOCK *** Fix it by replacing the pasid_lock with an atomic exchange operation. Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210320020916.640115-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()Christophe JAILLET1-1/+3
[ Upstream commit 142e821f68cf5da79ce722cb9c1323afae30e185 ] A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18iommu/mediatek-v1: Add error handle for mtk_iommu_probeYong Wu1-4/+18
[ Upstream commit ac304c070c54413efabf29f9e73c54576d329774 ] In the original code, we lack the error handle. This patch adds them. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/20210412064843.11614-2-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 142e821f68cf ("iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid optionsKim Phillips1-25/+54
[ Upstream commit 1198d2316dc4265a97d0e8445a22c7a6d17580a4 ] Currently, these options cause the following libkmod error: libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \ Ignoring bad option on kernel command line while parsing module \ name: 'ivrs_xxxx[XX:XX' Fix by introducing a new parameter format for these options and throw a warning for the deprecated format. Users are still allowed to omit the PCI Segment if zero. Adding a Link: to the reason why we're modding the syntax parsing in the driver and not in libkmod. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/ Reported-by: Kim Phillips <kim.phillips@amd.com> Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commandsSuravee Suthikulpanit1-17/+27
[ Upstream commit bbe3a106580c21bc883fb0c9fa3da01534392fe8 ] By default, PCI segment is zero and can be omitted. To support system with non-zero PCI segment ID, modify the parsing functions to allow PCI segment ID. Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20220706113825.25582-33-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 1198d2316dc4 ("iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/amd: Fix ivrs_acpihid cmdline parsing codeKim Phillips1-0/+7
commit 5f18e9f8868c6d4eae71678e7ebd4977b7d8c8cf upstream. The second (UID) strcmp in acpi_dev_hid_uid_match considers "0" and "00" different, which can prevent device registration. Have the AMD IOMMU driver's ivrs_acpihid parsing code remove any leading zeroes to make the UID strcmp succeed. Now users can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect the same behaviour. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220919155638.391481-1-kim.phillips@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-14iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITYJason Gunthorpe1-1/+0
[ Upstream commit ef5bb8e7a7127218f826b9ccdf7508e7a339f4c2 ] This driver treats IOMMU_DOMAIN_IDENTITY the same as UNMANAGED, which cannot possibly be correct. UNMANAGED domains are required to start out blocking all DMAs. This seems to be what this driver does as it allocates a first level 'dt' for the IO page table that is 0 filled. Thus UNMANAGED looks like a working IO page table, and so IDENTITY must be a mistake. Remove it. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0-v1-97f0adf27b5e+1f0-s50_identity_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe()Yuan Can1-1/+1
[ Upstream commit 73f5fc5f884ad0c5f7d57f66303af64f9f002526 ] The fsl_pamu_probe() returns directly when create_csd() failed, leaving irq and memories unreleased. Fix by jumping to error if create_csd() returns error. Fixes: 695093e38c3e ("iommu/fsl: Freescale PAMU driver and iommu implementation.") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221121082022.19091-1-yuancan@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/amd: Fix pci device refcount leak in ppr_notifier()Yang Yingliang1-0/+1
[ Upstream commit 6cf0981c2233f97d56938d9d61845383d6eb227c ] As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrement the reference count by calling pci_dev_put(). So call it before returning from ppr_notifier() to avoid refcount leak. Fixes: daae2d25a477 ("iommu/amd: Don't copy GCR3 table root pointer") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221118093604.216371-1-yangyingliang@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/sun50i: Fix flush sizeJernej Skrabec1-1/+1
[ Upstream commit 67a8a67f9eceb72e4c73d1d09ed9ab04f4b8e12d ] Function sun50i_table_flush() takes number of entries as an argument, not number of bytes. Fix that mistake in sun50i_dte_get_page_table(). Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-5-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/sun50i: Fix R/W permission checkJernej Skrabec1-1/+1
[ Upstream commit eac0104dc69be50bed86926d6f32e82b44f8c921 ] Because driver has enum type permissions and iommu subsystem has bitmap type, we have to be careful how check for combined read and write permissions is done. In such case, we have to mask both permissions and check that both are set at the same time. Current code just masks both flags but doesn't check that both are set. In short, it always sets R/W permission, regardles if requested permissions were RO, WO or RW. Fix that. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-4-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/sun50i: Consider all fault sources for resetJernej Skrabec1-2/+6
[ Upstream commit cef20703e2b2276aaa402ec5a65ec9a09963b83e ] We have to reset masters for all faults - permissions, L1 fault or L2 fault. Currently it's done only for permissions. If other type of fault happens, master is in locked up state. Fix that by really considering all fault sources. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-3-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14iommu/sun50i: Fix reset releaseJernej Skrabec1-1/+2
[ Upstream commit 9ad0c1252e84dbc664f0462707182245ed603237 ] Reset signal is asserted by writing 0 to the corresponding locations of masters we want to reset. So in order to deassert all reset signals, we should write 1's to all locations. Current code writes 1's to locations of masters which were just reset which is good. However, at the same time it also writes 0's to other locations and thus asserts reset signals of remaining masters. Fix code by writing all 1's when we want to deassert all reset signals. This bug was discovered when working with Cedrus (video decoder). When it faulted, display went blank due to reset signal assertion. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-2-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()Xiongfeng Wang1-0/+1
[ Upstream commit 4bedbbd782ebbe7287231fea862c158d4f08a9e3 ] for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() for the error path to avoid reference count leak. Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08iommu/vt-d: Fix PCI device refcount leak in has_external_pci()Xiongfeng Wang1-1/+3
[ Upstream commit afca9e19cc720bfafc75dc5ce429c185ca93f31d ] for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() before 'return true' to avoid reference count leak. Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-25iommu/vt-d: Set SRE bit only when hardware has SRS capTina Zhang1-2/+3
commit 7fc961cf7ffcb130c4e93ee9a5628134f9de700a upstream. SRS cap is the hardware cap telling if the hardware IOMMU can support requests seeking supervisor privilege or not. SRE bit in scalable-mode PASID table entry is treated as Reserved(0) for implementation not supporting SRS cap. Checking SRS cap before setting SRE bit can avoid the non-recoverable fault of "Non-zero reserved field set in PASID Table Entry" caused by setting SRE bit while there is no SRS cap support. The fault messages look like below: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000 [fault reason 0x5a] SM: Non-zero reserved field set in PASID Table Entry Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-30iommu/vt-d: Clean up si_domain in the init_dmars() error pathJerry Snitselaar1-0/+5
[ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ] A splat from kmem_cache_destroy() was seen with a kernel prior to commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26iommu/omap: Fix buffer overflow in debugfsDan Carpenter1-3/+3
[ Upstream commit 184233a5202786b20220acd2d04ddf909ef18f29 ] There are two issues here: 1) The "len" variable needs to be checked before the very first write. Otherwise if omap2_iommu_dump_ctx() with "bytes" less than 32 it is a buffer overflow. 2) The snprintf() function returns the number of bytes that *would* have been copied if there were enough space. But we want to know the number of bytes which were *actually* copied so use scnprintf() instead. Fixes: bd4396f09a4a ("iommu/omap: Consolidate OMAP IOMMU modules") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/YuvYh1JbE3v+abd5@kili Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28iommu/vt-d: Check correct capability for sagaw determinationYi Liu1-1/+1
commit 154897807050c1161cb2660e502fc0470d46b986 upstream. Check 5-level paging capability for 57 bits address width instead of checking 1GB large page capability. Fixes: 53fc7ad6edf2 ("iommu/vt-d: Correctly calculate sagaw value of IOMMU") Cc: stable@vger.kernel.org Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Link: https://lore.kernel.org/r/20220916071212.2223869-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-20iommu/vt-d: Correctly calculate sagaw value of IOMMULu Baolu1-3/+25
[ Upstream commit 53fc7ad6edf210b497230ce74b61b322a202470c ] The Intel IOMMU driver possibly selects between the first-level and the second-level translation tables for DMA address translation. However, the levels of page-table walks for the 4KB base page size are calculated from the SAGAW field of the capability register, which is only valid for the second-level page table. This causes the IOMMU driver to stop working if the hardware (or the emulated IOMMU) advertises only first-level translation capability and reports the SAGAW field as 0. This solves the above problem by considering both the first level and the second level when calculating the supported page table levels. Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15iommu/amd: use full 64-bit value in build_completion_wait()John Sperbeck1-1/+2
[ Upstream commit 94a568ce32038d8ff9257004bb4632e60eb43a49 ] We started using a 64 bit completion value. Unfortunately, we only stored the low 32-bits, so a very large completion value would never be matched in iommu_completion_wait(). Fixes: c69d89aff393 ("iommu/amd: Use 4K page for completion wait write-back semaphore") Signed-off-by: John Sperbeck <jsperbeck@google.com> Link: https://lore.kernel.org/r/20220801192229.3358786-1-jsperbeck@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)Alexander Lobakin1-1/+1
[ Upstream commit b0b0b77ea611e3088e9523e60860f4f41b62b235 ] KASAN reports: [ 4.668325][ T0] BUG: KASAN: wild-memory-access in dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497) [ 4.676149][ T0] Read of size 8 at addr 1fffffff85115558 by task swapper/0/0 [ 4.683454][ T0] [ 4.685638][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc3-00004-g0e862838f290 #1 [ 4.694331][ T0] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016 [ 4.703196][ T0] Call Trace: [ 4.706334][ T0] <TASK> [ 4.709133][ T0] ? dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497) after converting the type of the first argument (@nr, bit number) of arch_test_bit() from `long` to `unsigned long`[0]. Under certain conditions (for example, when ACPI NUMA is disabled via command line), pxm_to_node() can return %NUMA_NO_NODE (-1). It is valid 'magic' number of NUMA node, but not valid bit number to use in bitops. node_online() eventually descends to test_bit() without checking for the input, assuming it's on caller side (which might be good for perf-critical tasks). There, -1 becomes %ULONG_MAX which leads to an insane array index when calculating bit position in memory. For now, add an explicit check for @node being not %NUMA_NO_NODE before calling test_bit(). The actual logics didn't change here at all. [0] https://github.com/norov/linux/commit/0e862838f290147ea9c16db852d8d494b552d38d Fixes: ee34b32d8c29 ("dmar: support for parsing Remapping Hardware Static Affinity structure") Cc: stable@vger.kernel.org # 2.6.33+ Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loopLiang He1-2/+5
[ Upstream commit a91eb6803c1c715738682fece095145cbd68fe0b ] In qcom_iommu_has_secure_context(), we should call of_node_put() for the reference 'child' when breaking out of for_each_child_of_node() which will automatically increase and decrease the refcount. Fixes: d051f28c8807 ("iommu/qcom: Initialize secure page table") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20220719124955.1242171-1-windhl@126.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21iommu/exynos: Handle failed IOMMU device registration properlySam Protsenko1-1/+5
[ Upstream commit fce398d2d02c0a9a2bedf7c7201b123e153e8963 ] If iommu_device_register() fails in exynos_sysmmu_probe(), the previous calls have to be cleaned up. In this case, the iommu_device_sysfs_add() should be cleaned up, by calling its remove counterpart call. Fixes: d2c302b6e8b1 ("iommu/exynos: Make use of iommu_device_register interface") Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20220714165550.8884-3-semen.protsenko@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-12iommu/vt-d: Fix PCI bus rescan device hot addYian Chen1-1/+1
commit 316f92a705a4c2bf4712135180d56f3cca09243a upstream. Notifier calling chain uses priority to determine the execution order of the notifiers or listeners registered to the chain. PCI bus device hot add utilizes the notification mechanism. The current code sets low priority (INT_MIN) to Intel dmar_pci_bus_notifier and postpones DMAR decoding after adding new device into IOMMU. The result is that struct device pointer cannot be found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the device is put under the "catch-all" IOMMU instead of the correct one. This could cause system hang when device TLB invalidation is sent to the wrong IOMMU. Invalidation timeout error and hard lockup have been observed and data inconsistency/crush may occur as well. This patch fixes the issue by setting a positive priority(1) for dmar_pci_bus_notifier while the priority of IOMMU bus notifier uses the default value(0), therefore DMAR decoding will be in advance of DRHD search for a new device to find the correct IOMMU. Following is a 2-step example that triggers the bug by simulating PCI device hot add behavior in Intel Sapphire Rapids server. echo 1 > /sys/bus/pci/devices/0000:6a:01.0/remove echo 1 > /sys/bus/pci/rescan Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope") Cc: stable@vger.kernel.org # v3.15+ Reported-by: Zhang, Bernice <bernice.zhang@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Yian Chen <yian.chen@intel.com> Link: https://lore.kernel.org/r/20220521002115.1624069-1-yian.chen@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14iommu/arm-smmu-v3: check return value after calling platform_get_resource()Yang Yingliang1-0/+2
[ Upstream commit b131fa8c1d2afd05d0b7598621114674289c2fbb ] It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220425114525.2651143-1-yangyingliang@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()Yang Yingliang1-3/+2
[ Upstream commit d9ed8af1dee37f181096631fb03729ece98ba816 ] It will cause null-ptr-deref when using 'res', if platform_get_resource() returns NULL, so move using 'res' after devm_ioremap_resource() that will check it to avoid null-ptr-deref. And use devm_platform_get_and_ioremap_resource() to simplify code. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220425114136.2649310-1-yangyingliang@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09iommu/msm: Fix an incorrect NULL check on list iteratorXiaomeng Tong1-4/+7
commit 8b9ad480bd1dd25f4ff4854af5685fa334a2f57a upstream. The bug is here: if (!iommu || iommu->dev->of_node != spec->np) { The list iterator value 'iommu' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found (in fact, it will point to a invalid structure object containing HEAD). To fix the bug, use a new value 'iter' as the list iterator, while use the old value 'iommu' as a dedicated variable to point to the found one, and remove the unneeded check for 'iommu->dev->of_node != spec->np' outside the loop. Cc: stable@vger.kernel.org Fixes: f78ebca8ff3d6 ("iommu/msm: Add support for generic master bindings") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Link: https://lore.kernel.org/r/20220501132823.12714-1-xiam0nd.tong@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09iommu/amd: Increase timeout waiting for GA log enablementJoerg Roedel1-1/+1
[ Upstream commit 42bb5aa043382f09bef2cc33b8431be867c70f8e ] On some systems it can take a long time for the hardware to enable the GA log of the AMD IOMMU. The current wait time is only 0.1ms, but testing showed that it can take up to 14ms for the GA log to enter running state after it has been enabled. Sometimes the long delay happens when booting the system, sometimes only on resume. Adjust the timeout accordingly to not print a warning when hardware takes a longer than usual. There has already been an attempt to fix this with commit 9b45a7738eec ("iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()") But that commit was based on some wrong math and did not fix the issue in all cases. Cc: "D. Ziegfeld" <dzigg@posteo.de> Cc: Jörg-Volker Peetz <jvpeetz@web.de> Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220520102214.12563-1-joro@8bytes.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09iommu/mediatek: Add list_del in mtk_iommu_removeYong Wu1-2/+1
[ Upstream commit ee55f75e4bcade81d253163641b63bef3e76cac4 ] Lack the list_del in the mtk_iommu_remove, and remove bus_set_iommu(*, NULL) since there may be several iommu HWs. we can not bus_set_iommu null when one iommu driver unbind. This could be a fix for mt2712 which support 2 M4U HW and list them. Fixes: 7c3a2ec02806 ("iommu/mediatek: Merge 2 M4U HWs into one iommu domain") Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20220503071427.2285-6-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09iommu/vt-d: Add RPLS to quirk list to skip TE disablingTejas Upadhyay1-1/+1
[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ] The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before completing the translation enable command and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translati on(), waiting for the completion of TE transition. This adds RPLS to a quirk list for those devices and skips TE disabling if the qurik hits. Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898 Tested-by: Raviteja Goud Talla <ravitejax.goud.talla@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadhyay@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-12iommu/vt-d: Calculate mask for non-aligned flushesDavid Stevens1-3/+24
commit 59bf3557cf2f8a469a554aea1e3d2c8e72a579f7 upstream. Calculate the appropriate mask for non-size-aligned page selective invalidation. Since psi uses the mask value to mask out the lower order bits of the target address, properly flushing the iotlb requires using a mask value such that [pfn, pfn+pages) all lie within the flushed size-aligned region. This is not normally an issue because iova.c always allocates iovas that are aligned to their size. However, iovas which come from other sources (e.g. userspace via VFIO) may not be aligned. To properly flush the IOTLB, both the start and end pfns need to be equal after applying the mask. That means that the most efficient mask to use is the index of the lowest bit that is equal where all higher bits are also equal. For example, if pfn=0x17f and pages=3, then end_pfn=0x181, so the smallest mask we can use is 8. Any differences above the highest bit of pages are due to carrying, so by xnor'ing pfn and end_pfn and then masking out the lower order bits based on pages, we get 0xffffff00, where the first set bit is the mask we want to use. Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking") Cc: stable@vger.kernel.org Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13iommu/omap: Fix regression in probe for NULL pointer dereferenceTony Lindgren1-1/+1
[ Upstream commit 71ff461c3f41f6465434b9e980c01782763e7ad8 ] Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started triggering a NULL pointer dereference for some omap variants: __iommu_probe_device from probe_iommu_group+0x2c/0x38 probe_iommu_group from bus_for_each_dev+0x74/0xbc bus_for_each_dev from bus_iommu_probe+0x34/0x2e8 bus_iommu_probe from bus_set_iommu+0x80/0xc8 bus_set_iommu from omap_iommu_init+0x88/0xcc omap_iommu_init from do_one_initcall+0x44/0x24 This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV) as noted by Jason Gunthorpe <jgg@ziepe.ca>. Looks like the regression already happened with an earlier commit 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") that changed the function return type and missed converting one place. Cc: Drew Fustini <dfustini@baylibre.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Suman Anna <s-anna@ti.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Fixes: 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") Signed-off-by: Tony Lindgren <tony@atomide.com> Tested-by: Drew Fustini <dfustini@baylibre.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220331062301.24269-1-tony@atomide.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13iommu/arm-smmu-v3: fix event handling soft lockupZhou Guanghui1-0/+1
[ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ] During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported. arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after the event information is printed. Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08iommu/ipmmu-vmsa: Check for error num after setting maskJiasheng Jiang1-1/+3
[ Upstream commit 1fdbbfd5099f797a4dac05e7ef0192ba4a9c39b4 ] Because of the possible failure of the dma_supported(), the dma_set_mask_and_coherent() may return error num. Therefore, it should be better to check it and return the error if fails. Fixes: 1c894225bf5b ("iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Link: https://lore.kernel.org/r/20220106024302.2574180-1-jiasheng@iscas.ac.cn Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08iommu/iova: Improve 32-bit free space estimateRobin Murphy1-2/+3
commit 5b61343b50590fb04a3f6be2cdc4868091757262 upstream. For various reasons based on the allocator behaviour and typical use-cases at the time, when the max32_alloc_size optimisation was introduced it seemed reasonable to couple the reset of the tracked size to the update of cached32_node upon freeing a relevant IOVA. However, since subsequent optimisations focused on helping genuine 32-bit devices make best use of even more limited address spaces, it is now a lot more likely for cached32_node to be anywhere in a "full" 32-bit address space, and as such more likely for space to become available from IOVAs below that node being freed. At this point, the short-cut in __cached_rbnode_delete_update() really doesn't hold up any more, and we need to fix the logic to reliably provide the expected behaviour. We still want cached32_node to only move upwards, but we should reset the allocation size if *any* 32-bit space has become available. Reported-by: Yunfei Wang <yf.wang@mediatek.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Miles Chen <miles.chen@mediatek.com> Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>