Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
"Two very minor fixes:
- Fix mismatched kvalloc()/kfree()
- Spelling fixes in documentation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Fix spelling errors in iommufd.rst
iommufd: viommu: free memory allocated by kvcalloc() using kvfree()
|
|
The riscv_iommu_pte_fetch() function returns either NULL for
unmapped/never-mapped iova, or a valid leaf pte pointer that
requires no further validation.
riscv_iommu_iova_to_phys() failed to handle NULL returns.
Prevent null pointer dereference in
riscv_iommu_iova_to_phys(), and remove the pte validation.
Fixes: 488ffbf18171 ("iommu/riscv: Paging domain support")
Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: XianLiang Huang <huangxianliang@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250820072248.312-1-huangxianliang@lanxincomputing.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Much like arm-smmu in commit 7d835134d4e1 ("iommu/arm-smmu: Make
instance lookup robust"), virtio-iommu appears to have the same issue
where iommu_device_register() makes the IOMMU instance visible to other
API callers (including itself) straight away, but internally the
instance isn't ready to recognise itself for viommu_probe_device() to
work correctly until after viommu_probe() has returned. This matters a
lot more now that bus_iommu_probe() has the DT/VIOT knowledge to probe
client devices the way that was always intended. Tweak the lookup and
initialisation in much the same way as for arm-smmu, to ensure that what
we register is functional and ready to go.
Cc: stable@vger.kernel.org
Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/308911aaa1f5be32a3a709996c7bd6cf71d30f33.1755190036.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The arm_smmu_attach_commit() updates master->ats_enabled before calling
arm_smmu_remove_master_domain() that is supposed to clean up everything
in the old domain, including the old domain's nr_ats_masters. So, it is
supposed to use the old ats_enabled state of the device, not an updated
state.
This isn't a problem if switching between two domains where:
- old ats_enabled = false; new ats_enabled = false
- old ats_enabled = true; new ats_enabled = true
but can fail cases where:
- old ats_enabled = false; new ats_enabled = true
(old domain should keep the counter but incorrectly decreased it)
- old ats_enabled = true; new ats_enabled = false
(old domain needed to decrease the counter but incorrectly missed it)
Update master->ats_enabled after arm_smmu_remove_master_domain() to fix
this.
Fixes: 7497f4211f4f ("iommu/arm-smmu-v3: Make changing domains be hitless for ATS")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Link: https://lore.kernel.org/r/20250801030127.2006979-1-nicolinc@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Use kvfree() instead of kfree() to free pages allocated by kvcalloc()
in iommufs_hw_queue_alloc_phys() to fix potential memory corruption.
Ensure the memory is properly freed, as kvcalloc may internally use
vmalloc or kmalloc depending on available memory in the system.
Fixes: 2238ddc2b056 ("iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl")
Link: https://patch.msgid.link/r/aJifyVV2PL6WGEs6@bhairav-test.ee.iitb.ac.in
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Sparse reported a warning:
drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c:305:47:
sparse: expected restricted __le64
sparse: got unsigned long long
Add cpu_to_le64() to fix that.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508142105.Jb5Smjsg-lkp@intel.com/
Suggested-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20250814193039.2265813-1-nicolinc@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
While the kernel command line is considered trusted in most environments,
avoid writing 1 byte past the end of "acpiid" if the "str" argument is
maximum length.
Reported-by: Simcha Kosman <simcha.kosman@cyberark.com>
Closes: https://lore.kernel.org/all/AS8P193MB2271C4B24BCEDA31830F37AE84A52@AS8P193MB2271.EURP193.PROD.OUTLOOK.COM
Fixes: b6b26d86c61c ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Ankit Soni <Ankit.Soni@amd.com>
Link: https://lore.kernel.org/r/20250804154023.work.970-kees@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Allow built-in drivers, not just modular drivers, to use async
initial probing (Lukas Wunner)
- Support Immediate Readiness even on devices with no PM Capability
(Sean Christopherson)
- Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
required delay between a reset and sending config requests to a
device (Niklas Cassel)
- Add pci_is_display() to check for "Display" base class and use it
in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)
- Allow 'isolated PCI functions' (multi-function devices without a
function 0) for LoongArch, similar to s390 and jailhouse (Huacai
Chen)
Power control:
- Add ability to enable optional slot clock for cases where the PCIe
host controller and the slot are supplied by different clocks
(Marek Vasut)
PCIe native device hotplug:
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
misinterpreting a config read failure after a device has been
removed (Lukas Wunner)
- Avoid creating a useless PCIe port service device for pciehp if the
slot is handled by the ACPI hotplug driver (Lukas Wunner)
- Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
ports (Lukas Wunner)
Virtualization:
- Save VF resizable BAR state and restore it after reset (Michał
Winiarski)
- Allow IOV resources (VF BARs) to be resized (Michał Winiarski)
- Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
(Michał Winiarski)
Endpoint framework:
- Add RC-to-EP doorbell support using platform MSI controller,
including a test case (Frank Li)
- Allow BAR assignment via configfs so platforms have flexibility in
determining BAR usage (Jerome Brunet)
Native PCIe controller drivers:
- Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
DT schema format (Rob Herring)
- Use dev_fwnode() instead of of_fwnode_handle() to remove OF
dependency in altera (fixes an unused variable), designware-host,
mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
xilinx-nwl (Jiri Slaby, Arnd Bergmann)
- Convert aardvark, altera, brcmstb, designware-host, iproc,
mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
using msi_create_parent_irq_domain() instead; this makes the
interrupt controller per-PCI device, allows dynamic allocation of
vectors after initialization, and allows support of IMS (Nam Cao)
APM X-Gene PCIe controller driver:
- Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
bits, use device-managed memory allocations, and clean things up
(Marc Zyngier)
- Probe xgene-msi as a standard platform driver rather than a
subsys_initcall (Marc Zyngier)
Broadcom STB PCIe controller driver:
- Add optional DT 'num-lanes' property and if present, use it to
override the Maximum Link Width advertised in Link Capabilities
(Jim Quinlan)
Cadence PCIe controller driver:
- Use PCIe Message routing types from the PCI core rather than
defining private ones (Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)
- Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
(Richard Zhu)
- Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
trigger doorbel on Endpoint (Frank Li)
- Remove apps_reset (LTSSM_EN) from
imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
regression on i.MX8MM (Richard Zhu)
- Delay Endpoint link start until configfs 'start' written (Richard
Zhu)
Intel VMD host bridge driver:
- Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)
Qualcomm PCIe controller driver:
- Add DT binding and driver support for SA8255p, which supports ECAM
for Configuration Space access (Mayank Rana)
- Update DT binding and driver to describe PHYs and per-Root Port
resets in a Root Port stanza and deprecate describing them in the
host bridge; this makes it possible to support multiple Root Ports
in the future (Krishna Chaitanya Chundru)
- Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)
- Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)
- Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
bindings (Konrad Dybcio)
- Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
Zhang)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Rockchip PCIe controller driver:
- Drop unused PCIe Message routing and code definitions (Hans Zhang)
- Remove several unused header includes (Hans Zhang)
- Use standard PCIe config register definitions instead of
rockchip-specific redefinitions (Geraldo Nascimento)
- Set Target Link Speed to 5.0 GT/s before retraining so we have a
chance to train at a higher speed (Geraldo Nascimento)
Rockchip DesignWare PCIe controller driver:
- Prevent race between link training and register update via DBI by
inhibiting link training after hot reset and link down (Wilfred
Mallawa)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Sophgo PCIe controller driver:
- Add DT binding and driver for Sophgo SG2044 PCIe controller driver
in Root Complex mode (Inochi Amaoto)
Synopsys DesignWare PCIe controller driver:
- Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
Ports that support > 5.0 GT/s. Slower Ports still rely on the
not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
waiting for the Link (Niklas Cassel)"
* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
dt-bindings: PCI: Remove 83xx-512x-pci.txt
dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
PCI: Move is_pciehp check out of pciehp_is_native()
PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
selftests: pci_endpoint: Add doorbell test case
misc: pci_endpoint_test: Add doorbell test case
PCI: endpoint: pci-epf-test: Add doorbell test support
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
PCI: vmd: Switch to msi_create_parent_irq_domain()
PCI: vmd: Convert to lock guards
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"This broadly brings the assigned HW command queue support to iommufd.
This feature is used to improve SVA performance in VMs by avoiding
paravirtualization traps during SVA invalidations.
Along the way I think some of the core logic is in a much better state
to support future driver backed features.
Summary:
- IOMMU HW now has features to directly assign HW command queues to a
guest VM. In this mode the command queue operates on a limited set
of invalidation commands that are suitable for improving guest
invalidation performance and easy for the HW to virtualize.
This brings the generic infrastructure to allow IOMMU drivers to
expose such command queues through the iommufd uAPI, mmap the
doorbell pages, and get the guest physical range for the command
queue ring itself.
- An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built
on the new iommufd command queue features. It works with the
existing SMMU driver support for cmdqv in guest VMs.
- Many precursor cleanups and improvements to support the above
cleanly, changes to the general ioctl and object helpers, driver
support for VDEVICE, and mmap pgoff cookie infrastructure.
- Sequence VDEVICE destruction to always happen before VFIO device
destruction. When using the above type features, and also in future
confidential compute, the internal virtual device representation
becomes linked to HW or CC TSM configuration and objects. If a VFIO
device is removed from iommufd those HW objects should also be
cleaned up to prevent a sort of UAF. This became important now that
we have HW backing the VDEVICE.
- Fix one syzkaller found error related to math overflows during iova
allocation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits)
iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size
iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3
iommufd: Rename some shortterm-related identifiers
iommufd/selftest: Add coverage for vdevice tombstone
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd: Destroy vdevice on idevice destroy
iommufd: Add a pre_destroy() op for objects
iommufd: Add iommufd_object_tombstone_user() helper
iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
iommufd/selftest: Test reserved regions near ULONG_MAX
iommufd: Prevent ALIGN() overflow
iommu/tegra241-cmdqv: import IOMMUFD module namespace
iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
...
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally
- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor
- Convert more system register sanitization to the config-driven
implementation
- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls
- Various cleanups and minor fixes
LoongArch:
- Add stat information for in-kernel irqchip
- Add tracepoints for CPUCFG and CSR emulation exits
- Enhance in-kernel irqchip emulation
- Various cleanups
RISC-V:
- Enable ring-based dirty memory tracking
- Improve perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU improvements related to upcoming nested virtualization
s390x
- Fixes
x86:
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time
- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)
- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives
- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical
- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps
- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently
- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo
- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone
- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)
- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC
- Various cleanups and fixes
x86 (Intel):
- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF
x86 (AMD):
- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code
- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation
- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs
- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model
- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care
- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance
- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data
Generic:
- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand
- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs
- Clean up and document/comment the irqfd assignment code
- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique
- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions
- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL
- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely
- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation
Selftests:
- Fix a comment typo
- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)
- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Will Deacon:
"Core:
- Remove the 'pgsize_bitmap' member from 'struct iommu_ops'
- Convert the x86 drivers over to msi_create_parent_irq_domain()
AMD-Vi:
- Add support for examining driver/device internals via debugfs
- Add support for "HATDis" to disable host translation when it is not
supported
- Add support for limiting the maximum host translation level based
on EFR[HATS]
Apple DART:
- Don't enable as built-in by default when ARCH_APPLE is selected
Arm SMMU:
- Devicetree bindings update for the Qualcomm SMMU in the "Milos" SoC
- Support for Qualcomm SM6115 MDSS parts
- Disable PRR on Qualcomm SM8250 as using these bits causes the
hypervisor to explode
Intel VT-d:
- Reorganize Intel VT-d to be ready for iommupt
- Optimize iotlb_sync_map for non-caching/non-RWBF modes
- Fix missed PASID in dev TLB invalidation in cache_tag_flush_all()
Mediatek:
- Fix build warnings when W=1
Samsung Exynos:
- Add support for reserved memory regions specified by the bootloader
TI OMAP:
- Use syscon_regmap_lookup_by_phandle_args() instead of parsing the
node manually
Misc:
- Cleanups and minor fixes across the board"
* tag 'iommu-updates-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits)
iommu/vt-d: Fix UAF on sva unbind with pending IOPFs
iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
dt-bindings: arm-smmu: Remove sdm845-cheza specific entry
iommu/amd: Fix geometry.aperture_end for V2 tables
iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks
iommu/amd: Add documentation for AMD IOMMU debugfs support
iommu/amd: Add debugfs support to dump IRT Table
iommu/amd: Add debugfs support to dump device table
iommu/amd: Add support for device id user input
iommu/amd: Add debugfs support to dump IOMMU command buffer
iommu/amd: Add debugfs support to dump IOMMU Capability registers
iommu/amd: Add debugfs support to dump IOMMU MMIO registers
iommu/amd: Refactor AMD IOMMU debugfs initial setup
dt-bindings: arm-smmu: document the support on Milos
iommu/exynos: add support for reserved regions
iommu/arm-smmu: disable PRR on SM8250
iommu/arm-smmu-v3: Revert vmaster in the error path
iommu/io-pgtable-arm: Remove unused macro iopte_prot
iommu/arm-smmu-qcom: Add SM6115 MDSS compatible
iommu/qcom: Fix pgsize_bitmap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"A quick summary: perf support for Branch Record Buffer Extensions
(BRBE), typical PMU hardware updates, small additions to MTE for
store-only tag checking and exposing non-address bits to signal
handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.
There is also a TLBI optimisation on hardware that does not require
break-before-make when changing the user PTEs between contiguous and
non-contiguous.
More details:
Perf and PMU updates:
- Add support for new (v3) Hisilicon SLLC and DDRC PMUs
- Add support for Arm-NI PMU integrations that share interrupts
between clock domains within a given instance
- Allow SPE to be configured with a lower sample period than the
minimum recommendation advertised by PMSIDR_EL1.Interval
- Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
- Adjust the perf watchdog period according to cpu frequency changes
- Minor driver fixes and cleanups
Hardware features:
- Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
- Support for reporting the non-address bits during a synchronous MTE
tag check fault (FEAT_MTE_TAGGED_FAR)
- Optimise the TLBI when folding/unfolding contiguous PTEs on
hardware with FEAT_BBM (break-before-make) level 2 and no TLB
conflict aborts
Software features:
- Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
and using the text-poke API for late module relocations
- Force VMAP_STACK always on and change arm64_efi_rt_init() to use
arch_alloc_vmap_stack() in order to avoid KASAN false positives
ACPI:
- Improve SPCR handling and messaging on systems lacking an SPCR
table
Debug:
- Simplify the debug exception entry path
- Drop redundant DBG_MDSCR_* macros
Kselftests:
- Cleanups and improvements for SME, SVE and FPSIMD tests
Miscellaneous:
- Optimise loop to reduce redundant operations in contpte_ptep_get()
- Remove ISB when resetting POR_EL0 during signal handling
- Mark the kernel as tainted on SEA and SError panic
- Remove redundant gcs_free() call"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64/gcs: task_gcs_el0_enable() should use passed task
arm64: Kconfig: Keep selects somewhat alphabetically ordered
arm64: signal: Remove ISB when resetting POR_EL0
kselftest/arm64: Handle attempts to disable SM on SME only systems
kselftest/arm64: Fix SVE write data generation for SME only systems
kselftest/arm64: Test SME on SME only systems in fp-ptrace
kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
kselftest/arm64: Allow sve-ptrace to run on SME only systems
arm64/mm: Drop redundant addr increment in set_huge_pte_at()
kselftest/arm4: Provide local defines for AT_HWCAP3
arm64: Mark kernel as tainted on SAE and SError panic
arm64/gcs: Don't call gcs_free() when releasing task_struct
drivers/perf: hisi: Support PMUs with no interrupt
drivers/perf: hisi: Relax the event number check of v2 PMUs
drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
drivers/perf: hisi: Simplify the probe process for each DDRC version
perf/arm-ni: Support sharing IRQs within an NI instance
perf/arm-ni: Consolidate CPU affinity handling
...
|
|
KVM IRQ changes for 6.17
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.
- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.
- Harden the device posted IRQ code against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.
|
|
It's more flexible to have a get_viommu_size op. Replace static vsmmu_size
and vsmmu_type with that.
Link: https://patch.msgid.link/r/20250724221002.1883034-3-nicolinc@nvidia.com
Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When viommu type is IOMMU_VIOMMU_TYPE_ARM_SMMUV3, always return or init the
standard struct arm_vsmmu, instead of going through impl_ops that must have
its own viommu type than the standard IOMMU_VIOMMU_TYPE_ARM_SMMUV3.
Given that arm_vsmmu_init() is called after arm_smmu_get_viommu_size(), any
unsupported viommu->type must be a corruption. And it must be a driver bug
that its vsmmu_size and vsmmu_init ops aren't paired. Warn these two cases.
Link: https://patch.msgid.link/r/20250724221002.1883034-2-nicolinc@nvidia.com
Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
* arm/smmu/updates:
iommu/arm-smmu: disable PRR on SM8250
iommu/arm-smmu-v3: Revert vmaster in the error path
iommu/io-pgtable-arm: Remove unused macro iopte_prot
|
|
* arm/smmu/bindings:
dt-bindings: arm-smmu: Remove sdm845-cheza specific entry
dt-bindings: arm-smmu: document the support on Milos
iommu/arm-smmu-qcom: Add SM6115 MDSS compatible
|
|
* apple/dart:
iommu/apple-dart: Drop default ARCH_APPLE in Kconfig
|
|
* ti/omap:
iommu/omap: Use syscon_regmap_lookup_by_phandle_args
iommu/omap: Drop redundant check if ti,syscon-mmuconfig exists
|
|
* mediatek:
iommu/mediatek-v1: Tidy up probe_finalize
|
|
* amd/amd-vi:
iommu/amd: Fix geometry.aperture_end for V2 tables
iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks
iommu/amd: Add documentation for AMD IOMMU debugfs support
iommu/amd: Add debugfs support to dump IRT Table
iommu/amd: Add debugfs support to dump device table
iommu/amd: Add support for device id user input
iommu/amd: Add debugfs support to dump IOMMU command buffer
iommu/amd: Add debugfs support to dump IOMMU Capability registers
iommu/amd: Add debugfs support to dump IOMMU MMIO registers
iommu/amd: Refactor AMD IOMMU debugfs initial setup
iommu/amd: Enable PASID and ATS capabilities in the correct order
iommu/amd: Add efr[HATS] max v1 page table level
iommu/amd: Add HATDis feature support
|
|
* intel/vt-d:
iommu/vt-d: Fix UAF on sva unbind with pending IOPFs
iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range
iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all
iommu/vt-d: Split paging_domain_compatible()
iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
iommu/vt-d: Create unique domain ops for each stage
iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
iommu/vt-d: Do not wipe out the page table NID when devices detach
iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook
|
|
* samsung/exynos:
iommu/exynos: add support for reserved regions
|
|
Commit 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach
path") disables IOPF on device by removing the device from its IOMMU's
IOPF queue when the last IOPF-capable domain is detached from the device.
Unfortunately, it did this in a wrong place where there are still pending
IOPFs. As a result, a use-after-free error is potentially triggered and
eventually a kernel panic with a kernel trace similar to the following:
refcount_t: underflow; use-after-free.
WARNING: CPU: 3 PID: 313 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Workqueue: iopf_queue/dmar0-iopfq iommu_sva_handle_iopf
Call Trace:
<TASK>
iopf_free_group+0xe/0x20
process_one_work+0x197/0x3d0
worker_thread+0x23a/0x350
? rescuer_thread+0x4a0/0x4a0
kthread+0xf8/0x230
? finish_task_switch.isra.0+0x81/0x260
? kthreads_online_cpu+0x110/0x110
? kthreads_online_cpu+0x110/0x110
ret_from_fork+0x13b/0x170
? kthreads_online_cpu+0x110/0x110
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
The intel_pasid_tear_down_entry() function is responsible for blocking
hardware from generating new page faults and flushing all in-flight
ones. Therefore, moving iopf_for_domain_remove() after this function
should resolve this.
Fixes: 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach path")
Reported-by: Ethan Milon <ethan.milon@eviden.com>
Closes: https://lore.kernel.org/r/e8b37f3e-8539-40d4-8993-43a1f3ffe5aa@eviden.com
Suggested-by: Ethan Milon <ethan.milon@eviden.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250723072045.1853328-1-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for
non-caching/non-RWBF modes") dynamically set iotlb_sync_map. This causes
synchronization issues due to lack of locking on map and attach paths,
racing iommufd userspace operations.
Invalidation changes must precede device attachment to ensure all flushes
complete before hardware walks page tables, preventing coherence issues.
Make domain->iotlb_sync_map static, set once during domain allocation. If
an IOMMU requires iotlb_sync_map but the domain lacks it, attach is
rejected. This won't reduce domain sharing: RWBF and shadowing page table
caching are legacy uses with legacy hardware. Mixed configs (some IOMMUs
in caching mode, others not) are unlikely in real-world scenarios.
Fixes: 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes")
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250721051657.1695788-1-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Rename the shortterm-related identifiers to wait-related.
The usage of shortterm_users refcount is now beyond its name. It is
also used for references which live longer than an ioctl execution.
E.g. vdev holds idev's shortterm_users refcount on vdev allocation,
releases it during idev's pre_destroy(). Rename the refcount as
wait_cnt, since it is always used to sync the referencing & the
destruction of the object by waiting for it to go to zero.
List all changed identifiers:
iommufd_object::shortterm_users -> iommufd_object::wait_cnt
REMOVE_WAIT_SHORTTERM -> REMOVE_WAIT
iommufd_object_dec_wait_shortterm() -> iommufd_object_dec_wait()
zerod_shortterm -> zerod_wait_cnt
No functional change intended.
Link: https://patch.msgid.link/r/20250716070349.1807226-9-yilun.xu@linux.intel.com
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove struct device *dev from struct vdevice.
The dev pointer is the Plan B for vdevice to reference the physical
device. As now vdev->idev is added without refcounting concern, just
use vdev->idev->dev when needed. To avoid exposing
struct iommufd_device in the public header, export a
iommufd_vdevice_to_device() helper.
Link: https://patch.msgid.link/r/20250716070349.1807226-6-yilun.xu@linux.intel.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
that vdev can't outlive idev.
idev represents the physical device bound to iommufd, while the vdev
represents the virtual instance of the physical device in the VM. The
lifecycle of the vdev should not be longer than idev. This doesn't
cause real problem on existing use cases cause vdev doesn't impact the
physical device, only provides virtualization information. But to
extend vdev for Confidential Computing (CC), there are needs to do
secure configuration for the vdev, e.g. TSM Bind/Unbind. These
configurations should be rolled back on idev destroy, or the external
driver (VFIO) functionality may be impact.
The idev is created by external driver so its destruction can't fail.
The idev implements pre_destroy() op to actively remove its associated
vdev before destroying itself. There are 3 cases on idev pre_destroy():
1. vdev is already destroyed by userspace. No extra handling needed.
2. vdev is still alive. Use iommufd_object_tombstone_user() to
destroy vdev and tombstone the vdev ID.
3. vdev is being destroyed by userspace. The vdev ID is already
freed, but vdev destroy handler is not completed. This requires
multi-threads syncing - vdev holds idev's short term users
reference until vdev destruction completes, idev leverages
existing wait_shortterm mechanism for syncing.
idev should also block any new reference to it after pre_destroy(),
or the following wait shortterm would timeout. Introduce a 'destroying'
flag, set it to true on idev pre_destroy(). Any attempt to reference
idev should honor this flag under the protection of
idev->igroup->lock.
Link: https://patch.msgid.link/r/20250716070349.1807226-5-yilun.xu@linux.intel.com
Originally-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Signed-off-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add a pre_destroy() op which gives objects a chance to clear their
short term users references before destruction. This op is intended for
external driver created objects (e.g. idev) which does deterministic
destruction.
In order to manage the lifecycle of interrelated objects as well as the
deterministic destruction (e.g. vdev can't outlive idev, and idev
destruction can't fail), short term users references are allowed to
live out of an ioctl execution. An immediate use case is, vdev holds
idev's short term user reference until vdev destruction completes, idev
leverages existing wait_shortterm mechanism to ensure it is destroyed
after vdev.
This extended usage makes the referenced object unable to just wait for
its reference gone. It needs to actively trigger the reference removal,
as well as prevent new references before wait. Should implement these
work in pre_destroy().
Link: https://patch.msgid.link/r/20250716070349.1807226-4-yilun.xu@linux.intel.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.
This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.
Since this happens for abnormal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.
The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.
Link: https://patch.msgid.link/r/20250716070349.1807226-3-yilun.xu@linux.intel.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Signed-off-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
To solve the vdevice lifecycle issue, future patches make the vdevice
allocation protected by lock. That will make
_iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
use _iommufd_object_alloc() for preparation.
Link: https://patch.msgid.link/r/20250716070349.1807226-2-yilun.xu@linux.intel.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The inline pci_is_display() helper does the same thing. Use it.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-5-superm1@kernel.org
|
|
When allocating IOVA the candidate range gets aligned to the target
alignment. If the range is close to ULONG_MAX then the ALIGN() can
wrap resulting in a corrupted iova.
Open code the ALIGN() using get_add_overflow() to prevent this.
This simplifies the checks as we don't need to check for length earlier
either.
Consolidate the two copies of this code under a single helper.
This bug would allow userspace to create a mapping that overlaps with some
other mapping or a reserved range.
Cc: stable@vger.kernel.org
Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping")
Reported-by: syzbot+c2f65e2801743ca64e08@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/685af644.a00a0220.2e5631.0094.GAE@google.com
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://patch.msgid.link/all/1-v1-7b4a16fc390b+10f4-iommufd_alloc_overflow_jgg@nvidia.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The AMD IOMMU documentation seems pretty clear that the V2 table follows
the normal CPU expectation of sign extension. This is shown in
Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
would have three regions in the page table: lower, non-canonical, upper.
The manual describes that the V1 table does not sign extend in section
2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
Further, Vasant has checked this and indicates the HW has an addtional
behavior that the manual does not yet describe. The AMDv2 table does not
have the sign extended behavior when attached to PASID 0, which may
explain why this has gone unnoticed.
The iommu domain geometry does not directly support sign extended page
tables. The driver should report only one of the lower/upper spaces. Solve
this by removing the top VA bit from the geometry to use only the lower
space.
This will also make the iommu_domain work consistently on all PASID 0 and
PASID != 1.
Adjust dma_max_address() to remove the top VA bit. It now returns:
5 Level:
Before 0x1ffffffffffffff
After 0x0ffffffffffffff
4 Level:
Before 0xffffffffffff
After 0x7fffffffffff
Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
Link: https://lore.kernel.org/all/8858d4d6-d360-4ef0-935c-bfd13ea54f42@amd.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/0-v2-0615cc99b88a+1ce-amdv2_geo_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In cases where we have an issue in the device interrupt path with IOMMU
interrupt remapping enabled, dumping valid IRT table entries for the device
is very useful and good input for debugging the issue.
eg.
-> To dump irte entries for a particular device
#echo "c4:00.0" > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/irqtbl | less
or
#echo "0000:c4:00.0" > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/irqtbl | less
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-8-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
IOMMU uses device table data structure to get per-device information for
DMA remapping, interrupt remapping, and other functionalities. It's a
valuable data structure to visualize for debugging issues related to
IOMMU.
eg.
-> To dump device table entry for a particular device
#echo 0000:c4:00.0 > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/devtbl
or
#echo c4:00.0 > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/devtbl
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-7-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Dumping IOMMU data structures like device table, IRT, etc., for all devices
on the system will be a lot of data dumped in a file. Also, user may want
to dump and analyze these data structures just for one or few devices. So
dumping IOMMU data structures like device table, IRT etc for all devices
is not a good approach.
Add "device id" user input to be used for dumping IOMMU data structures
like device table, IRT etc in AMD IOMMU debugfs.
eg.
1. # echo 0000:01:00.0 > /sys/kernel/debug/iommu/amd/devid
# cat /sys/kernel/debug/iommu/amd/devid
Output : 0000:01:00.0
2. # echo 01:00.0 > /sys/kernel/debug/iommu/amd/devid
# cat /sys/kernel/debug/iommu/amd/devid
Output : 0000:01:00.0
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-6-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
IOMMU driver sends command to IOMMU hardware via command buffer. In cases
where IOMMU hardware fails to process commands in command buffer, dumping
it is a valuable input to debug the issue.
IOMMU hardware processes command buffer entry at offset equals to the head
pointer. Dumping just the entry at the head pointer may not always be
useful. The current head may not be pointing to the entry of the command
buffer which is causing the issue. IOMMU Hardware may have processed the
entry and updated the head pointer. So dumping the entire command buffer
gives a broad understanding of what hardware was/is doing. The command
buffer dump will have all entries from start to end of the command buffer.
Along with that, it will have a head and tail command buffer pointer
register dump to facilitate where the IOMMU driver and hardware are in
the command buffer for injecting and processing the entries respectively.
Command buffer is a per IOMMU data structure. So dumping on per IOMMU
basis.
eg.
-> To get command buffer dump for iommu<x> (say, iommu00)
#cat /sys/kernel/debug/iommu/amd/iommu00/cmdbuf
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-5-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
IOMMU Capability registers defines capabilities of IOMMU and information
needed for initialising MMIO registers and device table. This is useful
to dump these registers for debugging IOMMU related issues.
e.g.
-> To get capability registers value at offset 0x10 for iommu<x> (say,
iommu00)
# echo "0x10" > /sys/kernel/debug/iommu/amd/iommu00/capability
# cat /sys/kernel/debug/iommu/amd/iommu00/capability
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-4-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Analyzing IOMMU MMIO registers gives a view of what IOMMU is
configured with on the system and is helpful to debug issues
with IOMMU.
eg.
-> To get mmio registers value at offset 0x18 for iommu<x> (say, iommu00)
# echo "0x18" > /sys/kernel/debug/iommu/amd/iommu00/mmio
# cat /sys/kernel/debug/iommu/amd/iommu00/mmio
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-3-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Rearrange initial setup of AMD IOMMU debugfs to segregate per IOMMU
setup and setup which is common for all IOMMUs. This ensures that common
debugfs paths (introduced in subsequent patches) are created only once
instead of being created for each IOMMU.
With the change, there is no need to use lock as amd_iommu_debugfs_setup()
will be called only once during AMD IOMMU initialization. So remove lock
acquisition in amd_iommu_debugfs_setup().
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250702093804.849-2-dheerajkumar.srivastava@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The tegra variant of smmu-v3 now uses the iommufd mmap interface but
is missing the corresponding import:
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_object_depend from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol iommufd_viommu_report_event from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_destroy_mmap from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_object_undepend from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_alloc_mmap from namespace IOMMUFD, but does not import it.
Fixes: b135de24cfc0 ("iommu/tegra241-cmdqv: Add user-space use support")
Link: https://patch.msgid.link/r/20250714205747.3475772-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
An abort op was introduced to allow its caller to invoke it within a lock
in the caller's function. On the other hand, _iommufd_object_alloc_ucmd()
would invoke the abort op in iommufd_object_abort_and_destroy() that must
be outside the caller's lock. So, these two cannot work together.
Add a validation in the _iommufd_object_alloc_ucmd(). Pick -EOPNOTSUPP to
reject the function call, indicating that the object allocator is buggy.
Link: https://patch.msgid.link/r/20250710202354.1658511-1-nicolinc@nvidia.com
Suggested-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The bootloader configures a reserved memory region for framebuffer,
which is protected by the IOMMU. The kernel-side driver is oblivious as
of which memory region is set up by the bootloader. In such case, the
IOMMU tries to reference the reserved region - which is not reserved in
the kernel anymore - and it results in an unrecoverable page fault. More
information about it is provided in [1].
Add support for reserved regions using iommu_dma_get_resv_regions().
For OF supported boards, this requires defining the region in the
iommu-addresses property of the IOMMU owner's node.
Link: https://lore.kernel.org/r/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org [1]
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://lore.kernel.org/r/20250712-exynos-sysmmu-resv-regions-v1-1-e79681fcab1a@disroot.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
On SM8250 / QRB5165-RB5 using PRR bits resets the device, most likely
because of the hyp limitations. Disable PRR support on that platform.
Fixes: 7f2ef1bfc758 ("iommu/arm-smmu: Add support for PRR bit setup")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250705-iommu-fix-prr-v2-1-406fecc37cf8@oss.qualcomm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The error path for err_free_master_domain leaks the vmaster. Move all
the kfrees for vmaster into the goto error section.
Fixes: cfea71aea921 ("iommu/arm-smmu-v3: Put iopf enablement in the domain attach path")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20250711204020.1677884-1-nicolinc@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 33729a5fc0ca ("iommu/io-pgtable-arm: Remove split on unmap
behavior") removed the last user of the macro iopte_prot. Remove the
macro definition of iopte_prot as well as three other related
definitions.
Fixes: 33729a5fc0ca ("iommu/io-pgtable-arm: Remove split on unmap behavior")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250708211705.1567787-1-danielmentz@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add the SM6115 MDSS compatible to clients compatible list, as it also
needs that workaround.
Without this workaround, for example, QRB4210 RB2 which is based on
SM4250/SM6115 generates a lot of smmu unhandled context faults during
boot:
arm_smmu_context_fault: 116854 callbacks suppressed
arm-smmu c600000.iommu: Unhandled context fault: fsr=0x402,
iova=0x5c0ec600, fsynr=0x320021, cbfrsynra=0x420, cb=5
arm-smmu c600000.iommu: FSR = 00000402 [Format=2 TF], SID=0x420
arm-smmu c600000.iommu: FSYNR0 = 00320021 [S1CBNDX=50 PNU PLVL=1]
arm-smmu c600000.iommu: Unhandled context fault: fsr=0x402,
iova=0x5c0d7800, fsynr=0x320021, cbfrsynra=0x420, cb=5
arm-smmu c600000.iommu: FSR = 00000402 [Format=2 TF], SID=0x420
and also failed initialisation of lontium lt9611uxc, gpu and dpu is
observed:
(binding MDSS components triggered by lt9611uxc have failed)
------------[ cut here ]------------
!aspace
WARNING: CPU: 6 PID: 324 at drivers/gpu/drm/msm/msm_gem_vma.c:130 msm_gem_vma_init+0x150/0x18c [msm]
Modules linked in: ... (long list of modules)
CPU: 6 UID: 0 PID: 324 Comm: (udev-worker) Not tainted 6.15.0-03037-gaacc73ceeb8b #4 PREEMPT
Hardware name: Qualcomm Technologies, Inc. QRB4210 RB2 (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : msm_gem_vma_init+0x150/0x18c [msm]
lr : msm_gem_vma_init+0x150/0x18c [msm]
sp : ffff80008144b280
...
Call trace:
msm_gem_vma_init+0x150/0x18c [msm] (P)
get_vma_locked+0xc0/0x194 [msm]
msm_gem_get_and_pin_iova_range+0x4c/0xdc [msm]
msm_gem_kernel_new+0x48/0x160 [msm]
msm_gpu_init+0x34c/0x53c [msm]
adreno_gpu_init+0x1b0/0x2d8 [msm]
a6xx_gpu_init+0x1e8/0x9e0 [msm]
adreno_bind+0x2b8/0x348 [msm]
component_bind_all+0x100/0x230
msm_drm_bind+0x13c/0x3d0 [msm]
try_to_bring_up_aggregate_device+0x164/0x1d0
__component_add+0xa4/0x174
component_add+0x14/0x20
dsi_dev_attach+0x20/0x34 [msm]
dsi_host_attach+0x58/0x98 [msm]
devm_mipi_dsi_attach+0x34/0x90
lt9611uxc_attach_dsi.isra.0+0x94/0x124 [lontium_lt9611uxc]
lt9611uxc_probe+0x540/0x5fc [lontium_lt9611uxc]
i2c_device_probe+0x148/0x2a8
really_probe+0xbc/0x2c0
__driver_probe_device+0x78/0x120
driver_probe_device+0x3c/0x154
__driver_attach+0x90/0x1a0
bus_for_each_dev+0x68/0xb8
driver_attach+0x24/0x30
bus_add_driver+0xe4/0x208
driver_register+0x68/0x124
i2c_register_driver+0x48/0xcc
lt9611uxc_driver_init+0x20/0x1000 [lontium_lt9611uxc]
do_one_initcall+0x60/0x1d4
do_init_module+0x54/0x1fc
load_module+0x1748/0x1c8c
init_module_from_file+0x74/0xa0
__arm64_sys_finit_module+0x130/0x2f8
invoke_syscall+0x48/0x104
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x2c/0x80
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x19c
---[ end trace 0000000000000000 ]---
msm_dpu 5e01000.display-controller: [drm:msm_gpu_init [msm]] *ERROR* could not allocate memptrs: -22
msm_dpu 5e01000.display-controller: failed to load adreno gpu
platform a400000.remoteproc:glink-edge:apr:service@7:dais: Adding to iommu group 19
msm_dpu 5e01000.display-controller: failed to bind 5900000.gpu (ops a3xx_ops [msm]): -22
msm_dpu 5e01000.display-controller: adev bind failed: -22
lt9611uxc 0-002b: failed to attach dsi to host
lt9611uxc 0-002b: probe with driver lt9611uxc failed with error -22
Suggested-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Fixes: 3581b7062cec ("drm/msm/disp/dpu1: add support for display on SM6115")
Cc: stable@vger.kernel.org
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://lore.kernel.org/r/20250613173238.15061-1-alexey.klimov@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
qcom uses the ARM_32_LPAE_S1 format which uses the ARM long descriptor
page table. Eventually arm_32_lpae_alloc_pgtable_s1() will adjust
the pgsize_bitmap with:
cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
So the current declaration is nonsensical. Fix it to be just SZ_4K which
is what it has actually been using so far. Most likely the qcom driver
copy and pasted the pgsize_bitmap from something using the ARM_V7S format.
Fixes: db64591de4b2 ("iommu/qcom: Remove iommu_ops pgsize_bitmap")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvif6kDDFar5ZK4Dff3XThSrhaZaJundjQYujaJW978yg@mail.gmail.com/
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/0-v1-65a7964d2545+195-qcom_pgsize_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The logic in cache_tag_flush_all() to iterate over cache tags and issue
TLB invalidations is largely duplicated in cache_tag_flush_range(), with
the only difference being the range parameters.
Extend cache_tag_flush_range() to handle a full address space flush when
called with start = 0 and end = ULONG_MAX. This allows
cache_tag_flush_all() to simply delegate to cache_tag_flush_range()
Signed-off-by: Ethan Milon <ethan.milon@eviden.com>
Link: https://lore.kernel.org/r/20250708214821.30967-2-ethan.milon@eviden.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-12-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|