summaryrefslogtreecommitdiff
path: root/drivers/pci/p2pdma.c
AgeCommit message (Collapse)AuthorFilesLines
2019-07-02PCI/P2PDMA: use the dev_pagemap internal refcountChristoph Hellwig1-53/+4
The functionality is identical to the one currently open coded in p2pdma.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02memremap: pass a struct dev_pagemap to ->kill and ->cleanupChristoph Hellwig1-4/+5
Passing the actual typed structure leads to more understandable code vs just passing the ref member. Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02memremap: move dev_pagemap callbacks into a separate structureChristoph Hellwig1-2/+6
The dev_pagemap is a growing too many callbacks. Move them into a separate ops structure so that they are not duplicated for multiple instances, and an attacker can't easily overwrite them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-22Merge tag 'pci-v5.2-fixes-1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2 because we don't yet know how to support P2PDMA in that case (Logan Gunthorpe)" * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
2019-06-20PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is presentLogan Gunthorpe1-0/+4
Presently, there is no path to DMA map P2PDMA memory, so if a TLP targeting this memory hits the root complex and an IOMMU is present, the IOMMU will reject the transaction, even if the RC would support P2PDMA. So until the kernel knows to map these DMA addresses in the IOMMU, we should not enable the whitelist when an IOMMU is present. Link: https://lore.kernel.org/linux-pci/20190522201252.2997-1-logang@deltatee.com/ Fixes: 0f97da831026 ("PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de>
2019-06-14mm/devm_memremap_pages: fix final page put raceDan Williams1-14/+3
Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens *after* devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 41e94a851304 ("add devm_memremap_pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-14PCI/P2PDMA: track pgmap references per resource, not globallyDan Williams1-43/+81
In preparation for fixing a race between devm_memremap_pages_release() and the final put of a page from the device-page-map, allocate a percpu-ref per p2pdma resource mapping. Link: http://lkml.kernel.org/r/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-14PCI/P2PDMA: fix the gen_pool_add_virt() failure pathDan Williams1-1/+3
The pci_p2pdma_add_resource() implementation immediately frees the pgmap if gen_pool_add_virt() fails. However, that means that when @dev triggers a devres release devm_memremap_pages_release() will crash trying to access the freed @pgmap. Use the new devm_memunmap_pages() to manually free the mapping in the error path. Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-02PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root ComplexChristian König1-3/+35
The PCI specs say that peer-to-peer DMA should be supported between any two devices that have a common upstream PCI-to-PCI bridge. But devices under different Root Ports don't share a common upstream bridge, and PCIe r4.0, sec 1.3.1, says routing peer-to-peer transactions between Root Ports in the same Root Complex is optional. Many Root Complexes, including AMD ZEN, *do* support peer-to-peer DMA even between Root Ports. Add a whitelist and allow peer-to-peer DMA if both participants are attached to a Root Complex known to support it. Link: https://lore.kernel.org/linux-pci/20190418115859.2394-1-christian.koenig@amd.com Signed-off-by: Christian König <christian.koenig@amd.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2019-01-06Merge tag 'pci-v4.21-changes' of ↵Linus Torvalds1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - Remove unused lists from ASPM pcie_link_state (Frederick Lawler) - Fix Broadcom CNB20LE host bridge unintended sign extension (Colin Ian King) - Expand Kconfig "PF" acronyms (Randy Dunlap) - Update MAINTAINERS for arch/x86/kernel/early-quirks.c (Bjorn Helgaas) - Add missing include to drivers/pci.h (Alexandru Gagniuc) - Override Synopsys USB 3.x HAPS device class so dwc3-haps can claim it instead of xhci (Thinh Nguyen) - Clean up P2PDMA documentation (Randy Dunlap) - Allow runtime PM even if driver doesn't supply callbacks (Jarkko Nikula) - Remove status check after submitting Switchtec MRPC Firmware Download commands to avoid Completion Timeouts (Kelvin Cao) - Set Switchtec coherent DMA mask to allow 64-bit DMA (Boris Glimcher) - Fix Switchtec SWITCHTEC_IOCTL_EVENT_IDX_ALL flag overwrite issue (Joey Zhang) - Enable write combining for Switchtec MRPC Input buffers (Kelvin Cao) - Add Switchtec MRPC DMA mode support (Wesley Sheng) - Skip VF scanning on powerpc, which does this in firmware (Sebastian Ott) - Add Amlogic Meson PCIe controller driver and DT bindings (Yue Wang) - Constify histb dw_pcie_host_ops structure (Julia Lawall) - Support multiple power domains for imx6 (Leonard Crestez) - Constify layerscape driver data (Stefan Agner) - Update imx6 Kconfig to allow imx6 PCIe in imx7 kernel (Trent Piepho) - Support armada8k GPIO reset (Baruch Siach) - Support suspend/resume support on imx6 (Leonard Crestez) - Don't hard-code DesignWare DBI/ATU offst (Stephen Warren) - Skip i.MX6 PHY setup on i.MX7D (Andrey Smirnov) - Remove Jianguo Sun from HiSilicon STB maintainers (Lorenzo Pieralisi) - Mask DesignWare interrupts instead of disabling them to avoid lost interrupts (Marc Zyngier) - Add locking when acking DesignWare interrupts (Marc Zyngier) - Ack DesignWare interrupts in the proper callbacks (Marc Zyngier) - Use devm resource parser in mediatek (Honghui Zhang) - Remove unused mediatek "num-lanes" DT property (Honghui Zhang) - Add UniPhier PCIe controller driver and DT bindings (Kunihiko Hayashi) - Enable MSI for imx6 downstream components (Richard Zhu) * tag 'pci-v4.21-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (40 commits) PCI: imx: Enable MSI from downstream components s390/pci: skip VF scanning PCI/IOV: Add flag so platforms can skip VF scanning PCI/IOV: Factor out sriov_add_vfs() PCI: uniphier: Add UniPhier PCIe host controller support dt-bindings: PCI: Add UniPhier PCIe host controller description PCI: amlogic: Add the Amlogic Meson PCIe controller driver dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe controller arm64: dts: mt7622: Remove un-used property for PCIe arm: dts: mt7623: Remove un-used property for PCIe dt-bindings: PCI: MediaTek: Remove un-used property PCI: mediatek: Remove un-used variant in struct mtk_pcie_port MAINTAINERS: Remove Jianguo Sun from HiSilicon STB DWC entry PCI: dwc: Don't hard-code DBI/ATU offset PCI: imx: Add imx6sx suspend/resume support PCI: armada8k: Add support for gpio controlled reset signal PCI: dwc: Adjust Kconfig to allow IMX6 PCIe host on IMX7 PCI: dwc: layerscape: Constify driver data PCI: imx: Add multi-pd support PCI: Override Synopsys USB 3.x HAPS device class ...
2018-12-28mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPLDan Williams1-8/+2
At Maintainer Summit, Greg brought up a topic I proposed around EXPORT_SYMBOL_GPL usage. The motivation was considerations for when EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional step of reclassifying an existing export. Specifically, I wanted to make the case that although the line is fuzzy and hard to specify in abstract terms, it is nonetheless clear that devm_memremap_pages() and HMM (Heterogeneous Memory Management) have crossed it. The devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the beginning, and HMM as a derivative of that functionality should have naturally picked up that designation as well. Contrary to typical rules, the HMM infrastructure was merged upstream with zero in-tree consumers. There was a promise at the time that those users would be merged "soon", but it has been over a year with no drivers arriving. While the Nouveau driver is about to belatedly make good on that promise it is clear that HMM was targeted first and foremost at an out-of-tree consumer. HMM is derived from devm_memremap_pages(), a facility Christoph and I spearheaded to support persistent memory. It combines a device lifetime model with a dynamically created 'struct page' / memmap array for any physical address range. It enables coordination and control of the many code paths in the kernel built to interact with memory via 'struct page' objects. With HMM the integration goes even deeper by allowing device drivers to hook and manipulate page fault and page free events. One interpretation of when EXPORT_SYMBOL is suitable is when it is exporting stable and generic leaf functionality. The devm_memremap_pages() facility continues to see expanding use cases, peer-to-peer DMA being the most recent, with no clear end date when it will stop attracting reworks and semantic changes. It is not suitable to export devm_memremap_pages() as a stable 3rd party driver API due to the fact that it is still changing and manipulates core behavior. Moreover, it is not in the best interest of the long term development of the core memory management subsystem to permit any external driver to effectively define its own system-wide memory management policies with no encouragement to engage with upstream. I am also concerned that HMM was designed in a way to minimize further engagement with the core-MM. That, with these hooks in place, device-drivers are free to implement their own policies without much consideration for whether and how the core-MM could grow to meet that need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the core-MM should be allowed the opportunity and stimulus to change and address these new use cases as first class functionality. Original changelog: hmm_devmem_add(), and hmm_devmem_add_resource() duplicated devm_memremap_pages() and are now simple now wrappers around the core facility to inject a dev_pagemap instance into the global pgmap_radix and hook page-idle events. The devm_memremap_pages() interface is base infrastructure for HMM. HMM has more and deeper ties into the kernel memory management implementation than base ZONE_DEVICE which is itself a EXPORT_SYMBOL_GPL facility. Originally, the HMM page structure creation routines copied the devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the implementations was discussed during the initial review: http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled this cleanup to move forward. In addition to the integration with devm_memremap_pages() HMM depends on other GPL-only symbols: mmu_notifier_unregister_no_release percpu_ref region_intersects __class_create It goes further to consume / indirectly expose functionality that is not exported to any other driver: alloc_pages_vma walk_page_range HMM is derived from devm_memremap_pages(), and extends deep core-kernel fundamentals. Similar to devm_memremap_pages(), mark its entry points EXPORT_SYMBOL_GPL(). [logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()] Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com>, Cc: Michal Hocko <mhocko@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-11PCI/P2PDMA: Clean up documentation and kernel-docRandy Dunlap1-7/+7
Fix typos, spellos, and grammar in p2pdma.rst and p2pdma.c. Fix return value(s) in function pci_p2pmem_alloc_sgl(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Logan Gunthorpe <logang@deltatee.com> Cc: Jonathan Corbet <corbet@lwn.net>
2018-10-17PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpersLogan Gunthorpe1-0/+82
Users of the P2PDMA infrastructure will typically need a way for the user to tell the kernel to use P2P resources. Typically this will be a simple on/off boolean operation but sometimes it may be desirable for the user to specify the exact device to use for the P2P operation. Add new helpers for attributes which take a boolean or a PCI device. Any boolean as accepted by strtobool() turn P2P on or off (such as 'y', 'n', '1', '0', etc). Specifying a full PCI device name/BDF will select the specific device. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-10-17PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offsetLogan Gunthorpe1-0/+43
The DMA address used when mapping PCI P2P memory must be the PCI bus address. Thus, introduce pci_p2pmem_map_sg() to map the correct addresses when using P2P memory. Memory mapped in this way does not need to be unmapped and thus if we provided pci_p2pmem_unmap_sg() it would be empty. This breaks the expected balance between map/unmap but was left out as an empty function doesn't really provide any benefit. In the future, if this call becomes necessary it can be added without much difficulty. For this, we assume that an SGL passed to these functions contain all P2P memory or no P2P memory. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-10-17PCI/P2PDMA: Add sysfs group to display p2pmem statsLogan Gunthorpe1-0/+54
Add a sysfs group to display statistics about P2P memory that is registered in each PCI device. Attributes in the group display the total amount of P2P memory, the amount available and whether it is published or not. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-10-10PCI/P2PDMA: Support peer-to-peer memoryLogan Gunthorpe1-0/+626
Some PCI devices may have memory mapped in a BAR space that's intended for use in peer-to-peer transactions. To enable such transactions the memory must be registered with ZONE_DEVICE pages so it can be used by DMA interfaces in existing drivers. Add an interface for other subsystems to find and allocate chunks of P2P memory as necessary to facilitate transfers between two PCI peers: struct pci_dev *pci_p2pmem_find[_many](); int pci_p2pdma_distance[_many](); void *pci_alloc_p2pmem(); The new interface requires a driver to collect a list of client devices involved in the transaction then call pci_p2pmem_find() to obtain any suitable P2P memory. Alternatively, if the caller knows a device which provides P2P memory, they can use pci_p2pdma_distance() to determine if it is usable. With a suitable p2pmem device, memory can then be allocated with pci_alloc_p2pmem() for use in DMA transactions. Depending on hardware, using peer-to-peer memory may reduce the bandwidth of the transfer but can significantly reduce pressure on system memory. This may be desirable in many cases: for example a system could be designed with a small CPU connected to a PCIe switch by a small number of lanes which would maximize the number of lanes available to connect to NVMe devices. The code is designed to only utilize the p2pmem device if all the devices involved in a transfer are behind the same PCI bridge. This is because we have no way of knowing whether peer-to-peer routing between PCIe Root Ports is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P transfers that go through the RC is limited to only reducing DRAM usage and, in some cases, coding convenience. The PCI-SIG may be exploring adding a new capability bit to advertise whether this is possible for future hardware. This commit includes significant rework and feedback from Christoph Hellwig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>: https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com, to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>