summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)AuthorFilesLines
2020-05-22PCI: Don't disable bridge BARs when assigning bus resourcesLogan Gunthorpe1-4/+16
commit 9db8dc6d0785225c42a37be7b44d1b07b31b8957 upstream. Some PCI bridges implement BARs in addition to bridge windows. For example, here's a PLX switch: 04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0 Memory at 90a00000 (32-bit, non-prefetchable) [size=256K] Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: 90000000-909fffff Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff Previously, when the kernel assigned resource addresses (with the pci=realloc command line parameter, for example) it could clear the struct resource corresponding to the BAR. When this happened, lspci would report this BAR as "ignored": Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K] This is because the kernel reports a zero start address and zero flags in the corresponding sysfs resource file and in /proc/bus/pci/devices. Investigation with 'lspci -x', however, shows the BIOS-assigned address will still be programmed in the device's BAR registers. It's clearly a bug that the kernel lost track of the BAR value, but in most cases, this still won't result in a visible issue because nothing uses the memory, so nothing is affected. However, when an IOMMU is in use, it will not reserve this space in the IOVA because the kernel no longer thinks the range is valid. (See dmar_init_reserved_ranges() for the Intel implementation of this.) Without the proper reserved range, a DMA mapping may allocate an IOVA that matches a bridge BAR, which results in DMA accesses going to the BAR instead of the intended RAM. The problem was in pci_assign_unassigned_root_bus_resources(). When any resource from a bridge device fails to get assigned, the code set the resource's flags to zero. This makes sense for bridge windows, as they will be re-enabled later, but for regular BARs, it makes the kernel permanently lose track of the fact that they decode address space. Change pci_assign_unassigned_root_bus_resources() and pci_assign_unassigned_bridge_resources() so they only clear "res->flags" for bridge *windows*, not bridge BARs. Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)") Link: https://lore.kernel.org/r/20200108213208.4612-1-logang@deltatee.com [bhelgaas: commit log, check for pci_is_bridge()] Reported-by: Kit Chow <kchow@gigaio.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2020-02-11PCI/MSI: Fix incorrect MSI-X masking on resumeJian-Hong Pan1-1/+2
commit e045fa29e89383c717e308609edd19d2fd29e1be upstream. When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector Control register for each vector and saves it in desc->masked. Each register is 32 bits and bit 0 is the actual Mask bit. When we restored these registers during resume, we previously set the Mask bit if *any* bit in desc->masked was set instead of when the Mask bit itself was set: pci_restore_state pci_restore_msi_state __pci_restore_msix_state for_each_pci_msi_entry msix_mask_irq(entry, entry->masked) <-- entire u32 word __pci_msix_desc_mask_irq(desc, flag) mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT if (flag) <-- testing entire u32, not just bit 0 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) This means that after resume, MSI-X vectors were masked when they shouldn't be, which leads to timeouts like this: nvme nvme0: I/O 978 QID 3 timeout, completion polled On resume, set the Mask bit only when the saved Mask bit from suspend was set. This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"). [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2020-02-11PCI: Fix Intel ACS quirk UPDCR register addressSteffen Liebergeld1-1/+1
commit d8558ac8c93d429d65d7490b512a3a67e559d0d4 upstream. According to documentation [0] the correct offset for the Upstream Peer Decode Configuration Register (UPDCR) is 0x1014. It was previously defined as 0x1114. d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports") intended to enforce isolation between PCI devices allowing them to be put into separate IOMMU groups. Due to the wrong register offset the intended isolation was not fully enforced. This is fixed with this patch. Please note that I did not test this patch because I have no hardware that implements this register. [0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325) Fixes: d99321b63b1f ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports") Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.com Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-12-19PCI: PM: Fix pci_power_up()Rafael J. Wysocki1-13/+11
commit 45144d42f299455911cc29366656c7324a3a7c97 upstream. There is an arbitrary difference between the system resume and runtime resume code paths for PCI devices regarding the delay to apply when switching the devices from D3cold to D0. Namely, pci_restore_standard_config() used in the runtime resume code path calls pci_set_power_state() which in turn invokes __pci_start_power_transition() to power up the device through the platform firmware and that function applies the transition delay (as per PCI Express Base Specification Revision 2.0, Section 6.6.1). However, pci_pm_default_resume_early() used in the system resume code path calls pci_power_up() which doesn't apply the delay at all and that causes issues to occur during resume from suspend-to-idle on some systems where the delay is required. Since there is no reason for that difference to exist, modify pci_power_up() to follow pci_set_power_state() more closely and invoke __pci_start_power_transition() from there to call the platform firmware to power up the device (in case that's necessary). Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of pci_set_power_state()") Reported-by: Daniel Drake <drake@endlessm.com> Tested-by: Daniel Drake <drake@endlessm.com> Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-22PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30Vidya Sagar1-2/+5
commit 7be142caabc4780b13a522c485abc806de5c4114 upstream. The PCI Tegra controller conversion to a device tree configurable driver in commit d1523b52bff3 ("PCI: tegra: Move PCIe driver to drivers/pci/host") implied that code for the driver can be compiled in for a kernel supporting multiple platforms. Unfortunately, a blind move of the code did not check that some of the quirks that were applied in arch/arm (eg enabling Relaxed Ordering on all PCI devices - since the quirk hook erroneously matches PCI_ANY_ID for both Vendor-ID and Device-ID) are now applied in all kernels that compile the PCI Tegra controlled driver, DT and ACPI alike. This is completely wrong, in that enablement of Relaxed Ordering is only required by default in Tegra20 platforms as described in the Tegra20 Technical Reference Manual (available at https://developer.nvidia.com/embedded/downloads#?search=tegra%202 in Section 34.1, where it is mentioned that Relaxed Ordering bit needs to be enabled in its root ports to avoid deadlock in hardware) and in the Tegra30 platforms for the same reasons (unfortunately not documented in the TRM). There is no other strict requirement on PCI devices Relaxed Ordering enablement on any other Tegra platforms or PCI host bridge driver. Fix this quite upsetting situation by limiting the vendor and device IDs to which the Relaxed Ordering quirk applies to the root ports in question, reported above. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [lorenzo.pieralisi@arm.com: completely rewrote the commit log/fixes tag] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-01PCI: Do not poll for PME if the device is in D3coldMika Westerberg1-0/+7
commit 000dd5316e1c756a1c028f22e01d06a38249dd4d upstream. PME polling does not take into account that a device that is directly connected to the host bridge may go into D3cold as well. This leads to a situation where the PME poll thread reads from a config space of a device that is in D3cold and gets incorrect information because the config space is not accessible. Here is an example from Intel Ice Lake system where two PCIe root ports are in D3cold (I've instrumented the kernel to log the PMCSR register contents): [ 62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff [ 62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff Since 0xffff is interpreted so that PME is pending, the root ports will be runtime resumed. This repeats over and over again essentially blocking all runtime power management. Prevent this from happening by checking whether the device is in D3cold before its PME status is read. Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessaryLyude Paul1-0/+58
commit e0547c81bfcfad01cbbfa93a5e66bb98ab932f80 upstream. On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M variant, the BIOS does not always reset the secondary Nvidia GPU during reboot if the laptop is configured in Hybrid Graphics mode. The reason is unknown, but the following steps and possibly a good bit of patience will reproduce the issue: 1. Boot up the laptop normally in Hybrid Graphics mode 2. Make sure nouveau is loaded and that the GPU is awake 3. Allow the Nvidia GPU to runtime suspend itself after being idle 4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help) 5. If nouveau loads up properly, reboot the machine again and go back to step 2 until you reproduce the issue This results in some very strange behavior: the GPU will be left in exactly the same state it was in when the previously booted kernel started the reboot. This has all sorts of bad side effects: for starters, this completely breaks nouveau starting with a mysterious EVO channel failure that happens well before we've actually used the EVO channel for anything: nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002 This causes a timeout trying to bring up the GR ctx: nouveau 0000:01:00.0: timeout WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018 Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] ... nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1) nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1) nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown] The GPU never manages to recover. Booting without loading nouveau causes issues as well, since the GPU starts sending spurious interrupts that cause other device's IRQs to get disabled by the kernel: irq 16: nobody cared (try booting with the "irqpoll" option) ... handlers: [<000000007faa9e99>] i801_isr [i2c_i801] Disabling IRQ #16 ... serio: RMI4 PS/2 pass-through port at rmi4-00.fn03 i801_smbus 0000:00:1f.4: Timeout waiting for interrupt! i801_smbus 0000:00:1f.4: Transaction timeout rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110). i801_smbus 0000:00:1f.4: Timeout waiting for interrupt! i801_smbus 0000:00:1f.4: Transaction timeout rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! This causes the touchpad and sometimes other things to get disabled. Since this happens without nouveau, we can't fix this problem from nouveau itself. Add a PCI quirk for the specific P50 variant of this GPU. Make sure the GPU is advertising NoReset- so we don't reset the GPU when the machine is in Dedicated graphics mode (where the GPU being initialized by the BIOS is normal and expected). Map the GPU MMIO space and read the magic 0x2240c register, which will have bit 1 set if the device was POSTed during a previous boot. Once we've confirmed all of this, reset the GPU and re-disable it - bringing it back to a healthy state. Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003 Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: Karol Herbst <kherbst@redhat.com> Cc: Ben Skeggs <skeggsb@gmail.com> [bwh: Backported to 3.16: - Use dev_{err,info}() instead of pci_{err,info}() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23PCI: Mark Atheros AR9462 to avoid bus resetJames Prestwood1-0/+1
commit 6afb7e26978da5e86e57e540fdce65c8b04f398a upstream. When using PCI passthrough with this device, the host machine locks up completely when starting the VM, requiring a hard reboot. Add a quirk to avoid bus resets on this device. Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Link: https://lore.kernel.org/linux-pci/20190107213248.3034-1-james.prestwood@linux.intel.com Signed-off-by: James Prestwood <james.prestwood@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratumStefan Mätje2-0/+26
commit 4ec73791a64bab25cabf16a6067ee478692e506d upstream. Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode (conventional PCI on primary side, PCIe on downstream side), the Retrain Link bit needs to be cleared manually to allow the link training to complete successfully. If it is not cleared manually, the link training is continuously restarted and no devices below the PCI-to-PCIe bridge can be accessed. That means drivers for devices below the bridge will be loaded but won't work and may even crash because the driver is only reading 0xffff. See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL, PI7C9X130. Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected devices set this bit. Note that pcie_retrain_link() lives in aspm.c because that's currently the only place we use it, but this erratum is not specific to ASPM, and we may retrain links for other reasons in the future. Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> [bhelgaas: apply regardless of CONFIG_PCIEASPM] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [bwh: Backported to 3.16: - Use dev_info() instead of pci_info() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23PCI: Factor out pcie_retrain_link() functionStefan Mätje1-16/+24
commit 86fa6a344209d9414ea962b1f1ac6ade9dd7563a upstream. Factor out pcie_retrain_link() to use for Pericom Retrain Link quirk. No functional change intended. Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controllerAndre Przywara1-0/+2
commit 9cde402a59770a0669d895399c13407f63d7d209 upstream. There is a Marvell 88SE9170 PCIe SATA controller I found on a board here. Some quick testing with the ARM SMMU enabled reveals that it suffers from the same requester ID mixup problems as the other Marvell chips listed already. Add the PCI vendor/device ID to the list of chips which need the workaround. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11PCI: Add Device IDs for Intel GPU "spurious interrupt" quirkBin Meng1-0/+4
commit d0c9606b31a21028fb5b753c8ad79626292accfd upstream. Add Device IDs to the Intel GPU "spurious interrupt" quirk table. For these devices, unplugging the VGA cable and plugging it in again causes spurious interrupts from the IGD. Linux eventually disables the interrupt, but of course that disables any other devices sharing the interrupt. The theory is that this is a VGA BIOS defect: it should have disabled the IGD interrupt but failed to do so. See f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs") and 7c82126a94e6 ("PCI: Add new ID for Intel GPU "spurious interrupt" quirk") for some history. [bhelgaas: See link below for discussion about how to fix this more generically instead of adding device IDs for every new Intel GPU. I hope this is the last patch to add device IDs.] Link: https://lore.kernel.org/linux-pci/1537974841-29928-1-git-send-email-bmeng.cn@gmail.com Signed-off-by: Bin Meng <bmeng.cn@gmail.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11PCI/ASPM: Fix link_state teardown on device removalLukas Wunner2-4/+2
commit aeae4f3e5c38d47bdaef50446dc0ec857307df68 upstream. Upon removal of the last device on a bus, the link_state of the bridge leading to that bus is sought to be torn down by having pci_stop_dev() call pcie_aspm_exit_link_state(). When ASPM was originally introduced by commit 7d715a6c1ae5 ("PCI: add PCI Express ASPM support"), it determined whether the device being removed is the last one by calling list_empty() on the bridge's subordinate devices list. That didn't work because the device is only removed from the list slightly later in pci_destroy_dev(). Commit 3419c75e15f8 ("PCI: properly clean up ASPM link state on device remove") attempted to fix it by calling list_is_last(), but that's not correct either because it checks whether the device is at the *end* of the list, not whether it's the last one *left* in the list. If the user removes the device which happens to be at the end of the list via sysfs but other devices are preceding the device in the list, the link_state is torn down prematurely. The real fix is to move the invocation of pcie_aspm_exit_link_state() to pci_destroy_dev() and reinstate the call to list_empty(). Remove a duplicate check for dev->bus->self because pcie_aspm_exit_link_state() already contains an identical check. Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Shaohua Li <shaohua.li@intel.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-12-17PCI: Reprogram bridge prefetch registers on resumeDaniel Drake1-8/+19
commit 083874549fdfefa629dfa752785e20427dde1511 upstream. On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3 suspend/resume. The affected products include multiple generations of NVIDIA GPUs and Intel SoCs. After resume, nouveau logs many errors such as: fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown] DRM: failed to idle channel 0 [DRM] Similarly, the NVIDIA proprietary driver also fails after resume (black screen, 100% CPU usage in Xorg process). We shipped a sample to NVIDIA for diagnosis, and their response indicated that it's a problem with the parent PCI bridge (on the Intel SoC), not the GPU. Runtime suspend/resume works fine, only S3 suspend is affected. We found a workaround: on resume, rewrite the Intel PCI bridge 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In the cases that I checked, this register has value 0 and we just have to rewrite that value. Linux already saves and restores PCI config space during suspend/resume, but this register was being skipped because upon resume, it already has value 0 (the correct, pre-suspend value). Intel appear to have previously acknowledged this behaviour and the requirement to rewrite this register: https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23 Based on that, rewrite the prefetch register values even when that appears unnecessary. We have confirmed this solution on all the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN). Additionally, this solves an issue where r8169 MSI-X interrupts were broken after S3 suspend/resume on ASUS X441UAR. This issue was recently worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet patched. I suspect it will also fix the issue that was worked around in commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g"). Thomas Martitz reports that this change also solves an issue where the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3 suspend/resume. Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069 Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-By: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-12-17PCI: mvebu: Fix I/O space end address calculationThomas Petazzoni1-1/+1
commit dfd0309fd7b30a5baffaf47b2fccb88b46d64d69 upstream. pcie->realio.end should be the address of last byte of the area, therefore using resource_size() of another resource is not correct, we must substract 1 to get the address of the last byte. Fixes: 11be65472a427 ("PCI: mvebu: Adapt to the new device tree layout") Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-12-17PCI: pciehp: Fix unprotected list iteration in IRQ handlerLukas Wunner1-10/+3
commit 1204e35bedf4e5015cda559ed8c84789a6dae24e upstream. Commit b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") iterates over the devices on a hotplug port's subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem. It is thus possible for a user to cause a crash by concurrently manipulating the device list, e.g. by disabling slot power via sysfs on a different CPU or by initiating a remove/rescan via sysfs. This can't be fixed by acquiring pci_bus_sem because it may sleep. The simplest fix is to avoid the list iteration altogether and just check the ignore_hotplug flag on the port itself. This works because pci_ignore_hotplug() sets the flag both on the device as well as on its parent bridge. We do lose the ability to print the name of the device blocking hotplug in the debug message, but that's probably bearable. Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: s/events/intr_loc/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-12-17PCI: pciehp: Fix use-after-free on unplugLukas Wunner3-3/+8
commit 281e878eab191cce4259abbbf1a0322e3adae02c upstream. When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the hotplug_slot struct is deregistered and thus freed before freeing the IRQ. The IRQ handler and the work items it schedules print the slot name referenced from the freed structure in various informational and debug log messages, each time resulting in a quadruple dereference of freed pointers (hotplug_slot -> pci_slot -> kobject -> name). At best the slot name is logged as "(null)", at worst kernel memory is exposed in logs or the driver crashes: pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present An attacker may provoke the bug by unplugging multiple devices on a Thunderbolt daisy chain at once. Unplugging can also be simulated by powering down slots via sysfs. The bug is particularly easy to trigger in poll mode. It has been present since the driver's introduction in 2004: https://git.kernel.org/tglx/history/c/c16b4b14d980 Fix by rearranging teardown such that the IRQ is freed first. Run the work items queued by the IRQ handler to completion before freeing the hotplug_slot struct by draining the work queue from the ->release_slot callback which is invoked by pci_hp_deregister(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-12-17PCI: hotplug: Don't leak pci_slot on registration failureLukas Wunner1-0/+9
commit 4ce6435820d1f1cc2c2788e232735eb244bcc8a3 upstream. If addition of sysfs files fails on registration of a hotplug slot, the struct pci_slot as well as the entry in the slot_list is leaked. The issue has been present since the hotplug core was introduced in 2002: https://git.kernel.org/tglx/history/c/a8a2069f432c Perhaps the idea was that even though sysfs addition fails, the slot should still be usable. But that's not how drivers use the interface, they abort probe if a non-zero value is returned. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-11-20PCI: shpchp: Fix AMD POGO identificationBjorn Helgaas1-4/+4
commit bed4e9cfab93a0f3d0144cb919820e6d5c40b8b1 upstream. The fix for an AMD POGO erratum related to SHPC incorrectly identified the device. The workaround should be applied only for AMD POGO devices, but it was instead applied to: - all AMD bridges, and - all devices from any vendor with device ID 0x7458 Fixes: 53044f357448 ("[PATCH] PCI Hotplug: shpchp: AMD POGO errata fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-11-20PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resumeMika Westerberg3-3/+14
commit 13c65840feab8109194f9490c9870587173cb29d upstream. After a suspend/resume cycle the Presence Detect or Data Link Layer Status Changed bits might be set. If we don't clear them those events will not fire anymore and nothing happens for instance when a device is now hot-unplugged. Fix this by clearing those bits in a newly introduced function pcie_reenable_notification(). This should be fine because immediately after, we check if the adapter is still present by reading directly from the status register. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-11-20PCI: ibmphp: Fix use-before-set in get_max_bus_speed()Dan Carpenter1-1/+1
commit 4051f5ebb11c6ef4b0d3eac2fbbd187c070656c5 upstream. The "rc" variable is only initialized on the error path. The caller doesn't check the return but, if "rc" is non-zero, then this function is basically a no-op. Fixes: 3749c51ac6c1 ("PCI: Make current and maximum bus speeds part of the PCI core") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-10-21drm/radeon: make MacBook Pro d3_delay quirk more genericBjorn Helgaas1-0/+13
commit 5938628c51a711ae2169d68b2e3a4f7d93d4dbea upstream. The PCI Power Management Spec, r1.2, sec 5.6.1, requires a 10 millisecond delay when powering on a device, i.e., transitioning from state D3hot to D0. Apparently some devices require more time, and d1f9809ed131 ("drm/radeon: add quirk for d3 delay during switcheroo poweron for apple macbooks") added an additional delay for the Radeon device in a MacBook Pro. 4807c5a8a0c8 ("drm/radeon: add a PX quirk list") made the affected device more explicit. Add a generic PCI quirk to increase the d3_delay. This means we will use the additional delay for *all* wakeups from D3, not just those initiated by radeon_switcheroo_set_state(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> CC: Maarten Lankhorst <maarten.lankhorst@canonical.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-10-21ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()Mika Westerberg1-7/+16
commit 13d3047c81505cc0fb9bdae7810676e70523c8bf upstream. Mike Lothian reported that plugging in a USB-C device does not work properly in his Dell Alienware system. This system has an Intel Alpine Ridge Thunderbolt controller providing USB-C functionality. In these systems the USB controller (xHCI) is hotplugged whenever a device is connected to the port using ACPI-based hotplug. The ACPI description of the root port in question is as follows: Device (RP01) { Name (_ADR, 0x001C0000) Device (PXSX) { Name (_ADR, 0x02) Method (_RMV, 0, NotSerialized) { // ... } } Here _ADR 0x02 means device 0, function 2 on the bus under root port (RP01) but that seems to be incorrect because device 0 is the upstream port of the Alpine Ridge PCIe switch and it has no functions other than 0 (the bridge itself). When we get ACPI Notify() to the root port resulting from connecting a USB-C device, Linux tries to read PCI_VENDOR_ID from device 0, function 2 which of course always returns 0xffffffff because there is no such function and we never find the device. In Windows this works fine. Now, since we get ACPI Notify() to the root port and not to the PXSX device we should actually start our scan from there as well and not from the non-existent PXSX device. Fix this by checking presence of the slot itself (function 0) if we fail to do that otherwise. While there use pci_bus_read_dev_vendor_id() in get_slot_status(), which is the recommended way to read Device and Vendor IDs of devices on PCI buses. Link: https://bugzilla.kernel.org/show_bug.cgi?id=198557 Reported-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-06-17PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644LHans de Goede1-0/+2
commit 1903be8222b7c278ca897c129ce477c1dd6403a8 upstream. The Highpoint RocketRAID 644L uses a Marvel 88SE9235 controller, as with other Marvel controllers this needs a function 1 DMA alias quirk. Note the RocketRAID 642L uses the same Marvel 88SE9235 controller and already is listed with a function 1 DMA alias quirk. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1534106 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-03-03PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()Rafael J. Wysocki1-1/+6
commit 5839ee7389e893a31e4e3c9cf17b50d14103c902 upstream. It is incorrect to call pci_restore_state() for devices in low-power states (D1-D3), as that involves the restoration of MSI setup which requires MMIO to be operational and that is only the case in D0. However, pci_pm_thaw_noirq() may do that if the driver's "freeze" callbacks put the device into a low-power state, so fix it by making it force devices into D0 via pci_set_power_state() instead of trying to "update" their power state which is pointless. Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation) Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl> Tested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-02-13PCI/AER: Report non-fatal errors only to the affected endpointGabriele Paoloni1-1/+8
commit 86acc790717fb60fb51ea3095084e331d8711c74 upstream. Previously, if an non-fatal error was reported by an endpoint, we called report_error_detected() for the endpoint, every sibling on the bus, and their descendents. If any of them did not implement the .error_detected() method, do_recovery() failed, leaving all these devices unrecovered. For example, the system described in the bugzilla below has two devices: 0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected() 0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected() When a device such as 74:02.0 reported a non-fatal error, do_recovery() failed because 74:03.0 lacked an .error_detected() method. But per PCIe r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and does not affect 74:03.0: Non-fatal errors are uncorrectable errors which cause a particular transaction to be unreliable but the Link is otherwise fully functional. Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in a device or system management software the opportunity to recover from the error without resetting the components on the Link and disturbing other transactions in progress. Devices not associated with the transaction in error are not impacted by the error. Report non-fatal errors only to the endpoint that reported them. We really want to check for AER_NONFATAL here, but the current code structure doesn't allow that. Looking for pci_channel_io_normal is the best we can do now. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055 Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver") Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2018-01-01PCI: Fix race condition with driver_overrideNicolai Stange1-2/+9
commit 9561475db680f7144d2223a409dd3d7e322aca03 upstream. The driver_override implementation is susceptible to a race condition when different threads are reading vs. storing a different driver override. Add locking to avoid the race condition. This is in close analogy to commit 6265539776a0 ("driver core: platform: fix race condition with driver_override") from Adrian Salido. Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override") Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-26PCI: shpchp: Enable bridge bus mastering if MSI is enabledAleksandr Bezzubikov1-0/+2
commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream. An SHPC may generate MSIs to notify software about slot or controller events (SHPC spec r1.0, sec 4.7). A PCI device can only generate an MSI if it has bus mastering enabled. Enable bus mastering if the bridge contains an SHPC that uses MSI for event notifications. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Limit config space size for Netronome NFP4000Simon Horman1-0/+1
commit c2e771b02792d222cbcd9617fe71482a64f52647 upstream. Like the NFP6000, the NFP4000 as an erratum where reading/writing to PCI config space addresses above 0x600 can cause the NFP to generate PCIe completion timeouts. Limit the NFP4000's PF's config space size to 0x600 bytes as is already done for the NFP6000. The NFP4000's VF is 0x6004 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same device ID as the NFP6000's VF. Thus, its config space is already limited by the existing use of quirk_nfp6000(). Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Limit config space size for Netronome NFP6000 familyJason S. McMullan1-0/+11
commit 9f33a2ae59f24452c1076749deb615bccd435ca9 upstream. The NFP6000 has an erratum where reading/writing to PCI config space addresses above 0x600 can cause the NFP to generate PCIe completion timeouts. Limit the NFP6000's config space size to 0x600 bytes. Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com> [simon: edited changelog] Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Support PCIe devices with short cfg_sizeJason S. McMullan1-9/+9
commit c20aecf6963d1273d8f6d61c042b4845441ca592 upstream. If a device quirk modifies the pci_dev->cfg_size to be less than PCI_CFG_SPACE_EXP_SIZE (4096), but greater than PCI_CFG_SPACE_SIZE (256), the PCI sysfs interface truncates the readable size to PCI_CFG_SPACE_SIZE. Allow sysfs access to config space up to cfg_size, even if the device doesn't support the entire 4096-byte PCIe config space. Note that pci_read_config() and pci_write_config() limit access to dev->cfg_size even though pcie_config_attr contains 4096 (the maximum size). Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com> [simon: edited changelog] Signed-off-by: Simon Horman <simon.horman@netronome.com> [bhelgaas: more changelog edits] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI/PM: Restore the status of PCI devices across hibernationChen Yu1-0/+1
commit e60514bd4485c0c7c5a7cf779b200ce0b95c70d6 upstream. Currently we saw a lot of "No irq handler" errors during hibernation, which caused the system hang finally: ata4.00: qc timeout (cmd 0xec) ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata4.00: revalidation failed (errno=-5) ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) do_IRQ: 31.151 No irq handler for vector According to above logs, there is an interrupt triggered and it is dispatched to CPU31 with a vector number 151, but there is no handler for it, thus this IRQ will not get acked and will cause an IRQ flood which kills the system. To be more specific, the 31.151 is an interrupt from the AHCI host controller. After some investigation, the reason why this issue is triggered is because the thaw_noirq() function does not restore the MSI/MSI-X settings across hibernation. The scenario is illustrated below: 1. Before hibernation, IRQ 34 is the handler for the AHCI device, which is bound to CPU31. 2. Hibernation starts, the AHCI device is put into low power state. 3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to the last alive one - CPU0. 4. After the snapshot has been created, all the nonboot CPUs are brought up again; IRQ 34 remains bound to CPU0. 5. AHCI devices are put into D0. 6. The snapshot is written to the disk. The issue is triggered in step 6. The AHCI interrupt should be delivered to CPU0, however it is delivered to the original CPU31 instead, which causes the "No irq handler" issue. Ying Huang has provided a clue that, in step 3 it is possible that writing to the register might not take effect as the PCI devices have been suspended. In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but in fact it is not. In __pci_write_msi_msg(), if the device is already in low power state, the low level MSI message entry will not be updated but cached. During the device restore process after a normal suspend/resume, pci_restore_msi_state() writes the cached MSI back to the hardware. But this is not the case for hibernation. pci_restore_msi_state() is not currently called in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary PCI cached information in pci_pm_freeze_noirq(). Restore the PCI status for the device during hibernation. Otherwise the status might be lost across hibernation (for example, settings for MSI, MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation. Suggested-by: Ying Huang <ying.huang@intel.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Ying Huang <ying.huang@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Correct PCI_STD_RESOURCE_END usageBjorn Helgaas1-1/+1
commit 2f686f1d9beee135de6d08caea707ec7bfc916d4 upstream. PCI_STD_RESOURCE_END is (confusingly) the index of the last valid BAR, not the *number* of BARs. To iterate through all possible BARs, we need to include PCI_STD_RESOURCE_END. Fixes: 9fe373f9997b ("PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Disable boot interrupt quirk for ASUS M2N-LRStefan Assmann1-0/+24
commit c4e649b09f55595e6df6da5465a5b3cfc93557c1 upstream. The ASUS M2N-LR should not trigger boot interrupt quirks although it carries an Intel 6702PXH. On this board the boot interrupt quirks cause incorrect IRQ assignments and should be disabled. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43074 Tested-by: Solomon Peachy <pizza@shaftnet.org> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Freeze PME scan before suspending devicesLukas Wunner1-4/+5
commit ea00353f36b64375518662a8ad15e39218a1f324 upstream. Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790) crashes during suspend tests. Geert Uytterhoeven managed to reproduce the issue on an M2-W Koelsch board (r8a7791): It occurs when the PME scan runs, once per second. During PME scan, the PCI host bridge (rcar-pci) registers are accessed while its module clock has already been disabled, leading to the crash. One reproducer is to configure s2ram to use "s2idle" instead of "deep" suspend: # echo 0 > /sys/module/printk/parameters/console_suspend # echo s2idle > /sys/power/mem_sleep # echo mem > /sys/power/state Another reproducer is to write either "platform" or "processors" to /sys/power/pm_test. It does not (or is less likely) to happen during full system suspend ("core" or "none") because system suspend also disables timers, and thus the workqueue handling PME scans no longer runs. Geert believes the issue may still happen in the small window between disabling module clocks and disabling timers: # echo 0 > /sys/module/printk/parameters/console_suspend # echo platform > /sys/power/pm_test # Or "processors" # echo mem > /sys/power/state (Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.) Rafael Wysocki agrees that PME scans should be suspended before the host bridge registers become inaccessible. To that end, queue the task on a workqueue that gets frozen before devices suspend. Rafael notes however that as a result, some wakeup events may be missed if they are delivered via PME from a device without working IRQ (which hence must be polled) and occur after the workqueue has been frozen. If that turns out to be an issue in practice, it may be possible to solve it by calling pci_pme_list_scan() once directly from one of the host bridge's pm_ops callbacks. Stacktrace for posterity: PM: Syncing filesystems ... [ 38.566237] done. PM: Preparing system for sleep (mem) Freezing user space processes ... [ 38.579813] (elapsed 0.001 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. PM: Suspending system (mem) PM: suspend of devices complete after 152.456 msecs PM: late suspend of devices complete after 2.809 msecs PM: noirq suspend of devices complete after 29.863 msecs suspend debug: Waiting for 5 second(s). Unhandled fault: asynchronous external abort (0x1211) at 0x00000000 pgd = c0003000 [00000000] *pgd=80000040004003, *pmd=00000000 Internal error: : 1211 [#1] SMP ARM Modules linked in: CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted 4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383 Hardware name: Generic R8A7791 (Flattened Device Tree) Workqueue: events pci_pme_list_scan task: eb56e140 task.stack: eb58e000 PC is at pci_generic_config_read+0x64/0x6c LR is at rcar_pci_cfg_base+0x64/0x84 pc : [<c041d7b4>] lr : [<c04309a0>] psr: 600d0093 sp : eb58fe98 ip : c041d750 fp : 00000008 r10: c0e2283c r9 : 00000000 r8 : 600d0013 r7 : 00000008 r6 : eb58fed6 r5 : 00000002 r4 : eb58feb4 r3 : 00000000 r2 : 00000044 r1 : 00000008 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 6a9f6c80 DAC: 55555555 Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210) Stack: (0xeb58fe98 to 0xeb590000) fe80: 00000002 00000044 fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000 fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830 fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100 ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000 ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380 ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000 ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0 ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd [<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>] (pci_bus_read_config_word+0x58/0x80) [<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>] (pci_check_pme_status+0x34/0x78) [<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54) [<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4) [<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>] (process_one_work+0x1bc/0x308) [<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0) [<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc) [<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c) Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000) ---[ end trace 667d43ba3aa9e589 ]--- Fixes: df17e62e5bff ("PCI: Add support for polling PME state on suspended legacy PCI devices") Reported-and-tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Only allow WC mmap on prefetchable resourcesDavid Woodhouse1-5/+8
commit cef4d02305a06be581bb7f4353446717a1b319ec upstream. The /proc/bus/pci mmap interface allows the user to specify whether they want WC or not. Don't let them do so on non-prefetchable BARs. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Fix another sanity check bug in /proc/pci mmapDavid Woodhouse1-2/+8
commit 17caf56731311c9596e7d38a70c88fcb6afa6a1b upstream. Don't match MMIO maps with I/O BARs and vice versa. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Ignore write combining when mapping I/O port spaceBjorn Helgaas1-3/+6
commit 3a92c319c44a7bcee9f48dff9d97d001943b54c6 upstream. PCI exposes files like /proc/bus/pci/00/00.0 in procfs. These files support operations like this: ioctl(fd, PCIIOC_MMAP_IS_IO); # request I/O port space ioctl(fd, PCIIOC_WRITE_COMBINE, 1); # request write-combining mmap(fd, ...) Write combining is useful on PCI memory space, but I don't think it makes sense on PCI I/O port space. We *could* change proc_bus_pci_ioctl() to make it impossible to set mmap_state == pci_mmap_io and write_combine at the same time, but that would break the following sequence, which is currently legal: mmap(fd, ...) # default is I/O, non-combining ioctl(fd, PCIIOC_WRITE_COMBINE, 1); # request write-combining ioctl(fd, PCIIOC_MMAP_IS_MEM); # request memory space mmap(fd, ...) # get write-combining mapping Ignore the write-combining flag when mapping I/O port space. This patch should have no functional effect, based on this analysis of all implementations of pci_mmap_page_range(): - ia64 mips parisc sh unicore32 x86 do not support mapping of I/O port space at all. - arm cris microblaze mn10300 sparc xtensa support mapping of I/O port space, but ignore the write_combine argument to pci_mmap_page_range(). - powerpc supports mapping of I/O port space and uses write_combine, and it disables write combining for I/O port space in __pci_mmap_set_pgprot(). This patch makes it possible to remove __pci_mmap_set_pgprot() from powerpc, which simplifies that path. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platformsDavid Woodhouse1-3/+7
commit 6bccc7f426abd640f08d8c75fb22f99483f201b4 upstream. In the PCI_MMAP_PROCFS case when the address being passed by the user is a 'user visible' resource address based on the bus window, and not the actual contents of the resource, that's what we need to be checking it against. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: dwc: Fix uninitialized variable in dw_handle_msi_irq()Dan Carpenter1-3/+4
commit 1b497e6493c49bbb55c89f53562f7f853495e90d upstream. The bug is that "val" is unsigned long but we only initialize 32 bits of it. Then we test "if (val)" and that might be true not because we set the bits but because some were never initialized. Fixes: f342d940ee0e ("PCI: exynos: Add support for MSI") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26PCI: dwc: Unindent dw_handle_msi_irq() loopBjorn Helgaas1-12/+11
commit dbe4a09e8bbcf88809a8394d6a359d8cebd22a86 upstream. Use "continue" to skip rest of the loop when possible to save an indent level. No functional change intended. Suggested-by: walter harms <wharms@bfs.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-03-16powerpc/pci/rpadlpar: Fix device reference leaksJohan Hovold1-1/+9
commit 99e5cde5eae78bef95bfe7c16ccda87fb070149b upstream. Make sure to drop any device reference taken by vio_find_node() when adding and removing virtual I/O slots. Fixes: 5eeb8c63a38f ("[PATCH] PCI Hotplug: rpaphp: Move VIO registration") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-03-16PCI: Check for PME in targeted sleep stateAlan Stern1-0/+4
commit 6496ebd7edf446fccf8266a1a70ffcb64252593e upstream. One some systems, the firmware does not allow certain PCI devices to be put in deep D-states. This can cause problems for wakeup signalling, if the device does not support PME# in the deepest allowed suspend state. For example, Pierre reports that on his system, ACPI does not permit his xHCI host controller to go into D3 during runtime suspend -- but D3 is the only state in which the controller can generate PME# signals. As a result, the controller goes into runtime suspend but never wakes up, so it doesn't work properly. USB devices plugged into the controller are never detected. If the device relies on PME# for wakeup signals but is not capable of generating PME# in the target state, the PCI core should accurately report that it cannot do wakeup from runtime suspend. This patch modifies the pci_dev_run_wake() routine to add this check. Reported-by: Pierre de Villemereuil <flyos@mailoo.org> Tested-by: Pierre de Villemereuil <flyos@mailoo.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-02-23PCI: Mark Atheros AR9580 to avoid bus resetMaik Broemme1-0/+1
commit 8e2e03179923479ca0c0b6fdc7c93ecf89bce7a8 upstream. Similar to the AR93xx and the AR94xx series, the AR95xx also have the same quirk for the Bus Reset. It will lead to instant system reset if the device is assigned via VFIO to a KVM VM. I've been able reproduce this behavior with a MikroTik R11e-2HnD. Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Maik Broemme <mbroemme@libmpq.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-11-20PCI: Mark Atheros AR9485 and QCA9882 to avoid bus resetChris Blake1-4/+6
commit 9ac0108c2bac3f1d0255f64fb89fc27e71131b24 upstream. Similar to the AR93xx series, the AR94xx and the Qualcomm QCA988x also have the same quirk for the Bus Reset. Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Chris Blake <chrisrblake93@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-08-23PCI: Disable all BAR sizing for devices with non-compliant BARsPrarit Bhargava1-3/+3
commit ad67b437f187ea818b2860524d10f878fadfdd99 upstream. b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with the PCI spec. But it didn't do anything for expansion ROM BARs, so we still try to size them, resulting in warnings like this on Broadwell-EP: pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref] Move the non-compliant BAR check from __pci_read_base() up to pci_read_bases() so it applies to the expansion ROM BAR as well as to BARs 0-5. Note that direct callers of __pci_read_base(), like sriov_init(), will now bypass this check. We haven't had reports of devices with broken SR-IOV BARs yet. [bhelgaas: changelog] Fixes: b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-08-23PCI: Supply CPU physical address (not bus address) to iomem_is_exclusive()Bjorn Helgaas1-4/+3
commit ca620723d4ff9ea7ed484eab46264c3af871b9ae upstream. iomem_is_exclusive() requires a CPU physical address, but on some arches we supplied a PCI bus address instead. On most arches, pci_resource_to_user(res) returns "res->start", which is a CPU physical address. But on microblaze, mips, powerpc, and sparc, it returns the PCI bus address corresponding to "res->start". The result is that pci_mmap_resource() may fail when it shouldn't (if the bus address happens to match an existing resource), or it may succeed when it should fail (if the resource is exclusive but the bus address doesn't match it). Call iomem_is_exclusive() with "res->start", which is always a CPU physical address, not the result of pci_resource_to_user(). Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers") Suggested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-05-01PCI: Disable IO/MEM decoding for devices with non-compliant BARsBjorn Helgaas1-0/+14
commit b84106b4e2290c081cdab521fa832596cdfea246 upstream. The PCI config header (first 64 bytes of each device's config space) is defined by the PCI spec so generic software can identify the device and manage its usage of I/O, memory, and IRQ resources. Some non-spec-compliant devices put registers other than BARs where the BARs should be. When the PCI core sizes these "BARs", the reads and writes it does may have unwanted side effects, and the "BAR" may appear to describe non-sensical address space. Add a flag bit to mark non-compliant devices so we don't touch their BARs. Turn off IO/MEM decoding to prevent the devices from consuming address space, since we can't read the BARs to find out what that address space would be. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-05-01PCI: imx6: Move link up check into imx6_pcie_wait_for_link()Lucas Stach1-8/+24
commit 4d107d3b5a686b5834e533a00b73bf7b1cf59df7 upstream. imx6_pcie_link_up() previously used usleep_range() to wait for the link to come up. Since it may be called while holding the config spinlock, the sleep causes a "BUG: scheduling while atomic" error. Instead of waiting for the link to come up in imx6_pcie_link_up(), do the waiting in imx6_pcie_wait_for_link(), where we're not holding a lock and sleeping is allowed. [bhelgaas: changelog, references to bugzilla and f95d3ae77191] Link: https://bugzilla.kernel.org/show_bug.cgi?id=100031 Fixes: f95d3ae77191 ("PCI: imx6: Wait for retraining") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [bwh: Backported to 3.16: also update the retry loop in imx6_pcie_wait_for_link() as done upstream in commit 6cbb247e85eb ("PCI: designware: Wait for link to come up with consistent style")] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2016-05-01PCI: imx6: Remove broken Gen2 workaroundLucas Stach1-16/+1
commit a77c5422d7586003643377afdb9915e76d07d21c upstream. Remove the remnants of the workaround for erratum ERR005184 which was never completely implemented. The checks alone don't carry any value as we don't act properly on the result. A workaround should be added to the lane speed change in establish_link later. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>