diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-16 06:44:49 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-16 06:44:49 +0300 |
commit | fb4da215ed92f564f7ca090bb81a199b0d6cab8a (patch) | |
tree | 38d4e18e1db026bec42c8b58ee40a245db313af3 /Documentation/PCI/pcieaer-howto.rst | |
parent | 2a3c389a0fde49b241430df806a34276568cfb29 (diff) | |
parent | 7b4b0f6b34d893be569da81ffad865a9d3a7d014 (diff) | |
download | linux-fb4da215ed92f564f7ca090bb81a199b0d6cab8a.tar.xz |
Merge tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration changes:
- Evaluate PCI Boot Configuration _DSM to learn if firmware wants us
to preserve its resource assignments (Benjamin Herrenschmidt)
- Simplify resource distribution (Nicholas Johnson)
- Decode 32 GT/s link speed (Gustavo Pimentel)
Virtualization:
- Fix incorrect caching of VF config space size (Alex Williamson)
- Fix VF driver probing sysfs knobs (Alex Williamson)
Peer-to-peer DMA:
- Fix dma_virt_ops check (Logan Gunthorpe)
Altera host bridge driver:
- Allow building as module (Ley Foon Tan)
Armada 8K host bridge driver:
- add PHYs support (Miquel Raynal)
DesignWare host bridge driver:
- Export APIs to support removable loadable module (Vidya Sagar)
- Enable Relaxed Ordering erratum workaround only on Tegra20 &
Tegra30 (Vidya Sagar)
Hyper-V host bridge driver:
- Fix use-after-free in eject (Dexuan Cui)
Mobiveil host bridge driver:
- Clean up and fix many issues, including non-identify mapped
windows, 64-bit windows, multi-MSI, class code, INTx clearing (Hou
Zhiqiang)
Qualcomm host bridge driver:
- Use clk bulk API for 2.4.0 controllers (Bjorn Andersson)
- Add QCS404 support (Bjorn Andersson)
- Assert PERST for at least 100ms (Niklas Cassel)
R-Car host bridge driver:
- Add r8a774a1 DT support (Biju Das)
Tegra host bridge driver:
- Add support for Gen2, opportunistic UpdateFC and ACK (PCIe protocol
details) AER, GPIO-based PERST# (Manikanta Maddireddy)
- Fix many issues, including power-on failure cases, interrupt
masking in suspend, UPHY settings, AFI dynamic clock gating,
pending DLL transactions (Manikanta Maddireddy)
Xilinx host bridge driver:
- Fix NWL Multi-MSI programming (Bharat Kumar Gogada)
Endpoint support:
- Fix 64bit BAR support (Alan Mikhak)
- Fix pcitest build issues (Alan Mikhak, Andy Shevchenko)
Bug fixes:
- Fix NVIDIA GPU multi-function power dependencies (Abhishek Sahu)
- Fix NVIDIA GPU HDA enablement issue (Lukas Wunner)
- Ignore lockdep for sysfs "remove" (Marek Vasut)
Misc:
- Convert docs to reST (Changbin Du, Mauro Carvalho Chehab)"
* tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (107 commits)
PCI: Enable NVIDIA HDA controllers
tools: PCI: Fix installation when `make tools/pci_install`
PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
PCI: Fix typos and whitespace errors
PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr()
PCI: mobiveil: Fix infinite-loop in the INTx handling function
PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine
PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window
PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window
PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup
PCI: mobiveil: Clear the control fields before updating it
PCI: mobiveil: Add configured inbound windows counter
PCI: mobiveil: Fix the valid check for inbound and outbound windows
PCI: mobiveil: Clean-up program_{ib/ob}_windows()
PCI: mobiveil: Remove an unnecessary return value check
PCI: mobiveil: Fix error return values
PCI: mobiveil: Refactor the MEM/IO outbound window initialization
PCI: mobiveil: Make some register updates more readable
PCI: mobiveil: Reformat the code for readability
dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional
...
Diffstat (limited to 'Documentation/PCI/pcieaer-howto.rst')
-rw-r--r-- | Documentation/PCI/pcieaer-howto.rst | 311 |
1 files changed, 311 insertions, 0 deletions
diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst new file mode 100644 index 000000000000..18bdefaafd1a --- /dev/null +++ b/Documentation/PCI/pcieaer-howto.rst @@ -0,0 +1,311 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=========================================================== +The PCI Express Advanced Error Reporting Driver Guide HOWTO +=========================================================== + +:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com> + - Yanmin Zhang <yanmin.zhang@intel.com> + +:Copyright: |copy| 2006 Intel Corporation + +Overview +=========== + +About this guide +---------------- + +This guide describes the basics of the PCI Express Advanced Error +Reporting (AER) driver and provides information on how to use it, as +well as how to enable the drivers of endpoint devices to conform with +PCI Express AER driver. + + +What is the PCI Express AER Driver? +----------------------------------- + +PCI Express error signaling can occur on the PCI Express link itself +or on behalf of transactions initiated on the link. PCI Express +defines two error reporting paradigms: the baseline capability and +the Advanced Error Reporting capability. The baseline capability is +required of all PCI Express components providing a minimum defined +set of error reporting requirements. Advanced Error Reporting +capability is implemented with a PCI Express advanced error reporting +extended capability structure providing more robust error reporting. + +The PCI Express AER driver provides the infrastructure to support PCI +Express Advanced Error Reporting capability. The PCI Express AER +driver provides three basic functions: + + - Gathers the comprehensive error information if errors occurred. + - Reports error to the users. + - Performs error recovery actions. + +AER driver only attaches root ports which support PCI-Express AER +capability. + + +User Guide +========== + +Include the PCI Express AER Root Driver into the Linux Kernel +------------------------------------------------------------- + +The PCI Express AER Root driver is a Root Port service driver attached +to the PCI Express Port Bus driver. If a user wants to use it, the driver +has to be compiled. Option CONFIG_PCIEAER supports this capability. It +depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and +CONFIG_PCIEAER = y. + +Load PCI Express AER Root Driver +-------------------------------- + +Some systems have AER support in firmware. Enabling Linux AER support at +the same time the firmware handles AER may result in unpredictable +behavior. Therefore, Linux does not handle AER events unless the firmware +grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 +Specification for details regarding _OSC usage. + +AER error output +---------------- + +When a PCIe AER error is captured, an error message will be output to +console. If it's a correctable error, it is output as a warning. +Otherwise, it is printed as an error. So users could choose different +log level to filter out correctable error messages. + +Below shows an example:: + + 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) + 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 + 0000:50:00.0: [20] Unsupported Request (First) + 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 + +In the example, 'Requester ID' means the ID of the device who sends +the error message to root port. Pls. refer to pci express specs for +other fields. + +AER Statistics / Counters +------------------------- + +When PCIe AER errors are captured, the counters / statistics are also exposed +in the form of sysfs attributes which are documented at +Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats + +Developer Guide +=============== + +To enable AER aware support requires a software driver to configure +the AER capability structure within its device and to provide callbacks. + +To support AER better, developers need understand how AER does work +firstly. + +PCI Express errors are classified into two types: correctable errors +and uncorrectable errors. This classification is based on the impacts +of those errors, which may result in degraded performance or function +failure. + +Correctable errors pose no impacts on the functionality of the +interface. The PCI Express protocol can recover without any software +intervention or any loss of data. These errors are detected and +corrected by hardware. Unlike correctable errors, uncorrectable +errors impact functionality of the interface. Uncorrectable errors +can cause a particular transaction or a particular PCI Express link +to be unreliable. Depending on those error conditions, uncorrectable +errors are further classified into non-fatal errors and fatal errors. +Non-fatal errors cause the particular transaction to be unreliable, +but the PCI Express link itself is fully functional. Fatal errors, on +the other hand, cause the link to be unreliable. + +When AER is enabled, a PCI Express device will automatically send an +error message to the PCIe root port above it when the device captures +an error. The Root Port, upon receiving an error reporting message, +internally processes and logs the error message in its PCI Express +capability structure. Error information being logged includes storing +the error reporting agent's requestor ID into the Error Source +Identification Registers and setting the error bits of the Root Error +Status Register accordingly. If AER error reporting is enabled in Root +Error Command Register, the Root Port generates an interrupt if an +error is detected. + +Note that the errors as described above are related to the PCI Express +hierarchy and links. These errors do not include any device specific +errors because device specific errors will still get sent directly to +the device driver. + +Configure the AER capability structure +-------------------------------------- + +AER aware drivers of PCI Express component need change the device +control registers to enable AER. They also could change AER registers, +including mask and severity registers. Helper function +pci_enable_pcie_error_reporting could be used to enable AER. See +section 3.3. + +Provide callbacks +----------------- + +callback reset_link to reset pci express link +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This callback is used to reset the pci express physical link when a +fatal error happens. The root port aer service driver provides a +default reset_link function, but different upstream ports might +have different specifications to reset pci express link, so all +upstream ports should provide their own reset_link functions. + +In struct pcie_port_service_driver, a new pointer, reset_link, is +added. +:: + + pci_ers_result_t (*reset_link) (struct pci_dev *dev); + +Section 3.2.2.2 provides more detailed info on when to call +reset_link. + +PCI error-recovery callbacks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The PCI Express AER Root driver uses error callbacks to coordinate +with downstream device drivers associated with a hierarchy in question +when performing error recovery actions. + +Data struct pci_driver has a pointer, err_handler, to point to +pci_error_handlers who consists of a couple of callback function +pointers. AER driver follows the rules defined in +pci-error-recovery.txt except pci express specific parts (e.g. +reset_link). Pls. refer to pci-error-recovery.txt for detailed +definitions of the callbacks. + +Below sections specify when to call the error callback functions. + +Correctable errors +~~~~~~~~~~~~~~~~~~ + +Correctable errors pose no impacts on the functionality of +the interface. The PCI Express protocol can recover without any +software intervention or any loss of data. These errors do not +require any recovery actions. The AER driver clears the device's +correctable error status register accordingly and logs these errors. + +Non-correctable (non-fatal and fatal) errors +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If an error message indicates a non-fatal error, performing link reset +at upstream is not required. The AER driver calls error_detected(dev, +pci_channel_io_normal) to all drivers associated within a hierarchy in +question. for example:: + + EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort + +If Upstream port A captures an AER error, the hierarchy consists of +Downstream port B and EndPoint. + +A driver may return PCI_ERS_RESULT_CAN_RECOVER, +PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on +whether it can recover or the AER driver calls mmio_enabled as next. + +If an error message indicates a fatal error, kernel will broadcast +error_detected(dev, pci_channel_io_frozen) to all drivers within +a hierarchy in question. Then, performing link reset at upstream is +necessary. As different kinds of devices might use different approaches +to reset link, AER port service driver is required to provide the +function to reset link. Firstly, kernel looks for if the upstream +component has an aer driver. If it has, kernel uses the reset_link +callback of the aer driver. If the upstream component has no aer driver +and the port is downstream port, we will perform a hot reset as the +default by setting the Secondary Bus Reset bit of the Bridge Control +register associated with the downstream port. As for upstream ports, +they should provide their own aer service drivers with reset_link +function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and +reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes +to mmio_enabled. + +helper functions +---------------- +:: + + int pci_enable_pcie_error_reporting(struct pci_dev *dev); + +pci_enable_pcie_error_reporting enables the device to send error +messages to root port when an error is detected. Note that devices +don't enable the error reporting by default, so device drivers need +call this function to enable it. + +:: + + int pci_disable_pcie_error_reporting(struct pci_dev *dev); + +pci_disable_pcie_error_reporting disables the device to send error +messages to root port when an error is detected. + +:: + + int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);` + +pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable +error status register. + +Frequent Asked Questions +------------------------ + +Q: + What happens if a PCI Express device driver does not provide an + error recovery handler (pci_driver->err_handler is equal to NULL)? + +A: + The devices attached with the driver won't be recovered. If the + error is fatal, kernel will print out warning messages. Please refer + to section 3 for more information. + +Q: + What happens if an upstream port service driver does not provide + callback reset_link? + +A: + Fatal error recovery will fail if the errors are reported by the + upstream ports who are attached by the service driver. + +Q: + How does this infrastructure deal with driver that is not PCI + Express aware? + +A: + This infrastructure calls the error callback functions of the + driver when an error happens. But if the driver is not aware of + PCI Express, the device might not report its own errors to root + port. + +Q: + What modifications will that driver need to make it compatible + with the PCI Express AER Root driver? + +A: + It could call the helper functions to enable AER in devices and + cleanup uncorrectable status register. Pls. refer to section 3.3. + + +Software error injection +======================== + +Debugging PCIe AER error recovery code is quite difficult because it +is hard to trigger real hardware errors. Software based error +injection can be used to fake various kinds of PCIe errors. + +First you should enable PCIe AER software error injection in kernel +configuration, that is, following item should be in your .config. + +CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m + +After reboot with new kernel or insert the module, a device file named +/dev/aer_inject should be created. + +Then, you need a user space tool named aer-inject, which can be gotten +from: + + https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ + +More information about aer-inject can be found in the document comes +with its source code. |