<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/pci/pcie/err.c, branch linux-5.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-07-22T20:41:03+00:00</updated>
<entry>
<title>PCI/ERR: Clear PCIe Device Status errors only if OS owns AER</title>
<updated>2020-07-22T20:41:03+00:00</updated>
<author>
<name>Jonathan Cameron</name>
<email>Jonathan.Cameron@huawei.com</email>
</author>
<published>2020-06-22T11:35:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=068c29a248b6ddbfdf7bb150b547569759620d36'/>
<id>urn:sha1:068c29a248b6ddbfdf7bb150b547569759620d36</id>
<content type='text'>
pcie_clear_device_status() resets the error bits in the PCIe Device Status
Register (PCI_EXP_DEVSTA).

Previously we did this unconditionally, but on ACPI systems, the _OSC AER
bit negotiates control of the AER capability.  Per sec 4.5.1 of the System
Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers
other error enable/status bits including the following:

  Correctable Error Reporting Enable
  Non-Fatal Error Reporting Enable
  Fatal Error Reporting Enable
  Unsupported Request Reporting Enable

These bits are all in the PCIe Device Control register (the ECN omitted
"Reporting", but I think that's a typo), so by implication the _OSC AER bit
also applies to the error status bits in the PCIe Device Status register:

  Correctable Error Detected
  Non-Fatal Error Detected
  Fatal Error Detected
  Unsupported Request Detected

Clear the PCIe Device Status error bits only when the OS controls the AER
capability and related error enable/status bits.  If platform firmware
controls the AER capability, firmware is responsible for clearing these
bits.

One call path leading here is:

  ghes_do_proc
    ghes_handle_aer
      aer_recover_queue
        schedule_work(&amp;aer_recover_work)
  ...
  aer_recover_work_func
    pcie_do_recovery
      pcie_clear_device_status

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
    2020, affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/14076
[bhelgaas: commit log, move test from pcie_clear_device_status() to callers]
Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com
Signed-off-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()</title>
<updated>2020-07-22T20:38:35+00:00</updated>
<author>
<name>Bjorn Helgaas</name>
<email>bhelgaas@google.com</email>
</author>
<published>2020-07-16T22:34:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=600a5b4fc8e8f6dad098cd50e0e727cb2a16be46'/>
<id>urn:sha1:600a5b4fc8e8f6dad098cd50e0e727cb2a16be46</id>
<content type='text'>
pci_aer_clear_device_status() clears the error bits in the PCIe Device
Status Register (PCI_EXP_DEVSTA).  Every PCIe device has this register,
regardless of whether it supports AER.

Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make
clear that it is PCIe-specific but not AER-specific.  Move it to
drivers/pci/pci.c, again since it's not AER-specific.  No functional change
intended.

Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'</title>
<updated>2020-07-07T22:11:52+00:00</updated>
<author>
<name>Luc Van Oostenryck</name>
<email>luc.vanoostenryck@gmail.com</email>
</author>
<published>2020-07-02T16:26:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=16d79cd4e23b1964d36c041ab027505ceacbbeeb'/>
<id>urn:sha1:16d79cd4e23b1964d36c041ab027505ceacbbeeb</id>
<content type='text'>
The method struct pci_error_handlers.error_detected() is defined and
documented as taking an 'enum pci_channel_state' for the second argument,
but most drivers use 'pci_channel_state_t' instead.

This 'pci_channel_state_t' is not a typedef for the enum but a typedef for
a bitwise type in order to have better/stricter typechecking.

Consolidate everything by using 'pci_channel_state_t' in the method's
definition, in the related helpers and in the drivers.

Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state'
with an anonymous 'enum'.

Note: Currently, from a typechecking point of view this patch changes
nothing because only the constants defined by the enum are bitwise, not the
enum itself (sparse doesn't have the notion of 'bitwise enum'). This may
change in some not too far future, hence the patch.

[bhelgaas: squash in
  https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.com
  https://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com]
Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck &lt;luc.vanoostenryck@gmail.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/AER: Rationalize error status register clearing</title>
<updated>2020-03-28T18:19:05+00:00</updated>
<author>
<name>Kuppuswamy Sathyanarayanan</name>
<email>sathyanarayanan.kuppuswamy@linux.intel.com</email>
</author>
<published>2020-03-24T00:26:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=894020fdd88c1e9a74c60b67c0f19f1c7696ba2f'/>
<id>urn:sha1:894020fdd88c1e9a74c60b67c0f19f1c7696ba2f</id>
<content type='text'>
The AER interfaces to clear error status registers were a confusing mess:

  - pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors
    from the Uncorrectable Error Status register.

  - pci_aer_clear_fatal_status() cleared fatal errors from the
    Uncorrectable Error Status register.

  - pci_cleanup_aer_error_status_regs() cleared the Root Error Status
    register (for Root Ports), the Uncorrectable Error Status register,
    and the Correctable Error Status register.

Rename them to make them consistent:

  From                                     To
  ---------------------------------------- -------------------------------
  pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status()
  pci_aer_clear_fatal_status()             pci_aer_clear_fatal_status()
  pci_cleanup_aer_error_status_regs()      pci_aer_clear_status()

Since pci_cleanup_aer_error_status_regs() (renamed to
pci_aer_clear_status()) is only used within drivers/pci/, move the
declaration from &lt;linux/aer.h&gt; to drivers/pci/pci.h.

[bhelgaas: commit log, add renames]
Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan &lt;sathyanarayanan.kuppuswamy@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/ERR: Return status of pcie_do_recovery()</title>
<updated>2020-03-28T18:19:01+00:00</updated>
<author>
<name>Kuppuswamy Sathyanarayanan</name>
<email>sathyanarayanan.kuppuswamy@linux.intel.com</email>
</author>
<published>2020-03-24T00:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e8e5ff2aeec19ade42f0535f4b554a3f6e1a58f7'/>
<id>urn:sha1:e8e5ff2aeec19ade42f0535f4b554a3f6e1a58f7</id>
<content type='text'>
As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS
supports Error Disconnect Recover (EDR), it must invalidate the software
state associated with child devices of the port without attempting to
access the child device hardware. In addition, if the OS supports DPC, it
must attempt to recover the child devices if the port implements the DPC
Capability. If the OS continues operation, the OS must inform the firmware
of the status of the recovery operation via the _OST method.

Return the result of pcie_do_recovery() so we can report it to firmware via
_OST.

[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
    affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/12888

Link: https://lore.kernel.org/r/eb60ec89448769349c6722954ffbf2de163155b5.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan &lt;sathyanarayanan.kuppuswamy@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/ERR: Remove service dependency in pcie_do_recovery()</title>
<updated>2020-03-28T18:18:54+00:00</updated>
<author>
<name>Kuppuswamy Sathyanarayanan</name>
<email>sathyanarayanan.kuppuswamy@linux.intel.com</email>
</author>
<published>2020-03-24T00:26:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6cf1a42f916af0b056079c37fc5fa7bf8e4b2e2'/>
<id>urn:sha1:b6cf1a42f916af0b056079c37fc5fa7bf8e4b2e2</id>
<content type='text'>
Previously we passed the PCIe service type parameter to pcie_do_recovery(),
where reset_link() looked up the underlying pci_port_service_driver and its
.reset_link() function pointer. Instead of using this roundabout way, we
can just pass the driver-specific .reset_link() callback function when
calling pcie_do_recovery() function.

This allows us to call pcie_do_recovery() from code that is not a PCIe port
service driver, e.g., Error Disconnect Recover (EDR) support.

Remove pcie_port_find_service() and pcie_port_service_driver.reset_link
since they are now unused.

Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan &lt;sathyanarayanan.kuppuswamy@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/ERR: Update error status after reset_link()</title>
<updated>2020-03-28T16:52:22+00:00</updated>
<author>
<name>Kuppuswamy Sathyanarayanan</name>
<email>sathyanarayanan.kuppuswamy@linux.intel.com</email>
</author>
<published>2020-03-24T00:25:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6d2c89441571ea534d6240f7724f518936c44f8d'/>
<id>urn:sha1:6d2c89441571ea534d6240f7724f518936c44f8d</id>
<content type='text'>
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
reset_link() to recover from fatal errors.  But during fatal error
recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
reset_link()) pcie_do_recovery() will report the recovery result as
failure.  Update the status of error after reset_link().

You can reproduce this issue by triggering a SW DPC using "DPC Software
Trigger" bit in "DPC Control Register".  You should see recovery failed
dmesg log as below:

  pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000
  pcieport 0000:00:16.0: DPC: software trigger detected
  pci 0000:04:00.0: AER: can't recover (no error_detected callback)
  pcieport 0000:00:16.0: AER: device recovery failed

Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
Link: https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
[bhelgaas: split pci_channel_io_frozen simplification to separate patch]
Signed-off-by: Kuppuswamy Sathyanarayanan &lt;sathyanarayanan.kuppuswamy@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Acked-by: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Ashok Raj &lt;ashok.raj@intel.com&gt;
</content>
</entry>
<entry>
<title>PCI/ERR: Combine pci_channel_io_frozen cases</title>
<updated>2020-03-28T16:50:56+00:00</updated>
<author>
<name>Kuppuswamy Sathyanarayanan</name>
<email>sathyanarayanan.kuppuswamy@linux.intel.com</email>
</author>
<published>2020-03-27T22:33:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5dfbeacf74865a8d62a4f70f501cdc61510f8e0'/>
<id>urn:sha1:b5dfbeacf74865a8d62a4f70f501cdc61510f8e0</id>
<content type='text'>
pcie_do_recovery() had two "if (state == pci_channel_io_frozen)" cases
right after each other.  Combine them to make this easier to read.  No
functional change intended.

Link: https://lore.kernel.org/r/20200317170654.GA23125@infradead.org
[bhelgaas: split from https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com]
Signed-off-by: Kuppuswamy Sathyanarayanan &lt;sathyanarayanan.kuppuswamy@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/AER: Factor message prefixes with dev_fmt()</title>
<updated>2020-01-23T22:39:53+00:00</updated>
<author>
<name>Bjorn Helgaas</name>
<email>bhelgaas@google.com</email>
</author>
<published>2019-12-13T22:46:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8d077c3ce0109c406c265cafc334258caee47e6d'/>
<id>urn:sha1:8d077c3ce0109c406c265cafc334258caee47e6d</id>
<content type='text'>
Define dev_fmt() with the common prefix of log messages so we don't have to
repeat it in every printk.  No functional change intended.

Link: https://lore.kernel.org/r/20191213225709.GA213811@google.com
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
<entry>
<title>PCI/AER: Log which device prevents error recovery</title>
<updated>2020-01-23T22:39:02+00:00</updated>
<author>
<name>Yicong Yang</name>
<email>yangyicong@hisilicon.com</email>
</author>
<published>2019-12-13T11:44:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=01daacfb9035e5b86d43a01f11a0614648f306c1'/>
<id>urn:sha1:01daacfb9035e5b86d43a01f11a0614648f306c1</id>
<content type='text'>
PCI error recovery will fail if any device under the Root Port doesn't have
an error_detected callback.  Currently only the failure result is printed,
which is not enough to identify the driver that lacks the callback.

Log a message to identify the device with no error_detected callback.

[bhelgaas: tweak log message]
Link: https://lore.kernel.org/r/1576237474-32021-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
</content>
</entry>
</feed>
