<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/pci/hotplug, branch v5.4.113</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.4.113</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.4.113'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-03-24T10:26:43+00:00</updated>
<entry>
<title>PCI: rpadlpar: Fix potential drc_name corruption in store functions</title>
<updated>2021-03-24T10:26:43+00:00</updated>
<author>
<name>Tyrel Datwyler</name>
<email>tyreld@linux.ibm.com</email>
</author>
<published>2021-03-15T21:48:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=51a2b19b554c8c75ee2d253b87240309cd81f1fc'/>
<id>urn:sha1:51a2b19b554c8c75ee2d253b87240309cd81f1fc</id>
<content type='text'>
commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|&lt;82&gt;!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler &lt;tyreld@linux.ibm.com&gt;
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyreld@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>PCI: pciehp: Fix MSI interrupt race</title>
<updated>2020-10-01T11:17:52+00:00</updated>
<author>
<name>Stuart Hayes</name>
<email>stuart.w.hayes@gmail.com</email>
</author>
<published>2020-02-19T14:31:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4667358dab9cc07da044d5bc087065545b1000df'/>
<id>urn:sha1:4667358dab9cc07da044d5bc087065545b1000df</id>
<content type='text'>
[ Upstream commit 8edf5332c39340b9583cf9cba659eb7ec71f75b5 ]

Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:

The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt.  If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits.  If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.

That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".

Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem.  The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).

Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.

[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment &amp; commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes &lt;stuart.w.hayes@gmail.com&gt;
Signed-off-by: Stuart Hayes &lt;stuart.w.hayes@gmail.com&gt;
Signed-off-by: Lukas Wunner &lt;lukas@wunner.de&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()</title>
<updated>2020-08-21T11:05:20+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-06-26T17:42:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae86233204bab50945c1c18b8664ddae00517748'/>
<id>urn:sha1:ae86233204bab50945c1c18b8664ddae00517748</id>
<content type='text'>
commit dae68d7fd4930315389117e9da35b763f12238f9 upstream.

If context is not NULL in acpiphp_grab_context(), but the
is_going_away flag is set for the device's parent, the reference
counter of the context needs to be decremented before returning
NULL or the context will never be freed, so make that happen.

Fixes: edf5bf34d408 ("ACPI / dock: Use callback pointers from devices' ACPI hotplug contexts")
Reported-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Cc: 3.15+ &lt;stable@vger.kernel.org&gt; # 3.15+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>PCI: pciehp: Prevent deadlock on disconnect</title>
<updated>2020-04-29T14:33:04+00:00</updated>
<author>
<name>Mika Westerberg</name>
<email>mika.westerberg@linux.intel.com</email>
</author>
<published>2019-10-29T17:00:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=01fad934f1bdca50a952b3347f4e234a3e1e4ff8'/>
<id>urn:sha1:01fad934f1bdca50a952b3347f4e234a3e1e4ff8</id>
<content type='text'>
[ Upstream commit 87d0f2a5536fdf5053a6d341880f96135549a644 ]

This addresses deadlocks in these common cases in hierarchies containing
two switches:

  - All involved ports are runtime suspended and they are unplugged. This
    can happen easily if the drivers involved automatically enable runtime
    PM (xHCI for example does that).

  - System is suspended (e.g., closing the lid on a laptop) with a dock +
    something else connected, and the dock is unplugged while suspended.

These cases lead to the following deadlock:

  INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
  irq/126-pciehp  D    0   198      2 0x80000000
  Call Trace:
   schedule+0x2c/0x80
   schedule_timeout+0x246/0x350
   wait_for_completion+0xb7/0x140
   kthread_stop+0x49/0x110
   free_irq+0x32/0x70
   pcie_shutdown_notification+0x2f/0x50
   pciehp_remove+0x27/0x50
   pcie_port_remove_service+0x36/0x50
   device_release_driver+0x12/0x20
   bus_remove_device+0xec/0x160
   device_del+0x13b/0x350
   device_unregister+0x1a/0x60
   remove_iter+0x1e/0x30
   device_for_each_child+0x56/0x90
   pcie_port_device_remove+0x22/0x40
   pcie_portdrv_remove+0x20/0x60
   pci_device_remove+0x3e/0xc0
   device_release_driver_internal+0x18c/0x250
   device_release_driver+0x12/0x20
   pci_stop_bus_device+0x6f/0x90
   pci_stop_bus_device+0x31/0x90
   pci_stop_and_remove_bus_device+0x12/0x20
   pciehp_unconfigure_device+0x88/0x140
   pciehp_disable_slot+0x6a/0x110
   pciehp_handle_presence_or_link_change+0x263/0x400
   pciehp_ist+0x1c9/0x1d0
   irq_thread_fn+0x24/0x60
   irq_thread+0xeb/0x190
   kthread+0x120/0x140

  INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
  irq/190-pciehp  D    0  2288      2 0x80000000
  Call Trace:
   __schedule+0x2a2/0x880
   schedule+0x2c/0x80
   schedule_preempt_disabled+0xe/0x10
   mutex_lock+0x2c/0x30
   pci_lock_rescan_remove+0x15/0x20
   pciehp_unconfigure_device+0x4d/0x140
   pciehp_disable_slot+0x6a/0x110
   pciehp_handle_presence_or_link_change+0x263/0x400
   pciehp_ist+0x1c9/0x1d0
   irq_thread_fn+0x24/0x60
   irq_thread+0xeb/0x190
   kthread+0x120/0x140

What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, which got the hot-remove event, starts
removing devices below it, taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence() which
ends up calling pciehp_card_present() and pciehp_check_link_active().  Both
of these use pcie_capability_read_word(), which notices that the underlying
device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the
capability value set to 0. When pciehp gets this value it thinks that its
child device is also hot-removed and schedules its IRQ thread to handle the
event.

The deadlock happens when the child's IRQ thread runs and tries to acquire
pci_lock_rescan_remove() which is already taken by the parent and the
parent waits for the child's IRQ thread to finish.

Prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.

[bhelgaas: add common scenarios to commit log]
Link: https://lore.kernel.org/r/20191029170022.57528-2-mika.westerberg@linux.intel.com
Tested-by: Kai-Heng Feng &lt;kai.heng.feng@canonical.com&gt;
Signed-off-by: Mika Westerberg &lt;mika.westerberg@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: pciehp: Fix indefinite wait on sysfs requests</title>
<updated>2020-04-17T08:50:10+00:00</updated>
<author>
<name>Lukas Wunner</name>
<email>lukas@wunner.de</email>
</author>
<published>2020-03-18T11:33:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=463181e64f5fe0ea4bf97e9e8d081b524f84c168'/>
<id>urn:sha1:463181e64f5fe0ea4bf97e9e8d081b524f84c168</id>
<content type='text'>
commit 3e487d2e4aa466decd226353755c9d423e8fbacc upstream.

David Hoyer reports that powering pciehp slots up or down via sysfs may
hang:  The call to wait_event() in pciehp_sysfs_enable_slot() and
_disable_slot() does not return because ctrl-&gt;ist_running remains true.

This flag, which was introduced by commit 157c1062fcd8 ("PCI: pciehp: Avoid
returning prematurely from sysfs requests"), signifies that the IRQ thread
pciehp_ist() is running.  It is set to true at the top of pciehp_ist() and
reset to false at the end.  However there are two additional return
statements in pciehp_ist() before which the commit neglected to reset the
flag to false and wake up waiters for the flag.

That omission opens up the following race when powering up the slot:

* pciehp_ist() runs because a PCI_EXP_SLTSTA_PDC event was requested
  by pciehp_sysfs_enable_slot()

* pciehp_ist() turns on slot power via the following call stack:
  pciehp_handle_presence_or_link_change() -&gt; pciehp_enable_slot() -&gt;
  __pciehp_enable_slot() -&gt; board_added() -&gt; pciehp_power_on_slot()

* after slot power is turned on, the link comes up, resulting in a
  PCI_EXP_SLTSTA_DLLSC event

* the IRQ handler pciehp_isr() stores the event in ctrl-&gt;pending_events
  and returns IRQ_WAKE_THREAD

* the IRQ thread is already woken (it's bringing up the slot), but the
  genirq code remembers to re-run the IRQ thread after it has finished
  (such that it can deal with the new event) by setting IRQTF_RUNTHREAD
  via __handle_irq_event_percpu() -&gt; __irq_wake_thread()

* the IRQ thread removes PCI_EXP_SLTSTA_DLLSC from ctrl-&gt;pending_events
  via board_added() -&gt; pciehp_check_link_status() in order to deal with
  presence and link flaps per commit 6c35a1ac3da6 ("PCI: pciehp:
  Tolerate initially unstable link")

* after pciehp_ist() has successfully brought up the slot, it resets
  ctrl-&gt;ist_running to false and wakes up the sysfs requester

* the genirq code re-runs pciehp_ist(), which sets ctrl-&gt;ist_running
  to true but then returns with IRQ_NONE because ctrl-&gt;pending_events
  is empty

* pciehp_sysfs_enable_slot() is finally woken but notices that
  ctrl-&gt;ist_running is true, hence continues waiting

The only way to get the hung task going again is to trigger a hotplug
event which brings down the slot, e.g. by yanking out the card.

The same race exists when powering down the slot because remove_board()
likewise clears link or presence changes in ctrl-&gt;pending_events per commit
3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes after powering off a
slot") and thereby may cause a re-run of pciehp_ist() which returns with
IRQ_NONE without resetting ctrl-&gt;ist_running to false.

Fix by adding a goto label before the teardown steps at the end of
pciehp_ist() and jumping to that label from the two return statements which
currently neglect to reset the ctrl-&gt;ist_running flag.

Fixes: 157c1062fcd8 ("PCI: pciehp: Avoid returning prematurely from sysfs requests")
Link: https://lore.kernel.org/r/cca1effa488065cb055120aa01b65719094bdcb5.1584530321.git.lukas@wunner.de
Reported-by: David Hoyer &lt;David.Hoyer@netapp.com&gt;
Signed-off-by: Lukas Wunner &lt;lukas@wunner.de&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Cc: stable@vger.kernel.org	# v4.19+
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>PCI: pciehp: Do not disable interrupt twice on suspend</title>
<updated>2020-01-17T18:48:51+00:00</updated>
<author>
<name>Mika Westerberg</name>
<email>mika.westerberg@linux.intel.com</email>
</author>
<published>2019-10-29T17:00:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee30108f7a005f5cd4a07b921deb99c885109bc2'/>
<id>urn:sha1:ee30108f7a005f5cd4a07b921deb99c885109bc2</id>
<content type='text'>
commit 75fcc0ce72e5cea2e357cdde858216c5bad40442 upstream.

We try to keep PCIe hotplug ports runtime suspended when entering system
suspend. Because the PCIe portdrv sets the DPM_FLAG_NEVER_SKIP flag, the PM
core always calls system suspend/resume hooks even if the device is left
runtime suspended. Since PCIe hotplug driver re-used the same function for
both runtime suspend and system suspend, it ended up disabling hotplug
interrupt twice and the second time following was printed:

  pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device

Prevent this from happening by checking whether the device is already
runtime suspended when the system suspend hook is called.

Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
Link: https://lore.kernel.org/r/20191029170022.57528-1-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng &lt;kai.heng.feng@canonical.com&gt;
Tested-by: Kai-Heng Feng &lt;kai.heng.feng@canonical.com&gt;
Signed-off-by: Mika Westerberg &lt;mika.westerberg@linux.intel.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>PCI: rpaphp: Correctly match ibm, my-drc-index to drc-name when using drc-info</title>
<updated>2020-01-04T18:18:00+00:00</updated>
<author>
<name>Tyrel Datwyler</name>
<email>tyreld@linux.ibm.com</email>
</author>
<published>2019-11-11T05:21:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03c90248c574b002a588a97187000179db6aef7d'/>
<id>urn:sha1:03c90248c574b002a588a97187000179db6aef7d</id>
<content type='text'>
[ Upstream commit 4f9f2d3d7a434b7f882b72550194c9278f4a3925 ]

The newer ibm,drc-info property is a condensed description of the old
ibm,drc-* properties (ie. names, types, indexes, and power-domains).
When matching a drc-index to a drc-name we need to verify that the
index is within the start and last drc-index range and map it to a
drc-name using the drc-name-prefix and logical index.

Fix the mapping by checking that the index is within the range of the
current drc-info entry, and build the name from the drc-name-prefix
concatenated with the starting drc-name-suffix value and the sequential
index obtained by subtracting ibm,my-drc-index from this entries
drc-start-index.

Signed-off-by: Tyrel Datwyler &lt;tyreld@linux.ibm.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1573449697-5448-10-git-send-email-tyreld@linux.ibm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: rpaphp: Annotate and correctly byte swap DRC properties</title>
<updated>2020-01-04T18:17:57+00:00</updated>
<author>
<name>Tyrel Datwyler</name>
<email>tyreld@linux.ibm.com</email>
</author>
<published>2019-11-11T05:21:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b49ded4448ac0153656c44d1165e9f970714bb8'/>
<id>urn:sha1:1b49ded4448ac0153656c44d1165e9f970714bb8</id>
<content type='text'>
[ Upstream commit 0737686778c6dbe0908d684dd5b9c05b127526ba ]

The device tree is in big endian format and any properties directly
retrieved using OF helpers that don't explicitly byte swap should
be annotated. In particular there are several places where we grab
the opaque property value for the old ibm,drc-* properties and the
ibm,my-drc-index property.

Fix this for better static checking by annotating values we know to
explicitly big endian, and byte swap where appropriate.

Signed-off-by: Tyrel Datwyler &lt;tyreld@linux.ibm.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1573449697-5448-9-git-send-email-tyreld@linux.ibm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: rpaphp: Don't rely on firmware feature to imply drc-info support</title>
<updated>2020-01-04T18:17:55+00:00</updated>
<author>
<name>Tyrel Datwyler</name>
<email>tyreld@linux.ibm.com</email>
</author>
<published>2019-11-11T05:21:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7aae44bbc91b90ae2b9fba573e51469ea4e1c7e4'/>
<id>urn:sha1:7aae44bbc91b90ae2b9fba573e51469ea4e1c7e4</id>
<content type='text'>
[ Upstream commit 52e2b0f16574afd082cff0f0e8567b2d9f68c033 ]

In the event that the partition is migrated to a platform with older
firmware that doesn't support the ibm,drc-info property the device
tree is modified to remove the ibm,drc-info property and replace it
with the older style ibm,drc-* properties for types, names, indexes,
and power-domains. One of the requirements of the drc-info firmware
feature is that the client is able to handle both the new property,
and old style properties at runtime. Therefore we can't rely on the
firmware feature alone to dictate which property is currently
present in the device tree.

Fix this short coming by checking explicitly for the ibm,drc-info
property, and falling back to the older ibm,drc-* properties if it
doesn't exist.

Signed-off-by: Tyrel Datwyler &lt;tyreld@linux.ibm.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1573449697-5448-6-git-send-email-tyreld@linux.ibm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: rpaphp: Fix up pointer to first drc-info entry</title>
<updated>2020-01-04T18:17:44+00:00</updated>
<author>
<name>Tyrel Datwyler</name>
<email>tyreld@linux.ibm.com</email>
</author>
<published>2019-11-11T05:21:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b8361f9a864d954e563528f27769fae66f1881db'/>
<id>urn:sha1:b8361f9a864d954e563528f27769fae66f1881db</id>
<content type='text'>
[ Upstream commit 9723c25f99aff0451cfe6392e1b9fdd99d0bf9f0 ]

The first entry of the ibm,drc-info property is an int encoded count
of the number of drc-info entries that follow. The "value" pointer
returned by of_prop_next_u32() is still pointing at the this value
when we call of_read_drc_info_cell(), but the helper function
expects that value to be pointing at the first element of an entry.

Fix up by incrementing the "value" pointer to point at the first
element of the first drc-info entry prior.

Signed-off-by: Tyrel Datwyler &lt;tyreld@linux.ibm.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/1573449697-5448-5-git-send-email-tyreld@linux.ibm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
