kernel/linux.git/drivers/misc/cxl, branch v3.18.76

cxl: Keep IRQ mappings on context teardown

2016-05-17T18:32:43+00:00

[ Upstream commit d6776bba44d9752f6cdf640046070e71ee4bba7b ] Keep IRQ mappings on context teardown. This won't leak IRQs as if we allocate the mapping again, the generic code will give the same mapping used last time. Doing this works around a race in the generic code. Masking the interrupt introduces a race which can crash the kernel or result in IRQ that is never EOIed. The lost of EOI results in all subsequent mappings to the same HW IRQ never receiving an interrupt. We've seen this race with cxl test cases which are doing heavy context startup and teardown at the same time as heavy interrupt load. A fix to the generic code is being investigated also. Signed-off-by: Michael Neuling Cc: stable@vger.kernel.org # 3.8 Tested-by: Andrew Donnellan Acked-by: Ian Munsie Tested-by: Vaibhav Jain Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin

cxl: Fix unbalanced pci_dev_get in cxl_probe

2015-10-27T13:33:08+00:00

[ Upstream commit 2925c2fdf1e0eb642482f5b30577e9435aaa8edb ] Currently the first thing we do in cxl_probe is to grab a reference on the pci device. Later on, we call device_register on our adapter. In our remove path, we call device_unregister, but we never call pci_dev_put. We therefore leak the device every time we do a reflash. device_register/unregister is sufficient to hold the reference. Therefore, drop the call to pci_dev_get. Here's why this is safe. The proposed cxl_probe(pdev) calls cxl_adapter_init: a) init calls cxl_adapter_alloc, which creates a struct cxl, conventionally called adapter. This struct contains a device entry, adapter->dev. b) init calls cxl_configure_adapter, where we set adapter->dev.parent = &dev->dev (here dev is the pci dev) So at this point, the cxl adapter's device's parent is the PCI device that I want to be refcounted properly. c) init calls cxl_register_adapter *) cxl_register_adapter calls device_register(&adapter->dev) So now we're in device_register, where dev is the adapter device, and we want to know if the PCI device is safe after we return. device_register(&adapter->dev) calls device_initialize() and then device_add(). device_add() does a get_device(). device_add() also explicitly grabs the device's parent, and calls get_device() on it: parent = get_device(dev->parent); So therefore, device_register() takes a lock on the parent PCI dev, which is what pci_dev_get() was guarding. pci_dev_get() can therefore be safely removed. Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens Acked-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin

cxl: Add missing return statement after handling AFU errror

2015-03-24T01:02:54+00:00

commit a6130ed253a931d2169c26ab0958d81b0dce4d6e upstream. We were missing a return statement in the PSL interrupt handler in the case of an AFU error, which would trigger an "Unhandled CXL PSL IRQ" warning. We do actually handle these type of errors (by notifying userspace), so add the missing return IRQ_HANDLED so we don't throw unecessary warnings. Signed-off-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Fix device_node reference counting

2015-03-24T01:02:54+00:00

commit 6f963ec2d6bf2476a16799eece920acb2100ff1c upstream. When unbinding and rebinding the driver on a system with a card in PHB0, this error condition is reached after a few attempts: ERROR: Bad of_node_put() on /pciex@3fffe40000000 CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152 Call Trace: [c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable) [c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0 [c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0 [c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30 [c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50 [c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0 [c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170 [c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0 [c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160 [c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60 [c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0 [c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0 [c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260 [c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100 [c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98 We are missing a call to of_node_get(). pnv_pci_to_phb_node() should call of_node_get() otherwise np's reference count isn't incremented and it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node() so it's clear it calls of_node_get(). Signed-off-by: Ryan Grimm Acked-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Use image state defaults for reloading FPGA

2015-03-24T01:02:54+00:00

commit 4beb5421babee1204757b877622830c6aa31be6d upstream. Select defaults such that a PERST causes flash image reload. Select which image based on what the card is set up to load. CXL_VSEC_PERST_LOADS_IMAGE selects whether PERST assertion causes flash image load. CXL_VSEC_PERST_SELECT_USER selects which image is loaded on the next PERST. cxl_update_image_control writes these bits into the VSEC. Signed-off-by: Ryan Grimm Acked-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Unmap MMIO regions when detaching a context

2015-01-27T16:29:36+00:00

commit b123429e6a9e8d03aacf888d23262835f0081448 upstream. If we need to force detach a context (e.g. due to EEH or simply force unbinding the driver) we should prevent the userspace contexts from being able to access the Problem State Area MMIO region further, which they may have mapped with mmap(). This patch unmaps any mapped MMIO regions when detaching a userspace context. Signed-off-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Add timeout to process element commands

2015-01-27T16:29:36+00:00

commit a98e6e9f4e0224d85b4d951edc44af16dfe6094a upstream. In the event that something goes wrong in the hardware and it is unable to complete a process element comment we would end up polling forever, effectively making the associated process unkillable. This patch adds a timeout to the process element command code path, so that we will give up if the hardware does not respond in a reasonable time. Signed-off-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Change contexts_lock to a mutex to fix sleep while atomic bug

2015-01-27T16:29:36+00:00

commit ee41d11d53c8fc4968f0816504651541d606cf40 upstream. We had a known sleep while atomic bug if a CXL device was forcefully unbound while it was in use. This could occur as a result of EEH, or manually induced with something like this while the device was in use: echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind The issue was that in this code path we iterated over each context and forcefully detached it with the contexts_lock spin lock held, however the detach also needed to take the spu_mutex, and call schedule. This patch changes the contexts_lock to a mutex so that we are not in atomic context while doing the detach, thereby avoiding the sleep while atomic. Also delete the related TODO comment, which suggested an alternate solution which turned out to not be workable. Signed-off-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman

cxl: Fix PSL error due to duplicate segment table entries

2014-10-28T08:52:52+00:00

In certain circumstances the PSL (Power Service Layer, which provides translation services for CXL hardware) can send an interrupt for a segment miss that the kernel has already handled. This can happen if multiple translations for the same segment are queued in the PSL before the kernel has restarted the first translation. The CXL driver does not expect this situation and does not check if a segment had already been handled. This could cause a duplicate segment table entry which in turn caused a PSL error taking down the card. This patch fixes the issue by checking for existing entries in the segment table that match the segment we are trying to insert, so as to avoid inserting duplicate entries. Signed-off-by: Ian Munsie Reviewed-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman

cxl: Refactor cxl_load_segment() and find_free_sste()

2014-10-28T08:52:24+00:00

This moves the segment table hash calculation from cxl_load_segment() into find_free_sste() since that is the only place it is actually used. Signed-off-by: Ian Munsie Reviewed-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman