summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-03-22Merge branches 'i40iw', 'sriov' and 'hfi1' into k.o/for-4.6Doug Ledford224-17223/+24043
2016-03-22IB/ipoib: Allow mcast packets from other VFsEli Cohen1-7/+20
With SRIOV enabled, two VFs on the same HCA which have the same port LID and may have the same QP number. To enable receiving multicasts from such VFs, further qualify the check: ignore the receive only if, in addition, the packet source gid equals the receiving VF's source gid. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22IB/mlx5: Implement callbacks for manipulating VFsEli Cohen6-2/+223
Implement the IB defined callbacks used to manipulate the policy for the link state, set GUIDs or get statistics information. This functionality is added into a new file that will be used to add any SRIOV related functionality to the mlx5 IB layer. The following callbacks have been added: mlx5_ib_get_vf_config mlx5_ib_set_vf_link_state mlx5_ib_get_vf_stats mlx5_ib_set_vf_guid In addition, publish whether this device is based on a virtual function. In mlx5 supported devices, virtual functions are implemented as vHCAs. vHCAs have their own QP number space so it is possible that two vHCAs will use a QP with the same number at the same time. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22net/mlx5_core: Implement modify HCA vport commandEli Cohen3-0/+77
Implement the modify HCA vport commands used to modify the parameters of virtual HCA's ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22net/mlx5_core: Add VF param when querying vport counterEli Cohen3-4/+6
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to query counters of virtual functions. Also update current users of the API. PFs may call mlx5_core_query_vport_counter with other_vport set to indicate that they are querying a virtual function. The virtual function to be queried is given by the vf parameter. Virtual function numbering is zero based so the first VF is 0 and so on. When a PF queries its own function, the other_vport parameter is cleared. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22IB/ipoib: Add ndo operations for configuring VFsEli Cohen1-2/+63
Add ndo operations to the network driver that enables configuring the following operations: ipoib_set_vf_link_state - configure the VF link policy ipoib_get_vf_config - get link state configuration ipoib_set_vf_guid - set a VF port or node GUID ipoib_get_vf_stats - get statistics of a VF Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22IB/core: Add interfaces to control VF attributesEli Cohen2-0/+59
Following the practice exercised for network devices which allow the PF net device to configure attributes of its virtual functions, we introduce the following functions to be used by IPoIB which is the network driver implementation for IB devices. ib_set_vf_link_state - set the policy for a VF link. More below. ib_get_vf_config - read configuration information of a VF ib_get_vf_stats - read VF statistics ib_set_vf_guid - set the node or port GUID of a VF Also add an indication in the device cap flags that indicates that this IB devices is based on a virtual function. A VF shares the physical port with the PF and other VFs. When setting the link state we have three options: 1. Auto - in this mode, the virtual port follows the state of the physical port and becomes active only if the physical port's state is active. In all other cases it remains in a Down state. 2. Down - sets the state of the virtual port to Down 3. Up - causes the virtual port to transition into Initialize state if it was not already in this state. A virtualization aware subnet manager can then bring the state of the port into the Active state. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Support accessing SA in virtualized environmentEli Cohen2-0/+11
Per the ongoing standardisation process, when virtual HCAs are present in a network, traffic is routed based on a destination GID. In order to access the SA we use the well known SA GID. We also add a GRH required boolean field to the port attributes which is used to report to the verbs consumer whether this port is connected to a virtual network. We use this field to realize whether we need to create an address vector with GRH to access the subnet administrator. We clear the port attributes struct before calling the hardware driver to make sure the default remains that GRH is not required. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Add subnet prefix to port infoEli Cohen2-1/+15
The subnet prefix is a part of the port_info MAD returned and should be available at the ib_port_attr struct. We define it here and provide a default implementation in case the hardware driver does not provide one. The subnet prefix is required when creating the address vector to access the SA in networks where GRH must be used. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/mlx5: Fix decision on using MAD_IFCEli Cohen1-1/+1
Fix the condition that dictates when MAD_IFC should be used. According to firmware specifications, MAD_IFC commands must be used only if the ib_virt capability is off. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/core: Add support for configuring VF GUIDsEli Cohen3-0/+46
Add two new NLAs to support configuration of Infiniband node or port GUIDs. New applications can choose to use this interface to configure GUIDs with iproute2 with commands such as: ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70 ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78 A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the request to change the GUID. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/{core, ulp} Support above 32 possible device capability flagsLeon Romanovsky3-3/+3
The old bitwise device_cap_flags variable was limited to u32 which has all bits already defined. In order to overcome it, we converted device_cap_flags variable to be u64 type. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Replace setting the zero values in ib_uverbs_ex_query_deviceLeon Romanovsky1-12/+3
The setting to zero during variable initialization eliminates the need to explicitly set to zero variables and structures. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Introduce offload arithmetic hardware capabilitiesSagi Grimberg3-1/+42
Define the necessary hardware structures for the offload arithmetic capabilities and read/cache them on driver load. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Refactor device capability functionLeon Romanovsky3-62/+24
Device capability function was called similar in all places. It was called twice for every queried parameter, while the difference between calls was in HCA capability mode only. The change proposed unify these calls into one function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21net/mlx5_core: Fix caching ATOMIC endian mode capabilityLeon Romanovsky1-0/+4
Add caching of maximum device capability of ATOMIC endian mode. Fixes: f91e6d8941bf ('net/mlx5_core: Add setting ATOMIC endian mode') Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21ib_srpt: fix a WARN_ON() messageDan Carpenter1-1/+1
The first argument of WARN_ON() is a condition, so it means the warning message here will just be the name without the ->qp_num information. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21i40iw: Replace the obsolete crypto hash interface with shashTatyana Nikolova5-21/+37
This patch replaces the obsolete crypto hash interface with shash and resolves a build failure after merge of the rdma tree which is caused by the removal of crypto hash interface Removing CRYPTO_ALG_ASYNC from crypto_alloc_shash(), because it is by definition sync only Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Add SDMA cache eviction algorithmMitko Haralanov2-2/+62
This commit adds a cache eviction algorithm for the SDMA user buffer cache. Besides the interval RB tree used for node lookup, the cache nodes are also arranged in a doubly-linked list. When a node is used, it is put at the beginning of the list. Less frequently used nodes naturally move to the tail of the list. When the cache limit is reached, the eviction code starts traversing the linked list in reverse, freeing buffers until enough space has been freed to fit the new user buffer. This guarantees that only the least used cache nodes will be removed from the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Switch to using the pin query functionMitko Haralanov2-1/+8
Use the new function to query whether the expected receive user buffer can be pinned successfully. This requires that a new variable be added to the hfi1_filedata structure used to hold the number of pages pinned by the expected receive code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Specify mm when releasing pagesMitko Haralanov4-14/+22
This change adds a pointer to the process mm_struct when calling hfi1_release_user_pages(). Previously, the function used the mm_struct of the current process to adjust the number of pinned pages. However, is some cases, namely when unpinning pages due to a MMU notifier call, we want to drop into that code block as it will cause a deadlock (the MMU notifiers take the process' mmap_sem prior to calling the callbacks). By allowing to caller to specify the pointer to the mm_struct, the caller has finer control over that part of hfi1_release_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Add pin query functionMitko Haralanov2-6/+47
System administrators can use the locked memory ulimit setting to set the maximum amount of memory a user can lock/pin. However, this setting alone is not enough to guarantee good operation of the hfi1 driver due to the fact that the setting does not have fine enough granularity to account for the limit being used by multiple user processes and caches. Therefore, a better limiting algorithm is needed. This is where the new hfi1_can_pin_pages() function and the cache_size module parameter come in. The function works by looking at the ulimit and cache_size value to compute a cache size. The algorithm examines the ulimit value and, if it is not "unlimited", computes a per-cache limit based on the number of configured user contexts. After that, the lower of the two - cache_size and computed per-cache limit - is used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Implement SDMA-side buffer cachingMitko Haralanov2-101/+155
Add support for caching of user buffers used for SDMA transfers. This change improves performance by avoiding repeatedly pinning the pages of buffers, which are being re-used by the application. While the cost of the pinning operation has been made heavier by adding the extra code to search the cache tree, re-allocate pages arrays, and future cache evictions, that cost will be amortized against the savings when the same buffer is re-used. It is also worth noting that in most cases, the cost of pinning should be much lower due to the buffer already being in the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Adjust last address values for intervalsMitko Haralanov1-3/+3
Last address values for intervals in the interval RB tree nodes should be non-inclusive in order to avoid confusing ranges. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Add filter callbackMitko Haralanov2-5/+15
This commit adds a filter callback, which can be used to filter out interval RB nodes matching a certain interval down to a single one. This is needed for the upcoming SDMA-side caching where buffers will need to be filtered by their virtual address. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Remove compare callbackMitko Haralanov3-17/+1
Interval RB trees provide their own searching function, which also takes care of determining the path through the tree that should be taken. This make the compare callback unnecessary. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Add MMU tracingMitko Haralanov3-0/+12
Add a new tracepoint type for the MMU functions and calls to that tracepoint to allow tracing of MMU functionality. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Use interval RB treesMitko Haralanov2-75/+34
The interval RB trees can handle RB nodes which hold ranged information. This is exactly the usage for the buffer cache implemented in the expected receive code path. Convert the MMU/RB functions to use the interval RB tree API. This will help with future users of the caching API, as well. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Notify remove MMU/RB callback of calling contextMitko Haralanov3-10/+11
Tell the remove MMU/RB callback if it's being called as part of a memory invalidation or not. This can be important in preventing a deadlock if the remove callback attempts to take the map_sem semaphore because the kernel's MMU invalidation functions have already taken it. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Remove the use of add/remove RB function pointersMitko Haralanov2-13/+14
The usage of function pointers for RB node insertion and removal in the expected receive code path was meant to be a small performance optimization. However, maintaining it, especially with the new MMU API, would become more troublesome as the API is extended. Since the performance optimization is minor, remove the function pointers and replace with direct calls. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Allow remove MMU callbacks to free nodesMitko Haralanov1-5/+7
In order to allow the remove MMU callbacks to free the RB nodes, it is necessary to prevent any references to the nodes after the remove callback has been called. Therefore, remove the node from the tree prior to calling the callback. In other words, the MMU/RB API now guarantees that all RB node operations it performs will be done prior to calling the remove callback and that the RB node will not be touched afterwards. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Prevent NULL pointer dereferenceMitko Haralanov1-0/+3
Prevent a potential NULL pointer dereference (found by code inspection) when unregistering an MMU handler. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Allow MMU function execution in IRQ contextMitko Haralanov1-15/+21
Future users of the MMU/RB functions might be searching or manipulating the MMU RB trees in interrupt context. Therefore, the MMU/RB functions need to be able to run in interrupt context. This requires that we use the IRQ-aware API for spin locks. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/hfi1: Re-factor MMU notification codeMitko Haralanov6-259/+471
The MMU notification code added to the expected receive side has been re-factored and split into it's own file. This was done in order to make the code more general and, therefore, usable by other parts of the driver. The caching behavior remains the same. However, the handling of the RB tree (insertion, deletions, and searching) as well as the MMU invalidation processing is now handled by functions in the mmu_rb.[ch] files. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/rdmavt: Post receive for QP in ERR stateAlex Estrin1-9/+24
Accordingly IB Spec post WR to receive queue must complete with error if QP is in Error state. Please refer to C10-42, C10-97.2.1 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Enable adaptive pio by defaultMike Marciniszyn1-1/+1
Set the piothreshold to the agreed upon default of 256B. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Fix adaptive pio packet corruptionMike Marciniszyn1-6/+13
The adaptive pio heuristic missed a case that causes a corrupted packet on the wire. The case is if SDMA egress had been chosen for a pio-able packet and then encountered a ring space wait, the packet is queued. The sge cursor had been incremented as part of the packet build out for SDMA. After the send engine restart, the heuristic might now chose pio based on the sdma count being zero and start the mmio copy using the already incremented sge cursor. Fix this by forcing SDMA egress when the SDMA descriptor has already been built. Additionally, the code to wait for a QPs pio count to zero when switching to SDMA was missing. Add it. There is also an issue with UD QPs, in that the different SLs can pick a different egress send context. For now, just insure the UD/GSI always go through SDMA. Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Fix panic in adaptive pioMike Marciniszyn2-5/+3
The following panic occurs while running ib_send_bw -a with adaptive pio turned on: [ 8551.143596] BUG: unable to handle kernel NULL pointer dereference at (null) [ 8551.152986] IP: [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1] [ 8551.160926] PGD 80db21067 PUD 80bb45067 PMD 0 [ 8551.166431] Oops: 0000 [#1] SMP [ 8551.276725] task: ffff880816bf15c0 ti: ffff880812ac0000 task.ti: ffff880812ac0000 [ 8551.285705] RIP: 0010:[<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1] [ 8551.296462] RSP: 0018:ffff880812ac3b58 EFLAGS: 00010282 [ 8551.303029] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000800 [ 8551.311633] RDX: ffff880812ac3c08 RSI: 0000000000000000 RDI: ffff8800b6665e40 [ 8551.320228] RBP: ffff880812ac3ba0 R08: 0000000000001000 R09: ffffffffa09039a0 [ 8551.328820] R10: ffff880817a0c000 R11: 0000000000000000 R12: ffff8800b6665e40 [ 8551.337406] R13: ffff880817a0c000 R14: ffff8800b6665800 R15: ffff8800b6665e40 [ 8551.355640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8551.362674] CR2: 0000000000000000 CR3: 000000080abe8000 CR4: 00000000001406e0 [ 8551.371262] Stack: [ 8551.374119] ffff880812ac3bf0 ffff88080cf54010 ffff880800000800 ffff880812ac3c08 [ 8551.383036] ffff8800b6665800 ffff8800b6665e40 0000000000000202 ffffffffa08e7b80 [ 8551.391941] 00000001007de431 ffff880812ac3bc8 ffffffffa0904645 ffff8800b6665800 [ 8551.400859] Call Trace: [ 8551.404214] [<ffffffffa08e7b80>] ? hfi1_del_timers_sync+0x30/0x30 [hfi1] [ 8551.412417] [<ffffffffa0904645>] hfi1_verbs_send+0x215/0x330 [hfi1] [ 8551.420154] [<ffffffffa08ec126>] hfi1_do_send+0x166/0x350 [hfi1] [ 8551.427618] [<ffffffffa055a533>] rvt_post_send+0x533/0x6a0 [rdmavt] [ 8551.435367] [<ffffffffa050760f>] ib_uverbs_post_send+0x30f/0x530 [ib_uverbs] [ 8551.443999] [<ffffffffa0501367>] ib_uverbs_write+0x117/0x380 [ib_uverbs] [ 8551.452269] [<ffffffff815810ab>] ? sock_recvmsg+0x3b/0x50 [ 8551.459071] [<ffffffff81581152>] ? sock_read_iter+0x92/0xe0 [ 8551.466068] [<ffffffff81212857>] __vfs_write+0x37/0x100 [ 8551.472692] [<ffffffff81213532>] ? rw_verify_area+0x52/0xd0 [ 8551.479682] [<ffffffff81213782>] vfs_write+0xa2/0x1a0 [ 8551.486089] [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [ 8551.493891] [<ffffffff812146c5>] SyS_write+0x55/0xc0 [ 8551.500220] [<ffffffff816ae0ee>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 8551.531284] RIP [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1] [ 8551.539508] RSP <ffff880812ac3b58> [ 8551.544110] CR2: 0000000000000000 The priv s_sendcontext pointer was not setup properly. Fix with this patch by using the s_sendcontext and eliminating its send engine use. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Fix PIO wakeup timing holeMike Marciniszyn1-5/+7
There is a timing hole if there had been greater than PIO_WAIT_BATCH_SIZE waiters. This code will dispatch the first batch but leave the others in the queue. If the restarted waiters don't in turn wait on a buffer, there is a hang. Fix by forcing a return when the QP queue is non-empty. Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Fix ordering of trace for accuracyMike Marciniszyn1-3/+6
The postitioning of the sdma ibhdr trace was causing an extra trace message when the tx send returned -EBUSY. Move the trace to just before the return and handle negative return values to avoid any trace. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Add unique trace point for pio and sdma sendMike Marciniszyn3-6/+14
This allows for separately enabling pio and sdma tracepoints to cut the volume of trace information. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Fix issues with qp_stats printMike Marciniszyn1-1/+1
The changes are to aid in coorelating trace information with QPs between the trace and qp_stats information Such changes include adds a space after QP and clarifying that the second QP is actually the remote QP. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Report pid in qp_stats to aid debugMike Marciniszyn3-2/+5
Tracking user/QP ownership is needed to debug issues with user ULPs like OpenMPI. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Improve LED beaconingEaswar Hariharan3-51/+38
The current LED beaconing code is unclear and uses the timer handler to turn off the timer. This patch simplifies the code by removing the special semantics of timeon = timeoff = 0 being interpreted as a request to turn off the beaconing. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Don't call cond_resched in atomic mode when sending packetsKaike Wan1-2/+5
This patch fixed the problem where the driver might reschedule in atomic mode when sending packets. This is due to the fact that the call to cond_resched() in hfi1_do_send() might occur in atomic mode and a check is required to avoid the warning message: "kernel: BUG: scheduling while atomic: swapper/2/0/0x10000100." Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Add adaptive cacheless verbs copyDean Luick3-2/+211
The kernel memcpy is faster than a cacheless copy. However, if too much of the L3 cache is overwritten by one-time copies then overall bandwidth suffers. Implement an adaptive scheme where full page copies are tracked and if the number of unique entries are larger than a threshold, verbs will use a cacheless copy. Tracked entries are gradually cleaned, allowing memcpy to resume once the larger copies have stopped. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Handle host handshake timeoutJubin John2-2/+4
Host handshake timeout can occur during the verify capability state. This is a LNI related failure and should be handled in the same way as other LNI failures. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Add ASIC flag view/clearDean Luick1-0/+125
Different OSes using parts of the same hardware may leave cross-device flags set. Export a debugfs file to view and clear these flags if needed. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Hold i2c resource across debugfs open/closeDean Luick1-21/+124
External i2c firmware updates are done in multiple steps and cannot have other things done in between. For debugfs files, acquire the resource on open and release it on close. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Reduce hardware mutex timeoutDean Luick1-1/+1
The hardware mutex is now held only long enough to set or clear flags. Reduce the timeout to something more reasonable. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>