summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-10-02IB/hfi1: Combine shift copy and byte copy for SGE readsSebastian Sanchez1-137/+23
Prevent over-reading the SGE length by using byte reads for non quad-word reads. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Do not read more than a SGE lengthSebastian Sanchez1-48/+40
In certain cases, if the tail of an SGE is not 8-byte aligned, bytes beyond the end to an 8-byte alignment can be read. Change the copy routine to avoid the over-read. Instead, stop on the final whole quad-word, then read the remaining bytes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Extend i2c timeoutDean Luick1-1/+1
Allow a longer timeout for i2c due to clock stretching and inaccurate jiffy timing when under a spin lock. This timeout is consistent with other i2c-algo-bit users. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Increase default settings of max_cqes and max_qpsJianxin Xiong1-2/+2
The ib_write_bw test allows using up to 16384 QPs. When a relatively large number of QPs (within that range) is used, the test can fail because the number of CQ entries needed exceeds the limit set by the driver. This patch increases the default setting of max_cqes from 0x2FFFF (196607) to 0x2FFFFF(3145727), which is sufficient to cover the maximum number needed by the ib_write_bw test (2097152). The default setting of max_qps is also increased from 16384 to 32768 to allow the test to run successfully with 16383 or 16384 QPs. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Remove filtering of Set(PkeyTable) in HFI SMASebastian Sanchez1-6/+0
The FM should have full control to set the pkeys in the driver pkey table. Remove filtering done by the driver. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/qib: Remove qpt_mask globalDennis Dalessandro3-13/+3
There is no need to have a global qpt_mask as that does not support the multiple chip model which qib has. Instead rely on the value which exists already in the device data (dd). Fixes: 898fa52b4ac3 "IB/qib: Remove qpn, qp tables and related variables from qib" Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Consolidate pio control masks into single definitionMike Marciniszyn4-30/+32
This allows for adding additional pages of adaptive pio opcode control including manufacturer specific ones. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debugMike Marciniszyn3-2/+30
This patch adds lockdep asserts in key code paths for insuring lock correctness. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Add qp init functionMike Marciniszyn1-42/+58
Add an rvt_qp_init() to initialize specific common fields as the qp is created or reset. The routine is shared by the rvt_reset_qp() and the rvt_create_qp(). The intent is that lock dep assertions will only appear in the rvt_reset_qp(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Move reset calldown to reset pathMike Marciniszyn1-6/+5
The reset calldown is misplaced. It should only be called in the code that actually transitions the QP to reset. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Move iowait_init() to priv allocateMike Marciniszyn1-7/+7
The call is misplaced in the reset calldown function and causes issues with lockdep assertions that are to be added. Fixes: Commit a2c2d608957c ("staging/rdma/hfi1: Remove create_qp functionality") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Correct sparse annotationMike Marciniszyn1-6/+3
The __must_hold() is sufficent to correct the sparse context imbalance inside a function. Per Documentation/sparse.txt: __must_hold - The specified lock is held on function entry and exit. Fixes: Commit c0a67f6ba356 ("IB/rdmavt: Annotate rvt_reset_qp()") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix locking scheme for affinity settingsTadeusz Struk2-43/+51
Existing locking scheme in affinity.c file using the &node_affinity.lock spinlock is not very elegant. We acquire the lock to get hfi1_affinity_node entry, unlock, and then use the entry without the lock held. With more functions being added, which access and modify the entries, this can lead to race conditions. This patch makes this locking scheme more consistent. It changes the spinlock to mutex. Since all the code is executed in a user process context there is no need for a spinlock. This also allows to keep the lock not only while we look up for the node affinity entry, but over the whole section where the entry is being used. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix user-space buffers mapping with IOMMU enabledTymoteusz Kielan7-63/+79
The dma_XXX API functions return bus addresses which are physical addresses when IOMMU is disabled. Buffer mapping to user-space is done via remap_pfn_range() with PFN based on bus address instead of physical. This results in wrong pages being mapped to user-space when IOMMU is enabled. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix the count of user packets submitted to an SDMA engineHarish Chegondi3-28/+30
Each user SDMA request coming into the driver may contain multiple packets. Each user packet may use multiple SDMA descriptors to fill the send buffer. The field seqsubmitted in struct user_sdma_request counts the number of user packets submitted to an SDMA engine. Sometimes, the intermediate count may not be updated properly. However, once all the packets' descriptors are successfully submitted to the SDMA engine, the final count is updated correctly. But, if only some of the packets are submitted to the engine due to an error, the intermediate count doesn't reflect the partial number of packets submitted to the SDMA engine. This can cause a hang later in the code as the count of packets submitted to the SDMA engine doesn't match the the count of packets processed by the SDMA engine. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Move serdes tune inside link start functionDean Luick2-15/+8
All calls to tune_serdes and start_link are paired. Move tune_serdes inside start_link. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/qib,IB/hfi: Use core common header fileMike Marciniszyn23-346/+162
Use common header file structs, defines, and accessors in the drivers. The old declarations are removed. The repositioning of the includes allows for the removal of hfi1_message_header and replaces its use with ib_header. Also corrected are two issues with set_armed_to_active(): - The "packet" parameter is now a pointer as it should have been - The etype is validated to insure that the header is correct Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/core: Add ib headers for general useMike Marciniszyn1-0/+178
Add IB headers, defines, and accessors that are identical in both qib and hfi1 into the core includes. The accessors for be maintenance of __be64 fields since alignment is potentially invalid and can differ based on the presense of the GRH. {hfi1,qib}_ib_headers will be ib_headers. {hfi1,qib|_other_headers will be ib_other_headers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routinesMike Marciniszyn7-26/+20
This improves readability and hides the reference count mechanism from the client drivers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt: Add functions to get and release QP referencesMike Marciniszyn1-0/+19
This centralizes the function and improves code readability. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt: Don't vfree a kzalloc'ed memory regionColin Ian King1-1/+1
The userspace memory region 'mr' is allocated with kzalloc in __rvt_alloc_mr however it is incorrectly being freed with vfree in __rvt_free_mr. Fix this by using kfree to free it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix kmem_cache leakYonatan Cohen1-0/+13
Decrement qp reference when handling error path in completer to prevent kmem_cache leak. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix race condition between requester and completerYonatan Cohen1-13/+44
rxe_requester() is sending a pkt with rxe_xmit_packet() and then calls rxe_update() to update the wqe and qp's psn values. But sometimes the response is received before the requester had time to update the wqe in which case the completer acts on errornous wqe values. This fix updates the wqe and qp before actually sending the request and rolls back when xmit fails. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix duplicate atomic request handlingYonatan Cohen1-5/+6
When handling ack for atomic opcodes like "fetch&add" or "cmp&swp", the method send_atomic_ack() saves the ack before sending it, in case it gets lost and never reach the requester. In which case the method duplicate_request() will need to find it using the duplicated request.psn. But send_atomic_ack() used a wrong psn value and thus the above ack was never found. This fix uses the ack.psn to locate the ack in case its needed. This fix also copies the ack packet to the skb's control buffer since duplicate_request() will need it when calling rxe_xmit_packet() Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix kernel panic in udp_setup_tunnelYonatan Cohen3-34/+51
Disable creation of a UDP socket for ipv6 when CONFIG_IPV6 is not enabeld. Since udp_sock_create6() returns 0 when CONFIG_IPV6 is not set [ 46.888632] IP: [<c220705a>] setup_udp_tunnel_sock+0x6/0x4f [ 46.891355] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 [ 46.893918] Oops: 0002 [#1] PREEMPT [ 46.896014] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-rc4-00001-g8700e3e #1 [ 46.900280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 46.904905] task: cf06c040 ti: cf05e000 task.ti: cf05e000 [ 46.907854] EIP: 0060:[<c220705a>] EFLAGS: 00210246 CPU: 0 [ 46.911137] EIP is at setup_udp_tunnel_sock+0x6/0x4f [ 46.914070] EAX: 00000044 EBX: 00000001 ECX: cf05fef0 EDX: ca8142e0 [ 46.917236] ESI: c2c4505b EDI: cf05fef0 EBP: cf05fed0 ESP: cf05fed0 [ 46.919836] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 46.922046] CR0: 80050033 CR2: 000001fc CR3: 02cec000 CR4: 000006b0 [ 46.924550] Stack: [ 46.926014] cf05ff10 c1fd4657 ca8142e0 0000000a 00000000 00000000 0000b712 00000008 [ 46.931274] 00000000 6bb5bd01 c1fd48de 00000000 00000000 cf05ff1c 00000000 00000000 [ 46.936122] cf05ff1c c1fd4bdf 00000000 cf05ff28 c2c4507b ffffffff cf05ff88 c2bf1c74 [ 46.942350] Call Trace: [ 46.944403] [<c1fd4657>] rxe_setup_udp_tunnel+0x8f/0x99 [ 46.947689] [<c1fd48de>] ? net_to_rxe+0x4e/0x4e [ 46.950567] [<c1fd4bdf>] rxe_net_init+0xe/0xa4 [ 46.953147] [<c2c4507b>] rxe_module_init+0x20/0x4c [ 46.955448] [<c2bf1c74>] do_one_initcall+0x89/0x113 [ 46.957797] [<c2bf15eb>] ? set_debug_rodata+0xf/0xf [ 46.959966] [<c2bf1dbc>] ? kernel_init_freeable+0xbe/0x15b [ 46.962262] [<c2bf1ddc>] kernel_init_freeable+0xde/0x15b [ 46.964418] [<c232eb54>] kernel_init+0x8/0xd0 [ 46.966618] [<c2333122>] ret_from_kernel_thread+0xe/0x24 [ 46.969592] [<c232eb4c>] ? rest_init+0x6f/0x6f Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx5: Set source mac address in FTEMaor Gottlieb1-0/+7
Set the source mac address in the FTE when L2 specification is provided. Fixes: 038d2ef87572 ('IB/mlx5: Add flow steering support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx5: Enable MAD_IFC commands for IB ports onlyNoa Osherovich1-1/+3
MAD_IFC command is supported only for physical functions (PF) and when physical port is IB. The proposed fix enforces it. Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC") Reported-by: David Chang <dchang@suse.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx4: Diagnostic HW counters are not supported in slave modeKamal Heib1-0/+3
Modify the mlx4_ib_diag_counters() to avoid the following error in the hypervisor when the slave tries to query the hardware counters in SR-IOV mode. mlx4_core 0000:81:00.0: Unknown command:0x30 accepted from slave:1 Fixes: 3f85f2aaabf7 ("IB/mlx4: Add diagnostic hardware counters") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOVJack Morgenstein3-3/+27
When sending QP1 MAD packets which use a GRH, the source GID (which consists of the 64-bit subnet prefix, and the 64 bit port GUID) must be included in the packet GRH. For SR-IOV, a GID cache is used, since the source GID needs to be the slave's source GID, and not the Hypervisor's GID. This cache also included a subnet_prefix. Unfortunately, the subnet_prefix field in the cache was never initialized (to the default subnet prefix 0xfe80::0). As a result, this field remained all zeroes. Therefore, when SR-IOV was active, all QP1 packets which included a GRH had a source GID subnet prefix of all-zeroes. However, the subnet-prefix should initially be 0xfe80::0 (the default subnet prefix). In addition, if OpenSM modifies a port's subnet prefix, the new subnet prefix must be used in the GRH when sending QP1 packets. To fix this we now initialize the subnet prefix in the SR-IOV GID cache to the default subnet prefix. We update the cached value if/when OpenSM modifies the port's subnet prefix. We take this cached value when sending QP1 packets when SR-IOV is active. Note that the value is stored as an atomic64. This eliminates any need for locking when the subnet prefix is being updated. Note also that we depend on the FW generating the "port management change" event for tracking subnet-prefix changes performed by OpenSM. If running early FW (before 2.9.4630), subnet prefix changes will not be tracked (but the default subnet prefix still will be stored in the cache; therefore users who do not modify the subnet prefix will not have a problem). IF there is a need for such tracking also for early FW, we will add that capability in a subsequent patch. Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx4: Fix code indentation in QP1 MAD flowJack Morgenstein1-17/+19
The indentation in the QP1 GRH flow in procedure build_mlx_header is really confusing. Fix it, in preparation for a commit which touches this code. Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOVAlex Vesker1-7/+7
Because of an incorrect bit-masking done on the join state bits, when handling a join request we failed to detect a difference between the group join state and the request join state when joining as send only full member (0x8). This caused the MC join request not to be sent. This issue is relevant only when SRIOV is enabled and SM supports send only full member. This fix separates scope bits and join states bits a nibble each. Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/ipoib: Don't allow MC joins during light MC flushAlex Vesker1-0/+9
This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c [18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0 [18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20 [18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40 [18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80 [18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30 [18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50 [18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170 [18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0 [18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50 [18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0 [18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0 [18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0 [18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: fix GFP_KERNEL in spinlock contextAlexey Khoroshilov1-1/+1
There is skb_clone(skb, GFP_KERNEL) in spinlock context in rxe_rcv_mcast_pkt(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Rework debugfs to use SRCUMike Marciniszyn1-80/+52
The debugfs RCU trips many debug kernel warnings because of potential sleeps with an RCU read lock held. This includes both user copy calls and slab allocations throughout the file. This patch switches the RCU to use SRCU for file remove/access race protection. In one case, the SRCU is implicit in the use of the raw debugfs file object and just works. In the seq_file case, a wrapper around seq_read() and seq_lseek() is used to enforce the SRCU using the debugfs supplied functions debugfs_use_file_start() and debugfs_use_file_stop(). The sychronize_rcu() is deleted since the SRCU prevents the remove access race. The RCU locking is kept for qp_stats since the QP hash list is protected using the non-sleepable RCU. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Make n_krcvqs be an unsigned long integerHarish Chegondi3-5/+5
The global variable n_krcvqs stores the sum of the number of kernel receive queues of VLs 0-7 which the user can pass to the driver through the module parameter array krcvqs which is of type unsigned integer. If the user passes large value(s) into krcvqs parameter array, it can cause an arithmetic overflow while calculating n_krcvqs which is also of type unsigned int. The overflow results in an incorrect value of n_krcvqs which can lead to kernel crash while loading the driver. Fix by changing the data type of n_krcvqs to unsigned long. This patch also changes the data type of other variables that get their values from n_krcvqs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Add QSFP sanity pre-checkDean Luick4-8/+82
Sometimes a QSFP device does not respond in the expected time after a power-on. Add a read pre-check/retry when starting the link on driver load. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix AHG KDETH Intr shiftJubin John1-1/+4
In the set_txreq_header_ahg(), The KDETH Intr bit is obtained from the header in the user sdma request using a KDETH_GET shift and mask macro. This value is then futher right shifted by 16 causing us to lose the value i.e it is shifted to zero, leading to the following smatch warning: drivers/infiniband/hw/hfi1/user_sdma.c:1482 set_txreq_header_ahg() warn: mask and shift to zero The Intr bit should be left shifted into its correct position in the KDETH header before the AHG update. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix SGE length for misaligned PIO copySebastian Sanchez1-0/+12
When trying to align the source pointer and there's a byte carry in an SGE copy, bytes are borrowed from the next quad-word X to complete the required quad-word copy. Then, the SGE length is reduced by the number of borrowed bytes. After this, if the remaining number of bytes from quad-word X (extra bytes) is greater than the new SGE length, the number of extra bytes needs to be updated to the new SGE length. Otherwise, when the SGE length gets updated again after the extra bytes are read to create the new byte carry, it goes negative, which then becomes a very large number as the SGE length is an unsigned integer. This causes SGE buffer to be over-read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx5: Don't return errors from poll_cqLeon Romanovsky1-20/+2
Remove returning errors from mlx5 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx5: Use TIR number based on selectorYishai Hadas3-1/+7
Use TIR number based on selector, it should be done to differentiate between RSS QP to RAW one. Reported-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx5: Simplify code by removing return variableLeon Romanovsky1-7/+3
Return variable was set in a line before the actual return was called in begin_wqe function. This patch removes such variable and simplifies the code. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx5: Return EINVAL when caller specifies too many SGEsChuck Lever1-1/+1
The returned value should be EINVAL, because it is caused by wrong caller and not by internal overflow event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx4: Don't return errors from poll_cqLeon Romanovsky1-24/+2
Remove returning errors from mlx4 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02Revert "IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one"Leon Romanovsky1-3/+3
By Mellanox HW design and SW implementation, poll_cq never fails and returns errors, so all these printks are to catch ULP bugs. In case of such bug, the reverted patch will cause reentry of the function, resulting in a printk storm. This reverts commit 5412352fcd8f ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/ipoib: Fix memory corruption in ipoib cm mode connect flowErez Shitrit3-1/+18
When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/core: Fix use after free in send_leave functionErez Shitrit1-11/+2
The function send_leave sets the member: group->query_id (group->query_id = ret) after calling the sa_query, but leave_handler can be executed before the setting and it might delete the group object, and will get a memory corruption. Additionally, this patch gets rid of group->query_id variable which is not used. Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/cxgb4: Make _free_qp static to silence build warningBaoyou Xie1-1/+1
We get 1 warning when build kernel with W=1: drivers/infiniband/hw/cxgb4/qp.c:686:6: warning: no previous prototype for '_free_qp' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks it 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/isert: Properly release resources on DEVICE_REMOVALRaju Rangoju2-3/+22
When the low level driver exercises the hot unplug they would call rdma_cm cma_remove_one which would fire DEVICE_REMOVAL event to all cma consumers. Now, if consumer doesn't make sure they destroy all IB objects created on that IB device instance prior to finalizing all processing of DEVICE_REMOVAL callback, rdma_cm will let the lld to de-register with IB core and destroy the IB device instance. And if the consumer calls (say) ib_dereg_mr(), it will crash since that dev object is NULL. In the current implementation, iser-target just initiates the cleanup and returns from DEVICE_REMOVAL callback. This deferred work creates a race between iser-target cleaning IB objects(say MR) and lld destroying IB device instance. This patch includes the following fixes -> make sure that consumer frees all IB objects associated with device instance -> return non-zero from the callback to destroy the rdma_cm id Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix the size parameter to find_first_bitChristophe Jaillet1-4/+4
The 2nd parameter of 'find_first_bit' is the number of bits to search. In this case, we are passing 'sizeof(u64)' which is 8. It is likely that the number of bits of 'port_mask' was expected here. Use sizeof() * 8 to get the correct number. It has been spotted by the following coccinelle script: @@ expression ret, x; @@ * ret = \(find_first_bit \| find_first_zero_bit\) (x, sizeof(...)); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/mlx5: Fix the size parameter to find_first_bitChristophe Jaillet1-3/+3
The 2nd parameter of 'find_first_bit' is the number of bits to search. In this case, we are passing 'sizeof(tmp)' which is likely to be 4 or 8 because 'tmp' is an 'unsigned long'. It is likely that the number of bits of 'tmp' was expected here. So use BITS_PER_LONG instead. It has been spotted by the following coccinelle script: @@ expression ret, x; @@ * ret = \(find_first_bit \| find_first_zero_bit\) (x, sizeof(...)); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>