summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rdmavt
AgeCommit message (Collapse)AuthorFilesLines
2020-10-29IB/rdmavt: Fix sizeof mismatchColin Ian King1-3/+1
[ Upstream commit 8e71f694e0c819db39af2336f16eb9689f1ae53f ] An incorrect sizeof is being used, struct rvt_ibport ** is not correct, it should be struct rvt_ibport *. Note that since ** is the same size as * this is not causing any issues. Improve this fix by using sizeof(*rdi->ports) as this allows us to not even reference the type of the pointer. Also remove line breaks as the entire statement can fit on one line. Link: https://lore.kernel.org/r/20201008095204.82683-1-colin.king@canonical.com Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)") Fixes: ff6acd69518e ("IB/rdmavt: Add device structure allocation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-25IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr valueMike Marciniszyn1-0/+2
[ Upstream commit 35164f5259a47ea756fa1deb3e463ac2a4f10dc9 ] The command 'ibv_devinfo -v' reports 0 for max_mr. Fix by assigning the query values after the mr lkey_table has been built rather than early on in the driver. Fixes: 7b1e2099adc8 ("IB/rdmavt: Move memory registration into rdmavt") Reviewed-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-25IB/rdmavt: Fix alloc_qpn() WARN_ON()Mike Marciniszyn1-1/+2
[ Upstream commit 2abae62a26a265129b364d8c1ef3be55e2c01309 ] The qpn allocation logic has a WARN_ON() that intends to detect the use of an index that will introduce bits in the lower order bits of the QOS bits in the QPN. Unfortunately, it has the following bugs: - it misfires when wrapping QPN allocation for non-QOS - it doesn't correctly detect low order QOS bits (despite the comment) The WARN_ON() should not be applied to non-QOS (qos_shift == 1). Additionally, it SHOULD test the qpn bits per the table below: 2 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, 0, 0, sc0], qp bit 1 always 0* 3-4 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, 0, sc1, sc0], qp bits [21] always 0 5-8 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, sc2, sc1, sc0] qp bits [321] always 0 Fix by qualifying the warning for qos_shift > 1 and producing the correct mask to insure the above bits are zero without generating a superfluous warning. Fixes: 501edc42446e ("IB/rdmavt: Correct warning during QPN allocation") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02IB/rdmavt: Fix frwr memory registrationJosh Collier1-7/+10
commit 7c39f7f671d2acc0a1f39ebbbee4303ad499bbfa upstream. Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17RDMA/rdmavt: Fix rvt_create_ah function signatureKamal Heib2-2/+5
[ Upstream commit 4f32fb921b153ae9ea280e02a3e91509fffc03d3 ] rdmavt uses a crazy system that looses the type checking when assinging functions to struct ib_device function pointers. Because of this the signature to this function was not changed when the below commit revised things. Fix the signature so we are not calling a function pointer with a mismatched signature. Fixes: 477864c8fcd9 ("IB/core: Let create_ah return extended response to user") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-07-03IB/hfi1: Optimize kthread pointer locking when queuing CQ entriesSebastian Sanchez1-12/+19
commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream. All threads queuing CQ entries on different CQs are unnecessarily synchronized by a spin lock to check if the CQ kthread worker hasn't been destroyed before queuing an CQ entry. The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") is a device global lock and will have poor performance at scale as completions are entered from a large number of CPUs. Convert to use RCU where the read side of RCU is rvt_cq_enter() to determine that the worker is alive prior to triggering the completion event. Apply write side RCU semantics in rvt_driver_cq_init() and rvt_cq_exit(). Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-12IB/rdmavt: Allocate CQ memory on the correct nodeMike Marciniszyn1-3/+7
[ Upstream commit db9a2c6f9b6196b889b98e961cb9a37617b11ccf ] CQ allocation does not ensure that completion queue entries and the completion queue structure are allocated on the correct numa node. Fix by allocating the rvt_cq and kernel CQ entries on the device node, leaving the user CQ entries on the default local node. Also ensure CQ resizes use the correct allocator when extending a CQ. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-21RDMAVT: Fix synchronization around percpu_refTejun Heo1-4/+6
commit 74b44bbe80b4c62113ac1501482ea1ee40eb9d67 upstream. rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: linux-rdma@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-29IB/rdmavt: Handle dereg of inuse MRs properlyMike Marciniszyn2-21/+212
A destroy of an MR prior to destroying the QP can cause the following diagnostic if the QP is referencing the MR being de-registered: hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108 00 pd ffff880859b20b00 The solution is to when the a non-zero refcount is encountered when the MR is destroyed the QPs needs to be iterated looking for QPs in the same PD as the MR. If rvt_qp_mr_clean() detects any such QP references the rkey/lkey, the QP needs to be put into an error state via a call to rvt_qp_error() which will trigger the clean up of any stuck references. This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5. [This is reproduced with the 0.4.9 version of qperf and the rc_bw test] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/rdmavt: Add QP iterator API for QPsMike Marciniszyn1-0/+144
There are currently 3 spots in the qib and hfi1 driver that have knowledge of the internal QP hash list that should only be in scope to rdmavt QP code. Add an iterator API for processing all QPs to hide the nature of the RCU hashlist. The API consists of: - rvt_qp_iter_init() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter_next() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter() * For iterating all QPs The first two are used for things like seq_file prints. The last is for code that just needs to iterate all QPs in the system. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref()Mike Marciniszyn1-5/+1
hfi1 and qib were converted in previous patches, do the same for rdmavt. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDsDon Hiatt2-16/+23
rvt_check_ah() delegates lid verification to underlying driver. Underlying driver uses different conditions to check for dlid depending on whether the device supports extended LIDs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Remove pmtu from the QP structureSebastian Sanchez1-2/+1
The pmtu field doens't have be stored in the QP structure as it can easily be calculated when needed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Add OPA extended LID supportHiatt, Don1-1/+1
This patch series primarily increases sizes of variables that hold lid values from 16 to 32 bits. Additionally, it adds a check in the IB mad stack to verify a properly formatted MAD when OPA extended LIDs are used. Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-testDoug Ledford1-1/+1
Conflicts: drivers/infiniband/hw/mlx5/main.c - Both add new code include/rdma/ib_verbs.h - Both add new code Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Change wc.slid from 16 to 32 bitsHiatt, Don1-1/+1
slid field in struct ib_wc is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compressionMike Marciniszyn2-14/+10
The server side of qperf panics as follows: [242446.336860] IP: report_bug+0x64/0x10 [242446.341031] PGD 1c0c067 [242446.341032] P4D 1c0c067 [242446.343951] PUD 1c0d063 [242446.346870] PMD 8587ea067 [242446.349788] PTE 800000083e14016 [242446.352901] [242446.358352] Oops: 0003 [#1] SM [242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1 [242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201 [242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000 [242446.465097] RIP: 0010:report_bug+0x64/0x10 [242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000 [242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000 [242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740 [242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025 [242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3 [242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000 [242446.516067] FS: 0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000 [242446.525191] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003 [242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e [242446.539756] Call Trace [242446.542582] fixup_bug+0x2c/0x5 [242446.546277] do_trap+0x12b/0x18 [242446.549972] do_error_trap+0x89/0x11 [242446.554171] ? hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.559324] ? ttwu_do_wakeup+0x1e/0x14 [242446.563795] ? ttwu_do_activate+0x77/0x8 [242446.568363] do_invalid_op+0x20/0x3 [242446.572448] invalid_op+0x1e/0x3 [242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004 [242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000 [242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24 [242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000 [242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000 [242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800 [242446.628293] ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1 [242446.633449] hfi1_rc_rcv+0x3da/0x1270 [hfi1 [242446.638312] ? sc_buffer_alloc+0x113/0x150 [hfi1 [242446.643662] hfi1_ib_rcv+0x1c9/0x2e0 [hfi1 [242446.648428] process_receive_ib+0x19a/0x270 [hfi1 [242446.653866] ? process_rcv_qp_work+0xd2/0x160 [hfi1 [242446.659505] handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1 [242446.666693] ? irq_finalize_oneshot+0x100/0x10 [242446.671846] receive_context_thread+0x1b/0x140 [hfi1 [242446.677576] irq_thread_fn+0x1e/0x4 [242446.681659] irq_thread+0x13c/0x1b [242446.685646] ? irq_forced_thread_fn+0x60/0x6 [242446.690604] kthread+0x112/0x15 [242446.694298] ? irq_thread_check_affinity+0xe0/0xe [242446.699738] ? kthread_park+0x60/0x6 [242446.703919] ? do_syscall_64+0x67/0x15 [242446.708292] ret_from_fork+0x25/0x3 [242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0 [242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c [242446.739935] CR2: ffffffffa06647e [242446.743763] ---[ end trace 0e90a20d0aa494f7 ]-- The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok() doesn't interpret the new return value from rvt_lkey_ok() properly leading to an mr reference count underrun. Additionally, remove an unused argument in rvt_sge_adjacent() aw well as an unneeded incr local in rvt_post_one_wr(). Fixes: Commit 14fe13fcd3af ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24Merge branch 'hfi1' into k.o/for-4.14Doug Ledford5-46/+147
2017-07-20IB/rdmavt: Setting of QP timeout can overflow jiffies computationKaike Wan1-3/+1
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause overflow due to the fact that the input to the function usecs_to_jiffies is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is equal to or greater than 30. The consequence is unnecessarily excessive retry and thus degradation of the system performance. This patch fixes the problem by limiting the input to 5-bit and calling usecs_to_jiffies() before multiplying the scaling factor. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-18IB/{rdmavt, qib, hfi1}: Remove gfp flags argumentLeon Romanovsky1-35/+13
The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Use QPN mask to avoid overflowDennis Dalessandro1-1/+1
Ensure we can't come up with an array size that is bigger than the array by applying the QPN mask before the divide in the free_qpn function. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/rdmavt: Remove duplicated functionsDennis Dalessandro1-23/+14
The free_qpn() function from the hfi1/qib driver which was the basis for rdmavt_free_qpn() function was accidentally left in the code. Remove it. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/core,rdmavt,hfi1,opa-vnic: Send OPA cap_mask3 in trapVishwanathapura, Niranjana1-2/+7
Provide the ability for IB clients to modify the OPA specific capability mask and include this mask in the subsequent trap data. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Michael N. Henry <michael.n.henry@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()Mike Marciniszyn4-21/+126
SGEs that are contiguous needlessly consume driver dependent TX resources. The lkey validation logic is enhanced to compress the SGE that ends up in the send wqe when consecutive addresses are detected. The lkey validation API used to return 1 (success) or 0 (fail). The return value is now an -errno, 0 (compressed), or 1 (uncompressed). A additional argument is added to pass the last SQE for the compression. Loopback callers always pass a NULL to last_sge since the optimization is of little benefit in that situation. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Use rdma_ah_attr accessor functionsDasaratharaman Chandramouli2-18/+23
Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_destroy_ah to rdma_destroy_ahDasaratharaman Chandramouli1-1/+1
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli2-7/+7
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/rdmavt/hfi1/qib: Use the MGID and MLID for multicast addressingMichael J. Ruhl1-17/+44
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). The current code only uses the MGID for identifying multicast groups. Update the driver to be compliant with this definition. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/rdmavt: restore IRQs on error path in rvt_create_ah()Dan Carpenter1-1/+1
We need to call spin_unlock_irqrestore() instead of vanilla spin_unlock() on this error path. Fixes: 119a8e708d16 ("IB/rdmavt: Add AH to rdmavt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25IB: Replace ib_umem page_size by page_shiftArtemy Kovalyov1-4/+4
Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <selvin.xavier@broadcom.com> CC: Steve Wise <swise@chelsio.com> CC: Lijun Ou <oulijun@huawei.com> CC: Shiraz Saleem <shiraz.saleem@intel.com> CC: Adit Ranadive <aditr@vmware.com> CC: Dennis Dalessandro <dennis.dalessandro@intel.com> CC: Ram Amrani <Ram.Amrani@Cavium.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdeviceDoug Ledford7-41/+317
2017-04-05IB/hfi1: Eliminate synchronize_rcu() in mr deleteMike Marciniszyn1-16/+33
The synchronize_rcu() call can be eliminated to improve memory deregistration performance. There are two key fields involved: - The rcu pointer itself - the lkey_published field To close the window between the rcu read of the mregion pointer and the reference count the code should: 1. To lkey/rkey validation (reader) Read the rcu pointer. If the pointer is non-NULL, get a reference. To the current validation tests use a READ_ONCE() on the lkey_published. Upon any failure release the reference. 2. To the remove logic (delete) Insure the published is zeroed prior to setting the pointer to NULL. This requires using rcu_assign_pointer() to insure lkey_published is written prior to the NULL. 3. To the insert logic (add) Insure the published is set use an rcu_assign_pointer() to insure the pointer is after all MR fields. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/rdmavt: Avoid reseting wqe send_flags in unreserveMike Marciniszyn1-2/+5
The wqe should be read only and in fact the superfluous reset of the RVT_SEND_RESERVE_USED flag causes an issue where reserved operations elicit a bad completion to the ULP. The maintenance of the flag is now entirely within rvt_post_one_wr() where a reserved operation will set the flag and a non-reserved operation will insure the operation that is about to be posted has the flag reset. Fixes: Commit 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations") Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/rdmavt, IB/hfi1: Fix timer migration regressionsSebastian Sanchez3-2/+116
RC timeout counter isn't getting incremented. Increment counter and add the trace for it. Fixes: 87c23b4ab018 ("IB/rdmavt: Adding timer logic to rdmavt") Reviewed-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/rdmavt: Add tracing for cq entry and pollMike Marciniszyn3-0/+131
The following fields are defined for filtering and triggering: - wr_id - status - opcode - qpn - length - idx Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/rdmavt: Add additional fields to post send traceMike Marciniszyn2-4/+32
This fix is to get additional debugging information. The following fields are added: - wqe - qpt - num_sge - ssn - pid - send_flags These additional fields provide for more focused filtering and triggering. The patch also moves the trace to just before the wqe is posted to get the most accurate information and future proofs the code to trace all possible reserved opcodes. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependentMike Marciniszyn1-17/+0
The work to create a completion helper moved the translation of send wqe operations to completion opcodes to rdmvat. This precludes having driver dependent operations. Make the translation driver dependent by doing the translation in the driver prior to the rvt_qp_swqe_complete() call using restored translation tables. Fixes: Commit f2dc9cdce83c ("IB/rdmavt: Add a send completion helper") Fixes: Commit 0771da5a6e9d ("IB/hfi1,IB/qib: Use new send completion helper") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24infiniband: Fix alignment of mmap cookies to support VIPT cachingJason Gunthorpe1-2/+2
When vmalloc_user is used to create memory that is supposed to be mmap'd to user space, it is necessary for the mmap cookie (eg the offset) to be aligned to SHMLBA. This creates a situation where all virtual mappings of the same physical page share the same virtual cache index and guarantees VIPT coherence. Otherwise the cache is non-coherent and the kernel will not see writes by userspace when reading the shared page (or vice-versa). Reported-by: Josh Beavers <josh.beavers@gmail.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-28scripts/spelling.txt: add "therfore" pattern and fix typo instancesMasahiro Yamada1-3/+3
Fix typos and add the following to the scripts/spelling.txt: therfore||therefore Besides, tidy up comment blocks for 80-col wrapping. Link: http://lkml.kernel.org/r/1481573103-11329-31-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-26Merge tag 'for-next-dma_ops' of ↵Linus Torvalds7-259/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...
2017-02-23Merge tag 'for-linus' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
2017-02-19IB/hfi1, qib, rdmavt: Move AETH defines to rdma/ib_hdrs.hDon Hiatt2-11/+11
Rename RVT AETH defines and export in rdma/ib_hdrs.h Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/hfi1: Add rvt_rnr_tbl_to_usec functionDon Hiatt1-0/+11
Return usec from an index into ib_rvt_rnr_table. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/hfi1, rdmavt: Update copy_sge to use boolean argumentsBrian Welty1-1/+1
Convert copy_sge and related SGE state functions to use boolean. For determining if QP is in user mode, add helper function in rdmavt_qp.h. This is used to determine if QP needs the last byte ordering. While here, change rvt_pd.user to a boolean. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/rdmavt: Adding timer logic to rdmavtVenkata Sandeep Dhanalakota1-2/+181
To move common code across target to rdmavt for code reuse. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavtBrian Welty2-2/+192
Add rvt_compute_aeth() and rvt_get_credit() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavtBrian Welty1-0/+38
Add rvt_rc_error() and rvt_comm_est() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/rdmavt: Use per-CPU reference count for MRsSebastian Sanchez1-21/+38
Having per-CPU reference count for each MR prevents cache-line bouncing across the system. Thus, it prevents bottlenecks. Use per-CPU reference counts per MR. The per-CPU reference count for FMRs is used in atomic mode to allow accurate testing of the busy state. Other MR types run in per-CPU mode MR until they're freed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14IB: Query ports via the core instead of direct into the driverOr Gerlitz1-3/+4
Change the drivers to call ib_query_port in their get port immutable handler instead of their own query port handler. Doing this required to set the core cap flags of this device before the ib_query_port call is made, since the IB core might need these caps to serve the port query. Drivers are ensured by the IB core that the port attributes passed to the port query verb implementation are zero, and hence we removed the zeroing from the drivers. This patch doesn't add any new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating itBart Van Assche7-259/+8
Make the rxe and rdmavt drivers use dma_virt_ops. Update the comments that refer to the source files removed by this patch. Remove struct ib_dma_mapping_ops. Remove ib_device.dma_ops. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Andrew Boyer <andrew.boyer@dell.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Jonathan Toppins <jtoppins@redhat.com> Cc: Alex Estrin <alex.estrin@intel.com> Cc: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>