summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rdmavt
AgeCommit message (Collapse)AuthorFilesLines
2018-05-24IB/{rdmavt,hfi1}: Change hrtimer add to use pinned versionMike Marciniszyn1-1/+1
Given we are dealing with nano-second level timers, when the timer pops, ensure it happens on the CPU which caused the timer to be set in the first place. This avoids excessive jitter from the desired expiration time by avoiding the cost of switching our context to another CPU that is cache cold for this given timer. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-15IB: Fix RDMA_RXE and INFINIBAND_RDMAVT dependencies for DMA_VIRT_OPSBen Hutchings1-1/+1
DMA_VIRT_OPS requires that dma_addr_t is at least as wide as a pointer, which is expressed as a dependency on !64BIT || ARCH_DMA_ADDR_T_64BIT. For parisc64 this is not true, and if these IB modules are enabled, kconfig warns: WARNING: unmet direct dependencies detected for DMA_VIRT_OPS Depends on [n]: HAS_DMA [=y] && (!64BIT [=y] || ARCH_DMA_ADDR_T_64BIT) Selected by [m]: - INFINIBAND_RDMAVT [=m] && INFINIBAND [=m] && 64BIT [=y] && PCI [=y] - RDMA_RXE [=m] && INET [=y] && PCI [=y] && INFINIBAND [=m] Add dependencies to fix this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/{hfi1, qib, rdmavt}: Move logic to allocate receive WQE into rdmavtBrian Welty1-0/+149
Moving receive-side WQE allocation logic into rdmavt will allow further code reuse between qib and hfi1 drivers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/{hfi1, rdmavt, qib}: Implement CQ completion vector supportSebastian Sanchez4-75/+82
Currently the driver doesn't support completion vectors. These are used to indicate which sets of CQs should be grouped together into the same vector. A vector is a CQ processing thread that runs on a specific CPU. If an application has several CQs bound to different completion vectors, and each completion vector runs on different CPUs, then the completion queue workload is balanced. This helps scale as more nodes are used. Implement CQ completion vector support using a global workqueue where a CQ entry is queued to the CPU corresponding to the CQ's completion vector. Since the workqueue is global, it's guaranteed to always be there when queueing CQ entries; Therefore, the RCU locking for cq->rdi->worker in the hot path is superfluous. Each completion vector is assigned to a different CPU. The number of completion vectors available is computed by taking the number of online, physical CPUs from the local NUMA node and subtracting the CPUs used for kernel receive queues and the general interrupt. Special use cases: * If there are no CPUs left for completion vectors, the same CPU for the general interrupt is used; Therefore, there would only be one completion vector available. * For multi-HFI systems, the number of completion vectors available for each device is the total number of completion vectors in the local NUMA node divided by the number of devices in the same NUMA node. If there's a division remainder, the first device to get initialized gets an extra completion vector. Upon a CQ creation, an invalid completion vector could be specified. Handle it as follows: * If the completion vector is less than 0, set it to 0. * Set the completion vector to the result of the passed completion vector moded with the number of device completion vectors available. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/hfi1: Optimize kthread pointer locking when queuing CQ entriesSebastian Sanchez1-12/+19
All threads queuing CQ entries on different CQs are unnecessarily synchronized by a spin lock to check if the CQ kthread worker hasn't been destroyed before queuing an CQ entry. The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") is a device global lock and will have poor performance at scale as completions are entered from a large number of CPUs. Convert to use RCU where the read side of RCU is rvt_cq_enter() to determine that the worker is alive prior to triggering the completion event. Apply write side RCU semantics in rvt_driver_cq_init() and rvt_cq_exit(). Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak1-1/+2
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-16RDMAVT: Fix synchronization around percpu_refTejun Heo1-4/+6
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: linux-rdma@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-13IB: remove duplicate header filesZhu Yanjun1-1/+0
In hfi.h, the header file opa_addr.h is included twice. In vt.h, the header file mmap.h is included twice. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07infiniband: remove redundant assignment to pointer 'rdi'Colin Ian King1-1/+1
The pointer rdi is being initialized with a value that is never read and re-assigned immediately after, hence the initialization is redundant and can be removed. Cleans up clang warning: drivers/infiniband/sw/rdmavt/vt.c:94:23: warning: Value stored to 'rdi' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds9-40/+86
Pull RDMA subsystem updates from Jason Gunthorpe: "Overall this cycle did not have any major excitement, and did not require any shared branch with netdev. Lots of driver updates, particularly of the scale-up and performance variety. The largest body of core work was Parav's patches fixing and restructing some of the core code to make way for future RDMA containerization. Summary: - misc small driver fixups to bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes - several major feature adds to bnxt_re driver: SRIOV VF RoCE support, HugePages support, extended hardware stats support, and SRQ support - a notable number of fixes to the i40iw driver from debugging scale up testing - more work to enable the new hip08 chip in the hns driver - misc small ULP fixups to srp/srpt//ipoib - preparation for srp initiator and target to support the RDMA-CM protocol for connections - add RDMA-CM support to srp initiator, srp target is still a WIP - fixes for a couple of places where ipoib could spam the dmesg log - fix encode/decode of FDR/EDR data rates in the core - many patches from Parav with ongoing work to clean up inconsistencies and bugs in RoCE support around the rdma_cm - mlx5 driver support for the userspace features 'thread domain', 'wallclock timestamps' and 'DV Direct Connected transport'. Support for the firmware dual port rocee capability - core support for more than 32 rdma devices in the char dev allocation - kernel doc updates from Randy Dunlap - new netlink uAPI for inspecting RDMA objects similar in spirit to 'ss' - one minor change to the kobject code acked by Greg KH" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (259 commits) RDMA/nldev: Provide detailed QP information RDMA/nldev: Provide global resource utilization RDMA/core: Add resource tracking for create and destroy PDs RDMA/core: Add resource tracking for create and destroy CQs RDMA/core: Add resource tracking for create and destroy QPs RDMA/restrack: Add general infrastructure to track RDMA resources RDMA/core: Save kernel caller name when creating PD and CQ objects RDMA/core: Use the MODNAME instead of the function name for pd callers RDMA: Move enum ib_cq_creation_flags to uapi headers IB/rxe: Change RDMA_RXE kconfig to use select IB/qib: remove qib_keys.c IB/mthca: remove mthca_user.h RDMA/cm: Fix access to uninitialized variable RDMA/cma: Use existing netif_is_bond_master function IB/core: Avoid SGID attributes query while converting GID from OPA to IB RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure IB/umad: Fix use of unprotected device pointer IB/iser: Combine substrings for three messages IB/iser: Delete an unnecessary variable initialisation in iser_send_data_out() IB/iser: Delete an error message for a failed memory allocation in iser_send_data_out() ...
2018-01-16RDMA: Mark imm_data as be32 in the verbs uapi headerJason Gunthorpe1-2/+1
This matches what the userspace copy of this header has been doing for a while. imm_data is an opaque 4 byte array carried over the network, and invalidate_rkey is in CPU byte order. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-11infiniband: fix sw/rdmavt/* kernel-doc notationRandy Dunlap5-17/+18
Use correct parameter names and formatting in function kernel-doc notation to eliminate warnings from scripts/kernel-doc. ../drivers/infiniband/sw/rdmavt/mr.c:784: warning: Excess function parameter 'ibmfr' description in 'rvt_map_phys_fmr' ../drivers/infiniband/sw/rdmavt/vt.c:234: warning: Excess function parameter 'intex' description in 'rvt_query_pkey' ../drivers/infiniband/sw/rdmavt/vt.c:266: warning: Excess function parameter 'index' description in 'rvt_query_gid' ../drivers/infiniband/sw/rdmavt/vt.c:306: warning: Excess function parameter 'data' description in 'rvt_alloc_ucontext' ../drivers/infiniband/sw/rdmavt/cq.c:65: warning: Excess function parameter 'sig' description in 'rvt_cq_enter' ../drivers/infiniband/sw/rdmavt/qp.c:279: warning: Excess function parameter 'qpt' description in 'rvt_free_all_qps' ../drivers/infiniband/sw/rdmavt/mcast.c:282: warning: Excess function parameter 'igd' description in 'rvt_attach_mcast' ../drivers/infiniband/sw/rdmavt/mcast.c:345: warning: Excess function parameter 'igd' description in 'rvt_detach_mcast' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-doc@vger.kernel.org Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-05IB/rdmavt: Add trace for RNRNAK timerKaike Wan2-5/+48
This patch adds static trace for RNRNAK timer. Currently the output from hrtimer static trace only shows the addresses of hrtimers in the system and there is no easy way to correlate an RNRNAK timer with its entries in the hrtimer trace. This patch adds the correlation among a QP, its RNRNAK timer, and its entries in the hrtimer trace. This correlation will be enormously helpful when debugging RNRNAK related issues. In addition, this patch cleans up rvt_stop_rnr_timer() to be void while here. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05IB/rdmavt: Allocate CQ memory on the correct nodeMike Marciniszyn1-3/+7
CQ allocation does not ensure that completion queue entries and the completion queue structure are allocated on the correct numa node. Fix by allocating the rvt_cq and kernel CQ entries on the device node, leaving the user CQ entries on the default local node. Also ensure CQ resizes use the correct allocator when extending a CQ. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05IB/rdmavt: Use correct numa node for SRQ allocationMike Marciniszyn1-7/+9
Normal receive queue allocation ensures that kernel receive queues are allocated on the local numa node. Shared receive queues do not behave the same way. Ensure that kernel shared receive queues are allocated on the device local node. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05IB/{rdmavt, hfi1, qib}: Remove get_card_name() downcallMichael J. Ruhl3-6/+5
rdmavt has a down call to client drivers to retrieve a crafted card name. This name should be the IB defined name. Rather than craft the name each time it is needed, simply retrieve the IB allocated name from the IB device. Update the function name to reflect its application. Clean up driver code to match this change. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05IB/rdmavt: No need to cancel RNRNAK retry timer when it is runningKaike Wan1-3/+1
When the rdmavt's RNRNAK timer is fired, it tries to cancel the timer by calling hrtimer_try_to_cancel(), which always returns -1 because the timer is currently running. This patch removes this useless call. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-05drivers/infiniband: Remove now-redundant smp_read_barrier_depends()Paul E. McKenney1-1/+0
The smp_read_barrier_depends() does nothing at all except on DEC Alpha, and no current DEC Alpha systems use Infiniband: lkml.kernel.org/r/20171023085921.jwbntptn6ictbnvj@tower This commit therefore makes Infiniband depend on !ALPHA and removes the now-ineffective invocations of smp_read_barrier_depends() from the InfiniBand driver. Please note that this patch should not be construed as my saying that InfiniBand's memory ordering is correct, but rather that this patch does not in any way affect InfiniBand's correctness. In other words, the result of applying this patch is bug-for-bug compatible with the original. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: <linux-rdma@vger.kernel.org> Cc: <linux-alpha@vger.kernel.org> [ paulmck: Removed drivers/dma/ioat/dma.c per Jason Gunthorpe's feedback. ] Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2017-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-16drivers/infiniband/sw/rdmavt/qp.c: use kmalloc_array_node()Johannes Thumshirn1-1/+1
Now that we have a NUMA-aware version of kmalloc_array() we can use it instead of kmalloc_node() without an overflow check in the size calculation. Link: http://lkml.kernel.org/r/20170927082038.3782-5-jthumshirn@suse.de Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: David Rientjes <rientjes@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16Merge tag 'for-linus' of ↵Linus Torvalds3-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is a fairly plain pull request. Lots of driver updates across the stack, a huge number of static analysis cleanups including a close to 50 patch series from Bart Van Assche, and a number of new features inside the stack such as general CQ moderation support. Nothing really stands out, but there might be a few conflicts as you take things in. In particular, the cleanups touched some of the same lines as the new timer_setup changes. Everything in this pull request has been through 0day and at least two days of linux-next (since Stephen doesn't necessarily flag new errors/warnings until day2). A few more items (about 30 patches) from Intel and Mellanox showed up on the list on Tuesday. I've excluded those from this pull request, and I'm sure some of them qualify as fixes suitable to send any time, but I still have to review them fully. If they contain mostly fixes and little or no new development, then I will probably send them through by the end of the week just to get them out of the way. There was a break in my acceptance of patches which coincides with the computer problems I had, and then when I got things mostly back under control I had a backlog of patches to process, which I did mostly last Friday and Monday. So there is a larger number of patches processed in that timeframe than I was striving for. Summary: - Add iWARP support to qedr driver - Lots of misc fixes across subsystem - Multiple update series to hns roce driver - Multiple update series to hfi1 driver - Updates to vnic driver - Add kref to wait struct in cxgb4 driver - Updates to i40iw driver - Mellanox shared pull request - timer_setup changes - massive cleanup series from Bart Van Assche - Two series of SRP/SRPT changes from Bart Van Assche - Core updates from Mellanox - i40iw updates - IPoIB updates - mlx5 updates - mlx4 updates - hns updates - bnxt_re fixes - PCI write padding support - Sparse/Smatch/warning cleanups/fixes - CQ moderation support - SRQ support in vmw_pvrdma" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits) RDMA/core: Rename kernel modify_cq to better describe its usage IB/mlx5: Add CQ moderation capability to query_device IB/mlx4: Add CQ moderation capability to query_device IB/uverbs: Add CQ moderation capability to query_device IB/mlx5: Exposing modify CQ callback to uverbs layer IB/mlx4: Exposing modify CQ callback to uverbs layer IB/uverbs: Allow CQ moderation with modify CQ iw_cxgb4: atomically flush the qp iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: Fix possible circular dependency locking warning RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion IB/core: Only maintain real QPs in the security lists IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey RDMA/core: Make function rdma_copy_addr return void RDMA/vmw_pvrdma: Add shared receive queue support RDMA/core: avoid uninitialized variable warning in create_udata RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP RDMA/bnxt_re: Set QP state in case of response completion errors RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries ...
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland1-3/+3
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18Merge branch 'timer_setup' into for-nextDoug Ledford1-6/+4
Conflicts: drivers/infiniband/hw/cxgb4/cm.c drivers/infiniband/hw/qib/qib_driver.c drivers/infiniband/hw/qib/qib_mad.c There were minor fixups needed in these files. Just minor context diffs due to patches from independent sources touching the same basic area. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/rdmavt: Convert timers to use timer_setup()Kees Cook1-6/+4
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. setup_timer() was already being called before the open-coded init_timer() and .data assignment. These are removed as well. Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'hfi1' into k.o/for-nextDoug Ledford2-2/+2
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/rdmavt: Don't wait for resources in QP resetAlex Estrin2-2/+2
Per the IBTA spec, QP destroy shall fail if the QP is attached to multicast groups, although the spec is silent on modify_qp to reset state. It implies that ULP must deregister QP from all mcast groups for destroy to succeed. The faulty patch "IB/ipoib: Update broadcast object if PKey value was changed in index 0" exposed two issues in rdmavt: 1. Rvt QP reset waits for qp references to go to zero. This will hang if QP is attached to multicast groups. 2. The mcast group detach will fail for a QP in reset state therefore preventing ULP from correcting the issue. This patch moves the reference count wait to the the destroy QP path and allows a QP mcast detach to work in the reset state. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-15RDMA/rdmavt: Suppress gcc 7 fall-through complaintsBart Van Assche1-0/+1
Avoid that gcc 7 reports the following warning when building with W=1: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB: Move PCI dependency from root KConfig to HW's KConfigsYuval Shaia1-0/+1
No reason to have dependency on PCI for the entire infiniband stack so move it to KConfig of only the drivers that actually using PCI. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/rdmavt: Handle dereg of inuse MRs properlyMike Marciniszyn2-21/+212
A destroy of an MR prior to destroying the QP can cause the following diagnostic if the QP is referencing the MR being de-registered: hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108 00 pd ffff880859b20b00 The solution is to when the a non-zero refcount is encountered when the MR is destroyed the QPs needs to be iterated looking for QPs in the same PD as the MR. If rvt_qp_mr_clean() detects any such QP references the rkey/lkey, the QP needs to be put into an error state via a call to rvt_qp_error() which will trigger the clean up of any stuck references. This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5. [This is reproduced with the 0.4.9 version of qperf and the rc_bw test] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/rdmavt: Add QP iterator API for QPsMike Marciniszyn1-0/+144
There are currently 3 spots in the qib and hfi1 driver that have knowledge of the internal QP hash list that should only be in scope to rdmavt QP code. Add an iterator API for processing all QPs to hide the nature of the RCU hashlist. The API consists of: - rvt_qp_iter_init() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter_next() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter() * For iterating all QPs The first two are used for things like seq_file prints. The last is for code that just needs to iterate all QPs in the system. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref()Mike Marciniszyn1-5/+1
hfi1 and qib were converted in previous patches, do the same for rdmavt. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDsDon Hiatt2-16/+23
rvt_check_ah() delegates lid verification to underlying driver. Underlying driver uses different conditions to check for dlid depending on whether the device supports extended LIDs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Remove pmtu from the QP structureSebastian Sanchez1-2/+1
The pmtu field doens't have be stored in the QP structure as it can easily be calculated when needed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Add OPA extended LID supportHiatt, Don1-1/+1
This patch series primarily increases sizes of variables that hold lid values from 16 to 32 bits. Additionally, it adds a check in the IB mad stack to verify a properly formatted MAD when OPA extended LIDs are used. Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-testDoug Ledford1-1/+1
Conflicts: drivers/infiniband/hw/mlx5/main.c - Both add new code include/rdma/ib_verbs.h - Both add new code Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Change wc.slid from 16 to 32 bitsHiatt, Don1-1/+1
slid field in struct ib_wc is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compressionMike Marciniszyn2-14/+10
The server side of qperf panics as follows: [242446.336860] IP: report_bug+0x64/0x10 [242446.341031] PGD 1c0c067 [242446.341032] P4D 1c0c067 [242446.343951] PUD 1c0d063 [242446.346870] PMD 8587ea067 [242446.349788] PTE 800000083e14016 [242446.352901] [242446.358352] Oops: 0003 [#1] SM [242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1 [242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201 [242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000 [242446.465097] RIP: 0010:report_bug+0x64/0x10 [242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000 [242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000 [242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740 [242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025 [242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3 [242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000 [242446.516067] FS: 0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000 [242446.525191] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003 [242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e [242446.539756] Call Trace [242446.542582] fixup_bug+0x2c/0x5 [242446.546277] do_trap+0x12b/0x18 [242446.549972] do_error_trap+0x89/0x11 [242446.554171] ? hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.559324] ? ttwu_do_wakeup+0x1e/0x14 [242446.563795] ? ttwu_do_activate+0x77/0x8 [242446.568363] do_invalid_op+0x20/0x3 [242446.572448] invalid_op+0x1e/0x3 [242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004 [242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000 [242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24 [242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000 [242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000 [242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800 [242446.628293] ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1 [242446.633449] hfi1_rc_rcv+0x3da/0x1270 [hfi1 [242446.638312] ? sc_buffer_alloc+0x113/0x150 [hfi1 [242446.643662] hfi1_ib_rcv+0x1c9/0x2e0 [hfi1 [242446.648428] process_receive_ib+0x19a/0x270 [hfi1 [242446.653866] ? process_rcv_qp_work+0xd2/0x160 [hfi1 [242446.659505] handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1 [242446.666693] ? irq_finalize_oneshot+0x100/0x10 [242446.671846] receive_context_thread+0x1b/0x140 [hfi1 [242446.677576] irq_thread_fn+0x1e/0x4 [242446.681659] irq_thread+0x13c/0x1b [242446.685646] ? irq_forced_thread_fn+0x60/0x6 [242446.690604] kthread+0x112/0x15 [242446.694298] ? irq_thread_check_affinity+0xe0/0xe [242446.699738] ? kthread_park+0x60/0x6 [242446.703919] ? do_syscall_64+0x67/0x15 [242446.708292] ret_from_fork+0x25/0x3 [242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0 [242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c [242446.739935] CR2: ffffffffa06647e [242446.743763] ---[ end trace 0e90a20d0aa494f7 ]-- The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok() doesn't interpret the new return value from rvt_lkey_ok() properly leading to an mr reference count underrun. Additionally, remove an unused argument in rvt_sge_adjacent() aw well as an unneeded incr local in rvt_post_one_wr(). Fixes: Commit 14fe13fcd3af ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24Merge branch 'hfi1' into k.o/for-4.14Doug Ledford5-46/+147
2017-07-20IB/rdmavt: Setting of QP timeout can overflow jiffies computationKaike Wan1-3/+1
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause overflow due to the fact that the input to the function usecs_to_jiffies is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is equal to or greater than 30. The consequence is unnecessarily excessive retry and thus degradation of the system performance. This patch fixes the problem by limiting the input to 5-bit and calling usecs_to_jiffies() before multiplying the scaling factor. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-18IB/{rdmavt, qib, hfi1}: Remove gfp flags argumentLeon Romanovsky1-35/+13
The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Use QPN mask to avoid overflowDennis Dalessandro1-1/+1
Ensure we can't come up with an array size that is bigger than the array by applying the QPN mask before the divide in the free_qpn function. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/rdmavt: Remove duplicated functionsDennis Dalessandro1-23/+14
The free_qpn() function from the hfi1/qib driver which was the basis for rdmavt_free_qpn() function was accidentally left in the code. Remove it. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/core,rdmavt,hfi1,opa-vnic: Send OPA cap_mask3 in trapVishwanathapura, Niranjana1-2/+7
Provide the ability for IB clients to modify the OPA specific capability mask and include this mask in the subsequent trap data. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Michael N. Henry <michael.n.henry@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()Mike Marciniszyn4-21/+126
SGEs that are contiguous needlessly consume driver dependent TX resources. The lkey validation logic is enhanced to compress the SGE that ends up in the send wqe when consecutive addresses are detected. The lkey validation API used to return 1 (success) or 0 (fail). The return value is now an -errno, 0 (compressed), or 1 (uncompressed). A additional argument is added to pass the last SQE for the compression. Loopback callers always pass a NULL to last_sge since the optimization is of little benefit in that situation. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Use rdma_ah_attr accessor functionsDasaratharaman Chandramouli2-18/+23
Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_destroy_ah to rdma_destroy_ahDasaratharaman Chandramouli1-1/+1
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli2-7/+7
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/rdmavt/hfi1/qib: Use the MGID and MLID for multicast addressingMichael J. Ruhl1-17/+44
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). The current code only uses the MGID for identifying multicast groups. Update the driver to be compliant with this definition. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/rdmavt: restore IRQs on error path in rvt_create_ah()Dan Carpenter1-1/+1
We need to call spin_unlock_irqrestore() instead of vanilla spin_unlock() on this error path. Fixes: 119a8e708d16 ("IB/rdmavt: Add AH to rdmavt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25IB: Replace ib_umem page_size by page_shiftArtemy Kovalyov1-4/+4
Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <selvin.xavier@broadcom.com> CC: Steve Wise <swise@chelsio.com> CC: Lijun Ou <oulijun@huawei.com> CC: Shiraz Saleem <shiraz.saleem@intel.com> CC: Adit Ranadive <aditr@vmware.com> CC: Dennis Dalessandro <dennis.dalessandro@intel.com> CC: Ram Amrani <Ram.Amrani@Cavium.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>