summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/bnxt_re
AgeCommit message (Collapse)AuthorFilesLines
2018-12-17RDMA/bnxt_re: Avoid accessing the device structure after it is freedSelvin Xavier1-0/+2
[ Upstream commit a6c66d6a08b88cc10aca9d3f65cfae31e7652a99 ] When bnxt_re_ib_reg returns failure, the device structure gets freed. Driver tries to access the device pointer after it is freed. [ 4871.034744] Failed to register with netedev: 0xffffffa1 [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea [ 4871.046430] ================================================================== [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813 [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE 4.20.0-rc1+ #42 [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 4871.046449] Call Trace: [ 4871.046454] dump_stack+0x91/0xeb [ 4871.046458] print_address_description+0x6a/0x2a0 [ 4871.046461] kasan_report+0x176/0x2d0 [ 4871.046463] ? bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046466] bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046470] process_one_work+0x216/0x5b0 [ 4871.046471] ? process_one_work+0x189/0x5b0 [ 4871.046475] worker_thread+0x4e/0x3d0 [ 4871.046479] kthread+0x10e/0x140 [ 4871.046480] ? process_one_work+0x5b0/0x5b0 [ 4871.046482] ? kthread_stop+0x220/0x220 [ 4871.046486] ret_from_fork+0x3a/0x50 [ 4871.046492] The buggy address belongs to the page: [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 4871.046495] flags: 0x57ffffc0000000() [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000 [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 4871.046501] page dumped because: kasan: bad access detected Avoid accessing the device structure once it is freed. Fixes: 497158aa5f52 ("RDMA/bnxt_re: Fix the ib_reg failure cleanup") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-17RDMA/bnxt_re: Fix system hang when registration with L2 driver failsSelvin Xavier1-0/+1
[ Upstream commit 3c4b1419c33c2417836a63f8126834ee36968321 ] Driver doesn't release rtnl lock if registration with L2 driver (bnxt_re_register_netdev) fais and this causes hang while requesting for the next lock. [ 371.635416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 371.635417] kworker/u48:1 D 0 634 2 0x80000000 [ 371.635423] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 371.635424] Call Trace: [ 371.635426] ? __schedule+0x36b/0xbd0 [ 371.635429] schedule+0x39/0x90 [ 371.635430] schedule_preempt_disabled+0x11/0x20 [ 371.635431] __mutex_lock+0x45b/0x9c0 [ 371.635433] ? __mutex_lock+0x16d/0x9c0 [ 371.635435] ? bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635438] ? wake_up_klogd+0x37/0x40 [ 371.635442] bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re] [ 371.635447] bnxt_re_task+0xfd/0x180 [bnxt_re] [ 371.635449] process_one_work+0x216/0x5b0 [ 371.635450] ? process_one_work+0x189/0x5b0 [ 371.635453] worker_thread+0x4e/0x3d0 [ 371.635455] kthread+0x10e/0x140 [ 371.635456] ? process_one_work+0x5b0/0x5b0 [ 371.635458] ? kthread_stop+0x220/0x220 [ 371.635460] ret_from_fork+0x3a/0x50 [ 371.635477] INFO: task NetworkManager:1228 blocked for more than 120 seconds. [ 371.635478] Tainted: G B OE 4.20.0-rc1+ #42 [ 371.635479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Release the rtnl_lock correctly in the failure path. Fixes: de5c95d0f518 ("RDMA/bnxt_re: Fix system crash during RDMA resource initialization") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-13RDMA/bnxt_re: Fix recursive lock warning in debug kernelSelvin Xavier1-2/+11
[ Upstream commit d455f29f6d76a5f94881ca1289aaa1e90617ff5d ] Fix possible recursive lock warning. Its a false warning as the locks are part of two differnt HW Queue data structure - cmdq and creq. Debug kernel is throwing the following warning and stack trace. [ 783.914967] ============================================ [ 783.914970] WARNING: possible recursive locking detected [ 783.914973] 4.19.0-rc2+ #33 Not tainted [ 783.914976] -------------------------------------------- [ 783.914979] swapper/2/0 is trying to acquire lock: [ 783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.914999] but task is already holding lock: [ 783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915013] other info that might help us debug this: [ 783.915016] Possible unsafe locking scenario: [ 783.915019] CPU0 [ 783.915021] ---- [ 783.915034] lock(&(&hwq->lock)->rlock); [ 783.915035] lock(&(&hwq->lock)->rlock); [ 783.915037] *** DEADLOCK *** [ 783.915038] May be due to missing lock nesting notation [ 783.915039] 1 lock held by swapper/2/0: [ 783.915040] #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915044] stack backtrace: [ 783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33 [ 783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 783.915048] Call Trace: [ 783.915049] <IRQ> [ 783.915054] dump_stack+0x90/0xe3 [ 783.915058] __lock_acquire+0x106c/0x1080 [ 783.915061] ? sched_clock+0x5/0x10 [ 783.915063] lock_acquire+0xbd/0x1a0 [ 783.915065] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915069] _raw_spin_lock_irqsave+0x4a/0x90 [ 783.915071] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915073] bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915078] tasklet_action_common.isra.17+0x197/0x1b0 [ 783.915081] __do_softirq+0xcb/0x3a6 [ 783.915084] irq_exit+0xe9/0x100 [ 783.915085] do_IRQ+0x6a/0x120 [ 783.915087] common_interrupt+0xf/0xf [ 783.915088] </IRQ> Use nested notation for the spin_lock to avoid this warning. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13RDMA/bnxt_re: Avoid accessing nq->bar_reg_iomem in failure caseSelvin Xavier1-1/+2
[ Upstream commit ed51efd2ce44091a858ad829f666727e7c95695e ] In the failure path, nq->bar_reg_iomem gets accessed without initializing. Avoid this by calling the bnxt_qplib_nq_stop_irq only if the initialization is complete. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-24RDMA/bnxt_re: Fix system crash during RDMA resource initializationSelvin Xavier1-55/+38
bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses the L2 driver. The following sequence can trigger a crash Acquires the rtnl_lock -> Registers roce driver callback with L2 driver -> release the rtnl lock bnxt_re acquires the rtnl_lock -> Request for MSIx vectors -> release the rtnl_lock Issue happens when bnxt_re proceeds with remaining part of initialization and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic. The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are not initialized yet, <snip> [ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.726674] PGD 0 [ 3551.726679] Oops: 0002 1 SMP ... [ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014 [ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000 [ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] ... [ 3551.726872] Call Trace: [ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re] [ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en] [ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en] [ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en] [ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en] [ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en] [ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250 [ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0 [ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210 [ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260 [ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30 [ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290 [ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210 [ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0 [ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880 ... [ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21 ... [ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.727175] RSP <ffff97e3173bf788> [ 3551.727177] CR2: 0000000000000000 Avoid this inconsistent state and system crash by acquiring the rtnl lock for the entire duration of device initialization. Re-factor the code to remove the rtnl lock from the individual function and acquire and release it from the caller. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06bnxt_re: Fix couple of memory leaks that could lead to IOMMU call tracesSomnath Kotur2-1/+3
1. DMA-able memory allocated for Shadow QP was not being freed. 2. bnxt_qplib_alloc_qp_hdr_buf() had a bug wherein the SQ pointer was erroneously pointing to the RQ. But since the corresponding free_qp_hdr_buf() was correct, memory being free was less than what was allocated. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31RDMA/providers: Fix return value from create_srq callbacksKamal Heib1-1/+1
The proper return code is "-EOPNOTSUPP" when the create_srq() callback is not supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments constBart Van Assche2-14/+14
Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31RDMA: Constify the argument of the work request conversion functionsBart Van Assche1-9/+9
When posting a send work request, the work request that is posted is not modified by any of the RDMA drivers. Make this explicit by constifying most ib_send_wr pointers in RDMA transport drivers. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24RDMA/bnxt_re: Modify a fall-through annotationBart Van Assche1-1/+1
This patch avoids that gcc reports the following warning when building with W=1: drivers/infiniband/hw/bnxt_re/ib_verbs.c:2404:4: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10RDMA: Fix storage of PortInfo CapabilityMask in the kernelJason Gunthorpe1-2/+2
The internal flag IP_BASED_GIDS was added to a field that was being used to hold the port Info CapabilityMask without considering the effects this will have. Since most drivers just use the value from the HW MAD it means IP_BASED_GIDS will also become set on any HW that sets the IBA flag IsOtherLocalChangesNoticeSupported - which is not intended. Fix this by keeping port_cap_flags only for the IBA CapabilityMask value and store unrelated flags externally. Move the bit definitions for this to ib_mad.h to make it clear what is happening. To keep the uAPI unchanged define a new set of flags in the uapi header that are only used by ib_uverbs_query_port_resp.port_cap_flags which match the current flags supported in rdma-core, and the values exposed by the current kernel. Fixes: b4a26a27287a ("IB: Report using RoCE IP based gids in port caps") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04RDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.cDan Carpenter1-6/+6
The srq->swq[] is allocated in bnxt_qplib_create_srq(). It has srq->hwq.max_elements elements so these tests should be > instead of >= or we might go beyond the end of the array. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04RDMA/bnxt_re: Fix a couple off by one bugsDan Carpenter1-2/+2
The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl(). It has sgid_tbl->max elements. So the > should be >= to prevent accessing one element beyond the end of the array. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB/core: add max_send_sge and max_recv_sge attributesSteve Wise1-1/+2
This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18RDMA: Convert drivers to use the AH's sgid_attr in post_wr pathsParav Pandit1-23/+10
For UD the drivers were doing a sgid_index lookup into the cache to get the attrs, however we can now directly access the same attrs stores in the ib_ah instead and remove the lookup. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18RDMA: Convert drivers to use sgid_attr instead of sgid_indexParav Pandit1-41/+22
The core code now ensures that all driver callbacks that receive an rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present. Drivers can use this pointer instead of calling a query function with sgid_index. This simplifies the drivers and also avoids races where a gid_index lookup may return different data if it is changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18RDMA: Use GID from the ib_gid_attr during the add_gid() callbackParav Pandit2-5/+3
Now that ib_gid_attr contains the GID, make use of that in the add_gid() callback functions for the provider drivers to simplify the add_gid() implementations. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-25RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changesDevesh Sharma5-53/+163
The recent changes in Broadcom's ethernet driver(L2 driver) broke RoCE functionality in terms of MSIx vector allocation and de-allocation. There is a possibility that L2 driver would initiate MSIx vector reallocation depending upon the requests coming from administrator. In such cases L2 driver needs to free up all the MSIx vectors allocated previously and reallocate/initialize those. If RoCE driver is loaded and reshuffling is attempted, there will be kernel crashes because RoCE driver would still be holding the MSIx vectors but L2 driver would attempt to free in-use vectors. Thus leading to a kernel crash. Making changes in roce driver to fix crashes described above. As part of solution L2 driver tells RoCE driver to release the MSIx vector whenever there is a need. When RoCE driver get message it sync up with all the running tasklets and IRQ handlers and releases the vectors. L2 driver send one more message to RoCE driver to resume the MSIx vectors. L2 driver guarantees that RoCE vector do not change during reshuffling. Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.") Fixes: 08654eb213a8 ("bnxt_en: Change IRQ assignment for RDMA driver.") Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04RDMA: Use ib_gid_attr during GID modificationParav Pandit2-10/+6
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/providers: Avoid null netdev check for RoCEParav Pandit1-3/+2
Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit1-1/+0
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak1-0/+1
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-nextDoug Ledford9-98/+102
Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masksArnd Bergmann2-3/+3
On 32-bit targets, we otherwise get a warning about an impossible constant integer expression: In file included from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39: drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device': include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow] #define BIT(nr) (1UL << (nr)) ^~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT' #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39) ^~~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH' #define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE' ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE; ^~~~~~~~~~~~~~~~~~~ Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15infiniband: qplib_fp: fix pointer castArnd Bergmann1-2/+2
Building for a 32-bit target results in a couple of warnings from casting between a 32-bit pointer and a 64-bit integer: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq': drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle, ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] (struct bnxt_qplib_srq *)q_handle, ^ In file included from include/linux/byteorder/little_endian.h:5, from arch/arm/include/uapi/asm/byteorder.h:22, from include/asm-generic/bitops/le.h:6, from arch/arm/include/asm/bitops.h:342, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq': include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] #define __cpu_to_le64(x) ((__force __le64)(__u64)(x)) ^ include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64' #define cpu_to_le64 __cpu_to_le64 ^~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64' req.srq_handle = cpu_to_le64(srq); Using a uintptr_t as an intermediate works on all architectures. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-13RDMA/bnxt_re: Remove an unused variableBart Van Assche1-3/+2
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Selvin Xavier <selvin.xavier@broadcom.com> Cc: Devesh Sharma <devesh.sharma@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07RDMA/bnxt_re: Avoid Hard lockup during error CQE processingSelvin Xavier6-90/+55
Hitting the following hardlockup due to a race condition in error CQE processing. [26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req [26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa [26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 [26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace [26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded [26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, [26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [26156.472639] Call Trace: [26156.475379] <NMI> [<ffffffff98d0d722>] dump_stack+0x19/0x1b [26156.481833] [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140 [26156.489341] [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100 [26156.496256] [<ffffffff98787c24>] perf_event_overflow+0x14/0x20 [26156.502887] [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510 [26156.509813] [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50 [26156.516738] [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150 [26156.523273] [<ffffffff98d17be8>] do_nmi+0x218/0x460 [26156.528834] [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e [26156.534980] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.543268] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.551556] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.559842] <EOE> [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf [26156.567555] [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30 [26156.573696] [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re] [26156.581789] [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re] [26156.589493] [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re] The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to put the error QP in flush list. Since SQ and RQ of each error qp are added to two different flush list, we need to protect it using locks of corresponding CQs. Difference in order of acquiring the lock in SQ poll_cq and RQ poll_cq can cause a hard lockup. Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock. Instead of this lock, introduces qplib_cq.flush_lock to handle addition/deletion of QPs in flush list. Also, always invoke the flush_lock in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential deadlock. Other than the poll_cq context, the movement of QP to/from flush list can be done in modify_qp context or from an async error event from HW. Synchronize these operations using the bnxt_re verbs layer CQ locks. To achieve this, adds a call back to the HW abstraction layer(qplib) to bnxt_re ib_verbs layer in case of async error event. Also, removes the buddy cq functions as it is no longer required. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-07RDMA/bnxt_re/qplib_sp: Use true and false for boolean valuesGustavo A. R. Silva1-1/+1
Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masksArnd Bergmann2-3/+3
On 32-bit targets, we otherwise get a warning about an impossible constant integer expression: In file included from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39: drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device': include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow] #define BIT(nr) (1UL << (nr)) ^~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT' #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39) ^~~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH' #define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE' ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE; ^~~~~~~~~~~~~~~~~~~ Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28infiniband: qplib_fp: fix pointer castArnd Bergmann1-2/+2
Building for a 32-bit target results in a couple of warnings from casting between a 32-bit pointer and a 64-bit integer: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq': drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle, ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] (struct bnxt_qplib_srq *)q_handle, ^ In file included from include/linux/byteorder/little_endian.h:5, from arch/arm/include/uapi/asm/byteorder.h:22, from include/asm-generic/bitops/le.h:6, from arch/arm/include/asm/bitops.h:342, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq': include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] #define __cpu_to_le64(x) ((__force __le64)(__u64)(x)) ^ include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64' #define cpu_to_le64 __cpu_to_le64 ^~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64' req.srq_handle = cpu_to_le64(srq); Using a uintptr_t as an intermediate works on all architectures. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28RDMA/bnxt_re: Fix the ib_reg failure cleanupSelvin Xavier1-1/+4
Release the netdev references in the cleanup path. Invokes the cleanup routines if bnxt_re_ib_reg fails. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28RDMA/bnxt_re: Fix incorrect DB offset calculationDevesh Sharma4-3/+32
To support host systems with non 4K page size, l2_db_size shall be calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size to FW during initialization. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28RDMA/bnxt_re: Unconditionly fence non wire memory operationsDevesh Sharma1-4/+11
HW requires an unconditonal fence for all non-wire memory operations through SQ. This guarantees the completions of these memory operations. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-20RDMA/bnxt_re: Avoid system hang during device un-regSelvin Xavier2-5/+4
BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work requests posted together. Track schedule of multiple workqueue items by maintaining a per device counter and proceed with IB dereg only if this counter is zero. flush_workqueue is no longer required from NETDEV_UNREGISTER path. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Fix system crash during load/unloadSelvin Xavier1-0/+5
During driver unload, the driver proceeds with cleanup without waiting for the scheduled events. So the device pointers get freed up and driver crashes when the events are scheduled later. Flush the bnxt_re_task work queue before starting device removal. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Synchronize destroy_qp with poll_cqSelvin Xavier4-19/+47
Avoid system crash when destroy_qp is invoked while the driver is processing the poll_cq. Synchronize these functions using the cq_lock. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Unpin SQ and RQ memory if QP create failsDevesh Sharma1-1/+8
Driver leaves the QP memory pinned if QP create command fails from the FW. Avoids this scenario by adding a proper exit path if the FW command fails. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Disable atomic capability on bnxt_re adaptersDevesh Sharma2-17/+3
More testing needs to be done before enabling this feature. Disabling the feature temporarily Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-02RDMA/bnxt_re: Use common error handling code in bnxt_qplib_alloc_dpi_tbl()Markus Elfring1-6/+7
Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-02RDMA/bnxt_re: Delete two error messages for a failed memory allocation in ↵Markus Elfring1-5/+0
bnxt_qplib_alloc_dpi_tbl() Omit extra messages for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01RDMA/bnxt_re: Fix an error code in bnxt_qplib_create_srq()Dan Carpenter1-1/+3
We should return -ENOMEM if the allocation fails. (The current code returns succees). Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-01RDMA/bnxt_re: Fix static checker warningDoug Ledford1-5/+2
If there is ever any error while creating srq->umem, we return that error, we don't store it in srq->umem, so any check of srq->umem for IS_ERR is pointless. Further, checking udata is unnecessary as srq->umem is always either NULL or valid, without respect to udata. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: Add SRQ support for Broadcom adaptersDevesh Sharma7-98/+816
Shared receive queue (SRQ) is defined as a pool of receive buffers shared among multiple QPs which belong to same protection domain in a given process context. Use of SRQ reduces the memory foot print of IB applications. Broadcom adapters support SRQ, adding code-changes to enable shared receive queue. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: expose detailed stats retrieved from HWSelvin Xavier7-10/+417
Broadcom's adapter supports more granular statistics to allow better understanding about the state of the chip when data traffic is flowing. Exposing the detailed stats to the consumer through the standard hook available in the kverbs interface. In order to retrieve all the information, driver implements a firmware command. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: Add support for MRs with Huge pagesSomnath Kotur5-57/+140
Depending on the OS page-table configurations, applications may request MRs which has page size alignment other than 4K Underlying provider driver needs to adjust its PBL boundaries according to the incoming page boundaries in the PA list. Adding a capability to register MRs having pages-sizes other than 4K (Hugepages). Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Add support for query firmware versionSelvin Xavier6-15/+39
The device now reports firmware version thus, removing the hard coded values of the FW version string and redundant fw_rev hook from sysfs. Adding code to query firmware version from underlying device and report it through the kernel verb to get firmware version string. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Enable RoCE on virtual functionsSelvin Xavier6-37/+182
RoCE can be used by virtual functions (VFs) as well. Adding code changes to allow resource reservation, initialization and avail the resources to the RDMA applications running on those VFs. Currently, fifty percent of the total available resources are reserved for PF and remaining are equally divided among active VFs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05bnxt_re: report RoCE device support at info levelJonathan Toppins1-1/+1
Reporting that a device doesn't support RoCE seems like a valuable piece of information to have when trying to determine why a driver is not binding to a device. Better to report this at info log level instead of requiring a user to enable all debug messages in the driver. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-02RDMA/bnxt_re: Use zeroing memory allocator than allocator/memsetHimanshu Jha1-5/+4
Use dma_zalloc_coherent for allocating zeroed memory and remove unnecessary memset function. Done using Coccinelle. Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci 0-day tested with no failures. Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13RDMA/bnxt_re: Remove redundant bnxt_qplib_disable_nq() callArvind Yadav1-1/+0
The bnxt_qplib_disable_nq() call is redundant as it occurs after 'goto fail' and hence it called twice. Remove it. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>