summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx4/main.c
AgeCommit message (Collapse)AuthorFilesLines
2020-05-05RDMA/mlx4: Initialize ib_spec on the stackAlaa Hleihel1-1/+2
commit c08cfb2d8d78bfe81b37cc6ba84f0875bddd0d5c upstream. Initialize ib_spec on the stack before using it, otherwise we will have garbage values that will break creating default rules with invalid parsing error. Fixes: a37a1a428431 ("IB/mlx4: Add mechanism to support flow steering over IB links") Link: https://lore.kernel.org/r/20200413132235.930642-1-leon@kernel.org Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12IB/mlx4: Follow mirror sequence of device add during device removalParav Pandit1-4/+5
[ Upstream commit 89f988d93c62384758b19323c886db917a80c371 ] Current code device add sequence is: ib_register_device() ib_mad_init() init_sriov_init() register_netdev_notifier() Therefore, the remove sequence should be, unregister_netdev_notifier() close_sriov() mad_cleanup() ib_unregister_device() However it is not above. Hence, make do above remove sequence. Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06infiniband: fix race condition between infiniband mlx4, mlx5 driver and core ↵Ajay Kaher1-1/+3
dumping This patch is the extension of following upstream commit to fix the race condition between get_task_mm() and core dumping for IB->mlx4 and IB->mlx5 drivers: commit 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")' Thanks to Jason for pointing this. Signed-off-by: Ajay Kaher <akaher@vmware.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30IB/mlx4: Include GID type when deleting GIDs from HW table under RoCEJack M1-2/+7
[ Upstream commit a18177925c252da7801149abe217c05b80884798 ] The commit cited below added a gid_type field (RoCEv1 or RoCEv2) to GID properties. When adding GIDs, this gid_type field was copied over to the hardware gid table. However, when deleting GIDs, the gid_type field was not copied over to the hardware gid table. As a result, when running RoCEv2, all RoCEv2 gids in the hardware gid table were set to type RoCEv1 when any gid was deleted. This problem would persist until the next gid was added (which would again restore the gid_type field for all the gids in the hardware gid table). Fix this by copying over the gid_type field to the hardware gid table when deleting gids, so that the gid_type of all remaining gids is preserved when a gid is deleted. Fixes: b699a859d17b ("IB/mlx4: Add gid_type to GID properties") Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDsJack Morgenstein1-2/+0
[ Upstream commit 0077416a3d529baccbe07ab3242e8db541cfadf6 ] When using IPv4 addresses in RoCEv2, the GID format for the mapped IPv4 address should be: ::ffff:<4-byte IPv4 address>. In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords zeroed out by memset, which resulted in deleting the ffff field. However, since procedure ipv6_addr_v4mapped() already verifies that the gid has format ::ffff:<ipv4 address>, no change is needed for the gid, and the memset can simply be removed. Fixes: 7e57b85c444c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware") Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24IB/mlx4: Change vma from shared to privateMaor Gottlieb1-0/+2
[ Upstream commit ca37a664a8e4e9988b220988ceb4d79e3316f195 ] Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise it would lead to SIGBUS. Remove the shared flags from the vma after we change it to be anonymous. This is easily reproduced by doing modprobe -r while running a user-space application such as raw_ethernet_bw. Fixes: ae184ddeca5db ('IB/mlx4_ib: Disassociate support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24IB/mlx4: Take write semaphore when changing the vma structMaor Gottlieb1-2/+2
[ Upstream commit 22c3653d04bd0c67b75e99d85e0c0bdf83947df5 ] When the driver disassociate user context, it changes the vma to anonymous by setting the vm_ops to null and zap the vma ptes. In order to avoid race in the kernel, we need to take write lock before we change the vma entries. Fixes: ae184ddeca5db ('IB/mlx4_ib: Disassociate support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH portsJack Morgenstein1-8/+5
commit 852f6927594d0d3e8632c889b2ab38cbc46476ad upstream. Allocating steerable UD QPs depends on having at least one IB port, while releasing those QPs does not. As a result, when there are only ETH ports, the IB (RoCE) driver requests releasing a qp range whose base qp is zero, with qp count zero. When SR-IOV is enabled, and the VF driver is running on a VM over a hypervisor which treats such qp release calls as errors (rather than NOPs), we see lines in the VM message log like: mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0 Fix this by adding a check for a zero count in mlx4_release_qp_range() (which thus treats releasing 0 qps as a nop), and eliminating the check for device managed flow steering when releasing steerable UD QPs. (Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it remains NULL when steerable UD QPs are not allocated). Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07net/mlx4_core: Fix raw qp flow steering rules under SRIOVJack Morgenstein1-2/+12
[ Upstream commit 10b1c04e92229ebeb38ccd0dcf2b6d3ec73c0575 ] Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: 48564135cba8 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/mlx4: Fix ib device initialization error flowJack Morgenstein1-0/+1
commit 99e68909d5aba1861897fe7afc3306c3c81b6de0 upstream. In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs. However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not called to free the allocated EQs. Fixes: e605b743f33d ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPsEran Ben Elisha1-8/+13
commit 1f22e454df2eb99ba6b7ace3f594f6805cdf5cbc upstream. According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is supported only if dmfs_ipoib bit is set. If it isn't set we want to ensure allocating NET_IF QPs fail. We do so by filling out the allocation bitmap. By thus, the NET_IF QPs allocating function won't find any free QP and will fail. Fixes: c1c98501121e ('IB/mlx4: Add support for steerable IB UD QPs') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26IB/mlx4: Fix port query for 56Gb Ethernet linksSaeed Mahameed1-3/+5
commit 6fa26208206c406fa529cd73f7ae6bf4181e270b upstream. Report the correct speed in the port attributes when using a 56Gbps ethernet link. Without this change the field is incorrectly set to 10. Fixes: a9c766bb75ee ('IB/mlx4: Fix info returned when querying IBoE ports') Fixes: 2e96691c31ec ('IB: Use central enum for speed instead of hard-coded values') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-10Merge tag 'for-linus' of ↵Linus Torvalds1-8/+139
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packetsJack Morgenstein1-2/+108
In MLX qp packets, the LRH (built by the driver) has both a VL field and an SL field. When building a QP1 packet, the VL field should reflect the SLtoVL mapping and not arbitrarily contain zero (as is done now). This bug causes credit problems in IB switches at high rates of QP1 packets. The fix is to cache the SL to VL mapping in the driver, and look up the VL mapped to the SL provided in the send request when sending QP1 packets. For FW versions which support generating a port_management_config_change event with subtype sl-to-vl-table-change, the driver uses that event to update its sl-to-vl mapping cache. Otherwise, the driver snoops incoming SMP mads to update the cache. There remains the case where the FW is running in secure-host mode (so no QP0 packets are delivered to the driver), and the FW does not generate the sl2vl mapping change event. To support this case, the driver updates (via querying the FW) its sl2vl mapping cache when running in secure-host mode when it receives either a Port Up event or a client-reregister event (where the port is still up, but there may have been an opensm failover). OpenSM modifies the sl2vl mapping before Port Up and Client-reregister events occur, so if there is a mapping change the driver's cache will be properly updated. Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/mlx4: Move user vendor structuresLeon Romanovsky1-1/+1
This patch moves mlx4 vendor's specific structures to common UAPI folder which will be visible to all consumers. These structures are used by user-space library driver (libmlx4) and currently manually copied to that library. This move will allow cross-compile against these files and simplify introduction of vendor specific data. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/{core,hw}: Add constant for node_descYuval Shaia1-3/+3
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/mlx4: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar1-1/+1
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues work items &dm[i]->work, &ew->work. It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/mlx4: Add validation to flow specifications parsingMaor Gottlieb1-0/+25
Add validation check that all set fields in flow specification are supported by vendor. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig1-1/+1
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/mlx4: Diagnostic HW counters are not supported in slave modeKamal Heib1-0/+3
Modify the mlx4_ib_diag_counters() to avoid the following error in the hypervisor when the slave tries to query the hardware counters in SR-IOV mode. mlx4_core 0000:81:00.0: Unknown command:0x30 accepted from slave:1 Fixes: 3f85f2aaabf7 ("IB/mlx4: Add diagnostic hardware counters") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-05Merge tag 'for-linus' of ↵Linus Torvalds1-13/+209
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford1-1/+197
2016-08-04IB/mlx4: Add diagnostic hardware countersMark Bloch1-1/+197
Expose IB diagnostic hardware counters. The counters count IB events and are applicable for IB and RoCE. The counters can be divided into two groups, per device and per port. Device counters are always exposed. Port counters are exposed only if the firmware supports per port counters. rq_num_dup and sq_num_to are only exposed if we have firmware support for them, if we do, we expose them per device and per port. rq_num_udsdprd and num_cqovf are device only counters. rq - denotes responder. sq - denotes requester. |-----------------------|---------------------------------------| | Name | Description | |-----------------------|---------------------------------------| |rq_num_lle | Number of local length errors | |-----------------------|---------------------------------------| |sq_num_lle | number of local length errors | |-----------------------|---------------------------------------| |rq_num_lqpoe | Number of local QP operation errors | |-----------------------|---------------------------------------| |sq_num_lqpoe | Number of local QP operation errors | |-----------------------|---------------------------------------| |rq_num_lpe | Number of local protection errors | |-----------------------|---------------------------------------| |sq_num_lpe | Number of local protection errors | |-----------------------|---------------------------------------| |rq_num_wrfe | Number of CQEs with error | |-----------------------|---------------------------------------| |sq_num_wrfe | Number of CQEs with error | |-----------------------|---------------------------------------| |sq_num_mwbe | Number of Memory Window bind errors | |-----------------------|---------------------------------------| |sq_num_bre | Number of bad response errors | |-----------------------|---------------------------------------| |sq_num_rire | Number of Remote Invalid request | | | errors | |-----------------------|---------------------------------------| |rq_num_rire | Number of Remote Invalid request | | | errors | |-----------------------|---------------------------------------| |sq_num_rae | Number of remote access errors | |-----------------------|---------------------------------------| |rq_num_rae | Number of remote access errors | |-----------------------|---------------------------------------| |sq_num_roe | Number of remote operation errors | |-----------------------|---------------------------------------| |sq_num_tree | Number of transport retries exceeded | | | errors | |-----------------------|---------------------------------------| |sq_num_rree | Number of RNR NAK retries exceeded | | | errors | |-----------------------|---------------------------------------| |rq_num_rnr | Number of RNR NAKs sent | |-----------------------|---------------------------------------| |sq_num_rnr | Number of RNR NAKs received | |-----------------------|---------------------------------------| |rq_num_oos | Number of Out of Sequence requests | | | received | |-----------------------|---------------------------------------| |sq_num_oos | Number of Out of Sequence NAKs | | | received | |-----------------------|---------------------------------------| |rq_num_udsdprd | Number of UD packets silently | | | discarded on the Receive Queue due to | | | lack of receive descriptor | |-----------------------|---------------------------------------| |rq_num_dup | Number of duplicate requests received | |-----------------------|---------------------------------------| |sq_num_to | Number of time out received | |-----------------------|---------------------------------------| |num_cqovf | Number of CQ overflows | |-----------------------|---------------------------------------| Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Support device FW version stringIra Weiny1-12/+12
And remove the sysfs in favor of common core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Verify port number in flow steering create flowYishai Hadas1-0/+3
In procedure mlx4_ib_create_flow, passing an invalid port number will cause an out-of-bounds array access. Data passed to this procedure can come from user-space. Therefore, need to validate port number before proceeding onwards. Note that we check against the number of physical ports declared at the verbs (ib core) level; When bonding is active, the verbs level sees one physical port, even though the low-level driver sees two ports. Fixes: f77c0162a339 ("IB/mlx4: Add receive flow steering support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/mlx4: Fix device managed flow steering support testBart Van Assche1-2/+2
Perform the test for device managed flow steering support even if memory windows are not supported. I noticed this because smatch reported inconsistent indentation for the device managed flow steering support test. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/mlx4: printk fixColin Ian King1-1/+1
fix spelling mistake, argumant -> argument Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-0/+7
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-02mlx4: Implement devlink interfaceJiri Pirko1-0/+7
Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> v2->v3: -add dev param to devlink_register (api change) Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01IB/mlx4: Add support for the don't trap ruleMarina Varshaver1-4/+67
Add support for receiving multicast/unicast traffic with the don't trap rule. Sniffing these packets requires a flow steering rule of type NORMAL at priority 0 with flag IB_FLOW_ATTR_FLAGS_DONT_TRAP set. Choosing between multicast or unicast is done via ethernet L2 dest_mac mask and value: - If mask is all zeros - unicast and multicast are set. - If mask non zero - only mask with multicast bit 1 and rest 0 is supported, the mac value will choose if it is multicast or unicast rule. If the mask multicast bit is on and some other bits are on too, it means a request for specific multicast or unicast, this is not supported, either receive all multicast or all unicast. Only when limitations are met registered QP will receive requested type but other QPs can receive same traffic if registered for it. Otherwise, if limitations are not met, an error will be returned. Limitations: - Rule must be with priority 0. - A0 mode is not supported. - Sniffer QP cannot appear in any other flow steering rule. Signed-off-by: Marina Varshaver <marinav@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-01IB/core: Add don't trap flag to flow creationMarina Varshaver1-0/+3
Don't trap flag (i.e. IB_FLOW_ATTR_FLAGS_DONT_TRAP) indicates that QP will receive traffic, but will not steal it. When a packet matches a flow steering rule that was created with the don't trap flag, the QPs assigned to this rule will get this packet, but matching will continue to other equal/lower priority rules. This will let other QPs assigned to those rules to get the packet too. If both don't trap rule and other rules have the same priority and match the same packet, the behavior is undefined. The don't trap flag can't be set with default rule types (i.e. IB_FLOW_ATTR_ALL_DEFAULT, IB_FLOW_ATTR_MC_DEFAULT) as default rules don't have rules after them and don't trap has no meaning here. Signed-off-by: Marina Varshaver <marinav@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Advertise RoCE v2 supportMatan Barak1-3/+9
Advertise RoCE v2 support in port_immutable attributes according to the hardware's capabilities. This enables the verbs stack to use RoCE v2 mode. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Enable RoCE v2 when the IB device is addedMoni Shoua1-1/+8
If the hardware supports RoCE v2, we configure the hardware UDP port according to the RoCE v2 Annex when mlx4_ib device is added. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Add support for setting RoCEv2 gids in hardwareMoni Shoua1-3/+60
To tell hardware about a gid with type RoCEv2, software needs a new modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Add gid_type to GID propertiesMoni Shoua1-4/+13
IB core driver adds a property of type to struct ib_gid_attr. The mlx4 driver should take that in consideration when modifying or querying the hardware gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove in-kernel support for memory windowsChristoph Hellwig1-1/+0
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08mlx4: Expose correct max_sge_rd limitSagi Grimberg1-1/+1
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29IB/mlx4: Remove old FRWR API supportSagi Grimberg1-2/+0
No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29IB/mlx4: Support the new memory registration APISagi Grimberg1-0/+1
Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Add netdev and gid attributes paramteres to cacheMatan Barak1-2/+2
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/mlx4: Add support for blocking multicast loopback QP creation user flagEran Ben Elisha1-1/+2
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream. In addition, this flag was supported only for IB_QPT_UD, now, with the new implementation it is supported for all QP types. Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from user space using the extension create qp command. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/mlx4: Add IB counters tableEran Ben Elisha1-13/+50
This is an infrastructure step for allocating and attaching more than one counter to QPs on the same port. Allocate a counters table and manage the insertion and removals of the counters in load and unload of mlx4 IB. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-29IB/mlx4: Report checksum offload cap for RAW QP when query deviceBodong Wang1-0/+2
Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31IB/mlx4_ib: Disassociate supportYishai Hadas1-2/+137
Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31IB/mlx4: Replace mechanism for RoCE GID managementMoni Shoua1-486/+27
Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31IB/mlx4: Implement ib_device callbacksMoni Shoua1-2/+234
get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31mlx4: Support ib_alloc_mr verbSagi Grimberg1-1/+1
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-29mlx4, mlx5, mthca: Expose max_sge_rd correctlySagi Grimberg1-0/+1
Applications must not assume that max_sge and max_sge_rd are the same, Hence expose max_sge_rd correctly as well. Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Optimize do_slave_initDoug Ledford1-7/+9
There is little chance our memory allocation will fail, so we can combine initializing the work structs with allocating them instead of looping through all of them once to allocate and again to initialize. Then when we need to actually find out if our device is up or in the process of going down, have all of our work structs batched up, take the spin_lock once and only once, and do all of the batch under the one spin_lock invocation instead of incurring all of the locked memory cycles we would otherwise incur to take/release the spin_lock over and over again. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Fix memory leak in do_slave_initDoug Ledford1-0/+2
We create a number of work structs to be queued up to a workqueue, and on completion of the workqueue handler, the workqueue handler frees the allocated memory. If, however, we don't queue the work struct because the device is going down, then we need to free the memory ourselves. Signed-off-by: Doug Ledford <dledford@redhat.com>