summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx4
AgeCommit message (Collapse)AuthorFilesLines
2015-05-12infiniband: Remove duplicated KERN_<LEVEL> from pr_<level> usesJoe Perches1-2/+1
These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause bad logging output so remove them. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' ↵Doug Ledford6-166/+391
into for-4.1
2015-04-15infiniband/mlx4: check for mapping errorSebastian Ott2-0/+13
Since ib_dma_map_single can fail use ib_dma_mapping_error to check for errors. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Fix WQE LSO segment calculationErez Shitrit1-2/+1
The current code decreases from the mss size (which is the gso_size from the kernel skb) the size of the packet headers. It shouldn't do that because the mss that comes from the stack (e.g IPoIB) includes only the tcp payload without the headers. The result is indication to the HW that each packet that the HW sends is smaller than what it could be, and too many packets will be sent for big messages. An easy way to demonstrate one more aspect of the problem is by configuring the ipoib mtu to be less than 2*hlen (2*56) and then run app sending big TCP messages. This will tell the HW to send packets with giant (negative value which under unsigned arithmetics becomes a huge positive one) length and the QP moves to SQE state. Fixes: b832be1e4007 ('IB/mlx4: Add IPoIB LSO support') Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Change alias guids default to be host assignedYishai Hadas1-2/+2
Change the default mode to be HOST assigned instead of SM assigned. This is the expected operational mode, because it doesn't depend on SM availability. As PF generates random GUIDs as the initial admin values, this gives out of the box experience. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Request alias GUID on demandYishai Hadas3-0/+75
Request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shutdown (but the GID was not administratively released). Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Change init flow to request alias GUIDs for active VFsYishai Hadas3-79/+50
Change the init flow to ask GUIDs only for active VFs. This is done for both SM & HOST modes so that there is no need any more to maintain the ownership record type. In case SM mode is used, the initial value will be 0, ask the SM to assign, for the HOST mode the initial value will be the HOST generated GUID. This will enable out of the box experience for both probed and attached VFs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Manage admin alias GUID upon admin requestYishai Hadas2-9/+15
Set the admin alias GUID per the administrator's request via the sysfs mechanism into the core layer. The "get" request returns the current value. However, if the administrator requests the SM to assign a new value by requesting 0, the SM assigned GUID is returned. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Alias GUID adding persistency supportYishai Hadas3-84/+245
If the SM rejects an alias GUID request the PF driver keeps trying to acquire the specified GUID indefinitely, utilizing an exponential backoff scheme. Retrying is managed per GUID entry. Each entry that wasn't applied holds its next retry information. Retry requests to the SM consist of records of 8 consecutive GUIDS. Each record that contains GUIDs requiring retries holds its next time-to-run based on the retry information of all its GUID entries. The record having the lowest retry time will run first when that retry time arrives. Since the method (SET or DELETE) as sent to the SM applies to all the GUIDs in the record, we must handle SET requests and DELETE requests in separate SM messages (one for SETs and the other for DELETEs). To avoid race conditions where a GUID entry request (set or delete) was modified after the SM request was sent, we save the method and the requested indices as part of the callback's context -- thus, only the requested indexes are evaluated when the response is received. When an GUID entry is approved we turn off its retry-required bit, this prevents redundant SM retries from occurring on that record. The port down event should be sent only when previously it was up. Likewise, the port up event should be sent only if previously the port was down. Synchronization was added around the flows that change entries and record state to prevent race conditions. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-02net/mlx4: Add SET_PORT opcode modifiers enumerationIdo Shamay1-5/+6
The calls to SET_PORT used hard-code numbers, when supplying command's opcode modifiers, fix that to use well defined constants. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18IB/mlx4: Saturate RoCE port PMA counters in case of overflowMajd Dibbiny1-4/+16
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of overflow, according to the IB spec, we have to saturate a counter to its max value, do that. Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18IB/mlx4: Verify net device validity on port change eventMoni Shoua1-1/+5
Processing an event is done in a different context from the one when the event was dispatched. This requires a check that the slave net device is still valid when the event is being processed. The check is done under the iboe lock which ensure correctness. Fixes: a57500903093 ('IB/mlx4: Add port aggregation support') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-21Merge tag 'rdma-for-linus' of ↵Linus Torvalds4-13/+13
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA updates from Roland Dreier: - Re-enable on-demand paging changes with stable ABI - Fairly large set of ocrdma HW driver fixes - Some qib HW driver fixes - Other miscellaneous changes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (43 commits) IB/qib: Add blank line after declaration IB/qib: Fix checkpatch warnings IB/mlx5: Enable the ODP capability query verb IB/core: Add on demand paging caps to ib_uverbs_ex_query_device IB/core: Add support for extended query device caps RDMA/cxgb4: Don't hang threads forever waiting on WR replies RDMA/ocrdma: Fix off by one in ocrdma_query_gid() RDMA/ocrdma: Use unsigned for bit index RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit RDMA/ocrdma: Update the ocrdma module version string RDMA/ocrdma: set vlan present bit for user AH RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure RDMA/ocrdma: Add support for interrupt moderation RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE RDMA/ocrdma: Host crash on destroying device resources RDMA/ocrdma: Report correct state in ibv_query_qp RDMA/ocrdma: Debugfs enhancments for ocrdma driver RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device ...
2015-02-18IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian orderJack Morgenstein1-1/+1
If a GUID is not found, the 64-bit GUID printed in the message log warning should converted to host-endian order for printing. Found by Doug Ledford and Hal Rosenstock. Fix suggested by Hal. Signed-off-by: Hal Rosenstock <hal@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-18IB/mlx4: Bug fixes in mlx4_ib_resize_cqMajd Dibbiny1-4/+3
1. Before the entries alignment, we need to check that the entries doesn't exceed the device's max cqe. 2. After the alignment, we need to make sure that the aligned number doesn't exceed the max cqes+1. The additional cqe is used to denote that the resizing operation has completed. 3. If the users asks to resize the CQ with entries less than the oustanding cqes we should fail instead of returning 0. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-18IB/mlx4: Fix memory leak in __mlx4_ib_modify_qpMajd Dibbiny1-2/+4
In case handle_eth_ud_smac_index fails, we need to free the allocated resources. Fixes: 2f5bb473681b ("mlx4: Add ref counting to port MAC table for RoCE") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detachOr Gerlitz1-5/+5
The MLX4_PROT_IB_IPV4 protocol should only be used with RoCEv2 and such. Removing this wrong usage allows to run multicast applications over RoCE. Fixes: d487ee77740c ("IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table") Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-10IB/mlx4: Reset flow support for IB kernel ULPsYishai Hadas5-6/+191
The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-10IB/mlx4: Always use the correct port for mirrored multicast attachmentsMoni Shoua1-1/+5
When attaching a QP to a multicast address in bonded mode, there was an assumption that the port of the QP must be #1. This assumption isn't the case under the flow which enables maximal usage of the physical ports. Fix it by always checking the port of the original flow and create the mirrored flow on the other port. Fixes: c6215745b66a ('IB/mlx4: Load balance ports in port aggregation mode') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05IB/mlx4: Load balance ports in port aggregation modeMoni Shoua4-0/+29
When the mlx4 IB (RoCE) device works in link aggregation mode, it exposes a single port to upper layers. Therefore, applications always set '1' in port_num attribute when modifying a QP or creating an address handle. To make sure that a node uses all available ports the mlx4 driver will override the port_num attribute with a round robin policy. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05IB/mlx4: Create mirror flows in port aggregation modeMoni Shoua2-13/+80
In port aggregation mode flows for port #1 (the only port) should be mirrored on port #2. This is because packets can arrive from either physical ports. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05IB/mlx4: Add port aggregation supportMoni Shoua1-6/+70
Register the interface with the mlx4 core driver with port aggregation support and check for port aggregation mode when the 'add' function is called. In this mode, only one physical port is reported to the upper layer (RoCE/IB core stack and ULPs). Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05IB/mlx4: Reuse mlx4_mac_to_u64()Moni Shoua1-11/+1
This function is implemented twice... get rid of one copy. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26net/mlx4_core: Maintain a persistent memory for mlx4 deviceYishai Hadas5-14/+20
Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16net/mlx4: Don't disable vxlan offloads under DMFS-A0 optimized steeringOr Gerlitz1-1/+2
Except for VXLAN steering rules, all offloads should work as they were under plain DMFS mode. Fix that by enabling all the offloads under DMFS-A0 mode, except for VXLAN steering rules. Fixes: d57febe1a478 "net/mlx4: Add A0 hybrid steering" Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mrJack Morgenstein1-1/+0
This error was detected by sparse static checker: drivers/infiniband/hw/mlx4/mr.c:226:21: warning: symbol 'err' shadows an earlier one drivers/infiniband/hw/mlx4/mr.c:197:13: originally declared here Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-11net/mlx4: Add A0 hybrid steeringMatan Barak1-2/+4
A0 hybrid steering is a form of high performance flow steering. By using this mode, mlx4 cards use a fast limited table based steering, in order to enable fast steering of unicast packets to a QP. In order to implement A0 hybrid steering we allocate resources from different zones: (1) General range (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region. When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP, we try hard to allocate the QP from range (2). Otherwise, we try hard not to allocate from this range. However, when the system is pushed to its limits and one needs every resource, the allocator uses every region it can. Meaning, when we run out of raw-eth qps, the allocator allocates from the general range (and the special-A0 area is no longer active). If we run out of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that is also exhausted, the allocator will allocate from the general range (and the A0 region is no longer active). Note that if a raw-eth qp is allocated from the general range, it attempts to allocate the range such that bits 6 and 7 (blueflame bits) in the QP number are not set. When the feature is used in SRIOV, the VF has to notify the PF what kind of QP attributes it needs. In order to do that, along with the "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According to the combination of these bits, the PF tries to allocate a suitable QP. In order to maintain backward compatibility (with older PFs), the PF notifies which QP attributes it supports via QUERY_FUNC_CAP command. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11net/mlx4: Change QP allocation schemeEugenia Emantayev2-5/+8
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset. The current Ethernet driver code reserves a Tx QP range with 256b alignment. This is wrong because if there are more than 64 Tx QPs in use, QPNs >= base + 65 will have bits 6/7 set. This problem is not specific for the Ethernet driver, any entity that tries to reserve more than 64 BF-enabled QPs should fail. Also, using ranges is not necessary here and is wasteful. The new mechanism introduced here will support reservation for "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs (when hypervisors support WC in VMs). The flow we use is: 1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation, and request "BF enabled QPs" if BF is supported for the function 2. In the ALLOC_RES FW command, change param1 to: a. param1[23:0] - number of QPs b. param1[31-24] - flags controlling QPs reservation Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits 6 and 7 unset in order to be used in Ethernet. Bits 24-30 of the flags are currently reserved. When a function tries to allocate a QP, it states the required attributes for this QP. Those attributes are considered "best-effort". If an attribute, such as Ethernet BF enabled QP, is a must-have attribute, the function has to check that attribute is supported before trying to do the allocation. In a lower layer of the code, mlx4_qp_reserve_range masks out the bits which are unsupported. If SRIOV is used, the PF validates those attributes and masks out unsupported attributes as well. In order to notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's mailbox is filled by the PF, which notifies which QP allocation attributes it supports. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11net/mlx4_core: Use tasklet for user-space CQ completion eventsMatan Barak1-1/+4
Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx4_en's and IPoIB napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for ↵Matan Barak1-2/+1
PF/VFs Previously, the driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31mlx4: Avoid leaking steering rules on flow creation error flowOr Gerlitz1-2/+8
If mlx4_ib_create_flow() attempts to create > 1 rules with the firmware, and one of these registrations fail, we leaked the already created flow rules. One example of the leak is when the registration of the VXLAN ghost steering rule fails, we didn't unregister the original rule requested by the user, introduced in commit d2fce8a9060d "mlx4: Set user-space raw Ethernet QPs to properly handle VXLAN traffic". While here, add dump of the VXLAN portion of steering rules so it can actually be seen when flow creation fails. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-24Merge tag 'rdma-for-linus' of ↵Linus Torvalds4-78/+159
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Last late set of InfiniBand/RDMA fixes for 3.17: - fixes for the new memory region re-registration support - iSER initiator error path fixes - grab bag of small fixes for the qib and ocrdma hardware drivers - larger set of fixes for mlx4, especially in RoCE mode" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/mlx4: Fix VF mac handling in RoCE IB/mlx4: Do not allow APM under RoCE IB/mlx4: Don't update QP1 in native mode IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/core: When marshaling uverbs path, clear unused fields IB/mlx4: Avoid executing gid task when device is being removed IB/mlx4: Fix lockdep splat for the iboe lock IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/iser: Bump version to 1.4.1 IB/iser: Allow bind only when connection state is UP IB/iser: Fix RX/TX CQ resource leak on error flow RDMA/ocrdma: Use right macro in query AH RDMA/ocrdma: Resolve L2 address when creating user AH mlx4: Correct error flows in rereg_mr IB/qib: Correct reference counting in debugfs qp_stats IPoIB: Remove unnecessary port query ...
2014-09-22IB/mlx4: Fix VF mac handling in RoCEJack Morgenstein2-11/+20
We had several problems here. First, a race condition on QP1 mac handling between mlx4_ib_update_qps and mlx4_ib_modify_qp, which is fixed by taking the qp mutex in mlx4_ib_update_qps. Also, qp->pri.smac_port was not updated in mlx4_ib_update_qps. Last, in __mlx4_ib_modify_qp we did not properly handle the case where the mac is zero, but port is non-zero. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Do not allow APM under RoCEJack Morgenstein2-1/+21
Automatic Path Migration is not supported under RoCE. Therefore, return a "not-supported" error if the caller attempts to set an alternate path in a QP context. In addition, if there are no IB ports configured, do not report APM capability in the device flags returned by mlx4_ib_query_device. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Don't update QP1 in native modeJack Morgenstein1-0/+4
For native functions (non-SR-IOV), there's no reason to update the smac_index, as QP1 is a GSI QP. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Avoid accessing netdevice when building RoCE qp1 headerJack Morgenstein3-16/+28
The source MAC is needed in RoCE when building the QP1 header. Currently, this is obtained from the source net device. However, the net device may not yet exist, or can be destroyed in parallel to this QP1 send operation (e.g through the VPI port change flow) so accessing it may cause a kernel crash. To fix this, we maintain a source MAC cache per port for the net device in struct mlx4_ib_roce. This cached MAC is initialized to be the default MAC address obtained during HCA initialization via QUERY_PORT. This cached MAC is updated via the netdev event notifier handler. Since the cached MAC is held in an atomic64 object, we do not need locking when accessing it. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Avoid executing gid task when device is being removedMoni Shoua1-0/+9
When device is being removed (e.g during VPI port link type change from ETH to IB), tasks for gid table changes should not be executed. Flush the current queue of tasks and block further tasks from entering the queue. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Fix lockdep splat for the iboe lockJack Morgenstein1-12/+12
Chuck Lever reported the following stack trace: ================================= [ INFO: inconsistent lock state ] 3.16.0-rc2-00024-g2e78883 #17 Tainted: G E --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (&(&iboe->lock)->rlock){+.?...}, at: [<ffffffffa065f68b>] mlx4_ib_addr_event+0xdb/0x1a0 [mlx4_ib] {SOFTIRQ-ON-W} state was registered at: [<ffffffff810b3110>] mark_irqflags+0x110/0x170 [<ffffffff810b4806>] __lock_acquire+0x2c6/0x5b0 [<ffffffff810b4bd9>] lock_acquire+0xe9/0x120 [<ffffffff815f7f6e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa0661084>] mlx4_ib_scan_netdevs+0x34/0x260 [mlx4_ib] [<ffffffffa06612db>] mlx4_ib_netdev_event+0x2b/0x40 [mlx4_ib] [<ffffffff81522219>] register_netdevice_notifier+0x99/0x1e0 [<ffffffffa06626e3>] mlx4_ib_add+0x743/0xbc0 [mlx4_ib] [<ffffffffa05ec168>] mlx4_add_device+0x48/0xa0 [mlx4_core] [<ffffffffa05ec2c3>] mlx4_register_interface+0x73/0xb0 [mlx4_core] [<ffffffffa05c505e>] cm_req_handler+0x13e/0x460 [ib_cm] [<ffffffff810002e2>] do_one_initcall+0x112/0x1c0 [<ffffffff810e8264>] do_init_module+0x34/0x190 [<ffffffff810ea62f>] load_module+0x5cf/0x740 [<ffffffff810ea939>] SyS_init_module+0x99/0xd0 [<ffffffff815f8fd2>] system_call_fastpath+0x16/0x1b irq event stamp: 336142 hardirqs last enabled at (336142): [<ffffffff810612f5>] __local_bh_enable_ip+0xb5/0xc0 hardirqs last disabled at (336141): [<ffffffff81061296>] __local_bh_enable_ip+0x56/0xc0 softirqs last enabled at (336004): [<ffffffff8106123a>] _local_bh_enable+0x4a/0x50 softirqs last disabled at (336005): [<ffffffff810617a4>] irq_exit+0x44/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&iboe->lock)->rlock); <Interrupt> lock(&(&iboe->lock)->rlock); *** DEADLOCK *** The above problem was caused by the spin lock being taken both in the process context and in a soft-irq context (in a netdev notifier handler). The required fix is to use spin_lock/unlock_bh() instead of spin_lock/unlock on the iboe lock. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes upMoni Shoua1-8/+17
When a RoCE port becomes active and the netdev of the port has upper device (e.g bond/team), GIDs derived from the upper dev should appear in the port's RoCE GID table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Reorder steps in RoCE GID table initializationMoni Shoua1-12/+17
There's no need to reset the gid table twice and we need to do it only for Ethernet ports. Also, no need to actively scan ndetdevs since it's being done immediatly after we register netdev notifiers. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Don't duplicate the default RoCE GIDMoni Shoua1-0/+4
When reading the IPv6 addresses from the net-device, make sure to avoid adding a duplicate entry to the GID table because of equality between the default GID we generate and the default IPv6 link-local address of the device. Fixes: acc4fccf4eff ("IB/mlx4: Make sure GID index 0 is always occupied") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()Moni Shoua1-23/+26
When Ethernet netdev is not present for a port (e.g. when the link layer type of the port is InfiniBand) it's possible to dereference a null pointer when we do netdevice scanning. To fix that, we move a section of code that needs to run only when netdev is present to a proper if () statement. Fixes: ad4885d279b6 ("IB/mlx4: Build the port IBoE GID table properly under bonding") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22mlx4: Correct error flows in rereg_mrMatan Barak1-2/+5
This patch addresses feedback from Sagi Grimberg on the rereg_mr implementation of mlx4. The following are fixed: 1. Set the correct pd_flags 2. Make sure we change the iova and size MR fields only after successful write and allocation of the MTTs. 3. Make the error checking more robust Fixes: e630664c8383 ("mlx4_core: Add helper functions to support MR re-registration") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19IB/mlx4: Disable TSO for Connect-X rev. A0 HCAsMarkus Stockhausen1-1/+4
According to <http://marc.info/?t=138347640900004&r=1&w=2>, revision A0 of Connect-X does not correctly assemble TSO packets. Disable that feature on that hardware revision. Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-11mlx4: Fix wrong endianess access with QP context flagsOr Gerlitz1-1/+1
We wrongly tested QP context bits without BE conversion as was spotted by sparse... drivers/infiniband/hw/mlx4/qp.c:1685:38: sparse: restricted __be32 degrades to integer Fix that! Fixes: d2fce8a ('mlx4: Set user-space raw Ethernet QPs to properly handle VXLAN traffic') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-11net/mlx4: Set vlan stripping policy by the right commandMatan Barak1-1/+1
Changing the vlan stripping policy of the QP isn't supported by older firmware versions for the INIT2RTR command. Nevertheless, we've used it. Fix that by doing this policy change using INIT2RTR only if the firmware supports it, otherwise, we call UPDATE_QP command to do the task. Fixes: 7677fc9 ('net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-30mlx4: Set user-space raw Ethernet QPs to properly handle VXLAN trafficOr Gerlitz2-1/+37
Raw Ethernet QPs opened from user-space lack the proper setup to recieve/handle VXLAN traffic when VXLAN offloads are enabled. Fix that by adding a tunnel steering rule on top of the normal unicast steering rule and set the tunnel_type field in the QP context. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', ↵Roland Dreier2-5/+3
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next
2014-08-13IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]Fabian Frederick1-4/+2
Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>