summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2012-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds8-33/+97
Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...
2012-07-23Merge branch 'for-next' of ↵Linus Torvalds1-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull target updates from Nicholas Bellinger: "There have been lots of work in a number of areas this past round. The highlights include: - Break out target_core_cdb.c emulation into SPC/SBC ops (hch) - Add a parse_cdb method to target backend drivers (hch) - Move sync_cache + write_same + unmap into spc_ops (hch) - Use target_execute_cmd for WRITEs in iscsi_target + srpt (hch) - Offload WRITE I/O backend submission in tcm_qla2xxx + tcm_fc (hch + nab) - Refactor core_update_device_list_for_node() into enable/disable funcs (agrover) - Replace the TCM processing thread with a TMR work queue (hch) - Fix regression in transport_add_device_to_core_hba from TMR conversion (DanC) - Remove racy, now-redundant check of sess_tearing_down with qla2xxx (roland) - Add range checking, fix reading of data len + possible underflow in UNMAP (roland) - Allow for target_submit_cmd() returning errors + convert fabrics (roland + nab) - Drop bogus struct file usage for iSCSI/SCTP (viro)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (54 commits) iscsi-target: Drop bogus struct file usage for iSCSI/SCTP target: NULL dereference on error path target: Allow for target_submit_cmd() returning errors target: Check number of unmap descriptors against our limit target: Fix possible integer underflow in UNMAP emulation target: Fix reading of data length fields for UNMAP commands target: Add range checking to UNMAP emulation target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE target: Make unnecessarily global se_dev_align_max_sectors() static target: Remove se_session.sess_wait_list qla2xxx: Remove racy, now-redundant check of sess_tearing_down target: Check sess_tearing_down in target_get_sess_cmd() sbp-target: Consolidate duplicated error path code in sbp_handle_command() target: Un-export target_get_sess_cmd() qla2xxx: Get rid of redundant qla_tgt_sess.tearing_down target: Make core_disable_device_list_for_node use pre-refactoring lock ordering target: refactor core_update_device_list_for_node() target: Eliminate else using boolean logic target: Misc retval cleanups target: Remove hba param from core_dev_add_lun ...
2012-07-19{NET,IB}/mlx4: Add rmap support to mlx4_assign_eqAmir Vadai1-1/+2
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap. If supplied, the assigned IRQ is tracked using rmap infrastructure. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()David S. Miller1-1/+1
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_compChristoph Hellwig1-5/+10
srpt_handle_rdma_comp is called from kthread context and thus can execute target_execute_cmd directly. srpt_abort_cmd sets the CMD_T_LUN_STOP flag directly, and thus the abuse of transport_generic_handle_data can be replaced with an opencoded variant of that code path. I'm still not happy about a fabric driver poking into target core internals like this, but let's defer the bigger architecture changes for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+8
Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11IPoIB: fix skb truesize underestimatiomEric Dumazet1-4/+8
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in skb_try_coalesce() This warning tracks drivers that incorrectly set skb->truesize IPoIB indeed allocates a full page to store a fragment, but only accounts in skb->truesize the used part of the page (frame length) This patch fixes skb truesize underestimation, and also fixes a performance issue, because RX skbs have not enough tailroom to allow IP and TCP stacks to pull their header in skb linear part without an expensive call to pskb_expand_head() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Shlomo Pongartz <shlomop@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-08{NET, IB}/mlx4: Add device managed flow steering firmware APIHadar Hen Zion3-7/+57
The driver is modified to support three operation modes. If supported by firmware use the device managed flow steering API, that which we call device managed steering mode. Else, if the firmware supports the B0 steering mode use it, and finally, if none of the above, use the A0 steering mode. When the steering mode is device managed, the code is modified such that L2 based rules set by the mlx4_en driver for Ethernet unicast and multicast, and the IB stack multicast attach calls done through the mlx4_ib driver are all routed to use the device managed API. When attaching rule using device managed flow steering API, the firmware returns a 64 bit registration id, which is to be provided during detach. Currently the firmware is always programmed during HCA initialization to use standard L2 hashing. Future work should be done to allow configuring the flow-steering hash function with common, non proprietary means. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-06ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock.David S. Miller1-7/+8
Otherwise local_bh_enable() complains. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05cxgb3: Convert t3_l2t_get() over to dst_neigh_lookup().David S. Miller1-2/+3
This means passing in a suitable destination address. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05ipoib: Convert over to dev_lookup_neigh_skb().David S. Miller2-10/+16
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-30netlink: add netlink_kernel_cfg parameter to netlink_kernel_createPablo Neira Ayuso1-2/+5
This patch adds the following structure: struct netlink_kernel_cfg { unsigned int groups; void (*input)(struct sk_buff *skb); struct mutex *cb_mutex; }; That can be passed to netlink_kernel_create to set optional configurations for netlink kernel sockets. I've populated this structure by looking for NULL and zero parameters at the existing code. The remaining parameters that always need to be set are still left in the original interface. That includes optional parameters for the netlink socket creation. This allows easy extensibility of this interface in the future. This patch also adapts all callers to use this new interface. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-46/+55
Conflicts: drivers/net/caif/caif_hsi.c drivers/net/usb/qmi_wwan.c The qmi_wwan merge was trivial. The caif_hsi.c, on the other hand, was not. It's a conflict between 1c385f1fdf6f9c66d982802cd74349c040980b50 ("caif-hsi: Replace platform device with ops structure.") in the net-next tree and commit 39abbaef19cd0a30be93794aa4773c779c3eb1f3 ("caif-hsi: Postpone init of HIS until open()") in the net tree. I did my best with that one and will ask Sjur to check it out. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27infiniband: netlink: Move away from NLMSG_NEW().David S. Miller1-4/+6
And use nlmsg_data() while we're here too. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-24Merge branches 'cma' and 'ocrdma' into for-linusRoland Dreier5-45/+54
2012-06-20RDMA/cma: QP type check on received REQs should be AND not ORSean Hefty1-1/+1
Change || check to the intended && when checking the QP type in a received connection request against the listening endpoint. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-15RDMA/ocrdma: Fix off by one in ocrdma_query_gid()Dan Carpenter1-1/+1
The dev->sgid_tbl[] array is allocated in ocrdma_alloc_resources(). It has OCRDMA_MAX_SGID elements so the test here is off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11RDMA/ocrdma: Fixed RQ error CQE pollingParav Pandit1-1/+3
Fix RQ/SRQ error CQE polling. Return error CQE to consumer for error case which was not returned previously. Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11RDMA/ocrdma: Correct queue SGE calculationMahesh Vardhamanaiah4-3/+10
Fix max sge calculation for sq, rq, srq for all hardware types. Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11RDMA/ocrdma: Correct reported max queue sizesMahesh Vardhamanaiah2-11/+6
Fix code to read the max wqe and max rqe values from mailbox response. Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11RDMA/ocrdma: Fixed GID table for vlan and eventsParav Pandit1-29/+34
1. Fix reporting GID table addition events. 2. Enable vlan based GID entries only when VLAN is enabled at compile time (test CONFIG_VLAN_8021Q / CONFIG_VLAN_8021Q_MODULE). Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06Merge branches 'cxgb4', 'mlx4' and 'ocrdma' into for-linusRoland Dreier9-39/+33
2012-06-06IB/mlx4: Fix max_wqe capacity reported from query deviceSagi Grimberg3-7/+24
1. Limit the max number of WQEs per QP reported when querying the device, so that ib_create_qp() will not fail for a QP size that the device claimed to support due to additional headroom WQEs being allocated. 2. Limit qp resources accepted for ib_create_qp() to the limits reported in ib_query_device(). In kernel space, make sure that the limits returned to the caller following qp creation also lie within the reported device limits. For userspace, report as before, and do adjustment in libmlx4 (so as not to break ABI). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-04IB/mlx4: Fix EQ deallocation in legacy modeShlomo Pongratz1-12/+7
Commit e605b743f33d ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") didn't handle correctly the case where there aren't enough MSI-X vectors to increase the number of EQs, so only the legacy EQs are allocated. This results in an attempt to memset() to zero the EQ table which was never allocated and a kernel crash. Fix this by checking in the teardown flow if the table of EQs was ever allocated. Also remove some unneeded setting to zero of the EQ related fields in struct mlx4_ib_dev. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-04RDMA/cxgb4: Fix crash when peer address is 0.0.0.0Thadeu Lima de Souza Cascardo1-0/+4
When using rping -c -a 0.0.0.0 with iw_cxgb4, the system crashes when rdma_connect() is called. ip_dev_find() will return NULL, but pdev is accessed anyway. Checking that pdev is NULL and returning -ENODEV prevents the system from crashing. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-29RDMA/ocrdma: Remove unnecessary version.h includesDevendra Naga2-2/+0
"make versioncheck" shows: drivers/infiniband/hw/ocrdma/ocrdma_main.c: 29 linux/version.h not needed. drivers/infiniband/hw/ocrdma/ocrdma_verbs.h: 31 linux/version.h not needed. Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-29RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHEDParav Pandit1-1/+1
Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-29RDMA/ocrdma: Correct queue free count mathParav Pandit4-17/+1
Correct queue free count math for SQ, RQ for all hardware type. Update user-kernel ABI interface. Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-22Merge tag 'rdma-for-3.5' of ↵Linus Torvalds56-408/+9679
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters - Add generic and mlx4 support for "raw" QPs: allow suitably privileged applications to send and receive arbitrary packets directly to/from the hardware - Add "doorbell drop" handling to the cxgb4 driver - A fairly large batch of qib hardware driver changes - A few fixes for lockdep-detected issues - A few other miscellaneous fixes and cleanups Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h. * tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits) RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree IB/mlx4: Fix mlx4_ib_add() error flow IB/core: Fix IB_SA_COMP_MASK macro IB/iser: Fix error flow in iser ep connection establishment IB/mlx4: Increase the number of vectors (EQs) available for ULPs RDMA/cxgb4: Add query_qp support RDMA/cxgb4: Remove kfifo usage RDMA/cxgb4: Use vmalloc() for debugfs QP dump RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch() RDMA/cxgb4: Add DB Overflow Avoidance RDMA/cxgb4: Add debugfs RDMA memory stats cxgb4: DB Drop Recovery for RDMA and LLD queues cxgb4: Common platform specific changes for DB Drop Recovery cxgb4: Detect DB FULL events and notify RDMA ULD RDMA/cxgb4: Drop peer_abort when no endpoint found RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr() mlx4_core: Change bitmap allocator to work in round-robin fashion RDMA/nes: Don't call event handler if pointer is NULL RDMA/nes: Fix for the ORD value of the connecting peer ...
2012-05-22Merge branch 'for-next' of ↵Linus Torvalds1-7/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi-target changes from Nicholas Bellinger: "There has been lots of work in existing code in a number of areas this past cycle. The major highlights have been: * Removal of transport_do_task_sg_chain() from core + fabrics (Roland) * target-core: Removal of se_task abstraction from target-core and enforce hw_max_sectors for pSCSI backends (hch) * Re-factoring of iscsi-target tx immediate/response queues (agrover) * Conversion of iscsi-target back to using target core memory allocation logic (agrover) We've had one last minute iscsi-target patch go into for-next to address a nasty regression bug related to the target core allocation logic conversion from agrover that is not included in friday's linux-next build, but has been included in this series. On the new fabric module code front for-3.5, here is a brief status update for the three currently in flight this round: * usb-gadget target driver: Sebastian Siewior's driver for supporting usb-gadget target mode operation. This will be going out as a separate PULL request from target-pending/usb-target-merge with subsystem maintainer ACKs. There is one minor target-core patch in this series required to function. * sbp ieee-1394/firewire target driver: Chris Boot's driver for supportting the Serial Block Protocol (SBP) across IEEE-1394 Firewire hardware. This will be going out as a separate PULL request from target-pending/sbp-target-merge with two additional drivers/firewire/ patches w/ subsystem maintainer ACKs. * qla2xxx LLD target mode infrastructure changes + tcm_qla2xxx: The Qlogic >= 24xx series HW target mode LLD infrastructure patch-set and tcm_qla2xxx fabric driver. Support for FC target mode using qla2xxx LLD code has been officially submitted by Qlogic to James below, and is currently outstanding but not yet merged into scsi.git/for-next.. [PATCH 00/22] qla2xxx: Updates for scsi "misc" branch http://www.spinics.net/lists/linux-scsi/msg59350.html Note there are *zero* direct dependencies upon this for-next series for the qla2xxx LLD target + tcm_qla2xxx patches submitted above, and over the last days the target mode team has been tracking down an tcm_qla2xxx specific active I/O shutdown bug that appears to now be almost squashed for 3.5-rc-fixes." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (47 commits) iscsi-target: Fix iov_count calculation bug in iscsit_allocate_iovecs iscsi-target: remove dead code in iscsi_check_valuelist_for_support target: Handle ATA_16 passthrough for pSCSI backend devices target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits target: add support for the WRITE_VERIFY command target: make target_put_session void target: cleanup transport_execute_tasks() target: Remove max_sectors device attribute for modern se_task less code target: lock => unlock typo in transport_lun_wait_for_tasks target: Enforce hw_max_sectors for SCF_SCSI_DATA_SG_IO_CDB target: remove the t_se_count field in struct se_cmd target: remove the t_task_cdbs_ex_left field in struct se_cmd target: remove the t_task_cdbs_left field in struct se_cmd target: remove struct se_task target: move the state and execute lists to the command target: simplify command to task linkage target: always allocate a single task target: replace ->execute_task with ->execute_cmd target: remove the task_sectors field in struct se_task ...
2012-05-21Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', ↵Roland Dreier55-405/+9666
'ocrdma', 'qib' and 'raw-qp' into for-linus
2012-05-21RDMA/cxgb4: Include vmalloc.h for vmalloc and vfreeVipul Pandya1-0/+1
Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19IB/mlx4: Fix mlx4_ib_add() error flowJack Morgenstein1-3/+3
We need to use a different loop index for mlx4_counter_alloc() and for device_create_file() iterations: the mlx4_counter_alloc() loop index is used in the error flow to free counters. If the same loop index is used for device_create_file() and, say, the device_create_file() loop fails on the first iteration, the allocated counters will not be freed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19IB/iser: Fix error flow in iser ep connection establishmentOr Gerlitz2-4/+4
The current error flow code was releasing the IB connection object and calling iscsi_destroy_endpoint() directly without going through the reference counting mechanism introduced in commit 39ff05d ("IB/iser: Enhance disconnection logic for multi-pathing"). This resulted in a double free of the iscsi endpoint object, which causes a kernel NULL pointer dereference. Fix that by plugging into the IB conn reference counting correctly. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19IB/mlx4: Increase the number of vectors (EQs) available for ULPsShlomo Pongratz3-0/+89
Enable IB ULPs to use a larger portion of the device EQs (which map to IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs to be divided among the device ports. In this scheme, for each IB port, the number of allocated EQs follows the number of cores, subject to other system constraints, such as number available MSI-X vectors. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Add query_qp supportVipul Pandya3-0/+32
This allows querying the QP state before flushing. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Remove kfifo usageVipul Pandya7-150/+203
Using kfifos for ID management was limiting the number of QPs and preventing NP384 MPI jobs. So replace it with a simple bitmap allocator. Remove IDs from the IDR tables before deallocating them. This bug was causing the BUG_ON() in insert_handle() to fire because the ID was getting reused before being removed from the IDR table. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Use vmalloc() for debugfs QP dumpVipul Pandya2-2/+20
This allows dumping thousands of QPs. Log active open failures of interest. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queuesVipul Pandya4-12/+259
Add module option db_fc_threshold which is the count of active QPs that trigger automatic db flow control mode. Automatically transition to/from flow control mode when the active qp count crosses db_fc_theshold. Add more db debugfs stats On DB DROP event from the LLD, recover all the iwarp queues. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()Vipul Pandya3-6/+9
Use GFP_ATOMIC in _insert_handle() if ints are disabled. Don't panic if we get an abort with no endpoint found. Just log a warning. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Add DB Overflow AvoidanceVipul Pandya4-12/+162
Get FULL/EMPTY/DROP events from LLD. On FULL event, disable normal user mode DB rings. Add modify_qp semantics to allow user processes to call into the kernel to ring doobells without overflowing. Add DB Full/Empty/Drop stats. Mark queues when created indicating the doorbell state. If we're in the middle of db overflow avoidance, then newly created queues should start out in this mode. Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can know if the driver supports the kernel mode db ringing. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Add debugfs RDMA memory statsVipul Pandya5-3/+155
Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-15RDMA/cxgb4: Drop peer_abort when no endpoint foundSteve Wise1-0/+6
Log a warning and drop the abort message. Otherwise we will do a bogus wake_up() and crash. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-15RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()Steve Wise1-4/+1
This fixes a race where an ingress abort fails to wake up the thread blocked in rdma_init() causing the app to hang. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14RDMA/nes: Don't call event handler if pointer is NULLTatyana Nikolova1-1/+2
Don't call the ibqp event_handler pointer in the case it wasn't initialized. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Donald Wood <Donald.E.Wood@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14RDMA/nes: Fix for the ORD value of the connecting peerTatyana Nikolova1-0/+4
Set ORD value of the connecting peer to be at least one in order to accommodate an RDMA READ Request message. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Donald Wood <Donald.E.Wood@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14IB/qib: Add cache line awareness to qib_qp and qib_devdata structuresMike Marciniszyn7-94/+120
This patch reorganizes the QP and devdata files to be more cache line aware. qib_qp fields in particular are split into read-mostly, send, and receive fields. qib_devdata fields are split into read-mostly and read/write fields Testing has show that bidirectional tests improve by as much as 100% with this patch. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14IB/qib: MADs with misset M_Keys should return failureJim Foraker1-1/+3
If a MAD is sent directly to the local HCA rather than placed on a QP and the MAD fails M_Key checks, there is no means to generate a timeout for the client, which may hang. Instead we report IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire packets, but will generate a send failure back to the client. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14IB/qib: Fix M_Key lease timeout handlingJim Foraker1-15/+29
If a port has an M_Key lease set, the M_Key protect bits set to 1, and a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is generated, the lease timer begins as expected, and eventually the M_Key protect bits will be set back to 0 as per the spec. However, if any other SMP with an invalid M_Key arrives, the lease timer is expired and the M_Key protect bits remain in force. This is not according to to spec. In particular, C14-17 says that a lease timer that is underway is not affected by protection level checks (ie, at protection level 1, a SubnGet with a bad M_Key may be successful, but does not stop the timer), and C14-19 says that the timer shall stop when a valid M_Key has been received. C14-19 is the only compliance statement that specifies a stopping condition for the timer. This behavior is magnified if the port's Master SM LID attribute points at itself. In that case, the M_Key mismatch trap is sufficient to expire the timer, and the mkey lease attribute is rendered useless. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14IB/qib: Fix QLE734X link cyclingMitko Haralanov1-1/+1
The SERDES was using the incorrect Frequency Loop Bandwidth setting causing the link to cycle through the Physical link negotiation state machine. Fixing the Frequency Loop Bandwidth setting in the SERDES helps the link come up faster and more reliably. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>