summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/cxgb4/provider.c
AgeCommit message (Collapse)AuthorFilesLines
2017-08-10RDMA: Simplify get firmware interfaceLeon Romanovsky1-3/+2
There is a need to forward FW version to user space application through RDMA netlink. In order to make it safe, there is need to declare nla_policy and limit the size of FW string. The new define IB_FW_VERSION_NAME_MAX will limit the size of FW version string. That define was chosen to be equal to ETHTOOL_FWVERS_LEN, because many drivers anyway are limited by that value indirectly. The introduction of this define allows us to remove the string size from get_fw_str function signature. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli1-1/+1
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb4: Convert PDBG to pr_debugJoe Perches1-19/+19
Use a more typical logging style. Miscellanea: o Obsolete the c4iw_debug module parameter o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb4: Use more common logging styleJoe Perches1-3/+1
Convert printks to pr_<level> Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-26Merge tag 'for-next-dma_ops' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...
2017-02-23Merge tag 'for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
2017-02-14IB: Query ports via the core instead of direct into the driverOr Gerlitz1-4/+4
Change the drivers to call ib_query_port in their get port immutable handler instead of their own query port handler. Doing this required to set the core cap flags of this device before the ib_query_port call is made, since the IB core might need these caps to serve the port query. Drivers are ensured by the IB core that the port attributes passed to the port query verb implementation are zero, and hence we removed the zeroing from the drivers. This patch doesn't add any new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24RDMA/core: Add the function ib_mtu_int_to_enumAmrani, Ram1-10/+1
As the functionality to convert the MTU from a number to enum_ib_mtu is ubiquitous, define a dedicated function and remove the duplicated code. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24IB/cxgb4: Set dev.parent instead of dma_deviceBart Van Assche1-1/+1
Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hariprasad S <hariprasad@chelsio.com> Acked-by: Steve Wise <swise@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10iw_cxgb4: free EQ queue memory on last derefSteve Wise1-4/+16
Commit ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") introduced a bug where the RDMA QP EQ queue memory (and QIDs) are possibly freed before the underlying connection has been fully shutdown. The result being a possible DMA read issued by HW after the queue memory has been unmapped and freed. This results in possible WR corruption in the worst case, system bus errors if an IOMMU is in use, and SGE "bad WR" errors reported in the very least. The fix is to defer unmap/free of queue memory and QID resources until the QP struct has been fully dereferenced. To do this, the c4iw_ucontext must also be kept around until the last QP that references it is fully freed. In addition, since the last QP deref can happen in an IRQ disabled context, we need a new workqueue thread to do the final unmap/free of the EQ queue memory. Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10iw_cxgb4: refactor sq/rq drain logicSteve Wise1-2/+0
With the addition of the IB/Core drain API, iw_cxgb4 supported drain by watching the CQs when the QP was out of RTS and signalling "drain complete" when the last CQE is polled. This, however, doesn't fully support the drain semantics. Namely, the drain logic is supposed to signal "drain complete" only when the application has _processed_ the last CQE, not just removed them from the CQ. Thus a small timing hole exists that can cause touch after free type bugs in applications using the drain API (nvmf, iSER, for example). So iw_cxgb4 needs a better solution. The iWARP Verbs spec mandates that "_at some point_ after the QP is moved to ERROR", the iWARP driver MUST synchronously fail post_send and post_recv calls. iw_cxgb4 was currently not allowing any posts once the QP is in ERROR. This was in part due to the fact that the HW queues for the QP in ERROR state are disabled at this point, so there wasn't much else to do but fail the post operation synchronously. This restriction is what drove the first drain implementation in iw_cxgb4 that has the above mentioned flaw. This patch changes iw_cxgb4 to allow post_send and post_recv WRs after the QP is moved to ERROR state for kernel mode users, thus still adhering to the Verbs spec for user mode users, but allowing flush WRs for kernel users. Since the HW queues are disabled, we just synthesize a CQE for this post, queue it to the SW CQ, and then call the CQ event handler. This enables proper drain operations for the various storage applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Let create_ah return extended response to userMoni Shoua1-1/+3
Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/{core,hw}: Add constant for node_descYuval Shaia1-0/+1
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/cxgb4: Support device FW version stringIra Weiny1-16/+15
And remove sysfs fw_ver in favor of the core. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/core: Make device counter infrastructure dynamicChristoph Lameter1-9/+49
In practice, each RDMA device has a unique set of counters that the hardware implements. Having a central set of counters that they must all adhere to is limiting and causes many useful counters to not be available. Therefore we create a dynamic counter registration infrastructure. The driver must implement a stats structure allocation routine, in which the driver must place the directory name it wants, a list of names for all of the counters, an array of u64 counters themselves, plus a few generic configuration options. We then implement a core routine to create a sysfs file for each of the named stats elements, and a core routine to retrieve the stats when any of the sysfs attribute files are read. To avoid excessive beating on the stats generation routine in the drivers, the core code also caches the stats for a short period of time so that someone attempting to read all of the stats in a given device's directory will not result in a stats generation call per file read. Future work will attempt to standardize just the shared stats elements, and possibly add a method to get the stats via netlink in addition to sysfs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [ Add caching, make structure names more informative, add i40iw support, other significant rewrites from the original patch ]
2016-04-26iw_cxgb4: initialize ibdev.iwcm->ifname for port mappingSteve Wise1-0/+2
The IWCM uses ibdev.iwcm->ifname for registration with the iwarp port map daemon. But iw_cxgb4 did not initialize this field which causes intermittent registration failures based on the contents of the uninitialized memory. Fixes: 170003c894d9 ("iw_cxgb4: remove port mapper related code") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6Doug Ledford1-1/+2
2016-03-01iw_cxgb4: add queue drain functionsSteve Wise1-0/+2
Add completion objects, named sq_drained and rq_drained, to the c4iw_qp struct. The queue-specific completion object is signaled when the last CQE is drained from the CQ for that queue. Add c4iw_drain_sq() to block until qp->rq_drained is completed. Add c4iw_drain_rq() to block until qp->sq_drained is completed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-01iw_cxgb4: Max fastreg depth depends on DSGL supportHariprasad S1-1/+2
The max depth of a fastreg mr depends on whether the device supports DSGL or not. So compute it dynamically based on the device support and the module use_dsgl option. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove in-kernel support for memory windowsChristoph Hellwig1-1/+0
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove support for phys MRsChristoph Hellwig1-2/+0
We have stopped using phys MRs in the kernel a while ago, so let's remove all the cruft used to implement them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29iw_cxgb4: Remove old FRWR APISagi Grimberg1-2/+0
No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29iw_cxgb4: Support the new memory registration APISagi Grimberg1-0/+1
Support the new memory registration API by allocating a private page list array in c4iw_mr and populate it when c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating build_fastreg just take the needed information from different places: - page_size, iova, length (ib_mr) - page array (c4iw_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22iw_cxgb4: Adds support for T6 adapterHariprasad S1-1/+1
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31iw_cxgb4: Support ib_alloc_mr verbSagi Grimberg1-1/+1
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-4/+4
Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
2015-06-12IB/mad: Add support for additional MAD info to/from driversIra Weiny1-2/+5
In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Pass hardware specific data in query_deviceMatan Barak1-2/+6
Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-04cxgb4: Add ethtool support to get adapter statsHariprasad Shenai1-4/+4
Add ethtool support to get adapter specific hardware statistics Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-02IB/core cleanup: Add const on args - device->process_madIra Weiny1-2/+3
The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20IB/core: Convert core to use bitfield for capsIra Weiny1-7/+1
Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20IB/core: Add per port immutable struct to ib_deviceIra Weiny1-0/+17
As of commit 5eb620c81ce3 "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Implement new callback query_protocol()Michael Wang1-0/+7
Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2014-11-23RDMA/cxgb4/cxgb4vf/csiostor: Cleanup macros/register defines related to ↵Hariprasad Shenai1-4/+4
PCIE, RSS and FW This patch cleanups all PCIE, RSS & FW related macros/register defines that are defined in t4fw_api.h and the affected files. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22iw_cxgb4: advertise the correct device max attributesHariprasad Shenai1-2/+2
Advertise the actual max limits for things like qp depths, number of qps, cqs, etc. Clean up the queue allocation for qps and cqs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16cxgb4/iw_cxgb4: use firmware ord/ird resource limitsHariprasad Shenai1-2/+4
Advertise a larger max read queue depth for qps, and gather the resource limits from fw and use them to avoid exhaustinq all the resources. Design: cxgb4: Obtain the max_ordird_qp and max_ird_adapter device params from FW at init time and pass them up to the ULDs when they attach. If these parameters are not available, due to older firmware, then hard-code the values based on the known values for older firmware. iw_cxgb4: Fix the c4iw_query_device() to report these correct values based on adapter parameters. ibv_query_device() will always return: max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp) max_res_rd_atom = max_ird_adapter Bump up the per qp max module option to 32, allowing it to be increased by the user up to the device max of max_ordird_qp. 32 seems to be sufficient to maximize throughput for streaming read benchmarks. Fail connection setup if the negotiated IRD exhausts the available adapter ird resources. So the driver will track the amount of ird resource in use and not send an RI_WR/INIT to FW that would reduce the available ird resources below zero. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16iw_cxgb4: Detect Ing. Padding Boundary at run-timeHariprasad Shenai1-2/+2
Updates iw_cxgb4 to determine the Ingress Padding Boundary from cxgb4_lld_info, and take subsequent actions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+1
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-11iw_cxgb4: Allocate and use IQs specifically for indirect interruptsHariprasad Shenai1-1/+1
Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA RXQs, which also handle direct interrupts for offload CPLs during RDMA connection setup/teardown. The intended T4 usage model, however, is to have indirect interrupts flow through dedicated IQs. IE not to mix indirect interrupts with CPL messages in an IQ. This patch adds the concept of RDMA concentrator IQs, or CIQs, setup and maintained by the LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will flow through the LLD's RDMA RXQs, and CQ interrupts flow through the CIQs. Design: cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs are sized according to the max available CQs available at adapter init. In addition, these IQs don't need FL buffers since they only service indirect interrupts. One CIQ is setup per RX channel similar to the RDMA RXQs. iw_cxgb4 will utilize these CIQs based on the vector value passed into create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the number of CIQs configured, and thus the vector value will be the index into the array of CIQs. Based on original work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_respYann Droneaud1-2/+3
The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_alloc_ucontext_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp { __u64 status_page_key; /* 0 8 */ __u32 status_page_size; /* 8 4 */ - /* size: 12, cachelines: 1, members: 2 */ - /* last cacheline: 12 bytes */ + /* size: 16, cachelines: 1, members: 2 */ + /* padding: 4 */ + /* last cacheline: 16 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. Additionally, as reported by Dan Carpenter, without the implicit padding being properly cleared, an information leak would take place in most architectures. This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_alloc_ucontext_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain Link: http://marc.info/?i=20140328082428.GH25192@mwanda Cc: <stable@vger.kernel.org> Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes") Reported-by: Yann Droneaud <ydroneaud@opteya.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11RDMA/cxgb4: Max fastreg depth depends on DSGL supportSteve Wise1-1/+1
The max depth of a fastreg mr depends on whether the device supports DSGL or not. So compute it dynamically based on the device support and the module use_dsgl option. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-28RDMA/cxgb4: set error code on kmalloc() failureYann Droneaud1-1/+3
If kmalloc() fails in c4iw_alloc_ucontext(), the function leaves but does not set an error code in ret variable: it will return 0 to the caller. This patch set ret to -ENOMEM in such case. Cc: Steve Wise <swise@opengridcomputing.com> Cc: Steve Wise <swise@chelsio.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-15cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug FixesSteve Wise1-2/+41
The current logic suffers from a slow response time to disable user DB usage, and also fails to avoid DB FIFO drops under heavy load. This commit fixes these deficiencies and makes the avoidance logic more optimal. This is done by more efficiently notifying the ULDs of potential DB problems, and implements a smoother flow control algorithm in iw_cxgb4, which is the ULD that puts the most load on the DB fifo. Design: cxgb4: Direct ULD callback from the DB FULL/DROP interrupt handler. This allows the ULD to stop doing user DB writes as quickly as possible. While user DB usage is disabled, the LLD will accumulate DB write events for its queues. Then once DB usage is reenabled, a single DB write is done for each queue with its accumulated write count. This reduces the load put on the DB fifo when reenabling. iw_cxgb4: Instead of marking each qp to indicate DB writes are disabled, we create a device-global status page that each user process maps. This allows iw_cxgb4 to only set this single bit to disable all DB writes for all user QPs vs traversing the idr of all the active QPs. If the libcxgb4 doesn't support this, then we fall back to the old approach of marking each QP. Thus we allow the new driver to work with an older libcxgb4. When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes via the status page and transition the DB state to STOPPED. As user processes see that DB writes are disabled, they call into iw_cxgb4 to submit their DB write events. Since the DB state is in STOPPED, the QP trying to write gets enqueued on a new DB "flow control" list. As subsequent DB writes are submitted for this flow controlled QP, the amount of writes are accumulated for each QP on the flow control list. So all the user QPs that are actively ringing the DB get put on this list and the number of writes they request are accumulated. When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq context, we change the DB state to FLOW_CONTROL, and begin resuming all the QPs that are on the flow control list. This logic runs on until the flow control list is empty or we exit FLOW_CONTROL mode (due to a DB DROP upcall, for example). QPs are removed from this list, and their accumulated DB write counts written to the DB FIFO. Sets of QPs, called chunks in the code, are removed at one time. The chunk size is 64. So 64 QPs are resumed at a time, and before the next chunk is resumed, the logic waits (blocks) for the DB FIFO to drain. This prevents resuming to quickly and overflowing the FIFO. Once the flow control list is empty, the db state transitions back to NORMAL and user QPs are again allowed to write directly to the user DB register. The algorithm is designed such that if the DB write load is high enough, then all the DB writes get submitted by the kernel using this flow controlled approach to avoid DB drops. As the load lightens though, we resume to normal DB writes directly by user applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-14RDMA/cxgb4: Add Support for Chelsio T5 adapterVipul Pandya1-4/+11
Adds support for Chelsio T5 adapter. Enables T5's Write Combining feature. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19RDMA/cxgb4: Add query_qp supportVipul Pandya1-0/+2
This allows querying the QP state before flushing. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Remove kfifo usageVipul Pandya1-6/+3
Using kfifos for ID management was limiting the number of QPs and preventing NP384 MPI jobs. So replace it with a simple bitmap allocator. Remove IDs from the IDR tables before deallocating them. This bug was causing the BUG_ON() in insert_handle() to fire because the ID was getting reused before being removed from the IDR table. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-19RDMA/cxgb4: Add debugfs RDMA memory statsVipul Pandya1-0/+8
Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05IB: Use central enum for speed instead of hard-coded valuesOr Gerlitz1-1/+1
The kernel IB stack uses one enumeration for IB speed, which wasn't explicitly specified in the verbs header file. Add that enum, and use it all over the code. The IB speed/width notation is also used by iWARP and IBoE HW drivers, which use the convention of rate = speed * width to advertise their port link rate. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-19RDMA: Allow for NULL .modify_device() and .modify_port() methodsBart Van Assche1-8/+0
These methods don't make sense for iWARP devices, so rather than forcing them to implement stubs, just return -ENOSYS in the core if the hardware driver doesn't set .modify_device and/or .modify_port. Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-10RDMA/cxgb4: EEH errors can hang the driverSteve Wise1-2/+0
A few more EEH fixes: c4iw_wait_for_reply(): detect fatal EEH condition on timeout and return an error. The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH event followed by a ib_register_device() when the device was reinitialized. However, the RDMA core doesn't allow multiple iterations of register/deregister by the provider. See drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where the kobject ref is held until the device is deallocated in ib_deallocate_device(). Calling deregister adds this kobj reference, and then a subsequent register call will generate a WARN_ON() from the kobject subsystem because the kobject is being initialized but is already initialized with the ref held. So the provider must deregister and dealloc when resetting for an EEH event, then alloc/register to re-initialize. To do this, we cannot use the device ptr as our ULD handle since it will change with each reallocation. This commit adds a ULD context struct which is used as the ULD handle, and then contains the device pointer and other state needed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>