summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5
AgeCommit message (Collapse)AuthorFilesLines
2019-07-03IB/mlx5: Register DEVX with mlx5_core to get async eventsYishai Hadas3-2/+48
Register DEVX with with mlx5_core to get async events. This will enable to dispatch the applicable events to its consumers in down stream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FDYishai Hadas1-0/+95
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async events. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()Parav Pandit1-1/+1
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev*, change its helper functions also to accept const *dev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe10-70/+103
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/mlx5: Fixed reporting counters on 2nd port for Dual port RoCEParav Pandit1-24/+36
Currently during dual port IB device registration in below code flow, ib_register_device() ib_device_register_sysfs() ib_setup_port_attrs() add_port() get_counter_table() get_perf_mad() process_mad() mlx5_ib_process_mad() mlx5_ib_process_mad() fails on 2nd port when both the ports are not fully setup at the device level (because 2nd port is unaffiliated). As a result, get_perf_mad() registers different PMA counter group for 1st and 2nd port, namely pma_counter_ext and pma_counter. However both ports have the same capability and counter offsets. Due to this when counters are read by the user via sysfs in below code flow, counters are queried from wrong location from the device mainly from PPCNT instead of VPORT counters. show_pma_counter() get_perf_mad() process_mad() mlx5_ib_process_mad() process_pma_cmd() This shows all zero counters for 2nd port. To overcome this, process_pma_cmd() is invoked, and when unaffiliated port is not yet setup during device registration phase, make the query on the first port. while at it, only process_pma_cmd() needs to work on the native port number and underlying mdev, so shift the get, put calls to where its needed inside process_pma_cmd(). Fixes: 212f2a87b74f ("IB/mlx5: Route MADs for dual port RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas3-3/+3
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: mlx5_core_create_cq() enhancementsYishai Hadas1-1/+2
Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Use event mask based on device capabilitiesYishai Hadas1-1/+1
Use the reported device capabilities for the supported user events (i.e. affiliated and un-affiliated) to set the EQ mask. As the event mask can be up to 256 defined by 4 entries of u64 change the applicable code to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-02net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang2-2/+2
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-02RDMA/mlx5: Cleanup rep when doing unloadBodong Wang1-7/+11
When an IB rep is loaded, netdev for the same vport is saved for later reference. However, it's not cleaned up when doing unload. For ECPF, kernel crashes when driver is referring to the already removed netdev. Following steps lead to a shown call trace: 1. Create n VFs from host PF 2. Distroy the VFs 3. Run "rdma link" from ARM Call trace: mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib] mlx5_query_port_roce+0x268/0x558 [mlx5_ib] mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib] ib_query_port+0x9c/0xfc [ib_core] fill_port_info+0x74/0x28c [ib_core] nldev_port_get_doit+0x1a8/0x1e8 [ib_core] rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core] rdma_nl_rcv+0xe8/0x144 [ib_core] netlink_unicast+0x184/0x214 netlink_sendmsg+0x288/0x354 sock_sendmsg+0x18/0x2c __sys_sendto+0xbc/0x138 __arm64_sys_sendto+0x28/0x34 el0_svc_common+0xb0/0x100 el0_svc_handler+0x6c/0x84 el0_svc+0x8/0xc Cleanup the rep and netdev reference when unloading IB rep. Fixes: 26628e2d58c9 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-02{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mappingBodong Wang2-3/+2
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-29Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe6-15/+23
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-26RDMA/mlx5: Add vport metadata matching for IB representorsJianbo Liu1-9/+36
If vport metadata matching is enabled in eswitch, the rule created must be changed to match on the metadata, instead of source port. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu3-16/+28
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25net/mlx5: Convert mkey_table to XArrayMatthew Wilcox4-28/+18
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24RDMA/mlx5: Refactor MR descriptors allocationMax Gurtovoy1-133/+157
Improve code readability using static helpers for each memory region type. Re-use the common logic to get smaller functions that are easy to maintain and reduce code duplication. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Use PA mapping for PI handoverMax Gurtovoy3-30/+114
If possibe, avoid doing a UMR operation to register data and protection buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the SG lists using PA access. This is safe, since the internal key for data and protection never exposed to the remote server (only signature key might be exposed). If PA mappings are not possible, perform mapping using MTT/KLM descriptors. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1266.4K/1262.4K 1720.1K/1732.1K 4k 793139/570902 1129.6K/773982 32k 72660/72086 97229/96164 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1590.2K/1600.1K 1828.2K/1830.3K 4k 1078.1K/937272 1142.1K/815304 32k 77012/77369 98125/97435 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Improve PI handover performanceIsrael Rukshin3-24/+164
In some loads, there is performance degradation when using KLM mkey instead of MTT mkey. This is because KLM descriptor access is via indirection that might require more HW resources and cycles. Using KLM descriptor is not necessary when there are no gaps at the data/metadata sg lists. As an optimization, use MTT mkey whenever it is possible. For that matter, allocate internal MTT mkey and choose the effective pi_mr for in transaction according to the required mapping scheme. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o/baseline): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1262.4K/1243.3K/1147.1K 1732.1K/1725.1K/1423.8K 4k 570902/571233/457874 773982/743293/642080 32k 72086/72388/71933 96164/71789/93249 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1600.1K/1572.1K/1393.3K 1830.3K/1823.5K/1557.2K 4k 937272/921992/762934 815304/753772/646071 32k 77369/75052/72058 97435/73180/94612 Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Idan Burstein <idanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin2-153/+16
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Validate integrity handover device capMax Gurtovoy2-7/+2
Protect the case that a ULP tries to allocate a QP with signature enabled flag while the LLD doesn't support this feature. While we're here, also move integrity_en attribute from mlx5_qp to ib_qp as a preparation for adding new integrity API to the rw-API (that is part of ib_core module). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Rename signature qp create flag and signature device capabilityIsrael Rukshin3-10/+9
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work requestMax Gurtovoy1-20/+198
This new WR will be used to perform PI (protection information) handover using the new API. Using the new API, the user will post a single WR that will internally perform all the needed actions to complete PI operation. This new WR will use a memory region that was allocated as IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the registration. In the old API, in order to perform a signature handover operation, each ULP should perform the following: 1. Map and register the data buffers. 2. Map and register the protection buffers. 3. Post a special reg WR to configure the signature handover operation layout. 4. Invalidate the signature memory key. 5. Invalidate protection buffers memory key. 6. Invalidate data buffers memory key. In the new API, the mapping of both data and protection buffers is performed using a single call to ib_map_mr_sg_pi function. Also the registration of the buffers and the configuration of the signature operation layout is done by a single new work request called IB_WR_REG_MR_INTEGRITY. This patch implements this operation for mlx5 devices that are capable to offload data integrity generation/validation while performing the actual buffer transfer. This patch will not remove the old signature API that is used by the iSER initiator and target drivers. This will be done in the future. In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work request, we are using a single UMR operation to register both data and protection buffers using KLM's. Afterwards, another UMR operation will describe the strided block format. These will be followed by 2 SET_PSV operations to set the memory/wire domains initial signature parameters passed by the user. In the end of the whole transaction, only the signature memory key (the one that exposed for the RDMA operation) will be invalidated. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Update set_sig_data_segment attribute for new signature APIMax Gurtovoy1-6/+5
Explicitly pass the sig_mr and the access flags for the mkey segment configuration. This function will be used also in the new signature API, so modify it in order to use it in both APIs. This is a preparation commit before adding new signature API. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Pass UMR segment flags instead of booleanMax Gurtovoy1-7/+12
UMR ctrl segment flags can vary between UMR operations. for example, using inline UMR or adding free/not-free checks for a memory key. This is a preparation commit before adding new signature API that will not need not-free checks for the internal memory key during the UMR operation. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Add attr for max number page list length for PI operationMax Gurtovoy1-0/+2
PI offload (protection information) is a feature that each RDMA provider can implement differently. Thus, introduce new device attribute to define the maximal length of the page list for PI fast registration operation. For example, mlx5 driver uses a single internal MR to map both data and protection SGL's, so it's equal to max_fast_reg_page_list_len / 2. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrityMax Gurtovoy3-11/+189
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the mlx5 memory region prior to the registration operation. In the new API, the mlx5 driver will allocate an internal memory region for the UMR operation to register both PI and data SG lists. The internal MR will use KLM mode in order to map 2 (possibly non-contiguous/non-align) SG lists using 1 memory key. In the new API, each ULP will use 1 memory region for the signature operation (instead of 3 in the old API). This memory region will have a key that will be exposed to remote server to perform RDMA operation. The internal memory key that will map the SG lists will stay private. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky3-28/+14
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20RDMA: Convert destroy_wq to be voidLeon Romanovsky2-4/+2
All callers of destroy WQ are always success and there is no need to check their return value, so convert destroy_wq to be void. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-19RDMA/mlx5: Enable decap and packet reformat on FDBMaor Gottlieb1-0/+5
If FDB flow tables support decap operation, enable it on creation, This allows to perform decapsulation of tunnelled packets by steering rules. If FDB flow tables support reformat operation, enable it on creation as well. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-19RDMA/mlx5: Consider eswitch encap modeMaor Gottlieb1-6/+14
When flow steering is created, then the encap support should consider the eswitch encap mode. If the eswitch flow table (FDB) supports encap then it shouldn't be supported on NIC RX flow tables. Fixes: 4adda1122c490 ('RDMA/mlx5: Enable decap and packet reformat on flow tables') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-19Merge remote-tracking branch 'mlx5-next/mlx5-next' into HEADDoug Ledford4-16/+27
Take mlx5-next so we can take a dependent two patch series next. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-13net/mlx5: Add EQ enable/disable APIYuval Avnery1-1/+8
Previously, EQ joined the chain notifier on creation. This forced the caller to be ready to handle events before creating the EQ through eq_create_generic interface. To help the caller control when the created EQ will be attached to the IRQ, add enable/disable API. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Use a single IRQ for all async EQsAriel Levkovich1-1/+1
The patch modifies the IRQ allocation so that all async EQs are assigned to the same IRQ resulting in more available IRQs for completion EQs. The changes are using the support for IRQ sharing and EQ polling budget that was introduced in previous patches so when the shared interrupt is triggered, the kernel will serially call the handler of each of the sharing EQs with a certain budget of EQEs to poll in order to prevent starvation. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Separate IRQ request/free from EQ life cycleYuval Avnery1-1/+1
Instead of requesting IRQ with eq creation, IRQs will be requested before EQ table creation. Instead of freeing the IRQs after EQ destroy, free IRQs after eq table destroy. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Change interrupt handler to call chain notifierYuval Avnery2-4/+7
Multiple EQs may share the same IRQ in subsequent patches. Instead of calling the IRQ handler directly, the EQ will register to an atomic chain notfier. The Linux built-in shared IRQ is not used because it forces the caller to disable the IRQ and clear affinity before free_irq() can be called. This patch is the first step in the separation of IRQ and EQ logic. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-11RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky3-32/+26
Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Clean destroy CQ in drivers do not return errorsLeon Romanovsky2-4/+2
Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-10RDMA: Move owner into struct ib_device_opsJason Gunthorpe1-1/+1
This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe1-1/+1
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe1-1/+2
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-3/+8
Pull rdma fixes from Jason Gunthorpe: "Things are looking pretty quiet here in RDMA, not too many bug fixes rolling in right now. The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/efa: Remove MAYEXEC flag check from mmap flow mlx5: avoid 64-bit division IB/hfi1: Validate page aligned for a given virtual address IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value IB/hfi1: Insure freeze_work work_struct is canceled on shutdown IB/rdmavt: Fix alloc_qpn() WARN_ON() RDMA/core: Fix panic when port_data isn't initialized RDMA/uverbs: Pass udata on uverbs error unwind RDMA/core: Clear out the udata before error unwind RDMA/hns: Fix PD memory leak for internal allocation RDMA/srp: Rename SRP sysfs name after IB device rename trigger
2019-05-31{IB,net}/mlx5: Constify rep ops functions pointersParav Pandit2-10/+11
Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31{IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev*Parav Pandit1-1/+1
Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv* as it is not needed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-29mlx5: avoid 64-bit divisionMichal Kubecek2-3/+8
Commit 25c13324d03d ("IB/mlx5: Add steering SW ICM device memory type") breaks i386 build by introducing three 64-bit divisions. As the divisor is MLX5_SW_ICM_BLOCK_SIZE() which is always a power of 2, we can replace the division with bit operations. Fixes: 25c13324d03d ("IB/mlx5: Add steering SW ICM device memory type") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-21Merge tag 'spdx-5.2-rc2' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...
2019-05-21RDMA/umem: Move page_shift from ib_umem to ib_odp_umemJason Gunthorpe3-25/+23
This value has always been set to PAGE_SHIFT in the core code, the only thing that does differently was the ODP path. Move the value into the ODP struct and still use it for ODP, but change all the non-ODP things to just use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2-0/+2
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-12/+13
Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung Cheng. 3) Orphan NFC, we'll take the patches directly into my tree. From Johannes Berg. 4) We can't recycle cloned TCP skbs, from Eric Dumazet. 5) Some flow dissector bpf test fixes, from Stanislav Fomichev. 6) Fix RCU marking and warnings in rhashtable, from Herbert Xu. 7) Fix some potential fib6 leaks, from Eric Dumazet. 8) Fix a _decode_session4 uninitialized memory read bug fix that got lost in a merge. From Florian Westphal. 9) Fix ipv6 source address routing wrt. exception route entries, from Wei Wang. 10) The netdev_xmit_more() conversion was not done %100 properly in mlx5 driver, fix from Tariq Toukan. 11) Clean up botched merge on netfilter kselftest, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits) of_net: fix of_get_mac_address retval if compiled without CONFIG_OF net: fix kernel-doc warnings for socket.c net: Treat sock->sk_drops as an unsigned int when printing kselftests: netfilter: fix leftover net/net-next merge conflict mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM mlxsw: core: Prevent QSFP module initialization for old hardware vsock/virtio: Initialize core virtio vsock before registering the driver net/mlx5e: Fix possible modify header actions memory leak net/mlx5e: Fix no rewrite fields with the same match net/mlx5e: Additional check for flow destination comparison net/mlx5e: Add missing ethtool driver info for representors net/mlx5e: Fix number of vports for ingress ACL configuration net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled net/mlx5e: Fix wrong xmit_more application net/mlx5: Fix peer pf disable hca command net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index net/mlx5: Add meaningful return codes to status_to_err function net/mlx5: Imply MLXFW in mlx5_core Revert "tipc: fix modprobe tipc failed after switch order of device registration" vsock/virtio: free packets during the socket release ...
2019-05-17net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_indexParav Pandit2-12/+13
To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-3/+10
Pull more rdma updates from Jason Gunthorpe: "This is being sent to get a fix for the gcc 9.1 build warnings, and I've also pulled in some bug fix patches that were posted in the last two weeks. - Avoid the gcc 9.1 warning about overflowing a union member - Fix the wrong callback type for a single response netlink to doit - Bug fixes from more usage of the mlx5 devx interface" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net/mlx5: Set completion EQs as shared resources IB/mlx5: Verify DEVX general object type correctly RDMA/core: Change system parameters callback from dumpit to doit RDMA: Directly cast the sockaddr union to sockaddr