summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)AuthorFilesLines
2024-11-14RDMA/bnxt_re: Refactor NQ allocationKalesh AP3-33/+60
Move NQ related data structures from rdev to a new structure named "struct bnxt_re_nq_record" by keeping a pointer to in the rdev structure. Allocate the memory for it dynamically. This change is needed for subsequent patches in the series. Also, removed the nq_task variable from rdev structure as it is redundant and no longer used. This change would help to reduce the size of the driver private structure as well. Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reservedKalesh AP2-10/+14
L2 driver allocates and populates the MSI-x vector details for RoCE in the en_dev structure. RoCE driver requires minimum 2 MSIx vectors. Hence during probe, driver has to check and bail out if there are not enough MSI-x vectors reserved for it before proceeding further initialization. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-14RDMA/hns: Fix different dgids mapping to the same dip_idxFeng Fang5-44/+75
DIP algorithm requires a one-to-one mapping between dgid and dip_idx. Currently a queue 'spare_idx' is used to store QPN of QPs that use DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx. This method lacks a mechanism for deduplicating QPN, which may result in different dgids sharing the same dip_idx and break the one-to-one mapping requirement. This patch replaces spare_idx with xarray and introduces a refcnt of a dip_idx to indicate the number of QPs that using this dip_idx. The state machine for dip_idx management is implemented as: * The entry at an index in xarray is empty -- This indicates that the corresponding dip_idx hasn't been created. * The entry at an index in xarray is not empty but with 0 refcnt -- This indicates that the corresponding dip_idx has been created but not used as dip_idx yet. * The entry at an index in xarray is not empty and with non-0 refcnt -- This indicates that the corresponding dip_idx is being used by refcnt number of DIP QPs. Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid") Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Feng Fang <fangfeng4@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241112055553.3681129-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-12RDMA/bnxt_re: Add set_func_resources support for P5/P7 adaptersKalesh AP2-15/+7
Enable set_func_resources for P5 and P7 adapters to handle VF resource distribution. Remove setting max resources per VF during PF initialization. This change is required for firmwares which does not support RoCE VF resource management by NIC driver. The code is same for all adapters now. Reviewed-by: Stephen Shi <stephen.shi@broadcom.com> Reviewed-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-12RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration designBhargava Chenna Marreddy4-5/+14
Refine RoCE SRIOV resource configuration design, using the INITIALIZE_FW's flag as an indication for the new design to the firmware. RoCE driver does not have to provision resources to VF when firmware advertises support for RoCE resource management by NIC driver. Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-10RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()Junxian Huang1-3/+4
ib_map_mr_sg() allows ULPs to specify NULL as the sg_offset argument. The driver needs to check whether it is a NULL pointer before dereferencing it. Fixes: d387d4b54eb8 ("RDMA/hns: Fix missing pagesize and alignment check in FRMR") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241108075743.2652258-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-10RDMA/hns: Fix out-of-order issue of requester when setting FENCEJunxian Huang2-1/+2
The FENCE indicator in hns WQE doesn't ensure that response data from a previous Read/Atomic operation has been written to the requester's memory before the subsequent Send/Write operation is processed. This may result in the subsequent Send/Write operation accessing the original data in memory instead of the expected response data. Unlike FENCE, the SO (Strong Order) indicator blocks the subsequent operation until the previous response data is written to memory and a bresp is returned. Set the SO indicator instead of FENCE to maintain strict order. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241108075743.2652258-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh1-1/+1
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-04RDMA/mlx5: Add implementation for ufile_hw_cleanup device operationPatrisious Haddad3-1/+97
Implement the device API for ufile_hw_cleanup operation, which iterates over the ufile uobjects lists, and attempts to destroy DevX QPs, by issuing up to 8 commands in parallel. This function is responsible only for cleaning the FW resources of the QP, and doesn't necessarily cleanup all of its resources. Hence the normal serialized cleanup flow is still executed after it in __uverbs_cleanup_ufile() to cleanup the remaining resources and handle the cleanup of SW objects. In order to avoid double cleanup for the FW resources, new DevX flag was added DEVX_OBJ_FLAGS_HW_FREED, which marks the object's FW resources as already freed. Since QP destruction is the most time-consuming operation in FW, parallelizing it reduces the cleanup time of applications that use DevX QPs. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/2f82675d0412542cba1c47a6b86f589521ae41e1.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Ensure active slave attachment to the bond IB deviceChiara Meiohas1-10/+18
Fix a race condition when creating a lag bond in active backup mode where after the bond creation the backup slave was attached to the IB device, instead of the active slave. This caused stale entries in the GID table, as the gid updating mechanism relies on ib_device_get_netdev(), which would return the backup slave. Send an MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE event when activating the lag, additionally to when modifying the lag. This ensures that eventually the active netdevice is stored in the bond IB device. When handling this event remove the GIDs of the previously attached netdevice in this port and rescan the GIDs of the newly attached netdevice. This ensures that eventually the active slave netdevice is correctly stored in the IB device port. While there might be a brief moment where the backup slave GIDs appear in the GID table, it will eventually stabilize with the correct GIDs (of the bond and the active slave). Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/91fc2cb24f63add266a528c1c702668a80416d9f.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Call dev_put() after the blocking notifierChiara Meiohas1-1/+0
Move dev_put() call to occur directly after the blocking notifier, instead of within the event handler. Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/342ff94b3dcbb07da1c7dab862a73933d604b717.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support querying per-plane IB PortCountersMark Zhang1-1/+7
On a SMI device, set requested plane_num when querying PPCNT register with the PortCounters Attribute group. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/828d57444a0a41042556bb0a4394ecf2fcaed639.1730368052.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support OOO RX WQE consumptionEdward Srouji3-5/+55
Support QP with out-of-order (OOO) capabilities enabled. This allows WRs on the receiver side of the QP to be consumed OOO, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side. When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order. RDMA Read and RDMA Atomic operations on the responder side continue to be executed in-order, while the ordering of data placement for RDMA Write and Send operations is not guaranteed. Atomic operations larger than 8 bytes are currently not supported. Therefore, when this feature is enabled, the created QP restricts its atomic support to 8 bytes at most. In addition, when querying the device, a new flag is returned in response to indicate that the Kernel supports OOO QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/06ac609a5f358c8fb0a090d22c61a2f9329d82e6.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add debugfs hook in the driverKalesh AP7-2/+180
Adding support for a per device debugfs folder for exporting some of the device specific debug information. Added support to get QP info for now. The same folder can be used to export other debug features in future. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support raw data query for each resourcesKashyap Desai1-0/+118
Support interfaces to get the raw data for each of the resources. Use this interface to get some of the HW structures from active resources. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add support for querying HW contextsKashyap Desai5-0/+98
Implements support for querying the hardware resource contexts. This raw data can be used for the debugging of the field issues. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support driver specific data collection using rdma toolKashyap Desai1-0/+141
Allow users to dump driver specific resource details when queried through rdma tool. This supports the driver data for QP, CQ, MR and SRQ. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Remove some dead codeChristophe JAILLET1-19/+0
If the probe succeeds, then auxiliary_get_drvdata() can't return a NULL pointer. So several NULL checks can be removed to simplify code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/f02eb630734ee530315dce9f60b078f631ae93d0.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()Christophe JAILLET1-0/+8
If bnxt_re_add_device() fails, 'en_info' still needs to be freed, as already done in the .remove() function. The commit in Fixes incorrectly removed this call, certainly because it was expecting the .remove() function was called anyway. But if the probe fails, the remove function is not called. There is no need to call bnxt_re_remove() as it was done before, kfree() is enough. Fixes: a5e099e0c464 ("RDMA/bnxt_re: Fix an error path in bnxt_re_add_device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/9e48ff955ae55fc39a9eb1eb590d374539eab5ba.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/efa: Report link speed according to device attributesMichael Margolin4-2/+54
Set port link speed and width based on max bandwidth acquired from the device instead of using constant 100 Gbps. Use a default value in case the device didn't set the field. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20241030093006.21352-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/bnxt_re: Check cqe flags to know imm_data vs inv_irkeyKashyap Desai2-3/+6
Invalidate rkey is cpu endian and immediate data is in big endian format. Both immediate data and invalidate the remote key returned by HW is in little endian format. While handling the commit in fixes tag, the difference between immediate data and invalidate rkey endianness was not considered. Without changes of this patch, Kernel ULP was failing while processing inv_rkey. dmesg log snippet - nvme nvme0: Bogus remote invalidation for rkey 0x2000019Fix in this patch Do endianness conversion based on completion queue entry flag. Also, the HW completions are already converted to host endianness in bnxt_qplib_cq_process_res_rc and bnxt_qplib_cq_process_res_ud and there is no need to convert it again in bnxt_re_poll_cq. Modified the union to hold the correct data type. Fixes: 95b087f87b78 ("bnxt_re: Fix imm_data endianness") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730110014-20755-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix cpu stuck caused by printings during resetwenglianfa5-48/+41
During reset, cmd to destroy resources such as qp, cq, and mr may fail, and error logs will be printed. When a large number of resources are destroyed, there will be lots of printings, and it may lead to a cpu stuck. Delete some unnecessary printings and replace other printing functions in these paths with the ratelimited version. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Use dev_* printings in hem code instead of ibdev_*Junxian Huang1-22/+22
The hem code is executed before ib_dev is registered, so use dev_* printing instead of ibdev_* to avoid log like this: (null): set HEM address to HW failed! Fixes: 2f49de21f3e9 ("RDMA/hns: Optimize mhop get flow for multi-hop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Modify debugfs nameYuyu Li1-1/+2
The sub-directory of hns_roce debugfs is named after the device's kernel name currently, but it will be inconvenient to use when the device is renamed. Modify the name to pci name as users can always easily find the correspondence between an RDMA device and its pci name. Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix flush cqe error when racing with destroy qpwenglianfa3-2/+22
QP needs to be modified to IB_QPS_ERROR to trigger HW flush cqe. But when this process races with destroy qp, the destroy-qp process may modify the QP to IB_QPS_RESET first. In this case flush cqe will fail since it is invalid to modify qp from IB_QPS_RESET to IB_QPS_ERROR. Add lock and bit flag to make sure pending flush cqe work is completed first and no more new works will be added. Fixes: ffd541d45726 ("RDMA/hns: Add the workqueue framework for flush cqe handler") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-3-huangjunxian6@hisilicon.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix an AEQE overflow error caused by untimely update of eq_db_ciwenglianfa4-44/+91
eq_db_ci is updated only after all AEQEs are processed in the AEQ interrupt handler, which is not timely enough and may result in AEQ overflow. Two optimization methods are proposed: 1. Set an upper limit for AEQE processing. 2. Move time-consuming operations such as printings to the bottom half of the interrupt. cmd events and flush_cqe events are still fully processed in the top half to ensure timely handling. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Fix access flags for MR and QP modifyHongguang Gao1-9/+50
Access flag definition in MR and QP is different in FW. Currently both reg/bind MR and modify/query QP uses the same flags. Add a different function to map the QP access flags for newer adapters. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for modify_device hookKalesh AP3-0/+20
Adds support for modify_device in the driver for node desc changes. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for CQ rx coalescingChandramohan Akula7-1/+76
RoCE message rate performance is heavily degraded without the use of cq coalescing. With proper coalescing, message rates get better. Furthermore, coalescing significantly reduces contention on the PCIe Root Complex/Memory subsystems. Add the changes to configure CQ rx colascing parameters based on adapter revision when CQ is created. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for optimized modify QPKalesh AP4-1/+52
Modify QP improvements are for state transitions from INIT -> RTR and RTR -> RTS. In order to support the Modify QP Optimization feature, the driver is expected to check for the feature support in the CMDQ_QUERY_FUNC and register its support for this feature with the FW in CMDQ_INITIALIZE_FIRMWARE. Additionally, the driver is required to specify the new fields and attribute masks for the transitions as follows: 1. INIT -> RTR: - New fields: srq_used, type. - enable srq_used when RC QP is configured to use SRQ. - set the type based on the QP type. - Mandatory masks: - RC: CMDQ_MODIFY_QP_MODIFY_MASK_ACCESS, CMDQ_MODIFY_QP_MODIFY_MASK_PKEY - UD QP and QP1: CMDQ_MODIFY_QP_MODIFY_MASK_PKEY, CMDQ_MODIFY_QP_MODIFY_MASK_QKEY 2. RTR -> RTS: - New fields: type - set the type based on the QP type. - Mandatory masks: - RC: CMDQ_MODIFY_QP_MODIFY_MASK_ACCESS - UD QP and QP1: CMDQ_MODIFY_QP_MODIFY_MASK_QKEY Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Tushar Rane <tushar.rane@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-21RDMA/efa: Add option to set QP service level on createMichael Margolin3-1/+5
Using modify QP with AH attributes and IB_QP_AV flag set doesn't make much sense for connectionless QP types like SRD. Add SL parameter to EFA create QP user ABI and pass it to the device. Link: https://patch.msgid.link/r/20241015174242.3490-3-mrgolin@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/efa: Update device interfaceMichael Margolin6-23/+149
Update device interface header files. Link: https://patch.msgid.link/r/20241015174242.3490-2-mrgolin@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/bnxt_re: synchronize the qp-handle table arraySelvin Xavier3-4/+15
There is a race between the CREQ tasklet and destroy qp when accessing the qp-handle table. There is a chance of reading a valid qp-handle in the CREQ tasklet handler while the QP is already moving ahead with the destruction. Fixing this race by implementing a table-lock to synchronize the access. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") Link: https://patch.msgid.link/r/1728912975-19346-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/bnxt_re: Fix the usage of control path spin locksSelvin Xavier1-15/+10
Control path completion processing always runs in tasklet context. To synchronize with the posting thread, there is no need to use the irq variant of spin lock. Use spin_lock_bh instead. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1728912975-19346-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of downPatrisious Haddad1-2/+2
After the cited commit below max_dest_rd_atomic and max_rd_atomic values are being rounded down to the next power of 2. As opposed to the old behavior and mlx4 driver where they used to be rounded up instead. In order to stay consistent with older code and other drivers, revert to using fls round function which rounds up to the next power of 2. Fixes: f18e26af6aba ("RDMA/mlx5: Convert modify QP to use MLX5_SET macros") Link: https://patch.msgid.link/r/d85515d6ef21a2fa8ef4c8293dce9b58df8a6297.1728550179.git.leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/cxgb4: Dump vendor specific QP detailsLeon Romanovsky1-0/+1
Restore the missing functionality to dump vendor specific QP details, which was mistakenly removed in the commit mentioned in Fixes line. Fixes: 5cc34116ccec ("RDMA: Add dedicated QP resource tracker function") Link: https://patch.msgid.link/r/ed9844829135cfdcac7d64285688195a5cd43f82.1728323026.git.leonro@nvidia.com Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Closes: https://lore.kernel.org/all/Zv_4qAxuC0dLmgXP@gallifrey Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix the GID table lengthKalesh AP1-1/+8
GID table length is reported by FW. The gid index which is passed to the driver during modify_qp/create_ah is restricted by the sgid_index field of struct ib_global_route. sgid_index is u8 and the max sgid possible is 256. Each GID entry in HW will have 2 GID entries in the kernel gid table. So we can support twice the gid table size reported by FW. Also, restrict the max GID to 256 also. Fixes: 847b97887ed4 ("RDMA/bnxt_re: Restrict the max_gids to 256") Link: https://patch.msgid.link/r/1728373302-19530-11-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pagesBhargava Chenna Marreddy1-16/+3
Avoid memory corruption while setting up Level-2 PBL pages for the non MR resources when num_pages > 256K. There will be a single PDE page address (contiguous pages in the case of > PAGE_SIZE), but, current logic assumes multiple pages, leading to invalid memory access after 256K PBL entries in the PDE. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Link: https://patch.msgid.link/r/1728373302-19530-10-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Change the sequence of updating the CQ toggle valueChandramohan Akula2-7/+6
Currently the CQ toggle value in the shared page (read by the userlib) is updated as part of the cqn_handler. There is a potential race of application calling the CQ ARM doorbell immediately and using the old toggle value. Change the sequence of updating CQ toggle value to update in the bnxt_qplib_service_nq function immediately after reading the toggle value to be in sync with the HW updated value. Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Link: https://patch.msgid.link/r/1728373302-19530-9-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix an error path in bnxt_re_add_deviceKalesh AP1-9/+3
In bnxt_re_add_device(), when register netdev notifier fails, driver is not unregistering the IB device in the error cleanup path. Also, removed the duplicate cleanup in error path of bnxt_re_probe. Fixes: 94a9dc6ac8f7 ("RDMA/bnxt_re: Group all operations under add_device and remove_device") Link: https://patch.msgid.link/r/1728373302-19530-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loopSelvin Xavier1-0/+9
Driver waits indefinitely for the fifo occupancy to go below a threshold as soon as the pacing interrupt is received. This can cause soft lockup on one of the processors, if the rate of DB is very high. Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th if the loop is taking more time. Pacing will be continuing until the occupancy is below the threshold. This is ensured by the checks in bnxt_re_pacing_timer_exp and further scheduling the work for pacing based on the fifo occupancy. Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm") Link: https://patch.msgid.link/r/1728373302-19530-7-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix a possible NULL pointer dereferenceKalesh AP1-3/+3
There is a possibility of a NULL pointer dereference in the failure path of bnxt_re_add_device(). To address that, moved the update of "rdev->adev" to bnxt_re_dev_add(). Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information") Link: https://patch.msgid.link/r/1728373302-19530-6-git-send-email-selvin.xavier@broadcom.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Return more meaningful errorKalesh AP1-1/+1
When the HWRM command fails, driver currently returns -EFAULT(Bad address). This does not look correct. Modified to return -EIO(I/O error). Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling") Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command") Link: https://patch.msgid.link/r/1728373302-19530-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix incorrect dereference of srq in async eventKashyap Desai1-2/+5
Currently driver is not getting correct srq. Dereference only if qplib has a valid srq. Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors") Link: https://patch.msgid.link/r/1728373302-19530-4-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix out of bound checkKalesh AP1-1/+1
Driver exports pacing stats only on GenP5 and P7 adapters. But while parsing the pacing stats, driver has a check for "rdev->dbr_pacing". This caused a trace when KASAN is enabled. BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re] Write of size 8 at addr ffff8885942a6340 by task modprobe/4809 Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing") Link: https://patch.msgid.link/r/1728373302-19530-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-12RDMA/bnxt_re: Fix the max CQ WQEs for older adaptersAbhishek Mohapatra2-0/+3
Older adapters doesn't support the MAX CQ WQEs reported by older FW. So restrict the value reported to 1M always for older adapters. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1728373302-19530-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Abhishek Mohapatra<abhishek.mohapatra@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/irdma: Fix misspelling of "accept*"Alexander Zubkov1-1/+1
There is "accept*" misspelled as "accpet*" in the comments. Fix the spelling. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://patch.msgid.link/r/20241008161913.19965-1-green@qrator.net Signed-off-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARPAnumula Murali Mohan Reddy1-5/+4
ip_dev_find() always returns real net_device address, whether traffic is running on a vlan or real device, if traffic is over vlan, filling endpoint struture with real ndev and an attempt to send a connect request will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the issue by using vlan_dev_real_dev(). Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://patch.msgid.link/r/20241007132311.70593-1-anumula@chelsio.com Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-08IB/hfi1: make clear_all_interrupts staticDr. David Alan Gilbert2-2/+1
clear_all_interrupts() in hw/hfi1/chip.c is currently global but only used in the same file, so make it static. There are also 'clear_all_interrupts' functions in i2c-nomadik and emif.c but fortunately they're already static. (Build and boot tested only, I don't have this hardware) Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241007235327.128613-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-08RDMA/bnxt_re: Fix the max WQEs used in Static WQE modeSelvin Xavier1-1/+5
max_sw_wqe used for static wqe mode should be same as the max_wqe. Calculate the max_sw_wqe only for the variable WQE mode. Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Link: https://patch.msgid.link/r/1726715161-18941-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>