kernel/linux.git/drivers/infiniband/hw, branch v6.6.141

RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()

2026-05-23T11:03:35+00:00

[ Upstream commit 6aaa978c6b6218cfac15fe1dab17c76fe229ce3f ] Sashiko points out that mana_ib_cfg_vport_steering() is leaked, the normal destroy path cleans it up. Cc: stable@vger.kernel.org Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/7-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

RDMA/mana: Validate rx_hash_key_len

2026-05-23T11:03:34+00:00

[ Upstream commit 6dd2d4ad9c8429523b1c220c5132bd551c006425 ] Sashiko points out that rx_hash_key_len comes from a uAPI structure and is blindly passed to memcpy, allowing the userspace to trash kernel memory. Bounds check it so the memcpy cannot overflow. Cc: stable@vger.kernel.org Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1 Link: https://patch.msgid.link/r/4-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li Signed-off-by: Jason Gunthorpe [ kept the stable branch's existing `req_buf_size` calculation instead of upstream's `struct_size(req, indir_tab, ...)` form ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

RDMA/mana_ib: Disable RX steering on RSS QP destroy

2026-05-17T15:13:49+00:00

[ Upstream commit dbeb256e8dd87233d891b170c0b32a6466467036 ] When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss() destroys the RX WQ objects but does not disable vPort RX steering in firmware. This leaves stale steering configuration that still points to the destroyed RX objects. If traffic continues to arrive (e.g. peer VM is still transmitting) and the VF interface is subsequently brought up (mana_open), the firmware may deliver completions using stale CQ IDs from the old RX objects. These CQ IDs can be reused by the ethernet driver for new TX CQs, causing RX completions to land on TX CQs: WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana] (is_sq == false) WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails) Fix this by disabling vPort RX steering before destroying RX WQ objects. Note that mana_fence_rqs() cannot be used here because the fence completion is delivered on the CQ, which is polled by user-mode (e.g. DPDK) and not visible to the kernel driver. Refactor the disable logic into a shared mana_disable_vport_rx() in mana_en, exported for use by mana_ib, replacing the duplicate code. The ethernet driver's mana_dealloc_queues() is also updated to call this common function. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Cc: stable@vger.kernel.org Signed-off-by: Long Li Link: https://patch.msgid.link/20260325194100.1929056-1-longli@microsoft.com Signed-off-by: Leon Romanovsky [ kept early-return error handling and used unquoted NET_MANA namespace in EXPORT_SYMBOL_NS ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path

2026-05-17T15:13:42+00:00

commit e38e86995df27f1f854063dab1f0c6a513db3faf upstream. Sashiko points out that pvrdma_uar_free() is already called within pvrdma_dealloc_ucontext(), so calling it before triggers a double free. Cc: stable@vger.kernel.org Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/10-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()

2026-05-17T15:13:42+00:00

commit 34fbf48cf3b410d2a6e8c586fa952a36331ca5ba upstream. Sashiko points out that pd->uctx isn't initialized until late in the function so all these error flow references are NULL and will crash. Use the uctx that isn't NULL. Cc: stable@vger.kernel.org Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/9-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init()

2026-05-17T15:13:42+00:00

commit c488df06bd552bb8b6e14fa0cfd5ad986c6e9525 upstream. mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When ib_create_srq() fails for s1, the error branch destroys s0 but falls through and unconditionally assigns the freed s0 and the ERR_PTR s1 to devr->s0 and devr->s1. This leads to several problems: the lock-free fast path checks "if (devr->s1) return 0;" and treats the ERR_PTR as already initialised; users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences the ERR_PTR and double-frees s0 on teardown. Fix by adding the same `goto unlock` in the s1 failure path. Cc: stable@vger.kernel.org Fixes: 5895e70f2e6e ("IB/mlx5: Allocate resources just before first QP/SRQ is created") Link: https://patch.msgid.link/r/SYBPR01MB7881E1E0970268BD69C0BA75AF2B2@SYBPR01MB7881.ausprd01.prod.outlook.com Reported-by: Yuhao Jiang Signed-off-by: Junrui Luo Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq()

2026-05-17T15:13:42+00:00

commit c54c7e4cb679c0aaa1cb489b9c3f2cd98e63a44c upstream. Sashiko points out that mlx4_srq_alloc() was not undone during error unwind, add the missing call to mlx4_srq_free(). Cc: stable@vger.kernel.org Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=8 Link: https://patch.msgid.link/r/11-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

RDMA/hns: Fix unlocked call to hns_roce_qp_remove()

2026-05-17T15:13:41+00:00

commit 0c99acbc8b6c6dd526ae475a48ee1897b61072fb upstream. Sashiko points out that hns_roce_qp_remove() requires the caller to hold locks. The error flow in hns_roce_create_qp_common() doesn't hold those locks for the error unwind so it risks corrupting memory. Grab the same locks the other two callers use. Cc: stable@vger.kernel.org Fixes: e088a685eae9 ("RDMA/hns: Support rq record doorbell for the user space") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=9 Link: https://patch.msgid.link/r/15-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Junxian Huang Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman

RDMA/irdma: Fix double free related to rereg_user_mr

2026-04-27T13:23:25+00:00

[ Upstream commit 29a3edd7004bb635d299fb9bc6f0ea4ef13ed5a2 ] If IB_MR_REREG_TRANS is set during rereg_user_mr, the umem will be released and a new one will be allocated in irdma_rereg_mr_trans. If any step of irdma_rereg_mr_trans fails after the new umem is allocated, it releases the umem, but does not set iwmr->region to NULL. The problem is that this failure is propagated to the user, who will then call ibv_dereg_mr (as they should). Then, the dereg_mr path will see a non-NULL umem and attempt to call ib_umem_release again. Fix this by setting iwmr->region to NULL after ib_umem_release. Fixed: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region") Signed-off-by: Jacob Moroni Link: https://patch.msgid.link/20260227152743.1183388-1-jmoroni@google.com Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin

RDMA/irdma: Return EINVAL for invalid arp index error

2026-04-02T11:07:22+00:00

[ Upstream commit 7221f581eefa79ead06e171044f393fb7ee22f87 ] When rdma_connect() fails due to an invalid arp index, user space rdma core reports ENOMEM which is confusing. Modify irdma_make_cm_node() to return the correct error code. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin