kernel/linux.git/drivers/infiniband/core/addr.c, branch linux-7.1.y

IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()

2026-04-29T19:37:12+00:00

When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a netlink request to the userspace daemon to perform IP-to-GID resolution in certain cases. The function allocates the netlink message buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the size of size_t) instead of 16 bytes (the size of an IPv6 address). This results in an 8-byte under-allocation. This is currently masked by nlmsg_new() over-allocation of the skb in its internal logic. However, the code remains incorrect. Fix the issue by supplying the proper IPv6 address length to nla_total_size(). Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com Signed-off-by: Maher Sanalla Reviewed-by: Patrisious Haddad Signed-off-by: Edward Srouji Signed-off-by: Jason Gunthorpe

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

2026-04-20T18:20:35+00:00

Pull rdma updates from Jason Gunthorpe: "The usual collection of driver changes, more core infrastructure updates that typical this cycle: - Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa, ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma - New udata validation framework and driver updates - Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in core - Promote UMEM to a core component, split out DMA block iterator logic - Introduce FRMR pools with aging, statistics, pinned handles, and netlink control and use it in mlx5 - Add PCIe TLP emulation support in mlx5 - Extend umem to work with revocable pinned dmabuf's and use it in irdma - More net namespace improvements for rxe - GEN4 hardware support in irdma - First steps to MW and UC support in mana_ib - Support for CQ umem and doorbells in bnxt_re - Drop opa_vnic driver from hfi1 Fixes: - IB/core zero dmac neighbor resolution race - GID table memory free - rxe pad/ICRC validation and r_key async errors - mlx4 external umem for CQ - umem DMA attributes on unmap - mana_ib RX steering on RSS QP destroy" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits) RDMA/core: Fix user CQ creation for drivers without create_cq RDMA/ionic: bound node_desc sysfs read with %.64s IB/core: Fix zero dmac race in neighbor resolution RDMA/mana_ib: Support memory windows RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv RDMA/core: Prefer NLA_NUL_STRING RDMA/core: Fix memory free for GID table RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in() RDMA: Remove redundant = {} for udata req structs RDMA/irdma: Add missing comp_mask check in alloc_ucontext RDMA/hns: Add missing comp_mask check in create_qp RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm() RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask RDMA/hns: Use ib_copy_validate_udata_in() RDMA/mlx4: Use ib_copy_validate_udata_in() for QP RDMA/mlx4: Use ib_copy_validate_udata_in() RDMA/mlx5: Use ib_copy_validate_udata_in() for MW RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq RDMA: Use ib_copy_validate_udata_in() for implicit full structs ...

IB/core: Fix zero dmac race in neighbor resolution

2026-04-09T15:22:02+00:00

dst_fetch_ha() checks nud_state without holding the neighbor lock, then copies ha under the seqlock. A race in __neigh_update() where nud_state is set to NUD_REACHABLE before ha is written allows dst_fetch_ha() to read a zero MAC address while the seqlock reports no concurrent writer. netevent_callback amplifies this by waking ALL pending addr_req workers when ANY neighbor becomes NUD_VALID. At scale (N peers resolving ARP concurrently), the hit probability scales as N^2, making it near-certain for large RDMA workloads. N(A): neigh_update(A) W(A): addr_resolve(A) | [sleep] | write_lock_bh(&A->lock) | | A->nud_state = NUD_REACHABLE | | // A->ha is still 0 | | [woken by netevent_cb() of | another neighbour] | | dst_fetch_ha(A) | | A->nud_state & NUD_VALID | | read_seqbegin(&A->ha_lock) | | snapshot = A->ha /* 0 */ | | read_seqretry(&A->ha_lock) | | return snapshot | seqlock(&A->ha_lock) | A->ha = mac_A /* too late */ | sequnlock(&A->ha_lock) | write_unlock_bh(&A->lock) The incorrect/zero mac is read and programmed in the device QP while it was not yet updated. This causes silent packet loss and eventual RETRY_EXC_ERR. Fix by holding the neighbor read lock across the nud_state check and ha copy in dst_fetch_ha(), ensuring it synchronizes with __neigh_update() which is updating while holding the write lock. Cc: stable@vger.kernel.org Fixes: 92ebb6a0a13a ("IB/cm: Remove now useless rcu_lock in dst_fetch_ha") Link: https://patch.msgid.link/r/20260405-fix-dmac-race-v1-1-cfa1ec2ce54a@nvidia.com Signed-off-by: Chen Zhao Reviewed-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe

drivers: net: drop ipv6_stub usage and use direct function calls

2026-03-29T18:21:23+00:00

As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. Convert all drivers currently utilizing ipv6_stub to make direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera Tested-by: Ricardo B. Marlière Reviewed-by: Jason A. Donenfeld Reviewed-by: Antonio Quartulli Reviewed-by: Edward Cree Link: https://patch.msgid.link/20260325120928.15848-7-fmancera@suse.de Signed-off-by: Jakub Kicinski

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

RDMA/core: Check for the presence of LS_NLA_TYPE_DGID correctly

2025-12-17T01:30:56+00:00

The netlink response for RDMA_NL_LS_OP_IP_RESOLVE should always have a LS_NLA_TYPE_DGID attribute, it is invalid if it does not. Use the nl parsing logic properly and call nla_parse_deprecated() to fill the nlattrs array and then directly index that array to get the data for the DGID. Just fail if it is NULL. Remove the for loop searching for the nla, and squash the validation and parsing into one function. Fixes an uninitialized read from the stack triggered by userspace if it does not provide the DGID to a kernel initiated RDMA_NL_LS_OP_IP_RESOLVE query. BUG: KMSAN: uninit-value in hex_byte_pack include/linux/hex.h:13 [inline] BUG: KMSAN: uninit-value in ip6_string+0xef4/0x13a0 lib/vsprintf.c:1490 hex_byte_pack include/linux/hex.h:13 [inline] ip6_string+0xef4/0x13a0 lib/vsprintf.c:1490 ip6_addr_string+0x18a/0x3e0 lib/vsprintf.c:1509 ip_addr_string+0x245/0xee0 lib/vsprintf.c:1633 pointer+0xc09/0x1bd0 lib/vsprintf.c:2542 vsnprintf+0xf8a/0x1bd0 lib/vsprintf.c:2930 vprintk_store+0x3ae/0x1530 kernel/printk/printk.c:2279 vprintk_emit+0x307/0xcd0 kernel/printk/printk.c:2426 vprintk_default+0x3f/0x50 kernel/printk/printk.c:2465 vprintk+0x36/0x50 kernel/printk/printk_safe.c:82 _printk+0x17e/0x1b0 kernel/printk/printk.c:2475 ib_nl_process_good_ip_rsep drivers/infiniband/core/addr.c:128 [inline] ib_nl_handle_ip_res_resp+0x963/0x9d0 drivers/infiniband/core/addr.c:141 rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:-1 [inline] rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0xefa/0x11c0 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0xf04/0x12b0 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x10b3/0x1250 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg+0x333/0x3d0 net/socket.c:729 ____sys_sendmsg+0x7e0/0xd80 net/socket.c:2617 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2671 __sys_sendmsg+0x1aa/0x300 net/socket.c:2703 __compat_sys_sendmsg net/compat.c:346 [inline] __do_compat_sys_sendmsg net/compat.c:353 [inline] __se_compat_sys_sendmsg net/compat.c:350 [inline] __ia32_compat_sys_sendmsg+0xa4/0x100 net/compat.c:350 ia32_sys_call+0x3f6c/0x4310 arch/x86/include/generated/asm/syscalls_32.h:371 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline] __do_fast_syscall_32+0xb0/0x150 arch/x86/entry/syscall_32.c:306 do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:3 Link: https://patch.msgid.link/r/0-v1-3fbaef094271+2cf-rdma_op_ip_rslv_syz_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Reported-by: syzbot+938fcd548c303fe33c1a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68dc3dac.a00a0220.102ee.004f.GAE@google.com Signed-off-by: Jason Gunthorpe

RDMA/core: Use route entry flag to decide on loopback traffic

2025-09-18T09:20:35+00:00

addr_resolve() considers a destination to be local if the next-hop device of the resolved route for the destination is the loopback netdevice. This fails when the source and destination IP addresses belong to a netdev enslaved to a VRF netdev. In this case the next-hop device is the VRF itself: $ ip link add name myvrf up type vrf table 100 $ ip link set ens2f0np0 master myvrf up $ ip addr add 192.168.1.1/24 dev ens2f0np0 $ ip route get 192.168.1.1 oif myvrf local 192.168.1.1 dev myvrf table 100 src 192.168.1.1 uid 0 cache This results in packets being generated with an incorrect destination MAC of the VRF netdevice and ib_write_bw failing with timeout. Solve this by determining if a destination is local or not based on the resolved route's type rather than based on its next-hop netdevice loopback flag. This enables to resolve loopback traffic with and without VRF configurations in a uniform way. Signed-off-by: Parav Pandit Reviewed-by: Vlad Dumitrescu Signed-off-by: Edward Srouji Link: https://patch.msgid.link/20250916111103.84069-4-edwards@nvidia.com Signed-off-by: Leon Romanovsky

RDMA/core: Resolve MAC of next-hop device without ARP support

2025-09-18T09:20:20+00:00

Currently, if the next-hop netdevice does not support ARP resolution, the destination MAC address is silently set to zero without reporting an error. This leads to incorrect behavior and may result in packet transmission failures. Fix this by deferring MAC resolution to the IP stack via neighbour lookup, allowing proper resolution or error reporting as appropriate. Fixes: 7025fcd36bd6 ("IB: address translation to map IP toIB addresses (GIDs)") Signed-off-by: Parav Pandit Reviewed-by: Vlad Dumitrescu Signed-off-by: Edward Srouji Link: https://patch.msgid.link/20250916111103.84069-3-edwards@nvidia.com Signed-off-by: Leon Romanovsky

RDMA/core: Squash a single user static function

2025-09-18T09:15:39+00:00

To reduce dependencies in IFF_LOOPBACK in route and neighbour resolution steps, squash the static function to its single caller and simplify the code. Until now, network field was set even when neighbour resolution failed. With this change, dev_addr output fields are valid only when resolution is successful. Signed-off-by: Parav Pandit Reviewed-by: Vlad Dumitrescu Signed-off-by: Edward Srouji Link: https://patch.msgid.link/20250916111103.84069-2-edwards@nvidia.com Signed-off-by: Leon Romanovsky