<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/infiniband/core/cache.c, branch v5.10.257</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:20:11+00:00</updated>
<entry>
<title>RDMA/core: Fix stale RoCE GIDs during netdev events at registration</title>
<updated>2026-03-04T12:20:11+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@nvidia.com</email>
</author>
<published>2026-01-27T09:38:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a75de01cf47321cfa7384ec6b963fd4866ea320'/>
<id>urn:sha1:3a75de01cf47321cfa7384ec6b963fd4866ea320</id>
<content type='text'>
[ Upstream commit 9af0feae8016ba58ad7ff784a903404986b395b1 ]

RoCE GID entries become stale when netdev properties change during the
IB device registration window. This is reproducible with a udev rule
that sets a MAC address when a VF netdev appears:

  ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
    RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"

After VF creation, show_gids displays GIDs derived from the original
random MAC rather than the configured one.

The root cause is a race between netdev event processing and device
registration:

  CPU 0 (driver)                    CPU 1 (udev/workqueue)
  ──────────────                    ──────────────────────
  ib_register_device()
    ib_cache_setup_one()
      gid_table_setup_one()
        _gid_table_setup_one()
          ← GID table allocated
        rdma_roce_rescan_device()
          ← GIDs populated with
            OLD MAC
                                    ip link set eth4 addr NEW_MAC
                                    NETDEV_CHANGEADDR queued
                                    netdevice_event_work_handler()
                                      ib_enum_all_roce_netdevs()
                                        ← Iterates DEVICE_REGISTERED
                                        ← Device NOT marked yet, SKIP!
    enable_device_and_get()
      xa_set_mark(DEVICE_REGISTERED)
          ← Too late, event was lost

The netdev event handler uses ib_enum_all_roce_netdevs() which only
iterates devices marked DEVICE_REGISTERED. However, this mark is set
late in the registration process, after the GID cache is already
populated. Events arriving in this window are silently dropped.

Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
set immediately after the GID table is allocated and initialized. Use
the new mark in ib_enum_all_roce_netdevs() function to iterate devices
instead of DEVICE_REGISTERED.

This is safe because:
- After _gid_table_setup_one(), all required structures exist (port_data,
  immutable, cache.gid)
- The GID table mutex serializes concurrent access between the initial
  rescan and event handlers
- Event handlers correctly update stale GIDs even when racing with rescan
- The mark is cleared in ib_cache_cleanup_one() before teardown

This also fixes similar races for IP address events (inetaddr_event,
inet6addr_event) which use the same enumeration path.

Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Link: https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Rate limit GID cache warning messages</title>
<updated>2025-08-28T14:22:29+00:00</updated>
<author>
<name>Maor Gottlieb</name>
<email>maorg@nvidia.com</email>
</author>
<published>2025-06-16T08:26:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a36f8f4c072d4c7005be09d07721c8b063336e78'/>
<id>urn:sha1:a36f8f4c072d4c7005be09d07721c8b063336e78</id>
<content type='text'>
[ Upstream commit 333e4d79316c9ed5877d7aac8b8ed22efc74e96d ]

The GID cache warning messages can flood the kernel log when there are
multiple failed attempts to add GIDs. This can happen when creating many
virtual interfaces without having enough space for their GIDs in the GID
table.

Change pr_warn to pr_warn_ratelimited to prevent log flooding while still
maintaining visibility of the issue.

Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.1750062357.git.leon@kernel.org
Signed-off-by: Maor Gottlieb &lt;maorg@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Fix empty gid table for non IB/RoCE devices</title>
<updated>2020-12-07T20:32:04+00:00</updated>
<author>
<name>Gal Pressman</name>
<email>galpress@amazon.com</email>
</author>
<published>2020-12-06T15:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e432c04c17993011b2a2f59dcb5738e604bd552e'/>
<id>urn:sha1:e432c04c17993011b2a2f59dcb5738e604bd552e</id>
<content type='text'>
The query_gid_table ioctl skips non IB/RoCE ports, which as a result
returns an empty gid table for devices such as EFA which have a GID table,
but are not IB/RoCE.

Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API")
Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com
Signed-off-by: Gal Pressman &lt;galpress@amazon.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma</title>
<updated>2020-10-17T18:18:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-10-17T18:18:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1e16bc7d5f7ca3599d8a7f061841c93a563665e'/>
<id>urn:sha1:a1e16bc7d5f7ca3599d8a7f061841c93a563665e</id>
<content type='text'>
Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
</content>
</entry>
<entry>
<title>RDMA/core: Introduce new GID table query API</title>
<updated>2020-10-02T00:20:11+00:00</updated>
<author>
<name>Avihai Horon</name>
<email>avihaih@nvidia.com</email>
</author>
<published>2020-09-23T16:50:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4b4d548fabc15b9c5db9f61204dd0c608414d2d'/>
<id>urn:sha1:c4b4d548fabc15b9c5db9f61204dd0c608414d2d</id>
<content type='text'>
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to a
provided buffer.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org
Signed-off-by: Avihai Horon &lt;avihaih@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Modify enum ib_gid_type and enum rdma_network_type</title>
<updated>2020-10-02T00:20:11+00:00</updated>
<author>
<name>Avihai Horon</name>
<email>avihaih@nvidia.com</email>
</author>
<published>2020-09-23T16:50:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c15b4f2a42ff6697767c22c8ff5f9bcc22fdbe5'/>
<id>urn:sha1:1c15b4f2a42ff6697767c22c8ff5f9bcc22fdbe5</id>
<content type='text'>
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon &lt;avihaih@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Change rdma_get_gid_attr returned error code</title>
<updated>2020-10-02T00:20:11+00:00</updated>
<author>
<name>Avihai Horon</name>
<email>avihaih@nvidia.com</email>
</author>
<published>2020-09-23T16:50:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ff4de8f6030cbcabb918a920037721bbc38e8f5'/>
<id>urn:sha1:3ff4de8f6030cbcabb918a920037721bbc38e8f5</id>
<content type='text'>
Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.

This change is done in order to provide a more accurate error reporting to
be used by the new GID query API in user space. Nevertheless, -EINVAL is
still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.

Link: https://lore.kernel.org/r/20200923165015.2491894-2-leon@kernel.org
Signed-off-by: Avihai Horon &lt;avihaih@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>net: core: introduce struct netdev_nested_priv for nested interface infrastructure</title>
<updated>2020-09-28T22:00:15+00:00</updated>
<author>
<name>Taehee Yoo</name>
<email>ap420073@gmail.com</email>
</author>
<published>2020-09-25T18:13:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eff7423365a6938d2d34dbce989febed2ae1f957'/>
<id>urn:sha1:eff7423365a6938d2d34dbce989febed2ae1f957</id>
<content type='text'>
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo &lt;ap420073@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is set</title>
<updated>2020-07-20T19:18:16+00:00</updated>
<author>
<name>Kamal Heib</name>
<email>kamalheib1@gmail.com</email>
</author>
<published>2020-07-14T18:34:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6f38efca9bf09b7e4faf8cc4d221e4336aa303ef'/>
<id>urn:sha1:6f38efca9bf09b7e4faf8cc4d221e4336aa303ef</id>
<content type='text'>
Allocate the pkey cache only if the pkey_tbl_len is set by the provider,
also add checks to avoid accessing the pkey cache when it not initialized.

Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib &lt;kamalheib1@gmail.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>IB/core: Fix potential NULL pointer dereference in pkey cache</title>
<updated>2020-05-12T14:47:48+00:00</updated>
<author>
<name>Jack Morgenstein</name>
<email>jackm@dev.mellanox.co.il</email>
</author>
<published>2020-05-07T07:10:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1901b91f99821955eac2bd48fe25ee983385dc00'/>
<id>urn:sha1:1901b91f99821955eac2bd48fe25ee983385dc00</id>
<content type='text'>
The IB core pkey cache is populated by procedure ib_cache_update().
Initially, the pkey cache pointer is NULL. ib_cache_update allocates a
buffer and populates it with the device's pkeys, via repeated calls to
procedure ib_query_pkey().

If there is a failure in populating the pkey buffer via ib_query_pkey(),
ib_cache_update does not replace the old pkey buffer cache with the
updated one -- it leaves the old cache as is.

Since initially the pkey buffer cache is NULL, when calling
ib_cache_update the first time, a failure in ib_query_pkey() will cause
the pkey buffer cache pointer to remain NULL.

In this situation, any calls subsequent to ib_get_cached_pkey(),
ib_find_cached_pkey(), or ib_find_cached_pkey_exact() will try to
dereference the NULL pkey cache pointer, causing a kernel panic.

Fix this by checking the ib_cache_update() return value.

Fixes: 8faea9fd4a39 ("RDMA/cache: Move the cache per-port data into the main ib_port_data")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20200507071012.100594-1-leon@kernel.org
Signed-off-by: Jack Morgenstein &lt;jackm@dev.mellanox.co.il&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
</feed>
