<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/infiniband/core/device.c, branch v6.6.134</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.134</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.134'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:21:01+00:00</updated>
<entry>
<title>RDMA/core: Fix stale RoCE GIDs during netdev events at registration</title>
<updated>2026-03-04T12:21:01+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@nvidia.com</email>
</author>
<published>2026-01-27T09:38:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52d469319ced49ed3bab628ea51e001afcf5ca69'/>
<id>urn:sha1:52d469319ced49ed3bab628ea51e001afcf5ca69</id>
<content type='text'>
[ Upstream commit 9af0feae8016ba58ad7ff784a903404986b395b1 ]

RoCE GID entries become stale when netdev properties change during the
IB device registration window. This is reproducible with a udev rule
that sets a MAC address when a VF netdev appears:

  ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
    RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"

After VF creation, show_gids displays GIDs derived from the original
random MAC rather than the configured one.

The root cause is a race between netdev event processing and device
registration:

  CPU 0 (driver)                    CPU 1 (udev/workqueue)
  ──────────────                    ──────────────────────
  ib_register_device()
    ib_cache_setup_one()
      gid_table_setup_one()
        _gid_table_setup_one()
          ← GID table allocated
        rdma_roce_rescan_device()
          ← GIDs populated with
            OLD MAC
                                    ip link set eth4 addr NEW_MAC
                                    NETDEV_CHANGEADDR queued
                                    netdevice_event_work_handler()
                                      ib_enum_all_roce_netdevs()
                                        ← Iterates DEVICE_REGISTERED
                                        ← Device NOT marked yet, SKIP!
    enable_device_and_get()
      xa_set_mark(DEVICE_REGISTERED)
          ← Too late, event was lost

The netdev event handler uses ib_enum_all_roce_netdevs() which only
iterates devices marked DEVICE_REGISTERED. However, this mark is set
late in the registration process, after the GID cache is already
populated. Events arriving in this window are silently dropped.

Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
set immediately after the GID table is allocated and initialized. Use
the new mark in ib_enum_all_roce_netdevs() function to iterate devices
instead of DEVICE_REGISTERED.

This is safe because:
- After _gid_table_setup_one(), all required structures exist (port_data,
  immutable, cache.gid)
- The GID table mutex serializes concurrent access between the initial
  rescan and event handlers
- Event handlers correctly update stale GIDs even when racing with rescan
- The mark is cleared in ib_cache_cleanup_one() before teardown

This also fixes similar races for IP address events (inetaddr_event,
inet6addr_event) which use the same enumeration path.

Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Link: https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/rxe: Remove the direct link to net_device</title>
<updated>2026-01-11T14:22:31+00:00</updated>
<author>
<name>Zhu Yanjun</name>
<email>yanjun.zhu@linux.dev</email>
</author>
<published>2026-01-08T10:05:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=32ca3557d968e662957131374a5f81c9c9cdbba8'/>
<id>urn:sha1:32ca3557d968e662957131374a5f81c9c9cdbba8</id>
<content type='text'>
[ Upstream commit 2ac5415022d16d63d912a39a06f32f1f51140261 ]

The similar patch in siw is in the link:
https://git.kernel.org/rdma/rdma/c/16b87037b48889

This problem also occurred in RXE. The following analyze this problem.
In the following Call Traces:
"
BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
Read of size 4 at addr ffff8880554640b0 by task kworker/1:4/5295

CPU: 1 UID: 0 PID: 5295 Comm: kworker/1:4 Not tainted
6.12.0-rc3-syzkaller-00399-g9197b73fd7bb #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 09/13/2024
Workqueue: infiniband ib_cache_event_task
Call Trace:
 &lt;TASK&gt;
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
 rxe_query_port+0x12d/0x260 drivers/infiniband/sw/rxe/rxe_verbs.c:60
 __ib_query_port drivers/infiniband/core/device.c:2111 [inline]
 ib_query_port+0x168/0x7d0 drivers/infiniband/core/device.c:2143
 ib_cache_update+0x1a9/0xb80 drivers/infiniband/core/cache.c:1494
 ib_cache_event_task+0xf3/0x1e0 drivers/infiniband/core/cache.c:1568
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f2/0x390 kernel/kthread.c:389
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 &lt;/TASK&gt;
"

1). In the link [1],

"
 infiniband syz2: set down
"

This means that on 839.350575, the event ib_cache_event_task was sent andi
queued in ib_wq.

2). In the link [1],

"
 team0 (unregistering): Port device team_slave_0 removed
"

It indicates that before 843.251853, the net device should be freed.

3). In the link [1],

"
 BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0
"

This means that on 850.559070, this slab-use-after-free problem occurred.

In all, on 839.350575, the event ib_cache_event_task was sent and queued
in ib_wq,

before 843.251853, the net device veth was freed.

on 850.559070, this event was executed, and the mentioned freed net device
was called. Thus, the above call trace occurred.

[1] https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000

Reported-by: syzbot+4b87489410b4efd181bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4b87489410b4efd181bf
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Zhu Yanjun &lt;yanjun.zhu@linux.dev&gt;
Link: https://patch.msgid.link/20241220222325.2487767-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
[Shivani: - exported ib_device_get_netdev() function.
          - added ib_device_get_netdev() to ib_verbs.h.]
Signed-off-by: Shivani Agarwal &lt;shivani.agarwal@broadcom.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problem</title>
<updated>2026-01-11T14:22:30+00:00</updated>
<author>
<name>Zhu Yanjun</name>
<email>yanjun.zhu@linux.dev</email>
</author>
<published>2025-05-06T15:10:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17d3103325e891e10994e7aa28d12bea04dc2c60'/>
<id>urn:sha1:17d3103325e891e10994e7aa28d12bea04dc2c60</id>
<content type='text'>
commit d0706bfd3ee40923c001c6827b786a309e2a8713 upstream.

Call Trace:

 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc3/0x670 mm/kasan/report.c:521
 kasan_report+0xe0/0x110 mm/kasan/report.c:634
 strlen+0x93/0xa0 lib/string.c:420
 __fortify_strlen include/linux/fortify-string.h:268 [inline]
 get_kobj_path_length lib/kobject.c:118 [inline]
 kobject_get_path+0x3f/0x2a0 lib/kobject.c:158
 kobject_uevent_env+0x289/0x1870 lib/kobject_uevent.c:545
 ib_register_device drivers/infiniband/core/device.c:1472 [inline]
 ib_register_device+0x8cf/0xe00 drivers/infiniband/core/device.c:1393
 rxe_register_device+0x275/0x320 drivers/infiniband/sw/rxe/rxe_verbs.c:1552
 rxe_net_add+0x8e/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:550
 rxe_newlink+0x70/0x190 drivers/infiniband/sw/rxe/rxe.c:225
 nldev_newlink+0x3a3/0x680 drivers/infiniband/core/nldev.c:1796
 rdma_nl_rcv_msg+0x387/0x6e0 drivers/infiniband/core/netlink.c:195
 rdma_nl_rcv_skb.constprop.0.isra.0+0x2e5/0x450
 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
 netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 ____sys_sendmsg+0xa95/0xc70 net/socket.c:2566
 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
 __sys_sendmsg+0x16d/0x220 net/socket.c:2652
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

This problem is similar to the problem that the
commit 1d6a9e7449e2 ("RDMA/core: Fix use-after-free when rename device name")
fixes.

The root cause is: the function ib_device_rename() renames the name with
lock. But in the function kobject_uevent(), this name is accessed without
lock protection at the same time.

The solution is to add the lock protection when this name is accessed in
the function kobject_uevent().

Fixes: 779e0bf47632 ("RDMA/core: Do not indicate device ready when device enablement fails")
Link: https://patch.msgid.link/r/20250506151008.75701-1-yanjun.zhu@linux.dev
Reported-by: syzbot+e2ce9e275ecc70a30b72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e2ce9e275ecc70a30b72
Signed-off-by: Zhu Yanjun &lt;yanjun.zhu@linux.dev&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
[ Ajay: Modified to apply on v5.10.y-v6.6.y
        ib_device_notify_register() not present in v5.10.y-v6.6.y,
        so directly added lock for kobject_uevent() ]
Signed-off-by: Ajay Kaher &lt;ajay.kaher@broadcom.com&gt;
Signed-off-by: Shivani Agarwal &lt;shivani.agarwal@broadcom.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Don't expose hw_counters outside of init net namespace</title>
<updated>2025-04-10T12:37:30+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2025-02-27T16:54:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0cf80f924aecb5b2bebd4f4ad11b2efc676a0b78'/>
<id>urn:sha1:0cf80f924aecb5b2bebd4f4ad11b2efc676a0b78</id>
<content type='text'>
[ Upstream commit a1ecb30f90856b0be4168ad51b8875148e285c1f ]

Commit 467f432a521a ("RDMA/core: Split port and device counter sysfs
attributes") accidentally almost exposed hw counters to non-init net
namespaces. It didn't expose them fully, as an attempt to read any of
those counters leads to a crash like this one:

[42021.807566] BUG: kernel NULL pointer dereference, address: 0000000000000028
[42021.814463] #PF: supervisor read access in kernel mode
[42021.819549] #PF: error_code(0x0000) - not-present page
[42021.824636] PGD 0 P4D 0
[42021.827145] Oops: 0000 [#1] SMP PTI
[42021.830598] CPU: 82 PID: 2843922 Comm: switchto-defaul Kdump: loaded Tainted: G S      W I        XXX
[42021.841697] Hardware name: XXX
[42021.849619] RIP: 0010:hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.855362] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7 f0 fa ff ff &lt;48&gt; 8b 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48
[42021.873931] RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287
[42021.879108] RAX: ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 0000000000000000
[42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI: ffff940c7517aef0
[42021.893230] RBP: ffff97fe90f03e70 R08: ffff94085f1aa000 R09: 0000000000000000
[42021.900294] R10: ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87ca2530
[42021.907355] R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15: ffff94085f1aa000
[42021.914418] FS:  00007fda1a3b9700(0000) GS:ffff94453fb80000(0000) knlGS:0000000000000000
[42021.922423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42021.928130] CR2: 0000000000000028 CR3: 00000042dcfb8003 CR4: 00000000003726f0
[42021.935194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42021.949324] Call Trace:
[42021.951756]  &lt;TASK&gt;
[42021.953842]  [&lt;ffffffff86c58674&gt;] ? show_regs+0x64/0x70
[42021.959030]  [&lt;ffffffff86c58468&gt;] ? __die+0x78/0xc0
[42021.963874]  [&lt;ffffffff86c9ef75&gt;] ? page_fault_oops+0x2b5/0x3b0
[42021.969749]  [&lt;ffffffff87674b92&gt;] ? exc_page_fault+0x1a2/0x3c0
[42021.975549]  [&lt;ffffffff87801326&gt;] ? asm_exc_page_fault+0x26/0x30
[42021.981517]  [&lt;ffffffffc0775680&gt;] ? __pfx_show_hw_stats+0x10/0x10 [ib_core]
[42021.988482]  [&lt;ffffffffc077564e&gt;] ? hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.995438]  [&lt;ffffffff86ac7f8e&gt;] dev_attr_show+0x1e/0x50
[42022.000803]  [&lt;ffffffff86a3eeb1&gt;] sysfs_kf_seq_show+0x81/0xe0
[42022.006508]  [&lt;ffffffff86a11134&gt;] seq_read_iter+0xf4/0x410
[42022.011954]  [&lt;ffffffff869f4b2e&gt;] vfs_read+0x16e/0x2f0
[42022.017058]  [&lt;ffffffff869f50ee&gt;] ksys_read+0x6e/0xe0
[42022.022073]  [&lt;ffffffff8766f1ca&gt;] do_syscall_64+0x6a/0xa0
[42022.027441]  [&lt;ffffffff8780013b&gt;] entry_SYSCALL_64_after_hwframe+0x78/0xe2

The problem can be reproduced using the following steps:
  ip netns add foo
  ip netns exec foo bash
  cat /sys/class/infiniband/mlx4_0/hw_counters/*

The panic occurs because of casting the device pointer into an
ib_device pointer using container_of() in hw_stat_device_show() is
wrong and leads to a memory corruption.

However the real problem is that hw counters should never been exposed
outside of the non-init net namespace.

Fix this by saving the index of the corresponding attribute group
(it might be 1 or 2 depending on the presence of driver-specific
attributes) and zeroing the pointer to hw_counters group for compat
devices during the initialization.

With this fix applied hw_counters are not available in a non-init
net namespace:
  find /sys/class/infiniband/mlx4_0/ -name hw_counters
    /sys/class/infiniband/mlx4_0/ports/1/hw_counters
    /sys/class/infiniband/mlx4_0/ports/2/hw_counters
    /sys/class/infiniband/mlx4_0/hw_counters

  ip netns add foo
  ip netns exec foo bash
  find /sys/class/infiniband/mlx4_0/ -name hw_counters

Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes")
Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Maher Sanalla &lt;msanalla@nvidia.com&gt;
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://patch.msgid.link/20250227165420.3430301-1-roman.gushchin@linux.dev
Reviewed-by: Parav Pandit &lt;parav@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA: Fix netdev tracker in ib_device_set_netdev</title>
<updated>2024-08-03T06:54:04+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsahern@kernel.org</email>
</author>
<published>2024-07-10T20:33:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ac45a09a19ca90e8932b8ff9cdd8933424bcd07f'/>
<id>urn:sha1:ac45a09a19ca90e8932b8ff9cdd8933424bcd07f</id>
<content type='text'>
[ Upstream commit 2043a14fb3de9d88440b21590f714306fcbbd55f ]

If a netdev has already been assigned, ib_device_set_netdev needs to
release the reference on the older netdev but it is mistakenly being
called for the new netdev. Fix it and in the process use netdev_put
to be symmetrical with the netdev_hold.

Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()")
Signed-off-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20240710203310.19317-1-dsahern@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Remove NULL check before dev_{put, hold}</title>
<updated>2024-08-03T06:54:03+00:00</updated>
<author>
<name>Jules Irenge</name>
<email>jbi.octave@gmail.com</email>
</author>
<published>2024-04-30T22:47:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=60708cb279f347eef72c609f75dfe14eda72545c'/>
<id>urn:sha1:60708cb279f347eef72c609f75dfe14eda72545c</id>
<content type='text'>
[ Upstream commit 7a1c2abf9a2be7d969b25e8d65567933335ca88e ]

The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warning:

./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot &lt;abaci@linux.alibaba.com&gt;
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047
Signed-off-by: Yang Li &lt;yang.lee@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Stable-dep-of: 2043a14fb3de ("RDMA: Fix netdev tracker in ib_device_set_netdev")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/device: Return error earlier if port in not valid</title>
<updated>2024-08-03T06:54:00+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2024-06-24T13:24:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=480d281743ded3a47ce4a80b16c355536beb7db8'/>
<id>urn:sha1:480d281743ded3a47ce4a80b16c355536beb7db8</id>
<content type='text'>
[ Upstream commit 917918f57a7b139c043e78c502876f2c286f4f0a ]

There is no need to allocate port data if port provided is not valid.

Fixes: c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev")
Link: https://lore.kernel.org/r/022047a8b16988fc88d4426da50bf60a4833311b.1719235449.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/device: Fix a race between mad_client and cm_client init</title>
<updated>2024-03-26T22:20:01+00:00</updated>
<author>
<name>Shifeng Li</name>
<email>lishifeng@sangfor.com.cn</email>
</author>
<published>2024-02-03T03:53:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=034977c33b4ad137f0013745a7313a75a3da0254'/>
<id>urn:sha1:034977c33b4ad137f0013745a7313a75a3da0254</id>
<content type='text'>
[ Upstream commit 7a8bccd8b29c321ac181369b42b04fecf05f98e2 ]

The mad_client will be initialized in enable_device_and_get(), while the
devices_rwsem will be downgraded to a read semaphore. There is a window
that leads to the failed initialization for cm_client, since it can not
get matched mad port from ib_mad_port_list, and the matched mad port will
be added to the list after that.

    mad_client    |                       cm_client
------------------|--------------------------------------------------------
ib_register_device|
enable_device_and_get
down_write(&amp;devices_rwsem)
xa_set_mark(&amp;devices, DEVICE_REGISTERED)
downgrade_write(&amp;devices_rwsem)
                  |
                  |ib_cm_init
                  |ib_register_client(&amp;cm_client)
                  |down_read(&amp;devices_rwsem)
                  |xa_for_each_marked (&amp;devices, DEVICE_REGISTERED)
                  |add_client_context
                  |cm_add_one
                  |ib_register_mad_agent
                  |ib_get_mad_port
                  |__ib_get_mad_port
                  |list_for_each_entry(entry, &amp;ib_mad_port_list, port_list)
                  |return NULL
                  |up_read(&amp;devices_rwsem)
                  |
add_client_context|
ib_mad_init_device|
ib_mad_port_open  |
list_add_tail(&amp;port_priv-&gt;port_list, &amp;ib_mad_port_list)
up_read(&amp;devices_rwsem)
                  |

Fix it by using down_write(&amp;devices_rwsem) in ib_register_client().

Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
Link: https://lore.kernel.org/r/20240203035313.98991-1-lishifeng@sangfor.com.cn
Suggested-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Signed-off-by: Shifeng Li &lt;lishifeng@sangfor.com.cn&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Use size_{add,sub,mul}() in calls to struct_size()</title>
<updated>2023-11-20T10:59:18+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2023-09-17T21:21:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=119d4350a535bf6aea3ec60d548840bc1602e45a'/>
<id>urn:sha1:119d4350a535bf6aea3ec60d548840bc1602e45a</id>
<content type='text'>
[ Upstream commit 81760bedc65194ff38e1e4faefd5f9f0c95c19a4 ]

If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` provides against potential integer
overflows is defeated. Fix this by hardening calls to `struct_size()`
with `size_add()`, `size_sub()` and `size_mul()`.

Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes")
Fixes: a4676388e2e2 ("RDMA/core: Simplify how the gid_attrs sysfs is created")
Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()")
Fixes: 324e227ea7c9 ("RDMA/device: Add ib_device_get_by_netdev()")
Fixes: 5aad26a7eac5 ("IB/core: Use struct_size() in kzalloc()")
Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core</title>
<updated>2022-12-16T11:54:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-16T11:54:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=71a7507afbc3f27c346898f13ab9bfd918613c34'/>
<id>urn:sha1:71a7507afbc3f27c346898f13ab9bfd918613c34</id>
<content type='text'>
Pull driver core updates from Greg KH:
 "Here is the set of driver core and kernfs changes for 6.2-rc1.

  The "big" change in here is the addition of a new macro,
  container_of_const() that will preserve the "const-ness" of a pointer
  passed into it.

  The "problem" of the current container_of() macro is that if you pass
  in a "const *", out of it can comes a non-const pointer unless you
  specifically ask for it. For many usages, we want to preserve the
  "const" attribute by using the same call. For a specific example, this
  series changes the kobj_to_dev() macro to use it, allowing it to be
  used no matter what the const value is. This prevents every subsystem
  from having to declare 2 different individual macros (i.e.
  kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
  the const value at build time, which having 2 macros would not do
  either.

  The driver for all of this have been discussions with the Rust kernel
  developers as to how to properly mark driver core, and kobject,
  objects as being "non-mutable". The changes to the kobject and driver
  core in this pull request are the result of that, as there are lots of
  paths where kobjects and device pointers are not modified at all, so
  marking them as "const" allows the compiler to enforce this.

  So, a nice side affect of the Rust development effort has been already
  to clean up the driver core code to be more obvious about object
  rules.

  All of this has been bike-shedded in quite a lot of detail on lkml
  with different names and implementations resulting in the tiny version
  we have in here, much better than my original proposal. Lots of
  subsystem maintainers have acked the changes as well.

  Other than this change, included in here are smaller stuff like:

   - kernfs fixes and updates to handle lock contention better

   - vmlinux.lds.h fixes and updates

   - sysfs and debugfs documentation updates

   - device property updates

  All of these have been in the linux-next tree for quite a while with
  no problems"

* tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits)
  device property: Fix documentation for fwnode_get_next_parent()
  firmware_loader: fix up to_fw_sysfs() to preserve const
  usb.h: take advantage of container_of_const()
  device.h: move kobj_to_dev() to use container_of_const()
  container_of: add container_of_const() that preserves const-ness of the pointer
  driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion.
  driver core: fix up missed scsi/cxlflash class.devnode() conversion.
  driver core: fix up some missing class.devnode() conversions.
  driver core: make struct class.devnode() take a const *
  driver core: make struct class.dev_uevent() take a const *
  cacheinfo: Remove of_node_put() for fw_token
  device property: Add a blank line in Kconfig of tests
  device property: Rename goto label to be more precise
  device property: Move PROPERTY_ENTRY_BOOL() a bit down
  device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*()
  kernfs: fix all kernel-doc warnings and multiple typos
  driver core: pass a const * into of_device_uevent()
  kobject: kset_uevent_ops: make name() callback take a const *
  kobject: kset_uevent_ops: make filter() callback take a const *
  kobject: make kobject_namespace take a const *
  ...
</content>
</entry>
</feed>
