<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/rdma/ib_verbs.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-04-10T12:39:19+00:00</updated>
<entry>
<title>RDMA/core: Don't expose hw_counters outside of init net namespace</title>
<updated>2025-04-10T12:39:19+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2025-02-27T16:54:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=df45ae2a4f1cdfda00c032839e12092e1f32c05e'/>
<id>urn:sha1:df45ae2a4f1cdfda00c032839e12092e1f32c05e</id>
<content type='text'>
[ Upstream commit a1ecb30f90856b0be4168ad51b8875148e285c1f ]

Commit 467f432a521a ("RDMA/core: Split port and device counter sysfs
attributes") accidentally almost exposed hw counters to non-init net
namespaces. It didn't expose them fully, as an attempt to read any of
those counters leads to a crash like this one:

[42021.807566] BUG: kernel NULL pointer dereference, address: 0000000000000028
[42021.814463] #PF: supervisor read access in kernel mode
[42021.819549] #PF: error_code(0x0000) - not-present page
[42021.824636] PGD 0 P4D 0
[42021.827145] Oops: 0000 [#1] SMP PTI
[42021.830598] CPU: 82 PID: 2843922 Comm: switchto-defaul Kdump: loaded Tainted: G S      W I        XXX
[42021.841697] Hardware name: XXX
[42021.849619] RIP: 0010:hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.855362] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7 f0 fa ff ff &lt;48&gt; 8b 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48
[42021.873931] RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287
[42021.879108] RAX: ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 0000000000000000
[42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI: ffff940c7517aef0
[42021.893230] RBP: ffff97fe90f03e70 R08: ffff94085f1aa000 R09: 0000000000000000
[42021.900294] R10: ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87ca2530
[42021.907355] R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15: ffff94085f1aa000
[42021.914418] FS:  00007fda1a3b9700(0000) GS:ffff94453fb80000(0000) knlGS:0000000000000000
[42021.922423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42021.928130] CR2: 0000000000000028 CR3: 00000042dcfb8003 CR4: 00000000003726f0
[42021.935194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42021.949324] Call Trace:
[42021.951756]  &lt;TASK&gt;
[42021.953842]  [&lt;ffffffff86c58674&gt;] ? show_regs+0x64/0x70
[42021.959030]  [&lt;ffffffff86c58468&gt;] ? __die+0x78/0xc0
[42021.963874]  [&lt;ffffffff86c9ef75&gt;] ? page_fault_oops+0x2b5/0x3b0
[42021.969749]  [&lt;ffffffff87674b92&gt;] ? exc_page_fault+0x1a2/0x3c0
[42021.975549]  [&lt;ffffffff87801326&gt;] ? asm_exc_page_fault+0x26/0x30
[42021.981517]  [&lt;ffffffffc0775680&gt;] ? __pfx_show_hw_stats+0x10/0x10 [ib_core]
[42021.988482]  [&lt;ffffffffc077564e&gt;] ? hw_stat_device_show+0x1e/0x40 [ib_core]
[42021.995438]  [&lt;ffffffff86ac7f8e&gt;] dev_attr_show+0x1e/0x50
[42022.000803]  [&lt;ffffffff86a3eeb1&gt;] sysfs_kf_seq_show+0x81/0xe0
[42022.006508]  [&lt;ffffffff86a11134&gt;] seq_read_iter+0xf4/0x410
[42022.011954]  [&lt;ffffffff869f4b2e&gt;] vfs_read+0x16e/0x2f0
[42022.017058]  [&lt;ffffffff869f50ee&gt;] ksys_read+0x6e/0xe0
[42022.022073]  [&lt;ffffffff8766f1ca&gt;] do_syscall_64+0x6a/0xa0
[42022.027441]  [&lt;ffffffff8780013b&gt;] entry_SYSCALL_64_after_hwframe+0x78/0xe2

The problem can be reproduced using the following steps:
  ip netns add foo
  ip netns exec foo bash
  cat /sys/class/infiniband/mlx4_0/hw_counters/*

The panic occurs because of casting the device pointer into an
ib_device pointer using container_of() in hw_stat_device_show() is
wrong and leads to a memory corruption.

However the real problem is that hw counters should never been exposed
outside of the non-init net namespace.

Fix this by saving the index of the corresponding attribute group
(it might be 1 or 2 depending on the presence of driver-specific
attributes) and zeroing the pointer to hw_counters group for compat
devices during the initialization.

With this fix applied hw_counters are not available in a non-init
net namespace:
  find /sys/class/infiniband/mlx4_0/ -name hw_counters
    /sys/class/infiniband/mlx4_0/ports/1/hw_counters
    /sys/class/infiniband/mlx4_0/ports/2/hw_counters
    /sys/class/infiniband/mlx4_0/hw_counters

  ip netns add foo
  ip netns exec foo bash
  find /sys/class/infiniband/mlx4_0/ -name hw_counters

Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes")
Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Maher Sanalla &lt;msanalla@nvidia.com&gt;
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://patch.msgid.link/20250227165420.3430301-1-roman.gushchin@linux.dev
Reviewed-by: Parav Pandit &lt;parav@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Implement RoCE GID port rescan and export delete function</title>
<updated>2024-12-05T13:02:08+00:00</updated>
<author>
<name>Chiara Meiohas</name>
<email>cmeiohas@nvidia.com</email>
</author>
<published>2024-10-31T13:36:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a03f5f2e05cf30501c7b3258aba7e3158e5aa3b'/>
<id>urn:sha1:3a03f5f2e05cf30501c7b3258aba7e3158e5aa3b</id>
<content type='text'>
[ Upstream commit af7a35bf6c36a77624d3abe46b3830b7c2a5f20c ]

rdma_roce_rescan_port() scans all network devices in
the system and adds the gids if relevant to the RoCE device
port. When not in bonding mode it adds the GIDs of the
netdevice in this port. When in bonding mode it adds the
GIDs of both the port's netdevice and the bond master
netdevice.

Export roce_del_all_netdev_gids(), which  removes all GIDs
associated with a specific netdevice for a given port.

Signed-off-by: Chiara Meiohas &lt;cmeiohas@nvidia.com&gt;
Link: https://patch.msgid.link/674d498da4637a1503ff1367e28bd09ff942fd5e.1730381292.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Stable-dep-of: 0bd2c61df953 ("RDMA/mlx5: Ensure active slave attachment to the bond IB device")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Provide rdma_user_mmap_disassociate() to disassociate mmap pages</title>
<updated>2024-12-05T13:02:02+00:00</updated>
<author>
<name>Chengchang Tang</name>
<email>tangchengchang@huawei.com</email>
</author>
<published>2024-09-27T10:33:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=86092a4e46d161d24f8106df082e24b8efe87382'/>
<id>urn:sha1:86092a4e46d161d24f8106df082e24b8efe87382</id>
<content type='text'>
[ Upstream commit 51976c6cd786151b6a1bdf8b8b3334beac0ba99c ]

Provide a new api rdma_user_mmap_disassociate() for drivers to
disassociate mmap pages for a device.

Since drivers can now disassociate mmaps by calling this api,
introduce a new disassociation_lock to specifically prevent
races between this disassociation process and new mmaps. And
thus the old hw_destroy_rwsem is not needed in this api.

Signed-off-by: Chengchang Tang &lt;tangchengchang@huawei.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20240927103323.1897094-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Stable-dep-of: 615b94746a54 ("RDMA/hns: Disassociate mmap pages for all uctx when HW is being reset")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Use IB set_netdev and get_netdev functions</title>
<updated>2024-09-13T05:27:40+00:00</updated>
<author>
<name>Chiara Meiohas</name>
<email>cmeiohas@nvidia.com</email>
</author>
<published>2024-09-09T17:30:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8d159eb2117b2e3697a31785662b653938f007cb'/>
<id>urn:sha1:8d159eb2117b2e3697a31785662b653938f007cb</id>
<content type='text'>
The IB layer provides a common interface to store and get net
devices associated to an IB device port (ib_device_set_netdev()
and ib_device_get_netdev()).
Previously, mlx5_ib stored and managed the associated net devices
internally.

Replace internal net device management in mlx5_ib with
ib_device_set_netdev() when attaching/detaching  a net device and
ib_device_get_netdev() when retrieving the net device.

Export ib_device_get_netdev().

For mlx5 representors/PFs/VFs and lag creation we replace the netdev
assignments with the IB set/get netdev functions.

In active-backup mode lag the active slave net device is stored in the
lag itself. To assure the net device stored in a lag bond IB device is
the active slave we implement the following:
- mlx5_core: when modifying the slave of a bond we send the internal driver event
  MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE.
- mlx5_ib: when catching the event call ib_device_set_netdev()

This patch also ensures the correct IB events are sent in switchdev lag.

While at it, when in multiport eswitch mode, only a single IB device is
created for all ports. The said IB device will receive all netdev events
of its VFs once loaded, thus to avoid overwriting the mapping of PF IB
device to PF netdev, ignore NETDEV_REGISTER events if the ib device has
already been mapped to a netdev.

Signed-off-by: Chiara Meiohas &lt;cmeiohas@nvidia.com&gt;
Signed-off-by: Michael Guralnik &lt;michaelgur@nvidia.com&gt;
Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API</title>
<updated>2024-08-11T08:12:50+00:00</updated>
<author>
<name>Yishai Hadas</name>
<email>yishaih@nvidia.com</email>
</author>
<published>2024-08-01T12:05:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3aa73c6b795b9aaaf933f3c95495d85fc0de39e3'/>
<id>urn:sha1:3aa73c6b795b9aaaf933f3c95495d85fc0de39e3</id>
<content type='text'>
Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of
udata.

This enables passing some new ioctl attributes to the drivers, as will
be introduced in the next patches for mlx5 driver.

Change the involved drivers accordingly.

Signed-off-by: Yishai Hadas &lt;yishaih@nvidia.com&gt;
Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Introduce "name_assign_type" for an IB device</title>
<updated>2024-07-04T04:59:53+00:00</updated>
<author>
<name>Mark Zhang</name>
<email>markzhang@nvidia.com</email>
</author>
<published>2024-07-01T12:40:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=af48f95492dc1af36d9636a750ec492035c0ed7d'/>
<id>urn:sha1:af48f95492dc1af36d9636a750ec492035c0ed7d</id>
<content type='text'>
The name_assign_type indicates how the name is provided. Currently
these types are supported:
- RDMA_NAME_ASSIGN_TYPE_UNKNOWN: Unknown or not set;
- RDMA_NAME_ASSIGN_TYPE_USER: Name is provided by the user; The
  user-created sub device, rxe and siw device has this type.

When filling nl device info, it is set in the new attribute
RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE. User-space tools like udev
"rdma_rename" could check this attribute to determine if this
device needs to be renamed or not.

Signed-off-by: Mark Zhang &lt;markzhang@nvidia.com&gt;
Link: https://lore.kernel.org/r/522591bef9a369cc8e5dcb77787e017bffee37fe.1719837610.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA: Set type of rdma_ah to IB for a SMI sub device</title>
<updated>2024-07-01T12:38:04+00:00</updated>
<author>
<name>Mark Zhang</name>
<email>markzhang@nvidia.com</email>
</author>
<published>2024-06-16T16:08:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36e97bbc2dca61751c5b4b7f248cd850a7654a19'/>
<id>urn:sha1:36e97bbc2dca61751c5b4b7f248cd850a7654a19</id>
<content type='text'>
An address handle created on a SMI port has type IB, as a SMI
port it's used for SMI management through umad.

Signed-off-by: Mark Zhang &lt;markzhang@nvidia.com&gt;
Link: https://lore.kernel.org/r/195be77aae0cce93522269f22f1303d2ccbef605.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Support IB sub device with type "SMI"</title>
<updated>2024-07-01T12:38:04+00:00</updated>
<author>
<name>Mark Zhang</name>
<email>markzhang@nvidia.com</email>
</author>
<published>2024-06-16T16:08:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bca51197620a257e2954be99b16f05115c3b2630'/>
<id>urn:sha1:bca51197620a257e2954be99b16f05115c3b2630</id>
<content type='text'>
This patch adds 2 APIs, as well as driver operations to support adding
and deleting an IB sub device, which provides part of functionalities
of it's parent.

A sub device has a type; for a sub device with type "SMI", it provides
the smi capability through umad for its parent, meaning uverb is not
supported.

A sub device cannot live without a parent. So when a parent is
released, all it's sub devices are released as well.

Signed-off-by: Mark Zhang &lt;markzhang@nvidia.com&gt;
Link: https://lore.kernel.org/r/44253f7508b21eb2caefea3980c2bc072869116c.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
</content>
</entry>
<entry>
<title>RDMA: Pass entire uverbs attr bundle to create cq function</title>
<updated>2024-06-27T19:28:21+00:00</updated>
<author>
<name>Akiva Goldberger</name>
<email>agoldberger@nvidia.com</email>
</author>
<published>2024-06-27T18:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dd6d7f8574d7f8b6a0bf1aeef0b285d2706b8c2a'/>
<id>urn:sha1:dd6d7f8574d7f8b6a0bf1aeef0b285d2706b8c2a</id>
<content type='text'>
Changes the create_cq verb signature by sending the entire uverbs attr
bundle as a parameter. This allows drivers to send driver specific attrs
through ioctl for the create_cq verb and access them in their driver
specific code.

Also adds a new enum value for driver specific ioctl attributes for
methods already supporting UHW.

Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org
Signed-off-by: Akiva Goldberger &lt;agoldberger@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
<entry>
<title>IB/core: add support for draining Shared receive queues</title>
<updated>2024-06-26T13:53:29+00:00</updated>
<author>
<name>Max Gurtovoy</name>
<email>mgurtovoy@nvidia.com</email>
</author>
<published>2024-06-19T17:11:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=844bc12e6da3e2d8ab0cd96d049fc695d5d8ba68'/>
<id>urn:sha1:844bc12e6da3e2d8ab0cd96d049fc695d5d8ba68</id>
<content type='text'>
To avoid leakage for QPs assocoated with SRQ, according to IB spec
(section 10.3.1):

"Note, for QPs that are associated with an SRQ, the Consumer should take
the QP through the Error State before invoking a Destroy QP or a Modify
QP to the Reset State. The Consumer may invoke the Destroy QP without
first performing a Modify QP to the Error State and waiting for the Affiliated
Asynchronous Last WQE Reached Event. However, if the Consumer
does not wait for the Affiliated Asynchronous Last WQE Reached Event,
then WQE and Data Segment leakage may occur. Therefore, it is good
programming practice to teardown a QP that is associated with an SRQ
by using the following process:
 - Put the QP in the Error State;
 - wait for the Affiliated Asynchronous Last WQE Reached Event;
 - either:
   - drain the CQ by invoking the Poll CQ verb and either wait for CQ
     to be empty or the number of Poll CQ operations has exceeded
     CQ capacity size; or
   - post another WR that completes on the same CQ and wait for this
     WR to return as a WC;
 - and then invoke a Destroy QP or Reset QP."

Catch the Last WQE Reached Event in the core layer during drain QP flow.

Signed-off-by: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Link: https://lore.kernel.org/r/20240619171153.34631-2-mgurtovoy@nvidia.com
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
</entry>
</feed>
