<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/ethernet/microsoft, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:13:26+00:00</updated>
<entry>
<title>net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown</title>
<updated>2026-03-25T10:13:26+00:00</updated>
<author>
<name>Dipayaan Roy</name>
<email>dipayanroy@linux.microsoft.com</email>
</author>
<published>2026-03-11T19:22:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=05d345719d85b927cba74afac4d5322de3aa4256'/>
<id>urn:sha1:05d345719d85b927cba74afac4d5322de3aa4256</id>
<content type='text'>
[ Upstream commit fa103fc8f56954a60699a29215cb713448a39e87 ]

A potential race condition exists in mana_hwc_destroy_channel() where
hwc-&gt;caller_ctx is freed before the HWC's Completion Queue (CQ) and
Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
handler to dereference freed memory, leading to a use-after-free or
NULL pointer dereference in mana_hwc_handle_resp().

mana_smc_teardown_hwc() signals the hardware to stop but does not
synchronize against IRQ handlers already executing on other CPUs. The
IRQ synchronization only happens in mana_hwc_destroy_cq() via
mana_gd_destroy_eq() -&gt; mana_gd_deregister_irq(). Since this runs
after kfree(hwc-&gt;caller_ctx), a concurrent mana_hwc_rx_event_handler()
can dereference freed caller_ctx (and rxq-&gt;msg_buf) in
mana_hwc_handle_resp().

Fix this by reordering teardown to reverse-of-creation order: destroy
the TX/RX work queues and CQ/EQ before freeing hwc-&gt;caller_ctx. This
ensures all in-flight interrupt handlers complete before the memory they
access is freed.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/abHA3AjNtqa1nx9k@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Ring doorbell at 4 CQ wraparounds</title>
<updated>2026-03-19T15:15:18+00:00</updated>
<author>
<name>Long Li</name>
<email>longli@microsoft.com</email>
</author>
<published>2026-02-26T19:28:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e07c00d20342d4af174e7f9094219b5e22635fee'/>
<id>urn:sha1:e07c00d20342d4af174e7f9094219b5e22635fee</id>
<content type='text'>
commit dabffd08545ffa1d7183bc45e387860984025291 upstream.

MANA hardware requires at least one doorbell ring every 8 wraparounds
of the CQ. The driver rings the doorbell as a form of flow control to
inform hardware that CQEs have been consumed.

The NAPI poll functions mana_poll_tx_cq() and mana_poll_rx_cq() can
poll up to CQE_POLLING_BUFFER (512) completions per call. If the CQ
has fewer than 512 entries, a single poll call can process more than
4 wraparounds without ringing the doorbell. The doorbell threshold
check also uses "&gt;" instead of "&gt;=", delaying the ring by one extra
CQE beyond 4 wraparounds. Combined, these issues can cause the driver
to exceed the 8-wraparound hardware limit, leading to missed
completions and stalled queues.

Fix this by capping the number of CQEs polled per call to 4 wraparounds
of the CQ in both TX and RX paths. Also change the doorbell threshold
from "&gt;" to "&gt;=" so the doorbell is rung as soon as 4 wraparounds are
reached.

Cc: stable@vger.kernel.org
Fixes: 58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
Signed-off-by: Long Li &lt;longli@microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Reviewed-by: Vadim Fedorenko &lt;vadim.fedorenko@linux.dev&gt;
Link: https://patch.msgid.link/20260226192833.1050807-1-longli@microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/mana: Null service_wq on setup error to prevent double destroy</title>
<updated>2026-03-19T15:14:58+00:00</updated>
<author>
<name>Shiraz Saleem</name>
<email>shirazsaleem@microsoft.com</email>
</author>
<published>2026-03-09T17:24:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c92392602b451e3869f15ab685f8f650e942b13'/>
<id>urn:sha1:6c92392602b451e3869f15ab685f8f650e942b13</id>
<content type='text'>
[ Upstream commit 87c2302813abc55c46485711a678e3c312b00666 ]

In mana_gd_setup() error path, set gc-&gt;service_wq to NULL after
destroy_workqueue() to match the cleanup in mana_gd_cleanup().
This prevents a use-after-free if the workqueue pointer is checked
after a failed setup.

Fixes: f975a0955276 ("net: mana: Fix double destroy_workqueue on service rescan PCI path")
Signed-off-by: Shiraz Saleem &lt;shirazsaleem@microsoft.com&gt;
Signed-off-by: Konstantin Taranov &lt;kotaranov@microsoft.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Fix double destroy_workqueue on service rescan PCI path</title>
<updated>2026-03-04T12:20:57+00:00</updated>
<author>
<name>Dipayaan Roy</name>
<email>dipayanroy@linux.microsoft.com</email>
</author>
<published>2026-02-24T12:38:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a9a7c3203fdc4d4a8d8a7a3b1ed05d2bb4c6e77e'/>
<id>urn:sha1:a9a7c3203fdc4d4a8d8a7a3b1ed05d2bb4c6e77e</id>
<content type='text'>
[ Upstream commit f975a0955276579e2176a134366ed586071c7c6a ]

While testing corner cases in the driver, a use-after-free crash
was found on the service rescan PCI path.

When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup()
destroys gc-&gt;service_wq. If the subsequent mana_gd_resume() fails
with -ETIMEDOUT or -EPROTO, the code falls through to
mana_serv_rescan() which triggers pci_stop_and_remove_bus_device().
This invokes the PCI .remove callback (mana_gd_remove), which calls
mana_gd_cleanup() a second time, attempting to destroy the already-
freed workqueue. Fix this by NULL-checking gc-&gt;service_wq in
mana_gd_cleanup() and setting it to NULL after destruction.

Call stack of issue for reference:
[Sat Feb 21 18:53:48 2026] Call Trace:
[Sat Feb 21 18:53:48 2026]  &lt;TASK&gt;
[Sat Feb 21 18:53:48 2026]  mana_gd_cleanup+0x33/0x70 [mana]
[Sat Feb 21 18:53:48 2026]  mana_gd_remove+0x3a/0xc0 [mana]
[Sat Feb 21 18:53:48 2026]  pci_device_remove+0x41/0xb0
[Sat Feb 21 18:53:48 2026]  device_remove+0x46/0x70
[Sat Feb 21 18:53:48 2026]  device_release_driver_internal+0x1e3/0x250
[Sat Feb 21 18:53:48 2026]  device_release_driver+0x12/0x20
[Sat Feb 21 18:53:48 2026]  pci_stop_bus_device+0x6a/0x90
[Sat Feb 21 18:53:48 2026]  pci_stop_and_remove_bus_device+0x13/0x30
[Sat Feb 21 18:53:48 2026]  mana_do_service+0x180/0x290 [mana]
[Sat Feb 21 18:53:48 2026]  mana_serv_func+0x24/0x50 [mana]
[Sat Feb 21 18:53:48 2026]  process_one_work+0x190/0x3d0
[Sat Feb 21 18:53:48 2026]  worker_thread+0x16e/0x2e0
[Sat Feb 21 18:53:48 2026]  kthread+0xf7/0x130
[Sat Feb 21 18:53:48 2026]  ? __pfx_worker_thread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork+0x269/0x350
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork_asm+0x1a/0x30
[Sat Feb 21 18:53:48 2026]  &lt;/TASK&gt;

Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing events")
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Fix use-after-free in reset service rescan path</title>
<updated>2025-12-28T09:34:00+00:00</updated>
<author>
<name>Dipayaan Roy</name>
<email>dipayanroy@linux.microsoft.com</email>
</author>
<published>2025-12-18T13:10:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3387a7ad478b46970ae8254049167d166e398aeb'/>
<id>urn:sha1:3387a7ad478b46970ae8254049167d166e398aeb</id>
<content type='text'>
When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc-&gt;in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  &lt;TASK&gt;
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  &lt;/TASK&gt;

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Long Li &lt;longli@microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: mana: Handle hardware recovery events when probing the device</title>
<updated>2025-12-01T21:53:53+00:00</updated>
<author>
<name>Long Li</name>
<email>longli@microsoft.com</email>
</author>
<published>2025-11-26T21:45:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9bf66036d686b9a67000ba22bd94be13a4ea79ac'/>
<id>urn:sha1:9bf66036d686b9a67000ba22bd94be13a4ea79ac</id>
<content type='text'>
When MANA is being probed, it's possible that hardware is in recovery
mode and the device may get GDMA_EQE_HWC_RESET_REQUEST over HWC in the
middle of the probe. Detect such condition and go through the recovery
service procedure.

Signed-off-by: Long Li &lt;longli@microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1764193552-9712-1-git-send-email-longli@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Drop TX skb on post_work_request failure and unmap resources</title>
<updated>2025-11-20T04:11:57+00:00</updated>
<author>
<name>Aditya Garg</name>
<email>gargaditya@linux.microsoft.com</email>
</author>
<published>2025-11-18T11:11:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45120304e84171fd215c1b57b15b285446d15106'/>
<id>urn:sha1:45120304e84171fd215c1b57b15b285446d15106</id>
<content type='text'>
Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.

Signed-off-by: Aditya Garg &lt;gargaditya@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Handle SKB if TX SGEs exceed hardware limit</title>
<updated>2025-11-20T04:11:57+00:00</updated>
<author>
<name>Aditya Garg</name>
<email>gargaditya@linux.microsoft.com</email>
</author>
<published>2025-11-18T11:11:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=934fa943b53795339486cc0026b3ab7ad39dc600'/>
<id>urn:sha1:934fa943b53795339486cc0026b3ab7ad39dc600</id>
<content type='text'>
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.

Also, Add ethtool counter for SKBs linearized

Co-developed-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Signed-off-by: Dipayaan Roy &lt;dipayanroy@linux.microsoft.com&gt;
Signed-off-by: Aditya Garg &lt;gargaditya@linux.microsoft.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Add standard counter rx_missed_errors</title>
<updated>2025-11-18T03:52:30+00:00</updated>
<author>
<name>Erni Sri Satya Vennela</name>
<email>ernis@linux.microsoft.com</email>
</author>
<published>2025-11-14T11:43:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=be4f1d67ec56f23f37714ac73c01094e63c7ff28'/>
<id>urn:sha1:be4f1d67ec56f23f37714ac73c01094e63c7ff28</id>
<content type='text'>
Report standard counter stats-&gt;rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.

Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.

To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.

Signed-off-by: Erni Sri Satya Vennela &lt;ernis@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: mana: Move hardware counter stats from per-port to per-VF context</title>
<updated>2025-11-18T03:52:30+00:00</updated>
<author>
<name>Erni Sri Satya Vennela</name>
<email>ernis@linux.microsoft.com</email>
</author>
<published>2025-11-14T11:43:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e275d9091c01b3b46f3ec534ce4ac77cffc9e3ae'/>
<id>urn:sha1:e275d9091c01b3b46f3ec534ce4ac77cffc9e3ae</id>
<content type='text'>
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.

Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S &lt;interface&gt;, maintaining consistency with
previous behavior.

Signed-off-by: Erni Sri Satya Vennela &lt;ernis@linux.microsoft.com&gt;
Reviewed-by: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
