diff options
| author | Nilay Shroff <nilay@linux.ibm.com> | 2025-06-26 08:19:19 +0300 | 
|---|---|---|
| committer | Christoph Hellwig <hch@lst.de> | 2025-06-30 09:31:49 +0300 | 
| commit | ba806c900379899e5cdd6ca165b900e2081e1c99 (patch) | |
| tree | b1f1dceb1c2b44173f8e3c732cbeba1b5b502713 /rust/helpers/workqueue.c | |
| parent | 2e96d2d8c2a7a6c2cef45593c028d9c5ef180316 (diff) | |
| download | linux-ba806c900379899e5cdd6ca165b900e2081e1c99.tar.xz | |
nvme: correctly account for namespace head reference counter
The blktests nvme/058 manifests an issue where the NVMe subsystem
kobject entry remains stale in sysfs, causing a failure during
subsequent NVMe module reloads[1]. Specifically, when attempting to
register a new NVMe subsystem, the driver encounters a kobejct name
collision because a stale kobject still exists. Though, please note
that nvme/058 doesn't report any failure and test case passes and
it's only during subsequent NVMe module reloads, the stale nvme sub-
system kobject entry in sysfs causes the observed symptom[1].
This issue stems from an imbalance in the get/put usage of the namespace
head (nshead) reference counter. The nshead holds a reference to the
associated NVMe subsystem. If the nshead reference is not properly
released, it prevents the cleanup of the subsystem's kobject, leaving
nvme subsystem stale entry behind in sysfs.
During the failure case, the last namespace path referencing a nshead
is removed, but the nshead reference was not released. This occurs
because the release logic currently only puts the nshead reference
when its state is LIVE. However, in configurations where ANA (Asymmetric
Namespace Access) is enabled, a namespace may be associated with an ANA
state that is neither optimized nor non-optimized. In this case, the
nshead may never transition to LIVE, and the corresponding nshead
reference is then never dropped. In fact nvme/058 associates some of
nvme namespaces to an inaccessible ANA state and with that nshead is
created but it's state is not transitioned to LIVE. So the current
logic would then causes nshead reference to be leaked for non-LIVE
states.
Another scenario, during namespace allocation, the driver first
allocates a nshead and then issues an Identify Namespace command. If
this command fails — which can happen in tests like nvme/058 that
rapidly enables and disables namespaces — we must release the reference
to the newly allocated nshead. However this reference release is
currently missing in the failure, causing a nshead reference leak.
To fix this, we now unconditionally release the nshead reference when
the last nvme path referencing to the nshead is removed, regardless of
the head’s state. Also during identify namespace failure case we now
properly release the nshead reference. So this ensures proper cleanup
of the nshead, and consequently, the NVMe subsystem and its associated
kobject.
This change prevents stale kobject entries from lingering in sysfs and
eliminates the module reload failures observed just after running
nvme/058.
[1] https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/
Reported-by: yi.zhang@redhat.com
Closes: https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/
Fixes: 62188639ec16 ("nvme-multipath: introduce delayed removal of the multipath head node")
Tested-by: yi.zhang@redhat.com
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Diffstat (limited to 'rust/helpers/workqueue.c')
0 files changed, 0 insertions, 0 deletions
