summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/pci.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-17nvme-pci: refresh visible attrs for cmb attributesKeith Busch1-0/+8
The sysfs group containing the cmb attributes is registered before the driver knows if they need to be visible or not. Update the group when cmb attributes are known to exist so the visibility setting is correct. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037 Fixes: 86adbf0cdb9ec65 ("nvme: simplify transport specific device attribute handling") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14nvme-pci: remove iod use_sglsKeith Busch1-3/+1
It's not used anywhere anymore, so remove it. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14nvme-pci: fix freeing single sglKeith Busch1-1/+1
There may only be a single DMA mapped entry from multiple physical segments, which means we don't allocate a separte SGL list. Check the number of allocations prior to know if we need to free something. Freeing a single list allocation is the same for both PRP and SGL usages, so we don't need to check the use_sgl flag anymore. Fixes: 01df742d8c5c0 ("nvme-pci: remove SGL segment descriptors") Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
2023-02-14nvme-pci: always return an ERR_PTR from nvme_pci_alloc_devIrvin Cote1-3/+3
Don't mix NULL and ERR_PTR returns. Fixes: 2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper") Signed-off-by: Irvin Cote <irvin.cote@insa-lyon.fr> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14nvme-pci: set the DMA mask earlierChristoph Hellwig1-7/+5
Set the DMA mask before calling dma_addressing_limited, which depends on it. Note that this stop checking the return value of dma_set_mask_and_coherent as this function can only fail for masks < 32-bit. Fixes: 3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl") Reported-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Michael Kelley <mikelley@microsoft.com>
2023-02-13nvme-pci: add bogus ID quirk for ADATA SX6000PNPDaniel Wagner1-0/+2
Yet another device which needs a quirk: nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000PNP firmware:V9002s94 Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1207827 Reported-by: Gustavo Freitas <freitasmgustavo@gmail.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-pci: place descriptor addresses in iodKeith Busch1-31/+18
The 'struct nvme_iod' space is appended at the end of the preallocated 'struct request', and padded to the cache line size. This leaves some free memory (in most kernel configs) up for grabs. Instead of appending the nvme data descriptor addresses after the scatterlist, inline these for free within struct nvme_iod. There is now enough space in the mempool for 128 possibe segments. And without increasing the size of the preallocated requests, we can hold up to 5 PRP descriptor elements, allowing the driver to increase its max transfer size to 8MB. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-pci: use mapped entries for sgl decisionKeith Busch1-3/+3
The driver uses the dma entries for setting up its command's SGL/PRP lists. The dma mapping might have fewer entries than the physical segments, so check the dma mapped count to determine which nvme data layout method is more optimal. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-pci: remove SGL segment descriptorsKeith Busch1-42/+5
The max segments this driver can see is 127, well below the 256 threshold needed to add an nvme sgl segment descriptor. Remove all the useless checks and dead code. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-24nvme-pci: flush initial scan_work for async probeKeith Busch1-0/+1
The nvme device may have a namespace with the root partition, so make sure we've completed scanning before returning from the async probe. Fixes: eac3ef262941 ("nvme-pci: split the initial probe from the rest path") Reported-by: Klaus Jensen <its@irrelevant.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-19nvme-pci: fix timeout request state checkKeith Busch1-1/+1
Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-pci: fix error handling in nvme_pci_enable()Tong Zhang1-2/+7
There are two issues in nvme_pci_enable(): 1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable statement. 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case, we will need to free IRQ properly. Otherwise the following warning could be triggered: [ 5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140 [ 5.290547] Call Trace: [ 5.290626] <TASK> [ 5.290695] msi_remove_device_irq_domain+0xc9/0xf0 [ 5.290843] msi_device_data_release+0x15/0x80 [ 5.290978] release_nodes+0x58/0x90 [ 5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80 [ 5.297573] Call Trace: [ 5.297651] <TASK> [ 5.297719] release_nodes+0x58/0x90 [ 5.297831] devres_release_all+0xef/0x140 [ 5.298339] device_unbind_cleanup+0x11/0xc0 [ 5.298479] really_probe+0x296/0x320 Fixes: a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllersHector Martin1-1/+2
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List command and behave identically to the SoC version judging by existing user reports/syslogs, so will need the same fix. This quirk reverts back to NVMe 1.0 behavior and disables the broken commands. Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Signed-off-by: Hector Martin <marcan@marcan.st> Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-26nvme-pci: update sqsize when adjusting the queue depthChristoph Hellwig1-4/+5
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it. Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-21nvme-pci: fix page size checksKeith Busch1-4/+4
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE. Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-21nvme-pci: fix mempool alloc sizeKeith Busch1-2/+2
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries. The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool. While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence. Cc: Jens Axboe <axboe@kernel.dk> Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-21nvme-pci: fix doorbell buffer value endiannessKlaus Jensen1-12/+13
When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions. This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2]. [1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/ Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-13Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linuxLinus Torvalds1-338/+268
Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ...
2022-12-07nvme-pci: use the tagset alloc/free helpersChristoph Hellwig1-71/+18
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-06nvme-pci: split out a nvme_pci_ctrl_is_dead helperChristoph Hellwig1-22/+25
Clean up nvme_dev_disable by splitting the logic to detect if a controller is dead into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme-pci: return early on ctrl state mismatch in nvme_reset_workChristoph Hellwig1-2/+1
The only way nvme_reset_work could be called when not in resetting state is if a reset and remove happen near the same time. This should not happen, but if it did we don't want the reset work to disable the controller because the remove is already doing that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme-pci: rename nvme_disable_io_queuesChristoph Hellwig1-10/+10
This function really deletes the I/O queues, so rename it to match the functionality. Also move the main wrapper right next to the actual underlying implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme-pci: cleanup nvme_suspend_queueChristoph Hellwig1-10/+7
Remove the unused returne value, pass a dev + qid instead of the queue as that is better for the callers as well as the function itself, and remove the entirely pointless kerneldoc comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme-pci: remove nvme_pci_disableChristoph Hellwig1-13/+5
nvme_pci_disable has a single caller, fold it into that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme-pci: remove nvme_disable_admin_queueChristoph Hellwig1-9/+2
nvme_disable_admin_queue has only a single caller, and just calls two other funtions, so remove it to clean up the remove path a little more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrlChristoph Hellwig1-6/+9
Many of the callers decide which one to use based on a bool argument and there is at least some code to be shared, so merge these two. Also move a comment specific to a single callsite to that callsite. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>
2022-12-06nvme: introduce nvme_start_requestSagi Grimberg1-1/+1
In preparation for nvme-multipath IO stats accounting, we want the accounting to happen in a centralized place. The request completion is already centralized, but we need a common helper to request I/O start. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-12-06nvme: use kstrtobool() instead of strtobool()Christophe JAILLET1-1/+2
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-30nvme-pci: clear the prp2 field when not usedLei Rao1-0/+2
If the prp2 field is not filled in nvme_setup_prp_simple(), the prp2 field is garbage data. According to nvme spec, the prp2 is reserved if the data transfer does not cross a memory page boundary, so clear it to zero if it is not used. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-18nvme: rename the queue quiescing helpersChristoph Hellwig1-8/+8
Naming the nvme helpers that wrap the block quiesce functionality _start/_stop is rather confusing. Switch to using the quiesce naming used by the block layer instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-11-16nvme-pci: don't unbind the driver on reset failureChristoph Hellwig1-30/+10
Unbind a device driver when a reset fails is very unusual behavior. Just shut the controller down and leave it in dead state if we fail to reset it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-11-16nvme-pci: split the initial probe from the rest pathChristoph Hellwig1-53/+80
nvme_reset_work is a little fragile as it needs to handle both resetting a live controller and initializing one during probe. Split out the initial probe and open code it in nvme_probe and leave nvme_reset_work to just do the live controller reset. This fixes a recently introduced bug where nvme_dev_disable causes a NULL pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer is not set when the reset state is entered directly from the new state. The separate probe code can skip the reset state and probe directly and fixes this. To make sure the system isn't single threaded on enabling nvme controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver structure so that the driver core probes in parallel. Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset") Reported-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-16nvme-pci: move the HMPRE check into nvme_setup_host_memChristoph Hellwig1-5/+6
Check that a HMB is wanted into the allocation helper instead of the caller. This makes life simpler for an upcoming second caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-11-16nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000Tiago Dias Ferreira1-0/+2
Added a quirk to fix the Netac NV7000 SSD reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Tiago Dias Ferreira <tiagodfer@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-15nvme-pci: simplify nvme_dbbuf_dma_allocChristoph Hellwig1-16/+16
Move the OACS check and the error checking into nvme_dbbuf_dma_alloc so that an upcoming second caller doesn't have to duplicate this boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-11-15nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enableChristoph Hellwig1-5/+2
nvme_pci_configure_admin_queue is called right after nvme_pci_enable, and it's work is undone by nvme_dev_disable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: set constant paramters in nvme_pci_alloc_ctrlChristoph Hellwig1-21/+17
Move setting of low-level constant parameters from nvme_reset_work to nvme_pci_alloc_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: factor out a nvme_pci_alloc_dev helperChristoph Hellwig1-35/+46
Add a helper that allocates the nvme_dev structure up to the point where we can call nvme_init_ctrl. This pairs with the free_ctrl method and can thus be used to cleanup the teardown path and make it more symmetric. Note that this now calls nvme_init_ctrl a lot earlier during probing, which also means the per-controller character device shows up earlier. Due to the controller state no commnds can be send on it, but it might make sense to delay the cdev registration until nvme_init_ctrl_finish. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: factor the iod mempool creation into a helperChristoph Hellwig1-23/+18
Add a helper to create the iod mempool. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: move more teardown work to nvme_removeChristoph Hellwig1-2/+2
nvme_dbbuf_dma_free frees dma coherent memory, so it must not be called after ->remove has returned. Fortunately there is no way to use it after shutdown as no more I/O is possible so it can be moved. Similarly the iod_mempool can't be used for a device kept alive after shutdown, so move it next to freeing the PRP pools. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: put the admin queue in nvme_dev_remove_adminChristoph Hellwig1-2/+1
Once the controller is shutdown no one can access the admin queue. Tear it down in nvme_dev_remove_admin, which matches the flow in the other drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme: simplify transport specific device attribute handlingChristoph Hellwig1-15/+8
Allow the transport driver to override the attribute groups for the control device, so that the PCIe driver doesn't manually have to add a group after device creation and keep track of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme: move OPAL setup from PCIe to coreChristoph Hellwig1-13/+1
Nothing about the TCG Opal support is PCIe transport specific, so move it to the core code. For this nvme_init_ctrl_finish grows a new was_suspended argument that allows the transport driver to tell the OPAL code if the controller came out of a suspend cycle. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-11-15nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron NitroBean Huo1-0/+2
Added a quirk to fix Micron Nitro NVMe reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-09nvme: quiet user passthrough command errorsKeith Busch1-2/+0
The driver is spamming the kernel logs for entirely harmless errors from user space submitting unsupported commands. Just silence the errors. The application has direct access to command status, so there's no need to log these. And since every passthrough command now uses the quiet flag, move the setting to the common initializer. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Alan Adamson <alan.adamson@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-02nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrlChristoph Hellwig1-1/+0
nvme_remove_dead_ctrl schedules nvme_remove to be called, which will call nvme_dev_disable and unquiesce the I/O queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02nvme: split nvme_kill_queuesChristoph Hellwig1-2/+4
nvme_kill_queues does two things: 1) mark the gendisk of all namespaces dead 2) unquiesce all I/O queues These used to be be intertwined due to block layer issues, but aren't any more. So move the unquiscing of the I/O queues into the callers, and rename the rest of the function to the now more descriptive nvme_mark_namespaces_dead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02nvme-pci: refactor the tagset handling in nvme_reset_workChristoph Hellwig1-16/+28
The code to create, update or delete a tagset and namespaces in nvme_reset_work is a bit convoluted. Refactor it with a two high-level conditionals for first probe vs reset and I/O queues vs no I/O queues to make the code flow more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-3-hch@lst.de [axboe: fix whitespace issue] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-25nvme-pci: remove an extra queue referenceChristoph Hellwig1-6/+0
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second admin queue reference to be held by the nvme_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-25blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queueChristoph Hellwig1-0/+1
The fact that blk_mq_destroy_queue also drops a queue reference leads to various places having to grab an extra reference. Move the call to blk_put_queue into the callers to allow removing the extra references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-2-hch@lst.de [axboe: fix fabrics_q vs admin_q conflict in nvme core.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>