summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/pci.c
AgeCommit message (Collapse)AuthorFilesLines
2016-01-22Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-blockLinus Torvalds1-1981/+719
Pull NVMe updates from Jens Axboe: "Last branch for this series is the nvme changes. It's in a separate branch to avoid splitting too much between core and NVMe changes, since NVMe is still helping drive some blk-mq changes. That said, not a huge amount of core changes in here. The grunt of the work is the continued split of the code" * 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits) uapi: update install list after nvme.h rename NVMe: Export NVMe attributes to sysfs group NVMe: Shutdown controller only for power-off NVMe: IO queue deletion re-write NVMe: Remove queue freezing on resets NVMe: Use a retryable error code on reset NVMe: Fix admin queue ring wrap nvme: make SG_IO support optional nvme: fixes for NVME_IOCTL_IO_CMD on the char device nvme: synchronize access to ctrl->namespaces nvme: Move nvme_freeze/unfreeze_queues to nvme core PCI/AER: include header file NVMe: Export namespace attributes to sysfs NVMe: Add pci error handlers block: remove REQ_NO_TIMEOUT flag nvme: merge iod and cmd_info nvme: meta_sg doesn't have to be an array nvme: properly free resources for cancelled command nvme: simplify completion handling nvme: special case AEN requests ...
2016-01-20Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+6
Pull core block updates from Jens Axboe: "We don't have a lot of core changes this time around, it's mostly in drivers, which will come in a subsequent pull. The cores changes include: - blk-mq - Prep patch from Christoph, changing blk_mq_alloc_request() to take flags instead of just using gfp_t for sleep/nosleep. - Doc patch from me, clarifying the difference between legacy and blk-mq for timer usage. - Fixes from Raghavendra for memory-less numa nodes, and a reuse of CPU masks. - Cleanup from Geliang Tang, using offset_in_page() instead of open coding it. - From Ilya, rename request_queue slab to it reflects what it holds, and a fix for proper use of bdgrab/put. - A real fix for the split across stripe boundaries from Keith. We yanked a broken version of this from 4.4-rc final, this one works. - From Mike Krinkin, emit a trace message when we split. - From Wei Tang, two small cleanups, not explicitly clearing memory that is already cleared" * 'for-4.5/core' of git://git.kernel.dk/linux-block: block: use bd{grab,put}() instead of open-coding block: split bios to max possible length block: add call to split trace point blk-mq: Avoid memoryless numa node encoded in hctx numa_node blk-mq: Reuse hardware context cpumask for tags blk-mq: add a flags parameter to blk_mq_alloc_request Revert "blk-flush: Queue through IO scheduler when flush not required" block: clarify blk_add_timer() use case for blk-mq bio: use offset_in_page macro block: do not initialise statics to 0 or NULL block: do not initialise globals to 0 or NULL block: rename request_queue slab cache
2016-01-13NVMe: Shutdown controller only for power-offKeith Busch1-21/+19
We don't need to shutdown a controller for a reset. A controller in a shutdown state may take longer to become ready than one that was simply disabled. This patch has the driver shut down a controller only if the device is about to be powered off or being removed. When taking the controller down for a reset reason, the controller will be disabled instead. Function names have been updated in this patch to reflect their changed semantics. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-13NVMe: IO queue deletion re-writeKeith Busch1-170/+81
The nvme driver deletes IO queues asynchronously since this operation may potentially take an undesirable amount of time with a large number of queues if done serially. The driver used to manage coordinating asynchronous deletions. This patch simplifies that by leveraging the block layer rather than using kthread workers and chaining more complicated callbacks. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12NVMe: Remove queue freezing on resetsKeith Busch1-4/+4
NVMe submits all commands through the block layer now. This means we can let requests queue at the blk-mq hardware context since there is no path that bypasses this anymore so we don't need to freeze the queues anymore. The driver can simply stop the h/w queues from running during a reset instead. This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen with requeued requests. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12NVMe: Use a retryable error code on resetKeith Busch1-1/+1
A negative status has the "do not retry" bit set, which makes it not retryable. Use a fake status that can potentially be retried on reset. An aborted command's status is overridden by the timeout handler so that it won't be retried, which is necessary to keep initialization from getting into a reset loop. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12NVMe: Fix admin queue ring wrapKeith Busch1-1/+6
The tag set queue depth needs to be one less than the h/w queue depth so we don't wrap the circular buffer. This conforms to the specification defined "Full Queue" condition. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12nvme: Move nvme_freeze/unfreeze_queues to nvme coreSagi Grimberg1-30/+2
Nothing pci specific about them and We'll need them exported in other transports too. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22NVMe: IO ending fixes on surprise removalKeith Busch1-1/+19
This patch fixes a lost request discovered during IO + hot removal. The driver's pci removal deletes gendisks prior to shutting down the controller to allow dirty data to sync. Dirty data can not be synced on a surprise removal, though, and would potentially block indefinitely. The driver previously had marked the queue as dying in this scenario to prevent new requests from attempting, however it will still block for requests that already entered the queue. This patch fixes this by quiescing IO first, then aborting the requeued requests before deleting disks. Reported-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22NVMe: Add pci error handlersKeith Busch1-10/+44
Requests enabling pcie aer support. Shuts down the controller on error detected with io frozen state prior to requesting slot reset; resumes controller after reset completes. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: merge iod and cmd_infoChristoph Hellwig1-111/+73
Merge the two per-request structures in the nvme driver into a single one. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: meta_sg doesn't have to be an arrayChristoph Hellwig1-6/+6
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: properly free resources for cancelled commandChristoph Hellwig1-39/+40
We need to move freeing of resources to the ->complete handler to ensure they are also freed when we cancel the command. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: simplify completion handlingChristoph Hellwig1-115/+26
Now that all commands are executed as block layer requests we can remove the internal completion in the NVMe driver. Note that we can simply call blk_mq_complete_request to abort commands as the block layer will protect against double copletions internally. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: special case AEN requestsChristoph Hellwig1-35/+40
AEN requests are different from other requests in that they don't time out or can easily be cancelled. Because of that we should not use the blk-mq infrastructure but just special case them in the completion path. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: switch abort to blk_execute_rq_nowaitChristoph Hellwig1-35/+26
And remove the now unused nvme_submit_cmd helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: switch delete SQ/CQ to blk_execute_rq_nowaitChristoph Hellwig1-34/+15
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: factor out a few helpers from req_completionChristoph Hellwig1-10/+2
We'll need them in other places later. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: fix admin queue depthChristoph Hellwig1-1/+1
The number in tag_set->queue depth includes the reserved tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22NVMe: Remove device management handles on removeKeith Busch1-0/+1
We don't want to allow new references to open on a device that is removed. This ties the lifetime of these handles to the physical device's presence rather than to the open reference count. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22NVMe: Use unbounded work queue for all workKeith Busch1-6/+6
Removes all usage of the global work queue so work can't be scheduled on two different work queues, and removes nvme's work queue singlethreadedness so controllers can be driven in parallel. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: keep the dead controller removal on the system workqueue to avoid deadlocks] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22NVMe: Implement namespace list scanningKeith Busch1-0/+2
The NVMe 1.1 specification provides an identify mode to return a list of active namespaces. This is more efficient to discover which namespace identifiers are active on a controller, providing potentially significant improvement in scan time for controllers with sparesly populated namespaces. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: add quirk for the broken Qemu Identify implementation. To be relaxed later] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: switch abort_limit to an atomic_tChristoph Hellwig1-4/+5
There is no lock to sychronize access to the abort_limit field of struct nvme_ctrl, so switch it to an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: remove dead controllers from a work itemChristoph Hellwig1-13/+9
Compared to the kthread this gives us multiple call prevention for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: merge probe_work and reset_workChristoph Hellwig1-35/+39
If we're using two work queues we're always going to run into races where one item is tearing down what the other one is initializing. So insted merge the two work queues, and let the old probe_work also tear the controller down first if it was alive. Together with the better detection of the probe path using a flag this gives us a properly serialized reset/probe path that also doesn't accidentally trigger when two commands time out and the second one tries to reset the controller while the first reset is still in progress. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: do not restart the request timeout if we're resetting the controllerKeith Busch1-9/+16
Otherwise we're never going to complete a command when it is restarted just after we completed all other outstanding commands in nvme_clear_queue. The controller must be disabled prior to completing a presumed lost command, do this by directly shutting down the controller before queueing the reset work, and return EH_HANDLED from the timeout handler after we shut the controller down. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split and rebase] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: simplify resetsChristoph Hellwig1-26/+13
Don't delete the controller from dev_list before queuing a reset, instead just check for it being reset in the polling kthread. This allows to remove the dev_list_lock in various places, and in addition we can simply rely on checking the queue_work return value to see if we could reset a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: add NVME_SC_CANCELLEDChristoph Hellwig1-1/+1
To properly document how we are using a negative Linux error value to communicate request cancellations inside the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: merge nvme_abort_req and nvme_timeoutChristoph Hellwig1-29/+18
We want to be able to return bettern error values frmo nvme_timeout, which is significantly easier if the two functions are merged. Also clean up and reduce the printk spew so that we only get one message per abort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: don't take the I/O queue q_lock in nvme_timeoutChristoph Hellwig1-4/+2
There is nothing it protects, but it makes lockdep unhappy in many different ways. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: protect against simultaneous shutdown invocationsKeith Busch1-0/+5
Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: only add a controller to dev_list after it's been fully initializedChristoph Hellwig1-21/+30
Without this we can easily get bad derferences on nvmeq->d_db when the nvme kthread tries to poll the CQs for controllers that are in half initialized state. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22nvme: only ignore hardware errors in nvme_create_io_queuesChristoph Hellwig1-15/+20
Half initialized queues due to kernel error returns or timeout are still a good reason to give up on initializing a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: temporary fix for Apple controller resetStephan Günther1-0/+12
Recent patches added basic support for the Apple NVMe controller but still cause resets and data corruption on that particular controller when a specific pattern of read/flush commands occurs. Limiting the queue depth to 2 works around that issue. This patch enforces that limit only for the Apple controller and is considered a temporary fix until we find the root source of that problem. Signed-off-by: Stephan Günther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: refactor set_queue_countChristoph Hellwig1-21/+13
Split out a helper that just issues the Set Features and interprets the result which can go to common code, and document why we are ignoring non-timeout error returns in the PCIe driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move chardev and sysfs interface to common codeChristoph Hellwig1-180/+15
For this we need to add a proper controller init routine and a list of all controllers that is in addition to the list of PCIe controllers, which stays in pci.c. Note that we remove the sysfs device when the last reference to a controller is dropped now - the old code would have kept it around longer, which doesn't make much sense. This requires a new ->reset_ctrl operation to implement controleller resets, and a new ->write_reg32 operation that is required to implement subsystem resets. We also now store caches copied of the NVMe compliance version and the flag if a controller is attached to a subsystem or not in the generic controller structure now. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixes for pr merge] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move namespace scanning to common codeChristoph Hellwig1-190/+30
The namespace scanning code has been mostly generic already, we just need to store a pointer to the tagset in the nvme_ctrl structure, and add a method to check if a controller is I/O incapable. The latter will hopefully be replaced by a proper controller state machine soon. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixed pr conflicts] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move the call to nvme_init_identify earlierChristoph Hellwig1-6/+4
We want to record the identify and CAP values even if no I/O queue is available. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: add a common helper to read Identify Controller dataChristoph Hellwig1-38/+15
And add the 64-bit register read operation for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move nvme_{enable,disable,shutdown}_ctrl to common codeChristoph Hellwig1-109/+24
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move remaining CC setup into nvme_enable_ctrlChristoph Hellwig1-23/+21
Remove the calculation of all the bits written into the CC register into nvme_enable_ctrl, so that they can be moved into the core NVMe driver in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: add explicit quirk handlingChristoph Hellwig1-3/+5
Add an enum for all workarounds not in the spec and identify the affected controllers at probe time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move block_device_operations and ns/ctrl freeing to common codeChristoph Hellwig1-400/+12
This moves the block_device_operations over to common code mostly as-is. The only change is that the ns and ctrl refcounting got some small refcounting to have wrappers around the kref_put operations. A new free_ctrl operation is added to allow the PCI driver to free it's ressources on the final drop. Signed-off-by: Christoph Hellwig <hch@lst.de> [Moved the integrity and pr changes due to merge conflict] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: use the block layer for userspace passthrough metadataKeith Busch1-34/+5
Use the integrity API to pass through metadata from userspace. For PI enabled devices this means that we now validate the reftag, which seems like an unintentional ommission in the old code. Thanks to Keith Busch for testing and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> [Skip metadata setup on admin commands] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: split __nvme_submit_sync_cmdChristoph Hellwig1-3/+3
Add a separate nvme_submit_user_cmd for commands that directly DMA to or from userspace. We'll add metadata support to that soon and the common version would become too messy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move nvme_setup_flush and nvme_setup_rw to common codeChristoph Hellwig1-49/+0
And mark them inline so that we don't slow down the I/O submission path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: move nvme_error_status to common codeChristoph Hellwig1-12/+0
And mark it inline so that we don't slow down the completion path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: factor out a nvme_unmap_data helperChristoph Hellwig1-18/+25
This is the counter part to nvme_map_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: refactor nvme_queue_rqChristoph Hellwig1-122/+97
This "backports" the structure I've used for the fabrics driver. It mostly started out as a cleanup so that I could actually understand the code, but I think it also qualifies as a micro-optimization due to the reduced time we hold q_lock and disable interrupts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: simplify nvme_setup_prps calling conventionChristoph Hellwig1-12/+10
Pass back a true/false value instead of the length which needs a compare with the bytes in the request and drop the pointless gfp_t argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>