kernel/linux.git/drivers/nvme, branch v4.4.235

nvme-pci: initialize queue memory before interrupts

2018-07-11T14:03:47+00:00

commit 161b8be2bd6abad250d4b3f674bdd5480f15beeb upstream. A spurious interrupt before the nvme driver has initialized the completion queue may inadvertently cause the driver to believe it has a completion to process. This may result in a NULL dereference since the nvmeq's tags are not set at this point. The patch initializes the host's CQ memory so that a spurious interrupt isn't mistaken for a real completion. Signed-off-by: Keith Busch Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe [bwh: Backported to 4.4: adjust context] Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman

nvme-pci: Fix nvme queue cleanup if IRQ setup fails

2018-05-30T05:49:01+00:00

[ Upstream commit f25a2dfc20e3a3ed8fe6618c331799dd7bd01190 ] This patch fixes nvme queue cleanup if requesting an IRQ handler for the queue's vector fails. It does this by resetting the cq_vector to the uninitialized value of -1 so it is ignored for a controller reset. Signed-off-by: Jianchao Wang [changelog updates, removed misc whitespace changes] Signed-off-by: Keith Busch Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

nvme: Fix managing degraded controllers

2018-02-16T19:09:47+00:00

commit 5bae7f73d378a986 upstream Upstream is a near rewrite of the async nvme probe that ultimately didn't even cleanly merge in 4.5. This patch is a much smaller change targeted to the regression introduced in 4.4. If a controller is in a degraded mode that needs admin assistence to recover, we need to leave the controller running. We just want to disable namespace access without shuting the controller down. Fixes: 3cf519b5a8d4("nvme: merge nvme_dev_start, nvme_dev_resume and nvme_async_probe") Signed-off-by: Keith Busch Signed-off-by: Greg Kroah-Hartman

nvme: Fix memory order on async queue deletion

2017-11-24T07:32:25+00:00

This patch is a fix specific to the 3.19 - 4.4 kernels. The 4.5 kernel inadvertently fixed this bug differently (db3cbfff5bcc0), but is not a stable candidate due it being a complicated re-write of the entire feature. This patch fixes a potential timing bug with nvme's asynchronous queue deletion, which causes an allocated request to be accidentally released due to the ordering of the shared completion context among the sq/cq pair. The completion context saves the request that issued the queue deletion. If the submission side deletion happens to reset the active request, the completion side will release the wrong request tag back into the pool of available tags. This means the driver will create multiple commands with the same tag, corrupting the queue context. The error is observable in the kernel logs like: "nvme XX:YY:ZZ completed id XX twice on qid:0" In this particular case, this message occurs because the queue is corrupted. The following timing sequence demonstrates the error: CPU A CPU B ----------------------- ----------------------------- nvme_irq nvme_process_cq async_completion queue_kthread_work -----------> nvme_del_sq_work_handler nvme_delete_cq adapter_async_del_queue nvme_submit_admin_async_cmd cmdinfo->req = req; blk_mq_free_request(cmdinfo->req); <-- wrong request!!! This patch fixes the bug by releasing the request in the completion side prior to waking the submission thread, such that that thread can't muck with the shared completion context. Fixes: a4aea5623d4a5 ("NVMe: Convert to blk-mq") Signed-off-by: Keith Busch Signed-off-by: Greg Kroah-Hartman

nvme: protect against simultaneous shutdown invocations

2017-10-12T09:27:35+00:00

commit 77bf25ea70200cddf083f74b7f617e5f07fac8bd upstream. [Back-ported to 4.4. The difference is the file location of the struct definition that's adding the mutex. This fixes reported kernel panics in 4.4-stable from simultaneous controller resets that was never supposed to be allowed to happen.] Signed-off-by: Keith Busch [hch: split from a larger patch] Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

2017-06-29T10:48:53+00:00

commit b5a10c5f7532b7473776da87e67f8301bbc32693 upstream. Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness") introduced a quirk to adapters that cannot read the bit NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters need a delay or else the action of reading the bit NVME_CSTS_RDY could somehow corrupt adapter's registers state and it never recovers. When this quirk was added, we checked ctrl->tagset in order to avoid quirking in probe time, supposing we would never require such delay during probe. Well, it was too optimistic; we in fact need this quirk at probe time in some cases, like after a kexec. In some experiments, after abnormal shutdown of machine (aka power cord unplug), we booted into our bootloader in Power, which is a Linux kernel, and kexec'ed into another distro. If this kexec is too quick, we end up reaching the probe of NVMe adapter in that distro when adapter is in bad state (not fully initialized on our bootloader). What happens next is that nvme_wait_ready() is unable to complete, except if the quirk is enabled. So, this patch removes the original ctrl->tagset verification in order to enable the quirk even on probe time. Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness") Reported-by: Andrew Byrne Reported-by: Jaime A. H. Gomez Reported-by: Zachary D. Myers Signed-off-by: Guilherme G. Piccoli Acked-by: Jeffrey Lien Signed-off-by: Christoph Hellwig [mauricfo: backport to v4.4.70 without nvme quirk handling & nvme_ctrl] Signed-off-by: Mauricio Faria de Oliveira Tested-by: Narasimhan Vaidyanathan Signed-off-by: Greg Kroah-Hartman

nvme/quirk: Add a delay before checking for adapter readiness

2017-06-29T10:48:53+00:00

commit 54adc01055b75ec8769c5a36574c7a0895c0c0b2 upstream. When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe [mauricfo: backport to v4.4.70 without nvme quirk handling & nvme_ctrl] Signed-off-by: Mauricio Faria de Oliveira Tested-by: Narasimhan Vaidyanathan Signed-off-by: Greg Kroah-Hartman

nvme: Call pci_disable_device on the error path.

2016-09-15T06:27:51+00:00

Commit 5706aca74fe4 ("NVMe: Don't unmap controller registers on reset"), which backported b00a726a9fd8 to the 4.4.y kernel introduced a regression in which it didn't call pci_disable_device in the error path of nvme_pci_enable. Reported-by: Jiri Slaby Embarassed-developer: Gabriel Krisman Bertazi Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Greg Kroah-Hartman

NVMe: Don't unmap controller registers on reset

2016-09-07T06:32:37+00:00

Commit b00a726a9fd82ddd4c10344e46f0d371e1674303 upstream. Unmapping the registers on reset or shutdown is not necessary. Keeping the mapping simplifies reset handling. This was backported to 4.4 stable tree because it prevents a race between the reset_work and the shutdown hook, that may provoke the Oops below, in the nvme_wait_ready function. The Oops is easily reproducible on systems that will kexec/reboot immediately after booting, which is actually the common use case for kexec based bootloaders, like Petitboot. This patch removes the unnecessary early unmapping of the PCI configuration in the shutdown hook, allowing a proper handling of the reset work. Unable to handle kernel paging request for data at address 0x0000001c Faulting instruction address: 0xd000000000720b38 cpu 0x1b: Vector: 300 (Data Access) at [c000007f7a9a38a0] pc: d000000000720b38: nvme_wait_ready+0x50/0x120 [nvme] lr: d000000000720b7c: nvme_wait_ready+0x94/0x120 [nvme] sp: c000007f7a9a3b20 msr: 9000000000009033 dar: 1c dsisr: 40000000 current = 0xc000007f7a926c80 paca = 0xc00000000fe85100 softe: 0 irq_happened: 0x01 pid = 2608, comm = kworker/27:1 enter ? for help [c000007f7a9a3bb0] d00000000072572c nvme_setup_io_queues+0xc08/0x1218 [nvme] [c000007f7a9a3c70] c00000000006bbd8 process_one_work+0x228/0x378 [c000007f7a9a3d00] c00000000006c050 worker_thread+0x2e0/0x420 [c000007f7a9a3d80] c00000000007161c kthread+0xfc/0x108 [c000007f7a9a3e30] c0000000000094b4 ret_from_kernel_thread+0x5c/0xa8 Signed-off-by: Keith Busch Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Gabriel Krisman Bertazi [Backport to v4.4.y] Signed-off-by: Greg Kroah-Hartman

NVMe: IO ending fixes on surprise removal

2015-12-22T17:12:04+00:00

This patch fixes a lost request discovered during IO + hot removal. The driver's pci removal deletes gendisks prior to shutting down the controller to allow dirty data to sync. Dirty data can not be synced on a surprise removal, though, and would potentially block indefinitely. The driver previously had marked the queue as dying in this scenario to prevent new requests from attempting, however it will still block for requests that already entered the queue. This patch fixes this by quiescing IO first, then aborting the requeued requests before deleting disks. Reported-by: Sujith Pandel Signed-off-by: Keith Busch Tested-by: Sujith Pandel Signed-off-by: Jens Axboe