summaryrefslogtreecommitdiff
path: root/drivers/nvme/host
AgeCommit message (Collapse)AuthorFilesLines
2021-03-05nvme-fabrics: fix kato initializationMartin George1-1/+4
Currently kato is initialized to NVME_DEFAULT_KATO for both discovery & i/o controllers. This is a problem specifically for non-persistent discovery controllers since it always ends up with a non-zero kato value. Fix this by initializing kato to zero instead, and ensuring various controllers are assigned appropriate kato values as follows: non-persistent controllers - kato set to zero persistent controllers - kato set to NVMF_DEV_DISC_TMO (or any positive int via nvme-cli) i/o controllers - kato set to NVME_DEFAULT_KATO (or any positive int via nvme-cli) Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-hwmon: Return error code when registration failsDaniel Wagner1-0/+1
The hwmon pointer wont be NULL if the registration fails. Though the exit code path will assign it to ctrl->hwmon_device. Later nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by returning the error code from hwmon_device_register_with_info(). Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: add quirks for Lexar 256GB SSDPascal Terjan1-0/+3
Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN quirks for this buggy device. Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417 Signed-off-by: Pascal Terjan <pterjan@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: mark Kingston SKC2000 as not supporting the deepest power stateZoltán Böszörményi1-0/+2
My 2TB SKC2000 showed the exact same symptoms that were provided in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed cold boot to get it back. According to some sources, the A2000 is simply a rebadged SKC2000 with a slightly optimized firmware. Adding the SKC2000 PCI ID to the quirk list with the same workaround as the A2000 made my laptop survive a 5 hours long Yocto bootstrap buildfest which reliably triggered the SSD lockup previously. Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.Julian Einwag1-1/+2
The kernel fails to fully detect these SSDs, only the character devices are present: [ 10.785605] nvme nvme0: pci function 0000:04:00.0 [ 10.876787] nvme nvme1: pci function 0000:81:00.0 [ 13.198614] nvme nvme0: missing or invalid SUBNQN field. [ 13.198658] nvme nvme1: missing or invalid SUBNQN field. [ 13.206896] nvme nvme0: Shutdown timeout set to 20 seconds [ 13.215035] nvme nvme1: Shutdown timeout set to 20 seconds [ 13.225407] nvme nvme0: 16/0/0 default/read/poll queues [ 13.233602] nvme nvme1: 16/0/0 default/read/poll queues [ 13.239627] nvme nvme0: Identify Descriptors failed (8194) [ 13.246315] nvme nvme1: Identify Descriptors failed (8194) Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679 Signed-off-by: Julian Einwag <jeinwag-nvme@marcapo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-02-27Merge branch 'stable/for-linus-5.12' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "Two memory encryption related patches (SWIOTLB is enabled by default for AMD-SEV): - Add support for alignment so that NVME can properly work - Keep track of requested DMA buffers length, as underlaying hardware devices can trip SWIOTLB to bounce too much and crash the kernel And a tiny fix to use proper APIs in drivers" * 'stable/for-linus-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Validate bounce size in the sync/unmap path nvme-pci: set min_align_mask swiotlb: respect min_align_mask swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single swiotlb: refactor swiotlb_tbl_map_single swiotlb: clean up swiotlb_tbl_unmap_single swiotlb: factor out a nr_slots helper swiotlb: factor out an io_tlb_offset helper swiotlb: add a IO_TLB_SIZE define driver core: add a min_align_mask field to struct device_dma_parameters sdhci: stop poking into swiotlb internals
2021-02-26nvme-pci: set min_align_maskJianxiong Gao1-0/+1
The PRP addressing scheme requires all PRP entries except for the first one to have a zero offset into the NVMe controller pages (which can be different from the Linux PAGE_SIZE). Use the min_align_mask device parameter to ensure that swiotlb does not change the address of the buffer modulo the device page size to ensure that the PRPs won't be malformed. Signed-off-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-21Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds10-70/+212
Pull block driver updates from Jens Axboe: - Remove the skd driver. It's been EOL for a long time (Damien) - NVMe pull requests - fix multipath handling of ->queue_rq errors (Chao Leng) - nvmet cleanups (Chaitanya Kulkarni) - add a quirk for buggy Amazon controller (Filippo Sironi) - avoid devm allocations in nvme-hwmon that don't interact well with fabrics (Hannes Reinecke) - sysfs cleanups (Jiapeng Chong) - fix nr_zones for multipath (Keith Busch) - nvme-tcp crash fix for no-data commands (Sagi Grimberg) - nvmet-tcp fixes (Sagi Grimberg) - add a missing __rcu annotation (Christoph) - failed reconnect fixes (Chao Leng) - various tracing improvements (Michal Krakowiak, Johannes Thumshirn) - switch the nvmet-fc assoc_list to use RCU protection (Leonid Ravich) - resync the status codes with the latest spec (Max Gurtovoy) - minor nvme-tcp improvements (Sagi Grimberg) - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya Kulkarni, Israel Rukshin) - Floppy O_NDELAY fix (Denis) - MD pull request - raid5 chunk_sectors fix (Guoqing) - Use lore links (Kees) - Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao) - loop lock scaling (Pavel) - mtip32xx PCI fixes (Bjorn) - bcache fixes (Kai, Dongdong) - Misc fixes (Tian, Yang, Guoqing, Joe, Andy) * tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits) lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid() lightnvm: fix unnecessary NULL check warnings nvme-tcp: fix crash triggered with a dataless request submission block: Replace lkml.org links with lore nbd: Convert to DEFINE_SHOW_ATTRIBUTE nvme: add 48-bit DMA address quirk for Amazon NVMe controllers nvme-hwmon: rework to avoid devm allocation nvmet: remove else at the end of the function nvmet: add nvmet_req_subsys() helper nvmet: use min of device_path and disk len nvmet: use invalid cmd opcode helper nvmet: use invalid cmd opcode helper nvmet: add helper to report invalid opcode nvmet: remove extra variable in id-ns handler nvmet: make nvmet_find_namespace() req based nvmet: return uniform error for invalid ns nvmet: set status to 0 in case for invalid nsid nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues nvme-multipath: set nr_zones for zoned namespaces nvmet-tcp: fix potential race of tcp socket closing accept_work ...
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds6-33/+28
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-11nvme-tcp: fix crash triggered with a dataless request submissionSagi Grimberg1-1/+1
write-zeros has a bio, but does not have any data buffers associated with it. Hence should not initialize the request iter for it (which attempts to reference the bi_io_vec (and crash). -- run blktests nvme/012 at 2021-02-05 21:53:34 BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S I 5.11.0-rc6+ #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp] blk_mq_dispatch_rq_list+0x11c/0x7c0 ? blk_mq_flush_busy_ctxs+0xf6/0x110 __blk_mq_sched_dispatch_requests+0x12b/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x60 process_one_work+0x1cb/0x360 ? process_one_work+0x360/0x360 worker_thread+0x30/0x370 ? process_one_work+0x360/0x360 kthread+0x116/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 -- Fixes: cb9b870fba3e ("nvme-tcp: fix wrong setting of request iov_iter") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme: add 48-bit DMA address quirk for Amazon NVMe controllersFilippo Sironi2-1/+26
Some Amazon NVMe controllers do not follow the NVMe specification and are limited to 48-bit DMA addresses. Add a quirk to force bounce buffering if needed and limit the IOVA allocation for these devices. This affects all current Amazon NVMe controllers that expose EBS volumes (0x0061, 0x0065, 0x8061) and local instance storage (0xcd00, 0xcd01, 0xcd02). Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme-hwmon: rework to avoid devm allocationHannes Reinecke3-10/+30
The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sysfs attributes, causing warning about duplicate sysfs entries upon reconnection. This patch reworks the hwmon allocation to avoid device-managed resource allocation, and uses the NVMe controller as parent for the sysfs attributes. Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme-multipath: set nr_zones for zoned namespacesKeith Busch1-0/+4
The bio based drivers only require the request_queue's nr_zones is set, so set this field in the head if the namespace path is zoned. Fixes: 240e6ee272c07 ("nvme: support for zoned namespaces") Reported-by: Minwoo Im <minwoo.im.dev@gmail.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme-rdma: handle nvme_rdma_post_send failures betterChao Leng1-1/+3
nvme_rdma_post_send failing is a path related error and should bounce to another path when using nvme-multipath. Call nvme_host_path_error when nvme_rdma_post_send returns -EIO to ensure nvme_complete_rq gets invoked to fail over to another path if there is one. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme-fabrics: avoid double completions in nvmf_fail_nonready_commandChao Leng1-5/+1
When reconnecting, the request may be completed with NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently set the state of the request to MQ_RQ_IN_FLIGHT before calling nvme_complete_rq. When this happens for a request that is freed by the caller, such as nvme_submit_user_cmd, in the worst case the request could be completed again in tear down process. Instead of calling blk_mq_start_request from nvmf_fail_nonready_command, just use the new nvme_host_path_error helper to complete the command without starting it. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme: introduce a nvme_host_path_error helperChao Leng2-0/+16
When using nvme native multipathing, if a path related error occurs during ->queue_rq, the request needs to be completed with NVME_SC_HOST_PATH_ERROR so that the request can be failed over. Introduce a helper to complete the command from ->queue_rq in a wait that invokes nvme_complete_rq. Signed-off-by: Chao Leng <lengchao@huawei.com> [hch: renamed, added a return value to clean up the callers a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme: convert sysfs sprintf/snprintf family to sysfs_emitJiapeng Chong1-5/+5
Fix the following coccicheck warning: ./drivers/nvme/host/core.c:3580:8-16: WARNING: use scnprintf or sprintf. ./drivers/nvme/host/core.c:3570:8-16: WARNING: use scnprintf or sprintf. ./drivers/nvme/host/core.c:3560:8-16: WARNING: use scnprintf or sprintf. ./drivers/nvme/host/core.c:3526:8-16: WARNING: use scnprintf or sprintf. ./drivers/nvme/host/core.c:2833:8-16: WARNING: use scnprintf or sprintf. Reported-by: Abaci Robot<abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme: cleanup zone information initializationDamien Le Moal2-13/+9
For a zoned namespace, in nvme_update_ns_info(), call nvme_update_zone_info() after executing nvme_update_disk_info() so that the namespace queue logical and physical block size limits are set. This allows setting the namespace queue max_zone_append_sectors limit in nvme_update_zone_info() instead of nvme_revalidate_zones(), simplifying this function. Also use blk_queue_set_zoned() to set the namespace zoned model. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-02nvme-pci: ignore the subsysem NQN on Phison E16Claus Stovgaard1-0/+2
Tested both with Corsairs firmware 11.3 and 13.0 for the Corsairs MP600 and both have the issue as reported by the kernel. nvme nvme0: missing or invalid SUBNQN field. Signed-off-by: Claus Stovgaard <claus.stovgaard@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDsThorsten Leemhuis1-0/+2
Some Kingston A2000 NVMe SSDs sooner or later get confused and stop working when they use the deepest APST sleep while running Linux. The system then crashes and one has to cold boot it to get the SSD working again. Kingston seems to known about this since at least mid-September 2020: https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994 Someone working for a German company representing Kingston to the German press confirmed to me Kingston engineering is aware of the issue and investigating; the person stated that to their current knowledge only the deepest APST sleep state causes trouble. Therefore, make Linux avoid it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD. I have two such SSDs, but it seems the problem doesn't occur with them. I hence couldn't verify if this patch really fixes the problem, but all the data in front of me suggests it should. This patch can easily be reverted or improved upon if a better solution surfaces. FWIW, there are many reports about the issue scattered around the web; most of the users disabled APST completely to make things work, some just made Linux avoid the deepest sleep state: https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80 https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu For the record, some data from 'nvme id-ctrl /dev/nvme0' NVME Identify Controller: vid : 0x2646 ssvid : 0x2646 mn : KINGSTON SA2000M81000G fr : S5Z42105 [...] ps 0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-tcp: use cancel tagset helper for tear downChao Leng1-10/+2
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for tear down process. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-rdma: use cancel tagset helper for tear downChao Leng1-10/+2
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for tear down process. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-tcp: add clean action for failed reconnectionChao Leng1-2/+16
If reconnect failed after start io queues, the queues will be unquiesced and new requests continue to be delivered. Reconnection error handling process directly free queues without cancel suspend requests. The suppend request will time out, and then crash due to use the queue after free. Add sync queues and cancel suppend requests for reconnection error handling. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-rdma: add clean action for failed reconnectionChao Leng1-2/+16
A crash happens when inject failed reconnection. If reconnect failed after start io queues, the queues will be unquiesced and new requests continue to be delivered. Reconnection error handling process directly free queues without cancel suspend requests. The suppend request will time out, and then crash due to use the queue after free. Add sync queues and cancel suppend requests for reconnection error handling. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-core: add cancel tagset helpersChao Leng2-0/+22
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and reconnection error handling. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-core: get rid of the extra spaceChaitanya Kulkarni1-1/+1
Remove the extra space in the nvme_free_cels() when calling xa_for_each loop which is not a common practice (except drivers/infiniband/core/ not sure why). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: add tracing of zns commandsJohannes Thumshirn1-0/+34
When support for the NVMe ZNS commands was merged, tracing of these has been omitted. Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as nvme_cmd_zone_append to the nvme driver's tracing facility. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: parse format nvm command details when tracingMichal Krakowiak1-0/+19
Add detailed parsing of format nvm admin command to make the trace log more consistent and human-readable. Signed-off-by: Michal Krakowiak <michal.krakowiak@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: refactor ns->ctrl by requestMinwoo Im1-3/+3
Just for current code in nvme_cleanup_cmd(), we don't have to get namespace instance, but we need controller instance. Controller instance can be retrieved by namespace instance, but it can be directly accessed by nvme_request instance from request. ctrl = nvme_req(req)->ctrl; We don't have to go around namespace instance from request instance through gendisk. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-tcp: pass multipage bvec to request iov_iterSagi Grimberg1-4/+9
iov_iter uses the right helpers so we should be able to pass in a multipage bvec. Right now the iov_iter is initialized with more segments that it needs which doesn't fail because the iov_iter is capped by byte count, but it is better to use a full multipage bvec iter. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-tcp: get rid of unused helper functionSagi Grimberg1-5/+0
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme-tcp: fix wrong setting of request iov_iterSagi Grimberg1-5/+2
We might set the iov_iter direction wrong, which is harmless for this use-case, but get it right. Also this makes the code slightly cleaner. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: support command retry delay for admin commandMinwoo Im1-3/+2
The controller can request a delay retrying a failed command by setting the Command Retry Delay (CRD) field in the Completion Queue Entry. Currentlty this features is only applied to commands on the I/O queue, but not to commands on the admin queue. Retreive the nvme_ctrl from the request so that no namespace is required and apply the feature to all commands. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: constify static attribute_group structsRikard Falkeborn2-3/+3
The only usage of these is to put their addresses in arrays of pointers to const attribute_groups. Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-28nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_headChao Leng1-1/+1
The "list" of nvme_ns_head is used as rcu list, now in nvme_init_ns_head list_add_tail is used to add ns->siblings to the rcu list. It is not safe. Should use list_add_tail_rcu instead of list_add_tail. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-28nvme-multipath: Early exit if no path is availableDaniel Wagner1-1/+1
nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-28nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC deviceChaitanya Kulkarni1-0/+2
This adds a quirk for SPCC 256GB NVMe 1.3 drive which fixes timeouts and I/O errors due to the fact that the controller does not properly handle the Write Zeroes command: [ 2745.659527] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 5.10.6-BET #1 [ 2745.659528] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 3001 12/04/2020 [ 2776.138874] nvme nvme1: I/O 414 QID 3 timeout, aborting [ 2776.138886] nvme nvme1: I/O 415 QID 3 timeout, aborting [ 2776.138891] nvme nvme1: I/O 416 QID 3 timeout, aborting [ 2776.138895] nvme nvme1: I/O 417 QID 3 timeout, aborting [ 2776.138912] nvme nvme1: Abort status: 0x0 [ 2776.138921] nvme nvme1: I/O 428 QID 3 timeout, aborting [ 2776.138922] nvme nvme1: Abort status: 0x0 [ 2776.138925] nvme nvme1: Abort status: 0x0 [ 2776.138974] nvme nvme1: Abort status: 0x0 [ 2776.138977] nvme nvme1: Abort status: 0x0 [ 2806.346792] nvme nvme1: I/O 414 QID 3 timeout, reset controller [ 2806.363566] nvme nvme1: 15/0/0 default/read/poll queues [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 [ 2836.672121] nvme nvme1: failed to mark controller live state [ 2836.672123] nvme nvme1: Removing after probe failure status: -19 [ 2836.689016] Aborting journal on device dm-0-8. [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592, lost sync page write [ 2836.689027] JBD2: Error -5 detected when updating journal superblock for dm-0-8. Reported-by: Bradley Chapman <chapman6235@comcast.net> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Bradley Chapman <chapman6235@comcast.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-27nvme-core: check bdev value for NULLChaitanya Kulkarni1-1/+2
The nvme-core sets the bdev to NULL when admin comamnd is issued from IOCTL in the following path e.g. nvme list :- block_ioctl() blkdev_ioctl() nvme_ioctl() nvme_user_cmd() nvme_submit_user_cmd() The commit 309dca309fc3 ("block: store a block_device pointer in struct bio") now uses bdev unconditionally in the macro bio_set_dev() and assumes that bdev value is not NULL which results in the following crash in since thats where bdev is actually accessed :- void bio_associate_blkg_from_css(struct bio *bio, struct cgroup_subsys_state *css) { if (bio->bi_blkg) blkg_put(bio->bi_blkg); if (css && css->parent) { bio->bi_blkg = blkg_tryget_closest(bio, css); } else { --------------> blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg); bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg; } } EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css); [ 345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690 [ 345.387103] #PF: supervisor read access in kernel mode [ 345.387894] #PF: error_code(0x0000) - not-present page [ 345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0 [ 345.389625] Oops: 0000 [#1] SMP NOPTI [ 345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G OE 5.11.0-rc5blk+ #141 [ 345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764 [ 345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f [ 345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246 [ 345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8 [ 345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001 [ 345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800 [ 345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540 [ 345.401976] FS: 00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000 [ 345.402997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0 [ 345.404685] Call Trace: [ 345.405031] bio_associate_blkg+0x71/0x1c0 [ 345.405649] nvme_submit_user_cmd+0x1aa/0x38e [nvme_core] [ 345.406348] nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core] [ 345.407117] nvme_ioctl+0x226/0x260 [nvme_core] [ 345.407707] blkdev_ioctl+0x1c8/0x2b0 [ 345.408183] block_ioctl+0x3f/0x50 [ 345.408627] __x64_sys_ioctl+0x84/0xc0 [ 345.409117] do_syscall_64+0x33/0x40 [ 345.409592] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 345.410233] RIP: 0033:0x7f3747632107 [ 345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107 [ 345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004 [ 345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080 [ 345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0 [ 345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 Add a NULL check before we set the bdev for bio. This issue is found on block/for-next tree. Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26nvme: use bio_set_dev to assign ->bi_bdevChristoph Hellwig3-4/+4
Always use the bio_set_dev helper to assign ->bi_bdev to make sure other state related to the device is uptodate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: remove unnecessary argument from blk_execute_rqGuoqing Jiang2-3/+3
We can remove 'q' from blk_execute_rq as well after the previous change in blk_execute_rq_nowait. And more importantly it never really was needed to start with given that we can trivial derive it from struct request. Cc: linux-scsi@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: linux-ide@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-nfs@vger.kernel.org Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: remove unnecessary argument from blk_execute_rq_nowaitGuoqing Jiang3-5/+5
The 'q' is not used since commit a1ce35fa4985 ("block: remove dead elevator code"), also update the comment of the function. And more importantly it never really was needed to start with given that we can trivial derive it from struct request. Cc: target-devel@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: linux-ide@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: store a block_device pointer in struct bioChristoph Hellwig4-9/+8
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25nvme: allow revalidate to set a namespace read-onlyChristoph Hellwig1-3/+2
Unconditionally call set_disk_ro now that it only updates the hardware state. This allows to properly set up the Linux devices read-only when the controller turns a previously writable namespace read-only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-20nvme-pci: fix error unwind in nvme_map_dataChristoph Hellwig1-10/+18
Properly unwind step by step using refactored helpers from nvme_unmap_data to avoid a potential double dma_unmap on a mapping failure. Fixes: 7fe07d14f71f ("nvme-pci: merge nvme_free_iod into nvme_unmap_data") Reported-by: Marc Orr <marcorr@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Marc Orr <marcorr@google.com>
2021-01-20nvme-pci: refactor nvme_unmap_dataChristoph Hellwig1-28/+49
Split out three helpers from nvme_unmap_data that will allow finer grained unwinding from nvme_map_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Marc Orr <marcorr@google.com>
2021-01-18nvme-pci: allow use of cmb on v1.4 controllersKlaus Jensen1-0/+14
Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled by the host. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> [hch: avoid a local variable and add a comment] Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-18nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeoutChao Leng1-4/+10
Each name space has a request queue, if complete request long time, multi request queues may have time out requests at the same time, nvme_tcp_timeout will execute concurrently. Multi requests in different request queues may be queued in the same tcp queue, multi nvme_tcp_timeout may call nvme_tcp_stop_queue at the same time. The first nvme_tcp_stop_queue will clear NVME_TCP_Q_LIVE and continue stopping the tcp queue(cancel io_work), but the others check NVME_TCP_Q_LIVE is already cleared, and then directly complete the requests, complete request before the io work is completely canceled may lead to a use-after-free condition. Add a multex lock to serialize nvme_tcp_stop_queue. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-18nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeoutChao Leng1-4/+11
A crash happens when inject completing request long time(nearly 30s). Each name space has a request queue, when inject completing request long time, multi request queues may have time out requests at the same time, nvme_rdma_timeout will execute concurrently. Multi requests in different request queues may be queued in the same rdma queue, multi nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time. The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE is already cleared, and then directly complete the requests, complete request before the qp is fully drained may lead to a use-after-free condition. Add a multex lock to serialize nvme_rdma_stop_queue. Signed-off-by: Chao Leng <lengchao@huawei.com> Tested-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-18nvme: check the PRINFO bit before deciding the host buffer lengthRevanth Rajashekar1-2/+15
According to NVMe spec v1.4, section 8.3.1, the PRINFO bit and the metadata size play a vital role in deteriming the host buffer size. If PRIFNO bit is set and MS==8, the host doesn't add the metadata buffer, instead the controller adds it. Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-14nvme: don't intialize hwmon for discovery controllersSagi Grimberg1-3/+8
Discovery controllers usually don't support smart log page command. So when we connect to the discovery controller we see this warning: nvme nvme0: Failed to read smart log (error 24577) nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009 nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" Introduce a new helper to understand if the controller is a discovery controller and use this helper to skip nvme_init_hwmon (also use it in other places that we check if the controller is a discovery controller). Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>